npcompare package documentation¶

class
npcompare.
Compare
(f1, f2, psamples1, psamples2, lower=0, upper=1, weights1=None, weights2=None, metric=None)¶ Bases:
object
Compare two samples.
Parameters: f1 : function
Density function for the first population.
f2 : function
Density function for the second population.
psamples1 : array like
Must be either a one dimensional numpy.array (in which case, each element will be passed one at a time to f1) or a two dimensional numpy.array (in which case, each row will be passed one at a time to f1).
psamples2 : array like
Analog to psamples1.
weights1 : array like
Give weights to posterior psamples1. Set to None if each posterior sample has the same weight (the usual case for MCMC methods).
weights2 : array like
Analog to weights2.
metric : function
Metric function to be used.
Defaults to:
def metric(f1, f2, param1, param2): return quad(lambda x: (f1(x, param1)  f2(x, param2))**2, a, b)[0]
Can be set to a userdefined function of the same signature.
lower : float
Lower integration limit passed to default metric function.
upper : float
Upper integration limit passed to default metric function.

classmethod
frombfs
(bfsobj1, bfsobj2, transformation=True, metric=None)¶ Create a class Compare from two EstimateBFS objects.
Parameters: bfsobj1 : EstimateBFS object
bfsobj2 : EstimateBFS object
transformation :
If set to False, metric will be evaluated in [0, 1] without transformation (EstimateBFS class documentation). Otherwise, transformation will be applied.
Ignored if bfsobj1 has no transformation (which implies sample space of [0, 1]).
metric : function
Metric function to be used, defaults to class’s default.
Returns: New instance of class Compare

plot
(ax=None, **kwargs)¶ Plot empirical CDF of metric samples.
Parameters: ax : matplotlib axes
Axis to plot, defaults to axes of a new figure.
show : bool
If True, calls matplotlib.pyplot plt.show() at end.
**kwargs :
Aditional named arguments passed to matplotlib.axes.Axes.step.
Returns: matplotlib.axes.Axes object

sampleposterior
(niter=1000, refresh=100)¶ Compare two samples.
Parameters: niter : integer
Number of simulations to be draw.
refresh : integer
Interval of samples to print the amount of samples obtained so far. Set to 0 to disable printing.
Returns: None

classmethod

class
npcompare.
EstimateBFS
(obs=None, nmaxcomp=10, hpp=1, hpgamma=0, transformation=None, mixture=True, **kwargs)¶ Bases:
object
Estimate univariate density using Bayesian Fourier Series. This method only works with data the lives in [0, 1], however, the class implements methods to automatically transform user inputted data to [0, 1]. See parameter transform below.
Parameters: obs : array like
List/tuple or numpy.array 1D array of observations. If unset, the constructor will not call method fit, and you will need to call it manually later to configure your model data (this can be usefull for using it with scikitlearns GridSearchCV). nmaxcomp : maximum number of components of the Fourier series expansion.
nmaxcomp : integer
Maximum number of components of the Fourier series expansion.
hpp : float
Hyperparameter p (defaults to “conservative” value of 1).
hpgamma : float
Hyperparameter p (defaults to “conservative” value of 0).
transform :
Transformation function to use. Can be either:
String “logit” to use logit transformation. Usefull if the sample space is the real line. However the obs must actually assume only low values like, for example, between [5, 5], this is due to the fact that inverse logit starts getting (computationally) very close to 1 (or 0) after 10 (or 10).
Dictionary {“transf”: “fixed”, “vmin”: vmin, “vmax”: vmax} for fixed value transformation, where vmin and vmax are the minimun and maximum values of the sample space (for obs), respectivelly.
A user defined transformation function with a dict with elements transf, itransf and laditransf, where transf is a function that transforms [0, 1] to sample space, itransf is its inverse, and laditransf is the log absolute derivative of itransf. These 3 functions must accept and return numpy 1D arrays.
mixture : bool
If True, will work with a mixture of Fourier series models with up to nmaxcomponent components (that is, a prior will be set on the number of components of the Fourier Series). If False, will work with a single Fourier series model with exactly nmaxcomponent components.
**kwargs:
Additional keyword arguments passed to method fit.

compilestanmodel
()¶ Compile Stan model necessary for method sample. This method is called automatically by obj.sample().
Returns: self

evalgrid
(gridsize=1000)¶ Calculates posterior estimated mean density value at grid points so they can be used later by method plot or directly by the user.
Parameters: gridsize : size of the grid. Returns: self Notes
The results are store in an instance variable called egresults which is a dictionary with the following objects:
If object was initialized with mixture=True, you will have:
Log densities for full mixture of components stored as logdensitymixmean.
Densities for full mixture of components stored as densitymixmean.
Log densities for individual components stored as logdensityindivmean.
Densities for individual components stored as densityindivmean.
If object was initialized with mixture=False, you will have:
Log densities stored as logdensitymean.
Densities stored as densitymean.

evaluate
(points, logdensity=False, transformed=True, component=None)¶ Predict posterior density for estimated model.
Parameters: points : array like or scalar
Scalar or 1D numpy array of points where density will be evaluated.
transformed : bool
True (default) if points live in the sample space of your inputted data. False if your points live in the sample space of Bayesian Fourier Series sample space ([0, 1]).
component : NoneType or integer
Which individual component of the mixture to use. Set to None (default) to full mixture with posterior with sieve prior (strongly recomended). Ignored if object was initialized with mixture=False.
logdensity : bool
If True, will return the logdensity instead of the density.
Returns: Numpy 1D array with predicted mean density.

fit
(obs, nmaxcomp=None, hpp=None, hpgamma=None, transformation=None, niter=5000, **kwargs)¶ Configure object model data and automatically calls method sampleposterior (to obtain MCMC samples from the posterior).
Parameters: obs : array like
List/tuple or numpy.array 1D array of observations.
nmaxcomp : integer
Maximum number of components of the Fourier series expansion. Defaults to the one already set in object construction.
hpp : float
Hyperparameter p (defaults to “conservative” value of 1). Defaults to the one already set in object construction.
hpgamma : float
Hyperparameter p (defaults to “conservative” value of 0). Defaults to the one already set in object construction.
transform :
Transformation function to use. Can be either:
String “logit” to use logit transformation. Usefull if the sample space is the real line. However the obs must actually assume only low values like, for example, between [5, 5], this is due to the fact that inverse logit starts getting (computationally) very close to 1 (or 0) after 10 (or 10).
Dictionary {“transf”: “fixed”, “vmin”: vmin, “vmax”: vmax} for fixed value transformation, where vmin and vmax are the minimun and maximum values of the sample space (for obs), respectivelly.
A user defined transformation function with a dict with elements transf, itransf and laditransf, where transf is a function that transforms [0, 1] to sample space, itransf is its inverse, and laditransf is the log absolute derivative of itransf. These 3 functions must accept and return numpy 1D arrays.
Defaults to the one already set in object construction.
niter : integer
Number of iterations to sample. If set to 0, then will not call method sample.
**kwargs:
Additional keyword arguments passed to method sample.
Returns: self

plot
(ax=None, component=None, **kwargs)¶ Plot samples.
Parameters: ax : matplotlib axes
Axis to plot, defaults to axes of a new figure.
component : NoneType or integer
Which individual component of the mixture to use. Set to None (default) to full mixture with posterior with sieve prior (strongly recomended). Ignored if object was initialized with mixture=False.
**kwargs :
Aditional named arguments passed to matplotlib.axes.Axes.step.
Returns: matplotlib.axes.Axes object

sampleposterior
(niter=5000, nchains=4, njobs=1, tolrhat=0.02, **kwargs)¶ Samples from posterior.
Parameters: niter :
Number of simulations to be draw.
nchains :
Number of MCMC chains.
njobs :
Number of CPUs to be used in parallel. If 1 (default), all CPUs will be used.
tolrhat :
Maximum tolerable distance an Rhat of any sampled parameter can have from 1 (we will resample model approximatedly 10% more iterations until this convergence criteria is met).
**kwargs :
Additional keyword arguments passed to pystan (e.g.: refresh parameter to configure sampler printing status).
Returns: self

score
(points, transformed=True, component=None)¶ Return sum of average logdensity of points.
This is equivalent to calling:
obj.evaluate(points, logdensity=True, transformed, component).sum()[()]
Parameters: points : array like or scalar
Scalar or 1D numpy array of points where density will be evaluated.
transformed : bool
True (default) if points live in the sample space of your inputted data. False if your points live in the sample space of Bayesian Fourier Series sample space ([0, 1]).
component : NoneType or integer
Which individual component of the mixture to use. Set to None (default) to full mixture with posterior with sieve prior (strongly recomended). Ignored if object was initialized with mixture=False.
Returns: Numpy float with sum of predicted mean log densities.


class
npcompare.
EstimateLindleyBFS
(obs0=None, obs1=None, nmaxcomp=10, hpp=1, hpgamma=0, hplindley=0.5, transformation=None, mixture=True, **kwargs)¶ Bases:
object
Estimate univariate density using Bayesian Fourier Series. This method only works with data the lives in [0, 1], however, the class implements methods to automatically transform user inputted data to [0, 1]. See parameter transform below.
Parameters: obs0 : array like
List/tuple or numpy.array 1D array of observations for model 1.
obs1 : array like
List/tuple or numpy.array 1D array of observations for model 2.
hplindley : float
A priori probability that obs0 and obs1 came from different populations.
**kwargs:
Additional keyword arguments passed to method fit.
Notes
For other parameters, see documentation for EstimateBFS.

compilestanmodel
()¶ Compile Stan model necessary for method sample. This method is called automatically by obj.sample().
Returns: self

evalgrid
(gridsize=1000)¶ Call method evalgrid for each EstimateBFS object that the self holds, that is, this is equivalent to calling:
obj.bfs0.evalgrid(gridsize) obj.bfs1.evalgrid(gridsize) obj.bfsconcat.evalgrid(gridsize)
Parameters: gridsize : size of the grid. Returns: self

fit
(obs0, obs1, nmaxcomp=None, hpp=None, hpgamma=None, hplindley=None, transformation=None, niter=5000, **kwargs)¶ Configure object model data and automatically calls method sampleposterior (to obtain MCMC samples from the posterior).
Parameters: obs0 : array like
List/tuple or numpy.array 1D array of observations for model 1.
obs1 : array like
List/tuple or numpy.array 1D array of observations for model 2.
hplindley : float
A priori probability that obs0 and obs1 came from different populations.
**kwargs:
Additional keyword arguments passed to method sample.
Returns: self
Notes
For other parameters, see documentation for method fit of EstimateBFS.

sampleposterior
(niter=5000, nchains=4, njobs=1, tolrhat=0.02, **kwargs)¶ Samples from posterior.
Parameters: niter :
Number of simulations to be draw.
nchains :
Number of MCMC chains.
njobs :
Number of CPUs to be used in parallel. If 1 (default), all CPUs will be used.
tolrhat :
Maximum tolerable distance an Rhat of any sampled parameter can have from 1 (we will resample model approximatedly 10% more iterations until this convergence criteria is met).
**kwargs :
Additional keyword arguments passed to pystan (e.g.: refresh parameter to configure sampler printing status).
Returns: self


npcompare.
fourierseries
(x, ncomponents)¶ Calculate Fourier Series Expansion.
Parameters: x : 1D numpy.array, list or tuple of numbers to calculate
fourier series expansion
ncomponents : int
number of components of the series
Returns: 2D numpy.array where each line is the Fourier series expansion of
each component of x.