npcompare package documentation

class npcompare.Compare(f1, f2, psamples1, psamples2, lower=0, upper=1, weights1=None, weights2=None, metric=None)

Bases: object

Compare two samples.

Parameters:

f1 : function

Density function for the first population.

f2 : function

Density function for the second population.

psamples1 : array like

Must be either a one dimensional numpy.array (in which case, each element will be passed one at a time to f1) or a two dimensional numpy.array (in which case, each row will be passed one at a time to f1).

psamples2 : array like

Analog to psamples1.

weights1 : array like

Give weights to posterior psamples1. Set to None if each posterior sample has the same weight (the usual case for MCMC methods).

weights2 : array like

Analog to weights2.

metric : function

Metric function to be used.

Defaults to:

def metric(f1, f2, param1, param2): return quad(lambda x:
(f1(x, param1) - f2(x, param2))**2, a, b)[0]

Can be set to a user-defined function of the same signature.

lower : float

Lower integration limit passed to default metric function.

upper : float

Upper integration limit passed to default metric function.

classmethod frombfs(bfsobj1, bfsobj2, transformation=True, metric=None)

Create a class Compare from two EstimateBFS objects.

Parameters:

bfsobj1 : EstimateBFS object

bfsobj2 : EstimateBFS object

transformation :

If set to False, metric will be evaluated in [0, 1] without transformation (EstimateBFS class documentation). Otherwise, transformation will be applied.

Ignored if bfsobj1 has no transformation (which implies sample space of [0, 1]).

metric : function

Metric function to be used, defaults to class’s default.

Returns:

New instance of class Compare

plot(ax=None, **kwargs)

Plot empirical CDF of metric samples.

Parameters:

ax : matplotlib axes

Axis to plot, defaults to axes of a new figure.

show : bool

If True, calls matplotlib.pyplot plt.show() at end.

**kwargs :

Aditional named arguments passed to matplotlib.axes.Axes.step.

Returns:

matplotlib.axes.Axes object

sampleposterior(niter=1000, refresh=100)

Compare two samples.

Parameters:

niter : integer

Number of simulations to be draw.

refresh : integer

Interval of samples to print the amount of samples obtained so far. Set to 0 to disable printing.

Returns:

self

class npcompare.EstimateBFS(obs=None, nmaxcomp=10, hpp=1, hpgamma=0, transformation=None, mixture=True, **kwargs)

Bases: object

Estimate univariate density using Bayesian Fourier Series. This method only works with data the lives in [0, 1], however, the class implements methods to automatically transform user inputted data to [0, 1]. See parameter transform below.

Parameters:

obs : array like

List/tuple or numpy.array 1D array of observations. If unset, the constructor will not call method fit, and you will need to call it manually later to configure your model data (this can be usefull for using it with scikit-learns GridSearchCV). nmaxcomp : maximum number of components of the Fourier series expansion.

nmaxcomp : integer

Maximum number of components of the Fourier series expansion.

hpp : float

Hyperparameter p (defaults to “conservative” value of 1).

hpgamma : float

Hyperparameter p (defaults to “conservative” value of 0).

transform :

Transformation function to use. Can be either:

String “logit” to use logit transformation. Usefull if the sample space is the real line. However the obs must actually assume only low values like, for example, between [-5, 5], this is due to the fact that inverse logit starts getting (computationally) very close to 1 (or 0) after 10 (or -10).

Dictionary {“transf”: “fixed”, “vmin”: vmin, “vmax”: vmax} for fixed value transformation, where vmin and vmax are the minimun and maximum values of the sample space (for obs), respectivelly.

A user defined transformation function with a dict with elements transf, itransf and laditransf, where transf is a function that transforms [0, 1] to sample space, itransf is its inverse, and laditransf is the log absolute derivative of itransf. These 3 functions must accept and return numpy 1D arrays.

mixture : bool

If True, will work with a mixture of Fourier series models with up to nmaxcomponent components (that is, a prior will be set on the number of components of the Fourier Series). If False, will work with a single Fourier series model with exactly nmaxcomponent components.

**kwargs:

Additional keyword arguments passed to method fit.

compilestanmodel()

Compile Stan model necessary for method sample. This method is called automatically by obj.sample().

Returns:self
evalgrid(gridsize=1000)

Calculates posterior estimated mean density value at grid points so they can be used later by method plot or directly by the user.

Parameters:gridsize : size of the grid.
Returns:self

Notes

The results are store in an instance variable called egresults which is a dictionary with the following objects:

If object was initialized with mixture=True, you will have:

Log densities for full mixture of components stored as logdensitymixmean.

Densities for full mixture of components stored as densitymixmean.

Log densities for individual components stored as logdensityindivmean.

Densities for individual components stored as densityindivmean.

If object was initialized with mixture=False, you will have:

Log densities stored as logdensitymean.

Densities stored as densitymean.

evaluate(points, logdensity=False, transformed=True, component=None)

Predict posterior density for estimated model.

Parameters:

points : array like or scalar

Scalar or 1D numpy array of points where density will be evaluated.

transformed : bool

True (default) if points live in the sample space of your inputted data. False if your points live in the sample space of Bayesian Fourier Series sample space ([0, 1]).

component : NoneType or integer

Which individual component of the mixture to use. Set to None (default) to full mixture with posterior with sieve prior (strongly recomended). Ignored if object was initialized with mixture=False.

logdensity : bool

If True, will return the logdensity instead of the density.

Returns:

Numpy 1D array with predicted mean density.

fit(obs, nmaxcomp=None, hpp=None, hpgamma=None, transformation=None, niter=5000, **kwargs)

Configure object model data and automatically calls method sampleposterior (to obtain MCMC samples from the posterior).

Parameters:

obs : array like

List/tuple or numpy.array 1D array of observations.

nmaxcomp : integer

Maximum number of components of the Fourier series expansion. Defaults to the one already set in object construction.

hpp : float

Hyperparameter p (defaults to “conservative” value of 1). Defaults to the one already set in object construction.

hpgamma : float

Hyperparameter p (defaults to “conservative” value of 0). Defaults to the one already set in object construction.

transform :

Transformation function to use. Can be either:

String “logit” to use logit transformation. Usefull if the sample space is the real line. However the obs must actually assume only low values like, for example, between [-5, 5], this is due to the fact that inverse logit starts getting (computationally) very close to 1 (or 0) after 10 (or -10).

Dictionary {“transf”: “fixed”, “vmin”: vmin, “vmax”: vmax} for fixed value transformation, where vmin and vmax are the minimun and maximum values of the sample space (for obs), respectivelly.

A user defined transformation function with a dict with elements transf, itransf and laditransf, where transf is a function that transforms [0, 1] to sample space, itransf is its inverse, and laditransf is the log absolute derivative of itransf. These 3 functions must accept and return numpy 1D arrays.

Defaults to the one already set in object construction.

niter : integer

Number of iterations to sample. If set to 0, then will not call method sample.

**kwargs:

Additional keyword arguments passed to method sample.

Returns:

self

plot(ax=None, component=None, **kwargs)

Plot samples.

Parameters:

ax : matplotlib axes

Axis to plot, defaults to axes of a new figure.

component : NoneType or integer

Which individual component of the mixture to use. Set to None (default) to full mixture with posterior with sieve prior (strongly recomended). Ignored if object was initialized with mixture=False.

**kwargs :

Aditional named arguments passed to matplotlib.axes.Axes.step.

Returns:

matplotlib.axes.Axes object

sampleposterior(niter=5000, nchains=4, njobs=-1, tolrhat=0.02, **kwargs)

Samples from posterior.

Parameters:

niter :

Number of simulations to be draw.

nchains :

Number of MCMC chains.

njobs :

Number of CPUs to be used in parallel. If -1 (default), all CPUs will be used.

tolrhat :

Maximum tolerable distance an Rhat of any sampled parameter can have from 1 (we will resample model approximatedly 10% more iterations until this convergence criteria is met).

**kwargs :

Additional keyword arguments passed to pystan (e.g.: refresh parameter to configure sampler printing status).

Returns:

self

score(points, transformed=True, component=None)

Return sum of average logdensity of points.

This is equivalent to calling:

obj.evaluate(points, logdensity=True, transformed,
component).sum()[()]
Parameters:

points : array like or scalar

Scalar or 1D numpy array of points where density will be evaluated.

transformed : bool

True (default) if points live in the sample space of your inputted data. False if your points live in the sample space of Bayesian Fourier Series sample space ([0, 1]).

component : NoneType or integer

Which individual component of the mixture to use. Set to None (default) to full mixture with posterior with sieve prior (strongly recomended). Ignored if object was initialized with mixture=False.

Returns:

Numpy float with sum of predicted mean log densities.

class npcompare.EstimateLindleyBFS(obs0=None, obs1=None, nmaxcomp=10, hpp=1, hpgamma=0, hplindley=0.5, transformation=None, mixture=True, **kwargs)

Bases: object

Estimate univariate density using Bayesian Fourier Series. This method only works with data the lives in [0, 1], however, the class implements methods to automatically transform user inputted data to [0, 1]. See parameter transform below.

Parameters:

obs0 : array like

List/tuple or numpy.array 1D array of observations for model 1.

obs1 : array like

List/tuple or numpy.array 1D array of observations for model 2.

hplindley : float

A priori probability that obs0 and obs1 came from different populations.

**kwargs:

Additional keyword arguments passed to method fit.

Notes

For other parameters, see documentation for EstimateBFS.

compilestanmodel()

Compile Stan model necessary for method sample. This method is called automatically by obj.sample().

Returns:self
evalgrid(gridsize=1000)

Call method evalgrid for each EstimateBFS object that the self holds, that is, this is equivalent to calling:

obj.bfs0.evalgrid(gridsize)
obj.bfs1.evalgrid(gridsize)
obj.bfsconcat.evalgrid(gridsize)
Parameters:gridsize : size of the grid.
Returns:self
fit(obs0, obs1, nmaxcomp=None, hpp=None, hpgamma=None, hplindley=None, transformation=None, niter=5000, **kwargs)

Configure object model data and automatically calls method sampleposterior (to obtain MCMC samples from the posterior).

Parameters:

obs0 : array like

List/tuple or numpy.array 1D array of observations for model 1.

obs1 : array like

List/tuple or numpy.array 1D array of observations for model 2.

hplindley : float

A priori probability that obs0 and obs1 came from different populations.

**kwargs:

Additional keyword arguments passed to method sample.

Returns:

self

Notes

For other parameters, see documentation for method fit of EstimateBFS.

sampleposterior(niter=5000, nchains=4, njobs=-1, tolrhat=0.02, **kwargs)

Samples from posterior.

Parameters:

niter :

Number of simulations to be draw.

nchains :

Number of MCMC chains.

njobs :

Number of CPUs to be used in parallel. If -1 (default), all CPUs will be used.

tolrhat :

Maximum tolerable distance an Rhat of any sampled parameter can have from 1 (we will resample model approximatedly 10% more iterations until this convergence criteria is met).

**kwargs :

Additional keyword arguments passed to pystan (e.g.: refresh parameter to configure sampler printing status).

Returns:

self

npcompare.fourierseries(x, ncomponents)

Calculate Fourier Series Expansion.

Parameters:

x : 1D numpy.array, list or tuple of numbers to calculate

fourier series expansion

ncomponents : int

number of components of the series

Returns:

2D numpy.array where each line is the Fourier series expansion of

each component of x.