Skip to content

Distributions

This serves as reference for all distribution objects that we implement in the ondil package.

Note

This page is somewhat under construction, since MkDocs does not support docstring inheritance at the moment.

All distributions are based on scipy.stats distributions. We implement the probability density function (PDF), the cumulative density function (CDF), the percentage point or quantile function (PPF) and the random variates (RVS) accordingly as pass-through. The link functions are implemented in the same way as in GAMLSS (Rigby & Stasinopoulos, 2005). The link functions and their derivatives derive from the LinkFunction base class.

Base Classes

Base Distribution Description
Distribution Base class for all distributions.
ScipyMixin Base class for all distributions that are based on scipy.

List of Distributions

Distribution Description scipy Equivalent
DistributionNormal Gaussian (mean and standard deviation) scipy.stats.norm
DistributionNormalMeanVariance Gaussian (mean and variance) scipy.stats.norm
DistributionT Student's \(t\) distribution scipy.stats.t
DistributionJSU Johnson's SU distribution scipy.stats.johnsonsu
DistributionGamma Gamma distribution scipy.stats.gamma
DistributionLogNormal Log-normal distribution scipy.stats.lognorm
DistributionLogNormalMedian Log-normal distribution (median) -
DistributionLogistic Logistic distribution scipy.stats.logistic
DistributionExponential Exponential distribution scipy.stats.expon
DistributionBeta Beta distribution scipy.stats.beta
DistributionGumbel Gumbel distribution scipy.stats.gumbel_r
DistributionInverseGaussian Inverse Gaussian distribution scipy.stats.invgauss
DistributionBetaInflated Beta Inflated distribution -
DistributionReverseGumbel Reverse Gumbel distribution scipy.stats.gumbel_r
DistributionInverseGamma Inverse Gamma distribution scipy.stats.invgamma

API Reference

ondil.DistributionNormal

Bases: ScipyMixin, Distribution

The Normal distribution with mean and standard deviation parameterization.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1^2}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.

This distribution corresponds to the NO() distribution in GAMLSS.

__init__

__init__(loc_link: LinkFunction = IdentityLink(), scale_link: LinkFunction = LogLink()) -> None

Initialize the DistributionNormal.

Parameters:

Name Type Description Default
loc_link LinkFunction

Location link. Defaults to IdentityLink().

IdentityLink()
scale_link LinkFunction

Scale link. Defaults to LogLink().

LogLink()

ondil.DistributionNormalMeanVariance

Bases: ScipyMixin, Distribution

The Normal distribution with mean and variance parameterization.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma^2 = \theta_1\) is the scale parameter.

__init__

__init__(loc_link: LinkFunction = IdentityLink(), scale_link: LinkFunction = LogLink()) -> None

Initialize the DistributionNormalMeanVariance.

Parameters:

Name Type Description Default
loc_link LinkFunction

Location link. Defaults to IdentityLink().

IdentityLink()
scale_link LinkFunction

Scale link. Defaults to LogLink().

LogLink()

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> dict

Map GAMLSS Parameters to scipy parameters.

Parameters:

Name Type Description Default
theta ndarray

parameters

required

Returns:

Name Type Description
dict dict

Dict of (loc, scale) for scipy.stats.norm(loc, scale)

ondil.DistributionT

Bases: ScipyMixin, Distribution

Corresponds to GAMLSS TF() and scipy.stats.t()

ondil.DistributionJSU

Bases: ScipyMixin, Distribution

Corresponds to GAMLSS JSUo() and scipy.stats.johnsonsu()

Distribution parameters: 0 : Location 1 : Scale (close to standard deviation) 2 : Skewness 3 : Tail behaviour

ondil.DistributionGamma

Bases: ScipyMixin, Distribution

The Gamma Distribution for GAMLSS.

The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{y^{(1/\sigma^2-1)}\exp[-y/(\sigma^2 \mu)]}{(\sigma^2 \mu)^{(1/\sigma^2)} \Gamma(1/\sigma^2)} $$

with the location and shape parameters \(\mu, \sigma > 0\).

Note

The function is parameterized as GAMLSS' GA() distribution.

This parameterization is different to the scipy.stats.gamma(alpha, loc, scale) parameterization.

We can use DistributionGamma().theta_to_scipy_params(theta) to map the distribution parameters to scipy.

The scipy.stats.gamma() distribution is defined as: $$ f(x, \alpha, \beta) = \frac{\beta^\alpha x^{\alpha - 1} \exp[-\beta x]}{\Gamma(\alpha)} $$ with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows: $$ \alpha = 1/\sigma^2 \Leftrightarrow \sigma = \sqrt{1 / \alpha} $$ and $$ \beta = 1/(\sigma^2\mu). $$

Parameters:

Name Type Description Default
loc_link LinkFunction

The link function for \(\mu\). Defaults to LogLink().

LogLink()
scale_link LinkFunction

The link function for \(\sigma\). Defaults to LogLink().

LogLink()

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> dict

Map GAMLSS Parameters to scipy parameters.

Parameters:

Name Type Description Default
theta ndarray

parameters

required

Returns:

Name Type Description
dict dict

Dict of (a, loc, scale) for scipy.stats.gamma(a, loc, scale)

ondil.DistributionLogNormal

Bases: ScipyMixin, Distribution

The Log-Normal distribution with mean and standard deviation parameterization in the log-space.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}}\exp\left(-\frac{(\log y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.

Note

Note that re-parameterization used to move from scipy.stats to GAMLSS is: $$ \mu = \exp(\theta_0) $$ and can therefore be numerically unstable for large values of \(\theta_0\). We have re-implemented the PDF, CDF, PPF according to avoid this issue, however the rvs method still uses the scipy.stats implementation which is not numerically stable for large values of \(\theta_0\).

cdf

cdf(y: ndarray, theta: ndarray) -> np.ndarray

Cumulative distribution function of the Log-Normal distribution.

logpdf

logpdf(y: ndarray, theta: ndarray) -> np.ndarray

Logarithm of the probability density function of the Log-Normal distribution.

pdf

pdf(y: ndarray, theta: ndarray) -> np.ndarray

Probability density function of the Log-Normal distribution.

ppf

ppf(p: ndarray, theta: ndarray) -> np.ndarray

Percent-point function (quantile function) of the Log-Normal distribution.

ondil.DistributionLogNormalMedian

Bases: ScipyMixin, Distribution

The Log-Normal distribution with median and standard deviation parameterization in the log-space.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\left(-\frac{(\log y - \log \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \log \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the median parameter and \(\sigma = \theta_1\) is the scale parameter.

ondil.DistributionLogistic

Bases: ScipyMixin, Distribution

The Logistic distribution with location and scale parameterization.

The probability density function is: $$ f(y | \mu, \sigma) = \frac{\exp\left(-\frac{y - \mu}{\sigma}\right)}{\sigma \left(1 + \exp\left(-\frac{y - \mu}{\sigma}\right)\right)^2} $$

This distribution corresponds to the LO() distribution in GAMLSS.

ondil.DistributionExponential

Bases: ScipyMixin, Distribution

The Exponential distribution parameterized by the mean (mu).

PDF: f(y | mu) = (1 / mu) * exp(-y / mu), for y > 0, mu > 0

This corresponds to EXP() in GAMLSS where: - mu > 0 - y > 0

ondil.DistributionInverseGaussian

Bases: ScipyMixin, Distribution

Inverse Gaussian (Wald) distribution for GAMLSS.

This distribution is characterized by two parameters: - \(\mu\): the mean of the distribution. - \(\sigma\): the scale parameter, which is related to the variance.

The probability density function (PDF) is given by: $$ f(y; \mu, \sigma) = \sqrt{\frac{\sigma}{2\pi y^3}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\sigma > 0\).

Note that the Inverse Gaussian distribution in scipy.stats is parameterized differently:

  • mu is the mean of the distribution.
  • scale is the scale parameter

and the PDF is given by: $$ f(y; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi y^3}} \exp\left(-\frac{\lambda (y - \mu)^2}{2\mu^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\lambda > 0\).

The relationship between the parameters is:

  • mu in scipy.stats corresponds to \(\mu \sigma^2\) in this implementation,
  • scale in scipy.stats corresponds to \(1 / \sigma^2\) in this implementation.
  • The loc parameter in scipy.stats is always 0.

ondil.DistributionBeta

Bases: ScipyMixin, Distribution

The Beta Distribution for GAMLSS.

The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{\Gamma(\frac{1 - \sigma^2}{\sigma^2})} { \Gamma(\frac{\mu (1 - \sigma^2)}{\sigma^2}) \Gamma(\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2})} y^{\frac{\mu (1 - \sigma^2)}{\sigma^2} - 1} (1-y)^{\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2} - 1} $$

with the location and shape parameters \(\mu, \sigma > 0\).

Note

The function is parameterized as GAMLSS' BE() distribution.

This parameterization is different to the scipy.stats.beta(alpha, beta, loc, scale) parameterization.

We can use DistributionBeta().gamlss_to_scipy(mu, sigma) to map the distribution parameters to scipy.

The scipy.stats.beta() distribution is defined as: $$ f(x, \alpha, \beta) = \frac{\Gamma(\alpha + \beta) x^{\alpha - 1} {(1 - x)}^{\beta - 1}}{\Gamma(\alpha) \Gamma(\beta)} $$

with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows: $$ \alpha = \mu (1 - \sigma^2) / \sigma^2 \Leftrightarrow \mu = \alpha / (\alpha + \beta) $$ and $$ \beta = (1 - \mu) (1 - \sigma^2)/ \sigma^2 \Leftrightarrow \sigma = \sqrt{((\alpha + \beta + 1) )} $$

Parameters:

Name Type Description Default
loc_link LinkFunction

The link function for \(\mu\). Defaults to LOGIT

LogitLink()
scale_link LinkFunction

The link function for \(\sigma\). Defaults to LOGIT

LogitLink()

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> dict

Map GAMLSS Parameters to scipy parameters.

Parameters:

Name Type Description Default
theta ndarray

parameters

required

Returns:

Name Type Description
dict dict

Dict of (a, b, loc, scale) for scipy.stats.beta(a, b, loc, scale)

ondil.DistributionGumbel

Bases: ScipyMixin, Distribution

The Gumbel distribution.

The probability density function is given by: $$ f(y|\mu, \sigma) = (1/\sigma) * \exp(-(z + \exp(-z))) $$ where \(z = (y - \mu)/\sigma\) and has the following parameters:

  • \(\mu\): location
  • \(\sigma\): scale (>0)

ondil.DistributionBetaInflated

Bases: Distribution

The Beta inflated Distribution for GAMLSS.

ondil.DistributionReverseGumbel

Bases: ScipyMixin, Distribution

The Reverse Gumbel (Type I minimum extreme value) distribution with location (mu) and scale (sigma) parameters.

The probability density function is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sigma} \exp\left( \frac{y - \mu}{\sigma} - \exp\left( \frac{y - \mu}{\sigma} \right) \right) $$

This distribution corresponds to the RG() distribution in GAMLSS.

Notes
  • Mean = mu - digamma(1) * sigma ≈ mu - 0.5772157 * sigma
  • Variance = (pi^2 * sigma^2) / 6 ≈ 1.64493 * sigma^2

ondil.DistributionInverseGamma

Bases: ScipyMixin, Distribution

The Inverse Gamma distribution as parameterized in GAMLSS:

Parameters:

Name Type Description Default
- mu

mean-related parameter

required
- sigma

dispersion parameter

required
Reparameterization

α = 1 / sigma² scale = mu * (1 + sigma²) / sigma²

This distribution corresponds to IGAMMA() in GAMLSS.

Base Class

ondil.base.Distribution

Bases: ABC

corresponding_gamlss property

corresponding_gamlss: str | None

The name of the corresponding implementation in 'gamlss.dist' R package.

distribution_support abstractmethod property

distribution_support: Tuple[float, float]

The support of the distribution.

n_params property

n_params: int

Each subclass must define 'n_params'.

parameter_names abstractmethod property

parameter_names: dict

Parameter name for each column of theta.

parameter_support abstractmethod property

parameter_support: dict

The support of each parameter of the distribution.

calculate_conditional_initial_values abstractmethod

calculate_conditional_initial_values(y: ndarray, theta: ndarray, param: int) -> np.ndarray

Calculate the conditional initial values for the GAMLSS fit.

cdf abstractmethod

cdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the cumulative distribution function (CDF) for the given data.

Parameters:

Name Type Description Default
y ndarray

The data points at which to evaluate the CDF.

required
theta ndarray

The parameters of the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: The CDF evaluated at the given data points.

dl1_dp1 abstractmethod

dl1_dp1(y: ndarray, theta: ndarray, param: int) -> np.ndarray

Take the first derivative of the likelihood function with respect to the param.

dl2_dp2 abstractmethod

dl2_dp2(y: ndarray, theta: ndarray, param: int) -> np.ndarray

Take the second derivative of the likelihood function with respect to the param.

dl2_dpp abstractmethod

dl2_dpp(y: ndarray, theta: ndarray, params: Tuple[int, int]) -> np.ndarray

Take the first derivative of the likelihood function with respect to both parameters.

initial_values abstractmethod

initial_values(y: ndarray, param: int = 0, axis: Optional[int | None] = None) -> np.ndarray

Calculate the initial values for the GAMLSS fit.

link_function(y: ndarray, param: int = 0) -> np.ndarray

Apply the link function for param on y.

link_function_derivative(y: ndarray, param: int = 0) -> np.ndarray

Apply the derivative of the link function for param on y.

link_inverse(y: ndarray, param: int = 0) -> np.ndarray

Apply the inverse of the link function for param on y.

link_inverse_derivative(y: ndarray, param: int = 0) -> np.ndarray

Apply the derivative of the inverse link function for param on y.

logcdf abstractmethod

logcdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the cumulative distribution function (CDF) for the given data points.

Parameters:

Name Type Description Default
y ndarray

An array of data points at which to evaluate the log CDF.

required
theta ndarray

An array of parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: An array of log CDF values corresponding to the data points in y.

pdf abstractmethod

pdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the probability density function (PDF) for the given data points.

Parameters:

Name Type Description Default
y ndarray

An array of data points at which to evaluate the PDF.

required
theta ndarray

An array of parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: An array of PDF values corresponding to the data points in y.

pmf abstractmethod

pmf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the probability mass function (PMF) for the given data points.

Parameters:

Name Type Description Default
y ndarray

An array of data points at which to evaluate the PDF.

required
theta ndarray

An array of parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: An array of PMF values corresponding to the data points in y.

ppf abstractmethod

ppf(q: ndarray, theta: ndarray) -> np.ndarray

Percent Point Function (Inverse of CDF).

Parameters:

Name Type Description Default
q ndarray

Quantiles.

required
theta ndarray

Distribution parameters.

required

Returns:

Type Description
ndarray

np.ndarray: The quantile corresponding to the given probabilities.

quantile

quantile(q: ndarray, theta: ndarray) -> np.ndarray

Compute the quantile function for the given data.

This is a alias for the ppf method.

Parameters:

Name Type Description Default
q ndarray

The quantiles to compute.

required
theta ndarray

The parameters of the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: The quantiles corresponding to the given probabilities.

rvs abstractmethod

rvs(size: int, theta: ndarray) -> np.ndarray

Generate random variates of given size and parameters.

Parameters:

Name Type Description Default
size int

The number of random variates to generate.

required
theta ndarray

The parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: A 2D array of random variates with shape (theta.shape[0], size).

theta_to_params

theta_to_params(theta: ndarray) -> Tuple[np.ndarray, ...]

Take the fitted values and return tuple of vectors for distribution parameters.

ondil.base.ScipyMixin

Bases: ABC

parameter_names abstractmethod property

parameter_names: dict

Parameter name for each column of theta.

scipy_dist abstractmethod property

scipy_dist: rv_continuous

The names of the parameters in the scipy.stats distribution and the corresponding column in theta.

scipy_names abstractmethod property

scipy_names: Tuple[str]

The names of the parameters in the scipy.stats distribution and the corresponding column in theta.

logcdf

logcdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the cumulative distribution function (CDF) for the given data points.

Parameters:

Name Type Description Default
y ndarray

An array of data points at which to evaluate the log CDF.

required
theta ndarray

An array of parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: An array of log CDF values corresponding to the data points in y.

logpdf

logpdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the probability density function (PDF) for the given data points.

Parameters:

Name Type Description Default
y ndarray

An array of data points at which to evaluate the log PDF.

required
theta ndarray

An array of parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: An array of log PDF values corresponding to the data points in y.

logpmf

logpmf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the probability mass function (PMF) for the given data points.

Parameters:

Name Type Description Default
y ndarray

An array of data points at which to evaluate the log PMF.

required
theta ndarray

An array of parameters for the distribution.

required

Returns:

Type Description
ndarray

np.ndarray: An array of log PMF values corresponding to the data points in y.

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> Dict[str, np.ndarray]

Maps \(\theta\) to the scipy parameters.

Parameters:

Name Type Description Default
theta ndarray

\(\theta\) as estimated by OnlineGamlss() estimator

required

Raises:

Type Description
ValueError

If we don't define the scipy_names attribute.

Returns:

Name Type Description
dict Dict[str, ndarray]

Dictionary that can be unrolled into scipy distribution class as in st.some_dist(**return_value)