Skip to content

Distributions

This serves as reference for all distribution objects that we implement in the ondil package.

Note

This page is somewhat under construction, since MkDocs does not support docstring inheritance at the moment.

All distributions are based on scipy.stats distributions. We implement the probability density function (PDF), the cumulative density function (CDF), the percentage point or quantile function (PPF) and the random variates (RVS) accordingly as pass-through. The link functions are implemented in the same way as in GAMLSS (Rigby & Stasinopoulos, 2005). The link functions and their derivatives derive from the LinkFunction base class.

Base Classes

Base Distribution Description
Distribution Base class for all distributions.
ScipyMixin Base class for all distributions that are based on scipy.

List of Distributions

Distribution Description scipy Base
Normal Gaussian (mean and standard deviation) scipy.stats.norm
NormalMeanVariance Gaussian (mean and variance) scipy.stats.norm
StudentT Student's \(t\) distribution scipy.stats.t
JSU Johnson's SU distribution scipy.stats.johnsonsu
Gamma Gamma distribution scipy.stats.gamma
LogNormal Log-normal distribution scipy.stats.lognorm
LogNormalMedian Log-normal distribution (median) -
Logistic Logistic distribution scipy.stats.logistic
Exponential Exponential distribution scipy.stats.expon
Beta Beta distribution scipy.stats.beta
Gumbel Gumbel distribution scipy.stats.gumbel_r
InverseGaussian Inverse Gaussian distribution scipy.stats.invgauss
BetaInflated Beta Inflated distribution -
ReverseGumbel Reverse Gumbel distribution scipy.stats.gumbel_r
InverseGamma Inverse Gamma distribution scipy.stats.invgamma
BetaInflatedZero Zero Inflated Beta distribution -
ZeroAdjustedGamma Zero Adjusted Gamma distribution -

API Reference

ondil.distributions.Normal

Bases: ScipyMixin, Distribution

The Normal distribution with mean and standard deviation parameterization.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1^2}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.

This distribution corresponds to the NO() distribution in GAMLSS.

__init__

__init__(
    loc_link: LinkFunction = Identity(),
    scale_link: LinkFunction = Log(),
) -> None

Initialize the Normal.

Parameters:

ondil.distributions.NormalMeanVariance

Bases: ScipyMixin, Distribution

The Normal distribution with mean and variance parameterization.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma^2 = \theta_1\) is the scale parameter.

__init__

__init__(
    loc_link: LinkFunction = Identity(),
    scale_link: LinkFunction = Log(),
) -> None

Initialize the NormalMeanVariance.

Parameters:

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> dict

Map GAMLSS Parameters to scipy parameters.

Parameters:

  • theta (ndarray) –

    parameters

Returns:

  • dict ( dict ) –

    Dict of (loc, scale) for scipy.stats.norm(loc, scale)

ondil.distributions.StudentT

Bases: ScipyMixin, Distribution

Corresponds to GAMLSS TF() and scipy.stats.t()

ondil.distributions.JSU

Bases: ScipyMixin, Distribution

Corresponds to GAMLSS JSUo() and scipy.stats.johnsonsu()

Distribution parameters: 0 : Location 1 : Scale (close to standard deviation) 2 : Skewness 3 : Tail behaviour

ondil.distributions.Gamma

Bases: ScipyMixin, Distribution

The Gamma Distribution for GAMLSS.

The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{y^{(1/\sigma^2-1)}\exp[-y/(\sigma^2 \mu)]}{(\sigma^2 \mu)^{(1/\sigma^2)} \Gamma(1/\sigma^2)} $$

with the location and shape parameters \(\mu, \sigma > 0\).

Note

The function is parameterized as GAMLSS' GA() distribution.

This parameterization is different to the scipy.stats.gamma(alpha, loc, scale) parameterization.

We can use Gamma().theta_to_scipy_params(theta) to map the distribution parameters to scipy.

The scipy.stats.gamma() distribution is defined as: $$ f(x, \alpha, \beta) = \frac{\beta^\alpha x^{\alpha - 1} \exp[-\beta x]}{\Gamma(\alpha)} $$ with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows: $$ \alpha = 1/\sigma^2 \Leftrightarrow \sigma = \sqrt{1 / \alpha} $$ and $$ \beta = 1/(\sigma^2\mu). $$

Parameters:

  • loc_link (LinkFunction, default: Log() ) –

    The link function for \(\mu\). Defaults to Log().

  • scale_link (LinkFunction, default: Log() ) –

    The link function for \(\sigma\). Defaults to Log().

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> dict

Map GAMLSS Parameters to scipy parameters.

Parameters:

  • theta (ndarray) –

    parameters

Returns:

  • dict ( dict ) –

    Dict of (a, loc, scale) for scipy.stats.gamma(a, loc, scale)

ondil.distributions.LogNormal

Bases: ScipyMixin, Distribution

The Log-Normal distribution with mean and standard deviation parameterization in the log-space.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}}\exp\left(-\frac{(\log y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.

Note

Note that re-parameterization used to move from scipy.stats to GAMLSS is: $$ \mu = \exp(\theta_0) $$ and can therefore be numerically unstable for large values of \(\theta_0\). We have re-implemented the PDF, CDF, PPF according to avoid this issue, however the rvs method still uses the scipy.stats implementation which is not numerically stable for large values of \(\theta_0\).

pdf

pdf(y: ndarray, theta: ndarray) -> np.ndarray

Probability density function of the Log-Normal distribution.

cdf

cdf(y: ndarray, theta: ndarray) -> np.ndarray

Cumulative distribution function of the Log-Normal distribution.

ppf

ppf(p: ndarray, theta: ndarray) -> np.ndarray

Percent-point function (quantile function) of the Log-Normal distribution.

logpdf

logpdf(y: ndarray, theta: ndarray) -> np.ndarray

Logarithm of the probability density function of the Log-Normal distribution.

ondil.distributions.LogNormalMedian

Bases: ScipyMixin, Distribution

The Log-Normal distribution with median and standard deviation parameterization in the log-space.

The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\left(-\frac{(\log y - \log \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \log \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the median parameter and \(\sigma = \theta_1\) is the scale parameter.

ondil.distributions.Logistic

Bases: ScipyMixin, Distribution

The Logistic distribution with location and scale parameterization.

The probability density function is: $$ f(y | \mu, \sigma) = \frac{\exp\left(-\frac{y - \mu}{\sigma}\right)}{\sigma \left(1 + \exp\left(-\frac{y - \mu}{\sigma}\right)\right)^2} $$

This distribution corresponds to the LO() distribution in GAMLSS.

ondil.distributions.Exponential

Bases: ScipyMixin, Distribution

The Exponential distribution parameterized by the mean (mu).

PDF: f(y | mu) = (1 / mu) * exp(-y / mu), for y > 0, mu > 0

This corresponds to EXP() in GAMLSS where: - mu > 0 - y > 0

ondil.distributions.InverseGaussian

Bases: ScipyMixin, Distribution

Inverse Gaussian (Wald) distribution for GAMLSS.

This distribution is characterized by two parameters: - \(\mu\): the mean of the distribution. - \(\sigma\): the scale parameter, which is related to the variance.

The probability density function (PDF) is given by: $$ f(y; \mu, \sigma) = \sqrt{\frac{\sigma}{2\pi y^3}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\sigma > 0\).

Note that the Inverse Gaussian distribution in scipy.stats is parameterized differently:

  • mu is the mean of the distribution.
  • scale is the scale parameter

and the PDF is given by: $$ f(y; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi y^3}} \exp\left(-\frac{\lambda (y - \mu)^2}{2\mu^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\lambda > 0\).

The relationship between the parameters is:

  • mu in scipy.stats corresponds to \(\mu \sigma^2\) in this implementation,
  • scale in scipy.stats corresponds to \(1 / \sigma^2\) in this implementation.
  • The loc parameter in scipy.stats is always 0.

ondil.distributions.Beta

Bases: ScipyMixin, Distribution

The Beta Distribution for GAMLSS.

The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{\Gamma(\frac{1 - \sigma^2}{\sigma^2})} { \Gamma(\frac{\mu (1 - \sigma^2)}{\sigma^2}) \Gamma(\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2})} y^{\frac{\mu (1 - \sigma^2)}{\sigma^2} - 1} (1-y)^{\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2} - 1} $$

with the location and shape parameters \(\mu, \sigma > 0\).

Note

The function is parameterized as GAMLSS' BE() distribution.

This parameterization is different to the scipy.stats.beta(alpha, beta, loc, scale) parameterization.

We can use Beta().gamlss_to_scipy(mu, sigma) to map the distribution parameters to scipy.

The scipy.stats.beta() distribution is defined as: $$ f(x, \alpha, \beta) = \frac{\Gamma(\alpha + \beta) x^{\alpha - 1} {(1 - x)}^{\beta - 1}}{\Gamma(\alpha) \Gamma(\beta)} $$

with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows: $$ \alpha = \mu (1 - \sigma^2) / \sigma^2 \Leftrightarrow \mu = \alpha / (\alpha + \beta) $$ and $$ \beta = (1 - \mu) (1 - \sigma^2)/ \sigma^2 \Leftrightarrow \sigma = \sqrt{((\alpha + \beta + 1) )} $$

Parameters:

  • loc_link (LinkFunction, default: Logit() ) –

    The link function for \(\mu\). Defaults to LOGIT

  • scale_link (LinkFunction, default: Logit() ) –

    The link function for \(\sigma\). Defaults to LOGIT

theta_to_scipy_params

theta_to_scipy_params(theta: ndarray) -> dict

Map GAMLSS Parameters to scipy parameters.

Parameters:

  • theta (ndarray) –

    parameters

Returns:

  • dict ( dict ) –

    Dict of (a, b, loc, scale) for scipy.stats.beta(a, b, loc, scale)

ondil.distributions.Gumbel

Bases: ScipyMixin, Distribution

The Gumbel distribution.

The probability density function is given by: $$ f(y|\mu, \sigma) = (1/\sigma) * \exp(-(z + \exp(-z))) $$ where \(z = (y - \mu)/\sigma\) and has the following parameters:

  • \(\mu\): location
  • \(\sigma\): scale (>0)

ondil.distributions.BetaInflated

Bases: Distribution

The Beta Inflated Distribution for GAMLSS.

The distribution function is defined as in GAMLSS as: $$ f_Y(y \mid \mu, \sigma, \nu, \tau) = \begin{cases} p_0 & \text{if } y = 0 \ (1 - p_0 - p_1) \dfrac{1}{B(\alpha, \beta)} y^{\alpha - 1}(1 - y)^{\beta - 1} & \text{if } 0 < y < 1 \ p_1 & \text{if } y = 1 \end{cases} $$

where \(\alpha = \mu (1 - \sigma^2) / \sigma^2\), \beta = (1 - \mu) (1 - \sigma^2)/ \sigma^2; p_0 = \nu (1 + \nu + \tau)^{-1} and p_1 = \tau (1 + \nu + \tau)^{-1}$,

and \(\mu, \sigma \in (0,1)\) and $\nu, \tau > 0 $

The parameter tuple \(\theta\) in Python is defined as:

\(\theta = (\theta_0, \theta_1, \theta_2, \theta_3) = (\mu, \sigma, \nu, \tau)\) where \(\mu = \theta_0\) is the location parameter, \(\sigma = \theta_1\) is the scale parameter and \(\nu, \tau = \theta_2, \theta_3\) are shape parameters which together define the inflation at 0 and 1

This distribution corresponds to the BEINF() distribution in GAMLSS.

ondil.distributions.ReverseGumbel

Bases: ScipyMixin, Distribution

The Reverse Gumbel (Type I minimum extreme value) distribution with location (mu) and scale (sigma) parameters.

The probability density function is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sigma} \exp\left( \frac{y - \mu}{\sigma} - \exp\left( \frac{y - \mu}{\sigma} \right) \right) $$

This distribution corresponds to the RG() distribution in GAMLSS.

Notes
  • Mean = mu - digamma(1) * sigma ≈ mu - 0.5772157 * sigma
  • Variance = (pi^2 * sigma^2) / 6 ≈ 1.64493 * sigma^2

ondil.distributions.InverseGamma

Bases: ScipyMixin, Distribution

The Inverse Gamma distribution as parameterized in GAMLSS:

Parameters:

  • - (mu) –

    mean-related parameter

  • - (sigma) –

    dispersion parameter

Reparameterization

α = 1 / sigma² scale = mu * (1 + sigma²) / sigma²

This distribution corresponds to IGAMMA() in GAMLSS.

ondil.distributions.ZeroAdjustedGamma

Bases: Distribution

The Zero Adjusted Gamma Distribution for GAMLSS.

The zero adjusted gamma distribution is a mixture of a discrete value 0 with probability \nu, and a gamma GA(\mu; \sigma) distribution on the positive real line (0, \infty) with probability (1 - \nu).

\[ f_Y(y \mid \mu, \sigma, \nu) = \begin{cases} \nu & \text{if } y = 0 \ (1 - \nu) f_W(y \mid \mu, \sigma) & \text{if } y > 0 \end{cases} \]

where \(y\) is the observed data, \(\mu > 0\) is the location parameter, \(\sigma > 0\) is the scale parameter, and $\nu \in [0, \infty) $ is the inflation parameter.

ondil.distributions.BetaInflatedZero

Bases: Distribution

The Zero Inflated Beta Distribution for GAMLSS.

f_Y(y \mid \mu, \sigma, \nu) = \begin{cases} p_0 & ext{if } y = 0 \ (1 - p_0) f_W(y \mid \mu, \sigma) & \text{if } 0 < y < 1 \end{cases}

where \(p_0 = \nu (1 + \nu)^{-1}\)

and \(\mu, \sigma \in (0,1)\) and $\nu > 0 $

Base Class

ondil.base.Distribution

Bases: ABC

corresponding_gamlss property

corresponding_gamlss: str | None

The name of the corresponding implementation in 'gamlss.dist' R package.

parameter_names abstractmethod property

parameter_names: dict

Parameter name for each column of theta.

n_params property

n_params: int

Each subclass must define 'n_params'.

distribution_support abstractmethod property

distribution_support: Tuple[float, float]

The support of the distribution.

parameter_support abstractmethod property

parameter_support: dict

The support of each parameter of the distribution.

theta_to_params

theta_to_params(theta: ndarray) -> Tuple[np.ndarray, ...]

Take the fitted values and return tuple of vectors for distribution parameters.

dl1_dp1 abstractmethod

dl1_dp1(
    y: ndarray, theta: ndarray, param: int
) -> np.ndarray

Take the first derivative of the likelihood function with respect to the param.

dl2_dp2 abstractmethod

dl2_dp2(
    y: ndarray, theta: ndarray, param: int
) -> np.ndarray

Take the second derivative of the likelihood function with respect to the param.

dl2_dpp abstractmethod

dl2_dpp(
    y: ndarray, theta: ndarray, params: Tuple[int, int]
) -> np.ndarray

Take the first derivative of the likelihood function with respect to both parameters.

link_function(y: ndarray, param: int = 0) -> np.ndarray

Apply the link function for param on y.

link_inverse(y: ndarray, param: int = 0) -> np.ndarray

Apply the inverse of the link function for param on y.

link_function_derivative(
    y: ndarray, param: int = 0
) -> np.ndarray

Apply the derivative of the link function for param on y.

link_inverse_derivative(
    y: ndarray, param: int = 0
) -> np.ndarray

Apply the derivative of the inverse link function for param on y.

initial_values abstractmethod

initial_values(
    y: ndarray,
    param: int = 0,
    axis: Optional[int | None] = None,
) -> np.ndarray

Calculate the initial values for the GAMLSS fit.

quantile

quantile(q: ndarray, theta: ndarray) -> np.ndarray

Compute the quantile function for the given data.

This is a alias for the ppf method.

Parameters:

  • q (ndarray) –

    The quantiles to compute.

  • theta (ndarray) –

    The parameters of the distribution.

Returns:

  • ndarray

    np.ndarray: The quantiles corresponding to the given probabilities.

calculate_conditional_initial_values abstractmethod

calculate_conditional_initial_values(
    y: ndarray, theta: ndarray, param: int
) -> np.ndarray

Calculate the conditional initial values for the GAMLSS fit.

cdf abstractmethod

cdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the cumulative distribution function (CDF) for the given data.

Parameters:

  • y (ndarray) –

    The data points at which to evaluate the CDF.

  • theta (ndarray) –

    The parameters of the distribution.

Returns:

  • ndarray

    np.ndarray: The CDF evaluated at the given data points.

pdf abstractmethod

pdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the probability density function (PDF) for the given data points.

Parameters:

  • y (ndarray) –

    An array of data points at which to evaluate the PDF.

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of PDF values corresponding to the data points in y.

pmf abstractmethod

pmf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the probability mass function (PMF) for the given data points.

Parameters:

  • y (ndarray) –

    An array of data points at which to evaluate the PDF.

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of PMF values corresponding to the data points in y.

ppf abstractmethod

ppf(q: ndarray, theta: ndarray) -> np.ndarray

Percent Point Function (Inverse of CDF).

Parameters:

  • q (ndarray) –

    Quantiles.

  • theta (ndarray) –

    Distribution parameters.

Returns:

  • ndarray

    np.ndarray: The quantile corresponding to the given probabilities.

rvs abstractmethod

rvs(size: int, theta: ndarray) -> np.ndarray

Generate random variates of given size and parameters.

Parameters:

  • size (int) –

    The number of random variates to generate.

  • theta (ndarray) –

    The parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: A 2D array of random variates with shape (theta.shape[0], size).

logcdf abstractmethod

logcdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the cumulative distribution function (CDF) for the given data points.

Parameters:

  • y (ndarray) –

    An array of data points at which to evaluate the log CDF.

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of log CDF values corresponding to the data points in y.

ondil.base.ScipyMixin

Bases: ABC

parameter_names abstractmethod property

parameter_names: dict

Parameter name for each column of theta.

scipy_dist abstractmethod property

scipy_dist: rv_continuous

The names of the parameters in the scipy.stats distribution and the corresponding column in theta.

scipy_names abstractmethod property

scipy_names: Tuple[str]

The names of the parameters in the scipy.stats distribution and the corresponding column in theta.

theta_to_scipy_params

theta_to_scipy_params(
    theta: ndarray,
) -> Dict[str, np.ndarray]

Maps \(\theta\) to the scipy parameters.

Parameters:

  • theta (ndarray) –

    \(\theta\) as estimated by OnlineDistributionalRegression() estimator

Raises:

  • ValueError

    If we don't define the scipy_names attribute.

Returns:

  • dict ( Dict[str, ndarray] ) –

    Dictionary that can be unrolled into scipy distribution class as in st.some_dist(**return_value)

logpmf

logpmf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the probability mass function (PMF) for the given data points.

Parameters:

  • y (ndarray) –

    An array of data points at which to evaluate the log PMF.

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of log PMF values corresponding to the data points in y.

logpdf

logpdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the probability density function (PDF) for the given data points.

Parameters:

  • y (ndarray) –

    An array of data points at which to evaluate the log PDF.

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of log PDF values corresponding to the data points in y.

logcdf

logcdf(y: ndarray, theta: ndarray) -> np.ndarray

Compute the log of the cumulative distribution function (CDF) for the given data points.

Parameters:

  • y (ndarray) –

    An array of data points at which to evaluate the log CDF.

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of log CDF values corresponding to the data points in y.

mean

mean(theta: ndarray) -> np.ndarray

Compute the mean of the distribution for the given parameters.

Parameters:

  • theta (ndarray) –

    An array of parameters for the distribution.

Returns:

  • ndarray

    np.ndarray: An array of means corresponding to the parameters in theta.