Distributions
This serves as reference for all distribution objects that we implement in the ondil
package.
Note
This page is somewhat under construction, since MkDocs
does not support docstring inheritance at the moment.
All distributions are based on scipy.stats
distributions. We implement the probability density function (PDF), the cumulative density function (CDF), the percentage point or quantile function (PPF) and the random variates (RVS) accordingly as pass-through. The link functions are implemented in the same way as in GAMLSS (Rigby & Stasinopoulos, 2005). The link functions and their derivatives derive from the LinkFunction
base class.
Base Classes
Base Distribution | Description |
---|---|
Distribution |
Base class for all distributions. |
ScipyMixin |
Base class for all distributions that are based on scipy . |
List of Distributions
Distribution | Description | scipy Equivalent |
---|---|---|
DistributionNormal |
Gaussian (mean and standard deviation) | scipy.stats.norm |
DistributionNormalMeanVariance |
Gaussian (mean and variance) | scipy.stats.norm |
DistributionT |
Student's \(t\) distribution | scipy.stats.t |
DistributionJSU |
Johnson's SU distribution | scipy.stats.johnsonsu |
DistributionGamma |
Gamma distribution | scipy.stats.gamma |
DistributionLogNormal |
Log-normal distribution | scipy.stats.lognorm |
DistributionLogNormalMedian |
Log-normal distribution (median) | - |
DistributionLogistic |
Logistic distribution | scipy.stats.logistic |
DistributionExponential |
Exponential distribution | scipy.stats.expon |
DistributionBeta |
Beta distribution | scipy.stats.beta |
DistributionGumbel |
Gumbel distribution | scipy.stats.gumbel_r |
DistributionInverseGaussian |
Inverse Gaussian distribution | scipy.stats.invgauss |
DistributionBetaInflated |
Beta Inflated distribution | - |
DistributionReverseGumbel |
Reverse Gumbel distribution | scipy.stats.gumbel_r |
DistributionInverseGamma |
Inverse Gamma distribution | scipy.stats.invgamma |
API Reference
ondil.DistributionNormal
Bases: ScipyMixin
, Distribution
The Normal distribution with mean and standard deviation parameterization.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1^2}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.
This distribution corresponds to the NO() distribution in GAMLSS.
__init__
Initialize the DistributionNormal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loc_link
|
LinkFunction
|
Location link. Defaults to IdentityLink(). |
IdentityLink()
|
scale_link
|
LinkFunction
|
Scale link. Defaults to LogLink(). |
LogLink()
|
ondil.DistributionNormalMeanVariance
Bases: ScipyMixin
, Distribution
The Normal distribution with mean and variance parameterization.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma^2 = \theta_1\) is the scale parameter.
__init__
Initialize the DistributionNormalMeanVariance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loc_link
|
LinkFunction
|
Location link. Defaults to IdentityLink(). |
IdentityLink()
|
scale_link
|
LinkFunction
|
Scale link. Defaults to LogLink(). |
LogLink()
|
ondil.DistributionT
ondil.DistributionJSU
Bases: ScipyMixin
, Distribution
Corresponds to GAMLSS JSUo() and scipy.stats.johnsonsu()
Distribution parameters: 0 : Location 1 : Scale (close to standard deviation) 2 : Skewness 3 : Tail behaviour
ondil.DistributionGamma
Bases: ScipyMixin
, Distribution
The Gamma Distribution for GAMLSS.
The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{y^{(1/\sigma^2-1)}\exp[-y/(\sigma^2 \mu)]}{(\sigma^2 \mu)^{(1/\sigma^2)} \Gamma(1/\sigma^2)} $$
with the location and shape parameters \(\mu, \sigma > 0\).
Note
The function is parameterized as GAMLSS' GA() distribution.
This parameterization is different to the scipy.stats.gamma(alpha, loc, scale)
parameterization.
We can use DistributionGamma().theta_to_scipy_params(theta)
to map the distribution parameters to scipy.
The scipy.stats.gamma()
distribution is defined as:
$$
f(x, \alpha, \beta) = \frac{\beta^\alpha x^{\alpha - 1} \exp[-\beta x]}{\Gamma(\alpha)}
$$
with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows:
$$
\alpha = 1/\sigma^2 \Leftrightarrow \sigma = \sqrt{1 / \alpha}
$$
and
$$
\beta = 1/(\sigma^2\mu).
$$
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loc_link
|
LinkFunction
|
The link function for \(\mu\). Defaults to LogLink(). |
LogLink()
|
scale_link
|
LinkFunction
|
The link function for \(\sigma\). Defaults to LogLink(). |
LogLink()
|
ondil.DistributionLogNormal
Bases: ScipyMixin
, Distribution
The Log-Normal distribution with mean and standard deviation parameterization in the log-space.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}}\exp\left(-\frac{(\log y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.
Note
Note that re-parameterization used to move from scipy.stats to GAMLSS is: $$ \mu = \exp(\theta_0) $$ and can therefore be numerically unstable for large values of \(\theta_0\). We have re-implemented the PDF, CDF, PPF according to avoid this issue, however the rvs method still uses the scipy.stats implementation which is not numerically stable for large values of \(\theta_0\).
cdf
Cumulative distribution function of the Log-Normal distribution.
logpdf
Logarithm of the probability density function of the Log-Normal distribution.
Probability density function of the Log-Normal distribution.
ondil.DistributionLogNormalMedian
Bases: ScipyMixin
, Distribution
The Log-Normal distribution with median and standard deviation parameterization in the log-space.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\left(-\frac{(\log y - \log \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \log \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the median parameter and \(\sigma = \theta_1\) is the scale parameter.
ondil.DistributionLogistic
Bases: ScipyMixin
, Distribution
The Logistic distribution with location and scale parameterization.
The probability density function is: $$ f(y | \mu, \sigma) = \frac{\exp\left(-\frac{y - \mu}{\sigma}\right)}{\sigma \left(1 + \exp\left(-\frac{y - \mu}{\sigma}\right)\right)^2} $$
This distribution corresponds to the LO() distribution in GAMLSS.
ondil.DistributionExponential
Bases: ScipyMixin
, Distribution
The Exponential distribution parameterized by the mean (mu).
PDF: f(y | mu) = (1 / mu) * exp(-y / mu), for y > 0, mu > 0
This corresponds to EXP() in GAMLSS where: - mu > 0 - y > 0
ondil.DistributionInverseGaussian
Bases: ScipyMixin
, Distribution
Inverse Gaussian (Wald) distribution for GAMLSS.
This distribution is characterized by two parameters: - \(\mu\): the mean of the distribution. - \(\sigma\): the scale parameter, which is related to the variance.
The probability density function (PDF) is given by: $$ f(y; \mu, \sigma) = \sqrt{\frac{\sigma}{2\pi y^3}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\sigma > 0\).
Note that the Inverse Gaussian distribution in scipy.stats
is parameterized differently:
mu
is the mean of the distribution.scale
is the scale parameter
and the PDF is given by: $$ f(y; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi y^3}} \exp\left(-\frac{\lambda (y - \mu)^2}{2\mu^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\lambda > 0\).
The relationship between the parameters is:
mu
inscipy.stats
corresponds to \(\mu \sigma^2\) in this implementation,scale
inscipy.stats
corresponds to \(1 / \sigma^2\) in this implementation.- The
loc
parameter inscipy.stats
is always 0.
ondil.DistributionBeta
Bases: ScipyMixin
, Distribution
The Beta Distribution for GAMLSS.
The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{\Gamma(\frac{1 - \sigma^2}{\sigma^2})} { \Gamma(\frac{\mu (1 - \sigma^2)}{\sigma^2}) \Gamma(\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2})} y^{\frac{\mu (1 - \sigma^2)}{\sigma^2} - 1} (1-y)^{\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2} - 1} $$
with the location and shape parameters \(\mu, \sigma > 0\).
Note
The function is parameterized as GAMLSS' BE() distribution.
This parameterization is different to the scipy.stats.beta(alpha, beta, loc, scale)
parameterization.
We can use DistributionBeta().gamlss_to_scipy(mu, sigma)
to map the distribution parameters to scipy.
The scipy.stats.beta()
distribution is defined as:
$$
f(x, \alpha, \beta) = \frac{\Gamma(\alpha + \beta) x^{\alpha - 1} {(1 - x)}^{\beta - 1}}{\Gamma(\alpha) \Gamma(\beta)}
$$
with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows: $$ \alpha = \mu (1 - \sigma^2) / \sigma^2 \Leftrightarrow \mu = \alpha / (\alpha + \beta) $$ and $$ \beta = (1 - \mu) (1 - \sigma^2)/ \sigma^2 \Leftrightarrow \sigma = \sqrt{((\alpha + \beta + 1) )} $$
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loc_link
|
LinkFunction
|
The link function for \(\mu\). Defaults to LOGIT |
LogitLink()
|
scale_link
|
LinkFunction
|
The link function for \(\sigma\). Defaults to LOGIT |
LogitLink()
|
ondil.DistributionGumbel
Bases: ScipyMixin
, Distribution
The Gumbel distribution.
The probability density function is given by: $$ f(y|\mu, \sigma) = (1/\sigma) * \exp(-(z + \exp(-z))) $$ where \(z = (y - \mu)/\sigma\) and has the following parameters:
- \(\mu\): location
- \(\sigma\): scale (>0)
ondil.DistributionBetaInflated
ondil.DistributionReverseGumbel
Bases: ScipyMixin
, Distribution
The Reverse Gumbel (Type I minimum extreme value) distribution with location (mu) and scale (sigma) parameters.
The probability density function is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sigma} \exp\left( \frac{y - \mu}{\sigma} - \exp\left( \frac{y - \mu}{\sigma} \right) \right) $$
This distribution corresponds to the RG() distribution in GAMLSS.
Notes
- Mean = mu - digamma(1) * sigma ≈ mu - 0.5772157 * sigma
- Variance = (pi^2 * sigma^2) / 6 ≈ 1.64493 * sigma^2
ondil.DistributionInverseGamma
Bases: ScipyMixin
, Distribution
The Inverse Gamma distribution as parameterized in GAMLSS:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
-
|
mu
|
mean-related parameter |
required |
-
|
sigma
|
dispersion parameter |
required |
Reparameterization
α = 1 / sigma² scale = mu * (1 + sigma²) / sigma²
This distribution corresponds to IGAMMA() in GAMLSS.
Base Class
ondil.base.Distribution
Bases: ABC
corresponding_gamlss
property
The name of the corresponding implementation in 'gamlss.dist' R package.
distribution_support
abstractmethod
property
The support of the distribution.
parameter_names
abstractmethod
property
Parameter name for each column of theta.
parameter_support
abstractmethod
property
The support of each parameter of the distribution.
calculate_conditional_initial_values
abstractmethod
Calculate the conditional initial values for the GAMLSS fit.
cdf
abstractmethod
Compute the cumulative distribution function (CDF) for the given data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
The data points at which to evaluate the CDF. |
required |
theta
|
ndarray
|
The parameters of the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The CDF evaluated at the given data points. |
dl1_dp1
abstractmethod
Take the first derivative of the likelihood function with respect to the param.
dl2_dp2
abstractmethod
Take the second derivative of the likelihood function with respect to the param.
dl2_dpp
abstractmethod
Take the first derivative of the likelihood function with respect to both parameters.
initial_values
abstractmethod
Calculate the initial values for the GAMLSS fit.
link_function
Apply the link function for param on y.
link_function_derivative
Apply the derivative of the link function for param on y.
link_inverse
Apply the inverse of the link function for param on y.
link_inverse_derivative
Apply the derivative of the inverse link function for param on y.
logcdf
abstractmethod
Compute the log of the cumulative distribution function (CDF) for the given data points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
An array of data points at which to evaluate the log CDF. |
required |
theta
|
ndarray
|
An array of parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: An array of log CDF values corresponding to the data points in |
pdf
abstractmethod
Compute the probability density function (PDF) for the given data points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
An array of data points at which to evaluate the PDF. |
required |
theta
|
ndarray
|
An array of parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: An array of PDF values corresponding to the data points in |
pmf
abstractmethod
Compute the probability mass function (PMF) for the given data points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
An array of data points at which to evaluate the PDF. |
required |
theta
|
ndarray
|
An array of parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: An array of PMF values corresponding to the data points in |
ppf
abstractmethod
Percent Point Function (Inverse of CDF).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
q
|
ndarray
|
Quantiles. |
required |
theta
|
ndarray
|
Distribution parameters. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The quantile corresponding to the given probabilities. |
quantile
Compute the quantile function for the given data.
This is a alias for the ppf
method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
q
|
ndarray
|
The quantiles to compute. |
required |
theta
|
ndarray
|
The parameters of the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: The quantiles corresponding to the given probabilities. |
rvs
abstractmethod
Generate random variates of given size and parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
int
|
The number of random variates to generate. |
required |
theta
|
ndarray
|
The parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: A 2D array of random variates with shape (theta.shape[0], size). |
ondil.base.ScipyMixin
Bases: ABC
parameter_names
abstractmethod
property
Parameter name for each column of theta.
scipy_dist
abstractmethod
property
The names of the parameters in the scipy.stats distribution and the corresponding column in theta.
scipy_names
abstractmethod
property
The names of the parameters in the scipy.stats distribution and the corresponding column in theta.
logcdf
Compute the log of the cumulative distribution function (CDF) for the given data points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
An array of data points at which to evaluate the log CDF. |
required |
theta
|
ndarray
|
An array of parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: An array of log CDF values corresponding to the data points in |
logpdf
Compute the log of the probability density function (PDF) for the given data points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
An array of data points at which to evaluate the log PDF. |
required |
theta
|
ndarray
|
An array of parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: An array of log PDF values corresponding to the data points in |
logpmf
Compute the log of the probability mass function (PMF) for the given data points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
ndarray
|
An array of data points at which to evaluate the log PMF. |
required |
theta
|
ndarray
|
An array of parameters for the distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: An array of log PMF values corresponding to the data points in |
theta_to_scipy_params
Maps \(\theta\) to the scipy
parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
theta
|
ndarray
|
\(\theta\) as estimated by |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If we don't define the |
Returns:
Name | Type | Description |
---|---|---|
dict |
Dict[str, ndarray]
|
Dictionary that can be unrolled into scipy distribution class as in |