Distributions
This serves as reference for all distribution objects that we implement in the ondil
package.
Note
This page is somewhat under construction, since MkDocs
does not support docstring inheritance at the moment.
All distributions are based on scipy.stats
distributions. We implement the probability density function (PDF), the cumulative density function (CDF), the percentage point or quantile function (PPF) and the random variates (RVS) accordingly as pass-through. The link functions are implemented in the same way as in GAMLSS (Rigby & Stasinopoulos, 2005). The link functions and their derivatives derive from the LinkFunction
base class.
Base Classes
Base Distribution | Description |
---|---|
Distribution |
Base class for all distributions. |
ScipyMixin |
Base class for all distributions that are based on scipy . |
List of Distributions
Distribution | Description | scipy Base |
---|---|---|
Normal |
Gaussian (mean and standard deviation) | scipy.stats.norm |
NormalMeanVariance |
Gaussian (mean and variance) | scipy.stats.norm |
StudentT |
Student's \(t\) distribution | scipy.stats.t |
JSU |
Johnson's SU distribution | scipy.stats.johnsonsu |
Gamma |
Gamma distribution | scipy.stats.gamma |
LogNormal |
Log-normal distribution | scipy.stats.lognorm |
LogNormalMedian |
Log-normal distribution (median) | - |
Logistic |
Logistic distribution | scipy.stats.logistic |
Exponential |
Exponential distribution | scipy.stats.expon |
Beta |
Beta distribution | scipy.stats.beta |
Gumbel |
Gumbel distribution | scipy.stats.gumbel_r |
InverseGaussian |
Inverse Gaussian distribution | scipy.stats.invgauss |
BetaInflated |
Beta Inflated distribution | - |
ReverseGumbel |
Reverse Gumbel distribution | scipy.stats.gumbel_r |
InverseGamma |
Inverse Gamma distribution | scipy.stats.invgamma |
BetaInflatedZero |
Zero Inflated Beta distribution | - |
ZeroAdjustedGamma |
Zero Adjusted Gamma distribution | - |
Distribution | Description | Scale Matrix Parameterization | Formula |
---|---|---|---|
MultivariateNormalInverseCholesky |
Multivariate normal (inverse Cholesky) | Inverse Cholesky factorization | $\Sigma = (L L^{\top})^{-1}$, where $L$ is lower triangular |
MultivariateNormalInverseModifiedCholesky |
Multivariate normal (inverse modified Cholesky) | Inverse modified Cholesky factorization | $\Sigma = (T D T^{\top})^{-1}$, $T$ unit lower triangular, $D$ diagonal |
MultivariateNormalInverseLowRank |
Multivariate normal (inverse low-rank) | Inverse low-rank factorization | $\Sigma = (U U^{\top} + D)^{-1}$, $U$ low-rank, $D$ diagonal |
MultivariateStudentTInverseCholesky |
Multivariate Student's $t$ (inverse Cholesky) | Inverse Cholesky factorization | $\Sigma = (L L^{\top})^{-1}$, where $L$ is lower triangular |
MultivariateStudentTInverseModifiedCholesky |
Multivariate Student's $t$ (inverse modified Cholesky) | Inverse modified Cholesky factorization | $\Sigma = (T D T^{\top})^{-1}$, $T$ unit lower triangular, $D$ diagonal |
MultivariateStudentTInverseLowRank |
Multivariate Student's $t$ (inverse low-rank) | Inverse low-rank factorization | $\Sigma = (U U^{\top} + D)^{-1}$, $U$ low-rank, $D$ diagonal |
API Reference
ondil.distributions.Normal
Bases: ScipyMixin
, Distribution
The Normal distribution with mean and standard deviation parameterization.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1^2}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.
This distribution corresponds to the NO() distribution in GAMLSS.
__init__
Initialize the Normal.
Parameters:
-
loc_link
(LinkFunction
, default:Identity()
) –Location link. Defaults to Identity().
-
scale_link
(LinkFunction
, default:Log()
) –Scale link. Defaults to Log().
ondil.distributions.NormalMeanVariance
Bases: ScipyMixin
, Distribution
The Normal distribution with mean and variance parameterization.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{\sqrt{2\pi\theta_1}} \exp\left(-\frac{(y - \theta_0)^2}{2\theta_1}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma^2 = \theta_1\) is the scale parameter.
__init__
Initialize the NormalMeanVariance.
Parameters:
-
loc_link
(LinkFunction
, default:Identity()
) –Location link. Defaults to Identity().
-
scale_link
(LinkFunction
, default:Log()
) –Scale link. Defaults to Log().
ondil.distributions.StudentT
ondil.distributions.JSU
Bases: ScipyMixin
, Distribution
Corresponds to GAMLSS JSUo() and scipy.stats.johnsonsu()
Distribution parameters: 0 : Location 1 : Scale (close to standard deviation) 2 : Skewness 3 : Tail behaviour
ondil.distributions.Gamma
Bases: ScipyMixin
, Distribution
The Gamma Distribution for GAMLSS.
The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{y^{(1/\sigma^2-1)}\exp[-y/(\sigma^2 \mu)]}{(\sigma^2 \mu)^{(1/\sigma^2)} \Gamma(1/\sigma^2)} $$
with the location and shape parameters \(\mu, \sigma > 0\).
Note
The function is parameterized as GAMLSS' GA() distribution.
This parameterization is different to the scipy.stats.gamma(alpha, loc, scale)
parameterization.
We can use Gamma().theta_to_scipy_params(theta)
to map the distribution parameters to scipy.
The scipy.stats.gamma()
distribution is defined as:
$$
f(x, \alpha, \beta) = \frac{\beta^\alpha x^{\alpha - 1} \exp[-\beta x]}{\Gamma(\alpha)}
$$
with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows:
$$
\alpha = 1/\sigma^2 \Leftrightarrow \sigma = \sqrt{1 / \alpha}
$$
and
$$
\beta = 1/(\sigma^2\mu).
$$
Parameters:
-
loc_link
(LinkFunction
, default:Log()
) –The link function for \(\mu\). Defaults to Log().
-
scale_link
(LinkFunction
, default:Log()
) –The link function for \(\sigma\). Defaults to Log().
ondil.distributions.LogNormal
Bases: ScipyMixin
, Distribution
The Log-Normal distribution with mean and standard deviation parameterization in the log-space.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}}\exp\left(-\frac{(\log y - \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the location parameter and \(\sigma = \theta_1\) is the scale parameter.
Note
Note that re-parameterization used to move from scipy.stats to GAMLSS is: $$ \mu = \exp(\theta_0) $$ and can therefore be numerically unstable for large values of \(\theta_0\). We have re-implemented the PDF, CDF, PPF according to avoid this issue, however the rvs method still uses the scipy.stats implementation which is not numerically stable for large values of \(\theta_0\).
Probability density function of the Log-Normal distribution.
cdf
Cumulative distribution function of the Log-Normal distribution.
ppf
Percent-point function (quantile function) of the Log-Normal distribution.
ondil.distributions.LogNormalMedian
Bases: ScipyMixin
, Distribution
The Log-Normal distribution with median and standard deviation parameterization in the log-space.
The probability density function of the distribution is defined as: $$ f(y | \mu, \sigma) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\left(-\frac{(\log y - \log \mu)^2}{2\sigma^2}\right). $$ respectively $$ f(y | \theta_0, \theta_1) = \frac{1}{y\theta_1\sqrt{2\pi}}\exp\left(-\frac{(\log y - \log \theta_0)^2}{2\theta_1^2}\right). $$ where \(y\) is the observed data, \(\mu = \theta_0\) is the median parameter and \(\sigma = \theta_1\) is the scale parameter.
ondil.distributions.Logistic
Bases: ScipyMixin
, Distribution
The Logistic distribution with location and scale parameterization.
The probability density function is: $$ f(y | \mu, \sigma) = \frac{\exp\left(-\frac{y - \mu}{\sigma}\right)}{\sigma \left(1 + \exp\left(-\frac{y - \mu}{\sigma}\right)\right)^2} $$
This distribution corresponds to the LO() distribution in GAMLSS.
ondil.distributions.Exponential
Bases: ScipyMixin
, Distribution
The Exponential distribution parameterized by the mean (mu).
PDF: f(y | mu) = (1 / mu) * exp(-y / mu), for y > 0, mu > 0
This corresponds to EXP() in GAMLSS where: - mu > 0 - y > 0
ondil.distributions.InverseGaussian
Bases: ScipyMixin
, Distribution
Inverse Gaussian (Wald) distribution for GAMLSS.
This distribution is characterized by two parameters: - \(\mu\): the mean of the distribution. - \(\sigma\): the scale parameter, which is related to the variance.
The probability density function (PDF) is given by: $$ f(y; \mu, \sigma) = \sqrt{\frac{\sigma}{2\pi y^3}} \exp\left(-\frac{(y - \mu)^2}{2\sigma^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\sigma > 0\).
Note that the Inverse Gaussian distribution in scipy.stats
is parameterized differently:
mu
is the mean of the distribution.scale
is the scale parameter
and the PDF is given by: $$ f(y; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi y^3}} \exp\left(-\frac{\lambda (y - \mu)^2}{2\mu^2 y}\right) $$ where \(y > 0\), \(\mu > 0\), and \(\lambda > 0\).
The relationship between the parameters is:
mu
inscipy.stats
corresponds to \(\mu \sigma^2\) in this implementation,scale
inscipy.stats
corresponds to \(1 / \sigma^2\) in this implementation.- The
loc
parameter inscipy.stats
is always 0.
ondil.distributions.Beta
Bases: ScipyMixin
, Distribution
The Beta Distribution for GAMLSS.
The distribution function is defined as in GAMLSS as: $$ f(y|\mu,\sigma)=\frac{\Gamma(\frac{1 - \sigma^2}{\sigma^2})} { \Gamma(\frac{\mu (1 - \sigma^2)}{\sigma^2}) \Gamma(\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2})} y^{\frac{\mu (1 - \sigma^2)}{\sigma^2} - 1} (1-y)^{\frac{(1 - \mu) (1 - \sigma^2)}{\sigma^2} - 1} $$
with the location and shape parameters \(\mu, \sigma > 0\).
Note
The function is parameterized as GAMLSS' BE() distribution.
This parameterization is different to the scipy.stats.beta(alpha, beta, loc, scale)
parameterization.
We can use Beta().gamlss_to_scipy(mu, sigma)
to map the distribution parameters to scipy.
The scipy.stats.beta()
distribution is defined as:
$$
f(x, \alpha, \beta) = \frac{\Gamma(\alpha + \beta) x^{\alpha - 1} {(1 - x)}^{\beta - 1}}{\Gamma(\alpha) \Gamma(\beta)}
$$
with the paramters \(\alpha, \beta >0\). The parameters can be mapped as follows: $$ \alpha = \mu (1 - \sigma^2) / \sigma^2 \Leftrightarrow \mu = \alpha / (\alpha + \beta) $$ and $$ \beta = (1 - \mu) (1 - \sigma^2)/ \sigma^2 \Leftrightarrow \sigma = \sqrt{((\alpha + \beta + 1) )} $$
Parameters:
-
loc_link
(LinkFunction
, default:Logit()
) –The link function for \(\mu\). Defaults to LOGIT
-
scale_link
(LinkFunction
, default:Logit()
) –The link function for \(\sigma\). Defaults to LOGIT
ondil.distributions.Gumbel
Bases: ScipyMixin
, Distribution
The Gumbel distribution.
The probability density function is given by: $$ f(y|\mu, \sigma) = (1/\sigma) * \exp(-(z + \exp(-z))) $$ where \(z = (y - \mu)/\sigma\) and has the following parameters:
- \(\mu\): location
- \(\sigma\): scale (>0)
ondil.distributions.BetaInflated
Bases: Distribution
The Beta Inflated Distribution for GAMLSS.
The distribution function is defined as in GAMLSS as: $$ f_Y(y \mid \mu, \sigma, \nu, \tau) = \begin{cases} p_0 & \text{if } y = 0 \ (1 - p_0 - p_1) \dfrac{1}{B(\alpha, \beta)} y^{\alpha - 1}(1 - y)^{\beta - 1} & \text{if } 0 < y < 1 \ p_1 & \text{if } y = 1 \end{cases} $$
where \(\alpha = \mu (1 - \sigma^2) / \sigma^2\), \beta = (1 - \mu) (1 - \sigma^2)/ \sigma^2; p_0 = \nu (1 + \nu + \tau)^{-1} and p_1 = \tau (1 + \nu + \tau)^{-1}$,
and \(\mu, \sigma \in (0,1)\) and $\nu, \tau > 0 $
The parameter tuple \(\theta\) in Python is defined as:
\(\theta = (\theta_0, \theta_1, \theta_2, \theta_3) = (\mu, \sigma, \nu, \tau)\) where \(\mu = \theta_0\) is the location parameter, \(\sigma = \theta_1\) is the scale parameter and \(\nu, \tau = \theta_2, \theta_3\) are shape parameters which together define the inflation at 0 and 1
This distribution corresponds to the BEINF() distribution in GAMLSS.
ondil.distributions.ReverseGumbel
Bases: ScipyMixin
, Distribution
The Reverse Gumbel (Type I minimum extreme value) distribution with location (mu) and scale (sigma) parameters.
The probability density function is defined as: $$ f(y | \mu, \sigma) = \frac{1}{\sigma} \exp\left( \frac{y - \mu}{\sigma} - \exp\left( \frac{y - \mu}{\sigma} \right) \right) $$
This distribution corresponds to the RG() distribution in GAMLSS.
Notes
- Mean = mu - digamma(1) * sigma ≈ mu - 0.5772157 * sigma
- Variance = (pi^2 * sigma^2) / 6 ≈ 1.64493 * sigma^2
ondil.distributions.InverseGamma
Bases: ScipyMixin
, Distribution
The Inverse Gamma distribution as parameterized in GAMLSS:
The distribution has two parameters
- mu: mean-related parameter
- sigma: dispersion parameter
Reparameterization
α = 1 / sigma² scale = mu * (1 + sigma²) / sigma²
This distribution corresponds to IGAMMA() in GAMLSS.
ondil.distributions.ZeroAdjustedGamma
Bases: Distribution
The Zero Adjusted Gamma Distribution for GAMLSS.
The zero adjusted gamma distribution is a mixture of a discrete value 0 with probability \nu, and a gamma GA(\mu; \sigma) distribution on the positive real line (0, \infty) with probability (1 - \nu).
where \(y\) is the observed data, \(\mu > 0\) is the location parameter, \(\sigma > 0\) is the scale parameter, and $\nu \in [0, \infty) $ is the inflation parameter.
ondil.distributions.BetaInflatedZero
Bases: Distribution
The Zero Inflated Beta Distribution for GAMLSS.
f_Y(y \mid \mu, \sigma, \nu) = \begin{cases} p_0 & ext{if } y = 0 \ (1 - p_0) f_W(y \mid \mu, \sigma) & \text{if } 0 < y < 1 \end{cases}
where \(p_0 = \nu (1 + \nu)^{-1}\)
and \(\mu, \sigma \in (0,1)\) and $\nu > 0 $
Multivariate Distributions
ondil.distributions.MultivariateNormalInverseCholesky
Bases: MultivariateDistributionMixin
, Distribution
The multivariate normal (Gaussian) distribution parameterized by the inverse Cholesky factor of the precision (inverse scale) matrix.
The PDF of the multivariate normal distribution is given by: $$ p(y \mid \mu, L) = |L| \cdot (2\pi)^{-k/2} \exp\left(-\frac{1}{2} (y - \mu)^T (L L^T) (y - \mu)\right) $$
where \( k \) is the dimensionality of the data, \( \mu \) is the location parameter, and \( L \) is the inverse Cholesky factor of the precision matrix (so the precision is \( L L^T \)).
set_theta_element
staticmethod
Sets an element of theta for parameter param and place k.
Note
This will mutate theta
!
Parameters:
-
theta
(Dict
) –Current fitted $ heta$
-
value
(ndarray
) –Value to set
-
param
(int
) –Distribution parameter
-
k
(int
) –Flat element index \(k\)
Returns:
-
Dict
(Dict
) –Theta where element (param, k) is set to value.
dl1_dp1
Return the first derivatives wrt to the parameter.
Note
We expect the fitted L^-1)^T to be handed in matrix/cube form, i.e of shape n x d x d. But we return the derivatives in flat format.
Parameters:
-
y
(ndarray
) –Y values of shape n x d
-
theta
(Dict
) –Dict with {0 : fitted mu, 1 : fitted (L^-1)^T}
-
param
(int
, default:0
) –Which parameter derivatives to return. Defaults to 0.
Returns:
-
derivative
(ndarray
) –The 1st derivatives.
dl2_dp2
Return the second derivatives wrt to the parameter.
Note
We expect the fitted L^-1)^T to be handed in matrix/cube form, i.e of shape n x d x d. But we return the derivatives in flat format.
Parameters:
-
y
(ndarray
) –Y values of shape n x d
-
theta
(Dict
) –Dict with {0 : fitted mu, 1 : fitted (L^-1)^T}
-
param
(int
, default:0
) –Which parameter derivatives to return. Defaults to 0.
Returns:
-
derivative
(ndarray
) –The 2nd derivatives.
param_conditional_likelihood
Calulate the log-likelihood for (flat) eta for parameter (param) and theta for all other parameters.
Parameters:
-
y
(ndarray
) –True values
-
theta
(Dict
) –Fitted theta.
-
eta
(ndarray
) –Fitted eta.
-
param
(int
) –Param for which we take eta.
Returns:
-
ndarray
–np.ndarray: Log-likelihood.
ondil.distributions.MultivariateNormalInverseModifiedCholesky
Bases: MultivariateDistributionMixin
, Distribution
The multivariate normal (Gaussian) distribution parameterized by the modified Cholesky decomposition of the precision (inverse scale) matrix.
In the modified Cholesky decomposition, the precision matrix \( \Omega \) is written as: $$ \Omega = T^T D T $$ where \( T \) is a lower triangular matrix with ones on the diagonal, and \( D \) is a diagonal matrix with positive entries.
The PDF of the multivariate normal distribution is then: $$ p(y \mid \mu, T, D) = |T| \cdot |D|^{1/2} \cdot (2\pi)^{-k/2} \exp\left(-\frac{1}{2} (y - \mu)^T T^T D T (y - \mu)\right) $$ where \( k \) is the dimensionality of the data, \( \mu \) is the mean vector, \( T \) and \( D \) are the modified Cholesky factors of the precision matrix.
set_theta_element
Sets an element of theta for parameter param and place k.
Note
This will mutate theta
!
Parameters:
-
theta
(Dict
) –Current fitted $ heta$
-
value
(ndarray
) –Value to set
-
param
(int
) –Distribution parameter
-
k
(int
) –Flat element index \(k\)
Returns:
-
Dict
(Dict
) –Theta where element (param, k) is set to value.
param_conditional_likelihood
Calulate the log-likelihood for (flat) eta for parameter (param) and theta for all other parameters.
Parameters:
-
y
(ndarray
) –True values
-
theta
(Dict
) –Fitted theta.
-
eta
(ndarray
) –Fitted eta.
-
param
(int
) –Param for which we take eta.
Returns:
-
ndarray
–np.ndarray: Log-likelihood.
ondil.distributions.MultivariateNormalInverseLowRank
Bases: MultivariateDistributionMixin
, Distribution
The multivariate normal (Gaussian) distribution parameterized by a low-rank precision matrix.
The PDF of the multivariate normal distribution is given by: $$ p(y \mid \mu, D, V) = |D + V V^T|^{1/2} \cdot (2\pi)^{-k/2} \exp\left(-\frac{1}{2} (y - \mu)^T (D + V V^T) (y - \mu)\right) $$
where \( k \) is the dimensionality of the data, \( \mu \) is the location parameter, \( D \) is a diagonal matrix, and \( V \) is a low-rank matrix such that the precision is \( D + V V^T \).
set_theta_element
Sets an element of theta for parameter param and place k.
Note
This will mutate theta
!
Parameters:
-
theta
(Dict
) –Current fitted $ heta$
-
value
(ndarray
) –Value to set
-
param
(int
) –Distribution parameter
-
k
(int
) –Flat element index \(k\)
Returns:
-
Dict
(Dict
) –Theta where element (param, k) is set to value.
theta_to_scipy
Map theta to scipy
distribution parameters for the multivariate normal distribution.
Parameters:
-
theta
(Dict[int, ndarray]
) –Fitted / predicted theta.
Returns:
-
Dict[str, ndarray]
–Dict[str, np.ndarray]: Mapped predicted
ondil.distributions.MultivariateStudentTInverseCholesky
Bases: MultivariateDistributionMixin
, Distribution
The multivariate \( t \)-distribution parameterized by the inverse Cholesky factor of the precision (inverse scale) matrix.
The PDF of the multivariate \( t \)-distribution is given by: $$ p(y \mid \mu, L, \nu) = \frac{\Gamma\left(\frac{\nu + k}{2}\right)} {\Gamma\left(\frac{\nu}{2}\right) \left(\pi \nu\right)^{k/2}} \cdot |L| \left(1 + \frac{1}{\nu} (y - \mu)^T (L L^T) (y - \mu)\right)^{-\frac{\nu + k}{2}} $$
where \( k \) is the dimensionality of the data, \( \mu \) is the location parameter, \( L \) is the inverse Cholesky factor of the precision matrix (so the precision is \( L L^T \)), and \( \nu \) is the degrees of freedom.
set_theta_element
staticmethod
Sets an element of theta for parameter param and place k.
Note
This will mutate theta
!
Parameters:
-
theta
(Dict
) –Current fitted $ heta$
-
value
(ndarray
) –Value to set
-
param
(int
) –Distribution parameter
-
k
(int
) –Flat element index \(k\)
Returns:
-
Dict
(Dict
) –Theta where element (param, k) is set to value.
ondil.distributions.MultivariateStudentTInverseModifiedCholesky
Bases: MultivariateDistributionMixin
, Distribution
Multivariate Student's t distribution with modified Cholesky decomposition.
The PDF of the multivariate \( t \)-distribution with precision matrix parameterized as \( T^T D T \) is: $$ p(y \mid \mu, D, T, \nu) = \frac{\Gamma\left(\frac{\nu + k}{2}\right)} {\Gamma\left(\frac{\nu}{2}\right) \left(\pi \nu\right)^{k/2}} \cdot \sqrt{\det(D)} \left(1 + \frac{1}{\nu} (y - \mu)^T T^T D T (y - \mu)\right)^{-\frac{\nu + k}{2}} $$ where \( k \) is the dimensionality, \( \mu \) is the location, \( D \) is a diagonal matrix, \( T \) is a lower triangular matrix, and \( \nu \) is the degrees of freedom.
__init__
__init__(
loc_link: LinkFunction = Identity(),
scale_link_1: LinkFunction = MatrixDiag(Log()),
scale_link_2: LinkFunction = MatrixDiagTril(
Identity(), Identity()
),
tail_link: LinkFunction = LogShiftTwo(),
dof_guesstimate: float = 10,
use_gaussian_for_location: bool = False,
)
Initializes the distribution with specified link functions and parameters. Args: loc_link (LinkFunction, optional): Link function for the location parameter. Defaults to Identity(). scale_link_1 (LinkFunction, optional): Link function for the first scale parameter (diagonal). Defaults to MatrixDiag(Log()). scale_link_2 (LinkFunction, optional): Link function for the second scale parameter (lower-triangular). Defaults to MatrixDiagTril(Identity(), Identity()). tail_link (LinkFunction, optional): Link function for the tail (degrees of freedom) parameter. Defaults to LogShiftTwo(). dof_guesstimate (float, optional): Initial guess for the degrees of freedom. Defaults to 10. use_gaussian_for_location (bool, optional): Whether to use a Gaussian approximation for the location parameter. Defaults to False. Attributes: dof_guesstimate (float): Stores the initial guess for the degrees of freedom. dof_independence (float): Large value to represent independence in degrees of freedom. use_gaussian_for_location (bool): Indicates if a Gaussian is used for the location parameter. _regularization_allowed (dict): Specifies which parameters allow regularization.
set_theta_element
Sets an element of theta for parameter param and place k.
Note
This will mutate theta
!
Parameters:
-
theta
(Dict
) –Current fitted $ heta$
-
value
(ndarray
) –Value to set
-
param
(int
) –Distribution parameter
-
k
(int
) –Flat element index \(k\)
Returns:
-
Dict
(Dict
) –Theta where element (param, k) is set to value.
param_conditional_likelihood
Calulate the log-likelihood for (flat) eta for parameter (param) and theta for all other parameters.
Parameters:
-
y
(ndarray
) –True values
-
theta
(Dict
) –Fitted theta.
-
eta
(ndarray
) –Fitted eta.
-
param
(int
) –Param for which we take eta.
Returns:
-
ndarray
–np.ndarray: Log-likelihood.
ondil.distributions.MultivariateStudentTInverseLowRank
Bases: MultivariateDistributionMixin
, Distribution
The multivariate \( t \)-distribution using a low-rank approximation (LRA) of the precision (inverse scale) matrix.
The PDF of the multivariate \( t \)-distribution is given by: $$ p(y \mid \mu, D, V, \nu) = \frac{\Gamma\left(\frac{\nu + k}{2}\right)} {\Gamma\left(\frac{\nu}{2}\right) \left(\pi \nu\right)^{k/2}} \cdot \frac{1}{\sqrt{\det(D + V V^T)}} \left(1 + \frac{1}{\nu} (y - \mu)^T (D + V V^T)^{-1} (y - \mu)\right)^{-\frac{\nu + k}{2}} $$
where \( k \) is the dimensionality of the data, \( \mu \) is the location parameter, \( D \) is a diagonal matrix, \( V \) is a low-rank matrix, and \( \nu \) is the degrees of freedom.
set_theta_element
Sets an element of theta for parameter param and place k.
Note
This will mutate theta
!
Parameters:
-
theta
(Dict
) –Current fitted $ heta$
-
value
(ndarray
) –Value to set
-
param
(int
) –Distribution parameter
-
k
(int
) –Flat element index \(k\)
Returns:
-
Dict
(Dict
) –Theta where element (param, k) is set to value.
Base Class
ondil.base.Distribution
Bases: ABC
corresponding_gamlss
property
The name of the corresponding implementation in 'gamlss.dist' R package.
parameter_names
abstractmethod
property
Parameter name for each column of theta.
parameter_shape
abstractmethod
property
Parameter name for each column of theta.
distribution_support
abstractmethod
property
The support of the distribution.
parameter_support
abstractmethod
property
The support of each parameter of the distribution.
theta_to_params
Take the fitted values and return tuple of vectors for distribution parameters.
dl1_dp1
abstractmethod
Take the first derivative of the likelihood function with respect to the param.
dl2_dp2
abstractmethod
Take the second derivative of the likelihood function with respect to the param.
dl2_dpp
abstractmethod
Take the first derivative of the likelihood function with respect to both parameters.
link_function
Apply the link function for param on y.
link_inverse
Apply the inverse of the link function for param on y.
link_function_derivative
Apply the derivative of the link function for param on y.
link_inverse_derivative
Apply the derivative of the inverse link function for param on y.
link_function_second_derivative
Apply the second derivative of the link function for param on y.
initial_values
abstractmethod
Calculate the initial values for the GAMLSS fit.
quantile
Compute the quantile function for the given data.
This is a alias for the ppf
method.
Parameters:
-
q
(ndarray
) –The quantiles to compute.
-
theta
(ndarray
) –The parameters of the distribution.
Returns:
-
ndarray
–np.ndarray: The quantiles corresponding to the given probabilities.
calculate_conditional_initial_values
abstractmethod
Calculate the conditional initial values for the GAMLSS fit.
cdf
abstractmethod
Compute the cumulative distribution function (CDF) for the given data.
Parameters:
-
y
(ndarray
) –The data points at which to evaluate the CDF.
-
theta
(ndarray
) –The parameters of the distribution.
Returns:
-
ndarray
–np.ndarray: The CDF evaluated at the given data points.
pdf
abstractmethod
Compute the probability density function (PDF) for the given data points.
Parameters:
-
y
(ndarray
) –An array of data points at which to evaluate the PDF.
-
theta
(ndarray
) –An array of parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: An array of PDF values corresponding to the data points in
y
.
pmf
abstractmethod
Compute the probability mass function (PMF) for the given data points.
Parameters:
-
y
(ndarray
) –An array of data points at which to evaluate the PDF.
-
theta
(ndarray
) –An array of parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: An array of PMF values corresponding to the data points in
y
.
ppf
abstractmethod
Percent Point Function (Inverse of CDF).
Parameters:
-
q
(ndarray
) –Quantiles.
-
theta
(ndarray
) –Distribution parameters.
Returns:
-
ndarray
–np.ndarray: The quantile corresponding to the given probabilities.
rvs
abstractmethod
Generate random variates of given size and parameters.
Parameters:
-
size
(int
) –The number of random variates to generate.
-
theta
(ndarray
) –The parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: A 2D array of random variates with shape (theta.shape[0], size).
logcdf
abstractmethod
Compute the log of the cumulative distribution function (CDF) for the given data points.
Parameters:
-
y
(ndarray
) –An array of data points at which to evaluate the log CDF.
-
theta
(ndarray
) –An array of parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: An array of log CDF values corresponding to the data points in
y
.
ondil.base.ScipyMixin
Bases: ABC
parameter_names
abstractmethod
property
Parameter name for each column of theta.
scipy_dist
abstractmethod
property
The names of the parameters in the scipy.stats distribution and the corresponding column in theta.
scipy_names
abstractmethod
property
The names of the parameters in the scipy.stats distribution and the corresponding column in theta.
theta_to_scipy_params
Maps \(\theta\) to the scipy
parameters.
Parameters:
-
theta
(ndarray
) –\(\theta\) as estimated by
OnlineDistributionalRegression()
estimator
Raises:
-
ValueError
–If we don't define the
scipy_names
attribute.
Returns:
-
dict
(Dict[str, ndarray]
) –Dictionary that can be unrolled into scipy distribution class as in
st.some_dist(**return_value)
logpmf
Compute the log of the probability mass function (PMF) for the given data points.
Parameters:
-
y
(ndarray
) –An array of data points at which to evaluate the log PMF.
-
theta
(ndarray
) –An array of parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: An array of log PMF values corresponding to the data points in
y
.
logpdf
Compute the log of the probability density function (PDF) for the given data points.
Parameters:
-
y
(ndarray
) –An array of data points at which to evaluate the log PDF.
-
theta
(ndarray
) –An array of parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: An array of log PDF values corresponding to the data points in
y
.
logcdf
Compute the log of the cumulative distribution function (CDF) for the given data points.
Parameters:
-
y
(ndarray
) –An array of data points at which to evaluate the log CDF.
-
theta
(ndarray
) –An array of parameters for the distribution.
Returns:
-
ndarray
–np.ndarray: An array of log CDF values corresponding to the data points in
y
.