Estimators

Estimator classes provide an sklearn-like API to fit, predict and update models with the accordingly named methods.

Online GAMLSS

ondil.estimators.OnlineDistributionalRegression

Bases: OndilEstimatorMixin, RegressorMixin, BaseEstimator

The online/incremental GAMLSS class.

init

__init__(
    distribution: Distribution = Normal(),
    equation: Dict[int, Union[str, ndarray, list]] = None,
    forget: float | Dict[int, float] = 0.0,
    method: Union[
        str,
        EstimationMethod,
        Dict[int, str],
        Dict[int, EstimationMethod],
    ] = "ols",
    scale_inputs: bool | ndarray = True,
    fit_intercept: Union[bool, Dict[int, bool]] = True,
    regularize_intercept: Union[
        bool, Dict[int, bool]
    ] = False,
    ic: Union[str, Dict] = "aic",
    model_selection: Literal[
        "local_rss", "global_ll"
    ] = "local_rss",
    prefit_initial: int = 0,
    prefit_update: int = 0,
    step_size: float | Dict[int, float] = 1.0,
    verbose: int = 0,
    debug: bool = False,
    param_order: ndarray | None = None,
    cautious_updates: bool = False,
    cond_start_val: bool = False,
    max_it_outer: int = 30,
    max_it_inner: int = 30,
    abs_tol_outer: float = 0.001,
    abs_tol_inner: float = 0.001,
    rel_tol_outer: float = 1e-05,
    rel_tol_inner: float = 1e-05,
    min_it_outer: int = 1,
) -> OnlineDistributionalRegression

The OnlineDistributionalRegression() provides the fit, update and predict methods for linear parametric GAMLSS models.

For a response variable \(Y\) which is distributed according to the distribution \(\mathcal{F}(\theta)\) with the distribution parameters \(\theta\), we model:

\[g_k(\theta_k) = \eta_k = X_k\beta_k\]

where \(g_k(\cdot)\) is a link function, which ensures that the predicted distribution parameters are in a sensible range (we don't want, e.g. negative standard deviations), and \(\eta_k\) is the predictor (on the space of the link function). The model is fitted using iterative re-weighted least squares (IRLS).

Tips and Tricks

If you're facing issues with non-convergence and/or matrix inversion problems, please enable the debug mode and increase the logging level by increasing verbose. In debug mode, the estimator will save the weights, working vectors, derivatives each iteration in a according dictionary, i.e. self._debug_weights. The keys are composed of a tuple of ints of (parameter, outer_iteration, inner_iteration). Very small and/or very large weights (implicitly second derivatives) can be a sign that either start values are not chosen appropriately or that the distributional assumption does not fit the data well.

Debug Mode

Please don't use debug more for production models since it saves the X matrix and its scaled counterpart, so you will get large estimator objects.

Conditional start values cond_start_val=False

The cond_start_val parameter is considered experimental and may not work as expected.

Cautious updates cautious_updates=True

The cautious_updates parameter is considered experimental and may not work as expected.

Parameters:

distribution (Distribution, default: Normal() ) –

The parametric distribution to use for modeling the response variable.
equation (Dict[int, Union[str, ndarray, list]], default: None ) –

The modeling equation for each distribution parameter. The dictionary should map parameter indices to either the strings 'all', 'intercept', a numpy array of column indices, or a list of column names. Defaults to None, which uses all covariates for the first parameter and intercepts for others.
forget (float | Dict[int, float], default: 0.0 ) –

The forget factor for exponential weighting of past observations. Can be a single float for all parameters or a dictionary mapping parameter indices to floats. Defaults to 0.0.
method (str | EstimationMethod | Dict[int, str] | Dict[int, EstimationMethod], default: 'ols' ) –

The estimation method for each parameter. Can be a string, EstimationMethod, or a dictionary mapping parameter indices. Defaults to "ols".
scale_inputs (bool | ndarray, default: True ) –

Whether to scale the input features. Can be a boolean or a numpy array specifying scaling per feature. Defaults to True.
fit_intercept (bool | Dict[int, bool], default: True ) –

Whether to fit an intercept for each parameter. Can be a boolean or a dictionary mapping parameter indices. Defaults to True.
regularize_intercept (bool | Dict[int, bool], default: False ) –

Whether to regularize the intercept for each parameter. Can be a boolean or a dictionary mapping parameter indices. Defaults to False.
ic (str | Dict, default: 'aic' ) –

Information criterion for model selection (e.g., "aic", "bic"). Can be a string or a dictionary mapping parameter indices. Defaults to "aic".
model_selection (Literal['local_rss', 'global_ll'], default: 'local_rss' ) –

Model selection strategy. "local_rss" selects based on local residual sum of squares, "global_ll" uses global log-likelihood. Defaults to "local_rss".
prefit_initial (int, default: 0 ) –

Number of initial outer iterations with only one inner iteration (for stabilization). Defaults to 0.
prefit_update (int, default: 0 ) –

Number of initial outer iterations with only one inner iteration during updates. Defaults to 0.
step_size (float | Dict[int, float], default: 1.0 ) –

Step size for parameter updates. Can be a float or a dictionary mapping parameter indices. Defaults to 1.0.
verbose (int, default: 0 ) –

Verbosity level for logging. 0 = silent, 1 = high-level, 2 = per-parameter, 3 = per-iteration. Defaults to 0.
debug (bool, default: False ) –
Enable debug mode. Debug mode will save additional data to the estimator object. Currently, we save
```
* self._debug_X_dict
* self._debug_X_scaled
* self._debug_weights
* self._debug_working_vectors
* self._debug_dl1dlp1
* self._debug_dl2dlp2
* self._debug_eta
* self._debug_fv
* self._debug_coef
* self._debug_coef_path
```
to the the estimator. Debug mode works in batch and online settings. Note that debug mode is not recommended for production use. Defaults to False.
param_order (ndarray | None, default: None ) –

Order in which to fit the distribution parameters. Defaults to None (natural order).
cautious_updates (bool, default: False ) –

If True, use smaller step sizes and more iterations when new data are outliers. Defaults to False.
cond_start_val (bool, default: False ) –

If True, use conditional start values for parameters (experimental). Defaults to False.
max_it_outer (int, default: 30 ) –

Maximum number of outer iterations for the fitting algorithm. Defaults to 30.
max_it_inner (int, default: 30 ) –

Maximum number of inner iterations for the fitting algorithm. Defaults to 30.
abs_tol_outer (float, default: 0.001 ) –

Absolute tolerance for convergence in the outer loop. Defaults to 1e-3.
abs_tol_inner (float, default: 0.001 ) –

Absolute tolerance for convergence in the inner loop. Defaults to 1e-3.
rel_tol_outer (float, default: 1e-05 ) –

Relative tolerance for convergence in the outer loop. Defaults to 1e-5.
rel_tol_inner (float, default: 1e-05 ) –

Relative tolerance for convergence in the inner loop. Defaults to 1e-5.
min_it_outer (int, default: 1 ) –

Minimum number of outer iterations before checking for convergence. Defaults to 1.

Attributes:

distribution (Distribution) –

The distribution used for modeling.
equation (Dict[int, Union[str, ndarray, list]]) –

The modeling equation for each distribution parameter.
forget (Dict[int, float]) –

Forget factor for each distribution parameter.
fit_intercept (Dict[int, bool]) –

Whether to fit an intercept for each parameter.
regularize_intercept (Dict[int, bool]) –

Whether to regularize the intercept for each parameter.
ic (Dict[int, str]) –

Information criterion for model selection for each parameter.
method (Dict[int, EstimationMethod]) –

Estimation method for each parameter.
scale_inputs (bool | ndarray) –

Whether to scale the input features.
param_order (ndarray | None) –

Order in which to fit the distribution parameters.
n_observations_ (float) –

Total number of observations used for fitting.
n_training_ (Dict[int, int]) –

Effective training length for each distribution parameter.
n_features_ (Dict[int, int]) –

Number of features used for each distribution parameter.
coef_ (ndarray) –

Coefficients for the fitted model, shape (n_params, n_features).
coef_path_ (ndarray) –

Coefficients path for the fitted model, shape (n_params, n_iterations, n_features). Only available if method is a path-based method like LASSO.

Returns:

OnlineDistributionalRegression ( OnlineDistributionalRegression ) –

The OnlineDistributionalRegression instance.

fit

fit(
    X: ndarray,
    y: ndarray,
    sample_weight: Optional[ndarray] = None,
) -> OnlineDistributionalRegression

Fit the online GAMLSS model.

This method initializes the model with the given covariate data matrix \(X\) and response variable \(Y\).

Parameters:

X (ndarray) –

Covariate data matrix \(X\).
y (ndarray) –

Response variable \(Y\).
sample_weight (Optional[ndarray], default: None ) –

User-defined sample weights. Defaults to None.

Returns:

OnlineDistributionalRegression ( OnlineDistributionalRegression ) –

The fitted OnlineDistributionalRegression instance.

Raises:

ValueError –

If the equation is not specified correctly.
OutOfSupportError –

If the values of \(y\) are below or above the distribution's support.

update

update(
    X: ndarray,
    y: ndarray,
    sample_weight: Optional[ndarray] = None,
)

Update the fit for the online GAMLSS Model.

Parameters:

X (ndarray) –

Covariate data matrix \(X\).
y (ndarray) –

Response variable \(Y\).
sample_weight (Optional[ndarray], default: None ) –

User-defined sample weights. Defaults to None (all observations have the same weight).

predict

predict(X: ndarray) -> np.ndarray

Predict the mean of the response distribution.

Parameters:

X (ndarray) –

Covariate matrix \(X\). Shape should be (n_samples, n_features).

Raises: NotFittedError: If the model is not fitted yet.

Returns:

Predictions ( ndarray ) –

Predictions

predict_median

predict_median(X: ndarray)

Predict the median of the distribution.

Parameters:

X (ndarray) –

Covariate matrix \(X\). Shape should be (n_samples, n_features).

Raises: NotFittedError: If the model is not fitted yet.

Returns:

Predictions ( ndarray ) –

Predicted median of the distribution. Shape will be (n_samples,).

predict_distribution_parameters

predict_distribution_parameters(
    X: ndarray,
    what: str = "response",
    return_contributions: bool = False,
) -> np.ndarray

Predict the distibution parameters given input data.

Parameters:

X (ndarray) –

Design matrix.
what (str, default: 'response' ) –

Predict the response or the link. Defaults to "response". Remember the GAMLSS models \(g(\theta) = X^T\beta\). Predict "link" will output \(X^T\beta\), predict "response" will output \(g^{-1}(X^T\beta)\). Usually, you want predict = "response".
return_contributions (bool, default: False ) –

Whether to return a Tuple[prediction, contributions] where the contributions of the individual covariates for each distribution parameter's predicted value is specified. Defaults to False.

Raises:

ValueError –

Raises if what is not in ["link", "response"].

Returns:

Predictions ( ndarray ) –

Predicted values for the distribution of shape (n_samples, n_params) where n_params is the number of distribution parameters.

predict_quantile

predict_quantile(
    X: ndarray, quantile: float | ndarray
) -> np.ndarray

Predict the quantile(s) of the distribution.

Parameters:

X (ndarray) –

Covariate matrix \(X\). Shape should be (n_samples, n_features).
quantile (float | ndarray) –

Quantile(s) to predict.

Returns:

ndarray –

np.ndarray: Predicted quantile(s) of the distribution. Shape will be (n_samples, n_quantiles).

get_debug_information

get_debug_information(
    variable: str = "coef",
    param: int = 0,
    it_outer: int = 1,
    it_inner: int = 1,
)

Get debug information for a specific variable, parameter, outer iteration and inner iteration.

We currently support the following variables:

"X_dict": The design matrix for the distribution parameter.
"X_scaled": The scaled design matrix.
"weights": The sample weights for the distribution parameter.
"working_vectors": The working vectors for the distribution parameter.
"dl1dlp1": The first derivative of the log-likelihood with respect to the distribution parameter.
"dl2dlp2": The second derivative of the log-likelihood with respect to the distribution parameter.
"eta": The linear predictor for the distribution parameter.
"fv": The fitted values for the distribution parameter.
"dv": The deviance for the distribution parameter.
"coef": The coefficients for the distribution parameter.
"coef_path": The coefficients path for the distribution parameter.

Parameters:

variable (str, default: 'coef' ) –

The variable to get debug information for. Defaults to "coef".
param (int, default: 0 ) –

The distribution parameter to get debug information for. Defaults to 0.
it_outer (int, default: 1 ) –

The outer iteration to get debug information for. Defaults to 1.
it_inner (int, default: 1 ) –

The inner iteration to get debug information for. Defaults to 1.

Returns: Any: The debug information for the specified variable, parameter, outer iteration and inner iteration. Raises: ValueError: If debug mode is not enabled.

Linear Models

ondil.estimators.OnlineLinearModel

Bases: OndilEstimatorMixin, RegressorMixin, BaseEstimator

Simple Online Linear Regression for the expected value.

init

__init__(
    forget: float = 0.0,
    scale_inputs: bool | ndarray = True,
    fit_intercept: bool = True,
    regularize_intercept: bool = False,
    method: EstimationMethod | str = "ols",
    ic: Literal["aic", "bic", "hqc", "max"] = "bic",
)

The basic linear model for many different estimation techniques.

Parameters:

forget (float, default: 0.0 ) –

Exponential discounting of old observations. Defaults to 0.
scale_inputs (bool, default: True ) –

Whether to scale the \(X\) matrix. Defaults to True.
fit_intercept (bool, default: True ) –

Whether to add an intercept in the estimation. Defaults to True.
regularize_intercept (bool, default: False ) –

Whether to regularize the intercept. Defaults to False.
method (EstimationMethod | str, default: 'ols' ) –

The estimation method. Can be a string or EstimationMethod class. Defaults to "ols".
ic (Literal['aic', 'bic', 'hqc', 'max'], default: 'bic' ) –

The information criteria for model selection. Defaults to "bic".

Raises: ValueError: Will raise if you try to regularize the intercept, but not fit it.

fit

fit(
    X: ndarray,
    y: ndarray,
    sample_weight: Optional[ndarray] = None,
) -> OnlineLinearModel

Initial fit of the online regression model.

Parameters:

X (ndarray) –

The design matrix \(X\).
y (ndarray) –

The response vector \(y\).
sample_weight (Optional[ndarray], default: None ) –

The sample weights. Defaults to None.

update

update(
    X: ndarray,
    y: ndarray,
    sample_weight: Optional[ndarray] = None,
) -> None

Update the regression model.

Parameters:

X (ndarray) –

The new row of the design matrix \(X\). Needs to be of shape 1 x n_features or n_obs_new x n_features.
y (ndarray) –

The new observation of \(y\). Needs to be the same shape as X or a single observation.
sample_weight (Optional[ndarray], default: None ) –

The weight for the new observations. None implies all observations have weight 1. Defaults to None.

score

score(X: ndarray, y: ndarray) -> float

Calculate the coefficient of determination \(R^2\).

Parameters:

X (ndarray) –

The design matrix \(X\).
y (ndarray) –

The response vector \(y\).

Returns:

float ( float ) –

The coefficient of determination \(R^2\).

predict

predict(X: ndarray) -> np.ndarray

Predict using the optimal IC selection.

Parameters:

X (ndarray) –

The design matrix \(X\).

Returns:

ndarray –

np.ndarray: The predictions for the optimal IC.

predict_path

predict_path(X: ndarray) -> np.ndarray

Predict the full regularization path.

Parameters:

X (ndarray) –

The design matrix \(X\).

Returns:

ndarray –

np.ndarray: The predictions for the full path.

ondil.estimators.OnlineLasso

Bases: OnlineLinearModel

init

__init__(
    forget: float = 0,
    scale_inputs: bool = True,
    fit_intercept: bool = True,
    regularize_intercept: bool = False,
    ic: Literal["aic", "bic", "hqc", "max"] = "bic",
    early_stop: int = 0,
    beta_lower_bound: ndarray | None = None,
    beta_upper_bound: ndarray | None = None,
    lambda_n: int = 100,
    lambda_eps: float = 0.0001,
    start_value: str = "previous_fit",
    tolerance: float = 0.0001,
    max_iterations: int = 1000,
    selection: Literal["cyclic", "random"] = "cyclic",
)

Online LASSO estimator class.

This class initializes the online linear regression fitted using LASSO. The estimator object provides three main methods, estimator.fit(X, y), estimator.update(X, y) and estimator.predict(X).

Parameters:

forget (float, default: 0 ) –

Exponential discounting of old observations. Defaults to 0.
scale_inputs (bool, default: True ) –

Whether to scale the \(X\) matrix. Defaults to True.
fit_intercept (bool, default: True ) –

Whether to add an intercept in the estimation. Defaults to True.
regularize_intercept (bool, default: False ) –

Whether to regularize the intercept. Defaults to False.
ic (Literal['aic', 'bic', 'hqc', 'max'], default: 'bic' ) –

The information criteria for model selection. Defaults to "bic".
early_stop (int, default: 0 ) –

Early stopping criterion. If we reach early_stop non-zero coefficients, we stop. Defaults to 0 (no early stopping).
beta_lower_bound (ndarray | None, default: None ) –

Lower bounds for beta. Keep in mind the size of X and whether you want to fit an intercept. None corresponds to unconstrained estimation.Defaults to None.
beta_upper_bound (ndarray | None, default: None ) –

Lower bounds for beta. Keep in mind the size of X and whether you want to fit an intercept. None corresponds to unconstrained estimation. Defaults to None.
lambda_n (int, default: 100 ) –

Length of the regularization path. Defaults to 100.
lambda_eps (float, default: 0.0001 ) –

The largest regularization is determined automatically such that the solution is fully regularized. The smallest regularization is taken as \(\varepsilon \lambda^\max\) and we will use an exponential grid. Defaults to 1e-4.
start_value (str, default: 'previous_fit' ) –

Whether to choose the previous fit or the previous regularization as start value. Defaults to 100.
tolerance (float, default: 0.0001 ) –

Tolerance for breaking the CD. Defaults to 1e-4.
max_iterations (int, default: 1000 ) –

Max number of CD iterations. Defaults to 1000.
selection (Literal['cyclic', 'random'], default: 'cyclic' ) –

Whether to cycle through all coordinates in order or random. For large problems, random might increase convergence. Defaults to 100.