Estimation Methods

Overview

EstimationMethod() classes do the actual hard lifting of fitting coefficients (or weights). They take more technical parameters like the length of the regularization path or upper bounds on certain coefficients. These parameters depend on the individual estimation method. In general, we aim to provide sensible out-of-the-box defaults. This page explains the difference in detail. Estimator classes often take a method parameter, to which either a string or an instance of the EstimationMethod() can be passed, e.g.

from ondil.estimators import OnlineLinearModel
from ondil.methods import LassoPath

fit_intercept = True
scale_inputs = True

model = OnlineLinearModel(
    method="lasso",  # default parameters
    fit_intercept=fit_intercept,
    scale_inputs=scale_inputs,
)
# or equivalent
model = OnlineLinearModel(
    method=LassoPath(),  # default parameters
    fit_intercept=fit_intercept,
    scale_inputs=scale_inputs,
)
# or with user-defined parameters
model = OnlineLinearModel(
    method=LassoPath(
        lambda_n=10
    ),  # only 10 different regularization strengths
    fit_intercept=fit_intercept,
    scale_inputs=scale_inputs,
)

More information on coordinate descent can also be found on this page and in the API Reference below.

API Reference

Note

We don't document the classmethods of the EstimationMethod since these are only used internally.

ondil.methods.OrdinaryLeastSquares

Bases: EstimationMethod

Simple ordinary least squares respectively recursive least squares. No fancy parameters possible.

ondil.methods.LassoPath

Bases: ElasticNetPath

Path-based lasso estimation.

The lasso method runs coordinate descent along a (geometric) decreasing grid of regularization strengths (lambdas). We automatically calculate the maximum regularization strength for which all (not-regularized) coefficients are 0. The lower end of the lambda grid is defined as $$\\lambda_\min = \\lambda_\max * \\varepsilon_\\lambda.$$

We allow to pass user-defined lower and upper bounds for the coefficients. The coefficient bounds must be an numpy array of the length of X respectively of the number of variables in the equation plus the intercept, if you fit one. This allows to box-constrain the coefficients to a certain range.

Furthermore, we allow to choose the start value, i.e. whether you want an update to be warm-started on the previous fit's path or on the previous reguarlization strength or an average of both. If your data generating process is rather stable, the "previous_fit" should give considerable speed gains, since warm starting on the previous strength is effectively batch-fitting.

Lastly, we have some rather technical parameters like the number of coordinate descent iterations, whether you want to cycle randomly and for which tolerance you want to break. We use active set iterations, i.e. after the first coordinate-wise update for each regularization strength, only non-zero coefficients are updated.

We use numba to speed up the coordinate descent algorithm.

init

__init__(
    lambda_n: int = 100,
    lambda_eps: float = 0.0001,
    early_stop: int = 0,
    start_value_initial: Literal[
        "previous_lambda", "previous_fit", "average"
    ] = "previous_lambda",
    start_value_update: Literal[
        "previous_lambda", "previous_fit", "average"
    ] = "previous_fit",
    selection: Literal["cyclic", "random"] = "cyclic",
    beta_lower_bound: ndarray | None = None,
    beta_upper_bound: ndarray | None = None,
    tolerance: float = 0.0001,
    max_iterations: int = 1000,
)

Initializes the lasso method with the specified parameters.

Parameters:

lambda_n (int, default: 100 ) –

Number of lambda values to use in the path. Default is 100.
lambda_eps (float, default: 0.0001 ) –

Minimum lambda value as a fraction of the maximum lambda. Default is 1e-4.
early_stop (int, default: 0 ) –

Early stopping criterion. Will stop if the number of non-zero parameters is reached. Default is 0 (no early stopping).
start_value_initial (Literal['previous_lambda', 'previous_fit', 'average'], default: 'previous_lambda' ) –

Method to initialize the start value for the first lambda. Default is "previous_lambda".
start_value_update (Literal['previous_lambda', 'previous_fit', 'average'], default: 'previous_fit' ) –

Method to update the start value for subsequent lambdas. Default is "previous_fit".
selection (Literal['cyclic', 'random'], default: 'cyclic' ) –

Method to select features during the path. Default is "cyclic".
beta_lower_bound (ndarray | None, default: None ) –

Lower bound for the coefficients. Default is None.
beta_upper_bound (ndarray | None, default: None ) –

Upper bound for the coefficients. Default is None.
tolerance (float, default: 0.0001 ) –

Tolerance for the optimization. Default is 1e-4.
max_iterations (int, default: 1000 ) –

Maximum number of iterations for the optimization. Default is 1000.

ondil.methods.Ridge

Bases: EstimationMethod

Single-lambda Ridge Estimation.

The ridge method runs coordinate descent for a single lambda.

We allow to pass user-defined lower and upper bounds for the coefficients. The coefficient bounds must be an numpy array of the length of X respectively of the number of variables in the equation plus the intercept, if you fit one. This allows to box-constrain the coefficients to a certain range.

Lastly, we have some rather technical parameters like the number of coordinate descent iterations, whether you want to cycle randomly and for which tolerance you want to break. We use active set iterations, i.e. after the first coordinate-wise update for each regularization strength, only non-zero coefficients are updated.

We use numba to speed up the coordinate descent algorithm.

init

__init__(
    lambda_reg: float | None = None,
    start_beta: ndarray | None = None,
    selection: Literal["cyclic", "random"] = "cyclic",
    beta_lower_bound: ndarray | None = None,
    beta_upper_bound: ndarray | None = None,
    tolerance: float = 0.0001,
    max_iterations: int = 1000,
)

Initializes the Ridge method with the specified parameters.

Parameters:

lambda_reg (float, default: None ) –

Regularization parameter. Must be greater than 0. Higher values lead to more regularization. If not set, the average variance of the features is used as the default.
selection (Literal['cyclic', 'random'], default: 'cyclic' ) –

Method to select features during the path. Default is "cyclic".
beta_lower_bound (ndarray | None, default: None ) –

Lower bound for the coefficients. Default is None.
beta_upper_bound (ndarray | None, default: None ) –

Upper bound for the coefficients. Default is None.
tolerance (float, default: 0.0001 ) –

Tolerance for the optimization. Default is 1e-4.
max_iterations (int, default: 1000 ) –

Maximum number of iterations for the optimization. Default is 1000.

ondil.methods.ElasticNetPath

Bases: EstimationMethod

Path-based elastic net estimation.

The elastic net method runs coordinate descent along a (geometric) decreasing grid of regularization strengths (lambdas). We automatically calculate the maximum regularization strength for which all (not-regularized) coefficients are 0. The lower end of the lambda grid is defined as $$\\lambda_\min = \\lambda_\max * \\varepsilon_\\lambda.$$

The elastic net method is a combination of LASSO and Ridge regression. Parameter $\alpha$ controls the balance between LASSO and Ridge. Thereby, $\alpha=0$ corresponds to Ridge regression and $\alpha=1$ corresponds to LASSO regression.

We allow to pass user-defined lower and upper bounds for the coefficients. The coefficient bounds must be an numpy array of the length of X respectively of the number of variables in the equation plus the intercept, if you fit one. This allows to box-constrain the coefficients to a certain range.

Furthermore, we allow to choose the start value, i.e. whether you want an update to be warm-started on the previous fit's path or on the previous reguarlization strength or an average of both. If your data generating process is rather stable, the "previous_fit" should give considerable speed gains, since warm starting on the previous strength is effectively batch-fitting.

Lastly, we have some rather technical parameters like the number of coordinate descent iterations, whether you want to cycle randomly and for which tolerance you want to break. We use active set iterations, i.e. after the first coordinate-wise update for each regularization strength, only non-zero coefficients are updated.

We use numba to speed up the coordinate descent algorithm.

init

__init__(
    alpha: float,
    lambda_n: int = 100,
    lambda_eps: float = 0.0001,
    early_stop: int = 0,
    start_value_initial: Literal[
        "previous_lambda", "previous_fit", "average"
    ] = "previous_lambda",
    start_value_update: Literal[
        "previous_lambda", "previous_fit", "average"
    ] = "previous_fit",
    selection: Literal["cyclic", "random"] = "cyclic",
    beta_lower_bound: ndarray | None = None,
    beta_upper_bound: ndarray | None = None,
    regularization_weights: ndarray | None = None,
    tolerance: float = 0.0001,
    max_iterations: int = 1000,
)

Initializes the ElasticNet method with the specified parameters.

Parameters:

alpha (float) –

Mixing parameter between the L1 and L2 loss. Alpha = 0 corresponds to the Rigde, Alpha = 1 corresponds to the LASSO.
lambda_n (int, default: 100 ) –

Number of lambda values to use in the path. Default is 100.
lambda_eps (float, default: 0.0001 ) –

Minimum lambda value as a fraction of the maximum lambda. Default is 1e-4.
early_stop (int, default: 0 ) –

Early stopping criterion. Will stop if the number of non-zero parameters is reached. Default is 0 (no early stopping).
start_value_initial (Literal['previous_lambda', 'previous_fit', 'average'], default: 'previous_lambda' ) –

Method to initialize the start value for the first lambda. Default is "previous_lambda".
start_value_update (Literal['previous_lambda', 'previous_fit', 'average'], default: 'previous_fit' ) –

Method to update the start value for subsequent lambdas. Default is "previous_fit".
selection (Literal['cyclic', 'random'], default: 'cyclic' ) –

Method to select features during the path. Default is "cyclic".
beta_lower_bound (ndarray | None, default: None ) –

Lower bound for the coefficients. Default is None.
beta_upper_bound (ndarray | None, default: None ) –

Upper bound for the coefficients. Default is None.
tolerance (float, default: 0.0001 ) –

Tolerance for the optimization. Default is 1e-4.
max_iterations (int, default: 1000 ) –

Maximum number of iterations for the optimization. Default is 1000.