Estimation Methods
Overview
EstimationMethod()
classes do the actual hard lifting of fitting coefficients (or weights). They take more technical parameters like the length of the regularization path or upper bounds on certain coefficients. These parameters depend on the individual estimation method. In general, we aim to provide sensible out-of-the-box defaults. This page explains the difference in detail. Estimator
classes often take a method parameter, to which either a string or an instance of the EstimationMethod()
can be passed, e.g.
from ondil import OnlineLinearModel, LassoPathMethod
fit_intercept = True
scale_inputs = True
model = OnlineLinearModel(
method="lasso", # default parameters
fit_intercept=fit_intercept,
scale_inputs=scale_inputs,
)
# or equivalent
model = OnlineLinearModel(
method=LassoPathMethod(), # default parameters
fit_intercept=fit_intercept,
scale_inputs=scale_inputs,
)
# or with user-defined parameters
model = OnlineLinearModel(
method=LassoPathMethod(
lambda_n=10
), # only 10 different regularization strengths
fit_intercept=fit_intercept,
scale_inputs=scale_inputs,
)
More information on coordinate descent can also be found on this page and in the API Reference below.
API Reference
Note
We don't document the classmethods of the EstimationMethod
since these are only used internally.
ondil.OrdinaryLeastSquaresMethod
Bases: EstimationMethod
Simple ordinary least squares respectively recursive least squares. No fancy parameters possible.
ondil.LassoPathMethod
Bases: ElasticNetPathMethod
Path-based lasso estimation.
The lasso method runs coordinate descent along a (geometric) decreasing grid of regularization strengths (lambdas). We automatically calculate the maximum regularization strength for which all (not-regularized) coefficients are 0. The lower end of the lambda grid is defined as \(\(\lambda_\min = \lambda_\max * \varepsilon_\lambda.\)\)
We allow to pass user-defined lower and upper bounds for the coefficients.
The coefficient bounds must be an numpy
array of the length of X
respectively of the number of variables in the
equation plus the intercept, if you fit one. This allows to box-constrain the coefficients to a certain range.
Furthermore, we allow to choose the start value, i.e. whether you want an update to be warm-started on the previous fit's path
or on the previous reguarlization strength or an average of both. If your data generating process is rather stable,
the "previous_fit"
should give considerable speed gains, since warm starting on the previous strength is effectively batch-fitting.
Lastly, we have some rather technical parameters like the number of coordinate descent iterations, whether you want to cycle randomly and for which tolerance you want to break. We use active set iterations, i.e. after the first coordinate-wise update for each regularization strength, only non-zero coefficients are updated.
We use numba
to speed up the coordinate descent algorithm.
__init__
__init__(lambda_n: int = 100, lambda_eps: float = 0.0001, early_stop: int = 0, start_value_initial: Literal['previous_lambda', 'previous_fit', 'average'] = 'previous_lambda', start_value_update: Literal['previous_lambda', 'previous_fit', 'average'] = 'previous_fit', selection: Literal['cyclic', 'random'] = 'cyclic', beta_lower_bound: ndarray | None = None, beta_upper_bound: ndarray | None = None, tolerance: float = 0.0001, max_iterations: int = 1000)
Initializes the lasso method with the specified parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lambda_n
|
int
|
Number of lambda values to use in the path. Default is 100. |
100
|
lambda_eps
|
float
|
Minimum lambda value as a fraction of the maximum lambda. Default is 1e-4. |
0.0001
|
early_stop
|
int
|
Early stopping criterion. Will stop if the number of non-zero parameters is reached. Default is 0 (no early stopping). |
0
|
start_value_initial
|
Literal['previous_lambda', 'previous_fit', 'average']
|
Method to initialize the start value for the first lambda. Default is "previous_lambda". |
'previous_lambda'
|
start_value_update
|
Literal['previous_lambda', 'previous_fit', 'average']
|
Method to update the start value for subsequent lambdas. Default is "previous_fit". |
'previous_fit'
|
selection
|
Literal['cyclic', 'random']
|
Method to select features during the path. Default is "cyclic". |
'cyclic'
|
beta_lower_bound
|
ndarray | None
|
Lower bound for the coefficients. Default is None. |
None
|
beta_upper_bound
|
ndarray | None
|
Upper bound for the coefficients. Default is None. |
None
|
tolerance
|
float
|
Tolerance for the optimization. Default is 1e-4. |
0.0001
|
max_iterations
|
int
|
Maximum number of iterations for the optimization. Default is 1000. |
1000
|
ondil.RidgeMethod
Bases: EstimationMethod
Single-lambda Ridge Estimation.
The ridge method runs coordinate descent for a single lambda.
We allow to pass user-defined lower and upper bounds for the coefficients.
The coefficient bounds must be an numpy
array of the length of X
respectively of the number of variables in the
equation plus the intercept, if you fit one. This allows to box-constrain the coefficients to a certain range.
Lastly, we have some rather technical parameters like the number of coordinate descent iterations, whether you want to cycle randomly and for which tolerance you want to break. We use active set iterations, i.e. after the first coordinate-wise update for each regularization strength, only non-zero coefficients are updated.
We use numba
to speed up the coordinate descent algorithm.
__init__
__init__(lambda_reg: float | None = None, start_beta: ndarray | None = None, selection: Literal['cyclic', 'random'] = 'cyclic', beta_lower_bound: ndarray | None = None, beta_upper_bound: ndarray | None = None, tolerance: float = 0.0001, max_iterations: int = 1000)
Initializes the Ridge method with the specified parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lambda_reg
|
float
|
Regularization parameter. Must be greater than 0. Higher values lead to more regularization. If not set, the average variance of the features is used as the default. |
None
|
selection
|
Literal['cyclic', 'random']
|
Method to select features during the path. Default is "cyclic". |
'cyclic'
|
beta_lower_bound
|
ndarray | None
|
Lower bound for the coefficients. Default is None. |
None
|
beta_upper_bound
|
ndarray | None
|
Upper bound for the coefficients. Default is None. |
None
|
tolerance
|
float
|
Tolerance for the optimization. Default is 1e-4. |
0.0001
|
max_iterations
|
int
|
Maximum number of iterations for the optimization. Default is 1000. |
1000
|
ondil.ElasticNetPathMethod
Bases: EstimationMethod
Path-based elastic net estimation.
The elastic net method runs coordinate descent along a (geometric) decreasing grid of regularization strengths (lambdas). We automatically calculate the maximum regularization strength for which all (not-regularized) coefficients are 0. The lower end of the lambda grid is defined as \(\(\lambda_\min = \lambda_\max * \varepsilon_\lambda.\)\)
The elastic net method is a combination of LASSO and Ridge regression. Parameter \(lpha\) controls the balance between LASSO and Ridge. Thereby, \(lpha=0\) corresponds to Ridge regression and \(lpha=1\) corresponds to LASSO regression.
We allow to pass user-defined lower and upper bounds for the coefficients.
The coefficient bounds must be an numpy
array of the length of X
respectively of the number of variables in the
equation plus the intercept, if you fit one. This allows to box-constrain the coefficients to a certain range.
Furthermore, we allow to choose the start value, i.e. whether you want an update to be warm-started on the previous fit's path
or on the previous reguarlization strength or an average of both. If your data generating process is rather stable,
the "previous_fit"
should give considerable speed gains, since warm starting on the previous strength is effectively batch-fitting.
Lastly, we have some rather technical parameters like the number of coordinate descent iterations, whether you want to cycle randomly and for which tolerance you want to break. We use active set iterations, i.e. after the first coordinate-wise update for each regularization strength, only non-zero coefficients are updated.
We use numba
to speed up the coordinate descent algorithm.
__init__
__init__(alpha: float, lambda_n: int = 100, lambda_eps: float = 0.0001, early_stop: int = 0, start_value_initial: Literal['previous_lambda', 'previous_fit', 'average'] = 'previous_lambda', start_value_update: Literal['previous_lambda', 'previous_fit', 'average'] = 'previous_fit', selection: Literal['cyclic', 'random'] = 'cyclic', beta_lower_bound: ndarray | None = None, beta_upper_bound: ndarray | None = None, tolerance: float = 0.0001, max_iterations: int = 1000)
Initializes the ElasticNet method with the specified parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lambda_n
|
int
|
Number of lambda values to use in the path. Default is 100. |
100
|
lambda_eps
|
float
|
Minimum lambda value as a fraction of the maximum lambda. Default is 1e-4. |
0.0001
|
early_stop
|
int
|
Early stopping criterion. Will stop if the number of non-zero parameters is reached. Default is 0 (no early stopping). |
0
|
start_value_initial
|
Literal['previous_lambda', 'previous_fit', 'average']
|
Method to initialize the start value for the first lambda. Default is "previous_lambda". |
'previous_lambda'
|
start_value_update
|
Literal['previous_lambda', 'previous_fit', 'average']
|
Method to update the start value for subsequent lambdas. Default is "previous_fit". |
'previous_fit'
|
selection
|
Literal['cyclic', 'random']
|
Method to select features during the path. Default is "cyclic". |
'cyclic'
|
beta_lower_bound
|
ndarray | None
|
Lower bound for the coefficients. Default is None. |
None
|
beta_upper_bound
|
ndarray | None
|
Upper bound for the coefficients. Default is None. |
None
|
tolerance
|
float
|
Tolerance for the optimization. Default is 1e-4. |
0.0001
|
max_iterations
|
int
|
Maximum number of iterations for the optimization. Default is 1000. |
1000
|