sklearn_utilities package
- class sklearn_utilities.AppendPredictionToX(estimators: Sequence[TEstimator] | TEstimator, *, concat: bool = True, n_jobs: int | None = -1)[source]
Bases:
BaseEstimator
,TransformerMixin
,Generic
[TEstimator
]Append the prediction of the estimators to X.
- estimators: Sequence[TEstimator]
- estimators_: Sequence[TEstimator]
- transform(X: TX, y: Any = None, **predict_params: Any) TX [source]
Append the prediction of the estimators to X.
If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}_{estimator_index}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}_{estimator_index}” if the prediction is 2D.
- class sklearn_utilities.AppendPredictionToXSingle(estimator: TEstimator, *, concat: bool = True)[source]
Bases:
BaseEstimator
,TransformerMixin
,Generic
[TEstimator
]Append the prediction of the estimator to X. To use multiple estimators, use AppendPredictionToX instead.
- estimator: TEstimator
- estimator_: TEstimator
The fitted estimator.
- transform(X: TX, y: Any = None, **predict_params: Any) TX [source]
Append the prediction of the estimator to X. If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}” if the prediction is 2D.
- class sklearn_utilities.AppendXPredictionToX(estimator: TEstimator, *, variables: Sequence[Hashable] | None = None, append: bool = True, append_pred_diff: bool = True, append_pred_real_diff: bool = True)[source]
Bases:
BaseEstimator
,TransformerMixin
,Generic
[TEstimator
]Append the prediction of X by the estimator to X.
- estimator: TEstimator
- class sklearn_utilities.ComposeVarEstimator(estimator: TEstimator, estimator_var: TEstimatorVar = DummyRegressorVar())[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
,TEstimatorVar
]Compose an estimator with a variance estimator.
- estimator: TEstimator
- predict(X: TX, return_std: Literal[False] = False, **predict_params: Any) TY [source]
- predict(X: TX, return_std: Literal[True], **predict_params: Any) tuple[TY, TY]
- set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') ComposeVarEstimator
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
return_std
parameter inpredict
.- Returns:
self – The updated object.
- Return type:
object
- class sklearn_utilities.DataFrameWrapper(estimator: TEstimator, *, pattern_x: str = '^(:?fit|transform|fit_transform)$', pattern_y: str = '^predict.*?$')[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]- pattern_x: str
- y_columns_or_name: Index[Any] | Hashable | None = None
- class sklearn_utilities.DropByNoisePrediction(estimator: Any | None = None, *, drop_rate: float = 0.1, distribution: Literal['uniform', 'normal', 'arange'] = 'uniform', random_state: RandomState | int | None = None)[source]
Bases:
SelectFromModel
Remove features based on their importance to a model’s prediction of noise.
“Unsupervised Learning by Predicting Noise” https://arxiv.org/pdf/1704.05310.pdf https://ar5iv.labs.arxiv.org/html/1704.05310
“Neural Architecture Search with Random Labels” https://arxiv.org/abs/2101.11834 https://ar5iv.labs.arxiv.org/html/2101.11834
Original Implementation: https://gist.github.com/richmanbtc/075178cd0e6d15c4a251128068991d47
- fit(X: Any, y: Any | None = None, **fit_params: Any) Self [source]
Fit the SelectFromModel meta-transformer.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,), default=None) – The target values (integers that correspond to classes in classification, real numbers in regression).
**fit_params (dict) –
If enable_metadata_routing=False (default):
Parameters directly passed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
If enable_metadata_routing=True:
Parameters safely routed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
Changed in version 1.4: See Metadata Routing User Guide for more details.
- Returns:
self – Fitted estimator.
- Return type:
object
- fit_transform(X: Any, y: Any = None, **fit_params: Any) Any [source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- sklearn_utilities.DropMissingColumns(threshold_not_missing: float = 0.9) LooseGenericUnivariateSelect [source]
Drop columns with missing values above a threshold.
- Parameters:
threshold_not_missing (float, optional) – If the ratio of non-missing values is below or equals to this threshold, the column is dropped, by default 0.9
- class sklearn_utilities.DropMissingRowsY(estimator: TEstimator)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]A wrapper for estimators that drops NaN values from y before fitting.
- estimator: TEstimator
- class sklearn_utilities.DummyRegressorVar(*, strategy: Literal['mean', 'median', 'quantile', 'constant', 'mean'] = 'mean', constant: float | None | ArrayLike | int = None, quantile: float | None = None, allow_nan: bool = True)[source]
Bases:
DummyRegressor
DummyRegressor with 1.0 variance.
- fit(X: ArrayLike, y: ArrayLike, sample_weight: ArrayLike | None = None) Self [source]
Fit the random regressor.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Target values.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
self – Fitted estimator.
- Return type:
object
- predict(X: ArrayLike, return_std: bool = False) ndarray | tuple[ndarray, ndarray] [source]
Perform classification on test vectors X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test data.
return_std (bool, default=False) –
Whether to return the standard deviation of posterior prediction. All zeros in this case.
New in version 0.20.
- Returns:
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Predicted target values for X.
y_std (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Standard deviation of predictive distribution of query points.
- predict_var(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] [source]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DummyRegressorVar
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') DummyRegressorVar
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
return_std
parameter inpredict
.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DummyRegressorVar
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- class sklearn_utilities.EstimatorWrapperBase(estimator: TEstimator)[source]
Bases:
BaseEstimator
,MetaEstimatorMixin
,Generic
[TEstimator
]A base class for estimator wrappers that delegates all attributes to the wrapped estimator.
- estimator: TEstimator
- class sklearn_utilities.EvalSetWrapper(estimator: TEstimator, *, test_size: float | int | None = None, train_size: float | int | None = None, random_state: int | RandomState | None = None, shuffle: bool = True, stratify: bool = False, **kwargs: Any)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]A wrapper that splits the data into train and validation sets and passes the validation set to eval_set parameter of the estimator.
- estimator: TEstimator
- class sklearn_utilities.ExcludedColumnTransformerPandas(estimator: Any = IdTransformer(), exclude_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]
Bases:
BaseEstimator
,TransformerMixin
A transformer that excludes columns from the input data frame.
- feature_names_in_: Sequence[str]
- feature_names_out_: Sequence[str]
- fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) DataFrame [source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class sklearn_utilities.FeatureUnionPandas(transformer_list, *, n_jobs=None, transformer_weights=None, verbose=False)[source]
Bases:
FeatureUnion
- fit_transform(X: Any, y: Any = None, **fit_params: Any) Any [source]
Fit all transformers, transform the data and concatenate results.
- Parameters:
X (iterable or array-like, depending on transformers) – Input data to be transformed.
y (array-like of shape (n_samples, n_outputs), default=None) – Targets for supervised learning.
**fit_params (dict, default=None) – Parameters to pass to the fit method of the estimator.
- Returns:
X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
- Return type:
array-like or sparse matrix of shape (n_samples, sum_n_components)
- steps: List[Any]
- transform(X: Any) Any [source]
Transform X separately by each transformer, concatenate results.
- Parameters:
X (iterable or array-like, depending on transformers) – Input data to be transformed.
- Returns:
X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
- Return type:
array-like or sparse matrix of shape (n_samples, sum_n_components)
- class sklearn_utilities.IdTransformer[source]
Bases:
BaseEstimator
,TransformerMixin
A transformer that does nothing.
- class sklearn_utilities.IncludedColumnTransformerPandas(estimator: Any = IdTransformer(), include_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]
Bases:
BaseEstimator
,TransformerMixin
A transformer that includes columns from the input data frame.
- feature_names_in_: Sequence[str]
- feature_names_out_: Sequence[str]
- fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) DataFrame [source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class sklearn_utilities.IntersectXY(estimator: TEstimator)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]Estimator wrapper that intersects X and y indices before fitting.
- estimator: TEstimator
- class sklearn_utilities.PipelineVar(steps, *, memory=None, verbose=False)[source]
Bases:
Pipeline
Pipeline that supports predict_var method
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PipelineVar
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
- class sklearn_utilities.RecursiveFitSubtractRegressor(estimators: Sequence[TEstimator], *, n_jobs: int | None = None)[source]
Bases:
Generic
[TEstimator
],EstimatorWrapperBase
[TEstimator
]Regressor that fits the residual of the prediction of the previous model.
- property estimator: TEstimator
- estimators: Sequence[TEstimator]
- fit(X: Any, y: Any, predict_params: Mapping[str, Any] | None = None, **fit_params: Any) Self [source]
- set_fit_request(*, predict_params: bool | None | str = '$UNCHANGED$') RecursiveFitSubtractRegressor
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predict_params (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predict_params
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- class sklearn_utilities.ReindexMissingColumns(*, if_missing: Literal['warn', 'raise'] | Callable[[Index[Any], Index[Any]], None] = 'warn', reindex_kwargs: dict[Literal['method', 'copy', 'level', 'fill_value', 'limit', 'tolerance'], Any] = {})[source]
Bases:
BaseEstimator
,TransformerMixin
Reindex X to match the columns of the training data to avoid errors.
- class sklearn_utilities.ReportNonFinite(*, on_fit: bool = False, on_fit_y: bool = False, on_transform: bool = True, plot: bool = True, calc_corr: bool = False, callback: Callable[[dict[str, DataFrame | Series]], None] | None = None, callback_figure: Callable[[Figure], None] | None = <function ReportNonFinite.<lambda>>)[source]
Bases:
BaseEstimator
,TransformerMixin
Report non-finite values in X or y.
- class sklearn_utilities.SmartMultioutputEstimator(estimator: TEstimator, *, n_jobs: int | None = -1, verbose: int = 1, pass_numpy: bool = False)[source]
Bases:
BaseEstimator
,RegressorMixin
,Generic
[TEstimator
]- estimator: TEstimator
- estimators_: list[TEstimator]
- predict(X: DataFrame, **predict_params: Any) DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...] [source]
- predict_var(X: DataFrame, **predict_params: Any) DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...] [source]
- score(X: DataFrame, y: DataFrame, **score_params: Any) ndarray[Any, dtype[Any]] [source]
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- class sklearn_utilities.StandardScalerVar(*, copy: bool = True, with_mean: bool = True, with_std: bool = True, var_type: Literal['std', 'var'] = 'std')[source]
Bases:
StandardScaler
- inverse_transform(X: Any, copy: bool | None = None, return_std: bool = False) Any [source]
Scale back the data to the original representation.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to scale along the features axis.
copy (bool, default=None) – Copy the input X or not.
- Returns:
X_tr – Transformed array.
- Return type:
{ndarray, sparse matrix} of shape (n_samples, n_features)
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') StandardScalerVar
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$', return_std: bool | None | str = '$UNCHANGED$') StandardScalerVar
Request metadata passed to the
inverse_transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toinverse_transform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toinverse_transform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
copy
parameter ininverse_transform
.return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
return_std
parameter ininverse_transform
.
- Returns:
self – The updated object.
- Return type:
object
- set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') StandardScalerVar
Request metadata passed to the
partial_fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topartial_fit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topartial_fit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inpartial_fit
.- Returns:
self – The updated object.
- Return type:
object
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') StandardScalerVar
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
copy
parameter intransform
.- Returns:
self – The updated object.
- Return type:
object
- with_mean: bool
- class sklearn_utilities.TransformedTargetEstimatorVar(estimator: TEstimator, *, transformer: TTransformer = IdTransformer(), inverse_transform_separately: bool = False)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
,TTransformer
]TransformTargetRegressor with std/var support.
- estimator: TEstimator
- sklearn_utilities.since_event(event: Series[bool]) Series[float] [source]
Calculate the elapsed time since the last event.
- Parameters:
event (Series[bool]) – Whether the event occurred.
- Returns:
The difference between the index of the last event and the current index. If the index is a DatetimeIndex, the unit is hours.
- Return type:
Series[float]
- sklearn_utilities.until_event(event: Series[bool]) Series[float] [source]
Calculate the elapsed time until the next event.
- Parameters:
event (Series[bool]) – Whether the event occurred.
- Returns:
The difference between the index of the next event and the current index. If the index is a DatetimeIndex, the unit is hours.
- Return type:
Series[float]
Subpackages
- sklearn_utilities.pandas package
DataFrameWrapper
ExcludedColumnTransformerPandas
FeatureUnionPandas
IncludedColumnTransformerPandas
SmartMultioutputEstimator
- Submodules
- sklearn_utilities.pandas.column_transformer_pandas module
- sklearn_utilities.pandas.dataframe_wrapper module
- sklearn_utilities.pandas.feature_union_pandas module
- sklearn_utilities.pandas.multioutput module
- sklearn_utilities.proba package
ComposeVarEstimator
DummyRegressorVar
PipelineVar
StandardScalerVar
TransformedTargetEstimatorVar
- Submodules
- sklearn_utilities.proba.compose_var module
- sklearn_utilities.proba.dummy_regressor module
- sklearn_utilities.proba.pipeline_var module
- sklearn_utilities.proba.standard_scaler_var module
- sklearn_utilities.proba.transformed_target_estimator module
- sklearn_utilities.torch package
Submodules
sklearn_utilities.append_prediction_to_x module
- class sklearn_utilities.append_prediction_to_x.AppendPredictionToX(estimators: Sequence[TEstimator] | TEstimator, *, concat: bool = True, n_jobs: int | None = -1)[source]
Bases:
BaseEstimator
,TransformerMixin
,Generic
[TEstimator
]Append the prediction of the estimators to X.
- estimators: Sequence[TEstimator]
- estimators_: Sequence[TEstimator]
- transform(X: TX, y: Any = None, **predict_params: Any) TX [source]
Append the prediction of the estimators to X.
If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}_{estimator_index}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}_{estimator_index}” if the prediction is 2D.
- class sklearn_utilities.append_prediction_to_x.AppendPredictionToXSingle(estimator: TEstimator, *, concat: bool = True)[source]
Bases:
BaseEstimator
,TransformerMixin
,Generic
[TEstimator
]Append the prediction of the estimator to X. To use multiple estimators, use AppendPredictionToX instead.
- estimator: TEstimator
- estimator_: TEstimator
The fitted estimator.
- transform(X: TX, y: Any = None, **predict_params: Any) TX [source]
Append the prediction of the estimator to X. If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}” if the prediction is 2D.
sklearn_utilities.append_x_prediction_to_x module
- class sklearn_utilities.append_x_prediction_to_x.AppendXPredictionToX(estimator: TEstimator, *, variables: Sequence[Hashable] | None = None, append: bool = True, append_pred_diff: bool = True, append_pred_real_diff: bool = True)[source]
Bases:
BaseEstimator
,TransformerMixin
,Generic
[TEstimator
]Append the prediction of X by the estimator to X.
- estimator: TEstimator
sklearn_utilities.dataset module
- sklearn_utilities.dataset.add_missing_values(dataset: tuple[TX, TY], *, missing_rate_x: float = 0.5, missing_rate_y: float = 0.5, random_state: RandomState | int | None = None) tuple[TX, TY] [source]
Add missing values to a dataset.
- Parameters:
dataset (tuple[TX, TY]) – The dataset to add missing values to.
missing_rate_x (float, optional) – The rate of missing values to add to X, by default 0.5
missing_rate_y (float, optional) – The rate of missing values to add to y, by default 0.5
random_state (RandomState | int | None, optional) – The random state to use, by default None
- Returns:
The dataset with missing values added.
- Return type:
tuple[TX, TY]
sklearn_utilities.drop_by_noise_prediction module
- class sklearn_utilities.drop_by_noise_prediction.DropByNoisePrediction(estimator: Any | None = None, *, drop_rate: float = 0.1, distribution: Literal['uniform', 'normal', 'arange'] = 'uniform', random_state: RandomState | int | None = None)[source]
Bases:
SelectFromModel
Remove features based on their importance to a model’s prediction of noise.
“Unsupervised Learning by Predicting Noise” https://arxiv.org/pdf/1704.05310.pdf https://ar5iv.labs.arxiv.org/html/1704.05310
“Neural Architecture Search with Random Labels” https://arxiv.org/abs/2101.11834 https://ar5iv.labs.arxiv.org/html/2101.11834
Original Implementation: https://gist.github.com/richmanbtc/075178cd0e6d15c4a251128068991d47
- fit(X: Any, y: Any | None = None, **fit_params: Any) Self [source]
Fit the SelectFromModel meta-transformer.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,), default=None) – The target values (integers that correspond to classes in classification, real numbers in regression).
**fit_params (dict) –
If enable_metadata_routing=False (default):
Parameters directly passed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
If enable_metadata_routing=True:
Parameters safely routed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
Changed in version 1.4: See Metadata Routing User Guide for more details.
- Returns:
self – Fitted estimator.
- Return type:
object
- fit_transform(X: Any, y: Any = None, **fit_params: Any) Any [source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
sklearn_utilities.drop_missing_columns module
- sklearn_utilities.drop_missing_columns.DropMissingColumns(threshold_not_missing: float = 0.9) LooseGenericUnivariateSelect [source]
Drop columns with missing values above a threshold.
- Parameters:
threshold_not_missing (float, optional) – If the ratio of non-missing values is below or equals to this threshold, the column is dropped, by default 0.9
- class sklearn_utilities.drop_missing_columns.LooseGenericUnivariateSelect(score_func=<function f_classif>, *, mode='percentile', param=1e-05)[source]
Bases:
GenericUnivariateSelect
A GenericUnivariateSelect that does not require y and accepts missing values in X.
- fit(X: Any, y: Any | None = None) Self [source]
Run score function on (X, y) and get the appropriate features.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,) or None) – The target values (class labels in classification, real numbers in regression). If the selector is unsupervised then y can be set to None.
- Returns:
self – Returns the instance itself.
- Return type:
object
sklearn_utilities.drop_missing_rows_y module
- class sklearn_utilities.drop_missing_rows_y.DropMissingRowsY(estimator: TEstimator)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]A wrapper for estimators that drops NaN values from y before fitting.
- estimator: TEstimator
sklearn_utilities.estimator_wrapper module
sklearn_utilities.eval_set module
- class sklearn_utilities.eval_set.CatBoostProgressBarWrapper(estimator: TEstimator, *, tqdm_cls: Literal['auto', 'autonotebook', 'std', 'notebook', 'asyncio', 'keras', 'dask', 'tk', 'gui', 'rich', 'contrib.slack', 'contrib.discord', 'contrib.telegram', 'contrib.bells'] | type[tqdm.std.tqdm] = 'auto', tqdm_kwargs: dict[str, Any] | None = None, verbose: bool = True)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]A wrapper that splits the data into train and validation sets and passes the validation set to eval_set parameter of the estimator and shows the progress bar using tqdm.
It is recommended to set iterations in CatBoost.__init__ to show the progress bar. It is recommended to set early_stopping_rounds in CatBoost.__init__ to enable early stopping.
- estimator: TEstimator
- class sklearn_utilities.eval_set.EvalSetWrapper(estimator: TEstimator, *, test_size: float | int | None = None, train_size: float | int | None = None, random_state: int | RandomState | None = None, shuffle: bool = True, stratify: bool = False, **kwargs: Any)[source]
Bases:
EstimatorWrapperBase
[TEstimator
],Generic
[TEstimator
]A wrapper that splits the data into train and validation sets and passes the validation set to eval_set parameter of the estimator.
- estimator: TEstimator
sklearn_utilities.event module
- sklearn_utilities.event.since_event(event: Series[bool]) Series[float] [source]
Calculate the elapsed time since the last event.
- Parameters:
event (Series[bool]) – Whether the event occurred.
- Returns:
The difference between the index of the last event and the current index. If the index is a DatetimeIndex, the unit is hours.
- Return type:
Series[float]
- sklearn_utilities.event.until_event(event: Series[bool]) Series[float] [source]
Calculate the elapsed time until the next event.
- Parameters:
event (Series[bool]) – Whether the event occurred.
- Returns:
The difference between the index of the next event and the current index. If the index is a DatetimeIndex, the unit is hours.
- Return type:
Series[float]
sklearn_utilities.id_transformer module
sklearn_utilities.intersect module
sklearn_utilities.recursive_fit_subtract_regressor module
- class sklearn_utilities.recursive_fit_subtract_regressor.RecursiveFitSubtractRegressor(estimators: Sequence[TEstimator], *, n_jobs: int | None = None)[source]
Bases:
Generic
[TEstimator
],EstimatorWrapperBase
[TEstimator
]Regressor that fits the residual of the prediction of the previous model.
- property estimator: TEstimator
- estimators: Sequence[TEstimator]
- fit(X: Any, y: Any, predict_params: Mapping[str, Any] | None = None, **fit_params: Any) Self [source]
- set_fit_request(*, predict_params: bool | None | str = '$UNCHANGED$') RecursiveFitSubtractRegressor
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predict_params (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predict_params
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
sklearn_utilities.reindex_missing_columns module
- class sklearn_utilities.reindex_missing_columns.ReindexMissingColumns(*, if_missing: Literal['warn', 'raise'] | Callable[[Index[Any], Index[Any]], None] = 'warn', reindex_kwargs: dict[Literal['method', 'copy', 'level', 'fill_value', 'limit', 'tolerance'], Any] = {})[source]
Bases:
BaseEstimator
,TransformerMixin
Reindex X to match the columns of the training data to avoid errors.
sklearn_utilities.report_non_finite module
- class sklearn_utilities.report_non_finite.ReportNonFinite(*, on_fit: bool = False, on_fit_y: bool = False, on_transform: bool = True, plot: bool = True, calc_corr: bool = False, callback: Callable[[dict[str, DataFrame | Series]], None] | None = None, callback_figure: Callable[[Figure], None] | None = <function ReportNonFinite.<lambda>>)[source]
Bases:
BaseEstimator
,TransformerMixin
Report non-finite values in X or y.