sklearn_utilities package

class sklearn_utilities.AppendPredictionToX(estimators: Sequence[TEstimator] | TEstimator, *, concat: bool = True, n_jobs: int | None = -1)[source]

Bases: BaseEstimator, TransformerMixin, Generic[TEstimator]

Append the prediction of the estimators to X.

estimators: Sequence[TEstimator]

estimators_: Sequence[TEstimator]

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]: Fit the estimators.

transform(X: TX, y: Any = None, **predict_params: Any) → TX[source]

Append the prediction of the estimators to X.

If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}_{estimator_index}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}_{estimator_index}” if the prediction is 2D.

class sklearn_utilities.AppendPredictionToXSingle(estimator: TEstimator, *, concat: bool = True)[source]

Bases: BaseEstimator, TransformerMixin, Generic[TEstimator]

Append the prediction of the estimator to X. To use multiple estimators, use AppendPredictionToX instead.

estimator: TEstimator

estimator_: TEstimator: The fitted estimator.

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]: Fit the estimator.

transform(X: TX, y: Any = None, **predict_params: Any) → TX[source]: Append the prediction of the estimator to X. If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}” if the prediction is 2D.

class sklearn_utilities.AppendXPredictionToX(estimator: TEstimator, *, variables: Sequence[Hashable] | None = None, append: bool = True, append_pred_diff: bool = True, append_pred_real_diff: bool = True)[source]

Bases: BaseEstimator, TransformerMixin, Generic[TEstimator]

Append the prediction of X by the estimator to X.

estimator: TEstimator

fit(X: DataFrame, y: Series | None = None, **fit_params: Any) → Self[source]

transform(X: DataFrame, y: Series | None = None, **transform_params: Any) → DataFrame[source]

class sklearn_utilities.ComposeVarEstimator(estimator: TEstimator, estimator_var: TEstimatorVar = DummyRegressorVar())[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator, TEstimatorVar]

Compose an estimator with a variance estimator.

estimator: TEstimator

fit(X: TX, y: TY, **fit_params: Any) → Self[source]

predict(X: TX, return_std: Literal[False] = False, **predict_params: Any) → TY[source]
predict(X: TX, return_std: Literal[True], **predict_params: Any) → tuple[TY, TY]

predict_var(X: TX, **predict_params: Any) → TY[source]

set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') → ComposeVarEstimator

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict.
Returns:: self – The updated object.
Return type:: object

class sklearn_utilities.DataFrameWrapper(estimator: TEstimator, *, pattern_x: str = '^(:?fit|transform|fit_transform)$', pattern_y: str = '^predict.*?$')[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

pattern_x: str

y_columns_or_name: Index[Any] | Hashable | None = None

class sklearn_utilities.DropByNoisePrediction(estimator: Any | None = None, *, drop_rate: float = 0.1, distribution: Literal['uniform', 'normal', 'arange'] = 'uniform', random_state: RandomState | int | None = None)[source]

Bases: SelectFromModel

Remove features based on their importance to a model’s prediction of noise.

“Unsupervised Learning by Predicting Noise” https://arxiv.org/pdf/1704.05310.pdf https://ar5iv.labs.arxiv.org/html/1704.05310

“Neural Architecture Search with Random Labels” https://arxiv.org/abs/2101.11834 https://ar5iv.labs.arxiv.org/html/2101.11834

Original Implementation: https://gist.github.com/richmanbtc/075178cd0e6d15c4a251128068991d47

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]

Fit the SelectFromModel meta-transformer.

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,), default=None) – The target values (integers that correspond to classes in classification, real numbers in regression).
**fit_params (dict) –
- If enable_metadata_routing=False (default):
  
  Parameters directly passed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
- If enable_metadata_routing=True:
  
  Parameters safely routed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
  
  Changed in version 1.4: See Metadata Routing User Guide for more details.

Returns:

self – Fitted estimator.

Return type:

object

fit_transform(X: Any, y: Any = None, **fit_params: Any) → Any[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

sklearn_utilities.DropMissingColumns(threshold_not_missing: float = 0.9) → LooseGenericUnivariateSelect[source]

Drop columns with missing values above a threshold.

Parameters:: threshold_not_missing (float, optional) – If the ratio of non-missing values is below or equals to this threshold, the column is dropped, by default 0.9

class sklearn_utilities.DropMissingRowsY(estimator: TEstimator)[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

A wrapper for estimators that drops NaN values from y before fitting.

estimator: TEstimator

fit(X: DataFrame, y: Any | None = None, **fit_params: Any) → Self[source]

class sklearn_utilities.DummyRegressorVar(*, strategy: Literal['mean', 'median', 'quantile', 'constant', 'mean'] = 'mean', constant: float | None | ArrayLike | int = None, quantile: float | None = None, allow_nan: bool = True)[source]

Bases: DummyRegressor

DummyRegressor with 1.0 variance.

fit(X: ArrayLike, y: ArrayLike, sample_weight: ArrayLike | None = None) → Self[source]

Fit the random regressor.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Target values.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

self – Fitted estimator.

Return type:

object

predict(X: ArrayLike, return_std: bool = False) → ndarray | tuple[ndarray, ndarray][source]

Perform classification on test vectors X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test data.
return_std (bool, default=False) –
Whether to return the standard deviation of posterior prediction. All zeros in this case.

New in version 0.20.

Returns:

y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Predicted target values for X.
y_std (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Standard deviation of predictive distribution of query points.

predict_var(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) → _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes][source]

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → DummyRegressorVar

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$') → DummyRegressorVar

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → DummyRegressorVar

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class sklearn_utilities.EstimatorWrapperBase(estimator: TEstimator)[source]

Bases: BaseEstimator, MetaEstimatorMixin, Generic[TEstimator]

A base class for estimator wrappers that delegates all attributes to the wrapped estimator.

estimator: TEstimator

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

A wrapper that splits the data into train and validation sets and passes the validation set to eval_set parameter of the estimator.

estimator: TEstimator

fit(X: Any, y: Any, **fit_params: Any) → Self[source]

Fit the estimator with eval_set set to the validation set.

Parameters:

X (Any) – The training input samples.
y (Any) – The target values.

Returns:

The fitted estimator.

Return type:

Self

class sklearn_utilities.ExcludedColumnTransformerPandas(estimator: Any = IdTransformer(), exclude_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]

Bases: BaseEstimator, TransformerMixin

A transformer that excludes columns from the input data frame.

feature_names_in_: Sequence[str]

feature_names_out_: Sequence[str]

fit(X: DataFrame, **fit_params: Any) → Self[source]

fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) → DataFrame[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X: DataFrame, y: Any = None, **transform_params: Any) → DataFrame[source]

class sklearn_utilities.FeatureUnionPandas(transformer_list, *, n_jobs=None, transformer_weights=None, verbose=False)[source]

Bases: FeatureUnion

fit_transform(X: Any, y: Any = None, **fit_params: Any) → Any[source]

Fit all transformers, transform the data and concatenate results.

Parameters:

X (iterable or array-like, depending on transformers) – Input data to be transformed.
y (array-like of shape (n_samples, n_outputs), default=None) – Targets for supervised learning.
**fit_params (dict, default=None) – Parameters to pass to the fit method of the estimator.

Returns:

X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

Return type:

array-like or sparse matrix of shape (n_samples, sum_n_components)

steps: List[Any]

transform(X: Any) → Any[source]

Transform X separately by each transformer, concatenate results.

Parameters:: X (iterable or array-like, depending on transformers) – Input data to be transformed.
Returns:: X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
Return type:: array-like or sparse matrix of shape (n_samples, sum_n_components)

class sklearn_utilities.IdTransformer[source]

Bases: BaseEstimator, TransformerMixin

A transformer that does nothing.

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]

inverse_transform(X: T, **transform_params: Any) → T[source]

inverse_transform_var(X: T, **transform_params: Any) → T[source]

transform(X: T, **transform_params: Any) → T[source]

class sklearn_utilities.IncludedColumnTransformerPandas(estimator: Any = IdTransformer(), include_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]

Bases: BaseEstimator, TransformerMixin

A transformer that includes columns from the input data frame.

feature_names_in_: Sequence[str]

feature_names_out_: Sequence[str]

fit(X: DataFrame, **fit_params: Any) → Self[source]

fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) → DataFrame[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X: DataFrame, y: Any = None, **transform_params: Any) → DataFrame[source]

class sklearn_utilities.IntersectXY(estimator: TEstimator)[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

Estimator wrapper that intersects X and y indices before fitting.

estimator: TEstimator

fit(X: Any, y: Any, **fit_params: Any) → Self[source]

class sklearn_utilities.PipelineVar(steps, *, memory=None, verbose=False)[source]

Bases: Pipeline

Pipeline that supports predict_var method

predict_var(X: Any, **predict_params: Any) → Any[source]

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → PipelineVar

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class sklearn_utilities.RecursiveFitSubtractRegressor(estimators: Sequence[TEstimator], *, n_jobs: int | None = None)[source]

Bases: Generic[TEstimator], EstimatorWrapperBase[TEstimator]

Regressor that fits the residual of the prediction of the previous model.

property estimator: TEstimator

estimators: Sequence[TEstimator]

fit(X: Any, y: Any, predict_params: Mapping[str, Any] | None = None, **fit_params: Any) → Self[source]

predict(X: Any, **predict_params: Any) → ndarray[Any, dtype[Any]][source]

predict_var(X: Any, **predict_params: Any) → ndarray[Any, dtype[Any]][source]

set_fit_request(*, predict_params: bool | None | str = '$UNCHANGED$') → RecursiveFitSubtractRegressor

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: predict_params (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predict_params parameter in fit.
Returns:: self – The updated object.
Return type:: object

class sklearn_utilities.ReindexMissingColumns(*, if_missing: Literal['warn', 'raise'] | Callable[[Index[Any], Index[Any]], None] = 'warn', reindex_kwargs: dict[Literal['method', 'copy', 'level', 'fill_value', 'limit', 'tolerance'], Any] = {})[source]

Bases: BaseEstimator, TransformerMixin

Reindex X to match the columns of the training data to avoid errors.

fit(X: DataFrame, y: Any | None = None, **fit_params: Any) → Self[source]

transform(X: TXPandas, y: Any = None, **fit_params: Any) → TXPandas[source]

class sklearn_utilities.ReportNonFinite(*, on_fit: bool = False, on_fit_y: bool = False, on_transform: bool = True, plot: bool = True, calc_corr: bool = False, callback: Callable[[dict[str, DataFrame | Series]], None] | None = None, callback_figure: Callable[[Figure], None] | None = <function ReportNonFinite.<lambda>>)[source]

Bases: BaseEstimator, TransformerMixin

Report non-finite values in X or y.

fit(X: DataFrame, y: Any | None = None, **fit_params: Any) → Self[source]

transform(X: TXPandas, y: Any = None, **fit_params: Any) → TXPandas[source]

class sklearn_utilities.SmartMultioutputEstimator(estimator: TEstimator, *, n_jobs: int | None = -1, verbose: int = 1, pass_numpy: bool = False)[source]

Bases: BaseEstimator, RegressorMixin, Generic[TEstimator]

estimator: TEstimator

estimators_: list[TEstimator]

fit(X: DataFrame, y: DataFrame, **fit_params: Any) → Self[source]

predict(X: DataFrame, **predict_params: Any) → DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...][source]

predict_var(X: DataFrame, **predict_params: Any) → DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...][source]

score(X: DataFrame, y: DataFrame, **score_params: Any) → ndarray[Any, dtype[Any]][source]

Return the coefficient of determination of the prediction.

The coefficient of determination $R^2$ is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – $R^2$ of self.predict(X) w.r.t. y.

Return type:

float

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

class sklearn_utilities.StandardScalerVar(*, copy: bool = True, with_mean: bool = True, with_std: bool = True, var_type: Literal['std', 'var'] = 'std')[source]

Bases: StandardScaler

inverse_transform(X: Any, copy: bool | None = None, return_std: bool = False) → Any[source]

Scale back the data to the original representation.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to scale along the features axis.
copy (bool, default=None) – Copy the input X or not.

Returns:

X_tr – Transformed array.

Return type:

{ndarray, sparse matrix} of shape (n_samples, n_features)

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StandardScalerVar

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$', return_std: bool | None | str = '$UNCHANGED$') → StandardScalerVar

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in inverse_transform.
return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in inverse_transform.

Returns:

self – The updated object.

Return type:

object

set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StandardScalerVar

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.
Returns:: self – The updated object.
Return type:: object

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') → StandardScalerVar

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.
Returns:: self – The updated object.
Return type:: object

with_mean: bool

class sklearn_utilities.TransformedTargetEstimatorVar(estimator: TEstimator, *, transformer: TTransformer = IdTransformer(), inverse_transform_separately: bool = False)[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator, TTransformer]

TransformTargetRegressor with std/var support.

estimator: TEstimator

fit(X: TX, y: TY, **fit_params: Any) → Self[source]

predict(X: TX, **predict_params: Any) → TY | tuple[TY, TY][source]

predict_var(X: TX, **predict_params: Any) → TY[source]

sklearn_utilities.since_event(event: Series[bool]) → Series[float][source]

Calculate the elapsed time since the last event.

Parameters:: event (Series[bool]) – Whether the event occurred.
Returns:: The difference between the index of the last event and the current index. If the index is a DatetimeIndex, the unit is hours.
Return type:: Series[float]

sklearn_utilities.until_event(event: Series[bool]) → Series[float][source]

Calculate the elapsed time until the next event.

Parameters:: event (Series[bool]) – Whether the event occurred.
Returns:: The difference between the index of the next event and the current index. If the index is a DatetimeIndex, the unit is hours.
Return type:: Series[float]

Subpackages

Submodules

sklearn_utilities.append_prediction_to_x module

class sklearn_utilities.append_prediction_to_x.AppendPredictionToX(estimators: Sequence[TEstimator] | TEstimator, *, concat: bool = True, n_jobs: int | None = -1)[source]

Bases: BaseEstimator, TransformerMixin, Generic[TEstimator]

Append the prediction of the estimators to X.

estimators: Sequence[TEstimator]

estimators_: Sequence[TEstimator]

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]: Fit the estimators.

transform(X: TX, y: Any = None, **predict_params: Any) → TX[source]

Append the prediction of the estimators to X.

If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}_{estimator_index}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}_{estimator_index}” if the prediction is 2D.

class sklearn_utilities.append_prediction_to_x.AppendPredictionToXSingle(estimator: TEstimator, *, concat: bool = True)[source]

Bases: BaseEstimator, TransformerMixin, Generic[TEstimator]

Append the prediction of the estimator to X. To use multiple estimators, use AppendPredictionToX instead.

estimator: TEstimator

estimator_: TEstimator: The fitted estimator.

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]: Fit the estimator.

transform(X: TX, y: Any = None, **predict_params: Any) → TX[source]: Append the prediction of the estimator to X. If pandas DataFrame is given, the prediction is added as a new column with the name “y_pred_{estimator.__class__.__name__}_{i}” if the prediction is 1D, or “y_pred_{estimator.__class__.__name__}_{i}_{column_name}” if the prediction is 2D.

sklearn_utilities.append_prediction_to_x.generate_new_prefix(X: DataFrame, prefix: str = 'y_pred_') → str[source]: Generate a new column name for the prediction.

sklearn_utilities.append_x_prediction_to_x module

class sklearn_utilities.append_x_prediction_to_x.AppendXPredictionToX(estimator: TEstimator, *, variables: Sequence[Hashable] | None = None, append: bool = True, append_pred_diff: bool = True, append_pred_real_diff: bool = True)[source]

Bases: BaseEstimator, TransformerMixin, Generic[TEstimator]

Append the prediction of X by the estimator to X.

estimator: TEstimator

fit(X: DataFrame, y: Series | None = None, **fit_params: Any) → Self[source]

transform(X: DataFrame, y: Series | None = None, **transform_params: Any) → DataFrame[source]

sklearn_utilities.dataset module

sklearn_utilities.dataset.add_missing_values(dataset: tuple[TX, TY], *, missing_rate_x: float = 0.5, missing_rate_y: float = 0.5, random_state: RandomState | int | None = None) → tuple[TX, TY][source]

Add missing values to a dataset.

Parameters:

dataset (tuple[TX, TY]) – The dataset to add missing values to.
missing_rate_x (float, optional) – The rate of missing values to add to X, by default 0.5
missing_rate_y (float, optional) – The rate of missing values to add to y, by default 0.5
random_state (RandomState | int | None, optional) – The random state to use, by default None

Returns:

The dataset with missing values added.

Return type:

tuple[TX, TY]

sklearn_utilities.drop_by_noise_prediction module

class sklearn_utilities.drop_by_noise_prediction.DropByNoisePrediction(estimator: Any | None = None, *, drop_rate: float = 0.1, distribution: Literal['uniform', 'normal', 'arange'] = 'uniform', random_state: RandomState | int | None = None)[source]

Bases: SelectFromModel

Remove features based on their importance to a model’s prediction of noise.

“Unsupervised Learning by Predicting Noise” https://arxiv.org/pdf/1704.05310.pdf https://ar5iv.labs.arxiv.org/html/1704.05310

“Neural Architecture Search with Random Labels” https://arxiv.org/abs/2101.11834 https://ar5iv.labs.arxiv.org/html/2101.11834

Original Implementation: https://gist.github.com/richmanbtc/075178cd0e6d15c4a251128068991d47

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]

Fit the SelectFromModel meta-transformer.

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,), default=None) – The target values (integers that correspond to classes in classification, real numbers in regression).
**fit_params (dict) –
- If enable_metadata_routing=False (default):
  
  Parameters directly passed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
- If enable_metadata_routing=True:
  
  Parameters safely routed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.
  
  Changed in version 1.4: See Metadata Routing User Guide for more details.

Returns:

self – Fitted estimator.

Return type:

object

fit_transform(X: Any, y: Any = None, **fit_params: Any) → Any[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

sklearn_utilities.drop_missing_columns module

sklearn_utilities.drop_missing_columns.DropMissingColumns(threshold_not_missing: float = 0.9) → LooseGenericUnivariateSelect[source]

Drop columns with missing values above a threshold.

Parameters:: threshold_not_missing (float, optional) – If the ratio of non-missing values is below or equals to this threshold, the column is dropped, by default 0.9

class sklearn_utilities.drop_missing_columns.LooseGenericUnivariateSelect(score_func=<function f_classif>, *, mode='percentile', param=1e-05)[source]

Bases: GenericUnivariateSelect

A GenericUnivariateSelect that does not require y and accepts missing values in X.

fit(X: Any, y: Any | None = None) → Self[source]

Run score function on (X, y) and get the appropriate features.

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,) or None) – The target values (class labels in classification, real numbers in regression). If the selector is unsupervised then y can be set to None.

Returns:

self – Returns the instance itself.

Return type:

object

sklearn_utilities.drop_missing_rows_y module

class sklearn_utilities.drop_missing_rows_y.DropMissingRowsY(estimator: TEstimator)[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

A wrapper for estimators that drops NaN values from y before fitting.

estimator: TEstimator

fit(X: DataFrame, y: Any | None = None, **fit_params: Any) → Self[source]

sklearn_utilities.estimator_wrapper module

class sklearn_utilities.estimator_wrapper.EstimatorWrapperBase(estimator: TEstimator)[source]

Bases: BaseEstimator, MetaEstimatorMixin, Generic[TEstimator]

A base class for estimator wrappers that delegates all attributes to the wrapped estimator.

estimator: TEstimator

sklearn_utilities.eval_set module

class sklearn_utilities.eval_set.CatBoostProgressBarWrapper(estimator: TEstimator, *, tqdm_cls: Literal['auto', 'autonotebook', 'std', 'notebook', 'asyncio', 'keras', 'dask', 'tk', 'gui', 'rich', 'contrib.slack', 'contrib.discord', 'contrib.telegram', 'contrib.bells'] | type[tqdm.std.tqdm] = 'auto', tqdm_kwargs: dict[str, Any] | None = None, verbose: bool = True)[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

A wrapper that splits the data into train and validation sets and passes the validation set to eval_set parameter of the estimator and shows the progress bar using tqdm.

It is recommended to set iterations in CatBoost.__init__ to show the progress bar. It is recommended to set early_stopping_rounds in CatBoost.__init__ to enable early stopping.

estimator: TEstimator

fit(X: Any, y: Any, **fit_params: Any) → Self[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

A wrapper that splits the data into train and validation sets and passes the validation set to eval_set parameter of the estimator.

estimator: TEstimator

fit(X: Any, y: Any, **fit_params: Any) → Self[source]

Fit the estimator with eval_set set to the validation set.

Parameters:

X (Any) – The training input samples.
y (Any) – The target values.

Returns:

The fitted estimator.

Return type:

Self

sklearn_utilities.event module

sklearn_utilities.event.since_event(event: Series[bool]) → Series[float][source]

Calculate the elapsed time since the last event.

Parameters:: event (Series[bool]) – Whether the event occurred.
Returns:: The difference between the index of the last event and the current index. If the index is a DatetimeIndex, the unit is hours.
Return type:: Series[float]

sklearn_utilities.event.until_event(event: Series[bool]) → Series[float][source]

Calculate the elapsed time until the next event.

Parameters:: event (Series[bool]) – Whether the event occurred.
Returns:: The difference between the index of the next event and the current index. If the index is a DatetimeIndex, the unit is hours.
Return type:: Series[float]

sklearn_utilities.id_transformer module

class sklearn_utilities.id_transformer.IdTransformer[source]

Bases: BaseEstimator, TransformerMixin

A transformer that does nothing.

fit(X: Any, y: Any | None = None, **fit_params: Any) → Self[source]

inverse_transform(X: T, **transform_params: Any) → T[source]

inverse_transform_var(X: T, **transform_params: Any) → T[source]

transform(X: T, **transform_params: Any) → T[source]

sklearn_utilities.intersect module

class sklearn_utilities.intersect.IntersectXY(estimator: TEstimator)[source]

Bases: EstimatorWrapperBase[TEstimator], Generic[TEstimator]

Estimator wrapper that intersects X and y indices before fitting.

estimator: TEstimator

fit(X: Any, y: Any, **fit_params: Any) → Self[source]

sklearn_utilities.recursive_fit_subtract_regressor module

class sklearn_utilities.recursive_fit_subtract_regressor.RecursiveFitSubtractRegressor(estimators: Sequence[TEstimator], *, n_jobs: int | None = None)[source]

Bases: Generic[TEstimator], EstimatorWrapperBase[TEstimator]

Regressor that fits the residual of the prediction of the previous model.

property estimator: TEstimator

estimators: Sequence[TEstimator]

fit(X: Any, y: Any, predict_params: Mapping[str, Any] | None = None, **fit_params: Any) → Self[source]

predict(X: Any, **predict_params: Any) → ndarray[Any, dtype[Any]][source]

predict_var(X: Any, **predict_params: Any) → ndarray[Any, dtype[Any]][source]

set_fit_request(*, predict_params: bool | None | str = '$UNCHANGED$') → RecursiveFitSubtractRegressor

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: predict_params (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predict_params parameter in fit.
Returns:: self – The updated object.
Return type:: object

sklearn_utilities.reindex_missing_columns module

class sklearn_utilities.reindex_missing_columns.ReindexMissingColumns(*, if_missing: Literal['warn', 'raise'] | Callable[[Index[Any], Index[Any]], None] = 'warn', reindex_kwargs: dict[Literal['method', 'copy', 'level', 'fill_value', 'limit', 'tolerance'], Any] = {})[source]

Bases: BaseEstimator, TransformerMixin

Reindex X to match the columns of the training data to avoid errors.

fit(X: DataFrame, y: Any | None = None, **fit_params: Any) → Self[source]

transform(X: TXPandas, y: Any = None, **fit_params: Any) → TXPandas[source]

sklearn_utilities.report_non_finite module

class sklearn_utilities.report_non_finite.ReportNonFinite(*, on_fit: bool = False, on_fit_y: bool = False, on_transform: bool = True, plot: bool = True, calc_corr: bool = False, callback: Callable[[dict[str, DataFrame | Series]], None] | None = None, callback_figure: Callable[[Figure], None] | None = <function ReportNonFinite.<lambda>>)[source]

Bases: BaseEstimator, TransformerMixin

Report non-finite values in X or y.

fit(X: DataFrame, y: Any | None = None, **fit_params: Any) → Self[source]

transform(X: TXPandas, y: Any = None, **fit_params: Any) → TXPandas[source]

sklearn_utilities.types module

sklearn_utilities.utils module

sklearn_utilities.utils.drop_X_y(X: TXPandas, y: TYPandas) → tuple[TXPandas, TYPandas][source]

sklearn_utilities.utils.intersect_X_y(X: TXPandas, y: TYPandas) → tuple[TXPandas, TYPandas][source]