Welcome to Sklearn Utilities documentation!

Installation & Usage

Sklearn Utilities

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Utilities for scikit-learn.

Installation

Install this via pip (or your favourite package manager):

pip install sklearn-utilities

API

See Docs for more information.

  • EstimatorWrapperBase: base class for wrappers. Redirects all attributes which are not in the wrapper to the wrapped estimator.

  • DataFrameWrapper: tries to convert every estimator output to a pandas DataFrame or Series.

  • FeatureUnionPandas: a FeatureUnion that works with pandas DataFrames.

  • IncludedColumnTransformerPandas, ExcludedColumnTransformerPandas: select columns by name.

  • AppendPredictionToX: appends the prediction of y to X.

  • AppendXPredictionToX: appends the prediction of X to X.

  • DropByNoisePrediction: drops columns which has high importance in predicting noise.

  • DropMissingColumns: drops columns with missing values above a threshold.

  • DropMissingRowsY: drops rows with missing values in y. Use feature_engine.DropMissingData for X.

  • IntersectXY: drops rows where the index of X and y do not intersect. Use with feature_engine.DropMissingData.

  • ReindexMissingColumns: reindexes columns of X in transform() to match the columns of X in fit().

  • ReportNonFinite: reports non-finite values in X and/or y.

  • IdTransformer: a transformer that does nothing.

  • RecursiveFitSubtractRegressor: a regressor that recursively fits a regressor and subtracts the prediction from the target.

  • SmartMultioutputEstimator: a MultiOutputEstimator that supports tuple of arrays in predict() and supports pandas Series and DataFrame.

  • until_event(), since_event(): calculates the time since or until events (Series[bool])

  • ComposeVarEstimator: composes mean and std/var estimators.

  • DummyRegressorVar: DummyRegressor that returns 1.0 for std/var.

  • TransformedTargetRegressorVar: TransformedTargetRegressor with std/var support.

  • StandardScalerVar: StandardScaler with std/var support.

  • EvalSetWrapper, CatBoostProgressBarWrapper: wrapper that passes eval_set to fit() using train_test_split(), mainly for CatBoost. The latter shows progress bar (using tqdm) as well. Useful for early stopping. For LightGBM, see lightgbm-callbacks.

sklearn_utilities.dataset

  • add_missing_values(): adds missing values to a dataset.

sklearn_utilities.torch

  • PCATorch: faster PCA using PyTorch with GPU support.

sklearn_utilities.torch.skorch

  • SkorchReshaper, SkorchCNNReshaper: reshapes X and y for nn.Linear and nn.Conv1d/2d respectively. (For nn.Conv2d, uses np.sliding_window_view().)

  • AllowNaN: wraps a loss module and assign 0 to y and y_hat for indices where y contains NaN in forward()..

See also

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!