sklearn_utilities.pandas package
- class sklearn_utilities.pandas.DataFrameWrapper(estimator: TEstimator, *, pattern_x: str = '^(:?fit|transform|fit_transform)$', pattern_y: str = '^predict.*?$')[source]
Bases:
EstimatorWrapperBase[TEstimator],Generic[TEstimator]- estimator: TEstimator
- pattern_x: str
- y_columns_or_name: Index[Any] | Hashable | None = None
- class sklearn_utilities.pandas.ExcludedColumnTransformerPandas(estimator: Any = IdTransformer(), exclude_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]
Bases:
BaseEstimator,TransformerMixinA transformer that excludes columns from the input data frame.
- feature_names_in_: Sequence[str]
- feature_names_out_: Sequence[str]
- fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) DataFrame[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class sklearn_utilities.pandas.FeatureUnionPandas(transformer_list, *, n_jobs=None, transformer_weights=None, verbose=False)[source]
Bases:
FeatureUnion- fit_transform(X: Any, y: Any = None, **fit_params: Any) Any[source]
Fit all transformers, transform the data and concatenate results.
- Parameters:
X (iterable or array-like, depending on transformers) – Input data to be transformed.
y (array-like of shape (n_samples, n_outputs), default=None) – Targets for supervised learning.
**fit_params (dict, default=None) – Parameters to pass to the fit method of the estimator.
- Returns:
X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
- Return type:
array-like or sparse matrix of shape (n_samples, sum_n_components)
- steps: List[Any]
- transform(X: Any) Any[source]
Transform X separately by each transformer, concatenate results.
- Parameters:
X (iterable or array-like, depending on transformers) – Input data to be transformed.
- Returns:
X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
- Return type:
array-like or sparse matrix of shape (n_samples, sum_n_components)
- class sklearn_utilities.pandas.IncludedColumnTransformerPandas(estimator: Any = IdTransformer(), include_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]
Bases:
BaseEstimator,TransformerMixinA transformer that includes columns from the input data frame.
- feature_names_in_: Sequence[str]
- feature_names_out_: Sequence[str]
- fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) DataFrame[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class sklearn_utilities.pandas.SmartMultioutputEstimator(estimator: TEstimator, *, n_jobs: int | None = -1, verbose: int = 1, pass_numpy: bool = False)[source]
Bases:
BaseEstimator,RegressorMixin,Generic[TEstimator]- estimator: TEstimator
- estimators_: list[TEstimator]
- predict(X: DataFrame, **predict_params: Any) DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...][source]
- predict_var(X: DataFrame, **predict_params: Any) DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...][source]
- score(X: DataFrame, y: DataFrame, **score_params: Any) ndarray[Any, dtype[Any]][source]
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)w.r.t. y.- Return type:
float
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score(). This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
Submodules
sklearn_utilities.pandas.column_transformer_pandas module
- class sklearn_utilities.pandas.column_transformer_pandas.ExcludedColumnTransformerPandas(estimator: Any = IdTransformer(), exclude_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]
Bases:
BaseEstimator,TransformerMixinA transformer that excludes columns from the input data frame.
- feature_names_in_: Sequence[str]
- feature_names_out_: Sequence[str]
- fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) DataFrame[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class sklearn_utilities.pandas.column_transformer_pandas.IncludedColumnTransformerPandas(estimator: Any = IdTransformer(), include_columns: Sequence[str] | Callable[[Sequence[str]], Sequence[bool]] = [])[source]
Bases:
BaseEstimator,TransformerMixinA transformer that includes columns from the input data frame.
- feature_names_in_: Sequence[str]
- feature_names_out_: Sequence[str]
- fit_transform(X: DataFrame, y: Any = None, **fit_params: Any) DataFrame[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
sklearn_utilities.pandas.dataframe_wrapper module
- class sklearn_utilities.pandas.dataframe_wrapper.DataFrameWrapper(estimator: TEstimator, *, pattern_x: str = '^(:?fit|transform|fit_transform)$', pattern_y: str = '^predict.*?$')[source]
Bases:
EstimatorWrapperBase[TEstimator],Generic[TEstimator]- estimator: TEstimator
- pattern_x: str
- y_columns_or_name: Index[Any] | Hashable | None = None
sklearn_utilities.pandas.feature_union_pandas module
- class sklearn_utilities.pandas.feature_union_pandas.FeatureUnionPandas(transformer_list, *, n_jobs=None, transformer_weights=None, verbose=False)[source]
Bases:
FeatureUnion- fit_transform(X: Any, y: Any = None, **fit_params: Any) Any[source]
Fit all transformers, transform the data and concatenate results.
- Parameters:
X (iterable or array-like, depending on transformers) – Input data to be transformed.
y (array-like of shape (n_samples, n_outputs), default=None) – Targets for supervised learning.
**fit_params (dict, default=None) – Parameters to pass to the fit method of the estimator.
- Returns:
X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
- Return type:
array-like or sparse matrix of shape (n_samples, sum_n_components)
- steps: List[Any]
- transform(X: Any) Any[source]
Transform X separately by each transformer, concatenate results.
- Parameters:
X (iterable or array-like, depending on transformers) – Input data to be transformed.
- Returns:
X_t – The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
- Return type:
array-like or sparse matrix of shape (n_samples, sum_n_components)
sklearn_utilities.pandas.multioutput module
- class sklearn_utilities.pandas.multioutput.SmartMultioutputEstimator(estimator: TEstimator, *, n_jobs: int | None = -1, verbose: int = 1, pass_numpy: bool = False)[source]
Bases:
BaseEstimator,RegressorMixin,Generic[TEstimator]- estimator: TEstimator
- estimators_: list[TEstimator]
- predict(X: DataFrame, **predict_params: Any) DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...][source]
- predict_var(X: DataFrame, **predict_params: Any) DataFrame | Series | NDArray[Any] | tuple[DataFrame | Series | NDArray[Any], ...][source]
- score(X: DataFrame, y: DataFrame, **score_params: Any) ndarray[Any, dtype[Any]][source]
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)w.r.t. y.- Return type:
float
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score(). This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).