sklearn_utilities.torch package

class sklearn_utilities.torch.PCATorch(n_components: int | None = None, *, qr: bool = False, svd_flip: bool | None = None, device: torch.device | int | str = 'cpu', dtype: torch.dtype = torch.float32, **kwargs: Any)[source]

Bases: Module, BaseEstimator, TransformerMixin

PCA using torch.linalg.svd.

If using CUDA, the first call may take significantly long time (~2s) due to CUDA initialization, but the subsequent calls should be faster than sklearn.decomposition.PCA, although the algorithm might be less efficient.

Call python -m sklearn_utilities.torch.pca 10000x100 to test the performance.

If we could easily replace np with torch in sklearn…

mean_

The mean vector.

Type:

torch.Tensor

components_

Vt where X = U D Vt Y = X Vh

Type:

torch.Tensor

fit(*args: Any, **kwargs: Any) Any[source]
forward(X: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

inverse_transform(*args: Any, **kwargs: Any) Any[source]
training: bool
transform(*args: Any, **kwargs: Any) Any[source]

Subpackages

Submodules

sklearn_utilities.torch.pca module

class sklearn_utilities.torch.pca.PCATorch(n_components: int | None = None, *, qr: bool = False, svd_flip: bool | None = None, device: torch.device | int | str = 'cpu', dtype: torch.dtype = torch.float32, **kwargs: Any)[source]

Bases: Module, BaseEstimator, TransformerMixin

PCA using torch.linalg.svd.

If using CUDA, the first call may take significantly long time (~2s) due to CUDA initialization, but the subsequent calls should be faster than sklearn.decomposition.PCA, although the algorithm might be less efficient.

Call python -m sklearn_utilities.torch.pca 10000x100 to test the performance.

If we could easily replace np with torch in sklearn…

mean_

The mean vector.

Type:

torch.Tensor

components_

Vt where X = U D Vt Y = X Vh

Type:

torch.Tensor

fit(*args: Any, **kwargs: Any) Any[source]
forward(X: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

inverse_transform(*args: Any, **kwargs: Any) Any[source]
training: bool
transform(*args: Any, **kwargs: Any) Any[source]
sklearn_utilities.torch.pca.pca_performance_test() None[source]
sklearn_utilities.torch.pca.svd_flip(u: Tensor, v: Tensor, u_based_decision: bool = True) tuple[Tensor, Tensor][source]

Sign correction to ensure deterministic output from SVD.

Adjusts the columns of u and the rows of v such that the loadings in the columns in u that are largest in absolute value are always positive.

Parameters:
  • u (torch.Tensor) – Parameters u and v are the output of linalg.svd or randomized_svd(), with matching inner dimensions so one can compute np.dot(u * s, v).

  • v (torch.Tensor) – Parameters u and v are the output of linalg.svd or randomized_svd(), with matching inner dimensions so one can compute np.dot(u * s, v). The input v should really be called vt to be consistent with scipy’s output.

  • u_based_decision (bool, default=True) – If True, use the columns of u as the basis for sign flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.

Returns:

  • u_adjusted (torch.Tensor) – Array u with adjusted columns and the same dimensions as u.

  • v_adjusted (torch.Tensor) – Array v with adjusted rows and the same dimensions as v.

sklearn_utilities.torch.pca.wrap_torch(func: Callable[[...], Any]) Callable[[...], Any][source]

Wrap a function to convert all non-torch.Tensor arguments to torch.Tensor and convert the result to numpy.ndarray.

Parameters:

func (Callable[..., Any]) – The function to wrap.

Returns:

The wrapped function.

Return type:

Callable[…, Any]