bigframes.ml.decomposition.PCA#

class bigframes.ml.decomposition.PCA(n_components: int | float | None = None, *, svd_solver: Literal['full', 'randomized', 'auto'] = 'auto')[source]#

Principal component analysis (PCA).

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import PCA
>>> X = bpd.DataFrame({"feat0": [-1, -2, -3, 1, 2, 3], "feat1": [-1, -1, -2, 1, 1, 2]})
>>> pca = PCA(n_components=2).fit(X)
>>> pca.predict(X)
    principal_component_1  principal_component_2
0              -0.755243               0.157628
1               -1.05405              -0.141179
2              -1.809292               0.016449
3               0.755243              -0.157628
4                1.05405               0.141179
5               1.809292              -0.016449

[6 rows x 2 columns]
>>> pca.explained_variance_ratio_
    principal_component_id  explained_variance_ratio
0                       1                   0.00901
1                       0                   0.99099

[2 rows x 2 columns]
Parameters:
  • n_components (int, float or None, default None) – Number of components to keep. If n_components is not set, all components are kept, n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

  • svd_solver ("full", "randomized" or "auto", default "auto") – The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver.

Attributes

components_

Principal axes in feature space, representing the directions of maximum variance in the data.

explained_variance_

The amount of variance explained by each of the selected components.

explained_variance_ratio_

Percentage of variance explained by each of the selected components.

Methods

__init__([n_components, svd_solver])

detect_anomalies(X, *[, contamination])

Detect the anomaly data points of the input.

fit(X[, y])

Fit the model according to the given training data.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict the closest cluster for each sample in X.

register([vertex_ai_model_id])

Register the model to Vertex AI.

score([X, y])

Calculate evaluation metrics of the model.

to_gbq(model_name[, replace])

Save the model to BigQuery.