bigframes.ml.preprocessing.StandardScaler#

class bigframes.ml.preprocessing.StandardScaler[source]#

Standardize features by removing the mean and scaling to unit variance.

The standard score of a sample x is calculated as:z = (x - u) / s where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform().

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

Examples:

from bigframes.ml.preprocessing import StandardScaler
import bigframes.pandas as bpd

scaler = StandardScaler()
data = bpd.DataFrame({"a": [0, 0, 1, 1], "b":[0, 0, 1, 1]})
scaler.fit(data)
print(scaler.transform(data))
print(scaler.transform(bpd.DataFrame({"a": [2], "b":[2]})))

fit(X: DataFrame | Series | DataFrame | Series, y=None) → StandardScaler[source]#

Compute the mean and std to be used for later scaling.

Parameters:

X (bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series) – The Dataframe or Series with training data.
y (default None) – Ignored.

Returns:

Fitted scaler.

Return type:

StandardScaler

transform(X: DataFrame | Series | DataFrame | Series) → DataFrame[source]#

Perform standardization by centering and scaling.

Parameters:: X (bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series) – The DataFrame or Series to be transformed.
Returns:: Transformed result.
Return type:: bigframes.dataframe.DataFrame

bigframes.ml.preprocessing.StandardScaler#

This Page