bigframes.ml.pipeline.Pipeline#

class bigframes.ml.pipeline.Pipeline(steps: List[Tuple[str, BaseEstimator]])[source]#

Pipeline of transforms with a final estimator.

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be transforms. That is, they must implement fit and transform methods. The final estimator only needs to implement fit.

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. This simplifies code and allows for deploying an estimator and preprocessing together, e.g. with Pipeline.to_gbq(…).

fit(X: DataFrame | Series, y: DataFrame | Series | None = None) → Pipeline[source]#

Fit the model.

Fit all the transformers one after the other and transform the data. Finally, fit the transformed data using the final estimator.

Parameters:

X (bigframes.dataframe.DataFrame or bigframes.series.Series) – A DataFrame or Series representing training data. Must match the input requirements of the first step of the pipeline.
y (bigframes.dataframe.DataFrame or bigframes.series.Series) – A DataFrame or Series representing training targets, if applicable.

Returns:

Pipeline with fitted steps.

Return type:

Pipeline

to_gbq(model_name: str, replace: bool = False) → Pipeline[source]#

Save the pipeline to BigQuery.

Parameters:

model_name (str) – The name of the model(pipeline).
replace (bool, default False) – Whether to replace if the model(pipeline) already exists. Default to False.

Returns:

Saved model(pipeline).

Return type:

Pipeline