bigframes.ml.model_selection.cross_validate#

Evaluate metric(s) by cross-validation and also record fit/score times.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import cross_validate, KFold
>>> from bigframes.ml.linear_model import LinearRegression
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> model = LinearRegression()
>>> scores = cross_validate(model, X, y, cv=3)
>>> for score in scores["test_score"]:
...   print(score["mean_squared_error"][0])
...
5.218167286047954e-19
2.726229944928669e-18
1.6197635612324266e-17

Parameters:

estimator – bigframes.ml model that implements fit().
data. (The object to use to fit the)
X (bigframes.dataframe.DataFrame or bigframes.series.Series) – The data to fit.
y (bigframes.dataframe.DataFrame, bigframes.series.Series or None) – The target variable to try to predict in the case of supe()rvised learning. Default to None.
cv (int, bigframes.ml.model_selection.KFold or None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 5-fold cross validation,
- int, to specify the number of folds in a KFold,
- bigframes.ml.model_selection.KFold instance.

Returns:

A dict of arrays containing the score/time arrays for each scorer is returned. The keys for this dict are:

test_score
The score array for test scores on each cv split.

fit_time
The time for fitting the estimator on the train set for each cv split.

score_time
The time for scoring the estimator on the test set for each cv split.

Return type:

Dict[str, List]

bigframes.ml.model_selection.cross_validate#

This Page