bigframes.ml.model_selection.cross_validate#
- bigframes.ml.model_selection.cross_validate(estimator, X: DataFrame | Series | DataFrame | Series, y: DataFrame | Series | DataFrame | Series | None = None, *, cv: int | KFold | None = None) dict[str, list][source]#
Evaluate metric(s) by cross-validation and also record fit/score times.
Examples:
>>> import bigframes.pandas as bpd >>> from bigframes.ml.model_selection import cross_validate, KFold >>> from bigframes.ml.linear_model import LinearRegression >>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]}) >>> y = bpd.DataFrame({"label": [1, 2, 3]}) >>> model = LinearRegression() >>> scores = cross_validate(model, X, y, cv=3) >>> for score in scores["test_score"]: ... print(score["mean_squared_error"][0]) ... 5.218167286047954e-19 2.726229944928669e-18 1.6197635612324266e-17
- Parameters:
estimator – bigframes.ml model that implements fit().
data. (The object to use to fit the)
X (bigframes.dataframe.DataFrame or bigframes.series.Series) – The data to fit.
y (bigframes.dataframe.DataFrame, bigframes.series.Series or None) – The target variable to try to predict in the case of supe()rvised learning. Default to None.
cv (int, bigframes.ml.model_selection.KFold or None) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross validation,
int, to specify the number of folds in a KFold,
bigframes.ml.model_selection.KFold instance.
- Returns:
A dict of arrays containing the score/time arrays for each scorer is returned. The keys for this
dictare:test_scoreThe score array for test scores on each cv split.
fit_timeThe time for fitting the estimator on the train set for each cv split.
score_timeThe time for scoring the estimator on the test set for each cv split.
- Return type:
Dict[str, List]