bigframes.ml.linear_model.LinearRegression#
- class bigframes.ml.linear_model.LinearRegression(*, optimize_strategy: Literal['auto_strategy', 'batch_gradient_descent', 'normal_equation'] = 'auto_strategy', fit_intercept: bool = True, l1_reg: float | None = None, l2_reg: float = 0.0, max_iterations: int = 20, warm_start: bool = False, learning_rate: float | None = None, learning_rate_strategy: Literal['line_search', 'constant'] = 'line_search', tol: float = 0.01, ls_init_learning_rate: float | None = None, calculate_p_values: bool = False, enable_global_explain: bool = False)[source]#
Ordinary least squares Linear Regression.
LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
Examples:
>>> from bigframes.ml.linear_model import LinearRegression >>> import bigframes.pandas as bpd >>> X = bpd.DataFrame({ "feature0": [20, 21, 19, 18], "feature1": [0, 1, 1, 0], "feature2": [0.2, 0.3, 0.4, 0.5]}) >>> y = bpd.DataFrame({"outcome": [0, 0, 1, 1]}) >>> # Create the linear model >>> model = LinearRegression() >>> model.fit(X, y) LinearRegression()
>>> # Score the model >>> score = model.score(X, y) >>> print(score) mean_absolute_error mean_squared_error mean_squared_log_error 0 0.022812 0.000602 0.00035 median_absolute_error r2_score explained_variance 0 0.015077 0.997591 0.997591
- Parameters:
optimize_strategy (str, default "auto_strategy") – The strategy to train linear regression models. Possible values are “auto_strategy”, “batch_gradient_descent”, “normal_equation”. Default to “auto_strategy”.
fit_intercept (bool, default True) – Default
True. Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).l1_reg (float or None, default None) – The amount of L1 regularization applied. Default to None. Can’t be set in “normal_equation” mode. If unset, value 0 is used.
l2_reg (float, default 0.0) – The amount of L2 regularization applied. Default to 0.
max_iterations (int, default 20) – The maximum number of training iterations or steps. Default to 20.
warm_start (bool, default False) – Determines whether to train a model with new training data, new model options, or both. Unless you explicitly override them, the initial options used to train the model are used for the warm start run. Default to False.
learning_rate (float or None, default None) – The learn rate for gradient descent when learning_rate_strategy=’constant’. If unset, value 0.1 is used. If learning_rate_strategy=’line_search’, an error is returned.
learning_rate_strategy (str, default "line_search") – The strategy for specifying the learning rate during training. Default to “line_search”.
tol (float, default 0.01) – The minimum relative loss improvement that is necessary to continue training when EARLY_STOP is set to true. For example, a value of 0.01 specifies that each iteration must reduce the loss by 1% for training to continue. Default to 0.01.
ls_init_learning_rate (float or None, default None) – Sets the initial learning rate that learning_rate_strategy=’line_search’ uses. This option can only be used if line_search is specified. If unset, value 0.1 is used.
calculate_p_values (bool, default False) – Specifies whether to compute p-values and standard errors during training. Default to False.
enable_global_explain (bool, default False) – Whether to compute global explanations using explainable AI to evaluate global feature importance to the model. Default to False.
- global_explain() DataFrame[source]#
Provide explanations for an entire linear regression model.
Note
Output matches that of the BigQuery ML.GLOBAL_EXPLAIN function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-global-explain
- Returns:
Dataframes containing feature importance values and corresponding attributions, designed to provide a global explanation of feature influence.
- Return type:
- predict(X: DataFrame | Series | DataFrame | Series) DataFrame[source]#
Predict using the linear model.
- Parameters:
X (bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series) – Series or DataFrame of shape (n_samples, n_features). Samples.
- Returns:
DataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted values.
- Return type:
- predict_explain(X: DataFrame | Series | DataFrame | Series, *, top_k_features: int = 5) DataFrame[source]#
Explain predictions for a linear regression model.
Note
Output matches that of the BigQuery ML.EXPLAIN_PREDICT function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-explain-predict
- Parameters:
or (X (bigframes.dataframe.DataFrame or bigframes.series.Series)
pandas.core.series.Series) (pandas.core.frame.DataFrame or) – Series or a DataFrame to explain its predictions.
top_k_features (int, default 5) –
an INT64 value that specifies how many top feature attribution pairs are generated for each row of input data. The features are ranked by the absolute values of their attributions.
By default, top_k_features is set to 5. If its value is greater than the number of features in the training data, the attributions of all features are returned.
- Returns:
The predicted DataFrames with explanation columns.
- Return type:
- score(X: DataFrame | Series | DataFrame | Series, y: DataFrame | Series | DataFrame | Series) DataFrame[source]#
Calculate evaluation metrics of the model.
Note
Output matches that of the BigQuery ML.EVALUATE function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate#regression_models for the outputs relevant to this model type.
- Parameters:
X (bigframes.dataframe.DataFrame or bigframes.series.Series) – Series or DataFrame of shape (n_samples, n_features). Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.y (bigframes.dataframe.DataFrame or bigframes.series.Series) – Series or DataFrame of shape (n_samples,) or (n_samples, n_outputs). True values for X.
- Returns:
A DataFrame of the evaluation result.
- Return type:
- to_gbq(model_name: str, replace: bool = False) LinearRegression[source]#
Save the model to BigQuery.
- Parameters:
- Returns:
Saved model.
- Return type: