bigframes.ml.forecasting.ARIMAPlus#
- class bigframes.ml.forecasting.ARIMAPlus(*, horizon: int = 1000, auto_arima: bool = True, auto_arima_max_order: int | None = None, auto_arima_min_order: int | None = None, data_frequency: str = 'auto_frequency', include_drift: bool = False, holiday_region: str | None = None, clean_spikes_and_dips: bool = True, adjust_step_changes: bool = True, forecast_limit_lower_bound: float | None = None, forecast_limit_upper_bound: float | None = None, time_series_length_fraction: float | None = None, min_time_series_length: int | None = None, max_time_series_length: int | None = None, trend_smoothing_window_size: int | None = None, decompose_time_series: bool = True)[source]#
Time Series ARIMA Plus model.
- Parameters:
horizon (int, default 1,000) – The number of time points to forecast. Default to 1,000, max value 10,000.
auto_arima (bool, default True) – Determines whether the training process uses auto.ARIMA or not. If True, training automatically finds the best non-seasonal order (that is, the p, d, q tuple) and decides whether or not to include a linear drift term when d is 1.
auto_arima_max_order (int or None, default None) – The maximum value for the sum of non-seasonal p and q.
auto_arima_min_order (int or None, default None) – The minimum value for the sum of non-seasonal p and q.
data_frequency (str, default "auto_frequency") – The data frequency of the input time series. Possible values are “auto_frequency”, “per_minute”, “hourly”, “daily”, “weekly”, “monthly”, “quarterly”, “yearly”
include_drift (bool, default False) – Determines whether the model should include a linear drift term or not. The drift term is applicable when non-seasonal d is 1.
holiday_region (str or None, default None) – The geographical region based on which the holiday effect is applied in modeling. By default, holiday effect modeling isn’t used. Possible values see https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-time-series#holiday_region.
clean_spikes_and_dips (bool, default True) – Determines whether or not to perform automatic spikes and dips detection and cleanup in the model training pipeline. The spikes and dips are replaced with local linear interpolated values when they’re detected.
adjust_step_changes (bool, default True) – Determines whether or not to perform automatic step change detection and adjustment in the model training pipeline.
forecast_limit_upper_bound (float or None, default None) – The upper bound of the forecasting values. When you specify the
forecast_limit_upper_boundoption, all of the forecast values must be less than the specified value. For example, if you setforecast_limit_upper_boundto 100, then all of the forecast values are less than 100. Also, all values greater than or equal to theforecast_limit_upper_boundvalue are excluded from modelling. The forecasting limit ensures that forecasts stay within limits.forecast_limit_lower_bound (float or None, default None) – The lower bound of the forecasting values where the minimum value allowed is 0. When you specify the
forecast_limit_lower_boundoption, all of the forecast values must be greater than the specified value. For example, if you setforecast_limit_lower_boundto 0, then all of the forecast values are larger than 0. Also, all values less than or equal to theforecast_limit_lower_boundvalue are excluded from modelling. The forecasting limit ensures that forecasts stay within limits.time_series_length_fraction (float or None, default None) – The fraction of the interpolated length of the time series that’s used to model the time series trend component. All of the time points of the time series are used to model the non-trend component.
min_time_series_length (int or None, default None) – The minimum number of time points that are used in modeling the trend component of the time series.
max_time_series_length (int or None, default None) – The maximum number of time points in a time series that can be used in modeling the trend component of the time series.
trend_smoothing_window_size (int or None, default None) – The smoothing window size for the trend component.
decompose_time_series (bool, default True) – Determines whether the separate components of both the history and forecast parts of the time series (such as holiday effect and seasonal components) are saved in the model.
- property coef_: DataFrame#
Inspect the coefficients of the model.
..note:
Output matches that of the ML.ARIMA_COEFFICIENTS function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-arima-coefficients for the outputs relevant to this model type.
- Returns:
A DataFrame with the coefficients for the model.
- Return type:
- detect_anomalies(X: DataFrame | Series | DataFrame | Series, *, anomaly_prob_threshold: float = 0.95) DataFrame[source]#
Detect the anomaly data points of the input.
- Parameters:
X (bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series) – Series or a DataFrame to detect anomalies.
anomaly_prob_threshold (float, default 0.95) – Identifies the custom threshold to use for anomaly detection. The value must be in the range [0, 1), with a default value of 0.95.
- Returns:
Detected DataFrame.
- Return type:
- predict(X=None, *, horizon: int = 3, confidence_level: float = 0.95) DataFrame[source]#
Forecast time series at future horizon.
Note
Output matches that of the BigQuery ML.FORECAST function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-forecast
- Parameters:
X (default None) – ignored, to be compatible with other APIs.
(int (horizon) – 3): an int value that specifies the number of time points to forecast. The default value is 3, and the maximum value is 1000.
default – 3): an int value that specifies the number of time points to forecast. The default value is 3, and the maximum value is 1000.
confidence_level (float, default 0.95) – A float value that specifies percentage of the future values that fall in the prediction interval. The valid input range is [0.0, 1.0).
- Returns:
- The predicted DataFrames. Which
contains 2 columns: “forecast_timestamp”, “id” as optional, and “forecast_value”.
- Return type:
- predict_explain(X=None, *, horizon: int = 3, confidence_level: float = 0.95) DataFrame[source]#
Explain Forecast time series at future horizon.
Note
Output matches that of the BigQuery ML.EXPLAIN_FORECAST function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-explain-forecast
- Parameters:
X (default None) – ignored, to be compatible with other APIs.
(int (horizon) – 3): an int value that specifies the number of time points to forecast. The default value is 3, and the maximum value is 1000.
default – 3): an int value that specifies the number of time points to forecast. The default value is 3, and the maximum value is 1000.
confidence_level (float, default 0.95) – A float value that specifies percentage of the future values that fall in the prediction interval. The valid input range is [0.0, 1.0).
- Returns:
The predicted DataFrames.
- Return type:
- score(X: DataFrame | Series | DataFrame | Series, y: DataFrame | Series | DataFrame | Series, id_col: DataFrame | Series | DataFrame | Series | None = None) DataFrame[source]#
Calculate evaluation metrics of the model.
Note
Output matches that of the BigQuery ML.EVALUATE function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate#time_series_models for the outputs relevant to this model type.
- Parameters:
bigframes.series.Series (y (bigframes.dataframe.DataFrame or)
pandas.core.series.Series) (or pandas.core.frame.DataFrame or) – A dataframe or series only contains 1 column as evaluation timestamp. The timestamp must be within the horizon of the model, which by default is 1000 data points.
bigframes.series.Series
pandas.core.series.Series) – A dataframe or series only contains 1 column as evaluation numeric values.
(Optional[bigframes.dataframe.DataFrame] (id_col)
Optional[bigframes.series.Series] (or)
Optional[pandas.core.frame.DataFrame] (or)
Optional[pandas.core.series.Series] (or)
None (or) – An optional dataframe or series contains at least 1 column as evaluation id column.
None) (default) – An optional dataframe or series contains at least 1 column as evaluation id column.
- Returns:
A DataFrame as evaluation result.
- Return type:
- summary(show_all_candidate_models: bool = False) DataFrame[source]#
Summary of the evaluation metrics of the time series model.
Note
Output matches that of the BigQuery ML.ARIMA_EVALUATE function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-arima-evaluate for the outputs relevant to this model type.
- Parameters:
show_all_candidate_models (bool, default to False) – Whether to show evaluation metrics or an error message for either all candidate models or for only the best model with the lowest AIC. Default to False.
- Returns:
A DataFrame as evaluation result.
- Return type: