Time Series Forecasting with BigFrames#

This notebook provides a comprehensive walkthrough of time series forecasting using the BigFrames library. We will explore two powerful models, TimesFM and ARIMAPlus, to predict bikeshare trip demand based on historical data from San Francisco. The process covers data loading, preprocessing, model training, and visualization of the results.

import bigframes.pandas as bpd
from bigframes.ml import forecasting
bpd.options.display.render_mode = "anywidget"

1. Data Loading and Preprocessing#

The first step is to load the San Francisco bikeshare dataset from BigQuery. We then preprocess the data by filtering for trips made by ‘Subscriber’ type users from 2018 onwards. This ensures we are working with a relevant and consistent subset of the data. Finally, we aggregate the trip data by the hour to create a time series of trip counts.

df = bpd.read_gbq("bigquery-public-data.san_francisco_bikeshare.bikeshare_trips")
df = df[df["start_date"] >= "2018-01-01"]
df = df[df["subscriber_type"] == "Subscriber"]
df["trip_hour"] = df["start_date"].dt.floor("h")
df_grouped = df[["trip_hour", "trip_id"]].groupby("trip_hour").count().reset_index()
df_grouped = df_grouped.rename(columns={"trip_id": "num_trips"})

2. Forecasting with TimesFM#

In this section, we use the TimesFM (Time Series Foundation Model) to forecast future bikeshare demand. TimesFM is a powerful model designed for a wide range of time series forecasting tasks. We will use it to predict the number of trips for the last week of our dataset.

result = df_grouped.head(2842-168).ai.forecast(
    timestamp_column="trip_hour",
    data_column="num_trips",
    horizon=168
)
result

/usr/local/google/home/shuowei/src/python-bigquery-dataframes/bigframes/dataframe.py:5340: FutureWarning: The 'ai' property will be removed. Please use 'bigframes.bigquery.ai'
instead.
  warnings.warn(msg, category=FutureWarning)

✅ Completed. Query processed 58.7 MB in 19 seconds of slot time. [Job bigframes-dev:US.eb026c28-038a-4ca7-acfa-474ed0be4119 details]

✅ Completed. Query processed 7.1 kB in a moment of slot time.

3. Forecasting with ARIMAPlus#

Next, we will use the ARIMAPlus model, which is a BigQuery ML model available through BigFrames. ARIMAPlus is an advanced forecasting model that can capture complex time series patterns. We will train it on the same historical data and use it to forecast the same period as the TimesFM model.

model = forecasting.ARIMAPlus(
    auto_arima_max_order=5,  # Reduce runtime for large datasets
    data_frequency="hourly",
    horizon=168
)
X = df_grouped.head(2842-168)[["trip_hour"]]
y = df_grouped.head(2842-168)[["num_trips"]]
model.fit(
    X, y
)
predictions = model.predict(horizon=168, confidence_level=0.95)
predictions

Query processed 1.8 MB in 46 seconds of slot time. [Job bigframes-dev:US.ac354d97-dc91-4d01-9dca-7069db6a26a7 details]

✅ Completed. Query processed 92.2 kB in a moment of slot time. [Job bigframes-dev:US.e61f41af-8761-4853-ae41-d38760c966ed details]

✅ Completed. Query processed 1.3 kB in a moment of slot time.

✅ Completed. Query processed 10.8 kB in a moment of slot time.

✅ Completed. Query processed 0 Bytes in a moment of slot time.

4. Compare and Visualize Forecasts#

Now we will visualize the forecasts from both TimesFM and ARIMAPlus against the actual historical data. This allows for a direct comparison of the two models’ performance.

timesfm_result = result.sort_values("forecast_timestamp")[["forecast_timestamp", "forecast_value"]]
timesfm_result = timesfm_result.rename(columns={
    "forecast_timestamp": "trip_hour",
    "forecast_value": "timesfm_forecast"
})
arimaplus_result = predictions.sort_values("forecast_timestamp")[["forecast_timestamp", "forecast_value"]]
arimaplus_result = arimaplus_result.rename(columns={
    "forecast_timestamp": "trip_hour",
    "forecast_value": "arimaplus_forecast"
})
df_all = df_grouped.merge(timesfm_result, on="trip_hour", how="left")
df_all = df_all.merge(arimaplus_result, on="trip_hour", how="left")
df_all.tail(672).plot.line(
    x="trip_hour",
    y=["num_trips", "timesfm_forecast", "arimaplus_forecast"],
    rot=45,
    title="Trip Forecasts Comparison"
)

✅ Completed. Query processed 31.7 MB in 11 seconds of slot time.

✅ Completed. Query processed 58.8 MB in 12 seconds of slot time.

<Axes: title={'center': 'Trip Forecasts Comparison'}, xlabel='trip_hour'>

../../_images/ae57954bdf6509f5e8ddac4f4637ef6f7c000f8c400bdeb5a1fa84bd864add22.png

5. Multiple Time Series Forecasting#

This section demonstrates a more advanced capability of ARIMAPlus: forecasting multiple time series simultaneously. This is useful when you have several independent series that you want to model together, such as trip counts from different bikeshare stations. The id_col parameter is key here, as it is used to differentiate between the individual time series.

df_multi = bpd.read_gbq("bigquery-public-data.san_francisco_bikeshare.bikeshare_trips")
df_multi = df_multi[df_multi["start_station_name"].str.contains("Market|Powell|Embarcadero")]
  
# Create daily aggregation
features = bpd.DataFrame({
    "start_station_name": df_multi["start_station_name"],
    "date": df_multi["start_date"].dt.date,
})

# Group by station and date
num_trips = features.groupby(
    ["start_station_name", "date"], as_index=False
).size()
# Rename the size column to "num_trips"
num_trips = num_trips.rename(columns={num_trips.columns[-1]: "num_trips"})

# Check data quality
print(f"Number of stations: {num_trips['start_station_name'].nunique()}")
print(f"Date range: {num_trips['date'].min()} to {num_trips['date'].max()}")

# Use daily frequency  
model = forecasting.ARIMAPlus(
    data_frequency="daily",
    horizon=30,
    auto_arima_max_order=3,
    min_time_series_length=10,
    time_series_length_fraction=0.8
)

model.fit(
    num_trips[["date"]],
    num_trips[["num_trips"]],
    id_col=num_trips[["start_station_name"]]
)

predictions_multi = model.predict()
predictions_multi

/usr/local/google/home/shuowei/src/python-bigquery-dataframes/bigframes/core/log_adapter.py:182: TimeTravelCacheWarning: Reading cached table from 2025-12-12 23:04:48.874384+00:00 to avoid
incompatibilies with previous reads of this table. To read the latest
version, set `use_cache=False` or close the current session with
Session.close() or bigframes.pandas.close_session().
  return method(*args, **kwargs)

✅ Completed. Query processed 69.8 MB in a moment of slot time.

Number of stations: 41

✅ Completed. Query processed 69.8 MB in a moment of slot time.

Date range: 2013-08-29 to 2018-04-30

Query processed 18.8 MB in 2 minutes of slot time. [Job bigframes-dev:US.74ada07a-98ad-4d03-90bb-2b98f1d8b558 details]

✅ Completed. Query processed 1.4 MB in 4 seconds of slot time. [Job bigframes-dev:US.a292f715-1d9c-406d-a7d5-f99b2ba71660 details]

✅ Completed. Query processed 4.6 kB in a moment of slot time.

✅ Completed. Query processed 11.5 kB in a moment of slot time.

✅ Completed. Query processed 0 Bytes in a moment of slot time.