BigQuery extension for pandas#

BigQuery DataFrames provides a pandas extension to execute BigQuery SQL scalar functions directly on pandas DataFrames.

import pandas as pd
import bigframes  # This import registers the bigquery accessor.

By default, BigQuery DataFrames selects a location to process data based on the data location, but using a pandas object doesn’t provide such informat. If processing location is important to you, configure the location before using the accessor.

import bigframes.pandas as bpd

bpd.reset_session()
bpd.options.bigquery.location = "US"

Using sql_scalar#

The bigquery.sql_scalar method allows you to apply a SQL scalar function to a pandas DataFrame by converting it to BigFrames, executing the SQL in BigQuery, and returning the result as a pandas Series.

df = pd.DataFrame({"a": [1.5, 2.5, 3.5]})
result = df.bigquery.sql_scalar("ROUND({0}, 0)")
result
Query processed 0 Bytes in a moment of slot time.
0    2.0
1    3.0
2    4.0
dtype: Float64

You can also use multiple columns.

df = pd.DataFrame({"a": [1, 2, 3], "b": [10, 20, 30]})
result = df.bigquery.sql_scalar("{a} + {b}")
result
Query processed 0 Bytes in a moment of slot time.
0    11
1    22
2    33
dtype: Int64