BigQuery extension for pandas#
BigQuery DataFrames provides a pandas extension to execute BigQuery SQL scalar functions directly on pandas DataFrames.
import pandas as pd
import bigframes # This import registers the bigquery accessor.
By default, BigQuery DataFrames selects a location to process data based on the data location, but using a pandas object doesn’t provide such informat. If processing location is important to you, configure the location before using the accessor.
import bigframes.pandas as bpd
bpd.reset_session()
bpd.options.bigquery.location = "US"
Using sql_scalar#
The bigquery.sql_scalar method allows you to apply a SQL scalar function to a pandas DataFrame by converting it to BigFrames, executing the SQL in BigQuery, and returning the result as a pandas Series.
df = pd.DataFrame({"a": [1.5, 2.5, 3.5]})
result = df.bigquery.sql_scalar("ROUND({0}, 0)")
result
Query processed 0 Bytes in a moment of slot time.
0 2.0
1 3.0
2 4.0
dtype: Float64
You can also use multiple columns.
df = pd.DataFrame({"a": [1, 2, 3], "b": [10, 20, 30]})
result = df.bigquery.sql_scalar("{a} + {b}")
result
Query processed 0 Bytes in a moment of slot time.
0 11
1 22
2 33
dtype: Int64