# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Struct Data Types#

In BigQuery, a STRUCT (also known as a record) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigQuery DataFrames maps BigQuery STRUCT types to the pandas equivalent, pandas.ArrowDtype(pa.struct()).

This notebook illustrates how to work with STRUCT columns in BigQuery DataFrames. First, let’s import the required packages and perform the necessary setup below.

import bigframes.pandas as bpd
import bigframes.bigquery as bbq
import pandas as pd
import pyarrow as pa
REGION = "US"  # @param {type: "string"}

bpd.options.display.progress_bar = None
bpd.options.bigquery.location = REGION

Create DataFrames with struct columns#

Example 1: Creating from a list of objects

names = ["Alice", "Bob", "Charlie"]
addresses = [
    {'City': 'New York', 'State': 'NY'},
    {'City': 'San Francisco', 'State': 'CA'},
    {'City': 'Seattle', 'State': 'WA'}
]
df = bpd.DataFrame({'Name': names, 'Address': addresses})
df
Name Address
0 Alice {'City': 'New York', 'State': 'NY'}
1 Bob {'City': 'San Francisco', 'State': 'CA'}
2 Charlie {'City': 'Seattle', 'State': 'WA'}

3 rows × 2 columns

[3 rows x 2 columns in total]
df.dtypes
Name                                    string[pyarrow]
Address    struct<City: string, State: string>[pyarrow]
dtype: object

Example 2: Defining schema explicitly

bpd.Series(
    data=addresses, 
    dtype=bpd.ArrowDtype(pa.struct([('City', pa.string()), ('State', pa.string())]))
)
0         {'City': 'New York', 'State': 'NY'}
1    {'City': 'San Francisco', 'State': 'CA'}
2          {'City': 'Seattle', 'State': 'WA'}
dtype: struct<City: string, State: string>[pyarrow]

Example 3: Reading from a source

bpd.read_gbq("bigquery-public-data.ml_datasets.credit_card_default", max_results=5)["predicted_default_payment_next_month"]
0    [{'tables': {'score': 0.8667634129524231, 'val...
1    [{'tables': {'score': 0.9351968765258789, 'val...
2    [{'tables': {'score': 0.8572560548782349, 'val...
3    [{'tables': {'score': 0.9690881371498108, 'val...
4    [{'tables': {'score': 0.9349926710128784, 'val...
Name: predicted_default_payment_next_month, dtype: list<item: struct<tables: struct<score: double, value: string>>>[pyarrow]

Operate on STRUCT data#

BigQuery DataFrames provides two main approaches for operating on STRUCT data:

  1. The Series.struct accessor: Provides Pandas-like methods for STRUCT column manipulation.

  2. The DataFrame.struct accessor: Provides Pandas-like methods for all child STRUCT columns manipulation.

  3. BigQuery built-in functions: Allows you to use functions mirroring BigQuery SQL operations, available through the bigframes.bigquery module (abbreviated as bbq below), such as struct.

View Data Types of Struct Fields#

df['Address'].struct.dtypes
City     string[pyarrow]
State    string[pyarrow]
dtype: object

Access a Struct Field by Name#

df['Address'].struct.field("City")
0         New York
1    San Francisco
2          Seattle
Name: City, dtype: string

Extract Struct Fields into a DataFrame#

Example 1: Using Series .struct accessor

df['Address'].struct.explode()
City State
0 New York NY
1 San Francisco CA
2 Seattle WA

3 rows × 2 columns

[3 rows x 2 columns in total]

Example 2: Using DataFrame .struct accessor while keeping other columns

df.struct.explode("Address")
Name Address.City Address.State
0 Alice New York NY
1 Bob San Francisco CA
2 Charlie Seattle WA

3 rows × 3 columns

[3 rows x 3 columns in total]