# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Analyzing movie posters with BigQuery Dataframe AI functions#

Colab logo Run in Colab GitHub logo View on GitHub BQ logo Open in BQ Studio

BigQuery Dataframe provides a Pythonic way to use AI functions directly with your dataframes. In this notebook, you will use these functions to analyze old movie posters. These posters are images stored in a public Google Cloud Storage bucket: gs://cloud-samples-data/vertex-ai/dataset-management/datasets/classic-movie-posters

Set up#

Before you begin, you need to

Once you have the permissions set up, import the bigframes.pandas package, and set your cloud project ID.

import bigframes.pandas as bpd

MY_RPOJECT_ID = "bigframes-dev" # @param {type:"string"}

bpd.options.bigquery.project = MY_RPOJECT_ID

Load data#

First, you load the data from the GCS bucket to a BigQuery Dataframe with the from_glob_path method:

# Replace with your own connection name.
MY_CONNECTION = 'bigframes-default-connection' # @param {type:"string"}

movies = bpd.from_glob_path(
    "gs://cloud-samples-data/vertex-ai/dataset-management/datasets/classic-movie-posters/*",
    connection = MY_CONNECTION,
    name='poster')
movies.head(1)
/usr/local/lib/python3.12/dist-packages/bigframes/core/global_session.py:113: DefaultLocationWarning: No explicit location is set, so using location US for the session.
  _global_session = bigframes.session.connect(
Query processed 0 Bytes in a moment of slot time. [Job bigframes-dev:US.48a27954-7a4a-4b9e-8176-ea227fd188ad details]
/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
  warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)
/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.
  return prop(*args, **kwargs)
Query processed 1.3 kB in a minute of slot time. [Job bigframes-dev:US.09c48ecb-e041-4c18-a390-ca5a36fd07c3 details]
Query processed 1.2 kB in a moment of slot time.
poster
0

1 rows × 1 columns

[1 rows x 1 columns in total]

Extract titles from posters#

import bigframes.bigquery as bbq

movies['title'] = bbq.ai.generate(
    ("What is the movie title for this poster? Name only", movies['poster']),
    endpoint='gemini-2.5-pro'
).struct.field("result")
movies.head(1)
/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
  warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)
/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.
  return prop(*args, **kwargs)
/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
  warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)
/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.
  return prop(*args, **kwargs)
Query processed 1.3 kB in 2 minutes of slot time. [Job bigframes-dev:US.4a08a15f-5a2f-463b-bba8-734858ec992b details]
Query processed 1.2 kB in a moment of slot time.
poster title
0 Der Student von Prag

1 rows × 2 columns

[1 rows x 2 columns in total]

Notice that ai.generate() has a struct return type, which holds not only the LLM response, but also the status. If you do not provide a field name for your answer, "result" will be the default name. You can access LLM response content with the struct accessor (e.g. my_response.struct.filed("result"));.

Get movie release year#

In the example below, you will use ai.generate_int() to find the release year for each movie poster:

movies['year'] = bbq.ai.generate_int(
    ("What is the release year for this movie?", movies['title']),
    endpoint='gemini-2.5-pro'
).struct.field("result")

movies.head(1)
/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
  warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)
/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.
  return prop(*args, **kwargs)
Query processed 1.3 kB in 4 minutes of slot time. [Job bigframes-dev:US.b60a151a-6cbc-405e-9c40-8a7461981a00 details]
Query processed 1.3 kB in a moment of slot time.
poster title year
0 Der Student von Prag 1913

1 rows × 3 columns

[1 rows x 3 columns in total]
movies.dtypes
/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
  warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)
0
poster struct<uri: string, version: string, authorize...
title string[pyarrow]
year Int64

Filter movie by production country#

In the next example, you will use ai.if_() to find the movies that were produced in the USA.

us_movies = movies[bbq.ai.if_(
    ("The movie ", movies['title'], " was made in US")
)]
us_movies.head(1)
/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())
instead of using `db_dtypes` in the future when available in pandas
(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.
  warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)
/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.
  return prop(*args, **kwargs)
Query processed 1.3 kB in 6 minutes of slot time. [Job bigframes-dev:US.c9bb23f0-5ceb-4d6c-8241-960c496274ae details]
Query processed 1.2 kB in a moment of slot time.
poster title year
8 Shoulder Arms 1918

1 rows × 3 columns

[1 rows x 3 columns in total]