{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "XZpKUoHjXw3_"
},
"outputs": [],
"source": [
"# Copyright 2026 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SEKzWP6jW9Oj"
},
"source": [
"# Analyzing movie posters with BigQuery Dataframe AI functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c9CCKXG5XTb-"
},
"source": [
"BigQuery Dataframe provides a Pythonic way to use AI functions directly with your dataframes. In this notebook, you will use these functions to analyze old\n",
"movie posters. These posters are images stored in a public Google Cloud Storage bucket: `gs://cloud-samples-data/vertex-ai/dataset-management/datasets/classic-movie-posters`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CUJDa_7MPbL9"
},
"source": [
"## Set up"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D3iYtBSkYpCK"
},
"source": [
"Before you begin, you need to\n",
"\n",
"* Set up your permissions for generative AI functions with [these instructions](https://docs.cloud.google.com/bigquery/docs/permissions-for-ai-functions)\n",
"* Set up your Cloud Resource connection by following [these instructions](https://docs.cloud.google.com/bigquery/docs/create-cloud-resource-connection)\n",
"\n",
"Once you have the permissions set up, import the `bigframes.pandas` package, and\n",
"set your cloud project ID."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6nqoRHYbPAx3"
},
"outputs": [],
"source": [
"import bigframes.pandas as bpd\n",
"\n",
"MY_RPOJECT_ID = \"bigframes-dev\" # @param {type:\"string\"}\n",
"\n",
"bpd.options.bigquery.project = MY_RPOJECT_ID"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2XHcNHtvPhNW"
},
"source": [
"## Load data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eS-9A7DijfoQ"
},
"source": [
"First, you load the data from the GCS bucket to a BigQuery Dataframe with the `from_glob_path` method:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "ZNPzFjCyPap0",
"outputId": "346d20b2-d615-4094-d24e-2d40e5c90ee2"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.12/dist-packages/bigframes/core/global_session.py:113: DefaultLocationWarning: No explicit location is set, so using location US for the session.\n",
" _global_session = bigframes.session.connect(\n"
]
},
{
"data": {
"text/html": [
"\n",
" Query processed 0 Bytes in a moment of slot time. [Job bigframes-dev:US.48a27954-7a4a-4b9e-8176-ea227fd188ad details]\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())\n",
"instead of using `db_dtypes` in the future when available in pandas\n",
"(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.\n",
" warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)\n",
"/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.\n",
" return prop(*args, **kwargs)\n"
]
},
{
"data": {
"text/html": [
"\n",
" Query processed 1.3 kB in a minute of slot time. [Job bigframes-dev:US.09c48ecb-e041-4c18-a390-ca5a36fd07c3 details]\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" Query processed 1.2 kB in a moment of slot time.\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
poster
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
\n",
"
\n",
" \n",
"
\n",
"
1 rows × 1 columns
\n",
"
[1 rows x 1 columns in total]"
],
"text/plain": [
" poster\n",
"0 {\"access_urls\":{\"expiry_time\":\"2026-03-27T02:0...\n",
"\n",
"[1 rows x 1 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Replace with your own connection name.\n",
"MY_CONNECTION = 'bigframes-default-connection' # @param {type:\"string\"}\n",
"\n",
"movies = bpd.from_glob_path(\n",
" \"gs://cloud-samples-data/vertex-ai/dataset-management/datasets/classic-movie-posters/*\",\n",
" connection = MY_CONNECTION,\n",
" name='poster')\n",
"movies.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EfkdDH08QnYw"
},
"source": [
"## Extract titles from posters"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "6CoZZ5tSQm1r",
"outputId": "1b3915ce-eb83-4be9-b1c1-d9a326dc9408"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())\n",
"instead of using `db_dtypes` in the future when available in pandas\n",
"(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.\n",
" warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)\n",
"/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.\n",
" return prop(*args, **kwargs)\n",
"/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())\n",
"instead of using `db_dtypes` in the future when available in pandas\n",
"(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.\n",
" warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)\n",
"/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.\n",
" return prop(*args, **kwargs)\n"
]
},
{
"data": {
"text/html": [
"\n",
" Query processed 1.3 kB in 2 minutes of slot time. [Job bigframes-dev:US.4a08a15f-5a2f-463b-bba8-734858ec992b details]\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" Query processed 1.2 kB in a moment of slot time.\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
poster
\n",
"
title
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
\n",
"
Der Student von Prag
\n",
"
\n",
" \n",
"
\n",
"
1 rows × 2 columns
\n",
"
[1 rows x 2 columns in total]"
],
"text/plain": [
" poster title\n",
"0 {\"access_urls\":{\"expiry_time\":\"2026-03-27T02:0... Der Student von Prag\n",
"\n",
"[1 rows x 2 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import bigframes.bigquery as bbq\n",
"\n",
"movies['title'] = bbq.ai.generate(\n",
" (\"What is the movie title for this poster? Name only\", movies['poster']),\n",
" endpoint='gemini-2.5-pro'\n",
").struct.field(\"result\")\n",
"movies.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cFQHQ9S2lr6t"
},
"source": [
"Notice that `ai.generate()` has a `struct` return type, which holds not only the LLM response, but also the status. If you do not provide a field name for your answer, `\"result\"` will be the default name. You can access LLM response content with the struct accessor (e.g. `my_response.struct.filed(\"result\")`);."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R8kkUhgoS5Xz"
},
"source": [
"## Get movie release year\n",
"\n",
"In the example below, you will use `ai.generate_int()` to find the release year for each movie poster:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 976
},
"id": "cKZdHq0XS1iW",
"outputId": "72cbad57-4518-4e1e-97bb-333d424dba73"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())\n",
"instead of using `db_dtypes` in the future when available in pandas\n",
"(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.\n",
" warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)\n",
"/usr/local/lib/python3.12/dist-packages/bigframes/core/logging/log_adapter.py:229: ApiDeprecationWarning: The blob accessor is deprecated and will be removed in a future release. Use bigframes.bigquery.obj functions instead.\n",
" return prop(*args, **kwargs)\n"
]
},
{
"data": {
"text/html": [
"\n",
" Query processed 1.3 kB in 4 minutes of slot time. [Job bigframes-dev:US.b60a151a-6cbc-405e-9c40-8a7461981a00 details]\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" Query processed 1.3 kB in a moment of slot time.\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
poster
\n",
"
title
\n",
"
year
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
\n",
"
Der Student von Prag
\n",
"
1913
\n",
"
\n",
" \n",
"
\n",
"
1 rows × 3 columns
\n",
"
[1 rows x 3 columns in total]"
],
"text/plain": [
" poster title \\\n",
"0 {\"access_urls\":{\"expiry_time\":\"2026-03-27T02:0... Der Student von Prag \n",
"\n",
" year \n",
"0 1913 \n",
"\n",
"[1 rows x 3 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies['year'] = bbq.ai.generate_int(\n",
" (\"What is the release year for this movie?\", movies['title']),\n",
" endpoint='gemini-2.5-pro'\n",
").struct.field(\"result\")\n",
"\n",
"movies.head(1)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 250
},
"id": "yqRiNRY8_8fs",
"outputId": "efa60107-6883-4f5c-8e40-43c7287ea7fb"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.12/dist-packages/bigframes/dtypes.py:1010: JSONDtypeWarning: JSON columns will be represented as pandas.ArrowDtype(pyarrow.json_())\n",
"instead of using `db_dtypes` in the future when available in pandas\n",
"(https://github.com/pandas-dev/pandas/issues/60958) and pyarrow.\n",
" warnings.warn(msg, bigframes.exceptions.JSONDtypeWarning)\n"
]
},
{
"data": {
"text/html": [
"