{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2025 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Format LLM output using an output schema\n",
    "\n",
    "<table align=\"left\">\n",
    "\n",
    "  <td>\n",
    "    <a href=\"https://colab.research.google.com/github/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb\">\n",
    "      <img src=\"https://raw.githubusercontent.com/googleapis/python-bigquery-dataframes/refs/heads/main/third_party/logo/colab-logo.png\" alt=\"Colab logo\"> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb\">\n",
    "      <img src=\"https://raw.githubusercontent.com/googleapis/python-bigquery-dataframes/refs/heads/main/third_party/logo/github-logo.png\" width=\"32\" alt=\"GitHub logo\">\n",
    "      View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://console.cloud.google.com/bigquery/import?url=https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb\">\n",
    "      <img src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTW1gvOovVlbZAIZylUtf5Iu8-693qS1w5NJw&s\" alt=\"BQ logo\" width=\"35\">\n",
    "      Open in BigQuery Studio\n",
    "    </a>\n",
    "  </td>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook shows you how to create structured LLM output by specifying an output schema when generating predictions with a Gemini model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Costs\n",
    "\n",
    "This tutorial uses billable components of Google Cloud:\n",
    "\n",
    "* BigQuery (compute)\n",
    "* BigQuery ML\n",
    "* Generative AI support on Vertex AI\n",
    "\n",
    "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models), [Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing),\n",
    "and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#section-11),\n",
    "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
    "to generate a cost estimate based on your projected usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Before you begin\n",
    "\n",
    "Complete the tasks in this section to set up your environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up your Google Cloud project\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n",
    "\n",
    "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
    "\n",
    "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com) to enable the following APIs:\n",
    "\n",
    "  * BigQuery API\n",
    "  * BigQuery Connection API\n",
    "  * Vertex AI API\n",
    "\n",
    "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Authenticate your Google Cloud account\n",
    "\n",
    "Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below.\n",
    "\n",
    "**BigQuery Studio** or **Vertex AI Workbench**\n",
    "\n",
    "Do nothing, you are already authenticated.\n",
    "\n",
    "**Local JupyterLab instance**\n",
    "\n",
    "Uncomment and run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ! gcloud auth login"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Colab**\n",
    "\n",
    "Uncomment and run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from google.colab import auth\n",
    "# auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up your project"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set your project and import necessary modules. If you don't know your project ID, see [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = \"\" # replace with your project\n",
    "import bigframes\n",
    "bigframes.options.bigquery.project = PROJECT\n",
    "bigframes.options.display.progress_bar = None\n",
    "\n",
    "import bigframes.pandas as bpd\n",
    "from bigframes.ml import llm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a DataFrame and a Gemini model\n",
    "Create a simple [DataFrame](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame) of several cities:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/global_session.py:103: DefaultLocationWarning: No explicit location is set, so using location US for the session.\n",
      "  _global_session = bigframes.session.connect(\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 1 columns</p>\n",
       "</div>[3 rows x 1 columns in total]"
      ],
      "text/plain": [
       "       city\n",
       "0   Seattle\n",
       "1  New York\n",
       "2  Shanghai\n",
       "\n",
       "[3 rows x 1 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = bpd.DataFrame({\"city\": [\"Seattle\", \"New York\", \"Shanghai\"]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connect to a Gemini model using the [`GeminiTextGenerator` class](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/log_adapter.py:175: FutureWarning: Since upgrading the default model can cause unintended breakages, the\n",
      "default model will be removed in BigFrames 3.0. Please supply an\n",
      "explicit model to avoid this message.\n",
      "  return method(*args, **kwargs)\n"
     ]
    }
   ],
   "source": [
    "gemini = llm.GeminiTextGenerator()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate structured output data\n",
    "Previously, LLMs could only generate text output. For example, you could generate output that identifies whether a given city is a US city:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/array_value.py:109: PreviewWarning: JSON column interpretation as a custom PyArrow extention in\n",
      "`db_dtypes` is a preview feature and subject to change.\n",
      "  warnings.warn(msg, bfe.PreviewWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>ml_generate_text_llm_result</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Yes, Seattle is a city in the United States. I...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>Yes, New York City is a city in the United Sta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>No, Shanghai is not a US city. It is a major c...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 2 columns</p>\n",
       "</div>[3 rows x 2 columns in total]"
      ],
      "text/plain": [
       "       city                        ml_generate_text_llm_result\n",
       "0   Seattle  Yes, Seattle is a city in the United States. I...\n",
       "1  New York  Yes, New York City is a city in the United Sta...\n",
       "2  Shanghai  No, Shanghai is not a US city. It is a major c...\n",
       "\n",
       "[3 rows x 2 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = gemini.predict(df, prompt=[df[\"city\"], \"is a US city?\"])\n",
    "result[[\"city\", \"ml_generate_text_llm_result\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output is text that a human can read. However, if you want the output to be more useful for analysis, it is better to format the output as structured data. This is especially true when you want to have Boolean, integer, or float values to work with instead of string values. Previously, formatting the output in this way wasn't easy.\n",
    "\n",
    "Now, you can get structured output out-of-the-box by specifying the `output_schema` parameter when calling the Gemini model's [`predict` method](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator#bigframes_ml_llm_GeminiTextGenerator_predict). In the following example, the model output is formatted as Boolean values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/array_value.py:109: PreviewWarning: JSON column interpretation as a custom PyArrow extention in\n",
      "`db_dtypes` is a preview feature and subject to change.\n",
      "  warnings.warn(msg, bfe.PreviewWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>is_us_city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 2 columns</p>\n",
       "</div>[3 rows x 2 columns in total]"
      ],
      "text/plain": [
       "       city  is_us_city\n",
       "0   Seattle        True\n",
       "1  New York        True\n",
       "2  Shanghai       False\n",
       "\n",
       "[3 rows x 2 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = gemini.predict(df, prompt=[df[\"city\"], \"is a US city?\"], output_schema={\"is_us_city\": \"bool\"})\n",
    "result[[\"city\", \"is_us_city\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also format model output as float or integer values. In the following example, the model output is formatted as float values to show the city's population in millions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/array_value.py:109: PreviewWarning: JSON column interpretation as a custom PyArrow extention in\n",
      "`db_dtypes` is a preview feature and subject to change.\n",
      "  warnings.warn(msg, bfe.PreviewWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>population_in_millions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>0.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>19.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>26.32</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 2 columns</p>\n",
       "</div>[3 rows x 2 columns in total]"
      ],
      "text/plain": [
       "       city  population_in_millions\n",
       "0   Seattle                0.75\n",
       "1  New York               19.68\n",
       "2  Shanghai               26.32\n",
       "\n",
       "[3 rows x 2 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = gemini.predict(df, prompt=[\"what is the population in millions of\", df[\"city\"]], output_schema={\"population_in_millions\": \"float64\"})\n",
    "result[[\"city\", \"population_in_millions\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following example, the model output is formatted as integer values to show the count of the city's rainy days:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/array_value.py:109: PreviewWarning: JSON column interpretation as a custom PyArrow extention in\n",
      "`db_dtypes` is a preview feature and subject to change.\n",
      "  warnings.warn(msg, bfe.PreviewWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>rainy_days</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>123</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 2 columns</p>\n",
       "</div>[3 rows x 2 columns in total]"
      ],
      "text/plain": [
       "       city  rainy_days\n",
       "0   Seattle         152\n",
       "1  New York         123\n",
       "2  Shanghai         123\n",
       "\n",
       "[3 rows x 2 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = gemini.predict(df, prompt=[\"how many rainy days per year in\", df[\"city\"]], output_schema={\"rainy_days\": \"int64\"})\n",
    "result[[\"city\", \"rainy_days\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Format output as multiple data types in one prediction\n",
    "Within a single prediction, you can generate multiple columns of output that use different data types. \n",
    "\n",
    "The input doesn't have to be dedicated prompts as long as the output column names are informative to the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/array_value.py:109: PreviewWarning: JSON column interpretation as a custom PyArrow extention in\n",
      "`db_dtypes` is a preview feature and subject to change.\n",
      "  warnings.warn(msg, bfe.PreviewWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>is_US_city</th>\n",
       "      <th>population_in_millions</th>\n",
       "      <th>rainy_days_per_year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>True</td>\n",
       "      <td>0.75</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>True</td>\n",
       "      <td>8.8</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>False</td>\n",
       "      <td>26.32</td>\n",
       "      <td>115</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 4 columns</p>\n",
       "</div>[3 rows x 4 columns in total]"
      ],
      "text/plain": [
       "       city  is_US_city  population_in_millions  rainy_days_per_year\n",
       "0   Seattle        True                    0.75                  152\n",
       "1  New York        True                     8.8                  121\n",
       "2  Shanghai       False                   26.32                  115\n",
       "\n",
       "[3 rows x 4 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = gemini.predict(df, prompt=[df[\"city\"]], output_schema={\"is_US_city\": \"bool\", \"population_in_millions\": \"float64\", \"rainy_days_per_year\": \"int64\"})\n",
    "result[[\"city\", \"is_US_city\", \"population_in_millions\", \"rainy_days_per_year\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Format output as a composite data type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can generate composite data types like arrays and structs. The following example generates a `places_to_visit` column as an array of strings and a `gps_coordinates` column as a struct of floats:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/array_value.py:109: PreviewWarning: JSON column interpretation as a custom PyArrow extention in\n",
      "`db_dtypes` is a preview feature and subject to change.\n",
      "  warnings.warn(msg, bfe.PreviewWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>is_US_city</th>\n",
       "      <th>population_in_millions</th>\n",
       "      <th>rainy_days_per_year</th>\n",
       "      <th>places_to_visit</th>\n",
       "      <th>gps_coordinates</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>True</td>\n",
       "      <td>0.74</td>\n",
       "      <td>150</td>\n",
       "      <td>['Space Needle' 'Pike Place Market' 'Museum of...</td>\n",
       "      <td>{'latitude': 47.6062, 'longitude': -122.3321}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>True</td>\n",
       "      <td>8.4</td>\n",
       "      <td>121</td>\n",
       "      <td>['Times Square' 'Central Park' 'Statue of Libe...</td>\n",
       "      <td>{'latitude': 40.7128, 'longitude': -74.006}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Shanghai</td>\n",
       "      <td>False</td>\n",
       "      <td>26.32</td>\n",
       "      <td>115</td>\n",
       "      <td>['The Bund' 'Yu Garden' 'Shanghai Museum' 'Ori...</td>\n",
       "      <td>{'latitude': 31.2304, 'longitude': 121.4737}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 6 columns</p>\n",
       "</div>[3 rows x 6 columns in total]"
      ],
      "text/plain": [
       "       city  is_US_city  population_in_millions  rainy_days_per_year  \\\n",
       "0   Seattle        True                    0.74                  150   \n",
       "1  New York        True                     8.4                  121   \n",
       "2  Shanghai       False                   26.32                  115   \n",
       "\n",
       "                                     places_to_visit  \\\n",
       "0  ['Space Needle' 'Pike Place Market' 'Museum of...   \n",
       "1  ['Times Square' 'Central Park' 'Statue of Libe...   \n",
       "2  ['The Bund' 'Yu Garden' 'Shanghai Museum' 'Ori...   \n",
       "\n",
       "                                 gps_coordinates  \n",
       "0  {'latitude': 47.6062, 'longitude': -122.3321}  \n",
       "1    {'latitude': 40.7128, 'longitude': -74.006}  \n",
       "2   {'latitude': 31.2304, 'longitude': 121.4737}  \n",
       "\n",
       "[3 rows x 6 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = gemini.predict(df, prompt=[df[\"city\"]], output_schema={\"is_US_city\": \"bool\", \"population_in_millions\": \"float64\", \"rainy_days_per_year\": \"int64\", \"places_to_visit\": \"array<string>\", \"gps_coordinates\": \"struct<latitude float64, longitude float64>\"})\n",
    "result[[\"city\", \"is_US_city\", \"population_in_millions\", \"rainy_days_per_year\", \"places_to_visit\", \"gps_coordinates\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up\n",
    "\n",
    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
    "\n",
    "Otherwise, run the following cell to delete the temporary cloud artifacts created during the BigFrames session:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bpd.close_session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}