{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "ur8xi4C7S06n"
   },
   "outputs": [],
   "source": [
    "# Copyright 2022 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "JAPoU8Sm5E6e"
   },
   "source": [
    "# Use BigQuery DataFrames with Generative AI for code generation",
    "\n",
    "<table align=\"left\">\n",
    "\n",
    "  <td>\n",
    "    <a href=\"https://colab.research.google.com/github/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb\">\n",
    "      <img src=\"https://raw.githubusercontent.com/googleapis/python-bigquery-dataframes/refs/heads/main/third_party/logo/colab-logo.png\" alt=\"Colab logo\"> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb\">\n",
    "      <img src=\"https://raw.githubusercontent.com/googleapis/python-bigquery-dataframes/refs/heads/main/third_party/logo/github-logo.png\" width=\"32\" alt=\"GitHub logo\">\n",
    "      View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/googleapis/python-bigquery-dataframes/tree/main/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb\">\n",
    "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\">\n",
    "      Open in Vertex AI Workbench\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://console.cloud.google.com/bigquery/import?url=https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb\">\n",
    "      <img src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTW1gvOovVlbZAIZylUtf5Iu8-693qS1w5NJw&s\" alt=\"BQ logo\" width=\"35\">\n",
    "      Open in BQ Studio\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "24743cf4a1e1"
   },
   "source": [
    "**_NOTE_**: This notebook has been tested in the following environment:\n",
    "\n",
    "* Python version = 3.10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tvgnzT1CKxrO"
   },
   "source": [
    "## Overview\n",
    "\n",
    "Use this notebook to walk through an example use case of generating sample code by using BigQuery DataFrames and its integration with Generative AI support on Vertex AI.\n",
    "\n",
    "Learn more about [BigQuery DataFrames](https://cloud.google.com/python/docs/reference/bigframes/latest)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d975e698c9a4"
   },
   "source": [
    "### Objective\n",
    "\n",
    "In this tutorial, you create a CSV file containing sample code for calling a given set of APIs.\n",
    "\n",
    "The steps include:\n",
    "\n",
    "- Defining an LLM model in BigQuery DataFrames, specifically the [Gemini Model](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models), using `bigframes.ml.llm`.\n",
    "- Creating a DataFrame by reading in data from Cloud Storage.\n",
    "- Manipulating data in the DataFrame to build LLM prompts.\n",
    "- Sending DataFrame prompts to the LLM model using the `predict` method.\n",
    "- Creating and using a custom function to transform the output provided by the LLM model response.\n",
    "- Exporting the resulting transformed DataFrame as a CSV file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "08d289fa873f"
   },
   "source": [
    "### Dataset\n",
    "\n",
    "This tutorial uses a dataset listing the names of various pandas DataFrame and Series APIs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aed92deeb4a0"
   },
   "source": [
    "### Costs\n",
    "\n",
    "This tutorial uses billable components of Google Cloud:\n",
    "\n",
    "* BigQuery\n",
    "* Generative AI support on Vertex AI\n",
    "* Cloud Functions\n",
    "\n",
    "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models),\n",
    "[Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing#generative_ai_models), and [Cloud Functions pricing](https://cloud.google.com/functions/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
    "to generate a cost estimate based on your projected usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "i7EUnXsZhAGF"
   },
   "source": [
    "## Installation\n",
    "\n",
    "Install the following packages, which are required to run this notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "2b4ef9b72d43"
   },
   "outputs": [],
   "source": [
    "!pip install bigframes --upgrade --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BF1j6f9HApxa"
   },
   "source": [
    "## Before you begin\n",
    "\n",
    "Complete the tasks in this section to set up your environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Wbr2aVtFQBcg"
   },
   "source": [
    "### Set up your Google Cloud project\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n",
    "\n",
    "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
    "\n",
    "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n",
    "\n",
    "  * BigQuery API\n",
    "  * BigQuery Connection API\n",
    "  * Cloud Functions API\n",
    "  * Cloud Run API\n",
    "  * Artifact Registry API\n",
    "  * Cloud Build API\n",
    "  * Cloud Resource Manager API\n",
    "  * Vertex AI API\n",
    "\n",
    "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WReHDGG5g0XY"
   },
   "source": [
    "#### Set your project ID\n",
    "\n",
    "If you don't know your project ID, try the following:\n",
    "* Run `gcloud config list`.\n",
    "* Run `gcloud projects list`.\n",
    "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "id": "oM1iC_MfAts1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1;31mERROR:\u001b[0m (gcloud.config.set) argument VALUE: Must be specified.\n",
      "Usage: gcloud config set SECTION/PROPERTY VALUE [optional flags]\n",
      "  optional flags may be  --help | --installation\n",
      "\n",
      "For detailed information on this command and its flags, run:\n",
      "  gcloud config set --help\n"
     ]
    }
   ],
   "source": [
    "PROJECT_ID = \"\"  # @param {type:\"string\"}\n",
    "\n",
    "# Set the project id\n",
    "! gcloud config set project {PROJECT_ID}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "region"
   },
   "source": [
    "#### Set the region\n",
    "\n",
    "You can also change the `REGION` variable used by BigQuery. Learn more about [BigQuery regions](https://cloud.google.com/bigquery/docs/locations#supported_locations)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "eF-Twtc4XGem"
   },
   "outputs": [],
   "source": [
    "REGION = \"US\"  # @param {type: \"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sBCra4QMA2wR"
   },
   "source": [
    "### Authenticate your Google Cloud account\n",
    "\n",
    "Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "74ccc9e52986"
   },
   "source": [
    "**Vertex AI Workbench**\n",
    "\n",
    "Do nothing, you are already authenticated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "de775a3773ba"
   },
   "source": [
    "**Local JupyterLab instance**\n",
    "\n",
    "Uncomment and run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "254614fa0c46"
   },
   "outputs": [],
   "source": [
    "# ! gcloud auth login"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ef21552ccea8"
   },
   "source": [
    "**Colab**\n",
    "\n",
    "Uncomment and run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "603adbbf0532"
   },
   "outputs": [],
   "source": [
    "# from google.colab import auth\n",
    "# auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "960505627ddf"
   },
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "PyQmSRbKA8r-"
   },
   "outputs": [],
   "source": [
    "import bigframes.pandas as bf\n",
    "from google.cloud import bigquery\n",
    "from google.cloud import bigquery_connection_v1 as bq_connection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "init_aip:mbsdk,all"
   },
   "source": [
    "### Set BigQuery DataFrames options"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "NPPMuw2PXGeo"
   },
   "outputs": [],
   "source": [
    "# Note: The project option is not required in all environments.\n",
    "# On BigQuery Studio, the project ID is automatically detected.\n",
    "bf.options.bigquery.project = PROJECT_ID\n",
    "\n",
    "# Note: The location option is not required.\n",
    "# It defaults to the location of the first table or query\n",
    "# passed to read_gbq(). For APIs where a location can't be\n",
    "# auto-detected, the location defaults to the \"US\" location.\n",
    "bf.options.bigquery.location = REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DTVtFlqeFbrU"
   },
   "source": [
    "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6eytf4xQHzcF"
   },
   "source": [
    "# Define the LLM model\n",
    "\n",
    "BigQuery DataFrames provides integration with [Gemini Models](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models) via Vertex AI.\n",
    "\n",
    "This section walks through a few steps required in order to use the model in your notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qUjT8nw-jIXp"
   },
   "source": [
    "## Define the model\n",
    "\n",
    "Use `bigframes.ml.llm` to define the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "sdjeXFwcHfl7"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 0ee1a08e-788e-4fc7-b061-52c23ab25d5a is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:0ee1a08e-788e-4fc7-b061-52c23ab25d5a&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from bigframes.ml.llm import GeminiTextGenerator\n",
    "\n",
    "model = GeminiTextGenerator(model_name=\"gemini-2.0-flash-001\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GbW0oCnU1s1N"
   },
   "source": [
    "# Read data from Cloud Storage into BigQuery DataFrames\n",
    "\n",
    "You can create a BigQuery DataFrames DataFrame by reading data from any of the following locations:\n",
    "\n",
    "* A local data file\n",
    "* Data stored in a BigQuery table\n",
    "* A data file stored in Cloud Storage\n",
    "* An in-memory pandas DataFrame\n",
    "\n",
    "In this tutorial, you create BigQuery DataFrames DataFrames by reading two CSV files stored in Cloud Storage, one containing a list of DataFrame API names and one containing a list of Series API names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "SchiTkQGIJog"
   },
   "outputs": [],
   "source": [
    "df_api = bf.read_csv(\"gs://cloud-samples-data/vertex-ai/bigframe/df.csv\")\n",
    "series_api = bf.read_csv(\"gs://cloud-samples-data/vertex-ai/bigframe/series.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7OBjw2nmQY3-"
   },
   "source": [
    "Take a peek at a few rows of data for each file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "QCqgVCIsGGuv"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 48be241c-ee93-4dfa-a9e3-66b64c4b5150 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:48be241c-ee93-4dfa-a9e3-66b64c4b5150&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 6af9caa5-4f7a-48f0-a7df-d692ee063b7e is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:6af9caa5-4f7a-48f0-a7df-d692ee063b7e&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>API</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>values</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>dtypes</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2 rows × 1 columns</p>\n",
       "</div>[2 rows x 1 columns in total]"
      ],
      "text/plain": [
       "      API\n",
       "0  values\n",
       "1  dtypes\n",
       "\n",
       "[2 rows x 1 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_api.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "BGJnZbgEGS5-"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 41e4f2e7-689a-45d9-bf92-4416f5560b81 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:41e4f2e7-689a-45d9-bf92-4416f5560b81&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job aae0b164-f786-4734-8c79-2af9805af0cf is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:aae0b164-f786-4734-8c79-2af9805af0cf&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>API</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>shape</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>size</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2 rows × 1 columns</p>\n",
       "</div>[2 rows x 1 columns in total]"
      ],
      "text/plain": [
       "     API\n",
       "0  shape\n",
       "1   size\n",
       "\n",
       "[2 rows x 1 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_api.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "m3ZJEsi7SUKV"
   },
   "source": [
    "# Generate code using the LLM model\n",
    "\n",
    "Prepare the prompts and send them to the LLM model for prediction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9EMAqR37AfLS"
   },
   "source": [
    "## Prompt design in BigQuery DataFrames\n",
    "\n",
    "Designing prompts for LLMs is a fast growing area and you can read more in [this documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/introduction-prompt-design).\n",
    "\n",
    "For this tutorial, you use a simple prompt to ask the LLM model for sample code for each of the API methods (or rows) from the last step's DataFrames. The output is the new DataFrames `df_prompt` and `series_prompt`, which contain the full prompt text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "EDAaIwHpQCDZ"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 17f50c10-aa81-4023-b206-4ba59ddf2269 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:17f50c10-aa81-4023-b206-4ba59ddf2269&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job d6d217aa-a623-4ea4-83fb-8f1b8bfb8e68 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:d6d217aa-a623-4ea4-83fb-8f1b8bfb8e68&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job a275a107-752e-46f8-be9f-9cb35eb6b0b9 is DONE. 132 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:a275a107-752e-46f8-be9f-9cb35eb6b0b9&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    Generate Pandas sample code for DataFrame.values\n",
       "1    Generate Pandas sample code for DataFrame.dtypes\n",
       "Name: API, dtype: string"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_prompt_prefix = \"Generate Pandas sample code for DataFrame.\"\n",
    "series_prompt_prefix = \"Generate Pandas sample code for Series.\"\n",
    "\n",
    "df_prompt = (df_prompt_prefix + df_api['API'])\n",
    "series_prompt = (series_prompt_prefix + series_api['API'])\n",
    "\n",
    "df_prompt.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rwPLjqW2Ajzh"
   },
   "source": [
    "## Make predictions using the LLM model\n",
    "\n",
    "Use the BigQuery DataFrames DataFrame containing the full prompt text as the input to the `predict` method. The `predict` method calls the LLM model and returns its generated text output back to two new BigQuery DataFrames DataFrames, `df_pred` and `series_pred`.\n",
    "\n",
    "Note: The predictions might take a few minutes to run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "6i6HkFJZa8na"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 01f95d2d-901d-4edf-bd3a-245d17c31ef6 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:01f95d2d-901d-4edf-bd3a-245d17c31ef6&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 55927a6f-b023-479a-b9bf-826abde77111 is DONE. 584 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:55927a6f-b023-479a-b9bf-826abde77111&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 445eb0af-f643-40c5-9c1e-25aa3db8374a is DONE. 146 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:445eb0af-f643-40c5-9c1e-25aa3db8374a&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job ddee268c-773a-4dcc-b14c-ebdd90c2c347 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:ddee268c-773a-4dcc-b14c-ebdd90c2c347&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job d7f1eb26-28b2-44ba-8858-5cd4df8621bd is DONE. 904 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:d7f1eb26-28b2-44ba-8858-5cd4df8621bd&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job f24d27a5-0e36-4fb5-953b-d09298f83af6 is DONE. 226 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:f24d27a5-0e36-4fb5-953b-d09298f83af6&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_pred = model.predict(df_prompt.to_frame(), max_output_tokens=1024)\n",
    "series_pred = model.predict(series_prompt.to_frame(), max_output_tokens=1024)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "89cB8MW4UIdV"
   },
   "source": [
    "Once the predictions are processed, take a look at the sample output from the LLM, which provides code samples for the API names listed in the DataFrames dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "id": "9A2gw6hP_2nX"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 65599c98-72ad-4088-8b09-f29bf05c164b is DONE. 21.8 kB processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:65599c98-72ad-4088-8b09-f29bf05c164b&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "```python\n",
      "import pandas as pd\n",
      "\n",
      "# Create a DataFrame\n",
      "df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
      "\n",
      "# Get the values as a NumPy array\n",
      "values = df.values\n",
      "\n",
      "# Print the values\n",
      "print(values)\n",
      "```\n"
     ]
    }
   ],
   "source": [
    "print(df_pred['ml_generate_text_llm_result'].iloc[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Fx4lsNqMorJ-"
   },
   "source": [
    "# Manipulate LLM output using a remote function\n",
    "\n",
    "The output that the LLM provides often contains additional text beyond the code sample itself. Using BigQuery DataFrames, you can deploy custom Python functions that process and transform this output.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d8L7SN03VByG"
   },
   "source": [
    "Running the cell below creates a custom function that you can use to process the LLM output data in two ways:\n",
    "1. Strip the LLM text output to include only the code block.\n",
    "2. Substitute `import pandas as pd` with `import bigframes.pandas as bf` so that the resulting code block works with BigQuery DataFrames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "id": "GskyyUQPowBT"
   },
   "outputs": [],
   "source": [
    "@bf.remote_function(cloud_function_service_account=\"default\")\n",
    "def extract_code(text: str) -> str:\n",
    "  try:\n",
    "    res = text[text.find('\\n')+1:text.find('```', 3)]\n",
    "    res = res.replace(\"import pandas as pd\", \"import bigframes.pandas as bf\")\n",
    "    if \"import bigframes.pandas as bf\" not in res:\n",
    "      res = \"import bigframes.pandas as bf\\n\" + res\n",
    "    return res\n",
    "  except:\n",
    "    return \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hVQAoqBUOJQf"
   },
   "source": [
    "The custom function is deployed as a Cloud Function, and then integrated with BigQuery as a [remote function](https://cloud.google.com/bigquery/docs/remote-functions). Save both of the function names so that you can clean them up at the end of this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "PBlp-C-DOHRO"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloud Function Name projects/swast-scratch/locations/us-central1/functions/bigframes-6e7606963c3f06b8181b3cb9449a4363\n",
      "Remote Function Name swast-scratch._63cfa399614a54153cc386c27d6c0c6fdb249f9e.bigframes_6e7606963c3f06b8181b3cb9449a4363\n"
     ]
    }
   ],
   "source": [
    "CLOUD_FUNCTION_NAME = format(extract_code.bigframes_cloud_function)\n",
    "print(\"Cloud Function Name \" + CLOUD_FUNCTION_NAME)\n",
    "REMOTE_FUNCTION_NAME = format(extract_code.bigframes_remote_function)\n",
    "print(\"Remote Function Name \" + REMOTE_FUNCTION_NAME)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4FEucaiqVs3H"
   },
   "source": [
    "Apply the custom function to each LLM output DataFrame to get the processed results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "bsQ9cmoWo0Ps"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 047903f8-ea67-430a-8281-8fb5a119b779 is DONE. 21.8 kB processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:047903f8-ea67-430a-8281-8fb5a119b779&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 793df956-0b1a-46ba-bb5e-e428171f3bd0 is DONE. 26.3 kB processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:793df956-0b1a-46ba-bb5e-e428171f3bd0&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_code = df_pred.assign(code=df_pred['ml_generate_text_llm_result'].apply(extract_code))\n",
    "series_code = series_pred.assign(code=series_pred['ml_generate_text_llm_result'].apply(extract_code))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ujQVVuhfWA3y"
   },
   "source": [
    "You can see the differences by inspecting the first row of data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "7yWzjhGy_zcy"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 6974c2b7-2ed9-4564-a80b-57aef6959e19 is DONE. 22.8 kB processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:6974c2b7-2ed9-4564-a80b-57aef6959e19&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "import bigframes.pandas as bf\n",
      "\n",
      "# Create a DataFrame\n",
      "df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
      "\n",
      "# Get the values as a NumPy array\n",
      "values = df.values\n",
      "\n",
      "# Print the values\n",
      "print(values)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(df_code['code'].iloc[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GTRdUw-Ro5R1"
   },
   "source": [
    "# Save the results to Cloud Storage\n",
    "\n",
    "BigQuery DataFrames lets you save a BigQuery DataFrames DataFrame as a CSV file in Cloud Storage for further use. Try that now with your processed LLM output data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9DQ7eiQxPTi3"
   },
   "source": [
    "Create a new Cloud Storage bucket with a unique name:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "id": "-J5LHgS6LLZ0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating gs://code-samples-773ee0f2-e302-11ee-8298-4201c0a8181f/...\n"
     ]
    }
   ],
   "source": [
    "import uuid\n",
    "BUCKET_ID = \"code-samples-\" + str(uuid.uuid1())\n",
    "\n",
    "!gcloud storage buckets create gs://{BUCKET_ID}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tyxZXj0UPYUv"
   },
   "source": [
    "Use `to_csv` to write each BigQuery DataFrames DataFrame as a CSV file in the Cloud Storage bucket:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "id": "Zs_b5L-4IvER"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job 81277037-032f-4557-a46e-1d39702f33d5 is DONE. 22.8 kB processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:81277037-032f-4557-a46e-1d39702f33d5&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 8dc5a38c-ac16-44e7-83dd-4187380f780f is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:8dc5a38c-ac16-44e7-83dd-4187380f780f&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 9087a758-b1f9-4be7-889b-7761ef0ad966 is DONE. 27.7 kB processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:9087a758-b1f9-4be7-889b-7761ef0ad966&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 6126ea72-c6f7-43f0-8888-e1c2a464a8a4 is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:6126ea72-c6f7-43f0-8888-e1c2a464a8a4&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_code[[\"code\"]].to_csv(f\"gs://{BUCKET_ID}/df_code*.csv\")\n",
    "series_code[[\"code\"]].to_csv(f\"gs://{BUCKET_ID}/series_code*.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UDBtDlrTuuh8"
   },
   "source": [
    "You can navigate to the Cloud Storage bucket browser to download the two files and view them.\n",
    "\n",
    "Run the following cell, and then follow the link to your Cloud Storage bucket browser:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "id": "PspCXu-qu_ND"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "https://console.developers.google.com/storage/browser/code-samples-773ee0f2-e302-11ee-8298-4201c0a8181f/\n"
     ]
    }
   ],
   "source": [
    "print(f'https://console.developers.google.com/storage/browser/{BUCKET_ID}/')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGSvUk48RK20"
   },
   "source": [
    "# Summary and next steps\n",
    "\n",
    "You've used BigQuery DataFrames' integration with LLM models (`bigframes.ml.llm`) to generate code samples, and have tranformed LLM output by creating and using a custom function in BigQuery DataFrames.\n",
    "\n",
    "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TpV-iwP9qw9c"
   },
   "source": [
    "## Cleaning up\n",
    "\n",
    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
    "\n",
    "Otherwise, you can uncomment the remaining cells and run them to delete the individual resources you created in this tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bf.close_session()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "id": "yw7A461XLjvW"
   },
   "outputs": [],
   "source": [
    "# # Delete the BigQuery Connection\n",
    "# from google.cloud import bigquery_connection_v1 as bq_connection\n",
    "# client = bq_connection.ConnectionServiceClient()\n",
    "# CONNECTION_ID = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n",
    "# client.delete_connection(name=CONNECTION_ID)\n",
    "# print(f\"Deleted connection '{CONNECTION_ID}'.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "id": "sx_vKniMq9ZX"
   },
   "outputs": [],
   "source": [
    "# # Delete the Cloud Function\n",
    "# ! gcloud functions delete {CLOUD_FUNCTION_NAME} --quiet\n",
    "# # Delete the Remote Function\n",
    "# REMOTE_FUNCTION_NAME = REMOTE_FUNCTION_NAME.replace(PROJECT_ID + \".\", \"\")\n",
    "# ! bq rm --routine --force=true {REMOTE_FUNCTION_NAME}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "id": "iQFo6OUBLmi3"
   },
   "outputs": [],
   "source": [
    "# # Delete the Google Cloud Storage bucket and files\n",
    "# ! gcloud storage rm gs://{BUCKET_ID} --recursive\n",
    "# print(f\"Deleted bucket '{BUCKET_ID}'.\")"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "venv (3.10.14)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}