{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use BigQuery DataFrames to run Anthropic LLM at scale\n", "\n", "\n", "\n", " \n", " \n", " \n", "
\n", " \n", " \"Colab Run in Colab\n", " \n", " \n", " \n", " \"GitHub\n", " View on GitHub\n", " \n", " \n", " \n", " \"BQ\n", " Open in BQ Studio\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview\n", "\n", "Anthropic Claude models are available as APIs on Vertex AI ([docs](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude)).\n", "\n", "To run the Claude models at large scale data we can utilze the BigQuery\n", "DataFrames remote functions ([docs](https://cloud.google.com/bigquery/docs/use-bigquery-dataframes#remote-functions)).\n", "BigQuery DataFrames provides a simple pythonic interface `remote_function` to\n", "deploy the user code as a BigQuery remote function and then invoke it at scale\n", "by utilizing the parallel distributed computing architecture of BigQuery and\n", "Google Cloud Function.\n", "\n", "In this notebook we showcase one such example. For the demonstration purpose we\n", "use a small amount of data, but the example generalizes for large data. Check out\n", "various IO APIs provided by BigQuery DataFrames [here](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.pandas#bigframes_pandas_read_gbq)\n", "to see how you could create a DataFrame from your Big Data sitting in a BigQuery\n", "table or GCS bucket." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Set Up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set up a claude model in Vertex\n", "\n", "https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#before_you_begin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Anthropic with Vertex if needed\n", "\n", "Uncomment the following cell and run the cell to install anthropic python\n", "package with vertex extension if you don't already have it." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# !pip install anthropic[vertex] --quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define project and location for GCP integration" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "PROJECT = \"bigframes-dev\" # replace with your project\n", "LOCATION = \"us-east5\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initialize BigQuery DataFrames dataframe\n", "\n", "BigQuery DataFrames is a set of open source Python libraries that let you take\n", "advantage of BigQuery data processing by using familiar Python APIs.\n", "See for more details https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Import BigQuery DataFrames pandas module and initialize it with your project\n", "# and location\n", "\n", "import bigframes.pandas as bpd\n", "bpd.options.bigquery.project = PROJECT\n", "bpd.options.bigquery.location = LOCATION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use a DataFrame with small amount of inline data for demo purpose.\n", "You could create a DataFrame from your own data. See APIs like `read_gbq`,\n", "`read_csv`, `read_json` etc. at https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.pandas." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
questions
0What is the capital of France?
1Explain the concept of photosynthesis in simpl...
2Write a haiku about artificial intelligence.
\n", "

3 rows × 1 columns

\n", "
[3 rows x 1 columns in total]" ], "text/plain": [ " questions\n", "0 What is the capital of France?\n", "1 Explain the concept of photosynthesis in simpl...\n", "2 Write a haiku about artificial intelligence.\n", "\n", "[3 rows x 1 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = bpd.DataFrame({\"questions\": [\n", " \"What is the capital of France?\",\n", " \"Explain the concept of photosynthesis in simple terms.\",\n", " \"Write a haiku about artificial intelligence.\"\n", " ]})\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Use BigQuery DataFrames `remote_function`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a remote function from a custom python function that takes a prompt\n", "and returns the output of the claude LLM running in Vertex. We will be using\n", "`max_batching_rows=1` to control parallelization. This ensures that a single\n", "prompt is processed per batch in the underlying cloud function so that the batch\n", "processing does not time out. An ideal value for `max_batching_rows` depends on\n", "the complexity of the prompts in the real use case and should be discovered\n", "through offline experimentation. Check out the API for other ways to control\n", "parallelization https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.pandas#bigframes_pandas_remote_function." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " Query processed 0 Bytes in a moment of slot time. [Job bigframes-dev:us-east5.9bc70627-6891-44a4-b7d7-8a28e213cdec details]\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "@bpd.remote_function(packages=[\"anthropic[vertex]\", \"google-auth[requests]\"],\n", " max_batching_rows=1, \n", " bigquery_connection=\"bigframes-dev.us-east5.bigframes-rf-conn\", # replace with your connection\n", " cloud_function_service_account=\"default\",\n", ")\n", "def anthropic_transformer(message: str) -> str:\n", " from anthropic import AnthropicVertex\n", " client = AnthropicVertex(region=LOCATION, project_id=PROJECT)\n", "\n", " message = client.messages.create(\n", " max_tokens=1024,\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": message,\n", " }\n", " ],\n", " model=\"claude-3-haiku@20240307\",\n", " )\n", " content_text = message.content[0].text if message.content else \"\"\n", " return content_text" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bigframes-dev._e9a5162ae4daa9f50fda3f95febaa9781131f3b8.bigframes_sessionc10c73_49262141176cbf70037559ae84e834d3'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print the BigQuery remote function created\n", "anthropic_transformer.bigframes_remote_function" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'projects/bigframes-dev/locations/us-east5/functions/bigframes-sessionc10c73-49262141176cbf70037559ae84e834d3'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print the cloud function created\n", "anthropic_transformer.bigframes_cloud_function" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " Query started with request ID bigframes-dev:us-east5.821579f4-63ea-4072-a3ce-318e43768432.
SQL
SELECT\n",
       "`bfuid_col_3` AS `bfuid_col_3`,\n",
       "`bfuid_col_4` AS `bfuid_col_4`,\n",
       "`bfuid_col_5` AS `bfuid_col_5`\n",
       "FROM\n",
       "(SELECT\n",
       "  `t1`.`bfuid_col_3`,\n",
       "  `t1`.`bfuid_col_4`,\n",
       "  `t1`.`bfuid_col_5`,\n",
       "  `t1`.`bfuid_col_6` AS `bfuid_col_7`\n",
       "FROM (\n",
       "  SELECT\n",
       "    `t0`.`level_0`,\n",
       "    `t0`.`column_0`,\n",
       "    `t0`.`bfuid_col_6`,\n",
       "    `t0`.`level_0` AS `bfuid_col_3`,\n",
       "    `t0`.`column_0` AS `bfuid_col_4`,\n",
       "    `bigframes-dev._e9a5162ae4daa9f50fda3f95febaa9781131f3b8.bigframes_sessionc10c73_49262141176cbf70037559ae84e834d3`(`t0`.`column_0`) AS `bfuid_col_5`\n",
       "  FROM (\n",
       "    SELECT\n",
       "      *\n",
       "    FROM UNNEST(ARRAY<STRUCT<`level_0` INT64, `column_0` STRING, `bfuid_col_6` INT64>>[STRUCT(0, 'What is the capital of France?', 0), STRUCT(1, 'Explain the concept of photosynthesis in simple terms.', 1), STRUCT(2, 'Write a haiku about artificial intelligence.', 2)]) AS `level_0`\n",
       "  ) AS `t0`\n",
       ") AS `t1`)\n",
       "ORDER BY `bfuid_col_7` ASC NULLS LAST\n",
       "LIMIT 10
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
questionsanswers
0What is the capital of France?The capital of France is Paris.
1Explain the concept of photosynthesis in simpl...Photosynthesis is the process by which plants ...
2Write a haiku about artificial intelligence.Here is a haiku about artificial intelligence:...
\n", "

3 rows × 2 columns

\n", "
[3 rows x 2 columns in total]" ], "text/plain": [ " questions \\\n", "0 What is the capital of France? \n", "1 Explain the concept of photosynthesis in simpl... \n", "2 Write a haiku about artificial intelligence. \n", "\n", " answers \n", "0 The capital of France is Paris. \n", "1 Photosynthesis is the process by which plants ... \n", "2 Here is a haiku about artificial intelligence:... \n", "\n", "[3 rows x 2 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Apply the remote function on the user data\n", "df[\"answers\"] = df[\"questions\"].apply(anthropic_transformer)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clean Up" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Session sessionc10c73 closed." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bpd.close_session()" ] } ], "metadata": { "kernelspec": { "display_name": "venv (3.14.2)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.2" } }, "nbformat": 4, "nbformat_minor": 2 }