{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Copyright 2025 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Array Data Types\n", "\n", "In BigQuery, an [ARRAY](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type) (also called a `repeated` column) is an ordered list of zero or more elements of the same, non-`NULL` data type. It's important to note that BigQuery `ARRAY`s cannot contain nested `ARRAY`s. BigQuery DataFrames represents BigQuery `ARRAY` types to `pandas.ArrowDtype(pa.list_(T))`, where `T` is the underlying Arrow type of the array elements.\n", "\n", "This notebook illustrates how to work with `ARRAY` columns in BigQuery DataFrames. First, let's import the required packages and perform the necessary setup below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import bigframes.pandas as bpd\n", "import bigframes.bigquery as bbq\n", "import pandas as pd\n", "import pyarrow as pa" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "REGION = \"US\" # @param {type: \"string\"}\n", "\n", "bpd.options.display.progress_bar = None\n", "bpd.options.bigquery.location = REGION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create DataFrames with an array column\n", "\n", "**Example 1: Creating from a list of lists/tuples**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Name | \n", "Scores | \n", "
|---|---|---|
| 0 | \n", "Alice | \n", "[95 88 92] | \n", "
| 1 | \n", "Bob | \n", "[78 81] | \n", "
| 2 | \n", "Charlie | \n", "[ 82 89 94 100] | \n", "
3 rows × 2 columns
\n", "| \n", " | Name | \n", "Scores | \n", "NewScores | \n", "
|---|---|---|---|
| 0 | \n", "Alice | \n", "[95 88 92] | \n", "[100. 93. 97.] | \n", "
| 1 | \n", "Bob | \n", "[78 81] | \n", "[83. 86.] | \n", "
| 2 | \n", "Charlie | \n", "[ 82 89 94 100] | \n", "[ 87. 94. 99. 105.] | \n", "
3 rows × 3 columns
\n", "