{ "cells": [ { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Copyright 2023 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Set Up" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "Y6QAttCqqMM0" }, "outputs": [], "source": [ "import bigframes.pandas as bpd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 296 }, "id": "xraJ9RRzsvel", "outputId": "6e3308cf-8de0-4b89-9128-4c6ddf3598c0" }, "outputs": [ { "data": { "text/html": [ "Query job 1f6094e9-1942-477c-9ce3-87a614d71294 is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job ba19f29c-33d3-4f12-9605-ddeafb74918e is DONE. 582.8 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job dd1ff8be-700a-4ce5-91a0-31413f70cfad is DONE. 82.0 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minutes
88RoyalsAthletics176
106DodgersGiants216
166PhilliesRoyals162
247RangersRoyals161
374AthleticsAstros161
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes\n", "88 Royals Athletics 176\n", "106 Dodgers Giants 216\n", "166 Phillies Royals 162\n", "247 Rangers Royals 161\n", "374 Athletics Astros 161" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = bpd.read_gbq(\"bigquery-public-data.baseball.schedules\")[[\"homeTeamName\", \"awayTeamName\", \"duration_minutes\"]]\n", "df.peek()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Notes\n", "\n", "* The API reference documentation for the `remote_function` can be found at\n", " https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.session.Session#bigframes_session_Session_remote_function\n", "\n", "* More code samples for `remote_function` can be found in the BigQuery\n", " DataFrames API reference documentation, e.g.\n", " * https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.series.Series#bigframes_series_Series_apply\n", " * https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame#bigframes_dataframe_DataFrame_map\n", " * https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame#bigframes_dataframe_DataFrame_apply\n", "\n", "* The following examples are only for the purpose of demonstrating\n", "`remote_function` usage. They are not necessarily the best way to achieve the\n", "end result.\n", "\n", "* In the examples in this notebook we are using `reuse=False` just as a caution\n", " to avoid concurrent runs of this notebook in the same google cloud project\n", " stepping over each other's remote function deployment. It may not be neccesary\n", " in a simple use case." ] }, { "cell_type": "markdown", "metadata": { "id": "Pt4mWYE1p5o8" }, "source": [ "# Self-contained function\n", "\n", "Let's consider a scenario where we want to categorize the matches as short,\n", "medium or long duration based on the `duration_minutes` column." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "id": "VoCPBJ-ZpyeG", "outputId": "19351206-116e-4da2-8ff0-f288b7745b27" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/google/home/arwas/src1/python-bigquery-dataframes/bigframes/functions/_function_session.py:335: UserWarning: You have not explicitly set a user-managed cloud_function_service_account. Using the default compute service account, {cloud_function_service_account}. To use Bigframes 2.0, please set an explicit user-managed cloud_function_service_account or set cloud_function_service_account explicitly to `default`.See, https://cloud.google.com/functions/docs/securing/function-identity.\n", " warnings.warn(msg, category=UserWarning)\n" ] }, { "data": { "text/html": [ "Query job 7c021760-59c4-4f3a-846c-9693a4d16eef is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Created cloud function 'projects/bigframes-dev/locations/us-central1/functions/bigframes-sessionca6012-ca541a90249f8b62951f38b7aba6a711-49to' and BQ remote function 'bigframes-dev._ed1e4d0f7d41174ba506d34d15dccf040d13f69e.bigframes_sessionca6012_ca541a90249f8b62951f38b7aba6a711_49to'.\n" ] } ], "source": [ "@bpd.remote_function(reuse=False, cloud_function_service_account=\"default\")\n", "def duration_category(duration_minutes: int) -> str:\n", " if duration_minutes < 90:\n", " return \"short\"\n", " elif duration_minutes < 180:\n", " return \"medium\"\n", " else:\n", " return \"long\"\n", "\n", "print(f\"Created cloud function '{duration_category.bigframes_cloud_function}' and BQ remote function '{duration_category.bigframes_remote_function}'.\")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 258 }, "id": "oXgDB70Lp5cG", "outputId": "c08aade0-8b03-425b-fc26-deafd89275a4" }, "outputs": [ { "data": { "text/html": [ "Query job 4b116e3e-d4d3-4eb6-9764-0a29a7c5d036 is DONE. 58.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job d62ac4f0-47c9-47ae-8611-c9ecf78f20c9 is DONE. 157.2 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 5f876ebb-2d95-4c68-9d84-947e02b37bad is DONE. 98.8 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minutesduration_cat
1911DodgersAngels132medium
2365AthleticsAngels134medium
1977AthleticsAngels139medium
554CubsAngels142medium
654AstrosAngels143medium
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes duration_cat\n", "1911 Dodgers Angels 132 medium\n", "2365 Athletics Angels 134 medium\n", "1977 Athletics Angels 139 medium\n", "554 Cubs Angels 142 medium\n", "654 Astros Angels 143 medium" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.assign(duration_cat=df[\"duration_minutes\"].apply(duration_category))\n", "df1.peek()" ] }, { "cell_type": "markdown", "metadata": { "id": "zTaNSVmuzEkc" }, "source": [ "# Function referring to variables outside the function body\n", "\n", "Let's consider a slight variation of the earlier example where the labels for\n", "the short, medium and long duration matches are defined outside the function\n", "body. They would be captured at the time of `remote_function` deployment and\n", "any change in their values in the notebook after the deployment will not\n", "automatically propagate to the `remote_function`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "id": "2UEmTbu4znyS" }, "outputs": [], "source": [ "DURATION_CATEGORY_SHORT = \"S\"\n", "DURATION_CATEGORY_MEDIUM = \"M\"\n", "DURATION_CATEGORY_LONG = \"L\"" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "id": "G-73kpmrznHn", "outputId": "b5923b7c-d412-43bf-9a20-3946154df81a" }, "outputs": [ { "data": { "text/html": [ "Query job 1909a652-5735-401b-8a77-674d8539ded0 is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Created cloud function 'projects/bigframes-dev/locations/us-central1/functions/bigframes-session54c8b0-4191f0fce98d46cc09359de47e203236-e009' and BQ remote function 'bigframes-dev._1b6c31ff1bcd5d2f6d86833cf8268317f1b12d57.bigframes_session54c8b0_4191f0fce98d46cc09359de47e203236_e009'.\n" ] } ], "source": [ "@bpd.remote_function(reuse=False, cloud_function_service_account=\"default\")\n", "def duration_category(duration_minutes: int) -> str:\n", " if duration_minutes < 90:\n", " return DURATION_CATEGORY_SHORT\n", " elif duration_minutes < 180:\n", " return DURATION_CATEGORY_MEDIUM\n", " else:\n", " return DURATION_CATEGORY_LONG\n", "\n", "print(f\"Created cloud function '{duration_category.bigframes_cloud_function}' and BQ remote function '{duration_category.bigframes_remote_function}'.\")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 258 }, "id": "DWHKsfF-z7rL", "outputId": "c736b57f-1fcb-464a-f725-eb203265ddc2" }, "outputs": [ { "data": { "text/html": [ "Query job a942bdc5-6a6d-4db8-b2aa-a556197377b3 is DONE. 58.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 175ae9d3-604f-495b-a167-8b06c0283bd2 is DONE. 147.7 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job d331a785-e574-45c9-86c8-d29ddd79a4d1 is DONE. 89.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minutesduration_cat
1911DodgersAngels132M
2365AthleticsAngels134M
1977AthleticsAngels139M
554CubsAngels142M
654AstrosAngels143M
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes duration_cat\n", "1911 Dodgers Angels 132 M\n", "2365 Athletics Angels 134 M\n", "1977 Athletics Angels 139 M\n", "554 Cubs Angels 142 M\n", "654 Astros Angels 143 M" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.assign(duration_cat=df[\"duration_minutes\"].apply(duration_category))\n", "df1.peek()" ] }, { "cell_type": "markdown", "metadata": { "id": "J-1BIasNzKil" }, "source": [ "# Function referring to imports (built-in) outside the function body\n", "\n", "Let's consider a scenario in which we want to categorize the matches in terms of\n", "hour buckets. E.g. a match finishing in 0-60 minutes would be in 1h category,\n", "61-120 minutes in 2h category and so on. The function itself makes use of the\n", "`math` module (a built-in module in a standard python installation) which\n", "happens to be imported outside the function body, let's say in one of the\n", "previous cells. For the demo purpose we have aliased the import to `mymath`, but\n", "it is not necessary.\n", "\n", "Later in the notebook we will see another example with a third-party module." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "id": "zlQfhcW41uzM" }, "outputs": [], "source": [ "import math as mymath" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "id": "ktADchck2mh4", "outputId": "9aed6aea-b361-4414-a0f6-8873e8291090" }, "outputs": [ { "data": { "text/html": [ "Query job bbc0b78f-bc04-4bd5-b711-399786a51519 is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Created cloud function 'projects/bigframes-dev/locations/us-central1/functions/bigframes-session54c8b0-cf31fc2d2c7fe111afa5526f5a9cdf06-gmmo' and BQ remote function 'bigframes-dev._1b6c31ff1bcd5d2f6d86833cf8268317f1b12d57.bigframes_session54c8b0_cf31fc2d2c7fe111afa5526f5a9cdf06_gmmo'.\n" ] } ], "source": [ "@bpd.remote_function(reuse=False, cloud_function_service_account=\"default\")\n", "def duration_category(duration_minutes: int) -> str:\n", " duration_hours = mymath.ceil(duration_minutes / 60)\n", " return f\"{duration_hours}h\"\n", "\n", "print(f\"Created cloud function '{duration_category.bigframes_cloud_function}' and BQ remote function '{duration_category.bigframes_remote_function}'.\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 258 }, "id": "ywAtZlJU3GoB", "outputId": "d3c93a31-3367-4ccf-bdf7-62d5bbff4461" }, "outputs": [ { "data": { "text/html": [ "Query job 991b54ed-9eaa-450f-9208-3e73404bb112 is DONE. 58.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 4e464a58-ac5b-42fd-91e3-92c115bdd273 is DONE. 150.1 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job d340f55d-1511-431a-970d-a70ed4356935 is DONE. 91.7 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minutesduration_cat
1911DodgersAngels1323h
2365AthleticsAngels1343h
1977AthleticsAngels1393h
554CubsAngels1423h
654AstrosAngels1433h
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes duration_cat\n", "1911 Dodgers Angels 132 3h\n", "2365 Athletics Angels 134 3h\n", "1977 Athletics Angels 139 3h\n", "554 Cubs Angels 142 3h\n", "654 Astros Angels 143 3h" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.assign(duration_cat=df[\"duration_minutes\"].apply(duration_category))\n", "df1.peek()" ] }, { "cell_type": "markdown", "metadata": { "id": "WO0FH7Bm3OxR" }, "source": [ "# Function referring to another function outside the function body\n", "\n", "In this example let's create a `remote_function` from a function\n", "`duration_category` which depends upon another function `get_hour_ceiling`,\n", "which further depends on another function `get_minutes_in_hour`. This dependency\n", "chain could be even longer in a real world example. The behaviors of the\n", "dependencies would be captured at the time of the remote function\n", "deployment.\n", "\n", "Please ntoe that any changes in those functions in the notebook after the\n", "deployment would not automatically propagate to the remote function." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "0G91fWiF3pKg" }, "outputs": [], "source": [ "import math\n", "\n", "def get_minutes_in_hour():\n", " return 60\n", "\n", "def get_hour_ceiling(minutes):\n", " return math.ceil(minutes / get_minutes_in_hour())" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "id": "lQrC8T2031EJ", "outputId": "420e7c3d-54cb-4814-f973-c7678be61caa" }, "outputs": [ { "data": { "text/html": [ "Query job 10d1afa3-349b-49a8-adbd-79a8309ce77c is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Created cloud function 'projects/bigframes-dev/locations/us-central1/functions/bigframes-session54c8b0-3c03836c2044bf625d02e25ccdbfe101-k1m4' and BQ remote function 'bigframes-dev._1b6c31ff1bcd5d2f6d86833cf8268317f1b12d57.bigframes_session54c8b0_3c03836c2044bf625d02e25ccdbfe101_k1m4'.\n" ] } ], "source": [ "@bpd.remote_function(reuse=False, cloud_function_service_account=\"default\")\n", "def duration_category(duration_minutes: int) -> str:\n", " duration_hours = get_hour_ceiling(duration_minutes)\n", " return f\"{duration_hours} hrs\"\n", "\n", "print(f\"Created cloud function '{duration_category.bigframes_cloud_function}' and BQ remote function '{duration_category.bigframes_remote_function}'.\")" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 258 }, "id": "GVyrihii4EFG", "outputId": "e979b649-4ed4-4b82-e814-54180420e3fc" }, "outputs": [ { "data": { "text/html": [ "Query job 33aff336-48d6-4caa-8cae-f459d21b180e is DONE. 58.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 561e0aa7-3962-4ef3-b308-a117a0ac3a7d is DONE. 157.4 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 759dccf8-3d88-40e1-a38a-2a2064e1d269 is DONE. 99.0 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minutesduration_cat
1911DodgersAngels1323 hrs
2365AthleticsAngels1343 hrs
1977AthleticsAngels1393 hrs
554CubsAngels1423 hrs
654AstrosAngels1433 hrs
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes duration_cat\n", "1911 Dodgers Angels 132 3 hrs\n", "2365 Athletics Angels 134 3 hrs\n", "1977 Athletics Angels 139 3 hrs\n", "554 Cubs Angels 142 3 hrs\n", "654 Astros Angels 143 3 hrs" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.assign(duration_cat=df[\"duration_minutes\"].apply(duration_category))\n", "df1.peek()" ] }, { "cell_type": "markdown", "metadata": { "id": "Uu7SOoT94vSP" }, "source": [ "# Function requiring external packages\n", "\n", "In this example let's say we want to redact the `homeTeamName` values, and we\n", "choose to use a third party library `cryptography`. Any third party dependencies\n", "can be specified in [pip format](https://pip.pypa.io/en/stable/reference/requirements-file-format/)\n", "(with or without version number) as a list via the `packages` parameter." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "3EUEyNcW41_l", "outputId": "2d09d60f-da1a-4eab-86d3-0e62390a360c" }, "outputs": [ { "data": { "text/html": [ "Query job e2a44878-2564-44a5-8dec-b7ea2f42afd4 is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "@bpd.remote_function(reuse=False, packages=[\"cryptography\"], cloud_function_service_account=\"default\")\n", "def get_hash(input: str) -> str:\n", " from cryptography.fernet import Fernet\n", "\n", " # handle missing value\n", " if input is None:\n", " input = \"\"\n", "\n", " key = Fernet.generate_key()\n", " f = Fernet(key)\n", " return f.encrypt(input.encode()).decode()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 258 }, "id": "OX1Hl7bR5uyd", "outputId": "8ac3bf28-d16d-438b-b636-74ef2371715f" }, "outputs": [ { "data": { "text/html": [ "Query job bcfab000-ca19-4633-bf0e-45e7d053f3eb is DONE. 60.5 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 139a6449-c07e-41ff-9aed-c6fdd633740a is DONE. 388.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 035fa2fb-0a55-4358-bb50-3ef915f5bf54 is DONE. 330.0 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minuteshomeTeamNameRedacted
641American LeagueNational League185gAAAAABmo0n2I391cbYwIYeg8lyJq1MSFZatrtpvuUD5v-...
349AngelsAstros187gAAAAABmo0n2pX-siRwl2tIZA4m--swndC_b7vgGXrqSNM...
2349AngelsAstros160gAAAAABmo0n28Q9RwH62HvYRhTDpQ9lo8c6G8F5bnn7wgF...
557AngelsAstros166gAAAAABmo0n2YlwHlSGQ0_XvXd-QVBtB_Lq2zUifu7vKhg...
220AngelsAstros162gAAAAABmo0n2l8HMSGKYizxfEmRvGQy96mrjwx734-Rl_Z...
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes \\\n", "641 American League National League 185 \n", "349 Angels Astros 187 \n", "2349 Angels Astros 160 \n", "557 Angels Astros 166 \n", "220 Angels Astros 162 \n", "\n", " homeTeamNameRedacted \n", "641 gAAAAABmo0n2I391cbYwIYeg8lyJq1MSFZatrtpvuUD5v-... \n", "349 gAAAAABmo0n2pX-siRwl2tIZA4m--swndC_b7vgGXrqSNM... \n", "2349 gAAAAABmo0n28Q9RwH62HvYRhTDpQ9lo8c6G8F5bnn7wgF... \n", "557 gAAAAABmo0n2YlwHlSGQ0_XvXd-QVBtB_Lq2zUifu7vKhg... \n", "220 gAAAAABmo0n2l8HMSGKYizxfEmRvGQy96mrjwx734-Rl_Z... " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.assign(homeTeamNameRedacted=df[\"homeTeamName\"].apply(get_hash))\n", "df1.peek()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Function referring to imports (third-party) outside the function body\n", "\n", "In this scenario the function depends on a third party library and the module\n", "from the third party library used in the function is imported outside the\n", "function body in a previous cell. Below is such an example where the third-party\n", "dependency is `humanize` and its module of the same name is imported outside the\n", "function body." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "import datetime as dt\n", "import humanize" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job af73ab2d-8d88-4cbe-863f-d35e48af84e1 is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Created cloud function 'projects/bigframes-dev/locations/us-central1/functions/bigframes-session54c8b0-a5e21a4ad488ce8b90de19c3c8cd33b6-0ab2' and BQ remote function 'bigframes-dev._1b6c31ff1bcd5d2f6d86833cf8268317f1b12d57.bigframes_session54c8b0_a5e21a4ad488ce8b90de19c3c8cd33b6_0ab2'.\n" ] } ], "source": [ "@bpd.remote_function(reuse=False, packages=[\"humanize\"], cloud_function_service_account=\"default\")\n", "def duration_category(duration_minutes: int) -> str:\n", " timedelta = dt.timedelta(minutes=duration_minutes)\n", " return humanize.naturaldelta(timedelta)\n", "\n", "print(f\"Created cloud function '{duration_category.bigframes_cloud_function}' and BQ remote function '{duration_category.bigframes_remote_function}'.\")" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job 0a9ac329-619d-4303-8dbd-176a576d4ce8 is DONE. 58.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 456bb9b4-0576-4c04-b707-4a04496aa538 is DONE. 162.2 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 37f59939-5d2c-4fb1-839b-282ae3702d3d is DONE. 103.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
homeTeamNameawayTeamNameduration_minutesduration_cat
1911DodgersAngels1322 hours
2365AthleticsAngels1342 hours
1977AthleticsAngels1392 hours
554CubsAngels1422 hours
654AstrosAngels1432 hours
\n", "
" ], "text/plain": [ " homeTeamName awayTeamName duration_minutes duration_cat\n", "1911 Dodgers Angels 132 2 hours\n", "2365 Athletics Angels 134 2 hours\n", "1977 Athletics Angels 139 2 hours\n", "554 Cubs Angels 142 2 hours\n", "654 Astros Angels 143 2 hours" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.assign(duration_cat=df[\"duration_minutes\"].apply(duration_category))\n", "df1.peek()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clean Up" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bpd.close_session()" ] } ], "metadata": { "colab": { "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 0 }