{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Using ML - SKLearn linear regression\n", "\n", "This demo shows how we can implement a linear regression in BigQuery DataFrames ML, with API that is exactly compatible with scikit-learn." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Init & load data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job f201b84b-5506-4038-92e6-b4a82318df8f is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 12e0f983-695e-4903-8ff1-2f353d7e8cba is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandculmen_length_mmculmen_depth_mmflipper_length_mmbody_mass_gsex
0Adelie Penguin (Pygoscelis adeliae)Biscoe40.118.9188.04300.0MALE
1Adelie Penguin (Pygoscelis adeliae)Torgersen39.118.7181.03750.0MALE
2Gentoo penguin (Pygoscelis papua)Biscoe47.414.6212.04725.0FEMALE
3Chinstrap penguin (Pygoscelis antarctica)Dream42.516.7187.03350.0FEMALE
4Adelie Penguin (Pygoscelis adeliae)Biscoe43.219.0197.04775.0MALE
5Gentoo penguin (Pygoscelis papua)Biscoe46.715.3219.05200.0MALE
6Adelie Penguin (Pygoscelis adeliae)Biscoe41.321.1195.04400.0MALE
7Gentoo penguin (Pygoscelis papua)Biscoe45.213.8215.04750.0FEMALE
8Gentoo penguin (Pygoscelis papua)Biscoe46.513.5210.04550.0FEMALE
9Gentoo penguin (Pygoscelis papua)Biscoe50.515.2216.05000.0FEMALE
10Gentoo penguin (Pygoscelis papua)Biscoe48.215.6221.05100.0MALE
11Adelie Penguin (Pygoscelis adeliae)Dream38.118.6190.03700.0FEMALE
12Gentoo penguin (Pygoscelis papua)Biscoe50.715.0223.05550.0MALE
13Adelie Penguin (Pygoscelis adeliae)Biscoe37.820.0190.04250.0MALE
14Adelie Penguin (Pygoscelis adeliae)Biscoe35.017.9190.03450.0FEMALE
15Gentoo penguin (Pygoscelis papua)Biscoe48.715.7208.05350.0MALE
16Adelie Penguin (Pygoscelis adeliae)Torgersen34.621.1198.04400.0MALE
17Gentoo penguin (Pygoscelis papua)Biscoe46.815.4215.05150.0MALE
18Chinstrap penguin (Pygoscelis antarctica)Dream50.320.0197.03300.0MALE
19Adelie Penguin (Pygoscelis adeliae)Dream37.218.1178.03900.0MALE
20Chinstrap penguin (Pygoscelis antarctica)Dream51.018.8203.04100.0MALE
21Adelie Penguin (Pygoscelis adeliae)Biscoe40.517.9187.03200.0FEMALE
22Gentoo penguin (Pygoscelis papua)Biscoe45.513.9210.04200.0FEMALE
23Adelie Penguin (Pygoscelis adeliae)Dream42.218.5180.03550.0FEMALE
24Chinstrap penguin (Pygoscelis antarctica)Dream51.720.3194.03775.0MALE
\n", "

25 rows × 7 columns

\n", "
[344 rows x 7 columns in total]" ], "text/plain": [ " species island culmen_length_mm \\\n", "0 Adelie Penguin (Pygoscelis adeliae) Biscoe 40.1 \n", "1 Adelie Penguin (Pygoscelis adeliae) Torgersen 39.1 \n", "2 Gentoo penguin (Pygoscelis papua) Biscoe 47.4 \n", "3 Chinstrap penguin (Pygoscelis antarctica) Dream 42.5 \n", "4 Adelie Penguin (Pygoscelis adeliae) Biscoe 43.2 \n", "5 Gentoo penguin (Pygoscelis papua) Biscoe 46.7 \n", "6 Adelie Penguin (Pygoscelis adeliae) Biscoe 41.3 \n", "7 Gentoo penguin (Pygoscelis papua) Biscoe 45.2 \n", "8 Gentoo penguin (Pygoscelis papua) Biscoe 46.5 \n", "9 Gentoo penguin (Pygoscelis papua) Biscoe 50.5 \n", "10 Gentoo penguin (Pygoscelis papua) Biscoe 48.2 \n", "11 Adelie Penguin (Pygoscelis adeliae) Dream 38.1 \n", "12 Gentoo penguin (Pygoscelis papua) Biscoe 50.7 \n", "13 Adelie Penguin (Pygoscelis adeliae) Biscoe 37.8 \n", "14 Adelie Penguin (Pygoscelis adeliae) Biscoe 35.0 \n", "15 Gentoo penguin (Pygoscelis papua) Biscoe 48.7 \n", "16 Adelie Penguin (Pygoscelis adeliae) Torgersen 34.6 \n", "17 Gentoo penguin (Pygoscelis papua) Biscoe 46.8 \n", "18 Chinstrap penguin (Pygoscelis antarctica) Dream 50.3 \n", "19 Adelie Penguin (Pygoscelis adeliae) Dream 37.2 \n", "20 Chinstrap penguin (Pygoscelis antarctica) Dream 51.0 \n", "21 Adelie Penguin (Pygoscelis adeliae) Biscoe 40.5 \n", "22 Gentoo penguin (Pygoscelis papua) Biscoe 45.5 \n", "23 Adelie Penguin (Pygoscelis adeliae) Dream 42.2 \n", "24 Chinstrap penguin (Pygoscelis antarctica) Dream 51.7 \n", "\n", " culmen_depth_mm flipper_length_mm body_mass_g sex \n", "0 18.9 188.0 4300.0 MALE \n", "1 18.7 181.0 3750.0 MALE \n", "2 14.6 212.0 4725.0 FEMALE \n", "3 16.7 187.0 3350.0 FEMALE \n", "4 19.0 197.0 4775.0 MALE \n", "5 15.3 219.0 5200.0 MALE \n", "6 21.1 195.0 4400.0 MALE \n", "7 13.8 215.0 4750.0 FEMALE \n", "8 13.5 210.0 4550.0 FEMALE \n", "9 15.2 216.0 5000.0 FEMALE \n", "10 15.6 221.0 5100.0 MALE \n", "11 18.6 190.0 3700.0 FEMALE \n", "12 15.0 223.0 5550.0 MALE \n", "13 20.0 190.0 4250.0 MALE \n", "14 17.9 190.0 3450.0 FEMALE \n", "15 15.7 208.0 5350.0 MALE \n", "16 21.1 198.0 4400.0 MALE \n", "17 15.4 215.0 5150.0 MALE \n", "18 20.0 197.0 3300.0 MALE \n", "19 18.1 178.0 3900.0 MALE \n", "20 18.8 203.0 4100.0 MALE \n", "21 17.9 187.0 3200.0 FEMALE \n", "22 13.9 210.0 4200.0 FEMALE \n", "23 18.5 180.0 3550.0 FEMALE \n", "24 20.3 194.0 3775.0 MALE \n", "...\n", "\n", "[344 rows x 7 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Initialize BigQuery DataFrame\n", "import bigframes.pandas\n", "\n", "# read a BigQuery table to a BigQuery DataFrame\n", "df = bigframes.pandas.read_gbq(\"bigframes-dev.bqml_tutorial.penguins\")\n", "\n", "# take a peek at the dataframe\n", "df" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Data cleaning / prep" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job 81305962-a96a-4c86-949c-471b2ae7c86d is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 2af0b0d6-c11b-499e-8d25-a2c628b2853b is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
islandculmen_length_mmculmen_depth_mmflipper_length_mmbody_mass_gsex
0Biscoe40.118.9188.04300.0MALE
1Torgersen39.118.7181.03750.0MALE
4Biscoe43.219.0197.04775.0MALE
6Biscoe41.321.1195.04400.0MALE
11Dream38.118.6190.03700.0FEMALE
13Biscoe37.820.0190.04250.0MALE
14Biscoe35.017.9190.03450.0FEMALE
16Torgersen34.621.1198.04400.0MALE
19Dream37.218.1178.03900.0MALE
21Biscoe40.517.9187.03200.0FEMALE
23Dream42.218.5180.03550.0FEMALE
30Dream39.221.1196.04150.0MALE
32Torgersen42.917.6196.04700.0MALE
38Dream41.117.5190.03900.0MALE
40Torgersen38.621.2191.03800.0MALE
42Biscoe35.516.2195.03350.0FEMALE
44Dream39.218.6190.04250.0MALE
45Torgersen35.215.9186.03050.0FEMALE
46Dream43.218.5192.04100.0MALE
49Biscoe39.617.7186.03500.0FEMALE
53Biscoe45.620.3191.04600.0MALE
58Torgersen40.916.8191.03700.0FEMALE
60Torgersen40.318.0195.03250.0FEMALE
62Dream36.018.5186.03100.0FEMALE
63Torgersen39.320.6190.03650.0MALE
\n", "

25 rows × 6 columns

\n", "
[146 rows x 6 columns in total]" ], "text/plain": [ " island culmen_length_mm culmen_depth_mm flipper_length_mm \\\n", "0 Biscoe 40.1 18.9 188.0 \n", "1 Torgersen 39.1 18.7 181.0 \n", "4 Biscoe 43.2 19.0 197.0 \n", "6 Biscoe 41.3 21.1 195.0 \n", "11 Dream 38.1 18.6 190.0 \n", "13 Biscoe 37.8 20.0 190.0 \n", "14 Biscoe 35.0 17.9 190.0 \n", "16 Torgersen 34.6 21.1 198.0 \n", "19 Dream 37.2 18.1 178.0 \n", "21 Biscoe 40.5 17.9 187.0 \n", "23 Dream 42.2 18.5 180.0 \n", "30 Dream 39.2 21.1 196.0 \n", "32 Torgersen 42.9 17.6 196.0 \n", "38 Dream 41.1 17.5 190.0 \n", "40 Torgersen 38.6 21.2 191.0 \n", "42 Biscoe 35.5 16.2 195.0 \n", "44 Dream 39.2 18.6 190.0 \n", "45 Torgersen 35.2 15.9 186.0 \n", "46 Dream 43.2 18.5 192.0 \n", "49 Biscoe 39.6 17.7 186.0 \n", "53 Biscoe 45.6 20.3 191.0 \n", "58 Torgersen 40.9 16.8 191.0 \n", "60 Torgersen 40.3 18.0 195.0 \n", "62 Dream 36.0 18.5 186.0 \n", "63 Torgersen 39.3 20.6 190.0 \n", "\n", " body_mass_g sex \n", "0 4300.0 MALE \n", "1 3750.0 MALE \n", "4 4775.0 MALE \n", "6 4400.0 MALE \n", "11 3700.0 FEMALE \n", "13 4250.0 MALE \n", "14 3450.0 FEMALE \n", "16 4400.0 MALE \n", "19 3900.0 MALE \n", "21 3200.0 FEMALE \n", "23 3550.0 FEMALE \n", "30 4150.0 MALE \n", "32 4700.0 MALE \n", "38 3900.0 MALE \n", "40 3800.0 MALE \n", "42 3350.0 FEMALE \n", "44 4250.0 MALE \n", "45 3050.0 FEMALE \n", "46 4100.0 MALE \n", "49 3500.0 FEMALE \n", "53 4600.0 MALE \n", "58 3700.0 FEMALE \n", "60 3250.0 FEMALE \n", "62 3100.0 FEMALE \n", "63 3650.0 MALE \n", "...\n", "\n", "[146 rows x 6 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# filter down to the data we want to analyze\n", "adelie_data = df[df.species == \"Adelie Penguin (Pygoscelis adeliae)\"]\n", "\n", "# drop the columns we don't care about\n", "adelie_data = adelie_data.drop(columns=[\"species\"])\n", "\n", "# drop rows with nulls to get our training data\n", "training_data = adelie_data.dropna()\n", "\n", "# take a peek at the training data\n", "training_data" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Use `model_selection.train_test_split` to prepare training data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job 0808457b-a0df-4a37-b7a5-8885f4a4588c is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from bigframes.ml.model_selection import train_test_split\n", "\n", "feature_columns = training_data[['island', 'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'sex']]\n", "label_columns = training_data[['body_mass_g']] \n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " feature_columns, label_columns, test_size=0.2)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Configure a linear regression pipeline with preprocessing" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Pipeline(steps=[('preproc',\n", " ColumnTransformer(transformers=[('onehot', OneHotEncoder(),\n", " ['island', 'species', 'sex']),\n", " ('scaler', StandardScaler(),\n", " ['culmen_depth_mm',\n", " 'culmen_length_mm',\n", " 'flipper_length_mm'])])),\n", " ('linreg', LinearRegression(fit_intercept=False))])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bigframes.ml.linear_model import LinearRegression\n", "from bigframes.ml.pipeline import Pipeline\n", "from bigframes.ml.compose import ColumnTransformer\n", "from bigframes.ml.preprocessing import StandardScaler, OneHotEncoder\n", "\n", "preprocessing = ColumnTransformer([\n", " (\"onehot\", OneHotEncoder(), [\"island\", \"sex\"]),\n", " (\"scaler\", StandardScaler(), [\"culmen_depth_mm\", \"culmen_length_mm\", \"flipper_length_mm\"]),\n", "])\n", "\n", "model = LinearRegression(fit_intercept=False)\n", "\n", "pipeline = Pipeline([\n", " ('preproc', preprocessing),\n", " ('linreg', model)\n", "])\n", "\n", "# TODO(bmil): pretty printing for pipelines\n", "pipeline" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Fit the pipeline to the training data\n", "\n", "This will create a temporary BQML model in BigQuery" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job e9bfa6a5-a53f-4d8b-ae8c-cc8cd55d0947 is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job d8d553cf-3d36-49aa-b18b-9a05576a1fb0 is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 75ef0083-9a4f-4ffb-a6c6-d82974a1659f is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "Pipeline(steps=[('preproc',\n", " ColumnTransformer(transformers=[('onehot', OneHotEncoder(),\n", " ['island', 'species', 'sex']),\n", " ('scaler', StandardScaler(),\n", " ['culmen_depth_mm',\n", " 'culmen_length_mm',\n", " 'flipper_length_mm'])])),\n", " ('linreg', LinearRegression(fit_intercept=False))])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipeline.fit(X_train, y_train)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Score the pipeline on the test data with `metrics.r2_score`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job 55c5a9ce-8159-4a1a-99a4-af3a906640ba is DONE. 29.3 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 3e41c470-de70-4f13-89d9-c5564d0b2836 is DONE. 232 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job ed2f9042-a737-4d13-bd21-8c3d29cd61a2 is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 815d16b5-0a5d-42be-a766-1cff5b8f22f2 is DONE. 28.9 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 37a38dc6-5073-4544-a1e3-da145a843922 is DONE. 29.4 kB processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0.2655729213572775" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bigframes.ml.metrics import r2_score\n", "\n", "y_pred = pipeline.predict(X_test)[\"predicted_body_mass_g\"]\n", "\n", "r2_score(y_test, y_pred)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Inference the model on new data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Load job 7b46750c-70b4-468d-87ba-9f84f579f2a6 is DONE. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas\n", "\n", "new_penguins = bigframes.pandas.read_pandas(\n", " pandas.DataFrame(\n", " {\n", " \"tag_number\": [1633, 1672, 1690],\n", " \"species\": [\n", " \"Adelie Penguin (Pygoscelis adeliae)\",\n", " \"Adelie Penguin (Pygoscelis adeliae)\",\n", " \"Adelie Penguin (Pygoscelis adeliae)\",\n", " ],\n", " \"island\": [\"Torgersen\", \"Torgersen\", \"Dream\"],\n", " \"culmen_length_mm\": [39.5, 38.5, 37.9],\n", " \"culmen_depth_mm\": [18.8, 17.2, 18.1],\n", " \"flipper_length_mm\": [196.0, 181.0, 188.0],\n", " \"sex\": [\"MALE\", \"FEMALE\", \"FEMALE\"],\n", " }\n", " ).set_index(\"tag_number\")\n", " )" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Query job d10dd37d-5e8e-4e15-9c83-a7e9a4c592a8 is DONE. 593 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 207cb787-cf8a-43ea-8e73-644d3f58b11a is DONE. 24 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job c5dc5075-cac0-4947-9e9f-06aa9cc5bd2a is DONE. 0 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Query job 2ca4a569-7186-48ed-b3e4-004dca704798 is DONE. 282 Bytes processed. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
predicted_body_mass_gspeciesislandculmen_length_mmculmen_depth_mmflipper_length_mmsex
tag_number
16334017.203152Adelie Penguin (Pygoscelis adeliae)Torgersen39.518.8196.0MALE
16723127.601519Adelie Penguin (Pygoscelis adeliae)Torgersen38.517.2181.0FEMALE
16903386.101231Adelie Penguin (Pygoscelis adeliae)Dream37.918.1188.0FEMALE
\n", "

3 rows × 7 columns

\n", "
[3 rows x 7 columns in total]" ], "text/plain": [ " predicted_body_mass_g species \\\n", "tag_number \n", "1633 4017.203152 Adelie Penguin (Pygoscelis adeliae) \n", "1672 3127.601519 Adelie Penguin (Pygoscelis adeliae) \n", "1690 3386.101231 Adelie Penguin (Pygoscelis adeliae) \n", "\n", " island culmen_length_mm culmen_depth_mm flipper_length_mm \\\n", "tag_number \n", "1633 Torgersen 39.5 18.8 196.0 \n", "1672 Torgersen 38.5 17.2 181.0 \n", "1690 Dream 37.9 18.1 188.0 \n", "\n", " sex \n", "tag_number \n", "1633 MALE \n", "1672 FEMALE \n", "1690 FEMALE \n", "\n", "[3 rows x 7 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipeline.predict(new_penguins)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Save in BigQuery" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Copy job d1def4a4-1da1-43a9-8ae5-4459444d993d is DONE. Open Job" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "Pipeline(steps=[('transform',\n", " ColumnTransformer(transformers=[('ont_hot_encoder',\n", " OneHotEncoder(max_categories=1000001,\n", " min_frequency=0),\n", " 'island'),\n", " ('standard_scaler',\n", " StandardScaler(),\n", " 'culmen_length_mm'),\n", " ('standard_scaler',\n", " StandardScaler(),\n", " 'culmen_depth_mm'),\n", " ('standard_scaler',\n", " StandardScaler(),\n", " 'flipper_length_mm'),\n", " ('ont_hot_encoder',\n", " OneHotEncoder(max_categories=1000001,\n", " min_frequency=0),\n", " 'sex')])),\n", " ('estimator',\n", " LinearRegression(fit_intercept=False,\n", " optimize_strategy='NORMAL_EQUATION'))])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipeline.to_gbq(\"bigframes-dev.bigframes_demo_us.penguin_model\", replace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "a850322d07d9bdc9ec5f301d307e048bcab2390ae395e1cbce9335f4e081e5e2" } } }, "nbformat": 4, "nbformat_minor": 2 }