{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Copyright 2023 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Use BigQuery DataFrames to cluster and characterize complaints", "\n", "
\n",
" \n",
" Run in Colab\n",
" \n",
" | \n",
" \n",
" \n",
" \n",
" View on GitHub\n",
" \n",
" | \n",
" \n",
" \n",
" | \n",
" \n",
" \n",
" | \n",
"
| \n", " | consumer_complaint_narrative | \n", "
|---|---|
| 2557016 | \n", "I've been disputing fraud accounts on my credi... | \n", "
| 2557686 | \n", "American Express Platinum totally messed up my... | \n", "
| 2558170 | \n", "I recently looked at my credit report and noti... | \n", "
| 2558545 | \n", "Select Portfolio Servicing contacted my insura... | \n", "
| 2558652 | \n", "I checked my credit report and I am upset on w... | \n", "
| \n", " | ml_generate_embedding_result | \n", "ml_generate_embedding_statistics | \n", "ml_generate_embedding_status | \n", "content | \n", "
|---|---|---|---|---|
| 415 | \n", "[ 2.56774724e-02 -1.06168222e-02 3.06945704e-... | \n", "{\"token_count\":171,\"truncated\":false} | \n", "\n", " | DEPT OF EDUCATION/XXXX is stating I was late ... | \n", "
| 596 | \n", "[ 5.90653270e-02 -9.31344274e-03 -7.12460047e-... | \n", "{\"token_count\":668,\"truncated\":false} | \n", "\n", " | I alerted my credit card company XX/XX/2017 th... | \n", "
| 706 | \n", "[ 0.01298233 0.00130001 0.01800315 0.037078... | \n", "{\"token_count\":252,\"truncated\":false} | \n", "\n", " | Sallie mae is corrupt. \n", "I have tried to talk t... | \n", "
| 804 | \n", "[-1.39777679e-02 1.68943349e-02 5.53999236e-... | \n", "{\"token_count\":412,\"truncated\":false} | \n", "\n", " | In accordance with the Fair Credit Reporting a... | \n", "
| 861 | \n", "[ 2.33309343e-02 -2.36528926e-03 3.37129943e-... | \n", "{\"token_count\":160,\"truncated\":false} | \n", "\n", " | Hello, My name is XXXX XXXX XXXX. I have a pro... | \n", "
| 1030 | \n", "[ 0.06060313 -0.06495965 -0.03605044 -0.028016... | \n", "{\"token_count\":298,\"truncated\":false} | \n", "\n", " | Hello, I would like to complain about PayPal H... | \n", "
| 1582 | \n", "[ 0.01255985 -0.01652482 -0.02638046 0.036858... | \n", "{\"token_count\":814,\"truncated\":false} | \n", "\n", " | Transunion is listing personal information ( n... | \n", "
| 1600 | \n", "[ 5.13355099e-02 4.01246967e-03 5.72342947e-... | \n", "{\"token_count\":653,\"truncated\":false} | \n", "\n", " | On XX/XX/XXXX, I called Citizen Bank at XXXX t... | \n", "
| 2060 | \n", "[ 6.44792162e-04 4.95899878e-02 4.67925966e-... | \n", "{\"token_count\":136,\"truncated\":false} | \n", "\n", " | Theses names are the known liars that I have s... | \n", "
| 2283 | \n", "[ 4.71848622e-02 -8.68239347e-03 5.80501892e-... | \n", "{\"token_count\":478,\"truncated\":false} | \n", "\n", " | My house was hit by a tree XX/XX/2018. My insu... | \n", "
| 2421 | \n", "[-2.90394691e-03 -1.81679502e-02 -7.99657404e-... | \n", "{\"token_count\":389,\"truncated\":false} | \n", "\n", " | I became aware of a credit inquiry on my XXXX... | \n", "
| 2422 | \n", "[-6.70500053e-03 1.51133696e-02 4.94448021e-... | \n", "{\"token_count\":124,\"truncated\":false} | \n", "\n", " | I have sent numerous letters, police reports a... | \n", "
| 2658 | \n", "[ 6.70989677e-02 -3.53626162e-02 1.08648362e-... | \n", "{\"token_count\":762,\"truncated\":false} | \n", "\n", " | This letter concerns two disputes ( chargeback... | \n", "
| 2883 | \n", "[-1.28255319e-02 -1.89735275e-02 5.68657108e-... | \n", "{\"token_count\":71,\"truncated\":false} | \n", "\n", " | It is very frustrating that this has been goin... | \n", "
| 2951 | \n", "[ 3.23301251e-03 -2.61142217e-02 1.31891826e-... | \n", "{\"token_count\":95,\"truncated\":false} | \n", "\n", " | I, the consumer, in fact, have a right to priv... | \n", "
| 2992 | \n", "[-2.22910382e-03 -1.07050659e-02 4.74211425e-... | \n", "{\"token_count\":407,\"truncated\":false} | \n", "\n", " | XXXX XXXX XXXX should not be reporting to Expe... | \n", "
| 3969 | \n", "[ 1.58297736e-02 3.01055871e-02 5.60088176e-... | \n", "{\"token_count\":287,\"truncated\":false} | \n", "\n", " | DEAR CFPB ; XXXX ; XXXX ; AND TRANSUNION ; SEE... | \n", "
| 4087 | \n", "[ 1.99207035e-03 -7.62321474e-03 7.92114343e-... | \n", "{\"token_count\":88,\"truncated\":false} | \n", "\n", " | This debt was from my identity being stolen I ... | \n", "
| 4326 | \n", "[ 3.44273262e-02 -3.36350128e-02 1.91939529e-... | \n", "{\"token_count\":52,\"truncated\":false} | \n", "\n", " | The items that are reflected on my credit repo... | \n", "
| 4682 | \n", "[ 2.47727744e-02 -1.77769139e-02 4.63737026e-... | \n", "{\"token_count\":284,\"truncated\":false} | \n", "\n", " | I filed for chapter XXXX bankruptcy on XXXX... | \n", "
| 5005 | \n", "[ 2.51834448e-02 -4.92606424e-02 -1.37688573e-... | \n", "{\"token_count\":17,\"truncated\":false} | \n", "\n", " | There are 2 Inquires on my credit report that ... | \n", "
| 5144 | \n", "[ 3.26358266e-02 -3.67171178e-03 3.65621522e-... | \n", "{\"token_count\":105,\"truncated\":false} | \n", "\n", " | My mortgage was sold from XXXX XXXX to freed... | \n", "
| 6090 | \n", "[ 2.47520711e-02 1.09149124e-02 1.35175223e-... | \n", "{\"token_count\":545,\"truncated\":false} | \n", "\n", " | On XX/XX/XXXX this company received certified... | \n", "
| 6449 | \n", "[ 1.86854266e-02 1.31238240e-03 -4.96791191e-... | \n", "{\"token_count\":104,\"truncated\":false} | \n", "\n", " | After hours on the phone with multiple agents,... | \n", "
| 6486 | \n", "[ 1.56347770e-02 2.23377198e-02 -1.32683543e-... | \n", "{\"token_count\":211,\"truncated\":false} | \n", "\n", " | On XX/XX/2019 two charges one for XXXX and one... | \n", "
25 rows × 4 columns
\n", "| \n", " | ml_generate_embedding_result | \n", "ml_generate_embedding_statistics | \n", "ml_generate_embedding_status | \n", "content | \n", "
|---|---|---|---|---|
| 415 | \n", "[ 2.56774724e-02 -1.06168222e-02 3.06945704e-... | \n", "{\"token_count\":171,\"truncated\":false} | \n", "\n", " | DEPT OF EDUCATION/XXXX is stating I was late ... | \n", "
| 596 | \n", "[ 5.90653270e-02 -9.31344274e-03 -7.12460047e-... | \n", "{\"token_count\":668,\"truncated\":false} | \n", "\n", " | I alerted my credit card company XX/XX/2017 th... | \n", "
| 706 | \n", "[ 0.01298233 0.00130001 0.01800315 0.037078... | \n", "{\"token_count\":252,\"truncated\":false} | \n", "\n", " | Sallie mae is corrupt. \n", "I have tried to talk t... | \n", "
| 804 | \n", "[-1.39777679e-02 1.68943349e-02 5.53999236e-... | \n", "{\"token_count\":412,\"truncated\":false} | \n", "\n", " | In accordance with the Fair Credit Reporting a... | \n", "
| 861 | \n", "[ 2.33309343e-02 -2.36528926e-03 3.37129943e-... | \n", "{\"token_count\":160,\"truncated\":false} | \n", "\n", " | Hello, My name is XXXX XXXX XXXX. I have a pro... | \n", "
| 1030 | \n", "[ 0.06060313 -0.06495965 -0.03605044 -0.028016... | \n", "{\"token_count\":298,\"truncated\":false} | \n", "\n", " | Hello, I would like to complain about PayPal H... | \n", "
| 1582 | \n", "[ 0.01255985 -0.01652482 -0.02638046 0.036858... | \n", "{\"token_count\":814,\"truncated\":false} | \n", "\n", " | Transunion is listing personal information ( n... | \n", "
| 1600 | \n", "[ 5.13355099e-02 4.01246967e-03 5.72342947e-... | \n", "{\"token_count\":653,\"truncated\":false} | \n", "\n", " | On XX/XX/XXXX, I called Citizen Bank at XXXX t... | \n", "
| 2060 | \n", "[ 6.44792162e-04 4.95899878e-02 4.67925966e-... | \n", "{\"token_count\":136,\"truncated\":false} | \n", "\n", " | Theses names are the known liars that I have s... | \n", "
| 2283 | \n", "[ 4.71848622e-02 -8.68239347e-03 5.80501892e-... | \n", "{\"token_count\":478,\"truncated\":false} | \n", "\n", " | My house was hit by a tree XX/XX/2018. My insu... | \n", "
| 2421 | \n", "[-2.90394691e-03 -1.81679502e-02 -7.99657404e-... | \n", "{\"token_count\":389,\"truncated\":false} | \n", "\n", " | I became aware of a credit inquiry on my XXXX... | \n", "
| 2422 | \n", "[-6.70500053e-03 1.51133696e-02 4.94448021e-... | \n", "{\"token_count\":124,\"truncated\":false} | \n", "\n", " | I have sent numerous letters, police reports a... | \n", "
| 2658 | \n", "[ 6.70989677e-02 -3.53626162e-02 1.08648362e-... | \n", "{\"token_count\":762,\"truncated\":false} | \n", "\n", " | This letter concerns two disputes ( chargeback... | \n", "
| 2883 | \n", "[-1.28255319e-02 -1.89735275e-02 5.68657108e-... | \n", "{\"token_count\":71,\"truncated\":false} | \n", "\n", " | It is very frustrating that this has been goin... | \n", "
| 2951 | \n", "[ 3.23301251e-03 -2.61142217e-02 1.31891826e-... | \n", "{\"token_count\":95,\"truncated\":false} | \n", "\n", " | I, the consumer, in fact, have a right to priv... | \n", "
| 2992 | \n", "[-2.22910382e-03 -1.07050659e-02 4.74211425e-... | \n", "{\"token_count\":407,\"truncated\":false} | \n", "\n", " | XXXX XXXX XXXX should not be reporting to Expe... | \n", "
| 3969 | \n", "[ 1.58297736e-02 3.01055871e-02 5.60088176e-... | \n", "{\"token_count\":287,\"truncated\":false} | \n", "\n", " | DEAR CFPB ; XXXX ; XXXX ; AND TRANSUNION ; SEE... | \n", "
| 4087 | \n", "[ 1.99207035e-03 -7.62321474e-03 7.92114343e-... | \n", "{\"token_count\":88,\"truncated\":false} | \n", "\n", " | This debt was from my identity being stolen I ... | \n", "
| 4326 | \n", "[ 3.44273262e-02 -3.36350128e-02 1.91939529e-... | \n", "{\"token_count\":52,\"truncated\":false} | \n", "\n", " | The items that are reflected on my credit repo... | \n", "
| 4682 | \n", "[ 2.47727744e-02 -1.77769139e-02 4.63737026e-... | \n", "{\"token_count\":284,\"truncated\":false} | \n", "\n", " | I filed for chapter XXXX bankruptcy on XXXX... | \n", "
| 5005 | \n", "[ 2.51834448e-02 -4.92606424e-02 -1.37688573e-... | \n", "{\"token_count\":17,\"truncated\":false} | \n", "\n", " | There are 2 Inquires on my credit report that ... | \n", "
| 5144 | \n", "[ 3.26358266e-02 -3.67171178e-03 3.65621522e-... | \n", "{\"token_count\":105,\"truncated\":false} | \n", "\n", " | My mortgage was sold from XXXX XXXX to freed... | \n", "
| 6090 | \n", "[ 2.47520711e-02 1.09149124e-02 1.35175223e-... | \n", "{\"token_count\":545,\"truncated\":false} | \n", "\n", " | On XX/XX/XXXX this company received certified... | \n", "
| 6449 | \n", "[ 1.86854266e-02 1.31238240e-03 -4.96791191e-... | \n", "{\"token_count\":104,\"truncated\":false} | \n", "\n", " | After hours on the phone with multiple agents,... | \n", "
| 6486 | \n", "[ 1.56347770e-02 2.23377198e-02 -1.32683543e-... | \n", "{\"token_count\":211,\"truncated\":false} | \n", "\n", " | On XX/XX/2019 two charges one for XXXX and one... | \n", "
25 rows × 4 columns
\n", "| \n", " | CENTROID_ID | \n", "NEAREST_CENTROIDS_DISTANCE | \n", "ml_generate_embedding_result | \n", "ml_generate_embedding_statistics | \n", "ml_generate_embedding_status | \n", "content | \n", "
|---|---|---|---|---|---|---|
| 3172121 | \n", "1 | \n", "[{'CENTROID_ID': 1, 'DISTANCE': 0.756634267893... | \n", "[ 3.18095312e-02 -3.54472063e-02 -7.13569671e-... | \n", "{\"token_count\":10,\"truncated\":false} | \n", "\n", " | Company did not provide verification and detai... | \n", "
| 2137420 | \n", "1 | \n", "[{'CENTROID_ID': 1, 'DISTANCE': 0.606628249825... | \n", "[ 1.91578846e-02 5.55988774e-02 8.88887007e-... | \n", "{\"token_count\":100,\"truncated\":false} | \n", "\n", " | I have already filed a dispute with Consumer A... | \n", "
| 2350775 | \n", "1 | \n", "[{'CENTROID_ID': 1, 'DISTANCE': 0.606676295233... | \n", "[ 2.25369893e-02 2.29400061e-02 -6.42273854e-... | \n", "{\"token_count\":100,\"truncated\":false} | \n", "\n", " | I informed Central Financial Control & provide... | \n", "
| 2904146 | \n", "1 | \n", "[{'CENTROID_ID': 1, 'DISTANCE': 0.596729348974... | \n", "[ 9.35115516e-02 4.27814946e-03 4.62085977e-... | \n", "{\"token_count\":100,\"truncated\":false} | \n", "\n", " | I received a letter from a collections agency ... | \n", "
| 1075571 | \n", "1 | \n", "[{'CENTROID_ID': 1, 'DISTANCE': 0.453806107968... | \n", "[-1.93953840e-03 -5.80236455e-03 8.49655271e-... | \n", "{\"token_count\":100,\"truncated\":false} | \n", "\n", " | I have not done business with this company, i ... | \n", "