# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Use BigQuery DataFrames to cluster and characterize complaints#
Run in Colab
|
|
|
|
Overview#
The goal of this notebook is to demonstrate a comment characterization algorithm for an online business. We will accomplish this using Google’s Embedding Models and KMeans clustering in three steps:
Use TextEmbeddingGenerator to generate text embeddings for each of 10000 complaints sent to an online bank. If you’re not familiar with what a text embedding is, it’s a list of numbers that are like coordinates in an imaginary “meaning space” for sentences. (It’s like word embeddings, but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.
Use KMeans clustering to group together complaints whose text embeddings are near to eachother. This will give us sets of similar complaints, but we don’t yet know why these complaints are similar.
Prompt GeminiTextGenerator in English asking what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to “understand the limits of your dataset and model.”
We will tie these pieces together in Python using BigQuery DataFrames. Click here to learn more about BigQuery DataFrames!
Dataset#
This notebook uses the CFPB Consumer Complaint Database.
Costs#
This tutorial uses billable components of Google Cloud:
BigQuery (compute)
BigQuery ML
Generative AI support on Vertex AI
Learn about BigQuery compute pricing, Generative AI support on Vertex AI pricing, and BigQuery ML pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.
Before you begin#
Complete the tasks in this section to set up your environment.
Set up your Google Cloud project#
The following steps are required, regardless of your notebook environment.
Select or create a Google Cloud project. When you first create an account, you get a $300 credit towards your compute/storage costs.
Click here to enable the following APIs:
BigQuery API
BigQuery Connection API
Vertex AI API
If you are running this notebook locally, install the Cloud SDK.
Set your project ID#
If you don’t know your project ID, see the support page: Locate the project ID
# set your project ID below
PROJECT_ID = "" # @param {type:"string"}
# Set the project id in gcloud
#! gcloud config set project {PROJECT_ID}
Authenticate your Google Cloud account#
Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below.
Vertex AI Workbench
Do nothing, you are already authenticated.
Local JupyterLab instance
Uncomment and run the following cell:
# ! gcloud auth login
Colab
Uncomment and run the following cell:
# from google.colab import auth
# auth.authenticate_user()
Now we are ready to use BigQuery DataFrames!
Step 1: Text embedding#
BigQuery DataFrames setup
import bigframes.pandas as bf
# Note: The project option is not required in all environments.
# On BigQuery Studio, the project ID is automatically detected.
bf.options.bigquery.project = PROJECT_ID
If you want to reset the location of the created DataFrame or Series objects, reset the session by executing bf.close_session(). After that, you can reuse bf.options.bigquery.location to specify another location.
Data Input - read the data from a publicly available BigQuery dataset
input_df = bf.read_gbq("bigquery-public-data.cfpb_complaints.complaint_database")
issues_df = input_df[["consumer_complaint_narrative"]].dropna()
issues_df.peek(n=5) # View an arbitrary five complaints
| consumer_complaint_narrative | |
|---|---|
| 2557016 | I've been disputing fraud accounts on my credi... |
| 2557686 | American Express Platinum totally messed up my... |
| 2558170 | I recently looked at my credit report and noti... |
| 2558545 | Select Portfolio Servicing contacted my insura... |
| 2558652 | I checked my credit report and I am upset on w... |
Downsample DataFrame to 10,000 records for model training.
# Choose 10,000 complaints randomly and store them in a column in a DataFrame
downsampled_issues_df = issues_df.sample(n=10000)
Generate the text embeddings
from bigframes.ml.llm import TextEmbeddingGenerator
model = TextEmbeddingGenerator() # No connection id needed
# Will take ~3 minutes to compute the embeddings
predicted_embeddings = model.predict(downsampled_issues_df)
# Notice the lists of numbers that are our text embeddings for each complaint
predicted_embeddings
/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/__init__.py:108: PreviewWarning: Interpreting JSON column(s) as StringDtype. This behavior may change in future versions.
warnings.warn(
| ml_generate_embedding_result | ml_generate_embedding_statistics | ml_generate_embedding_status | content | |
|---|---|---|---|---|
| 415 | [ 2.56774724e-02 -1.06168222e-02 3.06945704e-... | {"token_count":171,"truncated":false} | DEPT OF EDUCATION/XXXX is stating I was late ... | |
| 596 | [ 5.90653270e-02 -9.31344274e-03 -7.12460047e-... | {"token_count":668,"truncated":false} | I alerted my credit card company XX/XX/2017 th... | |
| 706 | [ 0.01298233 0.00130001 0.01800315 0.037078... | {"token_count":252,"truncated":false} | Sallie mae is corrupt. I have tried to talk t... | |
| 804 | [-1.39777679e-02 1.68943349e-02 5.53999236e-... | {"token_count":412,"truncated":false} | In accordance with the Fair Credit Reporting a... | |
| 861 | [ 2.33309343e-02 -2.36528926e-03 3.37129943e-... | {"token_count":160,"truncated":false} | Hello, My name is XXXX XXXX XXXX. I have a pro... | |
| 1030 | [ 0.06060313 -0.06495965 -0.03605044 -0.028016... | {"token_count":298,"truncated":false} | Hello, I would like to complain about PayPal H... | |
| 1582 | [ 0.01255985 -0.01652482 -0.02638046 0.036858... | {"token_count":814,"truncated":false} | Transunion is listing personal information ( n... | |
| 1600 | [ 5.13355099e-02 4.01246967e-03 5.72342947e-... | {"token_count":653,"truncated":false} | On XX/XX/XXXX, I called Citizen Bank at XXXX t... | |
| 2060 | [ 6.44792162e-04 4.95899878e-02 4.67925966e-... | {"token_count":136,"truncated":false} | Theses names are the known liars that I have s... | |
| 2283 | [ 4.71848622e-02 -8.68239347e-03 5.80501892e-... | {"token_count":478,"truncated":false} | My house was hit by a tree XX/XX/2018. My insu... | |
| 2421 | [-2.90394691e-03 -1.81679502e-02 -7.99657404e-... | {"token_count":389,"truncated":false} | I became aware of a credit inquiry on my XXXX... | |
| 2422 | [-6.70500053e-03 1.51133696e-02 4.94448021e-... | {"token_count":124,"truncated":false} | I have sent numerous letters, police reports a... | |
| 2658 | [ 6.70989677e-02 -3.53626162e-02 1.08648362e-... | {"token_count":762,"truncated":false} | This letter concerns two disputes ( chargeback... | |
| 2883 | [-1.28255319e-02 -1.89735275e-02 5.68657108e-... | {"token_count":71,"truncated":false} | It is very frustrating that this has been goin... | |
| 2951 | [ 3.23301251e-03 -2.61142217e-02 1.31891826e-... | {"token_count":95,"truncated":false} | I, the consumer, in fact, have a right to priv... | |
| 2992 | [-2.22910382e-03 -1.07050659e-02 4.74211425e-... | {"token_count":407,"truncated":false} | XXXX XXXX XXXX should not be reporting to Expe... | |
| 3969 | [ 1.58297736e-02 3.01055871e-02 5.60088176e-... | {"token_count":287,"truncated":false} | DEAR CFPB ; XXXX ; XXXX ; AND TRANSUNION ; SEE... | |
| 4087 | [ 1.99207035e-03 -7.62321474e-03 7.92114343e-... | {"token_count":88,"truncated":false} | This debt was from my identity being stolen I ... | |
| 4326 | [ 3.44273262e-02 -3.36350128e-02 1.91939529e-... | {"token_count":52,"truncated":false} | The items that are reflected on my credit repo... | |
| 4682 | [ 2.47727744e-02 -1.77769139e-02 4.63737026e-... | {"token_count":284,"truncated":false} | I filed for chapter XXXX bankruptcy on XXXX... | |
| 5005 | [ 2.51834448e-02 -4.92606424e-02 -1.37688573e-... | {"token_count":17,"truncated":false} | There are 2 Inquires on my credit report that ... | |
| 5144 | [ 3.26358266e-02 -3.67171178e-03 3.65621522e-... | {"token_count":105,"truncated":false} | My mortgage was sold from XXXX XXXX to freed... | |
| 6090 | [ 2.47520711e-02 1.09149124e-02 1.35175223e-... | {"token_count":545,"truncated":false} | On XX/XX/XXXX this company received certified... | |
| 6449 | [ 1.86854266e-02 1.31238240e-03 -4.96791191e-... | {"token_count":104,"truncated":false} | After hours on the phone with multiple agents,... | |
| 6486 | [ 1.56347770e-02 2.23377198e-02 -1.32683543e-... | {"token_count":211,"truncated":false} | On XX/XX/2019 two charges one for XXXX and one... |
25 rows × 4 columns
The model may have encountered errors while calculating embeddings for some rows. Filter out the errored rows before training the model. Alternatively, select these rows and retry the embeddings.
successful_rows = (
(predicted_embeddings["ml_generate_embedding_status"] == "")
# Series.str.len() gives the length of an array.
# See: https://stackoverflow.com/a/41340543/101923
& (predicted_embeddings["ml_generate_embedding_result"].str.len() != 0)
)
predicted_embeddings = predicted_embeddings[successful_rows]
predicted_embeddings
| ml_generate_embedding_result | ml_generate_embedding_statistics | ml_generate_embedding_status | content | |
|---|---|---|---|---|
| 415 | [ 2.56774724e-02 -1.06168222e-02 3.06945704e-... | {"token_count":171,"truncated":false} | DEPT OF EDUCATION/XXXX is stating I was late ... | |
| 596 | [ 5.90653270e-02 -9.31344274e-03 -7.12460047e-... | {"token_count":668,"truncated":false} | I alerted my credit card company XX/XX/2017 th... | |
| 706 | [ 0.01298233 0.00130001 0.01800315 0.037078... | {"token_count":252,"truncated":false} | Sallie mae is corrupt. I have tried to talk t... | |
| 804 | [-1.39777679e-02 1.68943349e-02 5.53999236e-... | {"token_count":412,"truncated":false} | In accordance with the Fair Credit Reporting a... | |
| 861 | [ 2.33309343e-02 -2.36528926e-03 3.37129943e-... | {"token_count":160,"truncated":false} | Hello, My name is XXXX XXXX XXXX. I have a pro... | |
| 1030 | [ 0.06060313 -0.06495965 -0.03605044 -0.028016... | {"token_count":298,"truncated":false} | Hello, I would like to complain about PayPal H... | |
| 1582 | [ 0.01255985 -0.01652482 -0.02638046 0.036858... | {"token_count":814,"truncated":false} | Transunion is listing personal information ( n... | |
| 1600 | [ 5.13355099e-02 4.01246967e-03 5.72342947e-... | {"token_count":653,"truncated":false} | On XX/XX/XXXX, I called Citizen Bank at XXXX t... | |
| 2060 | [ 6.44792162e-04 4.95899878e-02 4.67925966e-... | {"token_count":136,"truncated":false} | Theses names are the known liars that I have s... | |
| 2283 | [ 4.71848622e-02 -8.68239347e-03 5.80501892e-... | {"token_count":478,"truncated":false} | My house was hit by a tree XX/XX/2018. My insu... | |
| 2421 | [-2.90394691e-03 -1.81679502e-02 -7.99657404e-... | {"token_count":389,"truncated":false} | I became aware of a credit inquiry on my XXXX... | |
| 2422 | [-6.70500053e-03 1.51133696e-02 4.94448021e-... | {"token_count":124,"truncated":false} | I have sent numerous letters, police reports a... | |
| 2658 | [ 6.70989677e-02 -3.53626162e-02 1.08648362e-... | {"token_count":762,"truncated":false} | This letter concerns two disputes ( chargeback... | |
| 2883 | [-1.28255319e-02 -1.89735275e-02 5.68657108e-... | {"token_count":71,"truncated":false} | It is very frustrating that this has been goin... | |
| 2951 | [ 3.23301251e-03 -2.61142217e-02 1.31891826e-... | {"token_count":95,"truncated":false} | I, the consumer, in fact, have a right to priv... | |
| 2992 | [-2.22910382e-03 -1.07050659e-02 4.74211425e-... | {"token_count":407,"truncated":false} | XXXX XXXX XXXX should not be reporting to Expe... | |
| 3969 | [ 1.58297736e-02 3.01055871e-02 5.60088176e-... | {"token_count":287,"truncated":false} | DEAR CFPB ; XXXX ; XXXX ; AND TRANSUNION ; SEE... | |
| 4087 | [ 1.99207035e-03 -7.62321474e-03 7.92114343e-... | {"token_count":88,"truncated":false} | This debt was from my identity being stolen I ... | |
| 4326 | [ 3.44273262e-02 -3.36350128e-02 1.91939529e-... | {"token_count":52,"truncated":false} | The items that are reflected on my credit repo... | |
| 4682 | [ 2.47727744e-02 -1.77769139e-02 4.63737026e-... | {"token_count":284,"truncated":false} | I filed for chapter XXXX bankruptcy on XXXX... | |
| 5005 | [ 2.51834448e-02 -4.92606424e-02 -1.37688573e-... | {"token_count":17,"truncated":false} | There are 2 Inquires on my credit report that ... | |
| 5144 | [ 3.26358266e-02 -3.67171178e-03 3.65621522e-... | {"token_count":105,"truncated":false} | My mortgage was sold from XXXX XXXX to freed... | |
| 6090 | [ 2.47520711e-02 1.09149124e-02 1.35175223e-... | {"token_count":545,"truncated":false} | On XX/XX/XXXX this company received certified... | |
| 6449 | [ 1.86854266e-02 1.31238240e-03 -4.96791191e-... | {"token_count":104,"truncated":false} | After hours on the phone with multiple agents,... | |
| 6486 | [ 1.56347770e-02 2.23377198e-02 -1.32683543e-... | {"token_count":211,"truncated":false} | On XX/XX/2019 two charges one for XXXX and one... |
25 rows × 4 columns
We now have the complaints and their text embeddings as two columns in our predicted_embeddings DataFrame.
Step 2: Create k-means model and predict clusters#
from bigframes.ml.cluster import KMeans
cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups
Perform KMeans clustering
# Use KMeans clustering to calculate our groups. Will take ~3 minutes.
cluster_model.fit(predicted_embeddings[["ml_generate_embedding_result"]])
clustered_result = cluster_model.predict(predicted_embeddings)
# Notice the CENTROID_ID column, which is the ID number of the group that
# each complaint belongs to.
clustered_result.peek(n=5)
| CENTROID_ID | NEAREST_CENTROIDS_DISTANCE | ml_generate_embedding_result | ml_generate_embedding_statistics | ml_generate_embedding_status | content | |
|---|---|---|---|---|---|---|
| 3172121 | 1 | [{'CENTROID_ID': 1, 'DISTANCE': 0.756634267893... | [ 3.18095312e-02 -3.54472063e-02 -7.13569671e-... | {"token_count":10,"truncated":false} | Company did not provide verification and detai... | |
| 2137420 | 1 | [{'CENTROID_ID': 1, 'DISTANCE': 0.606628249825... | [ 1.91578846e-02 5.55988774e-02 8.88887007e-... | {"token_count":100,"truncated":false} | I have already filed a dispute with Consumer A... | |
| 2350775 | 1 | [{'CENTROID_ID': 1, 'DISTANCE': 0.606676295233... | [ 2.25369893e-02 2.29400061e-02 -6.42273854e-... | {"token_count":100,"truncated":false} | I informed Central Financial Control & provide... | |
| 2904146 | 1 | [{'CENTROID_ID': 1, 'DISTANCE': 0.596729348974... | [ 9.35115516e-02 4.27814946e-03 4.62085977e-... | {"token_count":100,"truncated":false} | I received a letter from a collections agency ... | |
| 1075571 | 1 | [{'CENTROID_ID': 1, 'DISTANCE': 0.453806107968... | [-1.93953840e-03 -5.80236455e-03 8.49655271e-... | {"token_count":100,"truncated":false} | I have not done business with this company, i ... |
Our DataFrame clustered_result now has an additional column that includes an ID from 1-10 (inclusive) indicating which semantically similar group they belong to.
Step 3: Use Gemini to summarize complaint clusters#
Build prompts - we will choose just two of our categories and prompt GeminiTextGenerator to identify their salient characteristics. The prompt is natural language in a python string.
# Using bigframes, with syntax identical to pandas,
# filter out the first and second groups
cluster_1_result = clustered_result[
clustered_result["CENTROID_ID"] == 1
][["content"]]
cluster_1_result_pandas = cluster_1_result.head(5).to_pandas()
cluster_2_result = clustered_result[
clustered_result["CENTROID_ID"] == 2
][["content"]]
cluster_2_result_pandas = cluster_2_result.head(5).to_pandas()
# Build plain-text prompts to send to Gemini. Use only 5 complaints from each group.
prompt1 = 'comment list 1:\n'
for i in range(5):
prompt1 += str(i + 1) + '. ' + \
cluster_1_result_pandas["content"].iloc[i] + '\n'
prompt2 = 'comment list 2:\n'
for i in range(5):
prompt2 += str(i + 1) + '. ' + \
cluster_2_result_pandas["content"].iloc[i] + '\n'
print(prompt1)
print(prompt2)
comment list 1:
1. This debt was from my identity being stolen I didnt open any account that resulted in this collection i have completed a police report which can be verified with the XXXX police @ XXXX report # XXXX and i have a notarized identity theft affidavit from ftc please remove this off of my credit and close my file ASAP
2. On XX/XX/XXXX this company received certified mail asking for validation of debt. On XX/XX/XXXX the company still did not validate debt owed and they did not mark the debt disputed by XX/XX/XXXX through the major credit reporting bureaus. This is a violation of the FDCPA and FCRA. I did send a second letter which the company received on XX/XX/XXXX . A lady from the company called and talked to me about the debt on XX/XX/XXXX but again did not have the credit bureaus mark the item as disputed. The company still violated the laws. Section [ 15 U.S.C. 1681s-2 ] ( 3 ) duty to provide notice of dispute. If the completeness or accuracy of any information furnished by any person to any consumer reporting agency is disputed to such person by a consumer, the person may not furnish the information to any consumer reporting agency without notice that such information is disputed. ( B ) ti me of notice! The notice required under sub paragraph ( A ) shall be provided to the customer prior to, or no later than 30 days after, furnishing the negative information to a consumer reporting agency described in section 603 ( p ). This company violated the state laws. I received no information until XX/XX/XXXX . Therefore by law the company should have the item removed from the credit agencies such as transunion and XXXX . I tried to call the company back about the laws that was broken and left my name no return call. The copy of my credit reports are below and as you can see the items was n't marked disputed. XXXX is marked disputed because on XX/XX/XXXX I myself disputed the information with the credit bureau. The lady stated they did n't receive my dispute letter until XX/XX/XXXX . Included is certified mail reciepts with date, time stamp, and signature of the person who signed for the certified mail on XX/XX/XXXX and XX/XX/XXXX . So again the company violated the laws and I have all the proof. If I have a contract with this company please send to me by mail a contract bearing my signature of the contract.
3. On XX/XX/2022, Pioneer Credit Recovery of XXXX, NY identified an alleged debt, which I do not owe.
On XX/XX/2022, I wrote a dispute letter to Pioneer, requesting that they stop communication with me, record my dispute, and provide verification of the debt if they believe otherwise.
Pioneer has not responded with verification, but has attempted to collect the debt since then by phone ( XX/XX/2022 ) and mail ( XX/XX/2022 ).
4. Disputed with the company on several occasions and they still havent provided proof in a timely manner. The FCRA gives the company 30 days to respond. I have not gotten a response.
5. I am not aware of this XXXX XXXX XXXX XXXX XXXX , XXXX balance. I have never seen anything dealing with this lender. Also, I have been threated that in 30 days they will seek to make a judgement on debt that does not belong to me. I understand that they are looking to offer me a settlement. However, I do not believe the validity of such debt accusation. Furthermore, I will not be limited to the action of court threats when I did not receive any notice of debt based on communication. The amount is {$880.00} from MBNA which was acquired by Bank of America in 2006. I do not claim debt.
comment list 2:
1. My name is XXXX XXXX XXXX. This issue with a Loan Till Payday account was previously reported to you for collection practices, etc. I had a pay day loan in 2013. At the time, I banked with XXXX XXXX, who advised me that pay day loans are not good, and in the end XXXX closed my bank account, it was involuntary. In the interim, I made payments to the agency. XXXX and XXXX were the primary contacts. On the last payment, due to the fact that I told him I was coming in to pay cash, and they withdrew the funds, electronically, my account was affected. XXXX advised me that the payment made was the last payment and the other ( which was primarily interest remaining ) would be charged off. XXXX later called me and advised that XXXX was not authorized to make that decision and demanded the payment. I do n't understand how one person can cancel the arrangements made by someone else.
In the end, they sold my account. It was reported to you, and that creditor then stated no further collection activity would occur.
Last week I began receiving calls from a collection agency, XXXX XXXX stating I would called for a civil deposition on this account. I do n't even know this agency. Later, I then received another call stating that I needed to hold, and after several clicks was connected to someone at a Mediaction service. I denied the owing the loan and stated it was paid.
Today, I received a call from an outsource service courier about a missed appointment or hearing??? What?? I have no idea who these people are. I called Loan Till Payday and was advised the loan was sold and I needed to settle with the new company. So, does this mean they are continuing to attempt to collect {$200.00}.
I attempted to call the numbers, and now no one picks up just a voicemail. I called the supposed service courier and advised that their number was showing up as a spam/fraud number and that if they were a legitimate company then they should leave their name, location, a number ( not a voicemail ), and the case they are calling me about. I have not been served with any collection documents - why am I being threatened with a deposition???
Telephone number recently calling me : ( XXXX ) XXXX.
Please help.
2. I receive 2 or 3 phone calls every day since early XXXX, my references receive calls. I will gladly satisfy this debt however even after 1st telling them the calls haven't stopped as though they are going to intimidate me. If the calls stopped for just 3 or 4 days I would satisfy my obligation but not because they keep calling me as well as my references.
3. Last month I received a phone call for my husband from XXXX XXXX XXXX saying he owed money and if I did not pay today it would be sent to litigation. The debt was Wachovia/wells Fargo, and account that we have never had. I had my husband call to get more information and they became very nasty with him. I called back asking for documentation on the debt because i did not think it was our debt and they became aggressive. They did email my husband something saying how much he owed, and I called back and asked to be emailed a copy, and the dollar amounts did not match. I called Wells Fargo and went over the above and verified that we have never had an account with them and I sent them the emails the XXXX sent to us and they started a fraud investigation. Yesterday I received another collections letter in the mail from the. Still trying to collect this debt. These people have my husbands full social security number ( we did not give it to them )
4. A company call XXXX XXXX XXXX came onto my private property on XX/XX/2018 and stole my automobile. I did receive any type of notice saying they collecting on a debt. If they take or threaten to take any nonjudicial action ( i.e, without a court order ) to repossess property when there is no present right to possession of the property they is in violation. l did not receive any type of notice asking if they can enter onto my private property and steal my private automobile.
5. Navient financial continues to send me erroneous debt collection emails. I have repeatedly asked them to remove my email address and to cease all communication with me.
I have no relationship with Navient and their continued threatening email is very unsettling.
I just want their erroneous threats to stop.
Below is the latest email I have received from them : Last Day to call this office XXXX by XXXX Regards, XXXX XXXX Team Lead Specialist Charge off Unit XXXX XXXX
# The plain English request we will make of Gemini
prompt = (
"Please highlight the most obvious difference between "
"the two lists of comments:\n" + prompt1 + prompt2
)
print(prompt)
Please highlight the most obvious difference between the two lists of comments:
comment list 1:
1. This debt was from my identity being stolen I didnt open any account that resulted in this collection i have completed a police report which can be verified with the XXXX police @ XXXX report # XXXX and i have a notarized identity theft affidavit from ftc please remove this off of my credit and close my file ASAP
2. On XX/XX/XXXX this company received certified mail asking for validation of debt. On XX/XX/XXXX the company still did not validate debt owed and they did not mark the debt disputed by XX/XX/XXXX through the major credit reporting bureaus. This is a violation of the FDCPA and FCRA. I did send a second letter which the company received on XX/XX/XXXX . A lady from the company called and talked to me about the debt on XX/XX/XXXX but again did not have the credit bureaus mark the item as disputed. The company still violated the laws. Section [ 15 U.S.C. 1681s-2 ] ( 3 ) duty to provide notice of dispute. If the completeness or accuracy of any information furnished by any person to any consumer reporting agency is disputed to such person by a consumer, the person may not furnish the information to any consumer reporting agency without notice that such information is disputed. ( B ) ti me of notice! The notice required under sub paragraph ( A ) shall be provided to the customer prior to, or no later than 30 days after, furnishing the negative information to a consumer reporting agency described in section 603 ( p ). This company violated the state laws. I received no information until XX/XX/XXXX . Therefore by law the company should have the item removed from the credit agencies such as transunion and XXXX . I tried to call the company back about the laws that was broken and left my name no return call. The copy of my credit reports are below and as you can see the items was n't marked disputed. XXXX is marked disputed because on XX/XX/XXXX I myself disputed the information with the credit bureau. The lady stated they did n't receive my dispute letter until XX/XX/XXXX . Included is certified mail reciepts with date, time stamp, and signature of the person who signed for the certified mail on XX/XX/XXXX and XX/XX/XXXX . So again the company violated the laws and I have all the proof. If I have a contract with this company please send to me by mail a contract bearing my signature of the contract.
3. On XX/XX/2022, Pioneer Credit Recovery of XXXX, NY identified an alleged debt, which I do not owe.
On XX/XX/2022, I wrote a dispute letter to Pioneer, requesting that they stop communication with me, record my dispute, and provide verification of the debt if they believe otherwise.
Pioneer has not responded with verification, but has attempted to collect the debt since then by phone ( XX/XX/2022 ) and mail ( XX/XX/2022 ).
4. Disputed with the company on several occasions and they still havent provided proof in a timely manner. The FCRA gives the company 30 days to respond. I have not gotten a response.
5. I am not aware of this XXXX XXXX XXXX XXXX XXXX , XXXX balance. I have never seen anything dealing with this lender. Also, I have been threated that in 30 days they will seek to make a judgement on debt that does not belong to me. I understand that they are looking to offer me a settlement. However, I do not believe the validity of such debt accusation. Furthermore, I will not be limited to the action of court threats when I did not receive any notice of debt based on communication. The amount is {$880.00} from MBNA which was acquired by Bank of America in 2006. I do not claim debt.
comment list 2:
1. My name is XXXX XXXX XXXX. This issue with a Loan Till Payday account was previously reported to you for collection practices, etc. I had a pay day loan in 2013. At the time, I banked with XXXX XXXX, who advised me that pay day loans are not good, and in the end XXXX closed my bank account, it was involuntary. In the interim, I made payments to the agency. XXXX and XXXX were the primary contacts. On the last payment, due to the fact that I told him I was coming in to pay cash, and they withdrew the funds, electronically, my account was affected. XXXX advised me that the payment made was the last payment and the other ( which was primarily interest remaining ) would be charged off. XXXX later called me and advised that XXXX was not authorized to make that decision and demanded the payment. I do n't understand how one person can cancel the arrangements made by someone else.
In the end, they sold my account. It was reported to you, and that creditor then stated no further collection activity would occur.
Last week I began receiving calls from a collection agency, XXXX XXXX stating I would called for a civil deposition on this account. I do n't even know this agency. Later, I then received another call stating that I needed to hold, and after several clicks was connected to someone at a Mediaction service. I denied the owing the loan and stated it was paid.
Today, I received a call from an outsource service courier about a missed appointment or hearing??? What?? I have no idea who these people are. I called Loan Till Payday and was advised the loan was sold and I needed to settle with the new company. So, does this mean they are continuing to attempt to collect {$200.00}.
I attempted to call the numbers, and now no one picks up just a voicemail. I called the supposed service courier and advised that their number was showing up as a spam/fraud number and that if they were a legitimate company then they should leave their name, location, a number ( not a voicemail ), and the case they are calling me about. I have not been served with any collection documents - why am I being threatened with a deposition???
Telephone number recently calling me : ( XXXX ) XXXX.
Please help.
2. I receive 2 or 3 phone calls every day since early XXXX, my references receive calls. I will gladly satisfy this debt however even after 1st telling them the calls haven't stopped as though they are going to intimidate me. If the calls stopped for just 3 or 4 days I would satisfy my obligation but not because they keep calling me as well as my references.
3. Last month I received a phone call for my husband from XXXX XXXX XXXX saying he owed money and if I did not pay today it would be sent to litigation. The debt was Wachovia/wells Fargo, and account that we have never had. I had my husband call to get more information and they became very nasty with him. I called back asking for documentation on the debt because i did not think it was our debt and they became aggressive. They did email my husband something saying how much he owed, and I called back and asked to be emailed a copy, and the dollar amounts did not match. I called Wells Fargo and went over the above and verified that we have never had an account with them and I sent them the emails the XXXX sent to us and they started a fraud investigation. Yesterday I received another collections letter in the mail from the. Still trying to collect this debt. These people have my husbands full social security number ( we did not give it to them )
4. A company call XXXX XXXX XXXX came onto my private property on XX/XX/2018 and stole my automobile. I did receive any type of notice saying they collecting on a debt. If they take or threaten to take any nonjudicial action ( i.e, without a court order ) to repossess property when there is no present right to possession of the property they is in violation. l did not receive any type of notice asking if they can enter onto my private property and steal my private automobile.
5. Navient financial continues to send me erroneous debt collection emails. I have repeatedly asked them to remove my email address and to cease all communication with me.
I have no relationship with Navient and their continued threatening email is very unsettling.
I just want their erroneous threats to stop.
Below is the latest email I have received from them : Last Day to call this office XXXX by XXXX Regards, XXXX XXXX Team Lead Specialist Charge off Unit XXXX XXXX
Get a response from Gemini by making a call to Vertex AI using our connection.
from bigframes.ml.llm import GeminiTextGenerator
q_a_model = GeminiTextGenerator(model_name="gemini-2.0-flash-001")
# Make a DataFrame containing only a single row with our prompt for Gemini
df = bf.DataFrame({"prompt": [prompt]})
# Send the request for Gemini to generate a response to our prompt
major_difference = q_a_model.predict(df)
# Gemini's response is the only row in the dataframe result
major_difference["ml_generate_text_llm_result"].iloc[0]
/usr/local/google/home/garrettwu/src/bigframes/bigframes/core/__init__.py:108: PreviewWarning: Interpreting JSON column(s) as StringDtype. This behavior may change in future versions.
warnings.warn(
"## Key Differences between Comment Lists 1 and 2:\n\n**Comment List 1:**\n\n* **Focuses on Legal Violations:** The comments in List 1 primarily focus on how the debt collectors violated specific laws, such as the FDCPA and FCRA, by not validating debt, not marking accounts as disputed, and using illegal collection tactics.\n* **Detailed Evidence:** Commenters provide detailed evidence of their claims, including dates, reference numbers, police reports, and copies of communications.\n* **Formal Tone:** The language in List 1 is more formal and uses legal terminology, suggesting the commenters may have a deeper understanding of their rights.\n* **Emphasis on Debt Accuracy:** Many comments explicitly deny owing the debt and question its validity, requesting proof and demanding removal from credit reports. \n\n**Comment List 2:**\n\n* **Focus on Harassment and Intimidation:** The comments in List 2 highlight the harassing and intimidating behavior of the debt collectors, such as making multiple calls, contacting references, and threatening legal action.\n* **Emotional Language:** Commenters express frustration, fear, and anger towards the debt collectors' behavior.\n* **Less Legal Detail:** While some commenters mention specific laws, they provide less detailed evidence than List 1.\n* **Uncertainty About Debt:** Several commenters are unsure whether they actually owe the debt, questioning its origin and validity. \n\n**Overall:**\n\n* List 1 focuses on legal arguments and violations, while List 2 emphasizes emotional distress and improper collection tactics.\n* List 1 provides more concrete evidence of wrongdoing, while List 2 relies more on personal experiences and descriptions.\n* Both lists highlight the negative impacts of debt collection practices on individuals.\n"
We now see GeminiTextGenerator’s characterization of the different comment groups. Thanks for using BigQuery DataFrames!
Summary and next steps#
You’ve used the ML and LLM capabilities of BigQuery DataFrames to help analyze and understand a large dataset of unstructured feedback.
Learn more about BigQuery DataFrames in the documentation and find more sample notebooks in the GitHub repo.
Run in Colab