# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
BigQuery DataFrames ML: Drug Name Generation#
Run in Colab
|
|
|
|
Overview#
The goal of this notebook is to demonstrate an enterprise generative AI use case. A marketing user can provide information about a new pharmaceutical drug and its generic name, and receive ideas on marketing-oriented brand names for that drug.
Learn more about BigQuery DataFrames.
Objective#
In this tutorial, you learn about Generative AI concepts such as prompting and few-shot learning, as well as how to use BigFrames ML for performing these tasks simply using an intuitive dataframe API.
The steps performed include:
Ask the user for the generic name and usage for the drug.
Use
bigframesto query the FDA dataset of over 100,000 drugs, filtered on the brand name, generic name, and indications & usage columns.Filter this dataset to find prototypical brand names that can be used as examples in prompt tuning.
Create a prompt with the user input, general instructions, examples and counter-examples for the desired brand name.
Use the
bigframes.ml.llm.GeminiTextGeneratorto generate choices of brand names.
Dataset#
This notebook uses the FDA dataset available at bigquery-public-data.fda_drug.
Costs#
This tutorial uses billable components of Google Cloud:
BigQuery (compute)
BigQuery ML
Learn about BigQuery compute pricing, and BigQuery ML pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.
Installation#
Install the following packages required to execute this notebook.
# !pip install -U --quiet bigframes
Colab only: Uncomment the following cell to restart the kernel.#
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython
# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)
Import libraries#
import bigframes.pandas as bpd
from bigframes.ml.llm import GeminiTextGenerator
from IPython.display import Markdown
Authenticate your Google Cloud account#
Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.
1. Vertex AI Workbench
Do nothing as you are already authenticated.
2. Local JupyterLab instance, uncomment and run:
# ! gcloud auth login
3. Colab, uncomment and run:
# from google.colab import auth
# auth.authenticate_user()
Before you begin#
Set up your Google Cloud project#
The following steps are required, regardless of your notebook environment.
Select or create a Google Cloud project. When you first create an account, you get a $300 free credit towards your compute/storage costs.
If you are running this notebook locally, you need to install the Cloud SDK.
Set your project ID#
If you don’t know your project ID, try the following:
Run
gcloud config list.Run
gcloud projects list.See the support page: Locate the project ID
# Please fill in these values.
PROJECT_ID = "" # @param {type:"string"}
# Set the project id
! gcloud config set project {PROJECT_ID}
ERROR: (gcloud.config.set) argument VALUE: Must be specified.
Usage: gcloud config set SECTION/PROPERTY VALUE [optional flags]
optional flags may be --help | --installation
For detailed information on this command and its flags, run:
gcloud config set --help
BigFrames configuration#
Next, we will specify a BigQuery connection. If you already have a connection, you can simplify provide the name and skip the following creation steps.
# Please fill in these values.
LOCATION = "us" # @param {type:"string"}
We will now try to use the provided connection, and if it doesn’t exist, create a new one. We will also print the service account used.
Initialize BigFrames client#
Here, we set the project configuration based on the provided parameters.
# Note: The project option is not required in all environments.
# On BigQuery Studio, the project ID is automatically detected.
bpd.options.bigquery.project = PROJECT_ID
# Note: The location option is not required.
# It defaults to the location of the first table or query
# passed to read_gbq(). For APIs where a location can't be
# auto-detected, the location defaults to the "US" location.
bpd.options.bigquery.location = LOCATION
Generate a name#
Let’s start with entering a generic name and description of the drug.
GENERIC_NAME = "Entropofloxacin" # @param {type:"string"}
USAGE = "Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back." # @param {type:"string"}
NUM_NAMES = 10 # @param {type:"integer"}
TEMPERATURE = 0.5 # @param {type: "number"}
We can now create a prompt string, and populate it with the name and description.
zero_shot_prompt = f"""Provide {NUM_NAMES} unique and modern brand names in Markdown bullet point format. Do not provide any additional explanation.
Be creative with the brand names. Don't use English words directly; use variants or invented words.
The generic name is: {GENERIC_NAME}
The indications and usage are: {USAGE}."""
print(zero_shot_prompt)
Provide 10 unique and modern brand names in Markdown bullet point format. Do not provide any additional explanation.
Be creative with the brand names. Don't use English words directly; use variants or invented words.
The generic name is: Entropofloxacin
The indications and usage are: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back..
Next, let’s create a helper function to predict with our model. It will take a string input, and add it to a temporary BigFrames DataFrame. It will also return the string extracted from the response DataFrame.
def predict(prompt: str, temperature: float = TEMPERATURE) -> str:
# Create dataframe
input = bpd.DataFrame(
{
"prompt": [prompt],
}
)
# Return response
return model.predict(input, temperature=temperature).ml_generate_text_llm_result.iloc[0]
We can now initialize the model, and get a response to our prompt!
# Define the model
model = GeminiTextGenerator(model_name="gemini-2.0-flash-001")
# Invoke LLM with prompt
response = predict(zero_shot_prompt, temperature = TEMPERATURE)
# Print results as Markdown
Markdown(response)
Etherealox
Zenithrox
Aureox
Lucentrox
Aethrox
Luminex
Elysirox
Quasarox
Novaflux
Arcanox
We’re off to a great start! Let’s see if we can refine our response.
Few-shot learning#
Let’s try using few-shot learning. We will provide a few examples of what we’re looking for along with our prompt.
Our prompt will consist of 3 parts:
General instructions (e.g. generate $n$ brand names)
Multiple examples
Information about the drug we’d like to generate a name for
Let’s walk through how to construct this prompt.
Our first step will be to define how many examples we want to provide in the prompt.
# Specify number of examples to include
NUM_EXAMPLES = 3 # @param {type:"integer"}
Next, let’s define a prefix that will set the overall context.
prefix_prompt = f"""Provide {NUM_NAMES} unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.
Be creative with the brand names. Don't use English words directly; use variants or invented words.
First, we will provide {NUM_EXAMPLES} examples to help with your thought process.
Then, we will provide the generic name and usage for the drug we'd like you to generate brand names for.
"""
print(prefix_prompt)
Provide 10 unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.
Be creative with the brand names. Don't use English words directly; use variants or invented words.
First, we will provide 3 examples to help with your thought process.
Then, we will provide the generic name and usage for the drug we'd like you to generate brand names for.
Our next step will be to include examples into the prompt.
We will start out by retrieving the raw data for the examples, by querying the BigQuery public dataset.
# Query 3 columns of interest from drug label dataset
df = bpd.read_gbq("bigquery-public-data.fda_drug.drug_label",
columns=["openfda_generic_name", "openfda_brand_name", "indications_and_usage"])
# Exclude any rows with missing data
df = df.dropna()
# Drop duplicate rows
df = df.drop_duplicates()
# Print values
df.head()
| openfda_generic_name | openfda_brand_name | indications_and_usage | |
|---|---|---|---|
| 0 | BENZALKONIUM CHLORIDE | meijer kids | Use - hand washing to decrease bacteria on skin |
| 3 | OCTINOXATE, TITANIUM DIOXIDE | CD DIORSKIN STAR Studio Makeup Spectacular Bri... | Uses Helps prevent sunburn. If used as directe... |
| 4 | TRIAMCINOLONE ACETONIDE | Triamcinolone Acetonide | INDICATIONS AND USAGE Triamcinolone Acetonide ... |
| 5 | BACITRACIN ZINC, NEOMYCIN SULFATE, POLYMYXIN B... | Triple Antibiotic | First aid to help prevent infection in minor c... |
| 6 | RISPERIDONE | Risperidone | 1. INDICATIONS AND USAGE Risperidone is an aty... |
5 rows × 3 columns
Let’s now filter the results to remove atypical names.
# Remove names with spaces
df = df[df["openfda_brand_name"].str.find(" ") == -1]
# Remove names with 5 or fewer characters
df = df[df["openfda_brand_name"].str.len() > 5]
# Remove names where the generic and brand name match (case-insensitive)
df = df[df["openfda_generic_name"].str.lower() != df["openfda_brand_name"].str.lower()]
Let’s take NUM_EXAMPLES samples to include in the prompt.
# Take a sample and convert to a Pandas dataframe for local usage.
df_examples = df.sample(NUM_EXAMPLES, random_state=3).to_pandas()
df_examples
| openfda_generic_name | openfda_brand_name | indications_and_usage | |
|---|---|---|---|
| 81748 | AMPICILLIN SODIUM | Ampicillin | INDICATIONS AND USAGE Ampicillin for Injection... |
| 730 | AZTREONAM | Cayston | 1 INDICATIONS AND USAGE CAYSTON® is indicated ... |
| 71763 | TERAZOSIN HYDROCHLORIDE | Terazosin | INDICATIONS AND USAGE Terazosin capsules are i... |
Let’s now convert the data to a JSON structure, to enable embedding into a prompt. For consistency, we’ll capitalize each example brand name.
examples = [
{
"brand_name": brand_name.capitalize(),
"generic_name": generic_name,
"usage": usage,
}
for brand_name, generic_name, usage in zip(
df_examples["openfda_brand_name"],
df_examples["openfda_generic_name"],
df_examples["indications_and_usage"],
)
]
print(examples)
[{'brand_name': 'Ampicillin', 'generic_name': 'AMPICILLIN SODIUM', 'usage': 'INDICATIONS AND USAGE Ampicillin for Injection, USP is indicated in the treatment of infections caused by susceptible strains of the designated organisms in the following conditions: Respiratory Tract Infections caused by Streptococcus pneumoniae. Staphylococcus aureus (penicillinase and nonpenicillinase-producing), H. influenzae, and Group A beta-hemolytic streptococci. Bacterial Meningitis caused by E. coli, Group B streptococci, and other Gram-negative bacteria (Listeria monocytogenes, N. meningitidis). The addition of an aminoglycoside with ampicillin may increase its effectiveness against Gram-negative bacteria. Septicemia and Endocarditis caused by susceptible Gram-positive organisms including Streptococcus spp., penicillin G-susceptible staphylococci, and enterococci. Gram-negative sepsis caused by E. coli, Proteus mirabilis and Salmonella spp. responds to ampicillin. Endocarditis due to enterococcal strains usually respond to intravenous therapy. The addition of an aminoglycoside may enhance the effectiveness of ampicillin when treating streptococcal endocarditis. Urinary Tract Infections caused by sensitive strains of E. coli and Proteus mirabilis. Gastrointestinal Infections caused by Salmonella typhi (typhoid fever), other Salmonella spp., and Shigella spp. (dysentery) usually respond to oral or intravenous therapy. Bacteriology studies to determine the causative organisms and their susceptibility to ampicillin should be performed. Therapy may be instituted prior to obtaining results of susceptibility testing. It is advisable to reserve the parenteral form of this drug for moderately severe and severe infections and for patients who are unable to take the oral forms. A change to oral ampicillin may be made as soon as appropriate. To reduce the development of drug-resistant bacteria and maintain the effectiveness of Ampicillin for Injection, USP and other antibacterial drugs, Ampicillin for Injection, USP should be used only to treat or prevent infections that are proven or strongly suspected to be caused by susceptible bacteria. When culture and susceptibility information are available, they should be considered in selecting or modifying antibacterial therapy. In the absence of such data, local epidemiology and susceptibility patterns may contribute to the empiric selection of therapy. Indicated surgical procedures should be performed.'}, {'brand_name': 'Cayston', 'generic_name': 'AZTREONAM', 'usage': '1 INDICATIONS AND USAGE CAYSTON® is indicated to improve respiratory symptoms in cystic fibrosis (CF) patients with Pseudomonas aeruginosa. Safety and effectiveness have not been established in pediatric patients below the age of 7 years, patients with FEV1 <25% or >75% predicted, or patients colonized with Burkholderia cepacia [see Clinical Studies (14) ]. To reduce the development of drug-resistant bacteria and maintain the effectiveness of CAYSTON and other antibacterial drugs, CAYSTON should be used only to treat patients with CF known to have Pseudomonas aeruginosa in the lungs. CAYSTON is a monobactam antibacterial indicated to improve respiratory symptoms in cystic fibrosis (CF) patients with Pseudomonas aeruginosa. Safety and effectiveness have not been established in pediatric patients below the age of 7 years, patients with FEV1 <25% or >75% predicted, or patients colonized with Burkholderia cepacia. (1)'}, {'brand_name': 'Terazosin', 'generic_name': 'TERAZOSIN HYDROCHLORIDE', 'usage': 'INDICATIONS AND USAGE Terazosin capsules are indicated for the treatment of symptomatic benign prostatic hyperplasia (BPH). There is a rapid response, with approximately 70% of patients experiencing an increase in urinary flow and improvement in symptoms of BPH when treated with terazosin capsules. The long-term effects of terazosin capsules on the incidence of surgery, acute urinary obstruction or other complications of BPH are yet to be determined. Terazosin capsules are also indicated for the treatment of hypertension. Terazosin capsules can be used alone or in combination with other antihypertensive agents such as diuretics or beta-adrenergic blocking agents.'}]
We’ll create a prompt template for each example, and view the first one.
example_prompt = ""
for example in examples:
example_prompt += f"Generic name: {example['generic_name']}\nUsage: {example['usage']}\nBrand name: {example['brand_name']}\n\n"
example_prompt
'Generic name: AMPICILLIN SODIUM\nUsage: INDICATIONS AND USAGE Ampicillin for Injection, USP is indicated in the treatment of infections caused by susceptible strains of the designated organisms in the following conditions: Respiratory Tract Infections caused by Streptococcus pneumoniae. Staphylococcus aureus (penicillinase and nonpenicillinase-producing), H. influenzae, and Group A beta-hemolytic streptococci. Bacterial Meningitis caused by E. coli, Group B streptococci, and other Gram-negative bacteria (Listeria monocytogenes, N. meningitidis). The addition of an aminoglycoside with ampicillin may increase its effectiveness against Gram-negative bacteria. Septicemia and Endocarditis caused by susceptible Gram-positive organisms including Streptococcus spp., penicillin G-susceptible staphylococci, and enterococci. Gram-negative sepsis caused by E. coli, Proteus mirabilis and Salmonella spp. responds to ampicillin. Endocarditis due to enterococcal strains usually respond to intravenous therapy. The addition of an aminoglycoside may enhance the effectiveness of ampicillin when treating streptococcal endocarditis. Urinary Tract Infections caused by sensitive strains of E. coli and Proteus mirabilis. Gastrointestinal Infections caused by Salmonella typhi (typhoid fever), other Salmonella spp., and Shigella spp. (dysentery) usually respond to oral or intravenous therapy. Bacteriology studies to determine the causative organisms and their susceptibility to ampicillin should be performed. Therapy may be instituted prior to obtaining results of susceptibility testing. It is advisable to reserve the parenteral form of this drug for moderately severe and severe infections and for patients who are unable to take the oral forms. A change to oral ampicillin may be made as soon as appropriate. To reduce the development of drug-resistant bacteria and maintain the effectiveness of Ampicillin for Injection, USP and other antibacterial drugs, Ampicillin for Injection, USP should be used only to treat or prevent infections that are proven or strongly suspected to be caused by susceptible bacteria. When culture and susceptibility information are available, they should be considered in selecting or modifying antibacterial therapy. In the absence of such data, local epidemiology and susceptibility patterns may contribute to the empiric selection of therapy. Indicated surgical procedures should be performed.\nBrand name: Ampicillin\n\nGeneric name: AZTREONAM\nUsage: 1 INDICATIONS AND USAGE CAYSTON® is indicated to improve respiratory symptoms in cystic fibrosis (CF) patients with Pseudomonas aeruginosa. Safety and effectiveness have not been established in pediatric patients below the age of 7 years, patients with FEV1 <25% or >75% predicted, or patients colonized with Burkholderia cepacia [see Clinical Studies (14) ]. To reduce the development of drug-resistant bacteria and maintain the effectiveness of CAYSTON and other antibacterial drugs, CAYSTON should be used only to treat patients with CF known to have Pseudomonas aeruginosa in the lungs. CAYSTON is a monobactam antibacterial indicated to improve respiratory symptoms in cystic fibrosis (CF) patients with Pseudomonas aeruginosa. Safety and effectiveness have not been established in pediatric patients below the age of 7 years, patients with FEV1 <25% or >75% predicted, or patients colonized with Burkholderia cepacia. (1)\nBrand name: Cayston\n\nGeneric name: TERAZOSIN HYDROCHLORIDE\nUsage: INDICATIONS AND USAGE Terazosin capsules are indicated for the treatment of symptomatic benign prostatic hyperplasia (BPH). There is a rapid response, with approximately 70% of patients experiencing an increase in urinary flow and improvement in symptoms of BPH when treated with terazosin capsules. The long-term effects of terazosin capsules on the incidence of surgery, acute urinary obstruction or other complications of BPH are yet to be determined. Terazosin capsules are also indicated for the treatment of hypertension. Terazosin capsules can be used alone or in combination with other antihypertensive agents such as diuretics or beta-adrenergic blocking agents.\nBrand name: Terazosin\n\n'
Finally, we can create a suffix to our prompt. This will contain the generic name of the drug, its usage, ending with a request for brand names.
suffix_prompt = f"""Generic name: {GENERIC_NAME}
Usage: {USAGE}
Brand names:"""
print(suffix_prompt)
Generic name: Entropofloxacin
Usage: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back.
Brand names:
Let’s pull it altogether into a few shot prompt.
# Define the prompt
few_shot_prompt = prefix_prompt + example_prompt + suffix_prompt
# Print the prompt
print(few_shot_prompt)
Provide 10 unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.
Be creative with the brand names. Don't use English words directly; use variants or invented words.
First, we will provide 3 examples to help with your thought process.
Then, we will provide the generic name and usage for the drug we'd like you to generate brand names for.
Generic name: AMPICILLIN SODIUM
Usage: INDICATIONS AND USAGE Ampicillin for Injection, USP is indicated in the treatment of infections caused by susceptible strains of the designated organisms in the following conditions: Respiratory Tract Infections caused by Streptococcus pneumoniae. Staphylococcus aureus (penicillinase and nonpenicillinase-producing), H. influenzae, and Group A beta-hemolytic streptococci. Bacterial Meningitis caused by E. coli, Group B streptococci, and other Gram-negative bacteria (Listeria monocytogenes, N. meningitidis). The addition of an aminoglycoside with ampicillin may increase its effectiveness against Gram-negative bacteria. Septicemia and Endocarditis caused by susceptible Gram-positive organisms including Streptococcus spp., penicillin G-susceptible staphylococci, and enterococci. Gram-negative sepsis caused by E. coli, Proteus mirabilis and Salmonella spp. responds to ampicillin. Endocarditis due to enterococcal strains usually respond to intravenous therapy. The addition of an aminoglycoside may enhance the effectiveness of ampicillin when treating streptococcal endocarditis. Urinary Tract Infections caused by sensitive strains of E. coli and Proteus mirabilis. Gastrointestinal Infections caused by Salmonella typhi (typhoid fever), other Salmonella spp., and Shigella spp. (dysentery) usually respond to oral or intravenous therapy. Bacteriology studies to determine the causative organisms and their susceptibility to ampicillin should be performed. Therapy may be instituted prior to obtaining results of susceptibility testing. It is advisable to reserve the parenteral form of this drug for moderately severe and severe infections and for patients who are unable to take the oral forms. A change to oral ampicillin may be made as soon as appropriate. To reduce the development of drug-resistant bacteria and maintain the effectiveness of Ampicillin for Injection, USP and other antibacterial drugs, Ampicillin for Injection, USP should be used only to treat or prevent infections that are proven or strongly suspected to be caused by susceptible bacteria. When culture and susceptibility information are available, they should be considered in selecting or modifying antibacterial therapy. In the absence of such data, local epidemiology and susceptibility patterns may contribute to the empiric selection of therapy. Indicated surgical procedures should be performed.
Brand name: Ampicillin
Generic name: AZTREONAM
Usage: 1 INDICATIONS AND USAGE CAYSTON® is indicated to improve respiratory symptoms in cystic fibrosis (CF) patients with Pseudomonas aeruginosa. Safety and effectiveness have not been established in pediatric patients below the age of 7 years, patients with FEV1 <25% or >75% predicted, or patients colonized with Burkholderia cepacia [see Clinical Studies (14) ]. To reduce the development of drug-resistant bacteria and maintain the effectiveness of CAYSTON and other antibacterial drugs, CAYSTON should be used only to treat patients with CF known to have Pseudomonas aeruginosa in the lungs. CAYSTON is a monobactam antibacterial indicated to improve respiratory symptoms in cystic fibrosis (CF) patients with Pseudomonas aeruginosa. Safety and effectiveness have not been established in pediatric patients below the age of 7 years, patients with FEV1 <25% or >75% predicted, or patients colonized with Burkholderia cepacia. (1)
Brand name: Cayston
Generic name: TERAZOSIN HYDROCHLORIDE
Usage: INDICATIONS AND USAGE Terazosin capsules are indicated for the treatment of symptomatic benign prostatic hyperplasia (BPH). There is a rapid response, with approximately 70% of patients experiencing an increase in urinary flow and improvement in symptoms of BPH when treated with terazosin capsules. The long-term effects of terazosin capsules on the incidence of surgery, acute urinary obstruction or other complications of BPH are yet to be determined. Terazosin capsules are also indicated for the treatment of hypertension. Terazosin capsules can be used alone or in combination with other antihypertensive agents such as diuretics or beta-adrenergic blocking agents.
Brand name: Terazosin
Generic name: Entropofloxacin
Usage: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back.
Brand names:
Now, let’s pass our prompt to the LLM, and get a response!
response = predict(few_shot_prompt)
Markdown(response)
Aerion: (Derived from “aer” meaning air)
Aquazone: (Combining “aqua” for water and “zone” for area)
Biosphere: (Inspired by the concept of a self-contained ecosystem)
Celestial: (Evoking the vastness and healing power of the universe)
Ethereal: (Conveying a sense of lightness and transcendence)
Luminary: (From “lumen” meaning light, symbolizing hope and healing)
Quasar: (Inspired by the powerful and distant cosmic objects)
Sanctuary: (Creating a sense of safety and refuge)
Zenith: (Reaching the highest point or peak)
Zephyr: (Named after the gentle west wind, representing a calming and soothing effect)
Bulk generation#
Let’s take these experiments to the next level by generating many names in bulk. We’ll see how to leverage BigFrames at scale!
We can start by finding drugs that are missing brand names. There are approximately 4,000 drugs that meet this criteria. We’ll put a limit of 100 in this notebook.
# Query 3 columns of interest from drug label dataset
df_missing = bpd.read_gbq("bigquery-public-data.fda_drug.drug_label",
columns=["openfda_generic_name", "openfda_brand_name", "indications_and_usage"])
# Exclude any rows with missing data
df_missing = df_missing.dropna()
# Include rows in which openfda_brand_name equals openfda_generic_name
df_missing = df_missing[df_missing["openfda_generic_name"] == df_missing["openfda_brand_name"]]
# Limit the number of rows for demonstration purposes
df_missing = df_missing.head(100)
# Print values
df_missing.head()
| openfda_generic_name | openfda_brand_name | indications_and_usage | |
|---|---|---|---|
| 89 | MEPHITIS MEPHITICA | MEPHITIS MEPHITICA | INDICATIONS Condition listed above or as direc... |
| 105 | ONDANSETRON | ONDANSETRON | 1 INDICATIONS AND USAGE Ondansetron Injection,... |
| 124 | CLOFARABINE | CLOFARABINE | 1 INDICATIONS AND USAGE Clofarabine injection ... |
| 273 | ACETAMINOPHEN AND DIPHENHYDRAMINE HYDROCHLORIDE | ACETAMINOPHEN AND DIPHENHYDRAMINE HYDROCHLORIDE | Uses Temporary relief of occasional headaches ... |
| 284 | OFLOXACIN | OFLOXACIN | INDICATIONS AND USAGE To reduce the developmen... |
5 rows × 3 columns
We will create a column prompt with a customized prompt for each row.
df_missing["prompt"] = (
"Provide a unique and modern brand name related to this pharmaceutical drug."
+ "Don't use English words directly; use variants or invented words. The generic name is: "
+ df_missing["openfda_generic_name"]
+ ". The indications and usage are: "
+ df_missing["indications_and_usage"]
+ "."
)
We’ll create a new helper method, batch_predict() and query the LLM. The job may take a couple minutes to execute.
def batch_predict(
input: bpd.DataFrame, temperature: float = TEMPERATURE
) -> bpd.DataFrame:
return model.predict(input, temperature=temperature).ml_generate_text_llm_result
response = batch_predict(df_missing["prompt"])
Let’s check the results for one of our responses!
# Pick a sample
k = 0
# Gather the prompt and response details
prompt_generic = df_missing["openfda_generic_name"].iloc[k]
prompt_usage = df_missing["indications_and_usage"].iloc[k]
response_str = response.iloc[k]
# Print details
print(f"Generic name: {prompt_generic}")
print(f"Brand name: {prompt_usage}")
print(f"Response: {response_str}")
Generic name: MEPHITIS MEPHITICA
Brand name: INDICATIONS Condition listed above or as directed by the physician
Response: **Ephemeral** (Latin root: "ephemerus," meaning "lasting for a day")
**Aetheria** (Greek root: "aither," meaning "upper air, sky")
**Zenithar** (Combination of "zenith" and "pharma")
**Celestian** (Latin root: "celestial," meaning "heavenly")
**Astralux** (Combination of "astral" and "lux," meaning "light")
Congratulations! You have learned how to use generative AI to jumpstart the creative process.
You’ve also seen how BigFrames can manage each step of the process, including gathering data, data manipulation, and querying the LLM.
Run in Colab