bigframes.bigquery.ai.classify#
- bigframes.bigquery.ai.classify(input: str | Series | Series | List[str | Series | Series] | Tuple[str | Series | Series, ...], categories: tuple[str, ...] | list[str], *, examples: list[tuple[str, str]] | list[tuple[str, list[str] | tuple[str, ...]]] | None = None, connection_id: str | None = None, endpoint: str | None = None, output_mode: Literal['single', 'multi'] | None = None, optimization_mode: Literal['minimize_cost', 'maximize_quality'] | None = None, max_error_ratio: float | None = None) Series[source]#
Classifies a given input into one of the specified categories. It will always return one of the provided categories best fit the prompt input.
Examples:
>>> import bigframes.pandas as bpd >>> import bigframes.bigquery as bbq >>> df = bpd.DataFrame({'creature': ['Cat', 'Salmon']}) >>> df['type'] = bbq.ai.classify(df['creature'], ['Mammal', 'Fish']) >>> df creature type 0 Cat Mammal 1 Salmon Fish [2 rows x 2 columns]
- Parameters:
input (str | Series | List[str|Series] | Tuple[str|Series, ...]) – A mixture of Series and string literals that specifies the input to send to the model. The Series can be BigFrames Series or pandas Series.
categories (tuple[str, ...] | list[str]) – Categories to classify the input into.
examples (list[tuple[str, str]] | list[tuple[str, list[str] | tuple[str, ...]]], optional) – An array that contains representative examples of input strings and the output category that you expect. If
output_modeismulti, each example output must be a list or tuple of strings. You can provide examples to help the model understand your intended threshold for a condition with nuanced or subjective logic. We recommend providing at most 5 examples.connection_id (str, optional) – Specifies the connection to use to communicate with the model. For example,
myproject.us.myconnection. If not provided, the query uses your end-user credential.endpoint (str, optional) – A STRING value that specifies the Vertex AI endpoint to use for the model. You can specify any generally available or preview Gemini model. If you specify the model name, BigQuery ML automatically identifies and uses the full endpoint of the model.
output_mode (Literal["single", "multi"], optional) – A STRING value that indicates whether a single input can be classified into multiple categories. Supported values are
singleandmulti.optimization_mode (Literal["minimize_cost", "maximize_quality"], optional) – A STRING value that specifies the optimization strategy to use. Supported values are
minimize_costandmaximize_quality.max_error_ratio (float, optional) – A value between
0.0and1.0that contains the maximum acceptable ratio of row-level inference failures to rows processed on this function. The default value is 1.0. This argument isn’t supported whenoptimization_modeis set tominimize_cost.
- Returns:
A new series of strings (or a series of arrays of strings if
output_modeis specified).- Return type: