bigframes.bigquery.ai.similarity#

bigframes.bigquery.ai.similarity(content1: str | Series | Series, content2: str | Series | Series, *, endpoint: str | None = None, model: str | None = None, model_params: Mapping[Any, Any] | None = None, connection_id: str | None = None) Series[source]#

Returns a FLOAT64 value that represents the cosine similarity between the two inputs.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> df = bpd.DataFrame({'word': ['happy', 'sad']})
>>> bbq.ai.similarity(df['word'], 'glad', endpoint='text-embedding-005')
0    0.916601
1    0.660579
Parameters:
  • content1 (str | Series) – A string or series that provides the first value to compare. Both a BigFrames Series or a pandas Series are allowed.

  • content2 (str | Series) – A string or series that provides the second value to compare. Both a BigFrames Series or a pandas Series are allowed.

  • endpoint (str, optional) – Specifies the Vertex AI endpoint to use for the text embedding model. If you specify the model name, such as 'text-embedding-005', rather than a URL, then BigQuery ML automatically identifies the model and uses the model’s full endpoint.

  • model (str, optional) – Specifies a built-in text embedding model. The only supported value is the embeddinggemma-300m model. If you specify this parameter, you can’t specify the endpoint, model_params, or connection_id parameters.

  • model_params (Mapping[Any, Any], optional) – Provides additional parameters to the model. You can use any of the parameters object fields. One of these fields, outputDimensionality, lets you specify the number of dimensions to use when generating embeddings.

  • connection_id (str, optional) – Specifies the connection to use to communicate with the model. For example, myproject.us.myconnection.

Returns:

A new series of FLOAT64 values representing the cosine similarity.

Return type:

bigframes.series.Series