bigframes.bigquery.ai.generate_embedding#

bigframes.bigquery.ai.generate_embedding(model_name: str, data: DataFrame | Series | DataFrame | Series, *, output_dimensionality: int | None = None, task_type: str | None = None, start_second: float | None = None, end_second: float | None = None, interval_seconds: float | None = None, trial_id: int | None = None) DataFrame[source]#

Creates embeddings that describe an entity—for example, a piece of text or an image.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> df = bpd.DataFrame({"content": ["apple", "bear", "pear"]})
>>> bbq.ai.generate_embedding(
...     "project.dataset.model_name",
...     df
... )
Parameters:
  • model_name (str) – The name of a remote model from Vertex AI, such as the multimodalembedding@001 model.

  • data (bigframes.pandas.DataFrame or bigframes.pandas.Series) – The data to generate embeddings for. If a Series is provided, it is treated as the ‘content’ column. If a DataFrame is provided, it must contain a ‘content’ column, or you must rename the column you wish to embed to ‘content’.

  • output_dimensionality (int, optional) – An INT64 value that specifies the number of dimensions to use when generating embeddings. For example, if you specify 256 AS output_dimensionality, then the embedding output column contains a 256-dimensional embedding for each input value. To find the supported range of output dimensions, read about the available Google text embedding models.

  • task_type (str, optional) – A STRING literal that specifies the intended downstream application to help the model produce better quality embeddings. For a list of supported task types and how to choose which one to use, see Choose an embeddings task type.

  • start_second (float, optional) – The second in the video at which to start the embedding. The default value is 0.

  • end_second (float, optional) – The second in the video at which to end the embedding. The default value is 120.

  • interval_seconds (float, optional) – The interval to use when creating embeddings. The default value is 16.

  • trial_id (int, optional) – An INT64 value that identifies the hyperparameter tuning trial that you want the function to evaluate. The function uses the optimal trial by default. Only specify this argument if you ran hyperparameter tuning when creating the model.

Returns:

A new DataFrame with the generated embeddings. See the SQL reference for AI.GENERATE_EMBEDDING for details.

Return type:

bigframes.pandas.DataFrame