bigframes.bigquery.ai.embed#
- bigframes.bigquery.ai.embed(content: str | Series | Series, *, endpoint: str | None = None, model: str | None = None, task_type: Literal['retrieval_query', 'retrieval_document', 'semantic_similarity', 'classification', 'clustering', 'question_answering', 'fact_verification', 'code_retrieval_query'] | None = None, title: str | None = None, model_params: Mapping[Any, Any] | None = None, connection_id: str | None = None) Series[source]#
Creates embeddings from text or image data in BigQuery.
Examples:
>>> import bigframes.pandas as bpd >>> import bigframes.bigquery as bbq >>> bbq.ai.embed("dog", endpoint="text-embedding-005") 0 {'result': array([ 1.78243860e-03, -1.10658340...
>>> s = bpd.Series(['dog']) >>> bbq.ai.embed(s, endpoint='text-embedding-005') 0 {'result': array([ 1.78243860e-03, -1.10658340...
- Parameters:
content (str | Series) – A string literal or a Series (either BigFrames series or pandas Series) that provides the text or image to embed.
endpoint (str, optional) – A string value that specifies a supported Vertex AI embedding model endpoint to use. The endpoint value that you specify must include the model version, for example,
"text-embedding-005". If you specify this parameter, you can’t specify themodelparameter.model (str, optional) – A string value that specifies a built-in embedding model. The only supported value is
"embeddinggemma-300m". If you specify this parameter, you can’t specify theendpoint,title,model_params, orconnection_idparameters.task_type (str, optional) – A string literal that specifies the intended downstream application to help the model produce better quality embeddings. Accepts
"retrieval_query","retrieval_document","semantic_similarity","classification","clustering","question_answering","fact_verification","code_retrieval_query".title (str, optional) – A string value that specifies the document title, which the model uses to improve embedding quality. You can only use this parameter if you specify
"retrieval_document"for thetask_typevalue.model_params (Mapping[Any, Any], optional) – A JSON literal that provides additional parameters to the model. For example,
{"outputDimensionality": 768}lets you specify the number of dimensions to use when generating embeddings.connection_id (str, optional) – A STRING value specifying the connection to use to communicate with the model, in the format
PROJECT_ID.LOCATION.CONNECTION_ID. For example,myproject.us.myconnection. If not provided, the query uses your end-user credential.
- Returns:
A new struct Series with the result data. The struct contains these fields: * “result”: an ARRAY<FLOAT64> value containing the generated embeddings. * “status”: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
- Return type: