bigframes.pandas.read_gbq_function#

bigframes.pandas.read_gbq_function(function_name: str, is_row_processor: bool = False)[source]#

Loads a BigQuery function from BigQuery.

Then it can be applied to a DataFrame or Series.

Note

The return type of the function must be explicitly specified in the function’s original definition even if not otherwise required.

BigQuery Utils provides many public functions under the bqutil project on Google Cloud Platform project (See: GoogleCloudPlatform/bigquery-utils). You can checkout Community UDFs to use community-contributed functions. (See: GoogleCloudPlatform/bigquery-utils).

Examples:

Use the [cw_lower_case_ascii_only](GoogleCloudPlatform/bigquery-utils) function from Community UDFs.

>>> import bigframes.pandas as bpd
>>> func = bpd.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")

You can run it on scalar input. Usually you would do so to verify that it works as expected before applying to all values in a Series.

>>> func('AURÉLIE')
'aurÉlie'

You can apply it to a BigQuery DataFrames Series.

>>> df = bpd.DataFrame({'id': [1, 2, 3], 'name': ['AURÉLIE', 'CÉLESTINE', 'DAPHNÉ']})
>>> df
   id       name
0   1    AURÉLIE
1   2  CÉLESTINE
2   3     DAPHNÉ

[3 rows x 2 columns]
>>> df1 = df.assign(new_name=df['name'].apply(func))
>>> df1
   id       name   new_name
0   1    AURÉLIE    aurÉlie
1   2  CÉLESTINE  cÉlestine
2   3     DAPHNÉ     daphnÉ

[3 rows x 3 columns]

You can even use a function with multiple inputs. For example, [cw_regexp_replace_5](GoogleCloudPlatform/bigquery-utils) from Community UDFs.

>>> func = bpd.read_gbq_function("bqutil.fn.cw_regexp_replace_5")
>>> func('TestStr123456', 'Str', 'Cad$', 1, 1)
'TestCad$123456'
>>> df = bpd.DataFrame({
...     "haystack" : ["TestStr123456", "TestStr123456Str", "TestStr123456Str"],
...     "regexp" : ["Str", "Str", "Str"],
...     "replacement" : ["Cad$", "Cad$", "Cad$"],
...     "offset" : [1, 1, 1],
...     "occurrence" : [1, 2, 1]
... })
>>> df
           haystack regexp replacement  offset  occurrence
0     TestStr123456    Str        Cad$       1           1
1  TestStr123456Str    Str        Cad$       1           2
2  TestStr123456Str    Str        Cad$       1           1

[3 rows x 5 columns]
>>> df.apply(func, axis=1)
0       TestCad$123456
1    TestStr123456Cad$
2    TestCad$123456Str
dtype: string

Another use case is to define your own remote function and use it later. For example, define the remote function:

>>> @bpd.remote_function(cloud_function_service_account="default")
... def tenfold(num: int) -> float:
...     return num * 10

Then, read back the deployed BQ remote function:

>>> tenfold_ref = bpd.read_gbq_function(
...     tenfold.bigframes_remote_function,
... )
>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
>>> df
    a   b   c
0   1   3   5
1   2   4   6

[2 rows x 3 columns]
>>> df['a'].apply(tenfold_ref)
0    10.0
1    20.0
Name: a, dtype: Float64

It also supports row processing by using is_row_processor=True. Please note, row processor implies that the function has only one input parameter.

>>> @bpd.remote_function(cloud_function_service_account="default")
... def row_sum(s: pd.Series) -> float:
...     return s['a'] + s['b'] + s['c']
>>> row_sum_ref = bpd.read_gbq_function(
...     row_sum.bigframes_remote_function,
...     is_row_processor=True,
... )
>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
>>> df
    a   b   c
0   1   3   5
1   2   4   6

[2 rows x 3 columns]
>>> df.apply(row_sum_ref, axis=1)
0     9.0
1    12.0
dtype: Float64
Parameters:
  • function_name (str) – The function’s name in BigQuery in the format project_id.dataset_id.function_name, or dataset_id.function_name to load from the default project, or function_name to load from the default project and the dataset associated with the current session.

  • is_row_processor (bool, default False) – Whether the function is a row processor. This is set to True for a function which receives an entire row of a DataFrame as a pandas Series.

Returns:

A function object pointing to the BigQuery function read from BigQuery.

The object is similar to the one created by the remote_function decorator, including the bigframes_remote_function property, but not including the bigframes_cloud_function property.

Return type:

collections.abc.Callable