bigframes.pandas.read_gbq_function#
- bigframes.pandas.read_gbq_function(function_name: str, is_row_processor: bool = False)[source]#
Loads a BigQuery function from BigQuery.
Then it can be applied to a DataFrame or Series.
Note
The return type of the function must be explicitly specified in the function’s original definition even if not otherwise required.
BigQuery Utils provides many public functions under the
bqutilproject on Google Cloud Platform project (See: GoogleCloudPlatform/bigquery-utils). You can checkout Community UDFs to use community-contributed functions. (See: GoogleCloudPlatform/bigquery-utils).Examples:
Use the [cw_lower_case_ascii_only](GoogleCloudPlatform/bigquery-utils) function from Community UDFs.
>>> import bigframes.pandas as bpd >>> func = bpd.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")
You can run it on scalar input. Usually you would do so to verify that it works as expected before applying to all values in a Series.
>>> func('AURÉLIE') 'aurÉlie'
You can apply it to a BigQuery DataFrames Series.
>>> df = bpd.DataFrame({'id': [1, 2, 3], 'name': ['AURÉLIE', 'CÉLESTINE', 'DAPHNÉ']}) >>> df id name 0 1 AURÉLIE 1 2 CÉLESTINE 2 3 DAPHNÉ [3 rows x 2 columns]
>>> df1 = df.assign(new_name=df['name'].apply(func)) >>> df1 id name new_name 0 1 AURÉLIE aurÉlie 1 2 CÉLESTINE cÉlestine 2 3 DAPHNÉ daphnÉ [3 rows x 3 columns]
You can even use a function with multiple inputs. For example, [cw_regexp_replace_5](GoogleCloudPlatform/bigquery-utils) from Community UDFs.
>>> func = bpd.read_gbq_function("bqutil.fn.cw_regexp_replace_5") >>> func('TestStr123456', 'Str', 'Cad$', 1, 1) 'TestCad$123456'
>>> df = bpd.DataFrame({ ... "haystack" : ["TestStr123456", "TestStr123456Str", "TestStr123456Str"], ... "regexp" : ["Str", "Str", "Str"], ... "replacement" : ["Cad$", "Cad$", "Cad$"], ... "offset" : [1, 1, 1], ... "occurrence" : [1, 2, 1] ... }) >>> df haystack regexp replacement offset occurrence 0 TestStr123456 Str Cad$ 1 1 1 TestStr123456Str Str Cad$ 1 2 2 TestStr123456Str Str Cad$ 1 1 [3 rows x 5 columns] >>> df.apply(func, axis=1) 0 TestCad$123456 1 TestStr123456Cad$ 2 TestCad$123456Str dtype: string
Another use case is to define your own remote function and use it later. For example, define the remote function:
>>> @bpd.remote_function(cloud_function_service_account="default") ... def tenfold(num: int) -> float: ... return num * 10
Then, read back the deployed BQ remote function:
>>> tenfold_ref = bpd.read_gbq_function( ... tenfold.bigframes_remote_function, ... )
>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]}) >>> df a b c 0 1 3 5 1 2 4 6 [2 rows x 3 columns]
>>> df['a'].apply(tenfold_ref) 0 10.0 1 20.0 Name: a, dtype: Float64
It also supports row processing by using is_row_processor=True. Please note, row processor implies that the function has only one input parameter.
>>> @bpd.remote_function(cloud_function_service_account="default") ... def row_sum(s: pd.Series) -> float: ... return s['a'] + s['b'] + s['c']
>>> row_sum_ref = bpd.read_gbq_function( ... row_sum.bigframes_remote_function, ... is_row_processor=True, ... )
>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]}) >>> df a b c 0 1 3 5 1 2 4 6 [2 rows x 3 columns]
>>> df.apply(row_sum_ref, axis=1) 0 9.0 1 12.0 dtype: Float64
- Parameters:
function_name (str) – The function’s name in BigQuery in the format project_id.dataset_id.function_name, or dataset_id.function_name to load from the default project, or function_name to load from the default project and the dataset associated with the current session.
is_row_processor (bool, default False) – Whether the function is a row processor. This is set to True for a function which receives an entire row of a DataFrame as a pandas Series.
- Returns:
A function object pointing to the BigQuery function read from BigQuery.
The object is similar to the one created by the remote_function decorator, including the bigframes_remote_function property, but not including the bigframes_cloud_function property.
- Return type: