bigframes.geopandas.GeoSeries.apply#

GeoSeries.apply(func, by_row: Literal['compat'] | bool = 'compat', *, args: Tuple = ()) Series#

Invoke function on values of a Series.

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values. If it is an arbitrary python function then converting it into a remote_function is recommended.

Examples:

Simple vectorized functions, lambdas or ufuncs can be applied directly with by_row=False.

>>> nums = bpd.Series([1, 2, 3, 4])
>>> nums
0    1
1    2
2    3
3    4
dtype: Int64
>>> nums.apply(lambda x: x*x + 2*x + 1, by_row=False)
0     4
1     9
2    16
3    25
dtype: Int64
>>> def is_odd(num):
...     return num % 2 == 1
>>> nums.apply(is_odd, by_row=False)
0     True
1    False
2     True
3    False
dtype: boolean
>>> nums.apply(np.log, by_row=False)
0         0.0
1    0.693147
2    1.098612
3    1.386294
dtype: Float64

Use remote_function to apply an arbitrary Python function. Set reuse=False flag to make sure a new remote_function is created every time you run the following code. Omit it to reuse a previously deployed remote_function from the same user defined function if the hash of the function definition hasn’t changed.

>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def minutes_to_hours(x: int) -> float:
...     return x/60
>>> minutes = bpd.Series([0, 30, 60, 90, 120])
>>> minutes
0      0
1     30
2     60
3     90
4    120
dtype: Int64
>>> hours = minutes.apply(minutes_to_hours)
>>> hours
0    0.0
1    0.5
2    1.0
3    1.5
4    2.0
dtype: Float64

To turn a user defined function with external package dependencies into a remote_function, you would provide the names of the packages via packages param.

>>> @bpd.remote_function(
...     reuse=False,
...     packages=["cryptography"],
...     cloud_function_service_account="default"
... )
... def get_hash(input: str) -> str:
...     from cryptography.fernet import Fernet
...
...     # handle missing value
...     if input is None:
...         input = ""
...
...     key = Fernet.generate_key()
...     f = Fernet(key)
...     return f.encrypt(input.encode()).decode()
>>> names = bpd.Series(["Alice", "Bob"])
>>> hashes = names.apply(get_hash)

You could return an array output from the remote function.

>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default")
... def text_analyzer(text: str) -> list[int]:
...     words = text.count(" ") + 1
...     periods = text.count(".")
...     exclamations = text.count("!")
...     questions = text.count("?")
...     return [words, periods, exclamations, questions]
>>> texts = bpd.Series([
...     "The quick brown fox jumps over the lazy dog.",
...     "I love this product! It's amazing.",
...     "Hungry? Wanna eat? Lets go!"
... ])
>>> features = texts.apply(text_analyzer)
>>> features
0    [9 1 0 0]
1    [6 1 1 0]
2    [5 0 1 2]
dtype: list<item: int64>[pyarrow]
Parameters:
  • func (function) – BigFrames DataFrames remote_function to apply. The function should take a scalar and return a scalar. It will be applied to every element in the Series.

  • by_row (False or "compat", default "compat") – If “compat” , func must be a remote function which will be passed each element of the Series, like Series.map. If False, the func will be passed the whole Series at once.

Returns:

A new Series with values representing the return value of the func applied to each element of the original Series.

Return type:

bigframes.pandas.Series