bigframes.pandas.Series.apply#
- Series.apply(func, by_row: Literal['compat'] | bool = 'compat', *, args: Tuple = ()) Series[source]#
Invoke function on values of a Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values. If it is an arbitrary python function then converting it into a remote_function is recommended.
Examples:
Simple vectorized functions, lambdas or ufuncs can be applied directly with by_row=False.
>>> nums = bpd.Series([1, 2, 3, 4]) >>> nums 0 1 1 2 2 3 3 4 dtype: Int64 >>> nums.apply(lambda x: x*x + 2*x + 1, by_row=False) 0 4 1 9 2 16 3 25 dtype: Int64
>>> def is_odd(num): ... return num % 2 == 1 >>> nums.apply(is_odd, by_row=False) 0 True 1 False 2 True 3 False dtype: boolean
>>> nums.apply(np.log, by_row=False) 0 0.0 1 0.693147 2 1.098612 3 1.386294 dtype: Float64
Use remote_function to apply an arbitrary Python function. Set
reuse=Falseflag to make sure a new remote_function is created every time you run the following code. Omit it to reuse a previously deployed remote_function from the same user defined function if the hash of the function definition hasn’t changed.>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default") ... def minutes_to_hours(x: int) -> float: ... return x/60
>>> minutes = bpd.Series([0, 30, 60, 90, 120]) >>> minutes 0 0 1 30 2 60 3 90 4 120 dtype: Int64
>>> hours = minutes.apply(minutes_to_hours) >>> hours 0 0.0 1 0.5 2 1.0 3 1.5 4 2.0 dtype: Float64
To turn a user defined function with external package dependencies into a remote_function, you would provide the names of the packages via packages param.
>>> @bpd.remote_function( ... reuse=False, ... packages=["cryptography"], ... cloud_function_service_account="default" ... ) ... def get_hash(input: str) -> str: ... from cryptography.fernet import Fernet ... ... # handle missing value ... if input is None: ... input = "" ... ... key = Fernet.generate_key() ... f = Fernet(key) ... return f.encrypt(input.encode()).decode()
>>> names = bpd.Series(["Alice", "Bob"]) >>> hashes = names.apply(get_hash)
You could return an array output from the remote function.
>>> @bpd.remote_function(reuse=False, cloud_function_service_account="default") ... def text_analyzer(text: str) -> list[int]: ... words = text.count(" ") + 1 ... periods = text.count(".") ... exclamations = text.count("!") ... questions = text.count("?") ... return [words, periods, exclamations, questions]
>>> texts = bpd.Series([ ... "The quick brown fox jumps over the lazy dog.", ... "I love this product! It's amazing.", ... "Hungry? Wanna eat? Lets go!" ... ]) >>> features = texts.apply(text_analyzer) >>> features 0 [9 1 0 0] 1 [6 1 1 0] 2 [5 0 1 2] dtype: list<item: int64>[pyarrow]
- Parameters:
func (function) – BigFrames DataFrames
remote_functionto apply. The function should take a scalar and return a scalar. It will be applied to every element in theSeries.by_row (False or "compat", default "compat") – If “compat” , func must be a remote function which will be passed each element of the Series, like Series.map. If False, the func will be passed the whole Series at once.
- Returns:
A new Series with values representing the return value of the
funcapplied to each element of the original Series.- Return type: