bigframes.pandas.DataFrame.where#

DataFrame.where(cond, other=None)[source]#

Replace values where the condition is False.

Examples:

>>> df = bpd.DataFrame({'a': [20, 10, 0], 'b': [0, 10, 20]})
>>> df
    a   b
0  20   0
1  10  10
2   0  20

[3 rows x 2 columns]

You can filter the values in the dataframe based on a condition. The values matching the condition would be kept, and not matching would be replaced. The default replacement value is NA. For example, when the condition is a dataframe:

>>> df.where(df > 0)
      a     b
0    20  <NA>
1    10    10
2  <NA>    20

[3 rows x 2 columns]

You can specify a custom replacement value for non-matching values.

>>> df.where(df > 0, -1)
      a     b
0    20    -1
1    10    10
2    -1    20

[3 rows x 2 columns]

Besides dataframe, the condition can be a series too. For example:

>>> df.where(df['a'] > 10, -1)
      a     b
0    20     0
1    -1    -1
2    -1    -1

[3 rows x 2 columns]

As for the replacement, it can be a dataframe too. For example:

>>> df.where(df > 10, -df)
      a     b
0    20     0
1   -10   -10
2     0    20

[3 rows x 2 columns]
>>> df.where(df['a'] > 10, -df)
      a     b
0    20     0
1   -10   -10
2     0   -20

[3 rows x 2 columns]

Please note, replacement doesn’t support Series for now. In pandas, when specifying a Series as replacement, the axis value should be specified at the same time, which is not supported in bigframes DataFrame.

Parameters:
  • cond (bool Series/DataFrame, array-like, or callable) – Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and returns boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

  • other (scalar, DataFrame, or callable) – Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the DataFrame and returns scalar or DataFrame. The callable must not change input DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).

Returns:

DataFrame after the replacement.

Return type:

DataFrame