bigframes.pandas.DataFrame.groupby#

DataFrame.groupby(by: Hashable | Series | Sequence[Hashable | Series] | None = None, *, level: Hashable | Sequence[Hashable] | None = None, as_index: bool = True, dropna: bool = True) DataFrameGroupBy[source]#

Group DataFrame by columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Examples:

>>> df = bpd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                                'Parrot', 'Parrot'],
...                     'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0

[4 rows x 2 columns]
>>> df.groupby(['Animal'])['Max Speed'].mean()
Animal
Falcon    375.0
Parrot     25.0
Name: Max Speed, dtype: Float64

We can also choose to include NA in group keys or not by setting dropna:

>>> df = bpd.DataFrame([[1, 2, 3],[1, None, 4], [2, 1, 3], [1, 2, 2]],
...                    columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
     a  c
b
1.0  2  3
2.0  2  5

[2 rows x 2 columns]
>>> df.groupby(by=["b"], dropna=False).sum()
      a  c
b
1.0   2  3
2.0   2  5
<NA>  1  4

[3 rows x 2 columns]

We can also choose to return object with group labels or not by setting as_index:

>>> df.groupby(by=["b"], as_index=False).sum()
     b  a  c
0  1.0  2  3
1  2.0  2  5

[2 rows x 3 columns]
Parameters:
  • by (str, Sequence[str]) – A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

  • level (int, level name, or sequence of such, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

  • as_index (bool, default True) – Default True. Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations such as head(), tail(), nth() and in transformations.

  • dropna (bool, default True) – Default True. If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

A groupby object that contains information about the groups.

Return type:

bigframes.core.groupby.SeriesGroupBy

Raises:
  • ValueError – If both by and level are specified.

  • TypeError – If one of by or level` is not specified.