bigframes.pandas.DataFrame.groupby#
- DataFrame.groupby(by: Hashable | Series | Sequence[Hashable | Series] | None = None, *, level: Hashable | Sequence[Hashable] | None = None, as_index: bool = True, dropna: bool = True) DataFrameGroupBy[source]#
Group DataFrame by columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Examples:
>>> df = bpd.DataFrame({'Animal': ['Falcon', 'Falcon', ... 'Parrot', 'Parrot'], ... 'Max Speed': [380., 370., 24., 26.]}) >>> df Animal Max Speed 0 Falcon 380.0 1 Falcon 370.0 2 Parrot 24.0 3 Parrot 26.0 [4 rows x 2 columns]
>>> df.groupby(['Animal'])['Max Speed'].mean() Animal Falcon 375.0 Parrot 25.0 Name: Max Speed, dtype: Float64
We can also choose to include NA in group keys or not by setting dropna:
>>> df = bpd.DataFrame([[1, 2, 3],[1, None, 4], [2, 1, 3], [1, 2, 2]], ... columns=["a", "b", "c"]) >>> df.groupby(by=["b"]).sum() a c b 1.0 2 3 2.0 2 5 [2 rows x 2 columns]
>>> df.groupby(by=["b"], dropna=False).sum() a c b 1.0 2 3 2.0 2 5 <NA> 1 4 [3 rows x 2 columns]
We can also choose to return object with group labels or not by setting as_index:
>>> df.groupby(by=["b"], as_index=False).sum() b a c 0 1.0 2 3 1 2.0 2 5 [2 rows x 3 columns]
- Parameters:
by (str, Sequence[str]) – A label or list of labels may be passed to group by the columns in
self. Notice that a tuple is interpreted as a (single) key.level (int, level name, or sequence of such, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both
byandlevel.as_index (bool, default True) – Default True. Return object with group labels as the index. Only relevant for DataFrame input.
as_index=Falseis effectively “SQL-style” grouped output. This argument has no effect on filtrations such ashead(),tail(),nth()and in transformations.dropna (bool, default True) – Default True. If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.
- Returns:
A groupby object that contains information about the groups.
- Return type:
bigframes.core.groupby.SeriesGroupBy
- Raises:
ValueError – If both
byandlevelare specified.TypeError – If one of
byor level` is not specified.