bigframes.pandas.Series.groupby#

Group Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Examples:

You can group by a named index level.

>>> s = bpd.Series([380, 370., 24., 26.],
...                index=["Falcon", "Falcon", "Parrot", "Parrot"],
...                name="Max Speed")
>>> s.index.name="Animal"
>>> s
Animal
Falcon    380.0
Falcon    370.0
Parrot     24.0
Parrot     26.0
Name: Max Speed, dtype: Float64
>>> s.groupby("Animal").mean()
Animal
Falcon    375.0
Parrot     25.0
Name: Max Speed, dtype: Float64

You can also group by more than one index levels.

>>> s = bpd.Series([380, 370., 24., 26.],
...                index=pd.MultiIndex.from_tuples(
...                    [("Falcon", "Clear"),
...                     ("Falcon", "Cloudy"),
...                     ("Parrot", "Clear"),
...                     ("Parrot", "Clear")],
...                    names=["Animal", "Sky"]),
...                name="Max Speed")
>>> s
Animal    Sky
Falcon  Clear     380.0
        Cloudy    370.0
Parrot  Clear      24.0
        Clear      26.0
Name: Max Speed, dtype: Float64

>>> s.groupby("Animal").mean()
Animal
Falcon    375.0
Parrot     25.0
Name: Max Speed, dtype: Float64

>>> s.groupby("Sky").mean()
Sky
Clear     143.333333
Cloudy         370.0
Name: Max Speed, dtype: Float64

>>> s.groupby(["Animal", "Sky"]).mean()
Animal  Sky
Falcon  Clear     380.0
        Cloudy    370.0
Parrot  Clear      25.0
Name: Max Speed, dtype: Float64

You can also group by values in a Series provided the index matches with the original series.

>>> df = bpd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...                     'Max Speed': [380., 370., 24., 26.],
...                     'Age': [10., 20., 4., 6.]})
>>> df
Animal  Max Speed   Age
0  Falcon      380.0  10.0
1  Falcon      370.0  20.0
2  Parrot       24.0   4.0
3  Parrot       26.0   6.0

[4 rows x 3 columns]

>>> df['Max Speed'].groupby(df['Animal']).mean()
Animal
Falcon    375.0
Parrot     25.0
Name: Max Speed, dtype: Float64

>>> df['Age'].groupby(df['Animal']).max()
Animal
Falcon    20.0
Parrot     6.0
Name: Age, dtype: Float64

Parameters:

by (mapping, function, label, pd.Grouper or list of such, default None) – Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
axis ({0 or 'index', 1 or 'columns'}, default 0) – Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.
level (int, level name, or sequence of such, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.
as_index (bool, default True) – Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the “filtrations in the user guide” https://pandas.pydata.org/docs/dev/user_guide/groupby.html#filtration), such as head(), tail(), nth() and in transformations (see the “transformations in the user guide” https://pandas.pydata.org/docs/dev/user_guide/groupby.html#transformation).
dropna – bool, default True If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

Returns a groupby object that contains information about the groups.

Return type:

bigframes.core.groupby.SeriesGroupBy