bigframes.pandas.DataFrame.dropna#

DataFrame.dropna(*, axis: int | str = 0, how: str = 'any', thresh: int | None = None, subset: None | Hashable | Sequence[Hashable] = None, inplace: bool = False, ignore_index=False) DataFrame[source]#

Remove missing values.

Examples:

>>> df = bpd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                     "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                     "born": [pd.NA, "1940-04-25", pd.NA]})
>>> df
       name        toy        born
0    Alfred       <NA>        <NA>
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        <NA>

[3 rows x 3 columns]

Drop the rows where at least one element is missing:

>>> df.dropna()
     name        toy        born
1  Batman  Batmobile  1940-04-25

[1 rows x 3 columns]

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

[3 rows x 1 columns]

Drop the rows where all elements are missing:

>>> df.dropna(how='all')
       name        toy        born
0    Alfred       <NA>        <NA>
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        <NA>

[3 rows x 3 columns]

Keep rows with at least 2 non-null values.

>>> df.dropna(thresh=2)
                name        toy        born
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        <NA>

[2 rows x 3 columns]

Keep columns with at least 2 non-null values:

>>> df.dropna(axis='columns', thresh=2)
    name        toy
0    Alfred       <NA>
1    Batman  Batmobile
2  Catwoman   Bullwhip

[3 rows x 2 columns]

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
       name        toy        born
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        <NA>

[2 rows x 3 columns]
Parameters:
  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Determine if rows or columns which contain missing values are removed.

    • 0, or ‘index’ : Drop rows which contain missing values.

    • 1, or ‘columns’ : Drop columns which contain missing value.

  • how ({'any', 'all'}, default 'any') –

    Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

    • ’any’ : If any NA values are present, drop that row or column.

    • ’all’ : If all values are NA, drop that row or column.

  • thresh (int, optional) – Require that many non-NA values. Cannot be combined with how.

  • subset (column label or sequence of labels, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Only supports axis=0.

  • inplace (bool, default False) – Not supported.

  • ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns:

DataFrame with NA entries dropped from it.

Return type:

bigframes.pandas.DataFrame

Raises:
  • ValueError – If how is not one of any or all.

  • TyperError – If both how and thresh are specified.