bigframes.pandas.DataFrame.drop_duplicates#

DataFrame.drop_duplicates(subset: Hashable | Sequence[Hashable] = None, *, keep: str = 'first') DataFrame[source]#

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

Parameters:
  • subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns.

  • keep ({‘first’, ‘last’, False}, default ‘first’) –

    Determines which duplicates (if any) to keep.

    • ’first’ : Drop duplicates except for the first occurrence.

    • ’last’ : Drop duplicates except for the last occurrence.

    • False : Drop all duplicates.

Returns:

DataFrame with duplicates removed

Return type:

bigframes.pandas.DataFrame