bigframes.pandas.DataFrame.merge#

DataFrame.merge(right: DataFrame, how: Literal['inner', 'left', 'outer', 'right', 'cross'] = 'inner', on: Hashable | Sequence[Hashable] | None = None, *, left_on: Hashable | Sequence[Hashable] | None = None, right_on: Hashable | Sequence[Hashable] | None = None, left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes: tuple[str, str] = ('_x', '_y')) → DataFrame[source]#

Merge DataFrame objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Warning

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

Examples:

Merge DataFrames df1 and df2 by specifying type of merge:

>>> df1 = bpd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df1
     a  b
0  foo  1
1  bar  2

[2 rows x 2 columns]

>>> df2 = bpd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df2
     a  c
0  foo  3
1  baz  4

[2 rows x 2 columns]

>>> df1.merge(df2, how="inner", on="a")
     a  b  c
0  foo  1  3

[1 rows x 3 columns]

>>> df1.merge(df2, how='left', on='a')
     a  b     c
0  foo  1     3
1  bar  2  <NA>

[2 rows x 3 columns]

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1 = bpd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df1
  lkey  value
0  foo      1
1  bar      2
2  baz      3
3  foo      5

[4 rows x 2 columns]

>>> df2 = bpd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df2
  rkey  value
0  foo      5
1  bar      6
2  baz      7
3  foo      8

[4 rows x 2 columns]

>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8

[6 rows x 4 columns]

Parameters:

right – Object to merge with.
how – {'left', 'right', 'outer', 'inner', 'cross'}, default 'inner' Type of merge to be performed. left: use only keys from left frame, similar to a SQL left outer join; preserve key order. right: use only keys from right frame, similar to a SQL right outer join; preserve key order. outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically. inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys. cross: creates the cartesian product from both frames, preserves the order of the left keys.
on (label or list of labels) – Columns to join on. It must be found in both DataFrames. Either on or left_on + right_on must be passed in.
left_on (label or list of labels) – Columns to join on in the left DataFrame. Either on or left_on + right_on must be passed in.
right_on (label or list of labels) – Columns to join on in the right DataFrame. Either on or left_on + right_on must be passed in.
left_index (bool, default False) – Use the index from the left DataFrame as the join key.
right_index (bool, default False) – Use the index from the right DataFrame as the join key.
sort – Default False. Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).
suffixes – Default ("_x", "_y"). A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.

Returns:

A DataFrame of the two merged objects.

Return type:

bigframes.pandas.DataFrame

Raises:

ValueError – If value for on is specified for cross join.
ValueError – If on or left_on + right_on are not specified when on is None.
ValueError – If on and left_on + right_on are specified when on is not None.
ValueError – If no column with the provided label is found in self for left join.
ValueError – If no column with the provided label is found in self for right join.

bigframes.pandas.DataFrame.merge#

This Page