bigframes.pandas.DataFrame.merge#
- DataFrame.merge(right: DataFrame, how: Literal['inner', 'left', 'outer', 'right', 'cross'] = 'inner', on: Hashable | Sequence[Hashable] | None = None, *, left_on: Hashable | Sequence[Hashable] | None = None, right_on: Hashable | Sequence[Hashable] | None = None, left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes: tuple[str, str] = ('_x', '_y')) DataFrame[source]#
Merge DataFrame objects with a database-style join.
The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Warning
If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.
Examples:
Merge DataFrames df1 and df2 by specifying type of merge:
>>> df1 = bpd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]}) >>> df1 a b 0 foo 1 1 bar 2 [2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]}) >>> df2 a c 0 foo 3 1 baz 4 [2 rows x 2 columns]
>>> df1.merge(df2, how="inner", on="a") a b c 0 foo 1 3 [1 rows x 3 columns]
>>> df1.merge(df2, how='left', on='a') a b c 0 foo 1 3 1 bar 2 <NA> [2 rows x 3 columns]
Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.
>>> df1 = bpd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'], ... 'value': [1, 2, 3, 5]}) >>> df1 lkey value 0 foo 1 1 bar 2 2 baz 3 3 foo 5 [4 rows x 2 columns]
>>> df2 = bpd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'], ... 'value': [5, 6, 7, 8]}) >>> df2 rkey value 0 foo 5 1 bar 6 2 baz 7 3 foo 8 [4 rows x 2 columns]
>>> df1.merge(df2, left_on='lkey', right_on='rkey') lkey value_x rkey value_y 0 foo 1 foo 5 1 foo 1 foo 8 2 bar 2 bar 6 3 baz 3 baz 7 4 foo 5 foo 5 5 foo 5 foo 8 [6 rows x 4 columns]
- Parameters:
right – Object to merge with.
how –
{'left', 'right', 'outer', 'inner', 'cross'}, default 'inner'Type of merge to be performed.left: use only keys from left frame, similar to a SQL left outer join; preserve key order.right: use only keys from right frame, similar to a SQL right outer join; preserve key order.outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.cross: creates the cartesian product from both frames, preserves the order of the left keys.on (label or list of labels) – Columns to join on. It must be found in both DataFrames. Either on or left_on + right_on must be passed in.
left_on (label or list of labels) – Columns to join on in the left DataFrame. Either on or left_on + right_on must be passed in.
right_on (label or list of labels) – Columns to join on in the right DataFrame. Either on or left_on + right_on must be passed in.
left_index (bool, default False) – Use the index from the left DataFrame as the join key.
right_index (bool, default False) – Use the index from the right DataFrame as the join key.
sort – Default False. Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).
suffixes – Default
("_x", "_y"). A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.
- Returns:
A DataFrame of the two merged objects.
- Return type:
- Raises:
ValueError – If value for
onis specified for cross join.ValueError – If
onorleft_on+right_onare not specified whenonisNone.ValueError – If
onandleft_on+right_onare specified whenonis notNone.ValueError – If no column with the provided label is found in
selffor left join.ValueError – If no column with the provided label is found in
selffor right join.