bigframes.pandas.DataFrame.join#
- DataFrame.join(other: DataFrame | Series, on: str | None = None, how: str = 'left', lsuffix: str = '', rsuffix: str = '') DataFrame[source]#
Join columns of another DataFrame.
Join columns with other DataFrame on index
Examples:
Join two DataFrames by specifying how to handle the operation:
>>> df1 = bpd.DataFrame({'col1': ['foo', 'bar'], 'col2': [1, 2]}, index=[10, 11]) >>> df1 col1 col2 10 foo 1 11 bar 2 [2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'col3': ['foo', 'baz'], 'col4': [3, 4]}, index=[11, 22]) >>> df2 col3 col4 11 foo 3 22 baz 4 [2 rows x 2 columns]
>>> df1.join(df2) col1 col2 col3 col4 10 foo 1 <NA> <NA> 11 bar 2 foo 3 [2 rows x 4 columns]
>>> df1.join(df2, how="left") col1 col2 col3 col4 10 foo 1 <NA> <NA> 11 bar 2 foo 3 [2 rows x 4 columns]
>>> df1.join(df2, how="right") col1 col2 col3 col4 11 bar 2 foo 3 22 <NA> <NA> baz 4 [2 rows x 4 columns]
>>> df1.join(df2, how="outer") col1 col2 col3 col4 10 foo 1 <NA> <NA> 11 bar 2 foo 3 22 <NA> <NA> baz 4 [3 rows x 4 columns]
>>> df1.join(df2, how="inner") col1 col2 col3 col4 11 bar 2 foo 3 [1 rows x 4 columns]
Another option to join using the key columns is to use the on parameter:
>>> df1.join(df2, on="col2", how="right") col1 col2 col3 col4 <NA> <NA> 11 foo 3 <NA> <NA> 22 baz 4 [2 rows x 4 columns]
If there are overlapping columns, lsuffix and rsuffix can be used:
>>> df1 = bpd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']}) >>> df2 = bpd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['B0', 'B1', 'B2']}) >>> df1.set_index('key').join(df2.set_index('key'), lsuffix='_left', rsuffix='_right') A_left A_right key K0 A0 B0 K1 A1 B1 K2 A2 B2 [3 rows x 2 columns]
- Parameters:
other – DataFrame or Series with an Index similar to the Index of this one.
on – Column in the caller to join on the index in other, otherwise joins index-on-index. Like an Excel VLOOKUP operation.
how ({'left', 'right', 'outer', 'inner'}, default 'left') – How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)right: use other’s index.outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it lexicographically.inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.cross: creates the cartesian product from both frames, preserves the order of the left keys.lsuffix (str, default '') – Suffix to use from left frame’s overlapping columns.
rsuffix (str, default '') – Suffix to use from right frame’s overlapping columns.
- Returns:
A dataframe containing columns from both the caller and other.
- Return type:
- Raises:
ValueError – If value for
onis specified for cross join.ValueError – If join on columns does not match the index level of the other DataFrame. Join on columns with multi-index is not supported.
ValueError – If left index to join on does not have the same number of levels as the right index.
ValueError – If columns overlap but no suffix is specified.
ValueError – If on column is not unique.