bigframes.pandas.DataFrame.join#

DataFrame.join(other: DataFrame | Series, on: str | None = None, how: str = 'left', lsuffix: str = '', rsuffix: str = '') DataFrame[source]#

Join columns of another DataFrame.

Join columns with other DataFrame on index

Examples:

Join two DataFrames by specifying how to handle the operation:

>>> df1 = bpd.DataFrame({'col1': ['foo', 'bar'], 'col2': [1, 2]}, index=[10, 11])
>>> df1
   col1  col2
10  foo     1
11  bar     2

[2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'col3': ['foo', 'baz'], 'col4': [3, 4]}, index=[11, 22])
>>> df2
   col3  col4
11  foo     3
22  baz     4

[2 rows x 2 columns]
>>> df1.join(df2)
   col1  col2  col3  col4
10  foo     1  <NA>  <NA>
11  bar     2   foo     3

[2 rows x 4 columns]
>>> df1.join(df2, how="left")
   col1  col2  col3  col4
10  foo     1  <NA>  <NA>
11  bar     2   foo     3

[2 rows x 4 columns]
>>> df1.join(df2, how="right")
    col1  col2 col3  col4
11  bar      2  foo     3
22  <NA>  <NA>  baz     4

[2 rows x 4 columns]
>>> df1.join(df2, how="outer")
    col1  col2  col3  col4
10   foo     1  <NA>  <NA>
11   bar     2   foo     3
22  <NA>  <NA>   baz     4

[3 rows x 4 columns]
>>> df1.join(df2, how="inner")
   col1  col2 col3  col4
11  bar     2  foo     3

[1 rows x 4 columns]

Another option to join using the key columns is to use the on parameter:

>>> df1.join(df2, on="col2", how="right")
      col1  col2 col3  col4
<NA>  <NA>    11  foo     3
<NA>  <NA>    22  baz     4

[2 rows x 4 columns]

If there are overlapping columns, lsuffix and rsuffix can be used:

>>> df1 = bpd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
>>> df2 = bpd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['B0', 'B1', 'B2']})
>>> df1.set_index('key').join(df2.set_index('key'), lsuffix='_left', rsuffix='_right')
     A_left A_right
key
K0       A0      B0
K1       A1      B1
K2       A2      B2

[3 rows x 2 columns]
Parameters:
  • other – DataFrame or Series with an Index similar to the Index of this one.

  • on – Column in the caller to join on the index in other, otherwise joins index-on-index. Like an Excel VLOOKUP operation.

  • how ({'left', 'right', 'outer', 'inner'}, default 'left') – How to handle the operation of the two objects. left: use calling frame’s index (or column if on is specified) right: use other’s index. outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it lexicographically. inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one. cross: creates the cartesian product from both frames, preserves the order of the left keys.

  • lsuffix (str, default '') – Suffix to use from left frame’s overlapping columns.

  • rsuffix (str, default '') – Suffix to use from right frame’s overlapping columns.

Returns:

A dataframe containing columns from both the caller and other.

Return type:

bigframes.pandas.DataFrame

Raises:
  • ValueError – If value for on is specified for cross join.

  • ValueError – If join on columns does not match the index level of the other DataFrame. Join on columns with multi-index is not supported.

  • ValueError – If left index to join on does not have the same number of levels as the right index.

  • ValueError – If columns overlap but no suffix is specified.

  • ValueError – If on column is not unique.