bigframes.pandas.api.typing.StringMethods.extract#

StringMethods.extract(pat: str, flags: int = 0) DataFrame[source]#

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

Examples:

>>> import bigframes.pandas as bpd

A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.

>>> s = bpd.Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')
      0     1
0     a     1
1     b     2
2  <NA>  <NA>

[3 rows x 2 columns]

Named groups will become column names in the result.

>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')
  letter digit
0      a     1
1      b     2
2   <NA>  <NA>

[3 rows x 2 columns]

A pattern with one group will return a DataFrame with one column.

>>> s.str.extract(r'[ab](\d)')
      0
0     1
1     2
2  <NA>

[3 rows x 1 columns]
Parameters:
  • pat (str) – Regular expression pattern with capturing groups.

  • flags (int, default 0 (no flags)) – Flags from the re module, e.g. re.IGNORECASE, that modify regular expression matching for things like case, spaces, etc. For more details, see re.

Returns:

A DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used.

Return type:

bigframes.dataframe.DataFrame