bigframes.pandas.DataFrame.to_pandas_batches#
- DataFrame.to_pandas_batches(page_size: int | None = None, max_results: int | None = None, *, allow_large_results: bool | None = None) Iterable[DataFrame][source]#
Stream DataFrame results to an iterable of pandas DataFrame.
page_size and max_results determine the size and number of batches, see https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJob#google_cloud_bigquery_job_QueryJob_result
Examples:
>>> df = bpd.DataFrame({'col': [4, 3, 2, 2, 3]})
Iterate through the results in batches, limiting the total rows yielded across all batches via max_results:
>>> for df_batch in df.to_pandas_batches(max_results=3): ... print(df_batch) col 0 4 1 3 2 2
Alternatively, control the approximate size of each batch using page_size and fetch batches manually using next():
>>> it = df.to_pandas_batches(page_size=2) >>> next(it) col 0 4 1 3 >>> next(it) col 2 2 3 2
- Parameters:
page_size (int, default None) – The maximum number of rows of each batch. Non-positive values are ignored.
max_results (int, default None) – The maximum total number of rows of all batches.
allow_large_results (bool, default None) – If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB.
- Returns:
An iterable of smaller dataframes which combine to form the original dataframe. Results stream from bigquery, see https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.table.RowIterator#google_cloud_bigquery_table_RowIterator_to_arrow_iterable
- Return type:
Iterable[pandas.DataFrame]