bigframes.pandas.DataFrame.to_parquet#

DataFrame.to_parquet(path=None, *, compression: Literal['snappy', 'gzip'] | None = 'snappy', index: bool = True, allow_large_results: bool | None = None) bytes | None[source]#

Write a DataFrame to the binary Parquet format.

This function writes the dataframe as a parquet file to Cloud Storage.

Examples:

>>> import bigframes.pandas as bpd
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> gcs_bucket = "gs://bigframes-dev-testing/sample_parquet*.parquet"
>>> df.to_parquet(path=gcs_bucket)
Parameters:
  • path (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset. Destination URI(s) of Cloud Storage files(s) to store the extracted dataframe should be formatted gs://<bucket_name>/<object_name_or_glob>. If the data size is more than 1GB, you must use a wildcard to export the data into multiple files and the size of the files varies.

  • compression (str, default 'snappy') – Name of the compression to use. Use None for no compression. Supported options: 'gzip', 'snappy'.

  • index (bool, default True) – If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file.

  • allow_large_results (bool, default None) – If not None, overrides the global setting to allow or disallow large query results over the default size limit of 10 GB. This parameter has no effect when results are saved to Google Cloud Storage (GCS).

Returns:

bytes if no path argument is provided else None

Return type:

None or bytes

Raises:

ValueError – If an invalid value provided for compression that is not one of None, snappy, or gzip.