bigframes.pandas.read_json#

bigframes.pandas.read_json(path_or_buf: str | IO[bytes], *, orient: Literal['split', 'records', 'index', 'columns', 'values', 'table'] = 'columns', dtype: Dict | None = None, encoding: str | None = None, lines: bool = False, engine: Literal['ujson', 'pyarrow', 'bigquery'] = 'ujson', write_engine: Literal['default', 'bigquery_inline', 'bigquery_load', 'bigquery_streaming', 'bigquery_write', '_deferred'] = 'default', **kwargs) → DataFrame[source]#

Convert a JSON string to DataFrame object.

Note

using engine=”bigquery” will not guarantee the same ordering as the file. Instead, set a serialized index column as the index and sort by that in the resulting DataFrame.

Note

For non-bigquery engine, data is inlined in the query SQL if it is small enough (roughly 5MB or less in memory). Larger size data is loaded to a BigQuery table instead.

Examples:

>>> import bigframes.pandas as bpd

>>> gcs_path = "gs://bigframes-dev-testing/sample1.json"
>>> df = bpd.read_json(path_or_buf=gcs_path, lines=True, orient="records")
>>> df.head(2)
   id   name
0   1  Alice
1   2    Bob

[2 rows x 2 columns]

Parameters:

path_or_buf (a valid JSON str, path object or file-like object) – A local or Google Cloud Storage (gs://) path with engine=”bigquery” otherwise passed to pandas.read_json.
orient (str, optional) –
If engine=”bigquery” orient only supports “records”. Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:
- 'split'dict like
  {{index -> [index], columns -> [columns], data -> [values]}}
- 'records'list like
  [{{column -> value}}, ... , {{column -> value}}]
- 'index' : dict like {{index -> {{column -> value}}}}
- 'columns' : dict like {{column -> {{index -> value}}}}
- 'values' : just the values array
dtype (bool or dict, default None) –
If True, infer dtypes; if a dict of column to dtype, then use those; if False, then don’t infer dtypes at all, applies only to the data.

For all orient values except 'table', default is True.
encoding (str, default is 'utf-8') – The encoding to use to decode py3 bytes.
lines (bool, default False) – Read the file as a json object per line. If using engine=”bigquery” lines only supports True.
engine ({{"ujson", "pyarrow", "bigquery"}}, default "ujson") – Type of engine to use. If engine=”bigquery” is specified, then BigQuery’s load API will be used. Otherwise, the engine will be passed to pandas.read_json.
write_engine (str) – How data should be written to BigQuery (if at all). See bigframes.pandas.read_pandas() for a full description of supported values.
**kwargs – keyword arguments for pandas.read_json when not using the BigQuery engine.

Returns:

The DataFrame representing JSON contents.

Return type:

bigframes.pandas.DataFrame

Raises:

bigframes.exceptions.DefaultIndexWarning – Using the default index is discouraged, such as with clustered or partitioned tables without primary keys.
ValueError – lines is only valid when orient is records.

bigframes.pandas.read_json#

This Page