bigframes.pandas.read_gbq_query#
- bigframes.pandas.read_gbq_query(query: str, *, index_col: Iterable[str] | str | bigframes.enums.DefaultIndexKind = (), columns: Iterable[str] = (), configuration: Dict | None = None, max_results: int | None = None, use_cache: bool | None = None, col_order: Iterable[str] = (), filters: vendored_pandas_gbq.FiltersType = (), dry_run: Literal[False] = False, allow_large_results: bool | None = None) bigframes.dataframe.DataFrame[source]#
- bigframes.pandas.read_gbq_query(query: str, *, index_col: Iterable[str] | str | bigframes.enums.DefaultIndexKind = (), columns: Iterable[str] = (), configuration: Dict | None = None, max_results: int | None = None, use_cache: bool | None = None, col_order: Iterable[str] = (), filters: vendored_pandas_gbq.FiltersType = (), dry_run: Literal[True] = False, allow_large_results: bool | None = None) Series
Turn a SQL query into a DataFrame.
Note: Because the results are written to a temporary table, ordering by
ORDER BYis not preserved. A unique index_col is recommended. Userow_number() over ()if there is no natural unique index or you want to preserve ordering.Examples:
Simple query input:
>>> import bigframes.pandas as bpd >>> df = bpd.read_gbq_query(''' ... SELECT ... pitcherFirstName, ... pitcherLastName, ... pitchSpeed, ... FROM `bigquery-public-data.baseball.games_wide` ... ''')
Preserve ordering in a query input.
>>> df = bpd.read_gbq_query(''' ... SELECT ... -- Instead of an ORDER BY clause on the query, use ... -- ROW_NUMBER() to create an ordered DataFrame. ... ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC) ... AS rowindex, ... ... pitcherFirstName, ... pitcherLastName, ... AVG(pitchSpeed) AS averagePitchSpeed ... FROM `bigquery-public-data.baseball.games_wide` ... WHERE year = 2016 ... GROUP BY pitcherFirstName, pitcherLastName ... ''', index_col="rowindex") >>> df.head(2) pitcherFirstName pitcherLastName averagePitchSpeed rowindex 1 Albertin Chapman 96.514113 2 Zachary Britton 94.591039 [2 rows x 3 columns]
See also:
Session.read_gbq().- Parameters:
query (str) – A SQL query to execute.
index_col (Iterable[str] or str, optional) – The column(s) to use as the index for the DataFrame. This can be a single column name or a list of column names. If not provided, a default index will be used.
columns (Iterable[str], optional) – The columns to read from the query result. If not specified, all columns will be read.
configuration (dict, optional) – A dictionary of query job configuration options. See the BigQuery REST API documentation for a list of available options: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query
max_results (int, optional) – The maximum number of rows to retrieve from the query result. If not specified, all rows will be loaded.
use_cache (bool, optional) – Whether to use cached results for the query. Defaults to
True. Setting this toFalsewill force a re-execution of the query.col_order (Iterable[str], optional) – The desired order of columns in the resulting DataFrame. This parameter is deprecated and will be removed in a future version. Use
columnsinstead.filters (list[tuple], optional) – A list of filters to apply to the data. Filters are specified as a list of tuples, where each tuple contains a column name, an operator (e.g., ‘==’, ‘!=’), and a value.
dry_run (bool, optional) – If
True, the function will not actually execute the query but will instead return statistics about the query. Defaults toFalse.allow_large_results (bool, optional) – Whether to allow large query results. If
True, the query results can be larger than the maximum response size. Defaults tobpd.options.compute.allow_large_results.
- Returns:
A DataFrame representing the result of the query. If
dry_runisTrue, apandas.Seriescontaining query statistics is returned.- Return type:
- Raises:
ValueError – When both
columnsandcol_orderare specified.