13.4. BigQuery Connector
The Google BigQuery connector allows querying tables in the BigQuery service. This can be used to join data between different systems like BigQuery and Hive, or between different BigQuery instances.
This connector is bundled in Presto Enterprise and requires a license from Starburst. For more information about Presto Enterprise and BigQuery connector or to obtain a free trial, please contact firstname.lastname@example.org.
Installation and configuration
The Google BigQuery connector plugin is distributed as part of Starburst Presto, but still requires installing the external Google BigQuery JDBC driver.
Installation and configuration methods require the following artifacts:
A Google service account to connect to BigQuery (can be created at Google IAM Service accounts page) which has the following roles:
BigQuery Data Editor
BigQuery Job User
A JSON private key file linked to the service account
BigQuery JDBC driver, which can be obtained from Google’s BigQuery drivers page
Please note that each Starburst Presto release is certified to work with a single version of the BigQuery JDBC driver:
BigQuery JDBC version
Starting with 312-e.7
Using a mismatched version may cause issues with JDBC driver dependencies.
Using a dedicated script
Starburst Presto RPM includes a special
install.sh script to help install the BigQuery JDBC driver and configure the connector.
This script is installed in
/usr/lib/presto/lib/plugin/bigquery and can be used and modified to fit a custom environment.
Out of the box, the script follows a typical Presto layout and eases the process of installing and configuring the connector on the Presto cluster.
The install script accepts a single optional parameter, the name of the default BigQuery dataset, which is used when a query doesn’t specify a dataset.
The install script assumes it finds the following files in the directory where it is run:
BigQuery JDBC driver, downloaded from Google and extracted from the archive (
BigQuery service account private key JSON file (
The script generates connection details to connect to BigQuery, in form of the standard Presto connector configuration (
The script follows the standard directory layout of a Presto installation and makes the following assumptions:
The node from which the script is run has password-less ssh login to all other Presto nodes.
The user has
Presto is running and available at
Presto plugin directory is
/usr/lib/presto/lib/plugin, plugin is installed under the name
Presto catalog directory is
There will be one BigQuery catalog (instance of the Presto BigQuery Connector), available under the name
Presto was installed as a service via
service presto restart).
presto-cliis in the user’s PATH.
The script does its work in the following sequence, for each node (including the node that it’s running on):
Send the files over
sshplacing them in the
Configure the connector.
Cleanup the files from
The script works out of the box for a standard Presto installation layout, but you are free to modify the script to fit your particular cluster environment and Presto setup.
Manual JDBC Driver Installation
To install the BigQuery Database connector manually, you will
need an installation of the Starburst Distribution of Presto, the BigQuery JDBC Driver
GoogleBigQueryJDBC42.jar) and a JSON private key (
service_account_private_key.json) for connecting to BigQuery.
Create a new directory named
bigqueryin Presto’s plugin directory (i.e.
Add the BigQuery JDBC Driver JAR and the JSON private key to the new directory.
Perform the above steps on every Presto cluster node.
Restart Presto on every node via
service presto restart.
Manual Connector Configuration
Connection details to connect to the BigQuery DB are required, in form of a standard Presto connector configuration (eg.
bigquery.properties for a Presto catalog named
This file should contain the following contents, replacing the connection properties as appropriate for your setup:
A detailed description of all BigQuery JDBC connection parameters can be found in Simba BigQuery JDBC documentation
Copy the catalog properties file to Presto catalog directory on every node and restart all Presto instances.
Multiple BigQuery Databases
If you want to connect to multiple BigQuery Databases, configure another instance of the BigQuery plugin as a separate catalog.
To add another BigQuery catalog, simply add another properties file to
~/.prestoadmin/catalog with a different name (making sure it ends in
.properties). For example, if you name the property file
sales.properties, Presto will create a catalog named
Querying BigQuery Databases
The BigQuery Database connector provides a schema for each BigQuery Database project and dataset.
Assuming the BigQuery project name is
bigquery, you can see the available schemas by running
SHOW SCHEMAS FROM bigquery;
If you have an BigQuery Database dataset named
web, you can view the tables it contains
SHOW TABLES FROM bigquery.web;
You can see a list of the columns in the
clicks table in
using either of the following:
DESCRIBE bigquery.web.clicks; SHOW COLUMNS FROM bigquery.web.clicks;
Finally, you can access the
clicks table in
SELECT * FROM bigquery.web.clicks;
Your privileges in these schemas are those of the user configured in the connection properties file. If the user does not have access to these tables, you will not be able to access them.
Mapping Data Types Between Presto and BigQuery
Presto and BigQuery have their own type system. And while they try to adhere to the SQL standard there are some differences that require a conversion. And sometimes a conversion is not possible.
BigQuery-to-Presto type mapping
Presto supports selecting the following BigQuery Database types. The table shows the mappings from BigQuery to Presto data types.
BigQuery Database type
See Mapping string types.
Mapping datetime types
Selecting a BigQuery temporal value with fractional second precision greater than 3 will truncate (not round) the fractional seconds to three digits.
TIMESTAMP values are stored in UTC time zone. When Presto reads these values from BigQuery, all selected BigQuery
TIMESTAMP values are mapped onto a Presto
TIMESTAMP WITH TIME ZONE in UTC time zone.
Because JDBC API returns
DATETIME values in the JVM time zone,
some values stored in BigQuery may not be representable by JDBC due to daylight
savings time and other gaps in the JVM time zone.
For this reason, we recommend setting the JVM time zone to UTC.
Mapping numeric types
There is an issue in BigQuery JDBC driver related to BigDecimal handling
in prepared statements.
Until it is resolved, you can get incorrect query results when
querying BigQuery tables with a predicate on a
Mapping string types
STRING will return Presto
STRING doesn’t have a explicit storage limit,
the BigQuery JDBC driver imposes a limit (65535 characters by default) while transferring data.
This limit can be increased by adding a
to the JDBC URL, for details see Simba BigQuery JDBC StringColumnLength documentation.
Unmapped BigQuery types
Some BigQuery types (
STRUCT also known as
RECORD) don’t have an equivalent in Presto
and are only accessible as
VARCHAR types. These can be parsed in Presto by the user.
Other BigQuery types (
ARRAY, also known as repeated values) have limitations in the BigQuery JDBC driver. Therefore Presto is unable to support such a type.
BigQuery imposes several quotas on all users.
Several of these apply to the BigQuery JDBC driver and by extension to the BigQuery connector:
Concurrent rate limit for on-demand, interactive queries — 50 concurrent queries.
Concurrent rate limit for on-demand, interactive queries against Cloud Bigtable external data sources — 4 concurrent queries
Query execution time limit — 6 hours
Maximum number of tables referenced per query — 1,000
Maximum unresolved standard SQL query length — 1 MB
Maximum row size — 100 MB
Maximum columns in a table, query result, or view definition — 10,000
API requests per second, per user — 100
Maximum number of tabledata.list queries per project: 500/second
Maximum bytes per second per project returned by calls to tabledata.list: 60 MB/second
Maximum rows per second per project returned by calls to tabledata.list: 150,000/second
Please note that the BigQuery connector needs to expand table names and literals in a Presto SQL query and may trigger the SQL query length limit for Presto queries shorter than 1 MB.
Adding and modifying data through this connector is not supported. This limitation is because of:
BigQuery DML quotas - 1,000 per day per table
BigQuery DML semantics - BigQuery JDBC driver doesn’t support bulk INSERTs, each tuple counts as a separate operation
The following SQL statements are not yet supported: