14.27. Snowflake Connector

The Snowflake connector allows querying and creating tables in an external Snowflake database. This can be used to join data between different systems like Snowflake and Hive, or between different Snowflake instances.

Note

This connector is bundled in Presto Enterprise and requires a license from Starburst. For more information about Presto Enterprise and the Snowflake connector, or to obtain a free trial, please contact hello@starburstdata.com.

Configuration

To configure the Snowflake connector, create a catalog properties file in etc/catalog named, for example, snowflake.properties, to mount the Snowflake connector as the snowflake catalog.

There are two flavors of the Snowflake connector - snowflake-jdbc and snowflake-distributed.

snowflake-jdbc uses JDBC for all reads and writes and is more efficient when the result set returned from Snowflake is small.

When larger result sets are extracted from Snowflake, the snowflake-distributed connector may be a better choice. Instead of requesting query results over a JDBC connection, the connector asks Snowflake to export them to S3 and Presto reads them from there. Since both the write and the read are parallelized, this approach scales better for large data sets, but has a higher latency.

Create the catalog properties file with the following contents, replacing the connection properties as appropriate for your setup (for example, replace <account_name> with the full name of your account, as provided by Snowflake).

connector.name=<snowflake-jdbc or snowflake-distributed>
connection-url=jdbc:snowflake://<account_name>.snowflakecomputing.com/
connection-user=<user_name>
connection-password=<password>
snowflake.warehouse=<warehouse_name>
snowflake.database=<database_name>

The role used by Snowflake to execute operations can be specified as snowflake.role=<role_name>. This configuration is optional, and can not be used together with User Impersonation.

Additionally, there are a number of configuration properties that apply only to the distributed connector.

Distributed Connector Configuration Properties

Property Name Description Default
snowflake.stage-schema Name of schema in which stages are created for exporting data.  
snowflake.max-export-retries Number of export retries. 3
snowflake.parquet.max-read-block-size Maximum block size when reading from the export file. 16MB
snowflake.max-split-size Maximum split size for processing the export file. 64MB
snowflake.max-initial-split-size Maximum initial split size. Half of snowflake.max-split-size
snowflake.export-file-max-size Maximum size of files to create when exporting data. 16MB

Querying Snowflake

The Snowflake connector provides a schema for each Snowflake database. Assuming the catalog name is snowflake, you can see the available schemas by running SHOW SCHEMAS:

SHOW SCHEMAS FROM snowflake;

If you have a Snowflake database named web, you can view the tables in this database by running SHOW TABLES:

SHOW TABLES FROM snowflake.web;

You can see a list of the columns in the clicks table in web’s schema using either of the following:

DESCRIBE snowflake.web.clicks;
SHOW COLUMNS FROM snowflake.web.clicks;

Finally, you can access the clicks table in web’s schema:

SELECT * FROM snowflake.web.clicks;

Your privileges in these schemas are those of the user configured in the connection properties file. If the user does not have access to these tables, you will not be able to access them.

User Impersonation

The Snowflake connector supports User Impersonation. It can be turned on with:

snowflake.impersonation.enabled=true

Table Statistics

The Snowflake connector supports only table statistics. They are based on Snowflake’s INFORMATION_SCHEMA.TABLES table. Table statistics are automatically updated by Snowflake.

Table Statistics Configuration Properties

Property Name Description Default
statistics.enabled Enables table statistics. true
statistics.cache-ttl Duration for which table and column statistics are cached. 10m
statistics.cache-missing Cache the fact that table statistics are not available. false

Snowflake Connector Limitations

The following SQL statements are not yet supported: