Starburst JDBC connector base#
A number of SEP connectors use JDBC connections to the underlying data source and a common base implementation. They share user facing behavior and features, documented in the following sections.
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the property
dynamic-filtering.enabled in your catalog properties file to
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the
dynamic-filtering.wait-timeout property in your
catalog properties file:
You can use the
dynamic_filtering_wait_timeout catalog session
property in a specific session:
SET SESSION mycatalog.dynamic_filtering_wait_timeout = 1s;
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
dynamic-filtering.domain-compaction-threshold property in the catalog
You can use the
SET SESSION mycatalog.dynamic_filtering_domain_compaction_threshold = 10;
dynamic-filtering.domain-compaction-threshold is set to
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column
dt on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30') to
'2020-01-10' AND '2020-05-30'. Using a large threshold can result in increased
table scan overhead due to a large
IN list getting pushed down to the data
Metrics about dynamic filtering are reported in a JMX table for each specific
mycatalog using any connector:
For the Snowflake connector, the JMX table also includes the name:
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of awaitable dynamic filters and the time spent waiting for dynamic filters.
JDBC connection pooling#
JDBC connection pooling is disabled by default. When enabled each worker creates and maintains a connection pool instead of opening and closing separate connections to the data source sequentially. Each connection is available to connect to the data source and retrieve data. After completion of the operation, the connection is returned to the pool and can be reused. This improves performance by a small amount, and reduces the load on any required authentication system used for establishing the connection.
You can enable JDBC connection pooling by setting
connection-pool.enabled=true in your catalog properties file.
Additional configuration properties to tune the connection pooling are listed below with their default value.
connection-pool.enabled=true connection-pool.max-size=10 connection-pool.max-connection-lifetime=30m connection-pool.pool-cache-ttl=30m connection-pool.pool-cache-max-size=1000
connection-pool.max-size controls the maximum number of idle and active
connections in the pool.
connection-pool.max-connection-lifetime controls the maximum lifetime of a
connection. When a connection reaches the timeout, even if recently used, it is
connection-pool.pool-cache-max-size controls the maximum size of the JDBC
data source cache.
connection-pool.pool-cache-ttl controls the expiration of a cached data
source when they are no longer accessed.