10.8. Cloudera Data Platform support#

The Starburst Hive connector can be used to query Cloudera Data Platform (CDP) version 7.1.

Note

The Cloudera Data Platform support requires a valid Starburst Enterprise Presto license.

Configuration#

  • Edit your catalog properties file using the Hive connector

  • Set the metastore to use thrift-cdp7

  • Configure the URI to point to your Hive metastore Thrift service

connector.name=hive-hadoop2
hive.metastore=thrift-cdp7
hive.metastore.uri=thrift://cdp-master:9083

Hive metastore and statistics#

The CDP support includes the improved thrift-cdp7 Hive metastore support. It supports the metastore thrift communication protocol regarding table statistics management implemented by CDP.

This supports separate handling of a variety of statistics for Presto:

  • Column statistics

  • Partition statistics

  • Table statistics

These statistics are used by the cost-based optimizer in Presto and result in higher query performance. The statistics need to be populated from Presto.

Statistics can also be populated directly from Hive. This is potentially less efficient, but required for cases where Presto can not perform the analysis, such as full ORC ACID tables.

Reading data#

CDP support includes read operations on the following tables:

  • compacted tables

  • bucketed tables

  • partitioned tables

  • unpartitioned tables

The following file formats can be read:

  • Avro

  • ORC ACID

  • Parquet

  • RCFile

Writing data#

Write operations, such as CREATE TABLE AS or CREATE VIEW and others, are generally supported.

Write operations on ORC ACID tables is not supported.