4.6. Configuring Hive Metastore

There are several ways you configure Hive Metastore in Hive connector.

None

By choosing MetastoreType to None (which is default configuration), no Hive connector will be configured.

Standalone (ephemeral)

By choosing MetastoreType to Standalone (ephemeral) a separate EC2 instance will be created by CFT which will contain both Hive Metastore and its underlying RDBMS.

Notice that information stored in such Metastore lives as long as Presto Cluster. Because of that such configuration should be avoid on production system, while it is the best option to play with Presto and Hive connector.

AWS Glue Data Catalog

By choosing MetastoreType to AWS Glue Data Catalog Hive connector will use AWS Glue Data Catalog as its Metastore service.

External MySQL RDBMS

By choosing MetastoreType to External MySQL RDBMS a separate EC2 instance will be created by CFT which will run Hive Metastore service that will leverage external MySQL RDBMS as its underlying storage. This new instance will need network access to the external MySQL system. So make sure to set up your networking and security groups appropriately. You can use your own MySQL instance you manage. However, we recommend using AWS RDS.

This configuration requires the below properties to be set:
  • ExternalMetastoreHost with host address where MySQL service is running
  • ExternalMetastorePort with port number of MySQL service. If 0 is set then 3306 (default MySQL port) will be used.
  • ExternalRdbmsMetastoreUserName with MySQL user name
  • ExternalRdbmsMetastorePassword with MySQL user password
  • ExternalRdbmsMetastoreDatabaseName with MySQL database name that will be used for storing Hive Metastore data.

RDBMS does not require any schema initialization other than database creation. It is well suited for MySQL provisioned with AWS RDS service.

External PostgreSQL RDBMS

By choosing MetastoreType to External PostgreSQL RDBMS a separate EC2 instance will be created by CFT which will run Hive Metastore service that will leverage external PostgreSQL RDBMS as its underlying storage. This new instance will need network access to the external PostgreSQL system. So make sure to set up your networking and security groups appropriately. You can use your own PostgreSQL instance you manage. However, we recommend using AWS RDS.

This configuration requires the below properties to be set:
  • ExternalMetastoreHost with host address where PostgreSQL service is running
  • ExternalMetastorePort with port number of PostgreSQL service. If 0 is set then 5432 (default PostgreSQL port) will be used.
  • ExternalRdbmsMetastoreUserName with PostgreSQL user name
  • ExternalRdbmsMetastorePassword with PostgreSQL user password
  • ExternalRdbmsMetastoreDatabaseName with PostgreSQL database name that will be used for storing Hive Metastore data.

RDBMS does not require any schema initialization other than database creation. It is well suited for PostgreSQL provisioned with AWS RDS service.

External Hive Metastore Service

By choosing MetastoreType to External Hive Metastore Service Hive connector will use an existing Hive Metastore Service.

This configuration requires the below properties to be set:
  • ExternalMetastoreHost with host address where External Hive Metastore Service service is running
  • ExternalMetastorePort with port number of External Hive Metastore Service service. If 0 is set then 9083 (default Hive Metastore Service port) will be used.