4.10. AWS Glue Support#
AWS Glue is a supported metadata catalog for Presto. It is intended to be used as a alternative to the Hive Metastore with the Presto Hive plugin to work with your S3 data.
AWS Glue with SEP AMI#
When you deploy a SEP AMI from the AWS Marketplace, you need to configure the Hive connector to use Glue. The minimal setup is to do the following on all Presto nodes:
1. create a
/etc/presto/catalogs/glue.properties file with at least the
restart Presto with:
sudo service presto restart
AWS Glue with CloudFormation Template#
When using the CloudFormation template in AWS, you can leverage Glue
by simply choosing
AWS Glue Data Catalog in
MetastoreType field of the
stack creation form (
Presto Configuration section)
SEP with AWS Glue Usage#
When configured as above, the Glue catalog is available via the
from within Presto CLI or any other Presto connection. As usual remember to
specify the location of the data on S3. Either for the entire schema or on the
table level. For example to create a schema
foo in Glue, with the S3 base
directory (root folder for per table subdirectories) pointing to the root of
my-bucket S3 bucket, you would write:
CREATE SCHEMA hive.foo WITH (location = 's3://my-bucket/')
You can also create and edit the schema and tables directly from AWS Glue. In AWS Glue terminology the schema is called “database”.
Both the AMI and CloudFormation approach mentioned above require the Presto instances to have permissions to access both S3 and Glue AWS services.
When using SEP via our CloudFormation template by default you do not need to provide anything, the template creates all necessary resources automatically.
If you need to provide your own IAM Instance Profile for the Presto instances
IamInstanceProfile field in the Stack creation form) consult the
IAM Role Permissions for Presto Cluster Nodes section. Same applies when launching the AMI
manually, make sure you choose an IAM Role that satisfies the requirements.
Table and Column Statistics Support#
hive.metastore.glue.column-statistics-enabled set to
table and column statistics are collected for Glue tables and partitions.
Such statistics are stored in JSON format as Glue table and partition
parameters. See also ANALYZE.
Known Limitations of AWS Glue Support#
There are a couple Presto features that are not yet supported with the Glue catalog:
When a column is renamed, its statistics are not preserved. Therefore the table needs to be re-analyzed.
Renaming tables from within AWS Glue is not supported.
Partition values containing quotes and apostrophes are not supported (for example,
Using Hive authorization is not supported.