4.4. Deploying Presto#

SEP is available as an Amazon Machine Image (AMI), which in turn is used in the CloudFormation template (CFT). You can get the resources as customer directly from Starburst, or from the AWS Marketplace

Launching the AMI provides a fully functional single node Presto setup – suitable for trial deployment of Presto in your development environment.

The CFT is ideal for production deployment. It configuring multiple AMIs for Presto and necessary other components to form a cluster. You need to have a sufficient set of permissions, that allow the creation of all necessary stack resources. You can use the EC2 console or the AWS CLI to configure, launch and manage the cluster. Both use the same CFT Configuration.

Deploying a Single Server with the EC2 Console#

With the help of the AMI, you can deploy a single node Presto setup, for trial purposes.

1. Launching the AMI

After subscribing to the software, choose ‘Launch through EC2’ to launch through the AWS EC2 Console.

../_images/presto_ami_launch.png

This directs you to the ‘Choose an Instance Type’ step. Optionally you can choose ‘Copy to Service Catalog’ for when using with the AWS Service Catalog.

Another option is to get the URL to the AMI directly from Starburst.

2. Choose an Instance Type

Choose an instance type that best suits your workload. The r4.4xlarge instance type is recommended by default and works well for most workloads. See Recommended Instance Types to assist you with what instance types may be best for you. Note that a single node Presto instance is typically used for trying Presto in a development environment.

3. Configure Instance Details

Configure your instance to fit your needs. Choose the existing VPC and Subnet you want to deploy to. And optionally choose an IAM Role. Refer to the Prerequisites for more information on these various specification fields.

../_images/presto_ami_configure.png

4. Add Storage

Manage your instance’s storage and add supplementary EBS and instance store volumes as needed. The defaults are generally OK.

5. Add Tags

Add and create one or more tags.

6. Configure Security Group

Create security group, or select an existing one, to control traffic to your instance. Note that you are able to choose multiple security group IDs when selecting from the pool of existing groups. For additional information regarding security groups, refer to the Prerequisites section.

It is recommended that ports 8080 and 8088 are accessible in order to access the Presto Web UI, submit queries from outside the cluster, and access Apache Superset. Additionally, it’s recommended that port 22 is accessible for SSH access.

7. Review

Review the details of your instance. When content, proceed by pressing launch to assign a key pair to the instance and conclude the launch process.

Deploying a Cluster with the EC2 Console#

1. Select Template

After you subscribe to the Presto offering on AWS Marketplace, or receiving the URL to the CFT, you are able to begin configuring and launching your cluster.

../_images/presto_cft_launch.png

This directs you to the ‘Select Template’ step for creating a CloudFormation stack. You should find a pre-populated field under ‘Amazon S3 template URL’. This is the location of Starburst’s Presto CloudFormation template. Click ‘next’.

Optionally you can choose ‘Copy to Service Catalog’ for when using with the AWS Service Catalog.

../_images/presto_cft_select.png

2. Specify Details

Proceed by specifying the details of your cluster defined by the CFT Configuration. This step includes network, EC2 and Presto and other configuration.

  • Hive Connector Options:

Specify the relevant Hive connector options, if you plan to query data in HDFS or S3.

  • Ranger Options and Ranger LDAP User Synchronization:

These options allow you to configure Apache Ranger for the system level security with Apache Ranger and the related synchronization with an LDAP backend for user and group information. More information is available a table of parameters.

  • Advanced AWS S3 Configuration:

3. Options

Enter any additional stack specifications as shown on the Options page. These options include adding tags to resources within your cluster, choosing IAM roles, and specifying monitoring time for rollback triggers, among other advanced specifications. More information can be found in the AWS CloudFormation documentation.

4. Review

Finally, review the details of your Presto cluster. When content, proceed by pressing create to conclude the creation.

Note

Just above the final “Create stack” button there is a blueish box informing you that The following resource(s) require capabilities: [AWS::IAM::Role]. In order to create the Stack you need to mark the checkbox next to I acknowledge that AWS CloudFormation might create IAM resources.

Deploying a Cluster with the AWS CLI#

After subscribing to the software you can optionally deploy a Presto cluster using the AWS CLI instead of the AWS Web Console. Ensure the CLI is installed and configured before you proceed.

1. Open a Terminal Window

Open a terminal window and start editing a script file to be able to assemble all properties for the desired command to create the stack. Check out the example below. The script assembles on long command line. To make it easier to read and edit, you can separate segments with a backslash.

2. Create Stack

Add a first line to the command that uses the AWS CLI to create a CloudFormation stack.

aws cloudformation create-stack \
3. Name Stack

Specify the name that is to be associated with the cluster. The name must be unique in the region in which you are creating the cluster.

--stack-name exampleclustername \
4. Specify Template

Add the CFT template URL you received from Starburst. The URL must point to a template that is located in an Amazon S3 bucket.

--template-url https://s3.amazonaws.com/example-context/PrestoCFT.template \

5. Specify Parameters

Define a list of parameter structures that specify input parameters for the cluster. Refer to the complete list of parameters in the configuration section Parameter values need to be provided on the command line in special form. Refer to the example below for guidance.

6. Options

Configure the rollback ability as desired to disable rollback of the cluster, if stack creation fails, with one of the following options

--disable-rollback \
--no-disable-rollback \
RollbackTriggers=[{Arn=string,Type=string},{Arn=string,Type=string}],MonitoringTimeInMinutes=integer

7. IAM Capabilities

Analogous to how you mark the I acknowledge that AWS CloudFormation might create IAM resources. checkbox, when deploying Presto via the AWS console, you need to denote that you accept this capability when deploying with the CLI. This is done by adding a --capabilities parameter to the AWS CLI create-stack command.

--capabilities CAPABILITY_IAM \
8. Review

Finally, review the details of your cluster and your command. When ready, safe the script file and run it to create your cluster.

CFT CLI Example#

The following example shows a full command to create a cluster with the AWS CLI.

aws cloudformation create-stack \
--stack-name "Presto-cluster" \
--template-url "https://s3.amazonaws.com/awsmp-fulfillment-cf-templates-prod/PrestoCFT.template" \
--parameters \
"ParameterKey=VPC,ParameterValue=vpc-4bd6ca11" \
"ParameterKey=Subnet,ParameterValue=subnet-123abc2b" \
"ParameterKey=SecurityGroups,ParameterValue=sg-12e34aeb" \
"ParameterKey=CoordinatorInstanceType,ParameterValue=r4.xlarge" \
"ParameterKey=WorkersInstanceType,ParameterValue=r4.xlarge" \
"ParameterKey=KeyName,ParameterValue=john.smith" \
"ParameterKey=IamInstanceProfile,ParameterValue=my-ec2-instance-profile" \
"ParameterKey=WorkersCount,ParameterValue=2" \
"ParameterKey=LaunchSuperset,ParameterValue=yes" \
"ParameterKey=MetastoreType,ParameterValue='External MySQL RDBS'" \
"ParameterKey=ExternalMetastoreHost,ParameterValue=172.31.6.18"  \
"ParameterKey=ExternalMetastorePort,ParameterValue=3306"  \
"ParameterKey=ExternalRdbmsMetastoreUserName,ParameterValue=hive"  \
"ParameterKey=ExternalRdbmsMetastorePassword,ParameterValue='q@55vv0r|>'"  \
"ParameterKey=ExternalRdbmsMetastoreDatabaseName,ParameterValue=hive_metastore"  \
"ParameterKey=AdditionalCoordinatorConfigurationURI,ParameterValue=s3://my_bucket/presto-additional-coordinator-configuration-1.0.zip" \
"ParameterKey=AdditionalWorkersConfigurationURI,ParameterValue=s3://my_bucket/presto-additional-workers-configuration-1.0.zip" \
"ParameterKey=BootstrapScriptURI,ParameterValue=s3://my_bucket/presto-bootstrap-1.0.sh" \
"ParameterKey=LicenseURI,ParameterValue=s3://my_bucket/starburstdata.license" \
"ParameterKey=S3Endpoint,ParameterValue=https://mybucket.s3-us-west-2.amazonaws.com" \
"ParameterKey=S3AccessKey,ParameterValue=AKIAIOSFODNN7EXAMPLE" \
"ParameterKey=S3SecretKey,ParameterValue=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
--capabilities CAPABILITY_IAM

The above commands yields output like the following:

{
  "StackId":"arn:aws:cloudformation:us-east-1:123456789012:stack/myteststack/466df9e0-0dff-08e3-8e2f-5088487c4896"
}

CFT Configuration#

The CFT includes numerous configuration parameters that are grouped in different sections. All include description in the AWS console. The same parameters apply to the EC2 console and the AWS CLI usage.

Network Configuration#

Network Configuration Parameters#

Parameter key

Description

Example

VPC

Virtual Private Cloud ID

vpc-4bd6ca11

Subnet

Subnet to use for Presto nodes (must belong to the selected VPC)

subnet-123abc2b

SelectedSubnetAutoAssignsPublicIp

Set to no if selected subnet does not provide public IPs. In this case VPC endpoints are created for the Presto stack.

yes

SecurityGroups

Additional Security Groups for Presto nodes (e.g: allowing SSH access). Must select at least one.

sg-12e34aeb

EC2 Configuration#

The EC2 configuration details the infrastructure used for your Presto cluster.

Choose a CoordinatorInstanceType and WorkerInstanceType suitable for

your workload. The r4.4xlarge instance types are chosen by default and work well for most workloads. See Recommended Instance Types to assist you with what instance types may be best for you.

EC2 Configuration Parameters#

Parameter key

Description

Default

Example

CoordinatorInstanceType

EC2 instance type of the coordinator. The default instance type works well for most workloads, and further tips on choosing the right size is available.

r4.xlarge

r5.12xlarge

WorkerInstanceType

EC2 instance type of the workers. The default instance type works well for most workloads, and further tips on choosing the right size is available.

r4.xlarge

m5.4xlarge

KeyName

Name of an EC2 KeyPair to enable SSH access to the instance. See SSH Keys for more detail.

john.smith

WorkersCount

Number of dedicated worker nodes (apart from coordinator) to instantiate. Worker nodes are added to an AWS AutoScaling Group. See Auto Scaling a Presto Cluster for more details.

10

HACoordinatorsCount

Number of coordinator nodes to instantiate. If more then one, the coordinator offers HA capabilities. This number represents one active coordinator plus the number of optional hot-standby coordinators. For example, if you specify 3, then there is 1 active coordinator and 2 standby coordinators, if the active one fails. See Coordinator High Availability for more details.

1

3

WorkerMountVolume

Mount an additional EBS volume on each worker at /data. This is required when using caching for distributed storage. Ensure that the directory /data is configured in your Hive catalog properties.

no

yes

WorkerVolumeType

Type of the additional EBS volume mounted on the workers.

io1

gp2

WorkerVolumeSize

Size of the additional EBS volume mounted on the workers, in GiB. Use at least 10GiB with the io1 volume type. Range from 4 to 16384 is valid

4

100

WorkerVolumeIOPS

The number of possible I/O operations per second for the additional volume. Used only with the io1 volume type. Each 5000 I/O ops require at least 100 GiB storage size on the volume. Range from 100 to 20000 is valid

100

2000

KeepCoordinatorNode

(Debug only) Keep coordinator node running after the coordinator service fails.

no

yes

Presto Configuration#

The Presto configuration parameter allow you to configure all Presto-specific aspects of your coordinators and workers in the cluster.

Presto Configuration Parameters#

Parameter key

Description

Example

AdditionalCoordinatorConfigurationURI

URI of S3 zip file with additional configuration for the coordinator (Optional)

s3://my_bucket/presto-additional-coordinator-configuration-1.0.zip

AdditionalWorkersConfigurationURI

URI of S3 zip file with additional configuration for the workers (Optional)

s3://my_bucket/presto-additional-workers-configuration-1.0.zip

BootstrapScriptURI

Optional URI of an bash script stored on S3 to execute on all nodes. The script runs after Presto is configured, but before it is started. For example, your bash script can be used to create directories, install additional software, deploy UDFs, or deploy other plugins. When the script is executed, a string argument value of coordinator or worker is passed in. You can check for this argument value in your script to perform certain actions based on the node type.

s3://my_bucket/presto-bootstrap-1.0.sh

PrestoHttpPort

Port to use for Presto coordinator and therefore the web UI as well as JDBC and other client connections. Defaults to 8080.

8088

LicenseURI

URI of the SEP license in S3. This is only needed when deploying the CFT (using a privately shared SEP AMI) without subscribing to the AWS Marketplace.

s3://my_bucket/starburstdata.license

Hive Connector Options#

The Hive connector is required, if you plan to access data in HDFS or S3. It requires a Hive Metastore so Presto knows where data lives. Refer to the dedicated documentation Configuring Hive Metastore to determine your configuration.

Hive Connector Options#

Parameter key

Description

Example

MetastoreType

Determines what metastore is used by the Hive connector. Defaults to None, which means that no Hive connector is provisioned.

AWS Glue Data catalog

ExternalMetastoreHost

When external Metastore is used (see MetastoreType parameter), this points to the host of the Metastore.

metastore.example.com

ExternalMetastorePort

When external Metastore is used (see MetastoreType parameter), this points to the Metastore service port number.

When set to 0 (the default value), default value per each metastore type is used:

  • 3306 for External MySQL RDBMS

  • 5432 for External PostgreSQL RDBMS

  • 9083 for External Hive Metastore Service

Cannot be empty when MetastoreType is to either of:

  • External MySQL RDBMS

  • External PostgreSQL RDBMS

  • External Hive Metastore Service

9083

ExternalRdbmsMetastoreUserName

When external Metastore is used (see MetastoreType parameter), this determines the JDBC connection user name. Cannot be empty when MetastoreType is to either of:

  • External MySQL RDBMS

  • External PostgreSQL RDBMS

database_user_name

ExternalRdbmsMetastorePassword

When external Metastore is used (see MetastoreType parameter), this determines the JDBC connection password. Cannot be empty when MetastoreType is to either of:

  • External MySQL RDBMS

  • External PostgreSQL RDBMS

jdbc_user_p@55vv0rd

ExternalRdbmsMetastoreDatabaseName

When external Metastore is used (see MetastoreType parameter), this determines the JDBC connection password. Cannot be empty when MetastoreType is to either of:

  • External MySQL RDBMS

  • External PostgreSQL RDBMS

hivemetastore

Ranger Options and Ranger LDAP User Synchronization#

The following parameters are related to the system level security with Apache Ranger and the related synchronization of Ranger with an LDAP backend for user and group information.

Ranger-related Configuration Parameters#

Parameter key

Description

Example

EnableRanger

When enabled, Apache Ranger for system level security is added. Defaults to no. All other settings in this sections are ignored if Ranger is disabled.

yes

RangerAdminPassword

Administrator password for Ranger. At least 8 characters, including lowercase, uppercase and digit, are required. When reusing an existing external database for Ranger in your CFT stack, you need to provide the same password as the initial one, to ensure access remains functional.

RangerBackendType

Type of database backend used for Apache Ranger. The default External PostgreSQL RDBMS is recommended for production usage. Built-in PostgreSQL RDBMS is ephemeral and only suitable for demo purposes.

ExternalRdbmsRangerHost

Hostname of the external PostgreSQL RDBMS server.

ExternalRdbmsRangerPort

Port of the external PostgreSQL RDBMS server. Defaults to 5432.

ExternalRdbmsRangerDatabaseName

Name of the database on the external PostgreSQL RDBMS server to use as Ranger database backend. The database must already exist. Defaults to ranger.

ExternalRdbmsRangerUserName

Name of the database user that Ranger uses to manage the database on the external PostgreSQL RDBMS. The user must exist, have full permissions to the database and must have CREATEROLE permissions granted. An additional user ‘ranger’ is created for non-admin database access. If you specify ‘ranger’, the single user is used for all operations. Defaults to rangeradmin

ExternalRdbmsRangerPassword

Password for the database user.

RangerConfigFile

URL to an optional additional Ranger config file in an S3 bucket. A template is available at https://starburstdata-cft-public.s3.us-east-2.amazonaws.com/1.0.0/ranger.template.properties. Modify the template and upload it to an S3 bucket. The config file is required for using Solr Audit with Ranger and other customizations. Example: s3://my-bucket/my-config_file.properties

RangerBootstrapScript

URL to an optional bootstrap script in an S3 bucket. The script is run before Ranger starts. It can for example be used to provide you truststore files. Example: s3://my-bucket/ranger-bootstrap.sh

EnableRangerUserSync

When enabled Apache Ranger synchronizes users from an external LDAP directory. Requires Ranger to be enabled, disabled by default. All other settings in this sections are ignored if Ranger user sync is disabled.

RangerUserSyncConfigFile

URL to Ranger user synchronization configuration file in S3 bucket. A template is available at https://starburstdata-cft-public.s3.us-east-2.amazonaws.com/1.0.0/usersync.template.properties Create a modified copy of the template and upload it to an S3 bucket. Required if Ranger user sync is enabled. Example: s3://my-bucket/my-config_file.properties

Advanced AWS S3 Configuration#

With the advanced AWS S3 configuration it is possible to:

  • configure custom access credentials for AWS S3, in this case set only S3AccessKey and S3SecretKey

  • configure custom AWS S3 endpoint, in this cases set only S3Endpoint

  • access third-party S3-compatible storage system, in this case set all three parameters

Notice that these parameters only affect the configuration of provisioned Hive catalogs. All of these properties are optional, when not given EC2 instance default values are used.

Advanced S3 Configuration Parameters#

Parameter key

Description

Example

S3Endpoint

URI to AWS S3-compatible endpoint (Optional)

https://mybucket.s3-us-west-2.amazonaws.com

S3AccessKey

Access key to AWS S3-compatible storage (Optional)

AKIAIOSFODNN7EXAMPLE

S3SecretKey

Access secret to AWS S3-compatible storage (Optional)

wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Monitoring#

Monitoring Parameters#

Parameter key

Description

Example

EnableCloudWatchMetrics

Enable integration with CloudWatch metrics. When enabled, OS and Presto metrics are reported for each cluster node and a CloudWatch Dashboard with cluster overview is created. Additional CloudWatch fees are charged. Refer to Integration with CloudWatch Metrics for more details.

no

Advanced Configuration#

Advanced Configuration Parameters#

Parameter key

Description

Example

IamInstanceProfile

Optional name of an IAM instance profile to attach to Presto nodes. See Instance Profiles for more detail. If you do not specify the InstanceProfile, the CloudFormation Template creates the necessary IAM role privileges.

my-ec2-instance-profile

Other Parameters#

Other Parameters#

Parameter key

Description

Example

LaunchSuperset

When enabled, Superset is deployed and started on an EC2 instance

yes