4.4. Deploying Presto

Presto on AWS Marketplace is available as both an Amazon Machine Image (AMI) and a CloudFormation template. Launching as an AMI provides a fully functional single node Presto setup – suitable for trial deployment of Presto in your development environment. The Starburst CloudFormation template is ideal for production deployment by configuring multiple Presto AMIs to form a Presto cluster consisting of a Presto Coordinator and various Presto Workers.

Deploying the Presto AMI Using AWS EC2 Console

1. Launching the AMI

After subscribing to the software, choose ‘Launch through EC2’ to launch through the AWS EC2 Console.

../_images/presto_ami_launch.png

This will direct you to the ‘Choose an Instance Type’ step. Optionally you can choose ‘Copy to Service Catalog’ for when using with the AWS Service Catalog.

2. Choose an Instance Type
Choose an instance type that best suits your workload. The r4.4xlarge instance type is recommended by default and works well for most workloads. See Recommended Instance Types to assist you with what instance types may be best for you. Note that a single node Presto instance is typically used for trying Presto in a development environment.
3. Configure Instance Details

Configure your instance to fit your needs. Choose the existing VPC and Subnet you want to deploy to. And optionally choose an IAM Role. Please refer to the above Prerequisites for more information on these various specification fields.

../_images/presto_ami_configure.png
4. Add Storage
Manage your instance’s storage and add supplementary EBS and instance store volumes as needed. The defaults are generally OK.
5. Add Tags
Add and create one or more tags. Refer to the link below for more information on tags
6. Configure Security Group

Create or select an existing security group to control traffic to your instance. Note that you are able to choose multiple security group IDs when selecting from the pool of existing security groups. For additional information regarding security groups, please refer to the Prerequisites section.

It’s recommended that ports 8080 and 8088 are accessible in order to access the Presto Web UI, submit queries from outside the cluster, and access Apache Superset. Additionally, it’s recommended that port 22 is accessible for SSH access.

7. Review
Review the details of your instance. When content, proceed by pressing launch to assign a key pair to the instance and conclude the launch process.

Deploying a Presto Cluster Using CloudFormation Template (Web Console)

1. Select Template

After you subscribe to the Presto offering on AWS Marketplace, you’ll be able to launch CloudFormation.

../_images/presto_cft_launch.png

This will direct you to the ‘Select Template’ step for creating a CloudFormation stack. You should find a pre-populated field under ‘Amazon S3 template URL’. This is the location of Starburst’s Presto CloudFormation template. Click ‘next’.

Optionally you can choose ‘Copy to Service Catalog’ for when using with the AWS Service Catalog.

../_images/presto_cft_select.png
2. Specify Details

Proceed by specifying the details of your Presto cluster (CloudFormation stack). This step includes network, EC2 and Presto configurations.

  • Preliminary Details: Name your Presto cluster .

    ../_images/presto_cft_stackname.png
  • Network Configuration: Specify your existing VPC, Subnet, and Security Groups. It is assumed these are preconfigured in your AWS account. See the below Prerequisites sections for more detail:

    Select no for SelectedSubnetAutoAssignsPublicIp if selected subnet does not provide public IPs. In such case VPC endpoints will be created for Presto stack.

    ../_images/presto_cft_network.png
  • EC2 Configuration:

    Choose a CoordinatorInstanceType and WorkerInstanceType suitable for your workload. The r4.4xlarge instance types are chosen by default and work well for most workloads. See Recommended Instance Types to assist you with what instance types may be best for you.

    Choose a KeyName which is the name of an EC2 KeyPair to enable SSH access to the instance. See SSH Keys for more detail.

    Specify the WorkersCount to specify the number of Presto worker nodes for the cluster. Presto worker nodes are added to an AWS AutoScaling Group. See Auto Scaling a Running Presto Cluster for more detail.

    Specify the HACoordinatorsCount for Presto coordinator high availability. This number represents 1 active coordinator plus the number of optional hot standby coordinators. For example, if you specify 3, then there is 1 active coordinator and 2 standby coordinators if the active one fails. See Presto Coordinator High Availability for more detail.

    ../_images/presto_cft_ec2.png
  • Presto Configuration:

    AllowExternalAccessOnHttp indicates whether you will be allowing external access on HTTP. Disabled by default, this is to disallow external access for security purposes. When disabled, clients (CLI/JDBC/etc) can connect to Presto only from machines in the AWS Subnet specified above. When enabled and instances are assigned public IP, Presto endpoints will be available over HTTP without authorization (unless eg. LDAP is configured via “Custom Configuration”).

    Note

    It’s important to make sure your AWS infrastructure is set up in such a way that the Presto is not publicly accessible.

    If you want to use the Hive connector to access data in HDFS or S3, you will need to configure Hive Metastore so Presto knows where data lives. See Configuring Hive Metastore for more detail.

    AdditionalCoordinatorConfigurationURI and AdditionalWorkersConfigurationURI are the locations of your custom Presto configuration if you wish to override the default configurations. Please see the section Custom Configuration for more detail.

    BootstrapScriptURI is the location of your optional bash script to run on the Presto Coordinator and Worker nodes after Presto is configured but before it is launched. For example, your bash script could be used to install additional software, deploy UDFs, or deploy other plugins. When the script is executed, a string argument value of coordinator or worker is passed in. You can check for this argument value in your script to perform certain actions based on the Presto node type.

    ../_images/presto_cft_configuration.png
PrestoHttpPort configures the Presto HTTP port number. By default it is set to 8080.
  • Hive Connector Options:

    These options determine the Hive Metastore to use. There are several ways you configure Hive Metastore in Hive connector. You should refer to the dedicated documentation Configuring Hive Metastore to determine your configuration.

    ../_images/presto_cft_metastore.png
  • Advanced AWS S3 configuration:

    With S3Endpoint, S3AccessKey and S3SecretKey it is possible to:

    • configure custom access credentials for AWS S3, in this case set only S3AccessKey and S3SecretKey
    • configure custom AWS S3 endpoint, in this cases set only S3Endpoint
    • access third-party S3-compatible storage system, in this case set all three parameters

    Notice that above parameters will affect configuration of only provisioned Hive catalog. All of these properties are optional, when not given EC2 instance default values are used.

    ../_images/presto_cft_s3_configuration.png
  • Monitoring:

    Indicate if you want to enable integration with CloudWatch metrics. When enabled, OS and Presto metrics will be reported for each cluster node. Additionally, a CloudWatch Dashboard with cluster overview will be created. Additional CloudWatch fees are charged. Please refer to Integration with CloudWatch metrics for more detail.

    ../_images/presto_cft_monitoring.png
  • Advanced:

    Optionally specificy the InstanceProfile to attach to Presto nodes. See Instance Profiles for more detail. If you do not specificy the InstanceProfile, the CloudFormation Template will create the necessary IAM Role privleges.

    ../_images/presto_cft_iam.png
  • Other:

    Indicate whether you will be launching Apache Superset.

    ../_images/presto_cft_superset.png
3. Options
Enter any additional stack specifications as shown on the Options page. These options include adding tags to resources within your cluster, choosing IAM roles, and specifying monitoring time for rollback triggers, among other advanced specifications. For further insight into said options follow the link to the AWS CloudFormation documentation.
4. Review
Finally, review the details of your Presto cluster. When content, proceed by pressing create to conclude the creation.

Note: Just above the final “Create stack” button there will be a blueish box informing you that The following resource(s) require capabilities: [AWS::IAM::Role]. In order to create the Stack you need to mark the checkbox next to I acknowledge that AWS CloudFormation might create IAM resources.

Deploying a Presto Cluster Using CloudFormation Template (AWS CLI)

After subscribing to the software you can optionally launch the Presto cluster using the AWS CLI instead of the AWS Web Console. This is often useful for those wanting to script control of the Presto cluster. Or simply for those more comfortable at the command line.

1. Open a Terminal Window
Open a terminal window to begin a command line.
2. Create Stack
Use the create stack command to initiate the creation of your Presto cluster (CloudFormation stack).
create-stack
3. Name Stack
Specify the name that is to be associated with the cluster. The name must be unique in the region in which you are creating the cluster.
--stack-name (string)
4. Specify Template

Indicate the template you will be using to create your cluster.

URL: Specify the location of the file containing the template body. The URL must point to a template that is located in an Amazon S3 bucket.

--template-url (string)

5. Specify Parameters

Define a list of parameter structures that specify input parameters for the cluster. Reference the following list of possible parameters for your cluster creation.
Parameter Key Description Example Parameter Values
VPC Virtual Private Cloud ID vpc-4bd6ca11
Subnet Subnet to use for Presto nodes (must belong to the selected VPC) subnet-123abc2b
SelectedSubnetAutoAssignsPublicIp Set to no if selected subnet does not provide public IPs. In such case VPC endpoints will be created for Presto stack. yes
SecurityGroups Additional Security Groups for Presto nodes (e.g: allowing SSH access). Must select at least one. sg-12e34aeb
CoordinatorInstanceType EC2 instance type of the coordinator r4.xlarge (For a full list see Available Instance Types under Supported Instances and Regions)
WorkerInstanceType EC2 instance type of the workers r4.xlarge (For a full list see Available Instance Types under Supported Instances and Regions)
KeyName Name of an EC2 KeyPair to enable SSH access to the instance. john.smith
IamInstanceProfile. The name of an instance profile to attach to Presto nodes (Optional) my-ec2-instance-profile.
WorkersCount Number of dedicated Presto worker nodes (apart from coordinator) to instantiate 10
HACoordinatorsCount Number of Presto coordinator nodes to instantiate. If more then one, then Presto Coordinator will offer HA capabilities. 1
LaunchSuperset When enabled, Superset will be started on a an EC2 instance yes
AllowExternalAccessOnHttp When enabled and instances are assigned public IP, Presto endpoints will be available over HTTP without authorization. no
MetastoreType Determines what metastore will be used for Hive connector. Defaults to None, which means that no Hive connector will be provisioned. AWS Glue Data Catalog
ExternalMetastoreHost When external Metastore is used (see MetastoreType parameter), this will point the host of the Metastore. metastore.yourcompany.com
ExternalMetastorePort

When external Metastore is used (see MetastoreType parameter), this will point the Metastore service port number.

When set to 0 (the default value), default value per each metastore type will be used:

  • 3306 for External MySQL RDBM
  • 5432 for External PostgreSQL RDBMS
  • 9083 for External Hive Metastore Service

Cannot be empty when MetastoreType is to either of: be used:

  • External MySQL RDBMS
  • External PostgreSQL RDBMS
  • External Hive Metastore Service
9083
ExternalRdbmsMetastoreUserName

When external Metastore is used (see MetastoreType parameter), this will determine the JDBC connection user name.

Cannot be empty when MetastoreType is to either of: be used:

  • External MySQL RDBMS
  • External PostgreSQL RDBMS
database_user_name
ExternalRdbmsMetastorePassword

When external Metastore is used (see MetastoreType parameter), this will determine the JDBC connection password

Cannot be empty when MetastoreType is to either of: be used:

  • External MySQL RDBMS
  • External PostgreSQL RDBMS
jdbc_user_p@55vv0rd
ExternalRdbmsMetastoreDatabaseName

When external Metastore is used (see MetastoreType parameter), this will determine the JDBC connection password

Cannot be empty when MetastoreType is to either of: be used:

  • External MySQL RDBMS
  • External PostgreSQL RDBMS
hivemetastore
AdditionalCoordinatorConfigurationURI URI of S3 zip file with additional Presto for the Presto Coordinator (Optional) s3://my_bucket/presto-additional- coordinator-configuration-1.0.zip
AdditionalWorkersConfigurationURI URI of S3 zip file with additional Presto for the Presto Workers (Optional) s3://my_bucket/presto-additional- workers-configuration-1.0.zip
BootstrapScriptURI URI of bash script on S3 to execute after. for the Presto Workers (Optional) Presto configuration. s3://my_bucket/presto-bootstrap-1.0.sh
S3Endpoint URI to AWS S3-compatible endpoint (Optional) https://mybucket.s3-us-west-2.amaz onaws.com
S3AccessKey Access key to AWS S3-compatible storage (Optional) AKIAIOSFODNN7EXAMPLE
S3SecretKey Access secret to AWS S3-compatible storage (Optional) wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAM PLEKEY
EnableCloudWatchMetrics Enable integration with CloudWatch metrics. When enabled, OS and Presto metrics will be reported for each cluster node. Additionally, a CloudWatch Dashboard with cluster overview will be created. Additional CloudWatch fees are charged. no

For a more detailed description on the above parameters, please refer to the Prerequisites section. Also, parameter values need to be provided on the command line in special form. Please refer to the Example below for guidance.

6. Options

Rollback: Set rollback ability to true to disable rollback of the cluster if stack creation failed.
--disable-rollback
--no-disable-rollback
RollbackTriggers=[{Arn=string,Type=string},{Arn=string,Type=string}],MonitoringTimeInMinutes=integer

7. IAM Capabilities

Analogous to how you mark the I acknowledge that AWS CloudFormation might create IAM resources. checkbox, when deploying Starburst Presto via the AWS Web Console, you need to denote that you accept this capability when deploying via CLI. This is done by adding a --capabilities parameter to the AWS CLI create-stack command.
--capabilities CAPABILITY_IAM
8. Review
Finally, review the details of your cluster and your commands. When ready, proceed by pressing enter to conclude the creation of your Presto cluster.

Example

See the following create-stacks command as a reference for your Presto cluster deployment:

      aws cloudformation create-stack \
      --stack-name "Presto-cluster" \
      --template-url "https://s3.amazonaws.com/awsmp-fulfillment-cf-templates-prod/PrestoCFT.template" \
      --parameters \
      "ParameterKey=VPC,ParameterValue=vpc-4bd6ca11" \
      "ParameterKey=Subnet,ParameterValue=subnet-123abc2b" \
      "ParameterKey=SecurityGroups,ParameterValue=sg-12e34aeb" \
      "ParameterKey=CoordinatorInstanceType,ParameterValue=r4.xlarge" \
      "ParameterKey=WorkersInstanceType,ParameterValue=r4.xlarge" \
      "ParameterKey=KeyName,ParameterValue=john.smith" \
      "ParameterKey=IamInstanceProfile,ParameterValue=my-ec2-instance-profile" \
      "ParameterKey=WorkersCount,ParameterValue=2" \
      "ParameterKey=LaunchSuperset,ParameterValue=yes" \
      "ParameterKey=AllowExternalAccessOnHttp,ParameterValue=no" \
      "ParameterKey=MetastoreType,ParameterValue='External MySQL RDBS'" \
      "ParameterKey=ExternalMetastoreHost,ParameterValue=172.31.6.18"  \
      "ParameterKey=ExternalMetastorePort,ParameterValue=3306"  \
      "ParameterKey=ExternalRdbmsMetastoreUserName,ParameterValue=hive"  \
      "ParameterKey=ExternalRdbmsMetastorePassword,ParameterValue='q@55vv0r|>'"  \
      "ParameterKey=ExternalRdbmsMetastoreDatabaseName,ParameterValue=hive_metastore"  \
      "ParameterKey=AdditionalCoordinatorConfigurationURI,ParameterValue=s3://my_bucket/presto-additional-coordinator-configuration-1.0.zip" \
      "ParameterKey=AdditionalWorkersConfigurationURI,ParameterValue=s3://my_bucket/presto-additional-workers-configuration-1.0.zip" \
      "ParameterKey=BootstrapScriptURI,ParameterValue=s3://my_bucket/presto-bootstrap-1.0.sh" \
      "ParameterKey=S3Endpoint,ParameterValue=https://mybucket.s3-us-west-2.amazonaws.com" \
      "ParameterKey=S3AccessKey,ParameterValue=AKIAIOSFODNN7EXAMPLE" \
      "ParameterKey=S3SecretKey,ParameterValue=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
--capabilities CAPABILITY_IAM

The above commands yield output like the following:

{
"StackId":"arn:aws:cloudformation:us-east-1:123456789012:stack/myteststack/466df9e0-0dff-08e3-8e2f-5088487c4896"
}

Permissions needed to deploy a Starburst Presto CloudFormation Template

A user launching a CloudFormation Stack, whether via the AWS Web Console or AWS CLI, needs to have a set of permissions that will allow the creation of all necessary Stack resources. Please see the IAM Role Permissions for CloudFormation section for details.