5.5. Deploying Presto with Caching
Presto on AWS Marketplace is available as both an Amazon Machine Image (AMI) and a CloudFormation template. Launching as an AMI provides a fully functional single node Presto setup – suitable for trial deployment of Presto in your development environment. The Starburst CloudFormation template is ideal for production deployment by configuring multiple Presto AMIs to form a Presto cluster consisting of a Presto Coordinator and various Presto Workers.
Presto’s caching is achieve using Alluxio. The Presto and Alluxio joint CloudFormation template is ideal for production deployment by configuring both Presto and Alluxio to form a joint cluster. Alluxio helps cache frequently used remote data locally to speed up Presto query performance. View https://docs.alluxio.io/os/user/stable/en/Overview.html for more information about Alluxio.
Presto Deployment Options
- 1. Select Template
After you subscribe to the Presto offering on AWS Marketplace, you’ll be able to launch CloudFormation.
This will direct you to the ‘Select Template’ step for creating a CloudFormation stack. You should find a pre-populated field under ‘Amazon S3 template URL’. This is the location of Starburst’s Presto CloudFormation template. Click ‘next’.
Optionally you can choose ‘Copy to Service Catalog’ for when using with the AWS Service Catalog.
- 2. Specify Details
Proceed by specifying the details of your Presto cluster (CloudFormation stack). This step includes network, EC2 and Presto configurations.
PrestoHttpPort configures the Presto HTTP port number. By default it is set to
Preliminary Details: Name your Presto cluster .
Network Configuration: Specify your existing VPC, Subnet, and Security Groups. It is assumed these are preconfigured in your AWS account. See the below Prerequisites sections for more detail:
nofor SelectedSubnetAutoAssignsPublicIp if selected subnet does not provide public IPs. In such case VPC endpoints will be created for Presto stack.
Choose a CoordinatorInstanceType and WorkerInstanceType suitable for your workload. The
r4.4xlargeinstance types are chosen by default and work well for most workloads. Since Alluxio Master is colocated with Presto Coordinator and Alluxio worker is colocated with Presto worker, it’s suggested to choose an instance type with enough memory.
Choose a KeyName which is the name of an EC2 KeyPair to enable SSH access to the instance. See SSH Keys for more detail.
Specify the WorkersCount to specify the number of nodes to instantiate Presto workers and Alluxio workers. Presto worker nodes are added to an AWS AutoScaling Group. See Auto Scaling a Running Presto Cluster for more detail.
Specify the HACoordinatorsCount for Presto coordinator and Alluxio master high availability. See Presto Coordinator High Availability for more detail about Presto high availability. Alluxio has a built-in mechanism to achieve high availability. See Running Alluxio on a HA cluster for more details.
AllowExternalAccessOnHttp indicates whether you will be allowing external access on HTTP. Disabled by default, this is to disallow external access for security purposes. When disabled, clients (CLI/JDBC/etc) can connect to Presto only from machines in the AWS Subnet specified above. When enabled and instances are assigned public IP, Presto endpoints will be available over HTTP without authorization (unless eg. LDAP is configured via “Custom Configuration”).
It’s important to make sure your AWS infrastructure is set up in such a way that the Presto is not publicly accessible.
If you want to use the Hive connector to access data in HDFS or S3, you will need to configure Hive Metastore so Presto knows where data lives. See Configuring Hive Metastore for more detail.
AdditionalCoordinatorConfigurationURI and AdditionalWorkersConfigurationURI are the locations of your custom Presto configuration if you wish to override the default configurations. Please see the section Custom Configuration for more detail.
BootstrapScriptURI is the location of your optional bash script to run on the Presto Coordinator and Worker nodes after Presto is configured but before it is launched. For example, your bash script could be used to install additional software, deploy UDFs, or deploy other plugins. When the script is executed, a string argument value of
workeris passed in. You can check for this argument value in your script to perform certain actions based on the Presto node type.
AlluxioS3RootMount is an existing S3 address (e.g.: s3://bucket/directory) to be mounted to Alluxio root. This parameter is required to start the Alluxio cluster. The current user account needs to have full read write permissions to this S3 address. Otherwise, S3 credentials of the root mount can be provided via the Advanced AWS S3 Configuration section.
AlluxioProperties is an optional field to provide additional properties. Alluxio will be preconfigured with some properties values to be able to launch. Properties specified here will take precedence, overriding any preconfigured entries. The format for this field is KEY1=VALUE1,KEY2=VALUE2.
Hive Connector Options:
These options determine the Hive Metastore to use. There are several ways you configure Hive Metastore in Hive connector. You should refer to the dedicated documentation Configuring Hive Metastore to determine your configuration.
Advanced AWS S3 configuration:
With S3Endpoint, S3AccessKey and S3SecretKey it is possible to:
- configure custom access credentials for AWS S3, in this case set only S3AccessKey and S3SecretKey
- configure custom AWS S3 endpoint, in this cases set only S3Endpoint
- access third-party S3-compatible storage system, in this case set all three parameters
Notice that above parameters will affect configuration of only provisioned Hive catalog. All of these properties are optional, when not given EC2 instance default values are used.
Indicate if you want to enable integration with CloudWatch metrics. When enabled, OS and Presto metrics will be reported for each cluster node. Additionally, a CloudWatch Dashboard with cluster overview will be created. Additional CloudWatch fees are charged. Please refer to Integration with CloudWatch metrics for more detail.
Optionally specificy the InstanceProfile to attach to Presto nodes. See Instance Profiles for more detail. If you do not specificy the InstanceProfile, the CloudFormation Template will create the necessary IAM Role privleges. Alluxio adds DescribeAutoScalingGroups permission for configuring internal high availability and S3 related permissions for accessing AlluxioS3RootMount.
Indicate whether you will be launching Apache Superset.
- 3. Options
- Enter any additional stack specifications as shown on the Options page. These options include adding tags to resources within your cluster, choosing IAM roles, and specifying monitoring time for rollback triggers, among other advanced specifications. For further insight into said options follow the link to the AWS CloudFormation documentation.
- 4. Review
- Finally, review the details of your Presto and Alluxio joint cluster. When content, proceed by pressing create to conclude the creation.
Note: Just above the final “Create stack” button there will be a blue box informing you that The following resource(s) require capabilities: [AWS::IAM::Role]. In order to create the Stack you need to mark the checkbox next to I acknowledge that AWS CloudFormation might create IAM resources.
A user launching a CloudFormation Stack, whether via the AWS Web Console or AWS CLI, needs to have a set of permissions that will allow the creation of all necessary Stack resources. Please see the IAM Role Permissions for CloudFormation section for details.