Starburst Enterprise Presto is available on the AWS Marketplace. It integrates the reliable, scalable, and cost-effective cloud computing services provided by Amazon with the power of the fastest growing distributed query engine within the industry. Through the use of Starburst’s CloudFormation template and Presto AMI, Presto on AWS enables the user to run analytic queries across distinct data sources of varying sizes via Presto clusters. Within a single query, you can access multiple data stores, allowing for the analysis of data across your entire organization. In minutes the user is able to provision from small to large clusters of compute instances and leverage the power of Presto’s parallelism. At its core, Presto is architected to bring your organization faster query processing and thus greater efficiencies and cost-effectiveness. Simply create your cluster and begin querying to witness how Presto can impact your organization’s big data query functionality and bottom line.
SEP on AWS Marketplace is always based on our latest release.
Check out our Release Notes for more detailed and up to date information.
The business intelligence tool Apache Superset is packaged alongside Presto. Superset is a data exploration and visualization web application that enables users to process data in a variety of ways including writing SQL queries, creating new tables, creating a visualization (slice), adding that visualization to one or many dashboards and downloading a CSV. Superset’s SQL Lab IDE provides the user ways to both query and visualize data. You can explore and preview tables in Presto and effortlessly compose SQL queries to access data. From here, you can export a CSV file or immediately visualize your data in the Superset “Explore” view.
For more information on Apache Superset, refer to our dedicated section.
You can deploy both single node AMIs and multi-node clusters via CFTs.
An Amazon Machine Image (AMI) provides the information required to launch an instance or a virtual server. Starburst makes launching instances easy with our custom Presto AMI. Simply choose your preferred instance type, specify configuration details and other instance specifications, and you are ready to launch. Deploying a single instance from our Presto AMI allows you to easily experience the power of Presto without expending resources on installation or configuration.
AWS CloudFormation is a service that we leverage to help set up AWS resources for a Presto cluster so that you can spend less time managing said resources and more time focusing on your applications that run in AWS. Employing our template helps to describe all the AWS resources that you need, such as Amazon EC2 compute instances, and AWS CloudFormation then takes care of provisioning and configuring those resources for you. Starburst’s CloudFormation template for Presto offers quick provisioning of Presto clusters with configurable specifications to fit your application’s needs. The provided CloudFormation template provisions a Presto cluster by launching multiple instances of the Presto AMI. Moreover, the CloudFormation template automatically configures Presto or allows for you to customize your Presto cluster configurations easily.
Starburst offers the following regions for Presto on AWS:
|Region Code||Region Name|
|us-east-1||US East (N. Virginia)|
|us-east-2||US East (Ohio)|
|us-west-1||US West (N. California)|
|us-west-2||US West (Oregon)|
|ap-northeast-1||Asia Pacific (Tokyo)|
|ap-northeast-2||Asia Pacific (Seoul)|
|ap-southeast-1||Asia Pacific (Singapore)|
|ap-southeast-2||Asia Pacific (Sydney)|
|ap-south-1||Asia Pacific (Mumbai)|
|sa-east-1||South America (São Paulo)|
Reference the following information to help choose your optimal instance type for your specific compute, memory, and storage needs.
1. Choose CPU/Memory Ratio
Generally, you should favor clusters with a higher CPU/memory ratio, as Presto is usually bound by CPU. However, machines with larger memory (e.g., r4.8xlarge, r4.16xlarge) are favorable, if one or more of the following cases hold true.
- Queries are failing because of exceeding node memory limits.
- There is high query concurrency and queries are executing in the reserved memory pool.
- A large number of nodes with lower memory and a higher number of CPUs is required because maximum query memory is very high.
- Your can observer query skewness issues.
- Presto is not bounded by CPU, but other factors such as storage on S3.
2. Align with Offered Instance Types
|Instance family||Category||CPU/memory ratio||Use case|
For cost-efficiency use the smallest cluster possible that allows for queries to pass (e.g., because of memory requirements). However, if your cluster is bound by some resource (e.g., CPU) choose nodes with the highest ratio between that resource and other resources (e.g., for CPU bound queries choose nodes with highest CPU/memory ratio).
Presto is a distributed system that runs on one or more machines to form cluster.
When using Starburst’s CloudFormation template for Presto, a typical alongside various other complex components. Following is a list of the involved components:
Command Line Interface Client
The command line interface is used to send an SQL query to Presto. This client is installed on the same machine as the coordinator by default. It can also be installed and used on a different machine that has access to the Presto coordinator via HTTP.
The coordinator is provisioned on an EC2 instance and is responsible for parsing the SQL queries as well as analyzing, planning, and scheduling their execution.
Presto Workers and Auto Scaling Group
The workers are provisioned on EC2 instances that comprise the remainder of the cluster and are responsible for executing the SQL queries, such as aggregating data, and delivering the result to the client.
The worker nodes belong to an Auto Scaling group for the purpose of instance scaling and management. An Auto Scaling group starts by launching enough EC2 instances to meet its desired capacity and continues to maintain the capacity by scaling up or down as needed.
Your EC2 instances are contained within placement groups. When creating a Presto cluster, instances are launched in a placement group, which determines how they are placed on the underlying hardware. This allows for a low-latency network between the Presto nodes.
Your EC2 instances are contained within one or more security groups – a virtual firewall that controls the traffic for one or more instances. When you create your cluster, you associate one or more security groups with it. This includes the specification and addition of rules to each security group that allow traffic to or from its associated instances.
Your cluster is launched within your virtual private cloud subnet; a subset of the overarching virtual network dedicated to a specific availability zone – isolated from failures in other zones.
Your VPC subnet is contained within a larger network entity known as the virtual private cloud (VPC). The VPC is a virtual network dedicated to your AWS account and is logically isolated from other virtual networks in the AWS Cloud. As mentioned above it is broken down into availability zones or regions.