4.12. Auto Scaling a Running Presto Cluster

AWS Auto Scaling offers automatic control over the size of your Presto cluster (CloudFormation stack).

Mange Auto Scaling Groups

When you create a Presto cluster, an Auto Scaling Group (ASG) is automatically created for all the Presto worker nodes. To view and manage this ASG, please refer to the AWS ASG page and log into your AWS account. There, you will see a list of ASGs for all Presto workers across all Presto clusters you have running. Here, you can control how Amazon Auto Scaling manages your Presto cluster.

Auto Scaling Models

There are three types of auto scaling models you can employ to manage your Presto cluster:

  • Static / Manual
  • Static / Scheduled
  • Dynamic

Static/Manual Auto Scaling

The static or manual auto scaling model is managed from the “Details” tab. This model is configured by default. In this tab, there are three main properties: “Desired Capacity”, “Min” and “Max”. You can click on the “Edit” button to change those values to your desired values and when you hit “Save” the Auto Scaling mechanism will then start to satisfy your requirements – either spinning up new Presto Worker nodes or shutting down existing ones.

In the Starburst CloudFormation template, by default, all three properties are set to the same value, keeping the number of running worker nodes equal to what you chose when spun up the Presto cluster from the Starburst CloudFormation template. This means that when a node gets terminated (or is unavailable for whatever reason), Auto Scaling will start a new one to satisfy the requirements.

Static/Scheduled Auto Scaling

The static or scheduled auto scaling model is controlled from the “Scheduled Actions” tab. There, you can create a number of scheduled actions that will allow you to change the size of the Presto cluster based on the time of day. For example, you could keep a small number of nodes during the night, and boost it during different parts of the day that see peak demand.

The configuration of this model is a simple list of actions that are scheduled to execute and change the static values of “Min”, “Max” and “Desired Capacity” properties to some other arbitrary (static) values of your choosing. Such an action will be executed with the configured schedule, either once or in a repetitive manner (cron). Continuing on the previous example, you would have a nightly cooldown – one event to handle lowering the values in the evening and another event every morning to bring them back up.

Dynamic Auto Scaling

Dynamic auto scaling uses policies which you define in the in the “Scaling Policies” tab. Of the three types of policies, “scaling policy with steps” and “target tracking scaling policy” (default policy), are the most useful. The third is a special case of the “with steps” policy that contains a single step. You can change the policy type by clicking a link at the bottom of the “Scaling Policies” tab.

  • Dynamic Target Tracking:
With the dynamic target tracking policy you: (1) choose a relevant metric (eg., avg CPU utilization) and state the target value; and (2) indicate the time buffer to wait before reassessing the metric to let the new nodes start up and start contributing to the metric value. Additionally, you can disable scale-in to have the mechanism be able to only increase the Starburst Presto worker count, not shrink the cluster.
  • Dynamic “With Steps”:
The dynamic “with steps” policy is more complex, as it consists of an alarm and a number of adjustments. To define an alarm, you must choose a metric and define its breach criteria (eg., avg CPU utilization over a chosen period of time higher than 70%). Additionally, the alarm can optionally send an event to an SNS topic for other systems to observe. Once the alarm is breached, a set of adjustments to the number of nodes are executed. Those adjustments can be either arbitrary (setting the number of nodes to a specific value) or increments. The increments, on the other hand, can be a value (eg., add 2 nodes, or remove 1 node) or a percentage of the current number of nodes (eg., add 10%, or reduce by 20%).

Auto Scaling Activity

All events in the Auto Scaling mechanism can be observed in the Activity History tab. This is very useful for debugging purposes. The current instances part of the ASG are listed in the “Instances” tab. The current behaviour is such that when Auto Scaling decides it needs to scale-in the cluster, it kills some number of Workers. This means that all currently running queries will fail. The Presto cluster itself is still healthy however and all new queries, submitted after the node termination, will run successfully. This limitation will be addressed in future versions, with the introduction of a graceful shutdown mechanism within the Starburst CloudFormation Stack.

Manual

Auto Scaling can also be used for Starburst Presto clusters built manually using the Starburst AMIs. The workers need to be manually put into a single Auto Scaling Group, and configured as described above.