5.7. Horizontal Scaling#

Presto is designed to scale horizontally. You can higher performance and can support larger workloads by adding more workers to the cluster. Kubernetes has built-in support for this horizontal scaling, which can greatly improve benefits, while at the same time reducing resource waste and therefore operating costs.

Horizontal Pod Scaling#

Kubernetes allows scaling your Presto cluster using the Horizontal Pod Autoscaler. HPA continuously monitors worker metrics and creates/terminates worker pods as needed to ensure there are enough worker pods to handle incoming queries.

HPA can be enabled and configured in the worker section:

worker:
  count: 2
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 100
    targetCPUUtilizationPercentage: 80
  cpuLimit: 15
  cpuRequest: 15

This configuration controls the number of pods and sets the target CPU usage the HPA tries to achieve.

Note that for HPA to work, worker CPU limit must also be set, otherwise Kubernetes does not monitor CPU usage and HPA lacks metrics to use. Set the CPU limit and CPU request to the same value when using HPA, preferably to number of node cores - 1 for dedicated Presto nodes. Setting both of these to the same value is important because CPU utilization percentage is calculated as cpuUsage/cpuRequest, but is limited by cpuLimit. If cpuRequest is smaller than cpuLimit, CPU utilization can be greater than 100%, and this can lead to huge spikes in the number of pods.

Cluster Scaling#

HPA schedules new pods to be started, but the pods still need nodes to run on. They can be provisioned on demand or taken from a large global pool.

An on-prem cluster typically pools all machines available, with resources shared between different services in separate namespaces. While the total number of nodes is constant in this scenario, the amount of resources dedicated to Presto still matches the demand thanks to pod creation/termination by the HPA.

On GKE you can use the cluster autoscaler.