6.7. Horizontal Scaling#
Presto is designed to scale horizontally. You can higher performance and can support larger workloads by adding more workers to the cluster. Kubernetes has built-in support for this horizontal scaling, which can greatly improve benefits, while at the same time reducing resource waste and therefore operating costs.
Kubernetes allows scaling your Presto cluster using the Horizontal Pod Autoscaler. HPA continuously monitors worker metrics and creates/terminates worker pods as needed to ensure there are enough worker pods to handle incoming queries.
HPA can be enabled and configured in the worker section:
worker: count: 2 autoscaling: enabled: true minReplicas: 1 maxReplicas: 100 targetCPUUtilizationPercentage: 80 cpuLimit: 15 cpuRequest: 15
This configuration controls the number of pods and sets the target CPU usage the HPA tries to achieve.
Note that for HPA to work, worker CPU limit must also be set, otherwise
Kubernetes does not monitor CPU usage and HPA lacks metrics to use. Set the CPU
limit and CPU request to the same value when using HPA, preferably to
of node cores - 1 for dedicated Presto nodes. Setting both of these to the
same value is important because CPU utilization percentage is calculated as
cpuUsage/cpuRequest, but is limited by
cpuLimit, CPU utilization can be greater than 100%, and this
can lead to huge spikes in the number of pods.
HPA schedules new pods to be started, but the pods still need nodes to run on. They can be provisioned on demand or taken from a large global pool.
An on-prem cluster typically pools all machines available, with resources shared between different services in separate namespaces. While the total number of nodes is constant in this scenario, the amount of resources dedicated to Presto still matches the demand thanks to pod creation/termination by the HPA.
On GKE you can use the cluster autoscaler.