4.17. Integration with CloudWatch Metrics#
Starburst’s CloudFormation template provides optional integration with
CloudWatch metrics. Metrics integration can be enabled with the
EnableCloudWatchMetrics
template property. When enabled detailed OS and
Presto metrics are collected and uploaded to CloudWatch metrics service.
Additionally, a CloudWatch Dashboard with cluster overview is created.
Metrics are stored within the Presto
CloudWatch metrics namespace. Metrics
collection interval is 10 seconds. Metrics are split into Node
and
Presto
types.
Node Metrics#
All node metrics contain following dimensions:
Category
- one of the following:cpu, memory, net, diskio
InstanceId
- an instance ID designated by AWSPrestoNodeRole
- eithercoordinator
orworker
PrestoStackName
- the name of a CloudFormation stackType
- this dimension is alwaysNode
host
- the private, internal hostname of the instance
There are following node metrics:
cpu_usage_idle, cpu_usage_iowait, cpu_usage_system, cpu_usage_softirq, cpu_usage_idle
– percentage node CPU usage. Those metrics additionally containcpu
dimension which is alwayscpu-total
.mem_used, mem_cached, mem_free
– node OS memory usage.net_bytes_recv, net_bytes_sent
– amount of bytes sent/read within last collection period (10 seconds). Those metrics additionally containinterface
dimension which is alwayseth0
.diskio_read_bytes, diskio_write_bytes
– amount of bytes read/written within last collection period (10 seconds). Those metrics additionally containname
dimension which is alwaysxvda1
.
Presto Metrics#
All Presto metrics contain following dimensions:
Category
- one of the following:executor, memory
InstanceId
- an instance ID designated by AWSPrestoNodeRole
- eithercoordinator
orworker
PrestoStackName
- the name of a CloudFormation stackType
- this dimension is alwaysPresto
host
- the private, internal hostname of the instancemetric_type
- one of the following:counter, timing
The following are Presto metrics collected:
RunningQueries
– the number of currently running queries.QueuedQueries
– the number of currently queued queries.HeapMemoryUsage_used, HeapMemoryUsage_committed, HeapMemoryUsage_max
– JVM heap memory usage metrics. For more information see MemoryMXBean.NonHeapMemoryUsage_used, HeapMemoryUsage_committed
– JVM non-heap memory usage metrics. For more information see MemoryMXBean.gc_young_CollectionCount, gc_young_CollectionTime
– counter and timer for G1 young and mixed collections (see also GarbageCollectorMXBean).gc_old_CollectionCount, gc_old_CollectionTime
– counter and timer for G1 full collections (see also GarbageCollectorMXBean). Ideally, those metrics should always be 0.
Aggregated Metrics#
Apart from per node metrics, there are also aggregated metrics with data from
all workers. Those metrics contain only: Category, PrestoNodeRole,
PrestoStackName, Type
dimensions.
Dashboard#
When metrics are enabled Starburst’s CloudFormation template creates a cluster
overview dashboard with Starburst-Dashboard-STACK_NAME
name. It contains
various charts that provide visualization of collected metrics for coordinator
and workers, but also HA alarm state and useful
Presto links.

Troubleshooting#
Metrics and dashboard can be used to troubleshoot various performance issues. As
an example, Presto should never trigger full G1 garbage collection during normal
operation. Therefore gc_old_CollectionCount, gc_old_CollectionTime
metrics
should always be 0.
Metrics can also be used to investigate cluster bottlenecks. For instance, it is possible to verify if the cluster fully utilizes network, cpu or disk capacity. ß Pricing ——-
AWS CloudWatch metrics, dashboard and API requests come with a fee (see Amazon CloudWatch Pricing). While the number of aggregated metrics is constant (below 30) the number of per-node metrics scales with the number of nodes. There are in total 21 metrics per node.