4.18. Integration with CloudWatch metrics
Starburst’s CloudFormation template provides optional integration with CloudWatch metrics.
Metrics integration can be enabled via EnableCloudWatchMetrics
template property.
When enabled detailed OS and Presto metrics will be collected and uploaded to CloudWatch metrics service.
Additionally, a CloudWatch Dashboard with cluster overview will be created.
Metrics will be stored within Presto
CloudWatch metrics namespace.
Metrics collection interval is 10 seconds. Metrics are split into Node
and
Presto
types.
Node metrics
All node metrics contain following dimensions:
Category
- one of the following:cpu, memory, net, diskio
InstanceId
- an instance ID designated by AWSPrestoNodeRole
- eithercoordinator
orworker
PrestoStackName
- the name of a CloudFormation stackType
- this dimension is alwaysNode
host
- the private, internal hostname of the instance
There are following node metrics:
cpu_usage_idle, cpu_usage_iowait, cpu_usage_system, cpu_usage_softirq, cpu_usage_idle
– percentage node CPU usage. Those metrics additionally containcpu
dimension which is alwayscpu-total
.mem_used, mem_cached, mem_free
– node OS memory usage.net_bytes_recv, net_bytes_sent
– amount of bytes sent/read within last collection period (10 seconds). Those metrics additionally containinterface
dimension which is alwayseth0
.diskio_read_bytes, diskio_write_bytes
– amount of bytes read/written within last collection period (10 seconds). Those metrics additionally containname
dimension which is alwaysxvda1
.
Presto metrics
All Presto metrics contain following dimensions:
Category
- one of the following:executor, memory
InstanceId
- an instance ID designated by AWSPrestoNodeRole
- eithercoordinator
orworker
PrestoStackName
- the name of a CloudFormation stackType
- this dimension is alwaysPresto
host
- the private, internal hostname of the instancemetric_type
- one of the following:counter, timing
There are following Presto metrics:
RunningQueries
– the number of currently running Presto queries.HeapMemoryUsage_used, HeapMemoryUsage_committed, HeapMemoryUsage_max
– JVM heap memory usage metrics. For more information see MemoryMXBean.NonHeapMemoryUsage_used, HeapMemoryUsage_committed
– JVM non-heap memory usage metrics. For more information see MemoryMXBean.gc_young_CollectionCount, gc_young_CollectionTime
– counter and timer for G1 young and mixed collections (see also GarbageCollectorMXBean).gc_old_CollectionCount, gc_old_CollectionTime
– counter and timer for G1 full collections (see also GarbageCollectorMXBean). Ideally, those metrics should always be 0.
Aggregated metrics
Apart from per node metrics, there are also aggregated metrics with data from all
workers. Those metrics contain only: Category, PrestoNodeRole, PrestoStackName, Type
dimensions.
Dashboard
When metrics are enabled Starburst’s CloudFormation template will also create a cluster overview dashboard with Starburst-Dashboard-STACK_NAME
name. Dashboard contains various charts that provide visualization of collected metrics for coordinator
and workers, but also HA (see Presto Coordinator High Availability) alarm state and useful Presto links.

Troubleshooting
Metrics and dashboard can be used to determine various Presto performance issues. As an example, Presto should never trigger
full G1 garbage collection during normal operation. Therefore gc_old_CollectionCount, gc_old_CollectionTime
metrics
should always be 0.
Metrics can also be used to investigate cluster bottlenecks. For instance, it is possible to easily verify if cluster fully utilizes network, cpu or disk capacity.
Pricing
AWS CloudWatch metrics, dashboard and API requests come with a fee (see Amazon CloudWatch Pricing). While the number of aggregated metrics is constant (below 30) the number of per-node metrics scales with the number of nodes. There are in total 21 metrics per node.