7.12. Event Logger#

The event logger allows query completion events and all related information to be logged into a PostgreSQL database.

Note

Event logger requires a valid Starburst Enterprise Presto license.

Requirements#

  • PostgreSQL database, version 9.6 or higher

  • Network connectivity between the coordinator and the PostgreSQL server

  • PostgreSQL JDBC driver JAR copied into plugin/eventlistener on all nodes

Connectivity between the Presto and the PostgreSQL database can be verified on the coordinator or worker.

You can use ping to verify the hostname:

$ ping postgresql.example.com

You can use telnet with the hostname and port of the database to confirm PostgreSQL is listening on the configured port.

$ telent postgresql.example.com:5432

Installation and configuration#

The event logger is implemented as a event listener implementation and can be enabled by creating a configuration file called etc/event-listener.properties on the coordinator and workers. The name needs to be set to event-logger. The configuration includes the JDBC connection URL with hostname, port and database name as well as username and password for accessing the PostgreSQL database.

event-listener.name=event-logger
jdbc.url=jdbc:postgresql://postgresql.example.com:5432/public
jdbc.user=test_psql
jdbc.password=test12

The user needs to have sufficient rights to create tables and insert data in the configured schema.

The cluster must be restarted after creating configuration file. At first usage the table completed_queries is automatically created in the PostgreSQL database.

Once the table is created, logging is activated and running.

Logged information#

For each completed query, a row is created in the completed_queries table. It captures everything that Presto emits – query, user, metadata, stats, performance related attributes, resource consumption, start time, end time, and much more.

The following columns are populated with the relevant information:

  • query_id

  • catalog

  • schema

  • principal

  • user_agent

  • client_info

  • source

  • environment

  • remote_client_address

  • server_version

  • usr

  • query_state

  • query

  • query_plan

  • total_rows

  • total_bytes

  • output_rows

  • output_bytes

  • written_rows

  • written_bytes

  • cpu_time_ms

  • wall_time_ms

  • queued_time_ms

  • peak_user_memory_bytes

  • peak_total_non_revocable_memory_bytes

  • peak_task_user_memory

  • peak_task_total_memory

  • physical_input_bytes

  • physical_input_rows

  • internal_network_bytes

  • internal_network_rows

  • cumulative_memory

  • completed_splits

  • plan_node_stats_and_costs

  • stage_gc_statistics

  • cpu_time_distribution

  • operator_summaries

  • resource_waiting_time

  • analysis_time

  • execution_time

  • create_time

  • end_time

  • accessed_metadata

Analyzing the event log#

To analyze and query the event log, you can create a catalog with the PostgreSQL connector using the same JDBC connection parameters. This allows you to use the Presto CLI, or any other application connected to the catalog to create queries, dashboards, perform ad hoc analysis and more.

Example use cases:

  • measure query performance numbers and trends

  • understand impact of different cluster configurations

  • enable cluster workload management and resource consumption

Example#

Generate event logger data by running a query.

presto> SELECT nationkey, name, regionkey FROM tpch.sf1.nation LIMIT 5;
nationkey |   name    | regionkey
-----------+-----------+-----------
        0 | ALGERIA   |         0
        1 | ARGENTINA |         1
        2 | BRAZIL    |         1
        3 | CANADA    |         1
        4 | EGYPT     |         4
(5 rows)

Query 20200703_035704_00001_9n2pi, FINISHED, 1 node
Splits: 33 total, 33 done (100.00%)
1.80 [25 rows, 0B] [13 rows/s, 0B/s]

Event logger data is written to the completed_queries table in the PostgreSQL database.

In the following example, the catalog file is named postgresql.properties and the public schema contains the completed_queries table. The query_id from the execution is used to locate the correct record. The output in the following example has been trimmed.

presto> SELECT *
        FROM postgresql.public.completed_queries
        WHERE query_id = '20200703_035704_00001_9n2pi';
-[ RECORD 1 ]-------------------------+---------------------------------------
query_id                              | 20200703_035704_00001_9n2pi
catalog                               | NULL
schema                                | NULL
principal                             | NULL
user_agent                            | StatementClientV1/334
client_info                           | NULL
source                                | presto-cli
environment                           | demo
remote_client_address                 | 127.0.0.1
server_version                        | 334-e
usr                                   | demo
query_state                           | FINISHED
query                                 | SELECT nationkey, name, regionkey FROM tpch.sf1.nation LIMIT 5
query_plan                            | Fragment 0 [SINGLE]
                                      |     CPU: 24.50ms, Scheduled: 117.15ms, Input: 5 rows (148B); per task: avg.: 5.00 std.dev.: 0.00, Output: 5 rows
                                      |     Output layout: [nationkey, name, regionkey]
                                      | ...
total_rows                            | 25
total_bytes                           | 0
output_rows                           | 5
output_bytes                          | 148
written_rows                          | 0
written_bytes                         | 0
cpu_time_ms                           | 1615
wall_time_ms                          | 2169
queued_time_ms                        | 10
peak_user_memory_bytes                | 0
peak_total_non_revocable_memory_bytes | 0
peak_task_user_memory                 | 0
peak_task_total_memory                | 0
physical_input_bytes                  | 0
physical_input_rows                   | 25
internal_network_bytes                | 202
internal_network_rows                 | 5
cumulative_memory                     | 0.0
completed_splits                      | 33
plan_node_stats_and_costs             | {"stats":{},"costs":{}}
stage_gc_statistics                   | [{"stageId":0,"tasks":1,"fullGcTasks":0,...
cpu_time_distribution                 | [{"stageId":0,"tasks":1,"p25":25,"p50":25,...
operator_summaries                    | [{"stageId":0,"pipelineId":0,"operatorId":0,..
resource_waiting_time                 | 165
analysis_time                         | 168
execution_time                        | 1993
create_time                           | 2020-07-03 03:59:54.543 UTC
end_time                              | 2020-07-03 03:59:56.712 UTC
accessed_metadata                     | [{"catalogName":"tpch","schema":"sf1.0","table":"nation","columns":["nationkey","regionkey","name"]}]