SEP configuration#

The starburst-presto Helm chart configures the SEP coordinator and worker nodes in the cluster with the values.yaml file detailed in the following sections.

A minimal values file adds the registry credentials, overrides any defaults to suitable values and adds configuration for catalogs and other information as desired.

Docker image and registry#

The default configuration automatically contains the details for the relevant Docker image on the Starburst Harbor instance.

image:
  repository: "harbor.starburstdata.net/starburstdata/presto"
  tag: "346-e"
  pullPolicy: "IfNotPresent"

initImage:
  repository: "harbor.starburstdata.net/starburstdata/presto-init"
  tag: "346.0.0"
  pullPolicy: "IfNotPresent"

registryCredentials:
  enabled: false
  registry:
  username:
  password:

You need to add your credentials to access the registry as shown in the Docker requirements section.

image

The image section controls the Docker image to use including the version. Typically, the Helm chart version reflects the SEP version. Specifically, the SEP version 344-e.x is reflected in the chart version of 344.x.0, or patches for the chart are released as 344.x.1, 344.x.2, and similar.

initImage

The initImage section defines the container image and version to use for bootstrapping configuration files for SEP from the Helm chart and values files. This functionality is internal to the chart, and you should not change or override this configuration.

registryCredentials

The registryCredentials section defines authentication details for Docker registry access. Typically, you need to use your username and password for the Starburst Harbor instance.

You can add and override these configurations in your YAML file to tightly control the version of SEP in your cluster.

For example, you can choose to use a newer patch version 346-e.3 of SEP, which allows you to keep the rest of the chart configuration unchanged:

image:
  repository: "harbor.starburstdata.net/starburstdata"
  tag: "346-e.3"
  pullPolicy: "IfNotPresent"

If you need to use an internal Docker registry, you can do the following:

  • pull the Docker image from the Starburst Harbor registry with your credentials

  • tag the image as desired for your internal registry

  • push the image to your registry

  • add the image section to your values YAML file:

image:
  repository: "docker.example.com/thirdparty"
  tag: "346-e"
  pullPolicy: "IfNotPresent"

registryCredentials:
  enabled: true
  registry: docker.example.com
  username: myusername
  password: mypassword

Internal communication#

The security of internal communication between the coordinator and the workers has to be improved by setting values to the following parameters:

environment:
sharedSecret:

environment

The environment name for the Presto cluster set as node.environment in the node properties file. Examples are values such as production, staging, batch_cluster, site_analytics, or any other short meaningful string.

sharedSecret

Set the shared secret value for secure communication between coordinator and workers in the cluster to a long, random string. If not set, the node.environment value from the node.properties file is used.

environment: test_cluster
sharedSecret: u70Qhhw9PsZmEgIo7Zqg3kIj3AJZ5/Mnyy5iyjcsgnceM+SSV+APSTis...

Exposing the cluster to an outside network#

You must expose the cluster to allow users to connect to the Presto coordinator with tools such as the CLI, applications using JDBC/ODBC drivers and any other client application. This service type configuration is defined in the expose section.

You can choose from four different mechanisms by setting the type value to the common configurations in k8s:

  • nodePort

  • clusterIp

  • loadBalancer

  • ingress

Depending on your choice, you only have to configure the identically named sections.

The default is clusterIp. This only exposes the coordinator on a k8s cluster internal IP.

expose:
  type: "clusterIp"
  clusterIp:
    name: "presto"
    ports:
      http:
        port: 8080

Using nodePort allows you to configure the internal port of the coordinator outside the cluster on the nodePort port number.

expose:
  type: "nodePort"
  nodePort:
    name: "presto"
    ports:
      http:
        port: 8080
        nodePort: 30080

You can use loadBalancer to configure a load balancer. It can be automatically created and configured by your k8s platform.

expose:
  type: "loadBalancer"
  loadBalancer:
    name: "presto"
    IP: ""
    ports:
      http:
        port: 8080
    annotations: {}
    sourceRanges: []

Ingress usage is very powerful and may provide load balancing, SSL termination, and name-based virtual hosting. This allows a load balancer to route to multiple apps in the cluster. For example, the SEP coordinator and Ranger server can be in the same cluster, and can be exposed via ingress configuration.

expose:
  type: "ingress"
  ingress:
    serviceName: "presto"
    servicePort: 8080
    tls:
      enabled: true
      secretName:
    host:
    path: "/"
    annotations: {}

Example for ingress with nginx and cert-manager#

nginx is a powerful HTTP and proxy server, commonly used as load balancer. You can combine using it with cert-manager backed by Let’s Encrypt:

As a first step you need to deploy an HTTPS ingress controller for your cluster. You can follow a tutorial from the cert-manager documentation.

With the setup done, and an A record in your DNS zone ready, you can expose the Presto Web UI:

expose:
  type: "ingress"
  ingress:
    serviceName: "presto"
    servicePort: 8080
    tls:
      enabled: true
      secretName: "tls-secret-presto"
    host: ""
    path: "/(.*)"
    annotations:
      kubernetes.io/ingress.class: "nginx"
      cert-manager.io/issuer: "letsencrypt-staging"

The secretName is used by the cert-manager to store the generated certificate, and can be any value.

The annotations section uses the nginx default value for the single ingress controller installation. It assumes certificate issuer with the name letsencrypt-staging is used, and needs to exist.

The Ranger user interface can be exposed in exactly the same way:

expose:
  type: "ingress"
  ingress:
    tls:
      enabled: true
      secretName: "tls-secret-ranger"
    host: ""
    path: "/(.*)"
    annotations:
      kubernetes.io/ingress.class: "nginx"
      cert-manager.io/issuer: "letsencrypt-staging"

Coordinator#

The coordinator section configures the pod of the cluster that runs the Presto coordinator. The default values are suitable to get started with reasonable defaults on a production-sized k8s cluster.

coordinator:
  etcFiles:
    jvm.config: |
      -server
      -XX:-UseBiasedLocking
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=32M
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+ExitOnOutOfMemoryError
      -XX:+UseGCOverheadLimit
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:ReservedCodeCacheSize=512M
      -Djdk.nio.maxCachedBufferSize=2000000
      -Djdk.attach.allowAttachSelf=true
    properties:
      config.properties: |
        coordinator=true
        node-scheduler.include-coordinator=false
        http-server.http.port=8080
        discovery-server.enabled=true
        discovery.uri=http://localhost:8080
      node.properties: |
        node.environment={{ include "presto.environment" . }}
        node.data-dir=/data/presto
        plugin.dir=/usr/lib/presto/plugin
        node.server-log-file=/var/log/presto/server.log
        node.launcher-log-file=/var/log/presto/launcher.log
      log.properties: |
        # Enable verbose logging from Presto
        #io.prestosql=DEBUG
      password-authenticator.properties: |
        password-authenticator.name=file
        file.password-file=/usr/lib/presto/etc/auth/password.db
      access-control.properties:
    other: {}
  resources:
    requests:
      memory: "60Gi"
      cpu: 16
    limits:
      memory: "60Gi"
      cpu: 16
  nodeMemoryHeadroom: "2Gi"
  heapSizePercentage: 90
  heapHeadroomPercentage: 30
  additionalProperties: ""

  envFrom: []
  nodeSelector: {}
  affinity: {}
  tolerations: []
  priorityClassName:

coordinator.etcFiles.jvm.config

Defines the content of the JVM Config for the coordinator. You can change it by adding an updated configuration in your values YAML file.

For example, if you want to update memory settings for the G1 garbage collector, the reserved code cache size, and the max cache buffer size to adapt to lower memory settings, you add the full configuration with the updated values.

coordinator:
  etcFiles:
    jvm.config: |
      -server
      -XX:-UseBiasedLocking
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=16M
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+ExitOnOutOfMemoryError
      -XX:+UseGCOverheadLimit
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:ReservedCodeCacheSize=256M
      -Djdk.nio.maxCachedBufferSize=3000000
      -Djdk.attach.allowAttachSelf=true

coordinator.etcFiles.properties

Defines configuration files located in the etc folder. Each nested section defines the filename. Defaults are provided for the main configuration files etc/jvm.config, etc/config.properties, etc/node.properties and etc/log.properties.

Modifications to default configuration can be performed in two steps. First, add the default configuration values to your values YAML file. Then, update and add to the configuration as desired.

Additional properties files can be added by adding a nested section and the desired content of the configuration file.

coordinator.etcFiles.properties.config.properties

Defines the content of the default configuration file for the coordinator. You can also use additionalProperties for adding configuration values.

coordinator.properties.node.properties

coordinator.etcFiles.properties.log.properties

coordinator.etcFiles.other

Other files that needs to be placed in the etc directory.

coordinator:
  etcFiles:
    other:
      resource-groups.json: |
        {
          <<json_here>
        }
      kafka/tpch.customer.json: |
        {
          <<json_here>
        }

coordinator.resources

The CPU and memory resources to use for the coordinator pod. Request and limit values should be identical.

These settings can be adjusted to match your workload and available node sizes in the cluster:

coordinator:
  resources:
    requests:
      memory: "256Gi"
      cpu: 32
    limits:
      memory: "256Gi"
      cpu: 32
  nodeMemoryHeadroom: "4Gi"

coordinator.nodeMemoryHeadroom

The size of the container memory headroom. The value needs to be less than resource allocation limit for memory defined in resources.

coordinator.heapSizePercentage

Percentage of container memory reduced with headroom assigned to Java heap. Must be less than 100.

coordinator.heapHeadroomPercentage

Headroom of Java heap memory not tracked by Presto during query execution. Must be less than 100.

coordinator.additionalProperties

Additional properties to append to the default configuration file. Default values are overridden. An example usage is to set query time out values

additionalProperties: |
  query.client.timeout=5m
  query.min-expire-age=30m

coordinator.envFrom

Allows for the propagation of environment variables from different sources complying to K8S schema:

envFrom:
  - secretRef:
      name: <<secret_name>>

This can be used to deliver values to Presto configuration properties file by creating a Kubernetes secret holding variable values. For example, if you want to use secrets for sensitive credential information to use in a catalog properties file:

1. Create a secret holding variables. You can statically create secrets using base64 encoded values of your configuration. Make sure your secret key, which is used to define the environment variable name, follows this regex pattern [a-zA-Z][a-zA-Z0-9_]* - only alphanumerics and underscore allowed. Convention is to use all caps and underscores such as PSQL_USERNAME.

echo -n user | base64
echo -n pass | base64
apiVersion: v1
kind: Secret
metadata:
  name: variables-secret
type: Opaque
data:
  PSQL_USERNAME: <base64_encoded_user>
  PSQL_PASSWORD: <base64_encoded_pass>
  1. Add the secret reference in envFrom for both coordinator and worker to make it accessible on all nodes:

envFrom:
  - secretRef:
      name: variables-secret

3. Reference variables in properties files using built-in placeholder pattern as supported by the secrets support of Presto.

catalogs:
  postgresql: |
    connector.name=postgresql
    connection-url=jdbc:postgresql://postgresql:5432/postgres
    connection-password=${ENV:PSQL_PASSWORD}
    connection-user=${ENV:PSQL_USERNAME}

coordinator.nodeSelector, coordinator.affinity and coordinator.tolerations

Configuration to determine the node and pod to use.

coordinator.priorityClassName

Priority class for coordinator pod for setting k8s pod priority and preemption.

Workers#

The worker section configures the pods of the cluster that run the Presto workers. The default values in the following snippet have to be adjusted to be suitable to your workload and cluster:

worker:
  etcFiles:
    jvm.config: |
      -server
      -XX:-UseBiasedLocking
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=32M
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+ExitOnOutOfMemoryError
      -XX:+UseGCOverheadLimit
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:ReservedCodeCacheSize=512M
      -Djdk.nio.maxCachedBufferSize=2000000
      -Djdk.attach.allowAttachSelf=true
    properties:
      config.properties: |
        coordinator=false
        http-server.http.port=8080
        discovery.uri=http://{{ include "presto.service.name" . }}:8080
      node.properties: |
        node.environment={{ include "presto.environment" . }}
        node.data-dir=/data/presto
        plugin.dir=/usr/lib/presto/plugin
        node.server-log-file=/var/log/presto/server.log
        node.launcher-log-file=/var/log/presto/launcher.log
      log.properties: |
        # Enable verbose logging from Presto
        #io.prestosql=DEBUG
    other: {}
  count: 2
  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 100
    targetCPUUtilizationPercentage: 80
  deploymentTerminationGracePeriodSeconds: 300 # 5 minutes
  prestoWorkerShutdownGracePeriodSeconds: 120 # 2 minutes
  resources:
    requests:
      memory: "100Gi"
      cpu: 16
    limits:
      memory: "100Gi"
      cpu: 16
  nodeMemoryHeadroom: "2Gi"
  heapSizePercentage: 90
  heapHeadroomPercentage: 30
  additionalProperties: ""
  envFrom: []
  nodeSelector: {}
  affinity: {}
  tolerations: []
  priorityClassName:

worker.count

The number of worker pods for a static cluster.

worker.autoscaling

Configuration for the minimum and maximum number of workers. Ensure the additional requirements for scaling are fulfilled on your k8s cluster. Set enabled to true to activate scaling. The targetCPUUtilizationPercentage sets the threshold value that triggers scaling up by adding more workers until maxReplicas is reached.

Scaling down proceeds until minReplicas is reached and is controlled by deploymentTerminationGracePeriodSeconds and prestoWorkerShutdownGracePeriodSeconds.

Warning

The autoscaling feature does not yet work with Openshift clusters (as of the latest release 4.6), due to a known limitation with Openshift clusters. HPA does not work with pods having init containers in Openshift.

Read more information in our scaling section.

worker.deploymentTerminationGracePeriodSeconds

Specifies the termination grace period for workers. Workers are not terminated until queries running on the pod are finished and the grace period passes.

worker.prestoWorkerShutdownGracePeriodSeconds

Sets shutdown.grace-period to configure the grace period for worker process shutdown.

The following configuration properties for workers are identical to the coordinator properties, documented in preceding section.

  • worker.etcFiles.*

  • worker.resources

  • worker.nodeMemoryHeadroom

  • worker.heapSizePercentage

  • worker.heapHeadroomPercentage

  • worker.additionalProperties

  • worker.envFrom

  • worker.nodeSelector

  • worker.affinity

  • worker.tolerations

  • worker.priorityClassName

Common settings for coordinator and workers nodes#

You can create a startup shell script to customize how Presto is started on the coordinator and workers and pass additional argument to it.

userDatabase:
name: password.db
users:
  - username: admin
    password: 46991b33f7a75ff79213c0dc0e610610
initFile:
extraArguments:
extraSecret:
  name:
  file:

initFile

A shell script to run before Presto is launched. The content of the file has to be an inline string in the YAML file. The script is started as /bin/bash <<init_file>>. When called, it is passed the single parameter value coordinator or worker depending on the type of pod. The script needs to invoke the Presto script /usr/lib/presto/bin/run-presto for a successful start of Presto.

Note

In the initFile, for chart versions released before middle of Oct 2020 use the launcher script exec /usr/lib/presto/bin/launcher run. For newer releases use exec /usr/lib/presto/bin/run-presto to enable graceful shutdown of workers .

extraArguments

List of extra arguments to be passed to the initFile script.

The following example shows how you can use initFile to run a custom init script on the coordinator and workers:

initFile: |
  #!/bin/bash
  echo "Custom init for $1 $2
  exec /usr/lib/presto/bin/run-presto
extraArguments:
  - TEST_ARG

Output on the coordinator:

Custom init for coordinator TEST_ARG
<<presto_logs>>

Output on a worker:

Custom init for worker TEST_ARG
<<presto_logs>>

Security context#

You can configure a security context to define privilege and access control settings for the Presto pods.

securityContext:
  <<security settings for the pod>>

External secret reference#

There are several locations where properties require pointing to files delivered outside of Presto, for example CA certificates. In such cases, you can use a special notation that allows you to point to a k8s secret.

For example, you can configure password authentication using LDAP. This requires the following configuration file etc/password-authenticator.properties, which points to the ca.crt certificate file.

ldap.ssl-trust-certificate=etc/ca.crt

Your first step is to create a k8s secret holding the file:

kubectl create secret generic ldap-ca --from-file=ca.crt

You can configure the secret reference usage for the above configuration as:

coordinator:
  etcFiles:
    properties:
      password-authenticator.properties: |
        ldap.url=ldaps://ldap-server:636
        ldap.user-bind-pattern=uid=${USER},OU=America,DC=corp,DC=example,DC=com
        ldap.ssl-trust-certificate=secretRef:ldap-ca:ca.crt

This mounts the secret named ldap-ca in the path /mnt/secretsRef/ldap-ca and replaces secretRef:ldap-ca occurrences into the absolute path.

ldap.ssl-trust-certificate=/mnt/secretRef/ldap-ca/ca.crt

Note

Specific secret values, such as passwords, can be passed into properties files using the envFrom parameters available for coordinator and worker.

Defining external secrets#

You can automatically mount external secrets, for example from the AWS Secrets Manager, using the secretRef or secretEnv notation.

externalSecrets:
  enabled: true # disabled by default
  type: goDaddy
  secretPrefix: <<secret_name_prefix>>
  goDaddy:
    backendType: <<string>>

externalSecrets.type

Type of the external secret provider. Currently, only goDaddy is supported.

Note

The external secret provider needs to be configured with proper access to the specified backendType in order to successfully pull in the external secrets.

externalSecrets.secretPrefix

Prefix of all secrets that need to be mapped to external secret.

externalSecrets.goDaddy.backendType

The type of the used GoDaddy backend, for example secretsManager or systemManager.

The Helm chart scans for all secretRef or secretEnv references in the values.yaml which start with the configured secretPrefix string. For each secret found, it generates an``ExternalSecret`` K8s manifest.

Note

The Selected external secrets provider needs to be deployed and configured separately. The secret names in the external storage must match names of K8s secrets you reference. When using secretEnv, the external storage secret must contain only a single value. For each external secret a single K8s secret is created, including one key with external secret value.

An example of this configuration:

  1. Create AWS Secrets Manager secret:

aws secretsmanager create-secret --name external0presto0http0server0port --secret-string 8888
  1. Reference it from your configuration section in config.properties:

coordinator:
  etcFiles:
    config.properties: |
      http-server.http.port=secretEnv:external0presto0http0server0port
  1. Configure the external secrets:

externalSecrets:
  enabled: true
  type: goDaddy
  secretPrefix: external0
  goDaddy:
    backendType: secretsManager

This creates the following external secret manifest:

apiVersion: kubernetes-client.io/v1
kind: ExternalSecret
metadata:
  name: external0presto0http0server0port
spec:
  backendType: secretsManager
  data:
    - key: external0presto0http0server0port
      name: external0presto0http0server0port

Additionally, the external secrets provider fetches secrets from AWS and creates a k8s secret:

apiVersion: v1
kind: Secret
metadata:
  name: external0presto0http0server0port
type: Opaque
data:
  external0presto0http0server0port: 8888

The k8s secret is now bound to the container as the external0presto0http0server0port environment variable. Presto config.properties is resolved to:

http-server.http.port=${ENV:external0presto0http0server0port}

If you have a secret with multiple values, such as a JSON-formatted secret, you can reference the secret values independently.

For example, you may have a secret named external0presto0creds0mysql that is structured like this in the AWS Secrets Manager:

{
  "MYSQL_USER": "user",
  "MYSQL_PASSWORD": "password"
}

The MYSQL_USER and MYSQL_PASSWORD keys can be referenced in the values.yaml file:

externalSecrets:
  enabled: true
  type: goDaddy
  secretPrefix: external0
  goDaddy:
    backendType: secretsManager

catalogs:
  mysqldb: |-
    connector.name=mysql
    connection-url=jdbc:mysql://<<dns>>:3306
    connection-user=secretEnv:external0presto0creds0mysql:MYSQL_USER
    connection-password=secretEnv:external0presto0creds0mysql:MYSQL_PASSWORD

File-based authentication#

The command htpasswd can generate a user database, which can be used to configure file-based user authentication. It creates the file under /usr/lib/presto/etc/auth/{{.Values.userDatabase.name}}. This allows you to statically deliver user credentials to the file etc/password-authenticator.properties:

userDatabase:
  name: password.db
  users:
    - username: admin
      password: 46991b33f7a75ff79213c0dc0e610610

Add the following setup to your values YAML file to configure file-based authentication:

coordinator:
  etcFiles:
    properties:
      password-authenticator.properties: |
        password-authenticator.name=file
        file.password-file=/usr/lib/presto/etc/auth/password.db

Query memory usage control#

The query section allows you to configure some query processing properties, which are inserted in the configuration properties file.

query:
  maxConcurrentQueries: 3

maxMemoryForCluster ==> presto.query.max-memory

query.heapHeadroomPercentage

Heap memory percentage not tracked by Presto during query execution. Used for Java objects.

query.maxConcurrentQueries

Maximum queries executed in parallel on single node

Spilling#

You can configure disk spilling with the spilling section. It uses internal node storage, which is mounted within the container. It is disabled by default, and we recommend to leave it disabled. Enabling spill can be used to allow for rare, memory-intensive queries to succeed on a smaller cluster at the expense of query performance and overall cluster performance.

spilling:
  enabled: false
  volume:
    emptyDir: {}

Hive connector storage caching#

The cache section allows you to configure Hive connector storage caching.

cache:
  enabled: false
  diskUsagePercentage: 80
  ttl: "7d"
  volume:
    emptyDir: {}

cache.enabled

Enable or disable caching for all catalogs using the Hive connector. If you want to only enable it for a specific catalog, you have to configure it with the catalog configuration and additionalVolumes.

cache.diskUsagePercentage

Set the value for the hive.cache.disk-usage-percentage property.

cache.ttl

Set the value for the hive.cache.ttl property.

cache.volume

Configure the volume in which to store the cached files.

Catalogs#

The catalogs section allows you to configure catalog properties files that configure access to the data sources. Information for specific properties supported in each catalog can be found with the documentation for the connectors.

By default, a catalog tpch is configured using the TPCH connector. This allows for some simple testing with a default deployment:

catalogs:
  tpch: |-
    connector.name=tpch
    tpch.splits-per-node=4

Each catalog is comprised of a key and a value. The key defines the name of the catalog, and the value defines the content of the resulting properties file. The best approach is to use the YAML multiline syntax to configure the content in multiple lines indented below the key.

For example, the following section adds the tpcds-testdata catalog. It uses the TPCDS Connector and only specifies the connector name.

catalogs:
  tpcds-testdata: |
    connector.name=tpcds

Multiple catalogs are configured one after the other:

catalogs:
  tpch-testdata: |
    connector.name=tpch
  tpcds-testdata: |
    connector.name=tpcds
  tmpmemory: |
    connector.name=memory
  metrics: |
    connector.name=jmx
  devnull: |
    connector.name=blackhole
  datalake: |
    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://hive:9083
  s3: |
    connector.name=hive-hadoop2
    hive.metastore=glue

Each catalog properties file can use the configuration options supported by the configured connector.

Additional volumes#

Additional volumes can be necessary for persisting files, for Hive object storage caching, and for a number of other use cases. These can be defined in the additionalVolumes section. None are defined by default:

additionalVolumes: []

You can add one or more volumes supported by k8s, to all nodes in the cluster.

If you specify path only, a directory named in path is created. When mounting ConfigMap or Secret, files are created in this directory for each key.

This also supports an optional subPath parameter which takes in an optional key in the ConfigMap or Secret volume you create. If you specify subPath, specific key named subPath from ConfigMap or Secret is mounted as a file with name provided by path.

additionalVolumes:
  - path: /mnt/InContainer
    volume:
      emptyDir: {}
  - path: /var/lib/presto/cache1
      volume:
      hostPath:
      path: /media/nv1/presto-cache
  - path: /var/lib/presto/cache2
    volume:
      hostPath:
        path: /media/nv2/presto-cache

Usage Metrics#

The usageMetrics section configures usage metric and uses the following default configuration:

usageMetrics:
  enabled: true
  usageClient:
    initialDelay: "1m"
    interval: "1m"

Metrics gathering is enabled which results in an additional side-car container with the coordinator called usage-client. The container gathers and exposes the metrics over the lifetime of the coordinator.

You can view usage metrics data using kubectl:

$ kubectl logs -n <namespace> <pod_name> usage-client
{"startTime":1598353893065,"time":"2020-08-25T11:13:00.753Z","cumulativeCpuTime":"0.00s",...}
{"startTime":1598353893065,"time":"2020-08-25T11:14:00.707Z","cumulativeCpuTime":"9.61s",...}

Restarts of the coordinator pod result in a reset of the metrics. Metrics are not persisted by default, but can be written to a configured persistent volume. The path to the log file has to be configured as additional config.proeprties property on the coordinator by appending

coordinator:
  additionalProperties: |
    usage-metrics.log.path=/mnt/logging-pv

The persistent volume has to be configured as additional volume.

Prometheus#

prometheus:
  enabled: true
  agent:
    version: "0.13.0"
    port: 8081
    config: "/usr/lib/presto/etc/telemetry/prometheus.yaml"
  rules:
    - pattern: presto.execution<name=QueryManager><>(running_queries|queued_queries)
      name: $1
      attrNameSnakeCase: true
      type: GAUGE
    - pattern: 'presto.execution<name=QueryManager><>FailedQueries\.TotalCount'
      name: 'failed_queries'
      type: COUNTER

License#

Starburst provides customers a license file to unlock additional features of SEP. The license file needs to be provided to SEP in the cluster:

  1. Rename the file you received to starburstdata.license.

  2. Create a k8a secret that contains the license file with a name of your choice in the cluster.

    kubectl create secret generic mylicense --from-file=starburstdata.license
    
  3. Configure the secret name as the Starburst platform license.

    starburstPlatformLicense: mylicense
    

Adding files#

Various use cases around security and event listeners need additional config files as properties or XML files. You can add any file to a pod using config maps.

Examples:

  • LDAP authentication file

  • Hive site xml file

Alternatively you can also use additionalVolumes to mount the files and copy the files to appropriate location using path and subPath parameter.

For instance, if you want to copy a file to an already existing location like /usr/lib/presto/plugin you can mount the file to a Kubernetes volume, like a configMap and add the file as a subPath to the path.

additionalVolumes:
  - path: /usr/lib/presto/plugin/x.jar
    subPath: x.jar
    volume:
      configMap:
        name: "configmap-in-volume"

In this case the key named x.jar from the ConfigMap is mounted as that file in the location provided in path.

Used directories#

The location of specific directories on the SEP container is important, if you configure additional files or otherwise want to tweak the container.

SEP is configured to use the following directories on the container:

  • /usr/lib/presto: top level folder for the Presto binaries

  • /usr/lib/presto/plugin: all plugins for Presto including connectors

  • /usr/lib/presto/etc: etc folder for Presto configuration files such as config.properties and others

  • /usr/lib/presto/bin: location of the run-presto script, which is invoked with the container start

  • /data/presto/var/run: contains launcher.pid file used by the launcher script

  • /data/presto/var/log: contains secondary, rotated log files, main log is redirected to stdout as recommended on containers