Presto Kubernetes Resource#

A Presto Kubernetes resource represents a specific Presto cluster. The YAML file allows you to specify various properties related to the coordinator and workers, the catalogs and the cluster overall.

The following snippet shows all available properties with defaults. When defining your particular Presto cluster resource, you need to specify only the properties with non-default values.

apiVersion: starburstdata.com/v1
kind: Presto
metadata:
  name: presto-cluster-name
spec:
  nameOverride: ""
  clusterDomain: cluster.local
  environment: ""
  additionalJvmConfigProperties: ""
  additionalCatalogs: {}
  additionalEtcPrestoTextFiles: {}
  additionalEtcPrestoBinaryFiles: {}
  licenseSecretName: ""
  imageNamePrefix: ""
  additionalBootstrapScriptVolume: {}
  additionalBootstrapScriptVolumes: {}
  additionalVolumes: []

  prometheus:
    enabled: false
    additionalRules: {}

  service:
    type: ClusterIP
    name: ""
    additionalSpecProperties: {}
    nodePort: 31234

  image:
    name: starburstdata/presto:340-e-k8s-0.34
    pullPolicy: Always

  memory:
    nodeMemoryHeadroom: 2Gi
    xmxToTotalMemoryRatio: 0.9
    heapHeadroomPerNodeRatio: 0.3
    queryMaxMemory: 1Pi
    queryMaxTotalMemoryPerNodePoolFraction: 0.333

  coordinator:
    cpuLimit: ""
    cpuRequest: 16
    memoryAllocation: 60Gi
    nodeSelector: {}
    affinity: {}
    additionalProperties: ""
    additionalAnnotations: {}

  worker:
    count: 2
    autoscaling:
      enabled: false
      minReplicas: 1
      maxReplicas: 100
      targetCPUUtilizationPercentage: 80
    deploymentTerminationGracePeriodSeconds: 7200 # 2 hours
    prestoWorkerShutdownGracePeriodSeconds: 120
    cpuLimit: ""
    cpuRequest: 16
    memoryAllocation: 100Gi
    nodeSelector: {}
    affinity: {}
    additionalProperties: ""
    additionalAnnotations: {}

  readinessProbe:
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 15

  livenessProbe:
    initialDelaySeconds: 300
    periodSeconds: 300
    failureThreshold: 1
    timeoutSeconds: 15

  spilling:
    enabled: false
    volume:
      emptyDir: {}

  usageMetrics:
    enabled: true
    usageClient:
      initialDelay: 1m
      interval: 1m

  hive:
    metastoreUri: ""
    awsSecretName: ""
    googleServiceAccountKeySecretName: ""
    azureWasbSecretName: ""
    azureAbfsSecretName: ""
    azureAdlSecretName: ""
    additionalProperties: ""
    internalMetastore:
      mySql:
        jdbcUrl: ""
        username: ""
        password: ""
      postgreSql:
        jdbcUrl: ""
        username: ""
        password: ""
      internalPostgreSql:
        enabled: false
        image:
          name: postgres:9.6.10
          pullPolicy: IfNotPresent
        storage:
          className: ""
          size: 10Gi
          claimSelector: {}
        memory: 2Gi
        cpu: 2
        nodeSelector: {}
        affinity: {}
      s3Endpoint: ""
      image:
        name: starburstdata/hive-metastore:k8s-0.7
        pullPolicy: IfNotPresent
      memory: 6Gi
      cpu: 2
      nodeSelector: {}
      affinity: {}

General Properties#

General properties#

Property name

Example

Description

nameOverride

nameOverride: presto-cluster

Presto Operator by default assigns a unique name to various Kubernetes resources (e.g. Services). The name consists of the cluster name and a unique suffix. Use this property to assign a static name instead.

clusterDomain

clusterDomain: cluster-domain.example

Domain of the K8s cluster.

environment

environment: production

The name of the Presto cluster environment.

additionalJvmConfigProperties

additionalJvmConfigProperties: |
  -XX:NewRatio=4
  -XX:SurvivorRatio=

Specify additional properties for the configuration of the JVM running Presto. Properties are appended to the default configuration.

additionalCatalogs

additionalCatalogs:
  tpcds: |
    connector.name=tpcds
  jmx: |
    connector.name=jmx
  cms: |
    connector.name=postgresql
    connection-url=jdbc:postgresql://example.com:5432/cmsdb
    connection-user=myuser
    connection-password=mypassword

Add one or more catalogs with the relevant properties files. The element name determines the name of the catalog and the multi-line text sets the content of the properties file.

additionalEtcPrestoTextFiles

additionalEtcPrestoTextFiles:
  access-control.properties: |
    access-control.name=read-only
  event-listener.properties: |
    event-listener.name=event-logger
    jdbc.url=jdbc:postgresql://example.com:5432/eventlog
    jdbc.user=myuser
    jdbc.password=mypassword

Add one or more configuration text files to the etc folder used by Presto. The element name determines the filename and the multi-line text sets the content of the file.

additionalEtcPrestoBinaryFiles

Add one or more binary files to the etc folder used by Presto. The configuration of is similar to the preceding text file configuration. The element name determines the filename and the multi-line text element needs to contain a base64 encoded content of the binary file.

licenseSecretName

licenseSecretName: license_secret

Name of a Kubernetes Secret that contains a SEP license file. The license file within the secret should be named signed.license.

imageNamePrefix

imageNamePrefix: gcr.io/project-name/

Specifies prefix of Docker image names used by the Presto cluster. This property enables using a private Docker registry.

image

image:
  name: org/name:tag
  pullPolicy: IfNotPresent

Image section allows you to specify a custom Docker image to be used by the cluster with organization namespace, image name and tag.

additionalBootstrapScriptVolume

additionalBootstrapScriptVolume:
  configMap:
    name: my-bootstrap-script

Property of coordinator and worker pod. Allows adding a custom bootstrap script.

additionalBootstrapScriptVolumes

additionalBootstrapScriptVolumes:
  - configMap:
      name: my-bootstrap-script-1
  - configMap:
      name: my-bootstrap-script-2

Property of coordinator and worker pod. Allows adding a custom bootstrap script.

additionalVolumes

additionalVolumes:
  - name:
    emptyDir: {}
      name: my-bootstrap-script-1
  - path: /var/lib/presto/cache1
      volume:
      hostPath:
      path: /media/nv1/presto-cache
  - path: /var/lib/presto/cache2
    volume:
      hostPath:
        path: /media/nv2/presto-cache

Add one or more volumes supported by k8s, to all nodes in the cluster. Optionally, it can be mounted on a specific path by adding it to the container image. You can use this feature to add one or more volumes and use them to cache distributed storage objects.

Service Properties#

Access to coordinator is possible via the Kubernetes Service. By default the service is only accessible within the Kubernetes cluster at http://presto-coordinator-CLUSTER_NAME_UUID.NAMESPACE.svc.cluster.local:8080, where NAMESPACE is the Kubernetes namespace where the given Presto cluster is deployed and CLUSTER_NAME_UUID is the cluster name with unique suffix appended.

Use the service.name Presto resource parameter to make the service name more predictable. For example, setting service.name=test-cluster causes the coordinator service address to be http://presto-coordinator-test-cluster.NAMESPACE.svc.cluster.local:8080.

Use the nameOverride parameter in order to set CLUSTER_NAME_UUID to a different value.

You can also change type of the coordinator Service using the service.type parameter. For more information on Kubernetes Service types, refer to Kubernetes Services types.

You can add additional parameters to spec section of Service by using service.additionalSpecProperties Presto resource parameter, e.g.

service:
  additionalSpecProperties:
    loadBalancerIP: 78.11.24.19
  type: LoadBalancer

Use service.nodePort parameter to specify the port on which Presto Coordinator Service should be exposed when service.type is set to NodePort, e.g.

service:
  type: NodePort
  nodePort: 3001

General Memory Properties#

The memory section specifies general Presto memory configuration.

General memory properties#

Property name

Example

Description

memory.nodeMemoryHeadroom

memory:
  nodeMemoryHeadroom: 2Gi

Memory headroom that Presto should leave on a node when Presto pods are configured to use entire node memory (empty memoryAllocation configuration property).

memory.xmxToTotalMemoryRatio

memory:
  xmxToTotalMemoryRatio: 0.9

Ratio between Presto JVM heap size and memory available for a Presto pod.

memory.heapHeadroomPerNodeRatio

memory:
  heapHeadroomPerNodeRatio: 0.3

Ratio between memory.heap-headroom-per-node Presto configuration property and Presto JVM heap size.

memory.queryMaxMemory

memory:
  queryMaxMemory: 1Pi

Value of the query.max-memory Presto configuration property.

memory.queryMaxTotalMemoryPerNodePoolFraction

memory:
  queryMaxTotalMemoryPerNodePoolFraction: 0.333

Value the query.max-total-memory-per-node Presto configuration property expressed as fraction of Presto JVM heap size

Coordinator Properties#

All coordinator properties are nested within coordinator.

Coordinator properties#

Property name

Example

Description

cpuRequest and cpuLimit

coordinator:
  cpuRequest: 6
  cpuLimit: 32

Specifies the coordinator pod’s CPU limit and request.

memoryAllocation

coordinator:
  memoryAllocation: 60Gi

Specifies coordinator pod’s memory usage (both request and limit). If empty, the coordinator pod utilizes the entire memory available on the node.

nodeSelector and affinity

coordinator:
  nodeSelector:
    role: "presto"
  affinity:
    podAffinity:
      ...

Specifies the coordinator pod’s node selector and affinity.

additionalProperties

coordinator:
  additionalProperties:
    resource-groups.config-file=filename

Specify additional configuration properties.

additionalAnnotations

Worker Properties#

All worker properties are nested within worker.

Worker properties#

Property name

Example

Description

count

worker:
  count: 3

Number of worker pods.

autoscaling

worker:
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 100
    targetCPUUtilizationPercentage: 80

Configuration of workers autoscaling.

deploymentTerminationGracePeriodSeconds

worker:
  deploymentTerminationGracePeriodSeconds: 7200

Specifies termination grace period for workers pods. Worker pods are not terminated until queries running on the pod are finished or grace period passes.

prestoWorkerShutdownGracePeriodSeconds

cpuRequest and cpuLimit

worker:
  cpuRequest: 6
  cpuLimit: 32

Specifies worker pod CPU limit and request.

memoryAllocation

worker:
  memoryAllocation: 100Gi

Specifies worker pod memory usage (both request and limit). If empty, the worker pod utilizes the entire memory available on the node.

nodeSelector and affinity

worker:
  nodeSelector:
    role: "presto"
  affinity:
    podAffinity:
    ...

Specifies the worker pod’s node selector and affinity.

additionalProperties

worker:
  additionalProperties:
    resource-groups.config-file=filename

Specify additional configuration properties.

additionalAnnotations

Readiness and Liveness Probe#

You can configure Kubernetes probes with the readinessProbe and livenessProbe elements.

Readiness and liveness probe properties#

Property name

Example

Description

readinessProbe

readinessProbe:
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 15

Properties of the coordinator and worker readiness probes. For more information on readiness probes, refer to Kubernetes probes.

livenessProbe

livenessProbe:
  initialDelaySeconds: 300
  periodSeconds: 300
  failureThreshold: 1
  timeoutSeconds: 15

Properties of coordinator and worker liveness probes. For more information on liveness probes, refer to Kubernetes probes.

Usage Metrics#

The defaults configure usage metrics correctly. No changes are necessary.

Spill to Disk Properties#

You can add an additional volume and enable the optional disk spilling in the spilling section. Use any type of volume supported by k8s, and configure it in spec.volumes.

The volume needs to be defined, and then used for spilling:

spec:
  volumes:
  - name: someVolume
    emptyDir: {}
  - name: mySpillVolume
    hostPath:
      path: /opt/data
      type: Directory

spilling:
  enabled: true
  volume:
    mySpillVolume

The default spilling configuration uses spec.volumes[0] set to emptyDir. We recommend using emptyDir or hostPath type for spilling. Ensure that each worker has exclusive access to its spilling volume, and the path exists.

Hive Connector Properties#

SEP on Kubernetes provides automatic configuration of the Hive connector. Such a connector allows you to either access an external Metastore or use built-in internal Presto cluster Metastore as well.

External Metastore#

You can configure Presto to use an external Hive Metastore by setting the hive.metastoreUri property, e.g.

hive:
  metastoreUri: thrift://hive-metastore:9083
External metastore properties#

Property name

Example

Description

awsSecretName

Name of the AWS secret to access the metastore

googleServiceAccountKeySecretName

Name of the secret key for the Google service account to access the metastore

azureWasbSecretName

Name of the secret for the Windows Azure Storage Blob (WASB) to access the metastore

azureAbfsSecretName

Name of the secret for the Azure Blob Filesystem (ABFS) to access the metastore

azureAdlSecretName

Name of the secret for the Azure Data Lake (ADL) to access the metastore

additionalProperties

hive:
  additionalProperties: |
    hive.allow-drop-table=true

Specify additional properties to configure the Hive connector behavior.

Internal Metastore#

You can configure Presto to use an internal Hive Metastore by setting the hive.internalMetastore.mySql or hive.internalMetastore.postgreSql properties, e.g.

hive:
  internalMetastore:
    mySql:
      jdbcUrl: jdbc:mysql://mysql-server/metastore-database
      username: hive
      password: hivePassword

or

hive:
  internalMetastore:
    postgreSql:
      jdbcUrl: jdbc:postgresql://postgresql-server/metastore-database
      username: hive
      password: hivePassword

In such a case, an additional Hive Metastore pod is created. It stores the metadata in the provided PostgreSQL or MySQL database.

You can also make the internal Metastore to use an internal ephemeral PostgreSQL database. This can be enabled by setting the hive.internalMetastore.internalPostgreSql.enabled to true, e.g.

hive:
  internalMetastore:
    internalPostgreSql:
      enabled: true

In this case, an additional PostgreSQL pod is created along with the Hive Metastore pod. This Hive Metastore uses the internal ephemeral PostgreSQL as a relational database.

Note: Using the internal ephemeral PostgreSQL is not recommended for production. Data stored within the internal PostgreSQL is lost when the Presto cluster is terminated.

Note: You cannot use external and internal Metastore at the same time. You also cannot use external and internal relational database for Metastore at the same time.

Customizing Internal Hive Metastore Pod#

It possible to customize internal Hive Metastore pod’s Docker image via the hive.internalMetastore.image property, e.g.

hive:
  internalMetastore:
    image:
      name: customized-internal-hive-metastore:0.1
      pullPolicy: IfNotPresent

It is possible to change the resources required by Hive Metastore’s pod via:

  • hive.internalMetastore.cpu

  • hive.internalMetastore.memory

  • hive.internalMetastore.nodeSelector

  • hive.internalMetastore.affinity

properties.

Customizing Internal PostgreSQL Docker Pod#

It is possible to customize the internal PostgreSQL pod’s Docker image via the hive.internalMetastore.internalPostgreSql.image property, e.g.

hive:
  internalMetastore:
    internalPostgreSql:
      image:
        name: customized-internal-postgresql:0.1
        pullPolicy: IfNotPresent

It is possible to change the resources required by PostgreSQL’s pod via:

  • hive.internalMetastore.internalPostgreSql.cpu

  • hive.internalMetastore.internalPostgreSql.memory

  • hive.internalMetastore.internalPostgreSql.nodeSelector

  • hive.internalMetastore.internalPostgreSql.affinity

properties.

It is possible to specify the storage used by PostgreSQL’s pod via: the hive.internalMetastore.internalPostgreSql.storage property, e.g.

hive:
  internalMetastore:
    internalPostgreSql:
      storage:
        className: db-class
        size: 20Gi
        claimSelector:
          matchLabels:
            release: "stable"
          matchExpressions:
            - {key: environment, operator: In, values: [dev]}

Prometheus Support#

Presto Kubernetes supports exposing Presto metrics to Prometheus. You can use the Prometheus Operator in order to set up Prometheus in your Kubernetes cluster and collect Presto metrics.

To enable Presto Prometheus metric endpoints set the prometheus.enabled property to true. This causes the following additional Kubernetes services to be created:

  • prometheus-coordinator-CLUSTER_NAME_UUID

  • prometheus-worker-CLUSTER_NAME_UUID

where CLUSTER_NAME_UUID is the cluster name with a unique suffix. Use the nameOverride property in order to set CLUSTER_NAME_UUID to a different value.

Those services expose Presto metrics in the Prometheus format. You can use ServiceMonitor resources from the Prometheus Operator to make Prometheus collect metrics from those endpoints. In order to match Presto Prometheus endpoints you can use labels, e.g.

matchLabels:
  instance: CLUSTER_NAME_UUID
  role: "prometheus-coordinator"

The following Presto and JVM metrics are exported by default:

  • running_queries - number of currently running queries

  • queued_queries - number of currently queued queries

  • failed_queries - total count of failed queries

  • jvm_gc_collection_seconds_sum - total GC (young and old) time in seconds

  • jvm_gc_collection_seconds_count - total GC (young and old) count

  • jvm_memory_bytes_committed, jvm_memory_bytes_init, jvm_memory_bytes_max, jvm_memory_bytes_used - JVM memory usage metrics (heap and non-heap). For more information see MemoryMXBean.