Overview#

Kubernetes (k8s) provides a very powerful and flexible deployment and runtime system for Starburst Enterprise Presto.

Note

Kubernetes and the ecosystem of tools, patterns and conventions around k8s is complex. This documentation assumes some familiarity. Use the many available resources, such as the tutorial to learn more.

Ensure that you can fulfill the prerequisites before proceeding to deploy Presto on k8s.

Docker container images#

The Kubernetes offering uses a number of Docker containers, all available on DockerHub.

Presto:

docker pull starburstdata/presto:340-e-k8s-0.34

Presto operator:

docker pull starburstdata/presto-operator:340-e-k8s-0.34

Apache Hive Metastore:

docker pull starburstdata/hive-metastore:k8s-0.7

Helm charts#

Helm charts are used for the Apache Ranger installation, which includes the SEP Ranger plugin, and are only available from the Starburst Helm chart repository.

Deploying on K8s#

With access to your cluster and the public internet, you can proceed to deploy using kubectl and the resources provided by Starburst. The relevant files are located in DockerHub and on Starburst storage systems.

Further platform specific tips are available:

Customize#

The Kubernetes integration allows adding a user-defined bootstrap script to the pods. This additional bootstrap scriptc is executed on the coordinator and worker nodes upon node startup.

If the additional bootstrap script mechanism is insufficient, you can further customize the Docker images. This allows you to to extend Presto with additional features, which are not supported by default.

Presto Cloud Architecture#

SEP on Kubernetes consists of various components and Kubernetes resources that form a Presto Kubernetes cluster. The following terms describe each component of the Presto Kubernetes architecture in more detail:

Presto Kubernetes Custom Resource Definition

Presto Kubernetes Custom Resource Definition (CRD) defines resources of Presto type within a Kubernetes namespace. More information on Kubernetes custom resources is available in the documentation.

Presto Kubernetes Resources

Presto Kubernetes resources are instances of Presto Kubernetes custom resource. Each instance of a Presto Kubernetes resource represents a cluster and contains various configurable parameters that specify Presto, connector and Kubernetes properties .

Presto Operator

The Presto operator is a Kubernetes operator pod that orchestrates Presto clusters. The operator continuously monitors the Kubernetes resources. It creates or removes clusters when Presto Kubernetes resources are created or removed. Any changes to existing the resource are picked up by the operator to update the corresponding cluster accordingly.

Presto Service Account

The account used by the Presto operator to make Kubernetes API calls.

Presto Service Role

The role that is bound to Presto service account. The role must have enough privileges to manage clusters.

Presto Coordinator

A Kubernetes pod that runs the coordinator of the cluster. Each cluster uses at most a single coordinator. The pod is automatically recreated on coordinator failure or unresponsiveness, thus providing high availability.

Presto Worker

Presto pods that run workers. The number of workers can be adjusted statically by specifying the worker.count property or automatically with the Kubernetes Horizontal Pod Autoscaler.

Presto Coordinator Service

A Kubernetes Service that delegates requests to the coordinator. Presto Coordinator Service is the frontend of the cluster which accepts queries and exposes the web UI.

Presto Network Policy

The Presto Network Policy only allows inbound traffic to the workers from the coordinator. Support for Kubernetes Network Policies needs to be enabled in your Kubernetes cluster.

Metastore

A Hive Metastore pod that is running when the cluster is configured to use an internal Metastore.

PostgreSQL

A PostgreSQL pod that is running when Presto cluster is configured to use an internal Metastore with an internal ephemeral Metastore database.

Labels#

All Presto Kubernetes components are labelled so that it’s easy to write selectors in order to match them:

  • instance: CLUSTER_NAME_UUIDCLUSTER_NAME_UUID is the cluster name with a unique suffix. CLUSTER_NAME_UUID can be overridden via the nameOverride Presto Kubernetes Resource property.

  • role: ROLE_NAMEROLE_NAME can be one of:

    • catalogs – specifies catalogs ConfigMap

    • configuration – specifies configuration ConfigMap

    • coordinator – specifies Presto Coordinator related components

    • worker – specifies Presto Worker related components

    • hive-metastore – specifies Hive Metastore related components

    • hive-postgresql – specifies PostgreSQL (for Hive Metastore) related components

    • prometheus-coordinator – specifies Prometheus metrics endpoint for the coordinator

    • prometheus-worker – specifies Prometheus metrics endpoint for the workers

License#

The following features require a valid license:

  • Graceful worker scale-down

  • Autoscaling

  • Presto pods with more than 8 cores

  • Presto pods with more than 64GB of RAM