Kubernetes (k8s) provides a very powerful and flexible deployment and runtime system for Starburst Enterprise Presto.
Kubernetes and the ecosystem of tools, patterns and conventions around k8s is complex. This documentation assumes some familiarity. Use the many available resources, such as the tutorial to learn more.
Ensure that you can fulfill the prerequisites before proceeding to deploy Presto on k8s.
The Kubernetes offering uses a number of Docker containers, all available on DockerHub.
docker pull starburstdata/presto:338-e.5-k8s-0.50
docker pull starburstdata/presto-operator:338-e.5-k8s-0.50
docker pull starburstdata/hive-metastore:k8s-0.7
Helm charts are used for the Apache Ranger installation, which includes the SEP Ranger plugin, and are only available from the Starburst Helm chart repository.
With access to your cluster and the public internet, you can proceed to deploy using kubectl and the resources provided by Starburst. The relevant files are located in DockerHub and on Starburst storage systems.
Further platform specific tips are available:
The Kubernetes integration allows adding a user-defined bootstrap script to the pods. This additional bootstrap script is executed on the coordinator and worker nodes upon node startup.
If the additional bootstrap script mechanism is insufficient, you can further customize the Docker images. This allows you to to extend Presto with additional features, which are not supported by default.
SEP on Kubernetes consists of various components and Kubernetes resources that form a Presto Kubernetes cluster. The following terms describe each component of the Presto Kubernetes architecture in more detail:
Presto Kubernetes Custom Resource Definition
Presto Kubernetes Custom Resource Definition (CRD) defines resources of Presto type within a Kubernetes namespace. More information on Kubernetes custom resources is available in the documentation.
Presto Kubernetes Resources
Presto Kubernetes resources are instances of Presto Kubernetes custom resource. Each instance of a Presto Kubernetes resource represents a cluster and contains various configurable parameters that specify Presto, connector and Kubernetes properties.
The Presto operator is a Kubernetes operator pod that orchestrates Presto clusters. The operator continuously monitors the Kubernetes resources. It creates or removes clusters when Presto Kubernetes resources are created or removed. Any changes to existing the resource are picked up by the operator to update the corresponding cluster accordingly.
Presto Service Account
The account used by the Presto operator to make Kubernetes API calls.
Presto Service Role
The role that is bound to Presto service account. The role must have enough privileges to manage clusters.
A Kubernetes pod that runs the coordinator of the cluster. Each cluster uses at most a single coordinator. The pod is automatically recreated on coordinator failure or unresponsiveness, thus providing high availability.
Presto Coordinator Service
A Kubernetes Service that delegates requests to the coordinator. Presto Coordinator Service is the frontend of the cluster which accepts queries and exposes the web UI.
Presto Network Policy
The Presto Network Policy only allows inbound traffic to the workers from the coordinator. Support for Kubernetes Network Policies needs to be enabled in your Kubernetes cluster.
A Hive Metastore pod that is running when the cluster is configured to use an internal Metastore.
A PostgreSQL pod that is running when Presto cluster is configured to use an internal Metastore with an internal ephemeral Metastore database.
All Presto Kubernetes components are labelled so that it’s easy to write selectors in order to match them:
CLUSTER_NAME_UUIDis the cluster name with a unique suffix.
CLUSTER_NAME_UUIDcan be overridden via the
nameOverridePresto Kubernetes Resource property.
ROLE_NAMEcan be one of:
catalogs– specifies catalogs ConfigMap
configuration– specifies configuration ConfigMap
coordinator– specifies Presto Coordinator related components
worker– specifies Presto Worker related components
hive-metastore– specifies Hive Metastore related components
hive-postgresql– specifies PostgreSQL (for Hive Metastore) related components
prometheus-coordinator– specifies Prometheus metrics endpoint for the coordinator
prometheus-worker– specifies Prometheus metrics endpoint for the workers