Hive metastore configuration#

The starburst-hive Helm chart configures a Hive Metastore Service HMS and optionally the backing database in the cluster with the values.yaml file detailed in the following sections.

A minimal values file adds the registry credentials and overrides any defaults to suitable values.

Using the HMS#

The expose section configures the DNS availability of the HMS in the cluster. By default the HMS is available at the hostname hive and port 9083. As a result the Thrift URL within the cluster is thrift://hive:9083.

You can use the URL for any catalog:

catalog:
  datalake: |
    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://hive:9083

Docker image and registry#

Same as Docker image and registry section for the SEP Helm chart

image:
  repository: "harbor.starburstdata.net/starburstdata/hive"
  tag: "346.0.0"
  pullPolicy: "IfNotPresent"

registryCredentials:
  enabled: false
  registry:
  username:
  password:

Exposing the pod to outside network#

The expose section for the HMS works identical to the SEP server expose section. Differences are isolated to the configured default values. The default type is clusterIp. Ensure to adapt your configured catalogs to use the correct Thrift URL for HMS, when changing this configuration.

expose:
  type: "clusterIp"
  clusterIp:
    name: "hive"
    ports:
      http:
        port: 9083
expose:
  type: "nodePort"
  nodePort:
    name: "hive"
    ports:
      http:
        port: 9083
        nodePort: 30083
expose:
  type: "ingress"
  ingress:
    tls:
      enabled: true
      secretName:
    host:
    path: /
    annotations: {}

Database backend for HMS#

The database backend for HMS is a PostgreSQL database internal to the cluster by default:

database:
  type: internal
  internal:
    image:
      repository: "library/postgres"
      tag: "10.6"
      pullPolicy: "IfNotPresent"
    volume:
      # use one of:
      # - existingVolumeClaim to specify existing PVC
      # - persistentVolumeClaim to specify spec for new PVC
      # - other volume type inline configuration, e.g. emptyDir
      # Examples:
      # existingVolumeClaim: "my_claim"
      # persistentVolumeClaim:
      #  storageClassName:
      #  accessModes:
      #    - ReadWriteOnce
      #  resources:
      #    requests:
      #      storage: "2Gi"
      emptyDir: {}
    resources:
      requests:
        memory: "1Gi"
        cpu: 2
      limits:
        memory: "1Gi"
        cpu: 2
    driver: "org.postgresql.Driver"
    port: 5432
    databaseName: "hive"
    databaseUser: "hive"
    databasePassword: "HivePass1234"
    env: []

database.internal.env: YAML sequence of mappings to define two keys environment variables for the internal PostgreSQL container.

For example, Openshift deployments often do not have access to pull from the default Docker registry library/postgres. You can replace it with an image from the Redhat registry, which requires additional environment variables set with the parameter database.internal.env:

database:
  type: internal
  internal:
    image:
       repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
       tag: "latest"
    env:
      - name: POSTGRESQL_DATABASE
         value: "hive"
      - name: POSTGRESQL_USER
         value: "hive"
      - name: POSTGRESQL_PASSWORD
         value: "HivePass1234"

Alternatively you can use an external PostgreSQL or MySQL database, by setting database.type to external and configuring the nested properties.

database:
  type: external
  external:
    jdbcUrl:
    driver:
    user:
    password:

database.external.jdbcUrl: JDBC URL to connect to the external database as required by the database and used driver, including hostname and port

database.external.driver: Valid values are com.mysql.jdbc.Driver for an external MySQL or compatible database or org.postgresql.Driver for a PostgreSQL database.

database.external.user: Database user name to access the external database using JDBC.

database.external.password: Password for the user configured to access the external database using JDBC.

Additional volumes#

Additional volumes can be necessary for persisting files. These can be defined in the additionalVolumes section. None are defined by default:

additionalVolumes: []

You can add one or more volumes supported by k8s, to all nodes in the cluster.

If you specify path only, a directory named in path is created. When mounting ConfigMap or Secret, files are created in this directory for each key.

This also supports an optional subPath parameter which takes in an optional key in the ConfigMap or Secret volume you create. If you specify subPath, a specific key named subPath from ConfigMap or Secret is mounted as a file with the name provided by path.

additionalVolumes:
  - path: /mnt/InContainer
    volume:
      emptyDir: {}
  - path: /etc/hive/conf/test_config.txt
    subPath: test_config.txt
    volume:
      configMap:
        name: "configmap-in-volume"

Storage#

The chart allows you to configure the credentials to access the HDFS or object storage. The credentials enable the HMS to access to storage for metadata information including statistics gathering.

In addition you have to configure the catalog with sufficient, corresponding credentials.

The default configuration includes no credentials:

hdfs:
  hadoopUserName:
objectStorage:
  awsS3:
    region:
    endpoint:
    accessKey:
    secretKey:
    pathStyleAccess: false
  gs:
    cloudKeyFileSecret:
  azure:
    abfs:
      authType: "accessKey"
      accessKey:
        storageAccount:
        accessKey:
      oauth:
        clientId:
        secret:
        endpoint:
    wasb:
      storageAccount:
      accessKey:
  adl:
    oauth2:
      clientId:
      credential:
      refreshUrl:

hdfs.hadoopUserName: User name for Hadoop HDFS access

objectStorage.awsS3.*: Configuration for AWS S3 access

objectStorage.awsS3.region: AWS region name

objectStorage.awsS3.endpoint: AWS S3 endpoint

objectStorage.awsS3.accessKey: Name of the access key for AWS S3

objectStorage.awsS3.secretKey: Name of the secret key for AWS S3

objectStorage.awsS3.pathStyleAccess:

objectStorage.gs.*: Configuration for Google Storage access

objectStorage.gs.cloudKeyFileSecret: Name of the secret with the file containing the access key to the cloud storage. The key of the secret must be named key.json

objectStorage.azure.*: Configuration for Microsoft Azure storage systems

objectStorage.azure.abfs.*: Configuration for Azure Blob Filesystem (ABFS)

objectStorage.azure.abfs.authType: Authentication to access ABFS, Valid values are``accessKey`` or oauth, configuration in the following properties.

objectStorage.azure.abfs.accessKey.*: Configuration for access key authentication to ABFS

objectStorage.azure.abfs.accessKey.storageAccount: Name of the ABFS account to access

objectStorage.azure.abfs.accessKey.accessKey: Actual access key to use for ABFS access

objectStorage.azure.abfs.oauth.*: Configuration for OAuth authentication to ABFS

objectStorage.azure.abfs.oauth.clientId: Client identifier for OAuth authentication.

objectStorage.azure.abfs.oauth.secret: Secret for OAuth.

objectStorage.azure.abfs.oauth.endpoint: Endpoint URL for OAuth.

objectStorage.azure.wasb.*: Configuration for Windows Azure Storage Blob (WASB)

objectStorage.azure.wasb.storageAccount: Name of the storage account to use for WASB.

objectStorage.azure.wasb.accessKey: Key to access WASB.

objectStorage.azure.adl: Configuration for Azure Data Lake (ADL)

objectStorage.azure.adl.oauth2.*: Configuration for OAuth authentication to ADL

objectStorage.azure.adl.oauth2.clientId: Client identifier for OAuth access to ADL

objectStorage.azure.adl.oauth2.credential: Credential for OAuth access to ADL

objectStorage.azure.adl.oauth2.refreshUrl: Refresh URL for the OAuth access to ADL.

Server configuration#

heapSizePercentage: 85

resources:
  requests:
    memory: "1Gi"
    cpu: 1
  limits:
    memory: "1Gi"
    cpu: 1

Node assignment#

You can add configuration to determine the node and pod to use:

nodeSelector: {}
tolerations: []
affinity: {}