11.5. Hive Security Configuration

Authorization

You can enable authorization checks for the Hive Connector by setting the hive.security property in the Hive catalog properties file. This property must be one of the following values:

Property Value Description
legacy (default value) Few authorization checks are enforced, thus allowing most operations. The config properties hive.allow-drop-table, hive.allow-rename-table, hive.allow-add-column, hive.allow-drop-column and hive.allow-rename-column are used.
read-only Operations that read data or metadata, such as SELECT, are permitted, but none of the operations that write data or metadata, such as CREATE, INSERT or DELETE, are allowed.
file Authorization checks are enforced using a config file specified by the Hive configuration property security.config-file. See File Based Authorization for details.
sql-standard Users are permitted to perform the operations as long as they have the required privileges as per the SQL standard. In this mode, Presto enforces the authorization checks for queries based on the privileges defined in Hive metastore. To alter these privileges, use the GRANT and REVOKE commands. See SQL Standard Based Authorization for details.
sentry Authorization checks are enforced using an external Apache Sentry service. See Apache Sentry Based Authorization for details.
ranger Authorization checks are enforced using an external Apache Ranger service. See Apache Ranger Based Authorization for details.

SQL Standard Based Authorization

When sql-standard security is enabled, Presto enforces the same SQL standard based authorization as Hive does.

Since Presto’s ROLE syntax support matches the SQL standard, and Hive does not exactly follow the SQL standard, there are the following limitations and differences:

  • CREATE ROLE role WITH ADMIN is not supported.
  • The admin role must be enabled to execute CREATE ROLE or DROP ROLE.
  • GRANT role TO user GRANTED BY someone is not supported.
  • REVOKE role FROM user GRANTED BY someone is not supported.
  • By default, all a user’s roles except admin are enabled in a new user session.
  • One particular role can be selected by executing SET ROLE role.
  • SET ROLE ALL enables all of a user’s roles except admin.
  • The admin role must be enabled explicitly by executing SET ROLE admin.

Authentication

The default security configuration of the Hive Connector does not use authentication when connecting to a Hadoop cluster. All queries are executed as the user who runs the Presto process, regardless of which user submits the query.

The Hive connector provides additional security options to support Hadoop clusters that have been configured to use Kerberos.

When accessing HDFS, Presto can impersonate the end user who is running the query. This can be used with HDFS permissions and ACLs to provide additional security for data.

Warning

Access to the Presto coordinator should be secured using Kerberos when using Kerberos authentication to Hadoop services. Failure to secure access to the Presto coordinator could result in unauthorized access to sensitive data on the Hadoop cluster.

See Coordinator Kerberos Authentication and CLI Kerberos Authentication for information on setting up Kerberos authentication.

Kerberos Support

In order to use the Hive connector with a Hadoop cluster that uses kerberos authentication, you will need to configure the connector to work with two services on the Hadoop cluster:

  • The Hive metastore Thrift service
  • The Hadoop Distributed File System (HDFS)

Access to these services by the Hive connector is configured in the properties file that contains the general Hive connector configuration.

Note

If your krb5.conf location is different from /etc/krb5.conf you must set it explicitly using the java.security.krb5.conf JVM property in jvm.config file.

Example: -Djava.security.krb5.conf=/example/path/krb5.conf.

Hive Metastore Thrift Service Authentication

In a Kerberized Hadoop cluster, Presto connects to the Hive metastore Thrift service using SASL and authenticates using Kerberos. Kerberos authentication for the metastore is configured in the connector’s properties file using the following properties:

Property Name Description
hive.metastore.authentication.type Hive metastore authentication type.
hive.metastore.service.principal The Kerberos principal of the Hive metastore service.
hive.metastore.client.principal The Kerberos principal that Presto will use when connecting to the Hive metastore service.
hive.metastore.client.keytab Hive metastore client keytab location.

hive.metastore.authentication.type

One of NONE or KERBEROS. When using the default value of NONE, Kerberos authentication is disabled and no other properties need to be configured.

When set to KERBEROS the Hive connector will connect to the Hive metastore Thrift service using SASL and authenticate using Kerberos.

This property is optional; the default is NONE.

hive.metastore.service.principal

The Kerberos principal of the Hive metastore service. The Presto coordinator will use this to authenticate the Hive metastore.

The _HOST placeholder can be used in this property value. When connecting to the Hive metastore, the Hive connector will substitute in the hostname of the metastore server it is connecting to. This is useful if the metastore runs on multiple hosts.

Example: hive/hive-server-host@EXAMPLE.COM or hive/_HOST@EXAMPLE.COM.

This property is optional; no default value.

hive.metastore.client.principal

The Kerberos principal that Presto will use when connecting to the Hive metastore.

The _HOST placeholder can be used in this property value. When connecting to the Hive metastore, the Hive connector will substitute in the hostname of the worker node Presto is running on. This is useful if each worker node has its own Kerberos principal.

Example: presto/presto-server-node@EXAMPLE.COM or presto/_HOST@EXAMPLE.COM.

This property is optional; no default value.

Warning

The principal specified by hive.metastore.client.principal must have sufficient privileges to remove files and directories within the hive/warehouse directory. If the principal does not, only the metadata will be removed, and the data will continue to consume disk space.

This occurs because the Hive metastore is responsible for deleting the internal table data. When the metastore is configured to use Kerberos authentication, all of the HDFS operations performed by the metastore are impersonated. Errors deleting data are silently ignored.

hive.metastore.client.keytab

The path to the keytab file that contains a key for the principal specified by hive.metastore.client.principal. This file must be readable by the operating system user running Presto.

This property is optional; no default value.

Example configuration with NONE authentication

hive.metastore.authentication.type=NONE

The default authentication type for the Hive metastore is NONE. When the authentication type is NONE, Presto connects to an unsecured Hive metastore. Kerberos is not used.

Example configuration with KERBEROS authentication

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-host.example.com@EXAMPLE.COM
hive.metastore.client.principal=presto@EXAMPLE.COM
hive.metastore.client.keytab=/etc/presto/hive.keytab

When the authentication type for the Hive metastore Thrift service is KERBEROS, Presto will connect as the Kerberos principal specified by the property hive.metastore.client.principal. Presto will authenticate this principal using the keytab specified by the hive.metastore.client.keytab property, and will verify that the identity of the metastore matches hive.metastore.service.principal.

Keytab files must be distributed to every node in the cluster that runs Presto.

Additional Information About Keytab Files.

HDFS Authentication

In a Kerberized Hadoop cluster, Presto authenticates to HDFS using Kerberos. Kerberos authentication for HDFS is configured in the connector’s properties file using the following properties:

Property Name Description
hive.hdfs.authentication.type HDFS authentication type. Possible values are NONE or KERBEROS.
hive.hdfs.impersonation.enabled Enable HDFS end-user impersonation.
hive.hdfs.presto.principal The Kerberos principal that Presto will use when connecting to HDFS.
hive.hdfs.presto.keytab HDFS client keytab location.

hive.hdfs.authentication.type

One of NONE or KERBEROS. When using the default value of NONE, Kerberos authentication is disabled and no other properties need to be configured.

When set to KERBEROS, the Hive connector authenticates to HDFS using Kerberos.

This property is optional; the default is NONE.

hive.hdfs.impersonation.enabled

Enable end-user HDFS impersonation.

The section End User Impersonation gives an in-depth explanation of HDFS impersonation.

This property is optional; the default is false.

hive.hdfs.presto.principal

The Kerberos principal that Presto will use when connecting to HDFS.

The _HOST placeholder can be used in this property value. When connecting to HDFS, the Hive connector will substitute in the hostname of the worker node Presto is running on. This is useful if each worker node has its own Kerberos principal.

Example: presto-hdfs-superuser/presto-server-node@EXAMPLE.COM or presto-hdfs-superuser/_HOST@EXAMPLE.COM.

This property is optional; no default value.

hive.hdfs.presto.keytab

The path to the keytab file that contains a key for the principal specified by hive.hdfs.presto.principal. This file must be readable by the operating system user running Presto.

This property is optional; no default value.

Example configuration with NONE authentication

hive.hdfs.authentication.type=NONE

The default authentication type for HDFS is NONE. When the authentication type is NONE, Presto connects to HDFS using Hadoop’s simple authentication mechanism. Kerberos is not used.

Example configuration with KERBEROS authentication

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.presto.principal=hdfs@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/presto/hdfs.keytab

When the authentication type is KERBEROS, Presto accesses HDFS as the principal specified by the hive.hdfs.presto.principal property. Presto will authenticate this principal using the keytab specified by the hive.hdfs.presto.keytab keytab.

Keytab files must be distributed to every node in the cluster that runs Presto.

Additional Information About Keytab Files.

End User Impersonation

Impersonation Accessing HDFS

Presto can impersonate the end user who is running a query. In the case of a user running a query from the command line interface, the end user is the username associated with the Presto CLI process or argument to the optional --user option. Impersonating the end user can provide additional security when accessing HDFS if HDFS permissions or ACLs are used.

HDFS Permissions and ACLs are explained in the HDFS Permissions Guide.

NONE authentication with HDFS impersonation

hive.hdfs.authentication.type=NONE
hive.hdfs.impersonation.enabled=true

When using NONE authentication with impersonation, Presto impersonates the user who is running the query when accessing HDFS. The user Presto is running as must be allowed to impersonate this user, as discussed in the section Impersonation in Hadoop. Kerberos is not used.

KERBEROS Authentication With HDFS Impersonation

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=true
hive.hdfs.presto.principal=presto@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/presto/hdfs.keytab

When using KERBEROS authentication with impersonation, Presto impersonates the user who is running the query when accessing HDFS. The principal specified by the hive.hdfs.presto.principal property must be allowed to impersonate this user, as discussed in the section Impersonation in Hadoop. Presto authenticates hive.hdfs.presto.principal using the keytab specified by hive.hdfs.presto.keytab.

Keytab files must be distributed to every node in the cluster that runs Presto.

Additional Information About Keytab Files.

Impersonation Accessing the Hive Metastore

Presto does not currently support impersonating the end user when accessing the Hive metastore.

Impersonation in Hadoop

In order to use NONE authentication with HDFS impersonation or KERBEROS Authentication With HDFS Impersonation, the Hadoop cluster must be configured to allow the user or principal that Presto is running as to impersonate the users who log in to Presto. Impersonation in Hadoop is configured in the file core-site.xml. A complete description of the configuration options can be found in the Hadoop documentation.

Additional Information About Keytab Files

Keytab files contain encryption keys that are used to authenticate principals to the Kerberos KDC. These encryption keys must be stored securely; you should take the same precautions to protect them that you would to protect ssh private keys.

In particular, access to keytab files should be limited to the accounts that actually need to use them to authenticate. In practice, this is the user that the Presto process runs as. The ownership and permissions on keytab files should be set to prevent other users from reading or modifying the files.

Keytab files need to be distributed to every node running Presto. Under common deployment situations, the Hive connector configuration will be the same on all nodes. This means that the keytab needs to be in the same location on every node.

You should ensure that the keytab files have the correct permissions on every node after distributing them.

File Based Authorization

The config file is specified using JSON and is composed of three sections, each of which is a list of rules that are matched in the order specified in the config file. The user is granted the privileges from the first matching rule. All regexes default to .* if not specified.

Schema Rules

These rules govern who is considered an owner of a schema.

  • user (optional): regex to match against user name.
  • schema (optional): regex to match against schema name.
  • owner (required): boolean indicating ownership.

Table Rules

These rules govern the privileges granted on specific tables.

  • user (optional): regex to match against user name.
  • schema (optional): regex to match against schema name.
  • table (optional): regex to match against table name.
  • privileges (required): zero or more of SELECT, INSERT, DELETE, OWNERSHIP, GRANT_SELECT.

Session Property Rules

These rules govern who may set session properties.

  • user (optional): regex to match against user name.
  • property (optional): regex to match against session property name.
  • allowed (required): boolean indicating whether this session property may be set.

See below for an example.

{
  "schemas": [
    {
      "user": "admin",
      "schema": ".*",
      "owner": true
    },
    {
      "user": "guest",
      "owner": false
    },
    {
      "schema": "default",
      "owner": true
    }
  ],
  "tables": [
    {
      "user": "admin",
      "privileges": ["SELECT", "INSERT", "DELETE", "OWNERSHIP"]
    },
    {
      "user": "banned_user",
      "privileges": []
    },
    {
      "schema": "default",
      "table": ".*",
      "privileges": ["SELECT"]
    }
  ],
  "sessionProperties": [
    {
      "property": "force_local_scheduling",
      "allow": true
    },
    {
      "user": "admin",
      "property": "max_split_size",
      "allow": true
    }
  ]
}

Apache Ranger Based Authorization

When ranger security is enabled, Presto enforces the same SQL standard based authorization as Hive does when Ranger is enabled for Hive.

See Presto with Apache Ranger for additional information.

Property Name Description
ranger.policy-rest-url Address URL of Ranger service.
ranger.service-name Ranger Presto plugin service name.
ranger.authentication-type Authentication method that will be used when connecting to Ranger service. Possible values are: - BASIC - HTTP basic authentication - KERBEROS - Kerberos authentication
ranger.presto-plugin-username Ranger Presto plugin user name. This property is used when ranger.authentication-type=BASIC is set.
ranger.presto-plugin-password Ranger Presto plugin user password. This property is used when ranger.authentication-type=BASIC is set.
ranger.kerberos-principal Ranger client Kerberos principal. This property is used when ranger.authentication-type=KERBEROS is set.
ranger.kerberos-keytab Ranger client Kerberos keytab file location. This property is used when ranger.authentication-type=KERBEROS is set.
ranger.plugin-policy-ssl-config-file Path to Ranger SSL configuration file.
ranger.policy-refresh-interval Interval determining how often authorization polices will be refreshed. The highest latency after which changes in Ranger authorization policies will be visible in Presto. By default it is set to 30s.
ranger.policy-connection-timeout Ranger service connection timeout. By default it is set to 30s.
ranger.policy-read-timeout Ranger service read timeout. By default it is set to 30s.
ranger.cache-ttl Period where group mapping information will be cached in Presto. 0ms disables the cache. By default it is set to 30s.

Sample Configuration

The following is a sample of a Hive Connector configuration file that is configured to use Apache Ranger for authorization. It utilizes Kerberos for authentication.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=ranger

ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

Apache Sentry Based Authorization

When sentry security is enabled, Presto enforces the same SQL standard based authorization as Hive does when Sentry is enabled for Hive.

See Presto with Apache Sentry for additional information.

Property Name Description
sentry.server The name of the server object in Sentry that Presto will use to find authorization rules. This should be set to value of hive.sentry.server from Hive’s configuration XML files.
sentry.admin-user Admin user of Apache Sentry that has ALL access to server object. It is a user that belongs to any group that are mentioned in sentry.service.admin.group property in sentry-site.xml Sentry service configuration file.
sentry.rpc-addresses Address on which sentry RPC is available.
sentry.rpc-port Port at which Sentry is listening.
sentry.authentication-type Authentication method that will be used when connecting to Sentry service. Possible values are NONE or KERBEROS.
sentry.service-principal Sentry service Kerberos principal that will be used to authenticate the Sentry service. This property is only used when sentry.authentication-type=KERBEROS.
sentry.client-principal Sentry client Kerberos principal that will be used to authenticate the client when connecting to Sentry service. The primary part of this principal (user) should be included in sentry.service.allow.connect property in sentry-site.xml Sentry service configuration file. This property is only used when sentry.authentication-type=KERBEROS.
sentry.client-key-tab Sentry client Kerberos keytab file location that will be used to authenticate the client when connecting to to Sentry service. This property is only used when sentry.authentication-type=KERBEROS.
sentry.cache-ttl Period where information returned by Sentry will be cached in Presto. 0ms disables the cache. By default it is set to 1m.
sentry.group-mapping Defines the way how user group are determined. Possible values are: - HADOOP_DEFAULT user groups will be retrieved from hadoop client library. You may want to use sentry.config.resources to customize this behaviour. - SYSTEM user groups will be retrieved from operating system that Presto is running on - LDAP user groups will be retrieved from LDAP.
sentry.ldap.url Address of LDAP service when sentry.group-mapping==LDAP.
sentry.ldap.user LDAP user name when sentry.group-mapping==LDAP.
sentry.ldap.password LDAP user password when sentry.group-mapping==LDAP.
sentry.ldap.search-base Configures the search base for the LDAP connection when sentry.group-mapping==LDAP.
sentry.ldap.user-search-filter Additional filters to apply when when searching for users when sentry.group-mapping==LDAP.
sentry.ldap.group-search-filter Additional filters to apply when finding relevant groups when sentry.group-mapping==LDAP.
sentry.ldap.group-member-attribute LDAP attribute to use for determining group membership when sentry.group-mapping==LDAP.
sentry.ldap.group-name-attribute LDAP attribute to use for identifying a group’s name when sentry.group-mapping==LDAP.
sentry.group-mapping.cache-ttl Period where group mapping information will be cached in Presto. 0ms disables the cache. By default it is set to 1min.
sentry.group-mapping.negative-cache-ttl Period where information about empty group will be cached in Presto. 0ms disables the cache.
sentry.config.resources Additional XML configuration files which will be read before applying Presto Sentry configuration. Useful for reusing existing sentry-site.xml configuration files.

Sample Configuration

The following is a sample of a Hive Connector configuration file that is configure to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=sentry

sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038

sentry.authentication-type=KERBEROS

sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=presto-server/presto-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/presto/conf/presto-server.keytab

sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=presto,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=presto,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn

sentry.group-mapping.cache-ttl=10s

HDFS wire encryption

In a Kerberized Hadoop cluster with enabled HDFS wire encryption you can enable Presto to access HDFS by using below property.

Property Name Description
hive.hdfs.wire-encryption.enabled Enables HDFS wire encryption. Possible values are true or false.

Note

Depending on Presto installation configuration, using wire encryption may impact query execution performance.