9.5. Starburst Hive Connector Security#

The Hive connector can be configured to use Apache Sentry sentry or Apache Ranger ranger as security backend for Hive object access using the hive.security property in the catalog properties file.

hive.security=ranger

This setup has to be combined with specific properties for the two systems.

Apache Sentry Based Authorization#

When sentry security is enabled, Presto enforces the same SQL standard based authorization as Hive does when Sentry is enabled for Hive.

Sentry authorization information can be accessed by querying the following tables:
  • information_schema.roles - return information about all existing roles (equivalent of SHOW ROLES)
  • information_schema.applicable_roles - return roles that are granted to current user
  • information_schema.enabled_roles - return a list of roles that currently user is using at the moment (equivalent of SHOW CURRENT USER)
  • information_schema.table_privileges - return all tables privileges granted to user according to currently enabled roles

Presto does not support any modification of authorization policies in Sentry.

Further information is available in the documentation about the Hive level security with Apache Sentry.

Property Name Description
sentry.server The name of the server object in Sentry that Presto will use to find authorization rules. This should be set to value of hive.sentry.server from Hive’s configuration XML files.
sentry.admin-user Admin user of Apache Sentry that has ALL access to server object. It is a user that belongs to any group that are mentioned in sentry.service.admin.group property in sentry-site.xml Sentry service configuration file.
sentry.rpc-addresses Address on which sentry RPC is available.
sentry.rpc-port Port at which Sentry is listening.
sentry.authentication-type Authentication method that will be used when connecting to Sentry service. Possible values are NONE or KERBEROS.
sentry.service-principal Sentry service Kerberos principal that will be used to authenticate the Sentry service. This property is only used when sentry.authentication-type=KERBEROS.
sentry.client-principal Sentry client Kerberos principal that will be used to authenticate the client when connecting to Sentry service. The primary part of this principal (user) should be included in sentry.service.allow.connect property in sentry-site.xml Sentry service configuration file. This property is only used when sentry.authentication-type=KERBEROS.
sentry.client-key-tab Sentry client Kerberos keytab file location that will be used to authenticate the client when connecting to to Sentry service. This property is only used when sentry.authentication-type=KERBEROS.
sentry.cache-ttl Period where information returned by Sentry will be cached in Presto. 0ms disables the cache. By default it is set to 1m.
sentry.group-mapping Defines the way how user group are determined. Possible values are: - HADOOP_DEFAULT user groups will be retrieved from hadoop client library. You may want to use sentry.config.resources to customize this behaviour. - SYSTEM user groups will be retrieved from operating system that Presto is running on - LDAP user groups will be retrieved from LDAP.
sentry.ldap.url Address of LDAP service when sentry.group-mapping==LDAP.
sentry.ldap.user LDAP user name when sentry.group-mapping==LDAP.
sentry.ldap.password LDAP user password when sentry.group-mapping==LDAP.
sentry.ldap.search-base Configures the search base for the LDAP connection when sentry.group-mapping==LDAP.
sentry.ldap.user-search-filter Additional filters to apply when when searching for users when sentry.group-mapping==LDAP.
sentry.ldap.group-search-filter Additional filters to apply when finding relevant groups when sentry.group-mapping==LDAP.
sentry.ldap.group-member-attribute LDAP attribute to use for determining group membership when sentry.group-mapping==LDAP.
sentry.ldap.group-name-attribute LDAP attribute to use for identifying a group’s name when sentry.group-mapping==LDAP.
sentry.group-mapping.cache-ttl Period where group mapping information will be cached in Presto. 0ms disables the cache. By default it is set to 1min.
sentry.group-mapping.negative-cache-ttl Period where information about empty group will be cached in Presto. 0ms disables the cache.
sentry.config.resources Additional XML configuration files which will be read before applying Presto Sentry configuration. Useful for reusing existing sentry-site.xml configuration files.

Sample Configuration#

The following is a sample of a Hive Connector configuration file that is configure to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=sentry

sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038

sentry.authentication-type=KERBEROS

sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=presto-server/presto-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/presto/conf/presto-server.keytab

sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=presto,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=presto,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn

sentry.group-mapping.cache-ttl=10s

Apache Ranger Based Authorization#

When ranger security is enabled, Presto enforces the same SQL standard based authorization as Hive does when Ranger is enabled for Hive.

Further information is available in the documentation about the Hive level security with Apache Ranger.

Property Name Description
ranger.policy-rest-url Address URL of Ranger service.
ranger.service-name Ranger Presto plugin service name.
ranger.authentication-type Authentication method that will be used when connecting to Ranger service. Possible values are: - BASIC - HTTP basic authentication - KERBEROS - Kerberos authentication Notice that ranger.authentication-type=KERBEROS requires https protocol to be used for ranger.policy-rest-url.
ranger.presto-plugin-username Ranger Presto plugin user name. This property is used when ranger.authentication-type=BASIC is set.
ranger.presto-plugin-password Ranger Presto plugin user password. This property is used when ranger.authentication-type=BASIC is set.
ranger.kerberos-principal Ranger client Kerberos principal. This property is used when ranger.authentication-type=KERBEROS is set.
ranger.kerberos-keytab Ranger client Kerberos keytab file location. This property is used when ranger.authentication-type=KERBEROS is set.
ranger.plugin-policy-ssl-config-file Path to Ranger SSL configuration file. This file is required when https protocol is used for ranger.policy-rest-url.
ranger.policy-refresh-interval Interval determining how often authorization polices will be refreshed. The highest latency after which changes in Ranger authorization policies will be visible in Presto. By default it is set to 30s.
ranger.policy-connection-timeout Ranger service connection timeout. By default it is set to 30s.
ranger.policy-read-timeout Ranger service read timeout. By default it is set to 30s.
ranger.policy-cache-dir Path to ranger cache dir for policies. It allows to load policies from cache on startup, even though Ranger Policy Admin was not available at the moment.
ranger.cache-ttl Period where group mapping information will be cached in Presto. 0ms disables the cache. By default it is set to 30s.
ranger.cache-refresh-interval Period at which group mapping information will be refreshed in Presto. Any value greater than ranger.cache-ttl disables it. Refreshing is disabled by default.
ranger.enable-row-filtering To enable row filtering set this flag to true. Note that there are semantic differences between Presto SQL and Hive QL.
ranger.wild-card-resource-matching-for-row-filtering To enable resource wild card matching for row filtering set this flag to true. When two policies are matching single resource, the one without wildcards will be used. When multiple wildcard policies match, it is undetermined which one will be used.
ranger.wild-card-resource-matching-for-column-masking To enable resource wild card matching for column masking set this flag to true. When two policies are matching single resource, the one without wildcards will be used. When multiple wildcard policies match, it is undetermined which one will be used.
ranger.config-resources Additional XML configuration files which will be read before applying Presto Ranger configuration. Useful for reusing existing Hive Ranger configuration with things like Ranger Audit configuration.

Column masking#

Presto Hive Ranger integration supports most of the column masking methods that are supported in Hive with Ranger. Presto does not distinguish upper case, lower case and digital characters when masking. x is used for all mentioned character types.

Sample Configuration#

The following is a sample of a Hive Connector configuration file that is configured to use Apache Ranger for authorization. It utilizes Kerberos for authentication.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=ranger

ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

The following is a sample of Ranger SSL configuration file.

<configuration>
    <!--  The following properties are used for 2-way SSL client server validation -->
    <property>
        <name>xasecure.policymgr.clientssl.keystore</name>
        <value>/etc/hive/conf/ranger-plugin-keystore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore</name>
        <value>/etc/hive/conf/ranger-plugin-truststore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
</configuration>

Configuring Ranger Audit#

Ranger Audit is configured in the Ranger-specific file /etc/hive/conf/ranger-hive-audit.xml. The location of the file has to be specified in your catalog properties file:

ranger.config-resources=/etc/hive/conf/ranger-hive-audit.xml

Controlling access to User Defined Functions with Ranger#

Ranger System Access Control is used to enforce User Defined Function (UDF) policies set by Ranger. A UDF in Presto is deployed as a plugin (Functions) and stored in the Presto global namespace. This global namespace is managed at the System Access Control (see System Access Control) level. This is independent of the Ranger Connector Access Control used to enforce all other Ranger policies at the catalog level.

The Ranger resource hierarchy for all UDF policies need to have an associated database (or schema) namespace when creating the policy. Because the global namespace is independent of any connector namespace, this poses a slight challenge to control access to UDFs using Ranger. To overcome this you must specify $presto as the database name in Ranger. This will keep all Presto functions are under the $presto database in Ranger resource hierarchy.

To configure Ranger System Access Control for User Defined Functions, you need to add the following to the System Access Control property file. Here is an example etc/access-control.properties below:

access-control.name=ranger-system-access-control
ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

All Ranger properties that are allowed in hive.properties are allowed here. However Ranger properties related to row filtering or column masking are ignored. This additional configuration is needed because the Ranger System Access Control uses an independent Ranger client from the Ranger Connector Access Control.

Recall that in Presto only one System Access Control can be defined, while Connector Access Control can be defined per each Presto catalog. In the scenario where there are multiple Hive connectors and multiple Ranger services, only one of those Ranger services can be used to managed the UDF policies.

Limitations#

Ranger authorization in Presto has following limitations:
  • authorization information cannot be accessed by querying the following tables like: information_schema.roles, information_schema.applicable_roles, information_schema.enabled_roles, information_schema.table_privileges
  • Presto does not support any modification of authorization policies in Sentry, this encloses commands like CREATE ROLE..., GRANT... or REVOKE.
  • Presto does not support SET ROLE... for Ranger authorization, by default all user applicable roles are enabled.
  • In case of usage of unsupported column masking MASK_NULL will be used.

HDFS wire encryption#

In a Kerberized Hadoop cluster with enabled HDFS wire encryption you can enable Presto to access HDFS by using below property.

Property Name Description
hive.hdfs.wire-encryption.enabled Enables HDFS wire encryption. Possible values are true or false.

Note

Depending on Presto installation configuration, using wire encryption may impact query execution performance.