11.5. Starburst Hive Connector Security#

The Hive connector can be configured to use Apache Sentry sentry or Apache Ranger ranger as security backend for Hive object access using the hive.security property in the catalog properties file.

hive.security=ranger

This setup has to be combined with specific properties for the two systems.

Apache Sentry Based Authorization#

When sentry security is enabled, Presto enforces the same SQL standard based authorization as Hive does when Sentry is enabled for Hive.

Sentry authorization information can be accessed by querying the following tables:
  • information_schema.roles - return information about all existing roles (equivalent of SHOW ROLES)

  • information_schema.applicable_roles - return roles that are granted to current user

  • information_schema.enabled_roles - return a list of roles that currently user is using at the moment (equivalent of SHOW CURRENT USER)

  • information_schema.table_privileges - return all tables privileges granted to user according to currently enabled roles

Presto does not support any modification of authorization policies in Sentry.

Further information is available in the documentation about the Hive level security with Apache Sentry.

Property Name

Description

sentry.server

The name of the server object in Sentry that Presto will use to find authorization rules. This should be set to value of hive.sentry.server from Hive’s configuration XML files.

sentry.admin-user

Admin user of Apache Sentry that has ALL access to server object. It is a user that belongs to any group that are mentioned in sentry.service.admin.group property in sentry-site.xml Sentry service configuration file.

sentry.rpc-addresses

Address on which sentry RPC is available.

sentry.rpc-port

Port at which Sentry is listening.

sentry.authentication-type

Authentication method that will be used when connecting to Sentry service. Possible values are NONE or KERBEROS.

sentry.service-principal

Sentry service Kerberos principal that will be used to authenticate the Sentry service. This property is only used when sentry.authentication-type=KERBEROS.

sentry.client-principal

Sentry client Kerberos principal that will be used to authenticate the client when connecting to Sentry service. The primary part of this principal (user) should be included in sentry.service.allow.connect property in sentry-site.xml Sentry service configuration file. This property is only used when sentry.authentication-type=KERBEROS.

sentry.client-key-tab

Sentry client Kerberos keytab file location that will be used to authenticate the client when connecting to to Sentry service. This property is only used when sentry.authentication-type=KERBEROS.

sentry.cache-ttl

Period where information returned by Sentry will be cached in Presto. 0ms disables the cache. By default it is set to 1m.

sentry.group-mapping

Defines the way how user group are determined. Possible values are: - HADOOP_DEFAULT user groups will be retrieved from hadoop client library. You may want to use sentry.config.resources to customize this behaviour. - SYSTEM user groups will be retrieved from operating system that Presto is running on - LDAP user groups will be retrieved from LDAP.

sentry.ldap.url

Address of LDAP service when sentry.group-mapping==LDAP.

sentry.ldap.user

LDAP user name when sentry.group-mapping==LDAP.

sentry.ldap.password

LDAP user password when sentry.group-mapping==LDAP.

sentry.ldap.search-base

Configures the search base for the LDAP connection when sentry.group-mapping==LDAP.

sentry.ldap.user-search-filter

Additional filters to apply when when searching for users when sentry.group-mapping==LDAP.

sentry.ldap.group-search-filter

Additional filters to apply when finding relevant groups when sentry.group-mapping==LDAP.

sentry.ldap.group-member-attribute

LDAP attribute to use for determining group membership when sentry.group-mapping==LDAP.

sentry.ldap.group-name-attribute

LDAP attribute to use for identifying a group’s name when sentry.group-mapping==LDAP.

sentry.group-mapping.cache-ttl

Period where group mapping information will be cached in Presto. 0ms disables the cache. By default it is set to 1min.

sentry.group-mapping.negative-cache-ttl

Period where information about empty group will be cached in Presto. 0ms disables the cache.

sentry.config.resources

Additional XML configuration files which will be read before applying Presto Sentry configuration. Useful for reusing existing sentry-site.xml configuration files.

Sample Configuration#

The following is a sample of a Hive Connector configuration file that is configure to use Apache Sentry for authorization. It utilizes Kerberos for authentication and LDAP for group mapping.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=sentry

sentry.server=sentryserver
sentry.admin-user=hive
sentry.rpc-addresses=sentry-host-address
sentry.rpc-port=8038

sentry.authentication-type=KERBEROS

sentry.service-principal=sentry/sentry-node@EXAMPLE.COM
sentry.client-principal=presto-server/presto-server-node@EXAMPLE.COM
sentry.client-key-tab=/etc/presto/conf/presto-server.keytab

sentry.group-mapping=LDAP
sentry.ldap.url=ldaps://ldapserver/
sentry.ldap.user=cn=admin,dc=presto,dc=example,dc=com
sentry.ldap.password=secret1234
sentry.ldap.search-base=dc=presto,dc=example,dc=com
sentry.ldap.user-search-filter=(&(objectClass=inetOrgPerson)(uid={0}))
sentry.ldap.group-search-filter=(objectClass=groupOfNames)
sentry.ldap.group-member-attribute=member
sentry.ldap.group-name-attribute=cn

sentry.group-mapping.cache-ttl=10s

Apache Ranger Based Authorization#

When ranger security is enabled, Presto enforces the same SQL standard based authorization as Hive does when Ranger is enabled for Hive.

Further information is available in the documentation about the Hive level security with Apache Ranger.

Ranger properties#

Property name

Default

Description

ranger.policy-rest-url

Address URL of Ranger service

ranger.service-name

Ranger Presto plugin service name

ranger.authentication-type

Authentication method to for connecting to Ranger. Possible values are:

  • BASIC, HTTP basic authentication

  • KERBEROS, Kerberos authentication, notice that ranger.authentication-type=KERBEROS requires the https protocol to be used for the ranger.policy-rest-url.

ranger.presto-plugin-username

Ranger Presto plugin user name. This property is used when ranger. authentication-type=BASIC is set.

ranger.presto-plugin-password

Ranger Presto plugin user password. This property is used when ranger.authentication-type=BASIC is set.

ranger.kerberos-principal

Ranger client Kerberos principal. This property is used when ranger.authentication-type=KERBEROS is set.

ranger.kerberos-keytab

Ranger client Kerberos keytab file location. This property is used when ranger.authentication-type=KERBEROS is set.

ranger.plugin-policy-ssl-config-file

Path to Ranger SSL configuration file. This file is required when https protocol is used for ranger.policy-rest-url.

ranger.policy-refresh-interval

30s

Interval determining how often authorization polices are refreshed. The highest latency after which changes in Ranger authorization policies are visible in Presto.

ranger.policy-connection-timeout

30s

Ranger service connection timeout.

ranger.policy-read-timeout

30s

Ranger service read timeout.

ranger.policy-cache-dir

Path to ranger cache dir for policies. It allows to load policies from cache on startup, even though Ranger Policy Admin was not available at the moment.

ranger.cache-ttl

30s

Period how long group mapping information is cached in Presto. 0ms disables the cache.

ranger.cache-refresh-interval

Disabled, 0ms

Period how long group mapping information is refreshed in Presto. Any value greater than ranger.cache-ttl disables it.

ranger.row-filtering.enabled

false

To enable row filtering set this flag to true. Note that there are semantic differences between Presto SQL and Hive QL.

ranger.wild-card-resource-matching-for-row-filtering

false

To enable resource wild card matching for row filtering set this flag to true. When two policies are matching single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used.

ranger.wild-card-resource-matching-for-column-masking

false

To enable resource wild card matching for column masking set this flag to true. When two policies are matching single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used.

ranger.config-resources

Additional XML configuration files which is read before applying Presto Ranger configuration. Useful for reusing existing Hive Ranger configuration with things like Ranger Audit configuration.

ranger.sql.enabled

true

Enable Ranger policy management with SQL.

Column masking#

Presto Hive Ranger integration supports most of the column masking methods that are supported in Hive with Ranger. Presto does not distinguish upper case, lower case and digital characters when masking. x is used for all mentioned character types.

Sample configuration#

The following is a sample of a Hive Connector configuration file that is configured to use Apache Ranger for authorization. It utilizes Kerberos for authentication.

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/presto-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/presto-server-node@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=ranger

ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

The following is a sample of Ranger SSL configuration file.

<configuration>
    <!--  The following properties are used for 2-way SSL client server validation -->
    <property>
        <name>xasecure.policymgr.clientssl.keystore</name>
        <value>/etc/hive/conf/ranger-plugin-keystore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore</name>
        <value>/etc/hive/conf/ranger-plugin-truststore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
</configuration>

Configuring Ranger Audit#

Ranger Audit is configured in the Ranger-specific file /etc/hive/conf/ranger-hive-audit.xml. The location of the file has to be specified in your catalog properties file:

ranger.config-resources=/etc/hive/conf/ranger-hive-audit.xml

Audit operates for Hive level security with Apache Ranger and system level security with Apache Ranger.

Controlling access to User Defined Functions with Ranger#

Ranger System Access Control is used to enforce User Defined Function (UDF) policies set by Ranger. A UDF in Presto is deployed as a plugin (Functions) and stored in the Presto global namespace. This global namespace is managed at the System Access Control (see System Access Control) level. This is independent of the Ranger Connector Access Control used to enforce all other Ranger policies at the catalog level.

The Ranger resource hierarchy for all UDF policies need to have an associated database (or schema) namespace when creating the policy. Because the global namespace is independent of any connector namespace, this poses a slight challenge to control access to UDFs using Ranger. To overcome this you must specify $presto as the database name in Ranger. This will keep all Presto functions are under the $presto database in Ranger resource hierarchy.

To configure Ranger System Access Control for User Defined Functions, you need to add the following to the System Access Control property file. Here is an example etc/access-control.properties below:

access-control.name=ranger-system-access-control
ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

All Ranger properties that are allowed in hive.properties are allowed here. However Ranger properties related to row filtering or column masking are ignored. This additional configuration is needed because the Ranger System Access Control uses an independent Ranger client from the Ranger Connector Access Control.

Recall that in Presto only one System Access Control can be defined, while Connector Access Control can be defined per each Presto catalog. In the scenario where there are multiple Hive connectors and multiple Ranger services, only one of those Ranger services can be used to managed the UDF policies.

Limitations#

Ranger authorization in Presto has following limitations:
  • authorization information cannot be accessed by querying the following tables like: information_schema.roles, information_schema.applicable_roles, information_schema.enabled_roles, information_schema.table_privileges

  • Presto does not support any modification of authorization policies in Sentry, this encloses commands like CREATE ROLE..., GRANT... or REVOKE.

  • Presto does not support SET ROLE... for Ranger authorization, by default all user applicable roles are enabled.

  • In case of usage of unsupported column masking MASK_NULL will be used.

HDFS wire encryption#

In a Kerberized Hadoop cluster with enabled HDFS wire encryption you can enable Presto to access HDFS by using below property.

Property Name

Description

hive.hdfs.wire-encryption.enabled

Enables HDFS wire encryption. Possible values are true or false.

Note

Depending on Presto installation configuration, using wire encryption may impact query execution performance.