9.9. Presto with Apache Sentry

Overview

Apache Sentry is a granular, role-based authorization module for Hadoop. Sentry provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications such as Presto.

Presto Enterprise is integrated with Apache Sentry enforcing the same and existing privileges granted on Hive objects. Presto will enforce privileges assigned to Hive Databases, Tables, Columns, and Views. If a user does not have a privilege to query an object, the query will fail and an error will be returned.

Presto Enterprise is only available from Staburst. For more information about Presto Enterprise and Apache Sentry integration or to obtain a free trial, please contact hello@starburstdata.com.

Before You Begin

Before you configure Presto with Apache Sentry, verify the following prerequisites:

  • Cloudera Enterprise 5.12+ with Apache Sentry and Hive installed.
  • Presto Coordinator and Presto Workers have the appropriate network access to communicate with the Apache Sentry Service. Typically this is port 8038.
  • If LDAP is used for user to groups mapping, Presto Coordinator and Presto Workers have the appropriate network access to communicate with the LDAP server. Typically this is port 636 or 389.

If you are new to Apache Sentry, Cloudera provides excellent documentation for installing and configuring Apache Sentry:

Architecture

When a query is submitted to Presto, Presto parses and analyzes the query to understand the privileges required by the user to access a particular object. Presto communicates with the Apache Sentry Service to determine if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, an error is returned to the user.

Caching is also used to improve performance and reduce the number of requests to the Sentry service.

Configuring Presto with Apache Sentry

Apache Sentry Configuration

As with Hive, Impala, Spark, and Hue, you must create a Admin Group for Presto named presto. You can do this via the Cloudera Manager or manually by adding to the property, sentry.service.admin.group in the sentry-site.xml file. The user of the Presto process should belong to this group. Additionally you must add Presto user (from sentry.client-principal) to sentry.service.allow.connect in sentry-site.xml.

For additional information refer to the Cloudera documentation:

Presto Configuration for Apache Sentry

Starburst Presto Enterprise must be configured to enable Presto to communicate with the Apache Sentry service. To enable set the following property in the Hive connector configuration.

hive.security = sentry

Once Apache Sentry is enabled for Presto, there are additional required and optional properties to further configure. You can also see the full list of configuration properties in Apache Sentry Based Authorization.

Group Mapping

Sentry manages role permissions and the roles to user groups associations. Sentry does not manage users to user groups associations. For this reason, any application using Sentry needs to be configured to be able to determine a user’s groups. In Presto, the sentry.group-mapping property specifies how the user groups are determined. By default it is set to HADOOP_DEFAULT. See Apache Sentry Based Authorization for other possible values.

For more information from Cloudera’s documentation, refer to:

Note

It may be desired to reuse your existing sentry-site.xml configuration instead of setting new configurations in the Hive connector configuration. To have Presto use an XML configuration file, set sentry.config.resources to the file location of a sentry-site.xml configuration file.

When using HADOOP_DEFAULT group mapping and sentry.config.resources is set, and the provided file(s) contain a value for hadoop.security.group.mapping, the configured user group mapping will be used. If you do not set sentry.config.resources Presto will use Hadoop’s default behavior, which is to retrieve user groups from the local operating system. Similarly, when using LDAP group mapping and you provide Hadoop configuration files using sentry.config.resources property, you can abstain from setting LDAP Group mapping properties in Hive connector configuration.

Authorization Information

Sentry authorization information can be accessed by querying the following tables:
  • information_schema.roles - return information about all existing roles (equivalent of SHOW ROLES)
  • information_schema.applicable_roles - return roles that are granted to current user
  • information_schema.enabled_roles - return a list of roles that currently user is using at the moment (equivalent of SHOW CURRENT USER)
  • information_schema.table_privileges - return all tables privileges granted to user according to currently enabled roles

Caching

There is some latency associated with making the remote procedure calls to Apache Sentry as well as syncing LDAP groups. Presto includes a caching mechanism so that subsequent calls can look at the cache before making the remote call. See Apache Sentry Based Authorization for the cache properties along with their default values. Depending on your use case, you may want to increase or decrease the default ttl values.

Troubleshooting

  • If you get the exception GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed) then you need to make sure you are using proper sentry.service-principal.
  • If you get an SentryAccessDeniedException exception then make sure the user that you set for sentry.admin-user belongs to any group listed by sentry.service.admin.group in sentry-site.xml.
  • If Presto is not capable to connect to Kerberized Sentry and you get an exception Peer indicated failure: Problem with callback handler make sure that you added the Presto user (from sentry.client-principal) to sentry.service.allow.connect in sentry-site. Additionally, make sure the letter casing matches.
  • Make sure that your sentry.server value is correct. It is not an IP or Hostname. It is server object name in Sentry.

Limitations

Presto only enforces the Apache Sentry policies. Presto does not support any modification of authorization policies in Sentry. This includes commands like CREATE ROLE, GRANT, or REVOKE. If you need to modify the roles and privileges, that must be done via another tool such as Apache Hive or Hue. Sentry Policy Files are also not supported.