System level security with Apache Ranger#

The integration of Apache Ranger with Starburst Enterprise Presto enables a global role-based access control and authorization model at the Presto system level.

Note

System level security with Apache Ranger requires a valid Starburst Enterprise Presto license.

Policies in Ranger are created with the Ranger user interface and define access and authorization. Each policy combines user and group information with a resource and access rights to the resource. Ranger is configured to your organization’s LDAP system for user and group information. Ranger usage requires the installation of the Presto Ranger plugin. It creates the Starburst Enterprise Presto starburst-enterprise-presto service type, or service definition, in Ranger. This encompasses a Presto-specific set of resources that include catalog, schema, table, column, and more, so access rules for there resources can be configured in Ranger.

The Ranger Presto plugin is responsible for connecting to Ranger from Presto and using the defined policies for Presto resources. Any user action in Presto, such as submitting a query, is validated against the policies from Ranger and potentially prevented.

For example, a query is parsed and analyzed to determine all involved resources such as schemas and tables. Once a list is created, all policies are analyzed in Presto to determine if the user initiating the query has all necessary access rights. Processing only continues if all rights are granted.

Note

Contrary to the Hive level security, the system level Ranger integration is suitable to define role-based access to catalogs using any connector as well as a number of other system resources.

Starting to use of Ranger for role-based access control can be summarized into a few steps:

  • Install Ranger and add the Presto Ranger plugin

  • Configure Ranger for user and group information from LDAP

  • Configure Presto to use Ranger

  • Define policies with the Ranger user interface

  • Enjoy the access control for catalogs, schema, tables and more for all users

Details for all these steps and more are documented in the following sections.

Ranger installation and configuration#

Your first important step is the installation and configuration of Ranger, which can be summarized to the following steps:

  • Install Ranger 2.0.0 or higher

  • Configure Ranger to access your LDAP system for user, group and role information

  • Add Presto Ranger plugin to Ranger

Installation

AWS CloudFoundation deployment

The Starburst support for Amazon CFT-based installation includes installation of Apache Ranger and all relevant configuration. Detailed information is available in the AWS documentation.

K8s deployment

The Starburst support for Kubernetes-based installation includes installation of Apache Ranger and all relevant configuration. Detailed information is available in the Kubernetes documentation.

Connect existing Ranger

Using an existing Ranger 2.0.0 or higher is supported. You just have to make sure that the coordinator has network access to Ranger and LDAP is configured, so that all relevant users, groups and roles are available. As a next step you need install the Presto Ranger Plugin.

Manual Ranger installation

If you are running Presto on-premise or some other custom deployment, you need to install Ranger 2.0.0 or higher following the documentation from the Ranger project.

Connect Ranger to LDAP

Ranger needs to access the information about your users, groups and roles in your LDAP system. With the K8s and AWS installation methods, all details are already configured. For existing Ranger usage or manual installation, you need to ensure that Ranger is connected to your LDAP directory provider and a synchronization process is in place. The process varies based on your LDAP system and documented in the Ranger documentation.

Presto Ranger plugin

The Presto Ranger plugin is automatically installed with Ranger, when using the AWS CFT and Kubernetes installations. For existing Ranger instances or custom Ranger installation, you need follow these steps:

  • Locate the directory in your SEP distribution

  • Copy JAR files presto-ranger-plugin.jar and presto-jdbc.jar to your Ranger installation into the directory ${RANGER_HOME}/ews/webapp/WEB-INF/lib

  • Restart Ranger. The Presto Ranger plugin automatically creates the service type definition for Presto in Ranger.

  • Access the Ranger user interface and confirm that you can find the Starburst Enterprise Presto service type

With the Presto Ranger plugin installed in Ranger, you can create one or multiple services with the SEP. This allows you to have separate services for different Presto clusters.

Ranger plugin configuration#

With Ranger installed and configured you are now ready to configure Ranger as the activated access control system. Update etc/config.properties and set the path to your Ranger access control configuration file:

access-control.config-files=etc/access-control-ranger.properties

Subsequently configure details in the file:

access-control.name=ranger
ranger.policy-rest-url=http://ranger.example.com:6080
ranger.service-name=presto-production
ranger.presto-plugin-username=<username>
ranger.presto-plugin-password=<password>
ranger.policy-refresh-interval=30s
Ranger configuration properties#

Property

Description

Default value

access-control.name

Set the name of the access control system to ranger to activate the Ranger plugin.

ranger.policy-rest-url

The URL to the Ranger server

ranger.service-name

Name of the service defined in Ranger for this Presto cluster

ranger.authentication-type

Authentication type for Presto connecting to Ranger, currently only BASIC is supported, KERBEROS is planned for a future release

BASIC

ranger.presto-plugin-username

Username for the Ranger Presto plugin to use to connect to Ranger with BASIC authentication

ranger.presto-plugin-password

Password for the Ranger Presto plugin to use to connect to Ranger with BASIC authentication

ranger.plugin-policy-ssl-config-file

Path to Ranger plugin SSL configuration

ranger.policy-cache-dir

Ranger’s client persistent cache for policies

ranger.policy-refresh-interval

Interval to refresh policies from Ranger

30s

ranger.policy-connection-timeout

Timeout to use when connecting to Ranger

120s

ranger.policy-read-timeout

Timeout to use when reading policies from Ranger

30s

Audit#

System level security can be configured to use Ranger audit. It automatically verifies, if a user can access queries from other users. Technically this is performed by accessing the internal table system.runtime.queries. Any access to the table is logged.

The property ranger.audit.system-runtime-queries.enabled is set to true by default and controls this logging behavior.

The Web UI makes heavy use of the queries table. As a result using the web interface causes a flood of audit events. Setting the property to false disables this audit logging.

User, groups and roles#

User, groups and roles are sourced from your connected LDAP directory and are used the target users for each policy.

Policies#

Policy creation and management is performed with the Ranger user interface, or optionally with the Ranger REST API.

A policy is a combination of set of resources and the associated privileges. Specific user interface elements with drop down and auto-completion are available for all resources.

Resource sets#

A resource set includes one or more resources of different resource types. Wildcard characters are supported to select a number of resources based on a pattern.

  • catalog

  • catalog - schema

  • catalog - schema - table

  • catalog - schema - table - column

  • catalog - schema - procedure

  • catalog - session property

  • function

  • system session property

  • query

As you can see from the list above, some resources are hierarchically organized within a catalog and below. This allows you for example to restrict access to a complete catalog, a specific schema, or table or even down to a column or a procedure within a schema.

For example, if you can define a set of resources, that allows you to restrict access to all the two tables credit-info and cards-info in all schemas in the hdfs catalog.

  • Catalog: hdfs

  • Schema: *

  • Table: credit-info, cards-info

A set of resource works as a primary key for a policy. It needs to be unique. Multiple policies however may cover a single resource because of the wildcard.

Privilege sets#

A set of privileges consists of one or more user groups, roles and users, and a set of access types for the specified resource set. Privileges can allow or deny operations.

The catalog, schema, table and column resources, which grant access to resources for queries, have the following access types.

  • SELECT to read data from the resource.

  • INSERT to add data to the resource.

  • UPDATE to change data in the resource.

  • DELETE to remove data from the resource.

  • OWNERSHIP to claim ownership of the resource, which provides complete access.

In addition there are privileges that determine access to queries and their usage, and are therefore of a more general nature.

  • SELECT to list queries.

  • EXECUTE to initiate processing of any query. Without this privilege user action is extremely limited.

  • KILL to stop processing of any query.

Denying access to specific tables and columns#

You may need to restrict access for users to tables or columns for security or other reasons.

The following example shows how to achieve that for any $path column and any $partitions table exposed by a catalog called hive:

  • Create a Ranger ALLOW policy with SELECT access type and assign it to the users, groups or roles you need.

  • Add a pattern to allow all resources in the the hive catalog:

    • catalog: hive

    • schema: *

    • table: *

    • column: *

  • Create a similar policy, this time with the DENY type.

  • Add a pattern to deny access to the $path column:

    • catalog: *

    • schema: *

    • table: *

    • column: $path

  • Create another DENY policy that restricts access to $partitions tables:

    • catalog: *

    • schema: *

    • table: *$partitions

    • column: *

This method of restricting access works, because Ranger processes DENY policies before ALLOW policies, when evaluating conditions.

Note

The $path column is an artificial column in Hive. It contains information about the location of data file. The table$partitions table contains internal information about the partitioning of the table. All this information should typically not be available to normal users. The above examples shows how you can implement this best practice for catalogs using the Hive connector.