Apache Ranger overview#

Apache Ranger is a tool to manage access control policies for Hadoop/Hive and related object storage systems. It provides a simple and intuitive web-based console for creating and managing policies controlling access to the data.

Presto can be integrated with Ranger as a access control system. When a query is submitted to Presto, Presto parses and analyzes the query to understand the privileges required by the user to access objects such as schemas and tables. Once a list of these objects is created, Presto communicates with the Ranger service to determine if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned. Ranger policies are cached in Presto to improve performance.

Authentication is handled outside of Ranger, for example using LDAP, and Ranger uses the authenticated user and user groups to associate with the policy definition.

Note

SEP integration with Ranger requires a valid Starburst Enterprise Presto license.

Ranger usage options#

SEP offers the following different integrations with Ranger:

We highly recommend implementing Ranger for global access control. This allows you to use Ranger policies for all configured catalogs.

Key concepts#

The concepts and features described in the following section apply to all Ranger usage.

Policies#

A policy is a combination of set of resources and the associated privileges. Ranger provides a user interface, or optionally a REST API, to create and manage these access control policies.

Privilege sets#

A set of privileges consists of one or more user groups, roles and users, and a set of access types for the specified resource set. Privileges can allow or deny operations.

The catalog, schema, table and column resources, which grant access to resources for queries, have the following access types.

  • SELECT to read data from the resource.

  • INSERT to add data to the resource.

  • UPDATE to change data in the resource.

  • DELETE to remove data from the resource.

  • OWNERSHIP to claim ownership of the resource, which provides complete access.

Users, groups, and roles#

Users, groups, and roles are sourced from your configured authentication system, ideally a connected LDAP directory, and are used the target users for each policy.

Column-level authorization#

Presto enforces column-level privileges granted to roles. For example, if a user is only granted access to a subset of table columns, they are only able to query from these columns. If they execute an SQL statement that refers to other columns, the query fails with an error.

Column masking#

Presto’s Apache Ranger integration supports most of the column masking methods that are supported in Hive with Ranger. Presto does not distinguish upper case, lower case and digital characters when masking. x is used for all mentioned character types.

Note

In the case of usage of any unsupported column masking, MASK_NULL is used.

Service and catalog integrations#

In addition to enforcing the policies in Apache Ranger, Presto integrates with the Apache Ranger Key Management Service, and has support for AWS Glue Data Catalog, row level filtering and tag-based policies.

Features and use cases#

The following features and use cases are applicable with all Ranger usage.

Hive and other catalog authorization set up#

The Ranger integrations replace any other authorization setup for the data source.

For example, you have to treat is as a replacement for authorization by the user configured for the connection to the data source, or any restrictions in the data source utilized by user impersonation or credential passthrough. It is important to avoid these other configurations, and let Ranger manage all access to keep the overall setup simple and manageable.

When catalogs use the Hive connector, disable the other Hive authorization checks in each catalog properties file. Edit the catalog properties file with the following configuration:

hive.security=allow-all

Controlling access to User Defined Functions with Ranger#

You can use the Ranger system access control to enforce User Defined Function (UDF) policies. A UDF in Presto is deployed as a plugin (Functions) and stored in the Presto global namespace. This global namespace is managed at the system access control level.

This is independent of the global and Hive access control with Ranger and Privacera.

The Ranger resource hierarchy for all UDF policies requires an associated database (or schema) namespace when creating the policy. Because the global namespace is independent of any connector namespace, this poses a slight challenge to control access to UDFs using Ranger. To overcome this you must specify $presto as the database name in Ranger. This keeps all Presto functions under the $presto database in Ranger resource hierarchy.

To configure Ranger system access control for UDFs, you need to add the following to a system access control property file e.g. named etc/access-control-ranger-udf.properties:

access-control.name=ranger-system-access-control
ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

All Ranger properties supported for Hive access control with Ranger are supported in the system access control file. However Ranger properties related to row filtering or column masking are ignored. This additional configuration is needed because the Ranger system access control uses an independent Ranger client from the Hive access control. Only one Ranger system access control can be defined, while Hive access control can be configured separated for each Hive catalog. In the scenario where there are multiple Hive catalogs and multiple Ranger services, only one of those Ranger services can be used to managed the UDF policies.

Audit#

When Ranger audit is implemented, whenever access is granted or denied through Ranger, an audit event is logged if auditing is enabled in a given resource policy.

Ranger audit is configured in the Ranger-specific file /etc/hive/conf/ranger-hive-audit.xml. Configuring Ranger audit is complex, and outside the scope of Starburst documentation; please refer to your Ranger documentation to learn how to set up audit optimally for your environment.

For Audit to work with Presto, the location of the file must be specified in your catalog properties file:

ranger.config-resources=/etc/hive/conf/ranger-hive-audit.xml

Caveat regarding performance

Ranger audits are performed by accessing the internal table system.runtime.queries. Any access to the table is logged.

The Web UI makes heavy use of the queries table. The property ranger.audit.system-runtime-queries.enabled is set to true by default and controls this logging behavior. Using the web interface causes a flood of audit events. Setting the property to false disables this audit logging.

Caching#

Caching is used to improve performance and reduce the number of requests to the Ranger service. Caching is enabled through configuration properties, which can be found in the Ranger installation and configuration page.

Authorization limitations#

Authorization information cannot be accessed by querying the following tables such as information_schema.roles, information_schema.applicable_roles, information_schema.enabled_roles, and information_schema.table_privileges.

Configuration properties#

The properties listed in this table apply to all Ranger-related configurations in system access control properties files as as well catalog files using the Hive connector for Hive access control with Apache Ranger or Privacera.

Ranger properties#

Property name

Default

Description

ranger.policy-rest-url

URL address of the Ranger REST service, required to use HTTPS with Kerberos authenticationpolicy-rest-url``.

ranger.service-name

Ranger Presto plugin service name

ranger.authentication-type

BASIC

Authentication type for Presto connecting to Ranger, BASIC or KERBEROS.

ranger.presto-plugin-username

Ranger Presto plugin user name. This property is used when ranger. authentication-type=BASIC is set.

ranger.presto-plugin-password

Ranger Presto plugin user password. This property is used when ranger.authentication-type=BASIC is set.

ranger.kerberos-principal

Ranger service kerberos principal

ranger.kerberos-keytab

Path to the Ranger service kerberos keytab file

ranger.plugin-policy-ssl-config-file

Path to Ranger plugin SSL configuration

ranger.policy-cache-dir

Ranger’s client persistent cache for policies

ranger.policy-refresh-interval

30s

Interval determining how often authorization polices are refreshed. The highest latency after which changes in Ranger authorization policies are visible in Presto.

ranger.policy-connection-timeout

120s

Ranger service connection timeout.

ranger.policy-read-timeout

30s

Ranger service read timeout.

ranger.policy-cache-dir

Path to ranger cache dir for policies. It allows to load policies from cache on startup, even though Ranger Policy Admin was not available at the moment.

ranger.cache-ttl

30s

Period how long group mapping information is cached in Presto. 0ms disables the cache.

ranger.cache-refresh-interval

Disabled, 0ms

Period how long group mapping information is refreshed in Presto. Any value greater than ranger.cache-ttl disables it.

ranger.row-filtering.enabled

false

To enable row filtering set this flag to true. Note that there are semantic differences between Presto SQL and Hive QL.

ranger.wild-card-resource-matching-for-row-filtering

false

To enable resource wild card matching for row filtering set this flag to true. When two policies are matching single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used.

ranger.wild-card-resource-matching-for-column-masking

false

To enable resource wild card matching for column masking set this flag to true. When two policies are matching single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used.

ranger.config-resources

Additional XML configuration files which is read before applying Presto Ranger configuration. Useful for reusing existing HIVE-LEVEL RANGER configuration with things like Ranger Audit configuration.

ranger.sql.enabled

true

Enable Ranger policy management with SQL as supported for Hive access control only.

Ensuring Ranger works with SSL#

If your organization implements SSL, you must ensure that Ranger is correctly configured for it, as connectors also use the configuration via the ranger.plugin-policy-ssl-config-file property. The following is a sample of Ranger SSL configuration file:

<configuration>
    <!--  The following properties are used for 2-way SSL client server validation -->
    <property>
        <name>xasecure.policymgr.clientssl.keystore</name>
        <value>/etc/hive/conf/ranger-plugin-keystore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore</name>
        <value>/etc/hive/conf/ranger-plugin-truststore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
</configuration>

Ensuring Ranger works with your authorization service#

You need to configure Presto to work with the authentication service used by Ranger. While Starburst does offer Kerberos support, Starburst encourages the use of LDAP. The following configuration property is provided:

LDAP#

If your organization uses LDAP system for user and group information, Ranger can use that information to define role-based access to catalogs using any connector, as well as a number of other system resources. Policies in Ranger define access and authorization, and are created with the Ranger user interface. Users, groups, and roles are sourced from your connected LDAP directory and are used to target users for a Ranger policy. Each policy combines user and group information with a resource and access rights to the resource.

Ranger needs to access the information about your users, groups and roles in your LDAP system. With the K8s and AWS installation methods, all details are already configured. For existing Ranger usage or manual installation, you must ensure that Ranger is connected to your LDAP directory provider, and that a synchronization process is in place.

The process of connecting your existing Ranger installation depends on your particular LDAP implementation as well as your Ranger configuration. Learn more about that in the Presto LDAP Authentication page.

Kerberos#

SEP can use Kerberos authentication page, and the Ranger integration also support Kerberos.

Warning

Most organizations that use Kerberos also use LDAP. We strongly encourage you to use LDAP instead of Kerberos, due to the relative unreliability of Kerberos servers, their lack of clear error messaging, and their rigid OS and JVM dependencies.

A sys admin Ranger user (user with role ROLE_SYS_ADMIN) must exist that matches Presto Kerberos principal ranger.kerberos-principal when or Ranger Presto plugin username ranger.presto-plugin-username and password ranger.presto-plugin-password, if BASIC auth is used.

Presto Kerberos principal is translated to Ranger user name via auth-to-local hadoop rules from core-site.xml.

Starburst Ranger CLI#

You can use the Starburst Ranger CLI to manage integration of SEP with Apache Ranger or Privacera Ranger for the following tasks:

The command line application is an executable Java archive, that requires Java 11 or higher available on the system path. You can download it from Starburst and install it with the following steps on Linux or macOS.

  • Ensure the computer is able to reach the Ranger server via HTTP, since the CLI interacts with the REST API. This can be the coordinator, or worker in the cluster or any other computer.

  • Verify Java with java -version

  • Move the binary to a directory in your path, such as ~/bin and rename it.

    mv starburst-ranger-cli-*-executable.jar ~/bin/starburst-ranger-cli
    
  • Verify the folder is on the path.

    echo $PATH
    
  • If necessary, add the folder.

    export PATH=~/bin:$PATH
    
  • Now you can run the help command to verify the CLI works.

    starburst-ranger-cli help
    
  • The resulting output is similar to the following:

    Starburst Ranger command line interface
    USAGE:
    starburst-ranger-cli [--properties=<configFile>] [-p=<String=String>]... [COMMAND]
    ...
    

The help command can also provide details about the other commands and their specific options, if you append help to the desired command, with a few examples shown in the following block:

starburst-ranger-cli help
starburst-ranger-cli user help
starburst-ranger-cli service-definition help
starburst-ranger-cli user create help

Windows installation is supported as well and requires similar commands. You can also run the application directly with Java on Linux, macOS or Windows.

java -jar starburst-ranger-cli-*-executable.jar

You have to supply the connection details from Presto to Ranger in a properties file. Typically you can simply use the Ranger access control properties file by copying it to the computer running the CLI. Alternatively you can use individual properties as command line options.

  • Use the --properties to specify the full path to a .properties file that contains one or more key=value pairs on each line

  • Use the -p option for each property separately with the format -p=key=value.

Ranger user management#

You can manage users in Ranger with the CLI. Properties are used to provide the details for Ranger access.

The following operations are available:

  • create a user

  • get user details

  • delete a user

Syntax follows the same syntax and users properties and the --name option:

starburst-ranger-cli user get
starburst-ranger-cli user create
starburst-ranger-cli user delete

A full example to get a user can look like this:

starburst-ranger-cli user get --name=username --properties=ranger-access-control.properties

Creating a user relies on a JSON file, such as alice.json, with the following syntax:

{
  "name": "alice",
  "firstName": "Alice",
  "lastName": "Wonderland",
  "emailAddress": "alice@example.com",
  "password": "not@trivialP225w0rd",
  "description": "She went down the rabbit hole.",
  "groupIdList": [],
  "groupNameList": [],
  "status": 1,
  "isVisible": 1,
  "userSource": 0,
  "userRoleList": []
}

The files is passed with the -f or --from-file option:

starburst-ranger-cli user create -f=alice.json

Service definition management#

You can find information about creating and overriding the service definition in the sections about installing and upgrading the Presto Ranger plugin.