Starburst Teradata connectors#

The Teradata connectors allow querying and creating tables in external Teradata databases. There are two connectors available:

The Starburst Teradata connector is a standard connector with a rich feature set, which is easy to install.

The Starburst Teradata Direct connector has the same feature set as the Starburst Teradata connector. It combines the connector with a table operator installed on the Teradata servers. This makes it more complex to install and update, but can produce higher query performance.

Note

The Starburst Teradata connectors require a valid Starburst Enterprise Presto license.

Starburst Teradata connector#

The Starburst Teradata connector is a feature rich, easy to install connector.

Prerequisites#

  • Teradata 16

  • Teradata’s JDBC driver, may be obtained from Teradata’s downloads.teradata.com

  • Connection details to connect to Teradata, in form of a regular Presto catalog properties file (e.g. teradata.properties for a Presto catalog named teradata). File should contain the following contents, replacing the connection properties as appropriate for your setup:

connector.name=teradata
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD

Refer to the Teradata documentation for more information about format and parameters of the JDBC URL supported by the Teradata JDBC driver.

Plugin configuration#

To install the Teradata connector, use the following directions. You need the Teradata JDBC driver files (terajdbc4.jar and tdgssconfig.jar).

  1. Add the Teradata JDBC driver JAR files to the Presto plugin/teradata directory.

  2. Add Presto Teradata catalog properties file (eg. teradata.properties for a Presto catalog named teradata)

  3. Perform the above steps on every Presto cluster node.

  4. Restart Presto on every node.

Kerberos authentication#

The connector supports Kerberos-based authentication with the following configuration:

teradata.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf

With this configuration the user example@example.com, defined in the principal property, is used to connect to the database, and the related Kerberos service ticket is located in the example.keytab file. The Kerberos configuration specified with kerberos.config is used.

Kerberos credential passthrough#

The connector can be configured to pass through Kerberos credentials, received by Presto, to the Teradata database.

Configure Kerberos and Presto, following the instructions in Kerberos Credential Passthrough.

Then configure the connector to pass through the credentials from the server to the database in your catalog properties file and ensure the Kerberos client configuration properties are in place:

teradata.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File

Now any database access via Presto is subject to the data access restrictions and permissions of the user supplied via Kerberos.

Querying#

The connector provides a schema for each Teradata database. Assuming the catalog name is teradata, you can see the available schemas by running SHOW SCHEMAS:

SHOW SCHEMAS FROM teradata;

If you have an Teradata database named web, you can view the tables they own by running SHOW TABLES:

SHOW TABLES FROM teradata.web;

You can see a list of the columns in the clicks table in web’s schema using either of the following:

DESCRIBE teradata.web.clicks;
SHOW COLUMNS FROM teradata.web.clicks;

Finally, you can access the clicks table in web’s schema:

SELECT * FROM teradata.web.clicks;

Your privileges in these schemas are those of the user configured in the connection properties file. If the user does not have access to these tables, you are not able to access them in Presto.

Concurrency#

Presto is reading from Teradata using multiple parallel JDBC connections. By default, the number of parallel reads is determined automatically based on number of AMPs available in Teradata. The parallel reads can be configured by using the following properties:

Teradata concurrency configuration properties#

Property name

Description

Default

teradata.parallelism-type

Determines the parallelism method. Possible values are:

  • AMP_BASED, single JDBC connection per Teradata AMP

  • MANUAL, user configures number of parallel connections

AMP_BASED

teradata.connections-count

Number of parallel connections, to be used when teradata.parallelism-type=MANUAL is used. Use 1 to disable parallel read.

Number of AMP in Teradata Database

User impersonation#

Teradata connector supports user impersonation.

User impersonation can be enabled in the catalog file:

teradata.impersonation-enabled=true

User impersonation in Teradata connector is based on GRANT CONNECT THROUGH. For more details, search for GRANT CONNECT THROUGH on the Teradata documentation.

Note

Option WITH TRUST_ONLY in GRANT CONNECT THROUGH is not yet supported.

Table statistics#

Teradata connector supports table and column statistics, as documented in Table Statistics. The statistics are collected by the Teradata database and retrieved by the connector. To collect statistics for a table, execute the following statements in Teradata. The second statement needs to be repeated for every column you want to collect statistics for.

COLLECT SUMMARY STATISTICS ON table_schema.table_name;
COLLECT STATISTICS COLUMN(column_name) ON table_schema.table_name;

The table and column statistics can be viewed in Presto using SHOW STATS and are used for Cost based optimizations.

Table statistics configuration properties#

Property name

Description

Default

statistics.enabled

Enables table and column statistics.

true

statistics.cache-ttl

Duration for which table and column statistics are cached.

10m

statistics.cache-missing

Cache the fact that table statistics were not available.

false

Mapping data types between Presto and Teradata#

Presto and Teradata each support different data types for table columns and use different names for some of them. Some data types are not supported equally by both systems. The following tables display the mapping used by Presto when working with existing columns, and when creating tables in Teradata.

Mappings for working with existing fields in Teradata:

Teradata data type

Presto data type

BYTEINT

TINYINT

SMALLINT

SMALLINT

INTEGER

INTEGER

BIGINT

BIGINT

REAL

REAL

FLOAT

REAL

DOUBLE

DOUBLE

NUMERIC

DECIMAL

DECIMAL

DECIMAL

CHAR

CHAR

NCHAR

CHAR

VARCHAR

VARCHAR

NVARCHAR

VARCHAR

LONGVARCHAR

VARCHAR

LONGNVARCHAR

VARCHAR

DATE

DATE

TIMESTAMP

TIMESTAMP

TIMESTAMP WITH TIME ZONE

not supported

JSON

not supported

XML

not supported

Any Teradata data types not listed above, such as JSON or XML and others, are not supported.

If a Teradata column uses a type not listed in the above table, any query accessing such a field fails.

Mappings for creating tables with specific fields in Teradata:

Presto data type

Teradata data type

TINYINT

BYTEINT

SMALLINT

SMALLINT

INTEGER

INTEGER

BIGINT

BIGINT

REAL

REAL

DOUBLE

DOUBLE PRECISION

DATE

DATE

TIMESTAMP

TIMESTAMP

CHAR

CHAR

VARCHAR

VARCHAR

The mapping above applies to creating a table with CREATE TABLE and CREATE TABLE ... AS [SELECT] statements.

Any Presto data type not listed in the above table can not be created as a column in a new Teradata table.

Mapping configuration#

Mapping configuration properties#

Property name

Session property name

Description

Default

jdbc-types-mapped-to-varchar

Comma separated list of data types that are converted to unbounded VARCHAR.

unsupported-type.handling-strategy

unsupported_type_handling_strategy

Configures how unsupported column data types are handled:

  • IGNORE, column is not accessible.

  • CONVERT_TO_VARCHAR, column is converted to unbounded VARCHAR.

IGNORE

teradata.number.default-scale

number_default_scale

Default Presto DECIMAL scale for Teradata NUMBER (without precision and scale) date type. When not set then such column is treated as not supported.

0

teradata.number.rounding-mode

number_rounding_mode

Rounding mode for the Teradata NUMBER data type. Possible values are:

  • UNNECESSARY - Rounding mode to assert that the requested operation has an exact result, hence no rounding is necessary.

  • CEILING - Rounding mode to round towards positive infinity.

  • FLOOR - Rounding mode to round towards negative infinity.

  • HALF_DOWN - Rounding mode to round towards nearest neighbor unless both neighbors are equidistant, in which case rounding down is used.

  • HALF_EVEN - Rounding mode to round towards the nearest neighbor unless both neighbors are equidistant, in which case rounding towards the nearest neighbor is performed.

  • HALF_UP - Rounding mode to round towards nearest neighbor unless both neighbors are equidistant, in which case rounding up is used.

  • UP - Rounding mode to round towards zero.

  • DOWN - Rounding mode to round towards zero.

UNNECESSARY

Lower case of Teradata NOT CASESPECIFIC char types#

Presto does not support case insensitive comparison of values with char-based type. However, it is possible to force the connector to convert values to lower case for these comparisons. You can activate this behavior with a catalog property:

teradata.type.not-case-specific.to-lower-case=true

Alternatively you can use a catalog session property:

SET SESSION teradata.not_casespecific_to_lower_case = true;

Pushdown#

The connector supports pushdown for optimized query processing.

Additionally, it supports pushdown of the following aggregate functions using the BIGINT type:

Limitations#

The following SQL statements are not yet supported:

Starburst Teradata Direct connector#

The Teradata Direct connector is an alternative connector for Teradata databases.

Compared to the Starburst Teradata connector, it requires more advanced configuration and is capable of achieving higher performance. It supports all use cases of the Teradata connector, and adds greatly improved performance for SELECT queries.

The connector is composed of components running on Presto, and others running on Teradata. All components participate in every transmission. A transmission includes all data transfer related to a single table scan.

Warning

The Teradata Direct connector cannot be used on Kubernetes clusters.

The connector uses JDBC as a control channel to schedule the query in Teradata. Any SELECT statement run with JDBC is passed to a Teradata table operator, which then uses the custom transmitters to pass the data over HTTP to the Presto controller and receivers:

Presto

Controller

The controller starts and manages all transmissions. It runs on the Presto coordinator and distributes work to the receivers, and provides status information to the transmitters.

Receivers

The receivers connect to the transmitters and receive data from Teradata from the transmitters. They run on the Presto workers and are managed by the controller.

Teradata

Table operator and JDBC driver

The table operator and JDBC driver read the data from the Teradata storage.

Transmitters

The transmitters send the data for each table operator invocation on Teradata to the receivers on Presto.

The following features from the Teradata connector are also supported by the Teradata Direct connector:

Presto controller and receiver installation#

The Teradata Direct connector needs to be configured in a catalog properties file. e.g. myteradatadb.properties, using the connector name teradata-direct.

connector.name=teradata-direct
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD
teradata-direct.table-operator.name=some_schema.presto_table_operator
teradata-direct.http.port=9000
teradata-direct.http.host=hostname.accessible.from.teradata.instance

Besides the connector name, you need to provide the JDBC connection parameters and the relevant JAR files in the teradata plugin directory, as discussed in Starburst Teradata connector.

Additionally you need to install the native table operator on Teradata in the queried schema and configure the parameters for the connector specifying the table operator name, and the HTTP port and host of the Presto coordinator node.

Further tweaking can be done with the following parameters:

teradata-direct.receiver.buffer.size in the catalog properties file, or receiver_buffer_size in catalog session properties:

Determines the buffer size per each Presto worker. This buffer is available per table scan, so a single query joining three tables uses three buffers. Default value is 32MB.

teradata-direct.parcel.size in catalog.properties, or parcel_size in catalog session properties:

The size of the data packets sent to the Presto receivers. The last packet may be smaller. Defaults to 2MB. Parcel size should not be higher than the buffer size.

teradata-direct.compression.enabled in catalog.propertes, or compression_enabled in catalog session properties:

Determines if transmission data is compressed. Defaults to false.

teradata-direct.splits-per-worker in catalog.propertes, or splits_per_worker in catalog session properties:

Determines how many splits each Presto worker receives. This determines the maximum concurrency level on Presto. Defaults to 16, identical to the task.concurrency of the Presto server.

Aside from Teradata specific options, it is recommended to use JDBC connection configuration, configured in the catalog properties file, to cache the Teradata metadata on Presto and improve query performance.

metadata.cache-ttl=10m
metadata.cache-missing=true
statistics.cache-ttl=10m
statistics.cache-missing=true

Encryption#

The connector can encrypt all data transmissions and use HTTPS. Add the details for the Java keystore file to use in the Presto server config.

teradata-direct.https.keystore.path=mykekstore.jks
teradata-direct.https.keystore.password=123456

Presto automatically distributes the keystore to the controller, the receivers, the transmitters and the table operator.

Native table operator#

The native table operator can be installed with the following steps:

  • Download the Teradata native table operator .so file

  • Copy the file onto the Teradata servers into a known location

  • Allow Teradata processes to access the file by either adding read permission with a command such as chmod +r <.so file>, or allowing access by the tdtrusted group with chown :tdtrusted <.so file>; chmod g+r <.so file>.

  • Create the table operator function in the desired schema. We recommend a dedicated database for the current operator version

    CREATE FUNCTION some_database.presto_table_operator()
    RETURNS TABLE VARYING USING FUNCTION starburst_teradata_contract
    LANGUAGE C
    NO SQL
    PARAMETER STYLE SQLTABLE
    EXTERNAL NAME 'SP!/opt/presto/libstarburst_teradata.so!F!starburst_teradata';
    

No Teradata restart is needed for the changes to take effect.

The path and access for the .so file, for example /opt/presto, has to be adjusted to a suitable location on your Teradata servers.

Each user that connects to Teradata from Presto needs to be granted sufficient access:

GRANT EXECUTE ON SPECIFIC FUNCTION some_database.presto_table_operator TO your_user;

Alternatively access to table operator UDF can be granted via ROLE

CREATE ROLE presto_table_operator_users;
GRANT EXECUTE ON SPECIFIC FUNCTION some_database.presto_table_operator
  TO presto_table_operator_users;
GRANT presto_table_operator_users TO your_user;

This also requires to set the role for any impersonated user, this could be achieved with the below in the catalog file:

teradata.impersonation-role=presto_table_operator_users

Updates and removal:

It is important to remove the function before any updates:

DROP FUNCTION some_database.presto_table_operator;

Additional recommendations for successful upgrades:

  • drop the database used for the function

  • remove all old .so files from the Teradata servers

  • use a different filename for the .so file for the new version

  • use a new exclusive database for the function with the new version

  • restart the Teradata servers, if possible