9.16. Starburst Teradata Connector#

The Teradata Database connector allows querying and creating tables in an external Teradata database. This can be used to join data between different systems like Teradata and Hive, or between different Teradata Database instances.

Presto Teradata Connector supports Teradata 16.

Installation and Configuration#

Prerequisites#

Any of the below installation and configuration methods require the following artifacts:

  • Teradata’s JDBC driver, may be obtained from Teradata’s downloads.teradata.com
  • Connection details to connect to the Teradata DB, in form of a regular Presto catalog properties file (e.g. teradata.properties for a Presto catalog named teradata). File should contain the following contents, replacing the connection properties as appropriate for your setup:
connector.name=teradata
connection-url=jdbc:teradata://HOST
connection-user=USERNAME
connection-password=PASSWORD

Refer to the Teradata documentation for more information about format and parameters of the JDBC URL supported by the Teradata JDBC Driver.

Plugin Configuration#

To install the Teradata Database connector manually, use the following directions. You will need an installation of the Starburst Distribution of Presto and an Teradata JDBC Driver files (terajdbc4.jar and tdgssconfig.jar).

  1. Add the Teradata JDBC Driver JAR files to the Presto plugin/teradata directory.
  2. Add Presto Teradata catalog properties file (eg. teradata.properties for a Presto catalog named teradata).
  3. Perform the above steps on every Presto cluster node.
  4. Restart Presto on every node.

Kerberos Authentication#

The Teradata connector supports Kerberos-based authentication with the following configuration:

teradata.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf

With this configuration the user example@example.com, defined in the principal property, is used to connect to the database, and the related Kerberos service ticket is located in the example.keytab file. The Kerberos configuration specified with kafka.kerberos.config is used.

Kerberos Credential Passthrough#

The Teradata connector can be configured to pass through Kerberos credentials, received by Presto, to the Teradata database.

Configure Kerberos and Presto, following the instructions in Kerberos Credential Passthrough.

Then configure the connector to pass through the credentials from the server to the database in your catalog properties file and ensure the Kerberos client configuration properties are in place:

teradata.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File

Now any database access via Presto is subject to the data access restrictions and permissions of the user supplied via Kerberos.

Querying Teradata Databases#

The Teradata connector provides a schema for each Teradata database’s. Assuming the catalog name is teradata, you can see the available schemas by running SHOW SCHEMAS:

SHOW SCHEMAS FROM teradata;

If you have an Teradata database named web, you can view the tables they own by running SHOW TABLES:

SHOW TABLES FROM teradata.web;

You can see a list of the columns in the clicks table in web’s schema using either of the following:

DESCRIBE teradata.web.clicks;
SHOW COLUMNS FROM teradata.web.clicks;

Finally, you can access the clicks table in web’s schema:

SELECT * FROM teradata.web.clicks;

Your privileges in these schemas are those of the user configured in the connection properties file. If the user does not have access to these tables, you will not be able to access them.

Configuring Concurrency#

Presto is reading from Teradata Database using multiple parallel JDBC connections. By default, number of parallel reads is determined automatically based on number of AMPs available in Teradata. The parallel reads be configured by using below properties.

Teradata Concurrency Configuration Properties#

Property Name Description Default
teradata.parallelism-type
Determines the parallelism method. Possible values are:
  • AMP_BASED - single JDBC connection per Teradata AMP
  • MANUAL - user configures number of parallel connections
AMP_BASED
teradata.connections-count Number of parallel connections, to be used when teradata.parallelism-type=MANUAL is used. Use 1 to disable parallel read. Number of AMP in Teradata Database

User Impersonation#

Teradata connector supports user impersonation.

User impersonation can be enabled in the catalog file:

teradata.impersonation-enabled=true

User impersonation in Teradata connector is based on GRANT CONNECT THROUGH. For more details, search for GRANT CONNECT THROUGH on the Teradata documentation.

Note

Option WITH TRUST_ONLY in GRANT CONNECT THROUGH is not yet supported.

Table Statistics#

Teradata connector supports table and column statistics, as documented in Table Statistics. The statistics are collected by the Teradata database and retrieved by the connector. To collect statistics for a table, execute the following statements in Teradata Database. The second statement needs to be repeated for every column you want to collect statistics for.

COLLECT SUMMARY STATISTICS ON table_schema.table_name;
COLLECT STATISTICS COLUMN(column_name) ON table_schema.table_name;

The table and column statistics can be viewed in Presto using SHOW STATS and are used for Cost based optimizations.

Table Statistics Configuration Properties#

Property Name Description Default
statistics.enabled Enables table and column statistics. true
statistics.cache-ttl Duration for which table and column statistics are cached. 10m
statistics.cache-missing Cache the fact that table statistics were not available. false

Mapping Data Types Between Presto and Teradata#

Presto and Teradata each support different data types for table columns and use different names for some of them. Some data types are not supported equally by both systems. The following tables display the mapping used by Presto when working with existing columns, and when creating tables in Teradata.

Mappings for working with existing fields in Teradata:

Teradata data type Presto data type
BYTEINT TINYINT
SMALLINT SMALLINT
INTEGER INTEGER
BIGINT BIGINT
REAL REAL
FLOAT REAL
DOUBLE DOUBLE
NUMERIC DECIMAL
DECIMAL DECIMAL
CHAR CHAR
NCHAR CHAR
VARCHAR VARCHAR
NVARCHAR VARCHAR
LONGVARCHAR VARCHAR
LONGNVARCHAR VARCHAR
DATE DATE
TIMESTAMP TIMESTAMP
TIMESTAMP WITH TIME ZONE not supported
JSON not supported
XML not supported

Any Teradata data types not listed above, such as JSON or XML and others, are not supported.

If a Teradata column uses a type not listed in the above table, any query accessing such a field fails.

Mappings for creating tables with specific fields in Teradata:

Presto data type Teradata data type
TINYINT BYTEINT
SMALLINT SMALLINT
INTEGER INTEGER
BIGINT BIGINT
REAL REAL
DOUBLE DOUBLE PRECISION
DATE DATE
TIMESTAMP TIMESTAMP
CHAR CHAR
VARCHAR VARCHAR

The mapping above applies to creating a table with CREATE TABLE and CREATE TABLE ... AS [SELECT] statements.

Any Presto data type not listed in the above table can not be created as a column in a new Teradata table.

Configuration#

Configuration Property Name Session Property name Description Default
jdbc-types-mapped-to-varchar   Coma separated list of data types that will be converted to unbounded VARCHAR.  
unsupported-type.handling-strategy unsupported_type_handling_strategy
Configures how unsupported column data types are handled:
  • IGNORE - column is not accessible.
  • CONVERT_TO_VARCHAR - column is converted to unbounded VARCHAR.
IGNORE
teradata.number.default-scale number_default_scale Default Presto DECIMAL scale for Teradata NUMBER (without precision and scale) date type. When not set then such column is treated as not supported. 0
teradata.number.rounding-mode number_rounding_mode

Rounding mode for the Teradata NUMBER data type. Possible values are:

  • UNNECESSARY - Rounding mode to assert that the requested operation has an exact result, hence no rounding is necessary.
  • CEILING - Rounding mode to round towards positive infinity.
  • FLOOR - Rounding mode to round towards negative infinity.
  • HALF_DOWN - Rounding mode to round towards nearest neighbor unless both neighbors are equidistant, in which case rounding down is used.
  • HALF_EVEN - Rounding mode to round towards the nearest neighbor unless both neighbors are equidistant, in which case rounding towards the nearest neighbor is performed.
  • HALF_UP - Rounding mode to round towards nearest neighbor unless both neighbors are equidistant, in which case rounding up is used
  • UP - Rounding mode to round towards zero.
  • DOWN - Rounding mode to round towards zero.
UNNECESSARY