14.33. Teradata Parallel Direct Connector
The Teradata Parallel Direct Connector is an alternative connector for Teradata databases.
Compared to the Teradata Connector, it requires more advanced configuration and is
capable of achieving higher performance. It supports all use cases of the
Teradata connector, and adds greatly improved performance for
The connector is composed of components running on Presto, and others running on Teradata. All components participate in every transmission. A transmission includes all data transfer related to a single table scan.
The connector uses JDBC as a control channel to schedule the query in Teradata.
SELECT statement run with JDBC is passed to a Teradata Table Operator,
which then uses the custom transmitters to pass the data over HTTP to the Presto
controller and receivers:
- The controller starts and manages all transmissions. It runs on the Presto coordinator and distributes work to the receivers, and provides status information to the transmitters.
- The receivers connect to the transmitters and receive data from Teradata from the transmitters. They run on the Presto workers and are managed by the controller.
- Table Operator and JDBC Driver
- The table operator and JDBC driver read the data from the Teradata storage.
- The transmitters send the data for each table operator invocation on Teradata to the receivers on Presto.
Follow the documentation below to start using the connector:
The Teradata Parallel Direct connector needs to be configured in a catalog
properties file. e.g.
myteradatadb.properties, using the connector name
connector.name=teradata-parallel-direct connection-url=jdbc:teradata://HOST connection-user=USERNAME connection-password=PASSWORD teradata-parallel-direct.table-operator.name=some_schema.presto_table_operator teradata-parallel-direct.http.port=9000 teradata-parallel-direct.http.host=hostname.accessible.from.teradata.instance
Besides the connector name, you need to provide the JDBC connection parameters
and the relevant JAR files in the
teradata plugin directory, as discussed in
Teradata Connector. Additionally you need to configure the parameters for the
connector specifying the table operator name, and the HTTP port and host of the
Presto coordinator node.
Further tweaking can be done with the following parameters:
receiver_buffer_size in catalog session properties:
Determines the buffer size per each Presto worker. This buffer is available per table scan, so a single query joining three tables uses three buffers. Default value is 32MB.
transmitter_buffer_size in catalog session properties:
Determines the buffer size for each Teradata transmitter per table operator UDF invocation. This buffer is available per table operator, so a single query joining three tables uses three buffers. Default value is 32MB.
transmitter_threads_count in catalog session
Number of dedicated threads used for the data transfer. If it is set to 0, data transfer is handled by the UDF thread. Default value is 0.
Having more dedicated threads for transfer result in better performance, until network is saturated or data is not produced fast enough. The threads are used per single table scan, so setting the count too high may cause resource related issues, especially with concurrent queries and with multiple table scans.
parcel_size in catalog session properties:
The size of the data packets sent to the Presto receivers. The last packet may be smaller. Defaults to 2MB. Parcel size should not be higher than the buffer size. If you are using N number of dedicated threads for the transmitters, then you should use at least
2*N*stork.parcel.sizefor transmitter buffer.
compression_enabled in catalog session properties:
Determines if transmission data is compressed. Defaults to
splits_per_worker in catalog session properties:
Determines how many splits each Presto worker receives. This determines the maximum concurrency level on Presto. Defaults to 16, identical to the
task.concurrencyof the Presto server.
Aside from Teradata specific options, it is recommended to use JDBC connection configuration, configured in the catalog properties file, to cache the Teradata metadata on Presto and improve query performance.
metadata.cache-ttl=10m metadata.cache-missing=true statistics.cache-ttl=10m statistics.cache-missing=true
The connector can encrypt all data transmissions and use HTTPS. Add the details for the Java keystore file to use in the Presto server config.
Presto automatically distributes the keystore to the controller, the receivers, the transmitters and the table operator.
The connector supports Kerberos-based authentication with the following configuration:
teradata.authentication.type=KERBEROS teradata.kerberos.client.principal=test@TESTING-KRB.STARBURSTDATA.COM teradata.kerberos.client.keytab=krb/client/test.keytab
With this configuration the user
email@example.com, defined in the
principal property, is used to connect to the database, and the related Kerberos
service ticket is located in the
The connector can be configured to pass through Kerberos credentials, received by the Presto server, to the Teradata database.
Configure Kerberos and the Presto server, following the instructions in Credential Passthrough.
Configure the connector to pass through the credentials from the server to the database:
Now any database access via Presto is subject to the data access restrictions and permissions of the user supplied via Kerberos.
Contact Starburst Support to receive the required JAR file,
Teradata instance is up and running.
Teradata user with sufficient access rights to install JAR and create user defined functions UDF
GRANT CREATE EXTERNAL PROCEDURE ON syslib TO your_user WITH GRANT OPTION GRANT EXECUTE PROCEDURE ON sqlj.install_jar TO your_user WITH GRANT OPTION GRANT EXECUTE PROCEDURE ON sqlj.remove_jar TO your_user WITH GRANT OPTION GRANT EXECUTE PROCEDURE ON sqlj.replace_jar TO your_user WITH GRANT OPTION
Install the JAR located in
JARPATH on the Teradata server, and create the
table operator UDF using BTEQ or a JDBC-connected tool
CALL sqlj.install_jar('SJ!/presto/table/operator/jar/JARPATH',’presto',0) REPLACE FUNCTION presto_table_operator() RETURNS TABLE VARYING USING FUNCTION session_contract LANGUAGE JAVA NO SQL PARAMETER STYLE SQLTABLE EXTERNAL NAME 'presto:com.starburstdata.teradata.PrestoTableOperator.execute'
No Teradata restart is needed for the changes to take effect.
Each user that connects to Teradata from Presto needs to be granted sufficient access:
GRANT EXECUTE ON SPECIFIC FUNCTION test.presto_table_operator TO your_user
Alternatively access to table operator UDF can be granted via ROLE
CREATE ROLE presto_table_operator_users; GRANT EXECUTE ON SPECIFIC FUNCTION test.presto_table_operator TO presto_table_operator_users; GRANT presto_table_operator_users TO your_user;
This also requires to set the role for any impersonated user, this could be achieved with the below in teradata.properties file:
DROP FUNCTION presto_table_operator CALL sqlj.remove_jar(‘presto’',0)
It is recommended for upgrade to install the Presto table operator under a different name. The UDF function should also use a different name. That way it is possible to have different Presto table operator versions installed at the same time and an upgrade process does not affect the currently working system.