13.1. Migrating From Hive#
Presto uses ANSI SQL syntax and semantics, whereas Hive uses a SQL-like language called HiveQL which is loosely modeled after MySQL (which itself has many differences from ANSI SQL).
Use subscript for accessing a dynamic index of an array instead of a udf#
The subscript operator in SQL supports full expressions, unlike Hive (which only supports constants). Therefore you can write queries like:
SELECT my_array[CARDINALITY(my_array)] as last_element FROM ...
Avoid out of bounds access of arrays#
Accessing out of bounds elements of an array will result in an exception. You can avoid this with an
if as follows:
SELECT IF(CARDINALITY(my_array) >= 3, my_array, NULL) FROM ...
Use ANSI SQL syntax for arrays#
Arrays are indexed starting from 1, not from 0:
SELECT my_array AS first_element FROM ...
Construct arrays with ANSI syntax:
SELECT ARRAY[1, 2, 3] AS my_array
Use ANSI SQL syntax for identifiers and strings#
Strings are delimited with single quotes and identifiers are quoted with double quotes, not backquotes:
SELECT name AS "User Name" FROM "7day_active" WHERE name = 'foo'
Quote identifiers that start with numbers#
Identifiers that start with numbers are not legal in ANSI SQL and must be quoted using double quotes:
SELECT * FROM "7day_active"
Use the standard string concatenation operator#
Use the ANSI SQL string concatenation operator:
SELECT a || b || c FROM ...
Use standard types for CAST targets#
The following standard types are supported for
SELECT CAST(x AS varchar) , CAST(x AS bigint) , CAST(x AS double) , CAST(x AS boolean) FROM ...
In particular, use
VARCHAR instead of
Use CAST when dividing integers#
Presto follows the standard behavior of performing integer division when dividing two integers. For example, dividing
2 will result in
To perform floating point division on two integers, cast one of them to a double:
SELECT CAST(5 AS DOUBLE) / 2
Use WITH for complex expressions or queries#
When you want to re-use a complex output expression as a filter, use either an inline subquery or factor it out using the
WITH a AS ( SELECT substr(name, 1, 3) x FROM ... ) SELECT * FROM a WHERE x = 'foo'
Use UNNEST to expand arrays and maps#
Presto supports UNNEST for expanding arrays and maps.
UNNEST instead of
LATERAL VIEW explode().
SELECT student, score FROM tests LATERAL VIEW explode(scores) t AS score;
SELECT student, score FROM tests CROSS JOIN UNNEST(scores) AS t (score);
Caution with datediff#
datediff function returns the difference between the two dates in
days and is declared as:
datediff(string enddate, string startdate) -> integer
The equivalent Presto function date_diff uses a reverse order for the two date parameters and requires a unit. This has to be taken into account when migrating:
date_diff('day', startdate, enddate)
Overwriting data on insert#
INSERT queries are not allowed to overwrite existing data. You
can use the catalog session property
allow overwrites. Prepend the name of the catalog using the Hive connector, for
hdfs, and set the property in the session before you run the insert
SET SESSION hdfs.insert_existing_partitions_behavior = 'OVERWRITE'; INSERT INTO hdfs.schema.table ...
The resulting behavior is equivalent to using INSERT OVERWRITE in Hive.