12.1. Data Types#
Presto has a set of built-in data types, described below. Additional types can be provided by plugins.
Connectors are not required to support all types. See connector documentation for details on supported types.
A 8-bit signed two’s complement integer with a minimum value of
-2^7and a maximum value of
2^7 - 1.
A 16-bit signed two’s complement integer with a minimum value of
-2^15and a maximum value of
2^15 - 1.
A 32-bit signed two’s complement integer with a minimum value of
-2^31and a maximum value of
2^31 - 1. The name
INTis also available for this type.
A 64-bit signed two’s complement integer with a minimum value of
-2^63and a maximum value of
2^63 - 1.
A real is a 32-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.
A double is a 64-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.
A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.
The decimal type takes two literal parameters:
precision - total number of digits
scale - number of digits in fractional part. Scale is optional and defaults to 0.
Example type definitions:
Variable length character data with an optional maximum length.
Example type definitions:
SQL statements support simple literal, as well as Unicode usage:
literal string :
'Hello winter !'
Unicode string with default escape character:
U&'Hello winter \2603 !'
Unicode string with custom escape character:
U&'Hello winter #2603 !' UESCAPE '#'
A Unicode string is prefixed with
U&and requires an escape character before any Unicode character usage with 4 digits. In the examples above
#2603represent a snowman character. Long Unicode codes with 6 digits require usage of the plus symbol before the code. For example, you need to use
\+01F600for a grinning face emoji.
Fixed length character data. A
CHARtype without length specified has a default length of 1. A
CHAR(x)value always has
xcharacters. For instance, casting
CHAR(7)adds 4 implicit trailing spaces. Leading and trailing spaces are included in comparisons of
CHARvalues. As a result, two character values with different lengths (
x != y) will never be equal.
Example type definitions:
Variable length binary data.
SQL statements support usage of binary data with the prefix
X. The binary data has to use hexadecimal format. For example, the binary form of
Binary strings with length are not yet supported:
JSON value type, which can be a JSON object, a JSON array, a JSON number, a JSON string,
Date and Time#
Calendar date (year, month, day).
Time of day (hour, minute, second, millisecond) without a time zone. Values of this type are parsed and rendered in the session time zone.
TIME WITH TIME ZONE#
Time of day (hour, minute, second, millisecond) with a time zone. Values of this type are rendered using the time zone from the value.
TIME '01:02:03.456 America/Los_Angeles'
TIMESTAMPis an alias for
Instant in time that includes the date and time of day without a time zone with
Pdigits of precision for the fraction of seconds. A precision of up to 12 (picoseconds) is supported. Values of this type are parsed and rendered in the session time zone.
TIMESTAMP(P) WITHOUT TIME ZONEis an equivalent name.
Timestamp values can be constructed with the
TIMESTAMPliteral expression. Alternatively, language constructs such as
localtimestamp(p), or a number of date and time functions and operators can return timestamp values.
Casting to lower precision causes the value to be rounded, and not truncated. Casting to higher precision appends zeros for the additional digits.
The following examples illustrate the behavior:SELECT TIMESTAMP '2020-06-10 15:55:23'; -- 2020-06-10 15:55:23 SELECT TIMESTAMP '2020-06-10 15:55:23.383345'; -- 2020-06-10 15:55:23.383345 SELECT typeof(TIMESTAMP '2020-06-10 15:55:23.383345'); -- timestamp(6) SELECT cast(TIMESTAMP '2020-06-10 15:55:23.383345' as TIMESTAMP(1)); -- 2020-06-10 15:55:23.4 SELECT cast(TIMESTAMP '2020-06-10 15:55:23.383345' as TIMESTAMP(12)); -- 2020-06-10 15:55:23.383345000000
TIMESTAMP WITH TIME ZONE#
TIMESTAMP WITH TIME ZONEis an alias for
TIMESTAMP(3) WITH TIME ZONE(millisecond precision).
TIMESTAMP(P) WITH TIME ZONE#
Instant in time that includes the date and time of day with
Pdigits of precision for the fraction of seconds and with a time zone. Values of this type are rendered using the time zone from the value.
TIMESTAMP '2001-08-22 03:04:05.321 America/Los_Angeles'
INTERVAL YEAR TO MONTH#
Span of years and months.
INTERVAL '3' MONTH
INTERVAL DAY TO SECOND#
Span of days, hours, minutes, seconds and milliseconds.
INTERVAL '2' DAY
An array of the given component type.
ARRAY[1, 2, 3]
A map between the given component types.
MAP(ARRAY['foo', 'bar'], ARRAY[1, 2])
A structure made up of fields that allows mixed types. The fields may be of any SQL type.
By default, row fields are not named, but names can be assigned.
CAST(ROW(1, 2e0) AS ROW(x BIGINT, y DOUBLE))
Named row fields are accessed with field reference operator
CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE)).x
Named or unnamed row fields are accessed by position with the subscript operator
. The position starts at
1and must be a constant.
An IP address that can represent either an IPv4 or IPv6 address. Internally, the type is a pure IPv6 address. Support for IPv4 is handled using the IPv4-mapped IPv6 address range (RFC 4291#section-220.127.116.11). When creating an
IPADDRESS, IPv4 addresses will be mapped into that range. When formatting an
IPADDRESS, any address within the mapped range will be formatted as an IPv4 address. Other addresses will be formatted as IPv6 using the canonical format defined in RFC 5952.
A HyperLogLog sketch allows efficient computation of
approx_distinct(). It starts as a sparse representation, switching to a dense representation when it becomes more efficient.
A quantile digest (qdigest) is a summary structure which captures the approximate distribution of data for a given input set, and can be queried to retrieve approximate quantile values from the distribution. The level of accuracy for a qdigest is tunable, allowing for more precise results at the expense of space.
A qdigest can be used to give approximate answer to queries asking for what value belongs at a certain quantile. A useful property of qdigests is that they are additive, meaning they can be merged together without losing precision.
A qdigest may be helpful whenever the partial results of
approx_percentilecan be reused. For example, one may be interested in a daily reading of the 99th percentile values that are read over the course of a week. Instead of calculating the past week of data with
qdigests could be stored daily, and quickly merged to retrieve the 99th percentile value.