Concepts #

Starburst Enterprise platform (SEP) and Starburst Galaxy embed Trino, and therefore all share numerous concepts. Your understanding of these basics helps you with using our products efficiently.

The following sections detail the core concepts behind the Starburst query engine and products.

Trino #

Trino (formerly Presto® SQL) is the fastest open source, massively parallel processing SQL query engine designed for analytics of large datasets distributed over one or more data sources in object storage, databases and other systems.

Learn more about Trino.

Starburst Galaxy #

Starburst Galaxy is an easy to use, fully-managed and enterprise-ready SaaS offering of Trino. Configure your data sources, and query your data wherever it lives. Starburst takes care of the rest so you can concentrate on the analytics.

Get started with Starburst Galaxy.

Starburst Enterprise #

Starburst Enterprise is a fully supported, enterprise-grade distribution of Trino. SEP adds integrations, improves performance, provides security, and makes it easy to deploy, configure and manage your clusters.

Get started with Starburst Enterprise.

Open source software #

Trino is distributed as open source software under the Apache license, and therefore maintained by a community of contributors from all across the globe. The founders and many core contributors of Trino, are with Starburst, leading the project and helping it grow.

Data sources #

Starburst products build on Trino’s ability to efficiently query data distributed in different platforms throughout your organization. From small RDBMS-based data sources to large, multiple-petabyte object store data sources, both SEP and Starburst Galaxy allow you to query these disparate sources at the same time, in the same query, with response times fast enough to support real-time analysis.

In both products, data sources are defined as catalogs, which are then used as part of an object’s fully-qualified name in queries.

Analytics #

Analytics is the process of systematically inspecting and manipulating data or statistics to better understand patterns and characteristics of the data and its origins. Trino is designed for analytics processing with SQL.

Massively parallel processing #

Massively parallel processing (MPP) is an architecture for distributed workload processing. Multiple server nodes collaborate in a cluster.

The coordinator node receives a query written in SQL from a user. The coordinator analyzes and plans the query execution. It then adapts the plan to the number of worker nodes in the cluster and distributes the workload for parallel processing across all workers.

These workers all load and process data at the same time. As a result, the query processing is completed much faster. The workers collaborate and provide the processing results back to the coordinator and ultimately to the user. Results are returned to the user much quicker than what a single node architecture can achieve.

SQL #

Structured Query Language (SQL) is a domain-specific language for data access and data manipulation. It is the industry standard with a long history and wide-adoption by users and tools alike.

Trino uses SQL as query language for analytics of the data in any connected data source.

Query engine #

A query engine is a system designed to receive queries, process them, and return results to the users.

Contrary to a database system, it does not include a storage engine for managing the actual data in files, objects, or in memory. Instead a query engine integrates with many storage engines and can therefore be used to query multiple systems at the same time.

These systems can be relational databases, data warehouse, object storage system implementing a data or even very different systems that simply expose an API to retrieve data.

Trino is a query engine, and not a database system.

Object storage #

An object storage system is a data source that stores data in files that live in a directory structure, rather than a relational database. Some common examples of object storage systems include Amazon S3 and Azure Data Lake Storage (ADLS).

Object storage systems require the use of a metastore, which is a repository of meta information about the storage system such as the file format, directory structure, and location of data within its files.

For more information, see the object storage systems introduction.

Next steps #

Want to learn even more? Try these sources: