4.1. Overview of Presto on Azure

General Description

Starburst Presto is available on the Azure HDInsight Platform. Azure HDInsight is a fully-managed cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. Presto and HDInsight were both designed for the separation of storage and compute which allows for flexible performance and cost. You pay only for what you use by creating clusters on demand scaling them up and down. You can read more about Azure HDInsight from Microsoft.

Presto is deployed as an application on Azure HDInsight and can be configured to immediately start querying data in Azure Storage. This allows you to install Presto on an existing HDInsight cluster or create a new cluster for Presto. Additionally, Presto is able to leverage the external metastore feature on HDInsight and share metadata with other clusters such as Hive and Spark

Try Presto on Microsoft Azure Marketplace!

Releases & Software

Presto 0.203-e

Starburst Presto on Azure HDInsight is based on version 203e. 203e includes many additional features and patches to the prestodb/presto version 0.203. Notably, Presto 203e, includes the state of the art cost based optimizer for fast query results.

Apache Superset

Packaged alongside Presto is the business intelligence tool Apache Superset. Superset is a data exploration and visualization web application that enables users to process data in a variety of ways including writing SQL queries, creating new tables, creating a visualization (slice), adding that visualization to one or many dashboards and downloading a CSV. Superset’s SQL Lab IDE provides the user ways to both query and visualize data. You can explore and preview tables in Presto and effortlessly compose SQL queries to access data. From here, you can export a CSV file or immediately visualize your data in the Superset “Explore” view.

For more information on Apache Superset, please refer to our dedicated section: Using Apache Superset.

Architecture

Starburst Presto is installed as an application on the Azure HDInsight Hadoop Cluster. The Cluster contains 2 HDInsight Head nodes and a variable number of HDInsight Worker nodes. The Presto Coordinator is installed on one of the two HDInsight Head Nodes and the Presto Workers are installed on HDInsight Worker Nodes. When HDInsight is either scaled up or down, Presto will scale with it.

The deployment also includes an HDInsight Edge Node. The Presto command line interface (CLI) and Apache Superset is installed on this node.

Deployment Recommendations

It is highly recommended to not use the Presto HDInsight cluster for anything compute intensive when Presto is in use. Presto is designed to assume full utilization of the system resources to get maximum performance. For example, do not run both Hive jobs and Presto queries simultaneously on the same cluster. However you can run them on the same cluster during different times.

Using Azure Powershell and Azure CLI

This guide primarily refers to using the Azure Portal when describing various actions. The same functionality is also available by using Azure Powershell and Azure CLI. Refer to the following documentation from Microsoft on how to manage Azure HDInsight clusters using these tools.

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-command-line

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-powershell

Presto Support Options

At Starburst we pride ourselves on being the Presto experts. Need support for your organization’s production and development environments for Presto? We have got you covered with 24x7 support as part of our Enterprise Subscription.

Contact us for more information about our offerings and explore our other services on our website.