Synapse is Microsoft’s new unified cloud analytics platform. At a glance you might be forgiven for thinking it’s just a re-branding exercise of existing services. Microsoft doesn’t help by describing Synapse as “Azure SQL Data Warehouse evolved”, its far more than that. Here are 5 reason we think you should take a serious look at Synapse for your next data analytics project.
1. Unified data platform
Azure offers a broad array of data services, individually they are compelling but choosing the right services and integrating them has always been the difficult part. Azure Synapse reduces this friction by bringing together the best of Azure’s existing data services along with some powerful new features and making them play together nicely. Services that you know and love include Azure Data Factory, Mapping Data Flows, Power BI and of course SQL Pools (formally SQL Data Warehouse). I’ll be covering some of the new services and capabilities shortly.
What this means is that it’s now possible to explore data, run experiments, develop pipelines and operationalise solutions from a single intuitive web-based UI. While it’s still early days it is starting to feel like these services were finally designed to work together and I’m sure this will become more so as the product matures.
2. SQL on your terms
We have become familiar with services that allow you to use SQL to query across a range of data sources and formats. Synapse goes one step further and gives you the option to run these workloads either on-demand, via a serverless compute model, or via provisioned capacity known as SQL Pools. I’m particularly excited about the new SQL Serverless option which allows you to query data stored in your storage account or data lake without the need to spin up any clusters or compute resource. You simply point your query at your data and you are charged based on the amount of data you read. What’s more, almost any existing tooling that supports SQL Server will just work. Currently supported formats are CSV, Parquet and Json, and it is even possible to query Spark tables without a Spark cluster running. This is just one area where you see closer integration between these services. If you are looking for an alternative to Azure Data Lake Analytics then this is definitely one worth looking at. We have been working closely with the SQL Serverless product group to provide feedback and helping to ensure it will support our customer needs. I will be sharing some of our detailed performance analysis in a separate post so watch out for that.
If you prefer provisioned capacity in the form of SQL Data Warehouse then go-ahead and use SQL Pools. This is the same service that you know and love with some added features and integrations. However, be aware that feature parity is not yet the same between the two models, over time this gap will close.
I’d love to see a notebook experience built into Synapse Studio for SQL Serverless and SQL Pools. An alternative is to use Azure Data Studio which does provide SQL notebook support.
3. Seamless Spark
You may be surprised to learn that Synapse offers it’s own managed spark environment based on Apache Spark 2.4 with Python 3.6. There is none of usual messing around with configuration before being productive. Security and sign-on is seamless and connecting to storage accounts is very straight forward. Cluster start up times feel snappier when compared to the likes of Databricks which will be welcome news if you are running spark jobs as part of your data pipelines. If you are a fan of Delta Lake then you won’t be disappointed, open source Linux Foundation Delta Lake 0.6 is supported. Syanapse Studio comes with an nteract based notebook experience for documenting your experiments.
If you come from a .NET background like me, you’ll be delighted to learn that Synapse comes with .NET bindings for Spark meaning you can now take advantage of Spark without needing to learn a new language. The notebook experience allows you to write all you analytics queries in .NET or even to mix match with more traditional options such as PySpark, Scala, Spark SQL. The .NET notebook experience comes courtesy of the excellent .Net Interactive. As of launch, .NET Core 3.1 is supported. It is possible to create Spark programs in .NET using Visual Studio and publish these for execution as Spark Jobs. This really is a game changer for organisations who have existing .NET teams and skills.
4. HTAP with Cosmos DB
You may be aware that Cosmos DB can now automatically write data to an optimized analytical store. It is now possible to query this store from SQL Serverless or Spark via Synapse Link. Imagine you have an application database backed by Cosmos DB and you want to access this data for exploration and analytical purposes without affecting your production system. Traditionally, you would need to write an ETL, possibly via the change feed to replicate your data into a data lake. With Synapse Link and Cosmos DB analytical store you can now perform powerful analytical queries over your application data from Azure Synapse within minutes of it arriving in your database. Setting it up is a breeze and once up and running, you can run analytical queries using SQL Serverless or Spark, you can even join and combine data with other data you have in your data lake. Pretty cool eh!
5. Integrated Power BI
The integrations don’t stop there. Power BI workspaces also integrate directly with Synapse. You can not only access reports and datasets within Synapse studio, but can easily create new datasets and reports from the data you’ve curated in Azure Synapse. You’ve got all the usual data connectors you are used to and since SQL Serverless looks like any regular SQL database you can easily run powerful analytical queries during import. Power BI has previously been focused on business users but this move now puts Power BI firmly in the hands of data professions. A clever and very welcome move.