Endjin - Home

Big Data

C#, Span and async

by Ian Griffiths

C# 7.2 introduced support for ref struct types. These are types that are required to live on the stack. The most notable ref struct types are Span<T> and ReadOnlySpan<T> which have made it possible to dramatically improve the efficiency of certain kinds of work by reducing or even eliminating object allocations. Endjin recently open sourced […]


Over the past few months I’ve worked on multiple projects involving reactive processing of large amounts of data. In a world where the volume of data is increasing at an almost inconceivable rate, being able to process this data efficiently and cheaply is vital. A large part of the work involved in these projects was […]


Import and export notebooks in Databricks

by Ed Freeman

Sometimes we need to import and export notebooks from a Databricks workspace. This might be because you have a bunch of generic notebooks that can be useful across numerous workspaces, or it could be that you’re having to delete your current workspace for some reason and therefore need to transfer content over to a new […]


A couple of weeks ago I had the opportunity to attend NDC in Oslo. It was an absolutely brilliant experience, and my head is still reeling a bit from everything I learnt! The focus of a lot of the talks was around neural networks and machine learning – something which we have explored quite a […]


Have you been trying to create a Databricks cluster using the CLI? Have you been getting infuriated by something seemingly so trivial? Well, join the club. Although, get ready to depart the club because I may have the solution you need. When creating a cluster using the CLI command databricks clusters create, you’re required to […]


I recently wrote a blog on using ADF Mapping Data Flow for data manipulation. As part of the same project, we also ported some of an existing ETL Jupyter notebook, written using the Python Pandas library, into a Databricks Notebook. This notebook could then be run as an activity in a ADF pipeline, and combined […]


Endjin is a Snowflake Partner

by Howard van Rooijen

I’ve very pleased to announce that endjin has become a Snowflake partner. This fantastic “designed for the cloud” data platform redefines what a data warehouse can be in the age of cloud. With features such as data sharing, usage based billing, and availability on Microsoft Azure, it has won our hearts. Over the last three years, we’ve […]


As part of a recent project we did a lot of experimentation with the new Azure Data Factory feature: Mapping Data Flows. The tool is still in preview, and more functionality is sure to be in the pipeline, but I think it opens up a lot of really exciting possibilities for visualising and building up […]


In the last post I explained how to create a set of Azure Functions that could load data into Snowflake as well as execute Snowflake queries and export the results into your favorite cloud storage solution. In this post I will show how we can use these functions in Azure Data Factory to plug Snowflake […]


If, like me, you are a fan of Azure Data Factory and love Snowflake then you are probably disappointed that there isn’t a native Data Factory connector for Snowflake. While we wait for an official connector from Microsoft we have no alternative but to roll our own. In this blog post I will walk you through […]


In September I joined endjin a Technical Fellow (an entirely new branch in endjin’s career pathway to accommodate me – more on that later). I’ve has been involved with endjin since 2011, as an Associate, helping to deliver some of our most technically challenging projects (and if you go even further back, I attended Cambridge University with endjin co-founder […]


We’re currently building a Data Governance Platform product that enables UK Financial Services organisations to discover and manage the life-cycle, usage, risk and compliance requirements of data assets across the organisation. Much of the core functionality is delivered using Cosmos DB’s Gremlin API to model data lineage and other relationships best represented by a graph […]


Overflowing with dataflow part 1: An overview

by Carmel Eve

In a recent project, I was asked to produce a tool for importing a fairly large amount of data at once, this data then needed to be processed and exported. After much refactoring, I achieved a solution I was satisfied with which used TPL dataflow to execute the processing in parallel. Before I talk specifically […]


Using Python inside SQL Server

by Ed Freeman

Hello everyone. Before Christmas I played around with SQL Server 2017’s inline Python integration capability. This capability was announced early last year, with the corresponding integration with R already being possible for a number of months. The main benefits from this are the abilities to: Eliminate data movement (having to transfer data samples from a database to […]


How to plan your cloud transformation journey

by Howard van Rooijen

This week I received an email from someone who asked how they could use our free Thought Leadership content to help their organisation move to the cloud. I realised that although we’ve released a lot of content, we’d never talked publicly about the rationale behind them and how they are all interconnected. Our Thought Leadership […]


This post explains how to create a PowerBI report which sources data from two separate Azure SQL Databases. PowerBI offers two data access mechanisms; Import and DirectQuery. DirectQuery provides a range of benefits, the chief of which is that data is automatically refreshed. DirectQuery doesn’t import any data into PowerBI, instead it queries the data […]


Choosing the right cloud platform provider can be a daunting task. Take the big three, AWS, Azure, and Google Cloud Platform; each offer a huge number of products and services, but understanding how they enable your specific needs is not easy. Since most organisations plan to migrate existing applications it is important to understand how […]


In this series, we’re comparing cloud services from AWS, Azure and Google Cloud Platform. A full breakdown and comparison of cloud providers and their services are available in this handy poster. We have assessed services across three typical migration strategies: Lift and shift – the cloud service can support running legacy systems with minimal change […]


In February 2016, I completed my second year of endjin’s three year custom apprenticeship scheme. This blog is a chance for me to reflect on what was learnt over the year – hopefully others will find it useful too. Year one had involved a very steep learning curve as I transitioned from student to graduate […]


We produced a booklet to coincide with our Future Decoded talk “The 100 Year Start-up: Embracing Disruption in Financial Services“, where we examine the challenges and opportunities in the Microsoft Cloud for the Financial Services Industry, covering the following topics: Security, Privacy & Data Sovereignty Data Ingestion, Transformation & Enrichment Big Compute Big Data – […]