Endjin - Home

Big Compute

In this blog from the Azure Advent Calendar 2019 we discuss building a secure data solution using Azure Data Lake. Data Lake has many features which enable fine grained security and data separation. It is also built on Azure Storage which enables us to take advantage of all of those features and means that ADLS is still a cost effective storage option!

This post runs through some of the great features of ADLS and runs through an example of how we build our solutions using this technology!


Very excited to be speaking at NDC in London in January! The talk is focused on “Combatting illegal fishing with Machine Learning and Azure” and will focus on the recent work we did with OceanMind. OceanMind are a not-for-profit who are working on cleaning up the world’s oceans with the help of Microsoft’s cloud technologies. […]


C#, Span and async

by Ian Griffiths

The addition of ref struct types, most notably Span, opened C# to a range of high performance scenarios that were impractical to tackle with earlier versions of the language. However, they introduce some challenges. For example, they do not mix very well with async methods. This article shows some techniques for mitigating this.


We worked on a project recently which required us to build a highly performant system for processing vast quantities of messages in real time. We had made the decision to run this processing using Azure Functions with C#. This post runs through some of the techniques we used for writing highly performant, low allocation code, including data streaming, list preallocation and the relatively new C# feature: Span.


Running Azure functions in Docker on a Raspberry Pi 4

by Jonathan George

At our endjin team meet up this week, we were all presented with Raspberry Pi 4b’s and told to go away and think of something good to do with them. I first bought a Raspberry Pi back in 2012 and have to admit, beyond installing XBMC and playing around with it, I haven’t done a […]


Import and export notebooks in Databricks

by Ed Freeman

Sometimes we need to import and export notebooks from a Databricks workspace. This might be because you have a bunch of generic notebooks that can be useful across numerous workspaces, or it could be that you’re having to delete your current workspace for some reason and therefore need to transfer content over to a new […]


Machine learning often seems like a black box. This post walks through what’s actually happening under the covers, in an attempt to de-mystify the process!

Neural networks are built up of neurons. In a shallow neural network we have an input layer, a “hidden” layer of neurons, and an output layer. For deep learning, there is simply more hidden layers which allows for combining neuron’s inputs and outputs to build up a more detailed picture.

If you have an interest in Machine Learning and what is really happening, definitely give this a read (WARNING: Some algebra ahead…)!


Here at endjin we’ve done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a “unified analytics engine for big data and machine learning”. It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.


Mapping Data Flows are a relatively new feature of ADF. They allow you to visually build up complex data transformation sequences. This can aid in the streamlining of data manipulation and ETL processes, without the need to write any code! This post gives a brief introduction to the technology, and what this could enable!


This post walks through the fix for DLL locking errors when trying to deploy an Azure Function. The solution was to switch over to the new “deploy from package” option when deploying the functions. This fixes the file locking problem because instead of deploying the DLLs, the function will run from a package file added to its directory.


When he joined endjin, Technical Fellow Ian sat down with founder Howard for a Q&A session. This was originally published on LinkedIn in 5 parts, but is republished here, in full. Ian talks about his path into computing, some highlights of his career, the evolution of the .NET ecosystem, AI, and the software engineering life.


Using Python inside SQL Server

by Ed Freeman

Hello everyone. Before Christmas I played around with SQL Server 2017’s inline Python integration capability. This capability was announced early last year, with the corresponding integration with R already being possible for a number of months. The main benefits from this are the abilities to: Eliminate data movement (having to transfer data samples from a database to […]


How to plan your cloud transformation journey

by Howard van Rooijen

This week I received an email from someone who asked how they could use our free Thought Leadership content to help their organisation move to the cloud. I realised that although we’ve released a lot of content, we’d never talked publicly about the rationale behind them and how they are all interconnected. Our Thought Leadership […]


Choosing the right cloud platform provider can be a daunting task. Take the big three, AWS, Azure, and Google Cloud Platform; each offer a huge number of products and services, but understanding how they enable your specific needs is not easy. Since most organisations plan to migrate existing applications it is important to understand how […]


In this series, we’re comparing cloud services from AWS, Azure and Google Cloud Platform. A full breakdown and comparison of cloud providers and their services are available in this handy poster. We have assessed services across three typical migration strategies: Lift and shift – the cloud service can support running legacy systems with minimal change […]


We produced a booklet to coincide with our Future Decoded talk “The 100 Year Start-up: Embracing Disruption in Financial Services“, where we examine the challenges and opportunities in the Microsoft Cloud for the Financial Services Industry, covering the following topics: Security, Privacy & Data Sovereignty Data Ingestion, Transformation & Enrichment Big Compute Big Data – […]


Azure Batch – Time is Money in Big Compute

by James Broome

Earlier in the year, endjin worked with the Azure Batch Product Team to run a series of experiments against the Azure Batch service using a framework we developed for performing scale, soak and performance tests. We’ve had conversations with a number of organisations over the last 5 years who have scaled their compute intensive workloads (SAS, […]


A short while ago, I was trying to classify some data using Azure Machine Learning, but the training data was very imbalanced. In the attempt to build a useful model from this data, I came across the Synthetic Minority Oversampling Technique (SMOTE), an approach to dealing with imbalanced training data. This blog describes what I […]


Spinning up 16,000 A1 Virtual Machines on Azure Batch

by Howard van Rooijen

Big Compute, like Big Data has a different meaning for every organisation; for Big Data this generally tends to be when data grows to a point where it can no longer be stored, queried, backed up, restored or processed easily on traditional database architectures. For Big Compute this tends to be when computation grows to […]