Endjin - Home

Big Compute

Optimising C# for a serverless environment

by Carmel Eve

In our recent project with OceanMind we used #AzureFunctions to process marine vessel telemetry from around the world. This involved processing huge quantities of data in close to real time. We optimised our processing for a #serverless environment, the outcome of which being that the compute would cost less than £10 / month!

This post summarises some of the techniques we used, including some concrete examples of optimisations we made.

#bigdata #dataprocessing #dataanalysis #bigcompute

In this blog from the Azure Advent Calendar 2019 we discuss building a secure data solution using Azure Data Lake. Data Lake has many features which enable fine grained security and data separation. It is also built on Azure Storage which enables us to take advantage of all of those features and means that ADLS is still a cost effective storage option!

This post runs through some of the great features of ADLS and runs through an example of how we build our solutions using this technology!

Very excited to be speaking at NDC in London in January! The talk is focused on “Combatting illegal fishing with Machine Learning and Azure” and will focus on the recent work we did with OceanMind. OceanMind are a not-for-profit who are working on cleaning up the world’s oceans with the help of Microsoft’s cloud technologies. […]

C#, Span and async

by Ian Griffiths

The addition of ref struct types, most notably Span, opened C# to a range of high performance scenarios that were impractical to tackle with earlier versions of the language. However, they introduce some challenges. For example, they do not mix very well with async methods. This article shows some techniques for mitigating this.

We worked on a project recently which required us to build a highly performant system for processing vast quantities of messages in real time. We had made the decision to run this processing using Azure Functions with C#. This post runs through some of the techniques we used for writing highly performant, low allocation code, including data streaming, list preallocation and the relatively new C# feature: Span.

Running Azure functions in Docker on a Raspberry Pi 4

by Jonathan George

For one of my first experiments with the Raspberry Pi 4, I decided to get an Azure Function running in a Docker container. This post gives a step-by-step guide on how to do it, as well as providing code you can use a starting point for your own experiments.

Import and export notebooks in Databricks

by Ed Freeman

Sometimes it’s necessary to import and export notebooks from a Databricks workspace. This might be because you have some generic notebooks that can be useful across numerous workspaces, or it could be that you’re having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace. Importing and exporting can be doing either manually or programmatically. In this blog, we outline a way to recursively export/import a directory and its files from/to a Databricks workspace.

Machine learning often seems like a black box. This post walks through what’s actually happening under the covers, in an attempt to de-mystify the process!

Neural networks are built up of neurons. In a shallow neural network we have an input layer, a “hidden” layer of neurons, and an output layer. For deep learning, there is simply more hidden layers which allows for combining neuron’s inputs and outputs to build up a more detailed picture.

If you have an interest in Machine Learning and what is really happening, definitely give this a read (WARNING: Some algebra ahead…)!

Here at endjin we’ve done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a “unified analytics engine for big data and machine learning”. It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.

Mapping Data Flows are a relatively new feature of ADF. They allow you to visually build up complex data transformation sequences. This can aid in the streamlining of data manipulation and ETL processes, without the need to write any code! This post gives a brief introduction to the technology, and what this could enable!

This post walks through the fix for DLL locking errors when trying to deploy an Azure Function. The solution was to switch over to the new “deploy from package” option when deploying the functions. This fixes the file locking problem because instead of deploying the DLLs, the function will run from a package file added to its directory.

When he joined endjin, Technical Fellow Ian sat down with founder Howard for a Q&A session. This was originally published on LinkedIn in 5 parts, but is republished here, in full. Ian talks about his path into computing, some highlights of his career, the evolution of the .NET ecosystem, AI, and the software engineering life.

Using Python inside SQL Server

by Ed Freeman

Do you have a bunch of data in SQL Server that you’re using ODBC/JDBC to pull data to work with in Python? Using SQL Server’s Python integration, you can connect to a SQL Server instance within your preferred IDE and perform the computations on the SQL Server Machine. No more clunky data transferring. Operationalizing a Python model/script is as easy as calling a stored procedure. Any application that can speak to SQL Server can invoke the Python code and retrieve the results. Easy! This blog will provide a few, simple examples which make use of this capability to carry out some simple Python commands, so you can get up and running as quickly as possible.

Learn what types of things an apprentice gets up to at endjin a few months after joining. You could be learning about Neural Networks: algorithms which mimic the way biological systems process information. You could be attending Microsoft’s Future Decoded conference, learning about Bots, CosmosDB, IoT and much more. Hopefully, you wouldn’t be in hospital after a ruptured appendix!

How to plan your cloud transformation journey

by Howard van Rooijen

This week I received an email from someone who asked how they could use our free Thought Leadership content to help their organisation move to the cloud. I realised that although we’ve released a lot of content, we’d never talked publicly about the rationale behind them and how they are all interconnected. Our Thought Leadership […]

Choosing the right cloud platform provider can be a daunting task. Take the big three, AWS, Azure, and Google Cloud Platform; each offer a huge number of products and services, but understanding how they enable your specific needs is not easy. Since most organisations plan to migrate existing applications it is important to understand how […]

In this series, we’re comparing cloud services from AWS, Azure and Google Cloud Platform. A full breakdown and comparison of cloud providers and their services are available in this handy poster. We have assessed services across three typical migration strategies: Lift and shift – the cloud service can support running legacy systems with minimal change […]

We produced a booklet to coincide with our Future Decoded talk “The 100 Year Start-up: Embracing Disruption in Financial Services“, where we examine the challenges and opportunities in the Microsoft Cloud for the Financial Services Industry, covering the following topics: Security, Privacy & Data Sovereignty Data Ingestion, Transformation & Enrichment Big Compute Big Data – […]

Azure Batch – Time is Money in Big Compute

by James Broome

Earlier in the year, endjin worked with the Azure Batch Product Team to run a series of experiments against the Azure Batch service using a framework we developed for performing scale, soak and performance tests. We’ve had conversations with a number of organisations over the last 5 years who have scaled their compute intensive workloads (SAS, […]

A short while ago, I was trying to classify some data using Azure Machine Learning, but the training data was very imbalanced. In the attempt to build a useful model from this data, I came across the Synthetic Minority Oversampling Technique (SMOTE), an approach to dealing with imbalanced training data. This blog describes what I […]