Endjin - Home

Databricks

Import and export notebooks in Databricks

by Ed Freeman

Sometimes it’s necessary to import and export notebooks from a Databricks workspace. This might be because you have some generic notebooks that can be useful across numerous workspaces, or it could be that you’re having to delete your current workspace for some reason and therefore need to transfer content over to a new workspace. Importing and exporting can be doing either manually or programmatically. In this blog, we outline a way to recursively export/import a directory and its files from/to a Databricks workspace.


Quite often it’s beneficial to work with pre-built CLIs/SDKs to interact with your favourite tools, instead of making requests to the underlying REST API. Much of the complexity around constructing requests has been abstracted, and authentication is often easier. The Databricks CLI makes it easier to interact with your Databricks instance, but sometimes you can run into strange errors when constructing the values passed in as arguments. In this blog, we take a look at a JsonDecodeError that can occur when speaking to the Clusters CLI, and look at a way we can avoid this error.


Here at endjin we’ve done a lot of work around data analysis and ETL. As part of this we have done some work with Databricks Notebooks on Microsoft Azure. Notebooks can be used for complex and powerful data analysis using Spark. Spark is a “unified analytics engine for big data and machine learning”. It allows you to run data analysis workloads, and can be accessed via many APIs. This means that you can build up data processes and models using a language you feel comfortable with. They can also be run as an activity in a ADF pipeline, and combined with Mapping Data Flows to build up a complex ETL process which can be run via ADF.