Databricks Connector

Created by Alexandru Sirbu, Modified on Thu, 23 Apr at 2:54 PM by Alexandru Sirbu

This document describes how the Databricks Connector is configured inside WOODY.IO, what authentication options are available, what permissions the authenticating identity must hold on the Databricks side, and how the optional Temporary Location feature works.

Creating a Databricks Connection

Navigate to Management > Connections > Add Connection. Set Connection Type to Databricks, then fill in the fields below.

Field	Description
Cluster Type	Choose All Purpose Cluster (general workloads) or SQL Warehouse (SQL/BI workloads)
Host	HTTPS endpoint of your Databricks workspace, e.g. adb-<id>.azuredatabricks.net
Cluster ID	Unique identifier of the target cluster or SQL Warehouse
Schema	Optional. Format: <catalog>.<schema> - overrides the workspace default catalog. Entity-level schema takes precedence if also set. Helps loading entities faster when creating Entity from Connection
Temporary Location Name	Optional. Name of a Databricks External Location used as a staging area for temp tables

Can Import / Can Persist / Can Live Edit - toggle these switches in the Details section to control how the Connection may be used.

Authentication

Two authentication methods are available. Select one in the Authentication Type dropdown.

Method	Details
Private Access Token	A Personal Access Token (PAT) generated in Databricks User Settings. Stored in WOODY.IO or referenced via Azure Key Vault using @KeyVault(<Identifier>;<SecretName>) when a KeyVault is configured on the environment
Application Service Principal	Uses the Service Principal configured at Application level (Tenant ID, Client ID, Client Secret). Authenticates via Azure AD OAuth 2.0. Recommended for production

Required Permissions

The identity used to authenticate, PAT owner or Service Principal, must hold the following permissions in Databricks.

Compute Type	Required Permissions
All Purpose Cluster	CAN ATTACH TO or CAN RESTART on the target cluster
SQL Warehouse	CAN USE on the target SQL Warehouse

Unity Catalog

For every catalog and schema the Connection or Entity Technical Names reference:

Grant	Purpose
USE CATALOG	Needed to access any object in the catalog
USE SCHEMA	Needed to access any object in the schema
SELECT	Needed to read source data during Import
MODIFY	Needed to insert, update, merge, or delete rows during Persist
CREATE TABLE	Only when Temporary Location Name is not configured

Temporary Location Name

The Temporary Location Name field in the Connection form, when configured, should point to a pre-configured Databricks External Location, a Unity Catalog object that grants Databricks access to a specific path in cloud storage (ADLS Gen2).

Whether or not this field is set changes how WOODY.IO handles the temp tables it needs during Merge, Update, and Delete operations.

Without Temporary Location Name

WOODY.IO creates the temp table as a managed table directly in the Unity Catalog. Databricks writes the Delta files to the storage account backing that catalog (its own storage or the metastore root).

If the Storage Credential on that storage account only has Storage Blob Data Reader, the CREATE TABLE call will fail - even though SELECT queries continue to work. The Reader role permits reads but not writes.

With Temporary Location Name configured

WOODY.IO creates the temp table as an Unmanaged (external) table inside the External Location path, then runs the required DML against the destination table, and finally issues DELETE and VACUUM to clean up the leftover Delta files.

Because the temp table is external (created with LOCATION), Databricks will not delete it automatically when dropped. WOODY.IO runs an explicit DELETE + VACUUM. This increases total import time and requires retentionDurationCheck to be disabled on the cluster.

Disable the check in cluster Spark config:

spark.databricks.delta.retentionDurationCheck.enabled false

Which approach to use?

Configuring an External Location is the recommended approach. It isolates WOODY.IO's temp writes to a dedicated, controlled storage path and avoids needing to elevate permissions on the entire Unity Catalog storage account.

Approach	Description
Without External Location	Temp tables go into the Unity Catalog managed storage. Requires Storage Blob Data Contributor on the catalog storage account. Simpler setup but broader permission scope
With External Location	Temp tables go into a dedicated External Location path. The Storage Credential on that path needs write access only to that path. Recommended for production environments

Configuration and Permission Checklist

Whenever the Import runs into an error during loading or persisting the data, or connection validation fails, go through the following checklist and confirm that your connection in WOODY.IO meets all the requirements.

Item	Notes
Auth Identity is a member of the Databricks workspace	PAT Owner or Service Principal
Cluster / Warehouse access granted	CAN ATTACH TO (All Purpose) or CAN USE (SQL Warehouse)
USE CATALOG granted	On every catalog used or part of the scope
USE SCHEMA granted	On every schema used or part of the scope
SELECT granted	On all source tables in the schema
MODIFY granted	On all destination tables in the schema
CREATE TABLE granted	Required only when Temporary Location Name is not set
READ FILES + WRITE FILES on External Location	Required only when Temporary Location Name is set
CREATE TABLE on External Location	Required only when Temporary Location Name is set
retentionDurationCheck disabled on cluster	Required only when Temporary Location Name is set
Schema field set (if using non-default catalog)	Format: <catalog>.<schema>. Helps loading the entities from the catalog/schema faster

If you have any further questions, please feel free to Contact Us.

You can also refer to the WOODY.IO End User Documentation.