Databricks Connector

Created by Alexandru Sirbu, Modified on Thu, 23 Apr at 2:54 PM by Alexandru Sirbu

This document describes how the Databricks Connector is configured inside WOODY.IO, what authentication options are available, what permissions the authenticating identity must hold on the Databricks side, and how the optional Temporary Location feature works.


Creating a Databricks Connection


Navigate to Management > Connections > Add Connection. Set Connection Type to Databricks, then fill in the fields below.


FieldDescription
Cluster TypeChoose All Purpose Cluster (general workloads) or SQL Warehouse (SQL/BI workloads)
HostHTTPS endpoint of your Databricks workspace, e.g. adb-<id>.azuredatabricks.net
Cluster IDUnique identifier of the target cluster or SQL Warehouse
SchemaOptional. Format: <catalog>.<schema> - overrides the workspace default catalog. Entity-level schema takes precedence if also set. Helps loading entities faster when creating Entity from Connection
Temporary Location NameOptional. Name of a Databricks External Location used as a staging area for temp tables


Can Import / Can Persist / Can Live Edit - toggle these switches in the Details section to control how the Connection may be used.


Authentication


Two authentication methods are available. Select one in the Authentication Type dropdown.


MethodDetails
Private Access TokenA Personal Access Token (PAT) generated in Databricks User Settings. Stored in WOODY.IO or referenced via Azure Key Vault using @KeyVault(<Identifier>;<SecretName>) when a KeyVault is configured on the environment
Application Service PrincipalUses the Service Principal configured at Application level (Tenant ID, Client ID, Client Secret). Authenticates via Azure AD OAuth 2.0. Recommended for production


Required Permissions


The identity used to authenticate, PAT owner or Service Principal, must hold the following permissions in Databricks.


Compute TypeRequired Permissions
All Purpose ClusterCAN ATTACH TO or CAN RESTART on the target cluster
SQL WarehouseCAN USE on the target SQL Warehouse


Unity Catalog


For every catalog and schema the Connection or Entity Technical Names reference:


GrantPurpose
USE CATALOGNeeded to access any object in the catalog
USE SCHEMANeeded to access any object in the schema
SELECTNeeded to read source data during Import
MODIFYNeeded to insert, update, merge, or delete rows during Persist
CREATE TABLEOnly when Temporary Location Name is not configured



Temporary Location Name


The Temporary Location Name field in the Connection form, when configured, should point to a pre-configured Databricks External Location, a Unity Catalog object that grants Databricks access to a specific path in cloud storage (ADLS Gen2).


Whether or not this field is set changes how WOODY.IO handles the temp tables it needs during Merge, Update, and Delete operations.


Without Temporary Location Name


WOODY.IO creates the temp table as a managed table directly in the Unity Catalog. Databricks writes the Delta files to the storage account backing that catalog (its own storage or the metastore root).


If the Storage Credential on that storage account only has Storage Blob Data Reader, the CREATE TABLE call will fail - even though SELECT queries continue to work. The Reader role permits reads but not writes.


With Temporary Location Name configured


WOODY.IO creates the temp table as an Unmanaged (external) table inside the External Location path, then runs the required DML against the destination table, and finally issues DELETE and VACUUM to clean up the leftover Delta files.


Because the temp table is external (created with LOCATION), Databricks will not delete it automatically when dropped. WOODY.IO runs an explicit DELETE + VACUUM. This increases total import time and requires retentionDurationCheck to be disabled on the cluster.


Disable the check in cluster Spark config:

spark.databricks.delta.retentionDurationCheck.enabled false


Which approach to use?


Configuring an External Location is the recommended approach. It isolates WOODY.IO's temp writes to a dedicated, controlled storage path and avoids needing to elevate permissions on the entire Unity Catalog storage account.


ApproachDescription
Without External LocationTemp tables go into the Unity Catalog managed storage. Requires Storage Blob Data Contributor on the catalog storage account. Simpler setup but broader permission scope
With External LocationTemp tables go into a dedicated External Location path. The Storage Credential on that path needs write access only to that path. Recommended for production environments



Configuration and Permission Checklist


Whenever the Import runs into an error during loading or persisting the data, or connection validation fails, go through the following checklist and confirm that your connection in WOODY.IO meets all the requirements.


ItemNotes
Auth Identity is a member of the Databricks workspacePAT Owner or Service Principal
Cluster / Warehouse access grantedCAN ATTACH TO (All Purpose) or CAN USE (SQL Warehouse)
USE CATALOG grantedOn every catalog used or part of the scope
USE SCHEMA grantedOn every schema used or part of the scope
SELECT grantedOn all source tables in the schema
MODIFY grantedOn all destination tables in the schema
CREATE TABLE grantedRequired only when Temporary Location Name is not set
READ FILES + WRITE FILES on External LocationRequired only when Temporary Location Name is set
CREATE TABLE on External LocationRequired only when Temporary Location Name is set
retentionDurationCheck disabled on clusterRequired only when Temporary Location Name is set
Schema field set (if using non-default catalog)Format: <catalog>.<schema>. Helps loading the entities from the catalog/schema faster



If you have any further questions, please feel free to Contact Us.

You can also refer to the WOODY.IO End User Documentation.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article