How do I set up Databricks for warehouse sync?

Last updated: September 12, 2025

Currently, Pylon only supports connecting your data to Databricks via AWS S3. To set this up, please navigate to How do I set up AWS S3 for warehouse sync? and follow the instructions there.

The S3 sync unloads data into an S3 bucket maintained in your own account, which you can then modify to allow Databricks access.

Here is a sample of one access pattern you can follow to enable Databricks access:

AWS Setup

You will need to create an IAM policy to allow S3 access

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

After you have created the policy, you can set up an IAM role with the following trust policy and attach the policy above to this IAM role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::414351767826:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "your-databricks-external-id"
        }
      }
    }
  ]
}

Databricks Setup

In Databricks, run the following commands to create storage credentials, set up an external location, and grant appropriate permissions for your S3 bucket.

CREATE STORAGE CREDENTIAL `your-storage-credential`
WITH (AWS_IAM_ROLE 'arn:aws:iam::YOUR-ACCOUNT:role/databricks-s3-read-role');

CREATE EXTERNAL LOCATION `your-external-location`
URL 's3://your-bucket-name/path/'
WITH (CREDENTIAL `your-storage-credential`);

GRANT READ FILES ON EXTERNAL LOCATION `your-external-location` TO `users`;