I have my my dbt-project configured and running for dbt-postgres. Is there an easy way to port it to dbt- databricks without having to rewrite the models.
Context: I am building a dbt demo on postgres, databricks and Azure SQL for my organization to see how well it fit our environments.
What I’ve already tried : I have successfully running pipelines on dbt-postgres.
Question posted in PostgreSQL
The official documentation can be found here.
The official documentation can be found here.
2
Answers
dbt handles the DDL/DML boilerplate; for
select
statements, postgres and sparkSQL/databricks aren’t that far off, but you will have to make some tweaks to the models to get things running, and you’ll want to take advantage of some of the config in databricks, like partitioning.One way to ease this transition is to use dbt’s cross-database macros, which standardize syntax for functions like
datediff
, which can vary between dialects.Another option, if your models don’t contain much jinja, would be to use
sqlglot
to transpile between dialects.If you have the same data in postgres and databricks, you can use datafold’s
data-diff
to compare the results of the modeled data between datastores, to confirm you ported everything correctly.Fyi databricks is coming out with lateral aliases, which will ease a lot of the problems.
https://docs.databricks.com/sql/language-manual/sql-ref-name-resolution.html
Databricks Runtime 12.2 and above