Unity Catalog
is the Azure Databricks data governance solution for the Lakehouse. Whereas, Microsoft Purview
provides a unified data governance solution to help manage and govern your on-premises, multicloud, and software as a service (SaaS) data.
Question: In our same
Azure Cloud project, can we use Unity Catalog
for the Azure Databricks Lakehouse, and use Microsoft Purview for the rest of our Azure project?
Update: In our current Azure subscription, we have divided workload as follows:
- SQL related workload: we are doing all our SQL database work using Databricks
only
(no Azure SQL databases are involved). That is, we are using Databricks Lakehouse, Delta Lake, Deatricks SQL etc. to performETL
and allData Analytics work
. - All Non-SQL workload: All other assets (Excel files, csv files, pdf, media files etc.) are stored in various Azure storage accounts.
MS Purview is doing a good job in scanning assets in scenario 2 above, and it easily creates a holistic, up-to-date map of our data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. It also enables our data consumers to access valuable, trustworthy data management.
However, our almost 50% of the work (SQL, ETL, Data Analytics etc.) is done in Azure Databricks where we have significant challenges with Purview. We were wondering if it’s possible to keep Purview and Unity Catalog separate as follows: Purview does its Data Governance work for scenario 1 only and Unity Catalog does its Data Governance work for scenario 2 only.
This recently released update may resolve our issue of making Purview work better with Azure Databricks but we have not tried it yet: Connect to and manage Azure Databricks in Microsoft Purview (Preview)
3
Answers
As of right now there is no official integration between Unity Catalog and Purview yet, but it may come in the future. You may join Azure Databricks roadmap webinar that will be tomorrow to get more information.
Regarding the actual question – imho, nothing prevents you from using UC & Purview in the same Azure project.
P.S. You can get metadata & lineage information into Purview by loading data from information schema tables and using Purview APIs to store it in Purview.
The integration between Purview and UC is in private preview.
Purview currently doesn’t support scanning catalogs with a metastore attached.
I tried to set this up too, but I only get tables from the standard hive_metastore directory.
There is an Azure Databricks to Purview Lineage Connector.
You can check it out here.