I have an Azure Data Factory pipeline that runs multiple Databricks Notebooks using job clusters.
I want to track down the cost of those job clusters in the Cost Management panel.
What I’m interested in is not only the Databricks cost, but also the underlying VM cost, but only for that specific set of jobs.
I know that I can filter by Tag and then by jobid, but that needs a lot of manual work and of course, the jobid changes between pipeline runs.
Is there any way to tag or filter it in a more automated fashion? Maybe by service principal used to run those jobs?
2
Answers
I ended up with dynamically tagging the job clusters with Data Factory's pipeline ID during pipeline run. The ADF activity used to run Databricks notebook has an option to add cluster tags. I can then reference the pipeline ID in Cost Managent to filter information I need.
Azure Cost Management is a tool integrated into the Azure Portal, designed to monitor and understand usage costs for Azure components, including Azure Databricks.
When an Azure Databricks Workspace resource is created, you will see that, in addition to the main workspace resource, a Managed Resource Group is also created and associated with the workspace.
This managed resource group includes the default storage account, virtual machines for cluster nodes, disks for the nodes, networking resources, and more. The following screenshots display the Azure Databricks Workspace resource along with its associated Managed Resource Group.
Azure Databricks Workspace — Managed Resource Group Link displayed
Resources in Azure Databricks Managed Resource Group
Regarding the Monitor usage using cluster, pool, and workspace tags, the official documentation provides detailed information about tags and their propagation to resources.
To navigate to the Cost Analysis section in the Azure Portal, search for "Cost Management + Billing" in the Azure Portal, then go to "Cost Management" followed by "Cost Analysis."
Total Databricks Costs grouped by Meter Category: apply a filter for the Vendor tag with the value "Databricks" and group by either Meter Category or Meter Subcategory. You can also group by Meter for more detailed service costs or select "None" for a single line item. There are various other options you can experiment with as well.
Know more about Costs using Azure Cost Management for Observability and Chargebacks — Effective Tag Usage.