skip to Main Content

I have the following config:

# Configure the Azure provider
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.25.0"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "1.4.0"
    }
  }

}


provider "azurerm" {
  alias = "uat-sub"
  features {}  
  subscription_id = "sfsdf"
}

provider "databricks" {
  host  = "https://abd-1234.azuredatabricks.net"
  token = "sdflkjsdf"
  alias = "dev-dbx-provider"
}


resource "databricks_cluster" "dev_cluster" {
  cluster_name = "xyz"
  spark_version = "10.4.x-scala2.12"
}

I am able to successfully import databricks_cluster.dev_cluster. Once imported, I update my config to output a value from the cluster in state. The updated config looks like this:

# Configure the Azure provider
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.25.0"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "1.4.0"
    }
  }

}


provider "azurerm" {
  alias = "uat-sub"
  features {}  
  subscription_id = "sfsdf"
}

provider "databricks" {
  host  = "https://abd-1234.azuredatabricks.net"
  token = "sdflkjsdf"
  alias = "dev-dbx-provider"
}


resource "databricks_cluster" "dev_cluster" {
  cluster_name = "xyz"
  spark_version = "10.4.x-scala2.12"
}

output "atm"{
   value = databricks_cluster.dev_cluster.autotermination_minutes
}

When I run terraform apply on the updated config, terrform proceeds to refresh my imported cluster and detects changes and does an ‘update-in-place’ where some of the values on my cluster are set null (autoscale/pyspark_env etc). All this happens when no changes are actually being made on the cluster. Why is this happening? Why is terraform resetting some values when no changes have been made?

EDIT- ‘terraform plan’ output:

C:Users>terraform plan
databricks_cluster.dev_cluster: Refreshing state... [id=gyht]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # databricks_cluster.dev_cluster will be updated in-place
  ~ resource "databricks_cluster" "dev_cluster" {
      ~ autotermination_minutes      = 10 -> 60
      - data_security_mode           = "NONE" -> null
        id                           = "gyht"
      ~ spark_env_vars               = {
          - "PYSPARK_PYTHON" = "/databricks/python3/bin/python3" -> null
        }
        # (13 unchanged attributes hidden)

      - autoscale {
          - max_workers = 8 -> null
          - min_workers = 2 -> null
        }

      - cluster_log_conf {
          - dbfs {
              - destination = "dbfs:/cluster-logs" -> null
            }
        }

        # (2 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

EDIT – Work around with hard coded tags:

resource "databricks_cluster" "dev_cluster" {
  cluster_name = "xyz"
  spark_version = "10.4.x-scala2.12"
  autotermination_minutes = 10
  data_security_mode = "NONE"
  autoscale {
    max_workers = 8
    min_workers = 2
   }
   cluster_log_conf {
      dbfs {
        destination = "dbfs:/cluster-logs"
      }
    }
    spark_env_vars = {
          PYSPARK_PYTHON = "/databricks/python3/bin/python3"
    }
}

The workaround partially works as I no longer see terraform trying to reset the tags on every apply. But if I were to change any of the tags on the cluster, lets says I change max workers to 5, terraform will not update state to reflect 5 workers. TF will override 5 with the hard coded 8, which is an issue.

2

Answers


  1. To answer your first part of your question, Terraform has imported the actual values of your cluster into the state file but it cannot import those values into your config file (.hcl) for you so you need to specify them manually (as you have done).

    By not setting the optional fields, you are effectively saying "set those fields to the default value" which in most cases is null (with the exception of the autotermination_minutes field which has a default of 60), which is why Terraform detects a drift between your state and your config. (actual values from import vs. the default values of the unspecified fields).

    For reference : https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/cluster

    For the second part of your question, you say

    lets says I change max workers to 5, terraform will not update state to reflect 5 workers.

    if you mean you change the max workers from outside of Terraform, then Terraform is designed to override that field when you run terraform apply. When working with Terraform, if you want to make a change to your infrastructure, you always want to make the changes in your Terraform config and run terraform apply to make those changes for you.

    So in your case if you wanted to change the max_workers to 5, you would set that value in the terraform config and run terraform apply. You would not do it from within Databricks. If that behaviour is problematic I would question whether you want to manage that resource with Terraform, as that is always how Terraform will work.

    Hope that helps!

    Login or Signup to reply.
  2. This is regarding the max_worker tag changes, hope you have the var.tf file and if you had mentioned var "max" {default=8} in var.tf.

    Then you can override this value explicitly by providing the required value while applying plan such as terraform plan -var="max=5" and you can check in the plan output.
    🙂

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search