skip to Main Content

In Azure Databricks, I am able to add different git repos in repos filed manually through console. But not able to find any way to automatically sync new files if I committed anything in source repos. Is there any service in Azure which can do this kind of work? Thanks

2

Answers


  1. You can update repository via Repos API, specifically via update command (even if repository is on branch already, calling it again will pull the changes). Another alternative is to use Databricks CLI that also has corresponding command to update repository.

    You can setup some CI/CD pipeline in Github Action or Azure DevOps to update your Databricks repo when commit happens. Here is an example for Azure DevOps.

    Login or Signup to reply.
  2. I’m currently building a solution to do exactly this in my company. The purpose is to stop using databricks managed notebooks and use GitHub versioned notebooks instead. As previously answered, the key to solve this is using the Databricks Repos API.

    To update a specific branch of a specific repository clone, I use the following function in python:

    def updateWorkspaceRepoBranch(databricksDomain, databricksToken, repo, branchName):
    headers={'Authorization': f'Bearer {databricksToken}', 'Content-Type': 'application/json'}
    log.info(f'Updating {repo["id"]} branch {branchName} {repo["path"]}')
    url = f'https://{databricksDomain}/api/2.0/repos/{repo["id"]}'
    payload = { "branch": branchName}
    try:
        response = requests.patch(url=url, headers=headers, json=payload)
        response.raise_for_status()
    except BaseException as e:
        log.warning(f'Error updating branch {branchName} of repository id {repo["id"]} at {repo["path"]}: {response.text}')
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search