I have 3 operators imported from airflow.providers.google.cloud.operators.dataproc
DataprocCreateBatchOperator
DataprocDeleteBatchOperator
DataprocGetBatchOperator
Need the same kind-of operators for Azure.
Can please someone look into this or I have to create a new operator ?
2
Answers
@Mazlum Tosun
For GCP in my code DataprocCreateBatchOperator used like this:-
I believe the
apache-airflow-providers-microsoft-azure
provider package equivalent for Dataproc operators would be Azure Synapse Operators.Specifically, the
AzureSynapseRunSparkBatchOperator
allows users to "execute a spark application within Synapse Analytics".If you’re running Spark jobs on Azure Databricks, there are also several Databricks Operators that might be able to help.
Here’s an example
PythonOperator
(via Taskflow API) that uses theAzureSynapseHook
. Note that I didn’t test this, and I’m just using this as a demonstration of what it might look like:This task will wait for a spark job to enter a status of "error", "dead", or "killed" or timeout. If the spark job enters one of the statuses previously mentioned, it will cancel the job. Again, this is just for a demonstration of how to use the
AzureSynapseHook
within aPythonOperator
, and I’m not sure if it would work or if it even makes sense to implement it this way.