File path error in pipeline for spark notebook in azure synapse

darkstar
July 17, 2022
214 views
0 votes
2 Answers

I have a spark notebook which I am running with the help of pipeline. The notebook is running fine manually but in the pipeline it is giving error for file location. In the code I am loading the file in a data frame. The file location in the code is abfss://storage_name/folder_name/* and in pipeline it is taking abfss://storage_name/filename.parquetn

This is the error
{
"errorCode": "6002",
"message": "org.apache.spark.sql.AnalysisException: Path does not exist: abfss://storage_name/filename.parquetn at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:806)nn at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:803)nn at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)nn at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)nn at scala.util.Success.$anonfun$map$1(Try.scala:255)nn at scala.util.Success.map(Try.scala:213)nn at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)nn at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)nn at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)nn at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)nn at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)nn at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)nn at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)nn at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)nn at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)n",
"failureType": "UserError",
"target": "notebook_name",
"details": [] }

Answers

Chosen as BEST ANSWER
- darkstar
- January 21, 2023 at 9:39 pm
- 0 votes
0
Added my synapse workspace under the required access. Hence, worked.

(Edit)

- SaiVamsi
- July 18, 2022 at 7:21 pm
- 0 votes
0
The above error mainly happens because of permission issue, the synapse workspace required lack of permissions to access storage account, so you need to grant storage blob contributor role.

To add storage account contributor role to your workspace, refer this Microsoft documentation

And also, make sure to check whether you are following ADLS gen2 proper syntax or not.
```
abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>
```
Sample code
```
df = spark.read.load('abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/samplefile.parquet>', format='parquet')
```
For more detail information refer this link.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.