How do I specify S3 key with wild card while searching in Scala Spark - Amazon Web Sevices

Mikasa
February 24, 2023
129 views
0 votes
2 Answers

I have a scala spark code that writes a json file with file name as part-*.json for example (part-00000-14732361-f017-468a-b948-22d3b6d460dc-c000.json).

I want to do
s3.doesObjectExist(buckey, key) where bucket = xyz and key = abc/def/part-*.json.

Looks like s3 doesn’t support wildcard search. What is the best way for me to do
s3.doesObjectExist(buckey, key) when I don’t know the exact file name in S3? There is always only one such json file stored as part-*.json.

Please help thanks!

Answers

Chosen as BEST ANSWER
- Mikasa
- March 3, 2023 at 6:23 pm
- 0 votes
0
I did a workaround
```
val bucket = "xyz"
val fileNamePrefix = "abc/def/part"
val key = s3.listObjectsV2(bucket,fileNamePrefix).getObjectSummaries.get(0).getKey
```
Since I mentioned there only one such file, the above code helped me get the entire key with full file name that I use.

(Edit)

- Marcin
- February 24, 2023 at 12:54 am
- 0 votes
0
Its not possible to do it with AWS API. You have to download the list of objects yourself and do the filtering on your own side. If you have lots of objects, you can request S3 Inventory to get the list and filter that.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How do I specify S3 key with wild card while searching in Scala Spark – Amazon Web Sevices

Answers