I have Paraquet files in my S3 bucket which is not AWS S3.
Is there a tool that connects to any S3 service (like Wasabi, Digital Ocean, MinIO), and allows me to query the Parquet files?
I have Paraquet files in my S3 bucket which is not AWS S3.
Is there a tool that connects to any S3 service (like Wasabi, Digital Ocean, MinIO), and allows me to query the Parquet files?
2
Answers
With MongoDB this can be done with our Atlas Data Federation product
https://www.mongodb.com/docs/atlas/data-federation/overview/
It can query parquet files stored in S3.
In case you need a GUI tool then you can use DBeaver + DuckDB.
For programmatic use, You can find DuckDB library for most languages.
Here is my other answer on the same topic.
There is a slight difference since you are querying data on a S3 compatible storage.
You simply need to run few additional commands mentioned here in the DuckDB docs.
In case you have parquet files hosted and served from S3 or any web server via HTTP – DuckDB has this covered as well.
Any S3 compatible object store(Wasabi, Digital Ocean, MinIO) should work similarly..
You can also write the data back as parquet after transformation to any S3 compatible storage(AWS, MinIO etc..).
All of these can be done programmatically as well.