The business logic is as below:
- The user upload a csv file;
- The application convert the csv file to a database table;
- In the future, the user could run sql on the table to generate a BI report;
Currently, the solution is to save the table to MySQL. But as times goes on, the MySQL database contains thousands of tables.
I want find a file format, which represent a table and can be put to a object storage such as AWS S3, and then run an sql on the file.
For example:
Datasource ds = new Datasource("s3://xxx/bbb/t1.tbl");
ResultSet rs = ds.runSQL("select c1, c2 from t1 where c3=8");
What is your ideas or solutions?
2
Answers
Amazon S3 can run an SQL query against a single CSV file. It uses a capability called S3 Select.
From Filtering and retrieving data using Amazon S3 Select – Amazon Simple Storage Service:
You can make an API call to S3 to perform the SQL query and retrieve the results. No database required. Just pay for the storage used by the CSV files (which can be gzipped to save space), plus
$0.002 per GB scanned
and$0.0007 per GB returned
.You can store the file as CSV in S3 and use S3 Select as mentioned in the other answer. Or you can store it as CSV or Parquet (a much more performant format) and run queries against it using AWS Athena.