I have a MySQL table with around 4 million+ rows. Let us say the table is as follows:
Columns in table Person
:
Id
Name
Age
Marital Status
Education Level
- ‘Location Country’
- ‘Description’
When I run a query based on Age
, I also want to have a summary count of people with the same age in different marital status and also with different ‘Education Level’ and ‘Location Country’.
When I run a query based on Age and Education Level, I also want to have a summary count of people with the same age and Education Level in different marital status and also with different ‘Location Country’.
For example, the query issued would be SELECT * FROM Person WHERE Age = 27;
. I also want results that would be produced by SELECT Education Level, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Education Level;
and SELECT Location Country, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Location Country;
Also, this becomes more challenging for me when I have to do a search based on keywords on description and want a summary count on each of the other columns. The application I am developing is a sort of search engine. This can be seen in sites like Ebay,
I can possibly run these queries separately. But, with 4 million rows, the GROUP BY query will take substantial amount of time. This is an internet application and the query should complete within few seconds.
Any help would be much appreciated.
2
Answers
You can do both in one query
In such situation, I would suggest to save data in a
memcache
cache. You can expire cache if new data inserted to table or after some expiration time, to avoid long query execution. Another improvement would be using a LIMIT to reduce number of row returned by DB like this:From what you are describing, I would have a separate aggregate table to query directly from that has those “roll-up” stats you want. How frequent is the “Person” table getting added to / changed. If you are only storing a person’s “Age”, what is the basis of the age if no date, and you add the person again in future they would have multiple records… such that
At age X, so many people were married (or not) and had this level of education.
At age Y, so many people… etc..
I would create a summary table, something like
Then, add a trigger to the person table such that during insert (or inclusive of update/delete as needed), the new record just adds 1 to each respective count applicable.
Then, for your web app, it would be instantaneous to grab one record from this summary table where age = 27 and you have ALL your classification stats.
However, if you distinctly wanted to know how many Married with Masters degree, you would have to roll back to master person list.
Alternatively, you could do a similar pre-aggregation but down a level of granularity something like
and likewise have a trigger that updates the count based on the two combination elements per age. Then, if you wanted the total “Married”, you can sum(peoplecount) for age = 27 and maritalstat=(enumerator for “married” value)
Good luck, and hope it helps alternative solution for you.