In MYSQL, how to summarise query results based on the parameters not specified in the query? - Ebay API

user1831240
November 17, 2012
215 views
0 votes
2 Answers

I have a MySQL table with around 4 million+ rows. Let us say the table is as follows:

Columns in table Person:

Id
Name
Age
Marital Status
Education Level
‘Location Country’
‘Description’

When I run a query based on Age, I also want to have a summary count of people with the same age in different marital status and also with different ‘Education Level’ and ‘Location Country’.

When I run a query based on Age and Education Level, I also want to have a summary count of people with the same age and Education Level in different marital status and also with different ‘Location Country’.

For example, the query issued would be SELECT * FROM Person WHERE Age = 27;. I also want results that would be produced by SELECT Education Level, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Education Level; and SELECT Location Country, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Location Country;

Also, this becomes more challenging for me when I have to do a search based on keywords on description and want a summary count on each of the other columns. The application I am developing is a sort of search engine. This can be seen in sites like Ebay,

I can possibly run these queries separately. But, with 4 million rows, the GROUP BY query will take substantial amount of time. This is an internet application and the query should complete within few seconds.

Any help would be much appreciated.

Tags: mysql summary

Answers

- RoozbehZabihollahi
- November 17, 2012 at 4:38 am
- 0 votes
0
You can do both in one query
```
SELECT p.*, count(p2.id)  
FROM Person p, Person p2 
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education 
GROUP BY p1.id
```
In such situation, I would suggest to save data in a memcache cache. You can expire cache if new data inserted to table or after some expiration time, to avoid long query execution. Another improvement would be using a LIMIT to reduce number of row returned by DB like this:
```
SELECT p.*, count(p2.id)  
FROM Person p, Person p2 
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education 
GROUP BY p1.id
LIMIT 10
```
Login or Signup to reply.

- DRapp
- November 17, 2012 at 9:52 am
- 0 votes
0
From what you are describing, I would have a separate aggregate table to query directly from that has those “roll-up” stats you want. How frequent is the “Person” table getting added to / changed. If you are only storing a person’s “Age”, what is the basis of the age if no date, and you add the person again in future they would have multiple records… such that

At age X, so many people were married (or not) and had this level of education.
At age Y, so many people… etc..

I would create a summary table, something like
```
create table AgeStat ( 
   age int, 
   married int, 
   single int, 
   divorced int, 
   HighSchool int, 
   Associates int,
   Bachelors int,
   Masters int,
   Doctorate int )
```
Then, add a trigger to the person table such that during insert (or inclusive of update/delete as needed), the new record just adds 1 to each respective count applicable.

Then, for your web app, it would be instantaneous to grab one record from this summary table where age = 27 and you have ALL your classification stats.

However, if you distinctly wanted to know how many Married with Masters degree, you would have to roll back to master person list.

Alternatively, you could do a similar pre-aggregation but down a level of granularity something like
```
create table AgeStat ( 
   age int, 
   maritalstat int,    -- but I would actually use an enumerated value for marital status
   educationlevel int, -- and education level vs a hard description of each.
   peoplecount int )
```
and likewise have a trigger that updates the count based on the two combination elements per age. Then, if you wanted the total “Married”, you can sum(peoplecount) for age = 27 and maritalstat=(enumerator for “married” value)

Good luck, and hope it helps alternative solution for you.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

In MYSQL, how to summarise query results based on the parameters not specified in the query? – Ebay API

Answers