skip to Main Content

I have a MySQL table with around 4 million+ rows. Let us say the table is as follows:

Columns in table Person:

  • Id
  • Name
  • Age
  • Marital Status
  • Education Level
  • ‘Location Country’
  • ‘Description’

When I run a query based on Age, I also want to have a summary count of people with the same age in different marital status and also with different ‘Education Level’ and ‘Location Country’.

When I run a query based on Age and Education Level, I also want to have a summary count of people with the same age and Education Level in different marital status and also with different ‘Location Country’.

For example, the query issued would be SELECT * FROM Person WHERE Age = 27;. I also want results that would be produced by SELECT Education Level, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Education Level; and SELECT Location Country, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Location Country;

Also, this becomes more challenging for me when I have to do a search based on keywords on description and want a summary count on each of the other columns. The application I am developing is a sort of search engine. This can be seen in sites like Ebay,

I can possibly run these queries separately. But, with 4 million rows, the GROUP BY query will take substantial amount of time. This is an internet application and the query should complete within few seconds.

Any help would be much appreciated.

2

Answers


  1. You can do both in one query

    SELECT p.*, count(p2.id)  
    FROM Person p, Person p2 
    WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education 
    GROUP BY p1.id
    

    In such situation, I would suggest to save data in a memcache cache. You can expire cache if new data inserted to table or after some expiration time, to avoid long query execution. Another improvement would be using a LIMIT to reduce number of row returned by DB like this:

    SELECT p.*, count(p2.id)  
    FROM Person p, Person p2 
    WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education 
    GROUP BY p1.id
    LIMIT 10
    
    Login or Signup to reply.
  2. From what you are describing, I would have a separate aggregate table to query directly from that has those “roll-up” stats you want. How frequent is the “Person” table getting added to / changed. If you are only storing a person’s “Age”, what is the basis of the age if no date, and you add the person again in future they would have multiple records… such that

    At age X, so many people were married (or not) and had this level of education.
    At age Y, so many people… etc..

    I would create a summary table, something like

    create table AgeStat ( 
       age int, 
       married int, 
       single int, 
       divorced int, 
       HighSchool int, 
       Associates int,
       Bachelors int,
       Masters int,
       Doctorate int )
    

    Then, add a trigger to the person table such that during insert (or inclusive of update/delete as needed), the new record just adds 1 to each respective count applicable.

    Then, for your web app, it would be instantaneous to grab one record from this summary table where age = 27 and you have ALL your classification stats.

    However, if you distinctly wanted to know how many Married with Masters degree, you would have to roll back to master person list.

    Alternatively, you could do a similar pre-aggregation but down a level of granularity something like

    create table AgeStat ( 
       age int, 
       maritalstat int,    -- but I would actually use an enumerated value for marital status
       educationlevel int, -- and education level vs a hard description of each.
       peoplecount int )
    

    and likewise have a trigger that updates the count based on the two combination elements per age. Then, if you wanted the total “Married”, you can sum(peoplecount) for age = 27 and maritalstat=(enumerator for “married” value)

    Good luck, and hope it helps alternative solution for you.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search