skip to Main Content

Background:
I’m writing an in house SEO crawler to check our positions in google.
The crawling is working brilliantly, the storage is fine, but Im now having performance issues with displaying the data (currently the storage table have over 11million records and is > 6.0GB in size.)

I’m trying to create an SQL query which will show me all the records from the input_keywords table, and then the last result from the rank_result table (for the given CompanyName) along with the previous result from the rank_result table (which will show us our movement, either up or down)

The tables are as follows

Table: input_keywords

-------------------------------------------------------------------------------------------------------
| Field           | Type             | Null | Key | Default             | Extra                       |
-------------------------------------------------------------------------------------------------------
| id              | int(11) unsigned | NO   | PRI | NULL                | auto_increment              |
-------------------------------------------------------------------------------------------------------
| keyword         | char(150)        | YES  | UNI | NULL                |                             |
-------------------------------------------------------------------------------------------------------
| last_check      | timestamp        | YES  | MUL | 2000-01-01 00:00:00 |                             |
-------------------------------------------------------------------------------------------------------
| CREATION        | timestamp        | YES  |     | CURRENT_TIMESTAMP   |                             |
-------------------------------------------------------------------------------------------------------
| MODIFICATION    | timestamp        | YES  |     | NULL                | on update CURRENT_TIMESTAMP |
-------------------------------------------------------------------------------------------------------
| p_deep          | int(1)           | YES  |     | 5                   |                             |
-------------------------------------------------------------------------------------------------------
| check_freq_days | int(11)          | YES  |     | 3                   |                             |
-------------------------------------------------------------------------------------------------------
| type            | char(50)         | YES  |     | NULL                |                             |
-------------------------------------------------------------------------------------------------------
| competitor      | char(100)        | YES  | MUL | CompanyName            |                          |
-------------------------------------------------------------------------------------------------------

Table: rank_result:

-----------------------------------------------------------------------------
| Field          | Type             | Null | Key | Default | Extra          |
-----------------------------------------------------------------------------
| id             | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
-----------------------------------------------------------------------------
| keyword        | char(150)        | YES  | MUL |         |                |
-----------------------------------------------------------------------------
| result_url     | text             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| position       | int(11)          | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| check_time     | timestamp        | YES  | MUL | NULL    |                |
-----------------------------------------------------------------------------
| useragent_used | char(255)        | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| proxy_log      | text             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| check_date     | date             | YES  |     | NULL    |                |
-----------------------------------------------------------------------------
| competitor     | tinytext         | YES  |     | NULL    |                |
-----------------------------------------------------------------------------

Some example data to show what I’m trying to achieve

Example contents: input_keywords

-----------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword               | last_check          | CREATION            | MODIFICATION        | p_deep | check_freq_days | type | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 2  | guitar accessories    | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 3  | guitar amps           | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 4  | guitar strings        | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 5  | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5      | 3               | NULL | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------

Example contents: rank_result (edited to show only relevant data)

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id    | keyword            | result_url                           | position | check_time          | useragent_used                       | proxy_log             | check_date | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 723   | guitar accessories | https://www.companyname.com/gui… | 33       | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL                  | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1572  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1672  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2511  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2656  | guitar accessories | https://www.companyname.com/gui… | 33       | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181         | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2809  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201        | 2017-01-19 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3147  | guitar accessories | https://www.companyname.com/gui… | 36       | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181         | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3490  | guitar accessories | https://www.companyname.com/gui… | 31       | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252        | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4530  | guitar accessories | https://www.companyname.com/gui… | 31       | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252        | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5277  | guitar accessories | https://www.companyname.com/gui… | 34       | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281   | 2017-01-20 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5480  | guitar accessories | https://www.companyname.com/gui… | 38       | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281   | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 9953  | guitar accessories | https://www.companyname.com/gui… | 37       | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40       | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281  | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38       | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281   | 2017-01-23 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32       | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281   | 2017-01-24 | CompanyName   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Example output:

---------------------------------------------------------------------------------------------------------------------------------------------
| search_keyword              | p_deep | check_freq_days | CREATION            | last_check          | current_position | previous_position |
---------------------------------------------------------------------------------------------------------------------------------------------
| guitar accessories          | 5      | 3               | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37               | 39                |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitar strings     | 5      | 3               | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitars            | 5      | 1               | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14               | 14                |
---------------------------------------------------------------------------------------------------------------------------------------------
| bass guitars                | 5      | 1               | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41               | 44                |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar Amp1 Nanotube     | 5      | 1               | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar NanoCab           | 5      | 1               | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL             | NULL              |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a bass guitar      | 5      | 3               | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5                | 4                 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a guitar           | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24               | 24                |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an acoustic guitar | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12               | 12                |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an electric guitar | 5      | 3               | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10               | 11                |
---------------------------------------------------------------------------------------------------------------------------------------------

Currently my query is as follows:

SELECT i.`keyword` AS 'search_keyword',  i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`,
              (SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position',
              (SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position'
              FROM input_keywords i
              WHERE i.keyword LIKE "%s"
              order by i.keyword ASC
              LIMIT 0,100

So my questions are as follows:

  1. Is there a better way to write this query
  2. I have to limit this to 100 results, or the query is just too long, and times out, can this be resolved.
  3. If I do not have to have the ORDER BY rr.check_time DESC , the query is hundreds of times faster but will obviously not return the correct info as its not getting the last records but the first, so can I do this in a different way?
  4. I would dearly love to NOT have the WHERE KEYWORD LIKE and just return all my input_keywords along with their current rank and previous rank.

Additional info:

to return the Current rank for a keyword:

***input_keywords          rank_result***
    keyword           ==    keyword
    last_check        ==    check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure)
    competitor        ==    competitor (this allows us to monitor us and our competitors.)

To return the previous rank for a keywords

***input_keywords          rank_result***
    keyword           ==    keyword
    competitor        ==    competitor (this allows us to monitor us and our competitors.)
    ORDER BY check_time desc
    LIMIT 1,1 (to get the last but one result)

Please be kind – I’m self taught on all this stuff!

EDIT 1.

Explain Extended on my current query (i’ve included the create statement as well)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type        | table | type | possible_keys                | key     | key_len | ref                          | rows | filtered | Extra                       |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1  | PRIMARY            | i     | ALL  | NULL                         | NULL    | NULL    | NULL                         | 1682 | 100.00   | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3  | DEPENDENT SUBQUERY | rr    | ref  | keyword                      | keyword | 451     | func                         | 32   | 100.00   | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2  | DEPENDENT SUBQUERY | r     | ref  | keyword,idx_rank_result_che… | keyword | 609     | func,GoogleCrawler.i.last_c… | 2    | 100.00   | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CREATE TABLE `input_keywords` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` char(150) DEFAULT NULL COMMENT 'the keyword....',
  `last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly',
  `CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  `p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5',
  `check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3',
  `type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc',
  `competitor` tinytext,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UNQ_Keyword` (`keyword`),
  KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100))
) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8;


CREATE TABLE `rank_result` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `keyword` char(150) DEFAULT '',
  `result_url` text,
  `position` int(11) DEFAULT NULL,
  `check_time` timestamp NULL DEFAULT NULL,
  `useragent_used` char(255) DEFAULT NULL,
  `proxy_log` text,
  `check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting',
  `competitor` tinytext,
  PRIMARY KEY (`id`),
  KEY `keyword` (`keyword`,`check_time`,`competitor`(50)),
  KEY `idx_rank_result_check_time` (`check_time`)
) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8;

EDIT 2:

From the two answers so far , I have adjusted my index on the rank_result and added in the limit by timescale.
I now get my results back in <1s which is an amazing result.

HOWEVER.

I still feel my query looks really ‘hacky’ and feel there must be a better, cleaner solution – is there one?

(current query which is now in production)

SELECT i.`keyword` AS search_keyword,  i.p_deep, i.check_freq_days, 
i.CREATION, i.last_check,
        (SELECT r.position
         FROM rank_result r 
         WHERE r.`keyword` = search_keyword AND
               r.`competitor` = 'Absolute' AND
               i.`last_check` = r.`check_time`
         ORDER BY r.check_time DESC
         LIMIT 0,1
        ) AS 'current_position',
        (SELECT rr.`position`
         FROM rank_result rr
         WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK)
         ORDER BY rr.check_time DESC
         LIMIT 1, 1
        ) AS 'previous_position'
        FROM input_keywords i
        ORDER BY i.keyword ASC

2

Answers


  1. I would add to the subquery

    SELECT rr.`position` AS 'previous_position' 
    FROM rank_result rr 
    WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' 
    ORDER BY rr.check_time DESC LIMIT 1,1
    

    a limitation if possible e.g.

    AND rr.check_time>NOW - 1 WEEK 
    

    or something like this to limit amount of records to be processed

    Also consider moving the query to the FROM section and JOIN it with the main query

    Login or Signup to reply.
  2. For this query:

    SELECT i.`keyword` AS search_keyword,  i.p_deep, i.check_freq_days, i.CREATION, i.last_check,
            (SELECT r.position
             FROM rank_result r 
             WHERE r.`keyword` = search_keyword AND
                   r.`competitor` = 'CompanyName' AND
                   i.`last_check` = r.`check_time`
             ORDER BY r.check_time DESC
             LIMIT 0,1
            ) AS current_position,
            (SELECT rr.`position`
             FROM rank_result rr
             WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName'
             ORDER BY rr.check_time DESC
             LIMIT 1, 1
            ) AS previous_position
    FROM input_keywords i
    WHERE i.keyword LIKE "%s"
    ORDER BY i.keyword ASC
    LIMIT 0, 100;
    

    You want an index on rank_result(keyword, competitor, check_time, position).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search