Background:
I’m writing an in house SEO crawler to check our positions in google.
The crawling is working brilliantly, the storage is fine, but Im now having performance issues with displaying the data (currently the storage table have over 11million records and is > 6.0GB in size.)
I’m trying to create an SQL query which will show me all the records from the input_keywords
table, and then the last result from the rank_result
table (for the given CompanyName) along with the previous result from the rank_result
table (which will show us our movement, either up or down)
The tables are as follows
Table: input_keywords
-------------------------------------------------------------------------------------------------------
| Field | Type | Null | Key | Default | Extra |
-------------------------------------------------------------------------------------------------------
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
-------------------------------------------------------------------------------------------------------
| keyword | char(150) | YES | UNI | NULL | |
-------------------------------------------------------------------------------------------------------
| last_check | timestamp | YES | MUL | 2000-01-01 00:00:00 | |
-------------------------------------------------------------------------------------------------------
| CREATION | timestamp | YES | | CURRENT_TIMESTAMP | |
-------------------------------------------------------------------------------------------------------
| MODIFICATION | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP |
-------------------------------------------------------------------------------------------------------
| p_deep | int(1) | YES | | 5 | |
-------------------------------------------------------------------------------------------------------
| check_freq_days | int(11) | YES | | 3 | |
-------------------------------------------------------------------------------------------------------
| type | char(50) | YES | | NULL | |
-------------------------------------------------------------------------------------------------------
| competitor | char(100) | YES | MUL | CompanyName | |
-------------------------------------------------------------------------------------------------------
Table: rank_result:
-----------------------------------------------------------------------------
| Field | Type | Null | Key | Default | Extra |
-----------------------------------------------------------------------------
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
-----------------------------------------------------------------------------
| keyword | char(150) | YES | MUL | | |
-----------------------------------------------------------------------------
| result_url | text | YES | | NULL | |
-----------------------------------------------------------------------------
| position | int(11) | YES | | NULL | |
-----------------------------------------------------------------------------
| check_time | timestamp | YES | MUL | NULL | |
-----------------------------------------------------------------------------
| useragent_used | char(255) | YES | | NULL | |
-----------------------------------------------------------------------------
| proxy_log | text | YES | | NULL | |
-----------------------------------------------------------------------------
| check_date | date | YES | | NULL | |
-----------------------------------------------------------------------------
| competitor | tinytext | YES | | NULL | |
-----------------------------------------------------------------------------
Some example data to show what I’m trying to achieve
Example contents: input_keywords
-----------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword | last_check | CREATION | MODIFICATION | p_deep | check_freq_days | type | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 2 | guitar accessories | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 3 | guitar amps | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 4 | guitar strings | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 5 | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
Example contents: rank_result (edited to show only relevant data)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword | result_url | position | check_time | useragent_used | proxy_log | check_date | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 723 | guitar accessories | https://www.companyname.com/gui… | 33 | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1572 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1672 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2511 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2656 | guitar accessories | https://www.companyname.com/gui… | 33 | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2809 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3147 | guitar accessories | https://www.companyname.com/gui… | 36 | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3490 | guitar accessories | https://www.companyname.com/gui… | 31 | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4530 | guitar accessories | https://www.companyname.com/gui… | 31 | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5277 | guitar accessories | https://www.companyname.com/gui… | 34 | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5480 | guitar accessories | https://www.companyname.com/gui… | 38 | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 9953 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40 | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38 | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32 | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281 | 2017-01-24 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Example output:
---------------------------------------------------------------------------------------------------------------------------------------------
| search_keyword | p_deep | check_freq_days | CREATION | last_check | current_position | previous_position |
---------------------------------------------------------------------------------------------------------------------------------------------
| guitar accessories | 5 | 3 | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37 | 39 |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitar strings | 5 | 3 | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitars | 5 | 1 | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14 | 14 |
---------------------------------------------------------------------------------------------------------------------------------------------
| bass guitars | 5 | 1 | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41 | 44 |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar Amp1 Nanotube | 5 | 1 | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar NanoCab | 5 | 1 | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a bass guitar | 5 | 3 | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5 | 4 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24 | 24 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an acoustic guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12 | 12 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an electric guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10 | 11 |
---------------------------------------------------------------------------------------------------------------------------------------------
Currently my query is as follows:
SELECT i.`keyword` AS 'search_keyword', i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`,
(SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position',
(SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position'
FROM input_keywords i
WHERE i.keyword LIKE "%s"
order by i.keyword ASC
LIMIT 0,100
So my questions are as follows:
- Is there a better way to write this query
- I have to limit this to 100 results, or the query is just too long, and times out, can this be resolved.
- If I do not have to have the ORDER BY rr.check_time DESC , the query is hundreds of times faster but will obviously not return the correct info as its not getting the last records but the first, so can I do this in a different way?
- I would dearly love to NOT have the
WHERE KEYWORD LIKE
and just return all myinput_keywords
along with their current rank and previous rank.
Additional info:
to return the Current rank for a keyword:
***input_keywords rank_result***
keyword == keyword
last_check == check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure)
competitor == competitor (this allows us to monitor us and our competitors.)
To return the previous rank for a keywords
***input_keywords rank_result***
keyword == keyword
competitor == competitor (this allows us to monitor us and our competitors.)
ORDER BY check_time desc
LIMIT 1,1 (to get the last but one result)
Please be kind – I’m self taught on all this stuff!
EDIT 1.
Explain Extended on my current query (i’ve included the create statement as well)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | PRIMARY | i | ALL | NULL | NULL | NULL | NULL | 1682 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3 | DEPENDENT SUBQUERY | rr | ref | keyword | keyword | 451 | func | 32 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2 | DEPENDENT SUBQUERY | r | ref | keyword,idx_rank_result_che… | keyword | 609 | func,GoogleCrawler.i.last_c… | 2 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE `input_keywords` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`keyword` char(150) DEFAULT NULL COMMENT 'the keyword....',
`last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly',
`CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5',
`check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3',
`type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc',
`competitor` tinytext,
PRIMARY KEY (`id`),
UNIQUE KEY `UNQ_Keyword` (`keyword`),
KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100))
) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8;
CREATE TABLE `rank_result` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`keyword` char(150) DEFAULT '',
`result_url` text,
`position` int(11) DEFAULT NULL,
`check_time` timestamp NULL DEFAULT NULL,
`useragent_used` char(255) DEFAULT NULL,
`proxy_log` text,
`check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting',
`competitor` tinytext,
PRIMARY KEY (`id`),
KEY `keyword` (`keyword`,`check_time`,`competitor`(50)),
KEY `idx_rank_result_check_time` (`check_time`)
) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8;
EDIT 2:
From the two answers so far , I have adjusted my index on the rank_result
and added in the limit by timescale.
I now get my results back in <1s which is an amazing result.
HOWEVER.
I still feel my query looks really ‘hacky’ and feel there must be a better, cleaner solution – is there one?
(current query which is now in production)
SELECT i.`keyword` AS search_keyword, i.p_deep, i.check_freq_days,
i.CREATION, i.last_check,
(SELECT r.position
FROM rank_result r
WHERE r.`keyword` = search_keyword AND
r.`competitor` = 'Absolute' AND
i.`last_check` = r.`check_time`
ORDER BY r.check_time DESC
LIMIT 0,1
) AS 'current_position',
(SELECT rr.`position`
FROM rank_result rr
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK)
ORDER BY rr.check_time DESC
LIMIT 1, 1
) AS 'previous_position'
FROM input_keywords i
ORDER BY i.keyword ASC
2
Answers
I would add to the subquery
a limitation if possible e.g.
or something like this to limit amount of records to be processed
Also consider moving the query to the FROM section and JOIN it with the main query
For this query:
You want an index on
rank_result(keyword, competitor, check_time, position)
.