I have a huge table for load to Cloud (BigQuery).
Table have very simple structure:
CREATE TABLE `my_huge_table` (
`user_id` int(10) NOT NULL,
`value_id` int(11) NOT NULL,
`some_value` tinyint(2) unsigned NOT NULL DEFAULT '1',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
UNIQUE KEY `user_segment_UNIQUE` (`user_id`,`value_id`),
KEY `user_idx` (`user_id`),
KEY `value_idx` (`value_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The table has about 850 000 000 (850M) rows .
And we have info from information_schema (result values formatted for humans):
select TABLE_ROWS, DATA_LENGTH, INDEX_LENGTH
from information_schema.TABLES where TABLE_NAME = "my_huge_table"
"TABLE_ROWS" "849 536 609"
"DATA_LENGTH" "59 544 354 816"
"INDEX_LENGTH" "38 034 964 480"
Bigquery allow loading chunks less then 4Gb. I’m determine rows count for every chunk and receive pack of queries with LIMIT, OFFSET. For example:
....
SELECT * INTO OUTFILE 'my_huge_table_720000000_10000000.bulk' FROM `my_huge_table` t1 limit 10000000 offset 720000000;
SELECT * INTO OUTFILE 'my_huge_table_730000000_10000000.bulk' FROM `my_huge_table` t1 limit 10000000 offset 730000000;
...
And its so slow! And this not it all.
There data with increase query time (seconds) for every chunk:
M seconds
10 32
20 35
30 38
40 40
50 44
60 45
70 49
80 51
90 53
100 57
110 59
120 63
130 65
140 66
150 71
160 73
170 75
180 77
190 80
200 83
210 87
220 88
230 90
240 94
250 94
260 99
270 100
280 109
290 118
300 107
310 113
320 115
330 118
340 121
350 124
360 127
370 130
380 130
390 135
400 139
410 140
420 144
430 144
440 146
450 148
460 149
470 151
480 155
490 160
500 164
510 164
520 165
530 170
540 169
550 173
560 178
570 183
580 185
590 186
600 206
610 380
620 388
630 399
640 411
650 426
660 453
670 491
680 504
690 779
700 795
710 901
720 922
730 956
740 936
750 972
760 1010
770 1027
780 1044
790 1047
800 980
810 998
Question: How do I increase speed for all of that queries?
2
Answers
You have discovered an inconvenient truth. LIMIT / OFFSET is ridiculously inefficient for large offsets.
Are you running on Linux or another UNIX-like? Do you have enough drive space to hold twice the whole text file, at least temporarily?
If so, you could use split to break it into chunks. Something like this will break it into chunks of roughly 2GiB each.
As has already been pointed out,
OFFSET
is very inefficient. It is, however, possible to iterate using a compound key; seehttps://mysql.rjweb.org/doc.php/deletebig#iterating_through_a_compound_key
As for chunk size, I doubt if there is much difference between 1000 rows at a time and 4GB at a time.
Alternatively, if no
user_id
has too many rows, consider doing one user at a time. Again, "remember where you left off"; do not useOFFSET
:PS: There is no need to have
KEY(user_id)
since `UNIQUE(user_id, …) provides such.