I Have a mysql(v8.x) table with 4 Billion records and I need to partition on one of the varchar column.
Partition on date column or integer column doesn’t work in this case.
This is a unique usecase.
Partitioning will be on cust_id column
cust_id contains data like below
| 2184-204CECE8FPNE_01_0102_4 |
| 2184-08NFINO3NC0E_01_0102_4 |
| 2185-B28RNFPE3NCS_01_0102_4 |
| 2185-HP3DNFNF2DNF_01_0102_4 |
| 9732-0I02NCNN30DW_01_0102_4 |
| 9732-8JFN30FJMDP3_01_0102_4 |
| 9732-B02F30CN3D25_01_0102_4 |
| 3184-N33NF4FNPF33_01_0102_4 |
| 3184-0702NDNDJD3S_01_0102_4 |
since we have hyphen in the data, how would I partition the data?
Yes, partition will be on first set of digits(before the character hyphen – before the First hyphen only)
I was trying to create a table as below snippet, i failed with it 🙁
Appreciate all the help,
Thank you
Here is the create statement I tried with, it didn’t work
CREATE TABLE cust_part (
id varchar(127) NOT NULL,
cust_id varchar(127) NOT NULL,
PRIMARY KEY (id,cust_id),
KEY idx_cust_id (cust_id)
)
PARTITION BY RANGE COLUMNS (substring_index(cust_id,'-',1))
(
PARTITION p1000 VALUES LESS THAN ('1000'),
PARTITION p2000 VALUES LESS THAN ('2000'),
PARTITION p3000 VALUES LESS THAN ('3000'),
PARTITION p4000 VALUES LESS THAN ('4000'),
PARTITION p5000 VALUES LESS THAN ('5000'),
PARTITION p6000 VALUES LESS THAN ('6000'),
PARTITION p7000 VALUES LESS THAN ('7000'),
PARTITION p8000 VALUES LESS THAN ('8000'),
PARTITION p9000 VALUES LESS THAN ('9000'),
PARTITION p10000 VALUES LESS THAN ('10000'),
PARTITION pMax VALUES LESS THAN MAXVALUE
);
2
Answers
https://dev.mysql.com/doc/refman/8.4/en/partitioning-columns-range.html says:
So you can’t use
substring_index()
or any other function.You don’t have to. You can rely on strings being ordered less than or greater than the fixed values you use in the partition defs.
But you must keep in mind that because the column is a varchar, not an integer, you must order the ranges in alphabetical order, not numeric order.
The following works:
Alphabetically,
'10000-'
goes between'1000-'
and'2000-'
. You’ll have to order any further partitions in a similar way.Give this a try…
If there are many rows for the typical cust_id, this will help due to "locality of reference" ("clustering").