I have a column in postgresql that is a text data type. But it really holds always 4 kind of values, these are:
EOS,EOS in 6 months,EOS in 12 months, EOS in 24 months
I always perform the same kind of query which will depend on user input but it will contain one or more of those values above, as such:
select * from table
where text_column in
('EOS','EOS in 6 months','EOS in 12 months', 'EOS in 24 months')
I wonder if I transform the data type from text to enum whether I will get any performance benefit?
2
Answers
The only foolproof answer here would be "you’ll need to benchmark it".
Having said that, if you only text this column for equality (remember that the
in
operator is just a fancy way of writing multiple equality tests) and you a limited number of valid values for that column, it makes sense to use an enum – the values are shorter (a single integer value vs a longer string), so it would store more compactly, leading to more efficient disk reads.TL;DR – using an enum is probably a good idea here, but you should benchmark it.
ENUM is encoded as an int32 so you will save a few bytes.
Performance should be a tiny bit faster but nothing spectacular.
You will be able to specify the sorting order at ENUM creation time, this is useful because text sorts alphabetically so ‘6 months’ is after ’24 months’ because ‘6’>’2’… it’s more intuitive.
And the most important, if you didn’t define a constraint on the TEXT field, then there’s no guarantee the column only contains one of the allowed values. If there is a bug in the application it could insert anything. ENUM will ensure the column only contains one of the allowed values.