I have Table T
T(A , B , C , D)
With this query:
Select A
from T
Where B = C
Does having an index on D make a difference in the performance of the given query?
Does having an index on columns that are not used in a select query make a difference?
3
Answers
Probably not, but it could make the query slower, if RAM that might have been used for serving the query is instead be used towards the index.
Additionally, if your model of indexing is separate indices for columns A, B, and C, that’s not how it works. For example, an index on column A alone is worthless for this query, and if you have separate indexes on columns B and C individually only one of them might possibly be used, and probably neither is helpful.
Instead, this query is best served by a single index for columns
(B, C, include A)
all in the same index and in that specific order (though you may be able to reverse C and B, since theB=C
expression is equivalent toC=B
).The
include
means it doesn’t have to worry about sorting/order for column A, and instead just uses the indexedA
values so the database can fully serve the query from the index alone, without needing to go back to the original table.Yes it does. The index will be larger because of the extra column so it does consume more IO and more RAM. If you can measure the difference, that’s a different thing. You can measure this yourself.
Do you have queries that search for the D column?
An index main purpose is to help you find tables entries quickly. Your index starts with column A, so if we wanted to find rows for particular A values, the index might be helpful. We are not looking for particular A values, though. But let’s say that the column A is not selective, i.e., containing only few distinct values. In the extreme there would only be one A value, and we’d could consider the index actually starting with column B. Then we’d find rows with particular B values quickly. Alas, we are not looking for particular B values in the query.
Conclusion: In order to find all rows matching B = C we’d have to read the whole index, anyway, because if we imagine B and C to contain the letters a to z, then we could have matches at any position in the index starting with a = a to the very end with z = z.
So, when would using the index make sense at all? The index does contain all columns we are interested in, so we could read the index instead of the table. But muddling one’s way through a complete index tree is usually much more work than just reading a table sequentially. Reading the whole index instead of the whole table only makes sense hence, when a table row we read from disk takes much more space than the three or four columns in the index. Let’s say A, B, C and D are integers, but there is another column E in the table that is a string containing about 5 kilobytes in each row. Then reading the few bytes from the index instead of the much bigger table rows would probably be preferable. And the additional D (just an integer) would not make the index much bigger, so including it could make things only slightly slower. If D were the string column, though, then this would make the index much bigger and you’d lose the advantage of reading the index versus reading the table.
So, to make a long story short: Whether the index will be used at all, and if so whether it would make a difference including column D in the index or not, depends on the table.