Have you ever been in the situation where you were creating an index that required more than one column in its key and thought, “Hmm I wonder if the order here matters?”.
To put it simply: yes, it does. The order of the columns matters when it comes to improving performance of queries in your SQL Server.
In this post we’ll be going through a few short examples and observing how different column orders behave with the same query.
I used the sample Stack Overflow database (50 GB) for the demos.
Let’s look at the query below:
SELECT LastEditorDisplayName, Score, ViewCount FROM Posts WHERE LastEditorDisplayName = 'Roger Pate' AND ViewCount > 500
This is a simple query. Selecting a few columns from the Posts table and with two predicates in the WHERE clause. The query filters the Posts table for ‘Roger Pate’ in the LastEditorDisplayName column and for records with a ViewCount greater than 500.
Now let’s go ahead and create two indexes for this query to use.
-- Index 1 CREATE NONCLUSTERED INDEX IX_LastEditorDisplayName_ViewCount ON Posts (LastEditorDisplayName, ViewCount) INCLUDE (Score)
-- Index 2 CREATE NONCLUSTERED INDEX IX_ViewCount_LastEditorDisplayName ON Posts (ViewCount,LastEditorDisplayName) INCLUDE (Score)
These indexes are identical, except the order of the columns has been switched.
Now let’s run the query twice, forcing it to use the indexes separately. We can use index hints to force the query to use our preferred index.
-- Index 1 SELECT LastEditorDisplayName, Score, ViewCount FROM Posts WITH (INDEX = IX_LastEditorDisplayName_ViewCount) WHERE LastEditorDisplayName = 'Roger Pate' AND ViewCount > 500 -- Index 2 SELECT LastEditorDisplayName, Score, ViewCount FROM Posts WITH (INDEX = IX_ViewCount_LastEditorDisplayName) WHERE LastEditorDisplayName = 'Roger Pate' AND ViewCount > 500
Here are the execution plans for the above two queries:
Now we can see that SQL Server believes that the second query has worse performance compared to the first query, despite the only difference being the order of the columns in the associated index.
Next, let’s have a look at the logical reads of both queries.
As we can see, the second query uses far more memory compared to the first query.
Why does this happen? Why is the second query far less optimised when the only difference is the order of columns in the query?
Let’s take a deeper look into the Index Seek operations in the execution plans.
The two main things that we need to focus on are the ‘Number of Rows Read’ and the ‘Actual Number of Rows for All Executions’. ‘Number of Rows Read’ refers to the number of rows SQL Server sifted through when looking for the records which match the predicates in the query. ‘Actual Number of Rows for All Executions’ refers to the number of rows actually returned to the end user.
So, when the ‘Number of Rows Read’ equals the ‘Actual Number of Rows for All Executions’, that is ideal for us because that means SQL Server only looked at the rows that were needed.
Now in query 1, we can see that 1009 rows were read. Whereas in query 2, we can see 3,645,154 million rows were read by SQL Server. Both queries only returned 1009 rows to the end user.
This is where the order of the columns in the index becomes critical.
To understand this better, lets look at the selectivity of the predicates in the WHERE clause in the query.
So below, I have split the predicates into two separate queries and counted the total number of rows returned by each query.
SELECT COUNT(*) FROM Posts WHERE LastEditorDisplayName = 'Roger Pate' SELECT COUNT(*) FROM Posts WHERE ViewCount > 500
This is where it gets interesting.
When you filter the Posts table for ‘Roger Pate’ in the ‘LastEditorDisplayName’ column, 1784 rows match that condition.
On the other hand, when you filter the Posts table for rows where the ‘ViewCount’ is greater than 500, over 3 million rows match that condition.
So when you create an index that starts with ‘ViewCount’, SQL Server will have to check each row in that 3,645,154 million rows to see which of those rows have ‘Roger Pate’ for the ‘LastEditorDisplayName’ column.
Now, if we made the first column ‘LastEditorDisplayName’, SQL Server will start by finding the rows that have ‘Roger Pate’, which is 1785 rows. Then, with those 1785 rows, SQL Server will pick the rows that have a ‘ViewCount’ greater than 500. As you can imagine, this way is a lot quicker for SQL Server as it plays with less data.
This is why you need to be careful when considering the order of the columns in your indexes.