Disadvantages of Indexing in SQL Server

Optimal performance in SQL Server OLTP (Online Transaction Processing) systems is achieved by creating balance between insert, update, delete, and select. Which is most important — adding data to the database, or retrieving it back in a report or application? The answer is yes! The very reason we save data is so that we can retrieve it. We also need to modify the data and delete non-relevant data.

Read operations (select) often oppose write operations (insert, update, and delete).

In a previous blog post, I discussed the value of indexing. The real value is to help us retrieve data quickly. No one likes to wait for data from the database. We want instant answers and indexes are the best way to facilitate that.

Well then, why not create a lot of them? Why not create an index to match the way we search or sort in each query? The answer is the negative impact on the server when we modify data.

I am going to way oversimplify this to make the point. Let’s look at some data.

CustomerId Lastname Firstname Addr1 Addr2 Zip
1 Jones Bill 111 East Main Street   78727
2 Adams Sue 121 East Ave Suite 500 04328
3 Simmons JoAnn 44 SW Parkway Suite 452 32111
4 Hamil Amy 2408 N Harrison   50421

 

Let’s say I have the following indexes and that they are all in ascending order:
Clustered index on Customer ID
Non-clustered index on Lastname
Non-clustered index on Zip

I’ve added the index order in blue to the table:

CustomerId Lastname Firstname Addr1 Addr2 Zip
1 1 3 Jones Bill 111 East Main Street   4 78727
2 2 1 Adams Sue 121 East Ave Suite 500 1 04328
3 3 4 Simmons JoAnn 44 SW Parkway Suite 452 2 32111
4 4 2 Hamil Amy 2408 N Harrison   3 50421

 

Now let’s say we hire a new person. In other words, we insert a row. We have one of two options — update the index, or let it grow out of date. If we update it, we will put everything in the proper order. Internally SQL Server considers where the rows really belong in the index. This takes work and adds overhead to the server. If we let it grow out of date, we might be looking for a row that we thought was in a particular order, but it really isn’t now.

CustomerId Lastname Firstname Addr1 Addr2 Zip
1 1 3 Jones Bill 111 East Main Street   4 78727
2 2 1 Adams Sue 121 East Ave Suite 500 1 04328
3 3 4 Simmons JoAnn 44 SW Parkway Suite 452 2 32111
4 4 2 Hamil Amy 2408 N Harrison   3 50421
5 Bruner Cathy 348 W 2ndStreet   42404

 

We will update ours. Here’s what it will look like after the update:

CustomerId Lastname Firstname Addr1 Addr2 Zip
1 1 4 Jones Bill 111 East Main Street   5 78727
2 2 1 Adams Sue 121 East Ave Suite 500 1 04328
3 3 5 Simmons JoAnn 44 SW Parkway Suite 452 2 32111
4 4 3 Hamil Amy 2408 N Harrison   4 50421
5 5 2 Bruner Cathy 348 W 2ndStreet   3 42404

 

We basically strike through the original value and reconsider where the rows belong. Let’s say that Sue Adams gets married and changes her lastname to Zachary and moves to the 32401 zipcode. This is an update. Now we need to redo the index order for the two non-clustered indexes:

CustomerId Lastname Firstname Addr1 Addr2 Zip
1 1 3 Jones Bill 111 East Main Street   5 78727
2 2 5 Zachary Sue 121 East Ave Suite 500 2 32401
3 3 4 Simmons JoAnn 44 SW Parkway Suite 452 1 32111
4 4 2 Hamil Amy 2408 N Harrison   4 50421
5 5 1 Bruner Cathy 348 W 2ndStreet   3 42404

 

We would have a similar situation if we delete a row. For example, let’s say that Amy Hamil asks to be removed from our customer list or is inactive for so long that we delete the row. We need to update the indexes again.

In our examples, we are considering a single row at a time. However, the reality is that rows are constantly being inserted, updated, and deleted. It’s not just one row we are considering — it’s hundreds, thousands, or even millions of rows. These processes are going on at the same time we report on the data. To optimize the read performance, we keep the indexes exactly up to date. However, at the same time of the read, when we modify data (if we are keeping the indexes up to date), we change the values of the position in the index due to the data modifications. Therefore, they are competing. If we don’t keep the indexes up to date, we might look in one place for the row when it’s really not located in that place any longer.

It’s also common for developers or dbas to create too many indexes. We might have a single query that is much improved by creating an index on a particular column. That’s great for the read performance of that query. However, modifications need to update that index. We maintain an index on a column that is part of every insert into the table but is searched only for that single query.

In summary, indexes improve performance if we search in the way we have indexed. Simultaneously, they potentially hurt us because of the overhead when data is modified.

In this article

Join the Conversation