Last week we discussed about whether to choose variable or fixed length data type and when to use when to use a Unicode data type like nchar or nvarchar in Common Mistakes in SQL Server – Part 1. Let’s discuss about the 2nd common mistake I have observed – using Identity column as primary key. In this post I will discuss about the pros and cons of identity column. Identity is a property which can be defined on an integer, decimal, numeric, small integer, big integer or tiny integer data type, an identity column is a column which – generally is used as primary key. The Identity property creates an incremental value for the specified column automatically, which is why it is widely used by developers when they designed the table and a primary key – an identifier column. This is the Common Mistakes in SQL Server Part 2
How does it work?
The Identity column property has two components: a seed value and an incremental value. A seed is the initial value set for that particular column , where as an incremental is added to last inserted value when a new row is created. When we set the Identity properties, we have to define the value for first record aka seed value (for example, 0) and
the incremental value (for example 1). Once you have defined this property and you insert a record into the table for the first time, it will add 0 for the Identity column. upon the 2ndtime you insert a record it, will have a value 1 (seed of 0 plus the incremental of, 1), and at 3rd record insertion it would have a value of 2 (1+1). Download the demo script here Part2_1 which illustrates the table definition with Identity column)
Advantage of using an Identity column as a Primary Key:There are couple of advantages using an Identity column as a Primary Key, including:
- SQL Server itself manages the Identity values
- Developers don’t need to write a code for new primary key value because Identity column value is created automatically
- The size of index will be low as compared to index size on char based columns
Are there any catches?
While there are couple of advantages, there are some disadvantages which makes managing Identity columns a bit harder including:
1. You cannot control Identity values, because this column and its values are controlled and managed by SQL Server. Please note, however, we may use ‘set Identity_insert tblName on‘ to insert an Identity value explicitly.
2. For any reason, if insertion fails, a value for Identity will get created. Then the next time you insert a record it will create gap in numbering. Let us check this out with an example. Create a table called TestIdentity with three columns: one Identity column, one varchar(10) and one varchar(5) column. We’ll insert 10 records here, and then 11th record will fail because we’ll try to insert a value which is greater in size then what is defined in table definition. Once it has failed, we’ll again pass the correct values. Please notice the gap created in the Identity value. Download the demo script here Part2_2
3. You cannot add an Identity property to an existing column. There are only two options that you have if you need to add an Identity column to a table – drop and recreate the table or add a new column with Identity value (and then drop the existing column, if applicable.
4. There has been a lot of said about the disadvantage of Identity columns , especially when it comes to a replicated database., you have to alter your identity property and say “Not for Replication”.
Then, if you have to restore your replicated database for any reason you will have to be very careful, as you will have lost all your replication settings unless you specify that you wish to keep_replication. For further information on what all we have to aware of when we back up or restore replicated database please refer to this article. Whenever the Identity has reached its threshold value you will have to reseed the Identity values and adjust Identity range at publisher end. This happens when we restore backup at subscriber end; you will have to find the last inserted Identity value and adjust the Identity range at publisher, using these steps:
a. Execute select ident_current(‘tblName’) for every subscriber you have
b. Note down the highest value
c. Go to your publication server and execute dbcc checkident(‘tblName’,reseed,value*+1) where value = value found in step bd. Finally execute sp_adjuste. Now, go to your publication server and execute sp_adjustpublisherIdentityrange PublicationName, TblName
Note: above case is for Merge Replication. Here is a nice article on MSDN on Replicating Identity Columns, and you may also refer Hilary Cotter’s article Identity Crisis (generally known as Identity Crisis). I personally believe that one should avoid using the Identity property , especially as a primary key when you have replicated database to over come the issues you face with Identity columns. Rob Volk has written an interesting observation on Identity and Primary Key. Yes, doing this will require some more efforts on your part to create incremental values for your Primary key; but it really is just a matter of writing a few more line of code. Additionally, there are alternate ways of doing this – for more information you may refer an article here on how to generate auto incremental values in SQL Server.