Friday, August 5, 2011

SQL Server Collations Explained

Couple of weeks back, I was working on SQL Collations (Troubleshooting Chinese Characters in AX) and had collected some notes on the way. Thought, will post it over here.

Overview
The physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared
Read more here -
A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages).
The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.

The code page used to store non-Unicode character data.http://msdn.microsoft.com/en-us/library/aa174903(v=sql.80).aspx
SQL Server collations can be specified at many levels.
  • Define Collation at Instance Level
  • Collation at Database Level (Defaulted to Instance)
  • At Column Levels (Defaulted to database)

List of SQL Server Collations
Storage and Performance Effects of Unicode
Note - It's very important to understand how Unicode/non-Unicode  storage affects the performance and size of the database, In the below para, I have highlighted especially languages e.g. Thai uses 2 times the space than Chinese chars.
SQL Server stores Unicode data by using the UCS-2 encoding scheme. Under this mechanism, all Unicode characters are stored by using 2 bytes
The difference in storing character data between Unicode and non-Unicode depends on whether non-Unicode data is stored by using double-byte character sets. All non-East Asian languages and the Thai language store non-Unicode characters in single bytes. Therefore, storing these languages as Unicode uses two times the space that is used specifying a non-Unicode code page. On the other hand, the non-Unicode code pages of many other Asian languages specify character storage in double-byte character sets (DBCS). Therefore, for these languages, there is almost no difference in storage between non-Unicode and Unicode
The following table shows the non-Unicode code pages that specify character data storage in double-byte character sets.
   
Choosing the correct collation while installing SQL Server 2008
Note - It's very important to choose a proper collation for your instance during the installation time, as later on going and changing it can prove costly
 
Change Collation at server level or database level


Check constraint Collation Dependency
Note - There are some action items for Microsoft to make it easy with respect to changing collations, Check Constraint Dependency with collations in their future releases of SQL
Make it easy to change Collation on Database

No comments:

Post a Comment