Dilip's blog on Dynamics 365 for Finance and Operations and Classic AX: SQL Server Collations Explained

Friday, August 5, 2011

SQL Server Collations Explained

Couple of weeks back, I was working on SQL Collations (Troubleshooting Chinese Characters in AX) and had collected some notes on the way. Thought, will post it over here.

Overview

The physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared

Read more here -

http://msdn.microsoft.com/en-us/library/aa214408(v=sql.80).aspx

A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages).

The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.

The code page used to store non-Unicode character data.http://msdn.microsoft.com/en-us/library/aa174903(v=sql.80).aspx

SQL Server collations can be specified at many levels.

Define Collation at Instance Level
Collation at Database Level (Defaulted to Instance)
At Column Levels (Defaulted to database)

List of SQL Server Collations

http://msdn.microsoft.com/en-us/library/ms144250.aspx

Storage and Performance Effects of Unicode
Note - It's very important to understand how Unicode/non-Unicode storage affects the performance and size of the database, In the below para, I have highlighted especially languages e.g. Thai uses 2 times the space than Chinese chars.

http://msdn.microsoft.com/en-us/library/ms189617.aspx

SQL Server stores Unicode data by using the UCS-2 encoding scheme. Under this mechanism, all Unicode characters are stored by using 2 bytes

The difference in storing character data between Unicode and non-Unicode depends on whether non-Unicode data is stored by using double-byte character sets. All non-East Asian languages and the Thai language store non-Unicode characters in single bytes. Therefore, storing these languages as Unicode uses two times the space that is used specifying a non-Unicode code page. On the other hand, the non-Unicode code pages of many other Asian languages specify character storage in double-byte character sets (DBCS). Therefore, for these languages, there is almost no difference in storage between non-Unicode and Unicode

The following table shows the non-Unicode code pages that specify character data storage in double-byte character sets.

http://blogs.msdn.com/b/qingsongyao/archive/2009/04/10/sql-server-and-utf-8-encoding-1-true-or-false.aspx?PageIndex=4#comments

Choosing the correct collation while installing SQL Server 2008
Note - It's very important to choose a proper collation for your instance during the installation time, as later on going and changing it can prove costly

http://blogs.msdn.com/b/nickmac/archive/2008/07/21/choosing-the-correct-collation-on-sql-server-2008.aspx

Change Collation at server level or database level

http://www.db-staff.com/index.php/microsoft-sql-server/69-change-collation

http://sql-articles.com/articles/dba/how-to-change-server-collation-in-sql-server-2008/

Windows Collation sorting style
http://msdn.microsoft.com/en-us/library/ms143515.aspx

Check constraint Collation Dependency

Note - There are some action items for Microsoft to make it easy with respect to changing collations, Check Constraint Dependency with collations in their future releases of SQL

https://connect.microsoft.com/SQLServer/feedback/details/384157/check-constraint-collation-dependency

Make it easy to change Collation on Database

https://connect.microsoft.com/SQLServer/feedback/details/358320/make-it-easy-to-change-collation-on-a-database

Dilip's blog on Dynamics 365 for Finance and Operations and Classic AX

Total Pageviews

Search This Blog

Friday, August 5, 2011

SQL Server Collations Explained

No comments: