Sql Server Collation Cheat Sheet

This post is not written for an architect or DBA who has to choose a collation and wants an in-depth explanation, but for a developer (perhaps one with an error to fix) who needs a primer on the subject.

Put very simply, collations have to do with languages and with the rules used to compare characters. This is relevant in joins and sorting. You might think that a collation ought to be specified for the query; that in your sql statement you should determine the rules to use for comparing values. You can do that (although you’ll set it at column, not query level), but the columns in the tables have a collation property which is what the query uses by default. This is a convenience as it standardises comparisons and saves you having to specify the rules over and over again. It’s when the two conflict that you have an error which may make it necessary to override one or more default collations.

You can view collations on your server as follows:
SELECT * FROM sys.fn_helpcollations()
/*
this code is from Microsoft's tech reference; the preferred base collation has changed in recent versions which may be
the reason for the exclusion below
*/
WHERE name NOT LIKE 'SQL%'

Collation Names
A name like “Latin1_General_100_CS_AS_WS” is quite forbidding. There’s information in the “description” column of the query above, but once you understand the codes embedded in a collation name you’ll probably have enough information for most purposes:

CS/CI Case-Sensitive or Case-Insensitive
AS/AI Accent-Sensitive or Accent-Insensitive
WS Width-Sensitive (if missing it appears that the collation defaults to Width-insensitive)
KS (Kana-Sensitive) Relates to comparison of Japanese characters

I’ve written the query below to show the effects of collation changes:
--treats capital letters and small letters the same--
SELECT 'Latin1_General_CI_AS' As Collation
,'UC'
,'uc'
, CASE WHEN ('UC' COLLATE Latin1_General_CI_AS ) = ('uc' COLLATE Latin1_General_CI_AS)
THEN 'Same' ELSE 'Different' END As Result
UNION ALL
--treats capital letters and small letters as DIFFERENT--
SELECT 'Latin1_General_CS_AS' As Collation
,'UC'
,'uc'
,CASE WHEN ('UC' COLLATE Latin1_General_CS_AS ) = ('uc' COLLATE Latin1_General_CS_AS)
THEN 'Same' ELSE 'Different' END
UNION ALL
--treats accented and unaccented letters as identical--
SELECT 'Latin1_General_CI_AI' As Collation
,'e'
,'é'
,CASE WHEN ('e' COLLATE Latin1_General_CI_AI ) = ('é' COLLATE Latin1_General_CI_AI)
THEN 'Same' ELSE 'Different' END As Result
UNION ALL
--treats accented and unaccented letters as DIFFERENT--
SELECT 'Latin1_General_CI_AS' As Collation
,'e'
,'é'
,CASE WHEN ('e' COLLATE Latin1_General_CI_AS ) = ('é' COLLATE Latin1_General_CI_AS)
THEN 'Same' ELSE 'Different' END
UNION ALL
--defaults to treating "big" and "small" numbers as identical--
SELECT 'Latin1_General_CI_AS'
,'102'
,'10²'
,CASE WHEN ('102' COLLATE Latin1_General_CI_AS) = ('10²' COLLATE Latin1_General_CI_AS)
THEN 'Same' ELSE 'Different' END
UNION ALL
--treats "big" and "small" numbers as DIFFERENT--
SELECT 'Latin1_General_100_CS_AS_WS'
,'102'
,'10²'
,CASE WHEN ('102' COLLATE Latin1_General_100_CS_AS_WS) = ('10²' COLLATE Latin1_General_100_CS_AS_WS)
THEN 'Same' ELSE 'Different' END

Sorry for not demonstrating Kana-Sensitivity, but if you need to know about that you’ll want to look at a more detailed reference.

The typical collation error in a join is this: "Cannot resolve the collation conflict between {one collation} and {a different collation} in the equal to operation. Why did it happen? In most cases you’re comparing tables in two databases with differing collations. If there’s a temporary table in your query it’s the same problem: your temporary table lives in tempdb. You can check the collations on the server like this:
--server and database collations
SELECT *
FROM
(
SELECT 'SERVER DEFAULT' as name, SERVERPROPERTY(N'Collation') As collation_name
UNION ALL
SELECT name, collation_name
FROM sys.databases d
) D1
ORDER BY
CASE WHEN name = 'Server Default' THEN '-998'
WHEN name = 'tempdb' THEN '999'
ELSE name
END

The results you get may suggest that the database collations need looking at, but that will take time of course. What can you do to get the query working in the meantime? There are ways of getting round the problem and they aren’t kludges that will cause you problems later on. If the collation clash is caused by your temporary table you can redefine it as follows:
CREATE TABLE #T1(C1 varchar(10) COLLATE DATABASE_DEFAULT NOT NULL)

You’re not specifying an explicit collation, which will avoid problems if there’s a change. This fixed the problem in my testing, which was interesting since it must mean that DATABASE_DEFAULT refers to the local database, not tempdb.

Can you fix the join without touching any of the participating tables? I recreated the temp table issue and fixed it as follows:
SELECT *
FROM #T1
JOIN T1
ON T1.C1 = (#T1.C1 COLLATE DATABASE_DEFAULT)

I hope the above is helpful. Good luck!

One thought on “Sql Server Collation Cheat Sheet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s