Dealing with case and accents in SQL Server
Takeaway: Collation describes the code page, case sensitivity, accent sensitivity, and language or alphabet in use. This SQL Server tip illustrates how you can force the use of any specific collation.
Collation describes the code page, case sensitivity, accent sensitivity, and language or alphabet in use. I often see collation handled on a per-database scale. For instance, in my experience, most DBAs in North America go with the default collation, which is case insensitive. The collation has no effect on the data itself, but rather how it is compared and indexed.
This tip illustrates how you can force the use of any specific collation without regard to the collation specified when the database of interest is created. It also shows some of the complications involved when your database attempts to deal with multiple character sets.
To try out the experiments in this column, create a test database and populate it with a few rows, as in Listing A. (Note: Regular readers of my tips will notice that I create a schema in which to house the tables related to any given tip. This is because I always use a single database called SQLTips for this purpose. Using schemas nicely groups the tables relevant to any particular tip.)
Listing B contains a few test queries for you to run. In each case, the query returns all four rows. This is to be expected when a case-insensitive collation is chosen. But suppose your task is to locate one of these rows using an exact match?
This is where collation comes in. Although a DBA usually specifies the collation when the database is created, you can apply a different collation after the fact simply by adding a COLLATE predicate to the WHERE clause. See Listing C. Each of the first three queries returns exactly one row, while the fourth query returns no rows.
Suppose that you want to change all possible spellings of 'fuller' (including the nonsense ones such as 'fUlLEr') to 'Fuller'. You can do that quite easily with the code in Listing D.
I like to run such a transaction when dealing with data inherited from a mainframe database because I tend to think of upper-case words as yelling, and doing it once to the data enables me to avoid worrying about it thereafter.
If you want a list of all the collations available, along with terse explanations, run this SQL:
SELECT * FROM ::fn_helpcollations()
To determine the default collation of the database of interest, do this:
SELECT collation_name
FROM master.sys.databases
WHERE Name='SQLTips'
In Listing E, I add some rows containing special alphabetic characters. I use the name of my good friend and colleague, DejanSunderic, whose surname is properly written as Šunderic' (the trailing 'c' should actually have an acute accent above it -- the display is a result of the selected font).
As you see, interesting things occur without the COLLATE predicate:
SELECT *
FROM CaseCheck.Names
WHERE Name LIKE N'S%'
This produces the following result set:
NamePK Name
6 SUNDERIC
7 sUnDeRIc
8 Sunderic
Notice that the special case of 'Šunderic' is not included. Let's experiment further:
SELECT * FROM CaseCheck.Names
ORDER BY Name
This produces the following result set:
NamePK Name
5 Šunderic
1 Fuller
2 FuLLer
3 FULLER
4 fuller
6 SUNDERIC
7 sUnDeRIc
This is clearly incorrect. PK 5 should reside somewhere below all occurrences of 'Fuller.' I am no language expert, so I cannot claim to know where among the Ss this ought to reside, but I suspect that this decision depends on local rules. It's clear that 'Šunderic' ought to reside somewhere after all occurrences of 'Fuller' and not before -- but where in the list of names beginning with 'S'?
I did another experiment using the name of my friend Alexander Karmanov. To see his name in the Cyrillic alphabet, check out Figure A. I ran the statement in Figure B using the Cyrillic alphabet spelling of his name.
Using the default character set in SSMS, a query on the table returns his name as a series of question marks. However, if I run the following statement, SSMS returns the correct row but still displays his name as a series of question marks due to the default character set. The important thing is that the Unicode characters were entered and accepted. See Figure C.
If you want to determine the native collation in a given database, use this code:
SELECT collation_name
FROM master.sys.databases
WHERE Name = 'SQLTips'
Miss a tip?
Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller's column.
Print/View all Posts Comments on this article
|
|
|
|
|
|
|
|
White Papers, Webcasts, and Downloads
- The Impact of Virtualization Software on Operating Environments VMware Today's use of virtualization technology allows IT professionals to ... Download Now
- Five Steps to Determine When to Virtualize YourServers VMware Thinking of virtualizing the servers at your company? Use this step-by-step guide to determine when's the best time to make your big move. Download Now
- The True Costs of Virtual Server Solutions VMware Discover ways to streamline and simplify your assessment of the total acquisition costs of a server virtualization environment. Download Now
- Why Isn't Server Virtualization Saving Us More? A Few Small Changes May Dramatically Increase Your Efficiency VMware Ever wonder why your company isn't saving more from its server virtualization? Making a few small changes could dramatically increase your efficiency. Download Now
- Building the Virtualized Enterprise with VMware Infrastructure VMware This paper explains how adopting a virtual infrastructure -- comprised of server, storage, and networking virtualization technologies -- can help your organization build a sustainable competitive ... Download Now
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET

