Automate the process of trimming table data in SQL Server 2005
Takeaway: Arthur Fuller recently instructed a colleague on how to automate the process of trimming table data in SQL Server 2005. Find out how he is able to turn an onerous task into a breeze.
A friend and colleague recently faced an interesting problem. Each month he receives a huge amount of data from a client, which he then must massage before it's usable. The tables contain millions of rows, and the structure of the tables may vary from month to month. To make matters worse, much of the data contains leading or trailing blanks or both.
After my friend did the job by hand the first time around, he quickly discovered the tedium of a manual approach. He wondered whether there was a way to automate the process of trimming all the data—the problem is that he cannot know beforehand how many columns there are or the names of the columns. Fortunately, it is possible to automate this process.
In SQL Server 2005, there are a number of very useful system views that reside in the INFORMATION_SCHEMA schema. The view of interest in this situation is INFORMATION_SCHEMA.COLUMNS.
All of these system views are hidden from your scrutiny, so you can't simply right-click and Script To a query window. Microsoft did this on purpose so it is free to change the innards. If you know something about the system tables, you can guess how the view is built, but for our purposes here, the innards are not important.
You can, however, obtain a wealth of information about the columns in your tables. Here are some of the interesting ones:
- Table_Catalog: name of the database
- Table_Schema: Schema
- Table_Name: name of the table
- Column_Name: name of the column
- Ordinal_Position: column number
- Column_Default: default value if any
- Is_Nullable: contains Yes or No
- Data_Type: indicates the data type of the column
For a list of the columns on any given table, your query would look something like this:
USE AdventureWorks
SELECT*FROMINFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='ProductDescription'
Try this command on one of your databases, substituting its name and a table of interest. Run it to get an appreciation of its output.
Now I'll use SQL to generate the UPDATE statements I need to trim all the data in the table. I achieve this by assembling a literal for each column in the table of interest. I'll skip all the data types that are irrelevant to our needs and concentrate on the char and varchar types. See Listing A. The output of this command is a series of UPDATE statements similar to what appears in Listing B. All you have to do is collect the output and run the block of statements as a query.
In the examples above, I was interested in only one table, but if I want to process all the tables within a schema, I would simply modify the WHERE clause to interrogate the Schema_Name column instead. See Listing C. Check out Listing D to see the output of this command.
This reveals a small problem. As denoted by my naming scheme, a view got included, and it makes no sense to try to update a view when I am already updating its underlying tables. So I need a way to skip the views. Looking only at the INFORMATION_SCHEMA.COLUMNS, there is no way to avoid this because views are not distinguished from tables.
Another view, INFORMATION_SCHEMA.Tables, comes to the rescue. I just need to add an EXISTS predicate to the above query. See Listing E. This version strips out the views from the list to process.
My colleague is now a happy camper. The solution is simple to use and very flexible. He can use it on a single table, or all the tables in a schema, or even on all the tables in a database, just by modifying the WHERE clause.
As you can see, SQL Server 2005 makes it far easier to work with table and column information than was the case in SQL Server 2000. I have also demonstrated how to generate SQL code using SQL. Now you know how to turn an onerous task into a breeze.
Miss a tip?
Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller's column.
Print/View all Posts Comments on this article
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
White Papers, Webcasts, and Downloads
- VMware Infrastructure: A Guide to Bottom-Line Benefits VMware Frustrated by the high cost of maintaining or building ever-larger data centers? Get the facts you need to formulate your Virtualization Action Plan. Download Now
- Tom Davenport Study: Linking decisions and information for organizational performance IBM Tom Davenport's new client study looks at approaches to linking ... Download Now
- The Scalable Enterprise: VMware ESX Server on the Dell PowerEdge 6650 Dell This paper introduces the server virtualization software, VMware ESX ... Download Now
- Why Isn't Server Virtualization Saving Us More? A Few Small Changes May Dramatically Increase Your Efficiency VMware Ever wonder why your company isn't saving more from its server virtualization? Making a few small changes could dramatically increase your efficiency. Download Now
- Building the Virtualized Enterprise with VMware Iinfrastructure VMware VMware virtualization software has been adopted by over 120,000 enterprise ... Download Now
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET

