Understand the effects of NULLs on inner and outer JOINs
Takeaway: In this SQL Server tip, Arthur Fuller explains the effects of NULLs, and the different results obtained by inner and outer JOINs.
SQL Server developers must understand two things: the difference between inner and outer JOINs and the effect of NULLs upon these choices. This tip addresses both subjects.
Let's take it one step at a time -- first, the differences between inner and outer JOINs, and then the implications of NULLs on our queries. In order to examine the difference between inner and outer JOINs, we'll need some sample data.
Suppose you have two tables, T1 and T2, each of which contain a column called TestJoin. The column contains the values 1, 2, and 3 in table T1, while the column contains NULL, 2, and 3 in table T2. Inner JOINing these tables on the column TestJoin returns two rows, since you cannot join the value 1 to the NULL. An outer JOIN is the opposite. For instance, the following yields only those rows where Customers have placed Orders.
SELECT CustomerID, OrderID FROM Customers Inner Join Orders
On Customers.CustomerID = Orders.CustomerID
(In many businesses, this may be valid; other businesses may define a Customer as one who has placed at least one Order. That is a business rule that ought to be reflected in the database definition, not the middle tier, in my opinion, though some would differ.)
An outer join would read:
SELECT CustomerID, OrderID FROM Customers LEFT OUTER JOIN Orders
OnOrders.CustomerID = Customers.CustomerID
which would result in a list of all customers, irrespective of whether they placed orders.
Run the simple script in Listing A to create some test tables.
Here is the SQL to compare the inner and outer JOINs:
SELECT InnerOuter.T1.T1ID, InnerOuter.T1.NameAS Name1,
InnerOuter.T2.T2ID, InnerOuter.T2.NameAS Name2
FROM InnerOuter.T1 LEFT OUTER JOIN
InnerOuter.T2 ON InnerOuter.T1.T1ID = InnerOuter.T2.T1ID
When you run both queries, you will confirm that the inner JOIN returns two rows, while the outer JOIN returns three. This principle holds true even if you add a third table, as you can see from Listing B.
If you join the third table in the SELECT statements, you find the same results: two rows from the inner JOIN and three from the outer JOIN. See Listing C.
Be aware of the effect of NULLs on aggregate functions. To observe this effect, add a NULL money column to T2 called Amount. Add a few more rows to T2 something like this:
T2ID T1ID Name Amount
1 1 T2Text1 NULL
2 NULL T2Text2 NULL
3 3 T2Text3 120.0000
4 1 T2Text4 123.0000
5 1 T2Text5 234.0000
6 3 T2Text6 345.0000
NULL NULL NULL NULL
The T1ID value 1 has three rows, one of whose Amount is NULL. The T2ID value 3 has two rows, neither of which are NULL. (You can add as many rows as you wish, but ensure that there are different numbers of rows with NULL values in the Amount column.)
Now run this query:
SELECT InnerOuter.T1.T1ID,
Sum(InnerOuter.T2.Amount)AS TotalAmount,
Count(InnerOuter.T2.T1ID)AS NumberOfRows,
Count(InnerOuter.T2.Amount)AS NumberOfAmounts,
Avg(InnerOuter.T2.Amount)AS AverageAmount
FROM InnerOuter.T1 LEFT OUTER JOIN
InnerOuter.T2 ON InnerOuter.T1.T1ID = InnerOuter.T2.T1ID
GROUP BY
InnerOuter.T1.T1ID
The above data produces this result set:
1 357.00 3 2 178.50
2 NULL 0 0 NULL
3 465.00 2 2 232.50
The first returned row indicates that if you count the T1ID column, you get the correct number of rows, but if you count the Amounts, you get the number of non-NULL values. Note that the averages reflect the number of non-NULL values, not the number of rows.
You might think these examples are artificial, since given the relationships between the tables, it would seem unlikely that the foreign key columns allow NULLs. However, I see this sort of thing all the time in databases. HR hires a new employee but has yet to assign her to a department. Or, a new customer is added to a database before a sales rep is assigned to that customer.
There are various ways to work around the absence of such information. In my opinion, the worst approach is to add what is known colloquially as the 0th row, in which a "fake" row is added to every lookup table, and this zero value is used whenever a foreign key is unknown—thus, the foreign keys will never be NULL. This approach has two significant problems: (1) It conceptually falsifies reality by introducing a new type of data into the table, when relational theory insists that a table holds exactly one type of data; (2) It complicates queries because you will always have to eliminate such rows from standard queries.
Miss a tip?
Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller's column.
Print/View all Posts Comments on this article
SponsoredWhite Papers, Webcasts, and Downloads
- The Superstar CFO: Optimizing an increasingly complex role SAP
- Live Webcast: Mid-Year Report: Malware, Spam and Web Threats in 2008 Sophos
- Responding quickly to changing markets: Anticipate, Adapt, Excel SAP
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET


Harnessing the power of waves
Planting solar gardens
Fill your car for $1.10 a gallon?
