Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo

Using Set Theory to Understand SQL Joins


  • Please log in to reply
21 replies to this topic

#1 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 22 February 2009 - 03:47 PM

One of the areas of SQL that causes some people issues is understanding how joins work in queries. To help make this clear, I'm going to start with the underlying concept of all of SQL: set theory.

A set is simply a collection of 0 or more objects. A set with 0 objects is called the empty set, and is represented as {}. The set of integers would be {..., -3, -2, -1, 0, 1, 2, 3, ...}. The set of furniture in my office is {desk, chair, bookshelf, file cabinet}. The objects listed in the set are called elements of the set (so bookshelf is an element of the set of furniture in my office).

In SQL, a table corresponds with a set, and a record corresponds with an element. So if table1 has three records, we can think of table1 as a set and each record as one of the three elements in it.

Sets have a lot of different properties and relations. Two important ones that relate to SQL are subset and cross product. A subset is a (possibly) smaller set whose elements can only come from the elements of the original. If A = {1, 2, 3, 4}, then the following are all subsets of A: {}, {1}, {2, 3}, {1, 2, 3, 4}. The empty set is a subset of ALL sets. Every set is a subset of itself.

The cross product of two sets is a new set that contains ordered pairs made of the elements of the originals. The ordered pairs' elements have to be in the same order as the order of the sets. For example: if A = {1, 2, 3} and B = {4, 5, 6}, then AxB = {(1,4), (1,5), (1,6), (2,4), (2,5), (2,6), (3,4), (3,5), (3,6)}. Note that while a set doesn't care about order ( {1,2,3} = {1,3,2} ), an ordered pair DOES, so (1,4) <> (4,1).

So what does this have to do with SQL? EVERYTHING! Let's look at two sample tables:
table1
t1_id, t1_subid, t1_data
------------------------
1      56        first
2      23        second
3      72        third

table2
t2_id, t2_subid, t2_data
------------------------
1      23        fourth
2      56        second
3      72        fifth
Sometimes we want data from both tables. In that case we have two options, 1: do a bunch of queries to get the necessary data, 2: use a join. In the raw, a join simply acts like a cross product on the records.

SELECT * FROM TABLE1 JOIN TABLE2 would return the following records:
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
1      56        first    1      23        fourth
1      56        first    2      56        second
1      56        first    3      72        fifth
2      23        second   1      23        fourth
2      23        second   2      56        second
2      23        second   3      72        fifth
3      72        third    1      23        fourth
3      72        third    2      56        second
3      72        third    3      72        fifth
Notice this is the same as doing a cross product. The problem is that this may be returning more than you wanted. There are two ways to restrict the results: with an on clause and a where clause. Let's say we wanted to only return the results where table1.t1_id = table2.t2_id. There are two ways we could do this:
SELECT * FROM TABLE1 JOIN TABLE2 ON TABLE1.T1_ID = TABLE2.T2_ID
or
SELECT * FROM TABLE1 JOIN TABLE2 WHERE TABLE1.T1_ID = TABLE2.T2_ID
In either case, the result would be:
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
1      56        first    1      23        fourth
2      23        second   2      56        second
3      72        third    3      72        fifth

If we wanted t1_data to match t2_data, however, things get interesting. A where clause will simply look at ALL the results and select those that match. So
SELECT * FROM TABLE1 JOIN TABLE2 WHERE TABLE1.T1_DATA = TABLE2.T2_DATA
will return:
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
2      23        second   2      56        second

A join, however, has four options. A plain join is sometimes called an "inner join". There is also a "left outer join", a "right outer join", and a "full outer join". The issue is that you may need some of the table from one of the tables, even if there isn't a match! For example, if table1 represents customers and table2 represents products on order, you may want a list of all your customers, regardless of whether there's a product on order or not. Here is what each join means, along with an example result set:
inner join: returns all the matches
left outer join: returns all the matches, left-hand records that don't have a match are matched with NULL
right outer join: returns all the matches, right-hand records that don't have a match are matched with NULL
full outer join: returns all the matches, any record from either side that doesn't have a match is matched with NULL
SELECT * FROM TABLE1 INNER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
2      23        second   2      56        second
SELECT * FROM TABLE1 LEFT OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
1      56        first    NULL   NULL      NULL
2      23        second   2      56        second
3      72        third    NULL   NULL      NULL
SELECT * FROM TABLE1 RIGHT OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
NULL   NULL      NULL     1      23        fourth
2      23        second   2      56        second
NULL   NULL      NULL     3      72        fifth
SELECT * FROM TABLE1 FULL OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
1      56        first    NULL   NULL      NULL
NULL   NULL      NULL     1      23        fourth
2      23        second   2      56        second
3      72        third    NULL   NULL      NULL
NULL   NULL      NULL     3      72        fifth

Finally, you can combine the results of a join clause and a where clause to do some interesting things. Let's say that every record in TABLE1 and TABLE2 is supposed to have at least one match, but our database got out of wack when a script crashed. To find the records that do NOT have a match, we can do this:
SELECT * FROM TABLE1 FULL OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA WHERE TABLE1.T1.ID IS NULL OR TABLE2.T2_ID IS NULL
This will create the matches above, and then look for where a match didn't really happen:
table1 join table2
t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data
--------------------------------------------------
1      56        first    NULL   NULL      NULL
NULL   NULL      NULL     1      23        fourth
3      72        third    NULL   NULL      NULL
NULL   NULL      NULL     3      72        fifth

  • 3

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#2 Guest_Jordan_*

Guest_Jordan_*
  • Guest

Posted 22 February 2009 - 05:19 PM

Wow, impressive and very comprehensive. I know a lot of people have trouble with joins when learning SQL. This should help them a lot!
  • 0

#3 Brandon W

Brandon W

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2092 posts
  • Location:Ipswich, Australia
  • Programming Language:C, Java
  • Learning:Java, C++, JavaScript

Posted 23 February 2009 - 03:05 AM

Seems to be quite long, I will read it when I can. But from Jordan's comment and quick look over the tutorial. It looks really good.
  • 0
I've returned...

#4 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 23 February 2009 - 08:42 AM

I could probably add some Venn Diagrams to further explain the join process... but I forgot to when I was typing it up.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#5 Brandon W

Brandon W

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2092 posts
  • Location:Ipswich, Australia
  • Programming Language:C, Java
  • Learning:Java, C++, JavaScript

Posted 23 February 2009 - 01:38 PM

Why not create a part 2?
  • 0
I've returned...

#6 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 23 February 2009 - 02:12 PM

To be honest, I think what I've got here is probably clearer. Unfortunately, ON clauses and WHERE clauses, in particular, can be confusing when dealing with joins.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#7 Brandon W

Brandon W

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2092 posts
  • Location:Ipswich, Australia
  • Programming Language:C, Java
  • Learning:Java, C++, JavaScript

Posted 23 February 2009 - 09:56 PM

I just finished school so I am not doing nothing, so I read your tutorial.

Great tutorial mate, no matter where I went and tried to learn SQL Joins it just made no sense. This tutorial on the other hand, was the most clearest joins tutorial ever! All I can say is well done. Even though it doesn't matter, I will +rep you ;)
  • 0
I've returned...

#8 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 24 February 2009 - 08:45 AM

I wrote it because I've seen a lot of intelligent people get scrambled by joins.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#9 Xav

Xav

    CC Mentor

  • VIP Member
  • PipPipPipPipPipPipPipPip
  • 8356 posts

Posted 24 February 2009 - 11:39 AM

You have probably seen a lot of stupid people get scrambled by joins too. :)
  • 0
If you enjoy reading this discussion and are thinking about commenting, why not click here to register and start participating in under a minute?

#10 Brandon W

Brandon W

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2092 posts
  • Location:Ipswich, Australia
  • Programming Language:C, Java
  • Learning:Java, C++, JavaScript

Posted 24 February 2009 - 01:32 PM

Of course, joins are a breeze now.
  • 0
I've returned...

#11 John

John

    CC Mentor

  • Moderator
  • 4450 posts
  • Location:New York, NY

Posted 26 February 2009 - 09:11 AM

I must say, this has made JOINs a lot clearer. For some reason, I never though of SQL in terms of set theory.
  • 0

#12 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 26 February 2009 - 09:20 AM

The funny thing is, set theory was the framework for the design of SQL, but nobody seems to use it to help TEACH SQL.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/