A set is simply a collection of 0 or more objects. A set with 0 objects is called the empty set, and is represented as {}. The set of integers would be {..., -3, -2, -1, 0, 1, 2, 3, ...}. The set of furniture in my office is {desk, chair, bookshelf, file cabinet}. The objects listed in the set are called elements of the set (so bookshelf is an element of the set of furniture in my office).
In SQL, a table corresponds with a set, and a record corresponds with an element. So if table1 has three records, we can think of table1 as a set and each record as one of the three elements in it.
Sets have a lot of different properties and relations. Two important ones that relate to SQL are subset and cross product. A subset is a (possibly) smaller set whose elements can only come from the elements of the original. If A = {1, 2, 3, 4}, then the following are all subsets of A: {}, {1}, {2, 3}, {1, 2, 3, 4}. The empty set is a subset of ALL sets. Every set is a subset of itself.
The cross product of two sets is a new set that contains ordered pairs made of the elements of the originals. The ordered pairs' elements have to be in the same order as the order of the sets. For example: if A = {1, 2, 3} and B = {4, 5, 6}, then AxB = {(1,4), (1,5), (1,6), (2,4), (2,5), (2,6), (3,4), (3,5), (3,6)}. Note that while a set doesn't care about order ( {1,2,3} = {1,3,2} ), an ordered pair DOES, so (1,4) <> (4,1).
So what does this have to do with SQL? EVERYTHING! Let's look at two sample tables:
table1 t1_id, t1_subid, t1_data ------------------------ 1 56 first 2 23 second 3 72 third table2 t2_id, t2_subid, t2_data ------------------------ 1 23 fourth 2 56 second 3 72 fifthSometimes we want data from both tables. In that case we have two options, 1: do a bunch of queries to get the necessary data, 2: use a join. In the raw, a join simply acts like a cross product on the records.
SELECT * FROM TABLE1 JOIN TABLE2 would return the following records:
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 1 56 first 1 23 fourth 1 56 first 2 56 second 1 56 first 3 72 fifth 2 23 second 1 23 fourth 2 23 second 2 56 second 2 23 second 3 72 fifth 3 72 third 1 23 fourth 3 72 third 2 56 second 3 72 third 3 72 fifthNotice this is the same as doing a cross product. The problem is that this may be returning more than you wanted. There are two ways to restrict the results: with an on clause and a where clause. Let's say we wanted to only return the results where table1.t1_id = table2.t2_id. There are two ways we could do this:
SELECT * FROM TABLE1 JOIN TABLE2 ON TABLE1.T1_ID = TABLE2.T2_ID
or
SELECT * FROM TABLE1 JOIN TABLE2 WHERE TABLE1.T1_ID = TABLE2.T2_ID
In either case, the result would be:
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 1 56 first 1 23 fourth 2 23 second 2 56 second 3 72 third 3 72 fifth
If we wanted t1_data to match t2_data, however, things get interesting. A where clause will simply look at ALL the results and select those that match. So
SELECT * FROM TABLE1 JOIN TABLE2 WHERE TABLE1.T1_DATA = TABLE2.T2_DATA
will return:
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 2 23 second 2 56 second
A join, however, has four options. A plain join is sometimes called an "inner join". There is also a "left outer join", a "right outer join", and a "full outer join". The issue is that you may need some of the table from one of the tables, even if there isn't a match! For example, if table1 represents customers and table2 represents products on order, you may want a list of all your customers, regardless of whether there's a product on order or not. Here is what each join means, along with an example result set:
inner join: returns all the matches
left outer join: returns all the matches, left-hand records that don't have a match are matched with NULL
right outer join: returns all the matches, right-hand records that don't have a match are matched with NULL
full outer join: returns all the matches, any record from either side that doesn't have a match is matched with NULL
SELECT * FROM TABLE1 INNER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 2 23 second 2 56 secondSELECT * FROM TABLE1 LEFT OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 1 56 first NULL NULL NULL 2 23 second 2 56 second 3 72 third NULL NULL NULLSELECT * FROM TABLE1 RIGHT OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- NULL NULL NULL 1 23 fourth 2 23 second 2 56 second NULL NULL NULL 3 72 fifthSELECT * FROM TABLE1 FULL OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 1 56 first NULL NULL NULL NULL NULL NULL 1 23 fourth 2 23 second 2 56 second 3 72 third NULL NULL NULL NULL NULL NULL 3 72 fifth
Finally, you can combine the results of a join clause and a where clause to do some interesting things. Let's say that every record in TABLE1 and TABLE2 is supposed to have at least one match, but our database got out of wack when a script crashed. To find the records that do NOT have a match, we can do this:
SELECT * FROM TABLE1 FULL OUTER JOIN TABLE2 ON TABLE1.T1_DATA = TABLE2.T2_DATA WHERE TABLE1.T1.ID IS NULL OR TABLE2.T2_ID IS NULL
This will create the matches above, and then look for where a match didn't really happen:
table1 join table2 t1_id, t1_subid, t1_data, t2_id, t2_subid, t2_data -------------------------------------------------- 1 56 first NULL NULL NULL NULL NULL NULL 1 23 fourth 3 72 third NULL NULL NULL NULL NULL NULL 3 72 fifth