Nice writeup. I've been self-studying SQL lately for kicks -- I can bumble my way through a SELECT, INSERT, CREATE TABLE, etc. but things like joins and subqueries are foreign to me. Always neat to see something I'm studying turn up on the HN front page so I can see how far I've come :)
The writeup makes me wonder what other Apple software uses SQLite under the hood. In particular if the iTunes library is backed by a SQLite file, I could actually use that for some future projects. Cursory inspection shows that it's hdf-compressed so I'll need to get past that first... anyway, nice task for after-work hours.
`join` isn't so bad -- you are connecting two tables based on a common value.
If you have `person` and `address`:
create table person (
id serial primary key,
first_name text not null,
last_name text not null,
date_of_birth date not null
);
create table address (
id serial primary key,
person_id integer not null references person (id),
street_number text not null,
apartment_number text,
city text not null,
state text,
postal_code text not null
);
Then you'd join using `person.id` to `address.person_id`. An inner join hides rows from the joining table if there is no corresponding row in the joined-to table, while an outer join includes rows from the joining table if there are no corresponding rows in the joined-to table.
If we've got:
insert into person (first_name, last_name, date_of_birth)
values ('Brian', 'Kernighan', '1950-01-01'), -- gets id 1
('Dennis', 'Ritchie', '1948-01-01'); -- gets id 2
insert into address (person_id, street_number, city, postal_code)
values (1, '123 Main St.', 'Berkeley', '94701');
Then these two queries will return one and two rows, respectively:
-- an inner join, only returns `person` rows with a corresponding
-- row in the `address` table
select person.first_name,
person.last_name,
address.street_number,
address.city,
address.postal_code
from person
inner join address on person.id = address.person_id
-- output:
-- first_name | last_name | street_number | city | postal_code
-- Brian | Kernighan | 123 Main St. | Berkeley | 94701
-- a left outer join, returns all `person` rows, regardless of
-- the existence of a corresponding row in the `address` table
select person.first_name,
person.last_name,
address.street_number,
address.city,
address.postal_code
from person
left outer join address on person.id = address.person_id
-- output:
-- first_name | last_name | street_number | city | postal_code
-- Brian | Kernighan | 123 Main St. | Berkeley | 94701
-- Dennis | Ritchie | | |
One important thing to know is if you are joining to multiple tables, and there are multiple corresponding rows (i.e., a `favorite_food` table with three rows per person, and a `favorite_color` table with three rows per person), you will get _multiplicative_ data -- `n_favorite_foods * n_favorite_colors` rows will be returned per person! This is where things like `group by` and `distinct` come into play.
There are other types of joins, but these two are the usual suspects in most queries.
The writeup makes me wonder what other Apple software uses SQLite under the hood. In particular if the iTunes library is backed by a SQLite file, I could actually use that for some future projects. Cursory inspection shows that it's hdf-compressed so I'll need to get past that first... anyway, nice task for after-work hours.