SQL Queries (Optional Part)
These aren’t required to form a valid query and are extensions to standard queries.
SQL queries have the following form:
SELECT [] FROM [] WHERE [] GROUP BY [] HAVING [] ORDER BY []
The first two parts are required to have a valid query.
WHERE
This is already mostly covered in SQL Data Manipulation Language (DML).
IN
In addition to providing a list like here you can also use sub-queries:
DELETE FROM Students WHERE name IN (SELECT name FROM Lecturers);
you can also do the same for a SELECT
query:
SELECT * FROM Students WHERE name IN (SELECT name FROM Lecturers);
EXISTS
This is a generalisation of IN
. With EXISTS
you can write a sub-query and if it returns something, the current row will be kept, otherwise thrown out. You can use attributes from the current row in the sub-query:
SELECT *
FROM Students
WHERE EXISTS ( SELECT 1
FROM Lecturers
WHERE Students.name=Lecturers.name AND Students.last_name=Lecturers.last_name);
It is common to use the constant 1
when we just want to have an output or not.
Semi-Join
Most types on joins are done in the FROM
clause. The only difference is semi-joins that are done from WHERE
clauses.
A left semi-join between two tables $A$ and $B$, finds the subset of row in $A$ that would be used to form some row of the natural join of $A$ and $B$.
A right semi-join would find the subset of $B$ instead.
These are done using EXISTS
(or IN
, if there is only one shared attribute).
Example
We will complete a left semi-join for the following tables (Employees to Transactions):
birthday | first_name | family_name | e_id |
---|---|---|---|
1990-11-10 | Anne | Smith | 1 |
200-02-05 | David | Jones | 2 |
1995-05-09 | William | Taylor | 3 |
t_id | c_id | e_id |
---|---|---|
1 | 3 | 1 |
2 | 6 | 1 |
3 | 19 | 3 |
We would use the following query:
SELECT *
FROM Employees E
WHERE EXISTS (SELECT 1
FROM Transaction T
WHERE E.e_id=T.e_id);
This gives the following output:
birthday | first_name | family_name | e_id |
---|---|---|---|
1990-11-10 | Anne | Smith | 1 |
1995-05-09 | William | Taylor | 3 |
Notice that only one Anne
row is output even though two transactions where done under here e_id
. This is as this method can only remove rows.
GROUP BY
GROUP BY
allows you to count transactions for each employee rather than just for one by using a WHERE
clause. For the following tables:
birthday | first_name | family_name | e_id |
---|---|---|---|
1990-11-10 | Anne | Smith | 1 |
200-02-05 | David | Jones | 2 |
1995-05-09 | William | Taylor | 3 |
t_id | c_id | e_id |
---|---|---|
1 | 3 | 1 |
2 | 6 | 1 |
3 | 19 | 3 |
You could count the number of transactions for all e_id
using the following query:
SELECT e_id, COUNT(t_id)
FROM Transactions
GROUP BY e_id;
This would return the following table:
e_id | COUNT(t_id) |
---|---|
1 | 2 |
3 | 1 |
Notice how the behaviour of COUNT
changes due to the iteration.
Complications
When using GROUP BY
you can only include in SELECT
attributes that we GROUP BY
or aggregates.
There are a couple of ways to overcome this:
-
Including additional attributes into the
GROUP BY
statement:SELECT first_name, e_id, COUNT(t_id) FROM Employees NATURAL JOIN Transactions GROUP BY first_name, e_id;
This seems like my preferred method.
-
Including the additional attribute in an aggregate:
SELECT MIN(first_name), e_id, COUNT(t_id) FROM Employees NATURAL JOIN Transactions GROUP BY e_id;
This works as each
e_id
will have on associatedfirst_name
, taking theMIN
will return the only entry.
These both give the following output where the first_name
included too:
first_name | e_id | COUNT(t_id) |
---|---|---|
Anne | 1 | 2 |
William | 3 | 1 |
Intuitive Method
Intuitively GROUP BY
works like this:
- Do the previous part of the query until
GROUP BY
(except ignoring the part inSELECT
) - Split the resulting table into sub-tables:
-
One sub-table for each value of the variables in
GROUP BY
. (If you hadfirst_name
ande_id
, yo would have a sub-tale for each pair offirst_name
ande_id
).This means that if you had two employees called Anne you would have two sub-tables, one for each Anne, identified by their
e_id
.
-
This means that for the following query you would get the following table (prior to SELECT
ing the required rows):
...
FROM Employees NATURAL JOIN Transactions
GROUP BY first_name, e_id;
first_name | e_id | sub-table(birthday) | sub-table(family_name) | sub-table(c_id) | sub-table(t_id) |
---|---|---|---|---|---|
Anne | 1 | 1990-11-10 | Smith | 3 | 1 |
” | ” | 1990-11-10 | Smith | 6 | 2 |
William | 3 | 1995-05-09 | Taylor | 19 | 3 |
As the table is grouped by the first_name
and e_id
these are pulled out of the table and redundant rows are removed. Aggregates are then calculated on the sub-tables of each group.
Having
This is just WHERE
but done after a GROUP BY
. This means that it doesn’t affect GROUP BY
and can be used on aggregates on the sub-tables in its expressions.
For the following query:
SELECT first_name,e_id,COUNT(t_id)
FROM Employees NATURAL JOIN Transactions
GROUP BY first_name, e_id
HAVING MIN(c_id)<5;
you would get the following output:
first_name | e_id | COUNT(t_id) |
---|---|---|
Anne | 1 | 2 |
This is for the same table as used on GROUP BY
.
ORDER BY
This defines how the output is sorted:
SELECT first_name
FROM Employees
ORDER BY family_name;
first_name |
---|
David |
Anne |
William |
You can also have more than one attribute in ORDER BY
. Then it would order those attributes in order.
It defaults to ascending order. You can also do descending by writing DESC
after the attribute you want to sort.