Index
Selection can be performed faster if we know where to find the rows that are being selected. One way of doing this is by using indexes.
Index
Given the values for one or more attributes of a relation $R$, provides quick access to the tuples with these values.
Consider that we have the following data that we want to index:
-
Students
id name programme 1234 Anna G401 2345 Ben G701 3456 Chloe G401 4567 Dave G401
We can use the following index table, sorted by value
:
value | pointers to rows |
---|---|
G401 | 1234, 3456, 4567 |
G701 | 2435 |
You should have an index for each attribute that you want to select.
There are two different types of index:
-
Secondary - Only points to the location of records on disk.
Always dense.
-
Primary - In addition, defines how data is sorted on disk.
Good when attributes involve primary keys.
Using indexes
To complete a selection using an index:
- Find the entry that you are selecting in the index (G401).
- Visit all rows in
Students
whose ids occur in the index entry for G401.
This has a running time of:
\[O(\log d+\text{size of output})\]Forms of Indexes
You can use the following datatypes as indexes:
- Hash-Index - Good for equality.
- B+- Tree - Good for ranges.
You can create an index in SQL using the following command:
CREATE INDEX
ON Students USING btree -- default, can use hash
(programme, year); -- the columns that you want to index