Nov. 19, 2023, 10:31 p.m.
Indexing can be thought of in the same way we think of the index of a book. If we want to look up a certain topic we can go to the index and see what pages of the book contain that topic. It’s the same with a relational database table: when we write a query on a table with an index we can use that index as a reference to find the locations of those rows of data.
The reason we use a table index is the same as the reason we use an index in a book: to save time! If we didn’t have the index the only way to get the information we are querying would be to performed a full scan of all the records of a table. This can be incredibly expensive in terms of time and resources especially on large data sets.
A clustered index determines the physical order of the data in a table. For this reason you can only have one clustered index. A primary key constraint automatically creates a clustered index on that column. Although a table can only have one clustered index you could have multiple columns in that cluster index. For example you could have a table where the employee id and name are both in the clustered index. The clustered index also orders the data within the table. So using our above example the table would have the data ordered by employee id and then by employee name.
In a non-clustered index the data is stored in one place and the data is stored in a separate location; it does not exist within the table as in a clustered index. The non-clustered index contains pointers to the storage location of the data (the row address). A table can have multiple non-clustered indexes.
Some things to note about clustered vs non-clustered indexes:
You can only have one clustered index on a table and it orders the records of the table
A clustered index is faster than a non-clustered index because there is no additional look up–because the index of a non-clustered index is stored separately to the physical table.
You can have multiple non-clustered indexes on a table.
non-clustered indexes take up additional storage space because they are
stored separately.
Uniqueness is a property of an index. Both clustered and non-clustered indexes can be unique. Primary key constraints actually use unique clustered indexes behind the scenes to enforce that constraint. Note that a unique constraint cannot be used if there is already a duplicate value in the column.
• Main advantage of indexes is that it helps the data base engine to find data quickly.
• Other functions like updates, deletes, and group bys can also take advan- tage of indexes.
• additional disk space (non-clustered indexes)
• if you have too many indexes on a table it can slow down delete, update,
and insert operations because all the indexes will need to be updated with each operation.
A covering query is a query in which the columns requested in the select state- ment are present in the index. In that case they are “covered” by the index and no additional work is required to bring in other columns. Like may happen when you use a SELECT * (or really any select statement that has columns not contained in the index). It should be noted that a clustered index will always produce a covered query because it is contained in the table itself.
Indexes are organized by nodes in a tree like structure. The nodes at the end of the tree are called leaves, the node that starts off the structure is called the root node, and the nodes in-between are called intermediary nodes.The root and the intermediary nodes contain index rows. Each index row contains a key value and a pointer to either another intermediary node or a data row. Through this process data can be found quickly and efficiently.
I should at this point give a great deal of thanks to Venkat over at Pargim Technologies. His YouTube video series was a great refresher on these topics and the basis of this blog post. Check out his channel here: Link to YouTube