Three ways to store hierarchical data in SQL
At work, we recently needed to store hierarchical data both MySQL and SQLite, a situation that's worth a short post.
Case study
Let's imagine that we content to tag, and our table of tags looks like this:
id | name |
---|---|
1 | environment |
2 | biodiversity |
3 | deforestation |
4 | nuclear energy |
Now we realise we want some tags to be subtags, so that deforestation and biodiversity are subtags of environment.
To do this, we want to turn the flat tags list into a hierarchy. Let's look at three ways of doing this.
Adjacency list model
In the adjacencly list model, we give every row a reference to its parent row, a parent_id
column. The mental model is a linked list.
id | name | parent_id |
---|---|---|
1 | environment | NULL |
2 | biodiversity | 1 |
3 | deforestation | 1 |
4 | nuclear energy | NULL |
The rows without a parent_id
are at the top of the hierarchy.
This makes inserting rows cheap but querying the whole hierarchy difficult, as you don’t know the number of joins needed. However, the situation improved as of MySQL 8.0 thanks to recursive common table expressions:1
WITH RECURSIVE all_children AS (
-- base case, the parent
SELECT *
FROM tags
WHERE name = 'environment'
UNION ALL
-- recursive case
SELECT t.*
FROM tags t
INNER JOIN all_children c ON (t.parent_id = c.id)
)
SELECT * FROM all_children
Recursion always needs a base case and a recursive case. Here, the first part of the query selecting the parent is the base case, and the second recursive part selects all children of existing rows.
Nested set model
In the nested set model, every row is a set containing everything between two numbers, left
and right
. The mental model is a set of cups of different sizes so that some fit into others.
id | name | left | right |
---|---|---|---|
1 | environment | 1 | 6 |
2 | biodiversity | 2 | 3 |
3 | deforestation | 4 | 5 |
4 | nuclear energy | 7 | 8 |
environment
has the range 1 to 6, a span which includes the left and right values biodiversity
and deforestation
.
The numbering is what you get if you load the whole hierarchy into memory and walk the tree depth-first, numbering the left node on entry and right node on exit.
To get the children of environment, you query like this:
WITH parent AS (
SELECT left, right
FROM tags
WHERE
name = 'environment'
)
SELECT *
FROM tags
WHERE
tags.left BETWEEN parent.left AND parent.right
In this model, you get nice reads, but the numbering system is fragile and needs to be recomputed for many or all rows when adding a new one.2 This makes writes slow and cumbersome, which is still fine if the table is usually read and only occasionally written to.
Fixed-depth hierarchy
Another model I've seen is to have a fixed number of levels in the hierarchy, and have a column for each level. The mental model for this is breadcrumbs.
For example, a table could look like this:
id | level_1 | level_2 | level_3 |
---|---|---|---|
1 | environment | NULL | NULL |
2 | environment | biodiversity | NULL |
3 | environment | deforestation | NULL |
4 | nuclear energy | NULL | NULL |
This is pretty easy to query:
SELECT * FROM tags
WHERE level_1 = 'environment'
Some downsides to this model. If you don’t know the tag level, querying becomes more complex and adds to your application logic. You have to decide decide the number of levels in advance, something you might not know. Updating a row means updating all of its children too. Adding an extra level requires a schema migration to add a new column for it. The same tag gets stored multiple times, which is a slight waste, but not a big deal if your table is small.
Conclusion
Overall, the adjacency list model with a parent_id
is likely the best default. It’s compact, easy to understand, query, and insert into. Recursive queries in MySQL 8.0+ and SQLite make it possible to query the whole hierarchy in one go and are quite performant. You are not forced to decide the number of levels in advance, which is helpful.
Footnotes
-
It seems obvious that a floating point scheme could avoid this, but I've never seen one in the wild. ↩