Your input and feedback on this article are welcome on github.com/DavidPratten/sql-fp
Hierarchical and graph data may be transformed with relational queries that are recursive, iterative, and fixed-point. These are available in SQL through the recursive Common Table Expression (CTE):
WITH RECURSIVE ...
This article argues that SQL’s CTE machinery is a distraction from the core idea of a recursive query and a hinderance to composition of queries. We propose a simpler SQL form which is equally declarative, expressive and composes nicely:
SELECT ... UNION SELECT ... FROM UNION
In addition, we explore how this form may be translated into Morel, a descendent of ML with relational queries.
Let’s begin by introducing a motivating example
Transitive Closure of a Tree
If we have a tree
This can be represented in a relation as:
id | parent_id |
1 | 1 |
2 | 1 |
3 | 1 |
4 | 2 |
5 | 2 |
6 | 5 |
7 | 5 |
The transitive closure of the tree is:
And this may be represented by the following relation.
id | parent_id | level |
1 | 1 | 0 |
2 | 2 | 0 |
3 | 3 | 0 |
4 | 4 | 0 |
5 | 5 | 0 |
6 | 6 | 0 |
7 | 7 | 0 |
2 | 1 | 1 |
3 | 1 | 1 |
4 | 2 | 1 |
5 | 2 | 1 |
6 | 5 | 1 |
7 | 5 | 1 |
4 | 1 | 2 |
5 | 1 | 2 |
6 | 2 | 2 |
7 | 2 | 2 |
6 | 1 | 3 |
7 | 1 | 3 |
Recursive CTE query
Using the Sqlite CTE form here is the query that computes the Transitive Closure given the Tree.
/* WITH RECURSIVE transitive closure of a tree (SQLITE) n.b. Tree's root node convention is id = parent_id */ with items (id, parent_id) as ( select 1 id, 1 parent_id union select 2, 1 union select 3, 1 union select 4, 2 union select 5, 2 union select 6, 5 union select 7, 5 ), transitive_closure (id, root_id, level) as ( select id, id, 0 from items union select items1.id, tcitems.root_id, tcitems.level + 1 from transitive_closure tcitems inner join items items1 on items1.parent_id = tcitems.id and items1.id <> items1.parent_id ) select * from transitive_closure;
Focusing in on the CTE let’s analyse its parts
- A language closure (WITH). (Note this is a different concept than transitive closure.)
- A name
- Core Queries and Union
- Final query.
As it turns out 3/4 of this structure may be dropped without loss.
SQL – SELECT … UNION SELECT … FROM UNION
We propose simplifying SQL by dropping the name, the closure and the final query and simplifying the composable recursive relational query down to:
select id, id root_id, 0 level from items union select items1.id, tcitems.root_id, tcitems.level + 1 from union tcitems inner join items items1 on items1.parent_id = tcitems.id and items1.id <> items1.parent_id
This form:
SELECT ... UNION SELECT ... FROM UNION
SELECTs the fixed-point result of the UNION without the overhead of a name (repeated three times), the closure and the final query.
This demonstrates that CTE machinery is not required to express recursive fixed point relational queries.
The simplified recursive relational query is orthogonal to all constructs in SQL and is composable:
- if it is desired to hide some of the columns included in the recursive query, then the simplified recursive query can be wrapped in an outer SELECT, and
- if the simplified recursive query needs to be consumed by more than one query then it can be
- wrapped in a CTE, or
- wrapped in a VIEW.
Multi-UNION Queries
There do need to be disambiguation rules if there are multiple UNIONs in scope. However, this is true whatever the linguistic form. See for example the multi-UNION rules for Sqlite. All non-recursive UNION sub-queries are constrained to be first, and all recursive sub-queries follow. And each recursive sub-query is presented with all generated rows from every other subquery in the UNION.
Composability
To be fair, the CTE variant of recursive queries is composable and SELECT FROM (WITH RECURSIVE …. SELECT …) works (at least in Sqlite). However, the ergonomics are poor, requiring the developer to build a mental model that includes a linguistic closure, a temporary local name, and a SELECT to pop back out of the closure. Here is the equivalent composable CTE-based query (Sqlite) with the required extra machinery in bold.
select * from (
with _unionfixedpoint (id, root_id, level) as ( select id, id root_id, 0 level from items union select items1.id, tcitems.root_id, tcitems.level + 1 from _unionfixedpoint tcitems inner join items items1 on items1.parent_id = tcitems.id and items1.id <> items1.parent_id)
select * from _unionfixedpoint
);
Application for other languages
The simplification of the recursive relational query to its essence is designed to improve developer experience and also points to a pattern for incorporating recursive queries into another language Morel.
Morel – from … union … from … in query
Having separated the idea of a recursive relational query from the CTE machinery, it becomes straight forward to propose the analog for a future release of Morel:
(from item in items yield {id = item.id, root_id=item.id, level=0}) union (from tcitem in query, item1 in items where item1.parent_id = tcitem.id and item1.id != item1.parent_id yield {id = item1.id, root_id=tcitem.root_id, level=tcitem.level+1});
The form
in query
is proposed for Morel, because ‘in union’ is too specific. Morel’s roadmap includes consideration for other UNION-like operators including unnamed lambdas.