Composable Recursive Relational Queries

Your input and feedback on this article are welcome on github.com/DavidPratten/sql-fp

Hierarchical and graph data may be transformed with relational queries that are recursive, iterative, and fixed-point. These are available in SQL through the recursive Common Table Expression (CTE):

WITH RECURSIVE ...

This article argues that SQL’s CTE machinery is a distraction from the core idea of a recursive query and a hinderance to composition of queries. We propose a simpler SQL form which is equally declarative, expressive and composes nicely:

SELECT ... UNION SELECT ... FROM UNION

In addition, we explore how this form may be translated into Morel, a descendent of ML with relational queries.

Let’s begin by introducing a motivating example

Transitive Closure of a Tree

If we have a tree

Example Tree

This can be represented in a relation as:

idparent_id
11
21
31
42
52
65
75

The transitive closure of the tree is:

Transitive Closure of the Tree with depth shown on the edges

And this may be represented by the following relation.

idparent_idlevel
110
220
330
440
550
660
770
211
311
421
521
651
751
412
512
622
722
613
713

Recursive CTE query

Using the Sqlite CTE form here is the query that computes the Transitive Closure given the Tree.

/* WITH RECURSIVE transitive closure of a tree (SQLITE)
   n.b. Tree's root node convention is id = parent_id
*/
with items (id, parent_id) as (
    select 1 id, 1 parent_id
    union
    select 2, 1
    union
    select 3, 1
    union
    select 4, 2
    union
    select 5, 2
    union
    select 6, 5
    union
    select 7, 5
),
     transitive_closure (id, root_id, level) as (
         select id, id, 0
            from items
         union
         select items1.id, tcitems.root_id, tcitems.level + 1
            from transitive_closure tcitems
            inner join items items1 on
                items1.parent_id = tcitems.id and
                items1.id <> items1.parent_id
     )
select * from transitive_closure;

Focusing in on the CTE let’s analyse its parts

  • A language closure (WITH). (Note this is a different concept than transitive closure.)
  • A name
  • Core Queries and Union
  • Final query.

As it turns out 3/4 of this structure may be dropped without loss.

SQL – SELECT … UNION SELECT … FROM UNION

We propose simplifying SQL by dropping the name, the closure and the final query and simplifying the composable recursive relational query down to:

select id, id root_id, 0 level 
    from items
union
select items1.id, tcitems.root_id, tcitems.level + 1
    from union tcitems
        inner join items items1 on items1.parent_id = tcitems.id 
        and items1.id <> items1.parent_id

This form:

SELECT ... UNION SELECT ... FROM UNION

SELECTs the fixed-point result of the UNION without the overhead of a name (repeated three times), the closure and the final query.

This demonstrates that CTE machinery is not required to express recursive fixed point relational queries.

The simplified recursive relational query is orthogonal to all constructs in SQL and is composable:

  • if it is desired to hide some of the columns included in the recursive query, then the simplified recursive query can be wrapped in an outer SELECT, and
  • if the simplified recursive query needs to be consumed by more than one query then it can be
    • wrapped in a CTE, or
    • wrapped in a VIEW.

Multi-UNION Queries

There do need to be disambiguation rules if there are multiple UNIONs in scope. However, this is true whatever the linguistic form. See for example the multi-UNION rules for Sqlite. All non-recursive UNION sub-queries are constrained to be first, and all recursive sub-queries follow. And each recursive sub-query is presented with all generated rows from every other subquery in the UNION.

Composability

To be fair, the CTE variant of recursive queries is composable and SELECT FROM (WITH RECURSIVE …. SELECT …) works (at least in Sqlite). However, the ergonomics are poor, requiring the developer to build a mental model that includes a linguistic closure, a temporary local name, and a SELECT to pop back out of the closure. Here is the equivalent composable CTE-based query (Sqlite) with the required extra machinery in bold.

select * from (
with _unionfixedpoint (id, root_id, level) as (
select id, id root_id, 0 level from items union select items1.id, tcitems.root_id, tcitems.level + 1 from _unionfixedpoint tcitems inner join items items1 on items1.parent_id = tcitems.id and items1.id <> items1.parent_id)
select * from _unionfixedpoint
);

Application for other languages

The simplification of the recursive relational query to its essence is designed to improve developer experience and also points to a pattern for incorporating recursive queries into another language Morel.

Morel – from … union … from … in query

Having separated the idea of a recursive relational query from the CTE machinery, it becomes straight forward to propose the analog for a future release of Morel:

(from item in items 
    yield {id = item.id, root_id=item.id, level=0})
union
(from tcitem in query,
    item1 in items
    where item1.parent_id = tcitem.id and item1.id != item1.parent_id
    yield {id = item1.id, root_id=tcitem.root_id, level=tcitem.level+1});

The form

in query

is proposed for Morel, because ‘in union’ is too specific. Morel’s roadmap includes consideration for other UNION-like operators including unnamed lambdas.