I propose seamless integration of solvers into SQL via Table Valued Constrained Functions (TVCF). By "seamless", I mean that the SQL programmer would have to ask the query planner if a constraint solver was invoked to complete the query.
A few years ago, I sketched the design of an ideal logic and constraint-based language "Parasat". In this article I describe a SQL language extension with similar goals which is also inspired by Picat.
Motivating Example
At university I was surprised, and disappointed, by the distance between the predicates of mathematical logic and those of Prolog. For example the Prolog interpreter was unable to give the answers
X=-4, Y= -3 and X=3 and Y=4
to the simultaneous equations x+1-y = 0 and x^2+y^2-25 = 0 expressed in Prolog as this predicate:
simultaneous(X,Y) :- plus(X, 1, Y),times(X,X,XSquared),times(Y,Y,YSquared),
plus(XSquared,YSquared, XSquaredPlusYSquared),equals(XSquaredPlusYSquared,25)
Here is another way forward by extending SQL’s Table Valued Functions (TVF) to include constraints passed into the function by the query planner.
Solving this simultaneous equation with SQL-92
This particular simultaneous equation can be solved using an ordinary garden variety SQL-92 SELECT query. Given two relations, one each for the simultaneous equations:
CREATE TABLE XPlusZEqualsY (X,Z,Y);
CREATE TABLE XSquaredPlusYSquaredEqualsZ(X,Y,Z);
populated with select rows as follows:
XPlusZEqualsY
X | Z | Y |
---|---|---|
-5 | 1 | -4 |
-4 | 1 | -3 |
-3 | 2 | -1 |
-3 | 1 | -2 |
2 | 1 | 3 |
3 | 1 | 4 |
4 | 3 | 7 |
XSquaredPlusYSquaredEqualsZ
X | Y | Z |
---|---|---|
-4 | -3 | 25 |
-1 | -1 | 1 |
0 | 0 | 0 |
3 | 4 | 25 |
The following SQL-92 SELECT query will solve the simultaneous equation:
SELECT
r1.X,
r1.Y
FROM
XPlusZEqualsY r1
INNER JOIN XSquaredPlusYSquaredEqualsZ r2 ON r1.X = r2.X AND r1.Y = r2.Y
WHERE r1.Z = 1 and r2.Z = 25;
However, this is not a useful technique. This SELECT query only works when, out of the infinite rows that could be in the relations, the specific rows that make the simultaneous equation true happen to be pre-loaded.
Can Table Valued Functions (TVF) be used to provide a generic solution?
No, is the short answer. What are TVFs and why can’t they provide a solution?
Table Valued Functions (TVF) are defined in the SQL standard and implemented in widely-used databases.
a table function returns a (relational) table comprising zero or more rows, each row with one or more columns. See wikipedia User Defined Functions
TVFs are implemented in a variety of databases for example: SQLite, PostgreSQL, and MS SQL.
TVFs can’t provide a general solution the simultaneous equation problem. The example above will illustrate this. If we replace the tables above with TVFs and pass the two TVFs some parameters:
Q: What parameters(?) would we pass them to ensure that the TVFs generated the correct rows to be selected against?
SELECT
r1.X,
r1.Y
FROM
XPlusZEqualsY(?) r1
INNER JOIN XSquaredPlusYSquaredEqualsZ(?) r2 ON r1.X = r2.X AND r1.Y = r2.Y
WHERE r1.Z = 5 and r2.Z = 25;
A: There aren’t any (non-magical) parameters that can achieve this.
Table Valued Constrained Functions (TVCF)
As defined in this article, TVCFs are like TVFs in that they are:
- Functions
- that return 0 or more rows of 1 or more columns
In addition, however, TVCFs:
- have a fixed and typed column-set, like an ordinary table or view
- are read-only (no UPDATE or DELETE FROM),
- may delegate back to the query planner the task of generating the next value by invoking a solver
- are possibly non-deterministic
- appear in SQL code as just their name like a table or a view (i.e. without parameters)
- are lazily evaluated
- are defined in a language which can be compiled into a form usable by solvers. We know this is feasible because the language Picat has this property.
- parameters to the TVCF are provided by context e.g. constraints provided by ON and WHERE clauses
When a TVCF is referenced in a query:
- if the contextually provided constraints provide all the information required to generate the next row the function returns the next row to the query engine.
- otherwise the TVCF invariant is compiled by the query planner and with the known constraints is passed as input into a solver. The query planner seamlessly consumes the solver output and continues with the query.
What is a solver?
A solver accepts constraints specified in some language and provides solutions using one or more techniques. A good introduction to solvers may be found:
- in the Wikipedia article on Constraint Programming
- in the introduction to Google’s OR-Tools
TVCFs Example
Here is the XPlusZEqualsY table from our example coded as a TVCF using a pythonic idiom:
CREATE FUNCTION XPlusZEqualsY
(X float|[constraint,..],
Z float|[constraint,..],
Y float|[constraint,..])
RETURNS TABLE
AS
INVARIANT: X+Z-Y=0
IF X, Z, and Y are float:
/* All parameters are provided */
IF INVARIANT(X,Z,Y):
RETURN (X,Z,Y)
ELSE:
RETURN(EMPTY_ROW())
/* Two out of three parameters */
IF X and Z are float:
RETURN (X, Z, X+Z)
ELIF X and Y are float:
RETURN (X, Y-X, Y)
ELIF Z and Y are float:
RETURN (Z-Y, Y, Z)
ELSE:
/* less than two parameters */
DELEGATE INVARIANT,X,Z,Y
;
This function has:
- named input parameters of X, Z, and Y which are either floats or a list of constraints on the value of the parameter.
- named result columns of X, Z and Y which are always floats.
The INVARIANT line captures the intent of the relation in a form:
- usable as a true/false predicate, and
- compilable to a form usable by a solver.
Let’s walk through the flow of the function
The behaviour of the function (copied at right) depends on what is provided by the query planner based on the context.
All parameters are provided. As would be the case in this example:
SELECT 1 FROM XPlusZEqualsY
WHERE X=1 AND Z=2 AND Z=3;
In this case, the provided row is returned if the invariant is true otherwise there is no matching value in the relation and the empty row is returned.
Two out of three parameters are provided. As would be the case in this example:
SELECT Z FROM XPlusZEqualsY
WHERE X=1 AND Y=2;
In this case the missing value is computed and the three values are returned as a row.
One or fewer parameters are provided. As in our example above (copied here for convenience):
SELECT
r1.X,
r1.Y
FROM
XPlusZEqualsY r1
INNER JOIN
XSquaredPlusYSquaredEqualsZ r2
ON r1.X = r2.X AND r1.Y = r2.Y
WHERE r1.Z = 1 and r2.Z = 25;
Here we have Z constrained to be 1 but X and Y are merely constrained not specified. In this case, the invariant with the parameters are delegated back to the query planner to compile and submit to a heuristically chosen solver.
CREATE FUNCTION XPlusZEqualsY
(X float|[constraint,..],
Z float|[constraint,..],
Y float|[constraint,..])
RETURNS TABLE
AS
INVARIANT: X+Z-Y=0
IF X, Z, and Y are float:
/* All parameters are provided */
IF INVARIANT(X,Z,Y):
RETURN (X,Z,Y)
ELSE:
RETURN(EMPTY_ROW())
/* Two out of three parameters */
IF X and Z are float:
RETURN (X, Z, X+Z)
ELIF X and Y are float:
RETURN (X, Y-X, Y)
ELIF Z and Y are float:
RETURN (Z-Y, Y, Z)
ELSE:
/* less than two parameters */
DELEGATE INVARIANT,X,Z,Y
;
Summary Pitch
SQL query writers are specifying constraints all the time, by using Table Valued Constrained Functions (TVCF) they gain access to the power of solvers with zero learning friction. Those responsible for writing SQL query planners are already using solver toolkits to solve other database challenges and are well placed to incorporate them into the heart of the relational model.