[Tarantool-discussions] RFC - distributed SQL, step #1 - AST

Fri Nov 13 13:06:40 MSK 2020

Distributed SQL: the first step - AST for parser
===============================================

Summary
-------

There is preliminary decision that we try to approach distributed SQL as
next big thing which come in 2021. This is a long project, which will be
approached gradually. Here we will try to observe all necessary steps to
be done both in long term and short term period of time. Longer terms goals
will be described briefly, but shorter term goal (extracting of AST from 
SQL parser) will be given in more details, we could do it because of current
set of Tarantool capabilities available and having some PoC already developed.

Bear in mind, that for longer term goals, later version of this RFC will try 
to collect wider relevant information, showing some (quiet random) industry
precedents (but that the list of industry examples will neither be scientific,
nor complete). 

Vocabulary:
-----------

We use standard MapReduce vocabulary when we talks about roles of
cluster nodes involved to the SQL queries processing.

+---------------+-----------------------------------------------------+
| Router(s)     | The node which processes queries and send to the    |
|               | corresponding storage nodes for local processing.   |
|               | It combines/reduces resultant data and sends it     |
|               | back to client                                      |
+===============+=====================================================+
| Combiner(s)   | Depending on the aggregation complexity needs there |
|               | may be several intermediate nodes which combine     |
|               | (aggregate) intermediate data and send it back      |
+---------------+-----------------------------------------------------+
| Storage nodes |                                                     |
+---------------+-----------------------------------------------------+

Distributed SQL scenario
------------------------

Once we move from single node case to multiple node case for SQL
execution all kinds of intra-node data exchange arise. Once we get
original SQL to the router node, it's expected that router would
preparse SQL query, (massage them appropriately) and then send some
command data to storage nodes for their local execution.

**The question is** - what format of data should we send to storage node?

We might try to send to the storage node the compiled binary VDBE
byte-code, but it looks to be a bad idea for several reasons:

1. Vdbe is not yet frozen and due to the technology used (lemon parser
   with on the fly generation of constants) it might differ very
   much between various versions even for builds from the same branch.
   Though, if we have to, we could take some extra measures to stabilize
   values of tokens generated;

2. But bigger problem is - different data distribution on different shard 
   nodes in the cluster. Which may require different query plans used 
   for the same SQL query. If we would generate blindly the single 
   byte code for received SQL then we may degrade performance comparing
   to the case when bytecode would be generated locally, for each modes 
   separately, taking local heuristics.

So at the moment simpler approach would be more preferable:

-  We simple transfer (modified) SQL query string to each of shard node 
   involved;

-  Or we could transfer AST serialized to some kind of binary form;

We suggest, that in the 1st stage, take the simplest approach - transfer 
simple SQL query in their textual form.

Distributed SQL in the InterSystems IRIS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(TBD)

Distributed SQL in MemSQL/SingleStore
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
 (TBD)

Distributed SQL in MariaDB SkySQL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

https://mariadb.com/products/skysql/
(TBD)

Distributed SQL in Yugabyte
~~~~~~~~~~~~~~~~~~~~~~~~~~~

(TBD)

Mike Siomkin' distributed SQL PoC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is working and deployed proof-of-concept project which has been
implemented by Mike Siomkin, and which implements distributed SQL query
concept using currently available Tarantool facilities. 

.. note::

   There are some obvious limitations though, but it further proves 
   the point that with relatively small efforts, restricted distributed
   SQL processing might be implemented in current Tarantool within
   relatively short time frame.

For preliminary parsing of SQL queries Mike' code is using SQLParser
LuaRocks (https://github.com/tarantool/sqlparser) module which is
wrapping HyRise SQL parser implemented in C++
(https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
building abstract-syntax trees (AST).

The intermediate part between cluster controller at the Tarantool side
and SQL parser is gridql.lua module. This is gridql responsibility to
parse SQL, analyze resultant AST, and enrich it appropriately for
aggregate functions, and pagination support. *I.e. queries sent to
storage node will be different to the original SQL query*, and will be
different to the query executed by combiner/reducer node.

The used sql-parser module exports only 2 methods: parse(query), and
tostring(ast).

-  `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
   parser mechanics and returns AST tree as ffi structure
   LuaSQLParseResult, which in turns, composed of series of
   LuaSQLStatement-based objects, which might be of various types
   (e.g. kStmtSelect,  kStmtImport,  kStmtInsert,  kStmtUpdate,
   kStmtDelete, etc.), each of them could be attributed different
   set of data, including LuaExpr lists of various kinds;

-  `sqlparser.tostring(ast)` stringifies the passed AST object;

Despite the fact that Hyrise SQL parser has *no knowledge about builtin
SQL functions* supported by Tarantool SQL, it's parsing facilities are
enough for AST tree traversal, because any known name is marked as 
identifier of function, which is a good enough to start of SQL processing
in gridql module.

Local SQL execution is being done using builtin Tarantool SQL engine,
thus such lack of functions knowledge is not a problem, iff we pass
transparently SQL query down to the node processor.

Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
*supports only `SELECT`s*, and not handles any other kinds of requests
(i.e. `UPDATE`).

Unfortunately, gridql, at their current form, could not be published due
to heavy usage of customer specific virtual tables, but there are claims
that it's possible to generalize and simplify code,
so it might be used elsewhere beyond current context.

Long-term goals
---------------

So, having many industry precendents we see that ideally, for distributed
SQL we have to have:

-  Some kind of router accepts SQL query, and then preparses it to some
   kind of intermediate representation (AST);

-  Topology aware query planner analyses parsed query and having knowledge
   of data distribution it sends parsed "AST" subqueries to only those
   nodes, which has relevant data. If there is no data locality known
   then all cluster involved via Map-Combine-Reduce operation;

-  Query might be split into inner subqueries for which stages would be
   planned and executed separately;

-  If transactions are not read only then cluster wide transaction
   manager / conflict manager to be involved for 2PC mechanics
   coordination;

-  And it would be easier if distributed SQL module should work even
   for single-node config (with or without vshard involved) for
   simpler debugging purposes;

Timings for these long-term plans are not yet known, but at the moment
we believe that the nearest subjects should be:

1. SQL parser refactoring to saving AST (with serialization and
   deserialization if necessary);

2. And transaction/conflict manager should be extended with cluster
   wide transaction support, to make possible next steps of queries
   beyond simple `SELECT`s;

2nd item is not SQL-specific, and will be handled elsewhere separately, 
this RFC we will continue to talk about SQL only plans;

Short-term distributed SQL plan
-------------------------------

At the moment parser, byte-code generation, and query execution is
tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
architecture largely inherited by Tarantool from SQLite parser. And such
close coupling might become a road-blocker for us in the longer term, when
we would have to go to different nodes.

If we properly split query parser and query execution logics we will 
simplify configuration, making it easier to approach distributed SQL.

-  So, for the 1st, obvious step - we would need to create a
   separate/builtin module tarantool-sqlparser which would wrap SQL
   parsing in `parse(query)` method in a fashion similar to Mike
   Siomkin' `sqlparser` module above;

-  The `sql.parse(query)` method would need to return AST data structures,
   exported via ffi.

   -  At the moment only SELECT and VIEW queries build AST during
      parsing, this is major limitation, which would require some extra
      refactoring later, but it's ok for the 1st stage.

   -  For the 2nd stage we would need to extend AST with more SQL
      statement types, e.g. UPDATE / DELETE.

   -  Worth to mention, that current AST structures as defined in the
      `sqlint.h` are quite similar to that used in Mike' sqlparser
      module - for more details see comparison of LuaDataTypes.h to
      sqlint.h in the Appendix A below;

-  As we build AST we may enrich returned AST nodes with information
   about builtin functions kinds and expression data types, specific
   for our SQL parser;

-  So in addition to currently available ways to run SQL queries via:

   -  direct `box.execute`,

   -  Or 2 step `box.prepare + box.execute`

   -  we would add `sql.parse` method, similar to `box.prepare`,
      which should be similarly executable via `box.prepare + box.execute`;

-  This refactoring with separation of parse step, should still maintain
   fully working SQL execution cycle, i.e. with minimum code
   modifications all relevant SQL queries should pass whole
   Tarantool SQL test suite. (This might apply only to SELECT
   queries, for stage 1);

Distributed testing scenario
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-  Assumption is that decoupled sql.parse method, should be powerful
   enough to be able replace HyRise SQL parser in the gridql
   proof-of-concept code. Once being published it will be another
   indicator of intermediate success if it will be working with
   Tarantool SQL parser module. (Mike is planning to publish cleaned
   up code soon);

   -  There is no need though in gridql for anything beyond simple
      SELECTs, which makes possible to create 1st implementation
      without major refactorings in Tarantool SQL parser, having only
      current data structures and code;

   -  Thus addition of AST support for DELETE, INSERT, UPDATE statements
      will be done at stage #2, probably the next quarter, and it's
      not a goal for current plan;

   -  i.e. we start with only read-only (SELECT and VIEW) queries, and
      not support RW operations yet;

-  INSERT/UPDATE/DELETE queries will be done afterward, once we have
   distributed conflict manager and transaction managers implemented.
   And it's subject to coordinated efforts with Alexander Lyapunov team;

Appendix A - AST data structures
--------------------------------

+-----------------------------------------------+-------------------------------+
| ``StatementType::kStmtSelect``                | ``ast_type::AST_TYPE_SELECT`` |
+===============================================+===============================+
|.. code:: c                                    |                               |
|                                               |                               |
| typedef struct {                              | **(there is none, yet)**      |
|    bool isValid;                              |                               |
|    char* errorMsg;                            |                               |
|    int errorLine;                             |                               |
|    int errorColumn;                           |                               |
|    size_t statementCount;                     |                               |
|    struct LuaSQLStatement** statements;       |                               |
| } LuaSQLParserResult;                         |                               |
|                                               |                               |
+-----------------------------------------------+-------------------------------+
|.. code:: c                                    |.. code:: c                    |
|                                               |                               |
| typedef struct LuaSelectStatement {           |  struct Select {              | 
|    struct LuaSQLStatement base;               |        ExprList *pEList;      |
|                                               |        u8 op;                 |
|    struct LuaTableRef* fromTable;             |        LogEst nSelectRow;     |
|    bool selectDistinct;                       |        u32 selFlags;          |
|                                               |        int iLimit, iOffset;   |
|    size_t selectListSize;                     |        char zSelName[12];     |
|    struct LuaExpr** selectList;               |        int addrOpenEphm[2];   |
|                                               |        SrcList *pSrc;         |
|    struct LuaExpr* whereClause;               |        Expr *pWhere;          |
|                                               |        ExprList *pGroupBy;    |
|    struct LuaGroupByDescription* groupBy;     |        Expr *pHaving;         |
|                                               |        ExprList *pOrderBy;    |
|    size_t setOperationCount;                  |        Select *pPrior;        |
|    struct LuaSetOperation** setOperations;    |        Select *pNext;         |
|                                               |        Expr *pLimit;          |
|    size_t orderCount;                         |        Expr *pOffset;         |
|    struct LuaOrderDescription** order;        |        With *pWith;           |
|                                               |  };                           |
|    size_t withDescriptionCount;               |                               |
|    struct LuaWithDescription**                |                               |
|           withDescriptions;                   |                               |
|    struct LuaLimitDescription* limit;         |                               |
| } LuaSelectStatement;                         |                               |
|                                               |                               |
+-----------------------------------------------+-------------------------------+

--
Timur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-discussions/attachments/20201113/24c0ed13/attachment.html>