[Tarantool-discussions] RFC - distributed SQL, step #1 - AST

Timur Safin tsafin at tarantool.org
Wed Nov 25 00:52:32 MSK 2020


: From: m.semkin <m.semkin at corp.mail.ru>
: Subject: Re: RFC - distributed SQL, step #1 - AST
: 
: I wouldn’t separate combiner nodes from storage nodes. I think combining
: should always be done on storage nodes. The aim of combining is to perform
: the first level of aggregation in order to:
: 
: * reduce the network traffic
: * not to overfill router's OM
: * increase parallelism
: 
: So I think the most suitable place for combining is storage nodes.

I thought that there is theoretical possibility to introduce several
intermediate combine steps for aggregating intermediate results for
multi-steps request. And for that we would need to introduce combiner
nodes which are like front-end router, but serving only subrequests.

But on a second thought I'd rather agree with you:
- it's front-end router responsibility to plan cluster execution, and
  Split query to subqueries if there is need;
- and there is no need to transfer raw data to the different node if 
  combine step might be done using this storage SQL facilties. i.e.
  local aggregation (combine) is done at the storage node. Indeed.


Agreed, thanks for note!

Timur


: 
: > On 13 Nov 2020, at 13:06, Timur Safin <tsafin at tarantool.org> wrote:
: >
: > Distributed SQL: the first step - AST for parser
: > ===============================================
: >
: > Summary
: > -------
: >
: > There is preliminary decision that we try to approach distributed SQL as
: > next big thing which come in 2021. This is a long project, which will be
: > approached gradually. Here we will try to observe all necessary steps to
: > be done both in long term and short term period of time. Longer terms
: goals
: > will be described briefly, but shorter term goal (extracting of AST from
: > SQL parser) will be given in more details, we could do it because of
: current
: > set of Tarantool capabilities available and having some PoC already
: developed.
: >
: > Bear in mind, that for longer term goals, later version of this RFC will
: try
: > to collect wider relevant information, showing some (quiet random)
: industry
: > precedents (but that the list of industry examples will neither be
: scientific,
: > nor complete).
: >
: >
: > Vocabulary:
: > -----------
: >
: > We use standard MapReduce vocabulary when we talks about roles of
: > cluster nodes involved to the SQL queries processing.
: >
: > +---------------+-----------------------------------------------------+
: > | Router(s)     | The node which processes queries and send to the    |
: > |               | corresponding storage nodes for local processing.   |
: > |               | It combines/reduces resultant data and sends it     |
: > |               | back to client                                      |
: > +===============+=====================================================+
: > | Combiner(s)   | Depending on the aggregation complexity needs there |
: > |               | may be several intermediate nodes which combine     |
: > |               | (aggregate) intermediate data and send it back      |
: > +---------------+-----------------------------------------------------+
: > | Storage nodes |                                                     |
: > +---------------+-----------------------------------------------------+
: >
: >
: >
: > Distributed SQL scenario
: > ------------------------
: >
: > Once we move from single node case to multiple node case for SQL
: > execution all kinds of intra-node data exchange arise. Once we get
: > original SQL to the router node, it's expected that router would
: > preparse SQL query, (massage them appropriately) and then send some
: > command data to storage nodes for their local execution.
: >
: > **The question is** - what format of data should we send to storage node?
: >
: > We might try to send to the storage node the compiled binary VDBE
: > byte-code, but it looks to be a bad idea for several reasons:
: >
: > 1. Vdbe is not yet frozen and due to the technology used (lemon parser
: >   with on the fly generation of constants) it might differ very
: >   much between various versions even for builds from the same branch.
: >   Though, if we have to, we could take some extra measures to stabilize
: >   values of tokens generated;
: >
: > 2. But bigger problem is - different data distribution on different shard
: >   nodes in the cluster. Which may require different query plans used
: >   for the same SQL query. If we would generate blindly the single
: >   byte code for received SQL then we may degrade performance comparing
: >   to the case when bytecode would be generated locally, for each modes
: >   separately, taking local heuristics.
: >
: > So at the moment simpler approach would be more preferable:
: >
: > -  We simple transfer (modified) SQL query string to each of shard node
: >   involved;
: >
: > -  Or we could transfer AST serialized to some kind of binary form;
: >
: > We suggest, that in the 1st stage, take the simplest approach - transfer
: > simple SQL query in their textual form.
: >
: > Distributed SQL in the InterSystems IRIS
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > (TBD)
: >
: >
: > Distributed SQL in MemSQL/SingleStore
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
: > (TBD)
: >
: >
: > Distributed SQL in MariaDB SkySQL
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > https://mariadb.com/products/skysql/
: > (TBD)
: >
: >
: > Distributed SQL in Yugabyte
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > (TBD)
: >
: >
: > Mike Siomkin' distributed SQL PoC
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > There is working and deployed proof-of-concept project which has been
: > implemented by Mike Siomkin, and which implements distributed SQL query
: > concept using currently available Tarantool facilities.
: >
: > .. note::
: >
: >   There are some obvious limitations though, but it further proves
: >   the point that with relatively small efforts, restricted distributed
: >   SQL processing might be implemented in current Tarantool within
: >   relatively short time frame.
: >
: > For preliminary parsing of SQL queries Mike' code is using SQLParser
: > LuaRocks (https://github.com/tarantool/sqlparser) module which is
: > wrapping HyRise SQL parser implemented in C++
: > (https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
: > building abstract-syntax trees (AST).
: >
: > The intermediate part between cluster controller at the Tarantool side
: > and SQL parser is gridql.lua module. This is gridql responsibility to
: > parse SQL, analyze resultant AST, and enrich it appropriately for
: > aggregate functions, and pagination support. *I.e. queries sent to
: > storage node will be different to the original SQL query*, and will be
: > different to the query executed by combiner/reducer node.
: >
: >
: >
: > The used sql-parser module exports only 2 methods: parse(query), and
: > tostring(ast).
: >
: > -  `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
: >   parser mechanics and returns AST tree as ffi structure
: >   LuaSQLParseResult, which in turns, composed of series of
: >   LuaSQLStatement-based objects, which might be of various types
: >   (e.g. kStmtSelect,  kStmtImport,  kStmtInsert,  kStmtUpdate,
: >   kStmtDelete, etc.), each of them could be attributed different
: >   set of data, including LuaExpr lists of various kinds;
: >
: > -  `sqlparser.tostring(ast)` stringifies the passed AST object;
: >
: > Despite the fact that Hyrise SQL parser has *no knowledge about builtin
: > SQL functions* supported by Tarantool SQL, it's parsing facilities are
: > enough for AST tree traversal, because any known name is marked as
: > identifier of function, which is a good enough to start of SQL processing
: > in gridql module.
: >
: > Local SQL execution is being done using builtin Tarantool SQL engine,
: > thus such lack of functions knowledge is not a problem, iff we pass
: > transparently SQL query down to the node processor.
: >
: > Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
: > *supports only `SELECT`s*, and not handles any other kinds of requests
: > (i.e. `UPDATE`).
: >
: > Unfortunately, gridql, at their current form, could not be published due
: > to heavy usage of customer specific virtual tables, but there are claims
: > that it's possible to generalize and simplify code,
: > so it might be used elsewhere beyond current context.
: >
: >
: > Long-term goals
: > ---------------
: >
: > So, having many industry precendents we see that ideally, for distributed
: > SQL we have to have:
: >
: > -  Some kind of router accepts SQL query, and then preparses it to some
: >   kind of intermediate representation (AST);
: >
: > -  Topology aware query planner analyses parsed query and having knowledge
: >   of data distribution it sends parsed "AST" subqueries to only those
: >   nodes, which has relevant data. If there is no data locality known
: >   then all cluster involved via Map-Combine-Reduce operation;
: >
: > -  Query might be split into inner subqueries for which stages would be
: >   planned and executed separately;
: >
: > -  If transactions are not read only then cluster wide transaction
: >   manager / conflict manager to be involved for 2PC mechanics
: >   coordination;
: >
: > -  And it would be easier if distributed SQL module should work even
: >   for single-node config (with or without vshard involved) for
: >   simpler debugging purposes;
: >
: > Timings for these long-term plans are not yet known, but at the moment
: > we believe that the nearest subjects should be:
: >
: > 1. SQL parser refactoring to saving AST (with serialization and
: >   deserialization if necessary);
: >
: > 2. And transaction/conflict manager should be extended with cluster
: >   wide transaction support, to make possible next steps of queries
: >   beyond simple `SELECT`s;
: >
: > 2nd item is not SQL-specific, and will be handled elsewhere separately,
: > this RFC we will continue to talk about SQL only plans;
: >
: >
: > Short-term distributed SQL plan
: > -------------------------------
: >
: > At the moment parser, byte-code generation, and query execution is
: > tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
: > architecture largely inherited by Tarantool from SQLite parser. And such
: > close coupling might become a road-blocker for us in the longer term, when
: > we would have to go to different nodes.
: >
: > If we properly split query parser and query execution logics we will
: > simplify configuration, making it easier to approach distributed SQL.
: >
: > -  So, for the 1st, obvious step - we would need to create a
: >   separate/builtin module tarantool-sqlparser which would wrap SQL
: >   parsing in `parse(query)` method in a fashion similar to Mike
: >   Siomkin' `sqlparser` module above;
: >
: > -  The `sql.parse(query)` method would need to return AST data structures,
: >   exported via ffi.
: >
: >   -  At the moment only SELECT and VIEW queries build AST during
: >      parsing, this is major limitation, which would require some extra
: >      refactoring later, but it's ok for the 1st stage.
: >
: >   -  For the 2nd stage we would need to extend AST with more SQL
: >      statement types, e.g. UPDATE / DELETE.
: >
: >   -  Worth to mention, that current AST structures as defined in the
: >      `sqlint.h` are quite similar to that used in Mike' sqlparser
: >      module - for more details see comparison of LuaDataTypes.h to
: >      sqlint.h in the Appendix A below;
: >
: > -  As we build AST we may enrich returned AST nodes with information
: >   about builtin functions kinds and expression data types, specific
: >   for our SQL parser;
: >
: > -  So in addition to currently available ways to run SQL queries via:
: >
: >   -  direct `box.execute`,
: >
: >   -  Or 2 step `box.prepare + box.execute`
: >
: >   -  we would add `sql.parse` method, similar to `box.prepare`,
: >      which should be similarly executable via `box.prepare + box.execute`;
: >
: > -  This refactoring with separation of parse step, should still maintain
: >   fully working SQL execution cycle, i.e. with minimum code
: >   modifications all relevant SQL queries should pass whole
: >   Tarantool SQL test suite. (This might apply only to SELECT
: >   queries, for stage 1);
: >
: >
: >
: > Distributed testing scenario
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > -  Assumption is that decoupled sql.parse method, should be powerful
: >   enough to be able replace HyRise SQL parser in the gridql
: >   proof-of-concept code. Once being published it will be another
: >   indicator of intermediate success if it will be working with
: >   Tarantool SQL parser module. (Mike is planning to publish cleaned
: >   up code soon);
: >
: >   -  There is no need though in gridql for anything beyond simple
: >      SELECTs, which makes possible to create 1st implementation
: >      without major refactorings in Tarantool SQL parser, having only
: >      current data structures and code;
: >
: >   -  Thus addition of AST support for DELETE, INSERT, UPDATE statements
: >      will be done at stage #2, probably the next quarter, and it's
: >      not a goal for current plan;
: >
: >   -  i.e. we start with only read-only (SELECT and VIEW) queries, and
: >      not support RW operations yet;
: >
: > -  INSERT/UPDATE/DELETE queries will be done afterward, once we have
: >   distributed conflict manager and transaction managers implemented.
: >   And it's subject to coordinated efforts with Alexander Lyapunov team;
: >
: >
: >
: > Appendix A - AST data structures
: > --------------------------------
: >
: > +-----------------------------------------------+-------------------------
: ------+
: > | ``StatementType::kStmtSelect``                |
: ``ast_type::AST_TYPE_SELECT`` |
: >
: +===============================================+===========================
: ====+
: > |.. code:: c                                    |
: |
: > |                                               |
: |
: > | typedef struct {                              | **(there is none, yet)**
: |
: > |    bool isValid;                              |
: |
: > |    char* errorMsg;                            |
: |
: > |    int errorLine;                             |
: |
: > |    int errorColumn;                           |
: |
: > |    size_t statementCount;                     |
: |
: > |    struct LuaSQLStatement** statements;       |
: |
: > | } LuaSQLParserResult;                         |
: |
: > |                                               |
: |
: > +-----------------------------------------------+-------------------------
: ------+
: > |.. code:: c                                    |.. code:: c
: |
: > |                                               |
: |
: > | typedef struct LuaSelectStatement {           |  struct Select {
: |
: > |    struct LuaSQLStatement base;               |        ExprList *pEList;
: |
: > |                                               |        u8 op;
: |
: > |    struct LuaTableRef* fromTable;             |        LogEst
: nSelectRow;     |
: > |    bool selectDistinct;                       |        u32 selFlags;
: |
: > |                                               |        int iLimit,
: iOffset;   |
: > |    size_t selectListSize;                     |        char
: zSelName[12];     |
: > |    struct LuaExpr** selectList;               |        int
: addrOpenEphm[2];   |
: > |                                               |        SrcList *pSrc;
: |
: > |    struct LuaExpr* whereClause;               |        Expr *pWhere;
: |
: > |                                               |        ExprList
: *pGroupBy;    |
: > |    struct LuaGroupByDescription* groupBy;     |        Expr *pHaving;
: |
: > |                                               |        ExprList
: *pOrderBy;    |
: > |    size_t setOperationCount;                  |        Select *pPrior;
: |
: > |    struct LuaSetOperation** setOperations;    |        Select *pNext;
: |
: > |                                               |        Expr *pLimit;
: |
: > |    size_t orderCount;                         |        Expr *pOffset;
: |
: > |    struct LuaOrderDescription** order;        |        With *pWith;
: |
: > |                                               |  };
: |
: > |    size_t withDescriptionCount;               |
: |
: > |    struct LuaWithDescription**                |
: |
: > |           withDescriptions;                   |
: |
: > |    struct LuaLimitDescription* limit;         |
: |
: > | } LuaSelectStatement;                         |
: |
: > |                                               |
: |
: > +-----------------------------------------------+-------------------------
: ------+
: >
: > --
: > Timur
: > <distributed-sql-ast.html>




More information about the Tarantool-discussions mailing list