[Tarantool-discussions] RFC - distributed SQL, step #1

Tarantool discussions archive
 help / color / mirror / Atom feed

* [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
@ 2020-11-13 10:06 Timur Safin
  2020-11-13 18:59 ` m.semkin
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Timur Safin @ 2020-11-13 10:06 UTC (permalink / raw)
  To: mons, Kirill Yukhin, korablev, Vladislav Shpilevoy, Sergey Ostanevich
  Cc: m.semkin, tarantool-discussions

[-- Attachment #1: Type: text/plain, Size: 15394 bytes --]

Distributed SQL: the first step - AST for parser
===============================================

Summary
-------

There is preliminary decision that we try to approach distributed SQL as
next big thing which come in 2021. This is a long project, which will be
approached gradually. Here we will try to observe all necessary steps to
be done both in long term and short term period of time. Longer terms goals
will be described briefly, but shorter term goal (extracting of AST from 
SQL parser) will be given in more details, we could do it because of current
set of Tarantool capabilities available and having some PoC already developed.

Bear in mind, that for longer term goals, later version of this RFC will try 
to collect wider relevant information, showing some (quiet random) industry
precedents (but that the list of industry examples will neither be scientific,
nor complete). 

Vocabulary:
-----------

We use standard MapReduce vocabulary when we talks about roles of
cluster nodes involved to the SQL queries processing.

+---------------+-----------------------------------------------------+
| Router(s)     | The node which processes queries and send to the    |
|               | corresponding storage nodes for local processing.   |
|               | It combines/reduces resultant data and sends it     |
|               | back to client                                      |
+===============+=====================================================+
| Combiner(s)   | Depending on the aggregation complexity needs there |
|               | may be several intermediate nodes which combine     |
|               | (aggregate) intermediate data and send it back      |
+---------------+-----------------------------------------------------+
| Storage nodes |                                                     |
+---------------+-----------------------------------------------------+

Distributed SQL scenario
------------------------

Once we move from single node case to multiple node case for SQL
execution all kinds of intra-node data exchange arise. Once we get
original SQL to the router node, it's expected that router would
preparse SQL query, (massage them appropriately) and then send some
command data to storage nodes for their local execution.

**The question is** - what format of data should we send to storage node?

We might try to send to the storage node the compiled binary VDBE
byte-code, but it looks to be a bad idea for several reasons:

1. Vdbe is not yet frozen and due to the technology used (lemon parser
   with on the fly generation of constants) it might differ very
   much between various versions even for builds from the same branch.
   Though, if we have to, we could take some extra measures to stabilize
   values of tokens generated;

2. But bigger problem is - different data distribution on different shard 
   nodes in the cluster. Which may require different query plans used 
   for the same SQL query. If we would generate blindly the single 
   byte code for received SQL then we may degrade performance comparing
   to the case when bytecode would be generated locally, for each modes 
   separately, taking local heuristics.

So at the moment simpler approach would be more preferable:

-  We simple transfer (modified) SQL query string to each of shard node 
   involved;

-  Or we could transfer AST serialized to some kind of binary form;

We suggest, that in the 1st stage, take the simplest approach - transfer 
simple SQL query in their textual form.

Distributed SQL in the InterSystems IRIS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(TBD)

Distributed SQL in MemSQL/SingleStore
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
 (TBD)

Distributed SQL in MariaDB SkySQL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

https://mariadb.com/products/skysql/
(TBD)

Distributed SQL in Yugabyte
~~~~~~~~~~~~~~~~~~~~~~~~~~~

(TBD)

Mike Siomkin' distributed SQL PoC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is working and deployed proof-of-concept project which has been
implemented by Mike Siomkin, and which implements distributed SQL query
concept using currently available Tarantool facilities. 

.. note::

   There are some obvious limitations though, but it further proves 
   the point that with relatively small efforts, restricted distributed
   SQL processing might be implemented in current Tarantool within
   relatively short time frame.

For preliminary parsing of SQL queries Mike' code is using SQLParser
LuaRocks (https://github.com/tarantool/sqlparser) module which is
wrapping HyRise SQL parser implemented in C++
(https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
building abstract-syntax trees (AST).

The intermediate part between cluster controller at the Tarantool side
and SQL parser is gridql.lua module. This is gridql responsibility to
parse SQL, analyze resultant AST, and enrich it appropriately for
aggregate functions, and pagination support. *I.e. queries sent to
storage node will be different to the original SQL query*, and will be
different to the query executed by combiner/reducer node.

The used sql-parser module exports only 2 methods: parse(query), and
tostring(ast).

-  `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
   parser mechanics and returns AST tree as ffi structure
   LuaSQLParseResult, which in turns, composed of series of
   LuaSQLStatement-based objects, which might be of various types
   (e.g. kStmtSelect,  kStmtImport,  kStmtInsert,  kStmtUpdate,
   kStmtDelete, etc.), each of them could be attributed different
   set of data, including LuaExpr lists of various kinds;

-  `sqlparser.tostring(ast)` stringifies the passed AST object;

Despite the fact that Hyrise SQL parser has *no knowledge about builtin
SQL functions* supported by Tarantool SQL, it's parsing facilities are
enough for AST tree traversal, because any known name is marked as 
identifier of function, which is a good enough to start of SQL processing
in gridql module.

Local SQL execution is being done using builtin Tarantool SQL engine,
thus such lack of functions knowledge is not a problem, iff we pass
transparently SQL query down to the node processor.

Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
*supports only `SELECT`s*, and not handles any other kinds of requests
(i.e. `UPDATE`).

Unfortunately, gridql, at their current form, could not be published due
to heavy usage of customer specific virtual tables, but there are claims
that it's possible to generalize and simplify code,
so it might be used elsewhere beyond current context.

Long-term goals
---------------

So, having many industry precendents we see that ideally, for distributed
SQL we have to have:

-  Some kind of router accepts SQL query, and then preparses it to some
   kind of intermediate representation (AST);

-  Topology aware query planner analyses parsed query and having knowledge
   of data distribution it sends parsed "AST" subqueries to only those
   nodes, which has relevant data. If there is no data locality known
   then all cluster involved via Map-Combine-Reduce operation;

-  Query might be split into inner subqueries for which stages would be
   planned and executed separately;

-  If transactions are not read only then cluster wide transaction
   manager / conflict manager to be involved for 2PC mechanics
   coordination;

-  And it would be easier if distributed SQL module should work even
   for single-node config (with or without vshard involved) for
   simpler debugging purposes;

Timings for these long-term plans are not yet known, but at the moment
we believe that the nearest subjects should be:

1. SQL parser refactoring to saving AST (with serialization and
   deserialization if necessary);

2. And transaction/conflict manager should be extended with cluster
   wide transaction support, to make possible next steps of queries
   beyond simple `SELECT`s;

2nd item is not SQL-specific, and will be handled elsewhere separately, 
this RFC we will continue to talk about SQL only plans;

Short-term distributed SQL plan
-------------------------------

At the moment parser, byte-code generation, and query execution is
tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
architecture largely inherited by Tarantool from SQLite parser. And such
close coupling might become a road-blocker for us in the longer term, when
we would have to go to different nodes.

If we properly split query parser and query execution logics we will 
simplify configuration, making it easier to approach distributed SQL.

-  So, for the 1st, obvious step - we would need to create a
   separate/builtin module tarantool-sqlparser which would wrap SQL
   parsing in `parse(query)` method in a fashion similar to Mike
   Siomkin' `sqlparser` module above;

-  The `sql.parse(query)` method would need to return AST data structures,
   exported via ffi.

   -  At the moment only SELECT and VIEW queries build AST during
      parsing, this is major limitation, which would require some extra
      refactoring later, but it's ok for the 1st stage.

   -  For the 2nd stage we would need to extend AST with more SQL
      statement types, e.g. UPDATE / DELETE.

   -  Worth to mention, that current AST structures as defined in the
      `sqlint.h` are quite similar to that used in Mike' sqlparser
      module - for more details see comparison of LuaDataTypes.h to
      sqlint.h in the Appendix A below;

-  As we build AST we may enrich returned AST nodes with information
   about builtin functions kinds and expression data types, specific
   for our SQL parser;

-  So in addition to currently available ways to run SQL queries via:

   -  direct `box.execute`,

   -  Or 2 step `box.prepare + box.execute`

   -  we would add `sql.parse` method, similar to `box.prepare`,
      which should be similarly executable via `box.prepare + box.execute`;

-  This refactoring with separation of parse step, should still maintain
   fully working SQL execution cycle, i.e. with minimum code
   modifications all relevant SQL queries should pass whole
   Tarantool SQL test suite. (This might apply only to SELECT
   queries, for stage 1);

Distributed testing scenario
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-  Assumption is that decoupled sql.parse method, should be powerful
   enough to be able replace HyRise SQL parser in the gridql
   proof-of-concept code. Once being published it will be another
   indicator of intermediate success if it will be working with
   Tarantool SQL parser module. (Mike is planning to publish cleaned
   up code soon);

   -  There is no need though in gridql for anything beyond simple
      SELECTs, which makes possible to create 1st implementation
      without major refactorings in Tarantool SQL parser, having only
      current data structures and code;

   -  Thus addition of AST support for DELETE, INSERT, UPDATE statements
      will be done at stage #2, probably the next quarter, and it's
      not a goal for current plan;

   -  i.e. we start with only read-only (SELECT and VIEW) queries, and
      not support RW operations yet;

-  INSERT/UPDATE/DELETE queries will be done afterward, once we have
   distributed conflict manager and transaction managers implemented.
   And it's subject to coordinated efforts with Alexander Lyapunov team;

Appendix A - AST data structures
--------------------------------

+-----------------------------------------------+-------------------------------+
| ``StatementType::kStmtSelect``                | ``ast_type::AST_TYPE_SELECT`` |
+===============================================+===============================+
|.. code:: c                                    |                               |
|                                               |                               |
| typedef struct {                              | **(there is none, yet)**      |
|    bool isValid;                              |                               |
|    char* errorMsg;                            |                               |
|    int errorLine;                             |                               |
|    int errorColumn;                           |                               |
|    size_t statementCount;                     |                               |
|    struct LuaSQLStatement** statements;       |                               |
| } LuaSQLParserResult;                         |                               |
|                                               |                               |
+-----------------------------------------------+-------------------------------+
|.. code:: c                                    |.. code:: c                    |
|                                               |                               |
| typedef struct LuaSelectStatement {           |  struct Select {              | 
|    struct LuaSQLStatement base;               |        ExprList *pEList;      |
|                                               |        u8 op;                 |
|    struct LuaTableRef* fromTable;             |        LogEst nSelectRow;     |
|    bool selectDistinct;                       |        u32 selFlags;          |
|                                               |        int iLimit, iOffset;   |
|    size_t selectListSize;                     |        char zSelName[12];     |
|    struct LuaExpr** selectList;               |        int addrOpenEphm[2];   |
|                                               |        SrcList *pSrc;         |
|    struct LuaExpr* whereClause;               |        Expr *pWhere;          |
|                                               |        ExprList *pGroupBy;    |
|    struct LuaGroupByDescription* groupBy;     |        Expr *pHaving;         |
|                                               |        ExprList *pOrderBy;    |
|    size_t setOperationCount;                  |        Select *pPrior;        |
|    struct LuaSetOperation** setOperations;    |        Select *pNext;         |
|                                               |        Expr *pLimit;          |
|    size_t orderCount;                         |        Expr *pOffset;         |
|    struct LuaOrderDescription** order;        |        With *pWith;           |
|                                               |  };                           |
|    size_t withDescriptionCount;               |                               |
|    struct LuaWithDescription**                |                               |
|           withDescriptions;                   |                               |
|    struct LuaLimitDescription* limit;         |                               |
| } LuaSelectStatement;                         |                               |
|                                               |                               |
+-----------------------------------------------+-------------------------------+

--
Timur

[-- Attachment #2: distributed-sql-ast.html --]
[-- Type: text/html, Size: 20728 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-13 10:06 [Tarantool-discussions] RFC - distributed SQL, step #1 - AST Timur Safin
@ 2020-11-13 18:59 ` m.semkin
  2020-11-24 21:52   ` Timur Safin
  2020-11-20  0:05 ` Peter Gulutzan
  2020-11-24 22:06 ` Timur Safin
  2 siblings, 1 reply; 8+ messages in thread
From: m.semkin @ 2020-11-13 18:59 UTC (permalink / raw)
  To: Timur Safin; +Cc: mons, tarantool-discussions

I wouldn’t separate combiner nodes from storage nodes. I think combining
should always be done on storage nodes. The aim of combining is to perform
the first level of aggregation in order to:

* reduce the network traffic
* not to overfill router's OM
* increase parallelism

So I think the most suitable place for combining is storage nodes.

> On 13 Nov 2020, at 13:06, Timur Safin <tsafin@tarantool.org> wrote:
> 
> Distributed SQL: the first step - AST for parser
> ===============================================
> 
> Summary
> -------
> 
> There is preliminary decision that we try to approach distributed SQL as
> next big thing which come in 2021. This is a long project, which will be
> approached gradually. Here we will try to observe all necessary steps to
> be done both in long term and short term period of time. Longer terms goals
> will be described briefly, but shorter term goal (extracting of AST from 
> SQL parser) will be given in more details, we could do it because of current
> set of Tarantool capabilities available and having some PoC already developed.
> 
> Bear in mind, that for longer term goals, later version of this RFC will try 
> to collect wider relevant information, showing some (quiet random) industry
> precedents (but that the list of industry examples will neither be scientific,
> nor complete). 
> 
> 
> Vocabulary:
> -----------
> 
> We use standard MapReduce vocabulary when we talks about roles of
> cluster nodes involved to the SQL queries processing.
> 
> +---------------+-----------------------------------------------------+
> | Router(s)     | The node which processes queries and send to the    |
> |               | corresponding storage nodes for local processing.   |
> |               | It combines/reduces resultant data and sends it     |
> |               | back to client                                      |
> +===============+=====================================================+
> | Combiner(s)   | Depending on the aggregation complexity needs there |
> |               | may be several intermediate nodes which combine     |
> |               | (aggregate) intermediate data and send it back      |
> +---------------+-----------------------------------------------------+
> | Storage nodes |                                                     |
> +---------------+-----------------------------------------------------+
> 
> 
> 
> Distributed SQL scenario
> ------------------------
> 
> Once we move from single node case to multiple node case for SQL
> execution all kinds of intra-node data exchange arise. Once we get
> original SQL to the router node, it's expected that router would
> preparse SQL query, (massage them appropriately) and then send some
> command data to storage nodes for their local execution.
> 
> **The question is** - what format of data should we send to storage node?
> 
> We might try to send to the storage node the compiled binary VDBE
> byte-code, but it looks to be a bad idea for several reasons:
> 
> 1. Vdbe is not yet frozen and due to the technology used (lemon parser
>   with on the fly generation of constants) it might differ very
>   much between various versions even for builds from the same branch.
>   Though, if we have to, we could take some extra measures to stabilize
>   values of tokens generated;
> 
> 2. But bigger problem is - different data distribution on different shard 
>   nodes in the cluster. Which may require different query plans used 
>   for the same SQL query. If we would generate blindly the single 
>   byte code for received SQL then we may degrade performance comparing
>   to the case when bytecode would be generated locally, for each modes 
>   separately, taking local heuristics.
> 
> So at the moment simpler approach would be more preferable:
> 
> -  We simple transfer (modified) SQL query string to each of shard node 
>   involved;
> 
> -  Or we could transfer AST serialized to some kind of binary form;
> 
> We suggest, that in the 1st stage, take the simplest approach - transfer 
> simple SQL query in their textual form.
> 
> Distributed SQL in the InterSystems IRIS
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> (TBD)
> 
> 
> Distributed SQL in MemSQL/SingleStore
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
> (TBD)
> 
> 
> Distributed SQL in MariaDB SkySQL
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> https://mariadb.com/products/skysql/
> (TBD)
> 
> 
> Distributed SQL in Yugabyte
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> (TBD)
> 
> 
> Mike Siomkin' distributed SQL PoC
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> There is working and deployed proof-of-concept project which has been
> implemented by Mike Siomkin, and which implements distributed SQL query
> concept using currently available Tarantool facilities. 
> 
> .. note::
> 
>   There are some obvious limitations though, but it further proves 
>   the point that with relatively small efforts, restricted distributed
>   SQL processing might be implemented in current Tarantool within
>   relatively short time frame.
> 
> For preliminary parsing of SQL queries Mike' code is using SQLParser
> LuaRocks (https://github.com/tarantool/sqlparser) module which is
> wrapping HyRise SQL parser implemented in C++
> (https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
> building abstract-syntax trees (AST).
> 
> The intermediate part between cluster controller at the Tarantool side
> and SQL parser is gridql.lua module. This is gridql responsibility to
> parse SQL, analyze resultant AST, and enrich it appropriately for
> aggregate functions, and pagination support. *I.e. queries sent to
> storage node will be different to the original SQL query*, and will be
> different to the query executed by combiner/reducer node.
> 
> 
> 
> The used sql-parser module exports only 2 methods: parse(query), and
> tostring(ast).
> 
> -  `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
>   parser mechanics and returns AST tree as ffi structure
>   LuaSQLParseResult, which in turns, composed of series of
>   LuaSQLStatement-based objects, which might be of various types
>   (e.g. kStmtSelect,  kStmtImport,  kStmtInsert,  kStmtUpdate,
>   kStmtDelete, etc.), each of them could be attributed different
>   set of data, including LuaExpr lists of various kinds;
> 
> -  `sqlparser.tostring(ast)` stringifies the passed AST object;
> 
> Despite the fact that Hyrise SQL parser has *no knowledge about builtin
> SQL functions* supported by Tarantool SQL, it's parsing facilities are
> enough for AST tree traversal, because any known name is marked as 
> identifier of function, which is a good enough to start of SQL processing
> in gridql module.
> 
> Local SQL execution is being done using builtin Tarantool SQL engine,
> thus such lack of functions knowledge is not a problem, iff we pass
> transparently SQL query down to the node processor.
> 
> Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
> *supports only `SELECT`s*, and not handles any other kinds of requests
> (i.e. `UPDATE`).
> 
> Unfortunately, gridql, at their current form, could not be published due
> to heavy usage of customer specific virtual tables, but there are claims
> that it's possible to generalize and simplify code,
> so it might be used elsewhere beyond current context.
> 
> 
> Long-term goals
> ---------------
> 
> So, having many industry precendents we see that ideally, for distributed
> SQL we have to have:
> 
> -  Some kind of router accepts SQL query, and then preparses it to some
>   kind of intermediate representation (AST);
> 
> -  Topology aware query planner analyses parsed query and having knowledge
>   of data distribution it sends parsed "AST" subqueries to only those
>   nodes, which has relevant data. If there is no data locality known
>   then all cluster involved via Map-Combine-Reduce operation;
> 
> -  Query might be split into inner subqueries for which stages would be
>   planned and executed separately;
> 
> -  If transactions are not read only then cluster wide transaction
>   manager / conflict manager to be involved for 2PC mechanics
>   coordination;
> 
> -  And it would be easier if distributed SQL module should work even
>   for single-node config (with or without vshard involved) for
>   simpler debugging purposes;
> 
> Timings for these long-term plans are not yet known, but at the moment
> we believe that the nearest subjects should be:
> 
> 1. SQL parser refactoring to saving AST (with serialization and
>   deserialization if necessary);
> 
> 2. And transaction/conflict manager should be extended with cluster
>   wide transaction support, to make possible next steps of queries
>   beyond simple `SELECT`s;
> 
> 2nd item is not SQL-specific, and will be handled elsewhere separately, 
> this RFC we will continue to talk about SQL only plans;
> 
> 
> Short-term distributed SQL plan
> -------------------------------
> 
> At the moment parser, byte-code generation, and query execution is
> tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
> architecture largely inherited by Tarantool from SQLite parser. And such
> close coupling might become a road-blocker for us in the longer term, when
> we would have to go to different nodes.
> 
> If we properly split query parser and query execution logics we will 
> simplify configuration, making it easier to approach distributed SQL.
> 
> -  So, for the 1st, obvious step - we would need to create a
>   separate/builtin module tarantool-sqlparser which would wrap SQL
>   parsing in `parse(query)` method in a fashion similar to Mike
>   Siomkin' `sqlparser` module above;
> 
> -  The `sql.parse(query)` method would need to return AST data structures,
>   exported via ffi.
> 
>   -  At the moment only SELECT and VIEW queries build AST during
>      parsing, this is major limitation, which would require some extra
>      refactoring later, but it's ok for the 1st stage.
> 
>   -  For the 2nd stage we would need to extend AST with more SQL
>      statement types, e.g. UPDATE / DELETE.
> 
>   -  Worth to mention, that current AST structures as defined in the
>      `sqlint.h` are quite similar to that used in Mike' sqlparser
>      module - for more details see comparison of LuaDataTypes.h to
>      sqlint.h in the Appendix A below;
> 
> -  As we build AST we may enrich returned AST nodes with information
>   about builtin functions kinds and expression data types, specific
>   for our SQL parser;
> 
> -  So in addition to currently available ways to run SQL queries via:
> 
>   -  direct `box.execute`,
> 
>   -  Or 2 step `box.prepare + box.execute`
> 
>   -  we would add `sql.parse` method, similar to `box.prepare`,
>      which should be similarly executable via `box.prepare + box.execute`;
> 
> -  This refactoring with separation of parse step, should still maintain
>   fully working SQL execution cycle, i.e. with minimum code
>   modifications all relevant SQL queries should pass whole
>   Tarantool SQL test suite. (This might apply only to SELECT
>   queries, for stage 1);
> 
> 
> 
> Distributed testing scenario
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> -  Assumption is that decoupled sql.parse method, should be powerful
>   enough to be able replace HyRise SQL parser in the gridql
>   proof-of-concept code. Once being published it will be another
>   indicator of intermediate success if it will be working with
>   Tarantool SQL parser module. (Mike is planning to publish cleaned
>   up code soon);
> 
>   -  There is no need though in gridql for anything beyond simple
>      SELECTs, which makes possible to create 1st implementation
>      without major refactorings in Tarantool SQL parser, having only
>      current data structures and code;
> 
>   -  Thus addition of AST support for DELETE, INSERT, UPDATE statements
>      will be done at stage #2, probably the next quarter, and it's
>      not a goal for current plan;
> 
>   -  i.e. we start with only read-only (SELECT and VIEW) queries, and
>      not support RW operations yet;
> 
> -  INSERT/UPDATE/DELETE queries will be done afterward, once we have
>   distributed conflict manager and transaction managers implemented.
>   And it's subject to coordinated efforts with Alexander Lyapunov team;
> 
> 
> 
> Appendix A - AST data structures
> --------------------------------
> 
> +-----------------------------------------------+-------------------------------+
> | ``StatementType::kStmtSelect``                | ``ast_type::AST_TYPE_SELECT`` |
> +===============================================+===============================+
> |.. code:: c                                    |                               |
> |                                               |                               |
> | typedef struct {                              | **(there is none, yet)**      |
> |    bool isValid;                              |                               |
> |    char* errorMsg;                            |                               |
> |    int errorLine;                             |                               |
> |    int errorColumn;                           |                               |
> |    size_t statementCount;                     |                               |
> |    struct LuaSQLStatement** statements;       |                               |
> | } LuaSQLParserResult;                         |                               |
> |                                               |                               |
> +-----------------------------------------------+-------------------------------+
> |.. code:: c                                    |.. code:: c                    |
> |                                               |                               |
> | typedef struct LuaSelectStatement {           |  struct Select {              | 
> |    struct LuaSQLStatement base;               |        ExprList *pEList;      |
> |                                               |        u8 op;                 |
> |    struct LuaTableRef* fromTable;             |        LogEst nSelectRow;     |
> |    bool selectDistinct;                       |        u32 selFlags;          |
> |                                               |        int iLimit, iOffset;   |
> |    size_t selectListSize;                     |        char zSelName[12];     |
> |    struct LuaExpr** selectList;               |        int addrOpenEphm[2];   |
> |                                               |        SrcList *pSrc;         |
> |    struct LuaExpr* whereClause;               |        Expr *pWhere;          |
> |                                               |        ExprList *pGroupBy;    |
> |    struct LuaGroupByDescription* groupBy;     |        Expr *pHaving;         |
> |                                               |        ExprList *pOrderBy;    |
> |    size_t setOperationCount;                  |        Select *pPrior;        |
> |    struct LuaSetOperation** setOperations;    |        Select *pNext;         |
> |                                               |        Expr *pLimit;          |
> |    size_t orderCount;                         |        Expr *pOffset;         |
> |    struct LuaOrderDescription** order;        |        With *pWith;           |
> |                                               |  };                           |
> |    size_t withDescriptionCount;               |                               |
> |    struct LuaWithDescription**                |                               |
> |           withDescriptions;                   |                               |
> |    struct LuaLimitDescription* limit;         |                               |
> | } LuaSelectStatement;                         |                               |
> |                                               |                               |
> +-----------------------------------------------+-------------------------------+
> 
> --
> Timur
> <distributed-sql-ast.html>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-13 10:06 [Tarantool-discussions] RFC - distributed SQL, step #1 - AST Timur Safin
  2020-11-13 18:59 ` m.semkin
@ 2020-11-20  0:05 ` Peter Gulutzan
  2020-11-24 21:55   ` Timur Safin
  2020-11-24 22:06 ` Timur Safin
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Gulutzan @ 2020-11-20  0:05 UTC (permalink / raw)
  To: Timur Safin; +Cc: m.semkin, mons, tarantool-discussions

Hi,

I only want to address the possible SQL syntax adjustments.

You mentioned "vshard" so I take it that we must assume use of
https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/
so I mean: what SQL adjustments can take advantage of vshard.

If we have some appropriate things in SQL, we can:
() make it appear that migration from other SQL DBMSs is easier
() partly fulfill the "distributed SQL" by really doing it in SQL.
() allow users to design tables and queries with shard awareness.
However, if I guess something looks hard, I'll say: make it illegal.

BUCKET_ID

All tuples, and most queries, need a column which contains numbers
between 1 and bucket_count, and has a non-unique index.
More flexibility would be nice but assume that's true for a year.
Since a typical SQL database will contain column values that have
different definitions, we are looking for a way that an SQL user
can declare, for each table, "here is a column that can be mapped
to bucket_id". Thus, in SQL terms, bucket_id is a GENERATED column
which is derived from other columns in the table, or system values.
For example, if column_x is an unsigned primary key, perhaps
bucket_id = mod(primary_key_value, bucket_count).
Any deterministic function will do.
Changing bucket_count will be difficult.
Changing the source column will be illegal.

Of course the initial assumption is that a router will handle
distribution questions automatically but the feature might
look better if there is clear support in the syntax.

"Distributed SQL" should mean that you can specify something in SQL.
There are some common attributes that can be specified,
and the most obvious one is: there is a choice of distribution
methods for each table.
In other words: saying
"It is distributed SQL but it's not in SQL and the only thing
you can do is distribute by a hash"
will not look like what some people think is distributed SQL.
So I am describing some additional SQL syntax
which is non-standard but very common,
with slight extra functionality that I hope will not look hard.

CREATE TABLE statement

I am suggesting a new optional clause, and a change to an
existing clause, that would at least indicate whether and how
the table is sharded.

It will show up in information_schema
and in error messages in a way that SQL users will understand.
Also, a syntax check can determine whether the clause is illegal
because there are other CREATE TABLE clauses that won't work
in combination with the new or changed clause.
(I am thinking of foreign-key and check clauses, possibly.)

CREATE TABLE ... PARTITION BY ...

Should the clause be PARTITION BY?
Of course the clause could start with the keyword SHARD, or something else.
And of course we do not wish to appear to be saying that
shards are synonymous with partitions.
But we can decide, as
PostgreSQL did when they "implemented sharding on top of partitioning"
https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/
that a partition description will affect a shard definition,
so it is okay unless we plan to do partitioning without sharding later.
And PARTITION BY is the clause that is already used in
MySQL/MariaDB
https://dev.mysql.com/doc/refman/8.0/en/create-table.html
https://mariadb.com/kb/en/create-table/#partitions
PostgreSQL:
https://www.postgresql.org/docs/11/sql-createtable.html
Oracle (though they add "You cannot partition a table that is part of a 
cluster."):
https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2215406
https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2215406
DB2:
https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_sql_createtable.html
So, although this is only part of what those other vendors do, I suggest
PARTITION BY some-keyword, and we can add more options as time goes by.

The word PARTITION is already a reserved word.

REJECTING REDSHIFT

To be fair, I must acknowledge the existence of other syntaxes,
such as SQL Server's (which allows specifying functions) and most
interestingly Amazon's. For example their ALTER TABLE options include
https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html
| ALTER DISTKEY column_name
| ALTER DISTSTYLE ALL
| ALTER DISTSTYLE EVEN
| ALTER DISTSTYLE KEY DISTKEY column_name
| ALTER DISTSTYLE AUTO
where ALL means "all nodes should get a copy of the table",
DISTKEY i.e. Distribution Key means something like partition key,
AUTO means let the system figure it out.
I like ALL but it would require a significant change to vshard rules.
I like AUTO but it is nearly useless without ALL.
Therefore, given that these options are uncommon, I ignore them.

(By the way Redshift's "leader node" can itself have tables so it
doesn't need to pass all queries to other nodes, maybe our "router"
could do something similar, but that's not an issue here.)

CREATE TABLE ... PARTITION BY HASH (partition-key-column-names);

I am assuming that we don't need to change vshard hashing or
add more hash algorithms. And I don't refer here to the way
that vshard handles bucket_id, I only expect the hash here to
provide us with a bucket_id value when partition-key-columns are known.
So perhaps it is uncontroversial that HASH (primary-key-column-names)
would be legal. But is it default and is it compulsory?

I think the answers are 'no' and 'no' but am uncertain what does it
mean if this clause is absent. It could mean
"iff the whole database is sharded then assume it"
or it could mean
"even if the database is sharded this is local and probably temporary".

It would be simplest if partition-key-column-names happened to be
a primary-key-column, because updating a primary-key-column is
illegal already. Anyway, updating partition-key-column-names is illegal.

The list of columns in CREATE TABLE must include "bucket_id" (lower case).
Perhaps we could assume that it exists and not show it with SELECT *.
But I think it is useful to know its value and useful to be able to
specify it anywhere in the column list. So now an example:

CREATE TABLE t (column_1 INT PRIMARY KEY, "bucket" INT, column_2 STRING)
PARTITION BY HASH (column_2);
INSERT INTO t VALUES (1, 'A');
SELECT * FROM t;
Result: 1, 41, 'A'.
Notice assumption: when inserting, I don't need to specify "bucket_id".
Notice assumption: bucket_id = code point of first letter of string.
Maybe those are bad assumptions, I just want the example to be easy.

I suppose that PARTITION BY clause comes after WITH clause,
as in MySQL where partition options come after table options.

CREATE TABLE ... PARTITION BY RANGE (expression)
PARTITION partition_name VALUES LESS THAN (value) [, ... ]);

This will be a bit more detailed than the first CREATE TABLE,
and will allow the user more control over what bucket a row goes to,
but will not be default and maybe will not be necessary.

An example could be
CREATE TABLE t (column_1 INT PRIMARY KEY, "bucket" INT, column_2 STRING)
PARTITION BY RANGE (column_2)
PARTITION p1 VALUES LESS THAN ('J'),
PARTITION p2 VALUES LESS THAN ('Z');
The result would be that there are only two possible bucket_id values,
1 for 'p1', and 2 for 'p2'.

Notice that this definition means that column_2 will always be less
than 'Z', so an additional clause
CHECK (column_2 < 'Y') would be redundant
and an additional clause
CHECK (column_2 > 'Д') should be illegal.

It may be useful that this means there is a one-to-one association
between partition_name and bucket_id value, as we will see later.

It may not be useful that this means a column with monotonically
increasing values will tend to be cause re-use of the same partition.

FOREIGN KEYS

The good news about foreign keys is:
if the foreign key and the referenced key are also partition keys
and the partition clauses are the same, then rows with the same
values will be in the same buckets. In such cases we can guarantee
that INSERT/UPDATE/DELETE will work without needing multiple nodes,
and equijoins will work if they are ANDED with single-row
selections of either key.

The bad news is:
Otherwise there has to be at least one lookup which might
be for a value in a different bucket, and the
ON UPDATE | ON DELETE clauses cause changes of values in
different buckets. That is, they are "multi-row".

So the recommendation is: let foreign keys be illegal.

TABLE REFERENCES

Now that we know there is sharding, we can change
FROM table_name
to
FROM table_name [PARTITION (partition_name)]
which would override router decisions because p1 is associated
with bucket.

This is manual "pruning".
"Pruning" can mean "only go to the necessary partitions".
Usually we hope an optimizer will do it, but let's not assume so.

In some systems one can say PARTITION (expression) instead,
but that's unnecessary, I think.

Table references can have PARTITION clauses in SELECT,
INSERT, UPDATE, DELETE, REPLACE, and perhaps other statements.
But they should be unnecessary for single-row INSERTs,
because single-row INSERTs will always contain the partition-key
values and therefore bucket_id is known already and therefore
passing partition name would be redundant.

INSERT

As stated, INSERT should be easy because bucket_id is easy
to calculate from the rest of the statement.

On the other hand, INSERT ... SELECT is hard, so I suggest
that it should be illegal in the first release.

UPDATE, SELECT, DELETE

Either these statements should have a partition name or they should
have a WHERE clause that contains the exact phrase
partition_key_value = literal
so that even a very simple program can calculate bucket_id.
Otherwise they should be illegal in the  first release.

Updating partition-key columns is illegal too.

ALTER

When ALTER ... DROP is illegal, it should be illegal to drop the 
"_bucket" column
or any column used for partitioning.

DROP INDEX

It should be illegal to drop indexes on the _bucket column.
It probably should be legal to drop indexes on columns used for 
partitioning.

INFORMATION_SCHEMA

The _TABLES table should show the partition definitions.
The _COLUMNS table should show whether a column is in a partition key.
Minimally, a user or a program should be able to see enough to decide
whether manual pruning is possible for a simple query.

Actually I suppose this is my job, since the only implementation of
INFORMATION_SCHEMA is the series of functions that I wrote for the
user manual. Remember that one of those functions will crash until
a certain server bug is fixed.

If it's desirable, I can write other routines that put bucket
information in INFORMATION_SCHEMA. That's easy for anyone to do,
it merely involves reading system tables and calling Lua functions.
However, I believe the display should be as a table -- that's what
any SQL person would expect -- so for example suppose we had this.
(not a system table but similar to an example in the manual):
tarantool> cfg = {
          >   servers = {
          >     { uri = 'host1:33131', zone = '1' },
          >     { uri = 'host2:33131', zone = '2' }
          >   },
          >   login = 'test_user',
          >   password = 'pass',
          >   redundancy = '2'
          > }
I want this display if I select:
| --------- + -------- + ---------- + ------------ + ------------- + 
-------------+ --------------|
| login     | password | redundancy | server_1_uri | server_1_zone | 
server_2_uri | server_2_zone |
| --------- + -------- + ---------- + ------------ + ------------- + 
-------------+ --------------|
| test_user | pass     | 2          | host:33131   | 1             | 
host2:33131  | 2             |
| --------- + -------- + ---------- + ------------ + ------------- + 
-------------+ --------------|
This is not a server issue, it is a client feature.

EXPLAIN | ANALYZE

I expect that the simple phrase "this can be done on a single node"
is the most useful thing that can be said. More is not necessary.

ON CLAUSE AND WHERE CLAUSE

These are the places where we're most likely to see joins or subqueries.
Suppose they require a search of more than one partition/bucket/node/etc.
We could cause a warning if someday we support warnings.
We could cause an error if the user fails to add a keyword (like FORCE)
to indicate an understanding of the possible consequences.
We could require 'admin' privilege to do slow queries of any kind.
But maybe we don't need to do anything, it might not be a big problem.

I believe, though, that everyone will be happier if we say:
joins and subqueries are illegal in the first release.

I could add more details but am unsure whether any of this is
the sort of feedback that you expected.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-13 18:59 ` m.semkin
@ 2020-11-24 21:52   ` Timur Safin
  0 siblings, 0 replies; 8+ messages in thread
From: Timur Safin @ 2020-11-24 21:52 UTC (permalink / raw)
  To: 'm.semkin'; +Cc: mons, tarantool-discussions

: From: m.semkin <m.semkin@corp.mail.ru>
: Subject: Re: RFC - distributed SQL, step #1 - AST
: 
: I wouldn’t separate combiner nodes from storage nodes. I think combining
: should always be done on storage nodes. The aim of combining is to perform
: the first level of aggregation in order to:
: 
: * reduce the network traffic
: * not to overfill router's OM
: * increase parallelism
: 
: So I think the most suitable place for combining is storage nodes.

I thought that there is theoretical possibility to introduce several
intermediate combine steps for aggregating intermediate results for
multi-steps request. And for that we would need to introduce combiner
nodes which are like front-end router, but serving only subrequests.

But on a second thought I'd rather agree with you:
- it's front-end router responsibility to plan cluster execution, and
  Split query to subqueries if there is need;
- and there is no need to transfer raw data to the different node if 
  combine step might be done using this storage SQL facilties. i.e.
  local aggregation (combine) is done at the storage node. Indeed.


Agreed, thanks for note!

Timur


: 
: > On 13 Nov 2020, at 13:06, Timur Safin <tsafin@tarantool.org> wrote:
: >
: > Distributed SQL: the first step - AST for parser
: > ===============================================
: >
: > Summary
: > -------
: >
: > There is preliminary decision that we try to approach distributed SQL as
: > next big thing which come in 2021. This is a long project, which will be
: > approached gradually. Here we will try to observe all necessary steps to
: > be done both in long term and short term period of time. Longer terms
: goals
: > will be described briefly, but shorter term goal (extracting of AST from
: > SQL parser) will be given in more details, we could do it because of
: current
: > set of Tarantool capabilities available and having some PoC already
: developed.
: >
: > Bear in mind, that for longer term goals, later version of this RFC will
: try
: > to collect wider relevant information, showing some (quiet random)
: industry
: > precedents (but that the list of industry examples will neither be
: scientific,
: > nor complete).
: >
: >
: > Vocabulary:
: > -----------
: >
: > We use standard MapReduce vocabulary when we talks about roles of
: > cluster nodes involved to the SQL queries processing.
: >
: > +---------------+-----------------------------------------------------+
: > | Router(s)     | The node which processes queries and send to the    |
: > |               | corresponding storage nodes for local processing.   |
: > |               | It combines/reduces resultant data and sends it     |
: > |               | back to client                                      |
: > +===============+=====================================================+
: > | Combiner(s)   | Depending on the aggregation complexity needs there |
: > |               | may be several intermediate nodes which combine     |
: > |               | (aggregate) intermediate data and send it back      |
: > +---------------+-----------------------------------------------------+
: > | Storage nodes |                                                     |
: > +---------------+-----------------------------------------------------+
: >
: >
: >
: > Distributed SQL scenario
: > ------------------------
: >
: > Once we move from single node case to multiple node case for SQL
: > execution all kinds of intra-node data exchange arise. Once we get
: > original SQL to the router node, it's expected that router would
: > preparse SQL query, (massage them appropriately) and then send some
: > command data to storage nodes for their local execution.
: >
: > **The question is** - what format of data should we send to storage node?
: >
: > We might try to send to the storage node the compiled binary VDBE
: > byte-code, but it looks to be a bad idea for several reasons:
: >
: > 1. Vdbe is not yet frozen and due to the technology used (lemon parser
: >   with on the fly generation of constants) it might differ very
: >   much between various versions even for builds from the same branch.
: >   Though, if we have to, we could take some extra measures to stabilize
: >   values of tokens generated;
: >
: > 2. But bigger problem is - different data distribution on different shard
: >   nodes in the cluster. Which may require different query plans used
: >   for the same SQL query. If we would generate blindly the single
: >   byte code for received SQL then we may degrade performance comparing
: >   to the case when bytecode would be generated locally, for each modes
: >   separately, taking local heuristics.
: >
: > So at the moment simpler approach would be more preferable:
: >
: > -  We simple transfer (modified) SQL query string to each of shard node
: >   involved;
: >
: > -  Or we could transfer AST serialized to some kind of binary form;
: >
: > We suggest, that in the 1st stage, take the simplest approach - transfer
: > simple SQL query in their textual form.
: >
: > Distributed SQL in the InterSystems IRIS
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > (TBD)
: >
: >
: > Distributed SQL in MemSQL/SingleStore
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
: > (TBD)
: >
: >
: > Distributed SQL in MariaDB SkySQL
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > https://mariadb.com/products/skysql/
: > (TBD)
: >
: >
: > Distributed SQL in Yugabyte
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > (TBD)
: >
: >
: > Mike Siomkin' distributed SQL PoC
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > There is working and deployed proof-of-concept project which has been
: > implemented by Mike Siomkin, and which implements distributed SQL query
: > concept using currently available Tarantool facilities.
: >
: > .. note::
: >
: >   There are some obvious limitations though, but it further proves
: >   the point that with relatively small efforts, restricted distributed
: >   SQL processing might be implemented in current Tarantool within
: >   relatively short time frame.
: >
: > For preliminary parsing of SQL queries Mike' code is using SQLParser
: > LuaRocks (https://github.com/tarantool/sqlparser) module which is
: > wrapping HyRise SQL parser implemented in C++
: > (https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
: > building abstract-syntax trees (AST).
: >
: > The intermediate part between cluster controller at the Tarantool side
: > and SQL parser is gridql.lua module. This is gridql responsibility to
: > parse SQL, analyze resultant AST, and enrich it appropriately for
: > aggregate functions, and pagination support. *I.e. queries sent to
: > storage node will be different to the original SQL query*, and will be
: > different to the query executed by combiner/reducer node.
: >
: >
: >
: > The used sql-parser module exports only 2 methods: parse(query), and
: > tostring(ast).
: >
: > -  `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
: >   parser mechanics and returns AST tree as ffi structure
: >   LuaSQLParseResult, which in turns, composed of series of
: >   LuaSQLStatement-based objects, which might be of various types
: >   (e.g. kStmtSelect,  kStmtImport,  kStmtInsert,  kStmtUpdate,
: >   kStmtDelete, etc.), each of them could be attributed different
: >   set of data, including LuaExpr lists of various kinds;
: >
: > -  `sqlparser.tostring(ast)` stringifies the passed AST object;
: >
: > Despite the fact that Hyrise SQL parser has *no knowledge about builtin
: > SQL functions* supported by Tarantool SQL, it's parsing facilities are
: > enough for AST tree traversal, because any known name is marked as
: > identifier of function, which is a good enough to start of SQL processing
: > in gridql module.
: >
: > Local SQL execution is being done using builtin Tarantool SQL engine,
: > thus such lack of functions knowledge is not a problem, iff we pass
: > transparently SQL query down to the node processor.
: >
: > Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
: > *supports only `SELECT`s*, and not handles any other kinds of requests
: > (i.e. `UPDATE`).
: >
: > Unfortunately, gridql, at their current form, could not be published due
: > to heavy usage of customer specific virtual tables, but there are claims
: > that it's possible to generalize and simplify code,
: > so it might be used elsewhere beyond current context.
: >
: >
: > Long-term goals
: > ---------------
: >
: > So, having many industry precendents we see that ideally, for distributed
: > SQL we have to have:
: >
: > -  Some kind of router accepts SQL query, and then preparses it to some
: >   kind of intermediate representation (AST);
: >
: > -  Topology aware query planner analyses parsed query and having knowledge
: >   of data distribution it sends parsed "AST" subqueries to only those
: >   nodes, which has relevant data. If there is no data locality known
: >   then all cluster involved via Map-Combine-Reduce operation;
: >
: > -  Query might be split into inner subqueries for which stages would be
: >   planned and executed separately;
: >
: > -  If transactions are not read only then cluster wide transaction
: >   manager / conflict manager to be involved for 2PC mechanics
: >   coordination;
: >
: > -  And it would be easier if distributed SQL module should work even
: >   for single-node config (with or without vshard involved) for
: >   simpler debugging purposes;
: >
: > Timings for these long-term plans are not yet known, but at the moment
: > we believe that the nearest subjects should be:
: >
: > 1. SQL parser refactoring to saving AST (with serialization and
: >   deserialization if necessary);
: >
: > 2. And transaction/conflict manager should be extended with cluster
: >   wide transaction support, to make possible next steps of queries
: >   beyond simple `SELECT`s;
: >
: > 2nd item is not SQL-specific, and will be handled elsewhere separately,
: > this RFC we will continue to talk about SQL only plans;
: >
: >
: > Short-term distributed SQL plan
: > -------------------------------
: >
: > At the moment parser, byte-code generation, and query execution is
: > tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
: > architecture largely inherited by Tarantool from SQLite parser. And such
: > close coupling might become a road-blocker for us in the longer term, when
: > we would have to go to different nodes.
: >
: > If we properly split query parser and query execution logics we will
: > simplify configuration, making it easier to approach distributed SQL.
: >
: > -  So, for the 1st, obvious step - we would need to create a
: >   separate/builtin module tarantool-sqlparser which would wrap SQL
: >   parsing in `parse(query)` method in a fashion similar to Mike
: >   Siomkin' `sqlparser` module above;
: >
: > -  The `sql.parse(query)` method would need to return AST data structures,
: >   exported via ffi.
: >
: >   -  At the moment only SELECT and VIEW queries build AST during
: >      parsing, this is major limitation, which would require some extra
: >      refactoring later, but it's ok for the 1st stage.
: >
: >   -  For the 2nd stage we would need to extend AST with more SQL
: >      statement types, e.g. UPDATE / DELETE.
: >
: >   -  Worth to mention, that current AST structures as defined in the
: >      `sqlint.h` are quite similar to that used in Mike' sqlparser
: >      module - for more details see comparison of LuaDataTypes.h to
: >      sqlint.h in the Appendix A below;
: >
: > -  As we build AST we may enrich returned AST nodes with information
: >   about builtin functions kinds and expression data types, specific
: >   for our SQL parser;
: >
: > -  So in addition to currently available ways to run SQL queries via:
: >
: >   -  direct `box.execute`,
: >
: >   -  Or 2 step `box.prepare + box.execute`
: >
: >   -  we would add `sql.parse` method, similar to `box.prepare`,
: >      which should be similarly executable via `box.prepare + box.execute`;
: >
: > -  This refactoring with separation of parse step, should still maintain
: >   fully working SQL execution cycle, i.e. with minimum code
: >   modifications all relevant SQL queries should pass whole
: >   Tarantool SQL test suite. (This might apply only to SELECT
: >   queries, for stage 1);
: >
: >
: >
: > Distributed testing scenario
: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: >
: > -  Assumption is that decoupled sql.parse method, should be powerful
: >   enough to be able replace HyRise SQL parser in the gridql
: >   proof-of-concept code. Once being published it will be another
: >   indicator of intermediate success if it will be working with
: >   Tarantool SQL parser module. (Mike is planning to publish cleaned
: >   up code soon);
: >
: >   -  There is no need though in gridql for anything beyond simple
: >      SELECTs, which makes possible to create 1st implementation
: >      without major refactorings in Tarantool SQL parser, having only
: >      current data structures and code;
: >
: >   -  Thus addition of AST support for DELETE, INSERT, UPDATE statements
: >      will be done at stage #2, probably the next quarter, and it's
: >      not a goal for current plan;
: >
: >   -  i.e. we start with only read-only (SELECT and VIEW) queries, and
: >      not support RW operations yet;
: >
: > -  INSERT/UPDATE/DELETE queries will be done afterward, once we have
: >   distributed conflict manager and transaction managers implemented.
: >   And it's subject to coordinated efforts with Alexander Lyapunov team;
: >
: >
: >
: > Appendix A - AST data structures
: > --------------------------------
: >
: > +-----------------------------------------------+-------------------------
: ------+
: > | ``StatementType::kStmtSelect``                |
: ``ast_type::AST_TYPE_SELECT`` |
: >
: +===============================================+===========================
: ====+
: > |.. code:: c                                    |
: |
: > |                                               |
: |
: > | typedef struct {                              | **(there is none, yet)**
: |
: > |    bool isValid;                              |
: |
: > |    char* errorMsg;                            |
: |
: > |    int errorLine;                             |
: |
: > |    int errorColumn;                           |
: |
: > |    size_t statementCount;                     |
: |
: > |    struct LuaSQLStatement** statements;       |
: |
: > | } LuaSQLParserResult;                         |
: |
: > |                                               |
: |
: > +-----------------------------------------------+-------------------------
: ------+
: > |.. code:: c                                    |.. code:: c
: |
: > |                                               |
: |
: > | typedef struct LuaSelectStatement {           |  struct Select {
: |
: > |    struct LuaSQLStatement base;               |        ExprList *pEList;
: |
: > |                                               |        u8 op;
: |
: > |    struct LuaTableRef* fromTable;             |        LogEst
: nSelectRow;     |
: > |    bool selectDistinct;                       |        u32 selFlags;
: |
: > |                                               |        int iLimit,
: iOffset;   |
: > |    size_t selectListSize;                     |        char
: zSelName[12];     |
: > |    struct LuaExpr** selectList;               |        int
: addrOpenEphm[2];   |
: > |                                               |        SrcList *pSrc;
: |
: > |    struct LuaExpr* whereClause;               |        Expr *pWhere;
: |
: > |                                               |        ExprList
: *pGroupBy;    |
: > |    struct LuaGroupByDescription* groupBy;     |        Expr *pHaving;
: |
: > |                                               |        ExprList
: *pOrderBy;    |
: > |    size_t setOperationCount;                  |        Select *pPrior;
: |
: > |    struct LuaSetOperation** setOperations;    |        Select *pNext;
: |
: > |                                               |        Expr *pLimit;
: |
: > |    size_t orderCount;                         |        Expr *pOffset;
: |
: > |    struct LuaOrderDescription** order;        |        With *pWith;
: |
: > |                                               |  };
: |
: > |    size_t withDescriptionCount;               |
: |
: > |    struct LuaWithDescription**                |
: |
: > |           withDescriptions;                   |
: |
: > |    struct LuaLimitDescription* limit;         |
: |
: > | } LuaSelectStatement;                         |
: |
: > |                                               |
: |
: > +-----------------------------------------------+-------------------------
: ------+
: >
: > --
: > Timur
: > <distributed-sql-ast.html>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-20  0:05 ` Peter Gulutzan
@ 2020-11-24 21:55   ` Timur Safin
  0 siblings, 0 replies; 8+ messages in thread
From: Timur Safin @ 2020-11-24 21:55 UTC (permalink / raw)
  To: 'Peter Gulutzan'; +Cc: m.semkin, mons, tarantool-discussions

Thanks for your ideas, Peter!

We are not yet ready to discuss in all details distributed SQL 
modifications we would need to introduce for cluster execution,
because we are in much earlier stages of development, and need 
to proceed the 1st step yet - extracting AST from SQL parse tree.

But I promise, we will return back to this discussion, once 
we would go beyond the single node. I'll return back then
more prepared.

Thanks,
Timur

: From: Peter Gulutzan <pgulutzan@ocelot.ca>
: Subject: Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
: 
: Hi,
: 
: I only want to address the possible SQL syntax adjustments.
: 
: You mentioned "vshard" so I take it that we must assume use of
: https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/
: so I mean: what SQL adjustments can take advantage of vshard.
: 
: If we have some appropriate things in SQL, we can:
: () make it appear that migration from other SQL DBMSs is easier
: () partly fulfill the "distributed SQL" by really doing it in SQL.
: () allow users to design tables and queries with shard awareness.
: However, if I guess something looks hard, I'll say: make it illegal.
: 
: BUCKET_ID
: 
: All tuples, and most queries, need a column which contains numbers
: between 1 and bucket_count, and has a non-unique index.
: More flexibility would be nice but assume that's true for a year.
: Since a typical SQL database will contain column values that have
: different definitions, we are looking for a way that an SQL user
: can declare, for each table, "here is a column that can be mapped
: to bucket_id". Thus, in SQL terms, bucket_id is a GENERATED column
: which is derived from other columns in the table, or system values.
: For example, if column_x is an unsigned primary key, perhaps
: bucket_id = mod(primary_key_value, bucket_count).
: Any deterministic function will do.
: Changing bucket_count will be difficult.
: Changing the source column will be illegal.
: 
: Of course the initial assumption is that a router will handle
: distribution questions automatically but the feature might
: look better if there is clear support in the syntax.
: 
: "Distributed SQL" should mean that you can specify something in SQL.
: There are some common attributes that can be specified,
: and the most obvious one is: there is a choice of distribution
: methods for each table.
: In other words: saying
: "It is distributed SQL but it's not in SQL and the only thing
: you can do is distribute by a hash"
: will not look like what some people think is distributed SQL.
: So I am describing some additional SQL syntax
: which is non-standard but very common,
: with slight extra functionality that I hope will not look hard.
: 
: CREATE TABLE statement
: 
: I am suggesting a new optional clause, and a change to an
: existing clause, that would at least indicate whether and how
: the table is sharded.
: 
: It will show up in information_schema
: and in error messages in a way that SQL users will understand.
: Also, a syntax check can determine whether the clause is illegal
: because there are other CREATE TABLE clauses that won't work
: in combination with the new or changed clause.
: (I am thinking of foreign-key and check clauses, possibly.)
: 
: CREATE TABLE ... PARTITION BY ...
: 
: Should the clause be PARTITION BY?
: Of course the clause could start with the keyword SHARD, or something else.
: And of course we do not wish to appear to be saying that
: shards are synonymous with partitions.
: But we can decide, as
: PostgreSQL did when they "implemented sharding on top of partitioning"
: https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-
: postgresql-and-how-it-relates-to-mongodbs/
: that a partition description will affect a shard definition,
: so it is okay unless we plan to do partitioning without sharding later.
: And PARTITION BY is the clause that is already used in
: MySQL/MariaDB
: https://dev.mysql.com/doc/refman/8.0/en/create-table.html
: https://mariadb.com/kb/en/create-table/#partitions
: PostgreSQL:
: https://www.postgresql.org/docs/11/sql-createtable.html
: Oracle (though they add "You cannot partition a table that is part of a
: cluster."):
: https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2215406
: https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2215406
: DB2:
: https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2
: z_sql_createtable.html
: So, although this is only part of what those other vendors do, I suggest
: PARTITION BY some-keyword, and we can add more options as time goes by.
: 
: The word PARTITION is already a reserved word.
: 
: REJECTING REDSHIFT
: 
: To be fair, I must acknowledge the existence of other syntaxes,
: such as SQL Server's (which allows specifying functions) and most
: interestingly Amazon's. For example their ALTER TABLE options include
: https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html
: | ALTER DISTKEY column_name
: | ALTER DISTSTYLE ALL
: | ALTER DISTSTYLE EVEN
: | ALTER DISTSTYLE KEY DISTKEY column_name
: | ALTER DISTSTYLE AUTO
: where ALL means "all nodes should get a copy of the table",
: DISTKEY i.e. Distribution Key means something like partition key,
: AUTO means let the system figure it out.
: I like ALL but it would require a significant change to vshard rules.
: I like AUTO but it is nearly useless without ALL.
: Therefore, given that these options are uncommon, I ignore them.
: 
: (By the way Redshift's "leader node" can itself have tables so it
: doesn't need to pass all queries to other nodes, maybe our "router"
: could do something similar, but that's not an issue here.)
: 
: CREATE TABLE ... PARTITION BY HASH (partition-key-column-names);
: 
: I am assuming that we don't need to change vshard hashing or
: add more hash algorithms. And I don't refer here to the way
: that vshard handles bucket_id, I only expect the hash here to
: provide us with a bucket_id value when partition-key-columns are known.
: So perhaps it is uncontroversial that HASH (primary-key-column-names)
: would be legal. But is it default and is it compulsory?
: 
: I think the answers are 'no' and 'no' but am uncertain what does it
: mean if this clause is absent. It could mean
: "iff the whole database is sharded then assume it"
: or it could mean
: "even if the database is sharded this is local and probably temporary".
: 
: It would be simplest if partition-key-column-names happened to be
: a primary-key-column, because updating a primary-key-column is
: illegal already. Anyway, updating partition-key-column-names is illegal.
: 
: The list of columns in CREATE TABLE must include "bucket_id" (lower case).
: Perhaps we could assume that it exists and not show it with SELECT *.
: But I think it is useful to know its value and useful to be able to
: specify it anywhere in the column list. So now an example:
: 
: CREATE TABLE t (column_1 INT PRIMARY KEY, "bucket" INT, column_2 STRING)
: PARTITION BY HASH (column_2);
: INSERT INTO t VALUES (1, 'A');
: SELECT * FROM t;
: Result: 1, 41, 'A'.
: Notice assumption: when inserting, I don't need to specify "bucket_id".
: Notice assumption: bucket_id = code point of first letter of string.
: Maybe those are bad assumptions, I just want the example to be easy.
: 
: I suppose that PARTITION BY clause comes after WITH clause,
: as in MySQL where partition options come after table options.
: 
: CREATE TABLE ... PARTITION BY RANGE (expression)
: PARTITION partition_name VALUES LESS THAN (value) [, ... ]);
: 
: This will be a bit more detailed than the first CREATE TABLE,
: and will allow the user more control over what bucket a row goes to,
: but will not be default and maybe will not be necessary.
: 
: An example could be
: CREATE TABLE t (column_1 INT PRIMARY KEY, "bucket" INT, column_2 STRING)
: PARTITION BY RANGE (column_2)
: PARTITION p1 VALUES LESS THAN ('J'),
: PARTITION p2 VALUES LESS THAN ('Z');
: The result would be that there are only two possible bucket_id values,
: 1 for 'p1', and 2 for 'p2'.
: 
: Notice that this definition means that column_2 will always be less
: than 'Z', so an additional clause
: CHECK (column_2 < 'Y') would be redundant
: and an additional clause
: CHECK (column_2 > 'Д') should be illegal.
: 
: It may be useful that this means there is a one-to-one association
: between partition_name and bucket_id value, as we will see later.
: 
: It may not be useful that this means a column with monotonically
: increasing values will tend to be cause re-use of the same partition.
: 
: FOREIGN KEYS
: 
: The good news about foreign keys is:
: if the foreign key and the referenced key are also partition keys
: and the partition clauses are the same, then rows with the same
: values will be in the same buckets. In such cases we can guarantee
: that INSERT/UPDATE/DELETE will work without needing multiple nodes,
: and equijoins will work if they are ANDED with single-row
: selections of either key.
: 
: The bad news is:
: Otherwise there has to be at least one lookup which might
: be for a value in a different bucket, and the
: ON UPDATE | ON DELETE clauses cause changes of values in
: different buckets. That is, they are "multi-row".
: 
: So the recommendation is: let foreign keys be illegal.
: 
: TABLE REFERENCES
: 
: Now that we know there is sharding, we can change
: FROM table_name
: to
: FROM table_name [PARTITION (partition_name)]
: which would override router decisions because p1 is associated
: with bucket.
: 
: This is manual "pruning".
: "Pruning" can mean "only go to the necessary partitions".
: Usually we hope an optimizer will do it, but let's not assume so.
: 
: In some systems one can say PARTITION (expression) instead,
: but that's unnecessary, I think.
: 
: Table references can have PARTITION clauses in SELECT,
: INSERT, UPDATE, DELETE, REPLACE, and perhaps other statements.
: But they should be unnecessary for single-row INSERTs,
: because single-row INSERTs will always contain the partition-key
: values and therefore bucket_id is known already and therefore
: passing partition name would be redundant.
: 
: INSERT
: 
: As stated, INSERT should be easy because bucket_id is easy
: to calculate from the rest of the statement.
: 
: On the other hand, INSERT ... SELECT is hard, so I suggest
: that it should be illegal in the first release.
: 
: UPDATE, SELECT, DELETE
: 
: Either these statements should have a partition name or they should
: have a WHERE clause that contains the exact phrase
: partition_key_value = literal
: so that even a very simple program can calculate bucket_id.
: Otherwise they should be illegal in the  first release.
: 
: Updating partition-key columns is illegal too.
: 
: ALTER
: 
: When ALTER ... DROP is illegal, it should be illegal to drop the
: "_bucket" column
: or any column used for partitioning.
: 
: DROP INDEX
: 
: It should be illegal to drop indexes on the _bucket column.
: It probably should be legal to drop indexes on columns used for
: partitioning.
: 
: INFORMATION_SCHEMA
: 
: The _TABLES table should show the partition definitions.
: The _COLUMNS table should show whether a column is in a partition key.
: Minimally, a user or a program should be able to see enough to decide
: whether manual pruning is possible for a simple query.
: 
: Actually I suppose this is my job, since the only implementation of
: INFORMATION_SCHEMA is the series of functions that I wrote for the
: user manual. Remember that one of those functions will crash until
: a certain server bug is fixed.
: 
: If it's desirable, I can write other routines that put bucket
: information in INFORMATION_SCHEMA. That's easy for anyone to do,
: it merely involves reading system tables and calling Lua functions.
: However, I believe the display should be as a table -- that's what
: any SQL person would expect -- so for example suppose we had this.
: (not a system table but similar to an example in the manual):
: tarantool> cfg = {
:           >   servers = {
:           >     { uri = 'host1:33131', zone = '1' },
:           >     { uri = 'host2:33131', zone = '2' }
:           >   },
:           >   login = 'test_user',
:           >   password = 'pass',
:           >   redundancy = '2'
:           > }
: I want this display if I select:
: | --------- + -------- + ---------- + ------------ + ------------- +
: -------------+ --------------|
: | login     | password | redundancy | server_1_uri | server_1_zone |
: server_2_uri | server_2_zone |
: | --------- + -------- + ---------- + ------------ + ------------- +
: -------------+ --------------|
: | test_user | pass     | 2          | host:33131   | 1             |
: host2:33131  | 2             |
: | --------- + -------- + ---------- + ------------ + ------------- +
: -------------+ --------------|
: This is not a server issue, it is a client feature.
: 
: EXPLAIN | ANALYZE
: 
: I expect that the simple phrase "this can be done on a single node"
: is the most useful thing that can be said. More is not necessary.
: 
: ON CLAUSE AND WHERE CLAUSE
: 
: These are the places where we're most likely to see joins or subqueries.
: Suppose they require a search of more than one partition/bucket/node/etc.
: We could cause a warning if someday we support warnings.
: We could cause an error if the user fails to add a keyword (like FORCE)
: to indicate an understanding of the possible consequences.
: We could require 'admin' privilege to do slow queries of any kind.
: But maybe we don't need to do anything, it might not be a big problem.
: 
: I believe, though, that everyone will be happier if we say:
: joins and subqueries are illegal in the first release.
: 
: I could add more details but am unsure whether any of this is
: the sort of feedback that you expected.
: 
: Peter Gulutzan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-13 10:06 [Tarantool-discussions] RFC - distributed SQL, step #1 - AST Timur Safin
  2020-11-13 18:59 ` m.semkin
  2020-11-20  0:05 ` Peter Gulutzan
@ 2020-11-24 22:06 ` Timur Safin
  2020-11-25  9:44   ` Nikita Pettik
  2 siblings, 1 reply; 8+ messages in thread
From: Timur Safin @ 2020-11-24 22:06 UTC (permalink / raw)
  To: mons, 'Kirill Yukhin', 'korablev',
	'Vladislav Shpilevoy', 'Sergey Ostanevich',
	Alexander Turenko
  Cc: m.semkin, tarantool-discussions

First I need to update those who was not involved in our internal 
discussions - idea to serialize to SQL for sending commands to 
data nodes have been rejected, and we go full steam with normal
AST way:
- create AST for SQL query;
- serialize it/deserialize once we are about to transfer it elsewhere;
- There were doubts about complexities of implementing AST for 
  statements beyond currently implemented SELECT/VIEW and triggers.
  But more experienced colleagues believe that it will be easy to
  create AST for them also, so we will try to implement it also.

Also, Mons wanted to look into code examples how AST manipulations
will look like, so please see additional appendix created. 

Exporting AST as cdata is easy to do, but in this case it will be 
inconvenient to manipulate AST nodes, before sending to cluster nodes.
So there is 2nd, more idiomatic approach suggested - to convert AST
to nested Lua object/tables. Which should simplify some massaging
and provide natural way to serialization to msgpack.

Appendix B - programmatic interfaces
------------------------------------

SYNOPSIS
~~~~~~~~

.. code-block:: lua

   local sql = require `sqlparser`

   -- parse, return ast, pass it back unmodified for execution

   local ast = sql.parse [[ select * from "table" where id > 10 limit 10 ]]
   assert(type(ast) == 'cdata')
   local ok = sql.execute(ast)

   -- free allocated memory, like box.unprepare but for ast
   sql.unparse(ast)

   -- raw access to cdata structures
   local cdata = ast.as_cdata()
   if cdata.ast_type == ffi.C.AST_TYPE_SELECT
      handle_select_stmt(cdata.select)
   end

   -- Lua access to structurs as Lua tables
   local tree = ast.as_table()
   if tree.type == AST_TYPE_SELECT
      handle_columns_list(tree.select.columns)
      handle_where_clause(tree.select.where)
      limit = tree.select.limit
   end
   -- massaging with tree data

   -- serialization
   local msgpack = require 'msgpack'
   local to_wire = msgpack.encode(tree)

   -- networking magics ...
   -- ... deserialization
   local table = msgpack.decode(from_wire)
   ast.set_tree(tree)
   sql.execute(ast)

Regards,
Timur

P.S.

Sorry for top-post...

: From: Tarantool-discussions <tarantool-discussions-
: Subject: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
: 
: Distributed SQL: the first step - AST for parser
: ===============================================
: 
: Summary
: -------
: 
: There is preliminary decision that we try to approach distributed SQL as
: next big thing which come in 2021. This is a long project, which will be
: approached gradually. Here we will try to observe all necessary steps to
: be done both in long term and short term period of time. Longer terms goals
: will be described briefly, but shorter term goal (extracting of AST from
: SQL parser) will be given in more details, we could do it because of current
: set of Tarantool capabilities available and having some PoC already
: developed.
: 
: Bear in mind, that for longer term goals, later version of this RFC will try
: to collect wider relevant information, showing some (quiet random) industry
: precedents (but that the list of industry examples will neither be
: scientific,
: nor complete).
: 
: 
: Vocabulary:
: -----------
: 
: We use standard MapReduce vocabulary when we talks about roles of
: cluster nodes involved to the SQL queries processing.
: 
: +---------------+-----------------------------------------------------+
: | Router(s)     | The node which processes queries and send to the    |
: |               | corresponding storage nodes for local processing.   |
: |               | It combines/reduces resultant data and sends it     |
: |               | back to client                                      |
: +===============+=====================================================+
: | Combiner(s)   | Depending on the aggregation complexity needs there |
: |               | may be several intermediate nodes which combine     |
: |               | (aggregate) intermediate data and send it back      |
: +---------------+-----------------------------------------------------+
: | Storage nodes |                                                     |
: +---------------+-----------------------------------------------------+
: 
: 
: 
: Distributed SQL scenario
: ------------------------
: 
: Once we move from single node case to multiple node case for SQL
: execution all kinds of intra-node data exchange arise. Once we get
: original SQL to the router node, it's expected that router would
: preparse SQL query, (massage them appropriately) and then send some
: command data to storage nodes for their local execution.
: 
: **The question is** - what format of data should we send to storage node?
: 
: We might try to send to the storage node the compiled binary VDBE
: byte-code, but it looks to be a bad idea for several reasons:
: 
: 1. Vdbe is not yet frozen and due to the technology used (lemon parser
:    with on the fly generation of constants) it might differ very
:    much between various versions even for builds from the same branch.
:    Though, if we have to, we could take some extra measures to stabilize
:    values of tokens generated;
: 
: 2. But bigger problem is - different data distribution on different shard
:    nodes in the cluster. Which may require different query plans used
:    for the same SQL query. If we would generate blindly the single
:    byte code for received SQL then we may degrade performance comparing
:    to the case when bytecode would be generated locally, for each modes
:    separately, taking local heuristics.
: 
: So at the moment simpler approach would be more preferable:
: 
: -  We simple transfer (modified) SQL query string to each of shard node
:    involved;
: 
: -  Or we could transfer AST serialized to some kind of binary form;
: 
: We suggest, that in the 1st stage, take the simplest approach - transfer
: simple SQL query in their textual form.
: 
: Distributed SQL in the InterSystems IRIS
: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: 
: (TBD)
: 
: 
: Distributed SQL in MemSQL/SingleStore
: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: 
: https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
:  (TBD)
: 
: 
: Distributed SQL in MariaDB SkySQL
: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: 
: https://mariadb.com/products/skysql/
: (TBD)
: 
: 
: Distributed SQL in Yugabyte
: ~~~~~~~~~~~~~~~~~~~~~~~~~~~
: 
: (TBD)
: 
: 
: Mike Siomkin' distributed SQL PoC
: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: 
: There is working and deployed proof-of-concept project which has been
: implemented by Mike Siomkin, and which implements distributed SQL query
: concept using currently available Tarantool facilities.
: 
: .. note::
: 
:    There are some obvious limitations though, but it further proves
:    the point that with relatively small efforts, restricted distributed
:    SQL processing might be implemented in current Tarantool within
:    relatively short time frame.
: 
: For preliminary parsing of SQL queries Mike' code is using SQLParser
: LuaRocks (https://github.com/tarantool/sqlparser) module which is
: wrapping HyRise SQL parser implemented in C++
: (https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
: building abstract-syntax trees (AST).
: 
: The intermediate part between cluster controller at the Tarantool side
: and SQL parser is gridql.lua module. This is gridql responsibility to
: parse SQL, analyze resultant AST, and enrich it appropriately for
: aggregate functions, and pagination support. *I.e. queries sent to
: storage node will be different to the original SQL query*, and will be
: different to the query executed by combiner/reducer node.
: 
: 
: 
: The used sql-parser module exports only 2 methods: parse(query), and
: tostring(ast).
: 
: -  `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
:    parser mechanics and returns AST tree as ffi structure
:    LuaSQLParseResult, which in turns, composed of series of
:    LuaSQLStatement-based objects, which might be of various types
:    (e.g. kStmtSelect,  kStmtImport,  kStmtInsert,  kStmtUpdate,
:    kStmtDelete, etc.), each of them could be attributed different
:    set of data, including LuaExpr lists of various kinds;
: 
: -  `sqlparser.tostring(ast)` stringifies the passed AST object;
: 
: Despite the fact that Hyrise SQL parser has *no knowledge about builtin
: SQL functions* supported by Tarantool SQL, it's parsing facilities are
: enough for AST tree traversal, because any known name is marked as
: identifier of function, which is a good enough to start of SQL processing
: in gridql module.
: 
: Local SQL execution is being done using builtin Tarantool SQL engine,
: thus such lack of functions knowledge is not a problem, iff we pass
: transparently SQL query down to the node processor.
: 
: Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
: *supports only `SELECT`s*, and not handles any other kinds of requests
: (i.e. `UPDATE`).
: 
: Unfortunately, gridql, at their current form, could not be published due
: to heavy usage of customer specific virtual tables, but there are claims
: that it's possible to generalize and simplify code,
: so it might be used elsewhere beyond current context.
: 
: 
: Long-term goals
: ---------------
: 
: So, having many industry precendents we see that ideally, for distributed
: SQL we have to have:
: 
: -  Some kind of router accepts SQL query, and then preparses it to some
:    kind of intermediate representation (AST);
: 
: -  Topology aware query planner analyses parsed query and having knowledge
:    of data distribution it sends parsed "AST" subqueries to only those
:    nodes, which has relevant data. If there is no data locality known
:    then all cluster involved via Map-Combine-Reduce operation;
: 
: -  Query might be split into inner subqueries for which stages would be
:    planned and executed separately;
: 
: -  If transactions are not read only then cluster wide transaction
:    manager / conflict manager to be involved for 2PC mechanics
:    coordination;
: 
: -  And it would be easier if distributed SQL module should work even
:    for single-node config (with or without vshard involved) for
:    simpler debugging purposes;
: 
: Timings for these long-term plans are not yet known, but at the moment
: we believe that the nearest subjects should be:
: 
: 1. SQL parser refactoring to saving AST (with serialization and
:    deserialization if necessary);
: 
: 2. And transaction/conflict manager should be extended with cluster
:    wide transaction support, to make possible next steps of queries
:    beyond simple `SELECT`s;
: 
: 2nd item is not SQL-specific, and will be handled elsewhere separately,
: this RFC we will continue to talk about SQL only plans;
: 
: 
: Short-term distributed SQL plan
: -------------------------------
: 
: At the moment parser, byte-code generation, and query execution is
: tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
: architecture largely inherited by Tarantool from SQLite parser. And such
: close coupling might become a road-blocker for us in the longer term, when
: we would have to go to different nodes.
: 
: If we properly split query parser and query execution logics we will
: simplify configuration, making it easier to approach distributed SQL.
: 
: -  So, for the 1st, obvious step - we would need to create a
:    separate/builtin module tarantool-sqlparser which would wrap SQL
:    parsing in `parse(query)` method in a fashion similar to Mike
:    Siomkin' `sqlparser` module above;
: 
: -  The `sql.parse(query)` method would need to return AST data structures,
:    exported via ffi.
: 
:    -  At the moment only SELECT and VIEW queries build AST during
:       parsing, this is major limitation, which would require some extra
:       refactoring later, but it's ok for the 1st stage.
: 
:    -  For the 2nd stage we would need to extend AST with more SQL
:       statement types, e.g. UPDATE / DELETE.
: 
:    -  Worth to mention, that current AST structures as defined in the
:       `sqlint.h` are quite similar to that used in Mike' sqlparser
:       module - for more details see comparison of LuaDataTypes.h to
:       sqlint.h in the Appendix A below;
: 
: -  As we build AST we may enrich returned AST nodes with information
:    about builtin functions kinds and expression data types, specific
:    for our SQL parser;
: 
: -  So in addition to currently available ways to run SQL queries via:
: 
:    -  direct `box.execute`,
: 
:    -  Or 2 step `box.prepare + box.execute`
: 
:    -  we would add `sql.parse` method, similar to `box.prepare`,
:       which should be similarly executable via `box.prepare + box.execute`;
: 
: -  This refactoring with separation of parse step, should still maintain
:    fully working SQL execution cycle, i.e. with minimum code
:    modifications all relevant SQL queries should pass whole
:    Tarantool SQL test suite. (This might apply only to SELECT
:    queries, for stage 1);
: 
: 
: 
: Distributed testing scenario
: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: 
: -  Assumption is that decoupled sql.parse method, should be powerful
:    enough to be able replace HyRise SQL parser in the gridql
:    proof-of-concept code. Once being published it will be another
:    indicator of intermediate success if it will be working with
:    Tarantool SQL parser module. (Mike is planning to publish cleaned
:    up code soon);
: 
:    -  There is no need though in gridql for anything beyond simple
:       SELECTs, which makes possible to create 1st implementation
:       without major refactorings in Tarantool SQL parser, having only
:       current data structures and code;
: 
:    -  Thus addition of AST support for DELETE, INSERT, UPDATE statements
:       will be done at stage #2, probably the next quarter, and it's
:       not a goal for current plan;
: 
:    -  i.e. we start with only read-only (SELECT and VIEW) queries, and
:       not support RW operations yet;
: 
: -  INSERT/UPDATE/DELETE queries will be done afterward, once we have
:    distributed conflict manager and transaction managers implemented.
:    And it's subject to coordinated efforts with Alexander Lyapunov team;
: 
: 
: 
: Appendix A - AST data structures
: --------------------------------
: 
: +-----------------------------------------------+---------------------------
: ----+
: | ``StatementType::kStmtSelect``                |
: ``ast_type::AST_TYPE_SELECT`` |
: +===============================================+===========================
: ====+
: |.. code:: c                                    |
: |
: |                                               |
: |
: | typedef struct {                              | **(there is none, yet)**
: |
: |    bool isValid;                              |
: |
: |    char* errorMsg;                            |
: |
: |    int errorLine;                             |
: |
: |    int errorColumn;                           |
: |
: |    size_t statementCount;                     |
: |
: |    struct LuaSQLStatement** statements;       |
: |
: | } LuaSQLParserResult;                         |
: |
: |                                               |
: |
: +-----------------------------------------------+---------------------------
: ----+
: |.. code:: c                                    |.. code:: c
: |
: |                                               |
: |
: | typedef struct LuaSelectStatement {           |  struct Select {
: |
: |    struct LuaSQLStatement base;               |        ExprList *pEList;
: |
: |                                               |        u8 op;
: |
: |    struct LuaTableRef* fromTable;             |        LogEst nSelectRow;
: |
: |    bool selectDistinct;                       |        u32 selFlags;
: |
: |                                               |        int iLimit,
: iOffset;   |
: |    size_t selectListSize;                     |        char zSelName[12];
: |
: |    struct LuaExpr** selectList;               |        int
: addrOpenEphm[2];   |
: |                                               |        SrcList *pSrc;
: |
: |    struct LuaExpr* whereClause;               |        Expr *pWhere;
: |
: |                                               |        ExprList *pGroupBy;
: |
: |    struct LuaGroupByDescription* groupBy;     |        Expr *pHaving;
: |
: |                                               |        ExprList *pOrderBy;
: |
: |    size_t setOperationCount;                  |        Select *pPrior;
: |
: |    struct LuaSetOperation** setOperations;    |        Select *pNext;
: |
: |                                               |        Expr *pLimit;
: |
: |    size_t orderCount;                         |        Expr *pOffset;
: |
: |    struct LuaOrderDescription** order;        |        With *pWith;
: |
: |                                               |  };
: |
: |    size_t withDescriptionCount;               |
: |
: |    struct LuaWithDescription**                |
: |
: |           withDescriptions;                   |
: |
: |    struct LuaLimitDescription* limit;         |
: |
: | } LuaSelectStatement;                         |
: |
: |                                               |
: |
: +-----------------------------------------------+---------------------------
: ----+
: 
: --
: Timur

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-24 22:06 ` Timur Safin
@ 2020-11-25  9:44   ` Nikita Pettik
  2020-11-26  7:45     ` Timur Safin
  0 siblings, 1 reply; 8+ messages in thread
From: Nikita Pettik @ 2020-11-25  9:44 UTC (permalink / raw)
  To: Timur Safin; +Cc: m.semkin, mons, tarantool-discussions, Alexander Turenko

On 25 Nov 01:06, Timur Safin wrote:
> 
> Exporting AST as cdata is easy to do, but in this case it will be 
> inconvenient to manipulate AST nodes, before sending to cluster nodes.
> So there is 2nd, more idiomatic approach suggested - to convert AST
> to nested Lua object/tables.

Hm, why do you need those tables? Serialization into msgpack can
be done inside SQL internals.

> Which should simplify some massaging
> and provide natural way to serialization to msgpack.
> 
> 
> SYNOPSIS
> ~~~~~~~~
> 
> .. code-block:: lua
> 
>    local sql = require `sqlparser`
> 
>    -- parse, return ast, pass it back unmodified for execution
> 
>    local ast = sql.parse [[ select * from "table" where id > 10 limit 10 ]]

Should this be public API? Alternatively, we can hide it in sql.internals.

>    assert(type(ast) == 'cdata')
>    local ok = sql.execute(ast)
> 
>    -- free allocated memory, like box.unprepare but for ast
>    sql.unparse(ast)

I don't like unparse name. In fact even unprepare is a bad name,
it should be called sort of deallocate. I suggest sql.release_ast()/
sql.free_ast() naming.
 
>    -- raw access to cdata structures
>    local cdata = ast.as_cdata()
>    if cdata.ast_type == ffi.C.AST_TYPE_SELECT
>       handle_select_stmt(cdata.select)
>    end
> 
>    -- Lua access to structurs as Lua tables
>    local tree = ast.as_table()
>    if tree.type == AST_TYPE_SELECT
>       handle_columns_list(tree.select.columns)
>       handle_where_clause(tree.select.where)
>       limit = tree.select.limit

What's the purpose of these handles?

>    end
>    -- massaging with tree data
> 
>    -- serialization
>    local msgpack = require 'msgpack'
>    local to_wire = msgpack.encode(tree)
> 
>    -- networking magics ...
>    -- ... deserialization
>    local table = msgpack.decode(from_wire)
>    ast.set_tree(tree)
>    sql.execute(ast)
> 
> 
> Regards,
> Timur
> 
> P.S.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
  2020-11-25  9:44   ` Nikita Pettik
@ 2020-11-26  7:45     ` Timur Safin
  0 siblings, 0 replies; 8+ messages in thread
From: Timur Safin @ 2020-11-26  7:45 UTC (permalink / raw)
  To: 'Nikita Pettik'
  Cc: m.semkin, mons, tarantool-discussions, 'Alexander Turenko'

: From: Nikita Pettik <korablev@tarantool.org>
: Subject: Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
: 
: On 25 Nov 01:06, Timur Safin wrote:
: >
: > Exporting AST as cdata is easy to do, but in this case it will be
: > inconvenient to manipulate AST nodes, before sending to cluster nodes.
: > So there is 2nd, more idiomatic approach suggested - to convert AST
: > to nested Lua object/tables.
: 
: Hm, why do you need those tables? Serialization into msgpack can
: be done inside SQL internals.

I do understand that direct conversion to msgpack from ast, and creation
of tables instead of msgpack, for manipulations in Lua would have the same
complexity and mostly would be done using the same walker code. The problem
I foresee - it would be hard to massage AST in Lua iff we would need to.
And we would have to do it eventually, once we approach distributed SQL router
task which might need to modify original AST before sending to data nodes
for their local execution. Do you remember how Mike had to process AST cdata
for special aggregation functions processing, and modification of column list?

That's why I assumed that exposing AST nodes as Lua tables would simplify such
modification task, I guessed it would be more idiomatic for Lua. But I might
be wrong here...

: 
: > Which should simplify some massaging
: > and provide natural way to serialization to msgpack.
: >
: >
: > SYNOPSIS
: > ~~~~~~~~
: >
: > .. code-block:: lua
: >
: >    local sql = require `sqlparser`
: >
: >    -- parse, return ast, pass it back unmodified for execution
: >
: >    local ast = sql.parse [[ select * from "table" where id > 10 limit 10
: ]]
: 
: Should this be public API? Alternatively, we can hide it in sql.internals.

I consider sqlparser the internal API already (i.e. box.sql instead of 
currently used temporary sqlparser). So having yet another internal api 
would not make much sense. But it's all up to our decision. 

: 
: >    assert(type(ast) == 'cdata')
: >    local ok = sql.execute(ast)
: >
: >    -- free allocated memory, like box.unprepare but for ast
: >    sql.unparse(ast)
: 
: I don't like unparse name. In fact even unprepare is a bad name,
: it should be called sort of deallocate. I suggest sql.release_ast()/
: sql.free_ast() naming.

I'm ok with unprepare :) [we used this name in the InterSystems for similar 
Contexts] and thus ok with unparse. But seimilarly I'm ok with any different 
name - the major point here, there should be anything which frees AST data
structures kept elsewhere.

: 
: >    -- raw access to cdata structures
: >    local cdata = ast.as_cdata()
: >    if cdata.ast_type == ffi.C.AST_TYPE_SELECT
: >       handle_select_stmt(cdata.select)
: >    end
: >
: >    -- Lua access to structurs as Lua tables
: >    local tree = ast.as_table()
: >    if tree.type == AST_TYPE_SELECT
: >       handle_columns_list(tree.select.columns)
: >       handle_where_clause(tree.select.where)
: >       limit = tree.select.limit
: 
: What's the purpose of these handles?

Sorry for the confusion created. Those **handle_anything** assumed 
to represent any user-defined function which manipulate with cdata 
exported. i.e. I should put it the way:

  user_handle_column_list(tree.select.columns)
  user_handle_where_clause(tree.select.where)

and here we define only tree subobjects exposed to cdata, and don't
care how and where user defined their functions traversing/manipulating 
AST.

: 
: >    end
: >    -- massaging with tree data
: >
: >    -- serialization
: >    local msgpack = require 'msgpack'
: >    local to_wire = msgpack.encode(tree)
: >
: >    -- networking magics ...
: >    -- ... deserialization
: >    local table = msgpack.decode(from_wire)
: >    ast.set_tree(tree)
: >    sql.execute(ast)
: >
: >
: > Regards,
: > Timur
: >
: > P.S.
: >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-26  7:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13 10:06 [Tarantool-discussions] RFC - distributed SQL, step #1 - AST Timur Safin
2020-11-13 18:59 ` m.semkin
2020-11-24 21:52   ` Timur Safin
2020-11-20  0:05 ` Peter Gulutzan
2020-11-24 21:55   ` Timur Safin
2020-11-24 22:06 ` Timur Safin
2020-11-25  9:44   ` Nikita Pettik
2020-11-26  7:45     ` Timur Safin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox