[Tarantool-discussions] RFC - distributed SQL, step #1 - AST
Timur Safin
tsafin at tarantool.org
Fri Nov 13 13:06:40 MSK 2020
Distributed SQL: the first step - AST for parser
===============================================
Summary
-------
There is preliminary decision that we try to approach distributed SQL as
next big thing which come in 2021. This is a long project, which will be
approached gradually. Here we will try to observe all necessary steps to
be done both in long term and short term period of time. Longer terms goals
will be described briefly, but shorter term goal (extracting of AST from
SQL parser) will be given in more details, we could do it because of current
set of Tarantool capabilities available and having some PoC already developed.
Bear in mind, that for longer term goals, later version of this RFC will try
to collect wider relevant information, showing some (quiet random) industry
precedents (but that the list of industry examples will neither be scientific,
nor complete).
Vocabulary:
-----------
We use standard MapReduce vocabulary when we talks about roles of
cluster nodes involved to the SQL queries processing.
+---------------+-----------------------------------------------------+
| Router(s) | The node which processes queries and send to the |
| | corresponding storage nodes for local processing. |
| | It combines/reduces resultant data and sends it |
| | back to client |
+===============+=====================================================+
| Combiner(s) | Depending on the aggregation complexity needs there |
| | may be several intermediate nodes which combine |
| | (aggregate) intermediate data and send it back |
+---------------+-----------------------------------------------------+
| Storage nodes | |
+---------------+-----------------------------------------------------+
Distributed SQL scenario
------------------------
Once we move from single node case to multiple node case for SQL
execution all kinds of intra-node data exchange arise. Once we get
original SQL to the router node, it's expected that router would
preparse SQL query, (massage them appropriately) and then send some
command data to storage nodes for their local execution.
**The question is** - what format of data should we send to storage node?
We might try to send to the storage node the compiled binary VDBE
byte-code, but it looks to be a bad idea for several reasons:
1. Vdbe is not yet frozen and due to the technology used (lemon parser
with on the fly generation of constants) it might differ very
much between various versions even for builds from the same branch.
Though, if we have to, we could take some extra measures to stabilize
values of tokens generated;
2. But bigger problem is - different data distribution on different shard
nodes in the cluster. Which may require different query plans used
for the same SQL query. If we would generate blindly the single
byte code for received SQL then we may degrade performance comparing
to the case when bytecode would be generated locally, for each modes
separately, taking local heuristics.
So at the moment simpler approach would be more preferable:
- We simple transfer (modified) SQL query string to each of shard node
involved;
- Or we could transfer AST serialized to some kind of binary form;
We suggest, that in the 1st stage, take the simplest approach - transfer
simple SQL query in their textual form.
Distributed SQL in the InterSystems IRIS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(TBD)
Distributed SQL in MemSQL/SingleStore
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
https://docs.singlestore.com/v7.1/introduction/how-memsql-works/
(TBD)
Distributed SQL in MariaDB SkySQL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
https://mariadb.com/products/skysql/
(TBD)
Distributed SQL in Yugabyte
~~~~~~~~~~~~~~~~~~~~~~~~~~~
(TBD)
Mike Siomkin' distributed SQL PoC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is working and deployed proof-of-concept project which has been
implemented by Mike Siomkin, and which implements distributed SQL query
concept using currently available Tarantool facilities.
.. note::
There are some obvious limitations though, but it further proves
the point that with relatively small efforts, restricted distributed
SQL processing might be implemented in current Tarantool within
relatively short time frame.
For preliminary parsing of SQL queries Mike' code is using SQLParser
LuaRocks (https://github.com/tarantool/sqlparser) module which is
wrapping HyRise SQL parser implemented in C++
(https://github.com/hyrise/sql-parser) for parsing given SQL queries, and
building abstract-syntax trees (AST).
The intermediate part between cluster controller at the Tarantool side
and SQL parser is gridql.lua module. This is gridql responsibility to
parse SQL, analyze resultant AST, and enrich it appropriately for
aggregate functions, and pagination support. *I.e. queries sent to
storage node will be different to the original SQL query*, and will be
different to the query executed by combiner/reducer node.
The used sql-parser module exports only 2 methods: parse(query), and
tostring(ast).
- `sqlparser.parse(q)` uses ffi function parseSql, which wraps hyrise SQL
parser mechanics and returns AST tree as ffi structure
LuaSQLParseResult, which in turns, composed of series of
LuaSQLStatement-based objects, which might be of various types
(e.g. kStmtSelect, kStmtImport, kStmtInsert, kStmtUpdate,
kStmtDelete, etc.), each of them could be attributed different
set of data, including LuaExpr lists of various kinds;
- `sqlparser.tostring(ast)` stringifies the passed AST object;
Despite the fact that Hyrise SQL parser has *no knowledge about builtin
SQL functions* supported by Tarantool SQL, it's parsing facilities are
enough for AST tree traversal, because any known name is marked as
identifier of function, which is a good enough to start of SQL processing
in gridql module.
Local SQL execution is being done using builtin Tarantool SQL engine,
thus such lack of functions knowledge is not a problem, iff we pass
transparently SQL query down to the node processor.
Hyrise knowns all kinds of SQL queries, but at the moment gridql modules
*supports only `SELECT`s*, and not handles any other kinds of requests
(i.e. `UPDATE`).
Unfortunately, gridql, at their current form, could not be published due
to heavy usage of customer specific virtual tables, but there are claims
that it's possible to generalize and simplify code,
so it might be used elsewhere beyond current context.
Long-term goals
---------------
So, having many industry precendents we see that ideally, for distributed
SQL we have to have:
- Some kind of router accepts SQL query, and then preparses it to some
kind of intermediate representation (AST);
- Topology aware query planner analyses parsed query and having knowledge
of data distribution it sends parsed "AST" subqueries to only those
nodes, which has relevant data. If there is no data locality known
then all cluster involved via Map-Combine-Reduce operation;
- Query might be split into inner subqueries for which stages would be
planned and executed separately;
- If transactions are not read only then cluster wide transaction
manager / conflict manager to be involved for 2PC mechanics
coordination;
- And it would be easier if distributed SQL module should work even
for single-node config (with or without vshard involved) for
simpler debugging purposes;
Timings for these long-term plans are not yet known, but at the moment
we believe that the nearest subjects should be:
1. SQL parser refactoring to saving AST (with serialization and
deserialization if necessary);
2. And transaction/conflict manager should be extended with cluster
wide transaction support, to make possible next steps of queries
beyond simple `SELECT`s;
2nd item is not SQL-specific, and will be handled elsewhere separately,
this RFC we will continue to talk about SQL only plans;
Short-term distributed SQL plan
-------------------------------
At the moment parser, byte-code generation, and query execution is
tightly coupled in SQL code in Tarantool. This is side-effect of SQLite
architecture largely inherited by Tarantool from SQLite parser. And such
close coupling might become a road-blocker for us in the longer term, when
we would have to go to different nodes.
If we properly split query parser and query execution logics we will
simplify configuration, making it easier to approach distributed SQL.
- So, for the 1st, obvious step - we would need to create a
separate/builtin module tarantool-sqlparser which would wrap SQL
parsing in `parse(query)` method in a fashion similar to Mike
Siomkin' `sqlparser` module above;
- The `sql.parse(query)` method would need to return AST data structures,
exported via ffi.
- At the moment only SELECT and VIEW queries build AST during
parsing, this is major limitation, which would require some extra
refactoring later, but it's ok for the 1st stage.
- For the 2nd stage we would need to extend AST with more SQL
statement types, e.g. UPDATE / DELETE.
- Worth to mention, that current AST structures as defined in the
`sqlint.h` are quite similar to that used in Mike' sqlparser
module - for more details see comparison of LuaDataTypes.h to
sqlint.h in the Appendix A below;
- As we build AST we may enrich returned AST nodes with information
about builtin functions kinds and expression data types, specific
for our SQL parser;
- So in addition to currently available ways to run SQL queries via:
- direct `box.execute`,
- Or 2 step `box.prepare + box.execute`
- we would add `sql.parse` method, similar to `box.prepare`,
which should be similarly executable via `box.prepare + box.execute`;
- This refactoring with separation of parse step, should still maintain
fully working SQL execution cycle, i.e. with minimum code
modifications all relevant SQL queries should pass whole
Tarantool SQL test suite. (This might apply only to SELECT
queries, for stage 1);
Distributed testing scenario
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Assumption is that decoupled sql.parse method, should be powerful
enough to be able replace HyRise SQL parser in the gridql
proof-of-concept code. Once being published it will be another
indicator of intermediate success if it will be working with
Tarantool SQL parser module. (Mike is planning to publish cleaned
up code soon);
- There is no need though in gridql for anything beyond simple
SELECTs, which makes possible to create 1st implementation
without major refactorings in Tarantool SQL parser, having only
current data structures and code;
- Thus addition of AST support for DELETE, INSERT, UPDATE statements
will be done at stage #2, probably the next quarter, and it's
not a goal for current plan;
- i.e. we start with only read-only (SELECT and VIEW) queries, and
not support RW operations yet;
- INSERT/UPDATE/DELETE queries will be done afterward, once we have
distributed conflict manager and transaction managers implemented.
And it's subject to coordinated efforts with Alexander Lyapunov team;
Appendix A - AST data structures
--------------------------------
+-----------------------------------------------+-------------------------------+
| ``StatementType::kStmtSelect`` | ``ast_type::AST_TYPE_SELECT`` |
+===============================================+===============================+
|.. code:: c | |
| | |
| typedef struct { | **(there is none, yet)** |
| bool isValid; | |
| char* errorMsg; | |
| int errorLine; | |
| int errorColumn; | |
| size_t statementCount; | |
| struct LuaSQLStatement** statements; | |
| } LuaSQLParserResult; | |
| | |
+-----------------------------------------------+-------------------------------+
|.. code:: c |.. code:: c |
| | |
| typedef struct LuaSelectStatement { | struct Select { |
| struct LuaSQLStatement base; | ExprList *pEList; |
| | u8 op; |
| struct LuaTableRef* fromTable; | LogEst nSelectRow; |
| bool selectDistinct; | u32 selFlags; |
| | int iLimit, iOffset; |
| size_t selectListSize; | char zSelName[12]; |
| struct LuaExpr** selectList; | int addrOpenEphm[2]; |
| | SrcList *pSrc; |
| struct LuaExpr* whereClause; | Expr *pWhere; |
| | ExprList *pGroupBy; |
| struct LuaGroupByDescription* groupBy; | Expr *pHaving; |
| | ExprList *pOrderBy; |
| size_t setOperationCount; | Select *pPrior; |
| struct LuaSetOperation** setOperations; | Select *pNext; |
| | Expr *pLimit; |
| size_t orderCount; | Expr *pOffset; |
| struct LuaOrderDescription** order; | With *pWith; |
| | }; |
| size_t withDescriptionCount; | |
| struct LuaWithDescription** | |
| withDescriptions; | |
| struct LuaLimitDescription* limit; | |
| } LuaSelectStatement; | |
| | |
+-----------------------------------------------+-------------------------------+
--
Timur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-discussions/attachments/20201113/24c0ed13/attachment.html>
More information about the Tarantool-discussions
mailing list