[Tarantool-discussions] Reusable SQL parser

Konstantin Osipov kostja.osipov at gmail.com
Thu Dec 12 12:14:31 MSK 2019


* Alexander Turenko <alexander.turenko at tarantool.org> [19/12/12 04:24]:

> > This is only a fraction of questions your proposal on this subject
> > should address.
> 
> Don't mind to share your thoughts on this. I would appreciate it.

I don't have much wisdom here, it's new even to me. I see the
following broad directions:

- RPC kind. Basically, create some kind of RPC which would allow
to serialize any C data structure, and deserialize it on receiving
end. The PRC should be naturally versioned. The underlying
protocol could be msgpack, but the devil is in how the msgpack is
used to marshal data. The rules must be
- general, so that any structure can be serialized/deserialized
- extensible, i.e include versioning

Then invoking an SQL code remotely would be a matter of
serializing function arguments, which can be any, and passing them
to the remote end. If the remote end has a different, incompatible
version it would return an RPC call exception.

This is a very broad solution which could be used in other areas
of the product as well. Generally tarantool avoided it. E.g. SWIM,
as a recent addition, uses hand-crafted msgpack packets, carefully
thought through for the purpose, but still extensible, not
general-purpose serialization.

- Tarantool style, hand-crafted MsgPack: identify core parser
  structures and map them to MsgPack representation. This is 
  extensible, safe, but has rather high maintenance overhead.
  I like it because I don't think Tarantool will have a very rich
  SQL grammar it will want to push down soon. The current grammar
  has maybe a hundred or so distinct nodes, it is entirely
  manageable. If content of any node changes, it could be mapped
  to a different MsgPack key or new members could be added to the
  map associated with the existing key.

  It is also easy to examine/debug because it's essentially JSON.

- text SQL representation. Basically, find a way to "restore" the 
original SQL from the AST, but in an unequivocal way. And then
pass SQL or SQL fragments around, and use SQL parser to restore
AST. This is what Tarantool already does when loads CHECK
expressions from data dictionary. I like it too, but similar to
msgpack, it will require going over every AST node and writing
a function which would represent it unequivocally as SQL text. 
The issue I have here is that it is actually difficult to freeze
the semantics of SQL expression. Imagine we stop uppercasing, as I
keep suggesting? The representation needs to quote every
identifier, just in case. It is also very readable. The other
issue is that if we want to pass around not just the AST, but some
node augmentation/annotations, like object versions, it may not
fit nicely into SQL representation. Imagine SELECT * FROM table
becomes SELECT "a"{192411}, b"{192412}" FROM "table"{192412}

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-discussions mailing list