Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] Reusable SQL parser
@ 2019-11-24  4:07 Alexander Turenko
  2019-12-04 17:38 ` Konstantin Osipov
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Turenko @ 2019-11-24  4:07 UTC (permalink / raw)
  To: tarantool-discussions; +Cc: Nick Karlov, Konstantin Nazarov

Discussions about ways to reuse Tarantool SQL parser in applications and
connectors arise from time to time. I want to share a very high-level proposal
how we can reuse the parser in an application and provide some examples we
heard from users.

Proposal
--------

Proposed to extract or expose the SQL parser from Tarantool to provide ability
to generate AST accessible from Lua in an application and make transformations
on the AST.

The AST can be evaluated lazily: when a node is accessed from a Lua land, then
a Lua object (cdata or userdata) is created.

Proposed to provide an ability to generate an SQL request back from this AST
rather then allow to execute an AST. This way looks more debuggable, while
performance considerations may be eliminated using prepared statements.

NB: Implementation of a Lua API to retrieve AST looks as the good first step
to draw a line between the parser and the executor and so looks as a step to
extract the parser into its own library in the future (to use in connectors).

Let's consider a few examples.

Transparent versioning
----------------------

Example of transformation that a user may want to implement is transparently
select a last version of an object. Let's define a table and fill it with
versioned data.

 | CREATE TABLE objects(rowid INTEGER PRIMARY KEY AUTOINCREMENT, id INTEGER, \
 |                      version INTEGER, value VARCHAR(100))
 | INSERT INTO objects VALUES (null, 1, 1, 'record 1 version 1')
 | INSERT INTO objects VALUES (null, 1, 2, 'record 1 version 2')
 | INSERT INTO objects VALUES (null, 1, 3, 'record 1 version 3')
 | INSERT INTO objects VALUES (null, 2, 1, 'record 2 version 1')
 | INSERT INTO objects VALUES (null, 2, 2, 'record 2 version 2')
 | INSERT INTO objects VALUES (null, 3, 2, 'record 3 version 1')

Then, when a user of an app want to select data from this table:

 | SELECT id, value FROM objects

we transparently apply a transformation to retrieve only last version of each
object:

 | SELECT o1.id, o1.value FROM objects o1 LEFT JOIN objects o2 \
 | on (o1.id = o2.id and o1.version < o2.version) WHERE o2.version IS NULL

Based on this SO answer: https://stackoverflow.com/a/1313293/1598057

Transparent sharding
--------------------

We may transparently split an SQL request on a sharding router to several
requests: a part will be executed on the router itself, other parts on certain
storages.

I have no certain example now, so consider this as a kind of second-tier goal
rather then motivating example.

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] Reusable SQL parser
  2019-11-24  4:07 [Tarantool-discussions] Reusable SQL parser Alexander Turenko
@ 2019-12-04 17:38 ` Konstantin Osipov
  2019-12-12  1:21   ` Alexander Turenko
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Osipov @ 2019-12-04 17:38 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-discussions, Konstantin Nazarov, Nick Karlov

* Alexander Turenko <alexander.turenko@tarantool.org> [19/11/24 10:13]:
> Discussions about ways to reuse Tarantool SQL parser in applications and
> connectors arise from time to time. I want to share a very high-level proposal
> how we can reuse the parser in an application and provide some examples we
> heard from users.

I think this proposal as formulated here is a no-brainer.

The question is how best to do it. Shall we try to be ABI
compatible? How to version the ABI? How to operate heterogenous
clusters and clusters during an upgrade?

This is only a fraction of questions your proposal on this subject
should address.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] Reusable SQL parser
  2019-12-04 17:38 ` Konstantin Osipov
@ 2019-12-12  1:21   ` Alexander Turenko
  2019-12-12  9:14     ` Konstantin Osipov
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Turenko @ 2019-12-12  1:21 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-discussions, Konstantin Nazarov, Nick Karlov

On Wed, Dec 04, 2019 at 08:38:55PM +0300, Konstantin Osipov wrote:
> * Alexander Turenko <alexander.turenko@tarantool.org> [19/11/24 10:13]:
> > Discussions about ways to reuse Tarantool SQL parser in applications and
> > connectors arise from time to time. I want to share a very high-level proposal
> > how we can reuse the parser in an application and provide some examples we
> > heard from users.
> 
> I think this proposal as formulated here is a no-brainer.

Sure.

I wrote this (and other) proposals here in order to share motivation
under features we intend to implement: what exactly we asked for or what
do we want to achieve. So it is more about 'why', then 'how'.

Usually it is squeezing from one or several meetings.

My intention here is to have a linkable (and so public) statements about
motivation under our work, especially when it is relatively big.

Maybe you may suggest better wording for such statements.

> The question is how best to do it. Shall we try to be ABI
> compatible? How to version the ABI? How to operate heterogenous
> clusters and clusters during an upgrade?

Yep.

> This is only a fraction of questions your proposal on this subject
> should address.

Don't mind to share your thoughts on this. I would appreciate it.

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] Reusable SQL parser
  2019-12-12  1:21   ` Alexander Turenko
@ 2019-12-12  9:14     ` Konstantin Osipov
  0 siblings, 0 replies; 4+ messages in thread
From: Konstantin Osipov @ 2019-12-12  9:14 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-discussions, Konstantin Nazarov, Nick Karlov

* Alexander Turenko <alexander.turenko@tarantool.org> [19/12/12 04:24]:

> > This is only a fraction of questions your proposal on this subject
> > should address.
> 
> Don't mind to share your thoughts on this. I would appreciate it.

I don't have much wisdom here, it's new even to me. I see the
following broad directions:

- RPC kind. Basically, create some kind of RPC which would allow
to serialize any C data structure, and deserialize it on receiving
end. The PRC should be naturally versioned. The underlying
protocol could be msgpack, but the devil is in how the msgpack is
used to marshal data. The rules must be
- general, so that any structure can be serialized/deserialized
- extensible, i.e include versioning

Then invoking an SQL code remotely would be a matter of
serializing function arguments, which can be any, and passing them
to the remote end. If the remote end has a different, incompatible
version it would return an RPC call exception.

This is a very broad solution which could be used in other areas
of the product as well. Generally tarantool avoided it. E.g. SWIM,
as a recent addition, uses hand-crafted msgpack packets, carefully
thought through for the purpose, but still extensible, not
general-purpose serialization.

- Tarantool style, hand-crafted MsgPack: identify core parser
  structures and map them to MsgPack representation. This is 
  extensible, safe, but has rather high maintenance overhead.
  I like it because I don't think Tarantool will have a very rich
  SQL grammar it will want to push down soon. The current grammar
  has maybe a hundred or so distinct nodes, it is entirely
  manageable. If content of any node changes, it could be mapped
  to a different MsgPack key or new members could be added to the
  map associated with the existing key.

  It is also easy to examine/debug because it's essentially JSON.

- text SQL representation. Basically, find a way to "restore" the 
original SQL from the AST, but in an unequivocal way. And then
pass SQL or SQL fragments around, and use SQL parser to restore
AST. This is what Tarantool already does when loads CHECK
expressions from data dictionary. I like it too, but similar to
msgpack, it will require going over every AST node and writing
a function which would represent it unequivocally as SQL text. 
The issue I have here is that it is actually difficult to freeze
the semantics of SQL expression. Imagine we stop uppercasing, as I
keep suggesting? The representation needs to quote every
identifier, just in case. It is also very readable. The other
issue is that if we want to pass around not just the AST, but some
node augmentation/annotations, like object versions, it may not
fit nicely into SQL representation. Imagine SELECT * FROM table
becomes SELECT "a"{192411}, b"{192412}" FROM "table"{192412}

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-12  9:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-24  4:07 [Tarantool-discussions] Reusable SQL parser Alexander Turenko
2019-12-04 17:38 ` Konstantin Osipov
2019-12-12  1:21   ` Alexander Turenko
2019-12-12  9:14     ` Konstantin Osipov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox