From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AA4F546971A for ; Thu, 12 Dec 2019 12:14:34 +0300 (MSK) Received: by mail-lf1-f47.google.com with SMTP id l18so1131690lfc.1 for ; Thu, 12 Dec 2019 01:14:34 -0800 (PST) Date: Thu, 12 Dec 2019 12:14:31 +0300 From: Konstantin Osipov Message-ID: <20191212091431.GD24448@atlas> References: <20191124040710.d232movrzjefducx@tkn_work_nb> <20191204173855.GC19235@atlas> <20191212012132.iiy2ww4qxcjftpk2@tkn_work_nb> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191212012132.iiy2ww4qxcjftpk2@tkn_work_nb> Subject: Re: [Tarantool-discussions] Reusable SQL parser List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Turenko Cc: tarantool-discussions@dev.tarantool.org, Konstantin Nazarov , Nick Karlov * Alexander Turenko [19/12/12 04:24]: > > This is only a fraction of questions your proposal on this subject > > should address. > > Don't mind to share your thoughts on this. I would appreciate it. I don't have much wisdom here, it's new even to me. I see the following broad directions: - RPC kind. Basically, create some kind of RPC which would allow to serialize any C data structure, and deserialize it on receiving end. The PRC should be naturally versioned. The underlying protocol could be msgpack, but the devil is in how the msgpack is used to marshal data. The rules must be - general, so that any structure can be serialized/deserialized - extensible, i.e include versioning Then invoking an SQL code remotely would be a matter of serializing function arguments, which can be any, and passing them to the remote end. If the remote end has a different, incompatible version it would return an RPC call exception. This is a very broad solution which could be used in other areas of the product as well. Generally tarantool avoided it. E.g. SWIM, as a recent addition, uses hand-crafted msgpack packets, carefully thought through for the purpose, but still extensible, not general-purpose serialization. - Tarantool style, hand-crafted MsgPack: identify core parser structures and map them to MsgPack representation. This is extensible, safe, but has rather high maintenance overhead. I like it because I don't think Tarantool will have a very rich SQL grammar it will want to push down soon. The current grammar has maybe a hundred or so distinct nodes, it is entirely manageable. If content of any node changes, it could be mapped to a different MsgPack key or new members could be added to the map associated with the existing key. It is also easy to examine/debug because it's essentially JSON. - text SQL representation. Basically, find a way to "restore" the original SQL from the AST, but in an unequivocal way. And then pass SQL or SQL fragments around, and use SQL parser to restore AST. This is what Tarantool already does when loads CHECK expressions from data dictionary. I like it too, but similar to msgpack, it will require going over every AST node and writing a function which would represent it unequivocally as SQL text. The issue I have here is that it is actually difficult to freeze the semantics of SQL expression. Imagine we stop uppercasing, as I keep suggesting? The representation needs to quote every identifier, just in case. It is also very readable. The other issue is that if we want to pass around not just the AST, but some node augmentation/annotations, like object versions, it may not fit nicely into SQL representation. Imagine SELECT * FROM table becomes SELECT "a"{192411}, b"{192412}" FROM "table"{192412} -- Konstantin Osipov, Moscow, Russia