From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5C64E46971A for ; Sat, 7 Dec 2019 02:18:07 +0300 (MSK) References: <534c76888e033e2f37a93ee0f20a1b5ef806498d.1574277369.git.korablev@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Sat, 7 Dec 2019 00:18:05 +0100 MIME-Version: 1.0 In-Reply-To: <534c76888e033e2f37a93ee0f20a1b5ef806498d.1574277369.git.korablev@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v2 15/16] box: introduce prepared statements List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik , tarantool-patches@dev.tarantool.org Thanks for the patch! See 7 comments below. On 20/11/2019 22:28, Nikita Pettik wrote: > This patch introduces local prepared statements. Support of prepared > statements in IProto protocol and netbox is added in the next patch. > > Prepared statement is an opaque instance of SQL Virtual Machine. It can > be executed several times without necessity of query recompilation. To > achieve this one can use box.prepare(...) function. It takes string of > SQL query to be prepared; returns extended set of meta-information > including statement's string, parameter's types and names, types and > names of columns of the resulting set, count of parameters to be bound. > Lua object representing result of :prepare() invocation also features method > :execute(). It corresponds to box.execute(stmt.sql_str), i.e. automatically > substitutes string of prepared statement to be executed. Statements are > held in prepared statement cache - for details see previous commit. > After schema changes all prepared statement located in cache are > considered to be expired - they are re-prepared automatically on demand. > It is worth noting that box.execute() always attempts at finding > statement to be executed in prepared statement cache. Thus, once statement > is prepared, both box.execute() and :execute() methods will execute > already compiled statement. > > SQL cache memory limit is regulated by box{sql_cache_size} which can be > set dynamically. Setting it to 0 completely erases cache. 1. The problem with cache size is that you gave users a way to limit it, but didn't give a method to calculate how much memory do they need for their statements. For example, a user knows what statements he wants in the cache. How can he estimate the cache size for the these statements? Does he need 100 bytes per statement, 1KB, 100KB? We either need a theoretical description how to calculate it, even very rough, or maybe return a statement estimated size on box.prepare(). Then a user at least can do some experiments to estimate cache size. Or you can add statistics to box.info about how many SQL statements are prepared, and how much memory is occupied in the cache. That is even better I think. In box.info.sql. Maybe that is a good thing to do in a separate ticket. Don't know. > diff --git a/src/box/execute.c b/src/box/execute.c > index d2a999099..2b5a9ba90 100644 > --- a/src/box/execute.c > +++ b/src/box/execute.c> @@ -411,6 +412,59 @@ port_sql_dump_msgpack(struct port *port, struct obuf *out) > return 0; > } > > +static bool > +sql_stmt_check_schema_version(struct sql_stmt *stmt) 2. 'Const' struct sql_stmt, please. > +{ > + return sql_stmt_schema_version(stmt) == box_schema_version(); > +} > + > +static int > +sql_reprepare(struct stmt_cache_node **node) > +{ > + struct sql_stmt *stmt = (*node)->stmt; > + const char *sql_str = sql_stmt_query_str(stmt); > + struct sql_stmt *fresh_stmt; > + if (sql_compile(sql_str, strlen(sql_str), NULL, > + &fresh_stmt, NULL) != 0) > + return -1; > + sql_prepared_stmt_cache_delete(*node); > + if (sql_prepared_stmt_cache_insert(fresh_stmt) != 0) { > + sql_finalize(fresh_stmt); > + return -1; > + } > + sql_str = sql_stmt_query_str(fresh_stmt); > + *node = sql_prepared_stmt_cache_find(sql_str); > + return 0; > +} > + > +int > +sql_prepare(const char *sql, int len, struct port *port) > +{ > + struct stmt_cache_node *stmt_node = sql_prepared_stmt_cache_find(sql); > + struct sql_stmt *stmt; > + if (stmt_node == NULL) { > + if (sql_compile(sql, len, NULL, &stmt, NULL) != 0) > + return -1; > + if (sql_prepared_stmt_cache_insert(stmt) != 0) { > + sql_finalize(stmt); > + return -1; > + } > + } else { > + if (! sql_stmt_check_schema_version(stmt_node->stmt)) { > + if (sql_reprepare(&stmt_node) != 0) > + return -1; > + } else { > + sql_cache_stmt_refresh(stmt_node); > + } > + stmt = stmt_node->stmt; > + } > + enum sql_dump_format dump_format = sql_column_count(stmt) > 0 ? > + DQL_PREPARE : DML_PREPARE; > + port_sql_create(port, stmt, dump_format, PREPARE); > + 3. So, there is no any reference counting as I see? Do you plan to add it? Because as you fairly mentioned, in that implementation a statement, prepared in 100 different sessions, gets evicted the same as a statement prepared only once. However with reference counting there would be another problem: how to increment it, actually? Each time prepare() is called explicitly? And do nothing when only execute() is called? And what to do when something needs to be evicted? > + return 0; > +} > + > /** > * Execute prepared SQL statement. > * > @@ -448,11 +502,47 @@ sql_execute(struct sql_stmt *stmt, struct port *port, struct region *region) > int > sql_prepare_and_execute(const char *sql, int len, const struct sql_bind *bind, > uint32_t bind_count, struct port *port, > struct region *region) > { > + struct stmt_cache_node *stmt_node = sql_prepared_stmt_cache_find(sql); > + if (stmt_node != NULL) { 4. In case the node is NULL, and it was prepared sometime before that, and was not unprepared, then a user does not have a way how to understand, that a new PREPARE is needed. That makes impossible to rely on any long living 'prepare' handle on a client side. A client either needs to call prepare() before doing multiple execute() just in case, or we need to return something to signal whether the execute() has hit the cache. For example, a flag IPROTO_SQL_CACHE_HIT: boolean. Or something like that. Or send a flag which you lead to autoreprepare. I don't know. But certainly it should be left as is. Probably that is a subject for a separate ticket. > + if (! sql_stmt_check_schema_version(stmt_node->stmt)) { > + if (sql_reprepare(&stmt_node) != 0) > + return -1; > + } > + if (! sql_stmt_busy(stmt_node->stmt)) { > + return sql_execute_prepared(stmt_node->stmt, bind, > + bind_count, port, region); > + } > + } > diff --git a/src/box/lua/execute.c b/src/box/lua/execute.c > index 1b2f8d235..8d46399d4 100644 > --- a/src/box/lua/execute.c > +++ b/src/box/lua/execute.c > @@ -82,6 +146,55 @@ port_sql_dump_lua(struct port *port, struct lua_State *L, bool is_flat) > } > break; > } > + case DQL_PREPARE: { > + /* Format is following: 5. Lets better stick to the one approved code style: '/*' is on a separate line always. Here and in the next 'case:'. And now I see, why did you introduce dump format enum. But I don't know how to comment it yet. I think, on a next iteration after the other fixes I will say more. > diff --git a/test/sql/prepared.test.lua b/test/sql/prepared.test.lua > new file mode 100644 > index 000000000..24e49832f > --- /dev/null > +++ b/test/sql/prepared.test.lua > @@ -0,0 +1,196 @@ > +s = prepare("CREATE TRIGGER tr1 INSERT ON test1 FOR EACH ROW BEGIN DELETE FROM test1; END;") > +execute(s.sql_str) > +execute(s.sql_str) > + > +s = prepare("DROP TRIGGER tr1;") > +execute(s.sql_str) > +execute(s.sql_str) > + > +s = prepare("DROP TABLE test1;") > +execute(s.sql_str) > +execute(s.sql_str) 6. I saw Kostja asked about why do you increment schema version on some new system space updates, such as _trigger. There is why - we generate DROP TABLE as manual deletion of all space-related objects, and then of the space itself. Space related objects include triggers, ck, fk. And they are taken from struct space during compilation. That is, if you won't increment schema version on trigger/ck/fk update, a prepared DROP TABLE won't take the changes into account and won't be regenerated. This test fails in case you don't update schema version on _trigger update, right? Talking of all the 'generate DROP TABLE by struct space content' idea - I don't like it. But this is how it works now. I future we will need to open an iterator on the needed spaces at runtime. Because lookup in struct space won't be available on client side anyway. > +f1 = fiber.new(implicit_yield) > +f2 = fiber.new(implicit_yield) > +f1:set_joinable(true) > +f2:set_joinable(true) > + > +f1:join(); > +f2:join(); > + > +test_run:cmd("setopt delimiter ''"); > + > +box.space.TEST:drop() > +box.schema.func.drop('SLEEP') > 7. On the whole, that is probably the best thing happened to SQL in Tarantool for a very long time.