[Tarantool-patches] [PATCH v2 15/16] box: introduce prepared statements

Sat Dec 7 02:18:05 MSK 2019

Thanks for the patch!

See 7 comments below.

On 20/11/2019 22:28, Nikita Pettik wrote:
> This patch introduces local prepared statements. Support of prepared
> statements in IProto protocol and netbox is added in the next patch.
> 
> Prepared statement is an opaque instance of SQL Virtual Machine. It can
> be executed several times without necessity of query recompilation. To
> achieve this one can use box.prepare(...) function. It takes string of
> SQL query to be prepared; returns extended set of meta-information
> including statement's string, parameter's types and names, types and
> names of columns of the resulting set, count of parameters to be bound.
> Lua object representing result of :prepare() invocation also features method
> :execute(). It corresponds to box.execute(stmt.sql_str), i.e. automatically
> substitutes string of prepared statement to be executed. Statements are
> held in prepared statement cache - for details see previous commit.
> After schema changes all prepared statement located in cache are
> considered to be expired - they are re-prepared automatically on demand.
> It is worth noting that box.execute() always attempts at finding
> statement to be executed in prepared statement cache. Thus, once statement
> is prepared, both box.execute() and :execute() methods will execute
> already compiled statement.
> 
> SQL cache memory limit is regulated by box{sql_cache_size} which can be
> set dynamically. Setting it to 0 completely erases cache.

1. The problem with cache size is that you gave users a way to
limit it, but didn't give a method to calculate how much memory
do they need for their statements. For example, a user knows
what statements he wants in the cache. How can he estimate the
cache size for the these statements? Does he need 100 bytes per
statement, 1KB, 100KB?

We either need a theoretical description how to calculate it,
even very rough, or maybe return a statement estimated size
on box.prepare(). Then a user at least can do some experiments
to estimate cache size.

Or you can add statistics to box.info about how many SQL
statements are prepared, and how much memory is occupied in
the cache. That is even better I think. In box.info.sql.

Maybe that is a good thing to do in a separate ticket. Don't
know.

> diff --git a/src/box/execute.c b/src/box/execute.c
> index d2a999099..2b5a9ba90 100644
> --- a/src/box/execute.c
> +++ b/src/box/execute.c> @@ -411,6 +412,59 @@ port_sql_dump_msgpack(struct port *port, struct obuf *out)
>  	return 0;
>  }
>  
> +static bool
> +sql_stmt_check_schema_version(struct sql_stmt *stmt)

2. 'Const' struct sql_stmt, please.

> +{
> +	return sql_stmt_schema_version(stmt) == box_schema_version();
> +}
> +
> +static int
> +sql_reprepare(struct stmt_cache_node **node)
> +{
> +	struct sql_stmt *stmt = (*node)->stmt;
> +	const char *sql_str = sql_stmt_query_str(stmt);
> +	struct sql_stmt *fresh_stmt;
> +	if (sql_compile(sql_str, strlen(sql_str), NULL,
> +			&fresh_stmt, NULL) != 0)
> +		return -1;
> +	sql_prepared_stmt_cache_delete(*node);
> +	if (sql_prepared_stmt_cache_insert(fresh_stmt) != 0) {
> +		sql_finalize(fresh_stmt);
> +		return -1;
> +	}
> +	sql_str = sql_stmt_query_str(fresh_stmt);
> +	*node = sql_prepared_stmt_cache_find(sql_str);
> +	return 0;
> +}
> +
> +int
> +sql_prepare(const char *sql, int len, struct port *port)
> +{
> +	struct stmt_cache_node *stmt_node = sql_prepared_stmt_cache_find(sql);
> +	struct sql_stmt *stmt;
> +	if (stmt_node == NULL) {
> +		if (sql_compile(sql, len, NULL, &stmt, NULL) != 0)
> +			return -1;
> +		if (sql_prepared_stmt_cache_insert(stmt) != 0) {
> +			sql_finalize(stmt);
> +			return -1;
> +		}
> +	} else {
> +		if (! sql_stmt_check_schema_version(stmt_node->stmt)) {
> +			if (sql_reprepare(&stmt_node) != 0)
> +				return -1;
> +		} else {
> +			sql_cache_stmt_refresh(stmt_node);
> +		}
> +		stmt = stmt_node->stmt;
> +	}
> +	enum sql_dump_format dump_format = sql_column_count(stmt) > 0 ?
> +					   DQL_PREPARE : DML_PREPARE;
> +	port_sql_create(port, stmt, dump_format, PREPARE);
> +

3. So, there is no any reference counting as I see?
Do you plan to add it? Because as you fairly mentioned,
in that implementation a statement, prepared in 100
different sessions, gets evicted the same as a statement
prepared only once.

However with reference counting there would be another
problem: how to increment it, actually? Each time
prepare() is called explicitly? And do nothing when only
execute() is called? And what to do when something needs
to be evicted?

> +	return 0;
> +}
> +
>  /**
>   * Execute prepared SQL statement.
>   *
> @@ -448,11 +502,47 @@ sql_execute(struct sql_stmt *stmt, struct port *port, struct region *region)
>  int
>  sql_prepare_and_execute(const char *sql, int len, const struct sql_bind *bind,
>  			uint32_t bind_count, struct port *port,
>  			struct region *region)
>  {
> +	struct stmt_cache_node *stmt_node = sql_prepared_stmt_cache_find(sql);
> +	if (stmt_node != NULL) {

4. In case the node is NULL, and it was prepared sometime before
that, and was not unprepared, then a user does not have a way how
to understand, that a new PREPARE is needed. That makes impossible
to rely on any long living 'prepare' handle on a client side.

A client either needs to call prepare() before doing multiple
execute() just in case, or we need to return something to signal
whether the execute() has hit the cache. For example, a flag
IPROTO_SQL_CACHE_HIT: boolean. Or something like that.

Or send a flag which you lead to autoreprepare. I don't know. But
certainly it should be left as is. Probably that is a subject for
a separate ticket.

> +		if (! sql_stmt_check_schema_version(stmt_node->stmt)) {
> +			if (sql_reprepare(&stmt_node) != 0)
> +				return -1;
> +		}
> +		if (! sql_stmt_busy(stmt_node->stmt)) {
> +			return sql_execute_prepared(stmt_node->stmt, bind,
> +						    bind_count, port, region);
> +		}
> +	}
> diff --git a/src/box/lua/execute.c b/src/box/lua/execute.c
> index 1b2f8d235..8d46399d4 100644
> --- a/src/box/lua/execute.c
> +++ b/src/box/lua/execute.c
> @@ -82,6 +146,55 @@ port_sql_dump_lua(struct port *port, struct lua_State *L, bool is_flat)
>  		}
>  		break;
>  	}
> +	case DQL_PREPARE: {
> +		/* Format is following:

5. Lets better stick to the one approved code
style: '/*' is on a separate line always. Here
and in the next 'case:'.

And now I see, why did you introduce dump format
enum. But I don't know how to comment it yet.
I think, on a next iteration after the other
fixes I will say more.

> diff --git a/test/sql/prepared.test.lua b/test/sql/prepared.test.lua
> new file mode 100644
> index 000000000..24e49832f
> --- /dev/null
> +++ b/test/sql/prepared.test.lua
> @@ -0,0 +1,196 @@
> +s = prepare("CREATE TRIGGER tr1 INSERT ON test1 FOR EACH ROW BEGIN DELETE FROM test1; END;")
> +execute(s.sql_str)
> +execute(s.sql_str)
> +
> +s = prepare("DROP TRIGGER tr1;")
> +execute(s.sql_str)
> +execute(s.sql_str)
> +
> +s = prepare("DROP TABLE test1;")
> +execute(s.sql_str)
> +execute(s.sql_str)

6. I saw Kostja asked about why do you increment
schema version on some new system space updates,
such as _trigger.

There is why - we generate DROP TABLE as manual
deletion of all space-related objects, and then of
the space itself. Space related objects include
triggers, ck, fk. And they are taken from struct
space during compilation. That is, if you won't
increment schema version on trigger/ck/fk update,
a prepared DROP TABLE won't take the changes into
account and won't be regenerated.

This test fails in case you don't update schema
version on _trigger update, right?

Talking of all the 'generate DROP TABLE by struct
space content' idea - I don't like it. But this
is how it works now. I future we will need to
open an iterator on the needed spaces at runtime.
Because lookup in struct space won't be available
on client side anyway.

> +f1 = fiber.new(implicit_yield)
> +f2 = fiber.new(implicit_yield)
> +f1:set_joinable(true)
> +f2:set_joinable(true)
> +
> +f1:join();
> +f2:join();
> +
> +test_run:cmd("setopt delimiter ''");
> +
> +box.space.TEST:drop()
> +box.schema.func.drop('SLEEP')
> 

7. On the whole, that is probably the best thing happened to SQL
in Tarantool for a very long time.