From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id D5BFB2949B for ; Thu, 22 Mar 2018 09:09:31 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6A_SY85q_AQG for ; Thu, 22 Mar 2018 09:09:31 -0400 (EDT) Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 1F01C29498 for ; Thu, 22 Mar 2018 09:09:30 -0400 (EDT) Received: by smtp42.i.mail.ru with esmtpa (envelope-from ) id 1eyzyC-0002IJ-VC for tarantool-patches@freelists.org; Thu, 22 Mar 2018 16:09:29 +0300 From: "v.shpilevoy@tarantool.org" Content-Type: multipart/alternative; boundary="Apple-Mail=_D1D7BF84-486B-46BD-A423-49E8F45AF407" Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: [tarantool-patches] Re: [PATCH 2/4] sql: introduce opcodes to operate on system spaces Date: Thu, 22 Mar 2018 16:09:27 +0300 References: <6AAB3A70-E619-4232-AFC9-5EAB27D03961@tarantool.org> <430BF95D-84E2-4A63-915D-A079C5E97285@tarantool.org> In-Reply-To: <430BF95D-84E2-4A63-915D-A079C5E97285@tarantool.org> Message-Id: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org --Apple-Mail=_D1D7BF84-486B-46BD-A423-49E8F45AF407 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > 22 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2018 =D0=B3., =D0=B2 15:23, n.pettik = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0= ): >=20 >=20 >> On 22 Mar 2018, at 14:42, v.shpilevoy@tarantool.org = wrote: >>=20 >> See below 6 comments. >>=20 >>> 21 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2018 =D0=B3., =D0=B2 2:48, Nikita = Pettik > = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): >>>=20 >>> Since it is impossible to use space pointers during execution of DDL >>> (after any DDL schema may change and pointers fetched at compile = time >>> can become expired), special opcodes to operate on system spaces = have >>> been introduced: OP_SInsert and OP_SDelete. They take space id as >>> an argument and make space lookup each time before insertion or = deletion. >>> However, sometimes it is still required to iterate through space = (e.g. >>> to satisfy WHERE clause) during DDL. As far as now cursors rely on >>> pointers to space, it is required to convert space id to space = pointer >>> during VDBE execution. Hence, another opcode is added: OP_SIDtoPtr. >>> Finally, existing opcodes, which are invoked only during DDL, have = been >>> also refactored. >>>=20 >>> Part of #3252 >>> --- >>> src/box/sql.c | 51 ++++++++++++------- >>> src/box/sql/opcodes.c | 45 +++++++++-------- >>> src/box/sql/opcodes.h | 53 ++++++++++---------- >>> src/box/sql/tarantoolInt.h | 11 ++--- >>> src/box/sql/vdbe.c | 121 = ++++++++++++++++++++++++++++++++++----------- >>> 5 files changed, 182 insertions(+), 99 deletions(-) >>>=20 >>> diff --git a/src/box/sql.c b/src/box/sql.c >>> index e943131ae..db4c7e7ea 100644 >>> --- a/src/box/sql.c >>> +++ b/src/box/sql.c >>> +int >>> +sql_delete_by_key(struct space *space, char *key, uint32_t = key_size) >>> +{ >>> struct request request; >>> memset(&request, 0, sizeof(request)); >>> request.type =3D IPROTO_DELETE; >>> request.key =3D key; >>> request.key_end =3D key + key_size; >>> - request.space_id =3D pCur->space->def->id; >>> - rc =3D sql_execute_dml(&request, pCur->space); >>> + request.space_id =3D space->def->id; >>> + int rc =3D sql_execute_dml(&request, space); >>=20 >> 1. Why do you need sql_execute_dml? It does exactly the same as = process_rw, but process_rw can also return a tuple. And this process_rw = feature is needed for the patch on last inserted tuple returning. >=20 > I need it as an analogue of process_rw(), since process_rw() is = unavailable=20 > via public BOX interface (declared as static). If you approve to make = it public, > where should I place it and should I rename it? > Or, for instance, make sql_execute_dml be complete copy of = process_rw()? > It is unlikely to be useful somewhere except for SQL (and box = internals surely). > Initial indent was to reduce overhead while calling box functions from = SQL bindings (sql.c): > ordinary call stack looks like: box_insert() -> box_process1() -> = space lookup + process_rw(). > In such scheme we also process redundant space lookup since we already = have struct space * in cursor. > (Patch where I introduced this function isn=E2=80=99t in trunk, so I = can easily fix it). Yes, I very like the removal of box_process1. I just do not like code = duplicate. But also I can not see a place for declaration of process_rw = in any header. So lets make it extern "C" non-static in box.cc, and = declare it as extern in sql.c. >=20 >>>=20 >>> @@ -1088,20 +1107,16 @@ int tarantoolSqlite3IncrementMaxid(BtCursor = *pCur) >>> request.ops =3D ops; >>> request.ops_end =3D ops + sizeof(ops); >>> request.type =3D IPROTO_UPSERT; >>> - request.space_id =3D pCur->space->def->id; >>> - rc =3D sql_execute_dml(&request, pCur->space); >>> + request.space_id =3D BOX_SCHEMA_ID; >>> + rc =3D sql_execute_dml(&request, space_schema); >>=20 >> 2. If you use process_rw, then you do not need iterator to get the = previous max_id and increment it - you just do process_rw, get result = and do tuple_field_u64(result, 1, space_max_id); Suppose, this will be fixed, if you will use process_rw, as described = above. >>=20 >>> if (rc !=3D 0) { >>> iterator_delete(it); >>> return SQL_TARANTOOL_ERROR; >>> } >>> - if (pCur->last_tuple !=3D NULL) { >>> - box_tuple_unref(pCur->last_tuple); >>> - } >>> - box_tuple_ref(res); >>> - pCur->last_tuple =3D res; >>> - pCur->eState =3D CURSOR_VALID; >>> + rc =3D tuple_field_u64(res, 1, space_max_id); >>> + (*space_max_id)++; >>> iterator_delete(it); >>> - return SQLITE_OK; >>> + return rc =3D=3D 0 ? SQLITE_OK : SQL_TARANTOOL_ERROR; >>> } >>>=20 >>> /* >>> @@ -1687,14 +1702,12 @@ int = tarantoolSqlite3EphemeralGetMaxId(BtCursor *pCur, uint32_t fieldno, >>> * If index is empty - return 0 in max_id and success status >>> */ >>> int >>> -tarantoolSqlGetMaxId(BtCursor *cur, uint32_t fieldno, >>> - uint64_t *max_id) >>> +tarantoolSqlGetMaxId(uint32_t space_id, uint64_t *max_id) >>=20 >> 3. After changing a function signature, please, fix a comments too. = And why this function is sure, that the max id stored in the first = column? And why it can not be used for secondary indexes? >=20 > It might, but it is only applied for _sequence space during DDL = routine > (and this is also answer for the first question: format of system = space doesn=E2=80=99t change frequently), > so I decided to remove cursor/index overhead (as a part of overall DDL = refactoring). > Probably, I should also rename this opcode to make it more clear. Yes, lets rename it, and remove space_id argument, if it is always used = for _sequence space. >>>=20 >>> +/* Opcode: SInsert P1 P2 * * P5 >>> + * Synopsis: space id =3D P1, key =3D r[P2] >>> + * >>> + * This opcode is used only during DML routine. >>=20 >> 4. Do you mean DDL instead of DML? Hm? >>=20 >>> + * In contrast to ordinary insertion, insertion to system spaces >>> + * such as _space or _index will lead to schema changes. >>> + * Thus, usage of space pointers is going to be impossible, >>> + * as far as pointers can be expired since compilation time. >>> + * >>> + * If P5 is set to OPFLAG_NCHANGE, account overall changes >>> + * made to database. >>> + */ >>> +case OP_SInsert: { >>> + assert(pOp->p1 > 0); >>> + >>> +/* Opcode: SIDtoPtr P1 P2 * * * >>> + * Synopsis: space id =3D P1, space[out] =3D r[P2] >>> + * >>> + * This opcode makes look up by space id and save found space >>> + * into register, specified by the content of register P2. >>> + * Such trick is needed during DML routine, since schema may >>> + * change and pointers become expired. >>> + */ >>> +case OP_SIDtoPtr: { >>=20 >> 6. It seems to be unused in the patch. Can you use it for all these = things, introduced by the previous patch? >> struct space *space =3D = space_by_id(SQLITE_PAGENO_TO_SPACEID(pIdx->tnum)); >> assert(space !=3D NULL); >> sqlite3VdbeAddOp4Ptr(v, OP_LoadPtr, 0, space_ptr_reg, 0, (void *) = space); >> sqlite3VdbeAddOp3(v, OP_OpenRead, iIdxCur, pIdx->tnum, = space_ptr_reg); >>=20 >> Using OP_SIDtoPtr you already in this patch can defer space_by_id = lookup. >=20 > It is used in the next patch. I don=E2=80=99t want to make a lot of = diffs, since introducing of any opcodes > leads to auto-generating of opcodes.c/h and as a result to vast amount = of redundant diff. Ok, then lets leave it in this patch. > As for your example, during SQL DDL we need to execute queries such = as: > 'DELETE FROM BOX_SEQUENCE WHERE name =3D zName;=E2=80=99 > And =E2=80=98name=E2=80=99 is not a key. Hence, we have to open cursor = and iterate through the space. > But if we used pointers to spaces which were fetched at compile state, > they would become expired at VDBE runtime: > after first stage of DDL (e.g. deletion from _trigger), schema may = change. > Moreover, we are going to support syntax like: =E2=80=98CREATE TABLE = AS SELECT =E2=80=A6=E2=80=99, > where DDL and DML are mixing. Thus we need facility to make space = lookup at VDBE runtime. I agree, that we must have this opcode, but why it can not be used for = DML too? In my example you at compile time get space *, store it by = OP_LoadPtr, and then use to open iterator (but I can not see where - I = commented this in a previous patch review), but for DML you can replace = compile lookups on runtime lookups using OP_SIDtoPtr. And it also solves = the problem, that a prepared VDBE statement (which in a future we will = store for a long time) must store space * in opcodes - we can use = OP_SIDtoPtr and do not store it until execution. Sorry, can not explain = more clear. We can also discuss it face-to-face. >=20 --Apple-Mail=_D1D7BF84-486B-46BD-A423-49E8F45AF407 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

22 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2018 =D0=B3., =D0=B2 15:23, = n.pettik <korablev@tarantool.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0= =B0=D0=BB(=D0=B0):


On 22 Mar 2018, at 14:42, v.shpilevoy@tarantool.org wrote:

See below 6 comments.
21 =D0=BC=D0=B0=D1=80=D1=82=D0= =B0 2018 =D0=B3., =D0=B2 2:48, Nikita Pettik <korablev@tarantool.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0= =B0=D0=BB(=D0=B0):

Since it is impossible = to use space pointers during execution of DDL
(after any = DDL schema may change and pointers fetched at compile time
can become expired), special opcodes to operate on system = spaces have
been introduced: OP_SInsert and OP_SDelete. = They take space id as
an argument and make space lookup = each time before insertion or deletion.
However, sometimes = it is still required to iterate through space (e.g.
to = satisfy WHERE clause) during DDL. As far as now cursors rely on
pointers to space, it is required to convert space id to = space pointer
during VDBE execution. Hence, another opcode = is added: OP_SIDtoPtr.
Finally, existing opcodes, which = are invoked only during DDL, have been
also refactored.

Part of #3252
---
src/box/sql.c =             &n= bsp;|  51 ++++++++++++-------
src/box/sql/opcodes.c =      |  45 +++++++++--------
src/box/sql/opcodes.h      | =  53 ++++++++++----------
src/box/sql/tarantoolInt.h | =  11 ++---
src/box/sql/vdbe.c =         | 121 = ++++++++++++++++++++++++++++++++++-----------
5 files = changed, 182 insertions(+), 99 deletions(-)

diff --git a/src/box/sql.c b/src/box/sql.c
index = e943131ae..db4c7e7ea 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
+int
+sql_delete_by_key(struct space *space, char *key, uint32_t = key_size)
+{
struct request request;
= memset(&request, 0, sizeof(request));
= request.type =3D IPROTO_DELETE;
= request.key =3D key;
request.key_end =3D key + = key_size;
- request.space_id =3D = pCur->space->def->id;
- rc =3D = sql_execute_dml(&request, pCur->space);
+ = request.space_id =3D space->def->id;
+ int rc =3D = sql_execute_dml(&request, space);

1. Why do you need sql_execute_dml? It does = exactly the same as process_rw, but process_rw can also return a tuple. = And this process_rw feature is needed for the patch on last inserted = tuple returning.

I need it as an analogue = of process_rw(), since process_rw() is unavailable 
via public BOX interface (declared as static). If you approve = to make it public,
where should I place it and = should I rename it?
Or, for instance, make = sql_execute_dml be complete copy of process_rw()?
It = is unlikely to be useful somewhere except for SQL (and box internals = surely).
Initial indent was to reduce overhead = while calling box functions from SQL bindings (sql.c):
ordinary call stack looks like: box_insert() -> = box_process1() -> space lookup + process_rw().
In = such scheme we also process redundant space lookup since we already have = struct space * in cursor.
(Patch where I introduced = this function isn=E2=80=99t in trunk, so I can easily fix = it).

Yes, I very like the removal of box_process1. I = just do not like code duplicate. But also I can not see a place for = declaration of process_rw in any header. So lets make it extern "C" = non-static in box.cc, and = declare it as extern in sql.c.



@@ -1088,20 +1107,16 @@ int = tarantoolSqlite3IncrementMaxid(BtCursor *pCur)
= request.ops =3D ops;
request.ops_end =3D ops + = sizeof(ops);
request.type =3D = IPROTO_UPSERT;
- request.space_id =3D = pCur->space->def->id;
- rc =3D = sql_execute_dml(&request, pCur->space);
+ = request.space_id =3D BOX_SCHEMA_ID;
+ rc =3D = sql_execute_dml(&request, space_schema);

2. If you use process_rw, then you do not = need iterator to get the previous max_id and increment it - you just do = process_rw, get result and do tuple_field_u64(result, 1, = space_max_id);

Suppose, this will be fixed, if you will use = process_rw, as described above.


if (rc !=3D 0) {
= = iterator_delete(it);
return SQL_TARANTOOL_ERROR;
= }
- if (pCur->last_tuple !=3D = NULL) {
- = box_tuple_unref(pCur->last_tuple);
- }
- = box_tuple_ref(res);
- pCur->last_tuple =3D res;
- = pCur->eState =3D CURSOR_VALID;
+ rc =3D = tuple_field_u64(res, 1, space_max_id);
+ = (*space_max_id)++;
iterator_delete(it);
- = return SQLITE_OK;
+ return rc  =3D=3D 0 ? = SQLITE_OK : SQL_TARANTOOL_ERROR;
}

/*
@@ -1687,14 +1702,12 @@ int = tarantoolSqlite3EphemeralGetMaxId(BtCursor *pCur, uint32_t fieldno,
* If index is empty - return 0 in max_id and success = status
*/
int
-tarantoolSqlGetMaxId(BtCursor *cur, uint32_t fieldno,
- = =      uint6= 4_t *max_id)
+tarantoolSqlGetMaxId(uint32_t space_id, = uint64_t *max_id)

3. After changing a function = signature, please, fix a comments too. And why this function is sure, = that the max id stored in the first column? And why it can not be used = for secondary indexes?

It might, but it is only applied for _sequence space during = DDL routine
(and this is also answer for the first = question: format of system space doesn=E2=80=99t change = frequently),
so I decided to remove cursor/index = overhead (as a part of overall DDL refactoring).
Probably, I should also rename this opcode to make it more = clear.

Yes, lets rename it, and remove space_id argument, = if it is always used for _sequence space.

+/* Opcode: = SInsert P1 P2 * * P5
+ * Synopsis: space id =3D P1, key =3D = r[P2]
+ *
+ * This opcode is used only = during DML routine.

4. Do you mean DDL instead of = DML?

Hm?


+ * In contrast to ordinary insertion, insertion to system = spaces
+ * such as _space or _index will lead to schema = changes.
+ * Thus, usage of space pointers is going to be = impossible,
+ * as far as pointers can be expired since = compilation time.
+ *
+ * If P5 is set to = OPFLAG_NCHANGE, account overall changes
+ * made to = database.
+ */
+case OP_SInsert: {
+ = assert(pOp->p1 > 0);
+
+/* Opcode: SIDtoPtr P1 P2 * * *
+ * Synopsis: space id =3D P1, space[out] =3D r[P2]
+ *
+ * This opcode makes look up by space id = and save found space
+ * into register, specified by the = content of register P2.
+ * Such trick is needed during = DML routine, since schema may
+ * change and pointers = become expired.
+ */
+case OP_SIDtoPtr: {

6. It seems to be unused in the patch. = Can you use it for all these things, introduced by the previous = patch?
struct space *space =3D = space_by_id(SQLITE_PAGENO_TO_SPACEID(pIdx->tnum));
assert(space !=3D NULL);
sqlite3VdbeAddOp4Ptr(v, OP_LoadPtr, 0, = space_ptr_reg, 0, (void *) space);
sqlite3VdbeAddOp3(v, = OP_OpenRead, iIdxCur, pIdx->tnum, space_ptr_reg);
Using OP_SIDtoPtr you already in this patch can = defer space_by_id lookup.

It is used in the next patch. I don=E2=80=99t want to make a = lot of diffs, since introducing of any opcodes
leads = to auto-generating of opcodes.c/h and as a result to vast amount of = redundant diff.

Ok, then lets leave it in this patch.

As for your example, during SQL DDL we need to execute = queries such as:
'DELETE FROM BOX_SEQUENCE WHERE = name =3D zName;=E2=80=99
And =E2=80=98name=E2=80=99 = is not a key. Hence, we have to open cursor and iterate through the = space.
But if we used pointers to spaces which were = fetched at compile state,
they would become expired = at VDBE runtime:
after first stage of DDL (e.g. = deletion from _trigger), schema may change.
Moreover,= we are going to support syntax like: =E2=80=98CREATE TABLE AS SELECT = =E2=80=A6=E2=80=99,
where DDL and DML are mixing. = Thus we need facility to make space lookup at VDBE = runtime.

I agree, that we must have this opcode, but why it = can not be used for DML too? In my example you at compile time get space = *, store it by OP_LoadPtr, and then use to open iterator (but I can not see = where - I commented this in a previous patch review), but for DML you = can replace compile lookups on runtime lookups = using OP_SIDtoPtr. And it also solves the problem, that a = prepared VDBE statement (which in a future we will store for a long = time) must store space * in opcodes - we can use OP_SIDtoPtr and do not = store it until execution. Sorry, can not explain more clear. We can also = discuss it face-to-face.



= --Apple-Mail=_D1D7BF84-486B-46BD-A423-49E8F45AF407--