[tarantool-patches] Re: [PATCH 2/4] sql: introduce opcodes to operate on system spaces

v.shpilevoy at tarantool.org v.shpilevoy at tarantool.org
Thu Mar 22 16:09:27 MSK 2018



> 22 марта 2018 г., в 15:23, n.pettik <korablev at tarantool.org> написал(а):
> 
> 
>> On 22 Mar 2018, at 14:42, v.shpilevoy at tarantool.org <mailto:v.shpilevoy at tarantool.org> wrote:
>> 
>> See below 6 comments.
>> 
>>> 21 марта 2018 г., в 2:48, Nikita Pettik <korablev at tarantool.org <mailto:korablev at tarantool.org>> написал(а):
>>> 
>>> Since it is impossible to use space pointers during execution of DDL
>>> (after any DDL schema may change and pointers fetched at compile time
>>> can become expired), special opcodes to operate on system spaces have
>>> been introduced: OP_SInsert and OP_SDelete. They take space id as
>>> an argument and make space lookup each time before insertion or deletion.
>>> However, sometimes it is still required to iterate through space (e.g.
>>> to satisfy WHERE clause) during DDL. As far as now cursors rely on
>>> pointers to space, it is required to convert space id to space pointer
>>> during VDBE execution. Hence, another opcode is added: OP_SIDtoPtr.
>>> Finally, existing opcodes, which are invoked only during DDL, have been
>>> also refactored.
>>> 
>>> Part of #3252
>>> ---
>>> src/box/sql.c              |  51 ++++++++++++-------
>>> src/box/sql/opcodes.c      |  45 +++++++++--------
>>> src/box/sql/opcodes.h      |  53 ++++++++++----------
>>> src/box/sql/tarantoolInt.h |  11 ++---
>>> src/box/sql/vdbe.c         | 121 ++++++++++++++++++++++++++++++++++-----------
>>> 5 files changed, 182 insertions(+), 99 deletions(-)
>>> 
>>> diff --git a/src/box/sql.c b/src/box/sql.c
>>> index e943131ae..db4c7e7ea 100644
>>> --- a/src/box/sql.c
>>> +++ b/src/box/sql.c
>>> +int
>>> +sql_delete_by_key(struct space *space, char *key, uint32_t key_size)
>>> +{
>>> 	struct request request;
>>> 	memset(&request, 0, sizeof(request));
>>> 	request.type = IPROTO_DELETE;
>>> 	request.key = key;
>>> 	request.key_end = key + key_size;
>>> -	request.space_id = pCur->space->def->id;
>>> -	rc = sql_execute_dml(&request, pCur->space);
>>> +	request.space_id = space->def->id;
>>> +	int rc = sql_execute_dml(&request, space);
>> 
>> 1. Why do you need sql_execute_dml? It does exactly the same as process_rw, but process_rw can also return a tuple. And this process_rw feature is needed for the patch on last inserted tuple returning.
> 
> I need it as an analogue of process_rw(), since process_rw() is unavailable 
> via public BOX interface (declared as static). If you approve to make it public,
> where should I place it and should I rename it?
> Or, for instance, make sql_execute_dml be complete copy of process_rw()?
> It is unlikely to be useful somewhere except for SQL (and box internals surely).
> Initial indent was to reduce overhead while calling box functions from SQL bindings (sql.c):
> ordinary call stack looks like: box_insert() -> box_process1() -> space lookup + process_rw().
> In such scheme we also process redundant space lookup since we already have struct space * in cursor.
> (Patch where I introduced this function isn’t in trunk, so I can easily fix it).

Yes, I very like the removal of box_process1. I just do not like code duplicate. But also I can not see a place for declaration of process_rw in any header. So lets make it extern "C" non-static in box.cc, and declare it as extern in sql.c.

> 
>>> 
>>> @@ -1088,20 +1107,16 @@ int tarantoolSqlite3IncrementMaxid(BtCursor *pCur)
>>> 	request.ops = ops;
>>> 	request.ops_end = ops + sizeof(ops);
>>> 	request.type = IPROTO_UPSERT;
>>> -	request.space_id = pCur->space->def->id;
>>> -	rc = sql_execute_dml(&request, pCur->space);
>>> +	request.space_id = BOX_SCHEMA_ID;
>>> +	rc = sql_execute_dml(&request, space_schema);
>> 
>> 2. If you use process_rw, then you do not need iterator to get the previous max_id and increment it - you just do process_rw, get result and do tuple_field_u64(result, 1, space_max_id);

Suppose, this will be fixed, if you will use process_rw, as described above.

>> 
>>> 	if (rc != 0) {
>>> 		iterator_delete(it);
>>> 		return SQL_TARANTOOL_ERROR;
>>> 	}
>>> -	if (pCur->last_tuple != NULL) {
>>> -		box_tuple_unref(pCur->last_tuple);
>>> -	}
>>> -	box_tuple_ref(res);
>>> -	pCur->last_tuple = res;
>>> -	pCur->eState = CURSOR_VALID;
>>> +	rc = tuple_field_u64(res, 1, space_max_id);
>>> +	(*space_max_id)++;
>>> 	iterator_delete(it);
>>> -	return SQLITE_OK;
>>> +	return rc  == 0 ? SQLITE_OK : SQL_TARANTOOL_ERROR;
>>> }
>>> 
>>> /*
>>> @@ -1687,14 +1702,12 @@ int tarantoolSqlite3EphemeralGetMaxId(BtCursor *pCur, uint32_t fieldno,
>>> * If index is empty - return 0 in max_id and success status
>>> */
>>> int
>>> -tarantoolSqlGetMaxId(BtCursor *cur, uint32_t fieldno,
>>> -		     uint64_t *max_id)
>>> +tarantoolSqlGetMaxId(uint32_t space_id, uint64_t *max_id)
>> 
>> 3. After changing a function signature, please, fix a comments too. And why this function is sure, that the max id stored in the first column? And why it can not be used for secondary indexes?
> 
> It might, but it is only applied for _sequence space during DDL routine
> (and this is also answer for the first question: format of system space doesn’t change frequently),
> so I decided to remove cursor/index overhead (as a part of overall DDL refactoring).
> Probably, I should also rename this opcode to make it more clear.

Yes, lets rename it, and remove space_id argument, if it is always used for _sequence space.
>>> 
>>> +/* Opcode: SInsert P1 P2 * * P5
>>> + * Synopsis: space id = P1, key = r[P2]
>>> + *
>>> + * This opcode is used only during DML routine.
>> 
>> 4. Do you mean DDL instead of DML?

Hm?

>> 
>>> + * In contrast to ordinary insertion, insertion to system spaces
>>> + * such as _space or _index will lead to schema changes.
>>> + * Thus, usage of space pointers is going to be impossible,
>>> + * as far as pointers can be expired since compilation time.
>>> + *
>>> + * If P5 is set to OPFLAG_NCHANGE, account overall changes
>>> + * made to database.
>>> + */
>>> +case OP_SInsert: {
>>> +	assert(pOp->p1 > 0);
>>> +
>>> +/* Opcode: SIDtoPtr P1 P2 * * *
>>> + * Synopsis: space id = P1, space[out] = r[P2]
>>> + *
>>> + * This opcode makes look up by space id and save found space
>>> + * into register, specified by the content of register P2.
>>> + * Such trick is needed during DML routine, since schema may
>>> + * change and pointers become expired.
>>> + */
>>> +case OP_SIDtoPtr: {
>> 
>> 6. It seems to be unused in the patch. Can you use it for all these things, introduced by the previous patch?
>> struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(pIdx->tnum));
>> assert(space != NULL);
>> sqlite3VdbeAddOp4Ptr(v, OP_LoadPtr, 0, space_ptr_reg, 0, (void *) space);
>> sqlite3VdbeAddOp3(v, OP_OpenRead, iIdxCur, pIdx->tnum, space_ptr_reg);
>> 
>> Using OP_SIDtoPtr you already in this patch can defer space_by_id lookup.
> 
> It is used in the next patch. I don’t want to make a lot of diffs, since introducing of any opcodes
> leads to auto-generating of opcodes.c/h and as a result to vast amount of redundant diff.

Ok, then lets leave it in this patch.

> As for your example, during SQL DDL we need to execute queries such as:
> 'DELETE FROM BOX_SEQUENCE WHERE name = zName;’
> And ‘name’ is not a key. Hence, we have to open cursor and iterate through the space.
> But if we used pointers to spaces which were fetched at compile state,
> they would become expired at VDBE runtime:
> after first stage of DDL (e.g. deletion from _trigger), schema may change.
> Moreover, we are going to support syntax like: ‘CREATE TABLE AS SELECT …’,
> where DDL and DML are mixing. Thus we need facility to make space lookup at VDBE runtime.

I agree, that we must have this opcode, but why it can not be used for DML too? In my example you at compile time get space *, store it by OP_LoadPtr, and then use to open iterator (but I can not see where - I commented this in a previous patch review), but for DML you can replace compile lookups on runtime lookups using OP_SIDtoPtr. And it also solves the problem, that a prepared VDBE statement (which in a future we will store for a long time) must store space * in opcodes - we can use OP_SIDtoPtr and do not store it until execution. Sorry, can not explain more clear. We can also discuss it face-to-face.

> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180322/35503dd9/attachment.html>


More information about the Tarantool-patches mailing list