[server 4/4] Introduce BEFORE trigger

Fri Jan 26 00:17:32 MSK 2018

* Vladimir Davydov <vdavydov.dev at gmail.com> [18/01/23 21:09]:
> 
> Trigger callbacks are executed from space_execute_dml(), right before
> passing down a request to the engine implementation, but after resolving
> the space sequence. Just like on_replace, a before_replace callback is
> passed old and new tuples, but it can also return a tuple or nil, which
> will affect the current statement as follows:
> 
>  - If a callback function returns the old tuple, the statement is
>    ignored and IPROTO_NOP is written to xlog to bump LSN.
> 
>  - If a callback function returns the new tuple or doesn't return
>    anything, the statement is executed as is.

I don't see a test for "doesn't return anything".
Please add tests for all cases in the table we developed today in 
gh-2993, as well as things mentioned in '*' and '**'.

>  - If a callback function returns nil, the statement is turned into
>    DELETE.
> 
>  - If a callback function returns a tuple, the statement is turned
>    into REPLACE for this tuple.
> 
> Other return values result in ER_BEFORE_REPLACE_RET error.
> 
> Note, the trigger must not change the primary key of the old tuple,
> because that would require splitting the resulting statement into two -
> DELETE and REPLACE.
> 
> The new trigger can be used to resolve asynchronous replication
> conflicts as illustrated by replication/before_replace test.
> 
> 
> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 680a051f..fc341d38 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -531,6 +531,7 @@ space_def_new_from_tuple(struct tuple *tuple, uint32_t errcode,
>  static void
>  space_swap_triggers(struct space *new_space, struct space *old_space)
>  {
> +	rlist_swap(&new_space->before_replace, &old_space->before_replace);
>  	rlist_swap(&new_space->on_replace, &old_space->on_replace);
>  	rlist_swap(&new_space->on_stmt_begin, &old_space->on_stmt_begin);
>  }
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 20d11cd1..cdb286ff 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -105,7 +105,7 @@ struct errcode_record {
>  	/* 50 */_(ER_CREATE_FUNCTION,		"Failed to create function '%s': %s") \
>  	/* 51 */_(ER_NO_SUCH_FUNCTION,		"Function '%s' does not exist") \
>  	/* 52 */_(ER_FUNCTION_EXISTS,		"Function '%s' already exists") \
> -	/* 53 */_(ER_UNUSED3,			"") \
> +	/* 53 */_(ER_BEFORE_REPLACE_RET,	"Invalid return value of space:before_replace trigger: expected tuple or nil") \

This message can be misleading now that we allow return with no
arguments. What about box.NULL btw, is it invalid?

> +static int
> +lbox_pop_txn_stmt(struct lua_State *L, void *event)
> +{
> +	struct txn_stmt *stmt = txn_current_stmt((struct txn *) event);
> +
> +	if (lua_gettop(L) < 1) {
> +		/* No return value - nothing to do. */
> +		return 0;
> +	}
> +
> +	struct tuple *result = luaT_istuple(L, 1);
> +	if (result == NULL && !lua_isnil(L, 1)) {
> +		/* Invalid return value - ignore. */
> +		diag_set(ClientError, ER_BEFORE_REPLACE_RET);
> +		return -1;
> +	}

Please handle no return value, box.NULL. The error message could
be more clear if you add a placeholder for lua_type() to it.

> +	/* Update the new tuple. */
> +	if (result != NULL)
> +		tuple_ref(result);
> +	if (stmt->new_tuple != NULL)
> +		tuple_unref(stmt->new_tuple);
> +	stmt->new_tuple = result;
> +	return 0;
> +}

The semantics we settled on is very concise indeed.

> +int
> +request_before_replace(struct request *request, struct space *space,
> +		       struct txn *txn)
> +{

I thought we agreed to keep this piece in space.c and rename to
space_before_replace? Perhaps request_handle_sequence has confused
me and you. I thought you were going to split
request_handle_sequence to space part and request part. In other
words, space.c should depend on request.c, but not vice versa.

> +	if (space->index_count == 0) {
> +		/* Empty space, nothing to do. */
> +		return 0;
> +	}
> +
> +	struct region *gc = &fiber()->gc;
> +	enum iproto_type type = request->type;
> +	struct index *pk = space->index[0];
> +
> +	const char *key;
> +	uint32_t part_count;
> +	struct index *index;
> +
> +	/*
> +	 * Lookup the old tuple.
> +	 */
> +	if (type == IPROTO_UPDATE || type == IPROTO_DELETE) {
> +		index = index_find_unique(space, request->index_id);
> +		if (index == NULL)
> +			return -1;
> +		key = request->key;
> +		part_count = mp_decode_array(&key);
> +		if (exact_key_validate(index->def->key_def,
> +				       key, part_count) != 0)
> +			return -1;
> +	} else if (type == IPROTO_INSERT || type == IPROTO_REPLACE ||
> +		   type == IPROTO_UPSERT) {
> +		index = pk;
> +		key = tuple_extract_key_raw(request->tuple, request->tuple_end,
> +					    index->def->key_def, NULL);
> +		if (key == NULL)
> +			return -1;
> +		part_count = mp_decode_array(&key);
> +	} else {
> +		/* Unknown request type, nothing to do. */
> +		return 0;
> +	}
> +
> +	struct tuple *old_tuple;
> +	if (index_get(index, key, part_count, &old_tuple) != 0)
> +		return -1;
> +
> +	/*
> +	 * Create the new tuple.
> +	 */
> +	uint32_t new_size, old_size;
> +	const char *new_data, *new_data_end;
> +	const char *old_data, *old_data_end;
> +
> +	switch (request->type) {
> +	case IPROTO_INSERT:
> +	case IPROTO_REPLACE:
> +		new_data = request->tuple;
> +		new_data_end = request->tuple_end;
> +		break;
> +	case IPROTO_UPDATE:
> +		if (old_tuple == NULL) {
> +			/* Nothing to update. */
> +			return 0;
> +		}
> +		old_data = tuple_data_range(old_tuple, &old_size);
> +		old_data_end = old_data + old_size;
> +		new_data = tuple_update_execute(region_aligned_alloc_cb, gc,
> +					request->tuple, request->tuple_end,
> +					old_data, old_data_end, &new_size,
> +					request->index_base, NULL);
> +		if (new_data == NULL)
> +			return -1;
> +		new_data_end = new_data + new_size;
> +		break;
> +	case IPROTO_DELETE:
> +		if (old_tuple == NULL) {
> +			/* Nothing to delete. */
> +			return 0;
> +		}
> +		new_data = new_data_end = NULL;
> +		break;
> +	case IPROTO_UPSERT:
> +		if (old_tuple == NULL) {
> +			/*
> +			 * Turn UPSERT into INSERT, but still check
> +			 * provided operations.
> +			 */
> +			new_data = request->tuple;
> +			new_data_end = request->tuple_end;
> +			if (tuple_update_check_ops(region_aligned_alloc_cb, gc,
> +					request->ops, request->ops_end,
> +					request->index_base) != 0)
> +				return -1;
> +			break;
> +		}
> +		old_data = tuple_data_range(old_tuple, &old_size);
> +		old_data_end = old_data + old_size;
> +		new_data = tuple_upsert_execute(region_aligned_alloc_cb, gc,
> +					request->ops, request->ops_end,
> +					old_data, old_data_end, &new_size,
> +					request->index_base, false, NULL);
> +		new_data_end = new_data + new_size;

Can't we use tuple_update_execute here as well? The two are
supposed to be equivalent. 

> +		break;
> +	default:
> +		unreachable();
> +	}
> +
> +	struct tuple *new_tuple = NULL;
> +	if (new_data != NULL) {
> +		new_tuple = tuple_new(tuple_format_runtime,
> +				      new_data, new_data_end);
> +		if (new_tuple == NULL)
> +			return -1;
> +		tuple_ref(new_tuple);
> +	}
> +
> +	assert(old_tuple != NULL || new_tuple != NULL);
> +
> +	/*
> +	 * Execute all registered BEFORE triggers.
> +	 */
> +	struct txn_stmt *stmt = txn_current_stmt(txn);
> +	assert(stmt->old_tuple == NULL && stmt->new_tuple == NULL);

I don't understand this assert. Is this a sub-statement? Why are
both and new statements nil? 

> +	stmt->old_tuple = old_tuple;
> +	stmt->new_tuple = new_tuple;
> +
> +	int rc = trigger_run(&space->before_replace, txn);
> +
> +	/*
> +	 * The trigger can't change the old tuple,
> +	 * but it may replace the new tuple.
> +	 */

There can be many triggers. 

> +	bool request_changed = (stmt->new_tuple != new_tuple);
> +	new_tuple = stmt->new_tuple;
> +	assert(stmt->old_tuple == old_tuple);
> +	stmt->old_tuple = NULL;
> +	stmt->new_tuple = NULL;
> +
> +	if (rc != 0)
> +		goto out;
> +
> +	/*
> +	 * We don't allow to change the value of the primary key
> +	 * in the same statement.
> +	 */
> +	if (request_changed && old_tuple != NULL && new_tuple != NULL &&
> +	    tuple_compare(old_tuple, new_tuple, pk->def->key_def) != 0) {
> +		diag_set(ClientError, ER_CANT_UPDATE_PRIMARY_KEY,
> +			 pk->def->name, space->def->name);
> +		rc = -1;
> +		goto out;
> +	}

> +
> +	/*
> +	 * The trigger changed the resulting tuple.
> +	 * Fix the request to conform.
> +	 */
> +	if (request_changed) {
> +		rc = request_create_from_tuple(request, space,
> +					       old_tuple, new_tuple);
> +	}
> +out:
> +	if (new_tuple != NULL)
> +		tuple_unref(new_tuple);
> +	return rc;
> +}
> +
>  	switch (request->type) {
>  	case IPROTO_INSERT:
>  	case IPROTO_REPLACE:
> -		if (space->sequence != NULL &&
> -		    request_handle_sequence(request, space) != 0)
> -			return -1;
>  		if (space->vtab->execute_replace(space, txn,
>  						 request, result) != 0)
>  			return -1;
> diff --git a/src/box/space.h b/src/box/space.h
> index 69dea2a2..8918dbd8 100644

> diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result

Please have tests for before_replace chaining and multiple
before_replace triggers on the same space, a combination of
before_replace and on_replace, what happens with on_replace
trigger when before_replace turns the statement into a no-op.

Please also test exceptions thrown from before_replace triggers,
savepoint creation and rollback.

Have all this happen not only on master, but in slave applier
fiber. Have this happen during normal subscribe. Please test
that recovery from xlog containing noop statements works. 

No, I haven't read your tests yet: will study them tomorrow and
think some more.

Thank you for working on this.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.org - www.twitter.com/kostja_osipov