[PATCH v2 7/7] vinyl: eliminate disk read on REPLACE/DELETE

Konstantin Osipov kostja at tarantool.org
Tue Aug 21 19:13:50 MSK 2018


* Vladimir Davydov <vdavydov.dev at gmail.com> [18/08/21 15:19]:
>  		/*
> -		 * All indexes of a space must be consistent, i.e.
> -		 * if a tuple is present in one index, it must be
> -		 * present in all other indexes as well, so we can
> -		 * get here only if there's a bug somewhere in vinyl.
> -		 * Don't abort as core dump won't really help us in
> -		 * this case. Just warn the user and proceed to the
> -		 * next tuple.
> +		 * Invalidate the cache entry so that we won't read
> +		 * the overwritten tuple again from the cache.
>  		 */
> -		say_warn("%s: key %s missing in primary index",
> -			 vy_lsm_name(lsm), vy_stmt_str(tuple));
> +		vy_cache_on_write(&lsm->cache, tuple, NULL);

Please add a comment why you invalidate the cache.

> +	/*
> +	 * Extract space id, LSN of the deferred DELETE statement,
> +	 * and the deleted tuple from the system space row.
> +	 */
> +	uint32_t space_id;
> +	if (tuple_field_u32(stmt->new_tuple, 0, &space_id) != 0)
> +		diag_raise();
> +	int64_t lsn;
> +	if (tuple_field_i64(stmt->new_tuple, 1, &lsn) != 0)
> +		diag_raise();
> +	const char *delete_data = tuple_field(stmt->new_tuple, 2);
> +	if (delete_data == NULL) {
> +		diag_set(ClientError, ER_NO_SUCH_FIELD, 2);
> +		diag_raise();

Please use tuple iterator instead.

> +		diag_raise();
> +	if (space->index_count <= 1)
> +		return;

Please add a comment when this can be the case - e.g. the space
was altered after we created a deferred delete. I can't imagine
any other case.

> +	struct tuple *delete = vy_stmt_new_surrogate_delete_raw(pk->mem_format,
> +						delete_data, delete_data_end);
> +	if (delete == NULL)
> +		diag_raise();
> +	vy_stmt_set_lsn(delete, lsn);

Please say a few words why you reset the statement lsn and how this works
downstream (when processed by the write iterator).

> +	/* Insert the deferred DELETE into secondary indexes. */
> +	int rc = 0;
> +	size_t mem_used_before = lsregion_used(&env->mem_env.allocator);
> +	const struct tuple *region_stmt = NULL;
> +	for (uint32_t i = 1; i < space->index_count; i++) {
> +		struct vy_lsm *lsm = vy_lsm(space->index[i]);
> +		if (vy_is_committed_one(env, lsm))
> +			continue;
> +		/*
> +		 * As usual, rotate the active in-memory index if
> +		 * schema was changed or dump was triggered. Do it
> +		 * only if processing the first statement, because
> +		 * dump may be triggered by one of the statements
> +		 * of this transaction (see vy_quota_force_use()
> +		 * below), in which case we must not do rotation
> +		 * as we want all statements to land in the same
> +		 * in-memory index. This is safe, as long as we
> +		 * don't yield between statements.
> +		 */
> +		struct vy_mem *mem = lsm->mem;
> +		if (is_first_statement &&
> +		    (mem->space_cache_version != space_cache_version ||
> +		     mem->generation != *lsm->env->p_generation)) {
> +			rc = vy_lsm_rotate_mem(lsm);
> +			if (rc != 0)
> +				break;
> +			mem = lsm->mem;
> +		}
> +		rc = vy_lsm_set(lsm, mem, delete, &region_stmt);
> +		if (rc != 0)
> +			break;
> +		vy_lsm_commit_stmt(lsm, mem, region_stmt);

Can we share this code with vy_replace()?

> +			break;
> +		}
> +		vy_mem_pin(mem);
> +		trigger_create(on_commit, vy_deferred_delete_on_commit, mem, NULL);
> +		txn_on_commit(txn, on_commit);

What about on_rollback? If you missed it, please add a test case
:)
> +/**
> + * Try to generate a deferred DELETE statement on tx commit.
> + *
> + * This function is supposed to be called for a primary index
> + * statement which was executed without deletion of the overwritten
> + * tuple from secondary indexes. It looks up the overwritten tuple
> + * in memory and, if found, produces the deferred DELETEs and
> + * inserts them into the transaction log.
> + *

Please mention it's not only an optimization, explain 
why we logically need it (we can get out of memory error when
trying to insert during dump).


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov



More information about the Tarantool-patches mailing list