[PATCH v2 7/7] vinyl: eliminate disk read on REPLACE/DELETE
Konstantin Osipov
kostja at tarantool.org
Tue Aug 21 19:13:50 MSK 2018
* Vladimir Davydov <vdavydov.dev at gmail.com> [18/08/21 15:19]:
> /*
> - * All indexes of a space must be consistent, i.e.
> - * if a tuple is present in one index, it must be
> - * present in all other indexes as well, so we can
> - * get here only if there's a bug somewhere in vinyl.
> - * Don't abort as core dump won't really help us in
> - * this case. Just warn the user and proceed to the
> - * next tuple.
> + * Invalidate the cache entry so that we won't read
> + * the overwritten tuple again from the cache.
> */
> - say_warn("%s: key %s missing in primary index",
> - vy_lsm_name(lsm), vy_stmt_str(tuple));
> + vy_cache_on_write(&lsm->cache, tuple, NULL);
Please add a comment why you invalidate the cache.
> + /*
> + * Extract space id, LSN of the deferred DELETE statement,
> + * and the deleted tuple from the system space row.
> + */
> + uint32_t space_id;
> + if (tuple_field_u32(stmt->new_tuple, 0, &space_id) != 0)
> + diag_raise();
> + int64_t lsn;
> + if (tuple_field_i64(stmt->new_tuple, 1, &lsn) != 0)
> + diag_raise();
> + const char *delete_data = tuple_field(stmt->new_tuple, 2);
> + if (delete_data == NULL) {
> + diag_set(ClientError, ER_NO_SUCH_FIELD, 2);
> + diag_raise();
Please use tuple iterator instead.
> + diag_raise();
> + if (space->index_count <= 1)
> + return;
Please add a comment when this can be the case - e.g. the space
was altered after we created a deferred delete. I can't imagine
any other case.
> + struct tuple *delete = vy_stmt_new_surrogate_delete_raw(pk->mem_format,
> + delete_data, delete_data_end);
> + if (delete == NULL)
> + diag_raise();
> + vy_stmt_set_lsn(delete, lsn);
Please say a few words why you reset the statement lsn and how this works
downstream (when processed by the write iterator).
> + /* Insert the deferred DELETE into secondary indexes. */
> + int rc = 0;
> + size_t mem_used_before = lsregion_used(&env->mem_env.allocator);
> + const struct tuple *region_stmt = NULL;
> + for (uint32_t i = 1; i < space->index_count; i++) {
> + struct vy_lsm *lsm = vy_lsm(space->index[i]);
> + if (vy_is_committed_one(env, lsm))
> + continue;
> + /*
> + * As usual, rotate the active in-memory index if
> + * schema was changed or dump was triggered. Do it
> + * only if processing the first statement, because
> + * dump may be triggered by one of the statements
> + * of this transaction (see vy_quota_force_use()
> + * below), in which case we must not do rotation
> + * as we want all statements to land in the same
> + * in-memory index. This is safe, as long as we
> + * don't yield between statements.
> + */
> + struct vy_mem *mem = lsm->mem;
> + if (is_first_statement &&
> + (mem->space_cache_version != space_cache_version ||
> + mem->generation != *lsm->env->p_generation)) {
> + rc = vy_lsm_rotate_mem(lsm);
> + if (rc != 0)
> + break;
> + mem = lsm->mem;
> + }
> + rc = vy_lsm_set(lsm, mem, delete, ®ion_stmt);
> + if (rc != 0)
> + break;
> + vy_lsm_commit_stmt(lsm, mem, region_stmt);
Can we share this code with vy_replace()?
> + break;
> + }
> + vy_mem_pin(mem);
> + trigger_create(on_commit, vy_deferred_delete_on_commit, mem, NULL);
> + txn_on_commit(txn, on_commit);
What about on_rollback? If you missed it, please add a test case
:)
> +/**
> + * Try to generate a deferred DELETE statement on tx commit.
> + *
> + * This function is supposed to be called for a primary index
> + * statement which was executed without deletion of the overwritten
> + * tuple from secondary indexes. It looks up the overwritten tuple
> + * in memory and, if found, produces the deferred DELETEs and
> + * inserts them into the transaction log.
> + *
Please mention it's not only an optimization, explain
why we logically need it (we can get out of memory error when
trying to insert during dump).
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
More information about the Tarantool-patches
mailing list