From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 21 Aug 2018 19:13:50 +0300 From: Konstantin Osipov Subject: Re: [PATCH v2 7/7] vinyl: eliminate disk read on REPLACE/DELETE Message-ID: <20180821161350.GG28159@chai> References: <0ab3db91f86e4321fd835bc474d300b91b970517.1534847663.git.vdavydov.dev@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0ab3db91f86e4321fd835bc474d300b91b970517.1534847663.git.vdavydov.dev@gmail.com> To: Vladimir Davydov Cc: tarantool-patches@freelists.org List-ID: * Vladimir Davydov [18/08/21 15:19]: > /* > - * All indexes of a space must be consistent, i.e. > - * if a tuple is present in one index, it must be > - * present in all other indexes as well, so we can > - * get here only if there's a bug somewhere in vinyl. > - * Don't abort as core dump won't really help us in > - * this case. Just warn the user and proceed to the > - * next tuple. > + * Invalidate the cache entry so that we won't read > + * the overwritten tuple again from the cache. > */ > - say_warn("%s: key %s missing in primary index", > - vy_lsm_name(lsm), vy_stmt_str(tuple)); > + vy_cache_on_write(&lsm->cache, tuple, NULL); Please add a comment why you invalidate the cache. > + /* > + * Extract space id, LSN of the deferred DELETE statement, > + * and the deleted tuple from the system space row. > + */ > + uint32_t space_id; > + if (tuple_field_u32(stmt->new_tuple, 0, &space_id) != 0) > + diag_raise(); > + int64_t lsn; > + if (tuple_field_i64(stmt->new_tuple, 1, &lsn) != 0) > + diag_raise(); > + const char *delete_data = tuple_field(stmt->new_tuple, 2); > + if (delete_data == NULL) { > + diag_set(ClientError, ER_NO_SUCH_FIELD, 2); > + diag_raise(); Please use tuple iterator instead. > + diag_raise(); > + if (space->index_count <= 1) > + return; Please add a comment when this can be the case - e.g. the space was altered after we created a deferred delete. I can't imagine any other case. > + struct tuple *delete = vy_stmt_new_surrogate_delete_raw(pk->mem_format, > + delete_data, delete_data_end); > + if (delete == NULL) > + diag_raise(); > + vy_stmt_set_lsn(delete, lsn); Please say a few words why you reset the statement lsn and how this works downstream (when processed by the write iterator). > + /* Insert the deferred DELETE into secondary indexes. */ > + int rc = 0; > + size_t mem_used_before = lsregion_used(&env->mem_env.allocator); > + const struct tuple *region_stmt = NULL; > + for (uint32_t i = 1; i < space->index_count; i++) { > + struct vy_lsm *lsm = vy_lsm(space->index[i]); > + if (vy_is_committed_one(env, lsm)) > + continue; > + /* > + * As usual, rotate the active in-memory index if > + * schema was changed or dump was triggered. Do it > + * only if processing the first statement, because > + * dump may be triggered by one of the statements > + * of this transaction (see vy_quota_force_use() > + * below), in which case we must not do rotation > + * as we want all statements to land in the same > + * in-memory index. This is safe, as long as we > + * don't yield between statements. > + */ > + struct vy_mem *mem = lsm->mem; > + if (is_first_statement && > + (mem->space_cache_version != space_cache_version || > + mem->generation != *lsm->env->p_generation)) { > + rc = vy_lsm_rotate_mem(lsm); > + if (rc != 0) > + break; > + mem = lsm->mem; > + } > + rc = vy_lsm_set(lsm, mem, delete, ®ion_stmt); > + if (rc != 0) > + break; > + vy_lsm_commit_stmt(lsm, mem, region_stmt); Can we share this code with vy_replace()? > + break; > + } > + vy_mem_pin(mem); > + trigger_create(on_commit, vy_deferred_delete_on_commit, mem, NULL); > + txn_on_commit(txn, on_commit); What about on_rollback? If you missed it, please add a test case :) > +/** > + * Try to generate a deferred DELETE statement on tx commit. > + * > + * This function is supposed to be called for a primary index > + * statement which was executed without deletion of the overwritten > + * tuple from secondary indexes. It looks up the overwritten tuple > + * in memory and, if found, produces the deferred DELETEs and > + * inserts them into the transaction log. > + * Please mention it's not only an optimization, explain why we logically need it (we can get out of memory error when trying to insert during dump). -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov