From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 27 May 2019 19:45:42 +0300 From: Vladimir Davydov Subject: Re: [PATCH v2] memtx: add yields during index build Message-ID: <20190527164542.iuf55tij67mqlgak@esperanza> References: <20190523141111.14885-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190523141111.14885-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: tarantool-patches@freelists.org List-ID: On Thu, May 23, 2019 at 05:11:11PM +0300, Serge Petrenko wrote: > Memtx index build used to stall event loop for all the build period. > Add occasional yields so that the loop is not blocked for too long. > Also make index build set on_replace triggers so that concurrent > replaces are also correctly handled during the build. > > Closes #3976 Worth mentioning this in the documentation? > --- > https://github.com/tarantool/tarantool/issues/3976 > https://github.com/tarantool/tarantool/tree/sp/gh-3976-background-index-build > > Changes in v2: > - add an on_replace trigger > to handle concurrent replaces > while index build yields > - modify test case slightly, > test concurrent replaces. > > src/box/memtx_space.c | 84 +++++++++++++++++++ > test/box/memtx_background_index_build.result | 77 +++++++++++++++++ > .../box/memtx_background_index_build.test.lua | 46 ++++++++++ > test/box/suite.ini | 2 +- > 4 files changed, 208 insertions(+), 1 deletion(-) > create mode 100644 test/box/memtx_background_index_build.result > create mode 100644 test/box/memtx_background_index_build.test.lua > > diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c > index 5ddb4f7ee..2c72fa8f2 100644 > --- a/src/box/memtx_space.c > +++ b/src/box/memtx_space.c > @@ -870,10 +870,63 @@ memtx_init_ephemeral_space(struct space *space) > memtx_space_add_primary_key(space); > } > > +struct memtx_build_state { > + struct index *index; > + struct tuple_format *format; > + struct tuple *tuple; > + struct diag diag; > + int rc; > +}; > + > +static void > +memtx_build_on_replace(struct trigger *trigger, void *event) > +{ > + struct txn *txn = event; > + struct memtx_build_state *state = trigger->data; > + struct txn_stmt *stmt = txn_current_stmt(txn); > + > + if (stmt->new_tuple != NULL && > + tuple_validate(state->format, stmt->new_tuple) != 0) { > + state->rc = -1; > + diag_move(diag_get(), &state->diag); > + return; > + } > + > + struct tuple *cmp_tuple = stmt->new_tuple != NULL ? stmt->new_tuple : > + stmt->old_tuple; > + struct key_def *cmp_def = state->index->def->cmp_def; > + hint_t state_hint = tuple_hint(state->tuple, cmp_def); > + hint_t cmp_hint = tuple_hint(cmp_tuple, cmp_def); > + /* > + * Only update the already built part of an index. All the other > + * tuples will be inserted when build continues. > + */ > + if (tuple_compare(state->tuple, state_hint, cmp_tuple, cmp_hint, cmp_def) < 0) > + return; It's pointless to compute hints with tuple_hint() before calling tuple_compare() - you could simply pass HINT_NONE to get the same effect. Anyway, this isn't going to work in case of a multikey index, because the latter uses hints to store multikey offsets. Take a look at the implementation of memtx_tree_index_replace_multikey(). > + > + struct tuple *delete = NULL; > + state->rc = index_replace(state->index, stmt->old_tuple, stmt->new_tuple, > + DUP_REPLACE, &delete); You should abort index build if a duplicate record is inserted. > + > + if (state->rc != 0) { > + diag_move(diag_get(), &state->diag); > + } > + return; > +} > + > static int > memtx_space_build_index(struct space *src_space, struct index *new_index, > struct tuple_format *new_format) > { > + /* > + * Yield every 1K tuples. > + * In debug mode yield more often for testing purposes. > + */ > +#ifdef NDEBUG > + enum { YIELD_LOOPS = 1000 }; > +#else > + enum { YIELD_LOOPS = 10 }; > +#endif > /** > * If it's a secondary key, and we're not building them > * yet (i.e. it's snapshot recovery for memtx), do nothing. > @@ -899,6 +952,16 @@ memtx_space_build_index(struct space *src_space, struct index *new_index, > if (it == NULL) > return -1; > > + struct memtx_build_state state; > + state.index = new_index; > + state.format = new_format; > + state.rc = 0; > + diag_create(&state.diag); > + > + struct trigger on_replace; > + trigger_create(&on_replace, memtx_build_on_replace, &state, NULL); > + trigger_add(&src_space->on_replace, &on_replace); > + > /* > * The index has to be built tuple by tuple, since > * there is no guarantee that all tuples satisfy > @@ -909,6 +972,7 @@ memtx_space_build_index(struct space *src_space, struct index *new_index, > /* Build the new index. */ > int rc; > struct tuple *tuple; > + size_t count = 0; > while ((rc = iterator_next(it, &tuple)) == 0 && tuple != NULL) { > /* > * Check that the tuple is OK according to the > @@ -933,8 +997,28 @@ memtx_space_build_index(struct space *src_space, struct index *new_index, > */ > if (new_index->def->iid == 0) > tuple_ref(tuple); > + if (++count % YIELD_LOOPS == 0) { > + /* > + * Remember the latest inserted tuple to > + * avoid re-inserting already processed > + * tuples in on_replace trigger. > + */ > + state.tuple = tuple; > + fiber_sleep(0); > + /* > + * The on_replace trigger may have failed > + * during the yield. > + */ > + if (state.rc != 0) { > + rc = -1; > + diag_move(&state.diag, diag_get()); > + break; > + } > + } > } > iterator_delete(it); > + diag_destroy(&state.diag); > + trigger_clear(&on_replace); > return rc; > } > > diff --git a/test/box/memtx_background_index_build.test.lua b/test/box/memtx_background_index_build.test.lua > new file mode 100644 > index 000000000..34cd3da54 > --- /dev/null > +++ b/test/box/memtx_background_index_build.test.lua > @@ -0,0 +1,46 @@ > +env = require('test_run') > +test_run = env.new() > + > +fiber = require('fiber') > + > + > +test_run:cmd('setopt delimiter ";"') > + > +-- the fiber executing this function will serve two purposes. > +-- First of all it will count the number of context swtiches > +-- that happened during the index build. > +-- Secondly, it will issue concurrent replaces in > +-- the space to check that such replaces are handled correctly > +-- during the index build. > +function f() > + local i = 0 > + while true do > + i = i + 1 > + box.space.test:replace{i, "new"} > + box.space.test:replace{1001-i, "new"} > + fiber.yield() > + fiber.testcancel() > + end > +end; > +test_run:cmd('setopt delimiter ""'); > + > +_ = box.schema.space.create('test') > +_ = box.space.test:create_index('pk') > + > +for i = 1,1000 do box.space.test:insert{i} end > + > +csw = 0 > +fib = fiber.new(f) _ = box.space.test:create_index('sk') csw = fiber.info()[fib.id()].csw > +fiber.cancel(fib) > + > +-- index build in debug mode should yield every 10 iterations. > +-- This means we will have at least 100 event loop iterations > +-- during index build. > +csw >= 100 or csw Why don't you simply introduce a new error injection that would stall index build opening a time window for inserting a new record? IMO the test would be much easier for understanding that way. OTOH it might make sense to write a stress test that would insert random records and check index consistency upon completion. E.g. take a look at vinyl/ddl.test.lua. > +-- check that replaces generated by concurrent transactions > +-- also update the index correctly > +box.space.test.index[1]:get(1) > +box.space.test.index[1]:get(1000) > +box.space.test.index[1]:get(500) > + > +box.space.test:drop() The test is insufficient. You should also check the following cases: - Index build is aborted if a tuple that doesn't match the new index format is inserted into the space. - Index build is aborted if the unique constraint of the new index is violated. - Modifying the space while a multikey index is being built works fine. - Indexes are consistent after recovery. I think it would be best if you adopted corresponding vinyl/ddl test cases and moved them to the engine suite.