From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: tarantool-patches@freelists.org Subject: Re: [PATCH v2] memtx: add yields during index build Date: Tue, 28 May 2019 18:34:41 +0300 [thread overview] Message-ID: <26DAAB24-BDC5-41C1-BA3E-E5BEAC933D48@tarantool.org> (raw) In-Reply-To: <20190527164542.iuf55tij67mqlgak@esperanza> [-- Attachment #1: Type: text/plain, Size: 8484 bytes --] > 27 мая 2019 г., в 19:45, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а): > > On Thu, May 23, 2019 at 05:11:11PM +0300, Serge Petrenko wrote: >> Memtx index build used to stall event loop for all the build period. >> Add occasional yields so that the loop is not blocked for too long. >> Also make index build set on_replace triggers so that concurrent >> replaces are also correctly handled during the build. >> >> Closes #3976 > > Worth mentioning this in the documentation? Hi! Thank you for review. I’ve addressed your comments in v3. Please check it out. > >> --- >> https://github.com/tarantool/tarantool/issues/3976 >> https://github.com/tarantool/tarantool/tree/sp/gh-3976-background-index-build >> >> Changes in v2: >> - add an on_replace trigger >> to handle concurrent replaces >> while index build yields >> - modify test case slightly, >> test concurrent replaces. >> >> src/box/memtx_space.c | 84 +++++++++++++++++++ >> test/box/memtx_background_index_build.result | 77 +++++++++++++++++ >> .../box/memtx_background_index_build.test.lua | 46 ++++++++++ >> test/box/suite.ini | 2 +- >> 4 files changed, 208 insertions(+), 1 deletion(-) >> create mode 100644 test/box/memtx_background_index_build.result >> create mode 100644 test/box/memtx_background_index_build.test.lua >> >> diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c >> index 5ddb4f7ee..2c72fa8f2 100644 >> --- a/src/box/memtx_space.c >> +++ b/src/box/memtx_space.c >> @@ -870,10 +870,63 @@ memtx_init_ephemeral_space(struct space *space) >> memtx_space_add_primary_key(space); >> } >> >> +struct memtx_build_state { >> + struct index *index; >> + struct tuple_format *format; >> + struct tuple *tuple; >> + struct diag diag; >> + int rc; >> +}; >> + >> +static void >> +memtx_build_on_replace(struct trigger *trigger, void *event) >> +{ >> + struct txn *txn = event; >> + struct memtx_build_state *state = trigger->data; >> + struct txn_stmt *stmt = txn_current_stmt(txn); >> + >> + if (stmt->new_tuple != NULL && >> + tuple_validate(state->format, stmt->new_tuple) != 0) { >> + state->rc = -1; >> + diag_move(diag_get(), &state->diag); >> + return; >> + } >> + >> + struct tuple *cmp_tuple = stmt->new_tuple != NULL ? stmt->new_tuple : >> + stmt->old_tuple; >> + struct key_def *cmp_def = state->index->def->cmp_def; >> + hint_t state_hint = tuple_hint(state->tuple, cmp_def); >> + hint_t cmp_hint = tuple_hint(cmp_tuple, cmp_def); >> + /* >> + * Only update the already built part of an index. All the other >> + * tuples will be inserted when build continues. >> + */ >> + if (tuple_compare(state->tuple, state_hint, cmp_tuple, cmp_hint, cmp_def) < 0) >> + return; > > It's pointless to compute hints with tuple_hint() before calling > tuple_compare() - you could simply pass HINT_NONE to get the same > effect. > > Anyway, this isn't going to work in case of a multikey index, because > the latter uses hints to store multikey offsets. Take a look at the > implementation of memtx_tree_index_replace_multikey(). > >> + >> + struct tuple *delete = NULL; >> + state->rc = index_replace(state->index, stmt->old_tuple, stmt->new_tuple, >> + DUP_REPLACE, &delete); > > You should abort index build if a duplicate record is inserted. > >> + >> + if (state->rc != 0) { >> + diag_move(diag_get(), &state->diag); >> + } >> + return; >> +} >> + >> static int >> memtx_space_build_index(struct space *src_space, struct index *new_index, >> struct tuple_format *new_format) >> { >> + /* >> + * Yield every 1K tuples. >> + * In debug mode yield more often for testing purposes. >> + */ >> +#ifdef NDEBUG >> + enum { YIELD_LOOPS = 1000 }; >> +#else >> + enum { YIELD_LOOPS = 10 }; >> +#endif >> /** >> * If it's a secondary key, and we're not building them >> * yet (i.e. it's snapshot recovery for memtx), do nothing. >> @@ -899,6 +952,16 @@ memtx_space_build_index(struct space *src_space, struct index *new_index, >> if (it == NULL) >> return -1; >> >> + struct memtx_build_state state; >> + state.index = new_index; >> + state.format = new_format; >> + state.rc = 0; >> + diag_create(&state.diag); >> + >> + struct trigger on_replace; >> + trigger_create(&on_replace, memtx_build_on_replace, &state, NULL); >> + trigger_add(&src_space->on_replace, &on_replace); >> + >> /* >> * The index has to be built tuple by tuple, since >> * there is no guarantee that all tuples satisfy >> @@ -909,6 +972,7 @@ memtx_space_build_index(struct space *src_space, struct index *new_index, >> /* Build the new index. */ >> int rc; >> struct tuple *tuple; >> + size_t count = 0; >> while ((rc = iterator_next(it, &tuple)) == 0 && tuple != NULL) { >> /* >> * Check that the tuple is OK according to the >> @@ -933,8 +997,28 @@ memtx_space_build_index(struct space *src_space, struct index *new_index, >> */ >> if (new_index->def->iid == 0) >> tuple_ref(tuple); >> + if (++count % YIELD_LOOPS == 0) { >> + /* >> + * Remember the latest inserted tuple to >> + * avoid re-inserting already processed >> + * tuples in on_replace trigger. >> + */ >> + state.tuple = tuple; >> + fiber_sleep(0); >> + /* >> + * The on_replace trigger may have failed >> + * during the yield. >> + */ >> + if (state.rc != 0) { >> + rc = -1; >> + diag_move(&state.diag, diag_get()); >> + break; >> + } >> + } >> } >> iterator_delete(it); >> + diag_destroy(&state.diag); >> + trigger_clear(&on_replace); >> return rc; >> } >> >> diff --git a/test/box/memtx_background_index_build.test.lua b/test/box/memtx_background_index_build.test.lua >> new file mode 100644 >> index 000000000..34cd3da54 >> --- /dev/null >> +++ b/test/box/memtx_background_index_build.test.lua >> @@ -0,0 +1,46 @@ >> +env = require('test_run') >> +test_run = env.new() >> + >> +fiber = require('fiber') >> + >> + >> +test_run:cmd('setopt delimiter ";"') >> + >> +-- the fiber executing this function will serve two purposes. >> +-- First of all it will count the number of context swtiches >> +-- that happened during the index build. >> +-- Secondly, it will issue concurrent replaces in >> +-- the space to check that such replaces are handled correctly >> +-- during the index build. >> +function f() >> + local i = 0 >> + while true do >> + i = i + 1 >> + box.space.test:replace{i, "new"} >> + box.space.test:replace{1001-i, "new"} >> + fiber.yield() >> + fiber.testcancel() >> + end >> +end; >> +test_run:cmd('setopt delimiter ""'); >> + >> +_ = box.schema.space.create('test') >> +_ = box.space.test:create_index('pk') >> + >> +for i = 1,1000 do box.space.test:insert{i} end >> + >> +csw = 0 >> +fib = fiber.new(f) _ = box.space.test:create_index('sk') csw = fiber.info <http://fiber.info/>()[fib.id <http://fib.id/>()].csw >> +fiber.cancel(fib) >> + >> +-- index build in debug mode should yield every 10 iterations. >> +-- This means we will have at least 100 event loop iterations >> +-- during index build. >> +csw >= 100 or csw > > Why don't you simply introduce a new error injection that would stall > index build opening a time window for inserting a new record? IMO the > test would be much easier for understanding that way. > > OTOH it might make sense to write a stress test that would insert random > records and check index consistency upon completion. E.g. take a look at > vinyl/ddl.test.lua. > >> +-- check that replaces generated by concurrent transactions >> +-- also update the index correctly >> +box.space.test.index[1]:get(1) >> +box.space.test.index[1]:get(1000) >> +box.space.test.index[1]:get(500) >> + >> +box.space.test:drop() > > The test is insufficient. You should also check the following cases: > > - Index build is aborted if a tuple that doesn't match the new index > format is inserted into the space. > - Index build is aborted if the unique constraint of the new index is > violated. > - Modifying the space while a multikey index is being built works fine. > - Indexes are consistent after recovery. > > I think it would be best if you adopted corresponding vinyl/ddl test > cases and moved them to the engine suite. [-- Attachment #2: Type: text/html, Size: 44991 bytes --]
prev parent reply other threads:[~2019-05-28 15:34 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-23 14:11 Serge Petrenko 2019-05-23 15:25 ` [tarantool-patches] " Konstantin Osipov 2019-05-28 15:36 ` Serge Petrenko 2019-05-27 16:45 ` Vladimir Davydov 2019-05-28 15:34 ` Serge Petrenko [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=26DAAB24-BDC5-41C1-BA3E-E5BEAC933D48@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=vdavydov.dev@gmail.com \ --subject='Re: [PATCH v2] memtx: add yields during index build' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox