From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Vladimir Davydov Subject: [PATCH 1/2] vinyl: fix crash during index build Date: Sun, 14 Apr 2019 18:54:52 +0300 Message-Id: To: tarantool-patches@freelists.org List-ID: To propagate changes applied to a space while a new index is being built, we install an on_replace trigger. In case the on_replace trigger callback fails, we abort the DDL operation. The problem is the trigger may yield, e.g. to check the unique constraint of the new index. This opens a time window for the DDL operation to complete and clear the trigger. If this happens, the trigger will try to access the outdated build context and crash: | #0 0x558f29cdfbc7 in print_backtrace+9 | #1 0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7 | #2 0x7fe24e4ab0e0 in __restore_rt+0 | #3 0x558f29bfe036 in error_unref+1a | #4 0x558f29bfe0d1 in diag_clear+27 | #5 0x558f29bfe133 in diag_move+1c | #6 0x558f29c0a4e2 in vy_build_on_replace+236 | #7 0x558f29cf3554 in trigger_run+7a | #8 0x558f29c7b494 in txn_commit_stmt+125 | #9 0x558f29c7e22c in box_process_rw+ec | #10 0x558f29c81743 in box_process1+8b | #11 0x558f29c81d5c in box_upsert+c4 | #12 0x558f29caf110 in lbox_upsert+131 | #13 0x558f29cfed97 in lj_BC_FUNCC+34 | #14 0x558f29d104a4 in lua_pcall+34 | #15 0x558f29cc7b09 in luaT_call+29 | #16 0x558f29cc1de5 in lua_fiber_run_f+74 | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e | #18 0x558f29cdca33 in fiber_loop+41 | #19 0x558f29e4e8cd in coro_init+4c To fix this issue, let's recall that when a DDL operation completes, all pending transactions that affect the altered space are aborted by the space_invalidate callback. So to avoid the crash, we just need to bail out early from the on_replace trigger callback if we detect that the current transaction has been aborted. Closes #4152 --- https://github.com/tarantool/tarantool/issues/4152 https://github.com/tarantool/tarantool/commits/dv/vy-ddl-fixes src/box/vinyl.c | 9 +++++ src/box/vy_scheduler.c | 5 +++ src/lib/core/errinj.h | 1 + test/box/errinj.result | 10 +++-- test/vinyl/errinj_ddl.result | 90 ++++++++++++++++++++++++++++++++++++++++++ test/vinyl/errinj_ddl.test.lua | 42 ++++++++++++++++++++ 6 files changed, 153 insertions(+), 4 deletions(-) diff --git a/src/box/vinyl.c b/src/box/vinyl.c index 776643a8..9428e934 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -4070,6 +4070,15 @@ vy_build_on_replace(struct trigger *trigger, void *event) } return; err: + /* + * No need to abort the DDL request if this transaction + * has been aborted as its changes won't be applied to + * the space anyway. Besides, it might have been aborted + * by the DDL request itself, in which case the build + * context isn't valid and so we must not modify it. + */ + if (tx->state == VINYL_TX_ABORT) + return; ctx->is_failed = true; diag_move(diag_get(), &ctx->diag); } diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c index d56a505a..fabb4bb4 100644 --- a/src/box/vy_scheduler.c +++ b/src/box/vy_scheduler.c @@ -1091,6 +1091,11 @@ fail: static int vy_task_dump_execute(struct vy_task *task) { + struct errinj *errinj = errinj(ERRINJ_VY_DUMP_DELAY, ERRINJ_BOOL); + if (errinj != NULL && errinj->bparam) { + while (errinj->bparam) + fiber_sleep(0.01); + } /* * Don't compress L1 runs as they are most frequently read * and smallest runs at the same time and so we would gain diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h index 5d9317d5..4bca81ca 100644 --- a/src/lib/core/errinj.h +++ b/src/lib/core/errinj.h @@ -123,6 +123,7 @@ struct errinj { _(ERRINJ_VY_INDEX_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_RELAY_BREAK_LSN, ERRINJ_INT, {.iparam = -1}) \ _(ERRINJ_WAL_BREAK_LSN, ERRINJ_INT, {.iparam = -1}) \ + _(ERRINJ_VY_DUMP_DELAY, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_VY_COMPACTION_DELAY, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_TUPLE_FORMAT_COUNT, ERRINJ_INT, {.iparam = -1}) \ _(ERRINJ_MEMTX_DELAY_GC, ERRINJ_BOOL, {.bparam = false}) \ diff --git a/test/box/errinj.result b/test/box/errinj.result index 68b137a1..5905f8ee 100644 --- a/test/box/errinj.result +++ b/test/box/errinj.result @@ -40,22 +40,24 @@ errinj.info() state: false ERRINJ_VY_INDEX_FILE_RENAME: state: false + ERRINJ_VY_POINT_ITER_WAIT: + state: false ERRINJ_VY_DELAY_PK_LOOKUP: state: false ERRINJ_VY_TASK_COMPLETE: state: false ERRINJ_PORT_DUMP: state: false - ERRINJ_VY_POINT_ITER_WAIT: - state: false + ERRINJ_WAL_BREAK_LSN: + state: -1 ERRINJ_WAL_IO: state: false ERRINJ_WAL_FALLOCATE: state: 0 - ERRINJ_WAL_BREAK_LSN: - state: -1 ERRINJ_LOG_ROTATE: state: false + ERRINJ_VY_DUMP_DELAY: + state: false ERRINJ_TUPLE_FORMAT_COUNT: state: -1 ERRINJ_TUPLE_ALLOC: diff --git a/test/vinyl/errinj_ddl.result b/test/vinyl/errinj_ddl.result index 45e116a1..7bee4efd 100644 --- a/test/vinyl/errinj_ddl.result +++ b/test/vinyl/errinj_ddl.result @@ -804,6 +804,96 @@ s:select() s:drop() --- ... +-- +-- gh-4152: yet another DML vs DDL race. +-- +s = box.schema.space.create('test', {engine = 'vinyl'}) +--- +... +_ = s:create_index('primary') +--- +... +s:replace{1, 1} +--- +- [1, 1] +... +box.snapshot() +--- +- ok +... +test_run:cmd("setopt delimiter ';'") +--- +- true +... +-- Create a secondary index. Delay dump. +box.error.injection.set('ERRINJ_VY_DUMP_DELAY', true); +--- +- ok +... +ch = fiber.channel(1); +--- +... +_ = fiber.create(function() + local status, err = pcall(s.create_index, s, 'secondary', + {unique = true, parts = {2, 'unsigned'}}) + ch:put(status or err) +end); +--- +... +-- Wait for dump to start. +test_run:wait_cond(function() + return box.stat.vinyl().scheduler.tasks_inprogress > 0 +end); +--- +- true +... +-- Issue a DML request that will yield to check the unique +-- constraint of the new index. Delay disk read. +box.error.injection.set('ERRINJ_VY_READ_PAGE_DELAY', true); +--- +- ok +... +_ = fiber.create(function() + local status, err = pcall(s.replace, s, {2, 1}) + ch:put(status or err) +end); +--- +... +-- Wait for index creation to complete. +-- It must complete successfully. +box.error.injection.set('ERRINJ_VY_DUMP_DELAY', false); +--- +- ok +... +ch:get(); +--- +- true +... +-- Wait for the DML request to complete. +-- It must be aborted. +box.error.injection.set('ERRINJ_VY_READ_PAGE_DELAY', false); +--- +- ok +... +ch:get(); +--- +- Transaction has been aborted by conflict +... +test_run:cmd("setopt delimiter ''"); +--- +- true +... +s.index.primary:select() +--- +- - [1, 1] +... +s.index.secondary:select() +--- +- - [1, 1] +... +s:drop() +--- +... box.cfg{vinyl_cache = default_vinyl_cache} --- ... diff --git a/test/vinyl/errinj_ddl.test.lua b/test/vinyl/errinj_ddl.test.lua index 5870a0b0..52d9de65 100644 --- a/test/vinyl/errinj_ddl.test.lua +++ b/test/vinyl/errinj_ddl.test.lua @@ -353,4 +353,46 @@ ch4:get() s:select() s:drop() +-- +-- gh-4152: yet another DML vs DDL race. +-- +s = box.schema.space.create('test', {engine = 'vinyl'}) +_ = s:create_index('primary') +s:replace{1, 1} +box.snapshot() + +test_run:cmd("setopt delimiter ';'") +-- Create a secondary index. Delay dump. +box.error.injection.set('ERRINJ_VY_DUMP_DELAY', true); +ch = fiber.channel(1); +_ = fiber.create(function() + local status, err = pcall(s.create_index, s, 'secondary', + {unique = true, parts = {2, 'unsigned'}}) + ch:put(status or err) +end); +-- Wait for dump to start. +test_run:wait_cond(function() + return box.stat.vinyl().scheduler.tasks_inprogress > 0 +end); +-- Issue a DML request that will yield to check the unique +-- constraint of the new index. Delay disk read. +box.error.injection.set('ERRINJ_VY_READ_PAGE_DELAY', true); +_ = fiber.create(function() + local status, err = pcall(s.replace, s, {2, 1}) + ch:put(status or err) +end); +-- Wait for index creation to complete. +-- It must complete successfully. +box.error.injection.set('ERRINJ_VY_DUMP_DELAY', false); +ch:get(); +-- Wait for the DML request to complete. +-- It must be aborted. +box.error.injection.set('ERRINJ_VY_READ_PAGE_DELAY', false); +ch:get(); +test_run:cmd("setopt delimiter ''"); + +s.index.primary:select() +s.index.secondary:select() +s:drop() + box.cfg{vinyl_cache = default_vinyl_cache} -- 2.11.0