[Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL

Nikita Pettik korablev at tarantool.org
Fri Apr 3 04:25:35 MSK 2020


On 03 Apr 02:21, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 3 comments below.
> 
> On 31/03/2020 02:47, Nikita Pettik wrote:
> > It may turn out that dump_generation does not catch up with current
> > generation and no other dump tasks are in progress. This may happen
> > dump process is throttled due to errors. In this case generation is
> > bumped but dump_generation is not (since dump is not completed). In
> > turn, throttling opens a window for DDL operations. For instance, index
> > dropping and creation of new one results in mentioned situation:
> > 
> > box.snapshot() -- fails for some reason; next attempt at dumping will be
> >                -- taken in one second.
> > s:drop() -- drop index to be dumped
> > s = box.schema.space.create('test', {engine = 'vinyl'})
> > -- create new one (its mem generation is greater than scheduler's one)
> > i = s:create_index('pk')
> > 
> > Closes #4821
> > ---
> 
> 1. Add a changelog please.

Will do. But firstly, I'm going to introduce more sophisticated
tests involving creating, filling and deleting several LSMs and
meanwhile random dump fails (to make sure that provided fix works
in all other cases as well).
 
> > Note that current fix is only workaround amd doesn't pretend to be
> > the most proper one. Alternatively, we can rollback scheduler
> > generation in case dump is failed and index to be dumped is dropped.
> 
> 2. Not sure this is possible. At least I don't see a way. The dump is
> failed before the index is dropped. And only on a next dump you see
> that there is a problem. You can try changing the generation in the
> index drop, if you see, that its generation matches the dump generation.
> 
> But it looks like encapsulation violation if you do it straight forward.
> vy_lsm works with the scheduler only via its API. If you go this way,
> we need a method like vy_scheduler_lsm_is_dropped(). It would check that
> if the LSM's generation = dump_generation, we should recalculate it
> not counting this tree.
> 
> However I am ok with both solutions. Up to you.
> 
> > Branch: https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
> > Issue: https://github.com/tarantool/tarantool/issues/4821
> > > diff --git a/test/vinyl/gh-4821-ddl-during-throttled-dump.result b/test/vinyl/gh-4821-ddl-during-throttled-dump.result
> > new file mode 100644
> > index 000000000..eee031101
> 
> 3. I rolledback the patch except the test, and it passed anyway.
> The example in the ticket crashes on my machine, but the test
> does not.

Thanks for noting, I'll deal with it.
 
> Why not to take the test from the ticket as is? With a small
> change of throttling timeout using ERRINJ_VY_SCHED_TIMEOUT, to
> make it fail faster than 1 sec.

But I took test from the ticket except for removing VY_RUN_WRITE_DELAY
injection (it is not vital here).



More information about the Tarantool-patches mailing list