From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id F314241C5DA for ; Thu, 11 Jun 2020 04:40:58 +0300 (MSK) Date: Thu, 11 Jun 2020 01:40:57 +0000 From: Nikita Pettik Message-ID: <20200611014057.GA18382@tarantool.org> References: <43322a95-0fa3-fd8a-e77e-fc73a4bfa585@tarantool.org> <20200610214113.GA17509@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200610214113.GA17509@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aleksandr Lyapunov Cc: tarantool-patches@dev.tarantool.org On 10 Jun 21:41, Nikita Pettik wrote: > On 10 Jun 18:31, Aleksandr Lyapunov wrote: > > Thanks for the patch, lgtm. > > Actually it's a bit complicated, as well vy_scheduler itself. > > Pushed to master, 2.4, 2.3 and 1.10. Changelogs are updated > correspondingly. Branch is dropped. I also promised Konstantin provide more tests involving DDL operations and failures of dump/compact processes. However, now there's no infrastructure to write stress-like test. On the other hand, the very similar bug is appeared on our customer's instance. So that I pushed fix as is. To provide decent testing I've opened this issue: https://github.com/tarantool/tarantool/issues/5076 > > > It may turn out that dump_generation does not catch up with current > > > generation and no other dump tasks are in progress. This may happen > > > dump process is throttled due to errors. In this case generation is > > > bumped but dump_generation is not (since dump is not completed). In > > > turn, throttling opens a window for DDL operations. For instance, index > > > dropping and creation of new one results in mentioned situation: > > > > > > box.snapshot() -- fails for some reason; next attempt at dumping will be > > > -- taken in one second. > > > s:drop() -- drop index to be dumped > > > s = box.schema.space.create('test', {engine = 'vinyl'}) > > > -- create new one (its mem generation is greater than scheduler's one) > > > i = s:create_index('pk') > > > > > > Closes #4821 > > > --- > > > Note that current fix is only workaround amd doesn't pretend to be > > > the most proper one. Alternatively, we can rollback scheduler > > > generation in case dump is failed and index to be dumped is dropped. > > > > > > Branch:https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert > > > Issue:https://github.com/tarantool/tarantool/issues/4821 > > > > > > src/box/vy_scheduler.c | 14 ++++- > > > .../gh-4821-ddl-during-throttled-dump.result | 57 +++++++++++++++++++ > > > ...gh-4821-ddl-during-throttled-dump.test.lua | 22 +++++++ > > > test/vinyl/suite.ini | 2 +- > > > 4 files changed, 92 insertions(+), 3 deletions(-) > > > create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.result > > > create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua > > > > > > diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c > > > index bf4c3fe58..880304b66 100644 > > > --- a/src/box/vy_scheduler.c > > > +++ b/src/box/vy_scheduler.c > > > @@ -1846,9 +1846,19 @@ retry: > > > /* > > > * Dump is in progress, but all eligible LSM trees are > > > * already being dumped. Wait until the current round > > > - * is complete. > > > + * is complete. But if there's no any other tasks > > > + * in progress, it may mean that dump_generation does > > > + * not catch up with current generation. This may happen > > > + * due to failed dump process (i.e. generation is bumped > > > + * but dump_generation is not). In turn, after dump is failed, > > > + * next dump will be throttled which opens a window for DDL > > > + * operations. For instance, index dropping and creation of > > > + * new one results in mentioned situation. > > > */ > > > - assert(scheduler->dump_task_count > 0); > > > + if (scheduler->dump_task_count == 0) { > > > + scheduler->dump_generation = vy_lsm_generation(lsm); > > > + goto retry; > > > + } > > > no_task: > > > if (worker != NULL) > > > vy_worker_pool_put(worker); > >