* Re: [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL
@ 2020-06-10 15:31 Aleksandr Lyapunov
2020-06-10 21:41 ` Nikita Pettik
0 siblings, 1 reply; 6+ messages in thread
From: Aleksandr Lyapunov @ 2020-06-10 15:31 UTC (permalink / raw)
To: Nikita Pettik, tarantool-patches
Thanks for the patch, lgtm.
Actually it's a bit complicated, as well vy_scheduler itself.
> It may turn out that dump_generation does not catch up with current
> generation and no other dump tasks are in progress. This may happen
> dump process is throttled due to errors. In this case generation is
> bumped but dump_generation is not (since dump is not completed). In
> turn, throttling opens a window for DDL operations. For instance, index
> dropping and creation of new one results in mentioned situation:
>
> box.snapshot() -- fails for some reason; next attempt at dumping will be
> -- taken in one second.
> s:drop() -- drop index to be dumped
> s = box.schema.space.create('test', {engine = 'vinyl'})
> -- create new one (its mem generation is greater than scheduler's one)
> i = s:create_index('pk')
>
> Closes #4821
> ---
> Note that current fix is only workaround amd doesn't pretend to be
> the most proper one. Alternatively, we can rollback scheduler
> generation in case dump is failed and index to be dumped is dropped.
>
> Branch:https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
> Issue:https://github.com/tarantool/tarantool/issues/4821
>
> src/box/vy_scheduler.c | 14 ++++-
> .../gh-4821-ddl-during-throttled-dump.result | 57 +++++++++++++++++++
> ...gh-4821-ddl-during-throttled-dump.test.lua | 22 +++++++
> test/vinyl/suite.ini | 2 +-
> 4 files changed, 92 insertions(+), 3 deletions(-)
> create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.result
> create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua
>
> diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
> index bf4c3fe58..880304b66 100644
> --- a/src/box/vy_scheduler.c
> +++ b/src/box/vy_scheduler.c
> @@ -1846,9 +1846,19 @@ retry:
> /*
> * Dump is in progress, but all eligible LSM trees are
> * already being dumped. Wait until the current round
> - * is complete.
> + * is complete. But if there's no any other tasks
> + * in progress, it may mean that dump_generation does
> + * not catch up with current generation. This may happen
> + * due to failed dump process (i.e. generation is bumped
> + * but dump_generation is not). In turn, after dump is failed,
> + * next dump will be throttled which opens a window for DDL
> + * operations. For instance, index dropping and creation of
> + * new one results in mentioned situation.
> */
> - assert(scheduler->dump_task_count > 0);
> + if (scheduler->dump_task_count == 0) {
> + scheduler->dump_generation = vy_lsm_generation(lsm);
> + goto retry;
> + }
> no_task:
> if (worker != NULL)
> vy_worker_pool_put(worker);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL
2020-06-10 15:31 [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL Aleksandr Lyapunov
@ 2020-06-10 21:41 ` Nikita Pettik
2020-06-11 1:40 ` Nikita Pettik
0 siblings, 1 reply; 6+ messages in thread
From: Nikita Pettik @ 2020-06-10 21:41 UTC (permalink / raw)
To: Aleksandr Lyapunov; +Cc: tarantool-patches
On 10 Jun 18:31, Aleksandr Lyapunov wrote:
> Thanks for the patch, lgtm.
> Actually it's a bit complicated, as well vy_scheduler itself.
Pushed to master, 2.4, 2.3 and 1.10. Changelogs are updated
correspondingly. Branch is dropped.
> > It may turn out that dump_generation does not catch up with current
> > generation and no other dump tasks are in progress. This may happen
> > dump process is throttled due to errors. In this case generation is
> > bumped but dump_generation is not (since dump is not completed). In
> > turn, throttling opens a window for DDL operations. For instance, index
> > dropping and creation of new one results in mentioned situation:
> >
> > box.snapshot() -- fails for some reason; next attempt at dumping will be
> > -- taken in one second.
> > s:drop() -- drop index to be dumped
> > s = box.schema.space.create('test', {engine = 'vinyl'})
> > -- create new one (its mem generation is greater than scheduler's one)
> > i = s:create_index('pk')
> >
> > Closes #4821
> > ---
> > Note that current fix is only workaround amd doesn't pretend to be
> > the most proper one. Alternatively, we can rollback scheduler
> > generation in case dump is failed and index to be dumped is dropped.
> >
> > Branch:https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
> > Issue:https://github.com/tarantool/tarantool/issues/4821
> >
> > src/box/vy_scheduler.c | 14 ++++-
> > .../gh-4821-ddl-during-throttled-dump.result | 57 +++++++++++++++++++
> > ...gh-4821-ddl-during-throttled-dump.test.lua | 22 +++++++
> > test/vinyl/suite.ini | 2 +-
> > 4 files changed, 92 insertions(+), 3 deletions(-)
> > create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.result
> > create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua
> >
> > diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
> > index bf4c3fe58..880304b66 100644
> > --- a/src/box/vy_scheduler.c
> > +++ b/src/box/vy_scheduler.c
> > @@ -1846,9 +1846,19 @@ retry:
> > /*
> > * Dump is in progress, but all eligible LSM trees are
> > * already being dumped. Wait until the current round
> > - * is complete.
> > + * is complete. But if there's no any other tasks
> > + * in progress, it may mean that dump_generation does
> > + * not catch up with current generation. This may happen
> > + * due to failed dump process (i.e. generation is bumped
> > + * but dump_generation is not). In turn, after dump is failed,
> > + * next dump will be throttled which opens a window for DDL
> > + * operations. For instance, index dropping and creation of
> > + * new one results in mentioned situation.
> > */
> > - assert(scheduler->dump_task_count > 0);
> > + if (scheduler->dump_task_count == 0) {
> > + scheduler->dump_generation = vy_lsm_generation(lsm);
> > + goto retry;
> > + }
> > no_task:
> > if (worker != NULL)
> > vy_worker_pool_put(worker);
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL
2020-06-10 21:41 ` Nikita Pettik
@ 2020-06-11 1:40 ` Nikita Pettik
0 siblings, 0 replies; 6+ messages in thread
From: Nikita Pettik @ 2020-06-11 1:40 UTC (permalink / raw)
To: Aleksandr Lyapunov; +Cc: tarantool-patches
On 10 Jun 21:41, Nikita Pettik wrote:
> On 10 Jun 18:31, Aleksandr Lyapunov wrote:
> > Thanks for the patch, lgtm.
> > Actually it's a bit complicated, as well vy_scheduler itself.
>
> Pushed to master, 2.4, 2.3 and 1.10. Changelogs are updated
> correspondingly. Branch is dropped.
I also promised Konstantin provide more tests involving DDL operations
and failures of dump/compact processes. However, now there's no
infrastructure to write stress-like test. On the other hand,
the very similar bug is appeared on our customer's instance.
So that I pushed fix as is. To provide decent testing I've
opened this issue: https://github.com/tarantool/tarantool/issues/5076
> > > It may turn out that dump_generation does not catch up with current
> > > generation and no other dump tasks are in progress. This may happen
> > > dump process is throttled due to errors. In this case generation is
> > > bumped but dump_generation is not (since dump is not completed). In
> > > turn, throttling opens a window for DDL operations. For instance, index
> > > dropping and creation of new one results in mentioned situation:
> > >
> > > box.snapshot() -- fails for some reason; next attempt at dumping will be
> > > -- taken in one second.
> > > s:drop() -- drop index to be dumped
> > > s = box.schema.space.create('test', {engine = 'vinyl'})
> > > -- create new one (its mem generation is greater than scheduler's one)
> > > i = s:create_index('pk')
> > >
> > > Closes #4821
> > > ---
> > > Note that current fix is only workaround amd doesn't pretend to be
> > > the most proper one. Alternatively, we can rollback scheduler
> > > generation in case dump is failed and index to be dumped is dropped.
> > >
> > > Branch:https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
> > > Issue:https://github.com/tarantool/tarantool/issues/4821
> > >
> > > src/box/vy_scheduler.c | 14 ++++-
> > > .../gh-4821-ddl-during-throttled-dump.result | 57 +++++++++++++++++++
> > > ...gh-4821-ddl-during-throttled-dump.test.lua | 22 +++++++
> > > test/vinyl/suite.ini | 2 +-
> > > 4 files changed, 92 insertions(+), 3 deletions(-)
> > > create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.result
> > > create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua
> > >
> > > diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
> > > index bf4c3fe58..880304b66 100644
> > > --- a/src/box/vy_scheduler.c
> > > +++ b/src/box/vy_scheduler.c
> > > @@ -1846,9 +1846,19 @@ retry:
> > > /*
> > > * Dump is in progress, but all eligible LSM trees are
> > > * already being dumped. Wait until the current round
> > > - * is complete.
> > > + * is complete. But if there's no any other tasks
> > > + * in progress, it may mean that dump_generation does
> > > + * not catch up with current generation. This may happen
> > > + * due to failed dump process (i.e. generation is bumped
> > > + * but dump_generation is not). In turn, after dump is failed,
> > > + * next dump will be throttled which opens a window for DDL
> > > + * operations. For instance, index dropping and creation of
> > > + * new one results in mentioned situation.
> > > */
> > > - assert(scheduler->dump_task_count > 0);
> > > + if (scheduler->dump_task_count == 0) {
> > > + scheduler->dump_generation = vy_lsm_generation(lsm);
> > > + goto retry;
> > > + }
> > > no_task:
> > > if (worker != NULL)
> > > vy_worker_pool_put(worker);
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL
2020-04-03 0:21 ` Vladislav Shpilevoy
@ 2020-04-03 1:25 ` Nikita Pettik
0 siblings, 0 replies; 6+ messages in thread
From: Nikita Pettik @ 2020-04-03 1:25 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tarantool-patches
On 03 Apr 02:21, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
>
> See 3 comments below.
>
> On 31/03/2020 02:47, Nikita Pettik wrote:
> > It may turn out that dump_generation does not catch up with current
> > generation and no other dump tasks are in progress. This may happen
> > dump process is throttled due to errors. In this case generation is
> > bumped but dump_generation is not (since dump is not completed). In
> > turn, throttling opens a window for DDL operations. For instance, index
> > dropping and creation of new one results in mentioned situation:
> >
> > box.snapshot() -- fails for some reason; next attempt at dumping will be
> > -- taken in one second.
> > s:drop() -- drop index to be dumped
> > s = box.schema.space.create('test', {engine = 'vinyl'})
> > -- create new one (its mem generation is greater than scheduler's one)
> > i = s:create_index('pk')
> >
> > Closes #4821
> > ---
>
> 1. Add a changelog please.
Will do. But firstly, I'm going to introduce more sophisticated
tests involving creating, filling and deleting several LSMs and
meanwhile random dump fails (to make sure that provided fix works
in all other cases as well).
> > Note that current fix is only workaround amd doesn't pretend to be
> > the most proper one. Alternatively, we can rollback scheduler
> > generation in case dump is failed and index to be dumped is dropped.
>
> 2. Not sure this is possible. At least I don't see a way. The dump is
> failed before the index is dropped. And only on a next dump you see
> that there is a problem. You can try changing the generation in the
> index drop, if you see, that its generation matches the dump generation.
>
> But it looks like encapsulation violation if you do it straight forward.
> vy_lsm works with the scheduler only via its API. If you go this way,
> we need a method like vy_scheduler_lsm_is_dropped(). It would check that
> if the LSM's generation = dump_generation, we should recalculate it
> not counting this tree.
>
> However I am ok with both solutions. Up to you.
>
> > Branch: https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
> > Issue: https://github.com/tarantool/tarantool/issues/4821
> > > diff --git a/test/vinyl/gh-4821-ddl-during-throttled-dump.result b/test/vinyl/gh-4821-ddl-during-throttled-dump.result
> > new file mode 100644
> > index 000000000..eee031101
>
> 3. I rolledback the patch except the test, and it passed anyway.
> The example in the ticket crashes on my machine, but the test
> does not.
Thanks for noting, I'll deal with it.
> Why not to take the test from the ticket as is? With a small
> change of throttling timeout using ERRINJ_VY_SCHED_TIMEOUT, to
> make it fail faster than 1 sec.
But I took test from the ticket except for removing VY_RUN_WRITE_DELAY
injection (it is not vital here).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL
2020-03-31 0:47 Nikita Pettik
@ 2020-04-03 0:21 ` Vladislav Shpilevoy
2020-04-03 1:25 ` Nikita Pettik
0 siblings, 1 reply; 6+ messages in thread
From: Vladislav Shpilevoy @ 2020-04-03 0:21 UTC (permalink / raw)
To: Nikita Pettik, tarantool-patches
Hi! Thanks for the patch!
See 3 comments below.
On 31/03/2020 02:47, Nikita Pettik wrote:
> It may turn out that dump_generation does not catch up with current
> generation and no other dump tasks are in progress. This may happen
> dump process is throttled due to errors. In this case generation is
> bumped but dump_generation is not (since dump is not completed). In
> turn, throttling opens a window for DDL operations. For instance, index
> dropping and creation of new one results in mentioned situation:
>
> box.snapshot() -- fails for some reason; next attempt at dumping will be
> -- taken in one second.
> s:drop() -- drop index to be dumped
> s = box.schema.space.create('test', {engine = 'vinyl'})
> -- create new one (its mem generation is greater than scheduler's one)
> i = s:create_index('pk')
>
> Closes #4821
> ---
1. Add a changelog please.
> Note that current fix is only workaround amd doesn't pretend to be
> the most proper one. Alternatively, we can rollback scheduler
> generation in case dump is failed and index to be dumped is dropped.
2. Not sure this is possible. At least I don't see a way. The dump is
failed before the index is dropped. And only on a next dump you see
that there is a problem. You can try changing the generation in the
index drop, if you see, that its generation matches the dump generation.
But it looks like encapsulation violation if you do it straight forward.
vy_lsm works with the scheduler only via its API. If you go this way,
we need a method like vy_scheduler_lsm_is_dropped(). It would check that
if the LSM's generation = dump_generation, we should recalculate it
not counting this tree.
However I am ok with both solutions. Up to you.
> Branch: https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
> Issue: https://github.com/tarantool/tarantool/issues/4821
> > diff --git a/test/vinyl/gh-4821-ddl-during-throttled-dump.result b/test/vinyl/gh-4821-ddl-during-throttled-dump.result
> new file mode 100644
> index 000000000..eee031101
3. I rolledback the patch except the test, and it passed anyway.
The example in the ticket crashes on my machine, but the test
does not.
Why not to take the test from the ticket as is? With a small
change of throttling timeout using ERRINJ_VY_SCHED_TIMEOUT, to
make it fail faster than 1 sec.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL
@ 2020-03-31 0:47 Nikita Pettik
2020-04-03 0:21 ` Vladislav Shpilevoy
0 siblings, 1 reply; 6+ messages in thread
From: Nikita Pettik @ 2020-03-31 0:47 UTC (permalink / raw)
To: tarantool-patches; +Cc: v.shpilevoy
It may turn out that dump_generation does not catch up with current
generation and no other dump tasks are in progress. This may happen
dump process is throttled due to errors. In this case generation is
bumped but dump_generation is not (since dump is not completed). In
turn, throttling opens a window for DDL operations. For instance, index
dropping and creation of new one results in mentioned situation:
box.snapshot() -- fails for some reason; next attempt at dumping will be
-- taken in one second.
s:drop() -- drop index to be dumped
s = box.schema.space.create('test', {engine = 'vinyl'})
-- create new one (its mem generation is greater than scheduler's one)
i = s:create_index('pk')
Closes #4821
---
Note that current fix is only workaround amd doesn't pretend to be
the most proper one. Alternatively, we can rollback scheduler
generation in case dump is failed and index to be dumped is dropped.
Branch: https://github.com/tarantool/tarantool/tree/np/gh-4821-scheduler-gen-assert
Issue: https://github.com/tarantool/tarantool/issues/4821
src/box/vy_scheduler.c | 14 ++++-
.../gh-4821-ddl-during-throttled-dump.result | 57 +++++++++++++++++++
...gh-4821-ddl-during-throttled-dump.test.lua | 22 +++++++
test/vinyl/suite.ini | 2 +-
4 files changed, 92 insertions(+), 3 deletions(-)
create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.result
create mode 100644 test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index bf4c3fe58..880304b66 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -1846,9 +1846,19 @@ retry:
/*
* Dump is in progress, but all eligible LSM trees are
* already being dumped. Wait until the current round
- * is complete.
+ * is complete. But if there's no any other tasks
+ * in progress, it may mean that dump_generation does
+ * not catch up with current generation. This may happen
+ * due to failed dump process (i.e. generation is bumped
+ * but dump_generation is not). In turn, after dump is failed,
+ * next dump will be throttled which opens a window for DDL
+ * operations. For instance, index dropping and creation of
+ * new one results in mentioned situation.
*/
- assert(scheduler->dump_task_count > 0);
+ if (scheduler->dump_task_count == 0) {
+ scheduler->dump_generation = vy_lsm_generation(lsm);
+ goto retry;
+ }
no_task:
if (worker != NULL)
vy_worker_pool_put(worker);
diff --git a/test/vinyl/gh-4821-ddl-during-throttled-dump.result b/test/vinyl/gh-4821-ddl-during-throttled-dump.result
new file mode 100644
index 000000000..eee031101
--- /dev/null
+++ b/test/vinyl/gh-4821-ddl-during-throttled-dump.result
@@ -0,0 +1,57 @@
+-- test-run result file version 2
+errinj = box.error.injection
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+
+-- During this test we verify that if dump process is throttled
+-- due to error, DDL operations in window between dumps won't
+-- break anything.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+
+errinj.set('ERRINJ_VY_RUN_WRITE', true)
+ | ---
+ | - ok
+ | ...
+s:replace{2}
+ | ---
+ | - [2]
+ | ...
+box.snapshot()
+ | ---
+ | - error: Error injection 'vinyl dump'
+ | ...
+errinj.set('ERRINJ_VY_RUN_WRITE', false)
+ | ---
+ | - ok
+ | ...
+s:drop()
+ | ---
+ | ...
+
+s = box.schema.space.create('test1', {engine = 'vinyl'})
+ | ---
+ | ...
+i = s:create_index('pk1')
+ | ---
+ | ...
+s:replace{1}
+ | ---
+ | - [1]
+ | ...
+-- Unthrottle scheduler.
+errinj.set('ERRINJ_VY_SCHED_TIMEOUT', 0)
+ | ---
+ | - ok
+ | ...
+s:drop()
+ | ---
+ | ...
diff --git a/test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua b/test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua
new file mode 100644
index 000000000..750bed0db
--- /dev/null
+++ b/test/vinyl/gh-4821-ddl-during-throttled-dump.test.lua
@@ -0,0 +1,22 @@
+errinj = box.error.injection
+fiber = require('fiber')
+
+-- During this test we verify that if dump process is throttled
+-- due to error, DDL operations in window between dumps won't
+-- break anything.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+
+errinj.set('ERRINJ_VY_RUN_WRITE', true)
+s:replace{2}
+box.snapshot()
+errinj.set('ERRINJ_VY_RUN_WRITE', false)
+s:drop()
+
+s = box.schema.space.create('test1', {engine = 'vinyl'})
+i = s:create_index('pk1')
+s:replace{1}
+-- Unthrottle scheduler.
+errinj.set('ERRINJ_VY_SCHED_TIMEOUT', 0)
+s:drop()
diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini
index c8bc270f3..d1a0b0e71 100644
--- a/test/vinyl/suite.ini
+++ b/test/vinyl/suite.ini
@@ -2,7 +2,7 @@
core = tarantool
description = vinyl integration tests
script = vinyl.lua
-release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua
+release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4821-ddl-during-throttled-dump.test.lua
config = suite.cfg
lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua
use_unix_sockets = True
--
2.17.1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-11 1:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10 15:31 [Tarantool-patches] [PATCH] vinyl: bump dump_generation in case scheduler doesn't catch up with DDL Aleksandr Lyapunov
2020-06-10 21:41 ` Nikita Pettik
2020-06-11 1:40 ` Nikita Pettik
-- strict thread matches above, loose matches on Subject: below --
2020-03-31 0:47 Nikita Pettik
2020-04-03 0:21 ` Vladislav Shpilevoy
2020-04-03 1:25 ` Nikita Pettik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox