[Tarantool-patches] [PATCH] vinyl: fix recovery after deletion .run files

Maxim Kulis bokuno at picodata.io
Tue Feb 4 03:22:53 MSK 2020


Hi!

> Is there any issue corresponding to the problem you are solving?

No, there is no issue corresponding to that problem.

> > ---
> > src/box/vy_lsm.c | 19 +-
> > test/vinyl/delete_run.result | 302 +++++++++++++++++++++++++++++
> > test/vinyl/delete_run.test.lua | 93 +++++++++
> > test/vinyl/force_recovery_true.lua | 11 ++
> > 4 files changed, 421 insertions(+), 4 deletions(-)
> > create mode 100644 test/vinyl/delete_run.result
> > create mode 100644 test/vinyl/delete_run.test.lua
> > create mode 100644 test/vinyl/force_recovery_true.lua
> >
> > diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
> > index aa4bce9eb..a8abaf031 100644
> > --- a/src/box/vy_lsm.c
> > +++ b/src/box/vy_lsm.c
> > @@ -374,6 +374,9 @@ vy_lsm_recover_run(struct vy_lsm *lsm, struct vy_run_recovery_info *run_info,
> > lsm->space_id, lsm->index_id,
> > lsm->cmp_def, lsm->key_def,
> > lsm->disk_format, &lsm->opts) != 0)) {
> > + vy_log_tx_begin();
> > + vy_log_drop_run(run_info->id, 0);
> > + vy_log_tx_try_commit();
> > vy_run_unref(run);
> 
> You can re-use vy_run_discard().

Yes, I agree.


> > return NULL;
> > }
> > @@ -426,8 +429,14 @@ vy_lsm_recover_slice(struct vy_lsm *lsm, struct vy_range *range,
> >
> > run = vy_lsm_recover_run(lsm, slice_info->run,
> > run_env, force_recovery);
> > - if (run == NULL)
> > + if (run == NULL) {
> > + if (force_recovery) {
> > + vy_log_tx_begin();
> > + vy_log_delete_slice(slice_info->id);
> > + vy_log_tx_try_commit();
> > + }
> > goto out;
> > + }
> >
> > slice = vy_slice_new(slice_info->id, run, begin, end, lsm->cmp_def);
> > if (slice == NULL)
> > @@ -486,9 +495,11 @@ vy_lsm_recover_range(struct vy_lsm *lsm,
> > rlist_foreach_entry_reverse(slice_info, &range_info->slices, in_range) {
> > if (vy_lsm_recover_slice(lsm, range, slice_info,
> > run_env, force_recovery) == NULL) {
> > - vy_range_delete(range);
> > - range = NULL;
> > - goto out;
> > + if (force_recovery == false) {
> 
> Nit: if (!force_recovery)
> 
> Why do we delete slice if force_recovery is enabled,
> meanwhile delete range/run when force_recovery is disabled?

Slice is a reference to a run containing a key range and it’s stored only in the metadata log.
So, if run file is deleted and we restart server with force_recovery=true we have to delete
the corresponding slice to ignore the file loss. 
Range is an LSM-subtree, so if we want to get the rest of the subtree  at any cost (force_recovery=true)
we don't have to delete the range.

> Could you comment these test cases explaining what's going on here?

> > +test_run:cmd("create server test with script='vinyl/force_recovery_true.lua'")
> > +test_run:cmd("start server test with args='1'")
> > +test_run:cmd("switch test")
> > +
> > +test = box.schema.space.create('test', {engine='vinyl'})
> > +_ = test:create_index('pk')
> > +for i = 1, 10 do test:insert{i, i + 100} box.snapshot() end
> > +test:select()
> > +
> > +fio = require ('fio')
> > +for _, f in pairs(fio.glob(box.cfg.vinyl_dir .. '/' .. test.id .. '/0/*26*')) do fio.unlink(f) end
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('restart server test with args="1"')
> > +test_run:cmd('switch test')
> > +box.cfg.force_recovery
> > +box.space.test:select()
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('restart server test with args="0"')
> > +test_run:cmd('switch test')
> > +box.cfg.force_recovery
> > +box.space.test:select()
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('stop server test')
> > +test_run:cmd('cleanup server test')
> > +test_run:cmd('delete server test')
> > +
> > +-- Test 2
> > +test_run = require('test_run').new()
> > +test_run:cmd("create server test with script='vinyl/force_recovery_true.lua'")
> > +test_run:cmd("start server test with args='1'")
> > +test_run:cmd("switch test")
> > +
> > +test = box.schema.space.create('test', {engine='vinyl'})
> > +_ = test:create_index('pk')
> > +_ = test:insert{1, "123"}
> > +test:select()
> > +box.snapshot()
> > +
> > +fio = require ('fio')
> > +for _, f in pairs(fio.glob(box.cfg.vinyl_dir .. '/' .. test.id .. '/0/*.index')) do fio.unlink(f) end
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('restart server test with args="1"')
> > +test_run:cmd('switch test')
> > +box.space.test:select()
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('restart server test with args="0"')
> > +test_run:cmd('switch test')
> > +box.space.test:select()
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('stop server test')
> > +test_run:cmd('cleanup server test')
> > +test_run:cmd('delete server test')
> > +
> > +-- Test3
> > +
> > +test_run = require('test_run').new()
> > +test_run:cmd("create server test with script='vinyl/force_recovery_true.lua'")
> > +test_run:cmd("start server test")
> > +test_run:cmd("switch test")
> > +
> > +test = box.schema.space.create('test', {engine='vinyl'})
> > +_ = test:create_index('pk')
> > +_ = test:insert{1, "123"}
> > +test:select(1)
> > +box.snapshot()
> > +
> > +fio = require ('fio')
> > +for _, f in pairs(fio.glob(box.cfg.vinyl_dir .. '/' .. test.id .. '/0/*.run')) do fio.unlink(f) end
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('restart server test with args="1"')
> > +test_run:cmd('switch test')
> > +box.space.test:select()
> > +
> > +test_run:cmd('switch default')
> > +test_run:cmd('restart server test with args="0"')
> > +test_run:cmd('switch test')
> > +box.cfg.force_recovery
> > +box.space.test:select()
> > +test_run:cmd('switch default')
> > +test_run:cmd('stop server test')
> > +test_run:cmd('cleanup server test')
> > +test_run:cmd('delete server test')
> > diff --git a/test/vinyl/force_recovery_true.lua b/test/vinyl/force_recovery_true.lua
> > new file mode 100644
> > index 000000000..53414f22c
> > --- /dev/null
> > +++ b/test/vinyl/force_recovery_true.lua
> > @@ -0,0 +1,11 @@
> > +#!/usr/bin/env tarantool
bash: !/usr/bin/env: event not found
> > +
> > +local fr = tonumber(arg[1])
> > +box.cfg ({
> > + listen = os.getenv("LISTEN"),
> > + replication = os.getenv("MASTER"),
> > + vinyl_memory = 128 * 1024 * 1024,
> > + force_recovery = (fr == 1),
> > +})
> > +
> > +require('console').listen(os.getenv('ADMIN'))
> > --
> > 2.17.1
> >



In these tests I make some snapshots and delete .run and .index files. 
Then I recover server with force_reovery = true,
and in the end I restart serverwith force_recovery = false to make sure that
the server will start up

At the first test I delete only one .run file one .index file.
The second test deletes all .index files and the third tests deletes all .run files.

branch:  https://github.com/maksimkulis/tarantool/tree/maksimkulis/recover-after-deletion-run-files  

>Среда, 15 января 2020, 23:38 +03:00 от Nikita Pettik < korablev at tarantool.org >:
>
>On 13 Jan 14:27, Maksim Kulis wrote:
>> While deletion .run files and recover with force_recovery=true we have to delete some logs which are responsible for deleted files so that we can recover later with force_recovery=false.
>
>Please, limit commit message with 80 characters per line.
>
>Is there any issue corresponding to the problem you are solving?
> 
>> ---
>>  src/box/vy_lsm.c                   |  19 +-
>>  test/vinyl/delete_run.result       | 302 +++++++++++++++++++++++++++++
>>  test/vinyl/delete_run.test.lua     |  93 +++++++++
>>  test/vinyl/force_recovery_true.lua |  11 ++
>>  4 files changed, 421 insertions(+), 4 deletions(-)
>>  create mode 100644 test/vinyl/delete_run.result
>>  create mode 100644 test/vinyl/delete_run.test.lua
>>  create mode 100644 test/vinyl/force_recovery_true.lua
>> 
>> diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
>> index aa4bce9eb..a8abaf031 100644
>> --- a/src/box/vy_lsm.c
>> +++ b/src/box/vy_lsm.c
>> @@ -374,6 +374,9 @@ vy_lsm_recover_run(struct vy_lsm *lsm, struct vy_run_recovery_info *run_info,
>>  				  lsm->space_id, lsm->index_id,
>>  				  lsm->cmp_def, lsm->key_def,
>>  				  lsm->disk_format, &lsm->opts) != 0)) {
>> +		vy_log_tx_begin();
>> +		vy_log_drop_run(run_info->id, 0);
>> +		vy_log_tx_try_commit();
>>  		vy_run_unref(run);
>
>You can re-use vy_run_discard().
>
>>  		return NULL;
>>  	}
>> @@ -426,8 +429,14 @@ vy_lsm_recover_slice(struct vy_lsm *lsm, struct vy_range *range,
>> 
>>  	run = vy_lsm_recover_run(lsm, slice_info->run,
>>  				 run_env, force_recovery);
>> -	if (run == NULL)
>> +	if (run == NULL) {
>> +		if (force_recovery) {
>> +			vy_log_tx_begin();
>> +			vy_log_delete_slice(slice_info->id);
>> +			vy_log_tx_try_commit();
>> +		}
>>  		goto out;
>> +	}
>> 
>>  	slice = vy_slice_new(slice_info->id, run, begin, end, lsm->cmp_def);
>>  	if (slice == NULL)
>> @@ -486,9 +495,11 @@ vy_lsm_recover_range(struct vy_lsm *lsm,
>>  	rlist_foreach_entry_reverse(slice_info, &range_info->slices, in_range) {
>>  		if (vy_lsm_recover_slice(lsm, range, slice_info,
>>  					 run_env, force_recovery) == NULL) {
>> -			vy_range_delete(range);
>> -			range = NULL;
>> -			goto out;
>> +			if (force_recovery == false) {
>
>Nit: if (!force_recovery)
>
>Why do we delete slice if force_recovery is enabled,
>meanwhile delete range/run when force_recovery is disabled?
>
>> +				vy_range_delete(range);
>> +				range = NULL;
>> +				goto out;
>> +			}
>>  		}
>>  	}
>>  	vy_lsm_add_range(lsm, range);
>> diff --git a/test/vinyl/delete_run.test.lua b/test/vinyl/delete_run.test.lua
>> new file mode 100644
>> index 000000000..1e8ed39ed
>> --- /dev/null
>> +++ b/test/vinyl/delete_run.test.lua
>> @@ -0,0 +1,93 @@
>> +test_run = require('test_run').new()
>> +fiber = require('fiber')
>> +
>> +-- Test 1
>
>Could you comment these test cases explaining what's going on here?
>
>> +test_run:cmd("create server test with script='vinyl/force_recovery_true.lua'")
>> +test_run:cmd("start server test with args='1'")
>> +test_run:cmd("switch test")
>> +
>> +test = box.schema.space.create('test', {engine='vinyl'})
>> +_ = test:create_index('pk')
>> +for i = 1, 10 do test:insert{i, i + 100} box.snapshot() end
>> +test:select()
>> +
>> +fio = require ('fio')
>> +for _, f in pairs(fio.glob(box.cfg.vinyl_dir .. '/' .. test.id .. '/0/*26*')) do fio.unlink(f) end
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('restart server test with args="1"')
>> +test_run:cmd('switch test')
>> +box.cfg.force_recovery
>> +box.space.test:select()
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('restart server test with args="0"')
>> +test_run:cmd('switch test')
>> +box.cfg.force_recovery
>> +box.space.test:select()
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('stop server test')
>> +test_run:cmd('cleanup server test')
>> +test_run:cmd('delete server test')
>> +
>> +-- Test 2
>> +test_run = require('test_run').new()
>> +test_run:cmd("create server test with script='vinyl/force_recovery_true.lua'")
>> +test_run:cmd("start server test with args='1'")
>> +test_run:cmd("switch test")
>> +
>> +test = box.schema.space.create('test', {engine='vinyl'})
>> +_ = test:create_index('pk')
>> +_ = test:insert{1, "123"}
>> +test:select()
>> +box.snapshot()
>> +
>> +fio = require ('fio')
>> +for _, f in pairs(fio.glob(box.cfg.vinyl_dir .. '/' .. test.id .. '/0/*.index')) do fio.unlink(f) end
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('restart server test with args="1"')
>> +test_run:cmd('switch test')
>> +box.space.test:select()
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('restart server test with args="0"')
>> +test_run:cmd('switch test')
>> +box.space.test:select()
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('stop server test')
>> +test_run:cmd('cleanup server test')
>> +test_run:cmd('delete server test')
>> +
>> +-- Test3
>> +
>> +test_run = require('test_run').new()
>> +test_run:cmd("create server test with script='vinyl/force_recovery_true.lua'")
>> +test_run:cmd("start server test")
>> +test_run:cmd("switch test")
>> +
>> +test = box.schema.space.create('test', {engine='vinyl'})
>> +_ = test:create_index('pk')
>> +_ = test:insert{1, "123"}
>> +test:select(1)
>> +box.snapshot()
>> +
>> +fio = require ('fio')
>> +for _, f in pairs(fio.glob(box.cfg.vinyl_dir .. '/' .. test.id .. '/0/*.run')) do fio.unlink(f) end
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('restart server test with args="1"')
>> +test_run:cmd('switch test')
>> +box.space.test:select()
>> +
>> +test_run:cmd('switch default')
>> +test_run:cmd('restart server test with args="0"')
>> +test_run:cmd('switch test')
>> +box.cfg.force_recovery
>> +box.space.test:select()
>> +test_run:cmd('switch default')
>> +test_run:cmd('stop server test')
>> +test_run:cmd('cleanup server test')
>> +test_run:cmd('delete server test')
>> diff --git a/test/vinyl/force_recovery_true.lua b/test/vinyl/force_recovery_true.lua
>> new file mode 100644
>> index 000000000..53414f22c
>> --- /dev/null
>> +++ b/test/vinyl/force_recovery_true.lua
>> @@ -0,0 +1,11 @@
>> +#!/usr/bin/env tarantool
>> +
>> +local fr = tonumber(arg[1])
>> +box.cfg ({
>> +    listen = os.getenv("LISTEN"),
>> +    replication = os.getenv("MASTER"),
>> +    vinyl_memory = 128 * 1024 * 1024,
>> +    force_recovery = (fr == 1),
>> +})
>> +
>> +require('console').listen(os.getenv('ADMIN'))
>> -- 
>> 2.17.1
>> 


-- 
Maxim Kulis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200204/a88c66c3/attachment.html>


More information about the Tarantool-patches mailing list