From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D7710469710 for ; Thu, 7 May 2020 16:02:57 +0300 (MSK) Date: Thu, 7 May 2020 13:02:56 +0000 From: Nikita Pettik Message-ID: <20200507130256.GB11724@tarantool.org> References: <5012bd8eb07b5379eb70aa777402c7cd566a3b34.1588273848.git.korablev@tarantool.org> <959a3f72-6b95-30b1-c838-3b9e2bebc04c@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <959a3f72-6b95-30b1-c838-3b9e2bebc04c@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 2/2] vinyl: drop wasted runs in case range recovery fails List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org On 03 May 21:21, Vladislav Shpilevoy wrote: > Thanks for the patch! > > On 30/04/2020 21:27, Nikita Pettik wrote: > > If recovery process fails during range restoration, range itself is > > deleted and recovery is assumed to be finished as failed (in case of > > casual i.e. not forced recovery). During recovery of particular range, > > runs to be restored are reffed twice: once when they are created at > > vy_run_new() and once when they are attached to slice. This fact is > > taken into consideration and after all ranges are recovered all runs of > > lsm tree are unreffed so that slices own run resources. However, if > > Nit: 'unreffed' -> 'unrefed'. Since this is a shortcut for > 'unreferenced', where number of 'f' is 1. > > The same for 'reffed'. Fixed. > > range recovery fails, it is dropped alongside with slices which in turn > > results in unreffing runs - this is not accounted. In this case, once > > again unreffing such runs would lead to their destruction. On the other > > hand, iteration over runs may turn out to be unsafe, so we should use > > rlist_foreach_entry_safe(). Moreover, we should explicitly unaccount > > these runs calling vy_lsm_remove_run(). > > Sorry, I didn't understand almost everything from the message :D > But I did from the code. You may want to rephrase/restruct the text > if you think it may help. A bit reworked commit message and provided a brief schema in pseudocode. Hope this will help: vinyl: drop wasted runs in case range recovery fails If recovery process fails during range restoration, range itself is deleted and recovery is assumed to be finished as failed (in case of casual i.e. not forced recovery). During recovery of particular range, runs to be restored are refed twice: once when they are created at vy_run_new() and once when they are attached to slice. This fact is taken into consideration and after all ranges are recovered: all runs of lsm tree are unrefed so that slices own run resources (as a result, when slice is to be deleted its runs unrefed and deleted as well). However, if range recovery fails, range is dropped alongside with already recovered slices. This leads to unrefing runs - this is not accounted. To sum up recovery process below is a brief schema: foreach range in lsm.ranges { vy_lsm_recover_range(range) { foreach slice in range.slices { // inside recover_slice() each run is refed twice if vy_lsm_recover_slice() != 0 { // here all already restored slices are deleted and // corresponding runs are unrefed, so now they have 1 ref. range_delete() } } } } foreach run in lsm.runs { assert(run->refs > 1) vy_run_unref(run) } In this case, unrefing such runs one more time would lead to their destruction. On the other hand, iteration over runs may turn out to be unsafe, so we should use rlist_foreach_entry_safe(). Moreover, we should explicitly clean-up these runs calling vy_lsm_remove_run(). Closes #4805 > > diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c > > index 3d3f41b7a..81b011c69 100644 > > --- a/src/box/vy_lsm.c > > +++ b/src/box/vy_lsm.c > > @@ -604,9 +604,17 @@ vy_lsm_recover(struct vy_lsm *lsm, struct vy_recovery *recovery, > > * of each recovered run. We need to drop the extra > > * references once we are done. > > */ > > - struct vy_run *run; > > - rlist_foreach_entry(run, &lsm->runs, in_lsm) { > > - assert(run->refs > 1); > > + struct vy_run *run, *next_run; > > + rlist_foreach_entry_safe(run, &lsm->runs, in_lsm, next_run) { > > + /* > > + * In case vy_lsm_recover_range() failed, slices > > + * are already deleted and runs are unreffed. So > > + * we have nothing to do but finish run clean-up. > > + */ > > + if (run->refs == 1) { > > Reference counter looks like not a good information channel. > Could you use run->fd to check whether the run was really recovered? > vy_run_recover() leaves it -1, when fails. > > Otherwise this won't work the second when we will ref the run anywhere > else. Firstly, lsm at this point is not restored, ergo it is not functional and run can't be refed somewehere else - it's life span is clearly defined. Secondly, the problem is not in the last run (which failed to recover) but in those which are already recovered at the moment. Recovered runs feature valid fds. Finally, slice recover may fail not only in vy_run_recover(), but also due to oom, broken vylog etc. All these scenarios lead to the same situation.