From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <korablev@tarantool.org>
Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id D7710469710
 for <tarantool-patches@dev.tarantool.org>;
 Thu,  7 May 2020 16:02:57 +0300 (MSK)
Date: Thu, 7 May 2020 13:02:56 +0000
From: Nikita Pettik <korablev@tarantool.org>
Message-ID: <20200507130256.GB11724@tarantool.org>
References: <cover.1588273848.git.korablev@tarantool.org>
 <5012bd8eb07b5379eb70aa777402c7cd566a3b34.1588273848.git.korablev@tarantool.org>
 <959a3f72-6b95-30b1-c838-3b9e2bebc04c@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <959a3f72-6b95-30b1-c838-3b9e2bebc04c@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH 2/2] vinyl: drop wasted runs in case
	range recovery fails
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org

On 03 May 21:21, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> On 30/04/2020 21:27, Nikita Pettik wrote:
> > If recovery process fails during range restoration, range itself is
> > deleted and recovery is assumed to be finished as failed (in case of
> > casual i.e. not forced recovery). During recovery of particular range,
> > runs to be restored are reffed twice: once when they are created at
> > vy_run_new() and once when they are attached to slice. This fact is
> > taken into consideration and after all ranges are recovered all runs of
> > lsm tree are unreffed so that slices own run resources. However, if
> 
> Nit: 'unreffed' -> 'unrefed'. Since this is a shortcut for
> 'unreferenced', where number of 'f' is 1.
> 
> The same for 'reffed'.

Fixed.
 
> > range recovery fails, it is dropped alongside with slices which in turn
> > results in unreffing runs - this is not accounted. In this case, once
> > again unreffing such runs would lead to their destruction. On the other
> > hand, iteration over runs may turn out to be unsafe, so we should use
> > rlist_foreach_entry_safe(). Moreover, we should explicitly unaccount
> > these runs calling vy_lsm_remove_run().
> 
> Sorry, I didn't understand almost everything from the message :D
> But I did from the code. You may want to rephrase/restruct the text
> if you think it may help.

A bit reworked commit message and provided a brief schema in pseudocode.
Hope this will help:

    vinyl: drop wasted runs in case range recovery fails
    
    If recovery process fails during range restoration, range itself is
    deleted and recovery is assumed to be finished as failed (in case of
    casual i.e. not forced recovery). During recovery of particular range,
    runs to be restored are refed twice: once when they are created at
    vy_run_new() and once when they are attached to slice. This fact is
    taken into consideration and after all ranges are recovered: all runs of
    lsm tree are unrefed so that slices own run resources (as a result, when
    slice is to be deleted its runs unrefed and deleted as well). However, if
    range recovery fails, range is dropped alongside with already recovered
    slices. This leads to unrefing runs - this is not accounted. To sum up
    recovery process below is a brief schema:
    
    foreach range in lsm.ranges {
      vy_lsm_recover_range(range) {
        foreach slice in range.slices {
          // inside recover_slice() each run is refed twice
          if vy_lsm_recover_slice() != 0 {
            // here all already restored slices are deleted and
            // corresponding runs are unrefed, so now they have 1 ref.
            range_delete()
          }
        }
      }
    }
    foreach run in lsm.runs {
      assert(run->refs > 1)
      vy_run_unref(run)
    }
    
    In this case, unrefing such runs one more time would lead to their
    destruction. On the other hand, iteration over runs may turn out to
    be unsafe, so we should use rlist_foreach_entry_safe(). Moreover, we
    should explicitly clean-up these runs calling vy_lsm_remove_run().
    
    Closes #4805


> > diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
> > index 3d3f41b7a..81b011c69 100644
> > --- a/src/box/vy_lsm.c
> > +++ b/src/box/vy_lsm.c
> > @@ -604,9 +604,17 @@ vy_lsm_recover(struct vy_lsm *lsm, struct vy_recovery *recovery,
> >  	 * of each recovered run. We need to drop the extra
> >  	 * references once we are done.
> >  	 */
> > -	struct vy_run *run;
> > -	rlist_foreach_entry(run, &lsm->runs, in_lsm) {
> > -		assert(run->refs > 1);
> > +	struct vy_run *run, *next_run;
> > +	rlist_foreach_entry_safe(run, &lsm->runs, in_lsm, next_run) {
> > +		/*
> > +		 * In case vy_lsm_recover_range() failed, slices
> > +		 * are already deleted and runs are unreffed. So
> > +		 * we have nothing to do but finish run clean-up.
> > +		 */
> > +		if (run->refs == 1) {
> 
> Reference counter looks like not a good information channel.
> Could you use run->fd to check whether the run was really recovered?
> vy_run_recover() leaves it -1, when fails.
> 
> Otherwise this won't work the second when we will ref the run anywhere
> else.

Firstly, lsm at this point is not restored, ergo it is not functional
and run can't be refed somewehere else - it's life span is clearly
defined. Secondly, the problem is not in the last run (which failed to
recover) but in those which are already recovered at the moment.
Recovered runs feature valid fds. Finally, slice recover may fail
not only in vy_run_recover(), but also due to oom, broken vylog etc.
All these scenarios lead to the same situation.