From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp18.mail.ru (smtp18.mail.ru [94.100.176.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D8953469710 for ; Fri, 8 May 2020 00:47:51 +0300 (MSK) References: <5012bd8eb07b5379eb70aa777402c7cd566a3b34.1588273848.git.korablev@tarantool.org> <959a3f72-6b95-30b1-c838-3b9e2bebc04c@tarantool.org> <20200507130256.GB11724@tarantool.org> From: Vladislav Shpilevoy Message-ID: <5a697151-e6fc-c567-ccf6-8d02a1df340e@tarantool.org> Date: Thu, 7 May 2020 23:47:49 +0200 MIME-Version: 1.0 In-Reply-To: <20200507130256.GB11724@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 2/2] vinyl: drop wasted runs in case range recovery fails List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik Cc: tarantool-patches@dev.tarantool.org Thanks for the explanation and the new commit message! >>> diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c >>> index 3d3f41b7a..81b011c69 100644 >>> --- a/src/box/vy_lsm.c >>> +++ b/src/box/vy_lsm.c >>> @@ -604,9 +604,17 @@ vy_lsm_recover(struct vy_lsm *lsm, struct vy_recovery *recovery, >>> * of each recovered run. We need to drop the extra >>> * references once we are done. >>> */ >>> - struct vy_run *run; >>> - rlist_foreach_entry(run, &lsm->runs, in_lsm) { >>> - assert(run->refs > 1); >>> + struct vy_run *run, *next_run; >>> + rlist_foreach_entry_safe(run, &lsm->runs, in_lsm, next_run) { >>> + /* >>> + * In case vy_lsm_recover_range() failed, slices >>> + * are already deleted and runs are unreffed. So >>> + * we have nothing to do but finish run clean-up. >>> + */ >>> + if (run->refs == 1) { >> >> Reference counter looks like not a good information channel. >> Could you use run->fd to check whether the run was really recovered? >> vy_run_recover() leaves it -1, when fails. >> >> Otherwise this won't work the second when we will ref the run anywhere >> else. > > Firstly, lsm at this point is not restored, ergo it is not functional > and run can't be refed somewehere else - it's life span is clearly > defined. Secondly, the problem is not in the last run (which failed to > recover) but in those which are already recovered at the moment. > Recovered runs feature valid fds. Finally, slice recover may fail > not only in vy_run_recover(), but also due to oom, broken vylog etc. > All these scenarios lead to the same situation. Yeah, fair. Then what about run->slice_count? If it is zero, then it is not kept by any slice. So we can look at slice_count == 0 and assert ref == 1. Or look at ref == 1, and assert slice_count == 0. Whatever. Will that work?