From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B7C7946970E for ; Wed, 5 Feb 2020 10:33:39 +0300 (MSK) Received: by mail-lj1-f193.google.com with SMTP id v17so1244435ljg.4 for ; Tue, 04 Feb 2020 23:33:39 -0800 (PST) Date: Wed, 5 Feb 2020 10:33:36 +0300 From: Cyrill Gorcunov Message-ID: <20200205073336.GG12445@uranus> References: <20200128192249.10023-1-gorcunov@gmail.com> <20200128192249.10023-4-gorcunov@gmail.com> <20200204221902.GF20146@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200204221902.GF20146@atlas> Subject: Re: [Tarantool-patches] [PATCH v7 3/5] box/applier: fix nil dereference in applier rollback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov Cc: tml On Wed, Feb 05, 2020 at 01:19:02AM +0300, Konstantin Osipov wrote: > * Cyrill Gorcunov [20/01/28 22:58]: > > + /* > > + * Something really bad happened, we can't proceed > > + * thus stop the applier and throw FiberIsCancelled > > + * exception which will be catched by the caller > > + * and the fiber gracefully finish. > > + * > > + * FIXME: Need to make sure that this is a really > > + * final error where we can't longer proceed and should > > + * zap the applier, probably we could reconnect and > > + * retry instead? > > + */ > > fiber_cancel(applier->reader); > > + diag_set(FiberIsCancelled); > > Now that I have seen the entire series and the test case, I think > there are two different issues here and only one of them is > leading to a crash. > > One, is that we reset the original error with some vague > ER_WAL_IO. This shouldn't happen, but it's harmless. Still, your > fix for it is good. I don't think though it should check for !e. > > Two, is a failure to propagate the error from the cancelled > applier fiber. > > I think it's better to fix the call site to raise > replicaset.applier.diag than to add a yet another vague error > (fiberiscancelled) topped with prolific comments about how > the current code is broken. OK, thanks a huge, Kostya! I read all your mails about this series, and really appreciate the feedback! I'll rework the patchset.