From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7CED1445320 for ; Fri, 17 Jul 2020 10:25:54 +0300 (MSK) Received: by mail-lj1-f196.google.com with SMTP id q4so11569972lji.2 for ; Fri, 17 Jul 2020 00:25:54 -0700 (PDT) Date: Fri, 17 Jul 2020 10:25:50 +0300 From: Cyrill Gorcunov Message-ID: <20200717072550.GA2613@grain> References: <20200714144440.551127-1-gorcunov@gmail.com> <20200714144440.551127-5-gorcunov@gmail.com> <20200714145358.GG296695@grain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tml On Fri, Jul 17, 2020 at 12:45:16AM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > On 14.07.2020 16:53, Cyrill Gorcunov wrote: > > Basically our timeout is calculated via (a - b), where > > @a is a constant positive value fetched once, in turn > > the @b is rather a dynamic value thus the result may > > be negative. libev uses assert() call to catch such > > values when passed to timers setup. Thus lets intercept > > potential assert() trigger and exit early if timeout > > is already expired. > > > > Signed-off-by: Cyrill Gorcunov > > --- > > There were a typo, so I force-updated the branch > > > > src/box/txn_limbo.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > > index d5b887d36..0924952b7 100644 > > --- a/src/box/txn_limbo.c > > +++ b/src/box/txn_limbo.c > > @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry) > > double start_time = fiber_clock(); > > while (true) { > > double deadline = start_time + replication_synchro_timeout; > > - bool cancellable = fiber_set_cancellable(false); > > double timeout = deadline - fiber_clock(); > > + if (timeout < 0) > > + goto do_rollback; > > + bool cancellable = fiber_set_cancellable(false); > > I added timeout = -1; here and tried to commit a sync transaction. I > got timed out error, no assertions. Please, tell me, how to reproduce > the assertion you mention in the commit message. Otherwise I don't see > why would we need the < 0 check. If it is done somewhere inside > fiber_cond_wait_timeout anyway. Look, here is libev code --- noinline void ev_timer_start (EV_P_ ev_timer *w) EV_THROW { if (expect_false (ev_is_active (w))) return; ev_at (w) += mn_now; assert (("libev: ev_timer_start called with negative timer repeat value", w->repeat >= 0.)); --- To trigger this assert we have to enter idle cycle, then manually change replication_synchro_timeout via cfg (make it less than it was initially) which should lead to negative timeout. To be fair I don't know how to force it without error injection. Lets assume we have initial fiber clock 1 (the clocks are increasing everytime libev does a new polling cycle). Thus // replication_synchro_timeout = 2 double start_time = fiber_clock(); // 1 while (true) { double deadline = start_time + replication_synchro_timeout; // => 3 double timeout = deadline - fiber_clock(); // => 2 bool cancellable = fiber_set_cancellable(false); int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout); // // enter into resched cycle // manually change replication_synchro_timeout to 1 // next cycle starts double deadline = start_time + replication_synchro_timeout; // start_time didn't changed // => 1 + 1 = 2 double timeout = deadline - fiber_clock(); // assume several resched cycles passed, thus // fiber_clock returns 3, thus // => 2 - 3 => -1 ... Am I missing something obvious here?