From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 3B4B86F865; Thu, 29 Apr 2021 01:13:03 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3B4B86F865 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1619647983; bh=gnyhVSky6qGz2XLlLLvprKzNsFhCWLfrxgzAYaEntSQ=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sX+IOz4g8tpUS0h7YDlkVvTGxdPbPa/i1RES4zWldEMAKl7FhbnzXg5B+vKIjnM7O aSmncSHmHsR60bjkGlo5XlWntbg3lWRIIu7xsOqX/NyLhRV6LQX0sqP9GPC+ApKpzW jambUENDm2dIGvAjqt5pQEJOYNjRJE+5JmNkxpkA= Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 24D6F6F865 for ; Thu, 29 Apr 2021 01:13:02 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 24D6F6F865 Received: by mail-lj1-f172.google.com with SMTP id a5so37067390ljk.0 for ; Wed, 28 Apr 2021 15:13:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=uTF3qgSvuU5NUbpmH/brDqBKKol/Y3bYQHbgyDJlwNo=; b=LWVQGa7j/9bmWOIABKiImke1TfivF5OptKWDfNn1PnBWoxtDOkjGBXyZg+k3123hHx n83T5mjpzUxW6nuaQEor75FikhahchQ2cPvUTi5Aq/ejysmoWQ/p8rbfjBNlR4lOYJgN N+mgSQL0Yn5ZAVGPnM6AsXMVS6OXHojx485EboTW9ZYU7onMkd3TjU57hzBONoGUItxh CXFKQ4p9fMlBdPj4D4lCW2OJWGGUF21BGT6y8rwYqFlo43R6LCpVpWTuHJ+vt161uwL3 pJyJeZyqiOwEViiYwmipws6iWQLBWLXkvKHe5i1x+Kocod/Yfnz86VpSs4K1JT9cdC5D a+4g== X-Gm-Message-State: AOAM531ySlipnL4kDOwaDnYpffFYCvG9JAm50xSXT4x2j7Tu9zl89Ixg zK8U9e5OC5G9aKmCR2Y0ZlcmkuJGYG1/pg== X-Google-Smtp-Source: ABdhPJwxyvrK+1ZiLEFJGj4ePmTEeCdNZFDXXfSlbsMIpDckjg6kwdq6GgYLHih5SrQEKOak7PtCFw== X-Received: by 2002:a2e:3611:: with SMTP id d17mr21942186lja.392.1619647981015; Wed, 28 Apr 2021 15:13:01 -0700 (PDT) Received: from grain.localdomain ([5.18.199.94]) by smtp.gmail.com with ESMTPSA id j15sm251672lfu.269.2021.04.28.15.12.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 15:12:59 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id CEF88560116; Thu, 29 Apr 2021 01:12:58 +0300 (MSK) Date: Thu, 29 Apr 2021 01:12:58 +0300 To: Vladislav Shpilevoy Message-ID: References: <20210428102251.552976-1-gorcunov@gmail.com> <20210428102251.552976-3-gorcunov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.5 (2021-01-21) Subject: Re: [Tarantool-patches] [PATCH v2 2/2] fiber: fiber_join -- don't crash on misuse X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: tml Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Wed, Apr 28, 2021 at 11:16:20PM +0200, Vladislav Shpilevoy wrote: > > diff --git a/src/lua/fiber.c b/src/lua/fiber.c > > index 02ec3d158..0c8238cab 100644 > > --- a/src/lua/fiber.c > > +++ b/src/lua/fiber.c > > @@ -791,9 +793,11 @@ lbox_fiber_join(struct lua_State *L) > > int num_ret = 0; > > int coro_ref = 0; > > > > - if (!(fiber->flags & FIBER_IS_JOINABLE)) > > - luaL_error(L, "the fiber is not joinable"); > > - fiber_join(fiber); > > + if (fiber_join(fiber) != 0) { > > + e = diag_last_error(&fiber()->diag); > > + if (e->type == &type_IllegalParams) > > + luaL_error(L, e->errmsg); > > After looking at this hunk I realized that it might be wrong to > allow to call join on a non-joinable fiber. Firstly, you have no > way to check why did join return -1: because it wasn't joinable > or because this is what the fiber's function has returned. It is > simply impossible in the public API (module.h). Not really. The error message returned actually a part of api. Before the patch if we call join() on non joinable fiber we have an error thrown with "the fiber is not joinable", right? So we keep it here intact. So in Lua interface we keep everything as it was (if only I'm not missing something obvious, sorry brain is almost off already). Now to the C level interface. When we call it with non joinable fiber without the patch it leads to some unpredictable results (for now, since in future we might reuse fiber_id = 0 or even something more wider) and a crash in *best* case. In worst case it is completely undeterminable behaviour and exiting with error (or even crash some user application if it is not ready for such change) is a way better than silent potential data corruption in memory. As to reason for exit -- we have two error codes here and diag last error will show it. So that user's module can fetch and figure out the reason. Or you mean something else? I'll reread your email tomorrow, as I said I might be brain off already :( side note: i figured out what you mean at the end of email... > Secondly, fiber_join() is documented to always return the > fiber's function result. I see it in module.h and on the site. > Here the behaviour has kind of changed - it might return something > even if the fiber didn't really end. This is especially bad if the > fiber was using some resources which are freed right after the join. > And doubly-bad if the user's function never fails, so fiber_join() > result wasn't even checked in his code. I see what you mean and this is api change indeed, but because of this nasty error we've to break the api I think (simply because not breaking api is a way worse). Obviously the C level of fibers API either not much used or all users are sane and do the test for joinability otherwise we would had a number of reports I think. > Thirdly, this leads to inconsistent behaviour. In this example > fiber.join does not raise an error - it returns false + error: > > fiber = require('fiber') > do > f = fiber.new(function() box.error('other error') end) > f:set_joinable(true) > end > > tarantool> f:join() > --- > - false > - '[string "do..."]:2: box.error(): bad arguments' > ... > > But when I change the error type, it raises the error: > > fiber = require('fiber') > do > f = fiber.new(function() box.lib.load() end) > f:set_joinable(true) > end > > tarantool> f:join() > --- > - error: Expects box.lib.load('name') but no name passed > ... > > It didn't happen before your patch. The same problem exists for > fiber_join_timeout(), but at least it is documented to be able to > return before the fiber has joined. So this comes from error type collision where before the patch we didn't have. And that is really a show stopper, agreed. > With that said, I think we must call panic() on an attempt to join > a non-joinable fiber. Another option is to enter endless loop with error message printed out? Say if ((fiber->flags & FIBER_IS_JOINABLE) == 0) { say_crit("Attempt to join nonjoinable fiber"); say_crit("Enters endless loop"); while (!(fiber->flags & FIBER_IS_JOINABLE)) fiber_yield(); } or something similar. At least a user will be able to do something? Don't get me wrong but panic is definitely the last big hammer kick when tarantool simply cant continue working. > For easier usage we might need to introduce fiber_join_ex(), which > wouldn't mix its own fail and the fiber's function fail. But maybe > not now since nobody really asked for that. Cyrill