[Tarantool-patches] [PATCH v2 2/2] fiber: fiber_join -- don't crash on misuse

Cyrill Gorcunov gorcunov at gmail.com
Thu Apr 29 01:12:58 MSK 2021


On Wed, Apr 28, 2021 at 11:16:20PM +0200, Vladislav Shpilevoy wrote:
> > diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> > index 02ec3d158..0c8238cab 100644
> > --- a/src/lua/fiber.c
> > +++ b/src/lua/fiber.c
> > @@ -791,9 +793,11 @@ lbox_fiber_join(struct lua_State *L)
> >  	int num_ret = 0;
> >  	int coro_ref = 0;
> >  
> > -	if (!(fiber->flags & FIBER_IS_JOINABLE))
> > -		luaL_error(L, "the fiber is not joinable");
> > -	fiber_join(fiber);
> > +	if (fiber_join(fiber) != 0) {
> > +		e = diag_last_error(&fiber()->diag);
> > +		if (e->type == &type_IllegalParams)
> > +			luaL_error(L, e->errmsg);
> 
> After looking at this hunk I realized that it might be wrong to
> allow to call join on a non-joinable fiber. Firstly, you have no
> way to check why did join return -1: because it wasn't joinable
> or because this is what the fiber's function has returned. It is
> simply impossible in the public API (module.h).

Not really. The error message returned actually a part of api. Before
the patch if we call join() on non joinable fiber we have an error
thrown with "the fiber is not joinable", right? So we keep it here
intact. So in Lua interface we keep everything as it was (if only
I'm not missing something obvious, sorry brain is almost off
already).

Now to the C level interface. When we call it with non joinable fiber
without the patch it leads to some unpredictable results (for now,
since in future we might reuse fiber_id = 0 or even something more
wider) and a crash in *best* case. In worst case it is completely
undeterminable behaviour and exiting with error (or even crash
some user application if it is not ready for such change) is a
way better than silent potential data corruption in memory.

As to reason for exit -- we have two error codes here and diag
last error will show it. So that user's module can fetch and
figure out the reason.

Or you mean something else? I'll reread your email tomorrow,
as I said I might be brain off already :(

side note: i figured out what you mean at the end of email...

> Secondly, fiber_join() is documented to always return the
> fiber's function result. I see it in module.h and on the site.
> Here the behaviour has kind of changed - it might return something
> even if the fiber didn't really end. This is especially bad if the
> fiber was using some resources which are freed right after the join.
> And doubly-bad if the user's function never fails, so fiber_join()
> result wasn't even checked in his code.

I see what you mean and this is api change indeed, but because of
this nasty error we've to break the api I think (simply because not
breaking api is a way worse). Obviously the C level of fibers API
either not much used or all users are sane and do the test for
joinability otherwise we would had a number of reports I think.

> Thirdly, this leads to inconsistent behaviour. In this example
> fiber.join does not raise an error - it returns false + error:
> 
> 	fiber = require('fiber')
> 	do
> 	    f = fiber.new(function() box.error('other error') end)
> 	    f:set_joinable(true)
> 	end
> 
> 	tarantool> f:join()
> 	---
> 	- false
> 	- '[string "do..."]:2: box.error(): bad arguments'
> 	...
> 
> But when I change the error type, it raises the error:
> 
> 	fiber = require('fiber')
> 	do
> 	    f = fiber.new(function() box.lib.load() end)
> 	    f:set_joinable(true)
> 	end
> 
> 	tarantool> f:join()
> 	---
> 	- error: Expects box.lib.load('name') but no name passed
> 	...
> 
> It didn't happen before your patch. The same problem exists for
> fiber_join_timeout(), but at least it is documented to be able to
> return before the fiber has joined.

So this comes from error type collision where before the patch we
didn't have. And that is really a show stopper, agreed.

> With that said, I think we must call panic() on an attempt to join
> a non-joinable fiber.

Another option is to enter endless loop with error message printed out?
Say

	if ((fiber->flags & FIBER_IS_JOINABLE) == 0) {
		say_crit("Attempt to join nonjoinable fiber");
		say_crit("Enters endless loop");
		while (!(fiber->flags & FIBER_IS_JOINABLE))
			fiber_yield();
	}

or something similar. At least a user will be able to do something? Don't
get me wrong but panic is definitely the last big hammer kick when
tarantool simply cant continue working.

> For easier usage we might need to introduce fiber_join_ex(), which
> wouldn't mix its own fail and the fiber's function fail. But maybe
> not now since nobody really asked for that.

	Cyrill


More information about the Tarantool-patches mailing list