[Tarantool-patches] [PATCH] box: remove context from stack
Sergey Kaplun
skaplun at tarantool.org
Tue Sep 7 15:40:57 MSK 2021
Hi, Alexander!
Sorry for the long response!
On 01.07.21, Alexander Turenko wrote:
> On Tue, Jun 09, 2020 at 01:35:18PM +0300, Maria Khaydich wrote:
> >
> > I suppose no tests are needed since this one is pretty straightforward.
> > ----------------------------------------------------------------------
> >
> > Lua stack was broken because we forgot to clear the context
> > in case of an error when return value of called function was
> > not serializable.
> >
> > Closes #4617
> > ---
> > Branch:
> > https://github.com/tarantool/tarantool/compare/eljashm/gh-4617-broken-lua-stack
> > Issue:
> > https://github.com/tarantool/tarantool/issues/4617
> >
> > @ChangeLog
> > Remove context from lua stack in case of an error.
>
> Here we describe changes for a user (in the way a user may understand
> it). For complex bugs we can briefly describe a situation, when it
> occurs.
>
> It is now in the changelogs/ directory.
>
> >
> > src/box/lua/call.c | 1 +
> > 1 file changed, 1 insertion(+)
> > diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> > index 6588ec2fa..7ab49983d 100644
> > --- a/src/box/lua/call.c
> > +++ b/src/box/lua/call.c
> > @@ -436,6 +436,7 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
> > int top = lua_gettop(L);
> > if (lua_cpcall(L, handler, &ctx) != 0) {
> > luaT_toerror(port->L);
> > + lua_pop(L, 1);
>
> There are several doubts and thoughts around it:
To answer the questions below we need to describe what is going on
here.
The old code flow was the following:
1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
arguments to encode to MessagePack.
2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
or `encode_lua_call_16`() via `lua_cpcall()`.
3) Objects on port coroutine are encoded via `luamp_encode()` or
`luamp_encode_call16()`.
4) This encoding may raise an error on unprotected `port->L` coroutine.
This coroutine has no protected frame on it and this call should fail
in pure Lua.
Calling anything on unprotected coroutine is not allowed in Lua [1]:
| If an error happens outside any protected environment, Lua calls a
| panic function
Lua 5.1 sets protection only for specific lua_State [2] and calls a
panic function if we raise an error on unprotected lua_State [3].
Nevertheless, no panic occurs now due to two facts:
* The first one is LuaJIT's support of C++ exception handling [4] that
allows to raise an error in Lua and catch it in C++ or vice versa. But
documentation still doesn't allow raising errors on unprotected
coroutines (at least we must use try-catch block).
* The second one is the patch made in LuaJIT to restore currently
executed coroutine, when C function or fast function raises an
error [5][6] (see the related issue here [7][8]) (*).
For these reasons, when an error occurs, the unwinder searches and finds
the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
coroutine and unwinds until that point (without aforementioned patches
LuaJIT just calls a panic function and exits).
If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
error from `port->L` coroutine is converted into a Tarantool error and a
diagnostic is set.
(*) P.S. But raise an error on non currently executed coroutine is
violation of Lua C API too (see [9][10]).
>
> 1. lua_cpcall() always removes arguments from the Lua stack (in case of
> an error too). How we can observe this ctx userdata on tarantool_L
> without yield in encode_lua_call()?
Within the violation of Lua C API the behaviour is implementation
depended. cpcall removes arguments from `port->L` Lua stack, but not
from `tarantool_L`.
> 2. luaT_toerror() leans on luaT_tolstring(), so it duplicates a string
> with the error message. Should not we pop two elements in the case? I
> would just call lua_settop().
`luaT_toerror()` works with `port->L`, while we "want" to clean
`tarantool_L` (`L`) coroutine.
> 3. The clash is due to tarantool_L usage (see 1.10.1-48-g18d87adf3). Is
> there any real reason why we can't use port->L here?
I don't see any reason to avoid it. So this issue was fixed
[11][12][13] via usage of `port->L` Lua stack here instead.
>
> CCed Igor and Sergey to think around together.
>
> > return -1;
> > }
> > lua_settop(L, top);
> > --
> > 2.24.0
>
[1]: https://www.lua.org/manual/5.2/manual.html#4.6
[2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
[3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
[4]: https://luajit.org/extensions.html#exceptions
[5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
[6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
[7]: https://github.com/tarantool/tarantool/issues/1516
[8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
[9]: https://github.com/LuaJIT/LuaJIT/pull/740#issuecomment-897173357
[10]: http://lua-users.org/lists/lua-l/2021-08/msg00084.html
[11]: https://github.com/tarantool/tarantool/commit/027775ff22993b03886f3bcc002e9c257ad09c02
[12]: https://lists.tarantool.org/pipermail/tarantool-patches/2021-August/025078.html
[13]: https://lists.tarantool.org/pipermail/tarantool-patches/2021-June/024351.html
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list