[Tarantool-patches] [PATCH luajit v2 5/5] Restore cur_L for specific Lua/C API use case.
Maxim Kokryashkin
m.kokryashkin at tarantool.org
Fri Oct 13 14:50:28 MSK 2023
Hi, Sergey!
Thanks for the review!
See my replies below.
Fixed your comments, the branch is force-pushed.
On Tue, Oct 10, 2023 at 11:57:22AM +0300, Sergey Kaplun via Tarantool-patches wrote:
> Hi, Maxim!
> Thanks for the patch!
> LGTM, after fixing comments below.
>
> I suppose that this patch should be the first in the patch series
> since, otherwise, the test <tarantool-tests/gh-6189-cur_L.test.lua> will
> fail on the corresponding architectures before this commit.
Moved
>
> AFAIR, there are still some API misusages in the Tarantool Lua C code.
> I suppose they should be fixed first before this patch is merged.
> Please correct me if I am wrong.
Yep, but there are two important aspects:
1. This patch actually make this misuse legal now, so it is questionable,
whether it should be fixed or not.
2. The issue in tarantool is more broad -- Lua coroutines and C fibers
are switched completely unaware of one another, which is not a great
design choice. To provide 100% correct switching, LuaJIT API changes
are required, which should be discussed separately.
>
> On 29.09.23, Maxim Kokryashkin wrote:
> > From: Mike Pall <mike>
> >
> > Thanks to Peter Cawley.
> >
> > (cherry-picked from commit e86990f7f24a94b0897061f25a84547fe1108bed)
> >
> > Consider the following Lua C API function:
> >
> > ```
> > static int error_after_coroutine_return(lua_State *L)
> > {
> > lua_State *innerL = lua_newthread(L);
> > luaL_loadstring(innerL, "print('inner coro')");
> > lua_pcall(innerL, 0, 0, 0);
> > luaL_error(L, "my fancy error");
> > return 0;
> > }
> > ```
> >
> > And the following Lua script:
> > ```
> > local libcur_L = require('libcur_L')
> >
> > local function onesnap_f(var)
> > if var then
> > return 1
> > else
> > return 0
> > end
> > end
> >
> > -- Compile function to trace with snapshot.
> > if jit then jit.opt.start('hotloop=1') end
> > onesnap_f(true)
> > onesnap_f(true)
> >
> > local r, s = pcall(libcur_L.error_after_coroutine_return)
> > onesnap_f(false)
> > ```
>
> Minor: I prefer to describe the test case in words but not copy-paste it
> from tests. I suppose it will be enough to mention the test case for
> details.
Fixed!
>
> >
> > This is the only case when `cur_L` is not restored, according to
> > the analysis done in https://github.com/LuaJIT/LuaJIT/issues/1066.
> >
> > This patch changes the error-catching routine, so now the patch
> > sets the actual cur_L there.
> > Now it is possible to throw errors on non-executing coroutines,
> > which is a violation of the Lua C API. So, even though it is now
> > possible, that behavior should be avoided anyway.
>
> It's worth mentioning the analysis done in the 1066 that there is no
> need to patch case FRAME_CP because all work is done inside the VM, when
> returning from the C function called via `lua_cpcall()`.
> You may recap the analisys from 1066 here also to simplify reading of
> the patch.
>
> >
> > Maxim Kokryashkin:
> > * added the description and the test for the problem
> >
> > Resolves tarantool/tarantool#6323
>
> Missed label:
> | Part of tarantool/tarantool#9145
Fixed!
Here is the new commit message:
===
Restore cur_L for specific Lua/C API use case.
Thanks to Peter Cawley.
(cherry-picked from commit e86990f7f24a94b0897061f25a84547fe1108bed)
There is a case of valid Lua C API usage when non-throwing code
is executed on coroutine, other than one that an entry-point C
function was called with, which results in a segmentation fault
because cur_L is not being restored correspondingly. For a
comprehensive example, see the test case added within this patch.
According to the analysis done in lj-1066, to hit this issue,
we first need to call into a C function. That C function needs
to change cur_L by calling lua_resume/lua_p?call, and then the
only problematic path is leaving the C function by throwing an
error, which is caught by the fast-function x?pcall. When
x?pcall then returns to its caller, we can be executing
bytecode in the VM with an incorrect cur_L, and things will go
badly if we enter a trace before correcting cur_L.
The fix proposed in lj-740 was to set cur_L on error throw;
however, the analysis from lj-1066 suggests that it can be set
on the catching phase.
This patch changes the error-catching routine, so now the patch
sets the actual cur_L there.
Now it is possible to throw errors on non-executing coroutines,
which is a violation of the Lua C API. So, even though it is now
possible, that behavior should be avoided anyway.
Maxim Kokryashkin:
* added the description and the test for the problem
Resolves tarantool/tarantool#6323
Part of tarantool/tarantool#9145
===
>
> > ---
> > src/lj_err.c | 5 ++-
> > test/tarantool-tests/CMakeLists.txt | 1 +
> > ...-fix-cur_L-after-coroutine-resume.test.lua | 32 +++++++++++++++++++
> > .../CMakeLists.txt | 1 +
> > .../libcur_L_coroutine.c | 22 +++++++++++++
> > 5 files changed, 60 insertions(+), 1 deletion(-)
> > create mode 100644 test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
> > create mode 100644 test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/CMakeLists.txt
> > create mode 100644 test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
> >
> > diff --git a/src/lj_err.c b/src/lj_err.c
>
> <snipped>
>
> > diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
> > index c15d6037..d84072e0 100644
> > --- a/test/tarantool-tests/CMakeLists.txt
> > +++ b/test/tarantool-tests/CMakeLists.txt
>
> <snipped>
>
> > diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
> > new file mode 100644
> > index 00000000..3919ae23
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
> > @@ -0,0 +1,32 @@
> > +local tap = require('tap')
> > +local test = tap.test('lj-1066-fix-cur_L-after-coroutine-resume'):skipcond({
> > + ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
> > +local libcur_L_coroutine = require('libcur_L_coroutine')
> > +
> > +local function cbool(cond)
> > + if cond then
> > + return 1
> > + else
> > + return 0
> > + end
> > +end
> > +
> > +-- Compile function to trace with snapshot.
> > +jit.opt.start('hotloop=1')
> > +-- First call makes `cbool()` hot enough to be recorded next time.
> > +cbool(true)
> > +-- Second call records `cbool()` body (i.e. `if` branch). This is
> > +-- a root trace for `cbool()`.
> > +cbool(true)
> > +
> > +local res = pcall(libcur_L_coroutine.error_after_coroutine_return)
> > +assert(res == false, "return from error")
>
> Nit: Please, use single quotes here.
Fixed!
>
> > +-- Call with restoration from a snapshot with wrong cur_L.
> > +cbool(false)
> > +
> > +test:ok(true)
> > +test:done(true)
> > diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/CMakeLists.txt b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/CMakeLists.txt
>
> <snipped>
>
> > diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
> > new file mode 100644
> > index 00000000..7a71d0f0
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
> > @@ -0,0 +1,22 @@
> > +#include "lua.h"
> > +#include "lauxlib.h"
> > +
> > +static int error_after_coroutine_return(lua_State *L)
> > +{
> > + lua_State *innerL = lua_newthread(L);
> > + luaL_loadstring(innerL, "print('inner coro')");
>
> There is no need for debug printing here, just "return" in enough, I
> suppose.
>
Fixed!
> > + lua_pcall(innerL, 0, 0, 0);
> > + luaL_error(L, "my fancy error");
> > + return 0;
> > +}
> > +
> > +static const struct luaL_Reg libcur_L_coroutine[] = {
> > + {"error_after_coroutine_return", error_after_coroutine_return},
> > + {NULL, NULL}
> > +};
> > +
> > +LUA_API int luaopen_libcur_L_coroutine(lua_State *L)
> > +{
> > + luaL_register(L, "libcur_L_coroutine", libcur_L_coroutine);
> > + return 1;
> > +}
> > --
> > 2.42.0
Here is the diff:
===
diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
index 3919ae23..7f3739bf 100644
--- a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
+++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
@@ -24,7 +24,7 @@ cbool(true)
cbool(true)
local res = pcall(libcur_L_coroutine.error_after_coroutine_return)
-assert(res == false, "return from error")
+assert(res == false, 'return from error')
-- Call with restoration from a snapshot with wrong cur_L.
cbool(false)
diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
index 7a71d0f0..39d90e18 100644
--- a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
+++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
@@ -4,7 +4,7 @@
static int error_after_coroutine_return(lua_State *L)
{
lua_State *innerL = lua_newthread(L);
- luaL_loadstring(innerL, "print('inner coro')");
+ luaL_loadstring(innerL, "return");
lua_pcall(innerL, 0, 0, 0);
luaL_error(L, "my fancy error");
return 0;
===
> >
>
> --
> Best regards,
> Sergey Kaplun
More information about the Tarantool-patches
mailing list