[Tarantool-patches] [PATCH luajit v2 5/5] Restore cur_L for specific Lua/C API use case.

Maxim Kokryashkin m.kokryashkin at tarantool.org
Fri Oct 13 14:50:28 MSK 2023


Hi, Sergey!
Thanks for the review!
See my replies below.
Fixed your comments, the branch is force-pushed.

On Tue, Oct 10, 2023 at 11:57:22AM +0300, Sergey Kaplun via Tarantool-patches wrote:
> Hi, Maxim!
> Thanks for the patch!
> LGTM, after fixing comments below.
> 
> I suppose that this patch should be the first in the patch series
> since, otherwise, the test <tarantool-tests/gh-6189-cur_L.test.lua> will
> fail on the corresponding architectures before this commit.
Moved
> 
> AFAIR, there are still some API misusages in the Tarantool Lua C code.
> I suppose they should be fixed first before this patch is merged.
> Please correct me if I am wrong.
Yep, but there are two important aspects:
1. This patch actually make this misuse legal now, so it is questionable,
whether it should be fixed or not.
2. The issue in tarantool is more broad -- Lua coroutines and C fibers
are switched completely unaware of one another, which is not a great
design choice. To provide 100% correct switching, LuaJIT API changes
are required, which should be discussed separately.

> 
> On 29.09.23, Maxim Kokryashkin wrote:
> > From: Mike Pall <mike>
> > 
> > Thanks to Peter Cawley.
> > 
> > (cherry-picked from commit e86990f7f24a94b0897061f25a84547fe1108bed)
> > 
> > Consider the following Lua C API function:
> > 
> > ```
> > static int error_after_coroutine_return(lua_State *L)
> > {
> > 	lua_State *innerL = lua_newthread(L);
> > 	luaL_loadstring(innerL, "print('inner coro')");
> > 	lua_pcall(innerL, 0, 0, 0);
> > 	luaL_error(L, "my fancy error");
> > 	return 0;
> > }
> > ```
> > 
> > And the following Lua script:
> > ```
> > local libcur_L = require('libcur_L')
> > 
> > local function onesnap_f(var)
> >   if var then
> >     return 1
> >   else
> >     return 0
> >   end
> > end
> > 
> > -- Compile function to trace with snapshot.
> > if jit then jit.opt.start('hotloop=1') end
> > onesnap_f(true)
> > onesnap_f(true)
> > 
> > local r, s = pcall(libcur_L.error_after_coroutine_return)
> > onesnap_f(false)
> > ```
> 
> Minor: I prefer to describe the test case in words but not copy-paste it
> from tests. I suppose it will be enough to mention the test case for
> details.
Fixed!
> 
> > 
> > This is the only case when `cur_L` is not restored, according to
> > the analysis done in https://github.com/LuaJIT/LuaJIT/issues/1066.
> > 
> > This patch changes the error-catching routine, so now the patch
> > sets the actual cur_L there.
> > Now it is possible to throw errors on non-executing coroutines,
> > which is a violation of the Lua C API. So, even though it is now
> > possible, that behavior should be avoided anyway.
> 
> It's worth mentioning the analysis done in the 1066 that there is no
> need to patch case FRAME_CP because all work is done inside the VM, when
> returning from the C function called via `lua_cpcall()`.
> You may recap the analisys from 1066 here also to simplify reading of
> the patch.
> 
> > 
> > Maxim Kokryashkin:
> > * added the description and the test for the problem
> > 
> > Resolves tarantool/tarantool#6323
> 
> Missed label:
> | Part of tarantool/tarantool#9145
Fixed!

Here is the new commit message:
===
Restore cur_L for specific Lua/C API use case.

Thanks to Peter Cawley.

(cherry-picked from commit e86990f7f24a94b0897061f25a84547fe1108bed)

There is a case of valid Lua C API usage when non-throwing code
is executed on coroutine, other than one that an entry-point C
function was called with, which results in a segmentation fault
because cur_L is not being restored correspondingly. For a
comprehensive example, see the test case added within this patch.

According to the analysis done in lj-1066, to hit this issue,
we first need to call into a C function. That C function needs
to change cur_L by calling lua_resume/lua_p?call, and then the
only problematic path is leaving the C function by throwing an
error, which is caught by the fast-function x?pcall. When
x?pcall then returns to its caller, we can be executing
bytecode in the VM with an incorrect cur_L, and things will go
badly if we enter a trace before correcting cur_L.

The fix proposed in lj-740 was to set cur_L on error throw;
however, the analysis from lj-1066 suggests that it can be set
on the catching phase.

This patch changes the error-catching routine, so now the patch
sets the actual cur_L there.
Now it is possible to throw errors on non-executing coroutines,
which is a violation of the Lua C API. So, even though it is now
possible, that behavior should be avoided anyway.

Maxim Kokryashkin:
* added the description and the test for the problem

Resolves tarantool/tarantool#6323
Part of tarantool/tarantool#9145
===
> 
> > ---
> >  src/lj_err.c                                  |  5 ++-
> >  test/tarantool-tests/CMakeLists.txt           |  1 +
> >  ...-fix-cur_L-after-coroutine-resume.test.lua | 32 +++++++++++++++++++
> >  .../CMakeLists.txt                            |  1 +
> >  .../libcur_L_coroutine.c                      | 22 +++++++++++++
> >  5 files changed, 60 insertions(+), 1 deletion(-)
> >  create mode 100644 test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
> >  create mode 100644 test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/CMakeLists.txt
> >  create mode 100644 test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
> > 
> > diff --git a/src/lj_err.c b/src/lj_err.c
> 
> <snipped>
> 
> > diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
> > index c15d6037..d84072e0 100644
> > --- a/test/tarantool-tests/CMakeLists.txt
> > +++ b/test/tarantool-tests/CMakeLists.txt
> 
> <snipped>
> 
> > diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
> > new file mode 100644
> > index 00000000..3919ae23
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
> > @@ -0,0 +1,32 @@
> > +local tap = require('tap')
> > +local test = tap.test('lj-1066-fix-cur_L-after-coroutine-resume'):skipcond({
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
> > +local libcur_L_coroutine = require('libcur_L_coroutine')
> > +
> > +local function cbool(cond)
> > +  if cond then
> > +    return 1
> > +  else
> > +    return 0
> > +  end
> > +end
> > +
> > +-- Compile function to trace with snapshot.
> > +jit.opt.start('hotloop=1')
> > +-- First call makes `cbool()` hot enough to be recorded next time.
> > +cbool(true)
> > +-- Second call records `cbool()` body (i.e. `if` branch). This is
> > +-- a root trace for `cbool()`.
> > +cbool(true)
> > +
> > +local res = pcall(libcur_L_coroutine.error_after_coroutine_return)
> > +assert(res == false, "return from error")
> 
> Nit: Please, use single quotes here.
Fixed!
> 
> > +-- Call with restoration from a snapshot with wrong cur_L.
> > +cbool(false)
> > +
> > +test:ok(true)
> > +test:done(true)
> > diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/CMakeLists.txt b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/CMakeLists.txt
> 
> <snipped>
> 
> > diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
> > new file mode 100644
> > index 00000000..7a71d0f0
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
> > @@ -0,0 +1,22 @@
> > +#include "lua.h"
> > +#include "lauxlib.h"
> > +
> > +static int error_after_coroutine_return(lua_State *L)
> > +{
> > +	lua_State *innerL = lua_newthread(L);
> > +	luaL_loadstring(innerL, "print('inner coro')");
> 
> There is no need for debug printing here, just "return" in enough, I
> suppose.
> 
Fixed!
> > +	lua_pcall(innerL, 0, 0, 0);
> > +	luaL_error(L, "my fancy error");
> > +	return 0;
> > +}
> > +
> > +static const struct luaL_Reg libcur_L_coroutine[] = {
> > +	{"error_after_coroutine_return", error_after_coroutine_return},
> > +	{NULL, NULL}
> > +};
> > +
> > +LUA_API int luaopen_libcur_L_coroutine(lua_State *L)
> > +{
> > +	luaL_register(L, "libcur_L_coroutine", libcur_L_coroutine);
> > +	return 1;
> > +}
> > -- 
> > 2.42.0
Here is the diff:
===
diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
index 3919ae23..7f3739bf 100644
--- a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
+++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume.test.lua
@@ -24,7 +24,7 @@ cbool(true)
 cbool(true)
 
 local res = pcall(libcur_L_coroutine.error_after_coroutine_return)
-assert(res == false, "return from error")
+assert(res == false, 'return from error')
 -- Call with restoration from a snapshot with wrong cur_L.
 cbool(false)
 
diff --git a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
index 7a71d0f0..39d90e18 100644
--- a/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
+++ b/test/tarantool-tests/lj-1066-fix-cur_L-after-coroutine-resume/libcur_L_coroutine.c
@@ -4,7 +4,7 @@
 static int error_after_coroutine_return(lua_State *L)
 {
 	lua_State *innerL = lua_newthread(L);
-	luaL_loadstring(innerL, "print('inner coro')");
+	luaL_loadstring(innerL, "return");
 	lua_pcall(innerL, 0, 0, 0);
 	luaL_error(L, "my fancy error");
 	return 0;
===
> > 
> 
> -- 
> Best regards,
> Sergey Kaplun


More information about the Tarantool-patches mailing list