[Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API

Igor Munkin imun at tarantool.org
Sat Apr 4 00:31:25 MSK 2020


Sergos,

Thanks for your review!

Squashed the fixes, added your Reviewed-by tag and updated the upstream.

On 04.04.20, Sergey Ostanevich wrote:
> Hi!
> 
> LGTM.
> 
> Thanks.
> 
> On 30 мар 17:25, Igor Munkin wrote:
> > Sergos,
> > 
> > On 30.03.20, Sergey Ostanevich wrote:
> > > On 29 мар 13:45, Igor Munkin wrote:
> > > > Sergos,
> > > > 
> > > > On 29.03.20, Sergey Ostanevich wrote:
> > > > > On 28 мар 23:30, Igor Munkin wrote:
> > > > > > Sergos,
> > > > > > 
> > > > > > Thanks for you review!
> > > > > > 
> > > > > > On 28.03.20, Sergey Ostanevich wrote:
> > > > > > > Hi!
> > > > > > > 
> > > > > > > Thanks for the patch, see below.
> > > > > > > On 27 мар 13:47, Igor Munkin wrote:
> > > > > > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > > > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > > > > > re-entering Lua world again.
> > > > > > > > 
> > > > > > > > However, there are ways to break this atomicity via FFI: one can
> > > > > > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > > > > > a result the following stack mix is created:
> > > > > > > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> > > > > > 
> > > > > > Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> > > > > > 
> > > > > > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > > > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > > > > > against eating it.
> > > > > > > > 
> > > > > > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > > > > > from C space. As a result of FFI function call with a trigger underneath
> > > > > > > > the "FFI sandwich" is created.
> > > > > > > > 
> > > > > > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > > > > > to the one existing for Lua-FFI: trace recording is aborted whe the
> > > > > > >                                                               when
> > > > > > 
> > > > > > Thanks, fixed.
> > > > > > 
> > > > > > > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > > > > > > leads to platform panic.
> > > > > > > Could you elaborate here in comment on this a little: it means this 
> > > > > > > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > > > > > > the whole Tarantool?
> > > > > > > 
> > > > > > 
> > > > > > The whole Tarantool finishes its execution with EXIT_FAILURE code and
> > > > > > calls tarantool_panic_handler routine prior to the exit. Should I add
> > > > > > this remark into commit message?
> > > > > I would like to see it in commit message.
> > > > 
> > > > Added. Here is a new commit message:
> > > > 
> > > > ================================================================================
> > > > 
> > > > jit: abort trace recording and execution for C API
> > > > 
> > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > re-entering Lua world again.
> > > > 
> > > > However, there are ways to break this atomicity via FFI: one can
> > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > a result the following stack mix is created:
> > > > | Lua-FFI -> C routine -> Lua-C API -> Lua VM
> > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > against eating it.
> > > > 
> > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > from C space. As a result of FFI function call with a trigger underneath
> > > > the "FFI sandwich" is created.
> > > > 
> > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > to the one existing for Lua-FFI: trace recording is aborted when the
> > > > execution re-enters LuaJIT VM. If re-entrancy is detected while running
> > > > mcode the platform finishes its execution with EXIT_FAILURE code and
> > > > calls panic routine prior to the exit.
> > > > 
> > > > Relates to tarantool/tarantool#4427
> > > > 
> > > > ================================================================================
> > > 
> > > LGTM.
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > Relates to tarantool/tarantool#4427
> > > > > > > > 
> > > > > > > > Co-authored-by: Vyacheslav Egorov <vegorov at google.com>
> > > > > > > > Co-authored-by: Sergey Ostanevich <sergos at tarantool.org>
> > > > > > > > Signed-off-by: Igor Munkin <imun at tarantool.org>
> > > > > > > > ---
> > > > > > > >  src/lj_api.c                             | 35 ++++++++++----
> > > > > > > >  src/lj_errmsg.h                          |  1 +
> > > > > > > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > > > > > > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > > > > > > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > > > > > > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > > > > > > 
> > > > > > > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > > > > > > index a5e2465..c1f53e0 100644
> > > > > > > > --- a/src/lj_api.c
> > > > > > > > +++ b/src/lj_api.c
> > > > > > > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > > > > > > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static void call(lua_State *L, TValue *base, int nres) {
> > > > > > > I would recommend to name the function in a more descriptive way, like
> > > > > > > sane_call. 
> > > > > > 
> > > > > > IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> > > > > > propose the naming you prefer more.
> > > > > If you want to mention that this call aborts the trace generation,
> > > > > then I would put it like 'call_w_trace_abort'
> > > > 
> > > > It not only aborts the trace recording, but prohibits further execution
> > > > for already compiled code.
> > > > 
> > > > > But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?
> > > > 
> > > > The changes are not related to FFI at all. FFI helps to achieve the
> > > > platform failure, but the problem relates to JIT machinery. So this name
> > > > doesn't clarify function semantics.
> > > > 
> > > Can we sayn that this machinery tracks the machinery consistency? Then
> > > the first name comes to my mind is consistent_call with in the code
> > > comment on what consistency is - abort trace collection and inconsistent
> > > trace excution.
> > 
> > As a result of discussion, we stopped on the jit_secure_call name. The
> > diff is below:
> > 
> > ================================================================================
> > 
> > diff --git a/src/lj_api.c b/src/lj_api.c
> > index c1f53e0..4ff8ccf 100644
> > --- a/src/lj_api.c
> > +++ b/src/lj_api.c
> > @@ -76,8 +76,9 @@ static GCtab *getcurrenv(lua_State *L)
> >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> >  }
> >  
> > -static void call(lua_State *L, TValue *base, int nres) {
> > +static void jit_secure_call(lua_State *L, TValue *base, int nres) {
> >    global_State *g = G(L);
> > +  /* Forbid Lua world re-entrancy while running the trace */
> >    if (tvref(g->jit_base)) {
> >      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> >      if (g->panic) g->panic(L);
> > @@ -311,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
> >        return (int)(uintptr_t)base;
> >      } else {
> >        L->top = base+2;
> > -      call(L, base, 1+1);
> > +      jit_secure_call(L, base, 1+1);
> >        L->top -= 2+LJ_FR2;
> >        return tvistruecond(L->top+1+LJ_FR2);
> >      }
> > @@ -334,7 +334,7 @@ LUA_API int lua_lessthan(lua_State *L, int idx1, int idx2)
> >        return (int)(uintptr_t)base;
> >      } else {
> >        L->top = base+2;
> > -      call(L, base, 1+1);
> > +      jit_secure_call(L, base, 1+1);
> >        L->top -= 2+LJ_FR2;
> >        return tvistruecond(L->top+1+LJ_FR2);
> >      }
> > @@ -786,7 +786,7 @@ LUA_API void lua_concat(lua_State *L, int n)
> >        }
> >        n -= (int)(L->top - top);
> >        L->top = top+2;
> > -      call(L, top, 1+1);
> > +      jit_secure_call(L, top, 1+1);
> >        L->top -= 1+LJ_FR2;
> >        copyTV(L, L->top-1, L->top+LJ_FR2);
> >      } while (--n > 0);
> > @@ -806,7 +806,7 @@ LUA_API void lua_gettable(lua_State *L, int idx)
> >    v = lj_meta_tget(L, t, L->top-1);
> >    if (v == NULL) {
> >      L->top += 2;
> > -    call(L, L->top-2, 1+1);
> > +    jit_secure_call(L, L->top-2, 1+1);
> >      L->top -= 2+LJ_FR2;
> >      v = L->top+1+LJ_FR2;
> >    }
> > @@ -822,7 +822,7 @@ LUA_API void lua_getfield(lua_State *L, int idx, const char *k)
> >    v = lj_meta_tget(L, t, &key);
> >    if (v == NULL) {
> >      L->top += 2;
> > -    call(L, L->top-2, 1+1);
> > +    jit_secure_call(L, L->top-2, 1+1);
> >      L->top -= 2+LJ_FR2;
> >      v = L->top+1+LJ_FR2;
> >    }
> > @@ -977,7 +977,7 @@ LUA_API void lua_settable(lua_State *L, int idx)
> >      TValue *base = L->top;
> >      copyTV(L, base+2, base-3-2*LJ_FR2);
> >      L->top = base+3;
> > -    call(L, base, 0+1);
> > +    jit_secure_call(L, base, 0+1);
> >      L->top -= 3+LJ_FR2;
> >    }
> >  }
> > @@ -998,7 +998,7 @@ LUA_API void lua_setfield(lua_State *L, int idx, const char *k)
> >      TValue *base = L->top;
> >      copyTV(L, base+2, base-3-2*LJ_FR2);
> >      L->top = base+3;
> > -    call(L, base, 0+1);
> > +    jit_secure_call(L, base, 0+1);
> >      L->top -= 2+LJ_FR2;
> >    }
> >  }
> > @@ -1129,7 +1129,7 @@ LUA_API void lua_call(lua_State *L, int nargs, int nresults)
> >  {
> >    api_check(L, L->status == LUA_OK || L->status == LUA_ERRERR);
> >    api_checknelems(L, nargs+1);
> > -  call(L, api_call_base(L, nargs), nresults+1);
> > +  jit_secure_call(L, api_call_base(L, nargs), nresults+1);
> >  }
> >  
> >  LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
> > @@ -1147,6 +1148,7 @@ LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
> >      api_checkvalidindex(L, o);
> >      ef = savestack(L, o);
> >    }
> > +  /* Forbid Lua world re-entrancy while running the trace */
> >    if (tvref(g->jit_base)) {
> >      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> >      if (g->panic) g->panic(L);
> > @@ -1189,7 +1189,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
> >      if (LJ_FR2) setnilV(top++);
> >      copyTV(L, top++, index2adr(L, idx));
> >      L->top = top;
> > -    call(L, top-1, 1+1);
> > +    jit_secure_call(L, top-1, 1+1);
> >      return 1;
> >    }
> >    return 0;
> > 
> > ================================================================================
> > 
> > Squashed these changes with the original patch.
> > 
> > > 
> > > > > Although, safety resulting in a panic sounds paradoxical. 
> > > > 
> > > > Totally agree.
> > > > 
> > > > > 
> > > > > BTW, should it be under 'LJ_HASFFI' and keep the original call
> > > > > otherwise?
> > > > 
> > > > I see no benefits for it. Again, the fix doesn't relates to FFI
> > > > itself. It's just a particular way leading to JIT inconsistency.
> > > > 
> > > > Feel free to correct me, if I'm wrong.
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > +  global_State *g = G(L);
> > > > > > > > +  if (tvref(g->jit_base)) {
> > > > > > > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > > > > > > +    if (g->panic) g->panic(L);
> > > > > > > > +    exit(EXIT_FAILURE);
> > > > > > > > +  }
> > > > > > > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > > > > > > +  lj_vm_call(L, base, nres);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > > > > > > >  
> > > > > > > >  LUA_API int lua_status(lua_State *L)
> > > > > > > 
> > > > > > > Also I have a question on perf impact - have you measured any
> > > > > > > benchmarks - Lua only, or as part of Tarantool
> > > > > > 
> > > > > > I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> > > > > > critical difference.
> > > > > > 
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Sergos
> > > > > > 
> > > > > > [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> > > > > > [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> > > > > > 
> > > > > > -- 
> > > > > > Best regards,
> > > > > > IM
> > > > 
> > > > -- 
> > > > Best regards,
> > > > IM
> > 
> > -- 
> > Best regards,
> > IM

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list