Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change
@ 2020-03-27 10:47 Igor Munkin
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API Igor Munkin
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Igor Munkin @ 2020-03-27 10:47 UTC (permalink / raw)
  To: Sergey Ostanevich, Vladislav Shpilevoy, Kirill Yukhin; +Cc: tarantool-patches

This series closes two issues related to the JIT machinery behaviour:
* "FFI sandwich"(*) detection is introduced. If sandwich is detected
  while trace recording the recording is aborted. The sandwich detected
  while mcode execution leads to the platform panic.
* luaJIT_setmode call is prohibited while mcode execution and leads to
  the platform panic.

(*) The following stack mix is called FFI sandwich.
    | Lua-FFI -> С routine -> Lua-C API -> Lua VM
    This sort of re-entrancy is explicitly not supported by LuaJIT
    compiler. For more info see [1].

Branch: https://github.com/tarantool/luajit/tree/imun/ffi-sandwich

[1]: https://github.com/tarantool/tarantool/issues/4427

Igor Munkin (2):
  jit: abort trace recording and execution for C API
  jit: abort trace execution on JIT mode change

 src/lj_api.c                             | 35 ++++++++++----
 src/lj_dispatch.c                        |  5 ++
 src/lj_errmsg.h                          |  2 +
 test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
 test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
 test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
 test/lj-flush-on-trace/CMakeLists.txt    |  1 +
 test/lj-flush-on-trace/libflush.c        | 31 +++++++++++++
 test/lj-flush-on-trace/test.lua          | 25 ++++++++++
 9 files changed, 176 insertions(+), 9 deletions(-)
 create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
 create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
 create mode 100644 test/gh-4427-ffi-sandwich/test.lua
 create mode 100644 test/lj-flush-on-trace/CMakeLists.txt
 create mode 100644 test/lj-flush-on-trace/libflush.c
 create mode 100644 test/lj-flush-on-trace/test.lua

-- 
2.25.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-27 10:47 [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change Igor Munkin
@ 2020-03-27 10:47 ` Igor Munkin
  2020-03-28 16:33   ` Sergey Ostanevich
  2020-04-02 23:41   ` Vladislav Shpilevoy
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change Igor Munkin
  2020-04-02 23:41 ` [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or " Vladislav Shpilevoy
  2 siblings, 2 replies; 20+ messages in thread
From: Igor Munkin @ 2020-03-27 10:47 UTC (permalink / raw)
  To: Sergey Ostanevich, Vladislav Shpilevoy, Kirill Yukhin; +Cc: tarantool-patches

JIT recording semantics assumes FFI calls are atomic regarding the
LuaJIT VM: if the execution exited Lua world through FFI it is not
re-entering Lua world again.

However, there are ways to break this atomicity via FFI: one can
re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
a result the following stack mix is created:
| Lua-FFI -> С routine -> Lua-C API -> Lua VM
This sort of re-entrancy is explicitly not supported by LuaJIT compiler
machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
against eating it.

E.g. there is a trigger machinery in Tarantool calling Lua functions
from C space. As a result of FFI function call with a trigger underneath
the "FFI sandwich" is created.

This changeset introduces the mechanism for Lua-C API callbacks similar
to the one existing for Lua-FFI: trace recording is aborted whe the
execution re-enters LuaJIT VM. Sandwich detection while mcode execution
leads to platform panic.

Relates to tarantool/tarantool#4427

Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
---
 src/lj_api.c                             | 35 ++++++++++----
 src/lj_errmsg.h                          |  1 +
 test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
 test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
 test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
 5 files changed, 113 insertions(+), 9 deletions(-)
 create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
 create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
 create mode 100644 test/gh-4427-ffi-sandwich/test.lua

diff --git a/src/lj_api.c b/src/lj_api.c
index a5e2465..c1f53e0 100644
--- a/src/lj_api.c
+++ b/src/lj_api.c
@@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
   return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
 }
 
+static void call(lua_State *L, TValue *base, int nres) {
+  global_State *g = G(L);
+  if (tvref(g->jit_base)) {
+    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
+    if (g->panic) g->panic(L);
+    exit(EXIT_FAILURE);
+  }
+  lj_trace_abort(g);  /* Never record across Lua VM entrance */
+  lj_vm_call(L, base, nres);
+}
+
 /* -- Miscellaneous API functions ----------------------------------------- */
 
 LUA_API int lua_status(lua_State *L)
@@ -300,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
       return (int)(uintptr_t)base;
     } else {
       L->top = base+2;
-      lj_vm_call(L, base, 1+1);
+      call(L, base, 1+1);
       L->top -= 2+LJ_FR2;
       return tvistruecond(L->top+1+LJ_FR2);
     }
@@ -323,7 +334,7 @@ LUA_API int lua_lessthan(lua_State *L, int idx1, int idx2)
       return (int)(uintptr_t)base;
     } else {
       L->top = base+2;
-      lj_vm_call(L, base, 1+1);
+      call(L, base, 1+1);
       L->top -= 2+LJ_FR2;
       return tvistruecond(L->top+1+LJ_FR2);
     }
@@ -775,7 +786,7 @@ LUA_API void lua_concat(lua_State *L, int n)
       }
       n -= (int)(L->top - top);
       L->top = top+2;
-      lj_vm_call(L, top, 1+1);
+      call(L, top, 1+1);
       L->top -= 1+LJ_FR2;
       copyTV(L, L->top-1, L->top+LJ_FR2);
     } while (--n > 0);
@@ -795,7 +806,7 @@ LUA_API void lua_gettable(lua_State *L, int idx)
   v = lj_meta_tget(L, t, L->top-1);
   if (v == NULL) {
     L->top += 2;
-    lj_vm_call(L, L->top-2, 1+1);
+    call(L, L->top-2, 1+1);
     L->top -= 2+LJ_FR2;
     v = L->top+1+LJ_FR2;
   }
@@ -811,7 +822,7 @@ LUA_API void lua_getfield(lua_State *L, int idx, const char *k)
   v = lj_meta_tget(L, t, &key);
   if (v == NULL) {
     L->top += 2;
-    lj_vm_call(L, L->top-2, 1+1);
+    call(L, L->top-2, 1+1);
     L->top -= 2+LJ_FR2;
     v = L->top+1+LJ_FR2;
   }
@@ -966,7 +977,7 @@ LUA_API void lua_settable(lua_State *L, int idx)
     TValue *base = L->top;
     copyTV(L, base+2, base-3-2*LJ_FR2);
     L->top = base+3;
-    lj_vm_call(L, base, 0+1);
+    call(L, base, 0+1);
     L->top -= 3+LJ_FR2;
   }
 }
@@ -987,7 +998,7 @@ LUA_API void lua_setfield(lua_State *L, int idx, const char *k)
     TValue *base = L->top;
     copyTV(L, base+2, base-3-2*LJ_FR2);
     L->top = base+3;
-    lj_vm_call(L, base, 0+1);
+    call(L, base, 0+1);
     L->top -= 2+LJ_FR2;
   }
 }
@@ -1118,7 +1129,7 @@ LUA_API void lua_call(lua_State *L, int nargs, int nresults)
 {
   api_check(L, L->status == LUA_OK || L->status == LUA_ERRERR);
   api_checknelems(L, nargs+1);
-  lj_vm_call(L, api_call_base(L, nargs), nresults+1);
+  call(L, api_call_base(L, nargs), nresults+1);
 }
 
 LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
@@ -1136,6 +1147,12 @@ LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
     api_checkvalidindex(L, o);
     ef = savestack(L, o);
   }
+  if (tvref(g->jit_base)) {
+    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
+    if (g->panic) g->panic(L);
+    exit(EXIT_FAILURE);
+  }
+  lj_trace_abort(g);  /* Never record across Lua VM entrance */
   status = lj_vm_pcall(L, api_call_base(L, nargs), nresults+1, ef);
   if (status) hook_restore(g, oldh);
   return status;
@@ -1172,7 +1189,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
     if (LJ_FR2) setnilV(top++);
     copyTV(L, top++, index2adr(L, idx));
     L->top = top;
-    lj_vm_call(L, top-1, 1+1);
+    call(L, top-1, 1+1);
     return 1;
   }
   return 0;
diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
index 060a9f8..1580385 100644
--- a/src/lj_errmsg.h
+++ b/src/lj_errmsg.h
@@ -112,6 +112,7 @@ ERRDEF(NOJIT,	"no JIT compiler for this architecture (yet)")
 ERRDEF(NOJIT,	"JIT compiler permanently disabled by build option")
 #endif
 ERRDEF(JITOPT,	"unknown or malformed optimization flag " LUA_QS)
+ERRDEF(JITCALL,	"Lua VM re-entrancy is detected while executing the trace")
 
 /* Lexer/parser errors. */
 ERRDEF(XMODE,	"attempt to load chunk with wrong mode")
diff --git a/test/gh-4427-ffi-sandwich/CMakeLists.txt b/test/gh-4427-ffi-sandwich/CMakeLists.txt
new file mode 100644
index 0000000..995c6bb
--- /dev/null
+++ b/test/gh-4427-ffi-sandwich/CMakeLists.txt
@@ -0,0 +1 @@
+build_lualib(libsandwich libsandwich.c)
diff --git a/test/gh-4427-ffi-sandwich/libsandwich.c b/test/gh-4427-ffi-sandwich/libsandwich.c
new file mode 100644
index 0000000..2a24fb2
--- /dev/null
+++ b/test/gh-4427-ffi-sandwich/libsandwich.c
@@ -0,0 +1,59 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+struct sandwich {
+	lua_State *L; /* Coroutine saved for a Lua call */
+	int ref;      /* Anchor to the Lua function to be run */
+	int trigger;  /* Trigger for switching to Lua call */
+};
+
+int ipp(struct sandwich *state, int i)
+{
+	if (i < state->trigger)
+		return i + 1;
+
+	/* Sandwich is triggered and Lua ipp function is called */
+	lua_pushnumber(state->L, state->ref);
+	lua_gettable(state->L, LUA_REGISTRYINDEX);
+	lua_pushnumber(state->L, i);
+	lua_call(state->L, 1, 1);
+	return lua_tonumber(state->L, -1);
+}
+
+#define STRUCT_SANDWICH_MT "struct sandwich"
+
+static int init(lua_State *L)
+{
+	struct sandwich *state = lua_newuserdata(L, sizeof(struct sandwich));
+
+	luaL_getmetatable(L, STRUCT_SANDWICH_MT);
+	lua_setmetatable(L, -2);
+
+	/* Lua ipp function to be called when sandwich is triggered */
+	if (luaL_dostring(L, "return function(i) return i + 1 end"))
+		luaL_error(L, "failed to translate Lua ipp function");
+
+	state->ref = luaL_ref(L, LUA_REGISTRYINDEX);
+	state->L = L;
+	state->trigger = lua_tonumber(L, 1);
+	return 1;
+}
+
+static int fin(lua_State *L)
+{
+	struct sandwich *state = luaL_checkudata(L, 1, STRUCT_SANDWICH_MT);
+
+	/* Release the anchored ipp function */
+	luaL_unref(L, LUA_REGISTRYINDEX, state->ref);
+	return 0;
+}
+
+LUA_API int luaopen_libsandwich(lua_State *L)
+{
+	luaL_newmetatable(L, STRUCT_SANDWICH_MT);
+	lua_pushcfunction(L, fin);
+	lua_setfield(L, -2, "__gc");
+
+	lua_pushcfunction(L, init);
+	return 1;
+}
diff --git a/test/gh-4427-ffi-sandwich/test.lua b/test/gh-4427-ffi-sandwich/test.lua
new file mode 100644
index 0000000..3ccaee5
--- /dev/null
+++ b/test/gh-4427-ffi-sandwich/test.lua
@@ -0,0 +1,26 @@
+local cfg = {
+  hotloop = arg[1] or 1,
+  trigger = arg[2] or 1,
+}
+
+local ffi = require('ffi')
+local ffisandwich = ffi.load('libsandwich')
+ffi.cdef('int ipp(struct sandwich *state, int i)')
+
+-- Save the current coroutine and set the value to trigger ipp
+-- call the Lua routine instead of pure C implementation.
+local sandwich = require('libsandwich')(cfg.trigger)
+
+-- Depending on trigger and hotloop values the following contexts
+-- are possible:
+-- * if trigger <= hotloop -> trace recording is aborted
+-- * if trigger >  hotloop -> trace is recorded but execution
+--   leads to panic
+jit.opt.start("3", string.format("hotloop=%d", cfg.hotloop))
+
+local res
+for i = 0, cfg.trigger + cfg.hotloop do
+  res = ffisandwich.ipp(sandwich, i)
+end
+-- Check the resulting value if panic didn't occur earlier.
+print(res)
-- 
2.25.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change
  2020-03-27 10:47 [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change Igor Munkin
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API Igor Munkin
@ 2020-03-27 10:47 ` Igor Munkin
  2020-03-28 19:36   ` Sergey Ostanevich
  2020-04-02 23:41 ` [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or " Vladislav Shpilevoy
  2 siblings, 1 reply; 20+ messages in thread
From: Igor Munkin @ 2020-03-27 10:47 UTC (permalink / raw)
  To: Sergey Ostanevich, Vladislav Shpilevoy, Kirill Yukhin; +Cc: tarantool-patches

Current luaJIT_setmode implementation aborts trace recording but nothing
prevents calling it on already compiled trace. E.g. if one conditionally
calls an FFI function having luaJIT_setmode with LUAJIT_MODE_FLUSH mode
underneath, the trace being executed can be purged and the return
address is invalidated as a result (since the mcode is released).

This changeset prohibits luaJIT_setmode call while mcode is being
executed leading to platform panic if the call occurs.

Signed-off-by: Igor Munkin <imun@tarantool.org>
---
 src/lj_dispatch.c                     |  5 +++++
 src/lj_errmsg.h                       |  1 +
 test/lj-flush-on-trace/CMakeLists.txt |  1 +
 test/lj-flush-on-trace/libflush.c     | 31 +++++++++++++++++++++++++++
 test/lj-flush-on-trace/test.lua       | 25 +++++++++++++++++++++
 5 files changed, 63 insertions(+)
 create mode 100644 test/lj-flush-on-trace/CMakeLists.txt
 create mode 100644 test/lj-flush-on-trace/libflush.c
 create mode 100644 test/lj-flush-on-trace/test.lua

diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
index 5d6795f..b3448c8 100644
--- a/src/lj_dispatch.c
+++ b/src/lj_dispatch.c
@@ -240,6 +240,11 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
 {
   global_State *g = G(L);
   int mm = mode & LUAJIT_MODE_MASK;
+  if (tvref(g->jit_base)) {
+    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITMODE));
+    if (g->panic) g->panic(L);
+    exit(EXIT_FAILURE);
+  }
   lj_trace_abort(g);  /* Abort recording on any state change. */
   /* Avoid pulling the rug from under our own feet. */
   if ((g->hookmask & HOOK_GC))
diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
index 1580385..de7b867 100644
--- a/src/lj_errmsg.h
+++ b/src/lj_errmsg.h
@@ -113,6 +113,7 @@ ERRDEF(NOJIT,	"JIT compiler permanently disabled by build option")
 #endif
 ERRDEF(JITOPT,	"unknown or malformed optimization flag " LUA_QS)
 ERRDEF(JITCALL,	"Lua VM re-entrancy is detected while executing the trace")
+ERRDEF(JITMODE,	"JIT mode change is detected while executing the trace")
 
 /* Lexer/parser errors. */
 ERRDEF(XMODE,	"attempt to load chunk with wrong mode")
diff --git a/test/lj-flush-on-trace/CMakeLists.txt b/test/lj-flush-on-trace/CMakeLists.txt
new file mode 100644
index 0000000..a90452d
--- /dev/null
+++ b/test/lj-flush-on-trace/CMakeLists.txt
@@ -0,0 +1 @@
+build_lualib(libflush libflush.c)
diff --git a/test/lj-flush-on-trace/libflush.c b/test/lj-flush-on-trace/libflush.c
new file mode 100644
index 0000000..177409a
--- /dev/null
+++ b/test/lj-flush-on-trace/libflush.c
@@ -0,0 +1,31 @@
+#include <lua.h>
+#include <luajit.h>
+
+struct flush {
+	lua_State *L; /* Coroutine saved to change JIT mode */
+	int trigger;  /* Trigger for flushing all traces */
+};
+
+void flush(struct flush *state, int i)
+{
+	if (i < state->trigger)
+		return;
+
+	/* Trace flushing is triggered */
+	(void)luaJIT_setmode(state->L, 0, LUAJIT_MODE_ENGINE|LUAJIT_MODE_FLUSH);
+}
+
+static int init(lua_State *L)
+{
+	struct flush *state = lua_newuserdata(L, sizeof(struct flush));
+
+	state->L = L;
+	state->trigger = lua_tonumber(L, 1);
+	return 1;
+}
+
+LUA_API int luaopen_libflush(lua_State *L)
+{
+	lua_pushcfunction(L, init);
+	return 1;
+}
diff --git a/test/lj-flush-on-trace/test.lua b/test/lj-flush-on-trace/test.lua
new file mode 100644
index 0000000..ff6e0b6
--- /dev/null
+++ b/test/lj-flush-on-trace/test.lua
@@ -0,0 +1,25 @@
+local cfg = {
+  hotloop = arg[1] or 1,
+  trigger = arg[2] or 1,
+}
+
+local ffi = require('ffi')
+local ffiflush = ffi.load('libflush')
+ffi.cdef('void flush(struct flush *state, int i)')
+
+-- Save the current coroutine and set the value to trigger ipp
+-- call the Lua routine instead of pure C implementation.
+local flush = require('libflush')(cfg.trigger)
+
+-- Depending on trigger and hotloop values the following contexts
+-- are possible:
+-- * if trigger <= hotloop -> trace recording is aborted
+-- * if trigger >  hotloop -> trace is recorded but execution
+--   leads to panic
+jit.opt.start("3", string.format("hotloop=%d", cfg.hotloop))
+
+for i = 0, cfg.trigger + cfg.hotloop do
+  ffiflush.flush(flush, i)
+end
+-- Panic didn't occur earlier.
+print('OK')
-- 
2.25.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API Igor Munkin
@ 2020-03-28 16:33   ` Sergey Ostanevich
  2020-03-28 20:30     ` Igor Munkin
  2020-04-02 23:41   ` Vladislav Shpilevoy
  1 sibling, 1 reply; 20+ messages in thread
From: Sergey Ostanevich @ 2020-03-28 16:33 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

Hi!

Thanks for the patch, see below.
On 27 мар 13:47, Igor Munkin wrote:
> JIT recording semantics assumes FFI calls are atomic regarding the
> LuaJIT VM: if the execution exited Lua world through FFI it is not
> re-entering Lua world again.
> 
> However, there are ways to break this atomicity via FFI: one can
> re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> a result the following stack mix is created:
> | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> against eating it.
> 
> E.g. there is a trigger machinery in Tarantool calling Lua functions
> from C space. As a result of FFI function call with a trigger underneath
> the "FFI sandwich" is created.
> 
> This changeset introduces the mechanism for Lua-C API callbacks similar
> to the one existing for Lua-FFI: trace recording is aborted whe the
                                                              when
> execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> leads to platform panic.
Could you elaborate here in comment on this a little: it means this 
particular Lua call to be aborted, fail for the fiber Lua state, or for 
the whole Tarantool?

> 
> Relates to tarantool/tarantool#4427
> 
> Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> Signed-off-by: Igor Munkin <imun@tarantool.org>
> ---
>  src/lj_api.c                             | 35 ++++++++++----
>  src/lj_errmsg.h                          |  1 +
>  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
>  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
>  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
>  5 files changed, 113 insertions(+), 9 deletions(-)
>  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
>  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
>  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> 
> diff --git a/src/lj_api.c b/src/lj_api.c
> index a5e2465..c1f53e0 100644
> --- a/src/lj_api.c
> +++ b/src/lj_api.c
> @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
>    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
>  }
>  
> +static void call(lua_State *L, TValue *base, int nres) {
I would recommend to name the function in a more descriptive way, like
sane_call. 

> +  global_State *g = G(L);
> +  if (tvref(g->jit_base)) {
> +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> +    if (g->panic) g->panic(L);
> +    exit(EXIT_FAILURE);
> +  }
> +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> +  lj_vm_call(L, base, nres);
> +}
> +
>  /* -- Miscellaneous API functions ----------------------------------------- */
>  
>  LUA_API int lua_status(lua_State *L)

Also I have a question on perf impact - have you measured any
benchmarks - Lua only, or as part of Tarantool?

Regards,
Sergos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change Igor Munkin
@ 2020-03-28 19:36   ` Sergey Ostanevich
  2020-03-29 10:46     ` Igor Munkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Ostanevich @ 2020-03-28 19:36 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

Hi!

Thanks for the patch, LGTM.

Sergos.

On 27 мар 13:47, Igor Munkin wrote:
> Current luaJIT_setmode implementation aborts trace recording but nothing
> prevents calling it on already compiled trace. E.g. if one conditionally
> calls an FFI function having luaJIT_setmode with LUAJIT_MODE_FLUSH mode
> underneath, the trace being executed can be purged and the return
> address is invalidated as a result (since the mcode is released).
> 
> This changeset prohibits luaJIT_setmode call while mcode is being
> executed leading to platform panic if the call occurs.
> 
> Signed-off-by: Igor Munkin <imun@tarantool.org>
> ---
>  src/lj_dispatch.c                     |  5 +++++
>  src/lj_errmsg.h                       |  1 +
>  test/lj-flush-on-trace/CMakeLists.txt |  1 +
>  test/lj-flush-on-trace/libflush.c     | 31 +++++++++++++++++++++++++++
>  test/lj-flush-on-trace/test.lua       | 25 +++++++++++++++++++++
>  5 files changed, 63 insertions(+)
>  create mode 100644 test/lj-flush-on-trace/CMakeLists.txt
>  create mode 100644 test/lj-flush-on-trace/libflush.c
>  create mode 100644 test/lj-flush-on-trace/test.lua
> 
> diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
> index 5d6795f..b3448c8 100644
> --- a/src/lj_dispatch.c
> +++ b/src/lj_dispatch.c
> @@ -240,6 +240,11 @@ int luaJIT_setmode(lua_State *L, int idx, int mode)
>  {
>    global_State *g = G(L);
>    int mm = mode & LUAJIT_MODE_MASK;
> +  if (tvref(g->jit_base)) {
> +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITMODE));
> +    if (g->panic) g->panic(L);
> +    exit(EXIT_FAILURE);
> +  }
>    lj_trace_abort(g);  /* Abort recording on any state change. */
>    /* Avoid pulling the rug from under our own feet. */
>    if ((g->hookmask & HOOK_GC))
> diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
> index 1580385..de7b867 100644
> --- a/src/lj_errmsg.h
> +++ b/src/lj_errmsg.h
> @@ -113,6 +113,7 @@ ERRDEF(NOJIT,	"JIT compiler permanently disabled by build option")
>  #endif
>  ERRDEF(JITOPT,	"unknown or malformed optimization flag " LUA_QS)
>  ERRDEF(JITCALL,	"Lua VM re-entrancy is detected while executing the trace")
> +ERRDEF(JITMODE,	"JIT mode change is detected while executing the trace")
>  
>  /* Lexer/parser errors. */
>  ERRDEF(XMODE,	"attempt to load chunk with wrong mode")
> diff --git a/test/lj-flush-on-trace/CMakeLists.txt b/test/lj-flush-on-trace/CMakeLists.txt
> new file mode 100644
> index 0000000..a90452d
> --- /dev/null
> +++ b/test/lj-flush-on-trace/CMakeLists.txt
> @@ -0,0 +1 @@
> +build_lualib(libflush libflush.c)
> diff --git a/test/lj-flush-on-trace/libflush.c b/test/lj-flush-on-trace/libflush.c
> new file mode 100644
> index 0000000..177409a
> --- /dev/null
> +++ b/test/lj-flush-on-trace/libflush.c
> @@ -0,0 +1,31 @@
> +#include <lua.h>
> +#include <luajit.h>
> +
> +struct flush {
> +	lua_State *L; /* Coroutine saved to change JIT mode */
> +	int trigger;  /* Trigger for flushing all traces */
> +};
> +
> +void flush(struct flush *state, int i)
> +{
> +	if (i < state->trigger)
> +		return;
> +
> +	/* Trace flushing is triggered */
> +	(void)luaJIT_setmode(state->L, 0, LUAJIT_MODE_ENGINE|LUAJIT_MODE_FLUSH);
> +}
> +
> +static int init(lua_State *L)
> +{
> +	struct flush *state = lua_newuserdata(L, sizeof(struct flush));
> +
> +	state->L = L;
> +	state->trigger = lua_tonumber(L, 1);
> +	return 1;
> +}
> +
> +LUA_API int luaopen_libflush(lua_State *L)
> +{
> +	lua_pushcfunction(L, init);
> +	return 1;
> +}
> diff --git a/test/lj-flush-on-trace/test.lua b/test/lj-flush-on-trace/test.lua
> new file mode 100644
> index 0000000..ff6e0b6
> --- /dev/null
> +++ b/test/lj-flush-on-trace/test.lua
> @@ -0,0 +1,25 @@
> +local cfg = {
> +  hotloop = arg[1] or 1,
> +  trigger = arg[2] or 1,
> +}
> +
> +local ffi = require('ffi')
> +local ffiflush = ffi.load('libflush')
> +ffi.cdef('void flush(struct flush *state, int i)')
> +
> +-- Save the current coroutine and set the value to trigger ipp
> +-- call the Lua routine instead of pure C implementation.
> +local flush = require('libflush')(cfg.trigger)
> +
> +-- Depending on trigger and hotloop values the following contexts
> +-- are possible:
> +-- * if trigger <= hotloop -> trace recording is aborted
> +-- * if trigger >  hotloop -> trace is recorded but execution
> +--   leads to panic
> +jit.opt.start("3", string.format("hotloop=%d", cfg.hotloop))
> +
> +for i = 0, cfg.trigger + cfg.hotloop do
> +  ffiflush.flush(flush, i)
> +end
> +-- Panic didn't occur earlier.
> +print('OK')
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-28 16:33   ` Sergey Ostanevich
@ 2020-03-28 20:30     ` Igor Munkin
  2020-03-29  9:21       ` Sergey Ostanevich
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Munkin @ 2020-03-28 20:30 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergos,

Thanks for you review!

On 28.03.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch, see below.
> On 27 мар 13:47, Igor Munkin wrote:
> > JIT recording semantics assumes FFI calls are atomic regarding the
> > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > re-entering Lua world again.
> > 
> > However, there are ways to break this atomicity via FFI: one can
> > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > a result the following stack mix is created:
> > | Lua-FFI -> С routine -> Lua-C API -> Lua VM

Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.

> > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > against eating it.
> > 
> > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > from C space. As a result of FFI function call with a trigger underneath
> > the "FFI sandwich" is created.
> > 
> > This changeset introduces the mechanism for Lua-C API callbacks similar
> > to the one existing for Lua-FFI: trace recording is aborted whe the
>                                                               when

Thanks, fixed.

> > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > leads to platform panic.
> Could you elaborate here in comment on this a little: it means this 
> particular Lua call to be aborted, fail for the fiber Lua state, or for 
> the whole Tarantool?
> 

The whole Tarantool finishes its execution with EXIT_FAILURE code and
calls tarantool_panic_handler routine prior to the exit. Should I add
this remark into commit message?

> > 
> > Relates to tarantool/tarantool#4427
> > 
> > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > ---
> >  src/lj_api.c                             | 35 ++++++++++----
> >  src/lj_errmsg.h                          |  1 +
> >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> >  5 files changed, 113 insertions(+), 9 deletions(-)
> >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > 
> > diff --git a/src/lj_api.c b/src/lj_api.c
> > index a5e2465..c1f53e0 100644
> > --- a/src/lj_api.c
> > +++ b/src/lj_api.c
> > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> >  }
> >  
> > +static void call(lua_State *L, TValue *base, int nres) {
> I would recommend to name the function in a more descriptive way, like
> sane_call. 

IMHO no_trace_call / no_jit_call looks good to me, but feel free to
propose the naming you prefer more.

> 
> > +  global_State *g = G(L);
> > +  if (tvref(g->jit_base)) {
> > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > +    if (g->panic) g->panic(L);
> > +    exit(EXIT_FAILURE);
> > +  }
> > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > +  lj_vm_call(L, base, nres);
> > +}
> > +
> >  /* -- Miscellaneous API functions ----------------------------------------- */
> >  
> >  LUA_API int lua_status(lua_State *L)
> 
> Also I have a question on perf impact - have you measured any
> benchmarks - Lua only, or as part of Tarantool

I compared two sysbench logs ([1] and [2]) via vimdiff and see no
critical difference.

> 
> Regards,
> Sergos

[1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
[2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-28 20:30     ` Igor Munkin
@ 2020-03-29  9:21       ` Sergey Ostanevich
  2020-03-29 10:45         ` Igor Munkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Ostanevich @ 2020-03-29  9:21 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

On 28 мар 23:30, Igor Munkin wrote:
> Sergos,
> 
> Thanks for you review!
> 
> On 28.03.20, Sergey Ostanevich wrote:
> > Hi!
> > 
> > Thanks for the patch, see below.
> > On 27 мар 13:47, Igor Munkin wrote:
> > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > re-entering Lua world again.
> > > 
> > > However, there are ways to break this atomicity via FFI: one can
> > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > a result the following stack mix is created:
> > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> 
> Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> 
> > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > against eating it.
> > > 
> > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > from C space. As a result of FFI function call with a trigger underneath
> > > the "FFI sandwich" is created.
> > > 
> > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > to the one existing for Lua-FFI: trace recording is aborted whe the
> >                                                               when
> 
> Thanks, fixed.
> 
> > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > leads to platform panic.
> > Could you elaborate here in comment on this a little: it means this 
> > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > the whole Tarantool?
> > 
> 
> The whole Tarantool finishes its execution with EXIT_FAILURE code and
> calls tarantool_panic_handler routine prior to the exit. Should I add
> this remark into commit message?
I would like to see it in commit message.

> 
> > > 
> > > Relates to tarantool/tarantool#4427
> > > 
> > > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > > ---
> > >  src/lj_api.c                             | 35 ++++++++++----
> > >  src/lj_errmsg.h                          |  1 +
> > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > 
> > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > index a5e2465..c1f53e0 100644
> > > --- a/src/lj_api.c
> > > +++ b/src/lj_api.c
> > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > >  }
> > >  
> > > +static void call(lua_State *L, TValue *base, int nres) {
> > I would recommend to name the function in a more descriptive way, like
> > sane_call. 
> 
> IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> propose the naming you prefer more.
If you want to mention that this call aborts the trace generation,
then I would put it like 'call_w_trace_abort'
But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?
Although, safety resulting in a panic sounds paradoxical. 

BTW, should it be under 'LJ_HASFFI' and keep the original call
otherwise?

> 
> > 
> > > +  global_State *g = G(L);
> > > +  if (tvref(g->jit_base)) {
> > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > +    if (g->panic) g->panic(L);
> > > +    exit(EXIT_FAILURE);
> > > +  }
> > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > +  lj_vm_call(L, base, nres);
> > > +}
> > > +
> > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > >  
> > >  LUA_API int lua_status(lua_State *L)
> > 
> > Also I have a question on perf impact - have you measured any
> > benchmarks - Lua only, or as part of Tarantool
> 
> I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> critical difference.
> 
> > 
> > Regards,
> > Sergos
> 
> [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> 
> -- 
> Best regards,
> IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-29  9:21       ` Sergey Ostanevich
@ 2020-03-29 10:45         ` Igor Munkin
  2020-03-30  8:58           ` Sergey Ostanevich
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Munkin @ 2020-03-29 10:45 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergos,

On 29.03.20, Sergey Ostanevich wrote:
> On 28 мар 23:30, Igor Munkin wrote:
> > Sergos,
> > 
> > Thanks for you review!
> > 
> > On 28.03.20, Sergey Ostanevich wrote:
> > > Hi!
> > > 
> > > Thanks for the patch, see below.
> > > On 27 мар 13:47, Igor Munkin wrote:
> > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > re-entering Lua world again.
> > > > 
> > > > However, there are ways to break this atomicity via FFI: one can
> > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > a result the following stack mix is created:
> > > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> > 
> > Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> > 
> > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > against eating it.
> > > > 
> > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > from C space. As a result of FFI function call with a trigger underneath
> > > > the "FFI sandwich" is created.
> > > > 
> > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > to the one existing for Lua-FFI: trace recording is aborted whe the
> > >                                                               when
> > 
> > Thanks, fixed.
> > 
> > > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > > leads to platform panic.
> > > Could you elaborate here in comment on this a little: it means this 
> > > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > > the whole Tarantool?
> > > 
> > 
> > The whole Tarantool finishes its execution with EXIT_FAILURE code and
> > calls tarantool_panic_handler routine prior to the exit. Should I add
> > this remark into commit message?
> I would like to see it in commit message.

Added. Here is a new commit message:

================================================================================

jit: abort trace recording and execution for C API

JIT recording semantics assumes FFI calls are atomic regarding the
LuaJIT VM: if the execution exited Lua world through FFI it is not
re-entering Lua world again.

However, there are ways to break this atomicity via FFI: one can
re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
a result the following stack mix is created:
| Lua-FFI -> C routine -> Lua-C API -> Lua VM
This sort of re-entrancy is explicitly not supported by LuaJIT compiler
machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
against eating it.

E.g. there is a trigger machinery in Tarantool calling Lua functions
from C space. As a result of FFI function call with a trigger underneath
the "FFI sandwich" is created.

This changeset introduces the mechanism for Lua-C API callbacks similar
to the one existing for Lua-FFI: trace recording is aborted when the
execution re-enters LuaJIT VM. If re-entrancy is detected while running
mcode the platform finishes its execution with EXIT_FAILURE code and
calls panic routine prior to the exit.

Relates to tarantool/tarantool#4427

================================================================================

> 
> > 
> > > > 
> > > > Relates to tarantool/tarantool#4427
> > > > 
> > > > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > > > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > > > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > > > ---
> > > >  src/lj_api.c                             | 35 ++++++++++----
> > > >  src/lj_errmsg.h                          |  1 +
> > > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > > 
> > > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > > index a5e2465..c1f53e0 100644
> > > > --- a/src/lj_api.c
> > > > +++ b/src/lj_api.c
> > > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > > >  }
> > > >  
> > > > +static void call(lua_State *L, TValue *base, int nres) {
> > > I would recommend to name the function in a more descriptive way, like
> > > sane_call. 
> > 
> > IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> > propose the naming you prefer more.
> If you want to mention that this call aborts the trace generation,
> then I would put it like 'call_w_trace_abort'

It not only aborts the trace recording, but prohibits further execution
for already compiled code.

> But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?

The changes are not related to FFI at all. FFI helps to achieve the
platform failure, but the problem relates to JIT machinery. So this name
doesn't clarify function semantics.

> Although, safety resulting in a panic sounds paradoxical. 

Totally agree.

> 
> BTW, should it be under 'LJ_HASFFI' and keep the original call
> otherwise?

I see no benefits for it. Again, the fix doesn't relates to FFI
itself. It's just a particular way leading to JIT inconsistency.

Feel free to correct me, if I'm wrong.

> 
> > 
> > > 
> > > > +  global_State *g = G(L);
> > > > +  if (tvref(g->jit_base)) {
> > > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > > +    if (g->panic) g->panic(L);
> > > > +    exit(EXIT_FAILURE);
> > > > +  }
> > > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > > +  lj_vm_call(L, base, nres);
> > > > +}
> > > > +
> > > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > > >  
> > > >  LUA_API int lua_status(lua_State *L)
> > > 
> > > Also I have a question on perf impact - have you measured any
> > > benchmarks - Lua only, or as part of Tarantool
> > 
> > I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> > critical difference.
> > 
> > > 
> > > Regards,
> > > Sergos
> > 
> > [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> > [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> > 
> > -- 
> > Best regards,
> > IM

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change
  2020-03-28 19:36   ` Sergey Ostanevich
@ 2020-03-29 10:46     ` Igor Munkin
  0 siblings, 0 replies; 20+ messages in thread
From: Igor Munkin @ 2020-03-29 10:46 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergos,

Thanks for you review!

On 28.03.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch, LGTM.
> 
> Sergos.
> 
> On 27 мар 13:47, Igor Munkin wrote:
> > Current luaJIT_setmode implementation aborts trace recording but nothing
> > prevents calling it on already compiled trace. E.g. if one conditionally
> > calls an FFI function having luaJIT_setmode with LUAJIT_MODE_FLUSH mode
> > underneath, the trace being executed can be purged and the return
> > address is invalidated as a result (since the mcode is released).
> > 
> > This changeset prohibits luaJIT_setmode call while mcode is being
> > executed leading to platform panic if the call occurs.

I reworded this section considering your comment about platform panic:

================================================================================

This changeset prohibits luaJIT_setmode call while mcode is being
executed. If the call occurs the platform finishes its execution with
EXIT_FAILURE code and calls panic routine prior to the exit.

================================================================================

> > 

Reviewed-by: Sergey Ostanevich <sergos@tarantool.org>

> > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > ---
> >  src/lj_dispatch.c                     |  5 +++++
> >  src/lj_errmsg.h                       |  1 +
> >  test/lj-flush-on-trace/CMakeLists.txt |  1 +
> >  test/lj-flush-on-trace/libflush.c     | 31 +++++++++++++++++++++++++++
> >  test/lj-flush-on-trace/test.lua       | 25 +++++++++++++++++++++
> >  5 files changed, 63 insertions(+)
> >  create mode 100644 test/lj-flush-on-trace/CMakeLists.txt
> >  create mode 100644 test/lj-flush-on-trace/libflush.c
> >  create mode 100644 test/lj-flush-on-trace/test.lua
> > 

<snipped>

> > -- 
> > 2.25.0
> > 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-29 10:45         ` Igor Munkin
@ 2020-03-30  8:58           ` Sergey Ostanevich
  2020-03-30 14:25             ` Igor Munkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Ostanevich @ 2020-03-30  8:58 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

On 29 мар 13:45, Igor Munkin wrote:
> Sergos,
> 
> On 29.03.20, Sergey Ostanevich wrote:
> > On 28 мар 23:30, Igor Munkin wrote:
> > > Sergos,
> > > 
> > > Thanks for you review!
> > > 
> > > On 28.03.20, Sergey Ostanevich wrote:
> > > > Hi!
> > > > 
> > > > Thanks for the patch, see below.
> > > > On 27 мар 13:47, Igor Munkin wrote:
> > > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > > re-entering Lua world again.
> > > > > 
> > > > > However, there are ways to break this atomicity via FFI: one can
> > > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > > a result the following stack mix is created:
> > > > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> > > 
> > > Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> > > 
> > > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > > against eating it.
> > > > > 
> > > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > > from C space. As a result of FFI function call with a trigger underneath
> > > > > the "FFI sandwich" is created.
> > > > > 
> > > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > > to the one existing for Lua-FFI: trace recording is aborted whe the
> > > >                                                               when
> > > 
> > > Thanks, fixed.
> > > 
> > > > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > > > leads to platform panic.
> > > > Could you elaborate here in comment on this a little: it means this 
> > > > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > > > the whole Tarantool?
> > > > 
> > > 
> > > The whole Tarantool finishes its execution with EXIT_FAILURE code and
> > > calls tarantool_panic_handler routine prior to the exit. Should I add
> > > this remark into commit message?
> > I would like to see it in commit message.
> 
> Added. Here is a new commit message:
> 
> ================================================================================
> 
> jit: abort trace recording and execution for C API
> 
> JIT recording semantics assumes FFI calls are atomic regarding the
> LuaJIT VM: if the execution exited Lua world through FFI it is not
> re-entering Lua world again.
> 
> However, there are ways to break this atomicity via FFI: one can
> re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> a result the following stack mix is created:
> | Lua-FFI -> C routine -> Lua-C API -> Lua VM
> This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> against eating it.
> 
> E.g. there is a trigger machinery in Tarantool calling Lua functions
> from C space. As a result of FFI function call with a trigger underneath
> the "FFI sandwich" is created.
> 
> This changeset introduces the mechanism for Lua-C API callbacks similar
> to the one existing for Lua-FFI: trace recording is aborted when the
> execution re-enters LuaJIT VM. If re-entrancy is detected while running
> mcode the platform finishes its execution with EXIT_FAILURE code and
> calls panic routine prior to the exit.
> 
> Relates to tarantool/tarantool#4427
> 
> ================================================================================

LGTM.

> 
> > 
> > > 
> > > > > 
> > > > > Relates to tarantool/tarantool#4427
> > > > > 
> > > > > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > > > > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > > > > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > > > > ---
> > > > >  src/lj_api.c                             | 35 ++++++++++----
> > > > >  src/lj_errmsg.h                          |  1 +
> > > > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > > > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > > > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > > > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > > > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > > > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > > > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > > > 
> > > > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > > > index a5e2465..c1f53e0 100644
> > > > > --- a/src/lj_api.c
> > > > > +++ b/src/lj_api.c
> > > > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > > > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > > > >  }
> > > > >  
> > > > > +static void call(lua_State *L, TValue *base, int nres) {
> > > > I would recommend to name the function in a more descriptive way, like
> > > > sane_call. 
> > > 
> > > IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> > > propose the naming you prefer more.
> > If you want to mention that this call aborts the trace generation,
> > then I would put it like 'call_w_trace_abort'
> 
> It not only aborts the trace recording, but prohibits further execution
> for already compiled code.
> 
> > But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?
> 
> The changes are not related to FFI at all. FFI helps to achieve the
> platform failure, but the problem relates to JIT machinery. So this name
> doesn't clarify function semantics.
> 
Can we sayn that this machinery tracks the machinery consistency? Then
the first name comes to my mind is consistent_call with in the code
comment on what consistency is - abort trace collection and inconsistent
trace excution.

> > Although, safety resulting in a panic sounds paradoxical. 
> 
> Totally agree.
> 
> > 
> > BTW, should it be under 'LJ_HASFFI' and keep the original call
> > otherwise?
> 
> I see no benefits for it. Again, the fix doesn't relates to FFI
> itself. It's just a particular way leading to JIT inconsistency.
> 
> Feel free to correct me, if I'm wrong.
> 
> > 
> > > 
> > > > 
> > > > > +  global_State *g = G(L);
> > > > > +  if (tvref(g->jit_base)) {
> > > > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > > > +    if (g->panic) g->panic(L);
> > > > > +    exit(EXIT_FAILURE);
> > > > > +  }
> > > > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > > > +  lj_vm_call(L, base, nres);
> > > > > +}
> > > > > +
> > > > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > > > >  
> > > > >  LUA_API int lua_status(lua_State *L)
> > > > 
> > > > Also I have a question on perf impact - have you measured any
> > > > benchmarks - Lua only, or as part of Tarantool
> > > 
> > > I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> > > critical difference.
> > > 
> > > > 
> > > > Regards,
> > > > Sergos
> > > 
> > > [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> > > [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> > > 
> > > -- 
> > > Best regards,
> > > IM
> 
> -- 
> Best regards,
> IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-30  8:58           ` Sergey Ostanevich
@ 2020-03-30 14:25             ` Igor Munkin
  2020-04-03 21:06               ` Sergey Ostanevich
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Munkin @ 2020-03-30 14:25 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergos,

On 30.03.20, Sergey Ostanevich wrote:
> On 29 мар 13:45, Igor Munkin wrote:
> > Sergos,
> > 
> > On 29.03.20, Sergey Ostanevich wrote:
> > > On 28 мар 23:30, Igor Munkin wrote:
> > > > Sergos,
> > > > 
> > > > Thanks for you review!
> > > > 
> > > > On 28.03.20, Sergey Ostanevich wrote:
> > > > > Hi!
> > > > > 
> > > > > Thanks for the patch, see below.
> > > > > On 27 мар 13:47, Igor Munkin wrote:
> > > > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > > > re-entering Lua world again.
> > > > > > 
> > > > > > However, there are ways to break this atomicity via FFI: one can
> > > > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > > > a result the following stack mix is created:
> > > > > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> > > > 
> > > > Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> > > > 
> > > > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > > > against eating it.
> > > > > > 
> > > > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > > > from C space. As a result of FFI function call with a trigger underneath
> > > > > > the "FFI sandwich" is created.
> > > > > > 
> > > > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > > > to the one existing for Lua-FFI: trace recording is aborted whe the
> > > > >                                                               when
> > > > 
> > > > Thanks, fixed.
> > > > 
> > > > > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > > > > leads to platform panic.
> > > > > Could you elaborate here in comment on this a little: it means this 
> > > > > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > > > > the whole Tarantool?
> > > > > 
> > > > 
> > > > The whole Tarantool finishes its execution with EXIT_FAILURE code and
> > > > calls tarantool_panic_handler routine prior to the exit. Should I add
> > > > this remark into commit message?
> > > I would like to see it in commit message.
> > 
> > Added. Here is a new commit message:
> > 
> > ================================================================================
> > 
> > jit: abort trace recording and execution for C API
> > 
> > JIT recording semantics assumes FFI calls are atomic regarding the
> > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > re-entering Lua world again.
> > 
> > However, there are ways to break this atomicity via FFI: one can
> > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > a result the following stack mix is created:
> > | Lua-FFI -> C routine -> Lua-C API -> Lua VM
> > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > against eating it.
> > 
> > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > from C space. As a result of FFI function call with a trigger underneath
> > the "FFI sandwich" is created.
> > 
> > This changeset introduces the mechanism for Lua-C API callbacks similar
> > to the one existing for Lua-FFI: trace recording is aborted when the
> > execution re-enters LuaJIT VM. If re-entrancy is detected while running
> > mcode the platform finishes its execution with EXIT_FAILURE code and
> > calls panic routine prior to the exit.
> > 
> > Relates to tarantool/tarantool#4427
> > 
> > ================================================================================
> 
> LGTM.
> 
> > 
> > > 
> > > > 
> > > > > > 
> > > > > > Relates to tarantool/tarantool#4427
> > > > > > 
> > > > > > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > > > > > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > > > > > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > > > > > ---
> > > > > >  src/lj_api.c                             | 35 ++++++++++----
> > > > > >  src/lj_errmsg.h                          |  1 +
> > > > > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > > > > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > > > > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > > > > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > > > > 
> > > > > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > > > > index a5e2465..c1f53e0 100644
> > > > > > --- a/src/lj_api.c
> > > > > > +++ b/src/lj_api.c
> > > > > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > > > > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > > > > >  }
> > > > > >  
> > > > > > +static void call(lua_State *L, TValue *base, int nres) {
> > > > > I would recommend to name the function in a more descriptive way, like
> > > > > sane_call. 
> > > > 
> > > > IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> > > > propose the naming you prefer more.
> > > If you want to mention that this call aborts the trace generation,
> > > then I would put it like 'call_w_trace_abort'
> > 
> > It not only aborts the trace recording, but prohibits further execution
> > for already compiled code.
> > 
> > > But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?
> > 
> > The changes are not related to FFI at all. FFI helps to achieve the
> > platform failure, but the problem relates to JIT machinery. So this name
> > doesn't clarify function semantics.
> > 
> Can we sayn that this machinery tracks the machinery consistency? Then
> the first name comes to my mind is consistent_call with in the code
> comment on what consistency is - abort trace collection and inconsistent
> trace excution.

As a result of discussion, we stopped on the jit_secure_call name. The
diff is below:

================================================================================

diff --git a/src/lj_api.c b/src/lj_api.c
index c1f53e0..4ff8ccf 100644
--- a/src/lj_api.c
+++ b/src/lj_api.c
@@ -76,8 +76,9 @@ static GCtab *getcurrenv(lua_State *L)
   return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
 }
 
-static void call(lua_State *L, TValue *base, int nres) {
+static void jit_secure_call(lua_State *L, TValue *base, int nres) {
   global_State *g = G(L);
+  /* Forbid Lua world re-entrancy while running the trace */
   if (tvref(g->jit_base)) {
     setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
     if (g->panic) g->panic(L);
@@ -311,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
       return (int)(uintptr_t)base;
     } else {
       L->top = base+2;
-      call(L, base, 1+1);
+      jit_secure_call(L, base, 1+1);
       L->top -= 2+LJ_FR2;
       return tvistruecond(L->top+1+LJ_FR2);
     }
@@ -334,7 +334,7 @@ LUA_API int lua_lessthan(lua_State *L, int idx1, int idx2)
       return (int)(uintptr_t)base;
     } else {
       L->top = base+2;
-      call(L, base, 1+1);
+      jit_secure_call(L, base, 1+1);
       L->top -= 2+LJ_FR2;
       return tvistruecond(L->top+1+LJ_FR2);
     }
@@ -786,7 +786,7 @@ LUA_API void lua_concat(lua_State *L, int n)
       }
       n -= (int)(L->top - top);
       L->top = top+2;
-      call(L, top, 1+1);
+      jit_secure_call(L, top, 1+1);
       L->top -= 1+LJ_FR2;
       copyTV(L, L->top-1, L->top+LJ_FR2);
     } while (--n > 0);
@@ -806,7 +806,7 @@ LUA_API void lua_gettable(lua_State *L, int idx)
   v = lj_meta_tget(L, t, L->top-1);
   if (v == NULL) {
     L->top += 2;
-    call(L, L->top-2, 1+1);
+    jit_secure_call(L, L->top-2, 1+1);
     L->top -= 2+LJ_FR2;
     v = L->top+1+LJ_FR2;
   }
@@ -822,7 +822,7 @@ LUA_API void lua_getfield(lua_State *L, int idx, const char *k)
   v = lj_meta_tget(L, t, &key);
   if (v == NULL) {
     L->top += 2;
-    call(L, L->top-2, 1+1);
+    jit_secure_call(L, L->top-2, 1+1);
     L->top -= 2+LJ_FR2;
     v = L->top+1+LJ_FR2;
   }
@@ -977,7 +977,7 @@ LUA_API void lua_settable(lua_State *L, int idx)
     TValue *base = L->top;
     copyTV(L, base+2, base-3-2*LJ_FR2);
     L->top = base+3;
-    call(L, base, 0+1);
+    jit_secure_call(L, base, 0+1);
     L->top -= 3+LJ_FR2;
   }
 }
@@ -998,7 +998,7 @@ LUA_API void lua_setfield(lua_State *L, int idx, const char *k)
     TValue *base = L->top;
     copyTV(L, base+2, base-3-2*LJ_FR2);
     L->top = base+3;
-    call(L, base, 0+1);
+    jit_secure_call(L, base, 0+1);
     L->top -= 2+LJ_FR2;
   }
 }
@@ -1129,7 +1129,7 @@ LUA_API void lua_call(lua_State *L, int nargs, int nresults)
 {
   api_check(L, L->status == LUA_OK || L->status == LUA_ERRERR);
   api_checknelems(L, nargs+1);
-  call(L, api_call_base(L, nargs), nresults+1);
+  jit_secure_call(L, api_call_base(L, nargs), nresults+1);
 }
 
 LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
@@ -1147,6 +1148,7 @@ LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
     api_checkvalidindex(L, o);
     ef = savestack(L, o);
   }
+  /* Forbid Lua world re-entrancy while running the trace */
   if (tvref(g->jit_base)) {
     setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
     if (g->panic) g->panic(L);
@@ -1189,7 +1189,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
     if (LJ_FR2) setnilV(top++);
     copyTV(L, top++, index2adr(L, idx));
     L->top = top;
-    call(L, top-1, 1+1);
+    jit_secure_call(L, top-1, 1+1);
     return 1;
   }
   return 0;

================================================================================

Squashed these changes with the original patch.

> 
> > > Although, safety resulting in a panic sounds paradoxical. 
> > 
> > Totally agree.
> > 
> > > 
> > > BTW, should it be under 'LJ_HASFFI' and keep the original call
> > > otherwise?
> > 
> > I see no benefits for it. Again, the fix doesn't relates to FFI
> > itself. It's just a particular way leading to JIT inconsistency.
> > 
> > Feel free to correct me, if I'm wrong.
> > 
> > > 
> > > > 
> > > > > 
> > > > > > +  global_State *g = G(L);
> > > > > > +  if (tvref(g->jit_base)) {
> > > > > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > > > > +    if (g->panic) g->panic(L);
> > > > > > +    exit(EXIT_FAILURE);
> > > > > > +  }
> > > > > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > > > > +  lj_vm_call(L, base, nres);
> > > > > > +}
> > > > > > +
> > > > > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > > > > >  
> > > > > >  LUA_API int lua_status(lua_State *L)
> > > > > 
> > > > > Also I have a question on perf impact - have you measured any
> > > > > benchmarks - Lua only, or as part of Tarantool
> > > > 
> > > > I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> > > > critical difference.
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Sergos
> > > > 
> > > > [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> > > > [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> > > > 
> > > > -- 
> > > > Best regards,
> > > > IM
> > 
> > -- 
> > Best regards,
> > IM

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change
  2020-03-27 10:47 [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change Igor Munkin
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API Igor Munkin
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change Igor Munkin
@ 2020-04-02 23:41 ` Vladislav Shpilevoy
  2020-04-03 21:32   ` Igor Munkin
  2 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-04-02 23:41 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich, Kirill Yukhin; +Cc: tarantool-patches

Hi! Thanks for the patch!

I am getting build errors here:

Undefined symbols for architecture x86_64:
  "__Unwind_DeleteException", referenced from:
      _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
  "__Unwind_GetCFA", referenced from:
      _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
  "__Unwind_RaiseException", referenced from:
      _lj_err_throw in libluajit.a(lj_err.o)
  "__Unwind_SetGR", referenced from:
      _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
  "__Unwind_SetIP", referenced from:
      _lj_err_unwind_dwarf in libluajit.a(lj_err.o)

I think they are not related to your patch, but
probably you know how to fix them? Did I miss a build
option? I used 'make -j'.

I saw you fixed some things requested by Sergey, but I don't
see them on the branch. Did you push the latest version? For
example, in the first commit you had a typo in the commit
message - 'whe' instead of 'when'. I still see 'whe', on github
too.

On 27/03/2020 11:47, Igor Munkin wrote:
> This series closes two issues related to the JIT machinery behaviour:
> * "FFI sandwich"(*) detection is introduced. If sandwich is detected
>   while trace recording the recording is aborted. The sandwich detected
>   while mcode execution leads to the platform panic.
> * luaJIT_setmode call is prohibited while mcode execution and leads to
>   the platform panic.
> 
> (*) The following stack mix is called FFI sandwich.
>     | Lua-FFI -> С routine -> Lua-C API -> Lua VM
>     This sort of re-entrancy is explicitly not supported by LuaJIT
>     compiler. For more info see [1].
> 
> Branch: https://github.com/tarantool/luajit/tree/imun/ffi-sandwich
> 
> [1]: https://github.com/tarantool/tarantool/issues/4427
> 
> Igor Munkin (2):
>   jit: abort trace recording and execution for C API
>   jit: abort trace execution on JIT mode change
> 
>  src/lj_api.c                             | 35 ++++++++++----
>  src/lj_dispatch.c                        |  5 ++
>  src/lj_errmsg.h                          |  2 +
>  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
>  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
>  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
>  test/lj-flush-on-trace/CMakeLists.txt    |  1 +
>  test/lj-flush-on-trace/libflush.c        | 31 +++++++++++++
>  test/lj-flush-on-trace/test.lua          | 25 ++++++++++
>  9 files changed, 176 insertions(+), 9 deletions(-)
>  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
>  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
>  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
>  create mode 100644 test/lj-flush-on-trace/CMakeLists.txt
>  create mode 100644 test/lj-flush-on-trace/libflush.c
>  create mode 100644 test/lj-flush-on-trace/test.lua
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API Igor Munkin
  2020-03-28 16:33   ` Sergey Ostanevich
@ 2020-04-02 23:41   ` Vladislav Shpilevoy
  2020-04-04 11:55     ` Igor Munkin
  1 sibling, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-04-02 23:41 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich, Kirill Yukhin; +Cc: tarantool-patches

Hi! Thanks for the patch!

> diff --git a/src/lj_api.c b/src/lj_api.c
> index a5e2465..c1f53e0 100644
> --- a/src/lj_api.c
> +++ b/src/lj_api.c
> @@ -300,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
>        return (int)(uintptr_t)base;
>      } else {
>        L->top = base+2;
> -      lj_vm_call(L, base, 1+1);
> +      call(L, base, 1+1);

These changes look dangerous to perf. So if I had operators ==
and < overloaded for an FFI struct, they are not jitted anymore,
right? Why? Does someone really yield in them? Could you measure
how does it affect perf of these calls? From what I see, these
changes basically kill FFI making it not better than Lua C. It no
longer helps JITting anything, does it?

The same for concat, and other things. Looks like an overkill. We
would need sandwich protection only for a few places. Is it possible
to avoid such drastic changes?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-03-30 14:25             ` Igor Munkin
@ 2020-04-03 21:06               ` Sergey Ostanevich
  2020-04-03 21:31                 ` Igor Munkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Ostanevich @ 2020-04-03 21:06 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

Hi!

LGTM.

Thanks.

On 30 мар 17:25, Igor Munkin wrote:
> Sergos,
> 
> On 30.03.20, Sergey Ostanevich wrote:
> > On 29 мар 13:45, Igor Munkin wrote:
> > > Sergos,
> > > 
> > > On 29.03.20, Sergey Ostanevich wrote:
> > > > On 28 мар 23:30, Igor Munkin wrote:
> > > > > Sergos,
> > > > > 
> > > > > Thanks for you review!
> > > > > 
> > > > > On 28.03.20, Sergey Ostanevich wrote:
> > > > > > Hi!
> > > > > > 
> > > > > > Thanks for the patch, see below.
> > > > > > On 27 мар 13:47, Igor Munkin wrote:
> > > > > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > > > > re-entering Lua world again.
> > > > > > > 
> > > > > > > However, there are ways to break this atomicity via FFI: one can
> > > > > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > > > > a result the following stack mix is created:
> > > > > > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> > > > > 
> > > > > Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> > > > > 
> > > > > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > > > > against eating it.
> > > > > > > 
> > > > > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > > > > from C space. As a result of FFI function call with a trigger underneath
> > > > > > > the "FFI sandwich" is created.
> > > > > > > 
> > > > > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > > > > to the one existing for Lua-FFI: trace recording is aborted whe the
> > > > > >                                                               when
> > > > > 
> > > > > Thanks, fixed.
> > > > > 
> > > > > > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > > > > > leads to platform panic.
> > > > > > Could you elaborate here in comment on this a little: it means this 
> > > > > > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > > > > > the whole Tarantool?
> > > > > > 
> > > > > 
> > > > > The whole Tarantool finishes its execution with EXIT_FAILURE code and
> > > > > calls tarantool_panic_handler routine prior to the exit. Should I add
> > > > > this remark into commit message?
> > > > I would like to see it in commit message.
> > > 
> > > Added. Here is a new commit message:
> > > 
> > > ================================================================================
> > > 
> > > jit: abort trace recording and execution for C API
> > > 
> > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > re-entering Lua world again.
> > > 
> > > However, there are ways to break this atomicity via FFI: one can
> > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > a result the following stack mix is created:
> > > | Lua-FFI -> C routine -> Lua-C API -> Lua VM
> > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > against eating it.
> > > 
> > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > from C space. As a result of FFI function call with a trigger underneath
> > > the "FFI sandwich" is created.
> > > 
> > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > to the one existing for Lua-FFI: trace recording is aborted when the
> > > execution re-enters LuaJIT VM. If re-entrancy is detected while running
> > > mcode the platform finishes its execution with EXIT_FAILURE code and
> > > calls panic routine prior to the exit.
> > > 
> > > Relates to tarantool/tarantool#4427
> > > 
> > > ================================================================================
> > 
> > LGTM.
> > 
> > > 
> > > > 
> > > > > 
> > > > > > > 
> > > > > > > Relates to tarantool/tarantool#4427
> > > > > > > 
> > > > > > > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > > > > > > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > > > > > > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > > > > > > ---
> > > > > > >  src/lj_api.c                             | 35 ++++++++++----
> > > > > > >  src/lj_errmsg.h                          |  1 +
> > > > > > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > > > > > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > > > > > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > > > > > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > > > > > 
> > > > > > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > > > > > index a5e2465..c1f53e0 100644
> > > > > > > --- a/src/lj_api.c
> > > > > > > +++ b/src/lj_api.c
> > > > > > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > > > > > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > > > > > >  }
> > > > > > >  
> > > > > > > +static void call(lua_State *L, TValue *base, int nres) {
> > > > > > I would recommend to name the function in a more descriptive way, like
> > > > > > sane_call. 
> > > > > 
> > > > > IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> > > > > propose the naming you prefer more.
> > > > If you want to mention that this call aborts the trace generation,
> > > > then I would put it like 'call_w_trace_abort'
> > > 
> > > It not only aborts the trace recording, but prohibits further execution
> > > for already compiled code.
> > > 
> > > > But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?
> > > 
> > > The changes are not related to FFI at all. FFI helps to achieve the
> > > platform failure, but the problem relates to JIT machinery. So this name
> > > doesn't clarify function semantics.
> > > 
> > Can we sayn that this machinery tracks the machinery consistency? Then
> > the first name comes to my mind is consistent_call with in the code
> > comment on what consistency is - abort trace collection and inconsistent
> > trace excution.
> 
> As a result of discussion, we stopped on the jit_secure_call name. The
> diff is below:
> 
> ================================================================================
> 
> diff --git a/src/lj_api.c b/src/lj_api.c
> index c1f53e0..4ff8ccf 100644
> --- a/src/lj_api.c
> +++ b/src/lj_api.c
> @@ -76,8 +76,9 @@ static GCtab *getcurrenv(lua_State *L)
>    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
>  }
>  
> -static void call(lua_State *L, TValue *base, int nres) {
> +static void jit_secure_call(lua_State *L, TValue *base, int nres) {
>    global_State *g = G(L);
> +  /* Forbid Lua world re-entrancy while running the trace */
>    if (tvref(g->jit_base)) {
>      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
>      if (g->panic) g->panic(L);
> @@ -311,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
>        return (int)(uintptr_t)base;
>      } else {
>        L->top = base+2;
> -      call(L, base, 1+1);
> +      jit_secure_call(L, base, 1+1);
>        L->top -= 2+LJ_FR2;
>        return tvistruecond(L->top+1+LJ_FR2);
>      }
> @@ -334,7 +334,7 @@ LUA_API int lua_lessthan(lua_State *L, int idx1, int idx2)
>        return (int)(uintptr_t)base;
>      } else {
>        L->top = base+2;
> -      call(L, base, 1+1);
> +      jit_secure_call(L, base, 1+1);
>        L->top -= 2+LJ_FR2;
>        return tvistruecond(L->top+1+LJ_FR2);
>      }
> @@ -786,7 +786,7 @@ LUA_API void lua_concat(lua_State *L, int n)
>        }
>        n -= (int)(L->top - top);
>        L->top = top+2;
> -      call(L, top, 1+1);
> +      jit_secure_call(L, top, 1+1);
>        L->top -= 1+LJ_FR2;
>        copyTV(L, L->top-1, L->top+LJ_FR2);
>      } while (--n > 0);
> @@ -806,7 +806,7 @@ LUA_API void lua_gettable(lua_State *L, int idx)
>    v = lj_meta_tget(L, t, L->top-1);
>    if (v == NULL) {
>      L->top += 2;
> -    call(L, L->top-2, 1+1);
> +    jit_secure_call(L, L->top-2, 1+1);
>      L->top -= 2+LJ_FR2;
>      v = L->top+1+LJ_FR2;
>    }
> @@ -822,7 +822,7 @@ LUA_API void lua_getfield(lua_State *L, int idx, const char *k)
>    v = lj_meta_tget(L, t, &key);
>    if (v == NULL) {
>      L->top += 2;
> -    call(L, L->top-2, 1+1);
> +    jit_secure_call(L, L->top-2, 1+1);
>      L->top -= 2+LJ_FR2;
>      v = L->top+1+LJ_FR2;
>    }
> @@ -977,7 +977,7 @@ LUA_API void lua_settable(lua_State *L, int idx)
>      TValue *base = L->top;
>      copyTV(L, base+2, base-3-2*LJ_FR2);
>      L->top = base+3;
> -    call(L, base, 0+1);
> +    jit_secure_call(L, base, 0+1);
>      L->top -= 3+LJ_FR2;
>    }
>  }
> @@ -998,7 +998,7 @@ LUA_API void lua_setfield(lua_State *L, int idx, const char *k)
>      TValue *base = L->top;
>      copyTV(L, base+2, base-3-2*LJ_FR2);
>      L->top = base+3;
> -    call(L, base, 0+1);
> +    jit_secure_call(L, base, 0+1);
>      L->top -= 2+LJ_FR2;
>    }
>  }
> @@ -1129,7 +1129,7 @@ LUA_API void lua_call(lua_State *L, int nargs, int nresults)
>  {
>    api_check(L, L->status == LUA_OK || L->status == LUA_ERRERR);
>    api_checknelems(L, nargs+1);
> -  call(L, api_call_base(L, nargs), nresults+1);
> +  jit_secure_call(L, api_call_base(L, nargs), nresults+1);
>  }
>  
>  LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
> @@ -1147,6 +1148,7 @@ LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
>      api_checkvalidindex(L, o);
>      ef = savestack(L, o);
>    }
> +  /* Forbid Lua world re-entrancy while running the trace */
>    if (tvref(g->jit_base)) {
>      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
>      if (g->panic) g->panic(L);
> @@ -1189,7 +1189,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
>      if (LJ_FR2) setnilV(top++);
>      copyTV(L, top++, index2adr(L, idx));
>      L->top = top;
> -    call(L, top-1, 1+1);
> +    jit_secure_call(L, top-1, 1+1);
>      return 1;
>    }
>    return 0;
> 
> ================================================================================
> 
> Squashed these changes with the original patch.
> 
> > 
> > > > Although, safety resulting in a panic sounds paradoxical. 
> > > 
> > > Totally agree.
> > > 
> > > > 
> > > > BTW, should it be under 'LJ_HASFFI' and keep the original call
> > > > otherwise?
> > > 
> > > I see no benefits for it. Again, the fix doesn't relates to FFI
> > > itself. It's just a particular way leading to JIT inconsistency.
> > > 
> > > Feel free to correct me, if I'm wrong.
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > +  global_State *g = G(L);
> > > > > > > +  if (tvref(g->jit_base)) {
> > > > > > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > > > > > +    if (g->panic) g->panic(L);
> > > > > > > +    exit(EXIT_FAILURE);
> > > > > > > +  }
> > > > > > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > > > > > +  lj_vm_call(L, base, nres);
> > > > > > > +}
> > > > > > > +
> > > > > > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > > > > > >  
> > > > > > >  LUA_API int lua_status(lua_State *L)
> > > > > > 
> > > > > > Also I have a question on perf impact - have you measured any
> > > > > > benchmarks - Lua only, or as part of Tarantool
> > > > > 
> > > > > I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> > > > > critical difference.
> > > > > 
> > > > > > 
> > > > > > Regards,
> > > > > > Sergos
> > > > > 
> > > > > [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> > > > > [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> > > > > 
> > > > > -- 
> > > > > Best regards,
> > > > > IM
> > > 
> > > -- 
> > > Best regards,
> > > IM
> 
> -- 
> Best regards,
> IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-04-03 21:06               ` Sergey Ostanevich
@ 2020-04-03 21:31                 ` Igor Munkin
  0 siblings, 0 replies; 20+ messages in thread
From: Igor Munkin @ 2020-04-03 21:31 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergos,

Thanks for your review!

Squashed the fixes, added your Reviewed-by tag and updated the upstream.

On 04.04.20, Sergey Ostanevich wrote:
> Hi!
> 
> LGTM.
> 
> Thanks.
> 
> On 30 мар 17:25, Igor Munkin wrote:
> > Sergos,
> > 
> > On 30.03.20, Sergey Ostanevich wrote:
> > > On 29 мар 13:45, Igor Munkin wrote:
> > > > Sergos,
> > > > 
> > > > On 29.03.20, Sergey Ostanevich wrote:
> > > > > On 28 мар 23:30, Igor Munkin wrote:
> > > > > > Sergos,
> > > > > > 
> > > > > > Thanks for you review!
> > > > > > 
> > > > > > On 28.03.20, Sergey Ostanevich wrote:
> > > > > > > Hi!
> > > > > > > 
> > > > > > > Thanks for the patch, see below.
> > > > > > > On 27 мар 13:47, Igor Munkin wrote:
> > > > > > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > > > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > > > > > re-entering Lua world again.
> > > > > > > > 
> > > > > > > > However, there are ways to break this atomicity via FFI: one can
> > > > > > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > > > > > a result the following stack mix is created:
> > > > > > > > | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> > > > > > 
> > > > > > Fixed typo <D0><A1> (cyrillic С letter) to an ASCII C.
> > > > > > 
> > > > > > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > > > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > > > > > against eating it.
> > > > > > > > 
> > > > > > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > > > > > from C space. As a result of FFI function call with a trigger underneath
> > > > > > > > the "FFI sandwich" is created.
> > > > > > > > 
> > > > > > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > > > > > to the one existing for Lua-FFI: trace recording is aborted whe the
> > > > > > >                                                               when
> > > > > > 
> > > > > > Thanks, fixed.
> > > > > > 
> > > > > > > > execution re-enters LuaJIT VM. Sandwich detection while mcode execution
> > > > > > > > leads to platform panic.
> > > > > > > Could you elaborate here in comment on this a little: it means this 
> > > > > > > particular Lua call to be aborted, fail for the fiber Lua state, or for 
> > > > > > > the whole Tarantool?
> > > > > > > 
> > > > > > 
> > > > > > The whole Tarantool finishes its execution with EXIT_FAILURE code and
> > > > > > calls tarantool_panic_handler routine prior to the exit. Should I add
> > > > > > this remark into commit message?
> > > > > I would like to see it in commit message.
> > > > 
> > > > Added. Here is a new commit message:
> > > > 
> > > > ================================================================================
> > > > 
> > > > jit: abort trace recording and execution for C API
> > > > 
> > > > JIT recording semantics assumes FFI calls are atomic regarding the
> > > > LuaJIT VM: if the execution exited Lua world through FFI it is not
> > > > re-entering Lua world again.
> > > > 
> > > > However, there are ways to break this atomicity via FFI: one can
> > > > re-enter LuaJIT VM via Lua C API inside the C routine called via FFI. As
> > > > a result the following stack mix is created:
> > > > | Lua-FFI -> C routine -> Lua-C API -> Lua VM
> > > > This sort of re-entrancy is explicitly not supported by LuaJIT compiler
> > > > machinery. @mraleph named such layout an "FFI sandwich" and I'm strictly
> > > > against eating it.
> > > > 
> > > > E.g. there is a trigger machinery in Tarantool calling Lua functions
> > > > from C space. As a result of FFI function call with a trigger underneath
> > > > the "FFI sandwich" is created.
> > > > 
> > > > This changeset introduces the mechanism for Lua-C API callbacks similar
> > > > to the one existing for Lua-FFI: trace recording is aborted when the
> > > > execution re-enters LuaJIT VM. If re-entrancy is detected while running
> > > > mcode the platform finishes its execution with EXIT_FAILURE code and
> > > > calls panic routine prior to the exit.
> > > > 
> > > > Relates to tarantool/tarantool#4427
> > > > 
> > > > ================================================================================
> > > 
> > > LGTM.
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > Relates to tarantool/tarantool#4427
> > > > > > > > 
> > > > > > > > Co-authored-by: Vyacheslav Egorov <vegorov@google.com>
> > > > > > > > Co-authored-by: Sergey Ostanevich <sergos@tarantool.org>
> > > > > > > > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > > > > > > > ---
> > > > > > > >  src/lj_api.c                             | 35 ++++++++++----
> > > > > > > >  src/lj_errmsg.h                          |  1 +
> > > > > > > >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> > > > > > > >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> > > > > > > >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> > > > > > > >  5 files changed, 113 insertions(+), 9 deletions(-)
> > > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> > > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> > > > > > > >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> > > > > > > > 
> > > > > > > > diff --git a/src/lj_api.c b/src/lj_api.c
> > > > > > > > index a5e2465..c1f53e0 100644
> > > > > > > > --- a/src/lj_api.c
> > > > > > > > +++ b/src/lj_api.c
> > > > > > > > @@ -76,6 +76,17 @@ static GCtab *getcurrenv(lua_State *L)
> > > > > > > >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static void call(lua_State *L, TValue *base, int nres) {
> > > > > > > I would recommend to name the function in a more descriptive way, like
> > > > > > > sane_call. 
> > > > > > 
> > > > > > IMHO no_trace_call / no_jit_call looks good to me, but feel free to
> > > > > > propose the naming you prefer more.
> > > > > If you want to mention that this call aborts the trace generation,
> > > > > then I would put it like 'call_w_trace_abort'
> > > > 
> > > > It not only aborts the trace recording, but prohibits further execution
> > > > for already compiled code.
> > > > 
> > > > > But it goes beyond that AFAU. Can you name it like 'ffi_safe_call'?
> > > > 
> > > > The changes are not related to FFI at all. FFI helps to achieve the
> > > > platform failure, but the problem relates to JIT machinery. So this name
> > > > doesn't clarify function semantics.
> > > > 
> > > Can we sayn that this machinery tracks the machinery consistency? Then
> > > the first name comes to my mind is consistent_call with in the code
> > > comment on what consistency is - abort trace collection and inconsistent
> > > trace excution.
> > 
> > As a result of discussion, we stopped on the jit_secure_call name. The
> > diff is below:
> > 
> > ================================================================================
> > 
> > diff --git a/src/lj_api.c b/src/lj_api.c
> > index c1f53e0..4ff8ccf 100644
> > --- a/src/lj_api.c
> > +++ b/src/lj_api.c
> > @@ -76,8 +76,9 @@ static GCtab *getcurrenv(lua_State *L)
> >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> >  }
> >  
> > -static void call(lua_State *L, TValue *base, int nres) {
> > +static void jit_secure_call(lua_State *L, TValue *base, int nres) {
> >    global_State *g = G(L);
> > +  /* Forbid Lua world re-entrancy while running the trace */
> >    if (tvref(g->jit_base)) {
> >      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> >      if (g->panic) g->panic(L);
> > @@ -311,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
> >        return (int)(uintptr_t)base;
> >      } else {
> >        L->top = base+2;
> > -      call(L, base, 1+1);
> > +      jit_secure_call(L, base, 1+1);
> >        L->top -= 2+LJ_FR2;
> >        return tvistruecond(L->top+1+LJ_FR2);
> >      }
> > @@ -334,7 +334,7 @@ LUA_API int lua_lessthan(lua_State *L, int idx1, int idx2)
> >        return (int)(uintptr_t)base;
> >      } else {
> >        L->top = base+2;
> > -      call(L, base, 1+1);
> > +      jit_secure_call(L, base, 1+1);
> >        L->top -= 2+LJ_FR2;
> >        return tvistruecond(L->top+1+LJ_FR2);
> >      }
> > @@ -786,7 +786,7 @@ LUA_API void lua_concat(lua_State *L, int n)
> >        }
> >        n -= (int)(L->top - top);
> >        L->top = top+2;
> > -      call(L, top, 1+1);
> > +      jit_secure_call(L, top, 1+1);
> >        L->top -= 1+LJ_FR2;
> >        copyTV(L, L->top-1, L->top+LJ_FR2);
> >      } while (--n > 0);
> > @@ -806,7 +806,7 @@ LUA_API void lua_gettable(lua_State *L, int idx)
> >    v = lj_meta_tget(L, t, L->top-1);
> >    if (v == NULL) {
> >      L->top += 2;
> > -    call(L, L->top-2, 1+1);
> > +    jit_secure_call(L, L->top-2, 1+1);
> >      L->top -= 2+LJ_FR2;
> >      v = L->top+1+LJ_FR2;
> >    }
> > @@ -822,7 +822,7 @@ LUA_API void lua_getfield(lua_State *L, int idx, const char *k)
> >    v = lj_meta_tget(L, t, &key);
> >    if (v == NULL) {
> >      L->top += 2;
> > -    call(L, L->top-2, 1+1);
> > +    jit_secure_call(L, L->top-2, 1+1);
> >      L->top -= 2+LJ_FR2;
> >      v = L->top+1+LJ_FR2;
> >    }
> > @@ -977,7 +977,7 @@ LUA_API void lua_settable(lua_State *L, int idx)
> >      TValue *base = L->top;
> >      copyTV(L, base+2, base-3-2*LJ_FR2);
> >      L->top = base+3;
> > -    call(L, base, 0+1);
> > +    jit_secure_call(L, base, 0+1);
> >      L->top -= 3+LJ_FR2;
> >    }
> >  }
> > @@ -998,7 +998,7 @@ LUA_API void lua_setfield(lua_State *L, int idx, const char *k)
> >      TValue *base = L->top;
> >      copyTV(L, base+2, base-3-2*LJ_FR2);
> >      L->top = base+3;
> > -    call(L, base, 0+1);
> > +    jit_secure_call(L, base, 0+1);
> >      L->top -= 2+LJ_FR2;
> >    }
> >  }
> > @@ -1129,7 +1129,7 @@ LUA_API void lua_call(lua_State *L, int nargs, int nresults)
> >  {
> >    api_check(L, L->status == LUA_OK || L->status == LUA_ERRERR);
> >    api_checknelems(L, nargs+1);
> > -  call(L, api_call_base(L, nargs), nresults+1);
> > +  jit_secure_call(L, api_call_base(L, nargs), nresults+1);
> >  }
> >  
> >  LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
> > @@ -1147,6 +1148,7 @@ LUA_API int lua_pcall(lua_State *L, int nargs, int nresults, int errfunc)
> >      api_checkvalidindex(L, o);
> >      ef = savestack(L, o);
> >    }
> > +  /* Forbid Lua world re-entrancy while running the trace */
> >    if (tvref(g->jit_base)) {
> >      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> >      if (g->panic) g->panic(L);
> > @@ -1189,7 +1189,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
> >      if (LJ_FR2) setnilV(top++);
> >      copyTV(L, top++, index2adr(L, idx));
> >      L->top = top;
> > -    call(L, top-1, 1+1);
> > +    jit_secure_call(L, top-1, 1+1);
> >      return 1;
> >    }
> >    return 0;
> > 
> > ================================================================================
> > 
> > Squashed these changes with the original patch.
> > 
> > > 
> > > > > Although, safety resulting in a panic sounds paradoxical. 
> > > > 
> > > > Totally agree.
> > > > 
> > > > > 
> > > > > BTW, should it be under 'LJ_HASFFI' and keep the original call
> > > > > otherwise?
> > > > 
> > > > I see no benefits for it. Again, the fix doesn't relates to FFI
> > > > itself. It's just a particular way leading to JIT inconsistency.
> > > > 
> > > > Feel free to correct me, if I'm wrong.
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > +  global_State *g = G(L);
> > > > > > > > +  if (tvref(g->jit_base)) {
> > > > > > > > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > > > > > > > +    if (g->panic) g->panic(L);
> > > > > > > > +    exit(EXIT_FAILURE);
> > > > > > > > +  }
> > > > > > > > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > > > > > > > +  lj_vm_call(L, base, nres);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /* -- Miscellaneous API functions ----------------------------------------- */
> > > > > > > >  
> > > > > > > >  LUA_API int lua_status(lua_State *L)
> > > > > > > 
> > > > > > > Also I have a question on perf impact - have you measured any
> > > > > > > benchmarks - Lua only, or as part of Tarantool
> > > > > > 
> > > > > > I compared two sysbench logs ([1] and [2]) via vimdiff and see no
> > > > > > critical difference.
> > > > > > 
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Sergos
> > > > > > 
> > > > > > [1]: https://gitlab.com/tarantool/tarantool/-/jobs/489420257/raw
> > > > > > [2]: https://gitlab.com/tarantool/tarantool/-/jobs/486707332/raw
> > > > > > 
> > > > > > -- 
> > > > > > Best regards,
> > > > > > IM
> > > > 
> > > > -- 
> > > > Best regards,
> > > > IM
> > 
> > -- 
> > Best regards,
> > IM

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change
  2020-04-02 23:41 ` [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or " Vladislav Shpilevoy
@ 2020-04-03 21:32   ` Igor Munkin
  2020-04-04 21:36     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Munkin @ 2020-04-03 21:32 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Thanks for the review!

On 03.04.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> I am getting build errors here:
> 
> Undefined symbols for architecture x86_64:
>   "__Unwind_DeleteException", referenced from:
>       _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
>   "__Unwind_GetCFA", referenced from:
>       _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
>   "__Unwind_RaiseException", referenced from:
>       _lj_err_throw in libluajit.a(lj_err.o)
>   "__Unwind_SetGR", referenced from:
>       _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
>   "__Unwind_SetIP", referenced from:
>       _lj_err_unwind_dwarf in libluajit.a(lj_err.o)
> 
> I think they are not related to your patch, but
> probably you know how to fix them? Did I miss a build
> option? I used 'make -j'.

Hm, it totally doesn't relate to the patch. AFAIR you use MacOS, so the
problems might be related to the external unwinder ([1], sec. 6.2) used
by luajit. Could you please share your build environment configuration?

> 
> I saw you fixed some things requested by Sergey, but I don't
> see them on the branch. Did you push the latest version? For
> example, in the first commit you had a typo in the commit
> message - 'whe' instead of 'when'. I still see 'whe', on github
> too.

Sorry, my fault. I had been waiting for Sergos comments related to my
post-review fixes, avoiding several force pushes. Our discussion had
been stalled for a while, and I've just received his LGTM and I've
pushed the fixed patches to the remote branch. Next time I'll update the
upstream ASAP.

Please, let me know, whether I need to send the actual series version.

> 
> On 27/03/2020 11:47, Igor Munkin wrote:
> > This series closes two issues related to the JIT machinery behaviour:
> > * "FFI sandwich"(*) detection is introduced. If sandwich is detected
> >   while trace recording the recording is aborted. The sandwich detected
> >   while mcode execution leads to the platform panic.
> > * luaJIT_setmode call is prohibited while mcode execution and leads to
> >   the platform panic.
> > 
> > (*) The following stack mix is called FFI sandwich.
> >     | Lua-FFI -> С routine -> Lua-C API -> Lua VM
> >     This sort of re-entrancy is explicitly not supported by LuaJIT
> >     compiler. For more info see [1].
> > 
> > Branch: https://github.com/tarantool/luajit/tree/imun/ffi-sandwich
> > 
> > [1]: https://github.com/tarantool/tarantool/issues/4427
> > 
> > Igor Munkin (2):
> >   jit: abort trace recording and execution for C API
> >   jit: abort trace execution on JIT mode change
> > 
> >  src/lj_api.c                             | 35 ++++++++++----
> >  src/lj_dispatch.c                        |  5 ++
> >  src/lj_errmsg.h                          |  2 +
> >  test/gh-4427-ffi-sandwich/CMakeLists.txt |  1 +
> >  test/gh-4427-ffi-sandwich/libsandwich.c  | 59 ++++++++++++++++++++++++
> >  test/gh-4427-ffi-sandwich/test.lua       | 26 +++++++++++
> >  test/lj-flush-on-trace/CMakeLists.txt    |  1 +
> >  test/lj-flush-on-trace/libflush.c        | 31 +++++++++++++
> >  test/lj-flush-on-trace/test.lua          | 25 ++++++++++
> >  9 files changed, 176 insertions(+), 9 deletions(-)
> >  create mode 100644 test/gh-4427-ffi-sandwich/CMakeLists.txt
> >  create mode 100644 test/gh-4427-ffi-sandwich/libsandwich.c
> >  create mode 100644 test/gh-4427-ffi-sandwich/test.lua
> >  create mode 100644 test/lj-flush-on-trace/CMakeLists.txt
> >  create mode 100644 test/lj-flush-on-trace/libflush.c
> >  create mode 100644 test/lj-flush-on-trace/test.lua
> > 

[1]: https://www.uclibc.org/docs/psABI-x86_64.pdf

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-04-02 23:41   ` Vladislav Shpilevoy
@ 2020-04-04 11:55     ` Igor Munkin
  2020-04-04 21:37       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 20+ messages in thread
From: Igor Munkin @ 2020-04-04 11:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Thanks for your review!

On 03.04.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> > diff --git a/src/lj_api.c b/src/lj_api.c
> > index a5e2465..c1f53e0 100644
> > --- a/src/lj_api.c
> > +++ b/src/lj_api.c
> > @@ -300,7 +311,7 @@ LUA_API int lua_equal(lua_State *L, int idx1, int idx2)
> >        return (int)(uintptr_t)base;
> >      } else {
> >        L->top = base+2;
> > -      lj_vm_call(L, base, 1+1);
> > +      call(L, base, 1+1);
> 
> These changes look dangerous to perf. So if I had operators ==
> and < overloaded for an FFI struct, they are not jitted anymore,
> right?

Nope. I guess I need to stop right here and clarify the patch a little.

Let's start with the patch itself: the introduced changes abort the
trace recording whether re-entrancy to Lua VM is detected or "panic" if
such re-entrancy is faced while running already compiled machine code. A
good rationale for such protection is the platform failure Nikita faced
in #4427: the C function called via FFI makes the trigger with a Lua-C
API call underneath run. However it also might be any Lua-C API function
respecting the one of available metamethods (e.g. lua_less, lua_equal,
etc). If the correponding metamethod is set, the platform also re-enters
to execute it. Such behaviour is also unnacceptable, so metamethod call
ought to be invoked via jit_secure_call too. It's not optional and we
can't avoid it, since there are no guarantees the situation never
occurs.

OK, what about FFI: it's a well isolated machinery, that handles
everything its own way: own trace recording semantics, own metamethod
handling (since only tables and userdata can have an individual
metatable FFI has the one <ffi> "first level" metatable), etc. Nothing
FFI-related is changed here.

If you run the following snippet, you'll see the artefacts for
successfully compiled trace in dump.out.
| require('jit.dump').start('+tbisrmXaT', 'dump.out')
| 
| local limit = -1ULL / 2
| local value = 1ULL
| local tests = { }
| 
| while value < limit do
|   value = value * 2
|   table.insert(tests, value < limit)
| end
| 
| jit.off() -- prevent spoiling dump.out with other traces.
| print(value, limit, #tests)

Since the patch relates only to Lua-C API, metamethods triggered while
running Lua code are also successfully compiled.
| require('jit.dump').start('+tbisrmXaT', 'dump.out')
| 
| local complex
| 
| local complex_mt = {
|   __add = function(a, b) return complex(a.re + b.re, a.im + b.im) end,
|   __eq  = function(a, b) return a.re == b.re and a.im == b.im end,
|   __tostring = function(o) return ('re: %d, im: %d'):format(o.re, o.im) end
| }
| 
| complex = function(re, im)
|   return setmetatable({ re = re, im = im }, complex_mt)
| end
| 
| local start = complex(0, 0)
| local finish = complex(64, 64)
| local step = complex(1, 1)
| 
| repeat
|   start = start + step
| until start == finish

Feel free to point the parts that looks good to be included into commit
message, comments, etc.

> Why? Does someone really yield in them?

The main idea is to protect ourselves in the future (we have already
discussed it with Kostja here[1]).

> Could you measure how does it affect perf of these calls?

Talking about perf: Sergos also asked to make benchmarks and I provided
the sysbench results[2] I obtained in Gitlab. I saw no dramatically
changes. I guess the reason the performance stays the same is that
all such places has been already handled manually in Tarantool sources
(like box_txn_rollback_to_savepoint after Leonid changes in #4427).

However, if you would like to see more benchmarks, please let me know.

> From what I see, these changes basically kill FFI making it not better
> than Lua C. It no longer helps JITting anything, does it?
> 
> The same for concat, and other things. Looks like an overkill. We
> would need sandwich protection only for a few places. Is it possible
> to avoid such drastic changes?

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2019-November/012732.html
[2]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-March/015236.html

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change
  2020-04-03 21:32   ` Igor Munkin
@ 2020-04-04 21:36     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-04-04 21:36 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

This is a Mac problem. I found the solution here:
https://github.com/LuaJIT/LuaJIT/issues/449

Tarantool sets these variables in cmake, this is why
it compiles as a part of the main repo.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-04-04 11:55     ` Igor Munkin
@ 2020-04-04 21:37       ` Vladislav Shpilevoy
  2020-04-07 21:16         ` Igor Munkin
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-04-04 21:37 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

See 8 comments below.

> diff --git a/src/lj_api.c b/src/lj_api.c
> index a5e2465..b543231 100644
> --- a/src/lj_api.c
> +++ b/src/lj_api.c
> @@ -76,6 +76,18 @@ static GCtab *getcurrenv(lua_State *L)
>    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
>  }
>  
> +static void jit_secure_call(lua_State *L, TValue *base, int nres) {

1. Many functions in luajit related to jitting have prefix 'lj_'.
Probably this should be applied here too. I didn't see any other
starting with 'jit_'.

> +  global_State *g = G(L);
> +  /* Forbid Lua world re-entrancy while running the trace */
> +  if (tvref(g->jit_base)) {
> +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> +    if (g->panic) g->panic(L);
> +    exit(EXIT_FAILURE);

2. Is it correct, that it is allowed to renter Lua, and the panic
will happen only if we see that during reentering the trace is
still being recorded (i.e. if there is a bug in lj)?

I don't see in the code above where is that check like 'the trace
is being recorded'. Could you please point at it?

> +  }
> +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> +  lj_vm_call(L, base, nres);
> +}
> @@ -1172,7 +1191,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
>      if (LJ_FR2) setnilV(top++);
>      copyTV(L, top++, index2adr(L, idx));
>      L->top = top;
> -    lj_vm_call(L, top-1, 1+1);
> +    jit_secure_call(L, top-1, 1+1);
>      return 1;
>    }

3. Why didn't you change lua_cpcall(), luaL_callmeta()?

> diff --git a/test/gh-4427-ffi-sandwich/CMakeLists.txt b/test/gh-4427-ffi-sandwich/CMakeLists.txt
> new file mode 100644
> index 0000000..995c6bb
> --- /dev/null
> +++ b/test/gh-4427-ffi-sandwich/CMakeLists.txt

4. So this is not built by luajit? Does not it have anything
to be more self sufficient here? This looks strange to add
a CMake file to a probject, which uses Makefiles.

Also this cmake does not work. I get error:

CMake Error at CMakeLists.txt:1 (build_lualib):
  Unknown CMake command "build_lualib".

This is a clear cyclic dependency on Tarantool. Can it be
built by luajit's own means?

This dependency becomes even stronger in the next commit,
which introduces another CMake file.

> @@ -0,0 +1 @@
> +build_lualib(libsandwich libsandwich.c)
> diff --git a/test/gh-4427-ffi-sandwich/libsandwich.c b/test/gh-4427-ffi-sandwich/libsandwich.c
> new file mode 100644
> index 0000000..2a24fb2
> --- /dev/null
> +++ b/test/gh-4427-ffi-sandwich/libsandwich.c
> @@ -0,0 +1,59 @@
> +#include <lua.h>
> +#include <lauxlib.h>
> +
> +struct sandwich {
> +	lua_State *L; /* Coroutine saved for a Lua call */
> +	int ref;      /* Anchor to the Lua function to be run */
> +	int trigger;  /* Trigger for switching to Lua call */
> +};
> +
> +int ipp(struct sandwich *state, int i)

5. What is 'ipp'?

> +{
> +	if (i < state->trigger)
> +		return i + 1;
> +
> +	/* Sandwich is triggered and Lua ipp function is called */
> +	lua_pushnumber(state->L, state->ref);
> +	lua_gettable(state->L, LUA_REGISTRYINDEX);
> +	lua_pushnumber(state->L, i);
> +	lua_call(state->L, 1, 1);
> +	return lua_tonumber(state->L, -1);
> 
>> These changes look dangerous to perf. So if I had operators ==
>> and < overloaded for an FFI struct, they are not jitted anymore,
>> right?
> 
> Nope. I guess I need to stop right here and clarify the patch a little.
> 
> Let's start with the patch itself: the introduced changes abort the
> trace recording whether re-entrancy to Lua VM is detected or "panic" if
> such re-entrancy is faced while running already compiled machine code. A
> good rationale for such protection is the platform failure Nikita faced
> in #4427: the C function called via FFI makes the trigger with a Lua-C
> API call underneath run. However it also might be any Lua-C API function
> respecting the one of available metamethods (e.g. lua_less, lua_equal,
> etc). If the correponding metamethod is set, the platform also re-enters
> to execute it. Such behaviour is also unnacceptable, so metamethod call
> ought to be invoked via jit_secure_call too. It's not optional and we
> can't avoid it, since there are no guarantees the situation never
> occurs.

6. Yeah, ok, now I see, the changes functions are from the
public C Lua API, and are called by external code only. Such
as from tarantool. I was afraid, that lj uses them for all
comparisons internally.

> OK, what about FFI: it's a well isolated machinery, that handles
> everything its own way: own trace recording semantics, own metamethod
> handling (since only tables and userdata can have an individual
> metatable FFI has the one <ffi> "first level" metatable), etc. Nothing
> FFI-related is changed here.

7. If this has nothing to do with ffi, then why it is used
in the test?

> If you run the following snippet, you'll see the artefacts for
> successfully compiled trace in dump.out.
> | require('jit.dump').start('+tbisrmXaT', 'dump.out')
> | 
> | local limit = -1ULL / 2
> | local value = 1ULL
> | local tests = { }
> | 
> | while value < limit do
> |   value = value * 2
> |   table.insert(tests, value < limit)
> | end
> | 
> | jit.off() -- prevent spoiling dump.out with other traces.
> | print(value, limit, #tests)
> 
> Since the patch relates only to Lua-C API, metamethods triggered while
> running Lua code are also successfully compiled.
> | require('jit.dump').start('+tbisrmXaT', 'dump.out')
> | 
> | local complex
> | 
> | local complex_mt = {
> |   __add = function(a, b) return complex(a.re + b.re, a.im + b.im) end,
> |   __eq  = function(a, b) return a.re == b.re and a.im == b.im end,
> |   __tostring = function(o) return ('re: %d, im: %d'):format(o.re, o.im) end
> | }
> | 
> | complex = function(re, im)
> |   return setmetatable({ re = re, im = im }, complex_mt)
> | end
> | 
> | local start = complex(0, 0)
> | local finish = complex(64, 64)
> | local step = complex(1, 1)
> | 
> | repeat
> |   start = start + step
> | until start == finish
> 
> Feel free to point the parts that looks good to be included into commit
> message, comments, etc.
> 
>> Why? Does someone really yield in them?
> 
> The main idea is to protect ourselves in the future (we have already
> discussed it with Kostja here[1]).
> 
>> Could you measure how does it affect perf of these calls?
> 
> Talking about perf: Sergos also asked to make benchmarks and I provided
> the sysbench results[2] I obtained in Gitlab. I saw no dramatically
> changes. I guess the reason the performance stays the same is that
> all such places has been already handled manually in Tarantool sources
> (like box_txn_rollback_to_savepoint after Leonid changes in #4427).
> 
> However, if you would like to see more benchmarks, please let me know.
> 
>> From what I see, these changes basically kill FFI making it not better
>> than Lua C. It no longer helps JITting anything, does it?
>>
>> The same for concat, and other things. Looks like an overkill. We
>> would need sandwich protection only for a few places. Is it possible
>> to avoid such drastic changes?
> 
> [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2019-November/012732.html
> [2]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-March/015236.html
8. So are you saying, that ffi part is still fast?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API
  2020-04-04 21:37       ` Vladislav Shpilevoy
@ 2020-04-07 21:16         ` Igor Munkin
  0 siblings, 0 replies; 20+ messages in thread
From: Igor Munkin @ 2020-04-07 21:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Thanks for your review!

On 04.04.20, Vladislav Shpilevoy wrote:
> See 8 comments below.
> 
> > diff --git a/src/lj_api.c b/src/lj_api.c
> > index a5e2465..b543231 100644
> > --- a/src/lj_api.c
> > +++ b/src/lj_api.c
> > @@ -76,6 +76,18 @@ static GCtab *getcurrenv(lua_State *L)
> >    return fn->c.gct == ~LJ_TFUNC ? tabref(fn->c.env) : tabref(L->env);
> >  }
> >  
> > +static void jit_secure_call(lua_State *L, TValue *base, int nres) {
> 
> 1. Many functions in luajit related to jitting have prefix 'lj_'.
> Probably this should be applied here too. I didn't see any other
> starting with 'jit_'.

'lj_' prefix is used for extern functions in various units of internal
API. For static (i.e. private) routines no prefixes are used (consider
<index2adr>, <stkindex2adr>, <getcurrenv> helpers nearby in lj_api.c).
Originally, I named this function just <call>, but Sergos asked for a
more describing name in the neighbour thread and finally we chose this
one. I used underscores for readability, not for a prefix (I'm OK with
jitsecurecall though it much harder to parse :)). I *literally* OK with
any name if it is not misguiding one. Feel free to propose your naming
if you have a better one.

> 
> > +  global_State *g = G(L);
> > +  /* Forbid Lua world re-entrancy while running the trace */
> > +  if (tvref(g->jit_base)) {
> > +    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
> > +    if (g->panic) g->panic(L);
> > +    exit(EXIT_FAILURE);
> 
> 2. Is it correct, that it is allowed to renter Lua, and the panic
> will happen only if we see that during reentering the trace is
> still being recorded (i.e. if there is a bug in lj)?

Not quite right. there are two cases:

1. Trace is being recorded when re-entrancy occurs. In this case
lj_trace_abort macro asynchroniously aborts (i.e. signals to stop and
purge the result) trace recording prior to re-entering Lua VM.

2. Trace is being run when re-entrancy occurs. In this case jit_base
field is not NULL and contains an address to the guest stack frame base
on the moment trace started its execution. Unfortunately in this case we
can do nothing for now except panic.

> 
> I don't see in the code above where is that check like 'the trace
> is being recorded'. Could you please point at it?

Trace recording is aborted unconditionally via lj_trace_abort macro.

> 
> > +  }
> > +  lj_trace_abort(g);  /* Never record across Lua VM entrance */
> > +  lj_vm_call(L, base, nres);
> > +}
> > @@ -1172,7 +1191,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
> >      if (LJ_FR2) setnilV(top++);
> >      copyTV(L, top++, index2adr(L, idx));
> >      L->top = top;
> > -    lj_vm_call(L, top-1, 1+1);
> > +    jit_secure_call(L, top-1, 1+1);
> >      return 1;
> >    }
> 
> 3. Why didn't you change lua_cpcall(), luaL_callmeta()?

OK, let's start with luaL_callmeta: I've checked twice and see it is
patched. Here is the corresponding diff chunk:

================================================================================

@@ -1172,7 +1191,7 @@ LUALIB_API int luaL_callmeta(lua_State *L, int idx, const char *field)
     if (LJ_FR2) setnilV(top++);
     copyTV(L, top++, index2adr(L, idx));
     L->top = top;
-    lj_vm_call(L, top-1, 1+1);
+    jit_secure_call(L, top-1, 1+1);
     return 1;
   }
   return 0;

================================================================================

Now I clarify the situation regarding lua_cpcall: Yes, you're *formally*
right it re-enters Lua VM.

However, it just prepares the safe frame, creates a GC object of Lua
function type using the given C function (strictly saying lua_CFunction)
and executes it via BC_FUNCC. Yes, BC_FUNCC is "recordable" for platform
builtins and "fast" functions (i.e. functions with success paths
implemented in DynASM, and fallbacks implemented in C, e.g. tonumber,
rawget). Thereby calling arbitrary "extern" C function aborts recording.

If the call occurs while running already compiled trace it doesn't break
the following:
* it respects caller-callee convention and all callee-safe registers are
  preserved
* it leaves the guest stack unchanged (since lua_cpcall allows no values
  to be returned)
* it doesn't change global fields required by trace (e.g. jit_base)

*BUT* BC_FUNCC semantics changes vmstate field, where currently running
trace number is stored and after C function yields the execution back to
the VM the vmstate is not restored to traceno. This value is used within
vm_exit_handler to obtain trace number, so further execution might
lead to the platform failure.

Long story short: thanks for your preciseness, I re-checked lua_cpcall
sematics while writing this section and here is a fix for lua_cpcall:

================================================================================

diff --git a/src/lj_api.c b/src/lj_api.c
index b543231..c9b5d22 100644
--- a/src/lj_api.c
+++ b/src/lj_api.c
@@ -1179,6 +1179,13 @@ LUA_API int lua_cpcall(lua_State *L, lua_CFunction func, void *ud)
   uint8_t oldh = hook_save(g);
   int status;
   api_check(L, L->status == LUA_OK || L->status == LUA_ERRERR);
+  /* Forbid Lua world re-entrancy while running the trace */
+  if (tvref(g->jit_base)) {
+    setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
+    if (g->panic) g->panic(L);
+    exit(EXIT_FAILURE);
+  }
+  lj_trace_abort(g);  /* Never record across Lua VM entrance */
   status = lj_vm_cpcall(L, func, ud, cpcall);
   if (status) hook_restore(g, oldh);
   return status;

================================================================================

Squashed, force pushed to the upstream branch.

> 
> > diff --git a/test/gh-4427-ffi-sandwich/CMakeLists.txt b/test/gh-4427-ffi-sandwich/CMakeLists.txt
> > new file mode 100644
> > index 0000000..995c6bb
> > --- /dev/null
> > +++ b/test/gh-4427-ffi-sandwich/CMakeLists.txt
> 
> 4. So this is not built by luajit? Does not it have anything
> to be more self sufficient here? This looks strange to add
> a CMake file to a probject, which uses Makefiles.
> 
> Also this cmake does not work. I get error:
> 
> CMake Error at CMakeLists.txt:1 (build_lualib):
>   Unknown CMake command "build_lualib".
> 
> This is a clear cyclic dependency on Tarantool. Can it be
> built by luajit's own means?

No, it can't for now. LuaJIT tests are run by Tarantool testing
machinery and depends on it since[1].

> 
> This dependency becomes even stronger in the next commit,
> which introduces another CMake file.

I'm definitely going to rework LuaJIT testing environment to make it
more standalone and self-sufficient in the nearest future. Here is a
ticket for it[2].

> 
> > @@ -0,0 +1 @@
> > +build_lualib(libsandwich libsandwich.c)
> > diff --git a/test/gh-4427-ffi-sandwich/libsandwich.c b/test/gh-4427-ffi-sandwich/libsandwich.c
> > new file mode 100644
> > index 0000000..2a24fb2
> > --- /dev/null
> > +++ b/test/gh-4427-ffi-sandwich/libsandwich.c
> > @@ -0,0 +1,59 @@
> > +#include <lua.h>
> > +#include <lauxlib.h>
> > +
> > +struct sandwich {
> > +	lua_State *L; /* Coroutine saved for a Lua call */
> > +	int ref;      /* Anchor to the Lua function to be run */
> > +	int trigger;  /* Trigger for switching to Lua call */
> > +};
> > +
> > +int ipp(struct sandwich *state, int i)
> 
> 5. What is 'ipp'?

<ipp> stands for i-plus-plus. Does <increment> name look better?

> 
> > +{
> > +	if (i < state->trigger)
> > +		return i + 1;
> > +
> > +	/* Sandwich is triggered and Lua ipp function is called */
> > +	lua_pushnumber(state->L, state->ref);
> > +	lua_gettable(state->L, LUA_REGISTRYINDEX);
> > +	lua_pushnumber(state->L, i);
> > +	lua_call(state->L, 1, 1);
> > +	return lua_tonumber(state->L, -1);
> > 
> >> These changes look dangerous to perf. So if I had operators ==
> >> and < overloaded for an FFI struct, they are not jitted anymore,
> >> right?
> > 
> > Nope. I guess I need to stop right here and clarify the patch a little.
> > 
> > Let's start with the patch itself: the introduced changes abort the
> > trace recording whether re-entrancy to Lua VM is detected or "panic" if
> > such re-entrancy is faced while running already compiled machine code. A
> > good rationale for such protection is the platform failure Nikita faced
> > in #4427: the C function called via FFI makes the trigger with a Lua-C
> > API call underneath run. However it also might be any Lua-C API function
> > respecting the one of available metamethods (e.g. lua_less, lua_equal,
> > etc). If the correponding metamethod is set, the platform also re-enters
> > to execute it. Such behaviour is also unnacceptable, so metamethod call
> > ought to be invoked via jit_secure_call too. It's not optional and we
> > can't avoid it, since there are no guarantees the situation never
> > occurs.
> 
> 6. Yeah, ok, now I see, the changes functions are from the
> public C Lua API, and are called by external code only. Such
> as from tarantool. I was afraid, that lj uses them for all
> comparisons internally.
> 
> > OK, what about FFI: it's a well isolated machinery, that handles
> > everything its own way: own trace recording semantics, own metamethod
> > handling (since only tables and userdata can have an individual
> > metatable FFI has the one <ffi> "first level" metatable), etc. Nothing
> > FFI-related is changed here.
> 
> 7. If this has nothing to do with ffi, then why it is used
> in the test?

It is just a particular way to make compiler crazy (but I don't know
another one related to the case).

> 
> > If you run the following snippet, you'll see the artefacts for
> > successfully compiled trace in dump.out.
> > | require('jit.dump').start('+tbisrmXaT', 'dump.out')
> > | 
> > | local limit = -1ULL / 2
> > | local value = 1ULL
> > | local tests = { }
> > | 
> > | while value < limit do
> > |   value = value * 2
> > |   table.insert(tests, value < limit)
> > | end
> > | 
> > | jit.off() -- prevent spoiling dump.out with other traces.
> > | print(value, limit, #tests)
> > 
> > Since the patch relates only to Lua-C API, metamethods triggered while
> > running Lua code are also successfully compiled.
> > | require('jit.dump').start('+tbisrmXaT', 'dump.out')
> > | 
> > | local complex
> > | 
> > | local complex_mt = {
> > |   __add = function(a, b) return complex(a.re + b.re, a.im + b.im) end,
> > |   __eq  = function(a, b) return a.re == b.re and a.im == b.im end,
> > |   __tostring = function(o) return ('re: %d, im: %d'):format(o.re, o.im) end
> > | }
> > | 
> > | complex = function(re, im)
> > |   return setmetatable({ re = re, im = im }, complex_mt)
> > | end
> > | 
> > | local start = complex(0, 0)
> > | local finish = complex(64, 64)
> > | local step = complex(1, 1)
> > | 
> > | repeat
> > |   start = start + step
> > | until start == finish
> > 
> > Feel free to point the parts that looks good to be included into commit
> > message, comments, etc.
> > 
> >> Why? Does someone really yield in them?
> > 
> > The main idea is to protect ourselves in the future (we have already
> > discussed it with Kostja here[1]).
> > 
> >> Could you measure how does it affect perf of these calls?
> > 
> > Talking about perf: Sergos also asked to make benchmarks and I provided
> > the sysbench results[2] I obtained in Gitlab. I saw no dramatically
> > changes. I guess the reason the performance stays the same is that
> > all such places has been already handled manually in Tarantool sources
> > (like box_txn_rollback_to_savepoint after Leonid changes in #4427).
> > 
> > However, if you would like to see more benchmarks, please let me know.
> > 
> >> From what I see, these changes basically kill FFI making it not better
> >> than Lua C. It no longer helps JITting anything, does it?
> >>
> >> The same for concat, and other things. Looks like an overkill. We
> >> would need sandwich protection only for a few places. Is it possible
> >> to avoid such drastic changes?
> > 
> > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2019-November/012732.html
> > [2]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-March/015236.html
> 8. So are you saying, that ffi part is still fast?

Yes. Nothing has changed, the patch just protects one from FFI misuse.

[1]: https://github.com/tarantool/tarantool/issues/4478
[2]: https://github.com/tarantool/tarantool/issues/4862

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-04-07 21:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 10:47 [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or mode change Igor Munkin
2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 1/2] jit: abort trace recording and execution for C API Igor Munkin
2020-03-28 16:33   ` Sergey Ostanevich
2020-03-28 20:30     ` Igor Munkin
2020-03-29  9:21       ` Sergey Ostanevich
2020-03-29 10:45         ` Igor Munkin
2020-03-30  8:58           ` Sergey Ostanevich
2020-03-30 14:25             ` Igor Munkin
2020-04-03 21:06               ` Sergey Ostanevich
2020-04-03 21:31                 ` Igor Munkin
2020-04-02 23:41   ` Vladislav Shpilevoy
2020-04-04 11:55     ` Igor Munkin
2020-04-04 21:37       ` Vladislav Shpilevoy
2020-04-07 21:16         ` Igor Munkin
2020-03-27 10:47 ` [Tarantool-patches] [PATCH luajit 2/2] jit: abort trace execution on JIT mode change Igor Munkin
2020-03-28 19:36   ` Sergey Ostanevich
2020-03-29 10:46     ` Igor Munkin
2020-04-02 23:41 ` [Tarantool-patches] [PATCH luajit 0/2] Trace abort on FFI sandwich or " Vladislav Shpilevoy
2020-04-03 21:32   ` Igor Munkin
2020-04-04 21:36     ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox