[Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
Igor Munkin
imun at tarantool.org
Wed Jul 8 01:24:36 MSK 2020
Vlad,
Thanks for your review!
On 01.04.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
>
> See 7 comments below.
>
> > diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> > index 45bc03787..c79aa7c2b 100644
> > --- a/src/lua/fiber.c
> > +++ b/src/lua/fiber.c
> > @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
> > return result;
> > }
> >
> > +static int
> > +fiber_on_yield(struct trigger *trigger, void *event)
> > +{
> > + (void) trigger;
> > + (void) event;
> > +
> > + /*
> > + * XXX: According to LuaJIT API reference luaJIT_setmode
>
> 1. What is 'XXX'?
AFAIK this is just a legacy keyword coming from Sun code convention[1]:
| 10.5.4 Special Comments
|
| Use XXX in a comment to flag something that is bogus but works. Use
| FIXME to flag something that is bogus and broken.
I guess its meaning is kinda lost nowadays and often is considered close
to FIXME now. Some people suggest to avoid XXX since it's ambiguous but:
* I see it's already used in Tarantool sources:
| imun at tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l |
10
* This is the right case to use it: such <luaJIT_setmode> usage is bogus
but works for the issue purposes.
* This keyword is highlighted by the text editor (e.g. vim).
>
> > + * function returns 0 on failure and 1 on success. Since
> > + * this call is aimed to abort the trace recording on
> > + * fiber_yield call and prevent already compiled traces
> > + * execution, the mode parameter is an invalid value (-1)
> > + * so JIT mode change is expected to fail and 0 is
> > + * returned. Otherwise non-zero return value signals
> > + * trigger_run routine about this trigger failure.
>
> 2. So correct me if I am wrong - we deliberately fail this
> function assuming that its call stops current trace collection/execution,
> right? And we pass -1 because there is no other way to stop
Yes, with one nit: the call either aborts current trace collection or
leads to the platform panic (we have no option here) -- similar to the
way we handle FFI sandwiches.
> trace collection/execution except this function.
Fortunately <luaJIT_setmode> aborts trace compilation unconditionally
and then changes JIT behaviour considering the given mode value. We pass
-1 since there is no special value to abort the trace compilation. This
value changes nothing, so it's just a hack to abort trace compilation
via exported LuaJIT C API.
>
> Why can't we call lj_trace_abort() directly?
It's the internal API. Its usage complicates a switch between various
LuaJIT implementations (we faced several challenges when tried to build
Tarantool with uJIT). There is a public API to be used here (though in a
bit hacky way).
>
> And most importantly, how does it affect perf? New trigger
> is +1 virtual call on each yield of every Lua fiber and +1
> execution of non-trival function luaJIT_setmode(). I think
> it is better to write a micro bench checking how many yields
> can we do per time unit before and after this patch. From Lua
> fibers.
I made some benchmarks[2] and here are the numbers for the bleeding
master:
* Vanilla (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.0015582
| #1700 bench: fibers: 10; iterations: 1000 => 0.004238
| #1700 bench: fibers: 10; iterations: 10000 => 0.031612
| #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
| #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
| #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
| #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
| #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
| #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
| #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
| #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
| #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
| #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
| #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
| #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
* Patched [Lua fiber trigger] (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
| #1700 bench: fibers: 10; iterations: 1000 => 0.004395
| #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
| #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
| #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
| #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
| #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
| #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
| #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
| #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
| #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
| #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
| #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
| #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
| #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
* Relative measurements (vanilla -> patched):
| #1700 bench: fibers: 10; iterations: 100 => 1.31%
| #1700 bench: fibers: 10; iterations: 1000 => 3.7%
| #1700 bench: fibers: 10; iterations: 10000 => 3.82%
| #1700 bench: fibers: 10; iterations: 100000 => 5.24%
| #1700 bench: fibers: 100; iterations: 100 => 2.99%
| #1700 bench: fibers: 100; iterations: 1000 => 5.19%
| #1700 bench: fibers: 100; iterations: 10000 => 5%
| #1700 bench: fibers: 100; iterations: 100000 => 5.24%
| #1700 bench: fibers: 1000; iterations: 100 => 6.06%
| #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
| #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
| #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
| #1700 bench: fibers: 10000; iterations: 100 => 0.29%
| #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
| #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
| #1700 bench: fibers: 10000; iterations: 100000 => 0.73%
>
> > + */
> > + return luaJIT_setmode(tarantool_L, 0, -1);
>
> 3. In case that function ever returns 1, that breaks our
> error return convention, that error means -1 (it will work
> in triggers, since they do '!= 0', but anyway). Also when
> there is an error, diag is expected to be set. Better do
> diag_set(LuajitError) here when this fails. Or even panic()
> since it is not supposed to return 1 when the mode is bad.
> Or assume it is always 0 (add assert()), and return const 0
> from the trigger.
I guess assert with return 0 is enough for this case.
>
> > +}
> > +
> > /**
> > * Utility function for fiber.create and fiber.new
> > */
> > @@ -467,10 +487,18 @@ fiber_create(struct lua_State *L)
> >
> > struct fiber *f = fiber_new("lua", lua_fiber_run_f);
> > if (f == NULL) {
> > - luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
> > - luaT_error(L);
> > + /* diagnostics is set in fiber_new. */
> > + goto error;
> > }
> >
> > + struct trigger *t = malloc(sizeof(*t));
> > + if (t == NULL) {
> > + diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
> > + goto error;
> > + }
> > + trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
> > + trigger_add(&f->on_yield, t);
>
> 4. Yeah, now I understand why did you ask about when triggers are
> deleted. In this particular case there is a better option. You can
> allocate the trigger right on the stack, in lua_fiber_run_f(). This
> is the trampoline for all Lua fibers, regardless how they are
> created: via fiber.new(), or fiber.create(). You can add the trigger
> before
>
> result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
>
> And remove afterwards. No need to use the heap then, nor move some
> error handling code.
Nice, thanks! I adjusted the patch, considering this point:
================================================================================
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index c79aa7c2b..f0bcc1ab7 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -437,11 +437,15 @@ static int
lua_fiber_run_f(MAYBE_UNUSED va_list ap)
{
int result;
+ struct trigger t;
struct fiber *f = fiber();
struct lua_State *L = f->storage.lua.stack;
int coro_ref = lua_tointeger(L, -1);
lua_pop(L, 1);
+ trigger_create(&t, fiber_on_yield, NULL, NULL);
+ trigger_add(&f->on_yield, &t);
result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+ trigger_clear(&t);
/*
* If fiber is not joinable
* We can unref child stack here,
@@ -487,17 +491,9 @@ fiber_create(struct lua_State *L)
struct fiber *f = fiber_new("lua", lua_fiber_run_f);
if (f == NULL) {
- /* diagnostics is set in fiber_new. */
- goto error;
- }
-
- struct trigger *t = malloc(sizeof(*t));
- if (t == NULL) {
- diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
- goto error;
+ luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
+ luaT_error(L);
}
- trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
- trigger_add(&f->on_yield, t);
/* Move the arguments to the new coro */
lua_xmove(L, child_L, lua_gettop(L));
@@ -511,13 +507,6 @@ fiber_create(struct lua_State *L)
lua_pushinteger(child_L, coro_ref);
f->storage.lua.stack = child_L;
return f;
-
-error:
- /* Release the anchored coroutine. */
- luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
- luaT_error(L);
- unreachable();
- return NULL;
}
/**
================================================================================
>
> 5. Also there is a question what will happen, if Lua is accessed from
> a fiber, created not in Lua? That is basically all pooled fibers,
> which serve IProto requests. See struct fiber_pool and its single
> usage place. These fibers are started from C, and may end up executing
> Lua code, if, for example, an IPROTO_CALL arrived.
>
> The same question can be applied to fibers, created from C API:
>
> API_EXPORT struct fiber *
> fiber_new(const char *name, fiber_func f);
>
> These fibers won't have the on_yield trigger, but can access Lua.
> For example, they can run Lua on_replace triggers in a space by making
> an insertion into it. Also they can access Lua API directly. By
> calling things like luaT_call() (see module.h).
Well, strictly saying this means we need such trigger on each
created/pooled fiber, right?
>
> You may consider adding Jit collection/tracing termination to the
> scheduler fiber. It is just one fiber, and from what I know, every
> fiber after yield switches current thread to the scheduler fiber. This
> also maybe less expensive. We probably won't even notice.
I've tried to implement such approach but it doesn't work. As you
mentioned on_yield triggers are not executed for scheduler fiber.
Nevertheless, here is the patch (I hope I've not missed something):
================================================================================
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc775..33ddb894d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
*/
struct fiber *script_fiber;
bool start_loop = true;
+static struct trigger lua_on_yield_trigger;
/* contents of src/lua/ files */
extern char strict_lua[],
@@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
return 1;
}
+static int
+tarantool_lua_on_yield(struct trigger *trigger, void *event)
+{
+ (void) trigger;
+ (void) event;
+ int status = luaJIT_setmode(tarantool_L, 0, -1);
+ assert(status == 0);
+ return 0;
+}
+
void
tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
{
+ trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
+ trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
+
lua_State *L = luaL_newstate();
if (L == NULL) {
panic("failed to initialize Lua");
================================================================================
However, I've implemented your idea in another way: added an auxiliary
function field to cord structure to be called on <fiber_yield> or
<fiber_call>:
================================================================================
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 5389ce467..c2b339f00 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
void
fiber_call(struct fiber *callee)
{
+ struct cord *cord = cord();
struct fiber *caller = fiber();
assert(! (caller->flags & FIBER_IS_READY));
assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+ if (cord->lua_on_yield)
+ cord->lua_on_yield();
clock_set_on_csw(caller);
callee->caller = caller;
callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+ if (cord->lua_on_yield)
+ cord->lua_on_yield();
clock_set_on_csw(caller);
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
slab_cache_set_thread(&cord()->slabc);
cord->id = pthread_self();
+ cord->lua_on_yield = NULL;
cord->on_exit = NULL;
slab_cache_create(&cord->slabc, &runtime);
mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index cd9346a55..97e2ebb02 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -534,6 +534,7 @@ struct cord_on_exit;
* model.
*/
struct cord {
+ void (*lua_on_yield)(void);
/** The fiber that is currently being executed. */
struct fiber *fiber;
struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc775..65e4b5dd0 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
lua_atpanic(L, tarantool_panic_handler);
/* clear possible left-overs of init */
lua_settop(L, 0);
+ cord()->lua_on_yield = tarantool_lua_on_yield;
tarantool_L = L;
}
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d7257..8645d49d9 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -37,6 +37,8 @@
#include <diag.h>
#include <fiber.h>
+#include <luajit.h>
+
#include "serializer_opts.h"
int luaL_nil_ref = LUA_REFNIL;
@@ -1276,3 +1278,9 @@ tarantool_lua_utils_init(struct lua_State *L)
assert(CTID_UUID != 0);
return 0;
}
+
+void
+tarantool_lua_on_yield(void)
+{
+ (void)luaJIT_setmode(tarantool_L, 0, -1);
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index b10754e4a..71dc67fa0 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
int
tarantool_lua_utils_init(struct lua_State *L);
+void
+tarantool_lua_on_yield(void);
+
#if defined(__cplusplus)
} /* extern "C" */
#endif /* defined(__cplusplus) */
================================================================================
And here are the results:
* Vanilla (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.0015582
| #1700 bench: fibers: 10; iterations: 1000 => 0.004238
| #1700 bench: fibers: 10; iterations: 10000 => 0.031612
| #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
| #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
| #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
| #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
| #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
| #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
| #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
| #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
| #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
| #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
| #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
| #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
* Patched [cord callback] (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.0015236
| #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
| #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
| #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
| #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
| #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
| #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
| #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
| #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
| #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
| #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
| #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
| #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
| #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
| #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
* Relative measurements (vanilla -> patched):
| #1700 bench: fibers: 10; iterations: 100 => -2.22%
| #1700 bench: fibers: 10; iterations: 1000 => 6.36%
| #1700 bench: fibers: 10; iterations: 10000 => 8.15%
| #1700 bench: fibers: 10; iterations: 100000 => 9.23%
| #1700 bench: fibers: 100; iterations: 100 => 2.83%
| #1700 bench: fibers: 100; iterations: 1000 => 7.78%
| #1700 bench: fibers: 100; iterations: 10000 => 7.6%
| #1700 bench: fibers: 100; iterations: 100000 => 7.81%
| #1700 bench: fibers: 1000; iterations: 100 => 8%
| #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
| #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
| #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
| #1700 bench: fibers: 10000; iterations: 100 => 4.2%
| #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
| #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
| #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
Looks like this way is slower than the one implemented via triggers.
>
> 6. There is also a question about how does it affect collection of
> valid traces? I don't know how Jit does it, so ask just in case.
> Won't this make traces never collected? Because every fiber yields
> soon or late.
All traces already aborts recording (or finalize it preparing for
further stitching) on each <fiber.yield> / <fiber.sleep> call, since
they are implemented via Lua C API. The patch just handle the case when
fibers are switched underneath an FFI call (see #4491[3]).
>
> 7. Does this patch allow to yield in ffi?
Yep.
[1]: https://www.oracle.com/technetwork/java/javase/documentation/codeconventions-137265.html
[2]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e
[3]: https://github.com/tarantool/tarantool/issues/4491
--
Best regards,
IM
More information about the Tarantool-patches
mailing list