[Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
@ 2021-06-18 18:14 Sergey Kaplun via Tarantool-patches
  2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-06-18 18:14 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Igor Munkin; +Cc: tarantool-patches

The old code flow was the following:
1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
   arguments to encode to MessagePack
2) The main coroutine `tarantool_L` is used with `lua_cpcall()` to call
   `encode_lua_call_16()` or `encode_lua_call()`
3) Objects on minor coroutine are encoded via `luamp_encode_call16()` or
   `luamp_encode()`. This encoding may raise an error on unprotected
   `port->L` coroutine. This coroutine has no protected frame on it
   and this call should fail in pure Lua. It is forbidden to call
   anything on unprotected coroutine [1] (Lua 5.1 sets protection only
   for specific lua_State [2] and calls a panic function if we raise an
   error on unprotected lua_State [3]). Netherless, there is no panic
   at now due to two facts. The first one is LuaJIT's support of C++
   exception handling [4] that allows to raise an error in Lua and
   catch it in C++ or vice versa. But documentation still doesn't
   permit errors on unprotected coroutines (at least we must set
   try-catch block). The second one is double monkey-patching of LuaJIT
   to restore currently executed coroutine, when C function or fast
   function raises an error [5][6] (see related issue here [7][8]).
   For these reasons, when an error occurs, the unwinder searches and
   finds the C-protected stack frame from the `lua_cpcall()` for the
   `tarantool_L` coroutine and unwinds until that point (without
   aforementioned patches LuaJIT just calls a panic function and exit).
4) If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then
   the error from `port->L` coroutine is converted into a Tarantool error
   and a diagnostic is set.

The auxiliary usage of `tarantool_L` coroutine is redundant and doesn't
respect Lua idiomatic of usage. So this patch drops it and uses only
minor coroutine instead with `lua_pcall()`.

Functions to encode are saved as entrance in the `LUA_REGISTRY` table
to reduce GC pressure, like it is done for other handlers [9].

[1]: https://www.lua.org/manual/5.2/manual.html#4.6
> If an error happens outside any protected environment, Lua calls a
> panic function
[2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
[3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
[4]: https://luajit.org/extensions.html#exceptions
[5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
[6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
[7]: https://github.com/tarantool/tarantool/issues/1516
[8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
[9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
---

Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
See the benchmarks sources here [1].

Before patch:
| Encode map: 189851357 mcs, 15.8 K ps
| Encode seq: 187926351 mcs, 16.0 K ps
| Encode str: 185451675 mcs, 16.2 K ps
| Encode dig: 184833396 mcs, 16.2 K ps

After patch:
| Encode map: 187814261 mcs, 16.0 K ps
| Encode seq: 183755028 mcs, 16.3 K ps
| Encode str: 181571626 mcs, 16.5 K ps
| Encode dig: 181572998 mcs, 16.5 K ps

Looks like the perf doesn't degrade at least.

[1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3

 src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 56 insertions(+), 15 deletions(-)

diff --git a/src/box/lua/call.c b/src/box/lua/call.c
index 0315e720c..3b2572096 100644
--- a/src/box/lua/call.c
+++ b/src/box/lua/call.c
@@ -61,6 +61,8 @@ enum handlers {
 	HANDLER_CALL,
 	HANDLER_CALL_BY_REF,
 	HANDLER_EVAL,
+	HANDLER_ENCODE_CALL,
+	HANDLER_ENCODE_CALL_16,
 	HANDLER_MAX,
 };
 
@@ -400,11 +402,26 @@ struct encode_lua_ctx {
 	struct mpstream *stream;
 };
 
+/**
+ * Encode call results to msgpack from Lua stack.
+ * Lua stack has the following structure -- the last element is
+ * lightuserdata pointer to encode_lua_ctx, all other values are
+ * arguments to proceed.
+ * The function encodes all given Lua objects to msgpack stream
+ * from context, sets port's size and returns no value on the Lua
+ * stack.
+ * XXX: This function *MUST* be called under lua_pcall(), because
+ * luamp_encode() may raise an error.
+ */
 static int
 encode_lua_call(lua_State *L)
 {
+	assert(lua_islightuserdata(L, -1));
 	struct encode_lua_ctx *ctx =
-		(struct encode_lua_ctx *) lua_topointer(L, 1);
+		(struct encode_lua_ctx *) lua_topointer(L, -1);
+	assert(ctx->port->L == L);
+	/* Delete ctx from the stack. */
+	lua_pop(L, 1);
 	/*
 	 * Add all elements from Lua stack to the buffer.
 	 *
@@ -413,33 +430,48 @@ encode_lua_call(lua_State *L)
 	struct luaL_serializer *cfg = luaL_msgpack_default;
 	const struct serializer_opts *opts =
 		&current_session()->meta.serializer_opts;
-	int size = lua_gettop(ctx->port->L);
+	const int size = lua_gettop(L);
 	for (int i = 1; i <= size; ++i)
-		luamp_encode(ctx->port->L, cfg, opts, ctx->stream, i);
+		luamp_encode(L, cfg, opts, ctx->stream, i);
 	ctx->port->size = size;
 	mpstream_flush(ctx->stream);
 	return 0;
 }
 
+/**
+ * Encode call_16 results to msgpack from Lua stack.
+ * Lua stack has the following structure -- the last element is
+ * lightuserdata pointer to encode_lua_ctx, all other values are
+ * arguments to proceed.
+ * The function encodes all given Lua objects to msgpack stream
+ * from context, sets port's size and returns no value on the Lua
+ * stack.
+ * XXX: This function *MUST* be called under lua_pcall(), because
+ * luamp_encode() may raise an error.
+ */
 static int
 encode_lua_call_16(lua_State *L)
 {
+	assert(lua_islightuserdata(L, -1));
 	struct encode_lua_ctx *ctx =
-		(struct encode_lua_ctx *) lua_topointer(L, 1);
+		(struct encode_lua_ctx *) lua_topointer(L, -1);
+	/* Delete ctx from the stack. */
+	lua_pop(L, 1);
+	assert(ctx->port->L == L);
 	/*
 	 * Add all elements from Lua stack to the buffer.
 	 *
 	 * TODO: forbid explicit yield from __serialize or __index here
 	 */
 	struct luaL_serializer *cfg = luaL_msgpack_default;
-	ctx->port->size = luamp_encode_call_16(ctx->port->L, cfg, ctx->stream);
+	ctx->port->size = luamp_encode_call_16(L, cfg, ctx->stream);
 	mpstream_flush(ctx->stream);
 	return 0;
 }
 
 static inline int
 port_lua_do_dump(struct port *base, struct mpstream *stream,
-		 lua_CFunction handler)
+		 enum handlers handler)
 {
 	struct port_lua *port = (struct port_lua *) base;
 	assert(port->vtab == &port_lua_vtab);
@@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
 	struct encode_lua_ctx ctx;
 	ctx.port = port;
 	ctx.stream = stream;
-	struct lua_State *L = tarantool_L;
-	int top = lua_gettop(L);
-	if (lua_cpcall(L, handler, &ctx) != 0) {
-		luaT_toerror(port->L);
+	lua_State *L = port->L;
+	/*
+	 * At the moment Lua stack holds only values to encode.
+	 * Insert corresponding encoder to the bottom and push
+	 * encode context as lightuserdata to the top.
+	 */
+	const int size = lua_gettop(L);
+	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
+	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
+	lua_insert(L, 1);
+	lua_pushlightuserdata(L, &ctx);
+	/* nargs -- all arguments + lightuserdata. */
+	if (luaT_call(L, size + 1, 0) != 0)
 		return -1;
-	}
-	lua_settop(L, top);
 	return port->size;
 }
 
@@ -467,7 +506,7 @@ port_lua_dump(struct port *base, struct obuf *out)
 	struct mpstream stream;
 	mpstream_init(&stream, out, obuf_reserve_cb, obuf_alloc_cb,
 		      luamp_error, port->L);
-	return port_lua_do_dump(base, &stream, encode_lua_call);
+	return port_lua_do_dump(base, &stream, HANDLER_ENCODE_CALL);
 }
 
 static int
@@ -477,7 +516,7 @@ port_lua_dump_16(struct port *base, struct obuf *out)
 	struct mpstream stream;
 	mpstream_init(&stream, out, obuf_reserve_cb, obuf_alloc_cb,
 		      luamp_error, port->L);
-	return port_lua_do_dump(base, &stream, encode_lua_call_16);
+	return port_lua_do_dump(base, &stream, HANDLER_ENCODE_CALL_16);
 }
 
 static void
@@ -501,7 +540,7 @@ port_lua_get_msgpack(struct port *base, uint32_t *size)
 	mpstream_init(&stream, region, region_reserve_cb, region_alloc_cb,
 		      luamp_error, port->L);
 	mpstream_encode_array(&stream, lua_gettop(port->L));
-	int rc = port_lua_do_dump(base, &stream, encode_lua_call);
+	int rc = port_lua_do_dump(base, &stream, HANDLER_ENCODE_CALL);
 	if (rc < 0) {
 		region_truncate(region, region_svp);
 		return NULL;
@@ -1049,6 +1088,8 @@ box_lua_call_init(struct lua_State *L)
 		[HANDLER_CALL] = execute_lua_call,
 		[HANDLER_CALL_BY_REF] = execute_lua_call_by_ref,
 		[HANDLER_EVAL] = execute_lua_eval,
+		[HANDLER_ENCODE_CALL] = encode_lua_call,
+		[HANDLER_ENCODE_CALL_16] = encode_lua_call_16,
 	};
 
 	for (int i = 0; i < HANDLER_MAX; i++) {
-- 
2.31.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
@ 2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
  2021-08-01 12:34 ` Igor Munkin via Tarantool-patches
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-21 20:36 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi! Thanks for the patch!

> Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> See the benchmarks sources here [1].
> 
> Before patch:
> | Encode map: 189851357 mcs, 15.8 K ps
> | Encode seq: 187926351 mcs, 16.0 K ps
> | Encode str: 185451675 mcs, 16.2 K ps
> | Encode dig: 184833396 mcs, 16.2 K ps
> 
> After patch:
> | Encode map: 187814261 mcs, 16.0 K ps
> | Encode seq: 183755028 mcs, 16.3 K ps
> | Encode str: 181571626 mcs, 16.5 K ps
> | Encode dig: 181572998 mcs, 16.5 K ps
> 
> Looks like the perf doesn't degrade at least.
> 
> [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> 
>  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 56 insertions(+), 15 deletions(-)
> 
> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> index 0315e720c..3b2572096 100644
> --- a/src/box/lua/call.c
> +++ b/src/box/lua/call.c
> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
>  	struct encode_lua_ctx ctx;
>  	ctx.port = port;
>  	ctx.stream = stream;
> -	struct lua_State *L = tarantool_L;
> -	int top = lua_gettop(L);
> -	if (lua_cpcall(L, handler, &ctx) != 0) {
> -		luaT_toerror(port->L);
> +	lua_State *L = port->L;
> +	/*
> +	 * At the moment Lua stack holds only values to encode.
> +	 * Insert corresponding encoder to the bottom and push
> +	 * encode context as lightuserdata to the top.
> +	 */
> +	const int size = lua_gettop(L);
> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> +	lua_insert(L, 1);

If I remember correctly, this leads to moving all the existing
stack elements forward. Which might be expensive. I know from
Vlad Grubov's words that they have code with hundreds of values in
multireturn from stored functions. Did you bench what happens when
the Lua coroutine contains a lot of values? In the bench by the
link above I see only 1-element array and map. Size of the array
and map does not matter though. Only multireturn is interesting
here. Like 'return 1, 2, 3, ...'.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
  2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-06-22 13:38 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi! Thanks for the review!

On 21.06.21, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> > See the benchmarks sources here [1].
> > 
> > Before patch:
> > | Encode map: 189851357 mcs, 15.8 K ps
> > | Encode seq: 187926351 mcs, 16.0 K ps
> > | Encode str: 185451675 mcs, 16.2 K ps
> > | Encode dig: 184833396 mcs, 16.2 K ps
> > 
> > After patch:
> > | Encode map: 187814261 mcs, 16.0 K ps
> > | Encode seq: 183755028 mcs, 16.3 K ps
> > | Encode str: 181571626 mcs, 16.5 K ps
> > | Encode dig: 181572998 mcs, 16.5 K ps
> > 
> > Looks like the perf doesn't degrade at least.
> > 
> > [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> > 
> >  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 56 insertions(+), 15 deletions(-)
> > 
> > diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> > index 0315e720c..3b2572096 100644
> > --- a/src/box/lua/call.c
> > +++ b/src/box/lua/call.c
> > @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
> >  	struct encode_lua_ctx ctx;
> >  	ctx.port = port;
> >  	ctx.stream = stream;
> > -	struct lua_State *L = tarantool_L;
> > -	int top = lua_gettop(L);
> > -	if (lua_cpcall(L, handler, &ctx) != 0) {
> > -		luaT_toerror(port->L);
> > +	lua_State *L = port->L;
> > +	/*
> > +	 * At the moment Lua stack holds only values to encode.
> > +	 * Insert corresponding encoder to the bottom and push
> > +	 * encode context as lightuserdata to the top.
> > +	 */
> > +	const int size = lua_gettop(L);
> > +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> > +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> > +	lua_insert(L, 1);
> 
> If I remember correctly, this leads to moving all the existing
> stack elements forward. Which might be expensive. I know from
> Vlad Grubov's words that they have code with hundreds of values in
> multireturn from stored functions. Did you bench what happens when
> the Lua coroutine contains a lot of values? In the bench by the
> link above I see only 1-element array and map. Size of the array
> and map does not matter though. Only multireturn is interesting
> here. Like 'return 1, 2, 3, ...'.

I've added this benchmark (200 numbers to return) to the gist.
Local results for the bench:

Master:
| Encode mul: 237900953 mcs, 12.6 K ps

My branch:
| Encode mul: 235735350 mcs, 12.7 K ps

`luamp_encode()` has the biggest impact in `port_do_lua_dump()`
(`lua_insert()` costs ~0.1% of the whole runtime).

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
@ 2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-29  7:07       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 13+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-24 20:00 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Thanks for the investigation!

>>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
>>> index 0315e720c..3b2572096 100644
>>> --- a/src/box/lua/call.c
>>> +++ b/src/box/lua/call.c
>>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
>>>  	struct encode_lua_ctx ctx;
>>>  	ctx.port = port;
>>>  	ctx.stream = stream;
>>> -	struct lua_State *L = tarantool_L;
>>> -	int top = lua_gettop(L);
>>> -	if (lua_cpcall(L, handler, &ctx) != 0) {
>>> -		luaT_toerror(port->L);
>>> +	lua_State *L = port->L;
>>> +	/*
>>> +	 * At the moment Lua stack holds only values to encode.
>>> +	 * Insert corresponding encoder to the bottom and push
>>> +	 * encode context as lightuserdata to the top.
>>> +	 */
>>> +	const int size = lua_gettop(L);
>>> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
>>> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
>>> +	lua_insert(L, 1);
>>
>> If I remember correctly, this leads to moving all the existing
>> stack elements forward. Which might be expensive. I know from
>> Vlad Grubov's words that they have code with hundreds of values in
>> multireturn from stored functions. Did you bench what happens when
>> the Lua coroutine contains a lot of values? In the bench by the
>> link above I see only 1-element array and map. Size of the array
>> and map does not matter though. Only multireturn is interesting
>> here. Like 'return 1, 2, 3, ...'.
> 
> I've added this benchmark (200 numbers to return) to the gist.
> Local results for the bench:
> 
> Master:
> | Encode mul: 237900953 mcs, 12.6 K ps
> 
> My branch:
> | Encode mul: 235735350 mcs, 12.7 K ps
> 
> `luamp_encode()` has the biggest impact in `port_do_lua_dump()`
> (`lua_insert()` costs ~0.1% of the whole runtime).

That looks suspicious. Why does it have such a small impact? Did
you check that multireturn really made the stack bigger? Shouldn't
it have linear complexity depending on the stack size after the
target index?

The patch looks fine to me, if the benchmarks show there is even a
slight perf win. But I think Igor should take another look.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-29  7:07       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 13+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-06-29  7:07 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Thanks for the questions!

On 24.06.21, Vladislav Shpilevoy wrote:
> Thanks for the investigation!
> 
> >>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> >>> index 0315e720c..3b2572096 100644
> >>> --- a/src/box/lua/call.c
> >>> +++ b/src/box/lua/call.c
> >>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
> >>>  	struct encode_lua_ctx ctx;
> >>>  	ctx.port = port;
> >>>  	ctx.stream = stream;
> >>> -	struct lua_State *L = tarantool_L;
> >>> -	int top = lua_gettop(L);
> >>> -	if (lua_cpcall(L, handler, &ctx) != 0) {
> >>> -		luaT_toerror(port->L);
> >>> +	lua_State *L = port->L;
> >>> +	/*
> >>> +	 * At the moment Lua stack holds only values to encode.
> >>> +	 * Insert corresponding encoder to the bottom and push
> >>> +	 * encode context as lightuserdata to the top.
> >>> +	 */
> >>> +	const int size = lua_gettop(L);
> >>> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> >>> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> >>> +	lua_insert(L, 1);
> >>
> >> If I remember correctly, this leads to moving all the existing
> >> stack elements forward. Which might be expensive. I know from
> >> Vlad Grubov's words that they have code with hundreds of values in
> >> multireturn from stored functions. Did you bench what happens when
> >> the Lua coroutine contains a lot of values? In the bench by the
> >> link above I see only 1-element array and map. Size of the array
> >> and map does not matter though. Only multireturn is interesting
> >> here. Like 'return 1, 2, 3, ...'.
> > 
> > I've added this benchmark (200 numbers to return) to the gist.
> > Local results for the bench:
> > 
> > Master:
> > | Encode mul: 237900953 mcs, 12.6 K ps
> > 
> > My branch:
> > | Encode mul: 235735350 mcs, 12.7 K ps
> > 
> > `luamp_encode()` has the biggest impact in `port_do_lua_dump()`
> > (`lua_insert()` costs ~0.1% of the whole runtime).
> 
> That looks suspicious. Why does it have such a small impact? Did

I've run the benchmark for return case only with `perf stat` (and 5K
return values) and found three interesting metrics (others are pretty
the same, up to 0.07%):

Master:
| 10,255,748,650 cache-misses      # 13.686 % of all cache refs         (12.49%)
| 1,364,762,310  LLC-load-misses   # 15.92% of all LL-cache accesses    (12.50%)
| 401,136,504    iTLB-load-misses  # 39.68% of all iTLB cache accesses  (12.51%)

My branch:
| 9,964,939,197  cache-misses      # 13.413 % of all cache refs         (12.52%)
| 1,296,545,699  LLC-load-misses   # 15.08% of all LL-cache accesses    (12.50%)
| 285,852,865    iTLB-load-misses  # 29.13% of all iTLB cache accesses  (12.49%)

1) LLC (last-line cache, L2 in my case) misses is slightly decreasing.
I suppose that all insert movement operations use cache (L1 or L2)
instead memory access, that make it rather cheaply.

2) iTLB (instruction translation look-aside buffer) [1] needs to avoid
penalty of page walks to a page table. So looks like within this patch
code locality is improved (may be due to the fact of using
`luaT_call()` instead of 'lua_cpcall()` and 'luaT_toerror()`).

> you check that multireturn really made the stack bigger? Shouldn't

Yes, I did.

> it have linear complexity depending on the stack size after the
> target index?

Yes, with 5K items to return it becomes pretty equal or even a little
bit slower (up to ~1%) than the original code. But encoding is still
taking the biggest part of the runtime.

Side note: if user wants to return 5K values (or a table with 5K values)
from Tarantool we can't stop him, but only to offer him our empathy.

> 
> The patch looks fine to me, if the benchmarks show there is even a
> slight perf win. But I think Igor should take another look.

[1]: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/front-end-bound/itlb-overhead.html

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
  2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-01 12:34 ` Igor Munkin via Tarantool-patches
  2021-08-02 14:25   ` Sergey Kaplun via Tarantool-patches
  2021-08-04 22:29 ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-14 10:16 ` Igor Munkin via Tarantool-patches
  3 siblings, 1 reply; 13+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-01 12:34 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergey,

Thanks for the patch! The changes are fine, but please consider the
comments below.

On 18.06.21, Sergey Kaplun wrote:
> The old code flow was the following:
> 1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
>    arguments to encode to MessagePack
> 2) The main coroutine `tarantool_L` is used with `lua_cpcall()` to call
>    `encode_lua_call_16()` or `encode_lua_call()`
> 3) Objects on minor coroutine are encoded via `luamp_encode_call16()` or

What is this minor coroutine? To make terms more consistent in scope of
this commit message, let's define this in the first bullet and use it
everywhere. For me the best option is "port coroutine".

>    `luamp_encode()`. This encoding may raise an error on unprotected

The next bullet is recommended to be started from this sentence.

>    `port->L` coroutine. This coroutine has no protected frame on it
>    and this call should fail in pure Lua. It is forbidden to call

Ditto.

>    anything on unprotected coroutine [1] (Lua 5.1 sets protection only
>    for specific lua_State [2] and calls a panic function if we raise an
>    error on unprotected lua_State [3]). Netherless, there is no panic

Ditto.

>    at now due to two facts. The first one is LuaJIT's support of C++

And sublist is strongly recommended for "the first" and "the second"
facts. It'll make this War and Peace part more readable.

>    exception handling [4] that allows to raise an error in Lua and
>    catch it in C++ or vice versa. But documentation still doesn't
>    permit errors on unprotected coroutines (at least we must set
>    try-catch block). The second one is double monkey-patching of LuaJIT

I doubt this is monkey-patching, but rather a monkey's patch.

>    to restore currently executed coroutine, when C function or fast
>    function raises an error [5][6] (see related issue here [7][8]).
>    For these reasons, when an error occurs, the unwinder searches and
>    finds the C-protected stack frame from the `lua_cpcall()` for the
>    `tarantool_L` coroutine and unwinds until that point (without
>    aforementioned patches LuaJIT just calls a panic function and exit).
> 4) If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then
>    the error from `port->L` coroutine is converted into a Tarantool error
>    and a diagnostic is set.
> 
> The auxiliary usage of `tarantool_L` coroutine is redundant and doesn't

Well, as for me, the main problem is violating Lua coroutine practice
(it's worth to mention, that internal unwinder used on M1 is not such
flexible, so this misuse still leads to panic call), and the minor
reason is tarantool_L usage redundancy.

> respect Lua idiomatic of usage. So this patch drops it and uses only
> minor coroutine instead with `lua_pcall()`.
> 
> Functions to encode are saved as entrance in the `LUA_REGISTRY` table
> to reduce GC pressure, like it is done for other handlers [9].
> 
> [1]: https://www.lua.org/manual/5.2/manual.html#4.6
> > If an error happens outside any protected environment, Lua calls a
> > panic function

Some copy-pasting artefacts.

> [2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
> [3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
> [4]: https://luajit.org/extensions.html#exceptions
> [5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
> [6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
> [7]: https://github.com/tarantool/tarantool/issues/1516
> [8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
> [9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216

This patch resolves #6248, BTW.

> ---
> 
> Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> See the benchmarks sources here [1].
> 
> Before patch:
> | Encode map: 189851357 mcs, 15.8 K ps
> | Encode seq: 187926351 mcs, 16.0 K ps
> | Encode str: 185451675 mcs, 16.2 K ps
> | Encode dig: 184833396 mcs, 16.2 K ps
> 
> After patch:
> | Encode map: 187814261 mcs, 16.0 K ps
> | Encode seq: 183755028 mcs, 16.3 K ps
> | Encode str: 181571626 mcs, 16.5 K ps
> | Encode dig: 181572998 mcs, 16.5 K ps
> 
> Looks like the perf doesn't degrade at least.

At first, I would like to emphasize that we have no option for merging
or not the fix for this issue.

Re benchmarks: It's worth to mention you're measuring two performance
critical changes: <lua_insert> effect and lower GC pressure. So, it's
interesting to see the following benchmarks:
* one with disabled GC and GC stats
* one with considerable amount of elements on Lua stack, but not
  triggering stack resize (AFAIU, 200 is too much)

Here are my points:
* There is no such huge increase as a result of reducing GC pressure
* Moving 1-5 8-byte elements is neglible for performance
* Moving 200(*) elements as a result of the guest stack resize affects
  both patched and vanilla versions
* <lua_insert> measurements are affected by resizing (considering your
  perf stats)

Anyway, though these are kinda independent changes, we see no
performance degradation using both of them in a single patch, so I guess
we have no reason to worry about.

(*) I'm not sure about the exact amount of the elements to be moved.

> 
> [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> 
>  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 56 insertions(+), 15 deletions(-)
> 
> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> index 0315e720c..3b2572096 100644
> --- a/src/box/lua/call.c
> +++ b/src/box/lua/call.c
> @@ -61,6 +61,8 @@ enum handlers {
>  	HANDLER_CALL,
>  	HANDLER_CALL_BY_REF,
>  	HANDLER_EVAL,
> +	HANDLER_ENCODE_CALL,
> +	HANDLER_ENCODE_CALL_16,

Minor: Entries are not sorted.

>  	HANDLER_MAX,
>  };
>  
> @@ -400,11 +402,26 @@ struct encode_lua_ctx {
>  	struct mpstream *stream;
>  };
>  
> +/**
> + * Encode call results to msgpack from Lua stack.
> + * Lua stack has the following structure -- the last element is
> + * lightuserdata pointer to encode_lua_ctx, all other values are
> + * arguments to proceed.

Typo: s/proceed/process/.

> + * The function encodes all given Lua objects to msgpack stream
> + * from context, sets port's size and returns no value on the Lua
> + * stack.
> + * XXX: This function *MUST* be called under lua_pcall(), because
> + * luamp_encode() may raise an error.
> + */

<snipped>

> +/**
> + * Encode call_16 results to msgpack from Lua stack.
> + * Lua stack has the following structure -- the last element is
> + * lightuserdata pointer to encode_lua_ctx, all other values are
> + * arguments to proceed.

Ditto.

> + * The function encodes all given Lua objects to msgpack stream
> + * from context, sets port's size and returns no value on the Lua
> + * stack.
> + * XXX: This function *MUST* be called under lua_pcall(), because
> + * luamp_encode() may raise an error.
> + */

<snipped>

> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
>  	struct encode_lua_ctx ctx;
>  	ctx.port = port;
>  	ctx.stream = stream;
> -	struct lua_State *L = tarantool_L;
> -	int top = lua_gettop(L);
> -	if (lua_cpcall(L, handler, &ctx) != 0) {
> -		luaT_toerror(port->L);
> +	lua_State *L = port->L;
> +	/*
> +	 * At the moment Lua stack holds only values to encode.
> +	 * Insert corresponding encoder to the bottom and push
> +	 * encode context as lightuserdata to the top.
> +	 */
> +	const int size = lua_gettop(L);
> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> +	lua_insert(L, 1);

Side note: I don't like this line, like all we do. The only better
approach in my head is to implement something similar to VARG frame, but
I doubt this approach have no pure Lua violations. Let's return to this
later a bit.

> +	lua_pushlightuserdata(L, &ctx);
> +	/* nargs -- all arguments + lightuserdata. */
> +	if (luaT_call(L, size + 1, 0) != 0)
>  		return -1;
> -	}
> -	lua_settop(L, top);
>  	return port->size;
>  }
>  

<snipped>

> @@ -1049,6 +1088,8 @@ box_lua_call_init(struct lua_State *L)
>  		[HANDLER_CALL] = execute_lua_call,
>  		[HANDLER_CALL_BY_REF] = execute_lua_call_by_ref,
>  		[HANDLER_EVAL] = execute_lua_eval,
> +		[HANDLER_ENCODE_CALL] = encode_lua_call,
> +		[HANDLER_ENCODE_CALL_16] = encode_lua_call_16,

Minor: Entries are also not sorted.

>  	};
>  
>  	for (int i = 0; i < HANDLER_MAX; i++) {
> -- 
> 2.31.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-08-01 12:34 ` Igor Munkin via Tarantool-patches
@ 2021-08-02 14:25   ` Sergey Kaplun via Tarantool-patches
  2021-08-12 17:35     ` Igor Munkin via Tarantool-patches
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-02 14:25 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

Hi, Igor!

Thanks for the review!

On 01.08.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! The changes are fine, but please consider the
> comments below.
> 
> On 18.06.21, Sergey Kaplun wrote:
> > The old code flow was the following:
> > 1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
> >    arguments to encode to MessagePack
> > 2) The main coroutine `tarantool_L` is used with `lua_cpcall()` to call
> >    `encode_lua_call_16()` or `encode_lua_call()`
> > 3) Objects on minor coroutine are encoded via `luamp_encode_call16()` or
> 
> What is this minor coroutine? To make terms more consistent in scope of
> this commit message, let's define this in the first bullet and use it
> everywhere. For me the best option is "port coroutine".

Fixed.

> 
> >    `luamp_encode()`. This encoding may raise an error on unprotected
> 
> The next bullet is recommended to be started from this sentence.

Fixed.

> 
> >    `port->L` coroutine. This coroutine has no protected frame on it
> >    and this call should fail in pure Lua. It is forbidden to call
> 
> Ditto.

Fixed.

> 
> >    anything on unprotected coroutine [1] (Lua 5.1 sets protection only
> >    for specific lua_State [2] and calls a panic function if we raise an
> >    error on unprotected lua_State [3]). Netherless, there is no panic
> 
> Ditto.

Fixed.

> 
> >    at now due to two facts. The first one is LuaJIT's support of C++
> 
> And sublist is strongly recommended for "the first" and "the second"
> facts. It'll make this War and Peace part more readable.

Done.

> 
> >    exception handling [4] that allows to raise an error in Lua and
> >    catch it in C++ or vice versa. But documentation still doesn't
> >    permit errors on unprotected coroutines (at least we must set
> >    try-catch block). The second one is double monkey-patching of LuaJIT
> 
> I doubt this is monkey-patching, but rather a monkey's patch.

Changed to patching.

> 
> >    to restore currently executed coroutine, when C function or fast
> >    function raises an error [5][6] (see related issue here [7][8]).
> >    For these reasons, when an error occurs, the unwinder searches and
> >    finds the C-protected stack frame from the `lua_cpcall()` for the
> >    `tarantool_L` coroutine and unwinds until that point (without
> >    aforementioned patches LuaJIT just calls a panic function and exit).
> > 4) If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then
> >    the error from `port->L` coroutine is converted into a Tarantool error
> >    and a diagnostic is set.
> > 
> > The auxiliary usage of `tarantool_L` coroutine is redundant and doesn't
> 
> Well, as for me, the main problem is violating Lua coroutine practice
> (it's worth to mention, that internal unwinder used on M1 is not such
> flexible, so this misuse still leads to panic call), and the minor
> reason is tarantool_L usage redundancy.

Changed considering your comments.

> 
> > respect Lua idiomatic of usage. So this patch drops it and uses only
> > minor coroutine instead with `lua_pcall()`.
> > 
> > Functions to encode are saved as entrance in the `LUA_REGISTRY` table
> > to reduce GC pressure, like it is done for other handlers [9].
> > 
> > [1]: https://www.lua.org/manual/5.2/manual.html#4.6
> > > If an error happens outside any protected environment, Lua calls a
> > > panic function
> 
> Some copy-pasting artefacts.

Moved upwards.

> 
> > [2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
> > [3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
> > [4]: https://luajit.org/extensions.html#exceptions
> > [5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
> > [6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
> > [7]: https://github.com/tarantool/tarantool/issues/1516
> > [8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
> > [9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
> 
> This patch resolves #6248, BTW.

Added.

> 

The new commit message is the following:

===================================================================
lua: refactor port_lua_do_dump and encode_lua_call

The old code flow was the following:

1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
   arguments to encode to MessagePack.

2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
   or `encode_lua_call_16`() via `lua_cpcall()`.

3) Objects on port coroutine are encoded via `luamp_encode()` or
   `luamp_encode_call16()`.

4) This encoding may raise an error on unprotected `port->L` coroutine.
   This coroutine has no protected frame on it and this call should fail
   in pure Lua.

Calling anything on unprotected coroutine is not allowed in Lua [1]:

| If an error happens outside any protected environment, Lua calls a
| panic function

Lua 5.1 sets protection only for specific lua_State [2] and calls a
panic function if we raise an error on unprotected lua_State [3].

Nevertheless, no panic occurs now due to two facts:
* The first one is LuaJIT's support of C++ exception handling [4] that
  allows to raise an error in Lua and catch it in C++ or vice versa. But
  documentation still doesn't allow raising errors on unprotected
  coroutines (at least we must use try-catch block).
* The second one is the patch made in LuaJIT to restore currently
  executed coroutine, when C function or fast function raises an
  error [5][6] (see the related issue here [7][8]).

For these reasons, when an error occurs, the unwinder searches and finds
the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
coroutine and unwinds until that point (without aforementioned patches
LuaJIT just calls a panic function and exit).

If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
error from `port->L` coroutine is converted into a Tarantool error and a
diagnostic is set.

The auxiliary usage of `tarantool_L` coroutine doesn't respect Lua
idiomatic of usage. Internal unwinder used on M1 is not such flexible,
so such misuse leads to panic call. Also the `tarantool_L` usage is
redundant. So this patch drops it and uses only minor coroutine instead
with `lua_pcall()`.

Functions to encode are saved as entrance in the `LUA_REGISTRY` table to
reduce GC pressure, like it is done for other handlers [9].

[1]: https://www.lua.org/manual/5.2/manual.html#4.6
[2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
[3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
[4]: https://luajit.org/extensions.html#exceptions
[5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
[6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
[7]: https://github.com/tarantool/tarantool/issues/1516
[8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
[9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216

Closes #6248
Closes #4617
===================================================================

> > ---
> > 
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> > See the benchmarks sources here [1].
> > 
> > Before patch:
> > | Encode map: 189851357 mcs, 15.8 K ps
> > | Encode seq: 187926351 mcs, 16.0 K ps
> > | Encode str: 185451675 mcs, 16.2 K ps
> > | Encode dig: 184833396 mcs, 16.2 K ps
> > 
> > After patch:
> > | Encode map: 187814261 mcs, 16.0 K ps
> > | Encode seq: 183755028 mcs, 16.3 K ps
> > | Encode str: 181571626 mcs, 16.5 K ps
> > | Encode dig: 181572998 mcs, 16.5 K ps
> > 
> > Looks like the perf doesn't degrade at least.
> 
> At first, I would like to emphasize that we have no option for merging
> or not the fix for this issue.
> 
> Re benchmarks: It's worth to mention you're measuring two performance
> critical changes: <lua_insert> effect and lower GC pressure. So, it's
> interesting to see the following benchmarks:
> * one with disabled GC and GC stats

Here the results with disabled GC:
Before patch:

Encode map: 4679394 mcs, 21.4 K ps
Encode seq: 4559824 mcs, 21.9 K ps
Encode str: 4574213 mcs, 21.9 K ps
Encode dig: 4595043 mcs, 21.8 K ps
Encode mul: 5978444 mcs, 16.7 K ps

After:

Encode map: 4739110 mcs, 21.1 K ps
Encode seq: 4528261 mcs, 22.1 K ps
Encode str: 4576910 mcs, 21.8 K ps
Encode dig: 4506142 mcs, 22.2 K ps
Encode mul: 6016659 mcs, 16.6 K ps

I suppose, that values are almost the same, at least within the margin
of error.
Note: I reduced amount of iterations 30 times. So inaccuracy increased.

> * one with considerable amount of elements on Lua stack, but not
>   triggering stack resize (AFAIU, 200 is too much)

Tried with 40 items on the stack:

Without GC:

Master:
Encode mul: 4895280 mcs, 20.4 K ps

Branch:
Encode mul: 4896076 mcs, 20.4 K ps

With GC:

Master:
Encode mul: 5123580 mcs, 19.5 K ps

Branch:
Encode mul: 5050863 mcs, 19.8 K ps

Seems pretty equal too.

> 
> Here are my points:
> * There is no such huge increase as a result of reducing GC pressure
> * Moving 1-5 8-byte elements is neglible for performance
> * Moving 200(*) elements as a result of the guest stack resize affects
>   both patched and vanilla versions
> * <lua_insert> measurements are affected by resizing (considering your
>   perf stats)
> 
> Anyway, though these are kinda independent changes, we see no
> performance degradation using both of them in a single patch, so I guess
> we have no reason to worry about.
> 
> (*) I'm not sure about the exact amount of the elements to be moved.

Exactly 200.

> 
> > 
> > [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> > 
> >  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 56 insertions(+), 15 deletions(-)
> > 
> > diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> > index 0315e720c..3b2572096 100644
> > --- a/src/box/lua/call.c
> > +++ b/src/box/lua/call.c
> > @@ -61,6 +61,8 @@ enum handlers {
> >  	HANDLER_CALL,
> >  	HANDLER_CALL_BY_REF,
> >  	HANDLER_EVAL,
> > +	HANDLER_ENCODE_CALL,
> > +	HANDLER_ENCODE_CALL_16,
> 
> Minor: Entries are not sorted.

Fixed here and below. See the iterative patch:
===================================================================
diff --git a/src/box/lua/call.c b/src/box/lua/call.c
index 3b2572096..5db17359d 100644
--- a/src/box/lua/call.c
+++ b/src/box/lua/call.c
@@ -60,9 +60,9 @@
 enum handlers {
         HANDLER_CALL,
         HANDLER_CALL_BY_REF,
-        HANDLER_EVAL,
         HANDLER_ENCODE_CALL,
         HANDLER_ENCODE_CALL_16,
+        HANDLER_EVAL,
         HANDLER_MAX,
 };
 
@@ -1087,9 +1087,9 @@ box_lua_call_init(struct lua_State *L)
         lua_CFunction handles[] = {
                 [HANDLER_CALL] = execute_lua_call,
                 [HANDLER_CALL_BY_REF] = execute_lua_call_by_ref,
-                [HANDLER_EVAL] = execute_lua_eval,
                 [HANDLER_ENCODE_CALL] = encode_lua_call,
                 [HANDLER_ENCODE_CALL_16] = encode_lua_call_16,
+                [HANDLER_EVAL] = execute_lua_eval,
         };
 
         for (int i = 0; i < HANDLER_MAX; i++) {
===================================================================

> 
> >  	HANDLER_MAX,
> >  };
> >  
> > @@ -400,11 +402,26 @@ struct encode_lua_ctx {
> >  	struct mpstream *stream;
> >  };
> >  
> > +/**
> > + * Encode call results to msgpack from Lua stack.
> > + * Lua stack has the following structure -- the last element is
> > + * lightuserdata pointer to encode_lua_ctx, all other values are
> > + * arguments to proceed.
> 
> Typo: s/proceed/process/.

Fixes. See the iterative patch below.

===================================================================
diff --git a/src/box/lua/call.c b/src/box/lua/call.c
index 5db17359d..857b57165 100644
--- a/src/box/lua/call.c
+++ b/src/box/lua/call.c
@@ -406,7 +406,7 @@ struct encode_lua_ctx {
  * Encode call results to msgpack from Lua stack.
  * Lua stack has the following structure -- the last element is
  * lightuserdata pointer to encode_lua_ctx, all other values are
- * arguments to proceed.
+ * arguments to process.
  * The function encodes all given Lua objects to msgpack stream
  * from context, sets port's size and returns no value on the Lua
  * stack.
@@ -442,7 +442,7 @@ encode_lua_call(lua_State *L)
  * Encode call_16 results to msgpack from Lua stack.
  * Lua stack has the following structure -- the last element is
  * lightuserdata pointer to encode_lua_ctx, all other values are
- * arguments to proceed.
+ * arguments to process.
  * The function encodes all given Lua objects to msgpack stream
  * from context, sets port's size and returns no value on the Lua
  * stack.
===================================================================

> 
> > + * The function encodes all given Lua objects to msgpack stream
> > + * from context, sets port's size and returns no value on the Lua
> > + * stack.
> > + * XXX: This function *MUST* be called under lua_pcall(), because
> > + * luamp_encode() may raise an error.
> > + */
> 
> <snipped>
> 
> > +/**
> > + * Encode call_16 results to msgpack from Lua stack.
> > + * Lua stack has the following structure -- the last element is
> > + * lightuserdata pointer to encode_lua_ctx, all other values are
> > + * arguments to proceed.
> 
> Ditto.
> 
> > + * The function encodes all given Lua objects to msgpack stream
> > + * from context, sets port's size and returns no value on the Lua
> > + * stack.
> > + * XXX: This function *MUST* be called under lua_pcall(), because
> > + * luamp_encode() may raise an error.
> > + */
> 
> <snipped>
> 
> > @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
> >  	struct encode_lua_ctx ctx;
> >  	ctx.port = port;
> >  	ctx.stream = stream;
> > -	struct lua_State *L = tarantool_L;
> > -	int top = lua_gettop(L);
> > -	if (lua_cpcall(L, handler, &ctx) != 0) {
> > -		luaT_toerror(port->L);
> > +	lua_State *L = port->L;
> > +	/*
> > +	 * At the moment Lua stack holds only values to encode.
> > +	 * Insert corresponding encoder to the bottom and push
> > +	 * encode context as lightuserdata to the top.
> > +	 */
> > +	const int size = lua_gettop(L);
> > +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> > +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> > +	lua_insert(L, 1);
> 
> Side note: I don't like this line, like all we do. The only better
> approach in my head is to implement something similar to VARG frame, but
> I doubt this approach have no pure Lua violations. Let's return to this
> later a bit.
> 
> > +	lua_pushlightuserdata(L, &ctx);
> > +	/* nargs -- all arguments + lightuserdata. */
> > +	if (luaT_call(L, size + 1, 0) != 0)
> >  		return -1;
> > -	}
> > -	lua_settop(L, top);
> >  	return port->size;
> >  }
> >  
> 
> <snipped>
> 
> > @@ -1049,6 +1088,8 @@ box_lua_call_init(struct lua_State *L)
> >  		[HANDLER_CALL] = execute_lua_call,
> >  		[HANDLER_CALL_BY_REF] = execute_lua_call_by_ref,
> >  		[HANDLER_EVAL] = execute_lua_eval,
> > +		[HANDLER_ENCODE_CALL] = encode_lua_call,
> > +		[HANDLER_ENCODE_CALL_16] = encode_lua_call_16,
> 
> Minor: Entries are also not sorted.
> 
> >  	};
> >  
> >  	for (int i = 0; i < HANDLER_MAX; i++) {
> > -- 
> > 2.31.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
  2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-01 12:34 ` Igor Munkin via Tarantool-patches
@ 2021-08-04 22:29 ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-14 10:16 ` Igor Munkin via Tarantool-patches
  3 siblings, 0 replies; 13+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-04 22:29 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi! Thanks for the patch!

LGTM.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-08-02 14:25   ` Sergey Kaplun via Tarantool-patches
@ 2021-08-12 17:35     ` Igor Munkin via Tarantool-patches
  2021-08-13  7:30       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-12 17:35 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergey,

Thanks for the fixes! LGTM, with a several typos in the commit message
mentioned below. Moreover, please rebase your branch on the current
master to check nothing is broken.

On 02.08.21, Sergey Kaplun wrote:
> Hi, Igor!
> 
> Thanks for the review!
> 

<snipped>

> 
> The new commit message is the following:
> 
> ===================================================================
> lua: refactor port_lua_do_dump and encode_lua_call
> 
> The old code flow was the following:
> 
> 1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
>    arguments to encode to MessagePack.
> 
> 2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
>    or `encode_lua_call_16`() via `lua_cpcall()`.
> 
> 3) Objects on port coroutine are encoded via `luamp_encode()` or
>    `luamp_encode_call16()`.
> 
> 4) This encoding may raise an error on unprotected `port->L` coroutine.
>    This coroutine has no protected frame on it and this call should fail
>    in pure Lua.
> 
> Calling anything on unprotected coroutine is not allowed in Lua [1]:
> 
> | If an error happens outside any protected environment, Lua calls a
> | panic function
> 
> Lua 5.1 sets protection only for specific lua_State [2] and calls a
> panic function if we raise an error on unprotected lua_State [3].
> 
> Nevertheless, no panic occurs now due to two facts:
> * The first one is LuaJIT's support of C++ exception handling [4] that
>   allows to raise an error in Lua and catch it in C++ or vice versa. But
>   documentation still doesn't allow raising errors on unprotected
>   coroutines (at least we must use try-catch block).
> * The second one is the patch made in LuaJIT to restore currently
>   executed coroutine, when C function or fast function raises an
>   error [5][6] (see the related issue here [7][8]).
> 
> For these reasons, when an error occurs, the unwinder searches and finds
> the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
> coroutine and unwinds until that point (without aforementioned patches
> LuaJIT just calls a panic function and exit).

Typo: s/exit/exits/.

> 
> If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
> error from `port->L` coroutine is converted into a Tarantool error and a
> diagnostic is set.
> 
> The auxiliary usage of `tarantool_L` coroutine doesn't respect Lua
> idiomatic of usage. Internal unwinder used on M1 is not such flexible,

Typo: Too much "usage", so I propose the following wording for the
previous sentence:
| Such auxiliary usage of `tarantool_L` is not idiomatic for Lua.

> so such misuse leads to panic call. Also the `tarantool_L` usage is
> redundant. So this patch drops it and uses only minor coroutine instead

Typo: Again, not minor coroutine, but port coroutine (as we agreed in
the previous review).

> with `lua_pcall()`.
> 
> Functions to encode are saved as entrance in the `LUA_REGISTRY` table to

Typo: s/saved as entrance in/saved to/.

> reduce GC pressure, like it is done for other handlers [9].
> 
> [1]: https://www.lua.org/manual/5.2/manual.html#4.6
> [2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
> [3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
> [4]: https://luajit.org/extensions.html#exceptions
> [5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
> [6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
> [7]: https://github.com/tarantool/tarantool/issues/1516
> [8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
> [9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
> 
> Closes #6248
> Closes #4617
> ===================================================================
> 
> > > ---
> > > 
> > > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> > > See the benchmarks sources here [1].
> > > 
> > > Before patch:
> > > | Encode map: 189851357 mcs, 15.8 K ps
> > > | Encode seq: 187926351 mcs, 16.0 K ps
> > > | Encode str: 185451675 mcs, 16.2 K ps
> > > | Encode dig: 184833396 mcs, 16.2 K ps
> > > 
> > > After patch:
> > > | Encode map: 187814261 mcs, 16.0 K ps
> > > | Encode seq: 183755028 mcs, 16.3 K ps
> > > | Encode str: 181571626 mcs, 16.5 K ps
> > > | Encode dig: 181572998 mcs, 16.5 K ps
> > > 
> > > Looks like the perf doesn't degrade at least.
> > 
> > At first, I would like to emphasize that we have no option for merging
> > or not the fix for this issue.
> > 
> > Re benchmarks: It's worth to mention you're measuring two performance
> > critical changes: <lua_insert> effect and lower GC pressure. So, it's
> > interesting to see the following benchmarks:
> > * one with disabled GC and GC stats
> 
> Here the results with disabled GC:
> Before patch:
> 
> Encode map: 4679394 mcs, 21.4 K ps
> Encode seq: 4559824 mcs, 21.9 K ps
> Encode str: 4574213 mcs, 21.9 K ps
> Encode dig: 4595043 mcs, 21.8 K ps
> Encode mul: 5978444 mcs, 16.7 K ps
> 
> After:
> 
> Encode map: 4739110 mcs, 21.1 K ps
> Encode seq: 4528261 mcs, 22.1 K ps
> Encode str: 4576910 mcs, 21.8 K ps
> Encode dig: 4506142 mcs, 22.2 K ps
> Encode mul: 6016659 mcs, 16.6 K ps
> 
> I suppose, that values are almost the same, at least within the margin
> of error.
> Note: I reduced amount of iterations 30 times. So inaccuracy increased.
> 
> > * one with considerable amount of elements on Lua stack, but not
> >   triggering stack resize (AFAIU, 200 is too much)
> 
> Tried with 40 items on the stack:
> 
> Without GC:
> 
> Master:
> Encode mul: 4895280 mcs, 20.4 K ps
> 
> Branch:
> Encode mul: 4896076 mcs, 20.4 K ps
> 
> With GC:
> 
> Master:
> Encode mul: 5123580 mcs, 19.5 K ps
> 
> Branch:
> Encode mul: 5050863 mcs, 19.8 K ps
> 
> Seems pretty equal too.

Mystery. Anyway, the current performance is not lost and this is great!

> 
> > 
> > Here are my points:
> > * There is no such huge increase as a result of reducing GC pressure
> > * Moving 1-5 8-byte elements is neglible for performance
> > * Moving 200(*) elements as a result of the guest stack resize affects
> >   both patched and vanilla versions
> > * <lua_insert> measurements are affected by resizing (considering your
> >   perf stats)
> > 
> > Anyway, though these are kinda independent changes, we see no
> > performance degradation using both of them in a single patch, so I guess
> > we have no reason to worry about.
> > 
> > (*) I'm not sure about the exact amount of the elements to be moved.
> 
> Exactly 200.
> 

<snipped>

> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-08-12 17:35     ` Igor Munkin via Tarantool-patches
@ 2021-08-13  7:30       ` Sergey Kaplun via Tarantool-patches
  2021-08-13  7:41         ` Vitaliia Ioffe via Tarantool-patches
  2021-08-13  9:27         ` Sergey Kaplun via Tarantool-patches
  0 siblings, 2 replies; 13+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-13  7:30 UTC (permalink / raw)
  To: Igor Munkin; +Cc: Vladislav Shpilevoy, tarantool-patches

Igor,

On 12.08.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the fixes! LGTM, with a several typos in the commit message
> mentioned below. Moreover, please rebase your branch on the current
> master to check nothing is broken.

Branch is rebased, fixes are done, and branch is force-pushed :).

> 
> On 02.08.21, Sergey Kaplun wrote:
> > Hi, Igor!
> > 
> > Thanks for the review!
> > 
> 
> <snipped>
> 
> > 
> > The new commit message is the following:
> > 
> > ===================================================================
> > lua: refactor port_lua_do_dump and encode_lua_call
> > 
> > The old code flow was the following:
> > 
> > 1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
> >    arguments to encode to MessagePack.
> > 
> > 2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
> >    or `encode_lua_call_16`() via `lua_cpcall()`.
> > 
> > 3) Objects on port coroutine are encoded via `luamp_encode()` or
> >    `luamp_encode_call16()`.
> > 
> > 4) This encoding may raise an error on unprotected `port->L` coroutine.
> >    This coroutine has no protected frame on it and this call should fail
> >    in pure Lua.
> > 
> > Calling anything on unprotected coroutine is not allowed in Lua [1]:
> > 
> > | If an error happens outside any protected environment, Lua calls a
> > | panic function
> > 
> > Lua 5.1 sets protection only for specific lua_State [2] and calls a
> > panic function if we raise an error on unprotected lua_State [3].
> > 
> > Nevertheless, no panic occurs now due to two facts:
> > * The first one is LuaJIT's support of C++ exception handling [4] that
> >   allows to raise an error in Lua and catch it in C++ or vice versa. But
> >   documentation still doesn't allow raising errors on unprotected
> >   coroutines (at least we must use try-catch block).
> > * The second one is the patch made in LuaJIT to restore currently
> >   executed coroutine, when C function or fast function raises an
> >   error [5][6] (see the related issue here [7][8]).
> > 
> > For these reasons, when an error occurs, the unwinder searches and finds
> > the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
> > coroutine and unwinds until that point (without aforementioned patches
> > LuaJIT just calls a panic function and exit).
> 
> Typo: s/exit/exits/.

Fixed.

> 
> > 
> > If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
> > error from `port->L` coroutine is converted into a Tarantool error and a
> > diagnostic is set.
> > 
> > The auxiliary usage of `tarantool_L` coroutine doesn't respect Lua
> > idiomatic of usage. Internal unwinder used on M1 is not such flexible,
> 
> Typo: Too much "usage", so I propose the following wording for the
> previous sentence:
> | Such auxiliary usage of `tarantool_L` is not idiomatic for Lua.

Done.

> 
> > so such misuse leads to panic call. Also the `tarantool_L` usage is
> > redundant. So this patch drops it and uses only minor coroutine instead
> 
> Typo: Again, not minor coroutine, but port coroutine (as we agreed in
> the previous review).

Fixed.

> 
> > with `lua_pcall()`.
> > 
> > Functions to encode are saved as entrance in the `LUA_REGISTRY` table to
> 
> Typo: s/saved as entrance in/saved to/.

Fixed.

> 
> > reduce GC pressure, like it is done for other handlers [9].
> > 
> > [1]: https://www.lua.org/manual/5.2/manual.html#4.6
> > [2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
> > [3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
> > [4]: https://luajit.org/extensions.html#exceptions
> > [5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
> > [6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
> > [7]: https://github.com/tarantool/tarantool/issues/1516
> > [8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
> > [9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
> > 
> > Closes #6248
> > Closes #4617
> > ===================================================================
> > 
> > > > ---
> > > > 
> > > > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call

<snipped>


-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches]  [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-08-13  7:30       ` Sergey Kaplun via Tarantool-patches
@ 2021-08-13  7:41         ` Vitaliia Ioffe via Tarantool-patches
  2021-08-13  9:27         ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 13+ messages in thread
From: Vitaliia Ioffe via Tarantool-patches @ 2021-08-13  7:41 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches, Vladislav Shpilevoy

[-- Attachment #1: Type: text/plain, Size: 4552 bytes --]


Hi team, 
 
QA LGTM!
 
 
--
Vitaliia Ioffe
 
  
>Пятница, 13 августа 2021, 10:32 +03:00 от Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org>:
> 
>Igor,
>
>On 12.08.21, Igor Munkin wrote:
>> Sergey,
>>
>> Thanks for the fixes! LGTM, with a several typos in the commit message
>> mentioned below. Moreover, please rebase your branch on the current
>> master to check nothing is broken.
>
>Branch is rebased, fixes are done, and branch is force-pushed :).
>
>>
>> On 02.08.21, Sergey Kaplun wrote:
>> > Hi, Igor!
>> >
>> > Thanks for the review!
>> >
>>
>> <snipped>
>>
>> >
>> > The new commit message is the following:
>> >
>> > ===================================================================
>> > lua: refactor port_lua_do_dump and encode_lua_call
>> >
>> > The old code flow was the following:
>> >
>> > 1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
>> > arguments to encode to MessagePack.
>> >
>> > 2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
>> > or `encode_lua_call_16`() via `lua_cpcall()`.
>> >
>> > 3) Objects on port coroutine are encoded via `luamp_encode()` or
>> > `luamp_encode_call16()`.
>> >
>> > 4) This encoding may raise an error on unprotected `port->L` coroutine.
>> > This coroutine has no protected frame on it and this call should fail
>> > in pure Lua.
>> >
>> > Calling anything on unprotected coroutine is not allowed in Lua [1]:
>> >
>> > | If an error happens outside any protected environment, Lua calls a
>> > | panic function
>> >
>> > Lua 5.1 sets protection only for specific lua_State [2] and calls a
>> > panic function if we raise an error on unprotected lua_State [3].
>> >
>> > Nevertheless, no panic occurs now due to two facts:
>> > * The first one is LuaJIT's support of C++ exception handling [4] that
>> > allows to raise an error in Lua and catch it in C++ or vice versa. But
>> > documentation still doesn't allow raising errors on unprotected
>> > coroutines (at least we must use try-catch block).
>> > * The second one is the patch made in LuaJIT to restore currently
>> > executed coroutine, when C function or fast function raises an
>> > error [5][6] (see the related issue here [7][8]).
>> >
>> > For these reasons, when an error occurs, the unwinder searches and finds
>> > the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
>> > coroutine and unwinds until that point (without aforementioned patches
>> > LuaJIT just calls a panic function and exit).
>>
>> Typo: s/exit/exits/.
>
>Fixed.
>
>>
>> >
>> > If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
>> > error from `port->L` coroutine is converted into a Tarantool error and a
>> > diagnostic is set.
>> >
>> > The auxiliary usage of `tarantool_L` coroutine doesn't respect Lua
>> > idiomatic of usage. Internal unwinder used on M1 is not such flexible,
>>
>> Typo: Too much "usage", so I propose the following wording for the
>> previous sentence:
>> | Such auxiliary usage of `tarantool_L` is not idiomatic for Lua.
>
>Done.
>
>>
>> > so such misuse leads to panic call. Also the `tarantool_L` usage is
>> > redundant. So this patch drops it and uses only minor coroutine instead
>>
>> Typo: Again, not minor coroutine, but port coroutine (as we agreed in
>> the previous review).
>
>Fixed.
>
>>
>> > with `lua_pcall()`.
>> >
>> > Functions to encode are saved as entrance in the `LUA_REGISTRY` table to
>>
>> Typo: s/saved as entrance in/saved to/.
>
>Fixed.
>
>>
>> > reduce GC pressure, like it is done for other handlers [9].
>> >
>> > [1]:  https://www.lua.org/manual/5.2/manual.html#4.6
>> > [2]:  https://www.lua.org/source/5.1/lstate.h.html#lua_State
>> > [3]:  https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
>> > [4]:  https://luajit.org/extensions.html#exceptions
>> > [5]:  https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
>> > [6]:  https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
>> > [7]:  https://github.com/tarantool/tarantool/issues/1516
>> > [8]:  https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
>> > [9]:  https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
>> >
>> > Closes #6248
>> > Closes #4617
>> > ===================================================================
>> >
>> > > > ---
>> > > >
>> > > > Branch:  https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
>
><snipped>
>
>
>--
>Best regards,
>Sergey Kaplun
 

[-- Attachment #2: Type: text/html, Size: 6778 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-08-13  7:30       ` Sergey Kaplun via Tarantool-patches
  2021-08-13  7:41         ` Vitaliia Ioffe via Tarantool-patches
@ 2021-08-13  9:27         ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 13+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-13  9:27 UTC (permalink / raw)
  To: Igor Munkin, Vladislav Shpilevoy, Vitaliia Ioffe, tarantool-patches

Also, add the branch for 1.10:
https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call-1.10

The difference is that the port structure itself is pushed instead ctx
structure on the stack.

The patch is the following:
===================================================================
commit 59eafb8e6d8f41de06bdb28b930c9ff1d3fbe63e
Author: Sergey Kaplun <skaplun@tarantool.org>
Date:   Fri Jun 18 18:28:30 2021 +0300

    lua: refactor port_lua_do_dump and encode_lua_call

    (cherry picked from commit 434d31bdb52d80384af55acd2c3a637e5154e257)

    The old code flow was the following:

    1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
       arguments to encode to MessagePack.

    2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
       or `encode_lua_call_16`() via `lua_cpcall()`.

    3) Objects on port coroutine are encoded via `luamp_encode()` or
       `luamp_encode_call16()`.

    4) This encoding may raise an error on unprotected `port->L` coroutine.
       This coroutine has no protected frame on it and this call should fail
       in pure Lua.

    Calling anything on unprotected coroutine is not allowed in Lua [1]:

    | If an error happens outside any protected environment, Lua calls a
    | panic function

    Lua 5.1 sets protection only for specific lua_State [2] and calls a
    panic function if we raise an error on unprotected lua_State [3].

    Nevertheless, no panic occurs now due to two facts:
    * The first one is LuaJIT's support of C++ exception handling [4] that
      allows to raise an error in Lua and catch it in C++ or vice versa. But
      documentation still doesn't allow raising errors on unprotected
      coroutines (at least we must use try-catch block).
    * The second one is the patch made in LuaJIT to restore currently
      executed coroutine, when C function or fast function raises an
      error [5][6] (see the related issue here [7][8]).

    For these reasons, when an error occurs, the unwinder searches and finds
    the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
    coroutine and unwinds until that point (without aforementioned patches
    LuaJIT just calls a panic function and exits).

    If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
    error from `port->L` coroutine is converted into a Tarantool error and a
    diagnostic is set.

    Such auxiliary usage of `tarantool_L` is not idiomatic for Lua.
    Internal unwinder used on M1 is not such flexible, so such misuse leads
    to panic call. Also the `tarantool_L` usage is redundant. So this patch
    drops it and uses only port coroutine instead with `lua_pcall()`.

    Functions to encode are saved to the `LUA_REGISTRY` table to
    reduce GC pressure, like it is done for other handlers [9].

    [1]: https://www.lua.org/manual/5.2/manual.html#4.6
    [2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
    [3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
    [4]: https://luajit.org/extensions.html#exceptions
    [5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
    [6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
    [7]: https://github.com/tarantool/tarantool/issues/1516
    [8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
    [9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216

    Closes #6248
    Closes #4617

diff --git a/src/box/lua/call.c b/src/box/lua/call.c
index 593317265..f173d97b7 100644
--- a/src/box/lua/call.c
+++ b/src/box/lua/call.c
@@ -53,6 +53,8 @@
  */
 enum handlers {
 	HANDLER_CALL,
+	HANDLER_ENCODE_CALL,
+	HANDLER_ENCODE_CALL_16,
 	HANDLER_EVAL,
 	HANDLER_MAX,
 };
@@ -347,10 +349,26 @@ execute_lua_eval(lua_State *L)
 	return lua_gettop(L);
 }
 
+
+/**
+ * Encode call results to msgpack from Lua stack.
+ * Lua stack has the following structure -- the last element is
+ * lightuserdata pointer to encode_lua_ctx, all other values are
+ * arguments to process.
+ * The function encodes all given Lua objects to msgpack stream
+ * from context, sets port's size and returns no value on the Lua
+ * stack.
+ * XXX: This function *MUST* be called under lua_pcall(), because
+ * luamp_encode() may raise an error.
+ */
 static int
 encode_lua_call(lua_State *L)
 {
+	assert(lua_islightuserdata(L, -1));
 	struct port_lua *port = (struct port_lua *) lua_topointer(L, -1);
+	assert(port->L == L);
+	/* Delete port from the stack. */
+	lua_pop(L, 1);
 	/*
 	 * Add all elements from Lua stack to the buffer.
 	 *
@@ -361,18 +379,33 @@ encode_lua_call(lua_State *L)
 		      luamp_error, port->L);
 
 	struct luaL_serializer *cfg = luaL_msgpack_default;
-	int size = lua_gettop(port->L);
+	const int size = lua_gettop(L);
 	for (int i = 1; i <= size; ++i)
-		luamp_encode(port->L, cfg, &stream, i);
+		luamp_encode(L, cfg, &stream, i);
 	port->size = size;
 	mpstream_flush(&stream);
 	return 0;
 }
 
+/**
+ * Encode call_16 results to msgpack from Lua stack.
+ * Lua stack has the following structure -- the last element is
+ * lightuserdata pointer to encode_lua_ctx, all other values are
+ * arguments to process.
+ * The function encodes all given Lua objects to msgpack stream
+ * from context, sets port's size and returns no value on the Lua
+ * stack.
+ * XXX: This function *MUST* be called under lua_pcall(), because
+ * luamp_encode() may raise an error.
+ */
 static int
 encode_lua_call_16(lua_State *L)
 {
+	assert(lua_islightuserdata(L, -1));
 	struct port_lua *port = (struct port_lua *) lua_topointer(L, -1);
+	assert(port->L == L);
+	/* Delete port from the stack. */
+	lua_pop(L, 1);
 	/*
 	 * Add all elements from Lua stack to the buffer.
 	 *
@@ -383,42 +416,45 @@ encode_lua_call_16(lua_State *L)
 		      luamp_error, port->L);
 
 	struct luaL_serializer *cfg = luaL_msgpack_default;
-	port->size = luamp_encode_call_16(port->L, cfg, &stream);
+	port->size = luamp_encode_call_16(L, cfg, &stream);
 	mpstream_flush(&stream);
 	return 0;
 }
 
 static inline int
-port_lua_do_dump(struct port *base, struct obuf *out, lua_CFunction handler)
+port_lua_do_dump(struct port *base, struct obuf *out, enum handlers handler)
 {
 	struct port_lua *port = (struct port_lua *)base;
 	assert(port->vtab == &port_lua_vtab);
 	/* Use port to pass arguments to encoder quickly. */
 	port->out = out;
+	lua_State *L = port->L;
 	/*
-	 * Use the same global state, assuming the encoder doesn't
-	 * yield.
+	 * At the moment Lua stack holds only values to encode.
+	 * Insert corresponding encoder to the bottom and push
+	 * encode context as lightuserdata to the top.
 	 */
-	struct lua_State *L = tarantool_L;
-	int top = lua_gettop(L);
-	if (lua_cpcall(L, handler, port) != 0) {
-		luaT_toerror(port->L);
+	const int size = lua_gettop(L);
+	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
+	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
+	lua_insert(L, 1);
+	lua_pushlightuserdata(L, port);
+	/* nargs -- all arguments + lightuserdata. */
+	if (luaT_call(L, size + 1, 0) != 0)
 		return -1;
-	}
-	lua_settop(L, top);
 	return port->size;
 }
 
 static int
 port_lua_dump(struct port *base, struct obuf *out)
 {
-	return port_lua_do_dump(base, out, encode_lua_call);
+	return port_lua_do_dump(base, out, HANDLER_ENCODE_CALL);
 }
 
 static int
 port_lua_dump_16(struct port *base, struct obuf *out)
 {
-	return port_lua_do_dump(base, out, encode_lua_call_16);
+	return port_lua_do_dump(base, out, HANDLER_ENCODE_CALL_16);
 }
 
 static void
@@ -499,6 +535,8 @@ box_lua_call_init(struct lua_State *L)
 
 	lua_CFunction handles[] = {
 		[HANDLER_CALL] = execute_lua_call,
+		[HANDLER_ENCODE_CALL] = encode_lua_call,
+		[HANDLER_ENCODE_CALL_16] = encode_lua_call_16,
 		[HANDLER_EVAL] = execute_lua_eval,
 	};
 
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-08-04 22:29 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-14 10:16 ` Igor Munkin via Tarantool-patches
  3 siblings, 0 replies; 13+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-14 10:16 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Vladislav Shpilevoy, tarantool-patches

Sergey,

I've checked the patch into 1.10, 2.7, 2.8 and master.

On 18.06.21, Sergey Kaplun wrote:
> The old code flow was the following:
> 1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
>    arguments to encode to MessagePack
> 2) The main coroutine `tarantool_L` is used with `lua_cpcall()` to call
>    `encode_lua_call_16()` or `encode_lua_call()`
> 3) Objects on minor coroutine are encoded via `luamp_encode_call16()` or
>    `luamp_encode()`. This encoding may raise an error on unprotected
>    `port->L` coroutine. This coroutine has no protected frame on it
>    and this call should fail in pure Lua. It is forbidden to call
>    anything on unprotected coroutine [1] (Lua 5.1 sets protection only
>    for specific lua_State [2] and calls a panic function if we raise an
>    error on unprotected lua_State [3]). Netherless, there is no panic
>    at now due to two facts. The first one is LuaJIT's support of C++
>    exception handling [4] that allows to raise an error in Lua and
>    catch it in C++ or vice versa. But documentation still doesn't
>    permit errors on unprotected coroutines (at least we must set
>    try-catch block). The second one is double monkey-patching of LuaJIT
>    to restore currently executed coroutine, when C function or fast
>    function raises an error [5][6] (see related issue here [7][8]).
>    For these reasons, when an error occurs, the unwinder searches and
>    finds the C-protected stack frame from the `lua_cpcall()` for the
>    `tarantool_L` coroutine and unwinds until that point (without
>    aforementioned patches LuaJIT just calls a panic function and exit).
> 4) If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then
>    the error from `port->L` coroutine is converted into a Tarantool error
>    and a diagnostic is set.
> 
> The auxiliary usage of `tarantool_L` coroutine is redundant and doesn't
> respect Lua idiomatic of usage. So this patch drops it and uses only
> minor coroutine instead with `lua_pcall()`.
> 
> Functions to encode are saved as entrance in the `LUA_REGISTRY` table
> to reduce GC pressure, like it is done for other handlers [9].
> 
> [1]: https://www.lua.org/manual/5.2/manual.html#4.6
> > If an error happens outside any protected environment, Lua calls a
> > panic function
> [2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
> [3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
> [4]: https://luajit.org/extensions.html#exceptions
> [5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
> [6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
> [7]: https://github.com/tarantool/tarantool/issues/1516
> [8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
> [9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
> ---
> 
> Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> See the benchmarks sources here [1].
> 
> Before patch:
> | Encode map: 189851357 mcs, 15.8 K ps
> | Encode seq: 187926351 mcs, 16.0 K ps
> | Encode str: 185451675 mcs, 16.2 K ps
> | Encode dig: 184833396 mcs, 16.2 K ps
> 
> After patch:
> | Encode map: 187814261 mcs, 16.0 K ps
> | Encode seq: 183755028 mcs, 16.3 K ps
> | Encode str: 181571626 mcs, 16.5 K ps
> | Encode dig: 181572998 mcs, 16.5 K ps
> 
> Looks like the perf doesn't degrade at least.
> 
> [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> 
>  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 56 insertions(+), 15 deletions(-)
> 

<snipped>

> -- 
> 2.31.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-14 10:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
2021-06-29  7:07       ` Sergey Kaplun via Tarantool-patches
2021-08-01 12:34 ` Igor Munkin via Tarantool-patches
2021-08-02 14:25   ` Sergey Kaplun via Tarantool-patches
2021-08-12 17:35     ` Igor Munkin via Tarantool-patches
2021-08-13  7:30       ` Sergey Kaplun via Tarantool-patches
2021-08-13  7:41         ` Vitaliia Ioffe via Tarantool-patches
2021-08-13  9:27         ` Sergey Kaplun via Tarantool-patches
2021-08-04 22:29 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-14 10:16 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox