Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
@ 2021-06-18 18:14 Sergey Kaplun via Tarantool-patches
  2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-06-18 18:14 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Igor Munkin; +Cc: tarantool-patches

The old code flow was the following:
1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
   arguments to encode to MessagePack
2) The main coroutine `tarantool_L` is used with `lua_cpcall()` to call
   `encode_lua_call_16()` or `encode_lua_call()`
3) Objects on minor coroutine are encoded via `luamp_encode_call16()` or
   `luamp_encode()`. This encoding may raise an error on unprotected
   `port->L` coroutine. This coroutine has no protected frame on it
   and this call should fail in pure Lua. It is forbidden to call
   anything on unprotected coroutine [1] (Lua 5.1 sets protection only
   for specific lua_State [2] and calls a panic function if we raise an
   error on unprotected lua_State [3]). Netherless, there is no panic
   at now due to two facts. The first one is LuaJIT's support of C++
   exception handling [4] that allows to raise an error in Lua and
   catch it in C++ or vice versa. But documentation still doesn't
   permit errors on unprotected coroutines (at least we must set
   try-catch block). The second one is double monkey-patching of LuaJIT
   to restore currently executed coroutine, when C function or fast
   function raises an error [5][6] (see related issue here [7][8]).
   For these reasons, when an error occurs, the unwinder searches and
   finds the C-protected stack frame from the `lua_cpcall()` for the
   `tarantool_L` coroutine and unwinds until that point (without
   aforementioned patches LuaJIT just calls a panic function and exit).
4) If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then
   the error from `port->L` coroutine is converted into a Tarantool error
   and a diagnostic is set.

The auxiliary usage of `tarantool_L` coroutine is redundant and doesn't
respect Lua idiomatic of usage. So this patch drops it and uses only
minor coroutine instead with `lua_pcall()`.

Functions to encode are saved as entrance in the `LUA_REGISTRY` table
to reduce GC pressure, like it is done for other handlers [9].

[1]: https://www.lua.org/manual/5.2/manual.html#4.6
> If an error happens outside any protected environment, Lua calls a
> panic function
[2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
[3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
[4]: https://luajit.org/extensions.html#exceptions
[5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
[6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
[7]: https://github.com/tarantool/tarantool/issues/1516
[8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
[9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216
---

Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
See the benchmarks sources here [1].

Before patch:
| Encode map: 189851357 mcs, 15.8 K ps
| Encode seq: 187926351 mcs, 16.0 K ps
| Encode str: 185451675 mcs, 16.2 K ps
| Encode dig: 184833396 mcs, 16.2 K ps

After patch:
| Encode map: 187814261 mcs, 16.0 K ps
| Encode seq: 183755028 mcs, 16.3 K ps
| Encode str: 181571626 mcs, 16.5 K ps
| Encode dig: 181572998 mcs, 16.5 K ps

Looks like the perf doesn't degrade at least.

[1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3

 src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 56 insertions(+), 15 deletions(-)

diff --git a/src/box/lua/call.c b/src/box/lua/call.c
index 0315e720c..3b2572096 100644
--- a/src/box/lua/call.c
+++ b/src/box/lua/call.c
@@ -61,6 +61,8 @@ enum handlers {
 	HANDLER_CALL,
 	HANDLER_CALL_BY_REF,
 	HANDLER_EVAL,
+	HANDLER_ENCODE_CALL,
+	HANDLER_ENCODE_CALL_16,
 	HANDLER_MAX,
 };
 
@@ -400,11 +402,26 @@ struct encode_lua_ctx {
 	struct mpstream *stream;
 };
 
+/**
+ * Encode call results to msgpack from Lua stack.
+ * Lua stack has the following structure -- the last element is
+ * lightuserdata pointer to encode_lua_ctx, all other values are
+ * arguments to proceed.
+ * The function encodes all given Lua objects to msgpack stream
+ * from context, sets port's size and returns no value on the Lua
+ * stack.
+ * XXX: This function *MUST* be called under lua_pcall(), because
+ * luamp_encode() may raise an error.
+ */
 static int
 encode_lua_call(lua_State *L)
 {
+	assert(lua_islightuserdata(L, -1));
 	struct encode_lua_ctx *ctx =
-		(struct encode_lua_ctx *) lua_topointer(L, 1);
+		(struct encode_lua_ctx *) lua_topointer(L, -1);
+	assert(ctx->port->L == L);
+	/* Delete ctx from the stack. */
+	lua_pop(L, 1);
 	/*
 	 * Add all elements from Lua stack to the buffer.
 	 *
@@ -413,33 +430,48 @@ encode_lua_call(lua_State *L)
 	struct luaL_serializer *cfg = luaL_msgpack_default;
 	const struct serializer_opts *opts =
 		&current_session()->meta.serializer_opts;
-	int size = lua_gettop(ctx->port->L);
+	const int size = lua_gettop(L);
 	for (int i = 1; i <= size; ++i)
-		luamp_encode(ctx->port->L, cfg, opts, ctx->stream, i);
+		luamp_encode(L, cfg, opts, ctx->stream, i);
 	ctx->port->size = size;
 	mpstream_flush(ctx->stream);
 	return 0;
 }
 
+/**
+ * Encode call_16 results to msgpack from Lua stack.
+ * Lua stack has the following structure -- the last element is
+ * lightuserdata pointer to encode_lua_ctx, all other values are
+ * arguments to proceed.
+ * The function encodes all given Lua objects to msgpack stream
+ * from context, sets port's size and returns no value on the Lua
+ * stack.
+ * XXX: This function *MUST* be called under lua_pcall(), because
+ * luamp_encode() may raise an error.
+ */
 static int
 encode_lua_call_16(lua_State *L)
 {
+	assert(lua_islightuserdata(L, -1));
 	struct encode_lua_ctx *ctx =
-		(struct encode_lua_ctx *) lua_topointer(L, 1);
+		(struct encode_lua_ctx *) lua_topointer(L, -1);
+	/* Delete ctx from the stack. */
+	lua_pop(L, 1);
+	assert(ctx->port->L == L);
 	/*
 	 * Add all elements from Lua stack to the buffer.
 	 *
 	 * TODO: forbid explicit yield from __serialize or __index here
 	 */
 	struct luaL_serializer *cfg = luaL_msgpack_default;
-	ctx->port->size = luamp_encode_call_16(ctx->port->L, cfg, ctx->stream);
+	ctx->port->size = luamp_encode_call_16(L, cfg, ctx->stream);
 	mpstream_flush(ctx->stream);
 	return 0;
 }
 
 static inline int
 port_lua_do_dump(struct port *base, struct mpstream *stream,
-		 lua_CFunction handler)
+		 enum handlers handler)
 {
 	struct port_lua *port = (struct port_lua *) base;
 	assert(port->vtab == &port_lua_vtab);
@@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
 	struct encode_lua_ctx ctx;
 	ctx.port = port;
 	ctx.stream = stream;
-	struct lua_State *L = tarantool_L;
-	int top = lua_gettop(L);
-	if (lua_cpcall(L, handler, &ctx) != 0) {
-		luaT_toerror(port->L);
+	lua_State *L = port->L;
+	/*
+	 * At the moment Lua stack holds only values to encode.
+	 * Insert corresponding encoder to the bottom and push
+	 * encode context as lightuserdata to the top.
+	 */
+	const int size = lua_gettop(L);
+	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
+	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
+	lua_insert(L, 1);
+	lua_pushlightuserdata(L, &ctx);
+	/* nargs -- all arguments + lightuserdata. */
+	if (luaT_call(L, size + 1, 0) != 0)
 		return -1;
-	}
-	lua_settop(L, top);
 	return port->size;
 }
 
@@ -467,7 +506,7 @@ port_lua_dump(struct port *base, struct obuf *out)
 	struct mpstream stream;
 	mpstream_init(&stream, out, obuf_reserve_cb, obuf_alloc_cb,
 		      luamp_error, port->L);
-	return port_lua_do_dump(base, &stream, encode_lua_call);
+	return port_lua_do_dump(base, &stream, HANDLER_ENCODE_CALL);
 }
 
 static int
@@ -477,7 +516,7 @@ port_lua_dump_16(struct port *base, struct obuf *out)
 	struct mpstream stream;
 	mpstream_init(&stream, out, obuf_reserve_cb, obuf_alloc_cb,
 		      luamp_error, port->L);
-	return port_lua_do_dump(base, &stream, encode_lua_call_16);
+	return port_lua_do_dump(base, &stream, HANDLER_ENCODE_CALL_16);
 }
 
 static void
@@ -501,7 +540,7 @@ port_lua_get_msgpack(struct port *base, uint32_t *size)
 	mpstream_init(&stream, region, region_reserve_cb, region_alloc_cb,
 		      luamp_error, port->L);
 	mpstream_encode_array(&stream, lua_gettop(port->L));
-	int rc = port_lua_do_dump(base, &stream, encode_lua_call);
+	int rc = port_lua_do_dump(base, &stream, HANDLER_ENCODE_CALL);
 	if (rc < 0) {
 		region_truncate(region, region_svp);
 		return NULL;
@@ -1049,6 +1088,8 @@ box_lua_call_init(struct lua_State *L)
 		[HANDLER_CALL] = execute_lua_call,
 		[HANDLER_CALL_BY_REF] = execute_lua_call_by_ref,
 		[HANDLER_EVAL] = execute_lua_eval,
+		[HANDLER_ENCODE_CALL] = encode_lua_call,
+		[HANDLER_ENCODE_CALL_16] = encode_lua_call_16,
 	};
 
 	for (int i = 0; i < HANDLER_MAX; i++) {
-- 
2.31.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
@ 2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-21 20:36 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin; +Cc: tarantool-patches

Hi! Thanks for the patch!

> Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> See the benchmarks sources here [1].
> 
> Before patch:
> | Encode map: 189851357 mcs, 15.8 K ps
> | Encode seq: 187926351 mcs, 16.0 K ps
> | Encode str: 185451675 mcs, 16.2 K ps
> | Encode dig: 184833396 mcs, 16.2 K ps
> 
> After patch:
> | Encode map: 187814261 mcs, 16.0 K ps
> | Encode seq: 183755028 mcs, 16.3 K ps
> | Encode str: 181571626 mcs, 16.5 K ps
> | Encode dig: 181572998 mcs, 16.5 K ps
> 
> Looks like the perf doesn't degrade at least.
> 
> [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> 
>  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 56 insertions(+), 15 deletions(-)
> 
> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> index 0315e720c..3b2572096 100644
> --- a/src/box/lua/call.c
> +++ b/src/box/lua/call.c
> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
>  	struct encode_lua_ctx ctx;
>  	ctx.port = port;
>  	ctx.stream = stream;
> -	struct lua_State *L = tarantool_L;
> -	int top = lua_gettop(L);
> -	if (lua_cpcall(L, handler, &ctx) != 0) {
> -		luaT_toerror(port->L);
> +	lua_State *L = port->L;
> +	/*
> +	 * At the moment Lua stack holds only values to encode.
> +	 * Insert corresponding encoder to the bottom and push
> +	 * encode context as lightuserdata to the top.
> +	 */
> +	const int size = lua_gettop(L);
> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> +	lua_insert(L, 1);

If I remember correctly, this leads to moving all the existing
stack elements forward. Which might be expensive. I know from
Vlad Grubov's words that they have code with hundreds of values in
multireturn from stored functions. Did you bench what happens when
the Lua coroutine contains a lot of values? In the bench by the
link above I see only 1-element array and map. Size of the array
and map does not matter though. Only multireturn is interesting
here. Like 'return 1, 2, 3, ...'.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
  2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-06-22 13:38 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi! Thanks for the review!

On 21.06.21, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-refactor-lua-call
> > See the benchmarks sources here [1].
> > 
> > Before patch:
> > | Encode map: 189851357 mcs, 15.8 K ps
> > | Encode seq: 187926351 mcs, 16.0 K ps
> > | Encode str: 185451675 mcs, 16.2 K ps
> > | Encode dig: 184833396 mcs, 16.2 K ps
> > 
> > After patch:
> > | Encode map: 187814261 mcs, 16.0 K ps
> > | Encode seq: 183755028 mcs, 16.3 K ps
> > | Encode str: 181571626 mcs, 16.5 K ps
> > | Encode dig: 181572998 mcs, 16.5 K ps
> > 
> > Looks like the perf doesn't degrade at least.
> > 
> > [1]: https://gist.github.com/Buristan/3e6d6bf2c722874bec55a8c5a44b98f3
> > 
> >  src/box/lua/call.c | 71 ++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 56 insertions(+), 15 deletions(-)
> > 
> > diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> > index 0315e720c..3b2572096 100644
> > --- a/src/box/lua/call.c
> > +++ b/src/box/lua/call.c
> > @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
> >  	struct encode_lua_ctx ctx;
> >  	ctx.port = port;
> >  	ctx.stream = stream;
> > -	struct lua_State *L = tarantool_L;
> > -	int top = lua_gettop(L);
> > -	if (lua_cpcall(L, handler, &ctx) != 0) {
> > -		luaT_toerror(port->L);
> > +	lua_State *L = port->L;
> > +	/*
> > +	 * At the moment Lua stack holds only values to encode.
> > +	 * Insert corresponding encoder to the bottom and push
> > +	 * encode context as lightuserdata to the top.
> > +	 */
> > +	const int size = lua_gettop(L);
> > +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> > +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> > +	lua_insert(L, 1);
> 
> If I remember correctly, this leads to moving all the existing
> stack elements forward. Which might be expensive. I know from
> Vlad Grubov's words that they have code with hundreds of values in
> multireturn from stored functions. Did you bench what happens when
> the Lua coroutine contains a lot of values? In the bench by the
> link above I see only 1-element array and map. Size of the array
> and map does not matter though. Only multireturn is interesting
> here. Like 'return 1, 2, 3, ...'.

I've added this benchmark (200 numbers to return) to the gist.
Local results for the bench:

Master:
| Encode mul: 237900953 mcs, 12.6 K ps

My branch:
| Encode mul: 235735350 mcs, 12.7 K ps

`luamp_encode()` has the biggest impact in `port_do_lua_dump()`
(`lua_insert()` costs ~0.1% of the whole runtime).

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
@ 2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-29  7:07       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-24 20:00 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Thanks for the investigation!

>>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
>>> index 0315e720c..3b2572096 100644
>>> --- a/src/box/lua/call.c
>>> +++ b/src/box/lua/call.c
>>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
>>>  	struct encode_lua_ctx ctx;
>>>  	ctx.port = port;
>>>  	ctx.stream = stream;
>>> -	struct lua_State *L = tarantool_L;
>>> -	int top = lua_gettop(L);
>>> -	if (lua_cpcall(L, handler, &ctx) != 0) {
>>> -		luaT_toerror(port->L);
>>> +	lua_State *L = port->L;
>>> +	/*
>>> +	 * At the moment Lua stack holds only values to encode.
>>> +	 * Insert corresponding encoder to the bottom and push
>>> +	 * encode context as lightuserdata to the top.
>>> +	 */
>>> +	const int size = lua_gettop(L);
>>> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
>>> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
>>> +	lua_insert(L, 1);
>>
>> If I remember correctly, this leads to moving all the existing
>> stack elements forward. Which might be expensive. I know from
>> Vlad Grubov's words that they have code with hundreds of values in
>> multireturn from stored functions. Did you bench what happens when
>> the Lua coroutine contains a lot of values? In the bench by the
>> link above I see only 1-element array and map. Size of the array
>> and map does not matter though. Only multireturn is interesting
>> here. Like 'return 1, 2, 3, ...'.
> 
> I've added this benchmark (200 numbers to return) to the gist.
> Local results for the bench:
> 
> Master:
> | Encode mul: 237900953 mcs, 12.6 K ps
> 
> My branch:
> | Encode mul: 235735350 mcs, 12.7 K ps
> 
> `luamp_encode()` has the biggest impact in `port_do_lua_dump()`
> (`lua_insert()` costs ~0.1% of the whole runtime).

That looks suspicious. Why does it have such a small impact? Did
you check that multireturn really made the stack bigger? Shouldn't
it have linear complexity depending on the stack size after the
target index?

The patch looks fine to me, if the benchmarks show there is even a
slight perf win. But I think Igor should take another look.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call
  2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-29  7:07       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-06-29  7:07 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Thanks for the questions!

On 24.06.21, Vladislav Shpilevoy wrote:
> Thanks for the investigation!
> 
> >>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
> >>> index 0315e720c..3b2572096 100644
> >>> --- a/src/box/lua/call.c
> >>> +++ b/src/box/lua/call.c
> >>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
> >>>  	struct encode_lua_ctx ctx;
> >>>  	ctx.port = port;
> >>>  	ctx.stream = stream;
> >>> -	struct lua_State *L = tarantool_L;
> >>> -	int top = lua_gettop(L);
> >>> -	if (lua_cpcall(L, handler, &ctx) != 0) {
> >>> -		luaT_toerror(port->L);
> >>> +	lua_State *L = port->L;
> >>> +	/*
> >>> +	 * At the moment Lua stack holds only values to encode.
> >>> +	 * Insert corresponding encoder to the bottom and push
> >>> +	 * encode context as lightuserdata to the top.
> >>> +	 */
> >>> +	const int size = lua_gettop(L);
> >>> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
> >>> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
> >>> +	lua_insert(L, 1);
> >>
> >> If I remember correctly, this leads to moving all the existing
> >> stack elements forward. Which might be expensive. I know from
> >> Vlad Grubov's words that they have code with hundreds of values in
> >> multireturn from stored functions. Did you bench what happens when
> >> the Lua coroutine contains a lot of values? In the bench by the
> >> link above I see only 1-element array and map. Size of the array
> >> and map does not matter though. Only multireturn is interesting
> >> here. Like 'return 1, 2, 3, ...'.
> > 
> > I've added this benchmark (200 numbers to return) to the gist.
> > Local results for the bench:
> > 
> > Master:
> > | Encode mul: 237900953 mcs, 12.6 K ps
> > 
> > My branch:
> > | Encode mul: 235735350 mcs, 12.7 K ps
> > 
> > `luamp_encode()` has the biggest impact in `port_do_lua_dump()`
> > (`lua_insert()` costs ~0.1% of the whole runtime).
> 
> That looks suspicious. Why does it have such a small impact? Did

I've run the benchmark for return case only with `perf stat` (and 5K
return values) and found three interesting metrics (others are pretty
the same, up to 0.07%):

Master:
| 10,255,748,650 cache-misses      # 13.686 % of all cache refs         (12.49%)
| 1,364,762,310  LLC-load-misses   # 15.92% of all LL-cache accesses    (12.50%)
| 401,136,504    iTLB-load-misses  # 39.68% of all iTLB cache accesses  (12.51%)

My branch:
| 9,964,939,197  cache-misses      # 13.413 % of all cache refs         (12.52%)
| 1,296,545,699  LLC-load-misses   # 15.08% of all LL-cache accesses    (12.50%)
| 285,852,865    iTLB-load-misses  # 29.13% of all iTLB cache accesses  (12.49%)

1) LLC (last-line cache, L2 in my case) misses is slightly decreasing.
I suppose that all insert movement operations use cache (L1 or L2)
instead memory access, that make it rather cheaply.

2) iTLB (instruction translation look-aside buffer) [1] needs to avoid
penalty of page walks to a page table. So looks like within this patch
code locality is improved (may be due to the fact of using
`luaT_call()` instead of 'lua_cpcall()` and 'luaT_toerror()`).

> you check that multireturn really made the stack bigger? Shouldn't

Yes, I did.

> it have linear complexity depending on the stack size after the
> target index?

Yes, with 5K items to return it becomes pretty equal or even a little
bit slower (up to ~1%) than the original code. But encoding is still
taking the biggest part of the runtime.

Side note: if user wants to return 5K values (or a table with 5K values)
from Tarantool we can't stop him, but only to offer him our empathy.

> 
> The patch looks fine to me, if the benchmarks show there is even a
> slight perf win. But I think Igor should take another look.

[1]: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/front-end-bound/itlb-overhead.html

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-29  7:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 18:14 [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Sergey Kaplun via Tarantool-patches
2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-22 13:38   ` Sergey Kaplun via Tarantool-patches
2021-06-24 20:00     ` Vladislav Shpilevoy via Tarantool-patches
2021-06-29  7:07       ` Sergey Kaplun via Tarantool-patches

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git