Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
@ 2020-07-20 11:28 Igor Munkin
  2020-07-20 12:10 ` Timur Safin
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Igor Munkin @ 2020-07-20 11:28 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Sergey Ostanevich; +Cc: tarantool-patches

<luaT_newthread> created a new GCfunc object for the helper invoked in a
protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
call. The change introduces a static reference to a GCfunc object for
<luaT_newthread_wrapper> to be initialized on Tarantool startup to
reduce Lua GC memory usage.

Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
created Lua coroutine need to be pushed back to prevent its sweep. So
to reduce guest stack manipulations <lua_cpcall> is replaced with
<lua_pcall> and the resulting Lua thread is obtained via guest stack.

Signed-off-by: Igor Munkin <imun@tarantool.org>
---
Recently we discussed with Timur his struggling with linking his binary
Lua module against Tarantool. The reason is LuaJIT internals usage for
manipulations with the guest stack that are not provided by the binary.
I glanced the current <luaT_newthread> implementation and found out two
another problems related to the platform overall performance (as it is
proved with the corresponding benchmarks).

The first problem is the similar one <box_process_lua> had prior to the
patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
function to be called in protected frame. However this function is the
same throughout the platform uptime. It can be created on Taranool
startup and I see no reason to clobber GC that way.

Another problem I found are excess manipulations with the guest stack:
one need the newly born coroutine on it to prevent it being collected by
GC, but <lua_cpcall> purges everything left on the stack in scope of the
invoked function. As a result the obtained Lua coroutine is "pushed"
back to the guest stack via LuaJIT internal interfaces. It's a bit
ridiculous, since one can just use public API to get the same results:
Lua coroutine on the guest stack and the corresponding pointer to it.

I tested the platform performance with the same benchmark[2] I made for
the <box_process_lua> patch and here are the numbers I obtained after
the 15 runs:
* Vanilla bleeding master (mean):
| ===== 2.5.0-267-gbf047ad44 =====
| call by ref GC: 921877 Kb
| call by ref time: 1.340172 sec
| call GC: 476563.991667 Kb
| eval GC: 655274.935547 Kb
* Patched bleeding master (mean):
| ===== 2.5.0-268-gec0eb12f4 =====
| call by ref GC: 859377 Kb
| call by ref time: 1.215410 sec
| call GC: 445313 Kb
| eval GC: 624024 Kb
* Relative measurements (before -> after):
| call by ref GC: -6% (-62500 Kb)
| call by ref time: -9% (-0.124762 sec)
| call GC: -6% (-31250 Kb)
| eval GC: -4% (-31250 Kb)

There is one hot path I left unverified -- Lua-born fibers creation, but
I guess the relative numbers are quite similar to the ones I mentioned
above. However, if one wonders these results, feel free to ask me.

Branch: https://github.com/tarantool/tarantool/tree/imun/experimental-luaT-newthread

@ChangeLog:
* Improved safe Lua coroutine creation for the case of fiber
  initialization. Prepared GCfunc object is used instead of temporary
  one, resulting in 3-6% garbage collection reduction. Also excess guest
  stack manipulations are removed.

 src/lua/utils.c | 32 ++++++++++++++++++++++++++++++++
 src/lua/utils.h | 38 ++++++++++----------------------------
 2 files changed, 42 insertions(+), 28 deletions(-)

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-June/017834.html
[2]: https://gist.github.com/igormunkin/c941074fa9fdf0f7a4f068498fb5e24c

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d7257..23ccbc3c9 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -43,6 +43,8 @@ int luaL_nil_ref = LUA_REFNIL;
 int luaL_map_metatable_ref = LUA_REFNIL;
 int luaL_array_metatable_ref = LUA_REFNIL;
 
+static int luaT_newthread_ref = LUA_NOREF;
+
 static uint32_t CTID_STRUCT_IBUF;
 static uint32_t CTID_STRUCT_IBUF_PTR;
 static uint32_t CTID_CHAR_PTR;
@@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it)
 
 /* }}} */
 
+/**
+ * @brief A wrapper for <lua_newthread> to be called via luaT_call
+ * in luaT_newthread. Whether new Lua coroutine is created it is
+ * returned on the top of the guest stack.
+ * @param L is a Lua state
+ * @sa <lua_newthread>
+ */
+static int
+luaT_newthread_wrapper(lua_State *L)
+{
+	(void)lua_newthread(L);
+	return 1;
+}
+
+lua_State *
+luaT_newthread(lua_State *L)
+{
+	assert(luaT_newthread_ref != LUA_NOREF);
+	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
+	assert(lua_isfunction(L, -1));
+	if (luaT_call(L, 0, 1) != 0)
+		return NULL;
+	lua_State *L1 = lua_tothread(L, -1);
+	assert(L1 != NULL);
+	return L1;
+}
+
 int
 tarantool_lua_utils_init(struct lua_State *L)
 {
@@ -1274,5 +1303,8 @@ tarantool_lua_utils_init(struct lua_State *L)
 	(void) rc;
 	CTID_UUID = luaL_ctypeid(L, "struct tt_uuid");
 	assert(CTID_UUID != 0);
+
+	lua_pushcfunction(L, luaT_newthread_wrapper);
+	luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
 	return 0;
 }
diff --git a/src/lua/utils.h b/src/lua/utils.h
index b10754e4a..404eafe9b 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -597,34 +597,16 @@ luaL_checkfinite(struct lua_State *L, struct luaL_serializer *cfg,
 }
 
 /**
- * @brief A wrapper for lua_newthread() to pass it into luaT_cpcall
- * @param L is a Lua State
- * @sa lua_newthread()
- */
-static inline int
-luaT_newthread_wrapper(lua_State *L)
-{
-	*(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
-	return 0;
-}
-
-/**
- * @brief Safe wrapper for lua_newthread()
- * @param L is a Lua State
- * @sa lua_newthread()
- */
-static inline lua_State *
-luaT_newthread(lua_State *L)
-{
-	lua_State *L1 = NULL;
-	if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) {
-		return NULL;
-	}
-	assert(L1 != NULL);
-	setthreadV(L, L->top, L1);
-	incr_top(L);
-	return L1;
-}
+ * @brief Creates a new Lua coroutine in a protected frame. If
+ * <lua_newthread> call underneath succeeds, the created Lua state
+ * is on the top of the guest stack and a pointer to this state is
+ * returned. Otherwise LUA_ERRMEM error is handled and the result
+ * is NULL.
+ * @param L is a Lua state
+ * @sa <lua_newthread>
+ */
+lua_State *
+luaT_newthread(lua_State *L);
 
 /**
  * Check if a value on @a L stack by index @a idx is an ibuf
-- 
2.25.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-20 11:28 [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance Igor Munkin
@ 2020-07-20 12:10 ` Timur Safin
  2020-07-22 11:30 ` Sergey Ostanevich
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Timur Safin @ 2020-07-20 12:10 UTC (permalink / raw)
  To: 'Igor Munkin', 'Vladislav Shpilevoy',
	'Sergey Ostanevich'
  Cc: tarantool-patches



: -----Original Message-----
: From: Igor Munkin <imun@tarantool.org>
: Subject: [PATCH] lua/utils: improve luaT_newthread performance
: 
: index b10754e4a..404eafe9b 100644

...
: --- a/src/lua/utils.h
: +++ b/src/lua/utils.h
: @@ -597,34 +597,16 @@ luaL_checkfinite(struct lua_State *L, struct
: luaL_serializer *cfg,
:  }
: 
:  /**
: - * @brief A wrapper for lua_newthread() to pass it into luaT_cpcall
: - * @param L is a Lua State
: - * @sa lua_newthread()
: - */
: -static inline int
: -luaT_newthread_wrapper(lua_State *L)
: -{
: -	*(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
: -	return 0;
: -}
: -
: -/**
: - * @brief Safe wrapper for lua_newthread()
: - * @param L is a Lua State
: - * @sa lua_newthread()
: - */
: -static inline lua_State *
: -luaT_newthread(lua_State *L)
: -{
: -	lua_State *L1 = NULL;
: -	if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) {
: -		return NULL;
: -	}
: -	assert(L1 != NULL);
: -	setthreadV(L, L->top, L1);
: -	incr_top(L);
: -	return L1;
: -}
: + * @brief Creates a new Lua coroutine in a protected frame. If
: + * <lua_newthread> call underneath succeeds, the created Lua state
: + * is on the top of the guest stack and a pointer to this state is
: + * returned. Otherwise LUA_ERRMEM error is handled and the result
: + * is NULL.
: + * @param L is a Lua state
: + * @sa <lua_newthread>
: + */
: +lua_State *
: +luaT_newthread(lua_State *L);
: 
:  /**
:   * Check if a value on @a L stack by index @a idx is an ibuf
: --
: 2.25.0


Woohoo! I'm all for this patch - but just to remind all that I need 
similar patch for 1.10 (because our merger moduels supposed to work 
there as well)

Timur

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-20 11:28 [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance Igor Munkin
  2020-07-20 12:10 ` Timur Safin
@ 2020-07-22 11:30 ` Sergey Ostanevich
  2020-07-22 15:12   ` Igor Munkin
  2020-07-23 21:23 ` Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Sergey Ostanevich @ 2020-07-22 11:30 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy

Hi!

Thanks for the patch, LGTM.

I wonder if it can be done in a more safe way in luaT_newthread() so
that if the refernece is not initialized then initialize it, rather 
assert in debug and - I suppose - fail with creation in release? I see
an overhead for condition here and making even bigger indirect 
machinery for the wrappers itself won't bring a lot either.

Regards,
Sergos


On 20 Jul 14:28, Igor Munkin wrote:
> <luaT_newthread> created a new GCfunc object for the helper invoked in a
> protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> call. The change introduces a static reference to a GCfunc object for
> <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> reduce Lua GC memory usage.
> 
> Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> created Lua coroutine need to be pushed back to prevent its sweep. So
> to reduce guest stack manipulations <lua_cpcall> is replaced with
> <lua_pcall> and the resulting Lua thread is obtained via guest stack.
> 
> Signed-off-by: Igor Munkin <imun@tarantool.org>
> ---
> Recently we discussed with Timur his struggling with linking his binary
> Lua module against Tarantool. The reason is LuaJIT internals usage for
> manipulations with the guest stack that are not provided by the binary.
> I glanced the current <luaT_newthread> implementation and found out two
> another problems related to the platform overall performance (as it is
> proved with the corresponding benchmarks).
> 
> The first problem is the similar one <box_process_lua> had prior to the
> patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
> function to be called in protected frame. However this function is the
> same throughout the platform uptime. It can be created on Taranool
> startup and I see no reason to clobber GC that way.
> 
> Another problem I found are excess manipulations with the guest stack:
> one need the newly born coroutine on it to prevent it being collected by
> GC, but <lua_cpcall> purges everything left on the stack in scope of the
> invoked function. As a result the obtained Lua coroutine is "pushed"
> back to the guest stack via LuaJIT internal interfaces. It's a bit
> ridiculous, since one can just use public API to get the same results:
> Lua coroutine on the guest stack and the corresponding pointer to it.
> 
> I tested the platform performance with the same benchmark[2] I made for
> the <box_process_lua> patch and here are the numbers I obtained after
> the 15 runs:
> * Vanilla bleeding master (mean):
> | ===== 2.5.0-267-gbf047ad44 =====
> | call by ref GC: 921877 Kb
> | call by ref time: 1.340172 sec
> | call GC: 476563.991667 Kb
> | eval GC: 655274.935547 Kb
> * Patched bleeding master (mean):
> | ===== 2.5.0-268-gec0eb12f4 =====
> | call by ref GC: 859377 Kb
> | call by ref time: 1.215410 sec
> | call GC: 445313 Kb
> | eval GC: 624024 Kb
> * Relative measurements (before -> after):
> | call by ref GC: -6% (-62500 Kb)
> | call by ref time: -9% (-0.124762 sec)
> | call GC: -6% (-31250 Kb)
> | eval GC: -4% (-31250 Kb)
> 
> There is one hot path I left unverified -- Lua-born fibers creation, but
> I guess the relative numbers are quite similar to the ones I mentioned
> above. However, if one wonders these results, feel free to ask me.
> 
> Branch: https://github.com/tarantool/tarantool/tree/imun/experimental-luaT-newthread
> 
> @ChangeLog:
> * Improved safe Lua coroutine creation for the case of fiber
>   initialization. Prepared GCfunc object is used instead of temporary
>   one, resulting in 3-6% garbage collection reduction. Also excess guest
>   stack manipulations are removed.
> 
>  src/lua/utils.c | 32 ++++++++++++++++++++++++++++++++
>  src/lua/utils.h | 38 ++++++++++----------------------------
>  2 files changed, 42 insertions(+), 28 deletions(-)
> 
> [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-June/017834.html
> [2]: https://gist.github.com/igormunkin/c941074fa9fdf0f7a4f068498fb5e24c
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 0b05d7257..23ccbc3c9 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -43,6 +43,8 @@ int luaL_nil_ref = LUA_REFNIL;
>  int luaL_map_metatable_ref = LUA_REFNIL;
>  int luaL_array_metatable_ref = LUA_REFNIL;
>  
> +static int luaT_newthread_ref = LUA_NOREF;
> +
>  static uint32_t CTID_STRUCT_IBUF;
>  static uint32_t CTID_STRUCT_IBUF_PTR;
>  static uint32_t CTID_CHAR_PTR;
> @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it)
>  
>  /* }}} */
>  
> +/**
> + * @brief A wrapper for <lua_newthread> to be called via luaT_call
> + * in luaT_newthread. Whether new Lua coroutine is created it is
> + * returned on the top of the guest stack.
> + * @param L is a Lua state
> + * @sa <lua_newthread>
> + */
> +static int
> +luaT_newthread_wrapper(lua_State *L)
> +{
> +	(void)lua_newthread(L);
> +	return 1;
> +}
> +
> +lua_State *
> +luaT_newthread(lua_State *L)
> +{
> +	assert(luaT_newthread_ref != LUA_NOREF);
> +	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
> +	assert(lua_isfunction(L, -1));
> +	if (luaT_call(L, 0, 1) != 0)
> +		return NULL;
> +	lua_State *L1 = lua_tothread(L, -1);
> +	assert(L1 != NULL);
> +	return L1;
> +}
> +
>  int
>  tarantool_lua_utils_init(struct lua_State *L)
>  {
> @@ -1274,5 +1303,8 @@ tarantool_lua_utils_init(struct lua_State *L)
>  	(void) rc;
>  	CTID_UUID = luaL_ctypeid(L, "struct tt_uuid");
>  	assert(CTID_UUID != 0);
> +
> +	lua_pushcfunction(L, luaT_newthread_wrapper);
> +	luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
>  	return 0;
>  }
> diff --git a/src/lua/utils.h b/src/lua/utils.h
> index b10754e4a..404eafe9b 100644
> --- a/src/lua/utils.h
> +++ b/src/lua/utils.h
> @@ -597,34 +597,16 @@ luaL_checkfinite(struct lua_State *L, struct luaL_serializer *cfg,
>  }
>  
>  /**
> - * @brief A wrapper for lua_newthread() to pass it into luaT_cpcall
> - * @param L is a Lua State
> - * @sa lua_newthread()
> - */
> -static inline int
> -luaT_newthread_wrapper(lua_State *L)
> -{
> -	*(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
> -	return 0;
> -}
> -
> -/**
> - * @brief Safe wrapper for lua_newthread()
> - * @param L is a Lua State
> - * @sa lua_newthread()
> - */
> -static inline lua_State *
> -luaT_newthread(lua_State *L)
> -{
> -	lua_State *L1 = NULL;
> -	if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) {
> -		return NULL;
> -	}
> -	assert(L1 != NULL);
> -	setthreadV(L, L->top, L1);
> -	incr_top(L);
> -	return L1;
> -}
> + * @brief Creates a new Lua coroutine in a protected frame. If
> + * <lua_newthread> call underneath succeeds, the created Lua state
> + * is on the top of the guest stack and a pointer to this state is
> + * returned. Otherwise LUA_ERRMEM error is handled and the result
> + * is NULL.
> + * @param L is a Lua state
> + * @sa <lua_newthread>
> + */
> +lua_State *
> +luaT_newthread(lua_State *L);
>  
>  /**
>   * Check if a value on @a L stack by index @a idx is an ibuf
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-22 11:30 ` Sergey Ostanevich
@ 2020-07-22 15:12   ` Igor Munkin
  2020-07-24 16:15     ` Sergey Ostanevich
  0 siblings, 1 reply; 12+ messages in thread
From: Igor Munkin @ 2020-07-22 15:12 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches, Vladislav Shpilevoy

Sergos,

Thanks for your review!

On 22.07.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch, LGTM.

Added your tag:
| Reviewed-by: Sergey Ostanevich <sergos@tarantool.org>

> 
> I wonder if it can be done in a more safe way in luaT_newthread() so
> that if the refernece is not initialized then initialize it, rather 
> assert in debug and - I suppose - fail with creation in release? I see
> an overhead for condition here and making even bigger indirect 
> machinery for the wrappers itself won't bring a lot either.

Frankly speaking, it looks an overkill to me. What are your exact
concerns? I added those asserts mostly for self-check. This is a hot
path in call/eval request handling, fiber creation and stored procedures
calls, so Debug mode testing fails if the problem occurs. This approach
is the similar to <box_process_lua> one[1] but there are no such checks.
This static reference can't be unintentionally changed outside this
translation unit, so its usage is well-localized. My arguments are not
about performance but sanity. Do you see any flaws in this design?

> 
> Regards,
> Sergos
> 
> 
> On 20 Jul 14:28, Igor Munkin wrote:
> > <luaT_newthread> created a new GCfunc object for the helper invoked in a
> > protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> > call. The change introduces a static reference to a GCfunc object for
> > <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> > reduce Lua GC memory usage.
> > 
> > Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> > created Lua coroutine need to be pushed back to prevent its sweep. So
> > to reduce guest stack manipulations <lua_cpcall> is replaced with
> > <lua_pcall> and the resulting Lua thread is obtained via guest stack.
> > 
> > Signed-off-by: Igor Munkin <imun@tarantool.org>
> > ---
> > Recently we discussed with Timur his struggling with linking his binary
> > Lua module against Tarantool. The reason is LuaJIT internals usage for
> > manipulations with the guest stack that are not provided by the binary.
> > I glanced the current <luaT_newthread> implementation and found out two
> > another problems related to the platform overall performance (as it is
> > proved with the corresponding benchmarks).
> > 
> > The first problem is the similar one <box_process_lua> had prior to the
> > patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
> > function to be called in protected frame. However this function is the
> > same throughout the platform uptime. It can be created on Taranool
> > startup and I see no reason to clobber GC that way.
> > 
> > Another problem I found are excess manipulations with the guest stack:
> > one need the newly born coroutine on it to prevent it being collected by
> > GC, but <lua_cpcall> purges everything left on the stack in scope of the
> > invoked function. As a result the obtained Lua coroutine is "pushed"
> > back to the guest stack via LuaJIT internal interfaces. It's a bit
> > ridiculous, since one can just use public API to get the same results:
> > Lua coroutine on the guest stack and the corresponding pointer to it.
> > 
> > I tested the platform performance with the same benchmark[2] I made for
> > the <box_process_lua> patch and here are the numbers I obtained after
> > the 15 runs:
> > * Vanilla bleeding master (mean):
> > | ===== 2.5.0-267-gbf047ad44 =====
> > | call by ref GC: 921877 Kb
> > | call by ref time: 1.340172 sec
> > | call GC: 476563.991667 Kb
> > | eval GC: 655274.935547 Kb
> > * Patched bleeding master (mean):
> > | ===== 2.5.0-268-gec0eb12f4 =====
> > | call by ref GC: 859377 Kb
> > | call by ref time: 1.215410 sec
> > | call GC: 445313 Kb
> > | eval GC: 624024 Kb
> > * Relative measurements (before -> after):
> > | call by ref GC: -6% (-62500 Kb)
> > | call by ref time: -9% (-0.124762 sec)
> > | call GC: -6% (-31250 Kb)
> > | eval GC: -4% (-31250 Kb)
> > 
> > There is one hot path I left unverified -- Lua-born fibers creation, but
> > I guess the relative numbers are quite similar to the ones I mentioned
> > above. However, if one wonders these results, feel free to ask me.
> > 
> > Branch: https://github.com/tarantool/tarantool/tree/imun/experimental-luaT-newthread
> > 
> > @ChangeLog:
> > * Improved safe Lua coroutine creation for the case of fiber
> >   initialization. Prepared GCfunc object is used instead of temporary
> >   one, resulting in 3-6% garbage collection reduction. Also excess guest
> >   stack manipulations are removed.
> > 
> >  src/lua/utils.c | 32 ++++++++++++++++++++++++++++++++
> >  src/lua/utils.h | 38 ++++++++++----------------------------
> >  2 files changed, 42 insertions(+), 28 deletions(-)
> > 
> > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-June/017834.html
> > [2]: https://gist.github.com/igormunkin/c941074fa9fdf0f7a4f068498fb5e24c
> > 
> > diff --git a/src/lua/utils.c b/src/lua/utils.c
> > index 0b05d7257..23ccbc3c9 100644
> > --- a/src/lua/utils.c
> > +++ b/src/lua/utils.c
> > @@ -43,6 +43,8 @@ int luaL_nil_ref = LUA_REFNIL;
> >  int luaL_map_metatable_ref = LUA_REFNIL;
> >  int luaL_array_metatable_ref = LUA_REFNIL;
> >  
> > +static int luaT_newthread_ref = LUA_NOREF;
> > +
> >  static uint32_t CTID_STRUCT_IBUF;
> >  static uint32_t CTID_STRUCT_IBUF_PTR;
> >  static uint32_t CTID_CHAR_PTR;
> > @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it)
> >  
> >  /* }}} */
> >  
> > +/**
> > + * @brief A wrapper for <lua_newthread> to be called via luaT_call
> > + * in luaT_newthread. Whether new Lua coroutine is created it is
> > + * returned on the top of the guest stack.
> > + * @param L is a Lua state
> > + * @sa <lua_newthread>
> > + */
> > +static int
> > +luaT_newthread_wrapper(lua_State *L)
> > +{
> > +	(void)lua_newthread(L);
> > +	return 1;
> > +}
> > +
> > +lua_State *
> > +luaT_newthread(lua_State *L)
> > +{
> > +	assert(luaT_newthread_ref != LUA_NOREF);
> > +	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
> > +	assert(lua_isfunction(L, -1));
> > +	if (luaT_call(L, 0, 1) != 0)
> > +		return NULL;
> > +	lua_State *L1 = lua_tothread(L, -1);
> > +	assert(L1 != NULL);
> > +	return L1;
> > +}
> > +
> >  int
> >  tarantool_lua_utils_init(struct lua_State *L)
> >  {
> > @@ -1274,5 +1303,8 @@ tarantool_lua_utils_init(struct lua_State *L)
> >  	(void) rc;
> >  	CTID_UUID = luaL_ctypeid(L, "struct tt_uuid");
> >  	assert(CTID_UUID != 0);
> > +
> > +	lua_pushcfunction(L, luaT_newthread_wrapper);
> > +	luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
> >  	return 0;
> >  }
> > diff --git a/src/lua/utils.h b/src/lua/utils.h
> > index b10754e4a..404eafe9b 100644
> > --- a/src/lua/utils.h
> > +++ b/src/lua/utils.h
> > @@ -597,34 +597,16 @@ luaL_checkfinite(struct lua_State *L, struct luaL_serializer *cfg,
> >  }
> >  
> >  /**
> > - * @brief A wrapper for lua_newthread() to pass it into luaT_cpcall
> > - * @param L is a Lua State
> > - * @sa lua_newthread()
> > - */
> > -static inline int
> > -luaT_newthread_wrapper(lua_State *L)
> > -{
> > -	*(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
> > -	return 0;
> > -}
> > -
> > -/**
> > - * @brief Safe wrapper for lua_newthread()
> > - * @param L is a Lua State
> > - * @sa lua_newthread()
> > - */
> > -static inline lua_State *
> > -luaT_newthread(lua_State *L)
> > -{
> > -	lua_State *L1 = NULL;
> > -	if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) {
> > -		return NULL;
> > -	}
> > -	assert(L1 != NULL);
> > -	setthreadV(L, L->top, L1);
> > -	incr_top(L);
> > -	return L1;
> > -}
> > + * @brief Creates a new Lua coroutine in a protected frame. If
> > + * <lua_newthread> call underneath succeeds, the created Lua state
> > + * is on the top of the guest stack and a pointer to this state is
> > + * returned. Otherwise LUA_ERRMEM error is handled and the result
> > + * is NULL.
> > + * @param L is a Lua state
> > + * @sa <lua_newthread>
> > + */
> > +lua_State *
> > +luaT_newthread(lua_State *L);
> >  
> >  /**
> >   * Check if a value on @a L stack by index @a idx is an ibuf
> > -- 
> > 2.25.0
> > 

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-June/017836.html

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-20 11:28 [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance Igor Munkin
  2020-07-20 12:10 ` Timur Safin
  2020-07-22 11:30 ` Sergey Ostanevich
@ 2020-07-23 21:23 ` Vladislav Shpilevoy
  2020-07-24 14:14   ` Igor Munkin
  2020-07-24 21:45 ` Igor Munkin
  2020-07-29 13:41 ` Kirill Yukhin
  4 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 21:23 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

Hi! Thanks for the patch!

On 20.07.2020 13:28, Igor Munkin wrote:
> <luaT_newthread> created a new GCfunc object for the helper invoked in a
> protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> call. The change introduces a static reference to a GCfunc object for
> <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> reduce Lua GC memory usage.
> 
> Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> created Lua coroutine need to be pushed back to prevent its sweep. So
> to reduce guest stack manipulations <lua_cpcall> is replaced with
> <lua_pcall> and the resulting Lua thread is obtained via guest stack.

I don't think I understand. Before we had one store into an 8 byte location
with lua_cpcall(). Now we have push and pop on the stack with lua_pcall().
What is the point? It seems to be worse.

> Signed-off-by: Igor Munkin <imun@tarantool.org>

This is already the second patch on removal of excessive func pushes.
I suggest you take a look at all the other lua_pushcfunction() calls.
For example, luaT_pushtuple() does it, and it is called on each tuple
push. Can be tens of thousands per seconds.

The same for luaT_pusherror, lua_field_inspect_ucdata. These should be
called often too.

> ---
> Recently we discussed with Timur his struggling with linking his binary
> Lua module against Tarantool. The reason is LuaJIT internals usage for
> manipulations with the guest stack that are not provided by the binary.
> I glanced the current <luaT_newthread> implementation and found out two
> another problems related to the platform overall performance (as it is
> proved with the corresponding benchmarks).
> 
> The first problem is the similar one <box_process_lua> had prior to the
> patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
> function to be called in protected frame. However this function is the
> same throughout the platform uptime. It can be created on Taranool
> startup and I see no reason to clobber GC that way.
> 
> Another problem I found are excess manipulations with the guest stack:
> one need the newly born coroutine on it to prevent it being collected by
> GC, but <lua_cpcall> purges everything left on the stack in scope of the
> invoked function. As a result the obtained Lua coroutine is "pushed"
> back to the guest stack via LuaJIT internal interfaces. It's a bit
> ridiculous, since one can just use public API to get the same results:
> Lua coroutine on the guest stack and the corresponding pointer to it.

I understand the GC object point. But I don't understand the pcall vs cpcall.
What is the problem with lua_cpcall() removing all from the stack? We don't
need anything on the stack here. We just need to return a pointer, and this
was done fine before.

> I tested the platform performance with the same benchmark[2] I made for
> the <box_process_lua> patch and here are the numbers I obtained after
> the 15 runs:
> * Vanilla bleeding master (mean):
> | ===== 2.5.0-267-gbf047ad44 =====
> | call by ref GC: 921877 Kb
> | call by ref time: 1.340172 sec
> | call GC: 476563.991667 Kb
> | eval GC: 655274.935547 Kb
> * Patched bleeding master (mean):
> | ===== 2.5.0-268-gec0eb12f4 =====
> | call by ref GC: 859377 Kb
> | call by ref time: 1.215410 sec
> | call GC: 445313 Kb
> | eval GC: 624024 Kb
> * Relative measurements (before -> after):
> | call by ref GC: -6% (-62500 Kb)
> | call by ref time: -9% (-0.124762 sec)
> | call GC: -6% (-31250 Kb)
> | eval GC: -4% (-31250 Kb)
> 
> There is one hot path I left unverified -- Lua-born fibers creation, but
> I guess the relative numbers are quite similar to the ones I mentioned
> above. However, if one wonders these results, feel free to ask me.
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 0b05d7257..23ccbc3c9 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it)
>  
>  /* }}} */
>  
> +/**
> + * @brief A wrapper for <lua_newthread> to be called via luaT_call
> + * in luaT_newthread. Whether new Lua coroutine is created it is
> + * returned on the top of the guest stack.
> + * @param L is a Lua state
> + * @sa <lua_newthread>
> + */
> +static int
> +luaT_newthread_wrapper(lua_State *L)

All the other code uses 'struct lua_State' instead of 'lua_State'.
Except old code. Why isn't it so here?

> +{
> +	(void)lua_newthread(L);
> +	return 1;
> +}
> +
> +lua_State *
> +luaT_newthread(lua_State *L)
> +{
> +	assert(luaT_newthread_ref != LUA_NOREF);
> +	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
> +	assert(lua_isfunction(L, -1));
> +	if (luaT_call(L, 0, 1) != 0)
> +		return NULL;
> +	lua_State *L1 = lua_tothread(L, -1);
> +	assert(L1 != NULL);
> +	return L1;
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-23 21:23 ` Vladislav Shpilevoy
@ 2020-07-24 14:14   ` Igor Munkin
  2020-07-24 21:47     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Igor Munkin @ 2020-07-24 14:14 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Thanks for your review!

On 23.07.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> On 20.07.2020 13:28, Igor Munkin wrote:
> > <luaT_newthread> created a new GCfunc object for the helper invoked in a
> > protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> > call. The change introduces a static reference to a GCfunc object for
> > <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> > reduce Lua GC memory usage.
> > 
> > Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> > created Lua coroutine need to be pushed back to prevent its sweep. So
> > to reduce guest stack manipulations <lua_cpcall> is replaced with
> > <lua_pcall> and the resulting Lua thread is obtained via guest stack.
> 
> I don't think I understand. Before we had one store into an 8 byte location
> with lua_cpcall(). Now we have push and pop on the stack with lua_pcall().
> What is the point? It seems to be worse.

No, it doesn't. It seems I need to explain this in a more precise way.

Let's start with the old implementation:
| (gdb) b luaT_newthread
| Breakpoint 1 at 0x156504: luaT_newthread. (5 locations)
| (gdb) b luaT_newthread_wrapper
| Breakpoint 2 at 0x1564c9: luaT_newthread_wrapper. (5 locations)
| (gdb) r
| <snipped>
| Breakpoint 1, luaT_newthread (L=0x40000378) at /tarantool/src/lua/utils.h:619
| 619             lua_State *L1 = NULL;
| (gdb) s
| 620             if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) {
| (gdb) lj-stack L
| 0x40023eb0:0x40023ed0 [    ] 5 slots: Red zone
| 0x40023ea8            [   M]
| 0x40023ac0:0x40023ea0 [    ] 125 slots: Free stack slots
| 0x40023ab8            [  T ]
| 0x40023ab0            [ B  ] VALUE: Lua function @ 0x4003da38, 4 upvalues, "@builtin/fiber.lua":67
| 0x40023aa8            [    ] FRAME: [L] delta=14, C function @ 0x55555577a9a2
| <snipped>

This is the Lua stack state, when <lbox_fiber_new> is called, but a new
Lua coroutine is not created yet.

Let's continue:
| (gdb) c
| Continuing.
|
| Breakpoint 2, luaT_newthread_wrapper (L=0x40000378) at /tarantool/src/lua/utils.h:607
| 607             *(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
| (gdb) lj-stack L
| 0x40023eb0:0x40023ed0 [    ] 5 slots: Red zone
| 0x40023ea8            [   M]
| 0x40023ad0:0x40023ea0 [    ] 123 slots: Free stack slots
| 0x40023ac8            [  T ]
| 0x40023ac0            [ B  ] VALUE: light userdata @ 0xffffd648
| 0x40023ab8            [    ] FRAME: [CP] delta=2, C function @ 0x555555779291
| 0x40023ab0            [    ] VALUE: Lua function @ 0x4003da38, 4 upvalues, "@builtin/fiber.lua":67
| 0x40023aa8            [    ] FRAME: [L] delta=14, C function @ 0x55555577a9a2
| <snipped>

This is the Lua stack state, when <luaT_newthread_wrapper> is called,
but a new Lua coroutine is still not created yet.

NB: you can see a strange address for light userdata payload
(0xffffd648), but it's a bug in the gdb extension. The valid address is
0x7fffffffd648 as you can see below.

Let's proceed further:
| (gdb) s
| lua_touserdata (L=0x40000378, idx=1) at lj_api.c:616
| <snipped>
| Run till exit from #0  lua_touserdata (L=0x40000378, idx=1) at lj_api.c:616
| 0x00005555557792af in luaT_newthread_wrapper (L=0x40000378) at /tarantool/src/lua/utils.h:607
| 607             *(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
| Value returned is $1 = (void *) 0x7fffffffd648
| (gdb) s
| lua_newthread (L=0x40000378) at lj_api.c:758
| <snipped>
| Run till exit from #0  lua_newthread (L=0x40000378) at lj_api.c:758
| luaT_newthread_wrapper (L=0x40000378) at /tarantool/src/lua/utils.h:607
| 607             *(lua_State **)lua_touserdata(L, 1) = lua_newthread(L);
| Value returned is $2 = (lua_State *) 0x4003ee28
| (gdb) s
| 608             return 0;
| (gdb) lj-stack L
| 0x40023eb0:0x40023ed0 [    ] 5 slots: Red zone
| 0x40023ea8            [   M]
| 0x40023ad8:0x40023ea0 [    ] 122 slots: Free stack slots
| 0x40023ad0            [  T ]
| 0x40023ac8            [    ] VALUE: thread @ 0x4003ee28
| 0x40023ac0            [ B  ] VALUE: light userdata @ 0xffffd648
| 0x40023ab8            [    ] FRAME: [CP] delta=2, C function @ 0x555555779291
| 0x40023ab0            [    ] VALUE: Lua function @ 0x4003da38, 4 upvalues, "@builtin/fiber.lua":67
| 0x40023aa8            [    ] FRAME: [L] delta=14, C function @ 0x55555577a9a2
| <snipped>
| (gdb) p/x *(uintptr_t *)0x7fffffffd648
| $3 = 0x4003ee28

As a result there is the light userdata payload initialized with the
created Lua coroutine address and the same coroutine in Lua stack slot
at 0x40023ac8.

Now, let's see the stack after the execution leaves the protected frame:
| luaT_newthread (L=0x40000378) at /tarantool/src/lua/utils.h:620
| 620             if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) {
| (gdb) lj-stack L
| 0x40023eb0:0x40023ed0 [    ] 5 slots: Red zone
| 0x40023ea8            [   M]
| 0x40023ac0:0x40023ea0 [    ] 125 slots: Free stack slots
| 0x40023ab8            [  T ]
| 0x40023ab0            [ B  ] VALUE: Lua function @ 0x4003da38, 4 upvalues, "@builtin/fiber.lua":67
| 0x40023aa8            [    ] FRAME: [L] delta=14, C function @ 0x55555577a9a2
| <snipped>

So, as I mentioned in the patch nothing is returned on the Lua stack.
As a result the created Lua coroutine is unreachable for GC machinery.
That means it will not be marked and can be released in the nearest GC
cycle. Yes, we anchor it to Lua registry later (in the <luaT_newthread>
caller) but considering <luaL_ref> API[1] the object to be anchored also
ought to be on the top of the stack.

Thereby the Lua coroutine obtained via L1 is put back to Lua stack:
| (gdb) n
| 624             setthreadV(L, L->top, L1);
| (gdb)
| 625             incr_top(L);
| (gdb) lj-stack L
| 0x40023eb0:0x40023ed0 [    ] 5 slots: Red zone
| 0x40023ea8            [   M]
| 0x40023ac8:0x40023ea0 [    ] 124 slots: Free stack slots
| 0x40023ac0            [  T ]
| 0x40023ab8            [    ] VALUE: thread @ 0x4003ee28
| 0x40023ab0            [ B  ] VALUE: Lua function @ 0x4003da38, 4 upvalues, "@builtin/fiber.lua":67
| 0x40023aa8            [    ] FRAME: [L] delta=14, C function @ 0x55555577a9a2
| <snipped>

My main point is regarding these guest stack manipulations: since we
need the created Lua coroutine on the Lua stack, just leave it there as
a return value of <luaT_call> (i.e. <lua_pcall>).

The differences are below:
* Lua stack state prior to <luaT_call>:
| 1249            if (luaT_call(L, 0, 1) != 0)
| (gdb) lj-stack L
| 0x40023dc8:0x40023de8 [    ] 5 slots: Red zone
| 0x40023dc0            [   M]
| 0x400239e0:0x40023db8 [    ] 124 slots: Free stack slots
| 0x400239d8            [  T ]
| 0x400239d0            [    ] VALUE: C function @ 0x555555781ff6
| 0x400239c8            [ B  ] VALUE: Lua function @ 0x4003da80, 4 upvalues, "@builtin/fiber.lua":67
| 0x400239c0            [    ] FRAME: [L] delta=14, C function @ 0x55555577a486
| <snipped>
* Lua stack state as a result of <lua_newthread> call:
| (gdb) c
| Continuing.
|
| Breakpoint 2, luaT_newthread_wrapper (L=0x40000378) at /tarantool/src/lua/utils.c:1239
| 1239            (void)lua_newthread(L);
| (gdb) s
| <snipped>
| Run till exit from #0  lua_newthread (L=0x40000378) at lj_api.c:758
| luaT_newthread_wrapper (L=0x40000378) at /tarantool/src/lua/utils.c:1240
| 1240            return 1;
| Value returned is $2 = (lua_State *) 0x4003ee48
| (gdb) lj-stack L
| 0x40023dc8:0x40023de8 [    ] 5 slots: Red zone
| 0x40023dc0            [   M]
| 0x400239e8:0x40023db8 [    ] 123 slots: Free stack slots
| 0x400239e0            [  T ]
| 0x400239d8            [ B  ] VALUE: thread @ 0x4003ee48
| 0x400239d0            [    ] FRAME: [CP] delta=2, C function @ 0x555555781ff6
| 0x400239c8            [    ] VALUE: Lua function @ 0x4003da80, 4 upvalues, "@builtin/fiber.lua":67
| 0x400239c0            [    ] FRAME: [L] delta=14, C function @ 0x55555577a486
| <snipped>
* Lua stack state when <luaT_call> returned:
| luaT_newthread (L=0x40000378) at /tarantool/src/lua/utils.c:1249
| 1249            if (luaT_call(L, 0, 1) != 0)
| (gdb) lj-stack L
| 0x40023dc8:0x40023de8 [    ] 5 slots: Red zone
| 0x40023dc0            [   M]
| 0x400239e0:0x40023db8 [    ] 124 slots: Free stack slots
| 0x400239d8            [  T ]
| 0x400239d0            [    ] VALUE: thread @ 0x4003ee48
| 0x400239c8            [ B  ] VALUE: Lua function @ 0x4003da80, 4 upvalues, "@builtin/fiber.lua":67
| 0x400239c0            [    ] FRAME: [L] delta=14, C function @ 0x55555577a486
| <snipped>

As a result, the created Lua coroutine is returned on the guest stack
and L1 is initialized via <lua_tothread> call.

If you find something relevant or useful to be included right to the
commit message, feel free to point it out.

> 
> > Signed-off-by: Igor Munkin <imun@tarantool.org>
> 
> This is already the second patch on removal of excessive func pushes.
> I suggest you take a look at all the other lua_pushcfunction() calls.
> For example, luaT_pushtuple() does it, and it is called on each tuple
> push. Can be tens of thousands per seconds.
> 
> The same for luaT_pusherror, lua_field_inspect_ucdata. These should be
> called often too.

I also thought about such activity while testing this change. I filed an
issue[2] for it and referred it with this patch:
| Part of #5201

> 
> > ---
> > Recently we discussed with Timur his struggling with linking his binary
> > Lua module against Tarantool. The reason is LuaJIT internals usage for
> > manipulations with the guest stack that are not provided by the binary.
> > I glanced the current <luaT_newthread> implementation and found out two
> > another problems related to the platform overall performance (as it is
> > proved with the corresponding benchmarks).
> > 
> > The first problem is the similar one <box_process_lua> had prior to the
> > patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
> > function to be called in protected frame. However this function is the
> > same throughout the platform uptime. It can be created on Taranool
> > startup and I see no reason to clobber GC that way.
> > 
> > Another problem I found are excess manipulations with the guest stack:
> > one need the newly born coroutine on it to prevent it being collected by
> > GC, but <lua_cpcall> purges everything left on the stack in scope of the
> > invoked function. As a result the obtained Lua coroutine is "pushed"
> > back to the guest stack via LuaJIT internal interfaces. It's a bit
> > ridiculous, since one can just use public API to get the same results:
> > Lua coroutine on the guest stack and the corresponding pointer to it.
> 
> I understand the GC object point. But I don't understand the pcall vs cpcall.
> What is the problem with lua_cpcall() removing all from the stack? We don't
> need anything on the stack here. We just need to return a pointer, and this
> was done fine before.

As I mentioned above, we also need to return the new object on the guest
stack to anchor it to Lua registry table in the caller.

Besides, if we go further we can enclose all registry-related actions in
scope of the <luaT_newthread>, but it seems to lead to a huge
refactoring:
* the reference obtained via <luaL_ref> need to be saved the complement
* function for release need to be implemented
* all callers need to be adjusted to the new interface

Thoughts?

> 
> > I tested the platform performance with the same benchmark[2] I made for
> > the <box_process_lua> patch and here are the numbers I obtained after
> > the 15 runs:
> > * Vanilla bleeding master (mean):
> > | ===== 2.5.0-267-gbf047ad44 =====
> > | call by ref GC: 921877 Kb
> > | call by ref time: 1.340172 sec
> > | call GC: 476563.991667 Kb
> > | eval GC: 655274.935547 Kb
> > * Patched bleeding master (mean):
> > | ===== 2.5.0-268-gec0eb12f4 =====
> > | call by ref GC: 859377 Kb
> > | call by ref time: 1.215410 sec
> > | call GC: 445313 Kb
> > | eval GC: 624024 Kb
> > * Relative measurements (before -> after):
> > | call by ref GC: -6% (-62500 Kb)
> > | call by ref time: -9% (-0.124762 sec)
> > | call GC: -6% (-31250 Kb)
> > | eval GC: -4% (-31250 Kb)
> > 
> > There is one hot path I left unverified -- Lua-born fibers creation, but
> > I guess the relative numbers are quite similar to the ones I mentioned
> > above. However, if one wonders these results, feel free to ask me.
> > 
> > diff --git a/src/lua/utils.c b/src/lua/utils.c
> > index 0b05d7257..23ccbc3c9 100644
> > --- a/src/lua/utils.c
> > +++ b/src/lua/utils.c
> > @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it)
> >  
> >  /* }}} */
> >  
> > +/**
> > + * @brief A wrapper for <lua_newthread> to be called via luaT_call
> > + * in luaT_newthread. Whether new Lua coroutine is created it is
> > + * returned on the top of the guest stack.
> > + * @param L is a Lua state
> > + * @sa <lua_newthread>
> > + */
> > +static int
> > +luaT_newthread_wrapper(lua_State *L)
> 
> All the other code uses 'struct lua_State' instead of 'lua_State'.
> Except old code. Why isn't it so here?

At first, it was used this way in the original implementation (as you
can see I moved most of the code from lua/utils.h to lua/utils.c).
Furthermore, a lot of Lua C API code (at least that I've seen outside
the Tarantool) uses <lua_State *> provided via lua.h header. Besides,
there are many function definitions nearby in lua/utils.c where <struct>
keyword is omitted.

However, I checked our guidelines[3] and fixed <lua_State *> usage the
way you proposed:

================================================================================

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 23ccbc3c9..af114b0a2 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1234,21 +1234,21 @@ void luaL_iterator_delete(struct luaL_iterator *it)
  * @sa <lua_newthread>
  */
 static int
-luaT_newthread_wrapper(lua_State *L)
+luaT_newthread_wrapper(struct lua_State *L)
 {
 	(void)lua_newthread(L);
 	return 1;
 }
 
-lua_State *
-luaT_newthread(lua_State *L)
+struct lua_State *
+luaT_newthread(struct lua_State *L)
 {
 	assert(luaT_newthread_ref != LUA_NOREF);
 	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
 	assert(lua_isfunction(L, -1));
 	if (luaT_call(L, 0, 1) != 0)
 		return NULL;
-	lua_State *L1 = lua_tothread(L, -1);
+	struct lua_State *L1 = lua_tothread(L, -1);
 	assert(L1 != NULL);
 	return L1;
 }
diff --git a/src/lua/utils.h b/src/lua/utils.h
index 404eafe9b..7e02a05f2 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -605,8 +605,8 @@ luaL_checkfinite(struct lua_State *L, struct luaL_serializer *cfg,
  * @param L is a Lua state
  * @sa <lua_newthread>
  */
-lua_State *
-luaT_newthread(lua_State *L);
+struct lua_State *
+luaT_newthread(struct lua_State *L);
 
 /**
  * Check if a value on @a L stack by index @a idx is an ibuf

================================================================================

> 
> > +{
> > +	(void)lua_newthread(L);
> > +	return 1;
> > +}
> > +
> > +lua_State *
> > +luaT_newthread(lua_State *L)
> > +{
> > +	assert(luaT_newthread_ref != LUA_NOREF);
> > +	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
> > +	assert(lua_isfunction(L, -1));
> > +	if (luaT_call(L, 0, 1) != 0)
> > +		return NULL;
> > +	lua_State *L1 = lua_tothread(L, -1);
> > +	assert(L1 != NULL);
> > +	return L1;
> > +}

[1]: https://www.lua.org/manual/5.1/manual.html#luaL_ref
[2]: https://github.com/tarantool/tarantool/issues/5201
[3]: https://www.tarantool.io/en/doc/2.3/dev_guide/c_style_guide/#chapter-5-typedefs

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-22 15:12   ` Igor Munkin
@ 2020-07-24 16:15     ` Sergey Ostanevich
  2020-07-24 19:18       ` Igor Munkin
  0 siblings, 1 reply; 12+ messages in thread
From: Sergey Ostanevich @ 2020-07-24 16:15 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy

Hi!

On 22 Jul 18:12, Igor Munkin wrote:
> Sergos,
> 
> Thanks for your review!
> 
> On 22.07.20, Sergey Ostanevich wrote:
> > Hi!
> > 
> > Thanks for the patch, LGTM.
> 
> Added your tag:
> | Reviewed-by: Sergey Ostanevich <sergos@tarantool.org>
> 
> > 
> > I wonder if it can be done in a more safe way in luaT_newthread() so
> > that if the refernece is not initialized then initialize it, rather 
> > assert in debug and - I suppose - fail with creation in release? I see
> > an overhead for condition here and making even bigger indirect 
> > machinery for the wrappers itself won't bring a lot either.
> 
> Frankly speaking, it looks an overkill to me. What are your exact
> concerns? I added those asserts mostly for self-check. This is a hot
> path in call/eval request handling, fiber creation and stored procedures
> calls, so Debug mode testing fails if the problem occurs. This approach
> is the similar to <box_process_lua> one[1] but there are no such checks.
> This static reference can't be unintentionally changed outside this
> translation unit, so its usage is well-localized. My arguments are not
> about performance but sanity. Do you see any flaws in this design?
> 

The change is not the issue, the missing initialization is. Since
luaT_newthread() is external - there _could_ be some circumstances that
it will be called w/o init done. I believe the lua_rawgeti() should
fail, so my question if it will report a comprehensible error? If it
will - it should be enough, so I enforce my LGTM once more.

Regards,
Sergos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-24 16:15     ` Sergey Ostanevich
@ 2020-07-24 19:18       ` Igor Munkin
  0 siblings, 0 replies; 12+ messages in thread
From: Igor Munkin @ 2020-07-24 19:18 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches, Vladislav Shpilevoy

Sergos,

On 24.07.20, Sergey Ostanevich wrote:
> Hi!
> 
> On 22 Jul 18:12, Igor Munkin wrote:

<snipped>

> > > 
> > > I wonder if it can be done in a more safe way in luaT_newthread() so
> > > that if the refernece is not initialized then initialize it, rather 
> > > assert in debug and - I suppose - fail with creation in release? I see
> > > an overhead for condition here and making even bigger indirect 
> > > machinery for the wrappers itself won't bring a lot either.
> > 
> > Frankly speaking, it looks an overkill to me. What are your exact
> > concerns? I added those asserts mostly for self-check. This is a hot
> > path in call/eval request handling, fiber creation and stored procedures
> > calls, so Debug mode testing fails if the problem occurs. This approach
> > is the similar to <box_process_lua> one[1] but there are no such checks.
> > This static reference can't be unintentionally changed outside this
> > translation unit, so its usage is well-localized. My arguments are not
> > about performance but sanity. Do you see any flaws in this design?
> > 
> 
> The change is not the issue, the missing initialization is. Since

The reference is initialized in scope of <tarantool_lua_utils_init>
function. If it was not called other vital Tarantool internals also
would be missing (e.g. box.NULL, ibuf).

> luaT_newthread() is external - there _could_ be some circumstances that
> it will be called w/o init done. I believe the lua_rawgeti() should

luaT_newthread can't be called prior to utils initialization, since
there is no lua_State to be passed to it (AFAICS all Lua-dependent
subsystems are initialized in scope of <tarantool_lua_init>).

> fail, so my question if it will report a comprehensible error? If it

Everything is fine with Debug mode (considering several asserts I
added), so here are several examples for Release mode:

1. Reference is not initialized on Tarantool startup:
================================================================================

diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0a2..6d422eef3 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1305,6 +1305,5 @@ tarantool_lua_utils_init(struct lua_State *L)
 	assert(CTID_UUID != 0);
 
 	lua_pushcfunction(L, luaT_newthread_wrapper);
-	luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
 	return 0;
 }

================================================================================
| $ ./src/tarantool
| PANIC: unprotected error in call to Lua API (?)

2. Reference is corrupted while Tarantool is running:
================================================================================

diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0a2..be56ba12f 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -998,6 +998,7 @@ luaL_toint64(struct lua_State *L, int idx)
 int
 luaT_toerror(lua_State *L)
 {
+	luaT_newthread_ref = LUA_NOREF;
 	struct error *e = luaL_iserror(L, -1);
 	if (e != NULL) {
 		/* Re-throw original error */

================================================================================
| $ cat ~/ref.lua
| local fiber = require('fiber')
|
| fiber.create(function() a() end)
| fiber.create(function() a() end)
| $ ./src/tarantool ~/ref.lua
| LuajitError: /home/imun/ref.lua:3: attempt to call global 'a' (a nil value)
| LuajitError: attempt to call a nil value
| fatal error, exiting the event loop

> will - it should be enough, so I enforce my LGTM once more.

We're testing lots of various builds and I believe our testing system
covers most of the cases occurring in real life.

> 
> Regards,
> Sergos
> 
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-24 21:47     ` Vladislav Shpilevoy
@ 2020-07-24 21:41       ` Igor Munkin
  0 siblings, 0 replies; 12+ messages in thread
From: Igor Munkin @ 2020-07-24 21:41 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

On 24.07.20, Vladislav Shpilevoy wrote:
> Thanks for the explanation!
> 
> Discussed 1-to-1 again, now I understand everything.
> 
> LGTM.

Thanks, added your tag:
| Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-20 11:28 [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance Igor Munkin
                   ` (2 preceding siblings ...)
  2020-07-23 21:23 ` Vladislav Shpilevoy
@ 2020-07-24 21:45 ` Igor Munkin
  2020-07-29 13:41 ` Kirill Yukhin
  4 siblings, 0 replies; 12+ messages in thread
From: Igor Munkin @ 2020-07-24 21:45 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches, Vladislav Shpilevoy

Kirill,

Please proceed with the patch.

On 20.07.20, Igor Munkin wrote:
> <luaT_newthread> created a new GCfunc object for the helper invoked in a
> protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> call. The change introduces a static reference to a GCfunc object for
> <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> reduce Lua GC memory usage.
> 
> Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> created Lua coroutine need to be pushed back to prevent its sweep. So
> to reduce guest stack manipulations <lua_cpcall> is replaced with
> <lua_pcall> and the resulting Lua thread is obtained via guest stack.
> 
> Signed-off-by: Igor Munkin <imun@tarantool.org>
> ---
> Recently we discussed with Timur his struggling with linking his binary
> Lua module against Tarantool. The reason is LuaJIT internals usage for
> manipulations with the guest stack that are not provided by the binary.
> I glanced the current <luaT_newthread> implementation and found out two
> another problems related to the platform overall performance (as it is
> proved with the corresponding benchmarks).
> 
> The first problem is the similar one <box_process_lua> had prior to the
> patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
> function to be called in protected frame. However this function is the
> same throughout the platform uptime. It can be created on Taranool
> startup and I see no reason to clobber GC that way.
> 
> Another problem I found are excess manipulations with the guest stack:
> one need the newly born coroutine on it to prevent it being collected by
> GC, but <lua_cpcall> purges everything left on the stack in scope of the
> invoked function. As a result the obtained Lua coroutine is "pushed"
> back to the guest stack via LuaJIT internal interfaces. It's a bit
> ridiculous, since one can just use public API to get the same results:
> Lua coroutine on the guest stack and the corresponding pointer to it.
> 
> I tested the platform performance with the same benchmark[2] I made for
> the <box_process_lua> patch and here are the numbers I obtained after
> the 15 runs:
> * Vanilla bleeding master (mean):
> | ===== 2.5.0-267-gbf047ad44 =====
> | call by ref GC: 921877 Kb
> | call by ref time: 1.340172 sec
> | call GC: 476563.991667 Kb
> | eval GC: 655274.935547 Kb
> * Patched bleeding master (mean):
> | ===== 2.5.0-268-gec0eb12f4 =====
> | call by ref GC: 859377 Kb
> | call by ref time: 1.215410 sec
> | call GC: 445313 Kb
> | eval GC: 624024 Kb
> * Relative measurements (before -> after):
> | call by ref GC: -6% (-62500 Kb)
> | call by ref time: -9% (-0.124762 sec)
> | call GC: -6% (-31250 Kb)
> | eval GC: -4% (-31250 Kb)
> 
> There is one hot path I left unverified -- Lua-born fibers creation, but
> I guess the relative numbers are quite similar to the ones I mentioned
> above. However, if one wonders these results, feel free to ask me.
> 
> Branch: https://github.com/tarantool/tarantool/tree/imun/experimental-luaT-newthread
> 
> @ChangeLog:
> * Improved safe Lua coroutine creation for the case of fiber
>   initialization. Prepared GCfunc object is used instead of temporary
>   one, resulting in 3-6% garbage collection reduction. Also excess guest
>   stack manipulations are removed.
> 
>  src/lua/utils.c | 32 ++++++++++++++++++++++++++++++++
>  src/lua/utils.h | 38 ++++++++++----------------------------
>  2 files changed, 42 insertions(+), 28 deletions(-)
> 
> [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-June/017834.html
> [2]: https://gist.github.com/igormunkin/c941074fa9fdf0f7a4f068498fb5e24c
> 

<snipped>

> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-24 14:14   ` Igor Munkin
@ 2020-07-24 21:47     ` Vladislav Shpilevoy
  2020-07-24 21:41       ` Igor Munkin
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-24 21:47 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Thanks for the explanation!

Discussed 1-to-1 again, now I understand everything.

LGTM.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance
  2020-07-20 11:28 [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance Igor Munkin
                   ` (3 preceding siblings ...)
  2020-07-24 21:45 ` Igor Munkin
@ 2020-07-29 13:41 ` Kirill Yukhin
  4 siblings, 0 replies; 12+ messages in thread
From: Kirill Yukhin @ 2020-07-29 13:41 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy

Hello,

On 20 июл 14:28, Igor Munkin wrote:
> <luaT_newthread> created a new GCfunc object for the helper invoked in a
> protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> call. The change introduces a static reference to a GCfunc object for
> <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> reduce Lua GC memory usage.
> 
> Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> created Lua coroutine need to be pushed back to prevent its sweep. So
> to reduce guest stack manipulations <lua_cpcall> is replaced with
> <lua_pcall> and the resulting Lua thread is obtained via guest stack.
> 
> Signed-off-by: Igor Munkin <imun@tarantool.org>

I've checked your patch into 1.10, 2.4, 2.5 and master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-29 13:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20 11:28 [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance Igor Munkin
2020-07-20 12:10 ` Timur Safin
2020-07-22 11:30 ` Sergey Ostanevich
2020-07-22 15:12   ` Igor Munkin
2020-07-24 16:15     ` Sergey Ostanevich
2020-07-24 19:18       ` Igor Munkin
2020-07-23 21:23 ` Vladislav Shpilevoy
2020-07-24 14:14   ` Igor Munkin
2020-07-24 21:47     ` Vladislav Shpilevoy
2020-07-24 21:41       ` Igor Munkin
2020-07-24 21:45 ` Igor Munkin
2020-07-29 13:41 ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox