From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp15.mail.ru (smtp15.mail.ru [94.100.176.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 35A94445320 for ; Wed, 22 Jul 2020 14:31:00 +0300 (MSK) Date: Wed, 22 Jul 2020 14:30:55 +0300 From: Sergey Ostanevich Message-ID: <20200722113055.GB894@tarantool.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Hi! Thanks for the patch, LGTM. I wonder if it can be done in a more safe way in luaT_newthread() so that if the refernece is not initialized then initialize it, rather assert in debug and - I suppose - fail with creation in release? I see an overhead for condition here and making even bigger indirect machinery for the wrappers itself won't bring a lot either. Regards, Sergos On 20 Jul 14:28, Igor Munkin wrote: > created a new GCfunc object for the helper invoked in a > protected frame (i.e. ) on each > call. The change introduces a static reference to a GCfunc object for > to be initialized on Tarantool startup to > reduce Lua GC memory usage. > > Furthermore, since yields nothing on guest stack, the newly > created Lua coroutine need to be pushed back to prevent its sweep. So > to reduce guest stack manipulations is replaced with > and the resulting Lua thread is obtained via guest stack. > > Signed-off-by: Igor Munkin > --- > Recently we discussed with Timur his struggling with linking his binary > Lua module against Tarantool. The reason is LuaJIT internals usage for > manipulations with the guest stack that are not provided by the binary. > I glanced the current implementation and found out two > another problems related to the platform overall performance (as it is > proved with the corresponding benchmarks). > > The first problem is the similar one had prior to the > patch[1]: creates an auxiliary GCfunc object for the > function to be called in protected frame. However this function is the > same throughout the platform uptime. It can be created on Taranool > startup and I see no reason to clobber GC that way. > > Another problem I found are excess manipulations with the guest stack: > one need the newly born coroutine on it to prevent it being collected by > GC, but purges everything left on the stack in scope of the > invoked function. As a result the obtained Lua coroutine is "pushed" > back to the guest stack via LuaJIT internal interfaces. It's a bit > ridiculous, since one can just use public API to get the same results: > Lua coroutine on the guest stack and the corresponding pointer to it. > > I tested the platform performance with the same benchmark[2] I made for > the patch and here are the numbers I obtained after > the 15 runs: > * Vanilla bleeding master (mean): > | ===== 2.5.0-267-gbf047ad44 ===== > | call by ref GC: 921877 Kb > | call by ref time: 1.340172 sec > | call GC: 476563.991667 Kb > | eval GC: 655274.935547 Kb > * Patched bleeding master (mean): > | ===== 2.5.0-268-gec0eb12f4 ===== > | call by ref GC: 859377 Kb > | call by ref time: 1.215410 sec > | call GC: 445313 Kb > | eval GC: 624024 Kb > * Relative measurements (before -> after): > | call by ref GC: -6% (-62500 Kb) > | call by ref time: -9% (-0.124762 sec) > | call GC: -6% (-31250 Kb) > | eval GC: -4% (-31250 Kb) > > There is one hot path I left unverified -- Lua-born fibers creation, but > I guess the relative numbers are quite similar to the ones I mentioned > above. However, if one wonders these results, feel free to ask me. > > Branch: https://github.com/tarantool/tarantool/tree/imun/experimental-luaT-newthread > > @ChangeLog: > * Improved safe Lua coroutine creation for the case of fiber > initialization. Prepared GCfunc object is used instead of temporary > one, resulting in 3-6% garbage collection reduction. Also excess guest > stack manipulations are removed. > > src/lua/utils.c | 32 ++++++++++++++++++++++++++++++++ > src/lua/utils.h | 38 ++++++++++---------------------------- > 2 files changed, 42 insertions(+), 28 deletions(-) > > [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-June/017834.html > [2]: https://gist.github.com/igormunkin/c941074fa9fdf0f7a4f068498fb5e24c > > diff --git a/src/lua/utils.c b/src/lua/utils.c > index 0b05d7257..23ccbc3c9 100644 > --- a/src/lua/utils.c > +++ b/src/lua/utils.c > @@ -43,6 +43,8 @@ int luaL_nil_ref = LUA_REFNIL; > int luaL_map_metatable_ref = LUA_REFNIL; > int luaL_array_metatable_ref = LUA_REFNIL; > > +static int luaT_newthread_ref = LUA_NOREF; > + > static uint32_t CTID_STRUCT_IBUF; > static uint32_t CTID_STRUCT_IBUF_PTR; > static uint32_t CTID_CHAR_PTR; > @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it) > > /* }}} */ > > +/** > + * @brief A wrapper for to be called via luaT_call > + * in luaT_newthread. Whether new Lua coroutine is created it is > + * returned on the top of the guest stack. > + * @param L is a Lua state > + * @sa > + */ > +static int > +luaT_newthread_wrapper(lua_State *L) > +{ > + (void)lua_newthread(L); > + return 1; > +} > + > +lua_State * > +luaT_newthread(lua_State *L) > +{ > + assert(luaT_newthread_ref != LUA_NOREF); > + lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref); > + assert(lua_isfunction(L, -1)); > + if (luaT_call(L, 0, 1) != 0) > + return NULL; > + lua_State *L1 = lua_tothread(L, -1); > + assert(L1 != NULL); > + return L1; > +} > + > int > tarantool_lua_utils_init(struct lua_State *L) > { > @@ -1274,5 +1303,8 @@ tarantool_lua_utils_init(struct lua_State *L) > (void) rc; > CTID_UUID = luaL_ctypeid(L, "struct tt_uuid"); > assert(CTID_UUID != 0); > + > + lua_pushcfunction(L, luaT_newthread_wrapper); > + luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); > return 0; > } > diff --git a/src/lua/utils.h b/src/lua/utils.h > index b10754e4a..404eafe9b 100644 > --- a/src/lua/utils.h > +++ b/src/lua/utils.h > @@ -597,34 +597,16 @@ luaL_checkfinite(struct lua_State *L, struct luaL_serializer *cfg, > } > > /** > - * @brief A wrapper for lua_newthread() to pass it into luaT_cpcall > - * @param L is a Lua State > - * @sa lua_newthread() > - */ > -static inline int > -luaT_newthread_wrapper(lua_State *L) > -{ > - *(lua_State **)lua_touserdata(L, 1) = lua_newthread(L); > - return 0; > -} > - > -/** > - * @brief Safe wrapper for lua_newthread() > - * @param L is a Lua State > - * @sa lua_newthread() > - */ > -static inline lua_State * > -luaT_newthread(lua_State *L) > -{ > - lua_State *L1 = NULL; > - if (luaT_cpcall(L, luaT_newthread_wrapper, &L1) != 0) { > - return NULL; > - } > - assert(L1 != NULL); > - setthreadV(L, L->top, L1); > - incr_top(L); > - return L1; > -} > + * @brief Creates a new Lua coroutine in a protected frame. If > + * call underneath succeeds, the created Lua state > + * is on the top of the guest stack and a pointer to this state is > + * returned. Otherwise LUA_ERRMEM error is handled and the result > + * is NULL. > + * @param L is a Lua state > + * @sa > + */ > +lua_State * > +luaT_newthread(lua_State *L); > > /** > * Check if a value on @a L stack by index @a idx is an ibuf > -- > 2.25.0 >