From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 617FE445320 for ; Fri, 24 Jul 2020 00:23:08 +0300 (MSK) References: From: Vladislav Shpilevoy Message-ID: <5754acde-5591-791e-c836-c8685d9df208@tarantool.org> Date: Thu, 23 Jul 2020 23:23:05 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin , Sergey Ostanevich Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the patch! On 20.07.2020 13:28, Igor Munkin wrote: > created a new GCfunc object for the helper invoked in a > protected frame (i.e. ) on each > call. The change introduces a static reference to a GCfunc object for > to be initialized on Tarantool startup to > reduce Lua GC memory usage. > > Furthermore, since yields nothing on guest stack, the newly > created Lua coroutine need to be pushed back to prevent its sweep. So > to reduce guest stack manipulations is replaced with > and the resulting Lua thread is obtained via guest stack. I don't think I understand. Before we had one store into an 8 byte location with lua_cpcall(). Now we have push and pop on the stack with lua_pcall(). What is the point? It seems to be worse. > Signed-off-by: Igor Munkin This is already the second patch on removal of excessive func pushes. I suggest you take a look at all the other lua_pushcfunction() calls. For example, luaT_pushtuple() does it, and it is called on each tuple push. Can be tens of thousands per seconds. The same for luaT_pusherror, lua_field_inspect_ucdata. These should be called often too. > --- > Recently we discussed with Timur his struggling with linking his binary > Lua module against Tarantool. The reason is LuaJIT internals usage for > manipulations with the guest stack that are not provided by the binary. > I glanced the current implementation and found out two > another problems related to the platform overall performance (as it is > proved with the corresponding benchmarks). > > The first problem is the similar one had prior to the > patch[1]: creates an auxiliary GCfunc object for the > function to be called in protected frame. However this function is the > same throughout the platform uptime. It can be created on Taranool > startup and I see no reason to clobber GC that way. > > Another problem I found are excess manipulations with the guest stack: > one need the newly born coroutine on it to prevent it being collected by > GC, but purges everything left on the stack in scope of the > invoked function. As a result the obtained Lua coroutine is "pushed" > back to the guest stack via LuaJIT internal interfaces. It's a bit > ridiculous, since one can just use public API to get the same results: > Lua coroutine on the guest stack and the corresponding pointer to it. I understand the GC object point. But I don't understand the pcall vs cpcall. What is the problem with lua_cpcall() removing all from the stack? We don't need anything on the stack here. We just need to return a pointer, and this was done fine before. > I tested the platform performance with the same benchmark[2] I made for > the patch and here are the numbers I obtained after > the 15 runs: > * Vanilla bleeding master (mean): > | ===== 2.5.0-267-gbf047ad44 ===== > | call by ref GC: 921877 Kb > | call by ref time: 1.340172 sec > | call GC: 476563.991667 Kb > | eval GC: 655274.935547 Kb > * Patched bleeding master (mean): > | ===== 2.5.0-268-gec0eb12f4 ===== > | call by ref GC: 859377 Kb > | call by ref time: 1.215410 sec > | call GC: 445313 Kb > | eval GC: 624024 Kb > * Relative measurements (before -> after): > | call by ref GC: -6% (-62500 Kb) > | call by ref time: -9% (-0.124762 sec) > | call GC: -6% (-31250 Kb) > | eval GC: -4% (-31250 Kb) > > There is one hot path I left unverified -- Lua-born fibers creation, but > I guess the relative numbers are quite similar to the ones I mentioned > above. However, if one wonders these results, feel free to ask me. > > diff --git a/src/lua/utils.c b/src/lua/utils.c > index 0b05d7257..23ccbc3c9 100644 > --- a/src/lua/utils.c > +++ b/src/lua/utils.c > @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it) > > /* }}} */ > > +/** > + * @brief A wrapper for to be called via luaT_call > + * in luaT_newthread. Whether new Lua coroutine is created it is > + * returned on the top of the guest stack. > + * @param L is a Lua state > + * @sa > + */ > +static int > +luaT_newthread_wrapper(lua_State *L) All the other code uses 'struct lua_State' instead of 'lua_State'. Except old code. Why isn't it so here? > +{ > + (void)lua_newthread(L); > + return 1; > +} > + > +lua_State * > +luaT_newthread(lua_State *L) > +{ > + assert(luaT_newthread_ref != LUA_NOREF); > + lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref); > + assert(lua_isfunction(L, -1)); > + if (luaT_call(L, 0, 1) != 0) > + return NULL; > + lua_State *L1 = lua_tothread(L, -1); > + assert(L1 != NULL); > + return L1; > +}