[Tarantool-patches] [PATCH] lua/utils: improve luaT_newthread performance

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jul 24 00:23:05 MSK 2020


Hi! Thanks for the patch!

On 20.07.2020 13:28, Igor Munkin wrote:
> <luaT_newthread> created a new GCfunc object for the helper invoked in a
> protected <lua_cpcall> frame (i.e. <luaT_newthread_wrapper>) on each
> call. The change introduces a static reference to a GCfunc object for
> <luaT_newthread_wrapper> to be initialized on Tarantool startup to
> reduce Lua GC memory usage.
> 
> Furthermore, since <lua_cpcall> yields nothing on guest stack, the newly
> created Lua coroutine need to be pushed back to prevent its sweep. So
> to reduce guest stack manipulations <lua_cpcall> is replaced with
> <lua_pcall> and the resulting Lua thread is obtained via guest stack.

I don't think I understand. Before we had one store into an 8 byte location
with lua_cpcall(). Now we have push and pop on the stack with lua_pcall().
What is the point? It seems to be worse.

> Signed-off-by: Igor Munkin <imun at tarantool.org>

This is already the second patch on removal of excessive func pushes.
I suggest you take a look at all the other lua_pushcfunction() calls.
For example, luaT_pushtuple() does it, and it is called on each tuple
push. Can be tens of thousands per seconds.

The same for luaT_pusherror, lua_field_inspect_ucdata. These should be
called often too.

> ---
> Recently we discussed with Timur his struggling with linking his binary
> Lua module against Tarantool. The reason is LuaJIT internals usage for
> manipulations with the guest stack that are not provided by the binary.
> I glanced the current <luaT_newthread> implementation and found out two
> another problems related to the platform overall performance (as it is
> proved with the corresponding benchmarks).
> 
> The first problem is the similar one <box_process_lua> had prior to the
> patch[1]: <lua_cpcall> creates an auxiliary GCfunc object for the
> function to be called in protected frame. However this function is the
> same throughout the platform uptime. It can be created on Taranool
> startup and I see no reason to clobber GC that way.
> 
> Another problem I found are excess manipulations with the guest stack:
> one need the newly born coroutine on it to prevent it being collected by
> GC, but <lua_cpcall> purges everything left on the stack in scope of the
> invoked function. As a result the obtained Lua coroutine is "pushed"
> back to the guest stack via LuaJIT internal interfaces. It's a bit
> ridiculous, since one can just use public API to get the same results:
> Lua coroutine on the guest stack and the corresponding pointer to it.

I understand the GC object point. But I don't understand the pcall vs cpcall.
What is the problem with lua_cpcall() removing all from the stack? We don't
need anything on the stack here. We just need to return a pointer, and this
was done fine before.

> I tested the platform performance with the same benchmark[2] I made for
> the <box_process_lua> patch and here are the numbers I obtained after
> the 15 runs:
> * Vanilla bleeding master (mean):
> | ===== 2.5.0-267-gbf047ad44 =====
> | call by ref GC: 921877 Kb
> | call by ref time: 1.340172 sec
> | call GC: 476563.991667 Kb
> | eval GC: 655274.935547 Kb
> * Patched bleeding master (mean):
> | ===== 2.5.0-268-gec0eb12f4 =====
> | call by ref GC: 859377 Kb
> | call by ref time: 1.215410 sec
> | call GC: 445313 Kb
> | eval GC: 624024 Kb
> * Relative measurements (before -> after):
> | call by ref GC: -6% (-62500 Kb)
> | call by ref time: -9% (-0.124762 sec)
> | call GC: -6% (-31250 Kb)
> | eval GC: -4% (-31250 Kb)
> 
> There is one hot path I left unverified -- Lua-born fibers creation, but
> I guess the relative numbers are quite similar to the ones I mentioned
> above. However, if one wonders these results, feel free to ask me.
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 0b05d7257..23ccbc3c9 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -1224,6 +1226,33 @@ void luaL_iterator_delete(struct luaL_iterator *it)
>  
>  /* }}} */
>  
> +/**
> + * @brief A wrapper for <lua_newthread> to be called via luaT_call
> + * in luaT_newthread. Whether new Lua coroutine is created it is
> + * returned on the top of the guest stack.
> + * @param L is a Lua state
> + * @sa <lua_newthread>
> + */
> +static int
> +luaT_newthread_wrapper(lua_State *L)

All the other code uses 'struct lua_State' instead of 'lua_State'.
Except old code. Why isn't it so here?

> +{
> +	(void)lua_newthread(L);
> +	return 1;
> +}
> +
> +lua_State *
> +luaT_newthread(lua_State *L)
> +{
> +	assert(luaT_newthread_ref != LUA_NOREF);
> +	lua_rawgeti(L, LUA_REGISTRYINDEX, luaT_newthread_ref);
> +	assert(lua_isfunction(L, -1));
> +	if (luaT_call(L, 0, 1) != 0)
> +		return NULL;
> +	lua_State *L1 = lua_tothread(L, -1);
> +	assert(L1 != NULL);
> +	return L1;
> +}


More information about the Tarantool-patches mailing list