[Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call

Thu Jun 24 23:00:28 MSK 2021

Thanks for the investigation!

>>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c
>>> index 0315e720c..3b2572096 100644
>>> --- a/src/box/lua/call.c
>>> +++ b/src/box/lua/call.c
>>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream,
>>>  	struct encode_lua_ctx ctx;
>>>  	ctx.port = port;
>>>  	ctx.stream = stream;
>>> -	struct lua_State *L = tarantool_L;
>>> -	int top = lua_gettop(L);
>>> -	if (lua_cpcall(L, handler, &ctx) != 0) {
>>> -		luaT_toerror(port->L);
>>> +	lua_State *L = port->L;
>>> +	/*
>>> +	 * At the moment Lua stack holds only values to encode.
>>> +	 * Insert corresponding encoder to the bottom and push
>>> +	 * encode context as lightuserdata to the top.
>>> +	 */
>>> +	const int size = lua_gettop(L);
>>> +	lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]);
>>> +	assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1));
>>> +	lua_insert(L, 1);
>>
>> If I remember correctly, this leads to moving all the existing
>> stack elements forward. Which might be expensive. I know from
>> Vlad Grubov's words that they have code with hundreds of values in
>> multireturn from stored functions. Did you bench what happens when
>> the Lua coroutine contains a lot of values? In the bench by the
>> link above I see only 1-element array and map. Size of the array
>> and map does not matter though. Only multireturn is interesting
>> here. Like 'return 1, 2, 3, ...'.
> 
> I've added this benchmark (200 numbers to return) to the gist.
> Local results for the bench:
> 
> Master:
> | Encode mul: 237900953 mcs, 12.6 K ps
> 
> My branch:
> | Encode mul: 235735350 mcs, 12.7 K ps
> 
> `luamp_encode()` has the biggest impact in `port_do_lua_dump()`
> (`lua_insert()` costs ~0.1% of the whole runtime).

That looks suspicious. Why does it have such a small impact? Did
you check that multireturn really made the stack bigger? Shouldn't
it have linear complexity depending on the stack size after the
target index?

The patch looks fine to me, if the benchmarks show there is even a
slight perf win. But I think Igor should take another look.