[Tarantool-patches] [PATCH 00/20] Rewrite performance critical parts of net.box in C

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jul 30 01:38:52 MSK 2021


>>>> Asynchronous calls don't show as much of an improvement as synchronous,
>>>> because per each asynchronous call we still have to create a 'future'
>>>> object in Lua. Still, the improvement is quite noticeable - 30% for
>>>> REPLACE, 10% for UPDATE, 20% for SELECT, 25% for CALL.
>>>
>>> I didn't reach the end of the patchset yet, but did you try to create
>>> the futures as cdata objects? They could be allocated on mempool, their
>>> GC pressure might be optimized by doing similar to luaT_tuple_encode_table_ref
>>> optimization (there was found a way to make __gc and other C functions
>>> cheaper when it comes to amount of GC objects in Lua).
>>>
>>> The API would stay the same, they just would become C structs with
>>> methods instead of Lua tables.
>>
>> Good call. Going to to try that. Thanks.
> 
> Quickly whipped up a patch that converts userdata and cdata. Applied on
> top of the series. Surprisingly, it only made things worse:
> 
> With the patch (future is cdata):
> 
> ==== FUTURE ====
>   REPLACE: WALL 221.182 PROC 343.368 KRPS
>    UPDATE: WALL 178.918 PROC 291.504 KRPS
>    SELECT: WALL 220.815 PROC 248.843 KRPS
>      CALL: WALL 218.313 PROC 315.670 KRPS
> 
> 
> Without the patch (future is userdata):
> 
> ==== FUTURE ====
>   REPLACE: WALL 262.454 PROC 450.425 KRPS
>    UPDATE: WALL 191.538 PROC 322.888 KRPS
>    SELECT: WALL 288.498 PROC 333.393 KRPS
>      CALL: WALL 247.463 PROC 375.180 KRPS
> 
> The patch is below. Note, it isn't entirely correct - future:pairs
> doesn't work, because luaL_checkcdata doesn't seem to handle upvalues,
> but it shouldn't affect the test.

Sorry, the patch can't be applied on top of the branch somewhy.

But I see now that you already had the futures as C objects even
before my proposal on top of the branch, so perhaps this is expectable
that not much improved.

However there is an idea which might make the perf gap smaller.

> @@ -1642,10 +1649,14 @@ netbox_perform_async_request_impl(struct lua_State *L, int idx,
>  static int
>  netbox_perform_async_request(struct lua_State *L)
>  {
> -	struct netbox_request *request = lua_newuserdata(L, sizeof(*request));
> +	struct netbox_request *request = mempool_alloc(&netbox_request_pool);
> +	if (request == NULL)
> +		return luaL_error(L, "out of memory");
>  	netbox_request_create(request);
> -	luaL_getmetatable(L, netbox_request_typename);
> -	lua_setmetatable(L, -2);
> +	*(struct netbox_request **)
> +		luaL_pushcdata(L, CTID_STRUCT_NETBOX_REQUEST_PTR) = request;
> +	lua_pushcfunction(L, luaT_netbox_request_gc);

Here is the problem, which I mentioned in my email.

lua_pushcfunction() is an expensive thing because it pushes a new GC
object on the stack every time. It was happening in a few hot places before,
and there is now a solution that for such cfunctions we push them only once,
remember their ref like luaT_tuple_encode_table_ref does, and then re-push
the same finalizer for each cdata object.

See ec9a7fa7ebbb8fd6e15b9516875c3fd1a1f6dfee and
e88c0d21ab765d4c53bed2437c49d77b3ffe4216.

You need to do

	lua_pushcfunction(L, luaT_netbox_request_gc);

only once somewhere in netbox_init() or something. Then 

	netbox_request_gc_ref = luaL_ref(L, LUA_REGISTRYINDEX);

Then in netbox_perform_async_request() you make

	lua_rawgeti(L, LUA_REGISTRYINDEX, netbox_request_gc_ref);
	luaL_setcdatagc(L, -2);

I think this should help. But I doubt it will help too much
though.


More information about the Tarantool-patches mailing list