[Tarantool-patches] [PATCH 00/20] Rewrite performance critical parts of net.box in C
Vladimir Davydov
vdavydov at tarantool.org
Fri Jul 30 13:04:12 MSK 2021
On Fri, Jul 30, 2021 at 12:38:52AM +0200, Vladislav Shpilevoy wrote:
> >>>> Asynchronous calls don't show as much of an improvement as synchronous,
> >>>> because per each asynchronous call we still have to create a 'future'
> >>>> object in Lua. Still, the improvement is quite noticeable - 30% for
> >>>> REPLACE, 10% for UPDATE, 20% for SELECT, 25% for CALL.
> >>>
> >>> I didn't reach the end of the patchset yet, but did you try to create
> >>> the futures as cdata objects? They could be allocated on mempool, their
> >>> GC pressure might be optimized by doing similar to luaT_tuple_encode_table_ref
> >>> optimization (there was found a way to make __gc and other C functions
> >>> cheaper when it comes to amount of GC objects in Lua).
> >>>
> >>> The API would stay the same, they just would become C structs with
> >>> methods instead of Lua tables.
> >>
> >> Good call. Going to to try that. Thanks.
> >
> > Quickly whipped up a patch that converts userdata and cdata. Applied on
> > top of the series. Surprisingly, it only made things worse:
> >
> > With the patch (future is cdata):
> >
> > ==== FUTURE ====
> > REPLACE: WALL 221.182 PROC 343.368 KRPS
> > UPDATE: WALL 178.918 PROC 291.504 KRPS
> > SELECT: WALL 220.815 PROC 248.843 KRPS
> > CALL: WALL 218.313 PROC 315.670 KRPS
> >
> >
> > Without the patch (future is userdata):
> >
> > ==== FUTURE ====
> > REPLACE: WALL 262.454 PROC 450.425 KRPS
> > UPDATE: WALL 191.538 PROC 322.888 KRPS
> > SELECT: WALL 288.498 PROC 333.393 KRPS
> > CALL: WALL 247.463 PROC 375.180 KRPS
> >
> > The patch is below. Note, it isn't entirely correct - future:pairs
> > doesn't work, because luaL_checkcdata doesn't seem to handle upvalues,
> > but it shouldn't affect the test.
>
> Sorry, the patch can't be applied on top of the branch somewhy.
>
> But I see now that you already had the futures as C objects even
> before my proposal on top of the branch, so perhaps this is expectable
> that not much improved.
>
> However there is an idea which might make the perf gap smaller.
>
> > @@ -1642,10 +1649,14 @@ netbox_perform_async_request_impl(struct lua_State *L, int idx,
> > static int
> > netbox_perform_async_request(struct lua_State *L)
> > {
> > - struct netbox_request *request = lua_newuserdata(L, sizeof(*request));
> > + struct netbox_request *request = mempool_alloc(&netbox_request_pool);
> > + if (request == NULL)
> > + return luaL_error(L, "out of memory");
> > netbox_request_create(request);
> > - luaL_getmetatable(L, netbox_request_typename);
> > - lua_setmetatable(L, -2);
> > + *(struct netbox_request **)
> > + luaL_pushcdata(L, CTID_STRUCT_NETBOX_REQUEST_PTR) = request;
> > + lua_pushcfunction(L, luaT_netbox_request_gc);
>
> Here is the problem, which I mentioned in my email.
>
> lua_pushcfunction() is an expensive thing because it pushes a new GC
> object on the stack every time. It was happening in a few hot places before,
> and there is now a solution that for such cfunctions we push them only once,
> remember their ref like luaT_tuple_encode_table_ref does, and then re-push
> the same finalizer for each cdata object.
>
> See ec9a7fa7ebbb8fd6e15b9516875c3fd1a1f6dfee and
> e88c0d21ab765d4c53bed2437c49d77b3ffe4216.
>
> You need to do
>
> lua_pushcfunction(L, luaT_netbox_request_gc);
>
> only once somewhere in netbox_init() or something. Then
>
> netbox_request_gc_ref = luaL_ref(L, LUA_REGISTRYINDEX);
>
> Then in netbox_perform_async_request() you make
>
> lua_rawgeti(L, LUA_REGISTRYINDEX, netbox_request_gc_ref);
> luaL_setcdatagc(L, -2);
>
> I think this should help. But I doubt it will help too much
> though.
It does help, but the userdata implementation of the future object still
performs better:
* Userdata (original implementation)
REPLACE: WALL 254.037 PROC 424.737 KRPS
UPDATE: WALL 193.136 PROC 323.086 KRPS
SELECT: WALL 354.040 PROC 428.211 KRPS
CALL: WALL 275.215 PROC 449.326 KRPS
* Cdata; set gc function using lua_pushcfunction
REPLACE: WALL 216.516 PROC 334.610 KRPS
UPDATE: WALL 180.824 PROC 290.090 KRPS
SELECT: WALL 261.858 PROC 301.944 KRPS
CALL: WALL 244.465 PROC 372.902 KRPS
* Cdata; set gc function using lua_rawgeti
REPLACE: WALL 237.665 PROC 381.624 KRPS
UPDATE: WALL 184.152 PROC 298.147 KRPS
SELECT: WALL 299.181 PROC 350.282 KRPS
CALL: WALL 263.745 PROC 419.355 KRPS
The patches turning the future object into cdata are available on
the following branch:
https://github.com/tarantool/tarantool/tree/vdavydov/net-box-optimization-cdata-request
More information about the Tarantool-patches
mailing list