[Tarantool-patches] [PATCH 00/20] Rewrite performance critical parts of net.box in C

Vladimir Davydov vdavydov at tarantool.org
Fri Jul 30 13:04:12 MSK 2021


On Fri, Jul 30, 2021 at 12:38:52AM +0200, Vladislav Shpilevoy wrote:
> >>>> Asynchronous calls don't show as much of an improvement as synchronous,
> >>>> because per each asynchronous call we still have to create a 'future'
> >>>> object in Lua. Still, the improvement is quite noticeable - 30% for
> >>>> REPLACE, 10% for UPDATE, 20% for SELECT, 25% for CALL.
> >>>
> >>> I didn't reach the end of the patchset yet, but did you try to create
> >>> the futures as cdata objects? They could be allocated on mempool, their
> >>> GC pressure might be optimized by doing similar to luaT_tuple_encode_table_ref
> >>> optimization (there was found a way to make __gc and other C functions
> >>> cheaper when it comes to amount of GC objects in Lua).
> >>>
> >>> The API would stay the same, they just would become C structs with
> >>> methods instead of Lua tables.
> >>
> >> Good call. Going to to try that. Thanks.
> > 
> > Quickly whipped up a patch that converts userdata and cdata. Applied on
> > top of the series. Surprisingly, it only made things worse:
> > 
> > With the patch (future is cdata):
> > 
> > ==== FUTURE ====
> >   REPLACE: WALL 221.182 PROC 343.368 KRPS
> >    UPDATE: WALL 178.918 PROC 291.504 KRPS
> >    SELECT: WALL 220.815 PROC 248.843 KRPS
> >      CALL: WALL 218.313 PROC 315.670 KRPS
> > 
> > 
> > Without the patch (future is userdata):
> > 
> > ==== FUTURE ====
> >   REPLACE: WALL 262.454 PROC 450.425 KRPS
> >    UPDATE: WALL 191.538 PROC 322.888 KRPS
> >    SELECT: WALL 288.498 PROC 333.393 KRPS
> >      CALL: WALL 247.463 PROC 375.180 KRPS
> > 
> > The patch is below. Note, it isn't entirely correct - future:pairs
> > doesn't work, because luaL_checkcdata doesn't seem to handle upvalues,
> > but it shouldn't affect the test.
> 
> Sorry, the patch can't be applied on top of the branch somewhy.
> 
> But I see now that you already had the futures as C objects even
> before my proposal on top of the branch, so perhaps this is expectable
> that not much improved.
> 
> However there is an idea which might make the perf gap smaller.
> 
> > @@ -1642,10 +1649,14 @@ netbox_perform_async_request_impl(struct lua_State *L, int idx,
> >  static int
> >  netbox_perform_async_request(struct lua_State *L)
> >  {
> > -	struct netbox_request *request = lua_newuserdata(L, sizeof(*request));
> > +	struct netbox_request *request = mempool_alloc(&netbox_request_pool);
> > +	if (request == NULL)
> > +		return luaL_error(L, "out of memory");
> >  	netbox_request_create(request);
> > -	luaL_getmetatable(L, netbox_request_typename);
> > -	lua_setmetatable(L, -2);
> > +	*(struct netbox_request **)
> > +		luaL_pushcdata(L, CTID_STRUCT_NETBOX_REQUEST_PTR) = request;
> > +	lua_pushcfunction(L, luaT_netbox_request_gc);
> 
> Here is the problem, which I mentioned in my email.
> 
> lua_pushcfunction() is an expensive thing because it pushes a new GC
> object on the stack every time. It was happening in a few hot places before,
> and there is now a solution that for such cfunctions we push them only once,
> remember their ref like luaT_tuple_encode_table_ref does, and then re-push
> the same finalizer for each cdata object.
> 
> See ec9a7fa7ebbb8fd6e15b9516875c3fd1a1f6dfee and
> e88c0d21ab765d4c53bed2437c49d77b3ffe4216.
> 
> You need to do
> 
> 	lua_pushcfunction(L, luaT_netbox_request_gc);
> 
> only once somewhere in netbox_init() or something. Then 
> 
> 	netbox_request_gc_ref = luaL_ref(L, LUA_REGISTRYINDEX);
> 
> Then in netbox_perform_async_request() you make
> 
> 	lua_rawgeti(L, LUA_REGISTRYINDEX, netbox_request_gc_ref);
> 	luaL_setcdatagc(L, -2);
> 
> I think this should help. But I doubt it will help too much
> though.

It does help, but the userdata implementation of the future object still
performs better:

* Userdata (original implementation)

  REPLACE: WALL 254.037 PROC 424.737 KRPS
   UPDATE: WALL 193.136 PROC 323.086 KRPS
   SELECT: WALL 354.040 PROC 428.211 KRPS
     CALL: WALL 275.215 PROC 449.326 KRPS

* Cdata; set gc function using lua_pushcfunction

  REPLACE: WALL 216.516 PROC 334.610 KRPS
   UPDATE: WALL 180.824 PROC 290.090 KRPS
   SELECT: WALL 261.858 PROC 301.944 KRPS
     CALL: WALL 244.465 PROC 372.902 KRPS

* Cdata; set gc function using lua_rawgeti

  REPLACE: WALL 237.665 PROC 381.624 KRPS
   UPDATE: WALL 184.152 PROC 298.147 KRPS
   SELECT: WALL 299.181 PROC 350.282 KRPS
     CALL: WALL 263.745 PROC 419.355 KRPS

The patches turning the future object into cdata are available on
the following branch:

https://github.com/tarantool/tarantool/tree/vdavydov/net-box-optimization-cdata-request


More information about the Tarantool-patches mailing list