From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4531D6EC55; Fri, 30 Jul 2021 13:04:15 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4531D6EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627639455; bh=iMcovme2bGUsrLd+QT8AZh+RAPbwl79kk7WUEIQBFZA=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=p5wK7G3jmN1OYztT45uDsjB6sM0P6RAWJUz4yiE05A16giME0eDMRUf06+eZiZTAH 5M24mw3PiRb2xjHFYtZVIoIirX33Nk2PWQ+0FV4tRWUfsPejwfl3tO7MkNqHeLgZ1O J1fR9A+/tsmb/UDwZcQQKVJ2Gjoza89FLSOPg7sw= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B71BC6EC55 for ; Fri, 30 Jul 2021 13:04:14 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B71BC6EC55 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1m9PNF-0002i2-QV; Fri, 30 Jul 2021 13:04:14 +0300 Date: Fri, 30 Jul 2021 13:04:12 +0300 To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Message-ID: <20210730100412.r3m2hdqnhrzjmjpn@esperanza> References: <0123065e-6352-331e-2dd8-b712b9d6e26a@tarantool.org> <20210729113313.fmq2fo4vpkrdusp7@esperanza> <20210729152316.boj2wdhfn6wr26j2@esperanza> <6baa639d-8c94-000d-cd56-9189b3cc7dc9@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6baa639d-8c94-000d-cd56-9189b3cc7dc9@tarantool.org> X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C351B198F4576AC7B20CA14D9DFB46B94A182A05F53808504058310549BFD8E53D3CEA0393FBFE49B6934C35A8594D0AF67290BDBC52849D5F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7A33E1178EA603666EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006372CAA106849E7D531EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F29015C23E62742983EAF9E9E6C1CB0863CC7F00164DA146DAFE8445B8C89999728AA50765F7900637F6B57BC7E64490618DEB871D839B7333395957E7521B51C2DFABB839C843B9C08941B15DA834481F8AA50765F7900637D0FEED2715E18529389733CBF5DBD5E9B5C8C57E37DE458B9E9CE733340B9D5F3BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E7358CCB3ED2A1DE2304C4224003CC83647689D4C264860C145E X-C1DE0DAB: 0D63561A33F958A5C16F97CDA1F671D211DA1B73DE524CAE06D49525AEC5470CD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA751B940EDA0DFB0535410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D343C45ADCD169245FA503BD45D78F7DA30F524919282817D78219855D0DCB2E8EEFCE1E285E8ACAA2C1D7E09C32AA3244C85788DB9FCBDBC59AD3E09CD7E54A245F2F5F14F68F1805BFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojWBddABnKmoLXUu5muoHvVg== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DC204EC3FB7B506A399B4100F1D736C65274CEFED1673C562683ABF942079399BFB559BB5D741EB966A65DFF43FF7BE03240331F90058701C67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 00/20] Rewrite performance critical parts of net.box in C X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladimir Davydov via Tarantool-patches Reply-To: Vladimir Davydov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Fri, Jul 30, 2021 at 12:38:52AM +0200, Vladislav Shpilevoy wrote: > >>>> Asynchronous calls don't show as much of an improvement as synchronous, > >>>> because per each asynchronous call we still have to create a 'future' > >>>> object in Lua. Still, the improvement is quite noticeable - 30% for > >>>> REPLACE, 10% for UPDATE, 20% for SELECT, 25% for CALL. > >>> > >>> I didn't reach the end of the patchset yet, but did you try to create > >>> the futures as cdata objects? They could be allocated on mempool, their > >>> GC pressure might be optimized by doing similar to luaT_tuple_encode_table_ref > >>> optimization (there was found a way to make __gc and other C functions > >>> cheaper when it comes to amount of GC objects in Lua). > >>> > >>> The API would stay the same, they just would become C structs with > >>> methods instead of Lua tables. > >> > >> Good call. Going to to try that. Thanks. > > > > Quickly whipped up a patch that converts userdata and cdata. Applied on > > top of the series. Surprisingly, it only made things worse: > > > > With the patch (future is cdata): > > > > ==== FUTURE ==== > > REPLACE: WALL 221.182 PROC 343.368 KRPS > > UPDATE: WALL 178.918 PROC 291.504 KRPS > > SELECT: WALL 220.815 PROC 248.843 KRPS > > CALL: WALL 218.313 PROC 315.670 KRPS > > > > > > Without the patch (future is userdata): > > > > ==== FUTURE ==== > > REPLACE: WALL 262.454 PROC 450.425 KRPS > > UPDATE: WALL 191.538 PROC 322.888 KRPS > > SELECT: WALL 288.498 PROC 333.393 KRPS > > CALL: WALL 247.463 PROC 375.180 KRPS > > > > The patch is below. Note, it isn't entirely correct - future:pairs > > doesn't work, because luaL_checkcdata doesn't seem to handle upvalues, > > but it shouldn't affect the test. > > Sorry, the patch can't be applied on top of the branch somewhy. > > But I see now that you already had the futures as C objects even > before my proposal on top of the branch, so perhaps this is expectable > that not much improved. > > However there is an idea which might make the perf gap smaller. > > > @@ -1642,10 +1649,14 @@ netbox_perform_async_request_impl(struct lua_State *L, int idx, > > static int > > netbox_perform_async_request(struct lua_State *L) > > { > > - struct netbox_request *request = lua_newuserdata(L, sizeof(*request)); > > + struct netbox_request *request = mempool_alloc(&netbox_request_pool); > > + if (request == NULL) > > + return luaL_error(L, "out of memory"); > > netbox_request_create(request); > > - luaL_getmetatable(L, netbox_request_typename); > > - lua_setmetatable(L, -2); > > + *(struct netbox_request **) > > + luaL_pushcdata(L, CTID_STRUCT_NETBOX_REQUEST_PTR) = request; > > + lua_pushcfunction(L, luaT_netbox_request_gc); > > Here is the problem, which I mentioned in my email. > > lua_pushcfunction() is an expensive thing because it pushes a new GC > object on the stack every time. It was happening in a few hot places before, > and there is now a solution that for such cfunctions we push them only once, > remember their ref like luaT_tuple_encode_table_ref does, and then re-push > the same finalizer for each cdata object. > > See ec9a7fa7ebbb8fd6e15b9516875c3fd1a1f6dfee and > e88c0d21ab765d4c53bed2437c49d77b3ffe4216. > > You need to do > > lua_pushcfunction(L, luaT_netbox_request_gc); > > only once somewhere in netbox_init() or something. Then > > netbox_request_gc_ref = luaL_ref(L, LUA_REGISTRYINDEX); > > Then in netbox_perform_async_request() you make > > lua_rawgeti(L, LUA_REGISTRYINDEX, netbox_request_gc_ref); > luaL_setcdatagc(L, -2); > > I think this should help. But I doubt it will help too much > though. It does help, but the userdata implementation of the future object still performs better: * Userdata (original implementation) REPLACE: WALL 254.037 PROC 424.737 KRPS UPDATE: WALL 193.136 PROC 323.086 KRPS SELECT: WALL 354.040 PROC 428.211 KRPS CALL: WALL 275.215 PROC 449.326 KRPS * Cdata; set gc function using lua_pushcfunction REPLACE: WALL 216.516 PROC 334.610 KRPS UPDATE: WALL 180.824 PROC 290.090 KRPS SELECT: WALL 261.858 PROC 301.944 KRPS CALL: WALL 244.465 PROC 372.902 KRPS * Cdata; set gc function using lua_rawgeti REPLACE: WALL 237.665 PROC 381.624 KRPS UPDATE: WALL 184.152 PROC 298.147 KRPS SELECT: WALL 299.181 PROC 350.282 KRPS CALL: WALL 263.745 PROC 419.355 KRPS The patches turning the future object into cdata are available on the following branch: https://github.com/tarantool/tarantool/tree/vdavydov/net-box-optimization-cdata-request