From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4ECE66EC40; Tue, 29 Jun 2021 10:09:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4ECE66EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1624950561; bh=TUV3JJjPS7C4SXBvFX+iAUic9m9wBqGHdJwkyJUT0B0=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=WdHUW4gBLbH0GSatKDfsHNahNxW2yQTb2+i9w1fJnC+lJSwUJRv9nzyKoWyZaYUN8 PWYCbtpEuocgl5gA9sd5DRV2NAzhbU5czN7eRkw3J5vS6/lv/tl2ZH6R3iCMr9L2AI qBWU00yThshO3pXGuGK7gbwCf18AtFLNOa33QEEA= Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 1EE486EC40 for ; Tue, 29 Jun 2021 10:09:19 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1EE486EC40 Received: by smtp51.i.mail.ru with esmtpa (envelope-from ) id 1ly7ry-0005zM-2o; Tue, 29 Jun 2021 10:09:18 +0300 Date: Tue, 29 Jun 2021 10:07:59 +0300 To: Vladislav Shpilevoy Message-ID: References: <20210618181416.25454-1-skaplun@tarantool.org> <67687e87-fb0e-1477-ac76-5fa9b04b9e3a@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD954DFF1DC42D673FB517E8D1A7E34673E94B235AA752823A6182A05F5380850400FC0495FBA9103C297A141CD41720BCCC4915A9AAE95998EED18AF5D92467DB3 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7A3295C83650092F9EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637174E2957C4CE0F938638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D81890BD4873856FAC75912971F84B84C1117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18CB629EEF1311BF91D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B67393CE827C55B5F775ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A5DAC49D3A95700028C86A76B3B33D376EE3BC5B03484EEFADD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA752546FE575EB473F1410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34C53592357E805773B2E8B68F4CFD443116D65C09F0600DDA4BE005ED5A04A39C23E9DD865CE84F211D7E09C32AA3244CA0F9C2EB2C46F7C83681D9D36F46041AC3B3ADDA61883BB5FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojNjLyMoNI2JarL49bIMPdjw== X-Mailru-Sender: 3B9A0136629DC91206CBC582EFEF4CB4ABDC90449B6C189ADD3F95D3FD80EDC68D65200875EB1684F2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A89437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for the questions! On 24.06.21, Vladislav Shpilevoy wrote: > Thanks for the investigation! > > >>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c > >>> index 0315e720c..3b2572096 100644 > >>> --- a/src/box/lua/call.c > >>> +++ b/src/box/lua/call.c > >>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream, > >>> struct encode_lua_ctx ctx; > >>> ctx.port = port; > >>> ctx.stream = stream; > >>> - struct lua_State *L = tarantool_L; > >>> - int top = lua_gettop(L); > >>> - if (lua_cpcall(L, handler, &ctx) != 0) { > >>> - luaT_toerror(port->L); > >>> + lua_State *L = port->L; > >>> + /* > >>> + * At the moment Lua stack holds only values to encode. > >>> + * Insert corresponding encoder to the bottom and push > >>> + * encode context as lightuserdata to the top. > >>> + */ > >>> + const int size = lua_gettop(L); > >>> + lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]); > >>> + assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1)); > >>> + lua_insert(L, 1); > >> > >> If I remember correctly, this leads to moving all the existing > >> stack elements forward. Which might be expensive. I know from > >> Vlad Grubov's words that they have code with hundreds of values in > >> multireturn from stored functions. Did you bench what happens when > >> the Lua coroutine contains a lot of values? In the bench by the > >> link above I see only 1-element array and map. Size of the array > >> and map does not matter though. Only multireturn is interesting > >> here. Like 'return 1, 2, 3, ...'. > > > > I've added this benchmark (200 numbers to return) to the gist. > > Local results for the bench: > > > > Master: > > | Encode mul: 237900953 mcs, 12.6 K ps > > > > My branch: > > | Encode mul: 235735350 mcs, 12.7 K ps > > > > `luamp_encode()` has the biggest impact in `port_do_lua_dump()` > > (`lua_insert()` costs ~0.1% of the whole runtime). > > That looks suspicious. Why does it have such a small impact? Did I've run the benchmark for return case only with `perf stat` (and 5K return values) and found three interesting metrics (others are pretty the same, up to 0.07%): Master: | 10,255,748,650 cache-misses # 13.686 % of all cache refs (12.49%) | 1,364,762,310 LLC-load-misses # 15.92% of all LL-cache accesses (12.50%) | 401,136,504 iTLB-load-misses # 39.68% of all iTLB cache accesses (12.51%) My branch: | 9,964,939,197 cache-misses # 13.413 % of all cache refs (12.52%) | 1,296,545,699 LLC-load-misses # 15.08% of all LL-cache accesses (12.50%) | 285,852,865 iTLB-load-misses # 29.13% of all iTLB cache accesses (12.49%) 1) LLC (last-line cache, L2 in my case) misses is slightly decreasing. I suppose that all insert movement operations use cache (L1 or L2) instead memory access, that make it rather cheaply. 2) iTLB (instruction translation look-aside buffer) [1] needs to avoid penalty of page walks to a page table. So looks like within this patch code locality is improved (may be due to the fact of using `luaT_call()` instead of 'lua_cpcall()` and 'luaT_toerror()`). > you check that multireturn really made the stack bigger? Shouldn't Yes, I did. > it have linear complexity depending on the stack size after the > target index? Yes, with 5K items to return it becomes pretty equal or even a little bit slower (up to ~1%) than the original code. But encoding is still taking the biggest part of the runtime. Side note: if user wants to return 5K values (or a table with 5K values) from Tarantool we can't stop him, but only to offer him our empathy. > > The patch looks fine to me, if the benchmarks show there is even a > slight perf win. But I think Igor should take another look. [1]: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/front-end-bound/itlb-overhead.html -- Best regards, Sergey Kaplun