From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call Date: Tue, 29 Jun 2021 10:07:59 +0300 [thread overview] Message-ID: <YNrGzydr84utZH0x@root> (raw) In-Reply-To: <f7828c1c-f878-0c57-d496-e02006b44086@tarantool.org> Thanks for the questions! On 24.06.21, Vladislav Shpilevoy wrote: > Thanks for the investigation! > > >>> diff --git a/src/box/lua/call.c b/src/box/lua/call.c > >>> index 0315e720c..3b2572096 100644 > >>> --- a/src/box/lua/call.c > >>> +++ b/src/box/lua/call.c > >>> @@ -450,13 +482,20 @@ port_lua_do_dump(struct port *base, struct mpstream *stream, > >>> struct encode_lua_ctx ctx; > >>> ctx.port = port; > >>> ctx.stream = stream; > >>> - struct lua_State *L = tarantool_L; > >>> - int top = lua_gettop(L); > >>> - if (lua_cpcall(L, handler, &ctx) != 0) { > >>> - luaT_toerror(port->L); > >>> + lua_State *L = port->L; > >>> + /* > >>> + * At the moment Lua stack holds only values to encode. > >>> + * Insert corresponding encoder to the bottom and push > >>> + * encode context as lightuserdata to the top. > >>> + */ > >>> + const int size = lua_gettop(L); > >>> + lua_rawgeti(L, LUA_REGISTRYINDEX, execute_lua_refs[handler]); > >>> + assert(lua_isfunction(L, -1) && lua_iscfunction(L, -1)); > >>> + lua_insert(L, 1); > >> > >> If I remember correctly, this leads to moving all the existing > >> stack elements forward. Which might be expensive. I know from > >> Vlad Grubov's words that they have code with hundreds of values in > >> multireturn from stored functions. Did you bench what happens when > >> the Lua coroutine contains a lot of values? In the bench by the > >> link above I see only 1-element array and map. Size of the array > >> and map does not matter though. Only multireturn is interesting > >> here. Like 'return 1, 2, 3, ...'. > > > > I've added this benchmark (200 numbers to return) to the gist. > > Local results for the bench: > > > > Master: > > | Encode mul: 237900953 mcs, 12.6 K ps > > > > My branch: > > | Encode mul: 235735350 mcs, 12.7 K ps > > > > `luamp_encode()` has the biggest impact in `port_do_lua_dump()` > > (`lua_insert()` costs ~0.1% of the whole runtime). > > That looks suspicious. Why does it have such a small impact? Did I've run the benchmark for return case only with `perf stat` (and 5K return values) and found three interesting metrics (others are pretty the same, up to 0.07%): Master: | 10,255,748,650 cache-misses # 13.686 % of all cache refs (12.49%) | 1,364,762,310 LLC-load-misses # 15.92% of all LL-cache accesses (12.50%) | 401,136,504 iTLB-load-misses # 39.68% of all iTLB cache accesses (12.51%) My branch: | 9,964,939,197 cache-misses # 13.413 % of all cache refs (12.52%) | 1,296,545,699 LLC-load-misses # 15.08% of all LL-cache accesses (12.50%) | 285,852,865 iTLB-load-misses # 29.13% of all iTLB cache accesses (12.49%) 1) LLC (last-line cache, L2 in my case) misses is slightly decreasing. I suppose that all insert movement operations use cache (L1 or L2) instead memory access, that make it rather cheaply. 2) iTLB (instruction translation look-aside buffer) [1] needs to avoid penalty of page walks to a page table. So looks like within this patch code locality is improved (may be due to the fact of using `luaT_call()` instead of 'lua_cpcall()` and 'luaT_toerror()`). > you check that multireturn really made the stack bigger? Shouldn't Yes, I did. > it have linear complexity depending on the stack size after the > target index? Yes, with 5K items to return it becomes pretty equal or even a little bit slower (up to ~1%) than the original code. But encoding is still taking the biggest part of the runtime. Side note: if user wants to return 5K values (or a table with 5K values) from Tarantool we can't stop him, but only to offer him our empathy. > > The patch looks fine to me, if the benchmarks show there is even a > slight perf win. But I think Igor should take another look. [1]: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/front-end-bound/itlb-overhead.html -- Best regards, Sergey Kaplun
next prev parent reply other threads:[~2021-06-29 7:09 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-18 18:14 Sergey Kaplun via Tarantool-patches 2021-06-21 20:36 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-22 13:38 ` Sergey Kaplun via Tarantool-patches 2021-06-24 20:00 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-29 7:07 ` Sergey Kaplun via Tarantool-patches [this message] 2021-08-01 12:34 ` Igor Munkin via Tarantool-patches 2021-08-02 14:25 ` Sergey Kaplun via Tarantool-patches 2021-08-12 17:35 ` Igor Munkin via Tarantool-patches 2021-08-13 7:30 ` Sergey Kaplun via Tarantool-patches 2021-08-13 7:41 ` Vitaliia Ioffe via Tarantool-patches 2021-08-13 9:27 ` Sergey Kaplun via Tarantool-patches 2021-08-04 22:29 ` Vladislav Shpilevoy via Tarantool-patches 2021-08-14 10:16 ` Igor Munkin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YNrGzydr84utZH0x@root \ --to=tarantool-patches@dev.tarantool.org \ --cc=skaplun@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] lua: refactor port_lua_do_dump and encode_lua_call' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox