From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Igor Munkin <imun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable Date: Wed, 27 Oct 2021 16:16:43 +0300 [thread overview] Message-ID: <YXlRO8vu3+fjyJEO@root> (raw) In-Reply-To: <20211027110640.GA8831@tarantool.org> Igor, Thanks for the review! On 27.10.21, Igor Munkin wrote: > Sergey, > > Thanks for the patch! LGTM with some nits below. > > On 22.10.21, Sergey Kaplun wrote: > > tuple_bless() uses a tail call to ffi.gc() with return to the caller. > > This tail call replaces the current (tuple_bless) frame with the frame > > of the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()` > > to the frame below it aborts the trace recording with the error "NYI: > > return to lower frame". > > Side note: for the root traces the issue is the same, but the error is > different. Yep. > > > > > This patch replaces the tail call with using additional local variable > > Minor: You do not replace tail call, but rather don't give an option for > LuaJIT to emit CALLT. Anyway, just being pedantic, feel free to ignore. So, the CALLT is replaced with a regular call :). Ignoring. > > > returned to the caller right after. > > --- > > > > Actually, this patch become possible thanks to Michael Filonenko and his > > benchmarks of TDG runs with jit.dump() enabled. After analysis of this > > dump we realize that tuple_bless is not compiled. This uncompiled chunk > > of code leads to the JIT cancer for all possible workflows that use > > tuple_bless() (i.e. tuple:update() and tuple:upsert()). This change is > > really trivial, but adds almost x2 improvement of performance for > > tuple:update()/upsert() scenario. Hope, that this patch will be a > > stimulus for including benchmarks of our forward products like TDG to > > routine performance running with the corresponding profilers dumps. > > Kekw, one-liner boosting update/upsert in two times -- nice catch! > Anyway, please check that your change doesn't affect overall perfomance > in interpreter mode too. The new one (without tailcall) is 1% slower: 21.2 sec vs 21.0 sec with jit.off(). This looks like a good trade to me. > > The bad thing in this, that we have no regular Lua benchmarks at all > (even those you provided below), so we can't watch the effect of such > changes regularly. It's true. > > > > > Benchmarks: > > > > Before patch: > > > > Update: > > | Tarantool 2.10.0-beta1-90-g31594b427 > > | type 'help' for interactive help > > | tarantool> local t = {} > > | for i = 1, 1e6 do > > | table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'}) > > | end > > | local clock = require"clock" > > | local S = clock.proc() > > | for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end > > | return clock.proc() - S; > > | --- > > | - 4.208298872 > > > > Upsert: 4.158661731 > > > > After patch: > > > > Update: > > | Tarantool 2.10.0-beta1-90-g31594b427 > > | type 'help' for interactive help > > | tarantool> local t = {} > > | for i = 1, 1e6 do > > | table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'}) > > | end > > | local clock = require"clock" > > | local S = clock.proc() > > | for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end > > | return clock.proc() - S; > > | --- > > | - 2.357670738 > > > > Upsert: 2.334134195 > > > > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-tuple-bless-compile > > > > src/box/lua/tuple.lua | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > > index fa76f4f7f..73446ab22 100644 > > --- a/src/box/lua/tuple.lua > > +++ b/src/box/lua/tuple.lua > > @@ -98,7 +98,14 @@ local tuple_bless = function(tuple) > > -- overflow checked by tuple_bless() in C > > builtin.box_tuple_ref(tuple) > > -- must never fail: > > - return ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) > > + -- XXX: If we use tail call (instead creating a new frame for > > Typo: s/instead/instead of/. > > > + -- a call just replace the top one) here, then JIT tries > > Minor: I see "replace" for the second time, but LuaJIT just "use" the > caller frame for callee. I propose to s/replace/use/g, but this is > neglible, so feel free to ignore. > > > + -- to compile return from `ffi.gc()` to the frame below. This > > + -- abort the trace recording with the error "NYI: return to > > Typo: s/abort/aborts/. Fixed your comments. See the iterative patch below. Branch is force-pushed. =================================================================== diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua index 1201c7c34..f47b5926d 100644 --- a/src/box/lua/tuple.lua +++ b/src/box/lua/tuple.lua @@ -98,10 +98,10 @@ local tuple_bless = function(tuple) -- overflow checked by tuple_bless() in C builtin.box_tuple_ref(tuple) -- must never fail: - -- XXX: If we use tail call (instead creating a new frame for - -- a call just replace the top one) here, then JIT tries - -- to compile return from `ffi.gc()` to the frame below. This - -- abort the trace recording with the error "NYI: return to + -- XXX: If we use tail call (instead of creating a new frame + -- for a call just use the top one) here, then JIT tries to + -- compile return from `ffi.gc()` to the frame below. This + -- aborts the trace recording with the error "NYI: return to -- lower frame". So avoid tail call and use additional stack -- slots (for the local variable and the frame). local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) =================================================================== And the new commit message (remove "replace" usage): =================================================================== tuple: make tuple_bless() compilable tuple_bless() uses a tail call to ffi.gc() with return to the caller. This tail call uses the current (tuple_bless) frame instead of creating the frame for the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()` to the frame below it aborts the trace recording with the error "NYI: return to lower frame". This patch replaces the tail call with using additional local variable returned to the caller right after. =================================================================== > > > + -- lower frame". So avoid tail call and use additional stack > > + -- slots (for the local variable and the frame). > > + local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc) > > + return tuple_ref > > Side note: Ugh... I'm sad we're doing things like this one. Complicating > the code, leaving huge comments with the rationale of such complicating > to reach the desirable (and what is important, local) performance. I > propose to spend your innovative time to try solving the problem in the > JIT engine: it will be more fun and allow us to avoid writing the > cookbook "How to write super-duper-jittable code in LuaJIT". Yes, it is ugly workaround. The true way is to resolve problem with compiling of return to frame below one where trace was started. > > Here is the valid question: what about other hot places with CALLT in > Tarantool? Should they be considered/fixed? I guess a ticket will help > to not forget about this problem. I suppose it should be created within the activity of regular testing our most valuable products and "boxes". > > Anyway, for now the fix provides the considerable boost, so feel free to > proceed with the patch. Thanks!:) > > > end > > > > local tuple_check = function(tuple, usage) > > -- > > 2.31.0 > > > > -- > Best regards, > IM -- Best regards, Sergey Kaplun
next prev parent reply other threads:[~2021-10-27 13:18 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-10-22 13:02 Sergey Kaplun via Tarantool-patches 2021-10-27 11:06 ` Igor Munkin via Tarantool-patches 2021-10-27 13:16 ` Sergey Kaplun via Tarantool-patches [this message] 2021-11-08 12:35 ` Nikita Pettik via Tarantool-patches 2021-11-10 12:31 ` Igor Munkin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YXlRO8vu3+fjyJEO@root \ --to=tarantool-patches@dev.tarantool.org \ --cc=imun@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox