Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Igor Munkin <imun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable
Date: Wed, 27 Oct 2021 16:16:43 +0300	[thread overview]
Message-ID: <YXlRO8vu3+fjyJEO@root> (raw)
In-Reply-To: <20211027110640.GA8831@tarantool.org>

Igor,

Thanks for the review!

On 27.10.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! LGTM with some nits below.
> 
> On 22.10.21, Sergey Kaplun wrote:
> > tuple_bless() uses a tail call to ffi.gc() with return to the caller.
> > This tail call replaces the current (tuple_bless) frame with the frame
> > of the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()`
> > to the frame below it aborts the trace recording with the error "NYI:
> > return to lower frame".
> 
> Side note: for the root traces the issue is the same, but the error is
> different.

Yep.

> 
> > 
> > This patch replaces the tail call with using additional local variable
> 
> Minor: You do not replace tail call, but rather don't give an option for
> LuaJIT to emit CALLT. Anyway, just being pedantic, feel free to ignore.

So, the CALLT is replaced with a regular call :). Ignoring.

> 
> > returned to the caller right after.
> > ---
> > 
> > Actually, this patch become possible thanks to Michael Filonenko and his
> > benchmarks of TDG runs with jit.dump() enabled. After analysis of this
> > dump we realize that tuple_bless is not compiled. This uncompiled chunk
> > of code leads to the JIT cancer for all possible workflows that use
> > tuple_bless() (i.e. tuple:update() and tuple:upsert()). This change is
> > really trivial, but adds almost x2 improvement of performance for
> > tuple:update()/upsert() scenario. Hope, that this patch will be a
> > stimulus for including benchmarks of our forward products like TDG to
> > routine performance running with the corresponding profilers dumps.
> 
> Kekw, one-liner boosting update/upsert in two times -- nice catch!
> Anyway, please check that your change doesn't affect overall perfomance
> in interpreter mode too.

The new one (without tailcall) is 1% slower:
21.2 sec vs 21.0 sec with jit.off().
This looks like a good trade to me.

> 
> The bad thing in this, that we have no regular Lua benchmarks at all
> (even those you provided below), so we can't watch the effect of such
> changes regularly.

It's true.

> 
> > 
> > Benchmarks:
> > 
> > Before patch:
> > 
> > Update:
> > | Tarantool 2.10.0-beta1-90-g31594b427
> > | type 'help' for interactive help
> > | tarantool> local t = {}
> > |            for i = 1, 1e6 do
> > |                table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'})
> > |            end
> > |            local clock = require"clock"
> > |            local S = clock.proc()
> > |            for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end
> > |            return clock.proc() - S;
> > | ---
> > | - 4.208298872
> > 
> > Upsert: 4.158661731
> > 
> > After patch:
> > 
> > Update:
> > | Tarantool 2.10.0-beta1-90-g31594b427
> > | type 'help' for interactive help
> > | tarantool> local t = {}
> > |            for i = 1, 1e6 do
> > |                table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'})
> > |            end
> > |            local clock = require"clock"
> > |            local S = clock.proc()
> > |            for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end
> > |            return clock.proc() - S;
> > | ---
> > | - 2.357670738
> > 
> > Upsert: 2.334134195
> > 
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-tuple-bless-compile
> > 
> >  src/box/lua/tuple.lua | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua
> > index fa76f4f7f..73446ab22 100644
> > --- a/src/box/lua/tuple.lua
> > +++ b/src/box/lua/tuple.lua
> > @@ -98,7 +98,14 @@ local tuple_bless = function(tuple)
> >      -- overflow checked by tuple_bless() in C
> >      builtin.box_tuple_ref(tuple)
> >      -- must never fail:
> > -    return ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
> > +    -- XXX: If we use tail call (instead creating a new frame for
> 
> Typo: s/instead/instead of/.
> 
> > +    -- a call just replace the top one) here, then JIT tries
> 
> Minor: I see "replace" for the second time, but LuaJIT just "use" the
> caller frame for callee. I propose to s/replace/use/g, but this is
> neglible, so feel free to ignore.
> 
> > +    -- to compile return from `ffi.gc()` to the frame below. This
> > +    -- abort the trace recording with the error "NYI: return to
> 
> Typo: s/abort/aborts/.

Fixed your comments. See the iterative patch below.
Branch is force-pushed.

===================================================================
diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua
index 1201c7c34..f47b5926d 100644
--- a/src/box/lua/tuple.lua
+++ b/src/box/lua/tuple.lua
@@ -98,10 +98,10 @@ local tuple_bless = function(tuple)
     -- overflow checked by tuple_bless() in C
     builtin.box_tuple_ref(tuple)
     -- must never fail:
-    -- XXX: If we use tail call (instead creating a new frame for
-    -- a call just replace the top one) here, then JIT tries
-    -- to compile return from `ffi.gc()` to the frame below. This
-    -- abort the trace recording with the error "NYI: return to
+    -- XXX: If we use tail call (instead of creating a new frame
+    -- for a call just use the top one) here, then JIT tries to
+    -- compile return from `ffi.gc()` to the frame below. This
+    -- aborts the trace recording with the error "NYI: return to
     -- lower frame". So avoid tail call and use additional stack
     -- slots (for the local variable and the frame).
     local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
===================================================================

And the new commit message (remove "replace" usage):

===================================================================
tuple: make tuple_bless() compilable

tuple_bless() uses a tail call to ffi.gc() with return to the caller.
This tail call uses the current (tuple_bless) frame instead of creating
the frame for the callee (ffi.gc). When JIT tries to compile return from
`ffi.gc()` to the frame below it aborts the trace recording with the
error "NYI: return to lower frame".

This patch replaces the tail call with using additional local variable
returned to the caller right after.
===================================================================

> 
> > +    -- lower frame". So avoid tail call and use additional stack
> > +    -- slots (for the local variable and the frame).
> > +    local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
> > +    return tuple_ref
> 
> Side note: Ugh... I'm sad we're doing things like this one. Complicating
> the code, leaving huge comments with the rationale of such complicating
> to reach the desirable (and what is important, local) performance. I
> propose to spend your innovative time to try solving the problem in the
> JIT engine: it will be more fun and allow us to avoid writing the
> cookbook "How to write super-duper-jittable code in LuaJIT".

Yes, it is ugly workaround. The true way is to resolve problem with
compiling of return to frame below one where trace was started.

> 
> Here is the valid question: what about other hot places with CALLT in
> Tarantool? Should they be considered/fixed? I guess a ticket will help
> to not forget about this problem.

I suppose it should be created within the activity of regular testing our
most valuable products and "boxes".

> 
> Anyway, for now the fix provides the considerable boost, so feel free to
> proceed with the patch.

Thanks!:)

> 
> >  end
> >  
> >  local tuple_check = function(tuple, usage)
> > -- 
> > 2.31.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

  reply	other threads:[~2021-10-27 13:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-22 13:02 Sergey Kaplun via Tarantool-patches
2021-10-27 11:06 ` Igor Munkin via Tarantool-patches
2021-10-27 13:16   ` Sergey Kaplun via Tarantool-patches [this message]
2021-11-08 12:35   ` Nikita Pettik via Tarantool-patches
2021-11-10 12:31 ` Igor Munkin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXlRO8vu3+fjyJEO@root \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=imun@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox