[Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable
    Sergey Kaplun 
    skaplun at tarantool.org
       
    Wed Oct 27 16:16:43 MSK 2021
    
    
  
Igor,
Thanks for the review!
On 27.10.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! LGTM with some nits below.
> 
> On 22.10.21, Sergey Kaplun wrote:
> > tuple_bless() uses a tail call to ffi.gc() with return to the caller.
> > This tail call replaces the current (tuple_bless) frame with the frame
> > of the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()`
> > to the frame below it aborts the trace recording with the error "NYI:
> > return to lower frame".
> 
> Side note: for the root traces the issue is the same, but the error is
> different.
Yep.
> 
> > 
> > This patch replaces the tail call with using additional local variable
> 
> Minor: You do not replace tail call, but rather don't give an option for
> LuaJIT to emit CALLT. Anyway, just being pedantic, feel free to ignore.
So, the CALLT is replaced with a regular call :). Ignoring.
> 
> > returned to the caller right after.
> > ---
> > 
> > Actually, this patch become possible thanks to Michael Filonenko and his
> > benchmarks of TDG runs with jit.dump() enabled. After analysis of this
> > dump we realize that tuple_bless is not compiled. This uncompiled chunk
> > of code leads to the JIT cancer for all possible workflows that use
> > tuple_bless() (i.e. tuple:update() and tuple:upsert()). This change is
> > really trivial, but adds almost x2 improvement of performance for
> > tuple:update()/upsert() scenario. Hope, that this patch will be a
> > stimulus for including benchmarks of our forward products like TDG to
> > routine performance running with the corresponding profilers dumps.
> 
> Kekw, one-liner boosting update/upsert in two times -- nice catch!
> Anyway, please check that your change doesn't affect overall perfomance
> in interpreter mode too.
The new one (without tailcall) is 1% slower:
21.2 sec vs 21.0 sec with jit.off().
This looks like a good trade to me.
> 
> The bad thing in this, that we have no regular Lua benchmarks at all
> (even those you provided below), so we can't watch the effect of such
> changes regularly.
It's true.
> 
> > 
> > Benchmarks:
> > 
> > Before patch:
> > 
> > Update:
> > | Tarantool 2.10.0-beta1-90-g31594b427
> > | type 'help' for interactive help
> > | tarantool> local t = {}
> > |            for i = 1, 1e6 do
> > |                table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'})
> > |            end
> > |            local clock = require"clock"
> > |            local S = clock.proc()
> > |            for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end
> > |            return clock.proc() - S;
> > | ---
> > | - 4.208298872
> > 
> > Upsert: 4.158661731
> > 
> > After patch:
> > 
> > Update:
> > | Tarantool 2.10.0-beta1-90-g31594b427
> > | type 'help' for interactive help
> > | tarantool> local t = {}
> > |            for i = 1, 1e6 do
> > |                table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'})
> > |            end
> > |            local clock = require"clock"
> > |            local S = clock.proc()
> > |            for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end
> > |            return clock.proc() - S;
> > | ---
> > | - 2.357670738
> > 
> > Upsert: 2.334134195
> > 
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-tuple-bless-compile
> > 
> >  src/box/lua/tuple.lua | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua
> > index fa76f4f7f..73446ab22 100644
> > --- a/src/box/lua/tuple.lua
> > +++ b/src/box/lua/tuple.lua
> > @@ -98,7 +98,14 @@ local tuple_bless = function(tuple)
> >      -- overflow checked by tuple_bless() in C
> >      builtin.box_tuple_ref(tuple)
> >      -- must never fail:
> > -    return ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
> > +    -- XXX: If we use tail call (instead creating a new frame for
> 
> Typo: s/instead/instead of/.
> 
> > +    -- a call just replace the top one) here, then JIT tries
> 
> Minor: I see "replace" for the second time, but LuaJIT just "use" the
> caller frame for callee. I propose to s/replace/use/g, but this is
> neglible, so feel free to ignore.
> 
> > +    -- to compile return from `ffi.gc()` to the frame below. This
> > +    -- abort the trace recording with the error "NYI: return to
> 
> Typo: s/abort/aborts/.
Fixed your comments. See the iterative patch below.
Branch is force-pushed.
===================================================================
diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua
index 1201c7c34..f47b5926d 100644
--- a/src/box/lua/tuple.lua
+++ b/src/box/lua/tuple.lua
@@ -98,10 +98,10 @@ local tuple_bless = function(tuple)
     -- overflow checked by tuple_bless() in C
     builtin.box_tuple_ref(tuple)
     -- must never fail:
-    -- XXX: If we use tail call (instead creating a new frame for
-    -- a call just replace the top one) here, then JIT tries
-    -- to compile return from `ffi.gc()` to the frame below. This
-    -- abort the trace recording with the error "NYI: return to
+    -- XXX: If we use tail call (instead of creating a new frame
+    -- for a call just use the top one) here, then JIT tries to
+    -- compile return from `ffi.gc()` to the frame below. This
+    -- aborts the trace recording with the error "NYI: return to
     -- lower frame". So avoid tail call and use additional stack
     -- slots (for the local variable and the frame).
     local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
===================================================================
And the new commit message (remove "replace" usage):
===================================================================
tuple: make tuple_bless() compilable
tuple_bless() uses a tail call to ffi.gc() with return to the caller.
This tail call uses the current (tuple_bless) frame instead of creating
the frame for the callee (ffi.gc). When JIT tries to compile return from
`ffi.gc()` to the frame below it aborts the trace recording with the
error "NYI: return to lower frame".
This patch replaces the tail call with using additional local variable
returned to the caller right after.
===================================================================
> 
> > +    -- lower frame". So avoid tail call and use additional stack
> > +    -- slots (for the local variable and the frame).
> > +    local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
> > +    return tuple_ref
> 
> Side note: Ugh... I'm sad we're doing things like this one. Complicating
> the code, leaving huge comments with the rationale of such complicating
> to reach the desirable (and what is important, local) performance. I
> propose to spend your innovative time to try solving the problem in the
> JIT engine: it will be more fun and allow us to avoid writing the
> cookbook "How to write super-duper-jittable code in LuaJIT".
Yes, it is ugly workaround. The true way is to resolve problem with
compiling of return to frame below one where trace was started.
> 
> Here is the valid question: what about other hot places with CALLT in
> Tarantool? Should they be considered/fixed? I guess a ticket will help
> to not forget about this problem.
I suppose it should be created within the activity of regular testing our
most valuable products and "boxes".
> 
> Anyway, for now the fix provides the considerable boost, so feel free to
> proceed with the patch.
Thanks!:)
> 
> >  end
> >  
> >  local tuple_check = function(tuple, usage)
> > -- 
> > 2.31.0
> > 
> 
> -- 
> Best regards,
> IM
-- 
Best regards,
Sergey Kaplun
    
    
More information about the Tarantool-patches
mailing list