[Tarantool-patches] [PATCH] tuple: make tuple_bless() compilable

Igor Munkin imun at tarantool.org
Wed Oct 27 14:06:40 MSK 2021


Sergey,

Thanks for the patch! LGTM with some nits below.

On 22.10.21, Sergey Kaplun wrote:
> tuple_bless() uses a tail call to ffi.gc() with return to the caller.
> This tail call replaces the current (tuple_bless) frame with the frame
> of the callee (ffi.gc). When JIT tries to compile return from `ffi.gc()`
> to the frame below it aborts the trace recording with the error "NYI:
> return to lower frame".

Side note: for the root traces the issue is the same, but the error is
different.

> 
> This patch replaces the tail call with using additional local variable

Minor: You do not replace tail call, but rather don't give an option for
LuaJIT to emit CALLT. Anyway, just being pedantic, feel free to ignore.

> returned to the caller right after.
> ---
> 
> Actually, this patch become possible thanks to Michael Filonenko and his
> benchmarks of TDG runs with jit.dump() enabled. After analysis of this
> dump we realize that tuple_bless is not compiled. This uncompiled chunk
> of code leads to the JIT cancer for all possible workflows that use
> tuple_bless() (i.e. tuple:update() and tuple:upsert()). This change is
> really trivial, but adds almost x2 improvement of performance for
> tuple:update()/upsert() scenario. Hope, that this patch will be a
> stimulus for including benchmarks of our forward products like TDG to
> routine performance running with the corresponding profilers dumps.

Kekw, one-liner boosting update/upsert in two times -- nice catch!
Anyway, please check that your change doesn't affect overall perfomance
in interpreter mode too.

The bad thing in this, that we have no regular Lua benchmarks at all
(even those you provided below), so we can't watch the effect of such
changes regularly.

> 
> Benchmarks:
> 
> Before patch:
> 
> Update:
> | Tarantool 2.10.0-beta1-90-g31594b427
> | type 'help' for interactive help
> | tarantool> local t = {}
> |            for i = 1, 1e6 do
> |                table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'})
> |            end
> |            local clock = require"clock"
> |            local S = clock.proc()
> |            for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end
> |            return clock.proc() - S;
> | ---
> | - 4.208298872
> 
> Upsert: 4.158661731
> 
> After patch:
> 
> Update:
> | Tarantool 2.10.0-beta1-90-g31594b427
> | type 'help' for interactive help
> | tarantool> local t = {}
> |            for i = 1, 1e6 do
> |                table.insert(t, box.tuple.new{'abc', 'def', 'ghi', 'abc'})
> |            end
> |            local clock = require"clock"
> |            local S = clock.proc()
> |            for i = 1, 1e6 do t[i]:update{{"=", 3, "xxx"}} end
> |            return clock.proc() - S;
> | ---
> | - 2.357670738
> 
> Upsert: 2.334134195
> 
> Branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-tuple-bless-compile
> 
>  src/box/lua/tuple.lua | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua
> index fa76f4f7f..73446ab22 100644
> --- a/src/box/lua/tuple.lua
> +++ b/src/box/lua/tuple.lua
> @@ -98,7 +98,14 @@ local tuple_bless = function(tuple)
>      -- overflow checked by tuple_bless() in C
>      builtin.box_tuple_ref(tuple)
>      -- must never fail:
> -    return ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
> +    -- XXX: If we use tail call (instead creating a new frame for

Typo: s/instead/instead of/.

> +    -- a call just replace the top one) here, then JIT tries

Minor: I see "replace" for the second time, but LuaJIT just "use" the
caller frame for callee. I propose to s/replace/use/g, but this is
neglible, so feel free to ignore.

> +    -- to compile return from `ffi.gc()` to the frame below. This
> +    -- abort the trace recording with the error "NYI: return to

Typo: s/abort/aborts/.

> +    -- lower frame". So avoid tail call and use additional stack
> +    -- slots (for the local variable and the frame).
> +    local tuple_ref = ffi.gc(ffi.cast(const_tuple_ref_t, tuple), tuple_gc)
> +    return tuple_ref

Side note: Ugh... I'm sad we're doing things like this one. Complicating
the code, leaving huge comments with the rationale of such complicating
to reach the desirable (and what is important, local) performance. I
propose to spend your innovative time to try solving the problem in the
JIT engine: it will be more fun and allow us to avoid writing the
cookbook "How to write super-duper-jittable code in LuaJIT".

Here is the valid question: what about other hot places with CALLT in
Tarantool? Should they be considered/fixed? I guess a ticket will help
to not forget about this problem.

Anyway, for now the fix provides the considerable boost, so feel free to
proceed with the patch.

>  end
>  
>  local tuple_check = function(tuple, usage)
> -- 
> 2.31.0
> 

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list