Hi, Sergey! Thanks for the patch! LGTM   -- Best regards, Maxim Kokryashkin     >  >>Hi, Sergey! >>Thanks for the review! >>I've fixed your comment regarding test comment and force-pushed the >>branch. >> >>On 29.11.23, Sergey Bronnikov wrote: >>> Hello, Sergey >>> >>> thanks for the patch! >>> >>> LGTM with a three minor comments below >>> >>> On 11/28/23 15:21, Sergey Kaplun wrote: >>> > From: Mike Pall >>> > >>> > Reported by XmiliaH. >>> > >>> > (cherry-picked from commit e73916d811710ab02a4dfe447d621c99f4e7186c) >>> > >>> > The RETF IR has a side effect: it shifts base when returning to a lower >>> > frame, i.e., it affects `REF_BASE` IR (0000) (thus, we can say that this >>> > IR is violating SSA form). So any optimization of IRs with `REF_BASE` as >>> > an operand across RETF IR may lead to incorrect optimizations (see >>> > details in the test file). >>> > >>> > This patch adds rules to the folding engine to prevent CSE across `IR_RETF` >>> > for all possible IRs containing REF_BASE. >>> > >>> > Sergey Kaplun: >>> > * added the description and the test for the problem >>> > >>> > Part of tarantool/tarantool#9145 >>> > --- >>> > >>> > Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-784-cse-ref-base-over-retf >>> > Tarantool PR:https://github.com/tarantool/tarantool/pull/9421 >>> > Related issues: >>> > * https://github.com/LuaJIT/LuaJIT/issues/784 >>> > * https://github.com/tarantool/tarantool/issues/9145 >>> > >>> > Interested reviewers can mention that only the `SUB any BASE` case is >>> > tested. >>> > The reason is that other cases are impossible to record in LuaJIT: >>> > * EQ any BASE: EQ pgc REF_BASE IR for upvalues is emitted when >>> > the open upvalue aliases a SSA slot, i.e., it belongs to the frame of >>> > the currently executed function. In that case, if we want to emit RETF >>> > IR, we need to leave this function. So we need to record the UCLO >>> > bytecode, which is NIY in JIT. So, such a type of trace is impossible. >>> > * SUB BASE any: SUB BASE fr is emitted for the recording of VARG >>> >>> Nit: fr -> frame >>> >>> or put in backticks if you refer to a variable in source code >>> >>> > bytecode, in case varargs are undefined on trace. We need a vararg >>> > function to call to create an additional frame. But returning to lower >>> > frames from a vararg function isn't implemented in LuaJIT -- either >>> > the trace recording is stopped or the error is rased and the trace >>> > isn't compiled. Also, IINM, fr operands will always be different for >>> >>> Nit: fr -> frame >>> >>> or put in backticks if you refer to a variable in source code >> >>Since this is only a remider for review in the ML, I've not changed it:). >>I suppose that to mention this information in the commit message is >>excess and important only for clarification on review. >> >>> >>> > different frames, so there is no possible CSE here. >>> > >>> > So, these cases are needed to prevent any regressions in the future. >>> > >>> > Please correct me if I've missed something. >>> > >>> > src/lj_opt_fold.c | 11 +++ >>> > .../lj-784-cse-ref-base-over-retf.test.lua | 86 +++++++++++++++++++ >>> > 2 files changed, 97 insertions(+) >>> > create mode 100644 test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua >>> > >>> > diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c >>> > index c5f2232e..750f1c7e 100644 >>> > --- a/src/lj_opt_fold.c >>> > +++ b/src/lj_opt_fold.c >>> > @@ -2313,6 +2313,17 @@ LJFOLDF(xload_kptr) >>> > LJFOLD(XLOAD any any) >>> > LJFOLDX(lj_opt_fwd_xload) >>> > >>> > +/* -- Frame handling ------------------------------------------------------ */ >>> > + >>> > +/* Prevent CSE of a REF_BASE operand across IR_RETF. */ >>> > +LJFOLD(SUB any BASE) >>> > +LJFOLD(SUB BASE any) >>> > +LJFOLD(EQ any BASE) >>> > +LJFOLDF(fold_base) >>> > +{ >>> > + return lj_opt_cselim(J, J->chain[IR_RETF]); >>> > +} >>> > + >>> > /* -- Write barriers ------------------------------------------------------ */ >>> > >>> > /* Write barriers are amenable to CSE, but not across any incremental >>> > diff --git a/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua b/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua >>> > new file mode 100644 >>> > index 00000000..095376fc >>> > --- /dev/null >>> > +++ b/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua >>> > @@ -0,0 +1,86 @@ >>> > +local tap = require('tap') >>> > + >>> > +-- Test file to demonstrate incorrect FOLD optimization for IR >>> > +-- with REF_BASE operand across IR RETF. >>> > +-- See also,https://github.com/LuaJIT/LuaJIT/issues/784. >>> > + >>> > +local test = tap.test('lj-784-cse-ref-base-over-retf'):skipcond({ >>> > + ['Test requires JIT enabled'] = not jit.status(), >>> > +}) >>> > + >>> > +test:plan(1) >>> > + >>> > +-- The RETF IR has a side effect: it shifts base when returning to >>> > +-- a lower frame, i.e., it affects `REF_BASE` IR (0000) (thus, we >>> > +-- can say that this IR is violating SSA form). >>> > +-- So any optimization of IRs with `REF_BASE` as an operand across >>> > +-- RETF IR may lead to incorrect optimizations. >>> > +-- In this test, SUB uref REF_BASE IR was eliminated, so instead >>> > +-- the following trace: >>> > +-- >>> > +-- 0004 p32 SUB 0003 0000 >>> > +-- 0005 > p32 UGT 0004 +32 >>> > +-- ... >>> > +-- 0009 > p32 RETF proto: 0x407dc118 [0x407dc194] >>> > +-- ... >>> > +-- 0012 p32 SUB 0003 0000 >>> > +-- 0013 > p32 UGT 0012 +72 >>> > +-- >>> > +-- We got the following: >>> > +-- >>> > +-- 0004 p32 SUB 0003 0000 >>> > +-- 0005 > p32 UGT 0004 +32 >>> > +-- ... >>> > +-- 0009 > p32 RETF proto: 0x41ffe0c0 [0x41ffe13c] >>> > +-- ... >>> > +-- 0012 > p32 UGT 0004 +72 >>> > +-- >>> > +-- As you can see, the 0012 SUB IR is eliminated because it is the >>> > +-- same as the 0004 IR. This leads to incorrect assertion guards >>> > +-- in the IR below. >>> >>> I would rephrase it to "assertion guards in the resulted IR" >>> >>> because there is no IR below the comment. >> >>Fixed, branch is force-pushed. >> >>=================================================================== >>diff --git a/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua b/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua >>index 095376fc..d6442cbb 100644 >>--- a/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua >>+++ b/test/tarantool-tests/lj-784-cse-ref-base-over-retf.test.lua >>@@ -37,7 +37,7 @@ test:plan(1) >> -- >> -- As you can see, the 0012 SUB IR is eliminated because it is the >> -- same as the 0004 IR. This leads to incorrect assertion guards >>--- in the IR below. >>+-- in the resulted IR 0012 below. >>  >> local MAGIC = 42 >> -- XXX: simplify `jit.dump()` output. >>=================================================================== >> >>> >>> > + >>> > +local MAGIC = 42 >>> > +-- XXX: simplify `jit.dump()` output. >>> > +local fmod = math.fmod >>> > + >>> > +local function exit_with_retf(closure) >>> > + -- Forcify stitch. Any NYI is OK here. >>> > + fmod(1, 1) >>> > + -- Call the closure so that we have emitted `uref - REF_BASE`. >>> > + closure(0) >>> > + -- Exit with `IR_RETF`. This will change `REF_BASE`. >>> > +end >>> > + >>> > +local function sub_uref_base(closure) >>> > + local open_upvalue >>> > + if closure == nil then >>> > + closure = function(val) >>> > + local old = open_upvalue >>> > + open_upvalue = val >>> > + return old >>> > + end >>> > + -- First, create an additional frame, so we got the trace, >>> > + -- where the open upvalue reference is always < `REF_BASE`. >>> > + sub_uref_base(closure) >>> > + end >>> > + for _ = 1, 4 do >>> > + -- `closure` function is inherited from the previous frame. >>> > + exit_with_retf(closure) >>> > + open_upvalue = MAGIC >>> > + -- The open upvalue guard will use CSE over `IR_RETF` for >>> > + -- `uref - REF_BASE`. `IR_RETF` changed the value of >>> > + -- `REF_BASE`. >>> > + -- Thus, the guards afterwards take the wrong IR as the first >>> > + -- operand, so they are not failed, and the wrong value is >>> > + -- returned from the trace. >>> > + open_upvalue = closure(0) >>> > + end >>> > + return open_upvalue >>> > +end >>> > + >>> > +jit.opt.start('hotloop=1') >>> > + >>> > +local res = sub_uref_base() >>> > +test:is(res, MAGIC, 'no SUB uref REF_BASE CSE across RETF') >>> > + >>> > +test:done(true) >> >>-- >>Best regards, >>Sergey Kaplun >