Hi, Sergey! thanks for the patch and a good explanation in the commit message! LGTM Sergey On 7/24/25 12:04, Sergey Kaplun wrote: > From: Mike Pall > > Thanks to Peter Cawley. > > (cherry picked from commit 7cc53f0b85f834dfba1516ea79d59db463e856fa) > > Assume we have a trace for the several `setmetatable()` calls to the > same table. This trace contains the following IR: > | 0011 p64 FREF 0003 tab.meta > | ... > | 0018 x0 > tab TNEW 0 0 > | 0019 tab TBAR 0003 > | 0020 tab FSTORE 0011 0018 > > The expected mcode to be emitted for the last two IRs is the following: > | 55626cffb0 ldrb w30, [x19, 8] ; tab->marked > | 55626cffb4 tst w30, 0x4 ; Is black? > | 55626cffb8 beq 0x626cffd0 ; Skip marking. > | 55626cffbc ldr x27, [x20, 128] > | 55626cffc0 and w30, w30, 0xfffffffb > | 55626cffc4 str x19, [x20, 128] > | 55626cffcc strb w30, [x19, 8] ; tab->marked > | 55626cffc8 str x27, [x19, 24] ; tab->gclist > | 55626cffd0 str x0, [x19, 32] ; tab->metatable > > But the last 2 instructions are fused into the following `stp`: > | 55581dffd0 stp x27, x0, [x19, 48] > Hence, the GC propagation frontier back is done partially, since > `str x27, [x19, 24]` is not skipped despite TBAR semantics. This leads > to the incorrect value in the `gclist` and the segmentation fault during > its traversal on GC step. > > This patch prevents this fusion via switching instruction for > `tab->gclist` and `tab->marked` storing. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#11691 > --- > src/lj_asm_arm64.h | 3 +- > ...1057-arm64-stp-fusing-across-tbar.test.lua | 79 +++++++++++++++++++ > 2 files changed, 81 insertions(+), 1 deletion(-) > create mode 100644 test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua > > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > index 5a6c60b7..9b3c0467 100644 > --- a/src/lj_asm_arm64.h > +++ b/src/lj_asm_arm64.h > @@ -1271,8 +1271,9 @@ static void asm_tbar(ASMState *as, IRIns *ir) > Reg link = ra_scratch(as, rset_exclude(RSET_GPR, tab)); > Reg mark = RID_TMP; > MCLabel l_end = emit_label(as); > - emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist)); > emit_lso(as, A64I_STRB, mark, tab, (int32_t)offsetof(GCtab, marked)); > + /* Keep STRx in the middle to avoid LDP/STP fusion with surrounding code. */ > + emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist)); > emit_setgl(as, tab, gc.grayagain); > emit_dn(as, A64I_ANDw^emit_isk13(~LJ_GC_BLACK, 0), mark, mark); > emit_getgl(as, link, gc.grayagain); > diff --git a/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua b/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua > new file mode 100644 > index 00000000..27d18916 > --- /dev/null > +++ b/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua > @@ -0,0 +1,79 @@ > +local tap = require('tap') > + > +-- This test demonstrates LuaJIT's incorrect fusing of store > +-- instructions separated by the conditional branch on arm64. > +-- See alsohttps://github.com/LuaJIT/LuaJIT/issues/1057. > +local test = tap.test('lj-1057-arm64-stp-fusing-across-tbar'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(2) > + > +-- XXX: Simplify the `jit.dump()` output. > +local setmetatable = setmetatable > + > +-- The function below generates the following IR: > +-- | 0011 p64 FREF 0003 tab.meta > +-- | ... > +-- | 0018 x0 > tab TNEW #0 #0 > +-- | 0019 tab TBAR 0003 > +-- | 0020 tab FSTORE 0011 0018 > +-- The expected mcode to be emitted for the last two IRs is the > +-- following: > +-- | 55626cffb0 ldrb w30, [x19, #8] ; tab->marked > +-- | 55626cffb4 tst w30, #0x4 ; Is black? > +-- | 55626cffb8 beq 0x626cffd0 ; Skip marking. > +-- | 55626cffbc ldr x27, [x20, #128] > +-- | 55626cffc0 and w30, w30, #0xfffffffb > +-- | 55626cffc4 str x19, [x20, #128] > +-- | 55626cffcc strb w30, [x19, #8] ; tab->marked > +-- | 55626cffc8 str x27, [x19, #24] ; tab->gclist > +-- | 55626cffd0 str x0, [x19, #32] ; tab->metatable > +-- > +-- But the last 2 instructions are fused into the following `stp`: > +-- | 55581dffd0 stp x27, x0, [x19, #48] > +-- Hence, the GC propagation frontier back is done partially, > +-- since `str x27, [x19, #24]` is not skipped despite TBAR > +-- semantics. This leads to the incorrect value in the `gclist` > +-- and the segmentation fault during its traversal on GC step. > +local function trace(target_t) > + -- Precreate a table for the FLOAD to avoid TNEW in between. > + local stack_t = {} > + -- Generate FSTORE TBAR pair. The FSTORE will be dropped due to > + -- the FSTORE below by DSE. > + setmetatable(target_t, {}) > + -- Generate FSTORE. TBAR will be dropped by CSE. > + setmetatable(target_t, stack_t) > +end > + > +jit.opt.start('hotloop=1') > + > +-- XXX: Need to trigger the GC on trace to introspect that the > +-- GC chain is broken. Use empirical 10000 iterations. > +local tab = {} > +for _ = 1, 1e4 do > + trace(tab) > +end > + > +test:ok(true, 'no assertion failure in the simple loop') > + > +-- The similar test, but be sure that we finish the whole GC > +-- cycle, plus using upvalue instead of stack slot for the target > +-- table. > + > +local target_t = {} > +local function trace2() > + local stack_t = {} > + setmetatable(target_t, {}) > + setmetatable(target_t, stack_t) > +end > + > +collectgarbage('collect') > +collectgarbage('setstepmul', 1) > +while not collectgarbage('step') do > + trace2() > +end > + > +test:ok(true, 'no assertion failure in the whole GC cycle') > + > +test:done(true)