From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 9CDA114F88FF; Mon, 25 Aug 2025 18:12:27 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 9CDA114F88FF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1756134747; bh=yMdPn6MynHbDh8Tx7VG+taBPHLt9Z7soP0FC5i0CQYw=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=hHtfsDswm5x2/t16ZkUbSwkNh/a+mF55TXHYbYfCminYupgRgygrojcetjY2T+vRq WwzMlkDL/EJd+8IVHRN97VsAR7iT4r1ZZ+e+TKgQ/5Hwrv/m43WAtHIhS8dThYQLRd 2M1qaQh+JO1DrzyUPSFpLXp70wn7uVSoDBS+mCV4= Received: from send220.i.mail.ru (send220.i.mail.ru [95.163.59.59]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 40CA014DFB7B for ; Mon, 25 Aug 2025 18:12:26 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 40CA014DFB7B Received: by exim-smtp-6944cbb85b-9k7j7 with esmtpa (envelope-from ) id 1uqYrx-00000000K6p-1acs; Mon, 25 Aug 2025 18:12:25 +0300 Content-Type: multipart/alternative; boundary="------------3NQBaCjN9hkJXoWrRrmsO6RH" Message-ID: <89fc7745-5d9b-48d1-9834-df58964df888@tarantool.org> Date: Mon, 25 Aug 2025 18:12:25 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <6a0d3fabe8caca468e02a319a45db6f2556dd2fe.1753344905.git.skaplun@tarantool.org> In-Reply-To: <6a0d3fabe8caca468e02a319a45db6f2556dd2fe.1753344905.git.skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9F7647FA575010556EB3CE62DFDA15333AF315CD67A6F6359182A05F538085040CC762B41C68522BC3DE06ABAFEAF6705183B7BCC63240DDD92DC6F55A08E6A16BE26E60622977F24 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE754CF51794FD91028EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB55337566FA0CDBC57965D23778F03A8C996E324A836E5A952E207642C43D96AD77E14C42389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A29E2F051442AF778941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6E5E764EB5D94DBD4CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C2249EEAEF4E9976DC6BE76E601842F6C81A12EF20D2F80756B5FB606B96278B59C4276E601842F6C81A127C277FBC8AE2E8BFAAB9F4395DEECFE3AA81AA40904B5D99C9F4D5AE37F343AD1F44FA8B9022EA23BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E7356D8C47C27EEC5E9FB5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A5F2E57F0F0F3222705002B1117B3ED696DEA6656953BFEDCEB91D2EB2DEE3878C823CB91A9FED034534781492E4B8EEAD21D4E6D365FE45D1BDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFF089BFD7732508AD0B9AEC1E7D9B8B095CAC58395353E09445FE196C2C394DE9392F865EE1B25CD542B2DB2C2921DC12042412D397AD8F7C5B42CC6C6A5370E69DBC71F9DFB17D84111DC66A97D0BFE2913E6812662D5F2AB9AF64DB4688768036DF5FE9C0001AF333F2C28C22F508233FCF178C6DD14203 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVXNcwk+fZooKqr1QnR7B9Vw= X-Mailru-Sender: 811C44EDE0507D1F797560C68D020EBD9AF1A40340CDE144BC5A20191D90C84FC719D3AEBEC6D2776418EEEEFCAB817D645D15D82EE4B272BD6E4642A116CA93524AA66B5ACBE6721EF430B9A63E2A504198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 3/3] ARM64: Prevent STP fusion for conditional code emitted by TBAR. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------3NQBaCjN9hkJXoWrRrmsO6RH Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, Sergey! thanks for the patch and a good explanation in the commit message! LGTM Sergey On 7/24/25 12:04, Sergey Kaplun wrote: > From: Mike Pall > > Thanks to Peter Cawley. > > (cherry picked from commit 7cc53f0b85f834dfba1516ea79d59db463e856fa) > > Assume we have a trace for the several `setmetatable()` calls to the > same table. This trace contains the following IR: > | 0011 p64 FREF 0003 tab.meta > | ... > | 0018 x0 > tab TNEW 0 0 > | 0019 tab TBAR 0003 > | 0020 tab FSTORE 0011 0018 > > The expected mcode to be emitted for the last two IRs is the following: > | 55626cffb0 ldrb w30, [x19, 8] ; tab->marked > | 55626cffb4 tst w30, 0x4 ; Is black? > | 55626cffb8 beq 0x626cffd0 ; Skip marking. > | 55626cffbc ldr x27, [x20, 128] > | 55626cffc0 and w30, w30, 0xfffffffb > | 55626cffc4 str x19, [x20, 128] > | 55626cffcc strb w30, [x19, 8] ; tab->marked > | 55626cffc8 str x27, [x19, 24] ; tab->gclist > | 55626cffd0 str x0, [x19, 32] ; tab->metatable > > But the last 2 instructions are fused into the following `stp`: > | 55581dffd0 stp x27, x0, [x19, 48] > Hence, the GC propagation frontier back is done partially, since > `str x27, [x19, 24]` is not skipped despite TBAR semantics. This leads > to the incorrect value in the `gclist` and the segmentation fault during > its traversal on GC step. > > This patch prevents this fusion via switching instruction for > `tab->gclist` and `tab->marked` storing. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#11691 > --- > src/lj_asm_arm64.h | 3 +- > ...1057-arm64-stp-fusing-across-tbar.test.lua | 79 +++++++++++++++++++ > 2 files changed, 81 insertions(+), 1 deletion(-) > create mode 100644 test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua > > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > index 5a6c60b7..9b3c0467 100644 > --- a/src/lj_asm_arm64.h > +++ b/src/lj_asm_arm64.h > @@ -1271,8 +1271,9 @@ static void asm_tbar(ASMState *as, IRIns *ir) > Reg link = ra_scratch(as, rset_exclude(RSET_GPR, tab)); > Reg mark = RID_TMP; > MCLabel l_end = emit_label(as); > - emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist)); > emit_lso(as, A64I_STRB, mark, tab, (int32_t)offsetof(GCtab, marked)); > + /* Keep STRx in the middle to avoid LDP/STP fusion with surrounding code. */ > + emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist)); > emit_setgl(as, tab, gc.grayagain); > emit_dn(as, A64I_ANDw^emit_isk13(~LJ_GC_BLACK, 0), mark, mark); > emit_getgl(as, link, gc.grayagain); > diff --git a/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua b/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua > new file mode 100644 > index 00000000..27d18916 > --- /dev/null > +++ b/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua > @@ -0,0 +1,79 @@ > +local tap = require('tap') > + > +-- This test demonstrates LuaJIT's incorrect fusing of store > +-- instructions separated by the conditional branch on arm64. > +-- See alsohttps://github.com/LuaJIT/LuaJIT/issues/1057. > +local test = tap.test('lj-1057-arm64-stp-fusing-across-tbar'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(2) > + > +-- XXX: Simplify the `jit.dump()` output. > +local setmetatable = setmetatable > + > +-- The function below generates the following IR: > +-- | 0011 p64 FREF 0003 tab.meta > +-- | ... > +-- | 0018 x0 > tab TNEW #0 #0 > +-- | 0019 tab TBAR 0003 > +-- | 0020 tab FSTORE 0011 0018 > +-- The expected mcode to be emitted for the last two IRs is the > +-- following: > +-- | 55626cffb0 ldrb w30, [x19, #8] ; tab->marked > +-- | 55626cffb4 tst w30, #0x4 ; Is black? > +-- | 55626cffb8 beq 0x626cffd0 ; Skip marking. > +-- | 55626cffbc ldr x27, [x20, #128] > +-- | 55626cffc0 and w30, w30, #0xfffffffb > +-- | 55626cffc4 str x19, [x20, #128] > +-- | 55626cffcc strb w30, [x19, #8] ; tab->marked > +-- | 55626cffc8 str x27, [x19, #24] ; tab->gclist > +-- | 55626cffd0 str x0, [x19, #32] ; tab->metatable > +-- > +-- But the last 2 instructions are fused into the following `stp`: > +-- | 55581dffd0 stp x27, x0, [x19, #48] > +-- Hence, the GC propagation frontier back is done partially, > +-- since `str x27, [x19, #24]` is not skipped despite TBAR > +-- semantics. This leads to the incorrect value in the `gclist` > +-- and the segmentation fault during its traversal on GC step. > +local function trace(target_t) > + -- Precreate a table for the FLOAD to avoid TNEW in between. > + local stack_t = {} > + -- Generate FSTORE TBAR pair. The FSTORE will be dropped due to > + -- the FSTORE below by DSE. > + setmetatable(target_t, {}) > + -- Generate FSTORE. TBAR will be dropped by CSE. > + setmetatable(target_t, stack_t) > +end > + > +jit.opt.start('hotloop=1') > + > +-- XXX: Need to trigger the GC on trace to introspect that the > +-- GC chain is broken. Use empirical 10000 iterations. > +local tab = {} > +for _ = 1, 1e4 do > + trace(tab) > +end > + > +test:ok(true, 'no assertion failure in the simple loop') > + > +-- The similar test, but be sure that we finish the whole GC > +-- cycle, plus using upvalue instead of stack slot for the target > +-- table. > + > +local target_t = {} > +local function trace2() > + local stack_t = {} > + setmetatable(target_t, {}) > + setmetatable(target_t, stack_t) > +end > + > +collectgarbage('collect') > +collectgarbage('setstepmul', 1) > +while not collectgarbage('step') do > + trace2() > +end > + > +test:ok(true, 'no assertion failure in the whole GC cycle') > + > +test:done(true) --------------3NQBaCjN9hkJXoWrRrmsO6RH Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi, Sergey!

thanks for the patch and a good explanation in the commit message!

LGTM

Sergey

On 7/24/25 12:04, Sergey Kaplun wrote:
From: Mike Pall <mike>

Thanks to Peter Cawley.

(cherry picked from commit 7cc53f0b85f834dfba1516ea79d59db463e856fa)

Assume we have a trace for the several `setmetatable()` calls to the
same table. This trace contains the following IR:
| 0011          p64 FREF   0003  tab.meta
| ...
| 0018 x0    >  tab TNEW   0    0
| 0019          tab TBAR   0003
| 0020          tab FSTORE 0011  0018

The expected mcode to be emitted for the last two IRs is the following:
| 55626cffb0  ldrb  w30, [x19, 8] ; tab->marked
| 55626cffb4  tst   w30, 0x4      ; Is black?
| 55626cffb8  beq   0x626cffd0     ; Skip marking.
| 55626cffbc  ldr   x27, [x20, 128]
| 55626cffc0  and   w30, w30, 0xfffffffb
| 55626cffc4  str   x19, [x20, 128]
| 55626cffcc  strb  w30, [x19, 8]  ; tab->marked
| 55626cffc8  str   x27, [x19, 24] ; tab->gclist
| 55626cffd0  str   x0,  [x19, 32] ; tab->metatable

But the last 2 instructions are fused into the following `stp`:
| 55581dffd0  stp   x27, x0, [x19, 48]
Hence, the GC propagation frontier back is done partially, since
`str x27, [x19, 24]` is not skipped despite TBAR semantics. This leads
to the incorrect value in the `gclist` and the segmentation fault during
its traversal on GC step.

This patch prevents this fusion via switching instruction for
`tab->gclist` and `tab->marked` storing.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11691
---
 src/lj_asm_arm64.h                            |  3 +-
 ...1057-arm64-stp-fusing-across-tbar.test.lua | 79 +++++++++++++++++++
 2 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua

diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
index 5a6c60b7..9b3c0467 100644
--- a/src/lj_asm_arm64.h
+++ b/src/lj_asm_arm64.h
@@ -1271,8 +1271,9 @@ static void asm_tbar(ASMState *as, IRIns *ir)
   Reg link = ra_scratch(as, rset_exclude(RSET_GPR, tab));
   Reg mark = RID_TMP;
   MCLabel l_end = emit_label(as);
-  emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist));
   emit_lso(as, A64I_STRB, mark, tab, (int32_t)offsetof(GCtab, marked));
+  /* Keep STRx in the middle to avoid LDP/STP fusion with surrounding code. */
+  emit_lso(as, A64I_STRx, link, tab, (int32_t)offsetof(GCtab, gclist));
   emit_setgl(as, tab, gc.grayagain);
   emit_dn(as, A64I_ANDw^emit_isk13(~LJ_GC_BLACK, 0), mark, mark);
   emit_getgl(as, link, gc.grayagain);
diff --git a/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua b/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua
new file mode 100644
index 00000000..27d18916
--- /dev/null
+++ b/test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua
@@ -0,0 +1,79 @@
+local tap = require('tap')
+
+-- This test demonstrates LuaJIT's incorrect fusing of store
+-- instructions separated by the conditional branch on arm64.
+-- See also https://github.com/LuaJIT/LuaJIT/issues/1057.
+local test = tap.test('lj-1057-arm64-stp-fusing-across-tbar'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(2)
+
+-- XXX: Simplify the `jit.dump()` output.
+local setmetatable = setmetatable
+
+-- The function below generates the following IR:
+-- | 0011          p64 FREF   0003  tab.meta
+-- | ...
+-- | 0018 x0    >  tab TNEW   #0    #0
+-- | 0019          tab TBAR   0003
+-- | 0020          tab FSTORE 0011  0018
+-- The expected mcode to be emitted for the last two IRs is the
+-- following:
+-- | 55626cffb0  ldrb  w30, [x19, #8] ; tab->marked
+-- | 55626cffb4  tst   w30, #0x4      ; Is black?
+-- | 55626cffb8  beq   0x626cffd0     ; Skip marking.
+-- | 55626cffbc  ldr   x27, [x20, #128]
+-- | 55626cffc0  and   w30, w30, #0xfffffffb
+-- | 55626cffc4  str   x19, [x20, #128]
+-- | 55626cffcc  strb  w30, [x19, #8]  ; tab->marked
+-- | 55626cffc8  str   x27, [x19, #24] ; tab->gclist
+-- | 55626cffd0  str   x0,  [x19, #32] ; tab->metatable
+--
+-- But the last 2 instructions are fused into the following `stp`:
+-- | 55581dffd0  stp   x27, x0, [x19, #48]
+-- Hence, the GC propagation frontier back is done partially,
+-- since `str x27, [x19, #24]` is not skipped despite TBAR
+-- semantics. This leads to the incorrect value in the `gclist`
+-- and the segmentation fault during its traversal on GC step.
+local function trace(target_t)
+  -- Precreate a table for the FLOAD to avoid TNEW in between.
+  local stack_t = {}
+  -- Generate FSTORE TBAR pair. The FSTORE will be dropped due to
+  -- the FSTORE below by DSE.
+  setmetatable(target_t, {})
+  -- Generate FSTORE. TBAR will be dropped by CSE.
+  setmetatable(target_t, stack_t)
+end
+
+jit.opt.start('hotloop=1')
+
+-- XXX: Need to trigger the GC on trace to introspect that the
+-- GC chain is broken. Use empirical 10000 iterations.
+local tab = {}
+for _ = 1, 1e4 do
+  trace(tab)
+end
+
+test:ok(true, 'no assertion failure in the simple loop')
+
+-- The similar test, but be sure that we finish the whole GC
+-- cycle, plus using upvalue instead of stack slot for the target
+-- table.
+
+local target_t = {}
+local function trace2()
+  local stack_t = {}
+  setmetatable(target_t, {})
+  setmetatable(target_t, stack_t)
+end
+
+collectgarbage('collect')
+collectgarbage('setstepmul', 1)
+while not collectgarbage('step') do
+  trace2()
+end
+
+test:ok(true, 'no assertion failure in the whole GC cycle')
+
+test:done(true)
--------------3NQBaCjN9hkJXoWrRrmsO6RH--