From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 90597E42CF3; Tue, 14 Jan 2025 17:15:22 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 90597E42CF3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1736864122; bh=nsWx7+YtLMzAH8jdBr1pU55XMsKIfdVIrLngZDqUV5k=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=dMSToyH5dcFTUJWwNZAAFp1NguL4Y09rWldm6FjIi6E48a8CAZRFP8gr6AtUJ8hrE cLW6jlkQ857C5vJMVR8d2yBq5pkriaryZwHheMjCHwOzNTRww3rqMM+bkej/sUoI1j uUvaa5V1bo/NkCEC77PnSBKgMrT+cIypj3Qn41sQ= Received: from send196.i.mail.ru (send196.i.mail.ru [95.163.59.35]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3790AE42CF2 for ; Tue, 14 Jan 2025 17:15:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3790AE42CF2 Received: by exim-smtp-6758d5575c-6jnmj with esmtpa (envelope-from ) id 1tXhhQ-00000000EyT-0fa4; Tue, 14 Jan 2025 17:15:20 +0300 Content-Type: multipart/alternative; boundary="------------WfT7i0riXfCbYRPkd2qOp3kF" Message-ID: <4b6699be-f43b-44a0-a75d-1b5ef9cf734b@tarantool.org> Date: Tue, 14 Jan 2025 17:15:19 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD97BF177E4F5EB85B858D7A5F0330C956A09C553812ED24DC200894C459B0CD1B9CCFCBB5E5A4B666003ED270C30F246C5D1C861F296B30F7AAEA645EAF6F4634BF8D9167951636F18 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE79145AB6E9E75F07EEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637FD169B9D7A3022168638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D884E617DF078BAA55642CBBCD056DCDAFD3033C3EA8F535A3CC7F00164DA146DAFE8445B8C89999728AA50765F790063741F7343E26298569389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8A9FF340AA05FB58CF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947C4E7D9683544204AF6E0066C2D8992A164AD6D5ED66289B523666184CF4C3C14F6136E347CC761E07725E5C173C3A84C3B74263D4D5690889BA3038C0950A5D36B5C8C57E37DE458B330BD67F2E7D9AF16D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE7CD707F342D9BDC98731C566533BA786AA5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A5F48D005C206FFBA15002B1117B3ED696ABD7CD0FA1B2FCF4E41E333F9D1358D5823CB91A9FED034534781492E4B8EEADB1D70E2111C441FFBDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D344FDECC3B9E4D57D4215161091589EB403740C0B35EA6F8BF518E98E86F0A9203BB32AAB92CE873FC1D7E09C32AA3244C83203E8E1034F6C577DD89D51EBB774276CF88A97A09AB4AEA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59978A700BF655EAEEED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj7YQzuqfQVvxZ0DkJLGWbGQ== X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D6140CCFCBB5E5A4B666003ED270C30F246C53290C8A4C08016B50152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------WfT7i0riXfCbYRPkd2qOp3kF Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, Sergey! thanks for the patch! LGTM On 10.01.2025 16:07, Sergey Kaplun wrote: > From: Mike Pall > > Reported by Peter Cawley. > > (cherry picked from commit 644723649ea04cb23b72c814b88b72a29e4afed4) > > Load fusing optimization doesn't take into account the presence of the > `IR_NEWREF` which may cause rehashing and deallocation of the array part > of the table. This may lead to the incorrect stores if the fusing > optimization occurs across this IR, leading to inconsistent behaviour > between the JIT and the VM. > > This patch adds the corresponding check by the refactoring of the > `noconflict()` function -- it now accepts the mask of the `check` as the > last argument. The first bit stands for the `IR_NEWREF` check, the > second for the multiple reference of the given instruction. > Unfortunately, this commit misses the check for the `table.clear()` > introduced for the preprevious patch. Thus, the corresponding test fails > again. This will be fixed in the next commit. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#10709 > --- > src/lj_asm_x86.h | 17 +++--- > .../lj-1117-fuse-across-newref.test.lua | 52 +++++++++++++++++++ > 2 files changed, 61 insertions(+), 8 deletions(-) > create mode 100644 test/tarantool-tests/lj-1117-fuse-across-newref.test.lua > > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h > index cba7ba80..d77087d6 100644 > --- a/src/lj_asm_x86.h > +++ b/src/lj_asm_x86.h > @@ -109,7 +109,7 @@ static int asm_isk32(ASMState *as, IRRef ref, int32_t *k) > /* Check if there's no conflicting instruction between curins and ref. > ** Also avoid fusing loads if there are multiple references. > */ > -static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload) > +static int noconflict(ASMState *as, IRRef ref, IROp conflict, int check) > { > IRIns *ir = as->ir; > IRRef i = as->curins; > @@ -118,7 +118,9 @@ static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload) > while (--i > ref) { > if (ir[i].o == conflict) > return 0; /* Conflict found. */ > - else if (!noload && (ir[i].op1 == ref || ir[i].op2 == ref)) > + else if ((check & 1) && ir[i].o == IR_NEWREF) > + return 0; > + else if ((check & 2) && (ir[i].op1 == ref || ir[i].op2 == ref)) > return 0; > } > return 1; /* Ok, no conflict. */ > @@ -134,7 +136,7 @@ static IRRef asm_fuseabase(ASMState *as, IRRef ref) > lj_assertA(irb->op2 == IRFL_TAB_ARRAY, "expected FLOAD TAB_ARRAY"); > /* We can avoid the FLOAD of t->array for colocated arrays. */ > if (ira->o == IR_TNEW && ira->op1 <= LJ_MAX_COLOSIZE && > - !neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 1)) { > + !neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 0)) { > as->mrm.ofs = (int32_t)sizeof(GCtab); /* Ofs to colocated array. */ > return irb->op1; /* Table obj. */ > } > @@ -448,7 +450,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow) > RegSet xallow = (allow & RSET_GPR) ? allow : RSET_GPR; > if (ir->o == IR_SLOAD) { > if (!(ir->op2 & (IRSLOAD_PARENT|IRSLOAD_CONVERT)) && > - noconflict(as, ref, IR_RETF, 0) && > + noconflict(as, ref, IR_RETF, 2) && > !(LJ_GC64 && irt_isaddr(ir->t))) { > as->mrm.base = (uint8_t)ra_alloc1(as, REF_BASE, xallow); > as->mrm.ofs = 8*((int32_t)ir->op1-1-LJ_FR2) + > @@ -459,13 +461,12 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow) > } else if (ir->o == IR_FLOAD) { > /* Generic fusion is only ok for 32 bit operand (but see asm_comp). */ > if ((irt_isint(ir->t) || irt_isu32(ir->t) || irt_isaddr(ir->t)) && > - noconflict(as, ref, IR_FSTORE, 0)) { > + noconflict(as, ref, IR_FSTORE, 2)) { > asm_fusefref(as, ir, xallow); > return RID_MRM; > } > } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) { > - if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) && > - noconflict(as, ref, IR_CALLS, 1) && /* Don't cross table.clear. */ > + if (noconflict(as, ref, ir->o + IRDELTA_L2S, 2+(ir->o != IR_ULOAD)) && > !(LJ_GC64 && irt_isaddr(ir->t))) { > asm_fuseahuref(as, ir->op1, xallow); > return RID_MRM; > @@ -475,7 +476,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow) > ** Fusing unaligned memory operands is ok on x86 (except for SIMD types). > */ > if ((!irt_typerange(ir->t, IRT_I8, IRT_U16)) && > - noconflict(as, ref, IR_XSTORE, 0)) { > + noconflict(as, ref, IR_XSTORE, 2)) { > asm_fusexref(as, ir->op1, xallow); > return RID_MRM; > } > diff --git a/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua > new file mode 100644 > index 00000000..4b8772bf > --- /dev/null > +++ b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua > @@ -0,0 +1,52 @@ > +local tap = require('tap') > +-- Test file to demonstrate LuaJIT's incorrect fusion across > +-- `IR_NEWREF`. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1117. > +local test = tap.test('lj-1117-fuse-across-newref'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +local ffi = require('ffi') > + > +test:plan(1) > + > +-- Table with content. > +local tab = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 42} > +-- Use the alias to trick the code flow analysis. > +local tab_alias = tab > +local result_tab = {} > + > +-- Need to start recording trace at the 16th iteration to avoid > +-- rehashing of the `t` and `result_tab` before the `if` > +-- condition below on the 32nd iteration. Also, the inner loop > +-- isn't recorded this way. After rehashing in the NEWREF, the > +-- fusion will use the wrong address, which leads to the dirty > +-- reads visible (always, not flaky) under Valgrind with the > +-- `--free-fill` option set. > +jit.opt.start('hotloop=16') > + > +-- The amount of iterations required for the rehashing of the > +-- table. > +for i = 1, 33 do > + -- ALOAD to be fused. > + local value = tab[16] > + -- NEWREF instruction. > + tab_alias[{}] = 100 > + -- Need this CONV cast to trigger load fusion. See `asm_comp()` > + -- for the details. Before the patch, this fusion takes the > + -- incorrect address of the already deallocated array part of > + -- the table, which leads to the incorrect result. > + result_tab[i] = ffi.cast('int64_t', value) > + if i == 32 then > + -- Clear the array part. > + for j = 1, 15 do > + tab[j] = nil > + end > + -- Next rehash of the `tab`/`tab_alias` will dealloc the array > + -- part. > + end > +end > + > +test:samevalues(result_tab, 'no fusion across NEWREF') > + > +test:done(true) --------------WfT7i0riXfCbYRPkd2qOp3kF Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi, Sergey!

thanks for the patch! LGTM

On 10.01.2025 16:07, Sergey Kaplun wrote:
From: Mike Pall <mike>

Reported by Peter Cawley.

(cherry picked from commit 644723649ea04cb23b72c814b88b72a29e4afed4)

Load fusing optimization doesn't take into account the presence of the
`IR_NEWREF` which may cause rehashing and deallocation of the array part
of the table. This may lead to the incorrect stores if the fusing
optimization occurs across this IR, leading to inconsistent behaviour
between the JIT and the VM.

This patch adds the corresponding check by the refactoring of the
`noconflict()` function -- it now accepts the mask of the `check` as the
last argument. The first bit stands for the `IR_NEWREF` check, the
second for the multiple reference of the given instruction.
Unfortunately, this commit misses the check for the `table.clear()`
introduced for the preprevious patch. Thus, the corresponding test fails
again. This will be fixed in the next commit.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709
---
 src/lj_asm_x86.h                              | 17 +++---
 .../lj-1117-fuse-across-newref.test.lua       | 52 +++++++++++++++++++
 2 files changed, 61 insertions(+), 8 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1117-fuse-across-newref.test.lua

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index cba7ba80..d77087d6 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -109,7 +109,7 @@ static int asm_isk32(ASMState *as, IRRef ref, int32_t *k)
 /* Check if there's no conflicting instruction between curins and ref.
 ** Also avoid fusing loads if there are multiple references.
 */
-static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload)
+static int noconflict(ASMState *as, IRRef ref, IROp conflict, int check)
 {
   IRIns *ir = as->ir;
   IRRef i = as->curins;
@@ -118,7 +118,9 @@ static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload)
   while (--i > ref) {
     if (ir[i].o == conflict)
       return 0;  /* Conflict found. */
-    else if (!noload && (ir[i].op1 == ref || ir[i].op2 == ref))
+    else if ((check & 1) && ir[i].o == IR_NEWREF)
+      return 0;
+    else if ((check & 2) && (ir[i].op1 == ref || ir[i].op2 == ref))
       return 0;
   }
   return 1;  /* Ok, no conflict. */
@@ -134,7 +136,7 @@ static IRRef asm_fuseabase(ASMState *as, IRRef ref)
     lj_assertA(irb->op2 == IRFL_TAB_ARRAY, "expected FLOAD TAB_ARRAY");
     /* We can avoid the FLOAD of t->array for colocated arrays. */
     if (ira->o == IR_TNEW && ira->op1 <= LJ_MAX_COLOSIZE &&
-	!neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 1)) {
+	!neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 0)) {
       as->mrm.ofs = (int32_t)sizeof(GCtab);  /* Ofs to colocated array. */
       return irb->op1;  /* Table obj. */
     }
@@ -448,7 +450,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
     RegSet xallow = (allow & RSET_GPR) ? allow : RSET_GPR;
     if (ir->o == IR_SLOAD) {
       if (!(ir->op2 & (IRSLOAD_PARENT|IRSLOAD_CONVERT)) &&
-	  noconflict(as, ref, IR_RETF, 0) &&
+	  noconflict(as, ref, IR_RETF, 2) &&
 	  !(LJ_GC64 && irt_isaddr(ir->t))) {
 	as->mrm.base = (uint8_t)ra_alloc1(as, REF_BASE, xallow);
 	as->mrm.ofs = 8*((int32_t)ir->op1-1-LJ_FR2) +
@@ -459,13 +461,12 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
     } else if (ir->o == IR_FLOAD) {
       /* Generic fusion is only ok for 32 bit operand (but see asm_comp). */
       if ((irt_isint(ir->t) || irt_isu32(ir->t) || irt_isaddr(ir->t)) &&
-	  noconflict(as, ref, IR_FSTORE, 0)) {
+	  noconflict(as, ref, IR_FSTORE, 2)) {
 	asm_fusefref(as, ir, xallow);
 	return RID_MRM;
       }
     } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
-      if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
-	  noconflict(as, ref, IR_CALLS, 1) &&  /* Don't cross table.clear. */
+      if (noconflict(as, ref, ir->o + IRDELTA_L2S, 2+(ir->o != IR_ULOAD)) &&
 	  !(LJ_GC64 && irt_isaddr(ir->t))) {
 	asm_fuseahuref(as, ir->op1, xallow);
 	return RID_MRM;
@@ -475,7 +476,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
       ** Fusing unaligned memory operands is ok on x86 (except for SIMD types).
       */
       if ((!irt_typerange(ir->t, IRT_I8, IRT_U16)) &&
-	  noconflict(as, ref, IR_XSTORE, 0)) {
+	  noconflict(as, ref, IR_XSTORE, 2)) {
 	asm_fusexref(as, ir->op1, xallow);
 	return RID_MRM;
       }
diff --git a/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
new file mode 100644
index 00000000..4b8772bf
--- /dev/null
+++ b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
@@ -0,0 +1,52 @@
+local tap = require('tap')
+-- Test file to demonstrate LuaJIT's incorrect fusion across
+-- `IR_NEWREF`.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1117.
+local test = tap.test('lj-1117-fuse-across-newref'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+local ffi = require('ffi')
+
+test:plan(1)
+
+-- Table with content.
+local tab = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 42}
+-- Use the alias to trick the code flow analysis.
+local tab_alias = tab
+local result_tab = {}
+
+-- Need to start recording trace at the 16th iteration to avoid
+-- rehashing of the `t` and `result_tab` before the `if`
+-- condition below on the 32nd iteration. Also, the inner loop
+-- isn't recorded this way. After rehashing in the NEWREF, the
+-- fusion will use the wrong address, which leads to the dirty
+-- reads visible (always, not flaky) under Valgrind with the
+-- `--free-fill` option set.
+jit.opt.start('hotloop=16')
+
+-- The amount of iterations required for the rehashing of the
+-- table.
+for i = 1, 33 do
+  -- ALOAD to be fused.
+  local value = tab[16]
+  -- NEWREF instruction.
+  tab_alias[{}] = 100
+  -- Need this CONV cast to trigger load fusion. See `asm_comp()`
+  -- for the details. Before the patch, this fusion takes the
+  -- incorrect address of the already deallocated array part of
+  -- the table, which leads to the incorrect result.
+  result_tab[i] = ffi.cast('int64_t', value)
+  if i == 32 then
+    -- Clear the array part.
+    for j = 1, 15 do
+      tab[j] = nil
+    end
+    -- Next rehash of the `tab`/`tab_alias` will dealloc the array
+    -- part.
+  end
+end
+
+test:samevalues(result_tab, 'no fusion across NEWREF')
+
+test:done(true)
--------------WfT7i0riXfCbYRPkd2qOp3kF--