Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit 0/4] Fixes for load fusing optimization
@ 2025-01-10 13:07 Sergey Kaplun via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear Sergey Kaplun via Tarantool-patches
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-10 13:07 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

The patch set fixes load fusing optimization by preventing it across:
* `table.clear()` (1st, 2nd, 4th commits)
* `IR_NEWREF` (3rd commit)

Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1117-loads-fusion
Related issues:
* https://github.com/LuaJIT/LuaJIT/issues/1117
* https://github.com/tarantool/tarantool/issues/10709

Mike Pall (4):
  x86/x64: Don't fuse loads across table.clear.
  Improve last commit.
  x86/x64: Don't fuse loads across IR_NEWREF.
  Fix last commit.

 src/lj_asm_x86.h                              | 16 +++---
 .../lj-1117-fuse-across-newref.test.lua       | 52 +++++++++++++++++++
 .../lj-1117-fuse-across-table-clear.test.lua  | 36 +++++++++++++
 3 files changed, 97 insertions(+), 7 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
 create mode 100644 test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua

-- 
2.47.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear.
  2025-01-10 13:07 [Tarantool-patches] [PATCH luajit 0/4] Fixes for load fusing optimization Sergey Kaplun via Tarantool-patches
@ 2025-01-10 13:07 ` Sergey Kaplun via Tarantool-patches
  2025-01-14 14:10   ` Sergey Bronnikov via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 2/4] Improve last commit Sergey Kaplun via Tarantool-patches
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-10 13:07 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Reported by Peter Cawley.

(cherry picked from commit 45c88b7963de2969a9a656c03ba06ad995d7fd5f)

Load fusing optimization takes into account only the presence of the
corresponding stores, but not any calls that may affect the table
content. This may lead to the incorrect stores if the fusing
optimization occurs across the `table.clear()` call, leading to
inconsistent behaviour between the JIT and the VM.

This patch adds the corresponding check.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709
---
 src/lj_asm_x86.h                              |  1 +
 .../lj-1117-fuse-across-table-clear.test.lua  | 36 +++++++++++++++++++
 2 files changed, 37 insertions(+)
 create mode 100644 test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index 86ce3937..f47c460a 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -465,6 +465,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
       }
     } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
       if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
+	  noconflict(as, ref, IR_CALLS, 0) &&  /* Don't cross table.clear. */
 	  !(LJ_GC64 && irt_isaddr(ir->t))) {
 	asm_fuseahuref(as, ir->op1, xallow);
 	return RID_MRM;
diff --git a/test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua b/test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua
new file mode 100644
index 00000000..2f7c91d1
--- /dev/null
+++ b/test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua
@@ -0,0 +1,36 @@
+local tap = require('tap')
+-- Test file to demonstrate LuaJIT's incorrect fusion across
+-- `table.clear()`.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1117.
+local test = tap.test('lj-1117-fuse-across-table-clear'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+local ffi = require('ffi')
+local table_clear = require('table.clear')
+
+test:plan(1)
+
+local tab = {0}
+local alias_tab = tab
+local result_tab = {}
+
+jit.opt.start('hotloop=1')
+
+for i = 1, 4 do
+  -- Load to be fused.
+  local value = tab[1]
+  -- Clear the alias table to trick the code flow analysis.
+  table_clear(alias_tab)
+  -- Need this cast to trigger load fusion. See `asm_comp()` for
+  -- the details. Before the patch, this fusion takes the
+  -- incorrect address of the already cleared table, which leads
+  -- to the failure of the check below.
+  result_tab[i] = ffi.cast('int64_t', value)
+  -- Revive the value.
+  tab[1] = 0
+end
+
+test:samevalues(result_tab, 'no fusion across table.clear')
+
+test:done(true)
-- 
2.47.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH luajit 2/4] Improve last commit.
  2025-01-10 13:07 [Tarantool-patches] [PATCH luajit 0/4] Fixes for load fusing optimization Sergey Kaplun via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear Sergey Kaplun via Tarantool-patches
@ 2025-01-10 13:07 ` Sergey Kaplun via Tarantool-patches
  2025-01-14 14:11   ` Sergey Bronnikov via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF Sergey Kaplun via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 4/4] Fix last commit Sergey Kaplun via Tarantool-patches
  3 siblings, 1 reply; 9+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-10 13:07 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry picked from commit 113a168b792cd367822ec04cdc2ef32facd28efa)

The `noconflict()` function checks if there's no conflicting instruction
between the current instruction and the given `ref` instruction. Also,
it avoids fusing loads if there are multiple references of the given
`ref`. The last check is performed in the presence of the `noload`
parameter. Since the `noconflict()`, which is added in the previous
patch, checks conflicts for the same `ref` as the call before, there is
no need to perform these checks again, so the corresponding parameter is
adjusted.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#10709
---
 src/lj_asm_x86.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index f47c460a..cba7ba80 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -465,7 +465,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
       }
     } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
       if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
-	  noconflict(as, ref, IR_CALLS, 0) &&  /* Don't cross table.clear. */
+	  noconflict(as, ref, IR_CALLS, 1) &&  /* Don't cross table.clear. */
 	  !(LJ_GC64 && irt_isaddr(ir->t))) {
 	asm_fuseahuref(as, ir->op1, xallow);
 	return RID_MRM;
-- 
2.47.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF.
  2025-01-10 13:07 [Tarantool-patches] [PATCH luajit 0/4] Fixes for load fusing optimization Sergey Kaplun via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear Sergey Kaplun via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 2/4] Improve last commit Sergey Kaplun via Tarantool-patches
@ 2025-01-10 13:07 ` Sergey Kaplun via Tarantool-patches
  2025-01-14 14:15   ` Sergey Bronnikov via Tarantool-patches
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 4/4] Fix last commit Sergey Kaplun via Tarantool-patches
  3 siblings, 1 reply; 9+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-10 13:07 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Reported by Peter Cawley.

(cherry picked from commit 644723649ea04cb23b72c814b88b72a29e4afed4)

Load fusing optimization doesn't take into account the presence of the
`IR_NEWREF` which may cause rehashing and deallocation of the array part
of the table. This may lead to the incorrect stores if the fusing
optimization occurs across this IR, leading to inconsistent behaviour
between the JIT and the VM.

This patch adds the corresponding check by the refactoring of the
`noconflict()` function -- it now accepts the mask of the `check` as the
last argument. The first bit stands for the `IR_NEWREF` check, the
second for the multiple reference of the given instruction.
Unfortunately, this commit misses the check for the `table.clear()`
introduced for the preprevious patch. Thus, the corresponding test fails
again. This will be fixed in the next commit.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709
---
 src/lj_asm_x86.h                              | 17 +++---
 .../lj-1117-fuse-across-newref.test.lua       | 52 +++++++++++++++++++
 2 files changed, 61 insertions(+), 8 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1117-fuse-across-newref.test.lua

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index cba7ba80..d77087d6 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -109,7 +109,7 @@ static int asm_isk32(ASMState *as, IRRef ref, int32_t *k)
 /* Check if there's no conflicting instruction between curins and ref.
 ** Also avoid fusing loads if there are multiple references.
 */
-static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload)
+static int noconflict(ASMState *as, IRRef ref, IROp conflict, int check)
 {
   IRIns *ir = as->ir;
   IRRef i = as->curins;
@@ -118,7 +118,9 @@ static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload)
   while (--i > ref) {
     if (ir[i].o == conflict)
       return 0;  /* Conflict found. */
-    else if (!noload && (ir[i].op1 == ref || ir[i].op2 == ref))
+    else if ((check & 1) && ir[i].o == IR_NEWREF)
+      return 0;
+    else if ((check & 2) && (ir[i].op1 == ref || ir[i].op2 == ref))
       return 0;
   }
   return 1;  /* Ok, no conflict. */
@@ -134,7 +136,7 @@ static IRRef asm_fuseabase(ASMState *as, IRRef ref)
     lj_assertA(irb->op2 == IRFL_TAB_ARRAY, "expected FLOAD TAB_ARRAY");
     /* We can avoid the FLOAD of t->array for colocated arrays. */
     if (ira->o == IR_TNEW && ira->op1 <= LJ_MAX_COLOSIZE &&
-	!neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 1)) {
+	!neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 0)) {
       as->mrm.ofs = (int32_t)sizeof(GCtab);  /* Ofs to colocated array. */
       return irb->op1;  /* Table obj. */
     }
@@ -448,7 +450,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
     RegSet xallow = (allow & RSET_GPR) ? allow : RSET_GPR;
     if (ir->o == IR_SLOAD) {
       if (!(ir->op2 & (IRSLOAD_PARENT|IRSLOAD_CONVERT)) &&
-	  noconflict(as, ref, IR_RETF, 0) &&
+	  noconflict(as, ref, IR_RETF, 2) &&
 	  !(LJ_GC64 && irt_isaddr(ir->t))) {
 	as->mrm.base = (uint8_t)ra_alloc1(as, REF_BASE, xallow);
 	as->mrm.ofs = 8*((int32_t)ir->op1-1-LJ_FR2) +
@@ -459,13 +461,12 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
     } else if (ir->o == IR_FLOAD) {
       /* Generic fusion is only ok for 32 bit operand (but see asm_comp). */
       if ((irt_isint(ir->t) || irt_isu32(ir->t) || irt_isaddr(ir->t)) &&
-	  noconflict(as, ref, IR_FSTORE, 0)) {
+	  noconflict(as, ref, IR_FSTORE, 2)) {
 	asm_fusefref(as, ir, xallow);
 	return RID_MRM;
       }
     } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
-      if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
-	  noconflict(as, ref, IR_CALLS, 1) &&  /* Don't cross table.clear. */
+      if (noconflict(as, ref, ir->o + IRDELTA_L2S, 2+(ir->o != IR_ULOAD)) &&
 	  !(LJ_GC64 && irt_isaddr(ir->t))) {
 	asm_fuseahuref(as, ir->op1, xallow);
 	return RID_MRM;
@@ -475,7 +476,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
       ** Fusing unaligned memory operands is ok on x86 (except for SIMD types).
       */
       if ((!irt_typerange(ir->t, IRT_I8, IRT_U16)) &&
-	  noconflict(as, ref, IR_XSTORE, 0)) {
+	  noconflict(as, ref, IR_XSTORE, 2)) {
 	asm_fusexref(as, ir->op1, xallow);
 	return RID_MRM;
       }
diff --git a/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
new file mode 100644
index 00000000..4b8772bf
--- /dev/null
+++ b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
@@ -0,0 +1,52 @@
+local tap = require('tap')
+-- Test file to demonstrate LuaJIT's incorrect fusion across
+-- `IR_NEWREF`.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1117.
+local test = tap.test('lj-1117-fuse-across-newref'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+local ffi = require('ffi')
+
+test:plan(1)
+
+-- Table with content.
+local tab = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 42}
+-- Use the alias to trick the code flow analysis.
+local tab_alias = tab
+local result_tab = {}
+
+-- Need to start recording trace at the 16th iteration to avoid
+-- rehashing of the `t` and `result_tab` before the `if`
+-- condition below on the 32nd iteration. Also, the inner loop
+-- isn't recorded this way. After rehashing in the NEWREF, the
+-- fusion will use the wrong address, which leads to the dirty
+-- reads visible (always, not flaky) under Valgrind with the
+-- `--free-fill` option set.
+jit.opt.start('hotloop=16')
+
+-- The amount of iterations required for the rehashing of the
+-- table.
+for i = 1, 33 do
+  -- ALOAD to be fused.
+  local value = tab[16]
+  -- NEWREF instruction.
+  tab_alias[{}] = 100
+  -- Need this CONV cast to trigger load fusion. See `asm_comp()`
+  -- for the details. Before the patch, this fusion takes the
+  -- incorrect address of the already deallocated array part of
+  -- the table, which leads to the incorrect result.
+  result_tab[i] = ffi.cast('int64_t', value)
+  if i == 32 then
+    -- Clear the array part.
+    for j = 1, 15 do
+      tab[j] = nil
+    end
+    -- Next rehash of the `tab`/`tab_alias` will dealloc the array
+    -- part.
+  end
+end
+
+test:samevalues(result_tab, 'no fusion across NEWREF')
+
+test:done(true)
-- 
2.47.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH luajit 4/4] Fix last commit.
  2025-01-10 13:07 [Tarantool-patches] [PATCH luajit 0/4] Fixes for load fusing optimization Sergey Kaplun via Tarantool-patches
                   ` (2 preceding siblings ...)
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF Sergey Kaplun via Tarantool-patches
@ 2025-01-10 13:07 ` Sergey Kaplun via Tarantool-patches
  2025-01-14 14:15   ` Sergey Bronnikov via Tarantool-patches
  3 siblings, 1 reply; 9+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-10 13:07 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry picked from commit 43d0a19158ceabaa51b0462c1ebc97612b420a2e)

The previous commit accidentally removes the check that fusing
optimization isn't performing across the call to the `table.clear()`.
This commit fixes the behaviour by adding the corresponding check that
depends on the first bit of `check` masks in the `noconflict()` routine.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#10709
---
 src/lj_asm_x86.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index d77087d6..a96bc2e7 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -118,7 +118,7 @@ static int noconflict(ASMState *as, IRRef ref, IROp conflict, int check)
   while (--i > ref) {
     if (ir[i].o == conflict)
       return 0;  /* Conflict found. */
-    else if ((check & 1) && ir[i].o == IR_NEWREF)
+    else if ((check & 1) && (ir[i].o == IR_NEWREF || ir[i].o == IR_CALLS))
       return 0;
     else if ((check & 2) && (ir[i].op1 == ref || ir[i].op2 == ref))
       return 0;
-- 
2.47.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear.
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear Sergey Kaplun via Tarantool-patches
@ 2025-01-14 14:10   ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 9+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-01-14 14:10 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]

Hi, Sergey!

thanks for the patch! LGTM

On 10.01.2025 16:07, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Reported by Peter Cawley.
>
> (cherry picked from commit 45c88b7963de2969a9a656c03ba06ad995d7fd5f)
>
> Load fusing optimization takes into account only the presence of the
> corresponding stores, but not any calls that may affect the table
> content. This may lead to the incorrect stores if the fusing
> optimization occurs across the `table.clear()` call, leading to
> inconsistent behaviour between the JIT and the VM.
>
> This patch adds the corresponding check.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#10709
> ---
>   src/lj_asm_x86.h                              |  1 +
>   .../lj-1117-fuse-across-table-clear.test.lua  | 36 +++++++++++++++++++
>   2 files changed, 37 insertions(+)
>   create mode 100644 test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua
>
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index 86ce3937..f47c460a 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -465,6 +465,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
>         }
>       } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
>         if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
> +	  noconflict(as, ref, IR_CALLS, 0) &&  /* Don't cross table.clear. */
>   	  !(LJ_GC64 && irt_isaddr(ir->t))) {
>   	asm_fuseahuref(as, ir->op1, xallow);
>   	return RID_MRM;
> diff --git a/test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua b/test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua
> new file mode 100644
> index 00000000..2f7c91d1
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1117-fuse-across-table-clear.test.lua
> @@ -0,0 +1,36 @@
> +local tap = require('tap')
> +-- Test file to demonstrate LuaJIT's incorrect fusion across
> +-- `table.clear()`.
> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1117.
> +local test = tap.test('lj-1117-fuse-across-table-clear'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +local ffi = require('ffi')
> +local table_clear = require('table.clear')
> +
> +test:plan(1)
> +
> +local tab = {0}
> +local alias_tab = tab
> +local result_tab = {}
> +
> +jit.opt.start('hotloop=1')
> +
> +for i = 1, 4 do
> +  -- Load to be fused.
> +  local value = tab[1]
> +  -- Clear the alias table to trick the code flow analysis.
> +  table_clear(alias_tab)
> +  -- Need this cast to trigger load fusion. See `asm_comp()` for
> +  -- the details. Before the patch, this fusion takes the
> +  -- incorrect address of the already cleared table, which leads
> +  -- to the failure of the check below.
> +  result_tab[i] = ffi.cast('int64_t', value)
> +  -- Revive the value.
> +  tab[1] = 0
> +end
> +
> +test:samevalues(result_tab, 'no fusion across table.clear')
> +
> +test:done(true)

[-- Attachment #2: Type: text/html, Size: 3387 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 2/4] Improve last commit.
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 2/4] Improve last commit Sergey Kaplun via Tarantool-patches
@ 2025-01-14 14:11   ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 9+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-01-14 14:11 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

Hi, Sergey!

thanks for the patch! LGTM

On 10.01.2025 16:07, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 113a168b792cd367822ec04cdc2ef32facd28efa)
>
> The `noconflict()` function checks if there's no conflicting instruction
> between the current instruction and the given `ref` instruction. Also,
> it avoids fusing loads if there are multiple references of the given
> `ref`. The last check is performed in the presence of the `noload`
> parameter. Since the `noconflict()`, which is added in the previous
> patch, checks conflicts for the same `ref` as the call before, there is
> no need to perform these checks again, so the corresponding parameter is
> adjusted.
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#10709
> ---
>   src/lj_asm_x86.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index f47c460a..cba7ba80 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -465,7 +465,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
>         }
>       } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
>         if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
> -	  noconflict(as, ref, IR_CALLS, 0) &&  /* Don't cross table.clear. */
> +	  noconflict(as, ref, IR_CALLS, 1) &&  /* Don't cross table.clear. */
>   	  !(LJ_GC64 && irt_isaddr(ir->t))) {
>   	asm_fuseahuref(as, ir->op1, xallow);
>   	return RID_MRM;

[-- Attachment #2: Type: text/html, Size: 1994 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF.
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF Sergey Kaplun via Tarantool-patches
@ 2025-01-14 14:15   ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 9+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-01-14 14:15 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 6815 bytes --]

Hi, Sergey!

thanks for the patch! LGTM

On 10.01.2025 16:07, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Reported by Peter Cawley.
>
> (cherry picked from commit 644723649ea04cb23b72c814b88b72a29e4afed4)
>
> Load fusing optimization doesn't take into account the presence of the
> `IR_NEWREF` which may cause rehashing and deallocation of the array part
> of the table. This may lead to the incorrect stores if the fusing
> optimization occurs across this IR, leading to inconsistent behaviour
> between the JIT and the VM.
>
> This patch adds the corresponding check by the refactoring of the
> `noconflict()` function -- it now accepts the mask of the `check` as the
> last argument. The first bit stands for the `IR_NEWREF` check, the
> second for the multiple reference of the given instruction.
> Unfortunately, this commit misses the check for the `table.clear()`
> introduced for the preprevious patch. Thus, the corresponding test fails
> again. This will be fixed in the next commit.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#10709
> ---
>   src/lj_asm_x86.h                              | 17 +++---
>   .../lj-1117-fuse-across-newref.test.lua       | 52 +++++++++++++++++++
>   2 files changed, 61 insertions(+), 8 deletions(-)
>   create mode 100644 test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
>
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index cba7ba80..d77087d6 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -109,7 +109,7 @@ static int asm_isk32(ASMState *as, IRRef ref, int32_t *k)
>   /* Check if there's no conflicting instruction between curins and ref.
>   ** Also avoid fusing loads if there are multiple references.
>   */
> -static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload)
> +static int noconflict(ASMState *as, IRRef ref, IROp conflict, int check)
>   {
>     IRIns *ir = as->ir;
>     IRRef i = as->curins;
> @@ -118,7 +118,9 @@ static int noconflict(ASMState *as, IRRef ref, IROp conflict, int noload)
>     while (--i > ref) {
>       if (ir[i].o == conflict)
>         return 0;  /* Conflict found. */
> -    else if (!noload && (ir[i].op1 == ref || ir[i].op2 == ref))
> +    else if ((check & 1) && ir[i].o == IR_NEWREF)
> +      return 0;
> +    else if ((check & 2) && (ir[i].op1 == ref || ir[i].op2 == ref))
>         return 0;
>     }
>     return 1;  /* Ok, no conflict. */
> @@ -134,7 +136,7 @@ static IRRef asm_fuseabase(ASMState *as, IRRef ref)
>       lj_assertA(irb->op2 == IRFL_TAB_ARRAY, "expected FLOAD TAB_ARRAY");
>       /* We can avoid the FLOAD of t->array for colocated arrays. */
>       if (ira->o == IR_TNEW && ira->op1 <= LJ_MAX_COLOSIZE &&
> -	!neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 1)) {
> +	!neverfuse(as) && noconflict(as, irb->op1, IR_NEWREF, 0)) {
>         as->mrm.ofs = (int32_t)sizeof(GCtab);  /* Ofs to colocated array. */
>         return irb->op1;  /* Table obj. */
>       }
> @@ -448,7 +450,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
>       RegSet xallow = (allow & RSET_GPR) ? allow : RSET_GPR;
>       if (ir->o == IR_SLOAD) {
>         if (!(ir->op2 & (IRSLOAD_PARENT|IRSLOAD_CONVERT)) &&
> -	  noconflict(as, ref, IR_RETF, 0) &&
> +	  noconflict(as, ref, IR_RETF, 2) &&
>   	  !(LJ_GC64 && irt_isaddr(ir->t))) {
>   	as->mrm.base = (uint8_t)ra_alloc1(as, REF_BASE, xallow);
>   	as->mrm.ofs = 8*((int32_t)ir->op1-1-LJ_FR2) +
> @@ -459,13 +461,12 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
>       } else if (ir->o == IR_FLOAD) {
>         /* Generic fusion is only ok for 32 bit operand (but see asm_comp). */
>         if ((irt_isint(ir->t) || irt_isu32(ir->t) || irt_isaddr(ir->t)) &&
> -	  noconflict(as, ref, IR_FSTORE, 0)) {
> +	  noconflict(as, ref, IR_FSTORE, 2)) {
>   	asm_fusefref(as, ir, xallow);
>   	return RID_MRM;
>         }
>       } else if (ir->o == IR_ALOAD || ir->o == IR_HLOAD || ir->o == IR_ULOAD) {
> -      if (noconflict(as, ref, ir->o + IRDELTA_L2S, 0) &&
> -	  noconflict(as, ref, IR_CALLS, 1) &&  /* Don't cross table.clear. */
> +      if (noconflict(as, ref, ir->o + IRDELTA_L2S, 2+(ir->o != IR_ULOAD)) &&
>   	  !(LJ_GC64 && irt_isaddr(ir->t))) {
>   	asm_fuseahuref(as, ir->op1, xallow);
>   	return RID_MRM;
> @@ -475,7 +476,7 @@ static Reg asm_fuseload(ASMState *as, IRRef ref, RegSet allow)
>         ** Fusing unaligned memory operands is ok on x86 (except for SIMD types).
>         */
>         if ((!irt_typerange(ir->t, IRT_I8, IRT_U16)) &&
> -	  noconflict(as, ref, IR_XSTORE, 0)) {
> +	  noconflict(as, ref, IR_XSTORE, 2)) {
>   	asm_fusexref(as, ir->op1, xallow);
>   	return RID_MRM;
>         }
> diff --git a/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
> new file mode 100644
> index 00000000..4b8772bf
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1117-fuse-across-newref.test.lua
> @@ -0,0 +1,52 @@
> +local tap = require('tap')
> +-- Test file to demonstrate LuaJIT's incorrect fusion across
> +-- `IR_NEWREF`.
> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1117.
> +local test = tap.test('lj-1117-fuse-across-newref'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +local ffi = require('ffi')
> +
> +test:plan(1)
> +
> +-- Table with content.
> +local tab = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 42}
> +-- Use the alias to trick the code flow analysis.
> +local tab_alias = tab
> +local result_tab = {}
> +
> +-- Need to start recording trace at the 16th iteration to avoid
> +-- rehashing of the `t` and `result_tab` before the `if`
> +-- condition below on the 32nd iteration. Also, the inner loop
> +-- isn't recorded this way. After rehashing in the NEWREF, the
> +-- fusion will use the wrong address, which leads to the dirty
> +-- reads visible (always, not flaky) under Valgrind with the
> +-- `--free-fill` option set.
> +jit.opt.start('hotloop=16')
> +
> +-- The amount of iterations required for the rehashing of the
> +-- table.
> +for i = 1, 33 do
> +  -- ALOAD to be fused.
> +  local value = tab[16]
> +  -- NEWREF instruction.
> +  tab_alias[{}] = 100
> +  -- Need this CONV cast to trigger load fusion. See `asm_comp()`
> +  -- for the details. Before the patch, this fusion takes the
> +  -- incorrect address of the already deallocated array part of
> +  -- the table, which leads to the incorrect result.
> +  result_tab[i] = ffi.cast('int64_t', value)
> +  if i == 32 then
> +    -- Clear the array part.
> +    for j = 1, 15 do
> +      tab[j] = nil
> +    end
> +    -- Next rehash of the `tab`/`tab_alias` will dealloc the array
> +    -- part.
> +  end
> +end
> +
> +test:samevalues(result_tab, 'no fusion across NEWREF')
> +
> +test:done(true)

[-- Attachment #2: Type: text/html, Size: 7307 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit 4/4] Fix last commit.
  2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 4/4] Fix last commit Sergey Kaplun via Tarantool-patches
@ 2025-01-14 14:15   ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 9+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-01-14 14:15 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

Hi, Sergey!

thanks for the patch! LGTM

On 10.01.2025 16:07, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 43d0a19158ceabaa51b0462c1ebc97612b420a2e)
>
> The previous commit accidentally removes the check that fusing
> optimization isn't performing across the call to the `table.clear()`.
> This commit fixes the behaviour by adding the corresponding check that
> depends on the first bit of `check` masks in the `noconflict()` routine.
>
> Sergey Kaplun:
> * added the description for the problem
>
> Part of tarantool/tarantool#10709
> ---
>   src/lj_asm_x86.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index d77087d6..a96bc2e7 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -118,7 +118,7 @@ static int noconflict(ASMState *as, IRRef ref, IROp conflict, int check)
>     while (--i > ref) {
>       if (ir[i].o == conflict)
>         return 0;  /* Conflict found. */
> -    else if ((check & 1) && ir[i].o == IR_NEWREF)
> +    else if ((check & 1) && (ir[i].o == IR_NEWREF || ir[i].o == IR_CALLS))
>         return 0;
>       else if ((check & 2) && (ir[i].op1 == ref || ir[i].op2 == ref))
>         return 0;

[-- Attachment #2: Type: text/html, Size: 1704 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-14 14:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-10 13:07 [Tarantool-patches] [PATCH luajit 0/4] Fixes for load fusing optimization Sergey Kaplun via Tarantool-patches
2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 1/4] x86/x64: Don't fuse loads across table.clear Sergey Kaplun via Tarantool-patches
2025-01-14 14:10   ` Sergey Bronnikov via Tarantool-patches
2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 2/4] Improve last commit Sergey Kaplun via Tarantool-patches
2025-01-14 14:11   ` Sergey Bronnikov via Tarantool-patches
2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 3/4] x86/x64: Don't fuse loads across IR_NEWREF Sergey Kaplun via Tarantool-patches
2025-01-14 14:15   ` Sergey Bronnikov via Tarantool-patches
2025-01-10 13:07 ` [Tarantool-patches] [PATCH luajit 4/4] Fix last commit Sergey Kaplun via Tarantool-patches
2025-01-14 14:15   ` Sergey Bronnikov via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox