Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
@ 2024-09-25 10:36 Sergey Kaplun via Tarantool-patches
  2024-10-08 12:39 ` Sergey Bronnikov via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-09-25 10:36 UTC (permalink / raw)
  To: Maxim Kokryashkin, Sergey Bronnikov; +Cc: tarantool-patches

From: Mike Pall <mike>

Thanks to Peter Cawley.

(cherry picked from commit 3bdc6498c4c012a8fbf9cfa2756a5b07f56f1540)

`IR_CALLXS` for the vararg function contains `IR_CARG(fptr, ctid)` as
the second operand. The `loop_emit_phi()` scans only the first operand
of the IR, so the second is not marked as PHI. In this case, when the IR
appears in both the invariant and variant parts of the loop, CSE may
remove it and thus lead to incorrect emitting results.

This patch tweaks the CSE rules to avoid CSE across the `IR_LOOP`.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10199
---

Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1244-missing-phi-carg
Related issues:
* https://github.com/tarantool/tarantool/issues/10199
* https://github.com/LuaJIT/LuaJIT/issues/1244

 src/lj_opt_fold.c                             | 11 ++++
 .../lj-1244-missing-phi-carg.test.lua         | 53 +++++++++++++++++++
 2 files changed, 64 insertions(+)
 create mode 100644 test/tarantool-tests/lj-1244-missing-phi-carg.test.lua

diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c
index e2171e1b..33e5f9dd 100644
--- a/src/lj_opt_fold.c
+++ b/src/lj_opt_fold.c
@@ -2406,6 +2406,17 @@ LJFOLD(XSNEW any any)
 LJFOLD(BUFHDR any any)
 LJFOLDX(lj_ir_emit)
 
+/* -- Miscellaneous ------------------------------------------------------- */
+
+LJFOLD(CARG any any)
+LJFOLDF(cse_carg)
+{
+  TRef tr = lj_opt_cse(J);
+  if (tref_ref(tr) < J->chain[IR_LOOP])  /* CSE across loop? */
+    return EMITFOLD;  /* Raw emit. Assumes fins is left intact by CSE. */
+  return tr;
+}
+
 /* ------------------------------------------------------------------------ */
 
 /* Every entry in the generated hash table is a 32 bit pattern:
diff --git a/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua b/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
new file mode 100644
index 00000000..865cdd26
--- /dev/null
+++ b/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
@@ -0,0 +1,53 @@
+local ffi = require('ffi')
+local table_new = require('table.new')
+
+-- Test file to demonstrate LuaJIT incorrect behaviour for
+-- recording the FFI call to the vararg function. See also:
+-- https://github.com/LuaJIT/LuaJIT/issues/1244.
+local tap = require('tap')
+local test = tap.test('lj-1244-missing-phi-carg'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+-- Loop unrolls into 2 iterations. Thus means that the loop is
+-- executed on trace on the 5th iteration (instead of the usual
+-- 4th). Run it even number of iterations to test both, so last is
+-- 6th.
+local NTESTS = 6
+
+test:plan(NTESTS)
+
+ffi.cdef[[
+  double sin(double, ...);
+  double cos(double, ...);
+]]
+
+local EXPECTED = {[0] = ffi.C.sin(0), ffi.C.cos(0)}
+
+-- Array of 2 functions.
+local fns = ffi.new('double (*[2])(double, ...)')
+fns[0] = ffi.C.cos
+fns[1] = ffi.C.sin
+
+-- Avoid reallocating the table on the trace.
+local result = table_new(8, 0)
+
+jit.opt.start('hotloop=1')
+
+local fn = fns[0]
+-- The first result is `cos()`.
+for i = 1, NTESTS do
+  result[i] = fn(0)
+  fn = fns[i % 2]
+  -- The call persists in the invariant part of the loop as well.
+  -- Hence, XLOAD (part of the IR_CARG -- function to be called)
+  -- should be marked as PHI, but it isn't due to CSE.
+  fn(0)
+end
+
+for i = 1, NTESTS do
+  test:is(result[i], EXPECTED[i % 2],
+          ('correct result on iteration %d'):format(i))
+end
+
+test:done(true)
-- 
2.46.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
  2024-09-25 10:36 [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations Sergey Kaplun via Tarantool-patches
@ 2024-10-08 12:39 ` Sergey Bronnikov via Tarantool-patches
  2024-10-08 13:25   ` Sergey Kaplun via Tarantool-patches
  2024-10-11 18:58 ` Maxim Kokryashkin via Tarantool-patches
  2024-10-18 15:19 ` Sergey Kaplun via Tarantool-patches
  2 siblings, 1 reply; 7+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-10-08 12:39 UTC (permalink / raw)
  To: Sergey Kaplun, Maxim Kokryashkin; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 3905 bytes --]

Hi, Sergey!

LGTM with a minor question below.

On 25.09.2024 13:36, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Thanks to Peter Cawley.
>
> (cherry picked from commit 3bdc6498c4c012a8fbf9cfa2756a5b07f56f1540)
>
> `IR_CALLXS` for the vararg function contains `IR_CARG(fptr, ctid)` as
> the second operand. The `loop_emit_phi()` scans only the first operand
> of the IR, so the second is not marked as PHI. In this case, when the IR
> appears in both the invariant and variant parts of the loop, CSE may
> remove it and thus lead to incorrect emitting results.
>
> This patch tweaks the CSE rules to avoid CSE across the `IR_LOOP`.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#10199
> ---
>
> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1244-missing-phi-carg
> Related issues:
> *https://github.com/tarantool/tarantool/issues/10199
> *https://github.com/LuaJIT/LuaJIT/issues/1244
>
>   src/lj_opt_fold.c                             | 11 ++++
>   .../lj-1244-missing-phi-carg.test.lua         | 53 +++++++++++++++++++
>   2 files changed, 64 insertions(+)
>   create mode 100644 test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
>
> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c
> index e2171e1b..33e5f9dd 100644
> --- a/src/lj_opt_fold.c
> +++ b/src/lj_opt_fold.c
> @@ -2406,6 +2406,17 @@ LJFOLD(XSNEW any any)
>   LJFOLD(BUFHDR any any)
>   LJFOLDX(lj_ir_emit)
>   
> +/* -- Miscellaneous ------------------------------------------------------- */
> +
> +LJFOLD(CARG any any)
> +LJFOLDF(cse_carg)
> +{
> +  TRef tr = lj_opt_cse(J);
> +  if (tref_ref(tr) < J->chain[IR_LOOP])  /* CSE across loop? */
> +    return EMITFOLD;  /* Raw emit. Assumes fins is left intact by CSE. */
> +  return tr;
> +}
> +
>   /* ------------------------------------------------------------------------ */
>   
>   /* Every entry in the generated hash table is a 32 bit pattern:
> diff --git a/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua b/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
> new file mode 100644
> index 00000000..865cdd26
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
> @@ -0,0 +1,53 @@
> +local ffi = require('ffi')
> +local table_new = require('table.new')
> +
> +-- Test file to demonstrate LuaJIT incorrect behaviour for
> +-- recording the FFI call to the vararg function. See also:
> +--https://github.com/LuaJIT/LuaJIT/issues/1244.
> +local tap = require('tap')
> +local test = tap.test('lj-1244-missing-phi-carg'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +-- Loop unrolls into 2 iterations. Thus means that the loop is
> +-- executed on trace on the 5th iteration (instead of the usual
> +-- 4th). Run it even number of iterations to test both, so last is
> +-- 6th.
> +local NTESTS = 6
> +
> +test:plan(NTESTS)
> +
> +ffi.cdef[[
> +  double sin(double, ...);
> +  double cos(double, ...);

Why do you use sin/cos with wrong function prototypes if you can take

a function with varargs. (printf for example)?

> +]]
> +
> +local EXPECTED = {[0] = ffi.C.sin(0), ffi.C.cos(0)}
> +
> +-- Array of 2 functions.
> +local fns = ffi.new('double (*[2])(double, ...)')
> +fns[0] = ffi.C.cos
> +fns[1] = ffi.C.sin
> +
> +-- Avoid reallocating the table on the trace.
> +local result = table_new(8, 0)
> +
> +jit.opt.start('hotloop=1')
> +
> +local fn = fns[0]
> +-- The first result is `cos()`.
> +for i = 1, NTESTS do
> +  result[i] = fn(0)
> +  fn = fns[i % 2]
> +  -- The call persists in the invariant part of the loop as well.
> +  -- Hence, XLOAD (part of the IR_CARG -- function to be called)
> +  -- should be marked as PHI, but it isn't due to CSE.
> +  fn(0)
> +end
> +
> +for i = 1, NTESTS do
> +  test:is(result[i], EXPECTED[i % 2],
> +          ('correct result on iteration %d'):format(i))
> +end
> +
> +test:done(true)

[-- Attachment #2: Type: text/html, Size: 4749 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
  2024-10-08 12:39 ` Sergey Bronnikov via Tarantool-patches
@ 2024-10-08 13:25   ` Sergey Kaplun via Tarantool-patches
  2024-10-11 14:02     ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-10-08 13:25 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Hi, Sergey!
Thanks for the review!
Please consider my answer below.

On 08.10.24, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> LGTM with a minor question below.
> 


<snipped>

> > +ffi.cdef[[
> > +  double sin(double, ...);
> > +  double cos(double, ...);
> 
> Why do you use sin/cos with wrong function prototypes if you can take
> 
> a function with varargs. (printf for example)?

I suppose this hack is well suited for our testing purposes:
1) We don't need to create custom vararg functions and the corresponding
   C libary.
2) Functions usage is obvious. Also, the vararg part will be silently
   ignored.

Be aware that we need the result of the function's call.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
  2024-10-08 13:25   ` Sergey Kaplun via Tarantool-patches
@ 2024-10-11 14:02     ` Sergey Bronnikov via Tarantool-patches
  2024-10-17  7:16       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-10-11 14:02 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

Hi, Sergey!

On 08.10.2024 16:25, Sergey Kaplun wrote:

<snipped>

>>> +ffi.cdef[[
>>> +  double sin(double, ...);
>>> +  double cos(double, ...);
>> Why do you use sin/cos with wrong function prototypes if you can take
>>
>> a function with varargs. (printf for example)?
> I suppose this hack is well suited for our testing purposes:
> 1) We don't need to create custom vararg functions and the corresponding
>     C libary.
> 2) Functions usage is obvious. Also, the vararg part will be silently
>     ignored.
>
> Be aware that we need the result of the function's call.
>
Got it. I would add this as a comment. Feel free to ignore.

LGTM

[-- Attachment #2: Type: text/html, Size: 1349 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
  2024-09-25 10:36 [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations Sergey Kaplun via Tarantool-patches
  2024-10-08 12:39 ` Sergey Bronnikov via Tarantool-patches
@ 2024-10-11 18:58 ` Maxim Kokryashkin via Tarantool-patches
  2024-10-18 15:19 ` Sergey Kaplun via Tarantool-patches
  2 siblings, 0 replies; 7+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2024-10-11 18:58 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi, Sergey!
LGTM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
  2024-10-11 14:02     ` Sergey Bronnikov via Tarantool-patches
@ 2024-10-17  7:16       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-10-17  7:16 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

Sergey,

On 11.10.24, Sergey Bronnikov wrote:
> Hi, Sergey!
> 
> On 08.10.2024 16:25, Sergey Kaplun wrote:
> 
> <snipped>
> 
> >>> +ffi.cdef[[
> >>> +  double sin(double, ...);
> >>> +  double cos(double, ...);
> >> Why do you use sin/cos with wrong function prototypes if you can take
> >>
> >> a function with varargs. (printf for example)?
> > I suppose this hack is well suited for our testing purposes:
> > 1) We don't need to create custom vararg functions and the corresponding
> >     C libary.
> > 2) Functions usage is obvious. Also, the vararg part will be silently
> >     ignored.
> >
> > Be aware that we need the result of the function's call.
> >
> Got it. I would add this as a comment. Feel free to ignore.

I've added the following comment:

===================================================================
diff --git a/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua b/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
index 865cdd26..d4981299 100644
--- a/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
+++ b/test/tarantool-tests/lj-1244-missing-phi-carg.test.lua
@@ -17,6 +17,9 @@ local NTESTS = 6
 
 test:plan(NTESTS)
 
+-- XXX: Hack with function's prototypes to avoid creation of
+-- custom functions to be loaded via FFI (vararg part will be just
+-- ignored).
 ffi.cdef[[
   double sin(double, ...);
   double cos(double, ...);
===================================================================

Branch is rebased on the current master and force-pushed.

> 
> LGTM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations.
  2024-09-25 10:36 [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations Sergey Kaplun via Tarantool-patches
  2024-10-08 12:39 ` Sergey Bronnikov via Tarantool-patches
  2024-10-11 18:58 ` Maxim Kokryashkin via Tarantool-patches
@ 2024-10-18 15:19 ` Sergey Kaplun via Tarantool-patches
  2 siblings, 0 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-10-18 15:19 UTC (permalink / raw)
  To: Maxim Kokryashkin, Sergey Bronnikov; +Cc: tarantool-patches

I've applied the patch into all long-term branches in tarantool/luajit
and bumped a new version in master [1], release/3.2 [2] and
release/2.11 [3].

[1]: https://github.com/tarantool/tarantool/pull/10712
[2]: https://github.com/tarantool/tarantool/pull/10713
[3]: https://github.com/tarantool/tarantool/pull/10714

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-18 15:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-25 10:36 [Tarantool-patches] [PATCH luajit] Limit CSE for IR_CARG to fix loop optimizations Sergey Kaplun via Tarantool-patches
2024-10-08 12:39 ` Sergey Bronnikov via Tarantool-patches
2024-10-08 13:25   ` Sergey Kaplun via Tarantool-patches
2024-10-11 14:02     ` Sergey Bronnikov via Tarantool-patches
2024-10-17  7:16       ` Sergey Kaplun via Tarantool-patches
2024-10-11 18:58 ` Maxim Kokryashkin via Tarantool-patches
2024-10-18 15:19 ` Sergey Kaplun via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox