Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
@ 2021-07-19  7:36 Sergey Kaplun via Tarantool-patches
  2021-08-01 10:43 ` Igor Munkin via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-07-19  7:36 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)

When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
two slots -- the first takes the TValue with the function to call, the
second takes the additional frame information. The recording JIT
machinery works pretty the same -- the function IR_KGC is loaded in the
first slot, and the second is set to TREF_FRAME value. This value
should be rewritten after return from a callee. It is done either by the
return values either this slot is cleared (set to zero) manually with
the next bytecode with RA dst mode with the assumption, that the dst RA
takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
the smallest possible destination register (see `lj_record_ins()` for
the details).

Bytecode allocator swaps operands for ISGT and ISGE comparisons.
When it happens, the aforementioned rule for registers allocations
may be violated. When it happens, and this chunk is recording, the slot
with TREF_FRAME is not rewritten (but the next empty slot after
TREF_FRAME is) during bytecode recording. This leads to JIT slots
inconsistency and assertion failure in `rec_check_slots()` during
recording the next bytecode instruction.

This patch fixes bytecode register allocation by changing the register
allocation order in case of ISGT and ISGE bytecodes.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6227
---

Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
Issue: https://github.com/tarantool/tarantool/issues/6227

 src/lj_parse.c                                |  7 +++-
 ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua

diff --git a/src/lj_parse.c b/src/lj_parse.c
index 08f7cfa6..a6325a76 100644
--- a/src/lj_parse.c
+++ b/src/lj_parse.c
@@ -853,9 +853,12 @@ static void bcemit_comp(FuncState *fs, BinOpr opr, ExpDesc *e1, ExpDesc *e2)
       e1 = e2; e2 = eret;  /* Swap operands. */
       op = ((op-BC_ISLT)^3)+BC_ISLT;
       expr_toval(fs, e1);
+      ra = expr_toanyreg(fs, e1);
+      rd = expr_toanyreg(fs, e2);
+    } else {
+      rd = expr_toanyreg(fs, e2);
+      ra = expr_toanyreg(fs, e1);
     }
-    rd = expr_toanyreg(fs, e2);
-    ra = expr_toanyreg(fs, e1);
     ins = BCINS_AD(op, ra, rd);
   }
   /* Using expr_free might cause asserts if the order is wrong. */
diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
new file mode 100644
index 00000000..66f6885e
--- /dev/null
+++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
@@ -0,0 +1,41 @@
+local tap = require('tap')
+local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
+test:plan(1)
+
+-- Test file to demonstrate assertion failure during recording
+-- wrong allocated bytecode for comparisons.
+-- See also https://github.com/tarantool/tarantool/issues/6227.
+
+-- Need function with RET0 bytecode to avoid reset of
+-- the first JIT slot with frame info. Also need no assignments
+-- by the caller.
+local function empty() end
+
+local uv = 0
+
+-- This function needs to reset register enumerating.
+-- Also set `J->maxslot` to zero.
+-- The upvalue function to call is loaded to 0 slot.
+local function bump_frame()
+  -- First call function with RET0 to set TREF_FRAME in the
+  -- last slot.
+  empty()
+  -- Test ISGE or ISGT bytecode. These bytecodes swap their
+  -- operands. Also, a constant is always loaded into the slot
+  -- smaller than upvalue. So, if upvalue loads before KSHORT,
+  -- then the difference between registers is more than 2 (2 is
+  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
+  -- the bytecode after call and return as expected. That leads
+  -- to recording slots inconsistency and assertion failure at
+  -- `rec_check_slots()`.
+  empty(1>uv)
+end
+
+jit.opt.start('hotloop=1')
+
+for _ = 1,3 do
+  bump_frame()
+end
+
+test:ok(true)
+os.exit(test:check() and 0 or 1)
-- 
2.31.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-07-19  7:36 [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons Sergey Kaplun via Tarantool-patches
@ 2021-08-01 10:43 ` Igor Munkin via Tarantool-patches
  2021-08-01 17:10   ` Sergey Kaplun via Tarantool-patches
  2021-08-10 17:03 ` Sergey Ostanevich via Tarantool-patches
  2021-08-17  9:24 ` Igor Munkin via Tarantool-patches
  2 siblings, 1 reply; 10+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-01 10:43 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider the comments below. I didn't check
the test yet, since I don't get the JIT peculiarities from your commit
message and comments. Please provide a clearer description and I'll
proceed with the review of the test case then.

On 19.07.21, Sergey Kaplun wrote:
> From: Mike Pall <mike>
> 
> (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> 
> When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
> two slots -- the first takes the TValue with the function to call, the
> second takes the additional frame information. The recording JIT

Minor: The second slot is the framelink in LuaJIT terms.

> machinery works pretty the same -- the function IR_KGC is loaded in the
> first slot, and the second is set to TREF_FRAME value. This value
> should be rewritten after return from a callee. It is done either by the
> return values either this slot is cleared (set to zero) manually with
> the next bytecode with RA dst mode with the assumption, that the dst RA
> takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
> the smallest possible destination register (see `lj_record_ins()` for
> the details).

The main point lies in the monstrous 5-line sentence. I've read several
times, but still don't get it. Could you please reword it in a not such
complex sentence?

> 
> Bytecode allocator swaps operands for ISGT and ISGE comparisons.
> When it happens, the aforementioned rule for registers allocations
> may be violated. When it happens, and this chunk is recording, the slot
> with TREF_FRAME is not rewritten (but the next empty slot after
> TREF_FRAME is) during bytecode recording. This leads to JIT slots
> inconsistency and assertion failure in `rec_check_slots()` during
> recording the next bytecode instruction.
> 
> This patch fixes bytecode register allocation by changing the register
> allocation order in case of ISGT and ISGE bytecodes.

It's better to use "virtual register" or even "VM register" to avoid
ambiguous plain "register" usage.

> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Resolves tarantool/tarantool#6227

Minor: Why #5629 is not mentioned?

> ---
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> Issue: https://github.com/tarantool/tarantool/issues/6227
> 
>  src/lj_parse.c                                |  7 +++-
>  ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
>  2 files changed, 46 insertions(+), 2 deletions(-)
>  create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> 
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index 08f7cfa6..a6325a76 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -853,9 +853,12 @@ static void bcemit_comp(FuncState *fs, BinOpr opr, ExpDesc *e1, ExpDesc *e2)
>        e1 = e2; e2 = eret;  /* Swap operands. */
>        op = ((op-BC_ISLT)^3)+BC_ISLT;
>        expr_toval(fs, e1);
> +      ra = expr_toanyreg(fs, e1);
> +      rd = expr_toanyreg(fs, e2);
> +    } else {
> +      rd = expr_toanyreg(fs, e2);
> +      ra = expr_toanyreg(fs, e1);
>      }
> -    rd = expr_toanyreg(fs, e2);
> -    ra = expr_toanyreg(fs, e1);
>      ins = BCINS_AD(op, ra, rd);
>    }
>    /* Using expr_free might cause asserts if the order is wrong. */
> diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> new file mode 100644
> index 00000000..66f6885e
> --- /dev/null
> +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> @@ -0,0 +1,41 @@
> +local tap = require('tap')
> +local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
> +test:plan(1)
> +
> +-- Test file to demonstrate assertion failure during recording
> +-- wrong allocated bytecode for comparisons.
> +-- See also https://github.com/tarantool/tarantool/issues/6227.
> +
> +-- Need function with RET0 bytecode to avoid reset of
> +-- the first JIT slot with frame info. Also need no assignments
> +-- by the caller.
> +local function empty() end
> +
> +local uv = 0
> +
> +-- This function needs to reset register enumerating.
> +-- Also set `J->maxslot` to zero.
> +-- The upvalue function to call is loaded to 0 slot.
> +local function bump_frame()
> +  -- First call function with RET0 to set TREF_FRAME in the
> +  -- last slot.
> +  empty()
> +  -- Test ISGE or ISGT bytecode. These bytecodes swap their
> +  -- operands. Also, a constant is always loaded into the slot
> +  -- smaller than upvalue. So, if upvalue loads before KSHORT,
> +  -- then the difference between registers is more than 2 (2 is
> +  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> +  -- the bytecode after call and return as expected. That leads
> +  -- to recording slots inconsistency and assertion failure at
> +  -- `rec_check_slots()`.
> +  empty(1>uv)
> +end
> +
> +jit.opt.start('hotloop=1')
> +
> +for _ = 1,3 do
> +  bump_frame()
> +end
> +
> +test:ok(true)
> +os.exit(test:check() and 0 or 1)
> -- 
> 2.31.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-08-01 10:43 ` Igor Munkin via Tarantool-patches
@ 2021-08-01 17:10   ` Sergey Kaplun via Tarantool-patches
  2021-08-16  7:20     ` Igor Munkin via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-01 17:10 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!

Thanks for the review!

On 01.08.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below. I didn't check
> the test yet, since I don't get the JIT peculiarities from your commit
> message and comments. Please provide a clearer description and I'll
> proceed with the review of the test case then.
> 
> On 19.07.21, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> > 
> > (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> > 
> > When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
> > two slots -- the first takes the TValue with the function to call, the
> > second takes the additional frame information. The recording JIT
> 
> Minor: The second slot is the framelink in LuaJIT terms.

Yes, because it takes the additional frame information. How do you want
to modify this line?

> 
> > machinery works pretty the same -- the function IR_KGC is loaded in the
> > first slot, and the second is set to TREF_FRAME value. This value
> > should be rewritten after return from a callee. It is done either by the
> > return values either this slot is cleared (set to zero) manually with
> > the next bytecode with RA dst mode with the assumption, that the dst RA
> > takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
> > the smallest possible destination register (see `lj_record_ins()` for
> > the details).
> 
> The main point lies in the monstrous 5-line sentence. I've read several
> times, but still don't get it. Could you please reword it in a not such
> complex sentence?

The first option is rewrite this slot by return values from the
function.

The second option is clearing slot (i.e. set to zero) manually, when
there is no values to return. It is done by the next bytecode having RA
dst mode. This obliges that the destination of RA takes the next slot
after TREF_FRAME. For this an earlier instruction must use the smallest
possible destination register (see `lj_record_ins()` for the details).

> 
> > 
> > Bytecode allocator swaps operands for ISGT and ISGE comparisons.
> > When it happens, the aforementioned rule for registers allocations
> > may be violated. When it happens, and this chunk is recording, the slot
> > with TREF_FRAME is not rewritten (but the next empty slot after
> > TREF_FRAME is) during bytecode recording. This leads to JIT slots
> > inconsistency and assertion failure in `rec_check_slots()` during
> > recording the next bytecode instruction.
> > 
> > This patch fixes bytecode register allocation by changing the register
> > allocation order in case of ISGT and ISGE bytecodes.
> 
> It's better to use "virtual register" or even "VM register" to avoid
> ambiguous plain "register" usage.

Changed to VM register.

> 
> > 
> > Sergey Kaplun:
> > * added the description and the test for the problem
> > 
> > Resolves tarantool/tarantool#6227
> 
> Minor: Why #5629 is not mentioned?

Added.
Branch is updated and force-pushed.

> 
> > ---
> > 
> > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > Issue: https://github.com/tarantool/tarantool/issues/6227
> > 
> >  src/lj_parse.c                                |  7 +++-
> >  ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
> >  2 files changed, 46 insertions(+), 2 deletions(-)
> >  create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > 
> > diff --git a/src/lj_parse.c b/src/lj_parse.c
> > index 08f7cfa6..a6325a76 100644
> > --- a/src/lj_parse.c
> > +++ b/src/lj_parse.c
> > @@ -853,9 +853,12 @@ static void bcemit_comp(FuncState *fs, BinOpr opr, ExpDesc *e1, ExpDesc *e2)
> >        e1 = e2; e2 = eret;  /* Swap operands. */
> >        op = ((op-BC_ISLT)^3)+BC_ISLT;
> >        expr_toval(fs, e1);
> > +      ra = expr_toanyreg(fs, e1);
> > +      rd = expr_toanyreg(fs, e2);
> > +    } else {
> > +      rd = expr_toanyreg(fs, e2);
> > +      ra = expr_toanyreg(fs, e1);
> >      }
> > -    rd = expr_toanyreg(fs, e2);
> > -    ra = expr_toanyreg(fs, e1);
> >      ins = BCINS_AD(op, ra, rd);
> >    }
> >    /* Using expr_free might cause asserts if the order is wrong. */
> > diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > new file mode 100644
> > index 00000000..66f6885e
> > --- /dev/null
> > +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > @@ -0,0 +1,41 @@
> > +local tap = require('tap')
> > +local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
> > +test:plan(1)
> > +
> > +-- Test file to demonstrate assertion failure during recording
> > +-- wrong allocated bytecode for comparisons.
> > +-- See also https://github.com/tarantool/tarantool/issues/6227.
> > +
> > +-- Need function with RET0 bytecode to avoid reset of
> > +-- the first JIT slot with frame info. Also need no assignments
> > +-- by the caller.
> > +local function empty() end
> > +
> > +local uv = 0
> > +
> > +-- This function needs to reset register enumerating.
> > +-- Also set `J->maxslot` to zero.
> > +-- The upvalue function to call is loaded to 0 slot.
> > +local function bump_frame()
> > +  -- First call function with RET0 to set TREF_FRAME in the
> > +  -- last slot.
> > +  empty()
> > +  -- Test ISGE or ISGT bytecode. These bytecodes swap their
> > +  -- operands. Also, a constant is always loaded into the slot
> > +  -- smaller than upvalue. So, if upvalue loads before KSHORT,
> > +  -- then the difference between registers is more than 2 (2 is
> > +  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> > +  -- the bytecode after call and return as expected. That leads
> > +  -- to recording slots inconsistency and assertion failure at
> > +  -- `rec_check_slots()`.
> > +  empty(1>uv)
> > +end
> > +
> > +jit.opt.start('hotloop=1')
> > +
> > +for _ = 1,3 do
> > +  bump_frame()
> > +end
> > +
> > +test:ok(true)
> > +os.exit(test:check() and 0 or 1)
> > -- 
> > 2.31.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-07-19  7:36 [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons Sergey Kaplun via Tarantool-patches
  2021-08-01 10:43 ` Igor Munkin via Tarantool-patches
@ 2021-08-10 17:03 ` Sergey Ostanevich via Tarantool-patches
  2021-08-16 16:44   ` Sergey Kaplun via Tarantool-patches
  2021-08-17  9:24 ` Igor Munkin via Tarantool-patches
  2 siblings, 1 reply; 10+ messages in thread
From: Sergey Ostanevich via Tarantool-patches @ 2021-08-10 17:03 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch! Just minor grammar, LGTM.

Sergos



> On 19 Jul 2021, at 10:36, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> From: Mike Pall <mike>
> 
> (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> 
> When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
		 built (v3 in passive voice)

[skipped, new version]

> The first option is rewrite this slot by return values from the
		    to
> function.
> 
> The second option is clearing slot (i.e. set to zero) manually, when
		      to clear the
> there is no values to return. It is done by the next bytecode having RA
> dst mode. This obliges that the destination of RA takes the next slot
> after TREF_FRAME. For this an earlier instruction must use the smallest
> possible destination register (see `lj_record_ins()` for the details).
> 
> Bytecode allocator swaps operands for ISGT and ISGE comparisons.
		    can swap (since it is followed by ‘When’)
> When it happens, the aforementioned rule for registers allocations
							 allocation
> may be violated. When it happens, and this chunk is recording, the slot
> with TREF_FRAME is not rewritten (but the next empty slot after
> TREF_FRAME is) during bytecode recording. This leads to JIT slots
> inconsistency and assertion failure in `rec_check_slots()` during
> recording the next bytecode instruction.
	   of 
> 
> This patch fixes bytecode register allocation by changing the register
> allocation order in case of ISGT and ISGE bytecodes.
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Resolves tarantool/tarantool#6227
> ---
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> Issue: https://github.com/tarantool/tarantool/issues/6227
> 
> src/lj_parse.c                                |  7 +++-
> ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
> 2 files changed, 46 insertions(+), 2 deletions(-)
> create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> 
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index 08f7cfa6..a6325a76 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -853,9 +853,12 @@ static void bcemit_comp(FuncState *fs, BinOpr opr, ExpDesc *e1, ExpDesc *e2)
>       e1 = e2; e2 = eret;  /* Swap operands. */
>       op = ((op-BC_ISLT)^3)+BC_ISLT;
>       expr_toval(fs, e1);
> +      ra = expr_toanyreg(fs, e1);
> +      rd = expr_toanyreg(fs, e2);
> +    } else {
> +      rd = expr_toanyreg(fs, e2);
> +      ra = expr_toanyreg(fs, e1);
>     }
> -    rd = expr_toanyreg(fs, e2);
> -    ra = expr_toanyreg(fs, e1);
>     ins = BCINS_AD(op, ra, rd);
>   }
>   /* Using expr_free might cause asserts if the order is wrong. */
> diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> new file mode 100644
> index 00000000..66f6885e
> --- /dev/null
> +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> @@ -0,0 +1,41 @@
> +local tap = require('tap')
> +local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
> +test:plan(1)
> +
> +-- Test file to demonstrate assertion failure during recording
> +-- wrong allocated bytecode for comparisons.
> +-- See also https://github.com/tarantool/tarantool/issues/6227.
> +
> +-- Need function with RET0 bytecode to avoid reset of
> +-- the first JIT slot with frame info. Also need no assignments
> +-- by the caller.
> +local function empty() end
> +
> +local uv = 0
> +
> +-- This function needs to reset register enumerating.
> +-- Also set `J->maxslot` to zero.
> +-- The upvalue function to call is loaded to 0 slot.
> +local function bump_frame()
> +  -- First call function with RET0 to set TREF_FRAME in the
> +  -- last slot.
> +  empty()
> +  -- Test ISGE or ISGT bytecode. These bytecodes swap their
> +  -- operands. Also, a constant is always loaded into the slot
> +  -- smaller than upvalue. So, if upvalue loads before KSHORT,
> +  -- then the difference between registers is more than 2 (2 is
> +  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> +  -- the bytecode after call and return as expected. That leads
> +  -- to recording slots inconsistency and assertion failure at
> +  -- `rec_check_slots()`.
> +  empty(1>uv)
> +end
> +
> +jit.opt.start('hotloop=1')
> +
> +for _ = 1,3 do
> +  bump_frame()
> +end
> +
> +test:ok(true)
> +os.exit(test:check() and 0 or 1)
> -- 
> 2.31.0
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-08-01 17:10   ` Sergey Kaplun via Tarantool-patches
@ 2021-08-16  7:20     ` Igor Munkin via Tarantool-patches
  2021-08-16 16:40       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-16  7:20 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the explanation! Please consider the new comments below.

On 01.08.21, Sergey Kaplun wrote:
> Hi, Igor!
> 
> Thanks for the review!
> 
> On 01.08.21, Igor Munkin wrote:
> > Sergey,
> > 
> > Thanks for the patch! Please consider the comments below. I didn't check
> > the test yet, since I don't get the JIT peculiarities from your commit
> > message and comments. Please provide a clearer description and I'll
> > proceed with the review of the test case then.
> > 
> > On 19.07.21, Sergey Kaplun wrote:
> > > From: Mike Pall <mike>
> > > 
> > > (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> > > 
> > > When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
> > > two slots -- the first takes the TValue with the function to call, the
> > > second takes the additional frame information. The recording JIT
> > 
> > Minor: The second slot is the framelink in LuaJIT terms.
> 
> Yes, because it takes the additional frame information. How do you want
> to modify this line?

Just say that the second slot takes the framelink: this is lapidary.

> 
> > 
> > > machinery works pretty the same -- the function IR_KGC is loaded in the
> > > first slot, and the second is set to TREF_FRAME value. This value
> > > should be rewritten after return from a callee. It is done either by the
> > > return values either this slot is cleared (set to zero) manually with
> > > the next bytecode with RA dst mode with the assumption, that the dst RA
> > > takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
> > > the smallest possible destination register (see `lj_record_ins()` for
> > > the details).
> > 
> > The main point lies in the monstrous 5-line sentence. I've read several
> > times, but still don't get it. Could you please reword it in a not such
> > complex sentence?
> 
> The first option is rewrite this slot by return values from the
> function.

And this is not the case, right? I mean, this approach works fine even
without the patch, doesn't it?

> 
> The second option is clearing slot (i.e. set to zero) manually, when
> there is no values to return. It is done by the next bytecode having RA
> dst mode. This obliges that the destination of RA takes the next slot
> after TREF_FRAME. For this an earlier instruction must use the smallest
> possible destination register (see `lj_record_ins()` for the details).

Here is the case, got it, thanks! So, I guess it's enough to adjust the
commit message to be similar to the section above.

> 
> > 
> > > 
> > > Bytecode allocator swaps operands for ISGT and ISGE comparisons.

I believe this should be called "bytecode emitter" or just "frontend".

> > > When it happens, the aforementioned rule for registers allocations
> > > may be violated. When it happens, and this chunk is recording, the slot
> > > with TREF_FRAME is not rewritten (but the next empty slot after
> > > TREF_FRAME is) during bytecode recording. This leads to JIT slots
> > > inconsistency and assertion failure in `rec_check_slots()` during
> > > recording the next bytecode instruction.
> > > 
> > > This patch fixes bytecode register allocation by changing the register
> > > allocation order in case of ISGT and ISGE bytecodes.
> > 
> > It's better to use "virtual register" or even "VM register" to avoid
> > ambiguous plain "register" usage.
> 
> Changed to VM register.
> 
> > 
> > > 
> > > Sergey Kaplun:
> > > * added the description and the test for the problem
> > > 
> > > Resolves tarantool/tarantool#6227
> > 
> > Minor: Why #5629 is not mentioned?
> 
> Added.
> Branch is updated and force-pushed.
> 
> > 
> > > ---
> > > 
> > > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > > Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > > Issue: https://github.com/tarantool/tarantool/issues/6227
> > > 
> > >  src/lj_parse.c                                |  7 +++-
> > >  ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 2 deletions(-)
> > >  create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > 

<snipped>

> > > diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > new file mode 100644
> > > index 00000000..66f6885e
> > > --- /dev/null
> > > +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > @@ -0,0 +1,41 @@
> > > +local tap = require('tap')
> > > +local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
> > > +test:plan(1)
> > > +
> > > +-- Test file to demonstrate assertion failure during recording
> > > +-- wrong allocated bytecode for comparisons.
> > > +-- See also https://github.com/tarantool/tarantool/issues/6227.
> > > +
> > > +-- Need function with RET0 bytecode to avoid reset of
> > > +-- the first JIT slot with frame info. Also need no assignments
> > > +-- by the caller.
> > > +local function empty() end
> > > +
> > > +local uv = 0
> > > +
> > > +-- This function needs to reset register enumerating.
> > > +-- Also set `J->maxslot` to zero.

Please add the reason, why J->maxslot is zero (it is initialized with
nargs in <rec_call_setup>).

> > > +-- The upvalue function to call is loaded to 0 slot.
> > > +local function bump_frame()
> > > +  -- First call function with RET0 to set TREF_FRAME in the
> > > +  -- last slot.
> > > +  empty()
> > > +  -- Test ISGE or ISGT bytecode. These bytecodes swap their
> > > +  -- operands. Also, a constant is always loaded into the slot
> > > +  -- smaller than upvalue. So, if upvalue loads before KSHORT,
> > > +  -- then the difference between registers is more than 2 (2 is
> > > +  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> > > +  -- the bytecode after call and return as expected. That leads

If the constant is loaded into a slot prior to the one with an upvalue,
then how upvalue can be loaded *before* KSHORT? How the difference
becomes more than 2? I don't get this math.

Furthermore, what does stop you from using local variables?

> > > +  -- to recording slots inconsistency and assertion failure at
> > > +  -- `rec_check_slots()`.
> > > +  empty(1>uv)
> > > +end
> > > +
> > > +jit.opt.start('hotloop=1')

It's worth to mention, that such JIT engine tuning allows to compile
<empty> function at first, and only later compile the loop below. As a
result <empty> function is not inlined into the loop body, so the fix
can be checked.

> > > +
> > > +for _ = 1,3 do

Minor: Space is missing after the comma.

> > > +  bump_frame()
> > > +end
> > > +
> > > +test:ok(true)
> > > +os.exit(test:check() and 0 or 1)
> > > -- 
> > > 2.31.0
> > > 
> > 
> > -- 
> > Best regards,
> > IM
> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-08-16 16:40       ` Sergey Kaplun via Tarantool-patches
@ 2021-08-16 16:27         ` Igor Munkin via Tarantool-patches
  2021-08-17  7:36           ` Vitaliia Ioffe via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-16 16:27 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the fixes! LGTM now.

On 16.08.21, Sergey Kaplun wrote:
> Igor,
> 
> Thanks for the review!
> 
> See the new comment message above:
> ===================================================================
> Fix bytecode register allocation for comparisons.
> 
> (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> 
> When LuaJIT is built with LJ_FR2 (e.g. with GC64 mode enabled),
> information about frame takes two slots -- the first takes the TValue
> with the function to be called, the second takes the framelink. The JIT
> recording machinery does pretty the same -- the function IR_KGC is
> loaded in the first slot, and the second is set to TREF_FRAME value.
> This value should be rewritten after return from a callee. This slot is
> cleared either by return values or manually (set to zero), when there
> are no values to return. The latter case is done by the next bytecode
> with RA dst mode. This obliges that the destination of RA takes the next
> slot after TREF_FRAME. Hence, this an earlier instruction must use the
> smallest possible destination register (see `lj_record_ins()` for the
> details).
> 
> Bytecode emitter swaps operands for ISGT and ISGE comparisons. As a
> result, the aforementioned rule for registers allocations may be
> violated. When it happens for a chunk being recorded, the slot with
> TREF_FRAME is not rewritten (but the next empty slot after TREF_FRAME
> is). This leads to JIT slots inconsistency and assertion failure in
> `rec_check_slots()` during recording of the next bytecode instruction.
> 
> This patch fixes bytecode register allocation by changing the VM
> register allocation order in case of ISGT and ISGE bytecodes.
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Resolves tarantool/tarantool#6227
> Part of tarantool/tarantool#5629
> ===================================================================
> 
> On 16.08.21, Igor Munkin wrote:

<snipped>

> 
> > 
> > Furthermore, what does stop you from using local variables?
> 
> They occupy new slots and make it harder to maintain, see the new
> comment below.

Meh, OK anyway :)

> 
> > 

<snipped>

> 
> ===================================================================
> diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> index 66f6885e..9788923a 100644
> --- a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> @@ -14,26 +14,39 @@ local function empty() end
>  local uv = 0
>  
>  -- This function needs to reset register enumerating.
> --- Also set `J->maxslot` to zero.
> --- The upvalue function to call is loaded to 0 slot.
> +-- `J->maxslot` is initialized with `nargs` (i.e. zero in this
> +-- case) in `rec_call_setup()`.
>  local function bump_frame()
>    -- First call function with RET0 to set TREF_FRAME in the
>    -- last slot.
>    empty()
> +  -- The old bytecode to be recorded looks like the following:
> +  -- 0000  . FUNCF    4
> +  -- 0001  . UGET     0   0      ; empty
> +  -- 0002  . CALL     0   1   1
> +  -- 0000  . . JFUNCF   1   1
> +  -- 0001  . . RET0     0   1
> +  -- 0002  . CALL     0   1   1
> +  -- 0003  . UGET     0   0      ; empty
> +  -- 0004  . UGET     3   1      ; uv
> +  -- 0005  . KSHORT   2   1
> +  -- 0006  . ISLT     3   2
>    -- Test ISGE or ISGT bytecode. These bytecodes swap their
> -  -- operands. Also, a constant is always loaded into the slot
> -  -- smaller than upvalue. So, if upvalue loads before KSHORT,
> -  -- then the difference between registers is more than 2 (2 is
> -  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> -  -- the bytecode after call and return as expected. That leads
> -  -- to recording slots inconsistency and assertion failure at
> -  -- `rec_check_slots()`.
> +  -- operands (consider ISLT above).
> +  -- Two calls of `empty()` function in a row is necessary for 2
> +  -- slot gap in LJ_FR2 mode.
> +  -- Upvalue loads before KSHORT, so the difference between slot
> +  -- for upvalue `empty` (function to be called) and slot for
> +  -- upvalue `uv` is more than 2. Hence, TREF_FRAME slot is not
> +  -- rewritten by the bytecode after return from `empty()`
> +  -- function as expected. That leads to recording slots
> +  -- inconsistency and assertion failure at `rec_check_slots()`.
>    empty(1>uv)
>  end
>  
>  jit.opt.start('hotloop=1')
>  
> -for _ = 1,3 do
> +for _ = 1, 3 do
>    bump_frame()
>  end
> ===================================================================
> 

<snipped>

> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-08-16  7:20     ` Igor Munkin via Tarantool-patches
@ 2021-08-16 16:40       ` Sergey Kaplun via Tarantool-patches
  2021-08-16 16:27         ` Igor Munkin via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-16 16:40 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

See the new comment message above:
===================================================================
Fix bytecode register allocation for comparisons.

(cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)

When LuaJIT is built with LJ_FR2 (e.g. with GC64 mode enabled),
information about frame takes two slots -- the first takes the TValue
with the function to be called, the second takes the framelink. The JIT
recording machinery does pretty the same -- the function IR_KGC is
loaded in the first slot, and the second is set to TREF_FRAME value.
This value should be rewritten after return from a callee. This slot is
cleared either by return values or manually (set to zero), when there
are no values to return. The latter case is done by the next bytecode
with RA dst mode. This obliges that the destination of RA takes the next
slot after TREF_FRAME. Hence, this an earlier instruction must use the
smallest possible destination register (see `lj_record_ins()` for the
details).

Bytecode emitter swaps operands for ISGT and ISGE comparisons. As a
result, the aforementioned rule for registers allocations may be
violated. When it happens for a chunk being recorded, the slot with
TREF_FRAME is not rewritten (but the next empty slot after TREF_FRAME
is). This leads to JIT slots inconsistency and assertion failure in
`rec_check_slots()` during recording of the next bytecode instruction.

This patch fixes bytecode register allocation by changing the VM
register allocation order in case of ISGT and ISGE bytecodes.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6227
Part of tarantool/tarantool#5629
===================================================================

On 16.08.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the explanation! Please consider the new comments below.
> 
> On 01.08.21, Sergey Kaplun wrote:
> > Hi, Igor!
> > 
> > Thanks for the review!
> > 
> > On 01.08.21, Igor Munkin wrote:
> > > Sergey,
> > > 
> > > Thanks for the patch! Please consider the comments below. I didn't check
> > > the test yet, since I don't get the JIT peculiarities from your commit
> > > message and comments. Please provide a clearer description and I'll
> > > proceed with the review of the test case then.
> > > 
> > > On 19.07.21, Sergey Kaplun wrote:
> > > > From: Mike Pall <mike>
> > > > 
> > > > (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> > > > 
> > > > When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
> > > > two slots -- the first takes the TValue with the function to call, the
> > > > second takes the additional frame information. The recording JIT
> > > 
> > > Minor: The second slot is the framelink in LuaJIT terms.
> > 
> > Yes, because it takes the additional frame information. How do you want
> > to modify this line?
> 
> Just say that the second slot takes the framelink: this is lapidary.

Fixed.

> 
> > 
> > > 
> > > > machinery works pretty the same -- the function IR_KGC is loaded in the
> > > > first slot, and the second is set to TREF_FRAME value. This value
> > > > should be rewritten after return from a callee. It is done either by the
> > > > return values either this slot is cleared (set to zero) manually with
> > > > the next bytecode with RA dst mode with the assumption, that the dst RA
> > > > takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
> > > > the smallest possible destination register (see `lj_record_ins()` for
> > > > the details).
> > > 
> > > The main point lies in the monstrous 5-line sentence. I've read several
> > > times, but still don't get it. Could you please reword it in a not such
> > > complex sentence?
> > 
> > The first option is rewrite this slot by return values from the
> > function.
> 
> And this is not the case, right? I mean, this approach works fine even
> without the patch, doesn't it?

Yep, exactly.

> 
> > 
> > The second option is clearing slot (i.e. set to zero) manually, when
> > there is no values to return. It is done by the next bytecode having RA
> > dst mode. This obliges that the destination of RA takes the next slot
> > after TREF_FRAME. For this an earlier instruction must use the smallest
> > possible destination register (see `lj_record_ins()` for the details).
> 
> Here is the case, got it, thanks! So, I guess it's enough to adjust the
> commit message to be similar to the section above.

Updated.

> 
> > 
> > > 
> > > > 
> > > > Bytecode allocator swaps operands for ISGT and ISGE comparisons.
> 
> I believe this should be called "bytecode emitter" or just "frontend".

Fixed.

> 
> > > > When it happens, the aforementioned rule for registers allocations
> > > > may be violated. When it happens, and this chunk is recording, the slot
> > > > with TREF_FRAME is not rewritten (but the next empty slot after
> > > > TREF_FRAME is) during bytecode recording. This leads to JIT slots
> > > > inconsistency and assertion failure in `rec_check_slots()` during
> > > > recording the next bytecode instruction.
> > > > 
> > > > This patch fixes bytecode register allocation by changing the register
> > > > allocation order in case of ISGT and ISGE bytecodes.
> > > 
> > > It's better to use "virtual register" or even "VM register" to avoid
> > > ambiguous plain "register" usage.
> > 
> > Changed to VM register.
> > 
> > > 
> > > > 
> > > > Sergey Kaplun:
> > > > * added the description and the test for the problem
> > > > 
> > > > Resolves tarantool/tarantool#6227
> > > 
> > > Minor: Why #5629 is not mentioned?
> > 
> > Added.
> > Branch is updated and force-pushed.
> > 
> > > 
> > > > ---
> > > > 
> > > > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > > > Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > > > Issue: https://github.com/tarantool/tarantool/issues/6227
> > > > 
> > > >  src/lj_parse.c                                |  7 +++-
> > > >  ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
> > > >  2 files changed, 46 insertions(+), 2 deletions(-)
> > > >  create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > > 
> 
> <snipped>
> 
> > > > diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > > new file mode 100644
> > > > index 00000000..66f6885e
> > > > --- /dev/null
> > > > +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > > @@ -0,0 +1,41 @@
> > > > +local tap = require('tap')
> > > > +local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
> > > > +test:plan(1)
> > > > +
> > > > +-- Test file to demonstrate assertion failure during recording
> > > > +-- wrong allocated bytecode for comparisons.
> > > > +-- See also https://github.com/tarantool/tarantool/issues/6227.
> > > > +
> > > > +-- Need function with RET0 bytecode to avoid reset of
> > > > +-- the first JIT slot with frame info. Also need no assignments
> > > > +-- by the caller.
> > > > +local function empty() end
> > > > +
> > > > +local uv = 0
> > > > +
> > > > +-- This function needs to reset register enumerating.
> > > > +-- Also set `J->maxslot` to zero.
> 
> Please add the reason, why J->maxslot is zero (it is initialized with
> nargs in <rec_call_setup>).

Fixed.

> 
> > > > +-- The upvalue function to call is loaded to 0 slot.
> > > > +local function bump_frame()
> > > > +  -- First call function with RET0 to set TREF_FRAME in the
> > > > +  -- last slot.
> > > > +  empty()
> > > > +  -- Test ISGE or ISGT bytecode. These bytecodes swap their
> > > > +  -- operands. Also, a constant is always loaded into the slot
> > > > +  -- smaller than upvalue. So, if upvalue loads before KSHORT,
> > > > +  -- then the difference between registers is more than 2 (2 is
> > > > +  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> > > > +  -- the bytecode after call and return as expected. That leads
> 
> If the constant is loaded into a slot prior to the one with an upvalue,
> then how upvalue can be loaded *before* KSHORT? How the difference
> becomes more than 2? I don't get this math.

See updated comment below.

> 
> Furthermore, what does stop you from using local variables?

They occupy new slots and make it harder to maintain, see the new
comment below.

> 
> > > > +  -- to recording slots inconsistency and assertion failure at
> > > > +  -- `rec_check_slots()`.
> > > > +  empty(1>uv)
> > > > +end
> > > > +
> > > > +jit.opt.start('hotloop=1')
> 
> It's worth to mention, that such JIT engine tuning allows to compile
> <empty> function at first, and only later compile the loop below. As a
> result <empty> function is not inlined into the loop body, so the fix
> can be checked.

Two call to `empty()` function is necessary for 2 slot gap after call,
see the new comment below.

> 
> > > > +
> > > > +for _ = 1,3 do
> 
> Minor: Space is missing after the comma.

Fixed.

===================================================================
diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
index 66f6885e..9788923a 100644
--- a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
+++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
@@ -14,26 +14,39 @@ local function empty() end
 local uv = 0
 
 -- This function needs to reset register enumerating.
--- Also set `J->maxslot` to zero.
--- The upvalue function to call is loaded to 0 slot.
+-- `J->maxslot` is initialized with `nargs` (i.e. zero in this
+-- case) in `rec_call_setup()`.
 local function bump_frame()
   -- First call function with RET0 to set TREF_FRAME in the
   -- last slot.
   empty()
+  -- The old bytecode to be recorded looks like the following:
+  -- 0000  . FUNCF    4
+  -- 0001  . UGET     0   0      ; empty
+  -- 0002  . CALL     0   1   1
+  -- 0000  . . JFUNCF   1   1
+  -- 0001  . . RET0     0   1
+  -- 0002  . CALL     0   1   1
+  -- 0003  . UGET     0   0      ; empty
+  -- 0004  . UGET     3   1      ; uv
+  -- 0005  . KSHORT   2   1
+  -- 0006  . ISLT     3   2
   -- Test ISGE or ISGT bytecode. These bytecodes swap their
-  -- operands. Also, a constant is always loaded into the slot
-  -- smaller than upvalue. So, if upvalue loads before KSHORT,
-  -- then the difference between registers is more than 2 (2 is
-  -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
-  -- the bytecode after call and return as expected. That leads
-  -- to recording slots inconsistency and assertion failure at
-  -- `rec_check_slots()`.
+  -- operands (consider ISLT above).
+  -- Two calls of `empty()` function in a row is necessary for 2
+  -- slot gap in LJ_FR2 mode.
+  -- Upvalue loads before KSHORT, so the difference between slot
+  -- for upvalue `empty` (function to be called) and slot for
+  -- upvalue `uv` is more than 2. Hence, TREF_FRAME slot is not
+  -- rewritten by the bytecode after return from `empty()`
+  -- function as expected. That leads to recording slots
+  -- inconsistency and assertion failure at `rec_check_slots()`.
   empty(1>uv)
 end
 
 jit.opt.start('hotloop=1')
 
-for _ = 1,3 do
+for _ = 1, 3 do
   bump_frame()
 end
===================================================================

 
> 
> > > > +  bump_frame()
> > > > +end
> > > > +
> > > > +test:ok(true)
> > > > +os.exit(test:check() and 0 or 1)
> > > > -- 
> > > > 2.31.0
> > > > 
> > > 
> > > -- 
> > > Best regards,
> > > IM
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-08-10 17:03 ` Sergey Ostanevich via Tarantool-patches
@ 2021-08-16 16:44   ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-16 16:44 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi, Sergos!

Thanks for the review!

On 10.08.21, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch! Just minor grammar, LGTM.

The new commit message is the following:

===================================================================
Fix bytecode register allocation for comparisons.

(cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)

When LuaJIT is built with LJ_FR2 (e.g. with GC64 mode enabled),
information about frame takes two slots -- the first takes the TValue
with the function to be called, the second takes the framelink. The JIT
recording machinery does pretty the same -- the function IR_KGC is
loaded in the first slot, and the second is set to TREF_FRAME value.
This value should be rewritten after return from a callee. This slot is
cleared either by return values or manually (set to zero), when there
are no values to return. The latter case is done by the next bytecode
with RA dst mode. This obliges that the destination of RA takes the next
slot after TREF_FRAME. Hence, this an earlier instruction must use the
smallest possible destination register (see `lj_record_ins()` for the
details).

Bytecode emitter swaps operands for ISGT and ISGE comparisons. As a
result, the aforementioned rule for registers allocations may be
violated. When it happens for a chunk being recorded, the slot with
TREF_FRAME is not rewritten (but the next empty slot after TREF_FRAME
is). This leads to JIT slots inconsistency and assertion failure in
`rec_check_slots()` during recording of the next bytecode instruction.

This patch fixes bytecode register allocation by changing the VM
register allocation order in case of ISGT and ISGE bytecodes.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6227
Part of tarantool/tarantool#5629
===================================================================

> 
> Sergos
> 
> 
> 

<snipped>

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-08-16 16:27         ` Igor Munkin via Tarantool-patches
@ 2021-08-17  7:36           ` Vitaliia Ioffe via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Vitaliia Ioffe via Tarantool-patches @ 2021-08-17  7:36 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 4931 bytes --]


QA LGTM
 
 
--
Vitaliia Ioffe
 
  
>Понедельник, 16 августа 2021, 19:52 +03:00 от Igor Munkin via Tarantool-patches <tarantool-patches@dev.tarantool.org>:
> 
>Sergey,
>
>Thanks for the fixes! LGTM now.
>
>On 16.08.21, Sergey Kaplun wrote:
>> Igor,
>>
>> Thanks for the review!
>>
>> See the new comment message above:
>> ===================================================================
>> Fix bytecode register allocation for comparisons.
>>
>> (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
>>
>> When LuaJIT is built with LJ_FR2 (e.g. with GC64 mode enabled),
>> information about frame takes two slots -- the first takes the TValue
>> with the function to be called, the second takes the framelink. The JIT
>> recording machinery does pretty the same -- the function IR_KGC is
>> loaded in the first slot, and the second is set to TREF_FRAME value.
>> This value should be rewritten after return from a callee. This slot is
>> cleared either by return values or manually (set to zero), when there
>> are no values to return. The latter case is done by the next bytecode
>> with RA dst mode. This obliges that the destination of RA takes the next
>> slot after TREF_FRAME. Hence, this an earlier instruction must use the
>> smallest possible destination register (see `lj_record_ins()` for the
>> details).
>>
>> Bytecode emitter swaps operands for ISGT and ISGE comparisons. As a
>> result, the aforementioned rule for registers allocations may be
>> violated. When it happens for a chunk being recorded, the slot with
>> TREF_FRAME is not rewritten (but the next empty slot after TREF_FRAME
>> is). This leads to JIT slots inconsistency and assertion failure in
>> `rec_check_slots()` during recording of the next bytecode instruction.
>>
>> This patch fixes bytecode register allocation by changing the VM
>> register allocation order in case of ISGT and ISGE bytecodes.
>>
>> Sergey Kaplun:
>> * added the description and the test for the problem
>>
>> Resolves tarantool/tarantool#6227
>> Part of tarantool/tarantool#5629
>> ===================================================================
>>
>> On 16.08.21, Igor Munkin wrote:
>
><snipped>
>
>>
>> >
>> > Furthermore, what does stop you from using local variables?
>>
>> They occupy new slots and make it harder to maintain, see the new
>> comment below.
>
>Meh, OK anyway :)
>
>>
>> >
>
><snipped>
>
>>
>> ===================================================================
>> diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
>> index 66f6885e..9788923a 100644
>> --- a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
>> +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
>> @@ -14,26 +14,39 @@ local function empty() end
>> local uv = 0
>>
>> -- This function needs to reset register enumerating.
>> --- Also set `J->maxslot` to zero.
>> --- The upvalue function to call is loaded to 0 slot.
>> +-- `J->maxslot` is initialized with `nargs` (i.e. zero in this
>> +-- case) in `rec_call_setup()`.
>> local function bump_frame()
>> -- First call function with RET0 to set TREF_FRAME in the
>> -- last slot.
>> empty()
>> + -- The old bytecode to be recorded looks like the following:
>> + -- 0000 . FUNCF 4
>> + -- 0001 . UGET 0 0 ; empty
>> + -- 0002 . CALL 0 1 1
>> + -- 0000 . . JFUNCF 1 1
>> + -- 0001 . . RET0 0 1
>> + -- 0002 . CALL 0 1 1
>> + -- 0003 . UGET 0 0 ; empty
>> + -- 0004 . UGET 3 1 ; uv
>> + -- 0005 . KSHORT 2 1
>> + -- 0006 . ISLT 3 2
>> -- Test ISGE or ISGT bytecode. These bytecodes swap their
>> - -- operands. Also, a constant is always loaded into the slot
>> - -- smaller than upvalue. So, if upvalue loads before KSHORT,
>> - -- then the difference between registers is more than 2 (2 is
>> - -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
>> - -- the bytecode after call and return as expected. That leads
>> - -- to recording slots inconsistency and assertion failure at
>> - -- `rec_check_slots()`.
>> + -- operands (consider ISLT above).
>> + -- Two calls of `empty()` function in a row is necessary for 2
>> + -- slot gap in LJ_FR2 mode.
>> + -- Upvalue loads before KSHORT, so the difference between slot
>> + -- for upvalue `empty` (function to be called) and slot for
>> + -- upvalue `uv` is more than 2. Hence, TREF_FRAME slot is not
>> + -- rewritten by the bytecode after return from `empty()`
>> + -- function as expected. That leads to recording slots
>> + -- inconsistency and assertion failure at `rec_check_slots()`.
>> empty(1>uv)
>> end
>>
>> jit.opt.start('hotloop=1')
>>
>> -for _ = 1,3 do
>> +for _ = 1, 3 do
>> bump_frame()
>> end
>> ===================================================================
>>
>
><snipped>
>
>>
>> --
>> Best regards,
>> Sergey Kaplun
>
>--
>Best regards,
>IM
 

[-- Attachment #2: Type: text/html, Size: 5990 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
  2021-07-19  7:36 [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons Sergey Kaplun via Tarantool-patches
  2021-08-01 10:43 ` Igor Munkin via Tarantool-patches
  2021-08-10 17:03 ` Sergey Ostanevich via Tarantool-patches
@ 2021-08-17  9:24 ` Igor Munkin via Tarantool-patches
  2 siblings, 0 replies; 10+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-08-17  9:24 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

I've checked the patch into all long-term branches in tarantool/luajit
and bumped a new version in 1.10, 2.7, 2.8 and master.

On 19.07.21, Sergey Kaplun wrote:
> From: Mike Pall <mike>
> 
> (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> 
> When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
> two slots -- the first takes the TValue with the function to call, the
> second takes the additional frame information. The recording JIT
> machinery works pretty the same -- the function IR_KGC is loaded in the
> first slot, and the second is set to TREF_FRAME value. This value
> should be rewritten after return from a callee. It is done either by the
> return values either this slot is cleared (set to zero) manually with
> the next bytecode with RA dst mode with the assumption, that the dst RA
> takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
> the smallest possible destination register (see `lj_record_ins()` for
> the details).
> 
> Bytecode allocator swaps operands for ISGT and ISGE comparisons.
> When it happens, the aforementioned rule for registers allocations
> may be violated. When it happens, and this chunk is recording, the slot
> with TREF_FRAME is not rewritten (but the next empty slot after
> TREF_FRAME is) during bytecode recording. This leads to JIT slots
> inconsistency and assertion failure in `rec_check_slots()` during
> recording the next bytecode instruction.
> 
> This patch fixes bytecode register allocation by changing the register
> allocation order in case of ISGT and ISGE bytecodes.
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Resolves tarantool/tarantool#6227
> ---
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> Issue: https://github.com/tarantool/tarantool/issues/6227
> 
>  src/lj_parse.c                                |  7 +++-
>  ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
>  2 files changed, 46 insertions(+), 2 deletions(-)
>  create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> 

<snipped>

> -- 
> 2.31.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-17  9:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19  7:36 [Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons Sergey Kaplun via Tarantool-patches
2021-08-01 10:43 ` Igor Munkin via Tarantool-patches
2021-08-01 17:10   ` Sergey Kaplun via Tarantool-patches
2021-08-16  7:20     ` Igor Munkin via Tarantool-patches
2021-08-16 16:40       ` Sergey Kaplun via Tarantool-patches
2021-08-16 16:27         ` Igor Munkin via Tarantool-patches
2021-08-17  7:36           ` Vitaliia Ioffe via Tarantool-patches
2021-08-10 17:03 ` Sergey Ostanevich via Tarantool-patches
2021-08-16 16:44   ` Sergey Kaplun via Tarantool-patches
2021-08-17  9:24 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox