Hi, Sergos!

Thanks for the review!

On 14.12.22, sergos wrote:
> Hi!
>
> Thanks for the patch!
>
> Some addition to Max’s comments. And a question on the test.
>
> Sergos
>
> > On 8 Dec 2022, at 08:46, Sergey Kaplun <skaplun@tarantool.org> wrote:
> >
> > From: Mike Pall <mike>
> >
> > Contributed by Peter Cawley.
> >
> > (cherry picked from commit b4ed3219a1a98dd9fe7d1e3eeea3b82f5a780948)
> >
> > When emitting `IR_HREF` for constant value to lookup the `ir_khash()`
> an ^^^
> perhaps just ‘for a constant value lokup’?
>
> > function is used to calculate hash for the corresponding object.
> > This calculation must be the same as in the corresponding `hashkey()`
> > function from <lj_tab.c>.
> >
> > Hash calculating via passing two arguments `lo`, and `hi` to `hashrot()`
> the
>
> > routine. For non-string GC objects the first `lo` argument is the same
> > for GC64 and not GC64 mode -- lower 32 bits of the object address. For
> > GC64 mode `hi` argument is upper 32 bits of the object address,
> > including specific type NaN-tag. This `hi` argument in `ir_khash()`
> a
>
> > function is miscalculated in GC64 using non-GC64 value (`lo` +
> mode a
>
> > `HASH_BIAS`). As a result, the hash for the GC object is miscalculated
> > on trace and we exit from trace due to assertion guard on the type or
> the an
> > value check.
> >
> > This patch fixes calculation of hash value on trace for GC64 mode by
> > making it consistent with `hashkey()`.
> the
> >

Fixed your comments.
The new commit message is the following:

| LJ_GC64: Fix ir_khash for non-string GCobj.
|
| Contributed by Peter Cawley.
|
| (cherry picked from commit b4ed3219a1a98dd9fe7d1e3eeea3b82f5a780948)
|
| When emitting the `IR_HREF` for a constant value lookup the `ir_khash()`
| function is used to calculate the hash for the corresponding object.
| This calculation must be the same as in the corresponding `hashkey()`
| function from <lj_tab.c>.
|
| Hash is calculated by passing two arguments `lo`, and `hi` to the
| `hashrot()` routine. For non-string GC objects the first `lo` argument
| is the same for GC64 and not GC64 mode -- lower 32 bits of the object
| address. For GC64 mode `hi` argument is upper 32 bits of the object
| address, including a specific type NaN-tag. This `hi` argument in
| `ir_khash()` function is miscalculated in GC64 mode using a non-GC64
| value (`lo` + `HASH_BIAS`). As a result, the hash for the GC object is
| miscalculated on trace and we exit from the trace due to an assertion
| guard on the type or value check.
|
| This patch fixes calculation of the hash value on trace for GC64 mode by
| making it consistent with the `hashkey()`.
|
| Sergey Kaplun:
| * added the description and the test for the problem
|
| Part of tarantool/tarantool#7230

> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#7230
> > ---
> >
> > Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-356-ir-khash-non-string-obj-full-ci
> > Issue/PR:
> > * https://github.com/tarantool/tarantool/issues/7230
> > * https://github.com/LuaJIT/LuaJIT/pull/356
> > Tarantool PR: https://github.com/tarantool/tarantool/pull/8020
> >
> > Side note: Problems with red fuzzer jobs look irrelevant to the patch.

<snipped>

> > diff --git a/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
> > new file mode 100644
> > index 00000000..fff0b1a5
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
> > @@ -0,0 +1,90 @@
> > +local tap = require('tap')
> > +local traceinfo = require('jit.util').traceinfo
> > +local table_new = require('table.new')
> > +
> > +-- Test file to demonstrate the incorrect GC64 JIT behaviour
> > +-- for `IR_HREF` for on-trace-constant key lookup.
> of an an
> > +-- See also https://github.com/LuaJIT/LuaJIT/pull/356.
> > +local test = tap.test('lj-356-ir-khash-non-string-obj')
> > +local N_ITERATIONS = 4
> > +
> > +-- Amount of iteration for trace compilation and execution and
> > +-- additional check, that there is no new trace compiled.
> > +test:plan(N_ITERATIONS + 1)
> > +
> > +-- To reproduce the issue we need to compile a trace with
> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To
> > +-- prevent `IR_HREFK` to be emitted instead, we need a table with
> an `IR_HREFK` emission

Side note: I'm not sure about "emission" corectness here, so ignoring
this part.

I've fixed the rest of your comments, see the iterative patch below.

===================================================================
diff --git a/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
index fff0b1a5..7f304183 100644
--- a/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
+++ b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
@@ -3,7 +3,7 @@ local traceinfo = require('jit.util').traceinfo
local table_new = require('table.new')

-- Test file to demonstrate the incorrect GC64 JIT behaviour
--- for `IR_HREF` for on-trace-constant key lookup.
+-- of an `IR_HREF` for the on-trace-constant key lookup.
-- See also https://github.com/LuaJIT/LuaJIT/pull/356.
local test = tap.test('lj-356-ir-khash-non-string-obj')
local N_ITERATIONS = 4
@@ -14,10 +14,10 @@ test:plan(N_ITERATIONS + 1)

-- To reproduce the issue we need to compile a trace with
-- `IR_HREF`, with a lookup of constant hash key GC value. To
--- prevent `IR_HREFK` to be emitted instead, we need a table with
--- a huge hash part. Delta of address between the start of the
--- hash part of the table and the current node to lookup must be
--- more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- prevent an `IR_HREFK` to be emitted instead, we need a table
+-- with a huge hash part. Delta of address between the start of
+-- the hash part of the table and the current node to lookup must
+-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
-- See <src/lj_record.c>, for details.
-- XXX: This constant is well suited to prevent test to be flaky,
-- because the aforementioned delta is always large enough.
@@ -36,8 +36,8 @@ end
-- exiting the main test cycle.
jit.opt.start('hotloop=1')

--- Prevent `get_const_cdata()` become hot and be compiled before
--- the main test cycle.
+-- Prevent `get_const_cdata()` from becoming hot and being
+-- compiled before the main test cycle.
jit.off()

filled_tab[get_const_cdata()] = MAGIC
@@ -46,10 +46,10 @@ filled_tab[get_const_cdata()] = MAGIC
jit.on()

-- Filling-up the table with GC values to minimize the amount of
--- hash collisions and increases delta between the start of the
+-- hash collisions and increase delta between the start of the
-- hash part of the table and currently stored node.
-for i = 1, N_HASH_FIELDS do
- filled_tab[1LL] = i
+for _ = 1, N_HASH_FIELDS do
+ filled_tab[1LL] = 1
end

-- Prevent JIT misbehaviour before the main test chunk.
===================================================================

>
> > +-- a huge hash part. Delta of address between the start of the
> > +-- hash part of the table and the current node to lookup must be
> > +-- more than `(1024 * 64 - 1) * sizeof(Node)`.
> > +-- See <src/lj_record.c>, for details.
> > +-- XXX: This constant is well suited to prevent test to be flaky,
> > +-- because the aforementioned delta is always large enough.
> > +local N_HASH_FIELDS = 1024 * 1024 * 8
> > +local MAGIC = 42

<snipped>

> > +
> > +test:ok(not traceinfo(2), 'the second trace should not be compiled')
>
> That’s not quite clear to me: a second trace generation is a side-effect
> of the incorrect hash calculation. Is it always leads to the trace
> generation?

How I see this for now. There are two possibilities, when the
aforementioned hash is miscalculated:

1) We got `nil` value on a trace to lookup and we exit from the trace by
assertion guard on the field type (the most possible one, AFAIKS).
2) We got a value for some existing cdata after hash lookup, so we don't
exit from a trace, but got an incorrect value by the given key. NB: I've
updated the generation of the table content to avoid clashing with
`MAGIC` value on the 42nd iteration :).

So this test should cover both cases.

>
> > +
> > +-- No more need to prevent trace compilation.
> > +jit.on()
> > +
> > +for i = 1, N_ITERATIONS do
> > + -- Check that that all lookups are correct and there is no
> > + -- value from other cdata stored in the table.
> > + test:ok(result_tab[i] == MAGIC, 'correct hash lookup from the table')
>
> And this one checks what then? The hash is calculated correctly, but the value
> read from the `filled_tab` is incorrect - what can lead to this?
>
> > +end
> > +
> > +os.exit(test:check() and 0 or 1)
> > --
> > 2.34.1
> >
>

--
Best regards,
Sergey Kaplun