Sergos

On 15 Dec 2022, at 13:13, Sergey Kaplun <skaplun@tarantool.org> wrote:

Hi, Sergos!

Thanks for the review!

On 14.12.22, sergos wrote:
Hi!

Thanks for the patch!

Some addition to Max’s comments. And a question on the test.

Sergos

On 8 Dec 2022, at 08:46, Sergey Kaplun <skaplun@tarantool.org> wrote:

From: Mike Pall <mike>

Contributed by Peter Cawley.

(cherry picked from commit b4ed3219a1a98dd9fe7d1e3eeea3b82f5a780948)

When emitting `IR_HREF` for constant value to lookup the `ir_khash()`
              an                           ^^^
                        perhaps just ‘for a constant value lokup’?

function is used to calculate hash for the corresponding object.
This calculation must be the same as in the corresponding `hashkey()`
function from <lj_tab.c>.

Hash calculating via passing two arguments `lo`, and `hi` to `hashrot()`
                                                            the

routine. For non-string GC objects the first `lo` argument is the same
for GC64 and not GC64 mode -- lower 32 bits of the object address. For
GC64 mode `hi` argument is upper 32 bits of the object address,
including specific type NaN-tag. This `hi` argument in `ir_khash()`
          a

function is miscalculated in GC64 using non-GC64 value (`lo` +
                                mode    a

`HASH_BIAS`). As a result, the hash for the GC object is miscalculated
on trace and we exit from trace due to assertion guard on the type or
                         the          an
value check.

This patch fixes calculation of hash value on trace for GC64 mode by
making it consistent with `hashkey()`.
                         the

Fixed your comments.
The new commit message is the following:

| LJ_GC64: Fix ir_khash for non-string GCobj.
|
| Contributed by Peter Cawley.
|
| (cherry picked from commit b4ed3219a1a98dd9fe7d1e3eeea3b82f5a780948)
|
| When emitting the `IR_HREF` for a constant value lookup the `ir_khash()`
| function is used to calculate the hash for the corresponding object.
| This calculation must be the same as in the corresponding `hashkey()`
| function from <lj_tab.c>.
|
| Hash is calculated by passing two arguments `lo`, and `hi` to the
| `hashrot()` routine. For non-string GC objects the first `lo` argument
| is the same for GC64 and not GC64 mode -- lower 32 bits of the object
| address. For GC64 mode `hi` argument is upper 32 bits of the object
| address, including a specific type NaN-tag. This `hi` argument in
| `ir_khash()` function is miscalculated in GC64 mode using a non-GC64
| value (`lo` + `HASH_BIAS`). As a result, the hash for the GC object is
| miscalculated on trace and we exit from the trace due to an assertion
| guard on the type or value check.
|
| This patch fixes calculation of the hash value on trace for GC64 mode by
| making it consistent with the `hashkey()`.
|
| Sergey Kaplun:
| * added the description and the test for the problem
|
| Part of tarantool/tarantool#7230

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#7230
---

Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-356-ir-khash-non-string-obj-full-ci
Issue/PR:
* https://github.com/tarantool/tarantool/issues/7230
* https://github.com/LuaJIT/LuaJIT/pull/356
Tarantool PR: https://github.com/tarantool/tarantool/pull/8020

Side note: Problems with red fuzzer jobs look irrelevant to the patch.

<snipped>

diff --git a/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
new file mode 100644
index 00000000..fff0b1a5
--- /dev/null
+++ b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
@@ -0,0 +1,90 @@
+local tap = require('tap')
+local traceinfo = require('jit.util').traceinfo
+local table_new = require('table.new')
+
+-- Test file to demonstrate the incorrect GC64 JIT behaviour
+-- for `IR_HREF` for on-trace-constant key lookup.
     of an           an
+-- See also https://github.com/LuaJIT/LuaJIT/pull/356.
+local test = tap.test('lj-356-ir-khash-non-string-obj')
+local N_ITERATIONS = 4
+
+-- Amount of iteration for trace compilation and execution and
+-- additional check, that there is no new trace compiled.
+test:plan(N_ITERATIONS + 1)
+
+-- To reproduce the issue we need to compile a trace with
+-- `IR_HREF`, with a lookup of constant hash key GC value. To
+-- prevent `IR_HREFK` to be emitted instead, we need a table with
            an `IR_HREFK` emission

Side note: I'm not sure about "emission" corectness here, so ignoring
this part.

I've fixed the rest of your comments, see the iterative patch below.

===================================================================
diff --git a/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
index fff0b1a5..7f304183 100644
--- a/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
+++ b/test/tarantool-tests/lj-356-ir-khash-non-string-obj.test.lua
@@ -3,7 +3,7 @@ local traceinfo = require('jit.util').traceinfo
local table_new = require('table.new')

-- Test file to demonstrate the incorrect GC64 JIT behaviour
--- for `IR_HREF` for on-trace-constant key lookup.
+-- of an `IR_HREF` for the on-trace-constant key lookup.
-- See also https://github.com/LuaJIT/LuaJIT/pull/356.
local test = tap.test('lj-356-ir-khash-non-string-obj')
local N_ITERATIONS = 4
@@ -14,10 +14,10 @@ test:plan(N_ITERATIONS + 1)

-- To reproduce the issue we need to compile a trace with
-- `IR_HREF`, with a lookup of constant hash key GC value. To
--- prevent `IR_HREFK` to be emitted instead, we need a table with
--- a huge hash part. Delta of address between the start of the
--- hash part of the table and the current node to lookup must be
--- more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- prevent an `IR_HREFK` to be emitted instead, we need a table
+-- with a huge hash part. Delta of address between the start of
+-- the hash part of the table and the current node to lookup must
+-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
-- See <src/lj_record.c>, for details.
-- XXX: This constant is well suited to prevent test to be flaky,
-- because the aforementioned delta is always large enough.
@@ -36,8 +36,8 @@ end
-- exiting the main test cycle.
jit.opt.start('hotloop=1')

--- Prevent `get_const_cdata()` become hot and be compiled before
--- the main test cycle.
+-- Prevent `get_const_cdata()` from becoming hot and being
+-- compiled before the main test cycle.
jit.off()

filled_tab[get_const_cdata()] = MAGIC
@@ -46,10 +46,10 @@ filled_tab[get_const_cdata()] = MAGIC
jit.on()

-- Filling-up the table with GC values to minimize the amount of
--- hash collisions and increases delta between the start of the
+-- hash collisions and increase delta between the start of the
-- hash part of the table and currently stored node.
-for i = 1, N_HASH_FIELDS do
- filled_tab[1LL] = i
+for _ = 1, N_HASH_FIELDS do
+ filled_tab[1LL] = 1
end

-- Prevent JIT misbehaviour before the main test chunk.
===================================================================

+-- a huge hash part. Delta of address between the start of the
+-- hash part of the table and the current node to lookup must be
+-- more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- See <src/lj_record.c>, for details.
+-- XXX: This constant is well suited to prevent test to be flaky,
+-- because the aforementioned delta is always large enough.
+local N_HASH_FIELDS = 1024 * 1024 * 8
+local MAGIC = 42

<snipped>

+
+test:ok(not traceinfo(2), 'the second trace should not be compiled')

That’s not quite clear to me: a second trace generation is a side-effect
of the incorrect hash calculation. Is it always leads to the trace
generation?

How I see this for now. There are two possibilities, when the
aforementioned hash is miscalculated:

1) We got `nil` value on a trace to lookup and we exit from the trace by
assertion guard on the field type (the most possible one, AFAIKS).
2) We got a value for some existing cdata after hash lookup, so we don't
exit from a trace, but got an incorrect value by the given key. NB: I've
updated the generation of the table content to avoid clashing with
`MAGIC` value on the 42nd iteration :).

So this test should cover both cases.

+
+-- No more need to prevent trace compilation.
+jit.on()
+
+for i = 1, N_ITERATIONS do
+ -- Check that that all lookups are correct and there is no
+ -- value from other cdata stored in the table.
+ test:ok(result_tab[i] == MAGIC, 'correct hash lookup from the table')

And this one checks what then? The hash is calculated correctly, but the value
read from the `filled_tab` is incorrect - what can lead to this?

+end
+
+os.exit(test:check() and 0 or 1)
--
2.34.1

--
Best regards,
Sergey Kaplun