Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
@ 2024-01-12 13:26 Maxim Kokryashkin via Tarantool-patches
  2024-01-15 15:22 ` Sergey Kaplun via Tarantool-patches
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2024-01-12 13:26 UTC (permalink / raw)
  To: tarantool-patches, skaplun, sergeyb

From: Mike Pall <mike>

Contributed by XmiliaH.

(cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)

In `lj_record_idx` when `ix->oldv` is the global nilnode and the
required key is not present in the table, it is possible to pass
the constant key lookup optimization condition because of the
`uint32_t` overflow. Because of that, further recording
incorrectly removes the check for the nilnode, which produces
wrong results when trace is called for a different table.

Maxim Kokryashkin:
* added the description and the test for the problem

Part of tarantool/tarantool#9145
---
Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
PR: https://github.com/tarantool/tarantool/pull/9591
Issues: https://github.com/LuaJIT/LuaJIT/issues/840
https://github.com/tarantool/tarantool/issues/9145

 src/lj_record.c                               |  8 +--
 .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
 2 files changed, 62 insertions(+), 4 deletions(-)
 create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua

diff --git a/src/lj_record.c b/src/lj_record.c
index a929b8aa..919e7169 100644
--- a/src/lj_record.c
+++ b/src/lj_record.c
@@ -1374,16 +1374,16 @@ static TRef rec_idx_key(jit_State *J, RecordIndex *ix, IRRef *rbref,
     key = emitir(IRTN(IR_CONV), key, IRCONV_NUM_INT);
   if (tref_isk(key)) {
     /* Optimize lookup of constant hash keys. */
-    MSize hslot = (MSize)((char *)ix->oldv - (char *)&noderef(t->node)[0].val);
-    if (t->hmask > 0 && hslot <= t->hmask*(MSize)sizeof(Node) &&
-	hslot <= 65535*(MSize)sizeof(Node)) {
+    GCSize hslot = (GCSize)((char *)ix->oldv-(char *)&noderef(t->node)[0].val);
+    if (hslot <= t->hmask*(GCSize)sizeof(Node) &&
+	hslot <= 65535*(GCSize)sizeof(Node)) {
       TRef node, kslot, hm;
       *rbref = J->cur.nins;  /* Mark possible rollback point. */
       *rbguard = J->guardemit;
       hm = emitir(IRTI(IR_FLOAD), ix->tab, IRFL_TAB_HMASK);
       emitir(IRTGI(IR_EQ), hm, lj_ir_kint(J, (int32_t)t->hmask));
       node = emitir(IRT(IR_FLOAD, IRT_PGC), ix->tab, IRFL_TAB_NODE);
-      kslot = lj_ir_kslot(J, key, hslot / sizeof(Node));
+      kslot = lj_ir_kslot(J, key, (IRRef)(hslot / sizeof(Node)));
       return emitir(IRTG(IR_HREFK, IRT_PGC), node, kslot);
     }
   }
diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
new file mode 100644
index 00000000..a11b91e3
--- /dev/null
+++ b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
@@ -0,0 +1,58 @@
+local tap = require('tap')
+
+-- Test file to demonstrate incorrect HREFK optimization
+-- in LuaJIT.
+
+local ffi = require('ffi')
+local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
+  ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+test:plan(1)
+
+local table_new = require('table.new')
+
+-- Size of single hash node in bytes.
+local NODE_SIZE = 24
+-- Number of hash nodes to allocate on each iteration
+-- based on the condition from `rec_idx_key`
+local HASH_NODES = 65535
+-- The vector of hash nodes should have a raw size of
+-- `HASH_NODES * NODE_SIZE`, which is allocated in
+-- `lj_alloc_malloc` directly with `mmap`. However,
+-- the LuaJIT allocator adds a bunch of small paddings
+-- and aligns the required size to LJ_PAGESIZE, which is
+-- 4096, so the actual allocated size includes alignment.
+local ALIGNMENT = 4096
+-- The vector for hash nodes in the table is allocated based on
+-- `hbits`, so it's actually got a size of 65536 nodes.
+local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
+-- We need to overflow the 32-bit distance to the global nilnode,
+-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
+-- bunch of non-table.new allocations already performed, so one
+-- iteration is subtracted to account for them.
+local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
+-- Prevent anchor table from interfering with target table allocations.
+local anchor = table.new(N_ITERATIONS, 0)
+
+-- Construct table.
+for _ = 1, N_ITERATIONS do
+  table.insert(anchor, table_new(0, HASH_NODES))
+end
+
+jit.opt.start('hotloop=1')
+local function get_n(tab)
+  local x
+  for _ = 1, 4 do
+    x = tab.n
+  end
+  return x
+end
+
+-- Record the trace for the constructed table.
+get_n(anchor[#anchor])
+
+-- Check the result for the table that has the required key.
+local result = get_n({n=1})
+test:is(result, 1, 'correct value retrieved')
+test:done(true)
--
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-12 13:26 [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization Maxim Kokryashkin via Tarantool-patches
@ 2024-01-15 15:22 ` Sergey Kaplun via Tarantool-patches
  2024-02-02 12:21   ` Maxim Kokryashkin via Tarantool-patches
  2024-01-16  8:46 ` Sergey Bronnikov via Tarantool-patches
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-01-15 15:22 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the patch!
LGTM with a few comments below.

On 12.01.24, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
> 
> Contributed by XmiliaH.
> 
> (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
> 
> In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> required key is not present in the table, it is possible to pass
> the constant key lookup optimization condition because of the
> `uint32_t` overflow. Because of that, further recording

I suggest clarifying like the following:
| `uint32_t` (`MSize`)
Feel free to ignore.

> incorrectly removes the check for the nilnode, which produces
> wrong results when trace is called for a different table.

Nit: Please mention also how the problem is fixed.

> 
> Maxim Kokryashkin:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#9145
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
> PR: https://github.com/tarantool/tarantool/pull/9591
> Issues: https://github.com/LuaJIT/LuaJIT/issues/840
> https://github.com/tarantool/tarantool/issues/9145
> 
>  src/lj_record.c                               |  8 +--
>  .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
>  2 files changed, 62 insertions(+), 4 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> 
> diff --git a/src/lj_record.c b/src/lj_record.c
> index a929b8aa..919e7169 100644
> --- a/src/lj_record.c
> +++ b/src/lj_record.c

<snipped>

> diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua

Nit: We can also add gc64 prefix for this test like:
<lj-840-gc64-fix-hrefk-optimization.test.lua>
Feel free to ignore.

> new file mode 100644
> index 00000000..a11b91e3
> --- /dev/null
> +++ b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> @@ -0,0 +1,58 @@
> +local tap = require('tap')
> +
> +-- Test file to demonstrate incorrect HREFK optimization
> +-- in LuaJIT.
> +
> +local ffi = require('ffi')
> +local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
> +  ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +test:plan(1)
> +
> +local table_new = require('table.new')
> +
> +-- Size of single hash node in bytes.
> +local NODE_SIZE = 24
> +-- Number of hash nodes to allocate on each iteration
> +-- based on the condition from `rec_idx_key`

Nit: Missed dot at the end of the sentence.

It is more correct to say that we use this is restricted by the IR
format:
`op2` field in the HREFK IR is a slot number and it is 16-bit wide.
65535 == 2^16 - 1; i.e., it is the maximum value that can be stored in a
16-bit field.

> +local HASH_NODES = 65535
> +-- The vector of hash nodes should have a raw size of
> +-- `HASH_NODES * NODE_SIZE`, which is allocated in
> +-- `lj_alloc_malloc` directly with `mmap`. However,
> +-- the LuaJIT allocator adds a bunch of small paddings
> +-- and aligns the required size to LJ_PAGESIZE, which is
> +-- 4096, so the actual allocated size includes alignment.
> +local ALIGNMENT = 4096

Minor: So, maybe name it `LJ_PAGESIZE`?
Feel free to ignore.

> +-- The vector for hash nodes in the table is allocated based on
> +-- `hbits`, so it's actually got a size of 65536 nodes.
> +local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72

What is the magic number 72?

> +-- We need to overflow the 32-bit distance to the global nilnode,
> +-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
> +-- bunch of non-table.new allocations already performed, so one
> +-- iteration is subtracted to account for them.

Why is it crucial to subtract it? What happens without it?
I suppose that the new table will still be huge enough, won't it?

> +local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1

Minor: We can use `(2 ^ 32)` instead of 0x100000000 (it is easier to
read).
Feel free to ignore.

> +-- Prevent anchor table from interfering with target table allocations.

Nit: Comment length is more than 66 symbols.

> +local anchor = table.new(N_ITERATIONS, 0)
> +
> +-- Construct table.
> +for _ = 1, N_ITERATIONS do
> +  table.insert(anchor, table_new(0, HASH_NODES))
> +end
> +
> +jit.opt.start('hotloop=1')
> +local function get_n(tab)
> +  local x
> +  for _ = 1, 4 do
> +    x = tab.n
> +  end
> +  return x
> +end
> +
> +-- Record the trace for the constructed table.
> +get_n(anchor[#anchor])
> +
> +-- Check the result for the table that has the required key.
> +local result = get_n({n=1})
> +test:is(result, 1, 'correct value retrieved')
> +test:done(true)
> --
> 2.43.0
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-12 13:26 [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization Maxim Kokryashkin via Tarantool-patches
  2024-01-15 15:22 ` Sergey Kaplun via Tarantool-patches
@ 2024-01-16  8:46 ` Sergey Bronnikov via Tarantool-patches
  2024-01-16 12:09   ` Sergey Kaplun via Tarantool-patches
  2024-01-16 12:54 ` Sergey Bronnikov via Tarantool-patches
  2024-02-15 13:42 ` Igor Munkin via Tarantool-patches
  3 siblings, 1 reply; 10+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-01-16  8:46 UTC (permalink / raw)
  To: Maxim Kokryashkin, tarantool-patches, skaplun

Hi, Max


thanks for the patch!

test is passed after reverting the patch.

On the same build repro from the issue [1] has failed.


1. https://github.com/LuaJIT/LuaJIT/issues/840

Sergey

On 1/12/24 16:26, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
>
> Contributed by XmiliaH.
>
> (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
>
> In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> required key is not present in the table, it is possible to pass
> the constant key lookup optimization condition because of the
> `uint32_t` overflow. Because of that, further recording
> incorrectly removes the check for the nilnode, which produces
> wrong results when trace is called for a different table.
>
> Maxim Kokryashkin:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#9145
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
> PR: https://github.com/tarantool/tarantool/pull/9591
> Issues: https://github.com/LuaJIT/LuaJIT/issues/840
> https://github.com/tarantool/tarantool/issues/9145
>
>   src/lj_record.c                               |  8 +--
>   .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
>   2 files changed, 62 insertions(+), 4 deletions(-)
>   create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
>
> diff --git a/src/lj_record.c b/src/lj_record.c
> index a929b8aa..919e7169 100644
> --- a/src/lj_record.c
> +++ b/src/lj_record.c
> @@ -1374,16 +1374,16 @@ static TRef rec_idx_key(jit_State *J, RecordIndex *ix, IRRef *rbref,
>       key = emitir(IRTN(IR_CONV), key, IRCONV_NUM_INT);
>     if (tref_isk(key)) {
>       /* Optimize lookup of constant hash keys. */
> -    MSize hslot = (MSize)((char *)ix->oldv - (char *)&noderef(t->node)[0].val);
> -    if (t->hmask > 0 && hslot <= t->hmask*(MSize)sizeof(Node) &&
> -	hslot <= 65535*(MSize)sizeof(Node)) {
> +    GCSize hslot = (GCSize)((char *)ix->oldv-(char *)&noderef(t->node)[0].val);
> +    if (hslot <= t->hmask*(GCSize)sizeof(Node) &&
> +	hslot <= 65535*(GCSize)sizeof(Node)) {
>         TRef node, kslot, hm;
>         *rbref = J->cur.nins;  /* Mark possible rollback point. */
>         *rbguard = J->guardemit;
>         hm = emitir(IRTI(IR_FLOAD), ix->tab, IRFL_TAB_HMASK);
>         emitir(IRTGI(IR_EQ), hm, lj_ir_kint(J, (int32_t)t->hmask));
>         node = emitir(IRT(IR_FLOAD, IRT_PGC), ix->tab, IRFL_TAB_NODE);
> -      kslot = lj_ir_kslot(J, key, hslot / sizeof(Node));
> +      kslot = lj_ir_kslot(J, key, (IRRef)(hslot / sizeof(Node)));
>         return emitir(IRTG(IR_HREFK, IRT_PGC), node, kslot);
>       }
>     }
> diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> new file mode 100644
> index 00000000..a11b91e3
> --- /dev/null
> +++ b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> @@ -0,0 +1,58 @@
> +local tap = require('tap')
> +
> +-- Test file to demonstrate incorrect HREFK optimization
> +-- in LuaJIT.
> +
> +local ffi = require('ffi')
> +local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
> +  ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +test:plan(1)
> +
> +local table_new = require('table.new')
> +
> +-- Size of single hash node in bytes.
> +local NODE_SIZE = 24
> +-- Number of hash nodes to allocate on each iteration
> +-- based on the condition from `rec_idx_key`
> +local HASH_NODES = 65535
> +-- The vector of hash nodes should have a raw size of
> +-- `HASH_NODES * NODE_SIZE`, which is allocated in
> +-- `lj_alloc_malloc` directly with `mmap`. However,
> +-- the LuaJIT allocator adds a bunch of small paddings
> +-- and aligns the required size to LJ_PAGESIZE, which is
> +-- 4096, so the actual allocated size includes alignment.
> +local ALIGNMENT = 4096
> +-- The vector for hash nodes in the table is allocated based on
> +-- `hbits`, so it's actually got a size of 65536 nodes.
> +local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
> +-- We need to overflow the 32-bit distance to the global nilnode,
> +-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
> +-- bunch of non-table.new allocations already performed, so one
> +-- iteration is subtracted to account for them.
> +local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
> +-- Prevent anchor table from interfering with target table allocations.
> +local anchor = table.new(N_ITERATIONS, 0)
> +
> +-- Construct table.
> +for _ = 1, N_ITERATIONS do
> +  table.insert(anchor, table_new(0, HASH_NODES))
> +end
> +
> +jit.opt.start('hotloop=1')
> +local function get_n(tab)
> +  local x
> +  for _ = 1, 4 do
> +    x = tab.n
> +  end
> +  return x
> +end
> +
> +-- Record the trace for the constructed table.
> +get_n(anchor[#anchor])
> +
> +-- Check the result for the table that has the required key.
> +local result = get_n({n=1})
> +test:is(result, 1, 'correct value retrieved')
> +test:done(true)
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-16  8:46 ` Sergey Bronnikov via Tarantool-patches
@ 2024-01-16 12:09   ` Sergey Kaplun via Tarantool-patches
  2024-01-16 12:53     ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-01-16 12:09 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: Maxim Kokryashkin, tarantool-patches

Hi, Sergey!

On 16.01.24, Sergey Bronnikov wrote:
> Hi, Max
> 
> 
> thanks for the patch!
> 
> test is passed after reverting the patch.

Do you check the GC64 build?
*NB*: You should use GC64 *without* SYSMALLOC (and hence without ASAN),
since we need to use more predictible LuaJIT's dlmalloc.

> 
> On the same build repro from the issue [1] has failed.
> 
> 
> 1. https://github.com/LuaJIT/LuaJIT/issues/840
> 
> Sergey

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-16 12:09   ` Sergey Kaplun via Tarantool-patches
@ 2024-01-16 12:53     ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-01-16 12:53 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Maxim Kokryashkin, tarantool-patches


On 1/16/24 15:09, Sergey Kaplun wrote:
> Hi, Sergey!
>
> On 16.01.24, Sergey Bronnikov wrote:
>> Hi, Max
>>
>>
>> thanks for the patch!
>>
>> test is passed after reverting the patch.
> Do you check the GC64 build?
> *NB*: You should use GC64 *without* SYSMALLOC (and hence without ASAN),
> since we need to use more predictible LuaJIT's dlmalloc.

Reproduced with:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -DLUA_USE_ASSERT=ON 
-DLUA_USE_APICHECK=ON -DLUAJIT_ENABLE_GC64=ON

Sergey, thanks!

>> On the same build repro from the issue [1] has failed.
>>
>>
>> 1. https://github.com/LuaJIT/LuaJIT/issues/840
>>
>> Sergey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-12 13:26 [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization Maxim Kokryashkin via Tarantool-patches
  2024-01-15 15:22 ` Sergey Kaplun via Tarantool-patches
  2024-01-16  8:46 ` Sergey Bronnikov via Tarantool-patches
@ 2024-01-16 12:54 ` Sergey Bronnikov via Tarantool-patches
  2024-02-15 13:42 ` Igor Munkin via Tarantool-patches
  3 siblings, 0 replies; 10+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-01-16 12:54 UTC (permalink / raw)
  To: Maxim Kokryashkin, tarantool-patches, skaplun

Max, thanks for the patch! LGTM

On 1/12/24 16:26, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
>
> Contributed by XmiliaH.
>
> (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
>
> In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> required key is not present in the table, it is possible to pass
> the constant key lookup optimization condition because of the
> `uint32_t` overflow. Because of that, further recording
> incorrectly removes the check for the nilnode, which produces
> wrong results when trace is called for a different table.
>
> Maxim Kokryashkin:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#9145
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
> PR: https://github.com/tarantool/tarantool/pull/9591
> Issues: https://github.com/LuaJIT/LuaJIT/issues/840
> https://github.com/tarantool/tarantool/issues/9145
>
>   src/lj_record.c                               |  8 +--
>   .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
>   2 files changed, 62 insertions(+), 4 deletions(-)
>   create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
>
> diff --git a/src/lj_record.c b/src/lj_record.c
> index a929b8aa..919e7169 100644
> --- a/src/lj_record.c
> +++ b/src/lj_record.c
> @@ -1374,16 +1374,16 @@ static TRef rec_idx_key(jit_State *J, RecordIndex *ix, IRRef *rbref,
>       key = emitir(IRTN(IR_CONV), key, IRCONV_NUM_INT);
>     if (tref_isk(key)) {
>       /* Optimize lookup of constant hash keys. */
> -    MSize hslot = (MSize)((char *)ix->oldv - (char *)&noderef(t->node)[0].val);
> -    if (t->hmask > 0 && hslot <= t->hmask*(MSize)sizeof(Node) &&
> -	hslot <= 65535*(MSize)sizeof(Node)) {
> +    GCSize hslot = (GCSize)((char *)ix->oldv-(char *)&noderef(t->node)[0].val);
> +    if (hslot <= t->hmask*(GCSize)sizeof(Node) &&
> +	hslot <= 65535*(GCSize)sizeof(Node)) {
>         TRef node, kslot, hm;
>         *rbref = J->cur.nins;  /* Mark possible rollback point. */
>         *rbguard = J->guardemit;
>         hm = emitir(IRTI(IR_FLOAD), ix->tab, IRFL_TAB_HMASK);
>         emitir(IRTGI(IR_EQ), hm, lj_ir_kint(J, (int32_t)t->hmask));
>         node = emitir(IRT(IR_FLOAD, IRT_PGC), ix->tab, IRFL_TAB_NODE);
> -      kslot = lj_ir_kslot(J, key, hslot / sizeof(Node));
> +      kslot = lj_ir_kslot(J, key, (IRRef)(hslot / sizeof(Node)));
>         return emitir(IRTG(IR_HREFK, IRT_PGC), node, kslot);
>       }
>     }
> diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> new file mode 100644
> index 00000000..a11b91e3
> --- /dev/null
> +++ b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> @@ -0,0 +1,58 @@
> +local tap = require('tap')
> +
> +-- Test file to demonstrate incorrect HREFK optimization
> +-- in LuaJIT.
> +
> +local ffi = require('ffi')
> +local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
> +  ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +test:plan(1)
> +
> +local table_new = require('table.new')
> +
> +-- Size of single hash node in bytes.
> +local NODE_SIZE = 24
> +-- Number of hash nodes to allocate on each iteration
> +-- based on the condition from `rec_idx_key`
> +local HASH_NODES = 65535
> +-- The vector of hash nodes should have a raw size of
> +-- `HASH_NODES * NODE_SIZE`, which is allocated in
> +-- `lj_alloc_malloc` directly with `mmap`. However,
> +-- the LuaJIT allocator adds a bunch of small paddings
> +-- and aligns the required size to LJ_PAGESIZE, which is
> +-- 4096, so the actual allocated size includes alignment.
> +local ALIGNMENT = 4096
> +-- The vector for hash nodes in the table is allocated based on
> +-- `hbits`, so it's actually got a size of 65536 nodes.
> +local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
> +-- We need to overflow the 32-bit distance to the global nilnode,
> +-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
> +-- bunch of non-table.new allocations already performed, so one
> +-- iteration is subtracted to account for them.
> +local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
> +-- Prevent anchor table from interfering with target table allocations.
> +local anchor = table.new(N_ITERATIONS, 0)
> +
> +-- Construct table.
> +for _ = 1, N_ITERATIONS do
> +  table.insert(anchor, table_new(0, HASH_NODES))
> +end
> +
> +jit.opt.start('hotloop=1')
> +local function get_n(tab)
> +  local x
> +  for _ = 1, 4 do
> +    x = tab.n
> +  end
> +  return x
> +end
> +
> +-- Record the trace for the constructed table.
> +get_n(anchor[#anchor])
> +
> +-- Check the result for the table that has the required key.
> +local result = get_n({n=1})
> +test:is(result, 1, 'correct value retrieved')
> +test:done(true)
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-15 15:22 ` Sergey Kaplun via Tarantool-patches
@ 2024-02-02 12:21   ` Maxim Kokryashkin via Tarantool-patches
  2024-02-05  9:53     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2024-02-02 12:21 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Maxim Kokryashkin, tarantool-patches

Hi, Sergey!
Thanks for the review!
Fixed your comments, branch is force-pushed.
Here is the diff with changes:
===
diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-gc64-fix-hrefk-optimization.test.lua
similarity index 67%
rename from test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
rename to test/tarantool-tests/lj-840-gc64-fix-hrefk-optimization.test.lua
index a11b91e3..49168bc4 100644
--- a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
+++ b/test/tarantool-tests/lj-840-gc64-fix-hrefk-optimization.test.lua
@@ -4,7 +4,7 @@ local tap = require('tap')
 -- in LuaJIT.

 local ffi = require('ffi')
-local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
+local test = tap.test('lj-840-gc64-fix-hrefk-optimization'):skipcond({
   ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
   ['Test requires JIT enabled'] = not jit.status(),
 })
@@ -12,10 +12,14 @@ test:plan(1)

 local table_new = require('table.new')

+local SIZEOF_GCTAB = 64
+-- See `chunk2mem` in lj_alloc.c for details.
+local ALLOC_CHUNK_HDR = 16
+local GCTAB_FOOTPRINT = SIZEOF_GCTAB + ALLOC_CHUNK_HDR
 -- Size of single hash node in bytes.
 local NODE_SIZE = 24
--- Number of hash nodes to allocate on each iteration
--- based on the condition from `rec_idx_key`
+-- The maximum value that can be stored in a 16-bit `op2`
+-- field in HREFK IR.
 local HASH_NODES = 65535
 -- The vector of hash nodes should have a raw size of
 -- `HASH_NODES * NODE_SIZE`, which is allocated in
@@ -23,16 +27,17 @@ local HASH_NODES = 65535
 -- the LuaJIT allocator adds a bunch of small paddings
 -- and aligns the required size to LJ_PAGESIZE, which is
 -- 4096, so the actual allocated size includes alignment.
-local ALIGNMENT = 4096
+local LJ_PAGESIZE = 4096
 -- The vector for hash nodes in the table is allocated based on
 -- `hbits`, so it's actually got a size of 65536 nodes.
-local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
+local NODE_VECTOR_SIZE = (HASH_NODES + 1) * NODE_SIZE + LJ_PAGESIZE
+local SINGLE_ITERATION_ALLOC = NODE_VECTOR_SIZE + GCTAB_FOOTPRINT
 -- We need to overflow the 32-bit distance to the global nilnode,
--- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
--- bunch of non-table.new allocations already performed, so one
--- iteration is subtracted to account for them.
-local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
--- Prevent anchor table from interfering with target table allocations.
+-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC and ceil
+-- the result.
+local N_ITERATIONS = math.ceil((2 ^ 32) / SINGLE_ITERATION_ALLOC)
+-- Prevent anchor table from interfering with target
+-- table allocations.
 local anchor = table.new(N_ITERATIONS, 0)

 -- Construct table.
===

New commit message:
===
LJ_GC64: Fix HREFK optimization.

Contributed by XmiliaH.

(cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)

In `lj_record_idx` when `ix->oldv` is the global nilnode and the
required key is not present in the table, it is possible to pass
the constant key lookup optimization condition because of the
`uint32_t` (`MSize`) overflow. Because of that, further recording
incorrectly removes the check for the nilnode, which produces
wrong results when trace is called for a different table.

The issue is solved by using `GCSize`, which has a size of
64 bits, instead of `MSize`.

Maxim Kokryashkin:
* added the description and the test for the problem

Part of tarantool/tarantool#9145
===

On Mon, Jan 15, 2024 at 06:22:29PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> Hi, Maxim!
> Thanks for the patch!
> LGTM with a few comments below.
>
> On 12.01.24, Maxim Kokryashkin wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by XmiliaH.
> >
> > (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
> >
> > In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> > required key is not present in the table, it is possible to pass
> > the constant key lookup optimization condition because of the
> > `uint32_t` overflow. Because of that, further recording
>
> I suggest clarifying like the following:
> | `uint32_t` (`MSize`)
> Feel free to ignore.
Fixed.
>
> > incorrectly removes the check for the nilnode, which produces
> > wrong results when trace is called for a different table.
>
> Nit: Please mention also how the problem is fixed.
Fixed.
>
> >
> > Maxim Kokryashkin:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#9145
> > ---
> > Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
> > PR: https://github.com/tarantool/tarantool/pull/9591
> > Issues: https://github.com/LuaJIT/LuaJIT/issues/840
> > https://github.com/tarantool/tarantool/issues/9145
> >
> >  src/lj_record.c                               |  8 +--
> >  .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
> >  2 files changed, 62 insertions(+), 4 deletions(-)
> >  create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> >
> > diff --git a/src/lj_record.c b/src/lj_record.c
> > index a929b8aa..919e7169 100644
> > --- a/src/lj_record.c
> > +++ b/src/lj_record.c
>
> <snipped>
>
> > diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
>
> Nit: We can also add gc64 prefix for this test like:
> <lj-840-gc64-fix-hrefk-optimization.test.lua>
> Feel free to ignore.
Fixed.
>
> > new file mode 100644
> > index 00000000..a11b91e3
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> > @@ -0,0 +1,58 @@
> > +local tap = require('tap')
> > +
> > +-- Test file to demonstrate incorrect HREFK optimization
> > +-- in LuaJIT.
> > +
> > +local ffi = require('ffi')
> > +local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
> > +  ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +test:plan(1)
> > +
> > +local table_new = require('table.new')
> > +
> > +-- Size of single hash node in bytes.
> > +local NODE_SIZE = 24
> > +-- Number of hash nodes to allocate on each iteration
> > +-- based on the condition from `rec_idx_key`
>
> Nit: Missed dot at the end of the sentence.
>
> It is more correct to say that we use this is restricted by the IR
> format:
> `op2` field in the HREFK IR is a slot number and it is 16-bit wide.
> 65535 == 2^16 - 1; i.e., it is the maximum value that can be stored in a
> 16-bit field.
Fixed.
>
> > +local HASH_NODES = 65535
> > +-- The vector of hash nodes should have a raw size of
> > +-- `HASH_NODES * NODE_SIZE`, which is allocated in
> > +-- `lj_alloc_malloc` directly with `mmap`. However,
> > +-- the LuaJIT allocator adds a bunch of small paddings
> > +-- and aligns the required size to LJ_PAGESIZE, which is
> > +-- 4096, so the actual allocated size includes alignment.
> > +local ALIGNMENT = 4096
>
> Minor: So, maybe name it `LJ_PAGESIZE`?
> Feel free to ignore.
>
> > +-- The vector for hash nodes in the table is allocated based on
> > +-- `hbits`, so it's actually got a size of 65536 nodes.
> > +local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
>
> What is the magic number 72?
It's GCtab memory footprint, but I got it wrong and it's actually 80.
Fixed.
>
> > +-- We need to overflow the 32-bit distance to the global nilnode,
> > +-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
> > +-- bunch of non-table.new allocations already performed, so one
> > +-- iteration is subtracted to account for them.
>
> Why is it crucial to subtract it? What happens without it?
> I suppose that the new table will still be huge enough, won't it?
>
> > +local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
I've got a mistake here too. There must be no subtraction, moreover, it
must be a ceil of the result. Fixed.
>
> Minor: We can use `(2 ^ 32)` instead of 0x100000000 (it is easier to
> read).
> Feel free to ignore.
Fixed.
>
> > +-- Prevent anchor table from interfering with target table allocations.
>
> Nit: Comment length is more than 66 symbols.
Fixed.
>
> > +local anchor = table.new(N_ITERATIONS, 0)
> > +
> > +-- Construct table.
> > +for _ = 1, N_ITERATIONS do
> > +  table.insert(anchor, table_new(0, HASH_NODES))
> > +end
> > +
> > +jit.opt.start('hotloop=1')
> > +local function get_n(tab)
> > +  local x
> > +  for _ = 1, 4 do
> > +    x = tab.n
> > +  end
> > +  return x
> > +end
> > +
> > +-- Record the trace for the constructed table.
> > +get_n(anchor[#anchor])
> > +
> > +-- Check the result for the table that has the required key.
> > +local result = get_n({n=1})
> > +test:is(result, 1, 'correct value retrieved')
> > +test:done(true)
> > --
> > 2.43.0
> >
>
> --
> Best regards,
> Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-02-02 12:21   ` Maxim Kokryashkin via Tarantool-patches
@ 2024-02-05  9:53     ` Sergey Kaplun via Tarantool-patches
  2024-02-06 11:09       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-02-05  9:53 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: Maxim Kokryashkin, tarantool-patches

Hi, Maxim!
Thanks for the fixes!
LGTM, except a single nit below.

On 02.02.24, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the review!
> Fixed your comments, branch is force-pushed.
> Here is the diff with changes:

<snipped>

> New commit message:
> ===
> LJ_GC64: Fix HREFK optimization.
> 
> Contributed by XmiliaH.
> 
> (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
> 
> In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> required key is not present in the table, it is possible to pass
> the constant key lookup optimization condition because of the
> `uint32_t` (`MSize`) overflow. Because of that, further recording
> incorrectly removes the check for the nilnode, which produces
> wrong results when trace is called for a different table.
> 
> The issue is solved by using `GCSize`, which has a size of

Minor: I'd say 'which size is defines by the target system' or something
like that (to mention that it is 32 bit for non-GC64 build).

> 64 bits, instead of `MSize`.
> 
> Maxim Kokryashkin:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#9145
> ===

<snipped>

> >
> > --
> > Best regards,
> > Sergey Kaplun

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-02-05  9:53     ` Sergey Kaplun via Tarantool-patches
@ 2024-02-06 11:09       ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2024-02-06 11:09 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: Maxim Kokryashkin, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]


Hi, Sergey!
Thanks for the review!
Now this line looks like the following:
| The issue is solved by using `GCSize`, which has a size of
| 64 bits on GC64 builds, instead of `MSize`.
 
--
Best regards,
Maxim Kokryashkin
 
  
>Понедельник, 5 февраля 2024, 12:57 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
> 
>Hi, Maxim!
>Thanks for the fixes!
>LGTM, except a single nit below.
>
>On 02.02.24, Maxim Kokryashkin wrote:
>> Hi, Sergey!
>> Thanks for the review!
>> Fixed your comments, branch is force-pushed.
>> Here is the diff with changes:
>
><snipped>
>
>> New commit message:
>> ===
>> LJ_GC64: Fix HREFK optimization.
>>
>> Contributed by XmiliaH.
>>
>> (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
>>
>> In `lj_record_idx` when `ix->oldv` is the global nilnode and the
>> required key is not present in the table, it is possible to pass
>> the constant key lookup optimization condition because of the
>> `uint32_t` (`MSize`) overflow. Because of that, further recording
>> incorrectly removes the check for the nilnode, which produces
>> wrong results when trace is called for a different table.
>>
>> The issue is solved by using `GCSize`, which has a size of
>
>Minor: I'd say 'which size is defines by the target system' or something
>like that (to mention that it is 32 bit for non-GC64 build).
>
>> 64 bits, instead of `MSize`.
>>
>> Maxim Kokryashkin:
>> * added the description and the test for the problem
>>
>> Part of tarantool/tarantool#9145
>> ===
>
><snipped>
>
>> >
>> > --
>> > Best regards,
>> > Sergey Kaplun
>
>--
>Best regards,
>Sergey Kaplun
 

[-- Attachment #2: Type: text/html, Size: 2342 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.
  2024-01-12 13:26 [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization Maxim Kokryashkin via Tarantool-patches
                   ` (2 preceding siblings ...)
  2024-01-16 12:54 ` Sergey Bronnikov via Tarantool-patches
@ 2024-02-15 13:42 ` Igor Munkin via Tarantool-patches
  3 siblings, 0 replies; 10+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2024-02-15 13:42 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Max,

I've checked the patchset into all long-term branches in
tarantool/luajit and bumped a new version in master, release/3.0 and
release/2.11.

On 12.01.24, Maxim Kokryashkin via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> Contributed by XmiliaH.
> 
> (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
> 
> In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> required key is not present in the table, it is possible to pass
> the constant key lookup optimization condition because of the
> `uint32_t` overflow. Because of that, further recording
> incorrectly removes the check for the nilnode, which produces
> wrong results when trace is called for a different table.
> 
> Maxim Kokryashkin:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#9145
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
> PR: https://github.com/tarantool/tarantool/pull/9591
> Issues: https://github.com/LuaJIT/LuaJIT/issues/840
> https://github.com/tarantool/tarantool/issues/9145
> 
>  src/lj_record.c                               |  8 +--
>  .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
>  2 files changed, 62 insertions(+), 4 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> 

<snipped>

> --
> 2.43.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-15 13:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-12 13:26 [Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization Maxim Kokryashkin via Tarantool-patches
2024-01-15 15:22 ` Sergey Kaplun via Tarantool-patches
2024-02-02 12:21   ` Maxim Kokryashkin via Tarantool-patches
2024-02-05  9:53     ` Sergey Kaplun via Tarantool-patches
2024-02-06 11:09       ` Maxim Kokryashkin via Tarantool-patches
2024-01-16  8:46 ` Sergey Bronnikov via Tarantool-patches
2024-01-16 12:09   ` Sergey Kaplun via Tarantool-patches
2024-01-16 12:53     ` Sergey Bronnikov via Tarantool-patches
2024-01-16 12:54 ` Sergey Bronnikov via Tarantool-patches
2024-02-15 13:42 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox