[Tarantool-patches] [PATCH luajit] LJ_GC64: Fix HREFK optimization.

Maxim Kokryashkin m.kokryashkin at tarantool.org
Fri Feb 2 15:21:39 MSK 2024


Hi, Sergey!
Thanks for the review!
Fixed your comments, branch is force-pushed.
Here is the diff with changes:
===
diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-gc64-fix-hrefk-optimization.test.lua
similarity index 67%
rename from test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
rename to test/tarantool-tests/lj-840-gc64-fix-hrefk-optimization.test.lua
index a11b91e3..49168bc4 100644
--- a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
+++ b/test/tarantool-tests/lj-840-gc64-fix-hrefk-optimization.test.lua
@@ -4,7 +4,7 @@ local tap = require('tap')
 -- in LuaJIT.

 local ffi = require('ffi')
-local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
+local test = tap.test('lj-840-gc64-fix-hrefk-optimization'):skipcond({
   ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
   ['Test requires JIT enabled'] = not jit.status(),
 })
@@ -12,10 +12,14 @@ test:plan(1)

 local table_new = require('table.new')

+local SIZEOF_GCTAB = 64
+-- See `chunk2mem` in lj_alloc.c for details.
+local ALLOC_CHUNK_HDR = 16
+local GCTAB_FOOTPRINT = SIZEOF_GCTAB + ALLOC_CHUNK_HDR
 -- Size of single hash node in bytes.
 local NODE_SIZE = 24
--- Number of hash nodes to allocate on each iteration
--- based on the condition from `rec_idx_key`
+-- The maximum value that can be stored in a 16-bit `op2`
+-- field in HREFK IR.
 local HASH_NODES = 65535
 -- The vector of hash nodes should have a raw size of
 -- `HASH_NODES * NODE_SIZE`, which is allocated in
@@ -23,16 +27,17 @@ local HASH_NODES = 65535
 -- the LuaJIT allocator adds a bunch of small paddings
 -- and aligns the required size to LJ_PAGESIZE, which is
 -- 4096, so the actual allocated size includes alignment.
-local ALIGNMENT = 4096
+local LJ_PAGESIZE = 4096
 -- The vector for hash nodes in the table is allocated based on
 -- `hbits`, so it's actually got a size of 65536 nodes.
-local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
+local NODE_VECTOR_SIZE = (HASH_NODES + 1) * NODE_SIZE + LJ_PAGESIZE
+local SINGLE_ITERATION_ALLOC = NODE_VECTOR_SIZE + GCTAB_FOOTPRINT
 -- We need to overflow the 32-bit distance to the global nilnode,
--- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
--- bunch of non-table.new allocations already performed, so one
--- iteration is subtracted to account for them.
-local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
--- Prevent anchor table from interfering with target table allocations.
+-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC and ceil
+-- the result.
+local N_ITERATIONS = math.ceil((2 ^ 32) / SINGLE_ITERATION_ALLOC)
+-- Prevent anchor table from interfering with target
+-- table allocations.
 local anchor = table.new(N_ITERATIONS, 0)

 -- Construct table.
===

New commit message:
===
LJ_GC64: Fix HREFK optimization.

Contributed by XmiliaH.

(cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)

In `lj_record_idx` when `ix->oldv` is the global nilnode and the
required key is not present in the table, it is possible to pass
the constant key lookup optimization condition because of the
`uint32_t` (`MSize`) overflow. Because of that, further recording
incorrectly removes the check for the nilnode, which produces
wrong results when trace is called for a different table.

The issue is solved by using `GCSize`, which has a size of
64 bits, instead of `MSize`.

Maxim Kokryashkin:
* added the description and the test for the problem

Part of tarantool/tarantool#9145
===

On Mon, Jan 15, 2024 at 06:22:29PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> Hi, Maxim!
> Thanks for the patch!
> LGTM with a few comments below.
>
> On 12.01.24, Maxim Kokryashkin wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by XmiliaH.
> >
> > (cherry-picked from commit 91bc6b8ad1f373c1ce9003dc024b2e21fad0e444)
> >
> > In `lj_record_idx` when `ix->oldv` is the global nilnode and the
> > required key is not present in the table, it is possible to pass
> > the constant key lookup optimization condition because of the
> > `uint32_t` overflow. Because of that, further recording
>
> I suggest clarifying like the following:
> | `uint32_t` (`MSize`)
> Feel free to ignore.
Fixed.
>
> > incorrectly removes the check for the nilnode, which produces
> > wrong results when trace is called for a different table.
>
> Nit: Please mention also how the problem is fixed.
Fixed.
>
> >
> > Maxim Kokryashkin:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#9145
> > ---
> > Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-840-fix-hrefk-optimization
> > PR: https://github.com/tarantool/tarantool/pull/9591
> > Issues: https://github.com/LuaJIT/LuaJIT/issues/840
> > https://github.com/tarantool/tarantool/issues/9145
> >
> >  src/lj_record.c                               |  8 +--
> >  .../lj-840-fix-hrefk-optimization.test.lua    | 58 +++++++++++++++++++
> >  2 files changed, 62 insertions(+), 4 deletions(-)
> >  create mode 100644 test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> >
> > diff --git a/src/lj_record.c b/src/lj_record.c
> > index a929b8aa..919e7169 100644
> > --- a/src/lj_record.c
> > +++ b/src/lj_record.c
>
> <snipped>
>
> > diff --git a/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
>
> Nit: We can also add gc64 prefix for this test like:
> <lj-840-gc64-fix-hrefk-optimization.test.lua>
> Feel free to ignore.
Fixed.
>
> > new file mode 100644
> > index 00000000..a11b91e3
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-840-fix-hrefk-optimization.test.lua
> > @@ -0,0 +1,58 @@
> > +local tap = require('tap')
> > +
> > +-- Test file to demonstrate incorrect HREFK optimization
> > +-- in LuaJIT.
> > +
> > +local ffi = require('ffi')
> > +local test = tap.test('lj-840-fix-hrefk-optimization'):skipcond({
> > +  ['Test requires GC64 mode enabled'] = not ffi.abi('gc64'),
> > +  ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +test:plan(1)
> > +
> > +local table_new = require('table.new')
> > +
> > +-- Size of single hash node in bytes.
> > +local NODE_SIZE = 24
> > +-- Number of hash nodes to allocate on each iteration
> > +-- based on the condition from `rec_idx_key`
>
> Nit: Missed dot at the end of the sentence.
>
> It is more correct to say that we use this is restricted by the IR
> format:
> `op2` field in the HREFK IR is a slot number and it is 16-bit wide.
> 65535 == 2^16 - 1; i.e., it is the maximum value that can be stored in a
> 16-bit field.
Fixed.
>
> > +local HASH_NODES = 65535
> > +-- The vector of hash nodes should have a raw size of
> > +-- `HASH_NODES * NODE_SIZE`, which is allocated in
> > +-- `lj_alloc_malloc` directly with `mmap`. However,
> > +-- the LuaJIT allocator adds a bunch of small paddings
> > +-- and aligns the required size to LJ_PAGESIZE, which is
> > +-- 4096, so the actual allocated size includes alignment.
> > +local ALIGNMENT = 4096
>
> Minor: So, maybe name it `LJ_PAGESIZE`?
> Feel free to ignore.
>
> > +-- The vector for hash nodes in the table is allocated based on
> > +-- `hbits`, so it's actually got a size of 65536 nodes.
> > +local SINGLE_ITERATION_ALLOC = (HASH_NODES + 1) * NODE_SIZE + ALIGNMENT + 72
>
> What is the magic number 72?
It's GCtab memory footprint, but I got it wrong and it's actually 80.
Fixed.
>
> > +-- We need to overflow the 32-bit distance to the global nilnode,
> > +-- so we divide 2^32 by the SINGLE_ITERATION_ALLOC. There are a
> > +-- bunch of non-table.new allocations already performed, so one
> > +-- iteration is subtracted to account for them.
>
> Why is it crucial to subtract it? What happens without it?
> I suppose that the new table will still be huge enough, won't it?
>
> > +local N_ITERATIONS = 0x100000000 / SINGLE_ITERATION_ALLOC - 1
I've got a mistake here too. There must be no subtraction, moreover, it
must be a ceil of the result. Fixed.
>
> Minor: We can use `(2 ^ 32)` instead of 0x100000000 (it is easier to
> read).
> Feel free to ignore.
Fixed.
>
> > +-- Prevent anchor table from interfering with target table allocations.
>
> Nit: Comment length is more than 66 symbols.
Fixed.
>
> > +local anchor = table.new(N_ITERATIONS, 0)
> > +
> > +-- Construct table.
> > +for _ = 1, N_ITERATIONS do
> > +  table.insert(anchor, table_new(0, HASH_NODES))
> > +end
> > +
> > +jit.opt.start('hotloop=1')
> > +local function get_n(tab)
> > +  local x
> > +  for _ = 1, 4 do
> > +    x = tab.n
> > +  end
> > +  return x
> > +end
> > +
> > +-- Record the trace for the constructed table.
> > +get_n(anchor[#anchor])
> > +
> > +-- Check the result for the table that has the required key.
> > +local result = get_n({n=1})
> > +test:is(result, 1, 'correct value retrieved')
> > +test:done(true)
> > --
> > 2.43.0
> >
>
> --
> Best regards,
> Sergey Kaplun


More information about the Tarantool-patches mailing list