[Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
@ 2021-05-28 12:06 Sergey Ostanevich via Tarantool-patches
  2021-07-04 21:06 ` Igor Munkin via Tarantool-patches
  2022-06-30 12:10 ` Igor Munkin via Tarantool-patches
  0 siblings, 2 replies; 7+ messages in thread
From: Sergey Ostanevich via Tarantool-patches @ 2021-05-28 12:06 UTC (permalink / raw)
  To: Igor Munkin, Sergey Kaplun, tarantool-patches

Author: Mike Pall <mike>
Date:   Mon Aug 28 10:43:37 2017 +0200

    x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().

    Contributed by Peter Cawley.

    (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)

    Code generation under LJ_GC64 missed an update to the mcode area after
    a 64bit constant encoding. This lead to a corruption to the constant
    later on.
    The problem is rather rare, since there should be big enough (4GiB)
    distance from the currently allocated mcode to the dispatch pointer.
    This lead to a number of flaky tests, trackers are addressed.

    Sergey Ostanevich:
    * added the description and the test for the problem

    Closes: #4095, #4199, #4614

    Signed-off-by: Sergey Ostanevich <sergos@tarantool.org>

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index 767bf6f3..2850aea9 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
       ir->i = (int32_t)(as->mctop - as->mcbot);
       as->mcbot += 8;
       as->mclim = as->mcbot + MCLIM_REDZONE;
+      lj_mcode_commitbot(as->J, as->mcbot);
     }
     as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
     as->mrm.base = RID_RIP;
diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
new file mode 100644
index 00000000..3ac30427
--- /dev/null
+++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
@@ -0,0 +1,63 @@
+-- the test is GC64 only
+local ffi=require('ffi')
+require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
+
+local tap = require("tap")
+local test = tap.test("gh-4199-gc64-flaky")
+test:plan(1)
+
+-- first - we have to make a gap from current JIT infra to next
+-- available mappable memory
+-- most efficient is to grab it per-page
+
+
+ffi.cdef('void * mmap(void *start, size_t length, int prot , int flags, int fd, long offset);')
+ffi.cdef('long getpagesize();')
+
+local pagesize = tonumber(ffi.C.getpagesize())
+local blob = {}
+for i=1, 4e9/pagesize do
+        blob[i] = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0)
+        assert(blob[i] ~= 0)
+end
+
+-- try to chomp all memory in currently allocated gc space
+collectgarbage('stop')
+local dummy={'a'}
+for i=2,30 do
+        dummy[i] = dummy[i - 1] .. dummy[i - 1]
+end
+
+-- generate a bunch of functions and keep them stored to trigger wrong constant placement
+
+local s={}
+local pass = true
+
+jit.opt.start('hotloop=1’)
+for n=1,100 do
+        local src='function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d)
+                local a={}
+                for i=1,1e6 do
+                        a[i] = x + y + z + f + g + h + j + k + r + c + d
+                        if (x > 0) then a[i] = a[i] + 1.1 end
+                        if (c > 0) then a[i] = a[i] + 2.2 end
+                        if (z > 0) then a[i] = a[i] + 3.3 end
+                        if (f > 0) then a[i] = a[i] + 4.4 end
+                        x=x+r
+                        y=y-c
+                        z=z+d
+                end
+                return a[1]
+        end
+        return f]] .. n ..'(...)'
+
+        s[n] = assert(load(src))
+        local res1 = s[n](1,2,3,4,5,6,7,8,9,10,11)
+        local res2 = s[n](1,2,3,4,5,6,7,8,9,10,11)
+        if (res1 ~= res2) then
+                pass = false
+                break
+        end
+end
+
+test:ok(pass, 'wrong IR constant fuse')

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
  2021-05-28 12:06 [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion Sergey Ostanevich via Tarantool-patches
@ 2021-07-04 21:06 ` Igor Munkin via Tarantool-patches
  2022-02-16 15:44   ` Sergey Kaplun via Tarantool-patches
  2022-06-30 12:10 ` Igor Munkin via Tarantool-patches
  1 sibling, 1 reply; 7+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-07-04 21:06 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Sergos,

Thanks for the patch! Something went wrong with formatting the patch I
guess, so I'll share you the recipe how I send the backported patches:
  1. cherry-pick the original patch from the upstream
  2. format-patch the backported commit
  3. Add "From" header with the original author of the changes at the
     very beginning of the formatted patch
  4. Adjust the commit message according to our backporting procedure

BTW, I've found neither the branch with the patch in tarantool/luajit
nor the branch with the green CI in tarantool/tarantool...

On 28.05.21, Sergey Ostanevich wrote:
> Author: Mike Pall <mike>
> Date:   Mon Aug 28 10:43:37 2017 +0200
> 
>     x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
> 
>     Contributed by Peter Cawley.
> 
>     (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
> 
>     Code generation under LJ_GC64 missed an update to the mcode area after

Minor: s/mcode area/mcode area boundaries/. Feel free to ignore.

>     a 64bit constant encoding. This lead to a corruption to the constant

Typo: s/corruption to/corruption of/.

>     later on.
>     The problem is rather rare, since there should be big enough (4GiB)
>     distance from the currently allocated mcode to the dispatch pointer.

Minor: Sorry for being pedantic, but this is not such a trivial bug, so
I've described semantics around this a bit. As for me it's worth to
mention four possible encodings of 64-bit constant on the trace.
  0. If the address of the constant fits into 32-bit one, then encode it
     as a 32-bit displacement (the only option for non-GC64 mode).
  1. If the offset of the constant slot from the dispatch table (pinned
     to r14 that is not changed while trace execution) fits into 32-bit,
     then encode this as a 32-bit displacement relative to r14.
  2. If the offset of the constant slot from the mcode (i.e. rip) fits
     into 32-bit, then encode this as a 32-bit displacement relative to
     rip (considering long mode specifics and RID_RIP hack).
  3. If none of the conditions above are valid, compiler materializes
     this 64-bit constant right at the trace bottom and encodes this the
     same way it does for the previous case.

And now goes your part regarding 4Gb and rarity of this case and the
problem is much clearer to the reader. Please adjust this part using my
description above.

>     This lead to a number of flaky tests, trackers are addressed.
> 
>     Sergey Ostanevich:
>     * added the description and the test for the problem
> 
>     Closes: #4095, #4199, #4614

At first, please move split this line into the separate ones with a
single issue per line. Also don't use ':' in GitHub tags, please: it
just doesn't respect our commit message style. Unfortunately, this
commit tags no issue, since it is pushed to tarantool/luajit repo, but
the issues relate to tarantool/tarantool. Hence, you need to explicitly
mention the latter repo.  And last (but not least), we agreed with
Sergey to use "Resolves" instead of "Closes" in tarantool/luajit repo
and use "Closes" in the corresponding patches bumping LuaJIT submodule.
As for these patches, I'd rather use "Fixes", since you have checked
that the failures related to the mentioned issues are gone. Considering
everything above, the line above transforms into the following:

| Fixes tarantool/tarantool#4095
| Fixes tarantool/tarantool#4199
| Fixes tarantool/tarantool#4614

> 
>     Signed-off-by: Sergey Ostanevich <sergos@tarantool.org>
> 
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index 767bf6f3..2850aea9 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
>        ir->i = (int32_t)(as->mctop - as->mcbot);
>        as->mcbot += 8;
>        as->mclim = as->mcbot + MCLIM_REDZONE;
> +      lj_mcode_commitbot(as->J, as->mcbot);
>      }
>      as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
>      as->mrm.base = RID_RIP;
> diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
> new file mode 100644
> index 00000000..3ac30427
> --- /dev/null
> +++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
> @@ -0,0 +1,63 @@

Regarding test code style: I don't want to point each place you have
violated it, so I just refer the document[1] that we try to follow except
the one rule: we use 2 spaces for indentation to get tests closer to
LuaJIT sources. Hence, please consider that I've "implicitly" commented
every spot below, where you don't follow our code style.

> +-- the test is GC64 only
> +local ffi=require('ffi')
> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
> +
> +local tap = require("tap")
> +local test = tap.test("gh-4199-gc64-flaky")
> +test:plan(1)
> +
> +-- first - we have to make a gap from current JIT infra to next
> +-- available mappable memory
> +-- most efficient is to grab it per-page
> +
> +
> +ffi.cdef('void * mmap(void *start, size_t length, int prot , int flags, int fd, long offset);')
> +ffi.cdef('long getpagesize();')

Minor: It's better use <ffi.cdef> once but with multiline declarations
instead of several <ffi.cdef> calls.

> +
> +local pagesize = tonumber(ffi.C.getpagesize())
> +local blob = {}
> +for i=1, 4e9/pagesize do
> +        blob[i] = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0)

0x22, 4e9 -- these are magic numbers for me. Please, create a variable
with the descriptive name (and even comments, I guess) for each of them,
to ease the further maintenance.

> +        assert(blob[i] ~= 0)
> +end
> +
> +-- try to chomp all memory in currently allocated gc space
> +collectgarbage('stop')

Since you decided to stop GC, you can do it right at the very beginning,
so you don't need to anchor all mmaped memory above.

> +local dummy={'a'}
> +for i=2,30 do
> +        dummy[i] = dummy[i - 1] .. dummy[i - 1]
> +end

Why do you need this loop?

> +
> +-- generate a bunch of functions and keep them stored to trigger wrong constant placement
> +
> +local s={}

Again, since GC is stopped, there is no need to anchor all the functions
generated below.

> +local pass = true
> +
> +jit.opt.start('hotloop=1’)
> +for n=1,100 do
> +        local src='function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d)
> +                local a={}
> +                for i=1,1e6 do
> +                        a[i] = x + y + z + f + g + h + j + k + r + c + d
> +                        if (x > 0) then a[i] = a[i] + 1.1 end
> +                        if (c > 0) then a[i] = a[i] + 2.2 end
> +                        if (z > 0) then a[i] = a[i] + 3.3 end
> +                        if (f > 0) then a[i] = a[i] + 4.4 end
> +                        x=x+r
> +                        y=y-c
> +                        z=z+d
> +                end
> +                return a[1]
> +        end
> +        return f]] .. n ..'(...)'
> +
> +        s[n] = assert(load(src))
> +        local res1 = s[n](1,2,3,4,5,6,7,8,9,10,11)
> +        local res2 = s[n](1,2,3,4,5,6,7,8,9,10,11)

This was not obvious to me, but I've finally got it: you compare the
result yielded by interpreter with the one yielded by the trace. At
first, you don't need to run all 1e6 iterations: you have set 'hotloop'
to 1, so you need only 3 (!) iterations to successfully compile the
loop. The second call will use the compiled trace for the first loop
iteration, but also will try to compile the function itself. AFAIU, the
constant being materialized on the trace is the <a> table address,
right? At least I see no other option, so please mention in the comment
which constant is fused and leads to the failure if the patch is
missing. Finally, I don't get, why do you need 100 iterations for
getting the misbehaviour.

Please, add everything I dumped below as the corresponding comments for
this part.

> +        if (res1 ~= res2) then
> +                pass = false

Why don't you add the assertion about constant fusion right here?

> +                break
> +        end
> +end
> +
> +test:ok(pass, 'wrong IR constant fuse')

[1]: https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guide/

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
  2021-07-04 21:06 ` Igor Munkin via Tarantool-patches
@ 2022-02-16 15:44   ` Sergey Kaplun via Tarantool-patches
  2022-06-21 12:11     ` sergos via Tarantool-patches
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2022-02-16 15:44 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Sergos!
Thanks for the patch!

Hi , Igor!
Thanks for the review!

On 05.07.21, Igor Munkin wrote:
> Sergos,
> 
> Thanks for the patch! Something went wrong with formatting the patch I
> guess, so I'll share you the recipe how I send the backported patches:
>   1. cherry-pick the original patch from the upstream
>   2. format-patch the backported commit
>   3. Add "From" header with the original author of the changes at the
>      very beginning of the formatted patch
>   4. Adjust the commit message according to our backporting procedure
> 
> BTW, I've found neither the branch with the patch in tarantool/luajit
> nor the branch with the green CI in tarantool/tarantool...

I've updated a little the old Sergos's branch [1] and save results on my
own branch [2].
Also I've added GC64 workflow for branch checkouted from the current
Tarantool's master branch [3].

The new commit message and patch is the following:

===================================================================
commit 0759d1783ef96c4bb47869c6bc1181b38d95f654
Author: Mike Pall <mike>
Date:   Mon Aug 28 10:43:37 2017 +0200

    From: Mike Pall <mike>
    
    (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
    
    x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
    
    Contributed by Peter Cawley.
    
    Code generation under LJ_GC64 missed an update to the mcode area
    boundaries after a 64-bit constant encoding. It can lead to a
    corruption of the constant value after next trace code generation.
    The constant can be encoded in one of the following ways:
    
     a. If the address of the constant fits into 32-bit one, then encode it
        as a 32-bit displacement (the only option for non-GC64 mode).
     b. If the offset of the constant slot from the dispatch table (pinned
        to r14 that is not changed while trace execution) fits into 32-bit,
        then encode this as a 32-bit displacement relative to r14.
     c. If the offset of the constant slot from the mcode (i.e. rip) fits
        into 32-bit, then encode this as a 32-bit displacement relative to
        rip (considering long mode specifics and RID_RIP hack).
     d. If none of the conditions above are valid, compiler materializes
        this 64-bit constant right at the trace bottom and encodes this the
        same way it does for the previous case.
    
    The mentioned problem appears only with 2Gb distance from the currently
    allocated mcode to the dispatch pointer, which may happen in real life
    in case of long running instance.
    
    Sergey Ostanevich:
    * added the description and the test for the problem
    
    Fixes tarantool/tarantool#4095
    Fixes tarantool/tarantool#4199
    Fixes tarantool/tarantool#4614

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index 767bf6f3..2850aea9 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
       ir->i = (int32_t)(as->mctop - as->mcbot);
       as->mcbot += 8;
       as->mclim = as->mcbot + MCLIM_REDZONE;
+      lj_mcode_commitbot(as->J, as->mcbot);
     }
     as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
     as->mrm.base = RID_RIP;
diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
new file mode 100644
index 00000000..9eabbdba
--- /dev/null
+++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
@@ -0,0 +1,79 @@
+-- The test is GC64 only.
+local ffi = require('ffi')
+require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
+
+local tap = require('tap')
+local test = tap.test('gh-4199-gc64-fuse')
+test:plan(1)
+
+collectgarbage()
+-- Chomp memory in currently allocated GC space.
+collectgarbage('stop')
+
+for _ = 1, 8 do
+  ffi.new('char[?]', 256 * 1024 * 1024)
+end
+
+jit.opt.start('hotloop=1')
+
+-- Generate a bunch of traces to trigger constant placement at the
+-- end of the trace. Since global describing the mcode area in the
+-- jit structure is not updated, the next trace generated will
+-- invalidate the constant of the previous trace. Then
+-- execution of the _previous_ trace will use wrong value.
+
+-- Keep last two functions generated to compare results.
+local s = {}
+local test_res = true
+
+-- XXX: The number of iteration is fragile, depending on the trace
+-- length and the amount of currently pre-allocated mcode area.
+-- Usually works under 100, which doesn't take too long in case of
+-- success, so I gave up to locate a better way to chomp the mcode
+-- area.
+
+for n = 1, 100 do
+  local src = string.format([[
+    function f%d(x, y)
+      local a = {}
+      for i = 1, 5 do
+        a[i] = 0
+        -- This summ is not necessarry but decreases the amount of
+        -- iterations.
+        a[i] = a[i] + x + y
+        -- Use all FPR registers and one value from the memory
+        -- (xmm0 is for result, xmm15 states for work with table).
+        a[i] = a[i] + 1.1
+        a[i] = a[i] + 2.2
+        a[i] = a[i] + 3.3
+        a[i] = a[i] + 4.4
+        a[i] = a[i] + 5.5
+        a[i] = a[i] + 6.6
+        a[i] = a[i] + 7.7
+        a[i] = a[i] + 8.8
+        a[i] = a[i] + 9.9
+        a[i] = a[i] + 10.10
+        a[i] = a[i] + 11.11
+        a[i] = a[i] + 12.12
+        a[i] = a[i] + 13.13
+        a[i] = a[i] + 14.14
+        a[i] = a[i] + 15.15
+      end
+      return a[1]
+    end
+    return f%d(...)
+  ]], n, n)
+  s[2] = load(src)
+  if s[1] ~= nil then
+    local res1 = s[1](1, 2)
+    local res2 = s[2](1, 2)
+    if res1 ~= res2 then
+      test_res = false
+      break
+    end
+  end
+  s[1] = s[2]
+end
+
+test:ok(test_res, 'IR constant fusion')
+os.exit(test:check() and 0 or 1)
===================================================================

> 
> On 28.05.21, Sergey Ostanevich wrote:
> > Author: Mike Pall <mike>
> > Date:   Mon Aug 28 10:43:37 2017 +0200
> > 
> >     x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
> > 
> >     Contributed by Peter Cawley.
> > 
> >     (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
> > 
> >     Code generation under LJ_GC64 missed an update to the mcode area after
> 
> Minor: s/mcode area/mcode area boundaries/. Feel free to ignore.
> 
> >     a 64bit constant encoding. This lead to a corruption to the constant
> 
> Typo: s/corruption to/corruption of/.
> 
> >     later on.
> >     The problem is rather rare, since there should be big enough (4GiB)
> >     distance from the currently allocated mcode to the dispatch pointer.
> 
> Minor: Sorry for being pedantic, but this is not such a trivial bug, so
> I've described semantics around this a bit. As for me it's worth to
> mention four possible encodings of 64-bit constant on the trace.
>   0. If the address of the constant fits into 32-bit one, then encode it
>      as a 32-bit displacement (the only option for non-GC64 mode).
>   1. If the offset of the constant slot from the dispatch table (pinned
>      to r14 that is not changed while trace execution) fits into 32-bit,
>      then encode this as a 32-bit displacement relative to r14.
>   2. If the offset of the constant slot from the mcode (i.e. rip) fits
>      into 32-bit, then encode this as a 32-bit displacement relative to
>      rip (considering long mode specifics and RID_RIP hack).
>   3. If none of the conditions above are valid, compiler materializes
>      this 64-bit constant right at the trace bottom and encodes this the
>      same way it does for the previous case.
> 
> And now goes your part regarding 4Gb and rarity of this case and the
> problem is much clearer to the reader. Please adjust this part using my
> description above.
> 
> >     This lead to a number of flaky tests, trackers are addressed.
> > 
> >     Sergey Ostanevich:
> >     * added the description and the test for the problem
> > 
> >     Closes: #4095, #4199, #4614
> 
> At first, please move split this line into the separate ones with a
> single issue per line. Also don't use ':' in GitHub tags, please: it
> just doesn't respect our commit message style. Unfortunately, this
> commit tags no issue, since it is pushed to tarantool/luajit repo, but
> the issues relate to tarantool/tarantool. Hence, you need to explicitly
> mention the latter repo.  And last (but not least), we agreed with
> Sergey to use "Resolves" instead of "Closes" in tarantool/luajit repo
> and use "Closes" in the corresponding patches bumping LuaJIT submodule.
> As for these patches, I'd rather use "Fixes", since you have checked
> that the failures related to the mentioned issues are gone. Considering
> everything above, the line above transforms into the following:
> 
> | Fixes tarantool/tarantool#4095
> | Fixes tarantool/tarantool#4199
> | Fixes tarantool/tarantool#4614
> 
> > 
> >     Signed-off-by: Sergey Ostanevich <sergos@tarantool.org>
> > 
> > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> > index 767bf6f3..2850aea9 100644
> > --- a/src/lj_asm_x86.h
> > +++ b/src/lj_asm_x86.h
> > @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
> >        ir->i = (int32_t)(as->mctop - as->mcbot);
> >        as->mcbot += 8;
> >        as->mclim = as->mcbot + MCLIM_REDZONE;
> > +      lj_mcode_commitbot(as->J, as->mcbot);
> >      }
> >      as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
> >      as->mrm.base = RID_RIP;
> > diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
> > new file mode 100644
> > index 00000000..3ac30427
> > --- /dev/null
> > +++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
> > @@ -0,0 +1,63 @@
> 
> Regarding test code style: I don't want to point each place you have
> violated it, so I just refer the document[1] that we try to follow except
> the one rule: we use 2 spaces for indentation to get tests closer to
> LuaJIT sources. Hence, please consider that I've "implicitly" commented
> every spot below, where you don't follow our code style.
> 
> > +-- the test is GC64 only
> > +local ffi=require('ffi')
> > +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
> > +
> > +local tap = require("tap")
> > +local test = tap.test("gh-4199-gc64-flaky")
> > +test:plan(1)
> > +
> > +-- first - we have to make a gap from current JIT infra to next
> > +-- available mappable memory
> > +-- most efficient is to grab it per-page
> > +
> > +
> > +ffi.cdef('void * mmap(void *start, size_t length, int prot , int flags, int fd, long offset);')
> > +ffi.cdef('long getpagesize();')
> 
> Minor: It's better use <ffi.cdef> once but with multiline declarations
> instead of several <ffi.cdef> calls.
> 
> > +
> > +local pagesize = tonumber(ffi.C.getpagesize())
> > +local blob = {}
> > +for i=1, 4e9/pagesize do
> > +        blob[i] = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0)
> 
> 0x22, 4e9 -- these are magic numbers for me. Please, create a variable
> with the descriptive name (and even comments, I guess) for each of them,
> to ease the further maintenance.
> 
> > +        assert(blob[i] ~= 0)
> > +end
> > +
> > +-- try to chomp all memory in currently allocated gc space
> > +collectgarbage('stop')
> 
> Since you decided to stop GC, you can do it right at the very beginning,
> so you don't need to anchor all mmaped memory above.
> 
> > +local dummy={'a'}
> > +for i=2,30 do
> > +        dummy[i] = dummy[i - 1] .. dummy[i - 1]
> > +end
> 
> Why do you need this loop?
> 
> > +
> > +-- generate a bunch of functions and keep them stored to trigger wrong constant placement
> > +
> > +local s={}
> 
> Again, since GC is stopped, there is no need to anchor all the functions
> generated below.
> 
> > +local pass = true
> > +
> > +jit.opt.start('hotloop=1’)
> > +for n=1,100 do
> > +        local src='function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d)
> > +                local a={}
> > +                for i=1,1e6 do
> > +                        a[i] = x + y + z + f + g + h + j + k + r + c + d
> > +                        if (x > 0) then a[i] = a[i] + 1.1 end
> > +                        if (c > 0) then a[i] = a[i] + 2.2 end
> > +                        if (z > 0) then a[i] = a[i] + 3.3 end
> > +                        if (f > 0) then a[i] = a[i] + 4.4 end
> > +                        x=x+r
> > +                        y=y-c
> > +                        z=z+d
> > +                end
> > +                return a[1]
> > +        end
> > +        return f]] .. n ..'(...)'
> > +
> > +        s[n] = assert(load(src))
> > +        local res1 = s[n](1,2,3,4,5,6,7,8,9,10,11)
> > +        local res2 = s[n](1,2,3,4,5,6,7,8,9,10,11)
> 
> This was not obvious to me, but I've finally got it: you compare the
> result yielded by interpreter with the one yielded by the trace. At
> first, you don't need to run all 1e6 iterations: you have set 'hotloop'
> to 1, so you need only 3 (!) iterations to successfully compile the
> loop. The second call will use the compiled trace for the first loop
> iteration, but also will try to compile the function itself. AFAIU, the
> constant being materialized on the trace is the <a> table address,
> right? At least I see no other option, so please mention in the comment
> which constant is fused and leads to the failure if the patch is
> missing. Finally, I don't get, why do you need 100 iterations for
> getting the misbehaviour.
> 
> Please, add everything I dumped below as the corresponding comments for
> this part.
> 
> > +        if (res1 ~= res2) then
> > +                pass = false
> 
> Why don't you add the assertion about constant fusion right here?
> 
> > +                break
> > +        end
> > +end
> > +
> > +test:ok(pass, 'wrong IR constant fuse')
> 
> [1]: https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guide/
> 
> -- 
> Best regards,
> IM

[1]: https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion
[2]: https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci
[3]: https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
  2022-02-16 15:44   ` Sergey Kaplun via Tarantool-patches
@ 2022-06-21 12:11     ` sergos via Tarantool-patches
  2022-06-22 13:32       ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 7+ messages in thread
From: sergos via Tarantool-patches @ 2022-06-21 12:11 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 14635 bytes --]

Hi!

Thanks for working on it!
Just minor nits, 
LGTM

Sergos

> On 16 Feb 2022, at 18:44, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> Hi, Sergos!
> Thanks for the patch!
> 
> Hi , Igor!
> Thanks for the review!
> 
> On 05.07.21, Igor Munkin wrote:
>> Sergos,
>> 
>> Thanks for the patch! Something went wrong with formatting the patch I
>> guess, so I'll share you the recipe how I send the backported patches:
>> 1. cherry-pick the original patch from the upstream
>> 2. format-patch the backported commit
>> 3. Add "From" header with the original author of the changes at the
>> very beginning of the formatted patch
>> 4. Adjust the commit message according to our backporting procedure
>> 
>> BTW, I've found neither the branch with the patch in tarantool/luajit
>> nor the branch with the green CI in tarantool/tarantool...
> 
> I've updated a little the old Sergos's branch [1] and save results on my
> own branch [2].
> Also I've added GC64 workflow for branch checkouted from the current
> Tarantool's master branch [3].
> 
> The new commit message and patch is the following:
> 
> ===================================================================
> commit 0759d1783ef96c4bb47869c6bc1181b38d95f654
> Author: Mike Pall <mike>
> Date: Mon Aug 28 10:43:37 2017 +0200
> 
> From: Mike Pall <mike>
> 
> (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
> 
> x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
> 
> Contributed by Peter Cawley.
> 
> Code generation under LJ_GC64 missed an update to the mcode area
> boundaries after a 64-bit constant encoding. It can lead to a
> corruption of the constant value after next trace code generation.
> The constant can be encoded in one of the following ways:
> 
> a. If the address of the constant fits into 32-bit one, then encode it
> as a 32-bit displacement (the only option for non-GC64 mode).
> b. If the offset of the constant slot from the dispatch table (pinned
> to r14 that is not changed while trace execution) fits into 32-bit,
> then encode this as a 32-bit displacement relative to r14.
> c. If the offset of the constant slot from the mcode (i.e. rip) fits
> into 32-bit, then encode this as a 32-bit displacement relative to
> rip (considering long mode specifics and RID_RIP hack).
> d. If none of the conditions above are valid, compiler materializes
> this 64-bit constant right at the trace bottom and encodes this the
> same way it does for the previous case.
> 
> The mentioned problem appears only with 2Gb distance from the currently
> allocated mcode to the dispatch pointer, which may happen in real life
> in case of long running instance.
> 
> Sergey Ostanevich:
> * added the description and the test for the problem
> 
> Fixes tarantool/tarantool#4095
> Fixes tarantool/tarantool#4199
> Fixes tarantool/tarantool#4614
> 
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index 767bf6f3..2850aea9 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
> ir->i = (int32_t)(as->mctop - as->mcbot);
> as->mcbot += 8;
> as->mclim = as->mcbot + MCLIM_REDZONE;
> + lj_mcode_commitbot(as->J, as->mcbot);
> }
> as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
> as->mrm.base = RID_RIP;
> diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
> new file mode 100644
> index 00000000..9eabbdba
> --- /dev/null
> +++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
> @@ -0,0 +1,79 @@
> +-- The test is GC64 only.
> +local ffi = require('ffi')
> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
> +
> +local tap = require('tap')
> +local test = tap.test('gh-4199-gc64-fuse')
> +test:plan(1)
> +
> +collectgarbage()
> +-- Chomp memory in currently allocated GC space.
> +collectgarbage('stop')
> +
> +for _ = 1, 8 do
> + ffi.new('char[?]', 256 * 1024 * 1024)
> +end
> +
> +jit.opt.start('hotloop=1')
> +
> +-- Generate a bunch of traces to trigger constant placement at the
> +-- end of the trace. Since global describing the mcode area in the
                             ^     ^
                             a     variable
> +-- jit structure is not updated, the next trace generated will
> +-- invalidate the constant of the previous trace. Then
> +-- execution of the _previous_ trace will use wrong value.
                                                ^
                                               the
> +
> +-- Keep last two functions generated to compare results.
          ^
         the
> +local s = {}
> +local test_res = true
> +
> +-- XXX: The number of iteration is fragile, depending on the trace
> +-- length and the amount of currently pre-allocated mcode area.
> +-- Usually works under 100, which doesn't take too long in case of
> +-- success, so I gave up to locate a better way to chomp the mcode
> +-- area.
> +
> +for n = 1, 100 do
> + local src = string.format([[
> + function f%d(x, y)

Not sure if indentation is preserved in the patch and should it be here?

> + local a = {}
> + for i = 1, 5 do
> + a[i] = 0
> + -- This summ is not necessarry but decreases the amount of
> + -- iterations.
> + a[i] = a[i] + x + y
> + -- Use all FPR registers and one value from the memory
> + -- (xmm0 is for result, xmm15 states for work with table).
> + a[i] = a[i] + 1.1
> + a[i] = a[i] + 2.2
> + a[i] = a[i] + 3.3
> + a[i] = a[i] + 4.4
> + a[i] = a[i] + 5.5
> + a[i] = a[i] + 6.6
> + a[i] = a[i] + 7.7
> + a[i] = a[i] + 8.8
> + a[i] = a[i] + 9.9
> + a[i] = a[i] + 10.10
> + a[i] = a[i] + 11.11
> + a[i] = a[i] + 12.12
> + a[i] = a[i] + 13.13
> + a[i] = a[i] + 14.14
> + a[i] = a[i] + 15.15
> + end
> + return a[1]
> + end
> + return f%d(...)
> + ]], n, n)
> + s[2] = load(src)
> + if s[1] ~= nil then
> + local res1 = s[1](1, 2)
> + local res2 = s[2](1, 2)
> + if res1 ~= res2 then
> + test_res = false
> + break
> + end
> + end
> + s[1] = s[2]
> +end
> +
> +test:ok(test_res, 'IR constant fusion')
> +os.exit(test:check() and 0 or 1)
> ===================================================================
> 
>> 
>> On 28.05.21, Sergey Ostanevich wrote:
>>> Author: Mike Pall <mike>
>>> Date: Mon Aug 28 10:43:37 2017 +0200
>>> 
>>> x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
>>> 
>>> Contributed by Peter Cawley.
>>> 
>>> (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
>>> 
>>> Code generation under LJ_GC64 missed an update to the mcode area after
>> 
>> Minor: s/mcode area/mcode area boundaries/. Feel free to ignore.
>> 
>>> a 64bit constant encoding. This lead to a corruption to the constant
>> 
>> Typo: s/corruption to/corruption of/.
>> 
>>> later on.
>>> The problem is rather rare, since there should be big enough (4GiB)
>>> distance from the currently allocated mcode to the dispatch pointer.
>> 
>> Minor: Sorry for being pedantic, but this is not such a trivial bug, so
>> I've described semantics around this a bit. As for me it's worth to
>> mention four possible encodings of 64-bit constant on the trace.
>> 0. If the address of the constant fits into 32-bit one, then encode it
>> as a 32-bit displacement (the only option for non-GC64 mode).
>> 1. If the offset of the constant slot from the dispatch table (pinned
>> to r14 that is not changed while trace execution) fits into 32-bit,
>> then encode this as a 32-bit displacement relative to r14.
>> 2. If the offset of the constant slot from the mcode (i.e. rip) fits
>> into 32-bit, then encode this as a 32-bit displacement relative to
>> rip (considering long mode specifics and RID_RIP hack).
>> 3. If none of the conditions above are valid, compiler materializes
>> this 64-bit constant right at the trace bottom and encodes this the
>> same way it does for the previous case.
>> 
>> And now goes your part regarding 4Gb and rarity of this case and the
>> problem is much clearer to the reader. Please adjust this part using my
>> description above.
>> 
>>> This lead to a number of flaky tests, trackers are addressed.
>>> 
>>> Sergey Ostanevich:
>>> * added the description and the test for the problem
>>> 
>>> Closes: #4095, #4199, #4614
>> 
>> At first, please move split this line into the separate ones with a
>> single issue per line. Also don't use ':' in GitHub tags, please: it
>> just doesn't respect our commit message style. Unfortunately, this
>> commit tags no issue, since it is pushed to tarantool/luajit repo, but
>> the issues relate to tarantool/tarantool. Hence, you need to explicitly
>> mention the latter repo. And last (but not least), we agreed with
>> Sergey to use "Resolves" instead of "Closes" in tarantool/luajit repo
>> and use "Closes" in the corresponding patches bumping LuaJIT submodule.
>> As for these patches, I'd rather use "Fixes", since you have checked
>> that the failures related to the mentioned issues are gone. Considering
>> everything above, the line above transforms into the following:
>> 
>> | Fixes tarantool/tarantool#4095
>> | Fixes tarantool/tarantool#4199
>> | Fixes tarantool/tarantool#4614
>> 
>>> 
>>> Signed-off-by: Sergey Ostanevich <sergos@tarantool.org>
>>> 
>>> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
>>> index 767bf6f3..2850aea9 100644
>>> --- a/src/lj_asm_x86.h
>>> +++ b/src/lj_asm_x86.h
>>> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
>>> ir->i = (int32_t)(as->mctop - as->mcbot);
>>> as->mcbot += 8;
>>> as->mclim = as->mcbot + MCLIM_REDZONE;
>>> + lj_mcode_commitbot(as->J, as->mcbot);
>>> }
>>> as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
>>> as->mrm.base = RID_RIP;
>>> diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
>>> new file mode 100644
>>> index 00000000..3ac30427
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
>>> @@ -0,0 +1,63 @@
>> 
>> Regarding test code style: I don't want to point each place you have
>> violated it, so I just refer the document[1] that we try to follow except
>> the one rule: we use 2 spaces for indentation to get tests closer to
>> LuaJIT sources. Hence, please consider that I've "implicitly" commented
>> every spot below, where you don't follow our code style.
>> 
>>> +-- the test is GC64 only
>>> +local ffi=require('ffi')
>>> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
>>> +
>>> +local tap = require("tap")
>>> +local test = tap.test("gh-4199-gc64-flaky")
>>> +test:plan(1)
>>> +
>>> +-- first - we have to make a gap from current JIT infra to next
>>> +-- available mappable memory
>>> +-- most efficient is to grab it per-page
>>> +
>>> +
>>> +ffi.cdef('void * mmap(void *start, size_t length, int prot , int flags, int fd, long offset);')
>>> +ffi.cdef('long getpagesize();')
>> 
>> Minor: It's better use <ffi.cdef> once but with multiline declarations
>> instead of several <ffi.cdef> calls.
>> 
>>> +
>>> +local pagesize = tonumber(ffi.C.getpagesize())
>>> +local blob = {}
>>> +for i=1, 4e9/pagesize do
>>> + blob[i] = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0)
>> 
>> 0x22, 4e9 -- these are magic numbers for me. Please, create a variable
>> with the descriptive name (and even comments, I guess) for each of them,
>> to ease the further maintenance.
>> 
>>> + assert(blob[i] ~= 0)
>>> +end
>>> +
>>> +-- try to chomp all memory in currently allocated gc space
>>> +collectgarbage('stop')
>> 
>> Since you decided to stop GC, you can do it right at the very beginning,
>> so you don't need to anchor all mmaped memory above.
>> 
>>> +local dummy={'a'}
>>> +for i=2,30 do
>>> + dummy[i] = dummy[i - 1] .. dummy[i - 1]
>>> +end
>> 
>> Why do you need this loop?
>> 
>>> +
>>> +-- generate a bunch of functions and keep them stored to trigger wrong constant placement
>>> +
>>> +local s={}
>> 
>> Again, since GC is stopped, there is no need to anchor all the functions
>> generated below.
>> 
>>> +local pass = true
>>> +
>>> +jit.opt.start('hotloop=1’)
>>> +for n=1,100 do
>>> + local src='function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d)
>>> + local a={}
>>> + for i=1,1e6 do
>>> + a[i] = x + y + z + f + g + h + j + k + r + c + d
>>> + if (x > 0) then a[i] = a[i] + 1.1 end
>>> + if (c > 0) then a[i] = a[i] + 2.2 end
>>> + if (z > 0) then a[i] = a[i] + 3.3 end
>>> + if (f > 0) then a[i] = a[i] + 4.4 end
>>> + x=x+r
>>> + y=y-c
>>> + z=z+d
>>> + end
>>> + return a[1]
>>> + end
>>> + return f]] .. n ..'(...)'
>>> +
>>> + s[n] = assert(load(src))
>>> + local res1 = s[n](1,2,3,4,5,6,7,8,9,10,11)
>>> + local res2 = s[n](1,2,3,4,5,6,7,8,9,10,11)
>> 
>> This was not obvious to me, but I've finally got it: you compare the
>> result yielded by interpreter with the one yielded by the trace. At
>> first, you don't need to run all 1e6 iterations: you have set 'hotloop'
>> to 1, so you need only 3 (!) iterations to successfully compile the
>> loop. The second call will use the compiled trace for the first loop
>> iteration, but also will try to compile the function itself. AFAIU, the
>> constant being materialized on the trace is the <a> table address,
>> right? At least I see no other option, so please mention in the comment
>> which constant is fused and leads to the failure if the patch is
>> missing. Finally, I don't get, why do you need 100 iterations for
>> getting the misbehaviour.
>> 
>> Please, add everything I dumped below as the corresponding comments for
>> this part.
>> 
>>> + if (res1 ~= res2) then
>>> + pass = false
>> 
>> Why don't you add the assertion about constant fusion right here?
>> 
>>> + break
>>> + end
>>> +end
>>> +
>>> +test:ok(pass, 'wrong IR constant fuse')
>> 
>> [1]: https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guide/
>> 
>> -- 
>> Best regards,
>> IM
> 
> [1]: https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion <https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion>
> [2]: https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci <https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci>
> [3]: https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci <https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci>
> 
> -- 
> Best regards,
> Sergey Kaplun


[-- Attachment #2: Type: text/html, Size: 130357 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
  2022-06-21 12:11     ` sergos via Tarantool-patches
@ 2022-06-22 13:32       ` Sergey Kaplun via Tarantool-patches
  2022-06-29  8:04         ` Igor Munkin via Tarantool-patches
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2022-06-22 13:32 UTC (permalink / raw)
  To: sergos; +Cc: tarantool-patches

Hi, Sergos!

Thanks for the review!

On 21.06.22, sergos wrote:
> Hi!
> 
> Thanks for working on it!
> Just minor nits, 
> LGTM

Fixed. See the iterative diff below.
Branch is force pushed.

> 
> Sergos
> 
> > On 16 Feb 2022, at 18:44, Sergey Kaplun <skaplun@tarantool.org> wrote:
> > 
> > Hi, Sergos!
> > Thanks for the patch!
> > 
> > Hi , Igor!
> > Thanks for the review!
> > 
> > On 05.07.21, Igor Munkin wrote:
> >> Sergos,
> >> 
> >> Thanks for the patch! Something went wrong with formatting the patch I
> >> guess, so I'll share you the recipe how I send the backported patches:
> >> 1. cherry-pick the original patch from the upstream
> >> 2. format-patch the backported commit
> >> 3. Add "From" header with the original author of the changes at the
> >> very beginning of the formatted patch
> >> 4. Adjust the commit message according to our backporting procedure
> >> 
> >> BTW, I've found neither the branch with the patch in tarantool/luajit
> >> nor the branch with the green CI in tarantool/tarantool...
> > 
> > I've updated a little the old Sergos's branch [1] and save results on my
> > own branch [2].
> > Also I've added GC64 workflow for branch checkouted from the current
> > Tarantool's master branch [3].
> > 
> > The new commit message and patch is the following:
> > 
> > ===================================================================
> > commit 0759d1783ef96c4bb47869c6bc1181b38d95f654
> > Author: Mike Pall <mike>
> > Date: Mon Aug 28 10:43:37 2017 +0200
> > 
> > From: Mike Pall <mike>

Fixed the commit message :)

| x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
|
| Contributed by Peter Cawley.
|
| (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
|
| Code generation under LJ_GC64 missed an update to the mcode area
...


> > 
> > (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
> > 
> > x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
> > 
> > Contributed by Peter Cawley.
> > 
> > Code generation under LJ_GC64 missed an update to the mcode area
> > boundaries after a 64-bit constant encoding. It can lead to a
> > corruption of the constant value after next trace code generation.
> > The constant can be encoded in one of the following ways:
> > 
> > a. If the address of the constant fits into 32-bit one, then encode it
> > as a 32-bit displacement (the only option for non-GC64 mode).
> > b. If the offset of the constant slot from the dispatch table (pinned
> > to r14 that is not changed while trace execution) fits into 32-bit,
> > then encode this as a 32-bit displacement relative to r14.
> > c. If the offset of the constant slot from the mcode (i.e. rip) fits
> > into 32-bit, then encode this as a 32-bit displacement relative to
> > rip (considering long mode specifics and RID_RIP hack).
> > d. If none of the conditions above are valid, compiler materializes
> > this 64-bit constant right at the trace bottom and encodes this the
> > same way it does for the previous case.
> > 
> > The mentioned problem appears only with 2Gb distance from the currently
> > allocated mcode to the dispatch pointer, which may happen in real life
> > in case of long running instance.
> > 
> > Sergey Ostanevich:
> > * added the description and the test for the problem
> > 
> > Fixes tarantool/tarantool#4095
> > Fixes tarantool/tarantool#4199
> > Fixes tarantool/tarantool#4614
> > 
> > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> > index 767bf6f3..2850aea9 100644
> > --- a/src/lj_asm_x86.h
> > +++ b/src/lj_asm_x86.h
> > @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
> > ir->i = (int32_t)(as->mctop - as->mcbot);
> > as->mcbot += 8;
> > as->mclim = as->mcbot + MCLIM_REDZONE;
> > + lj_mcode_commitbot(as->J, as->mcbot);
> > }
> > as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
> > as->mrm.base = RID_RIP;
> > diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
> > new file mode 100644
> > index 00000000..9eabbdba
> > --- /dev/null
> > +++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
> > @@ -0,0 +1,79 @@
> > +-- The test is GC64 only.
> > +local ffi = require('ffi')
> > +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
> > +
> > +local tap = require('tap')
> > +local test = tap.test('gh-4199-gc64-fuse')
> > +test:plan(1)
> > +
> > +collectgarbage()
> > +-- Chomp memory in currently allocated GC space.
> > +collectgarbage('stop')
> > +
> > +for _ = 1, 8 do
> > + ffi.new('char[?]', 256 * 1024 * 1024)
> > +end
> > +
> > +jit.opt.start('hotloop=1')
> > +
> > +-- Generate a bunch of traces to trigger constant placement at the
> > +-- end of the trace. Since global describing the mcode area in the
>                              ^     ^
>                              a     variable
> > +-- jit structure is not updated, the next trace generated will
> > +-- invalidate the constant of the previous trace. Then
> > +-- execution of the _previous_ trace will use wrong value.
>                                                 ^
>                                                the
> > +
> > +-- Keep last two functions generated to compare results.
>           ^
>          the

I've updated comments as the following:

===================================================================
diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
index 9eabbdba..b34b4a4f 100644
--- a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
+++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
@@ -17,12 +17,13 @@ end
 jit.opt.start('hotloop=1')
 
 -- Generate a bunch of traces to trigger constant placement at the
--- end of the trace. Since global describing the mcode area in the
--- jit structure is not updated, the next trace generated will
--- invalidate the constant of the previous trace. Then
--- execution of the _previous_ trace will use wrong value.
+-- end of the trace. Since a global variable describing the mcode
+-- area in the jit structure is not updated, the next trace
+-- generated will invalidate the constant of the previous trace.
+-- Then execution of the _previous_ trace will use the wrong
+-- value.
 
--- Keep last two functions generated to compare results.
+-- Keep the last two functions generated to compare results.
 local s = {}
 local test_res = true
 
===================================================================

The branch is force-pushed.

> > +local s = {}
> > +local test_res = true
> > +
> > +-- XXX: The number of iteration is fragile, depending on the trace
> > +-- length and the amount of currently pre-allocated mcode area.
> > +-- Usually works under 100, which doesn't take too long in case of
> > +-- success, so I gave up to locate a better way to chomp the mcode
> > +-- area.
> > +
> > +for n = 1, 100 do
> > + local src = string.format([[
> > + function f%d(x, y)
> 
> Not sure if indentation is preserved in the patch and should it be here?

Yes, this is something strange with offset here. I've used 2 spaces as
usual.

> 
> > + local a = {}

<snipped>

> >> IM
> > 
> > [1]: https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion <https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion>
> > [2]: https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci <https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci>
> > [3]: https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci <https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci>
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
  2022-06-22 13:32       ` Sergey Kaplun via Tarantool-patches
@ 2022-06-29  8:04         ` Igor Munkin via Tarantool-patches
  0 siblings, 0 replies; 7+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2022-06-29  8:04 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for your fixes! Glad to see this patch moving closer to the
trunk! LGTM.

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion
  2021-05-28 12:06 [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion Sergey Ostanevich via Tarantool-patches
  2021-07-04 21:06 ` Igor Munkin via Tarantool-patches
@ 2022-06-30 12:10 ` Igor Munkin via Tarantool-patches
  1 sibling, 0 replies; 7+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2022-06-30 12:10 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Sergos,

I've checked the patch into all long-term branches in tarantool/luajit
and bumped a new version in master, 2.10 and 1.10.

On 28.05.21, Sergey Ostanevich wrote:
> Author: Mike Pall <mike>
> Date:   Mon Aug 28 10:43:37 2017 +0200
> 
>     x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
> 
>     Contributed by Peter Cawley.
> 
>     (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
> 
>     Code generation under LJ_GC64 missed an update to the mcode area after
>     a 64bit constant encoding. This lead to a corruption to the constant
>     later on.
>     The problem is rather rare, since there should be big enough (4GiB)
>     distance from the currently allocated mcode to the dispatch pointer.
>     This lead to a number of flaky tests, trackers are addressed.
> 
>     Sergey Ostanevich:
>     * added the description and the test for the problem
> 
>     Closes: #4095, #4199, #4614
> 
>     Signed-off-by: Sergey Ostanevich <sergos@tarantool.org>
> 

<snipped>

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-30 12:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-28 12:06 [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion Sergey Ostanevich via Tarantool-patches
2021-07-04 21:06 ` Igor Munkin via Tarantool-patches
2022-02-16 15:44   ` Sergey Kaplun via Tarantool-patches
2022-06-21 12:11     ` sergos via Tarantool-patches
2022-06-22 13:32       ` Sergey Kaplun via Tarantool-patches
2022-06-29  8:04         ` Igor Munkin via Tarantool-patches
2022-06-30 12:10 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox