Hi!

Thanks for working on it!
Just minor nits, 
LGTM

Sergos

> On 16 Feb 2022, at 18:44, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> Hi, Sergos!
> Thanks for the patch!
> 
> Hi , Igor!
> Thanks for the review!
> 
> On 05.07.21, Igor Munkin wrote:
>> Sergos,
>> 
>> Thanks for the patch! Something went wrong with formatting the patch I
>> guess, so I'll share you the recipe how I send the backported patches:
>> 1. cherry-pick the original patch from the upstream
>> 2. format-patch the backported commit
>> 3. Add "From" header with the original author of the changes at the
>> very beginning of the formatted patch
>> 4. Adjust the commit message according to our backporting procedure
>> 
>> BTW, I've found neither the branch with the patch in tarantool/luajit
>> nor the branch with the green CI in tarantool/tarantool...
> 
> I've updated a little the old Sergos's branch [1] and save results on my
> own branch [2].
> Also I've added GC64 workflow for branch checkouted from the current
> Tarantool's master branch [3].
> 
> The new commit message and patch is the following:
> 
> ===================================================================
> commit 0759d1783ef96c4bb47869c6bc1181b38d95f654
> Author: Mike Pall <mike>
> Date: Mon Aug 28 10:43:37 2017 +0200
> 
> From: Mike Pall <mike>
> 
> (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
> 
> x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
> 
> Contributed by Peter Cawley.
> 
> Code generation under LJ_GC64 missed an update to the mcode area
> boundaries after a 64-bit constant encoding. It can lead to a
> corruption of the constant value after next trace code generation.
> The constant can be encoded in one of the following ways:
> 
> a. If the address of the constant fits into 32-bit one, then encode it
> as a 32-bit displacement (the only option for non-GC64 mode).
> b. If the offset of the constant slot from the dispatch table (pinned
> to r14 that is not changed while trace execution) fits into 32-bit,
> then encode this as a 32-bit displacement relative to r14.
> c. If the offset of the constant slot from the mcode (i.e. rip) fits
> into 32-bit, then encode this as a 32-bit displacement relative to
> rip (considering long mode specifics and RID_RIP hack).
> d. If none of the conditions above are valid, compiler materializes
> this 64-bit constant right at the trace bottom and encodes this the
> same way it does for the previous case.
> 
> The mentioned problem appears only with 2Gb distance from the currently
> allocated mcode to the dispatch pointer, which may happen in real life
> in case of long running instance.
> 
> Sergey Ostanevich:
> * added the description and the test for the problem
> 
> Fixes tarantool/tarantool#4095
> Fixes tarantool/tarantool#4199
> Fixes tarantool/tarantool#4614
> 
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index 767bf6f3..2850aea9 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
> ir->i = (int32_t)(as->mctop - as->mcbot);
> as->mcbot += 8;
> as->mclim = as->mcbot + MCLIM_REDZONE;
> + lj_mcode_commitbot(as->J, as->mcbot);
> }
> as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
> as->mrm.base = RID_RIP;
> diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
> new file mode 100644
> index 00000000..9eabbdba
> --- /dev/null
> +++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
> @@ -0,0 +1,79 @@
> +-- The test is GC64 only.
> +local ffi = require('ffi')
> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
> +
> +local tap = require('tap')
> +local test = tap.test('gh-4199-gc64-fuse')
> +test:plan(1)
> +
> +collectgarbage()
> +-- Chomp memory in currently allocated GC space.
> +collectgarbage('stop')
> +
> +for _ = 1, 8 do
> + ffi.new('char[?]', 256 * 1024 * 1024)
> +end
> +
> +jit.opt.start('hotloop=1')
> +
> +-- Generate a bunch of traces to trigger constant placement at the
> +-- end of the trace. Since global describing the mcode area in the
                             ^     ^
                             a     variable
> +-- jit structure is not updated, the next trace generated will
> +-- invalidate the constant of the previous trace. Then
> +-- execution of the _previous_ trace will use wrong value.
                                                ^
                                               the
> +
> +-- Keep last two functions generated to compare results.
          ^
         the
> +local s = {}
> +local test_res = true
> +
> +-- XXX: The number of iteration is fragile, depending on the trace
> +-- length and the amount of currently pre-allocated mcode area.
> +-- Usually works under 100, which doesn't take too long in case of
> +-- success, so I gave up to locate a better way to chomp the mcode
> +-- area.
> +
> +for n = 1, 100 do
> + local src = string.format([[
> + function f%d(x, y)

Not sure if indentation is preserved in the patch and should it be here?

> + local a = {}
> + for i = 1, 5 do
> + a[i] = 0
> + -- This summ is not necessarry but decreases the amount of
> + -- iterations.
> + a[i] = a[i] + x + y
> + -- Use all FPR registers and one value from the memory
> + -- (xmm0 is for result, xmm15 states for work with table).
> + a[i] = a[i] + 1.1
> + a[i] = a[i] + 2.2
> + a[i] = a[i] + 3.3
> + a[i] = a[i] + 4.4
> + a[i] = a[i] + 5.5
> + a[i] = a[i] + 6.6
> + a[i] = a[i] + 7.7
> + a[i] = a[i] + 8.8
> + a[i] = a[i] + 9.9
> + a[i] = a[i] + 10.10
> + a[i] = a[i] + 11.11
> + a[i] = a[i] + 12.12
> + a[i] = a[i] + 13.13
> + a[i] = a[i] + 14.14
> + a[i] = a[i] + 15.15
> + end
> + return a[1]
> + end
> + return f%d(...)
> + ]], n, n)
> + s[2] = load(src)
> + if s[1] ~= nil then
> + local res1 = s[1](1, 2)
> + local res2 = s[2](1, 2)
> + if res1 ~= res2 then
> + test_res = false
> + break
> + end
> + end
> + s[1] = s[2]
> +end
> +
> +test:ok(test_res, 'IR constant fusion')
> +os.exit(test:check() and 0 or 1)
> ===================================================================
> 
>> 
>> On 28.05.21, Sergey Ostanevich wrote:
>>> Author: Mike Pall <mike>
>>> Date: Mon Aug 28 10:43:37 2017 +0200
>>> 
>>> x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().
>>> 
>>> Contributed by Peter Cawley.
>>> 
>>> (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14)
>>> 
>>> Code generation under LJ_GC64 missed an update to the mcode area after
>> 
>> Minor: s/mcode area/mcode area boundaries/. Feel free to ignore.
>> 
>>> a 64bit constant encoding. This lead to a corruption to the constant
>> 
>> Typo: s/corruption to/corruption of/.
>> 
>>> later on.
>>> The problem is rather rare, since there should be big enough (4GiB)
>>> distance from the currently allocated mcode to the dispatch pointer.
>> 
>> Minor: Sorry for being pedantic, but this is not such a trivial bug, so
>> I've described semantics around this a bit. As for me it's worth to
>> mention four possible encodings of 64-bit constant on the trace.
>> 0. If the address of the constant fits into 32-bit one, then encode it
>> as a 32-bit displacement (the only option for non-GC64 mode).
>> 1. If the offset of the constant slot from the dispatch table (pinned
>> to r14 that is not changed while trace execution) fits into 32-bit,
>> then encode this as a 32-bit displacement relative to r14.
>> 2. If the offset of the constant slot from the mcode (i.e. rip) fits
>> into 32-bit, then encode this as a 32-bit displacement relative to
>> rip (considering long mode specifics and RID_RIP hack).
>> 3. If none of the conditions above are valid, compiler materializes
>> this 64-bit constant right at the trace bottom and encodes this the
>> same way it does for the previous case.
>> 
>> And now goes your part regarding 4Gb and rarity of this case and the
>> problem is much clearer to the reader. Please adjust this part using my
>> description above.
>> 
>>> This lead to a number of flaky tests, trackers are addressed.
>>> 
>>> Sergey Ostanevich:
>>> * added the description and the test for the problem
>>> 
>>> Closes: #4095, #4199, #4614
>> 
>> At first, please move split this line into the separate ones with a
>> single issue per line. Also don't use ':' in GitHub tags, please: it
>> just doesn't respect our commit message style. Unfortunately, this
>> commit tags no issue, since it is pushed to tarantool/luajit repo, but
>> the issues relate to tarantool/tarantool. Hence, you need to explicitly
>> mention the latter repo. And last (but not least), we agreed with
>> Sergey to use "Resolves" instead of "Closes" in tarantool/luajit repo
>> and use "Closes" in the corresponding patches bumping LuaJIT submodule.
>> As for these patches, I'd rather use "Fixes", since you have checked
>> that the failures related to the mentioned issues are gone. Considering
>> everything above, the line above transforms into the following:
>> 
>> | Fixes tarantool/tarantool#4095
>> | Fixes tarantool/tarantool#4199
>> | Fixes tarantool/tarantool#4614
>> 
>>> 
>>> Signed-off-by: Sergey Ostanevich <sergos@tarantool.org>
>>> 
>>> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
>>> index 767bf6f3..2850aea9 100644
>>> --- a/src/lj_asm_x86.h
>>> +++ b/src/lj_asm_x86.h
>>> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
>>> ir->i = (int32_t)(as->mctop - as->mcbot);
>>> as->mcbot += 8;
>>> as->mclim = as->mcbot + MCLIM_REDZONE;
>>> + lj_mcode_commitbot(as->J, as->mcbot);
>>> }
>>> as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i);
>>> as->mrm.base = RID_RIP;
>>> diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
>>> new file mode 100644
>>> index 00000000..3ac30427
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
>>> @@ -0,0 +1,63 @@
>> 
>> Regarding test code style: I don't want to point each place you have
>> violated it, so I just refer the document[1] that we try to follow except
>> the one rule: we use 2 spaces for indentation to get tests closer to
>> LuaJIT sources. Hence, please consider that I've "implicitly" commented
>> every spot below, where you don't follow our code style.
>> 
>>> +-- the test is GC64 only
>>> +local ffi=require('ffi')
>>> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only')
>>> +
>>> +local tap = require("tap")
>>> +local test = tap.test("gh-4199-gc64-flaky")
>>> +test:plan(1)
>>> +
>>> +-- first - we have to make a gap from current JIT infra to next
>>> +-- available mappable memory
>>> +-- most efficient is to grab it per-page
>>> +
>>> +
>>> +ffi.cdef('void * mmap(void *start, size_t length, int prot , int flags, int fd, long offset);')
>>> +ffi.cdef('long getpagesize();')
>> 
>> Minor: It's better use <ffi.cdef> once but with multiline declarations
>> instead of several <ffi.cdef> calls.
>> 
>>> +
>>> +local pagesize = tonumber(ffi.C.getpagesize())
>>> +local blob = {}
>>> +for i=1, 4e9/pagesize do
>>> + blob[i] = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0)
>> 
>> 0x22, 4e9 -- these are magic numbers for me. Please, create a variable
>> with the descriptive name (and even comments, I guess) for each of them,
>> to ease the further maintenance.
>> 
>>> + assert(blob[i] ~= 0)
>>> +end
>>> +
>>> +-- try to chomp all memory in currently allocated gc space
>>> +collectgarbage('stop')
>> 
>> Since you decided to stop GC, you can do it right at the very beginning,
>> so you don't need to anchor all mmaped memory above.
>> 
>>> +local dummy={'a'}
>>> +for i=2,30 do
>>> + dummy[i] = dummy[i - 1] .. dummy[i - 1]
>>> +end
>> 
>> Why do you need this loop?
>> 
>>> +
>>> +-- generate a bunch of functions and keep them stored to trigger wrong constant placement
>>> +
>>> +local s={}
>> 
>> Again, since GC is stopped, there is no need to anchor all the functions
>> generated below.
>> 
>>> +local pass = true
>>> +
>>> +jit.opt.start('hotloop=1’)
>>> +for n=1,100 do
>>> + local src='function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d)
>>> + local a={}
>>> + for i=1,1e6 do
>>> + a[i] = x + y + z + f + g + h + j + k + r + c + d
>>> + if (x > 0) then a[i] = a[i] + 1.1 end
>>> + if (c > 0) then a[i] = a[i] + 2.2 end
>>> + if (z > 0) then a[i] = a[i] + 3.3 end
>>> + if (f > 0) then a[i] = a[i] + 4.4 end
>>> + x=x+r
>>> + y=y-c
>>> + z=z+d
>>> + end
>>> + return a[1]
>>> + end
>>> + return f]] .. n ..'(...)'
>>> +
>>> + s[n] = assert(load(src))
>>> + local res1 = s[n](1,2,3,4,5,6,7,8,9,10,11)
>>> + local res2 = s[n](1,2,3,4,5,6,7,8,9,10,11)
>> 
>> This was not obvious to me, but I've finally got it: you compare the
>> result yielded by interpreter with the one yielded by the trace. At
>> first, you don't need to run all 1e6 iterations: you have set 'hotloop'
>> to 1, so you need only 3 (!) iterations to successfully compile the
>> loop. The second call will use the compiled trace for the first loop
>> iteration, but also will try to compile the function itself. AFAIU, the
>> constant being materialized on the trace is the <a> table address,
>> right? At least I see no other option, so please mention in the comment
>> which constant is fused and leads to the failure if the patch is
>> missing. Finally, I don't get, why do you need 100 iterations for
>> getting the misbehaviour.
>> 
>> Please, add everything I dumped below as the corresponding comments for
>> this part.
>> 
>>> + if (res1 ~= res2) then
>>> + pass = false
>> 
>> Why don't you add the assertion about constant fusion right here?
>> 
>>> + break
>>> + end
>>> +end
>>> +
>>> +test:ok(pass, 'wrong IR constant fuse')
>> 
>> [1]: https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guide/
>> 
>> -- 
>> Best regards,
>> IM
> 
> [1]: https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion <https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion>
> [2]: https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci <https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci>
> [3]: https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci <https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci>
> 
> -- 
> Best regards,
> Sergey Kaplun