Hi! Thanks for working on it! Just minor nits, LGTM Sergos > On 16 Feb 2022, at 18:44, Sergey Kaplun wrote: > > Hi, Sergos! > Thanks for the patch! > > Hi , Igor! > Thanks for the review! > > On 05.07.21, Igor Munkin wrote: >> Sergos, >> >> Thanks for the patch! Something went wrong with formatting the patch I >> guess, so I'll share you the recipe how I send the backported patches: >> 1. cherry-pick the original patch from the upstream >> 2. format-patch the backported commit >> 3. Add "From" header with the original author of the changes at the >> very beginning of the formatted patch >> 4. Adjust the commit message according to our backporting procedure >> >> BTW, I've found neither the branch with the patch in tarantool/luajit >> nor the branch with the green CI in tarantool/tarantool... > > I've updated a little the old Sergos's branch [1] and save results on my > own branch [2]. > Also I've added GC64 workflow for branch checkouted from the current > Tarantool's master branch [3]. > > The new commit message and patch is the following: > > =================================================================== > commit 0759d1783ef96c4bb47869c6bc1181b38d95f654 > Author: Mike Pall > Date: Mon Aug 28 10:43:37 2017 +0200 > > From: Mike Pall > > (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14) > > x64/LJ_GC64: Fix fallback case of asm_fuseloadk64(). > > Contributed by Peter Cawley. > > Code generation under LJ_GC64 missed an update to the mcode area > boundaries after a 64-bit constant encoding. It can lead to a > corruption of the constant value after next trace code generation. > The constant can be encoded in one of the following ways: > > a. If the address of the constant fits into 32-bit one, then encode it > as a 32-bit displacement (the only option for non-GC64 mode). > b. If the offset of the constant slot from the dispatch table (pinned > to r14 that is not changed while trace execution) fits into 32-bit, > then encode this as a 32-bit displacement relative to r14. > c. If the offset of the constant slot from the mcode (i.e. rip) fits > into 32-bit, then encode this as a 32-bit displacement relative to > rip (considering long mode specifics and RID_RIP hack). > d. If none of the conditions above are valid, compiler materializes > this 64-bit constant right at the trace bottom and encodes this the > same way it does for the previous case. > > The mentioned problem appears only with 2Gb distance from the currently > allocated mcode to the dispatch pointer, which may happen in real life > in case of long running instance. > > Sergey Ostanevich: > * added the description and the test for the problem > > Fixes tarantool/tarantool#4095 > Fixes tarantool/tarantool#4199 > Fixes tarantool/tarantool#4614 > > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h > index 767bf6f3..2850aea9 100644 > --- a/src/lj_asm_x86.h > +++ b/src/lj_asm_x86.h > @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir) > ir->i = (int32_t)(as->mctop - as->mcbot); > as->mcbot += 8; > as->mclim = as->mcbot + MCLIM_REDZONE; > + lj_mcode_commitbot(as->J, as->mcbot); > } > as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i); > as->mrm.base = RID_RIP; > diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua > new file mode 100644 > index 00000000..9eabbdba > --- /dev/null > +++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua > @@ -0,0 +1,79 @@ > +-- The test is GC64 only. > +local ffi = require('ffi') > +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only') > + > +local tap = require('tap') > +local test = tap.test('gh-4199-gc64-fuse') > +test:plan(1) > + > +collectgarbage() > +-- Chomp memory in currently allocated GC space. > +collectgarbage('stop') > + > +for _ = 1, 8 do > + ffi.new('char[?]', 256 * 1024 * 1024) > +end > + > +jit.opt.start('hotloop=1') > + > +-- Generate a bunch of traces to trigger constant placement at the > +-- end of the trace. Since global describing the mcode area in the ^ ^ a variable > +-- jit structure is not updated, the next trace generated will > +-- invalidate the constant of the previous trace. Then > +-- execution of the _previous_ trace will use wrong value. ^ the > + > +-- Keep last two functions generated to compare results. ^ the > +local s = {} > +local test_res = true > + > +-- XXX: The number of iteration is fragile, depending on the trace > +-- length and the amount of currently pre-allocated mcode area. > +-- Usually works under 100, which doesn't take too long in case of > +-- success, so I gave up to locate a better way to chomp the mcode > +-- area. > + > +for n = 1, 100 do > + local src = string.format([[ > + function f%d(x, y) Not sure if indentation is preserved in the patch and should it be here? > + local a = {} > + for i = 1, 5 do > + a[i] = 0 > + -- This summ is not necessarry but decreases the amount of > + -- iterations. > + a[i] = a[i] + x + y > + -- Use all FPR registers and one value from the memory > + -- (xmm0 is for result, xmm15 states for work with table). > + a[i] = a[i] + 1.1 > + a[i] = a[i] + 2.2 > + a[i] = a[i] + 3.3 > + a[i] = a[i] + 4.4 > + a[i] = a[i] + 5.5 > + a[i] = a[i] + 6.6 > + a[i] = a[i] + 7.7 > + a[i] = a[i] + 8.8 > + a[i] = a[i] + 9.9 > + a[i] = a[i] + 10.10 > + a[i] = a[i] + 11.11 > + a[i] = a[i] + 12.12 > + a[i] = a[i] + 13.13 > + a[i] = a[i] + 14.14 > + a[i] = a[i] + 15.15 > + end > + return a[1] > + end > + return f%d(...) > + ]], n, n) > + s[2] = load(src) > + if s[1] ~= nil then > + local res1 = s[1](1, 2) > + local res2 = s[2](1, 2) > + if res1 ~= res2 then > + test_res = false > + break > + end > + end > + s[1] = s[2] > +end > + > +test:ok(test_res, 'IR constant fusion') > +os.exit(test:check() and 0 or 1) > =================================================================== > >> >> On 28.05.21, Sergey Ostanevich wrote: >>> Author: Mike Pall >>> Date: Mon Aug 28 10:43:37 2017 +0200 >>> >>> x64/LJ_GC64: Fix fallback case of asm_fuseloadk64(). >>> >>> Contributed by Peter Cawley. >>> >>> (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14) >>> >>> Code generation under LJ_GC64 missed an update to the mcode area after >> >> Minor: s/mcode area/mcode area boundaries/. Feel free to ignore. >> >>> a 64bit constant encoding. This lead to a corruption to the constant >> >> Typo: s/corruption to/corruption of/. >> >>> later on. >>> The problem is rather rare, since there should be big enough (4GiB) >>> distance from the currently allocated mcode to the dispatch pointer. >> >> Minor: Sorry for being pedantic, but this is not such a trivial bug, so >> I've described semantics around this a bit. As for me it's worth to >> mention four possible encodings of 64-bit constant on the trace. >> 0. If the address of the constant fits into 32-bit one, then encode it >> as a 32-bit displacement (the only option for non-GC64 mode). >> 1. If the offset of the constant slot from the dispatch table (pinned >> to r14 that is not changed while trace execution) fits into 32-bit, >> then encode this as a 32-bit displacement relative to r14. >> 2. If the offset of the constant slot from the mcode (i.e. rip) fits >> into 32-bit, then encode this as a 32-bit displacement relative to >> rip (considering long mode specifics and RID_RIP hack). >> 3. If none of the conditions above are valid, compiler materializes >> this 64-bit constant right at the trace bottom and encodes this the >> same way it does for the previous case. >> >> And now goes your part regarding 4Gb and rarity of this case and the >> problem is much clearer to the reader. Please adjust this part using my >> description above. >> >>> This lead to a number of flaky tests, trackers are addressed. >>> >>> Sergey Ostanevich: >>> * added the description and the test for the problem >>> >>> Closes: #4095, #4199, #4614 >> >> At first, please move split this line into the separate ones with a >> single issue per line. Also don't use ':' in GitHub tags, please: it >> just doesn't respect our commit message style. Unfortunately, this >> commit tags no issue, since it is pushed to tarantool/luajit repo, but >> the issues relate to tarantool/tarantool. Hence, you need to explicitly >> mention the latter repo. And last (but not least), we agreed with >> Sergey to use "Resolves" instead of "Closes" in tarantool/luajit repo >> and use "Closes" in the corresponding patches bumping LuaJIT submodule. >> As for these patches, I'd rather use "Fixes", since you have checked >> that the failures related to the mentioned issues are gone. Considering >> everything above, the line above transforms into the following: >> >> | Fixes tarantool/tarantool#4095 >> | Fixes tarantool/tarantool#4199 >> | Fixes tarantool/tarantool#4614 >> >>> >>> Signed-off-by: Sergey Ostanevich >>> >>> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h >>> index 767bf6f3..2850aea9 100644 >>> --- a/src/lj_asm_x86.h >>> +++ b/src/lj_asm_x86.h >>> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns *ir) >>> ir->i = (int32_t)(as->mctop - as->mcbot); >>> as->mcbot += 8; >>> as->mclim = as->mcbot + MCLIM_REDZONE; >>> + lj_mcode_commitbot(as->J, as->mcbot); >>> } >>> as->mrm.ofs = (int32_t)mcpofs(as, as->mctop - ir->i); >>> as->mrm.base = RID_RIP; >>> diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua >>> new file mode 100644 >>> index 00000000..3ac30427 >>> --- /dev/null >>> +++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua >>> @@ -0,0 +1,63 @@ >> >> Regarding test code style: I don't want to point each place you have >> violated it, so I just refer the document[1] that we try to follow except >> the one rule: we use 2 spaces for indentation to get tests closer to >> LuaJIT sources. Hence, please consider that I've "implicitly" commented >> every spot below, where you don't follow our code style. >> >>> +-- the test is GC64 only >>> +local ffi=require('ffi') >>> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only') >>> + >>> +local tap = require("tap") >>> +local test = tap.test("gh-4199-gc64-flaky") >>> +test:plan(1) >>> + >>> +-- first - we have to make a gap from current JIT infra to next >>> +-- available mappable memory >>> +-- most efficient is to grab it per-page >>> + >>> + >>> +ffi.cdef('void * mmap(void *start, size_t length, int prot , int flags, int fd, long offset);') >>> +ffi.cdef('long getpagesize();') >> >> Minor: It's better use once but with multiline declarations >> instead of several calls. >> >>> + >>> +local pagesize = tonumber(ffi.C.getpagesize()) >>> +local blob = {} >>> +for i=1, 4e9/pagesize do >>> + blob[i] = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0) >> >> 0x22, 4e9 -- these are magic numbers for me. Please, create a variable >> with the descriptive name (and even comments, I guess) for each of them, >> to ease the further maintenance. >> >>> + assert(blob[i] ~= 0) >>> +end >>> + >>> +-- try to chomp all memory in currently allocated gc space >>> +collectgarbage('stop') >> >> Since you decided to stop GC, you can do it right at the very beginning, >> so you don't need to anchor all mmaped memory above. >> >>> +local dummy={'a'} >>> +for i=2,30 do >>> + dummy[i] = dummy[i - 1] .. dummy[i - 1] >>> +end >> >> Why do you need this loop? >> >>> + >>> +-- generate a bunch of functions and keep them stored to trigger wrong constant placement >>> + >>> +local s={} >> >> Again, since GC is stopped, there is no need to anchor all the functions >> generated below. >> >>> +local pass = true >>> + >>> +jit.opt.start('hotloop=1’) >>> +for n=1,100 do >>> + local src='function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d) >>> + local a={} >>> + for i=1,1e6 do >>> + a[i] = x + y + z + f + g + h + j + k + r + c + d >>> + if (x > 0) then a[i] = a[i] + 1.1 end >>> + if (c > 0) then a[i] = a[i] + 2.2 end >>> + if (z > 0) then a[i] = a[i] + 3.3 end >>> + if (f > 0) then a[i] = a[i] + 4.4 end >>> + x=x+r >>> + y=y-c >>> + z=z+d >>> + end >>> + return a[1] >>> + end >>> + return f]] .. n ..'(...)' >>> + >>> + s[n] = assert(load(src)) >>> + local res1 = s[n](1,2,3,4,5,6,7,8,9,10,11) >>> + local res2 = s[n](1,2,3,4,5,6,7,8,9,10,11) >> >> This was not obvious to me, but I've finally got it: you compare the >> result yielded by interpreter with the one yielded by the trace. At >> first, you don't need to run all 1e6 iterations: you have set 'hotloop' >> to 1, so you need only 3 (!) iterations to successfully compile the >> loop. The second call will use the compiled trace for the first loop >> iteration, but also will try to compile the function itself. AFAIU, the >> constant being materialized on the trace is the table address, >> right? At least I see no other option, so please mention in the comment >> which constant is fused and leads to the failure if the patch is >> missing. Finally, I don't get, why do you need 100 iterations for >> getting the misbehaviour. >> >> Please, add everything I dumped below as the corresponding comments for >> this part. >> >>> + if (res1 ~= res2) then >>> + pass = false >> >> Why don't you add the assertion about constant fusion right here? >> >>> + break >>> + end >>> +end >>> + >>> +test:ok(pass, 'wrong IR constant fuse') >> >> [1]: https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guide/ >> >> -- >> Best regards, >> IM > > [1]: https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion > [2]: https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-ci > [3]: https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full-ci > > -- > Best regards, > Sergey Kaplun