From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 13B1B6ECE3; Tue, 21 Jun 2022 15:11:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 13B1B6ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1655813498; bh=OEgPS4GhQcgMir7V31ZAb0QLuoyGptU4Rg7eR0qls2c=; h=Date:In-Reply-To:To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=zaaeBvh6NmSIqcJP9a25om+xCkEq9cIjqP0ShJ1EV9xMzmpViiTNkakkDUQ3CutAA TiGDR1KxD59F1t0BkG1K3G5JV0JXaWv+lVfhxsz67mCvz9nfTKIyAEt6hOEILiHTok H3+eOo3kiGWr63fTkc7SzR3epbTLd004Om3j7tAM= Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BF3C16ECE3 for ; Tue, 21 Jun 2022 15:11:34 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BF3C16ECE3 Received: by smtp51.i.mail.ru with esmtpa (envelope-from ) id 1o3cjF-0007mm-TH; Tue, 21 Jun 2022 15:11:34 +0300 Message-Id: <56F81FD2-FDD5-492E-9A16-6829EA710561@tarantool.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_2F0C549A-A24C-480E-B8BC-188922B0BF74" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Date: Tue, 21 Jun 2022 15:11:33 +0300 In-Reply-To: To: Sergey Kaplun References: <804A99A3-6D0C-4DA9-A939-26FFED0EC823@tarantool.org> <20210704210647.GA6106@tarantool.org> X-Mailer: Apple Mail (2.3696.100.31) X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD9B458E015651D1EC05F221EC801A654673609BD401559F1DF182A05F5380850408DD5FCA23F7CA9B69885A8E76F66BFC0FE7BC8A7D04C3DE069AD9EE7BA902D7A X-8FC586DF: 6EFBBC1D9D64D975 X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142187CD12E744282CF407866D6147AF826D806C414C04D156E9BC8FFEED801FD8A28F972CCD2F8FE1EF1CFC4036BBF6A4EA9B11811A4A51E3B0915E2725BA614EAEA1EF972C1F679AE1C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34AB80C45F81B80D63CE77E4C8906217EFC0F2A134DEE154041581C190BEFDB7F711F88044506B42061D7E09C32AA3244CAD44A214C8754403BB895375228234173A92A9747B6CC886FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+ZQCdndr4HmX4XLXaM0BKw== X-Mailru-Sender: 11C2EC085EDE56FA38FD4C59F7EFE4077DE422BE66BDFBAF2092ECA27C5065CED718CF4B4C3445CD19381EE24192DF5555834048F03EF5D4C9A814A92B2E3B1BA4250FC3964EA4964198E0F3ECE9B5443453F38A29522196 X-Mras: OK Subject: Re: [Tarantool-patches] [PATH luajit] GC64: fix 64-bit constant fusion X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" --Apple-Mail=_2F0C549A-A24C-480E-B8BC-188922B0BF74 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi! Thanks for working on it! Just minor nits,=20 LGTM Sergos > On 16 Feb 2022, at 18:44, Sergey Kaplun wrote: >=20 > Hi, Sergos! > Thanks for the patch! >=20 > Hi , Igor! > Thanks for the review! >=20 > On 05.07.21, Igor Munkin wrote: >> Sergos, >>=20 >> Thanks for the patch! Something went wrong with formatting the patch = I >> guess, so I'll share you the recipe how I send the backported = patches: >> 1. cherry-pick the original patch from the upstream >> 2. format-patch the backported commit >> 3. Add "From" header with the original author of the changes at the >> very beginning of the formatted patch >> 4. Adjust the commit message according to our backporting procedure >>=20 >> BTW, I've found neither the branch with the patch in tarantool/luajit >> nor the branch with the green CI in tarantool/tarantool... >=20 > I've updated a little the old Sergos's branch [1] and save results on = my > own branch [2]. > Also I've added GC64 workflow for branch checkouted from the current > Tarantool's master branch [3]. >=20 > The new commit message and patch is the following: >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > commit 0759d1783ef96c4bb47869c6bc1181b38d95f654 > Author: Mike Pall > Date: Mon Aug 28 10:43:37 2017 +0200 >=20 > From: Mike Pall >=20 > (chekky picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14) >=20 > x64/LJ_GC64: Fix fallback case of asm_fuseloadk64(). >=20 > Contributed by Peter Cawley. >=20 > Code generation under LJ_GC64 missed an update to the mcode area > boundaries after a 64-bit constant encoding. It can lead to a > corruption of the constant value after next trace code generation. > The constant can be encoded in one of the following ways: >=20 > a. If the address of the constant fits into 32-bit one, then encode it > as a 32-bit displacement (the only option for non-GC64 mode). > b. If the offset of the constant slot from the dispatch table (pinned > to r14 that is not changed while trace execution) fits into 32-bit, > then encode this as a 32-bit displacement relative to r14. > c. If the offset of the constant slot from the mcode (i.e. rip) fits > into 32-bit, then encode this as a 32-bit displacement relative to > rip (considering long mode specifics and RID_RIP hack). > d. If none of the conditions above are valid, compiler materializes > this 64-bit constant right at the trace bottom and encodes this the > same way it does for the previous case. >=20 > The mentioned problem appears only with 2Gb distance from the = currently > allocated mcode to the dispatch pointer, which may happen in real life > in case of long running instance. >=20 > Sergey Ostanevich: > * added the description and the test for the problem >=20 > Fixes tarantool/tarantool#4095 > Fixes tarantool/tarantool#4199 > Fixes tarantool/tarantool#4614 >=20 > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h > index 767bf6f3..2850aea9 100644 > --- a/src/lj_asm_x86.h > +++ b/src/lj_asm_x86.h > @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns = *ir) > ir->i =3D (int32_t)(as->mctop - as->mcbot); > as->mcbot +=3D 8; > as->mclim =3D as->mcbot + MCLIM_REDZONE; > + lj_mcode_commitbot(as->J, as->mcbot); > } > as->mrm.ofs =3D (int32_t)mcpofs(as, as->mctop - ir->i); > as->mrm.base =3D RID_RIP; > diff --git a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua = b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua > new file mode 100644 > index 00000000..9eabbdba > --- /dev/null > +++ b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua > @@ -0,0 +1,79 @@ > +-- The test is GC64 only. > +local ffi =3D require('ffi') > +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only') > + > +local tap =3D require('tap') > +local test =3D tap.test('gh-4199-gc64-fuse') > +test:plan(1) > + > +collectgarbage() > +-- Chomp memory in currently allocated GC space. > +collectgarbage('stop') > + > +for _ =3D 1, 8 do > + ffi.new('char[?]', 256 * 1024 * 1024) > +end > + > +jit.opt.start('hotloop=3D1') > + > +-- Generate a bunch of traces to trigger constant placement at the > +-- end of the trace. Since global describing the mcode area in the ^ ^ a variable > +-- jit structure is not updated, the next trace generated will > +-- invalidate the constant of the previous trace. Then > +-- execution of the _previous_ trace will use wrong value. ^ the > + > +-- Keep last two functions generated to compare results. ^ the > +local s =3D {} > +local test_res =3D true > + > +-- XXX: The number of iteration is fragile, depending on the trace > +-- length and the amount of currently pre-allocated mcode area. > +-- Usually works under 100, which doesn't take too long in case of > +-- success, so I gave up to locate a better way to chomp the mcode > +-- area. > + > +for n =3D 1, 100 do > + local src =3D string.format([[ > + function f%d(x, y) Not sure if indentation is preserved in the patch and should it be here? > + local a =3D {} > + for i =3D 1, 5 do > + a[i] =3D 0 > + -- This summ is not necessarry but decreases the amount of > + -- iterations. > + a[i] =3D a[i] + x + y > + -- Use all FPR registers and one value from the memory > + -- (xmm0 is for result, xmm15 states for work with table). > + a[i] =3D a[i] + 1.1 > + a[i] =3D a[i] + 2.2 > + a[i] =3D a[i] + 3.3 > + a[i] =3D a[i] + 4.4 > + a[i] =3D a[i] + 5.5 > + a[i] =3D a[i] + 6.6 > + a[i] =3D a[i] + 7.7 > + a[i] =3D a[i] + 8.8 > + a[i] =3D a[i] + 9.9 > + a[i] =3D a[i] + 10.10 > + a[i] =3D a[i] + 11.11 > + a[i] =3D a[i] + 12.12 > + a[i] =3D a[i] + 13.13 > + a[i] =3D a[i] + 14.14 > + a[i] =3D a[i] + 15.15 > + end > + return a[1] > + end > + return f%d(...) > + ]], n, n) > + s[2] =3D load(src) > + if s[1] ~=3D nil then > + local res1 =3D s[1](1, 2) > + local res2 =3D s[2](1, 2) > + if res1 ~=3D res2 then > + test_res =3D false > + break > + end > + end > + s[1] =3D s[2] > +end > + > +test:ok(test_res, 'IR constant fusion') > +os.exit(test:check() and 0 or 1) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 >>=20 >> On 28.05.21, Sergey Ostanevich wrote: >>> Author: Mike Pall >>> Date: Mon Aug 28 10:43:37 2017 +0200 >>>=20 >>> x64/LJ_GC64: Fix fallback case of asm_fuseloadk64(). >>>=20 >>> Contributed by Peter Cawley. >>>=20 >>> (cherry picked from commit 6b0824852677cc12570c20a3211fbfe0e4f0ce14) >>>=20 >>> Code generation under LJ_GC64 missed an update to the mcode area = after >>=20 >> Minor: s/mcode area/mcode area boundaries/. Feel free to ignore. >>=20 >>> a 64bit constant encoding. This lead to a corruption to the constant >>=20 >> Typo: s/corruption to/corruption of/. >>=20 >>> later on. >>> The problem is rather rare, since there should be big enough (4GiB) >>> distance from the currently allocated mcode to the dispatch pointer. >>=20 >> Minor: Sorry for being pedantic, but this is not such a trivial bug, = so >> I've described semantics around this a bit. As for me it's worth to >> mention four possible encodings of 64-bit constant on the trace. >> 0. If the address of the constant fits into 32-bit one, then encode = it >> as a 32-bit displacement (the only option for non-GC64 mode). >> 1. If the offset of the constant slot from the dispatch table (pinned >> to r14 that is not changed while trace execution) fits into 32-bit, >> then encode this as a 32-bit displacement relative to r14. >> 2. If the offset of the constant slot from the mcode (i.e. rip) fits >> into 32-bit, then encode this as a 32-bit displacement relative to >> rip (considering long mode specifics and RID_RIP hack). >> 3. If none of the conditions above are valid, compiler materializes >> this 64-bit constant right at the trace bottom and encodes this the >> same way it does for the previous case. >>=20 >> And now goes your part regarding 4Gb and rarity of this case and the >> problem is much clearer to the reader. Please adjust this part using = my >> description above. >>=20 >>> This lead to a number of flaky tests, trackers are addressed. >>>=20 >>> Sergey Ostanevich: >>> * added the description and the test for the problem >>>=20 >>> Closes: #4095, #4199, #4614 >>=20 >> At first, please move split this line into the separate ones with a >> single issue per line. Also don't use ':' in GitHub tags, please: it >> just doesn't respect our commit message style. Unfortunately, this >> commit tags no issue, since it is pushed to tarantool/luajit repo, = but >> the issues relate to tarantool/tarantool. Hence, you need to = explicitly >> mention the latter repo. And last (but not least), we agreed with >> Sergey to use "Resolves" instead of "Closes" in tarantool/luajit repo >> and use "Closes" in the corresponding patches bumping LuaJIT = submodule. >> As for these patches, I'd rather use "Fixes", since you have checked >> that the failures related to the mentioned issues are gone. = Considering >> everything above, the line above transforms into the following: >>=20 >> | Fixes tarantool/tarantool#4095 >> | Fixes tarantool/tarantool#4199 >> | Fixes tarantool/tarantool#4614 >>=20 >>>=20 >>> Signed-off-by: Sergey Ostanevich >>>=20 >>> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h >>> index 767bf6f3..2850aea9 100644 >>> --- a/src/lj_asm_x86.h >>> +++ b/src/lj_asm_x86.h >>> @@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, IRIns = *ir) >>> ir->i =3D (int32_t)(as->mctop - as->mcbot); >>> as->mcbot +=3D 8; >>> as->mclim =3D as->mcbot + MCLIM_REDZONE; >>> + lj_mcode_commitbot(as->J, as->mcbot); >>> } >>> as->mrm.ofs =3D (int32_t)mcpofs(as, as->mctop - ir->i); >>> as->mrm.base =3D RID_RIP; >>> diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua = b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua >>> new file mode 100644 >>> index 00000000..3ac30427 >>> --- /dev/null >>> +++ b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua >>> @@ -0,0 +1,63 @@ >>=20 >> Regarding test code style: I don't want to point each place you have >> violated it, so I just refer the document[1] that we try to follow = except >> the one rule: we use 2 spaces for indentation to get tests closer to >> LuaJIT sources. Hence, please consider that I've "implicitly" = commented >> every spot below, where you don't follow our code style. >>=20 >>> +-- the test is GC64 only >>> +local ffi=3Drequire('ffi') >>> +require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 only') >>> + >>> +local tap =3D require("tap") >>> +local test =3D tap.test("gh-4199-gc64-flaky") >>> +test:plan(1) >>> + >>> +-- first - we have to make a gap from current JIT infra to next >>> +-- available mappable memory >>> +-- most efficient is to grab it per-page >>> + >>> + >>> +ffi.cdef('void * mmap(void *start, size_t length, int prot , int = flags, int fd, long offset);') >>> +ffi.cdef('long getpagesize();') >>=20 >> Minor: It's better use once but with multiline = declarations >> instead of several calls. >>=20 >>> + >>> +local pagesize =3D tonumber(ffi.C.getpagesize()) >>> +local blob =3D {} >>> +for i=3D1, 4e9/pagesize do >>> + blob[i] =3D ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, = 0) >>=20 >> 0x22, 4e9 -- these are magic numbers for me. Please, create a = variable >> with the descriptive name (and even comments, I guess) for each of = them, >> to ease the further maintenance. >>=20 >>> + assert(blob[i] ~=3D 0) >>> +end >>> + >>> +-- try to chomp all memory in currently allocated gc space >>> +collectgarbage('stop') >>=20 >> Since you decided to stop GC, you can do it right at the very = beginning, >> so you don't need to anchor all mmaped memory above. >>=20 >>> +local dummy=3D{'a'} >>> +for i=3D2,30 do >>> + dummy[i] =3D dummy[i - 1] .. dummy[i - 1] >>> +end >>=20 >> Why do you need this loop? >>=20 >>> + >>> +-- generate a bunch of functions and keep them stored to trigger = wrong constant placement >>> + >>> +local s=3D{} >>=20 >> Again, since GC is stopped, there is no need to anchor all the = functions >> generated below. >>=20 >>> +local pass =3D true >>> + >>> +jit.opt.start('hotloop=3D1=E2=80=99) >>> +for n=3D1,100 do >>> + local src=3D'function f'.. n .. [[(x,y,z,f,g,h,j,k,r,c,d) >>> + local a=3D{} >>> + for i=3D1,1e6 do >>> + a[i] =3D x + y + z + f + g + h + j + k + r + c + d >>> + if (x > 0) then a[i] =3D a[i] + 1.1 end >>> + if (c > 0) then a[i] =3D a[i] + 2.2 end >>> + if (z > 0) then a[i] =3D a[i] + 3.3 end >>> + if (f > 0) then a[i] =3D a[i] + 4.4 end >>> + x=3Dx+r >>> + y=3Dy-c >>> + z=3Dz+d >>> + end >>> + return a[1] >>> + end >>> + return f]] .. n ..'(...)' >>> + >>> + s[n] =3D assert(load(src)) >>> + local res1 =3D s[n](1,2,3,4,5,6,7,8,9,10,11) >>> + local res2 =3D s[n](1,2,3,4,5,6,7,8,9,10,11) >>=20 >> This was not obvious to me, but I've finally got it: you compare the >> result yielded by interpreter with the one yielded by the trace. At >> first, you don't need to run all 1e6 iterations: you have set = 'hotloop' >> to 1, so you need only 3 (!) iterations to successfully compile the >> loop. The second call will use the compiled trace for the first loop >> iteration, but also will try to compile the function itself. AFAIU, = the >> constant being materialized on the trace is the table address, >> right? At least I see no other option, so please mention in the = comment >> which constant is fused and leads to the failure if the patch is >> missing. Finally, I don't get, why do you need 100 iterations for >> getting the misbehaviour. >>=20 >> Please, add everything I dumped below as the corresponding comments = for >> this part. >>=20 >>> + if (res1 ~=3D res2) then >>> + pass =3D false >>=20 >> Why don't you add the assertion about constant fusion right here? >>=20 >>> + break >>> + end >>> +end >>> + >>> +test:ok(pass, 'wrong IR constant fuse') >>=20 >> [1]: = https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guide/ >>=20 >> --=20 >> Best regards, >> IM >=20 > [1]: = https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-const-fusion = > [2]: = https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-fuse-full-c= i = > [3]: = https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc64-fuse-full= -ci = >=20 > --=20 > Best regards, > Sergey Kaplun --Apple-Mail=_2F0C549A-A24C-480E-B8BC-188922B0BF74 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi!

Thanks = for working on it!
Just minor nits, 
LGTM

Sergos


Hi, Sergos!
Thanks for the patch!

Hi , Igor!
Thanks for the review!

On 05.07.21, Igor Munkin wrote:
Sergos,

Thanks for the patch! Something went wrong with formatting = the patch I
guess, so I'll share you the recipe how I send = the backported patches:
1. cherry-pick the original patch = from the upstream
2. format-patch the backported commit
3. Add "From" header with the original author of the changes = at the
very beginning of the formatted patch
4. Adjust the commit message according to our backporting = procedure

BTW, I've found neither the = branch with the patch in tarantool/luajit
nor the branch = with the green CI in tarantool/tarantool...
I've updated = a little the old Sergos's branch [1] and save results on my
own branch = [2].
Also I've = added GC64 workflow for branch checkouted from the current
Tarantool's = master branch [3].

The new commit message and patch is the following:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
commit = 0759d1783ef96c4bb47869c6bc1181b38d95f654
Author: Mike Pall <mike>
Date: Mon Aug 28 10:43:37 2017 +0200

From: Mike = Pall <mike>

(chekky picked from commit = 6b0824852677cc12570c20a3211fbfe0e4f0ce14)

x64/LJ_GC64: Fix fallback case of = asm_fuseloadk64().

Contributed by Peter Cawley.

Code generation under LJ_GC64 missed an update to the mcode = area
boundaries = after a 64-bit constant encoding. It can lead to a
corruption of = the constant value after next trace code generation.
The constant = can be encoded in one of the following ways:

a. If the = address of the constant fits into 32-bit one, then encode it
as a 32-bit = displacement (the only option for non-GC64 mode).
b. If the = offset of the constant slot from the dispatch table (pinned
to r14 that = is not changed while trace execution) fits into 32-bit,
then encode = this as a 32-bit displacement relative to r14.
c. If the = offset of the constant slot from the mcode (i.e. rip) fits
into 32-bit, = then encode this as a 32-bit displacement relative to
rip = (considering long mode specifics and RID_RIP hack).
d. If none of = the conditions above are valid, compiler materializes
this 64-bit = constant right at the trace bottom and encodes this the
same way it = does for the previous case.

The mentioned problem appears only with 2Gb distance from the = currently
allocated = mcode to the dispatch pointer, which may happen in real life
in case of = long running instance.

Sergey Ostanevich:
* added the description and the test for the = problem

Fixes = tarantool/tarantool#4095
Fixes tarantool/tarantool#4199
Fixes tarantool/tarantool#4614

diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index = 767bf6f3..2850aea9 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -387,6 +387,7 @@ static Reg asm_fuseloadk64(ASMState *as, = IRIns *ir)
ir->i =3D = (int32_t)(as->mctop - as->mcbot);
as->mcbot +=3D 8;
as->mclim =3D as->mcbot + MCLIM_REDZONE;
+ = lj_mcode_commitbot(as->J, as->mcbot);
}
as->mrm.ofs = =3D (int32_t)mcpofs(as, as->mctop - ir->i);
as->mrm.base= =3D RID_RIP;
diff --git = a/test/tarantool-tests/gh-4199-gc64-fuse.test.lua = b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
new file mode = 100644
index = 00000000..9eabbdba
--- /dev/null
+++ = b/test/tarantool-tests/gh-4199-gc64-fuse.test.lua
@@ -0,0 +1,79 = @@
+-- The test = is GC64 only.
+local ffi =3D = require('ffi')
+require('utils').skipcond(not ffi.abi('gc64'), 'test is GC64 = only')
+
+local tap =3D = require('tap')
+local test =3D tap.test('gh-4199-gc64-fuse')
+test:plan(1)
+
+collectgarbage()
+-- Chomp memory in currently allocated GC space.
+collectgarbage('stop')
+
+for _ =3D 1, = 8 do
+ = ffi.new('char[?]', 256 * 1024 * 1024)
+end
+
+jit.opt.start('hotloop=3D1')
+
+-- Generate = a bunch of traces to trigger constant placement at the
+-- end of = the trace. Since global describing the mcode area in the
          =                    ^ =     ^
            =                  a   =   variable
+-- jit = structure is not updated, the next trace generated will
+-- = invalidate the constant of the previous trace. Then
+-- execution = of the _previous_ trace will use wrong value.
        =                     =                     = ^
                  =                     =          the
+
+-- Keep last = two functions generated to compare results.
          = ^
         the
+local s =3D = {}
+local = test_res =3D true
+
+-- XXX: The = number of iteration is fragile, depending on the trace
+-- length = and the amount of currently pre-allocated mcode area.
+-- Usually = works under 100, which doesn't take too long in case of
+-- success, = so I gave up to locate a better way to chomp the mcode
+-- = area.
+
+for n =3D 1, = 100 do
+ local src =3D= string.format([[
+ function f%d(x, y)

Not sure if indentation is preserved in the patch and = should it be here?

+ local a =3D {}
+ for i =3D 1, 5 do
+ a[i] =3D 0
+ -- This summ is not necessarry but decreases the amount = of
+ -- = iterations.
+ a[i] =3D = a[i] + x + y
+ -- Use all = FPR registers and one value from the memory
+ -- (xmm0 is = for result, xmm15 states for work with table).
+ a[i] =3D = a[i] + 1.1
+ a[i] =3D = a[i] + 2.2
+ a[i] =3D = a[i] + 3.3
+ a[i] =3D = a[i] + 4.4
+ a[i] =3D = a[i] + 5.5
+ a[i] =3D = a[i] + 6.6
+ a[i] =3D = a[i] + 7.7
+ a[i] =3D = a[i] + 8.8
+ a[i] =3D = a[i] + 9.9
+ a[i] =3D = a[i] + 10.10
+ a[i] =3D = a[i] + 11.11
+ a[i] =3D = a[i] + 12.12
+ a[i] =3D = a[i] + 13.13
+ a[i] =3D = a[i] + 14.14
+ a[i] =3D = a[i] + 15.15
+ = end
+ return = a[1]
+ = end
+ return = f%d(...)
+ ]], n, = n)
+ s[2] =3D = load(src)
+ if s[1] ~=3D = nil then
+ local res1 = =3D s[1](1, 2)
+ local res2 =3D s[2](1, 2)
+ if res1 ~=3D res2 then
+ test_res =3D false
+ break
+ end
+ end
+ s[1] =3D s[2]
+end
+
+test:ok(test_res, 'IR constant fusion')
+os.exit(test:check() and 0 or 1)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


On 28.05.21, Sergey Ostanevich wrote:
Author: Mike Pall <mike>
Date: Mon Aug 28 10:43:37 2017 +0200

x64/LJ_GC64: Fix fallback case of asm_fuseloadk64().

Contributed by Peter Cawley.

(cherry picked from commit = 6b0824852677cc12570c20a3211fbfe0e4f0ce14)

Code generation under LJ_GC64 missed an update to the mcode = area after

Minor: s/mcode = area/mcode area boundaries/. Feel free to ignore.

a 64bit constant = encoding. This lead to a corruption to the constant

Typo: s/corruption to/corruption = of/.

later on.
The problem is rather rare, since = there should be big enough (4GiB)
distance from the = currently allocated mcode to the dispatch pointer.

Minor: Sorry for being pedantic, = but this is not such a trivial bug, so
I've described = semantics around this a bit. As for me it's worth to
mention= four possible encodings of 64-bit constant on the trace.
0.= If the address of the constant fits into 32-bit one, then encode it
as a 32-bit displacement (the only option for non-GC64 = mode).
1. If the offset of the constant slot from the = dispatch table (pinned
to r14 that is not changed while = trace execution) fits into 32-bit,
then encode this as a = 32-bit displacement relative to r14.
2. If the offset of = the constant slot from the mcode (i.e. rip) fits
into = 32-bit, then encode this as a 32-bit displacement relative to
rip (considering long mode specifics and RID_RIP hack).
3. If none of the conditions above are valid, compiler = materializes
this 64-bit constant right at the trace = bottom and encodes this the
same way it does for the = previous case.

And now goes your part = regarding 4Gb and rarity of this case and the
problem is = much clearer to the reader. Please adjust this part using my
description above.

This lead to a number of flaky tests, trackers = are addressed.

Sergey Ostanevich:
* added the description and the test for the problem

Closes: #4095, #4199, #4614

At first, please move split this = line into the separate ones with a
single issue per line. = Also don't use ':' in GitHub tags, please: it
just doesn't = respect our commit message style. Unfortunately, this
commit= tags no issue, since it is pushed to tarantool/luajit repo, but
the issues relate to tarantool/tarantool. Hence, you need to = explicitly
mention the latter repo. And last (but not = least), we agreed with
Sergey to use "Resolves" instead of = "Closes" in tarantool/luajit repo
and use "Closes" in the = corresponding patches bumping LuaJIT submodule.
As for = these patches, I'd rather use "Fixes", since you have checked
that the failures related to the mentioned issues are gone. = Considering
everything above, the line above transforms = into the following:

| Fixes = tarantool/tarantool#4095
| Fixes = tarantool/tarantool#4199
| Fixes = tarantool/tarantool#4614


Signed-off-by: Sergey Ostanevich = <sergos@tarantool.org>

diff = --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index = 767bf6f3..2850aea9 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -387,6 +387,7 @@ = static Reg asm_fuseloadk64(ASMState *as, IRIns *ir)
ir->i= =3D (int32_t)(as->mctop - as->mcbot);
as->mcbot = +=3D 8;
as->mclim =3D as->mcbot + MCLIM_REDZONE;
+ lj_mcode_commitbot(as->J, as->mcbot);
}
as->mrm.ofs =3D (int32_t)mcpofs(as, = as->mctop - ir->i);
as->mrm.base =3D RID_RIP;
diff --git a/test/tarantool-tests/gh-4199-gc64-flaky.test.lua = b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
new = file mode 100644
index 00000000..3ac30427
--- = /dev/null
+++ = b/test/tarantool-tests/gh-4199-gc64-flaky.test.lua
@@ -0,0 = +1,63 @@

Regarding test code = style: I don't want to point each place you have
violated = it, so I just refer the document[1] that we try to follow except
the one rule: we use 2 spaces for indentation to get tests = closer to
LuaJIT sources. Hence, please consider that I've = "implicitly" commented
every spot below, where you don't = follow our code style.

+-- the test is GC64 only
+local = ffi=3Drequire('ffi')
+require('utils').skipcond(not = ffi.abi('gc64'), 'test is GC64 only')
+
+local= tap =3D require("tap")
+local test =3D = tap.test("gh-4199-gc64-flaky")
+test:plan(1)
+
+-- first - we have to make a gap from = current JIT infra to next
+-- available mappable memory
+-- most efficient is to grab it per-page
+
+
+ffi.cdef('void * mmap(void *start, size_t = length, int prot , int flags, int fd, long offset);')
+ffi.cdef('long getpagesize();')
Minor: It's better use <ffi.cdef> once but with = multiline declarations
instead of several <ffi.cdef> = calls.

+
+local pagesize =3D = tonumber(ffi.C.getpagesize())
+local blob =3D {}
+for i=3D1, 4e9/pagesize do
+ blob[i] =3D = ffi.C.mmap(ffi.cast('void*',0), pagesize, 0, 0x22, 0, 0)

0x22, 4e9 -- these are magic = numbers for me. Please, create a variable
with the = descriptive name (and even comments, I guess) for each of them,
to ease the further maintenance.

+ assert(blob[i] ~=3D = 0)
+end
+
+-- try to chomp all = memory in currently allocated gc space
+collectgarbage('stop')

Since you decided to stop GC, you can do it right at the very = beginning,
so you don't need to anchor all mmaped memory = above.

+local dummy=3D{'a'}
+for i=3D2,30 do
+ dummy[i] =3D dummy[i - 1] .. dummy[i - 1]
+end

Why do you = need this loop?

+
+-- generate a bunch of functions and keep = them stored to trigger wrong constant placement
+
+local s=3D{}

Again, = since GC is stopped, there is no need to anchor all the functions
generated below.

+local pass =3D true
+
+jit.opt.start('hotloop=3D1=E2=80=99)
+for = n=3D1,100 do
+ local src=3D'function f'.. n .. = [[(x,y,z,f,g,h,j,k,r,c,d)
+ local a=3D{}
+ = for i=3D1,1e6 do
+ a[i] =3D x + y + z + f + g + h + j + k = + r + c + d
+ if (x > 0) then a[i] =3D a[i] + 1.1 = end
+ if (c > 0) then a[i] =3D a[i] + 2.2 end
+ if (z > 0) then a[i] =3D a[i] + 3.3 end
+ = if (f > 0) then a[i] =3D a[i] + 4.4 end
+ x=3Dx+r
+ y=3Dy-c
+ z=3Dz+d
+ end
+ return a[1]
+ end
+ return f]] = .. n ..'(...)'
+
+ s[n] =3D = assert(load(src))
+ local res1 =3D = s[n](1,2,3,4,5,6,7,8,9,10,11)
+ local res2 =3D = s[n](1,2,3,4,5,6,7,8,9,10,11)

This was not obvious to me, but I've finally got it: you = compare the
result yielded by interpreter with the one = yielded by the trace. At
first, you don't need to run all = 1e6 iterations: you have set 'hotloop'
to 1, so you need = only 3 (!) iterations to successfully compile the
loop. = The second call will use the compiled trace for the first loop
iteration, but also will try to compile the function itself. = AFAIU, the
constant being materialized on the trace is the = <a> table address,
right? At least I see no other = option, so please mention in the comment
which constant is = fused and leads to the failure if the patch is
missing. = Finally, I don't get, why do you need 100 iterations for
getting the misbehaviour.

Please, = add everything I dumped below as the corresponding comments for
this part.

+ if (res1 ~=3D res2) then
+ pass = =3D false

Why don't you add = the assertion about constant fusion right here?

+ break
+ = end
+end
+
+test:ok(pass, = 'wrong IR constant fuse')

[1]: = https://www.tarantool.io/en/doc/latest/dev_guide/lua_style_guid= e/

-- 
Best = regards,
IM

[1]: https://github.com/tarantool/luajit/tree/sergos/gh-4095-gc64-co= nst-fusion
[2]: https://github.com/tarantool/luajit/tree/skaplun/gc64-constant-= fuse-full-ci
[3]: https://github.com/tarantool/tarantool/tree/skaplun/gh-4199-gc6= 4-fuse-full-ci

-- 
Best = regards,
Sergey = Kaplun

= --Apple-Mail=_2F0C549A-A24C-480E-B8BC-188922B0BF74--