From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A2E6237D26F; Fri, 24 Mar 2023 16:09:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A2E6237D26F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1679663371; bh=wudxqDFkbyeDVE2ky1h4HQ/KDDD170C8/Hh/f9oxR8o=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=uge/3aardcVl0uyN5AJXWlpmgYd7FKtaNz+iLtuwRnUHJm/QUQVvtmROqCavN3EJU Vr+1PPKSYIBWQz0SwssjakG6KdDqz8UsVn5K8tFl9Ud7EYBGVMnfIOj77WSy6zlEvt rhf8B+eUUX+kxge4UeK3OWTPAJ5VLou9BzHFlJ4Q= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 92CA81FA901 for ; Fri, 24 Mar 2023 16:09:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 92CA81FA901 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1pfhAe-000814-HR; Fri, 24 Mar 2023 16:09:29 +0300 Date: Fri, 24 Mar 2023 16:05:43 +0300 To: Maxim Kokryashkin Message-ID: References: <20230322082739.25391-1-skaplun@tarantool.org> <1679564209.269141204@f401.i.mail.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1679564209.269141204@f401.i.mail.ru> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9E0778B747EE6CD413DD4D567E1BBC0478A1087409B657D0F182A05F5380850404C228DA9ACA6FE278633B9102A7C6BDC6B22A0E9286B1FE9B647D1498D5B395A533DF49A4464E7A6 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7EA4B66823129EB3CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637D7C7E03580C3DB018638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8697624CF7AE8DC3518DA5AE76E22B369117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2BE06DBDC430765AA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F4460429728776938767073520CCD848CCB6FE560C2CC0D3CB04F14752D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B67393CE827C55B5F775ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A54D7C95454D45DCC0EF0294FD04605B3868AFFB7C5BFC546EF87CCE6106E1FC07E67D4AC08A07B9B0B355ED1E20F5346ACB5012B2E24CD356 X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D342C0B628602DFD0BC3FD5D7A775CE3BD7A92F2BFD7ABE036EA106229617DC0FED8D8D0AFF33B84B6A1D7E09C32AA3244C018630CDF1AB2C42AF405E39E95CA03CA95CA90A1D8AC565FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojXbeGizBE3SC91R6JOCndGw== X-DA7885C5: 6F7A3247A4FE48CD10B7AAA4F792021D71D49440C7D4F6B293BA163C3DF616AC262E2D401490A4A0DB037EFA58388B346E8BC1A9835FDE71 X-Mailru-Sender: 689FA8AB762F73933AF1F914F131DBF570642721E7636B385591F088F43703E50FBE9A32752B8C9C2AA642CC12EC09F1FB559BB5D741EB962F61BD320559CF1EFD657A8799238ED55FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Maxim! Thanks for the review! I've updated the commit message and comments, force-pushed the branch. On 23.03.23, Maxim Kokryashkin wrote: > > Hi! > Thanks for the patch! >   > >  > >>From: Mike Pall > >> > >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) > >> > >>The accessing of memory address for some operation `emit_rma()` may be > >>encoded in one of the following ways: > >> a. If the offset of the accessing address from the dispatch table > >I suggest paraphrasing it the following way: > >| If the offset of the address being accessed from the dispatch table > >  > >Same for the similar sentences below. > >>    (pinned to r14 that is not changed while trace execution) fits into > >Typo: s/while/during > >>    32-bit, then encode this as an access to 32-bit displacement > >>    relative to r14. > >> b. If the offset of the accessing address from the mcode (i.e. rip) > >>    fits into 32-bit, then encode this as an access to 32-bit > >>    displacement relative to rip (considering long mode specifics and > >>    `RID_RIP` hack). > >> c. If the address doesn't fit into 32-bit one and we use `mov` or > >>    `movsd`, then encode 64-bit load from this address. > >> d. Elsewhere, encode it as an access to 32-bit (the address should fit > >>    into 32-bit one) displacement (the only option for non-GC64 mode). > >> > >>So, each instruction in GC64 mode differs from `mov` or `movsd` should > >Typo: s/differs/that differs/ > >>be encoded via the last option. But if we got a 64-bit address with a > >Typo: s/got/get/ > >>big enough offset it can't be encoded and the assertion in `ptr2addr()` > >Typo: s/offset/offset,/ > >Typo: s/encoded/encoded, > >>will fail. > >Typo: s/will fail/fails/ > >> > >>There are several cases, when `emit_rma()` is used with non `mov` > >Typo: s/with non-`mov`/with a non-`mov`/ > >>instruction: > >>* `IR_LDEXP` with `fld` instruction for loading constant > >>   number `TValue` by address. > >>* `IR_OBAR` with the corresponding `test` instruction on > >>  `marked` field of `GCobj`. > >>All these instructions require an additional register to store value by > >>address. We can't truly allocate a register here due to possibility to > >>break IR assembling which depends on specific register usage. So, we use > >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling, > >>and restore r14 here for emitting. > >> > >>Also, this patch removes `movsd` from condition from the `x86Op` type > >>check, as far as it never uses for the `emit_rma()` routine (see also > >Typo: s/uses/used > >>`emit_loadk64()` for details). > >> > >>Sergey Kaplun: > >>* added the description and the test for the problem > >> > >>Part of tarantool/tarantool#8069 > >>--- Fixed the commit message; new commit message is the following: x64/LJ_GC64: Fix emit_rma(). (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) ``` The accessing of memory address for some operation `emit_rma()` may be encoded in one of the following ways: a. If the offset of the address being accessed from the dispatch table (pinned to r14 that is not changed during trace execution) fits into 32-bit, then encode this as an access to 32-bit displacement relative to r14. b. If the offset of the address being accessed from the mcode (i.e. rip) fits into 32-bit, then encode this as an access to 32-bit displacement relative to rip (considering long mode specifics and `RID_RIP` hack). c. If the address doesn't fit into 32-bit one and we use `mov` or `movsd`, then encode 64-bit load from this address. d. Elsewhere, encode it as an access to 32-bit (the address should fit into 32-bit one) displacement (the only option for non-GC64 mode). So, each instruction in GC64 mode that differs from `mov` or `movsd` should be encoded via the last option. But if we get a 64-bit address with a big enough offset, it can't be encoded, so the assertion in `ptr2addr()` fails. There are several cases, when `emit_rma()` is used with a non-`mov` instruction: * `IR_LDEXP` with `fld` instruction for loading constant number `TValue` by address. * `IR_OBAR` with the corresponding `test` instruction on `marked` field of `GCobj`. All these instructions require an additional register to store value by address. We can't truly allocate a register here due to the possibility to break IR assembling which depends on specific register usage. So, we use and restore r14 here for emitting. Also, this patch removes `movsd` from condition from the `x86Op` type check, as far as it is never used for the `emit_rma()` routine (see also `emit_loadk64()` for details). Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#8069 ``` > >> > >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma > >>PR: https://github.com/tarantool/tarantool/pull/8477 > >>Related issue: https://github.com/tarantool/tarantool/issues/8069 > >> > >>AFAICS, other places with `emit_rma()` usage are not related to the > >>patch as far as they take an offset for the address of JIT constants > >>stored in `jit_State`, so it always be near enough to dispatch. > >> > >>Side note: you may check test-correctness of the last check with GC by > >>changing the corresponding condition check on `GC_WHITES` in asm_obar to > >>CC_NZ (like it will be treated for incorrect check). Be carefull, member > >>that instructions are emitted from bottom to top! > >> > >> src/lj_emit_x86.h | 24 ++++- > >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ > >> 2 files changed, 123 insertions(+), 3 deletions(-) > >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua > >> > >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h > >>index 6b58306b..b3dc4ea5 100644 > >>--- a/src/lj_emit_x86.h > >>+++ b/src/lj_emit_x86.h > >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr) > >>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr)); > >>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) { > >>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr)); > >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) { > >>- emit_rmro(as, xo, rr, rr, 0); > >>- emit_loadu64(as, rr, (uintptr_t)addr); > >>+ } else if (!checki32((intptr_t)addr)) { > >>+ Reg ra = (rr & 15); > >>+ if (xo != XO_MOV) { > >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */ > >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch; > >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0; > >>+ ra = RID_DISPATCH; > >>+ if (checku32(dispaddr)) { > >>+ emit_loadi(as, ra, (int32_t)dispaddr); > >>+ } else { /* Full-size 64 bit load. */ > >>+ MCode *p = as->mcp; > >>+ *(uint64_t *)(p-8) = dispaddr; > >>+ p[-9] = (MCode)(XI_MOVri+(ra&7)); > >>+ p[-10] = 0x48 + ((ra>>3)&1); > >Why is it `0x48`? IINM, this is REX prefix for the instruction [1], Sergos should correct me if I wrong. | perl -E 'say sprintf "%08b", 0x48' | 01001000 ****WRXB ****: Fixed bit pattern. W bit is set to mark that a 64-bit operand size is used. This is the same approach as for `emit_loadu64()`, so I didn't left any comment. > >>+ p -= 10; > >>+ as->mcp = p; > >>+ } > >>+ if (xo == XO_GROUP3b) emit_i8(as, i8); > >>+ } > >>+ emit_rmro(as, xo, rr, ra, 0); > >>+ emit_loadu64(as, ra, (uintptr_t)addr); > >>   } else > >> #endif > >>   { > >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua > >>new file mode 100644 > >>index 00000000..faddfe83 > >>--- /dev/null > >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua > >>+-- Test `IR_LDEXP`. > >>+ > >>+-- Reproducer here is a little tricky. > >>+-- We need to generate a bunch of traces as far we reference an > >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of > >>+-- traces is empirical. Usually, assert fails on ~33d iteration, > >>+-- so let's use 100 just to be sure. > >Is there any way to make it more deterministic? I suppose no, due to undetermenistic memory allocation. So, just use big enough number. > >>+local test_marker > >>+for _ = 1, 100 do > >>+ test_marker = loadstring([[ > >>+ local test_marker > >>+ for i = 1, 4 do > >>+ -- Avoid fold optimization, use `i` as the second argument. > >>+ -- Need some constant differs from 1 or 0 as the first > >>+ -- argument. > >>+ test_marker = math.ldexp(1.2, i) > >>+ end > >>+ return test_marker > >>+ ]])() > >>+end > >>+ > >>+-- If we here, it means no assertion failed during emitting. > >Typo: s/we/we are/ Fixed, thanks! > >>+test:ok(true, 'IR_LDEXP emit_rma') > >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result') > >>+ > >>+-- Test `IR_OBAR`. > >>+ > >>+-- First, create a closed upvalue. > >>+do > >>+ local uv -- luacheck: no unused > >>+ -- `IR_OBAR` is used for object write barrier on upvalues. > >>+ _G.change_uv = function(newv) > >>+ uv = newv > >>+ end > >>+end > >>+ > >>+-- We need a constant value on trace to be referenced far enough > >>+-- from dispatch table. So we need to create a new function > >>+-- prototype with a constant string. > >>+-- This string should be long enough to be allocated with direct > >>+-- alloc far away from dispatch. > >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024 > >Why is that amount sufficient? Link to the source file would be enough. Added the following comment. ``` @@ -64,7 +64,8 @@ end -- from dispatch table. So we need to create a new function -- prototype with a constant string. -- This string should be long enough to be allocated with direct --- alloc far away from dispatch. +-- alloc (not fitting in free chunks) far away from dispatch. +-- See for details. local DEFAULT_MMAP_THRESHOLD = 128 * 1024 local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) local func_with_trace = loadstring([[ @@ -74,7 +75,7 @@ local func_with_trace = loadstring([[ ]]) ``` > >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) > >>+local func_with_trace = loadstring([[ > >>+ for _ = 1, 4 do > >>+ change_uv(']] .. str .. [[') > >>+ end > >>+]]) > >>+func_with_trace() > >>+ > >>+-- If we here, it means no assertion failed during emitting. > >Typo: s/we here/we are here/ Fixed, thanks! > >>+test:ok(true, 'IR_OBAR emit_rma') > >>+ > >>+-- Now check the correctness. > >>+ > >>+-- Set GC state to GCpause. > >>+collectgarbage() > >>+ > >>+-- We want to wait for the situation, when upvalue is black, > >>+-- the string is gray. Both conditions are satisfied, when the > >>+-- corresponding `change_uv()` function is marked, for example. > >>+-- We don't know on what exactly step our upvalue is marked as > >Typo: s/exactly/exact/ Fixed, thanks! > >>+-- black and execution of trace become dangerous, so just check it > >>+-- at each step. > >>+-- Don't need to do the full GC cycle step by step. > >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic > >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do > >>+ collectgarbage('step') > >>+ func_with_trace() > >>+end > >>+ > >>+-- If we here, it means no assertion failed during `gc_mark()`, > >Typo: s/we here/we are here/ Fixed, thanks! > >>+-- due to wrong call to `lj_gc_barrieruv()` on trace. > >>+test:ok(true, 'IR_OBAR emit_rma check correctness') > >>+ > >>+os.exit(test:check() and 0 or 1) > >>-- > >>2.34.1 > >-- > >Best regards, > >Maxim Kokryashkin > >  [1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix -- Best regards, Sergey Kaplun