From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 88D3A2A0379; Wed, 25 Jan 2023 22:01:27 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 88D3A2A0379 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1674673287; bh=c/Lu/SP01WNzVPGEkWrosCoJs1SLsarL6MRlq/G16E0=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=bJBk8y0F4ODbWP1x1IFQ/TxjvBDfpLmynyHWnEl9SVmnCT5gyygM13Pp3Rm6SJZki bgoV/ciunvnnEXYA5TuNMYc1H/cGamcgN9lIzQfsgEoNpHzNbQgT3ymqXM5VQVLifU 6fUkelV5O0YMc49h0B9n8YhGuvZVKlX+voRt0/zg= Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [95.163.41.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id DF5F388D68 for ; Wed, 25 Jan 2023 22:01:25 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org DF5F388D68 Received: by smtp35.i.mail.ru with esmtpa (envelope-from ) id 1pKl1Q-00BxiE-Qw; Wed, 25 Jan 2023 22:01:25 +0300 To: Sergey Ostanevich , Maxim Kokryashkin Date: Wed, 25 Jan 2023 21:57:58 +0300 Message-Id: <20230125185758.5760-1-skaplun@tarantool.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailru-Src: smtp X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9AD83B49AC1DDA08967EC36FA2B240E0E9DA1CCEB7D38F385182A05F5380850404C228DA9ACA6FE271F0137D150D3ED028DF4F6715DF489F045E80DFF143FA05D410AEB8FF27FD916 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE70A10A23A3B64B805EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637B1AF1CB5CD5D34E98638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8F4FC2EBFC5BBBD9F14DEACCA8CBFF9D0117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCE2CCD8F0CAA010FB389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8A2475C18617057EEF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947CCF7CD7A0D5AA5F25C0837EA9F3D197644AD6D5ED66289B52698AB9A7B718F8C46E0066C2D8992A16725E5C173C3A84C3D1CB9C1829AC0833BA3038C0950A5D36B5C8C57E37DE458B330BD67F2E7D9AF16D1867E19FE14079C09775C1D3CA48CFED8438A78DFE0A9E1DD303D21008E298D5E8D9A59859A8B6B372FE9A2E580EFC725E5C173C3A84C369456C5265B6C55C35872C767BF85DA2F004C90652538430E4A6367B16DE6309 X-C1DE0DAB: 0D63561A33F958A541DC25C19762951155C10AFA539AF5FB7DC1905FCC6329554EAF44D9B582CE87C8A4C02DF684249CC203C45FEA855C8F X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34F8A2DF183797A6A8046D5FDA930DE17A29D8C1D8E92F3818810C0FB06F5D0383E78BAF319A436AB21D7E09C32AA3244C1799AEC350A4239841A94DFD107F710DE8FBBEFAE1C4874C927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj8qpwqg954X8jXHFpqUc5og== X-Mailru-Sender: F16D9CAFEEA6770E7B6EAD4ADB3BCAF0A64F60B3E234CCEC4E8EF1F29A2E854BC0CD8652978296A9F2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A84198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit] ARM64: Fix {AHUV}LOAD specialized to nil/false/true. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" From: Mike Pall Reported by caohongqing. (cherry picked from commit 5bf0da3d7c02f9959fa3a9fb721e0565137b70c8) If there is high register pressure, and there are almost all registers in use during the aforementioned assembling, the same register is chosen as the one holding the given stack slot and the one holding the constant value for the type comparison. As the result we get the following assertion guard check in assembly: | cmp x0, x0, lsr #32 | bne ->0 Which is always false. This happens because the `tmp` register (used for loading constant type to compare in assertion guard) is scratched from `gpr` register set, but not from `allow` set, which is used during picking the register for slot loading (at the begging `allow` and `gpr` sets are the same). This patch changes `allow` set to `gpr` to fix the issue. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#8069 --- Branch: https://github.com/tarantool/luajit/tree/skaplun/or-94-fix-bool-ahuvload PR: https://github.com/tarantool/tarantool/pull/8214 Issues: * https://github.com/tarantool/tarantool/issues/8069 * https://github.com/openresty/luajit2/pull/94 src/lj_asm_arm64.h | 2 +- .../or-94-arm64-ir-ahuvload-bool.test.lua | 145 ++++++++++++++++++ 2 files changed, 146 insertions(+), 1 deletion(-) create mode 100644 test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h index 4aeb51f3..30d34ca1 100644 --- a/src/lj_asm_arm64.h +++ b/src/lj_asm_arm64.h @@ -1055,7 +1055,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir) emit_n(as, (A64I_CMNx^A64I_K12) | A64F_U12(1), tmp); } else { emit_nm(as, A64I_CMPx | A64F_SH(A64SH_LSR, 32), - ra_allock(as, (irt_toitype(ir->t) << 15) | 0x7fff, allow), tmp); + ra_allock(as, (irt_toitype(ir->t) << 15) | 0x7fff, gpr), tmp); } if (ofs & FUSE_REG) emit_dnm(as, (A64I_LDRx^A64I_LS_R)|A64I_LS_UXTWx|A64I_LS_SH, tmp, idx, (ofs & 31)); diff --git a/test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua b/test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua new file mode 100644 index 00000000..a9b0071c --- /dev/null +++ b/test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua @@ -0,0 +1,145 @@ +local tap = require('tap') +-- Test file to demonstrate the incorrect JIT assembling +-- for IR_{AHUV}LOAD specialized to boolean. +-- See also: https://github.com/openresty/luajit2/pull/94. + +-- If there is high register pressure, and there are almost +-- all registers in use during the aforementioned assembling, +-- the same register is chosen as the one holding the given +-- stack slot and the one holding the constant value for the type +-- comparison. As the result we get the following assertion +-- guard check in assembly: +-- | cmp x0, x0, lsr #32 +-- | bne ->0 +-- Which is always false. +local ffi = require('ffi') +-- Need for code generation. +_G.ffi = ffi + +local traceinfo = require('jit.util').traceinfo + +-- Each payload will be recording with the corresponding IR. +local TESTS = { + {irname = 'ALOAD', payload = 'aload()'}, + {irname = 'HLOAD', payload = 'hload()'}, + {irname = 'ULOAD', payload = 'uload()'}, + {irname = 'VLOAD', payload = '...'}, +} +local N_TESTS = #TESTS + +local test = tap.test('or-94-arm64-ir-ahuvload-bool') +test:plan(N_TESTS) + +-- Functions to be inlined on trace to generate different +-- types of IRs (ALOAD, HLOAD, ULOAD). +-- Declare as global for code generation. +local arr = {true} +local function aload() + return arr[1] +end +_G.aload = aload + +local h = {data = true} +local function hload() + local boolvalue = h.data + return boolvalue +end +_G.hload = hload + +do + local upvalue = true + local function uload() + return upvalue + end + -- Make upvalue muttable. Not really need to return this + -- function. + local function _() + upvalue = not upvalue + end + _G.uload = uload +end + +-- This function generate code like the following: +-- | local test_f(...) +-- | local r +-- | local rup1 +-- | --[[...]] +-- | for _ = 1, 4 do +-- | r1 = ffi.cast("int", 1) +-- | --[[...]] +-- | r = main_payload() +-- | rup1 = r1 +-- | --[[...]] +-- | end +-- | end +-- | return test_f +-- Those `rn` variables before and after `main_payload` are +-- required to generate enough register pressure (for GPR). Amount +-- of repeats is empirical. +-- Additional `test_f(...)` wrapper is needed for IR_VLOAD usage, +-- when `main_payload` is just `...`. +local function generate_payload(n_fillers, main_payload) + local code_chunk = 'local function test_f(...)\n' + code_chunk = code_chunk .. 'local r\n' + for i = 1, n_fillers do + code_chunk = code_chunk .. ('local rup%d\n'):format(i) + end + code_chunk = code_chunk .. 'for _ = 1, 4 do\n' + for i = 1, n_fillers do + code_chunk = code_chunk .. + ('local r%d = ffi.cast("int", %d)\n'):format(i, i) + end + code_chunk = code_chunk .. 'r = ' .. main_payload .. '\n' + for i = 1, n_fillers do + code_chunk = code_chunk .. ('rup%d = r%d\n'):format(i, i) + end + code_chunk = code_chunk .. 'end\nend\n' + code_chunk = code_chunk .. 'return test_f' + local f, err = loadstring(code_chunk, 'test_function') + assert(type(f) == 'function', err) + f = f() + assert(type(f) == 'function', 'returned generated value is not a function') + return f +end + +-- Disable sink optimization to allocate more registers in a +-- "convenient" way. 'hotexit' option is required to be sure that +-- we will start a new trace on false-positive guard assertion. +-- The new trace contains the same IR and so the same assertion +-- guard. This trace will be executed, assertion guard failed +-- again and the new third trace will be recorded. This trace will +-- be the last one to record as far as iterations over cycle are +-- finished and we returning from the function. The report of +-- `jit.dump` before the patch is the following: +-- | TRACE 1 start "test_function":29 +-- | TRACE 1 stop -> loop +-- | TRACE 1 exit 0 +-- | TRACE 2 start 1/0 "test_function":30 +-- | TRACE 2 stop -> loop +-- | TRACE 2 exit 0 +-- | TRACE 3 start 2/0 "test_function":30 +-- | TRACE 3 stop -> return +-- On the other hand, after the patch we will have only 2 traces: +-- the main one, and the second is recorded on exit from loop. +-- Hence, the report of `jit.dump` will be the following: +-- | TRACE 1 start "test_function":29 +-- | TRACE 1 stop -> loop +-- | TRACE 1 exit 3 +-- | TRACE 2 start 1/3 "test_function":84 +-- | TRACE 2 stop -> return +jit.opt.start('hotloop=1', 'hotexit=1', '-sink') +-- Prevent random hotcount to be sure that the cycle is compiled +-- first. +jit.off() +jit.flush() +for i = 1, N_TESTS do + local f = generate_payload(26, TESTS[i].payload) + jit.on() + -- Argument is needed only for IR_VLOAD. + f(true) + jit.off() + test:ok(not traceinfo(3), 'not recorded sidetrace for IR_' .. TESTS[i].irname) + jit.flush() +end + +os.exit(test:check() and 0 or 1) -- 2.34.1