From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id BEDD71510536; Wed, 27 Aug 2025 12:16:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BEDD71510536 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1756286191; bh=fUjjmCYrFCCByVCfWD7YYZ/tjF+v8GhIB/nvmQs4yd8=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=bZN/YX+JqRH/P4ycLlQ3hkhecRViX+7eR5quApaV/QYNWh4a8Z5OUhPudPhS9QBFA Gj5HpRkr3zP+Um3woR5tAH+iNcdL9juU6ew+WUhMWzFyIz7dRNGarbo7AiHb+Ac12U XYJFK+eHP9MkyAy/yu+gPN5BozOihcU3q+E7qzUo= Received: from send194.i.mail.ru (send194.i.mail.ru [95.163.59.33]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 99294254C7F for ; Wed, 27 Aug 2025 12:16:30 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 99294254C7F Received: by exim-smtp-6944cbb85b-qzkzj with esmtpa (envelope-from ) id 1urCGb-00000000DI8-2uZk; Wed, 27 Aug 2025 12:16:30 +0300 To: Sergey Bronnikov Date: Wed, 27 Aug 2025 12:17:11 +0300 Message-ID: <20250827091711.13681-1-skaplun@tarantool.org> X-Mailer: git-send-email 2.51.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailru-Src: smtp X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9771EFB8797C310D1D30A345599E830D57F491595A94C6D7D182A05F5380850404C228DA9ACA6FE27C7D7E4CAAB2D95603DE06ABAFEAF67059B10A9C5ED4222293F7F5878BAF60E247350E528B29B8D5B X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE789066434B85BF7C7EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB55337566B0A081E0BBE0864797399D32B3F6ABF5DB969278446DD1F798E50C5DF24BF4BE389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A633E4711A430BBE8941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B68CC5112E3E56BCDBCC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB86D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE78E7FE8CC7D0C7735731C566533BA786AA5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A594C848E90EA91B005002B1117B3ED696383BF4517DBD554C92B673A2F5DDD7E7823CB91A9FED034534781492E4B8EEADCC19BF12C8264DFDC79554A2A72441328621D336A7BC284946AD531847A6065A535571D14F44ED41 X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742DC8270968E61249B1004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D3419891600CCEF46076462864F076D61E32673D77B40F78344FC44B2FD613BBBA4AC5ECD9EBA6C6EAC1D7E09C32AA3244C473701833CB042A577DD89D51EBB7742F3FC06081CCDBE78EA455F16B58544A2557BDE0DD54B3590A5AE236DF995FB59829709634694AABAED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVXNcwk+fZooKRvm53dMKMd8= X-DA7885C5: 2812384E7D3D79B7F255D290C0D534F9BB8C512AC9D3432872EAB9E1EB239574DFAA390E9C3792D45B1A4C17EAA7BC4BEF2421ABFA55128DAF83EF9164C44C7E X-Mailru-Sender: 689FA8AB762F7393FE9E42A757851DB65A6A49EF82760B4785CC6046A49C2A83CC95501AB9630459E49D44BB4BD9522A059A1ED8796F048DB274557F927329BE89D5A3BC2B10C37545BD1C3CC395C826B4A721A3011E896F X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusion (again). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" From: Mike Pall Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley. (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b) Assume we have stores/loads from the pointer with offset +488 and -16. The lower bits of the offset are the same as for the offset (488 + 8). This leads to the incorrect fusion of these instructions: | str x20, [x21, 488] | stur x20, [x21, -16] to the following instruction: | stp x20, x20, [x21, 488] This patch prevents this fusion by more accurate offset comparison. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#11691 --- Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion Related issues: * https://github.com/tarantool/tarantool/issues/11691 * https://github.com/LuaJIT/LuaJIT/issues/1075 src/lj_emit_arm64.h | 17 ++- ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++ 2 files changed, 142 insertions(+), 4 deletions(-) create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h index 5c1bc372..9dd92c40 100644 --- a/src/lj_emit_arm64.h +++ b/src/lj_emit_arm64.h @@ -121,6 +121,17 @@ static int emit_checkofs(A64Ins ai, int64_t ofs) } } +static LJ_AINLINE uint32_t emit_lso_pair_candidate(A64Ins ai, int ofs, int sc) +{ + if (ofs >= 0) { + return ai | A64F_U12(ofs>>sc); /* Subsequent lj_ror checks ofs. */ + } else if (ofs >= -256) { + return (ai^A64I_LS_U) | A64F_S9(ofs & 0x1ff); + } else { + return A64F_D(31); /* Will mismatch prev. */ + } +} + static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) { int ot = emit_checkofs(ai, ofs), sc = (ai >> 30) & 3; @@ -132,11 +143,9 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) uint32_t prev = *as->mcp & ~A64F_D(31); int ofsm = ofs - (1<>sc)) || - prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsm&0x1ff))) { + if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsm, sc)) { aip = (A64F_A(rd) | A64F_D(*as->mcp & 31)); - } else if (prev == (ai | A64F_N(rn) | A64F_U12(ofsp>>sc)) || - prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsp&0x1ff))) { + } else if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsp, sc)) { aip = (A64F_D(rd) | A64F_A(*as->mcp & 31)); ofsm = ofs; } else { diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua new file mode 100644 index 00000000..c84c3b23 --- /dev/null +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua @@ -0,0 +1,129 @@ +local tap = require('tap') +local ffi = require('ffi') + +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP +-- instruction fused from LDR/STR with negative offset and +-- positive offset with the same lower bits on arm64. +-- See also https://github.com/LuaJIT/LuaJIT/pull/1075. +local test = tap.test('lj-1075-arm64-incorrect-ldp-stp-fusion'):skipcond({ + ['Test requires JIT enabled'] = not jit.status(), +}) + +test:plan(6) + +-- Amount of iterations to compile and run the invariant part of +-- the trace. +local N_ITERATIONS = 4 + +local EXPECTED = 42 + +-- 4 slots of redzone for int64_t load/store. +local REDZONE = 4 +local MASK_IMM7 = 0x7f +local BUFLEN = (MASK_IMM7 + REDZONE) * 4 +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0) + +local function clear_buf() + ffi.fill(buf, ffi.sizeof(buf), 0) +end + +-- Initialize the buffer with simple values. +local function init_buf() + -- Limit to fill the buffer. 0 in the top part helps + -- to detect the issue. + local LIMIT = BUFLEN - 12 + for i = 0, LIMIT - 1 do + buf[i] = i + end + for i = LIMIT, BUFLEN - 1 do + buf[i] = 0 + end +end + +jit.opt.start('hotloop=2') + +-- Assume we have stores/loads from the pointer with offset +-- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are +-- 1111100. These bits are the same as for the offset (488 + 8). +-- Thus, before the patch, these two instructions: +-- | str x20, [x21, #488] +-- | stur x20, [x21, #-16] +-- are incorrectly fused to the: +-- | stp x20, x20, [x21, #488] + +-- Test stores. + +local start = ffi.cast('unsigned char *', buf) +-- Use constants to allow optimization to take place. +local base_ptr = start + 16 +for _ = 1, N_ITERATIONS do + -- Save the result only for the last iteration. + clear_buf() + -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496` + -- on the trace before the patch. + ffi.cast('uint64_t *', base_ptr + 488)[0] = EXPECTED + ffi.cast('uint64_t *', base_ptr - 16)[0] = EXPECTED +end + +test:is(buf[488 + 16], EXPECTED, 'correct store top value') +test:is(buf[0], EXPECTED, 'correct store bottom value') + +-- Test loads. + +init_buf() + +local top, bottom +for _ = 1, N_ITERATIONS do + -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496` + -- on the trace before the patch. + top = ffi.cast('uint64_t *', base_ptr + 488)[0] + bottom = ffi.cast('uint64_t *', base_ptr - 16)[0] +end + +test:is(top, 0xfffefdfcfbfaf9f8ULL, 'correct load top value') +test:is(bottom, 0x706050403020100ULL, 'correct load bottom value') + +-- Another reproducer that is based on the snapshot restoring. +-- Its advantage is avoiding FFI usage. + +-- Snapshot slots are restored in the reversed order. +-- The recording order is the following (from the bottom of the +-- trace to the top): +-- - 0th (ofs == -16) -- `f64()` replaced the `tail64()` on the +-- stack, +-- - 63rd (ofs == 488) -- 1, +-- - 64th (ofs == 496) -- 2. +-- At recording, the instructions for the 0th and 63rd slots are +-- merged like the following: +-- | str x3, [x19, #496] +-- | stp x2, x1, [x19, #488] +-- The first store is dominated by the stp, so the restored value +-- is incorrect. + +-- Function with 63 slots on the stack. +local function f63() + -- 61 unused slots to avoid extra stores in between. + -- luacheck: no unused + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _ + return 1, 2 +end + +local function tail63() + return f63() +end + +-- Record the trace. +tail63() +tail63() +-- Run the trace. +local one, two = tail63() +test:is(one, 1, 'correct 1st value on stack') +test:is(two, 2, 'correct 2nd value on stack') + +test:done(true) -- 2.51.0