From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id B358F145CEE8; Thu, 26 Jun 2025 18:12:25 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B358F145CEE8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1750950745; bh=AUUu4fRcrssGuw/oGWKo/uDmtXeaCMsD8hDBNOsQlIg=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Al7rHX7d+uGZUHxeENLwJQMjzYzbiWEGEP7o8QQ1Qk49FjhSd6qbpaJpkddMV2M33 c2R9cbHcbefbgqPNNeQRgxlkyo5hUXNEjdX/EC+B7sYwFbZF8cdnzl8GspMfdXljqH Nrv8YHak7JCeGk+B+mRHpKPNJNPUUSaWg6mUGvAk= Received: from send81.i.mail.ru (send81.i.mail.ru [89.221.237.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B64E06ECCD for ; Thu, 26 Jun 2025 18:12:24 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B64E06ECCD Received: by exim-smtp-68f89ddb46-xqzk8 with esmtpa (envelope-from ) id 1uUoH1-00000000FFS-2jLF; Thu, 26 Jun 2025 18:12:24 +0300 To: Sergey Bronnikov Date: Thu, 26 Jun 2025 18:12:24 +0300 Message-ID: <20250626151224.27925-1-skaplun@tarantool.org> X-Mailer: git-send-email 2.49.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailru-Src: smtp X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D919194CF4FC6604945722CE3A4B5E787E1C3359CFC02A7200894C459B0CD1B9B182C73AB1F5B0C0411046492FDDF8063304BC602670296A45A60ABA0AB13E5F2208A4954BE01924 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F65C230EDDCD559EEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB553375669C5B23AD11DEB7B908AFDE303CC7026F21E2A392EBD21CFBE332458EA68FCC93389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C07734D68A6916D8318941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6AEEA5BB16A939343CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB86D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166176DF2183F8FC7C04E672349037D5FA5725E5C173C3A84C3483320834B361D1F089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A55E1B2C167F404F405002B1117B3ED696A24DAB38F74F488CED71F038FC046993823CB91A9FED034534781492E4B8EEAD90A7226BC187CE1EF36E2E0160E5C55395B8A2A0B6518DF68C46860778A80D548E8926FB43031F38 X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742DC8270968E61249B1004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D34BDCC71B3781C9C9539008716FD50BE88EA360F16FDD1EEBFF18BC3744B536F313E399871D4EFFF1C1D7E09C32AA3244CDAF363B7995547C777DD89D51EBB77427EA5A8E514E57F83EA455F16B58544A2557BDE0DD54B3590A5AE236DF995FB59829709634694AABAED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVVWXk7QTiVzHLKnEMZ6VXIc= X-DA7885C5: E4A573B1E54DE94BF255D290C0D534F9B1C6F769EE10B6E99AD705EF8DE64A5B282C7596AD87D8AB5B1A4C17EAA7BC4BEF2421ABFA55128DAF83EF9164C44C7E X-Mailru-Sender: 689FA8AB762F7393FE9E42A757851DB61F8395A6866B197C1AF95CAD4E6C2BE814EF3DAB50B6FDFDE49D44BB4BD9522A059A1ED8796F048DB274557F927329BE89D5A3BC2B10C37545BD1C3CC395C826B4A721A3011E896F X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusing for unaligned accesses. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" From: Mike Pall Thanks to Peter Cawley. (cherry picked from commit 0fa2f1cbcf023ad0549f1428809e506fa2c78552) The arm64 emitting of load/store operation works incorrectly in the case when at least one offset of load/store to be fused into ldp/stp is misaligned. In this case this misaligning is ignored, and instructions are fused, which leads to loading/storing from/to at least one incorrect address. For example, the following instructions: | stur w0, [x1, #17] | stur w0, [x1, #21] May be fused to the following: | stp w0, w0, [x1, #16] This patch prevents fusion in this case by testing the alignment with the help of bitwise ROR by the alignment value. In case of misaligned offset, the value overflows the 7-bit length mask in the check. The negative immediate (7-bit width including sign bit) is limited by the corresponding addition of `64 << sc` (it is harmless in the case of positive values). Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#11278 --- Related issues: * https://github.com/LuaJIT/LuaJIT/issues/1056 * https://github.com/tarantool/tarantool/issues/11278 Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1056-arm64-ldp-sdp-misaligned-fusing src/lj_emit_arm64.h | 2 +- ...6-arm64-ldp-sdp-misaligned-fusing.test.lua | 98 +++++++++++++++++++ 2 files changed, 99 insertions(+), 1 deletion(-) create mode 100644 test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h index 30cd3505..5c1bc372 100644 --- a/src/lj_emit_arm64.h +++ b/src/lj_emit_arm64.h @@ -142,7 +142,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) } else { goto nopair; } - if (ofsm >= (int)((unsigned int)-64<mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) | (ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000)); return; diff --git a/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua new file mode 100644 index 00000000..5d03097e --- /dev/null +++ b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua @@ -0,0 +1,98 @@ +local tap = require('tap') +local ffi = require('ffi') + +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP +-- instructions from LDUR/STUR instructions with misaligned offset +-- on arm64. +-- See also https://github.com/LuaJIT/LuaJIT/issue/1056. +local test = tap.test('lj-1056-arm64-ldp-sdp-misaligned-fusing'):skipcond({ + ['Test requires JIT enabled'] = not jit.status(), +}) + +test:plan(6) + +-- Amount of iterations to compile and run the invariant part of +-- the trace. +local N_ITERATIONS = 4 + +local EXPECTED = 42 + +-- 9 bytes to make unaligned 4-byte access like buf + 5. +local BUFLEN = 9 +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0) + +local function clear_buf() + ffi.fill(buf, ffi.sizeof(buf), 0) +end + +-- Initialize the buffer with simple values. +local function init_buf() + for i = 0, BUFLEN - 1 do + buf[i] = i + end +end + +local function test_buf_content(expected_bytes, msg) + local got_bytes = {} + assert(#expected_bytes == BUFLEN, 'mismatched size of buffer and table') + for i = 0, BUFLEN - 1 do + got_bytes[i + 1] = buf[i] + end + test:is_deeply(got_bytes, expected_bytes, msg) +end + +jit.opt.start('hotloop=1') + +-- Test stores. + +for _ = 1, N_ITERATIONS do + local ptr = ffi.cast('unsigned char *', buf) + -- These 2 accesses become ptr + 0 and ptr + 4 on the trace + -- before the patch. + ffi.cast('int32_t *', ptr + 1)[0] = EXPECTED + ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED +end +test_buf_content({0, EXPECTED, 0, 0, 0, EXPECTED, 0, 0, 0}, + 'pair of misaligned stores') + +clear_buf() + +for _ = 1, N_ITERATIONS do + local ptr = ffi.cast('unsigned char *', buf) + -- The next access becomes ptr + 4 on the trace before the + -- patch. + ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED + ffi.cast('int32_t *', ptr)[0] = EXPECTED +end +test_buf_content({EXPECTED, 0, 0, 0, 0, EXPECTED, 0, 0, 0}, + 'aligned / misaligned stores') + +-- Test loads. + +local resl, resr = 0, 0 + +init_buf() + +for _ = 1, N_ITERATIONS do + local ptr = ffi.cast('unsigned char *', buf) + -- These 2 accesses become ptr + 0 and ptr + 4 on the trace + -- before the patch. + resl = ffi.cast('int32_t *', ptr + 1)[0] + resr = ffi.cast('int32_t *', ptr + 5)[0] +end + +test:is(resl, 0x4030201, 'pair of misaligned loads, left') +test:is(resr, 0x8070605, 'pair of misaligned loads, right') + +for _ = 1, N_ITERATIONS do + local ptr = ffi.cast('unsigned char *', buf) + -- The next access becomes ptr + 4 on the trace before the + -- patch. + resr = ffi.cast('int32_t *', ptr + 5)[0] + resl = ffi.cast('int32_t *', ptr)[0] +end + +test:is(resl, 0x3020100, 'aligned / misaligned load, aligned') +test:is(resr, 0x8070605, 'aligned / misaligned load, misaligned') + +test:done(true) -- 2.49.0