From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 229B914C2155; Mon, 8 Sep 2025 12:47:22 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 229B914C2155 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1757324842; bh=hY+SsW3+EBN+GSSVWjH9X7muiFo/opftovfnqMLpi94=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=naPNtdwy9jmk8mUXTZVQOq43FwpXKrI8o9LIrXbSeRap1eDXE6ld4a8dWekIbEdaL 5afS4n9OMRFMI5vTQp6ySp1RmqWnSZ7S5dgbt2gtSohJjnrg93y93vly96CKQKXFp8 3a/b7NbbonoGLd3Nrj8kZIWrT1OQtYznPf1uBgvM= Received: from send152.i.mail.ru (send152.i.mail.ru [89.221.237.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5A44E14C2155 for ; Mon, 8 Sep 2025 12:47:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5A44E14C2155 Received: by exim-smtp-c584fb9f-9j696 with esmtpa (envelope-from ) id 1uvYT2-00000000EyH-0ECN; Mon, 08 Sep 2025 12:47:20 +0300 Date: Mon, 8 Sep 2025 12:48:09 +0300 To: Sergey Bronnikov Cc: tarantool-patches@dev.tarantool.org Message-ID: References: <20250827091711.13681-1-skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9C019336662AA2141240E23753963194F95EE4A9EDB87E616182A05F5380850404E7D7AB8F4F33CF73DE06ABAFEAF67050628EAAB7317A8AA8D35F30B0FA70C6F6AE5DE23F762A4BF X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F1942E6D70B4A2F0EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB553375661343631359B0F9D99B847FDBA547A8BFBCD1E217C038021AE8F092FD38DF6E9F389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A29E2F051442AF778941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6F459A8243F1D1D44CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB86D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE7B2B7C64F398C7410731C566533BA786AA5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A55857E9A9EFED54EE5002B1117B3ED696A45E9F928AEE12FA69995D676B7B4CBE823CB91A9FED034534781492E4B8EEAD5DF1C2DF01CE7211BDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D3472E5ECC12A9739C126B964A6B216633232C3E2001715366F29B2DE44712BC40994BD5D4CE0D790FC1D7E09C32AA3244CA54895259B87CBCB77DD89D51EBB7742529FA84A5BAFE91AEA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59829709634694AABAED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVdVMtzNxwZu533G6BJs7Qrg= X-DA7885C5: B591638B0D4FEEDBF255D290C0D534F935FEF761C8C788F47C781D479C6B42729A207C8CEE9CAD315B1A4C17EAA7BC4BEF2421ABFA55128DAF83EF9164C44C7E X-Mailru-Sender: 689FA8AB762F7393FE9E42A757851DB639D66A41E7C1E9605A3746274FF0C2984238C194B026BFD4E49D44BB4BD9522A059A1ED8796F048DB274557F927329BE89D5A3BC2B10C37545BD1C3CC395C826B4A721A3011E896F X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusion (again). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Sergey! Thanks for the review! Fixed your comment and force-pushed the branch. On 08.09.25, Sergey Bronnikov wrote: > Hi, Sergey, > > thanks for the patch! LGTM with two minor comments > > Sergey > > > On 8/27/25 12:17, Sergey Kaplun wrote: > >> From: Mike Pall > >> > >> Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley. > >> > >> (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b) > >> > >> Assume we have stores/loads from the pointer with offset +488 and -16. > >> The lower bits of the offset are the same as for the offset (488 + 8). > >> This leads to the incorrect fusion of these instructions: > >> | str x20, [x21, 488] > >> | stur x20, [x21, -16] > >> to the following instruction: > >> | stp x20, x20, [x21, 488] > >> > >> This patch prevents this fusion by more accurate offset comparison. > >> > >> Sergey Kaplun: > >> * added the description and the test for the problem > >> > >> Part of tarantool/tarantool#11691 > >> --- > >> > >> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion > >> Related issues: > >> *https://github.com/tarantool/tarantool/issues/11691 > >> *https://github.com/LuaJIT/LuaJIT/issues/1075 > >> > >> src/lj_emit_arm64.h | 17 ++- > >> ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++ > >> 2 files changed, 142 insertions(+), 4 deletions(-) > >> create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > >> > >> diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h > >> index 5c1bc372..9dd92c40 100644 > >> --- a/src/lj_emit_arm64.h > >> +++ b/src/lj_emit_arm64.h > >> diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > >> new file mode 100644 > >> index 00000000..c84c3b23 > >> --- /dev/null > >> +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > >> @@ -0,0 +1,129 @@ > >> + > >> +jit.opt.start('hotloop=2') > > Why 2? It deserves a comment, because usually we use 1 hotloop. It's a copy-pasting mistake from the aarch64 machine, fixed to `hotloop=1`, thanks: =================================================================== diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua index c84c3b23..393a1aa7 100644 --- a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua @@ -40,7 +40,7 @@ local function init_buf() end end -jit.opt.start('hotloop=2') +jit.opt.start('hotloop=1') -- Assume we have stores/loads from the pointer with offset -- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are =================================================================== > > >> + > >> +-- Another reproducer that is based on the snapshot restoring. > >> +-- Its advantage is avoiding FFI usage. > >> + > >> +-- Snapshot slots are restored in the reversed order. > >> +-- The recording order is the following (from the bottom of the > >> +-- trace to the top): > >> +-- - 0th (ofs == -16) -- `f64()` replaced the `tail64()` on the > >> +-- stack, > >> +-- - 63rd (ofs == 488) -- 1, > >> +-- - 64th (ofs == 496) -- 2. > >> +-- At recording, the instructions for the 0th and 63rd slots are > >> +-- merged like the following: > >> +-- | str x3, [x19, #496] > >> +-- | stp x2, x1, [x19, #488] > >> +-- The first store is dominated by the stp, so the restored value > >> +-- is incorrect. > >> + > >> +-- Function with 63 slots on the stack. > >> +local function f63() > > Minor: Hardcode a number of slots to the function name looks odd. It is mentioned above why exactly this amount of slots is required. It shouldn't be touched. > > The same for tail63. Bumping a number of slots will > > require renaming of two functions. > > Feel free to ignore. Ignoring. > > >> + -- 61 unused slots to avoid extra stores in between. > >> + -- luacheck: no unused > >> + local _, _, _, _, _, _, _, _, _, _ > >> + local _, _, _, _, _, _, _, _, _, _ > >> + local _, _, _, _, _, _, _, _, _, _ > >> + local _, _, _, _, _, _, _, _, _, _ > >> + local _, _, _, _, _, _, _, _, _, _ > >> + local _, _, _, _, _, _, _, _, _, _ > >> + local _ > >> + return 1, 2 > >> +end > >> + > >> +test:done(true) -- Best regards, Sergey Kaplun