LGTM
Thanks!Hi, Sergey! Thanks for the review! Fixed your comment and force-pushed the branch. On 08.09.25, Sergey Bronnikov wrote:Hi, Sergey, thanks for the patch! LGTM with two minor comments Sergey<snipped>On 8/27/25 12:17, Sergey Kaplun wrote:From: Mike Pall <mike> Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley. (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b) Assume we have stores/loads from the pointer with offset +488 and -16. The lower bits of the offset are the same as for the offset (488 + 8). This leads to the incorrect fusion of these instructions: | str x20, [x21, 488] | stur x20, [x21, -16] to the following instruction: | stp x20, x20, [x21, 488] This patch prevents this fusion by more accurate offset comparison. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#11691 --- Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion Related issues: *https://github.com/tarantool/tarantool/issues/11691 *https://github.com/LuaJIT/LuaJIT/issues/1075 src/lj_emit_arm64.h | 17 ++- ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++ 2 files changed, 142 insertions(+), 4 deletions(-) create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h index 5c1bc372..9dd92c40 100644 --- a/src/lj_emit_arm64.h +++ b/src/lj_emit_arm64.h<snipped>diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua new file mode 100644 index 00000000..c84c3b23 --- /dev/null +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua @@ -0,0 +1,129 @@<snipped>+ +jit.opt.start('hotloop=2')Why 2? It deserves a comment, because usually we use 1 hotloop.It's a copy-pasting mistake from the aarch64 machine, fixed to `hotloop=1`, thanks:
=================================================================== diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua index c84c3b23..393a1aa7 100644 --- a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua @@ -40,7 +40,7 @@ local function init_buf() end end -jit.opt.start('hotloop=2') +jit.opt.start('hotloop=1') -- Assume we have stores/loads from the pointer with offset -- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are ===================================================================<snipped>+ +-- Another reproducer that is based on the snapshot restoring. +-- Its advantage is avoiding FFI usage. + +-- Snapshot slots are restored in the reversed order. +-- The recording order is the following (from the bottom of the +-- trace to the top): +-- - 0th (ofs == -16) -- `f64()` replaced the `tail64()` on the +-- stack, +-- - 63rd (ofs == 488) -- 1, +-- - 64th (ofs == 496) -- 2. +-- At recording, the instructions for the 0th and 63rd slots are +-- merged like the following: +-- | str x3, [x19, #496] +-- | stp x2, x1, [x19, #488] +-- The first store is dominated by the stp, so the restored value +-- is incorrect. + +-- Function with 63 slots on the stack. +local function f63()Minor: Hardcode a number of slots to the function name looks odd.It is mentioned above why exactly this amount of slots is required. It shouldn't be touched.
The question was about hard-coding a number in a function name, not about
using exactly this number of slots. Ok, I'll not insist, as I
said in a question.
The same for tail63. Bumping a number of slots will require renaming of two functions. Feel free to ignore.Ignoring.+ -- 61 unused slots to avoid extra stores in between. + -- luacheck: no unused + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _, _, _, _, _, _, _, _, _, _ + local _ + return 1, 2 +end +<snipped>+test:done(true)