Hi, Sergey!
Thanks for the patch!
LGTM, except for a few nits below.
 
> 
>>From: Mike Pall <mike>
>>
>>Reported by caohongqing.
>>
>>(cherry picked from commit 5bf0da3d7c02f9959fa3a9fb721e0565137b70c8)
>>
>>If there is high register pressure, and there are almost all registers
>>in use during the aforementioned assembling, the same register is chosen
>>as the one holding the given stack slot and the one holding the constant
>>value for the type comparison. As the result we get the following
>>assertion guard check in assembly:
>>| cmp x0, x0, lsr #32
>>| bne ->0
>>Which is always false.
>>
>>This happens because the `tmp` register (used for loading constant type
>>to compare in assertion guard) is scratched from `gpr` register set, but
>>not from `allow` set, which is used during picking the register for slot
>>loading (at the begging `allow` and `gpr` sets are the same).
>>
>>This patch changes `allow` set to `gpr` to fix the issue.
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>---
>>
>>Branch:  https://github.com/tarantool/luajit/tree/skaplun/or-94-fix-bool-ahuvload
>>PR:  https://github.com/tarantool/tarantool/pull/8214
>>Issues:
>>*  https://github.com/tarantool/tarantool/issues/8069
>>*  https://github.com/openresty/luajit2/pull/94
>>
>> src/lj_asm_arm64.h | 2 +-
>> .../or-94-arm64-ir-ahuvload-bool.test.lua | 145 ++++++++++++++++++
>> 2 files changed, 146 insertions(+), 1 deletion(-)
>> create mode 100644 test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua
>>
>>diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
>>index 4aeb51f3..30d34ca1 100644
>>--- a/src/lj_asm_arm64.h
>>+++ b/src/lj_asm_arm64.h
>>@@ -1055,7 +1055,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
>>     emit_n(as, (A64I_CMNx^A64I_K12) | A64F_U12(1), tmp);
>>   } else {
>>     emit_nm(as, A64I_CMPx | A64F_SH(A64SH_LSR, 32),
>>- ra_allock(as, (irt_toitype(ir->t) << 15) | 0x7fff, allow), tmp);
>>+ ra_allock(as, (irt_toitype(ir->t) << 15) | 0x7fff, gpr), tmp);
>>   }
>>   if (ofs & FUSE_REG)
>>     emit_dnm(as, (A64I_LDRx^A64I_LS_R)|A64I_LS_UXTWx|A64I_LS_SH, tmp, idx, (ofs & 31));
>>diff --git a/test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua b/test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua
>>new file mode 100644
>>index 00000000..a9b0071c
>>--- /dev/null
>>+++ b/test/tarantool-tests/or-94-arm64-ir-ahuvload-bool.test.lua
>>@@ -0,0 +1,145 @@
>>+local tap = require('tap')
>>+-- Test file to demonstrate the incorrect JIT assembling
>>+-- for IR_{AHUV}LOAD specialized to boolean.
>>+-- See also:  https://github.com/openresty/luajit2/pull/94 .
>>+
>>+-- If there is high register pressure, and there are almost
>>+-- all registers in use during the aforementioned assembling,
>>+-- the same register is chosen as the one holding the given
>>+-- stack slot and the one holding the constant value for the type
>>+-- comparison. As the result we get the following assertion
>>+-- guard check in assembly:
>>+-- | cmp x0, x0, lsr #32
>>+-- | bne ->0
>>+-- Which is always false.
>>+local ffi = require('ffi')
>>+-- Need for code generation.
>>+_G.ffi = ffi
>>+
>>+local traceinfo = require('jit.util').traceinfo
>>+
>>+-- Each payload will be recording with the corresponding IR.
>>+local TESTS = {
>>+ {irname = 'ALOAD', payload = 'aload()'},
>>+ {irname = 'HLOAD', payload = 'hload()'},
>>+ {irname = 'ULOAD', payload = 'uload()'},
>>+ {irname = 'VLOAD', payload = '...'},
>>+}
>>+local N_TESTS = #TESTS
>>+
>>+local test = tap.test('or-94-arm64-ir-ahuvload-bool')
>>+test:plan(N_TESTS)
>>+
>>+-- Functions to be inlined on trace to generate different
>>+-- types of IRs (ALOAD, HLOAD, ULOAD).
>>+-- Declare as global for code generation.
>>+local arr = {true}
>>+local function aload()
>>+ return arr[1]
>>+end
>>+_G.aload = aload
>>+
>>+local h = {data = true}
>>+local function hload()
>>+ local boolvalue = h.data
>>+ return boolvalue
>>+end
>>+_G.hload = hload
>>+
>>+do
>>+ local upvalue = true
>>+ local function uload()
>>+ return upvalue
>>+ end
>>+ -- Make upvalue muttable. Not really need to return this
>Typo: s/muttable/mutable/
>>+ -- function.
>>+ local function _()
>>+ upvalue = not upvalue
>>+ end
>>+ _G.uload = uload
>>+end
>>+
>>+-- This function generate code like the following:
>Typo: s/generate/generates/
>>+-- | local test_f(...)
>>+-- | local r
>>+-- | local rup1
>>+-- | --[[...]]
>>+-- | for _ = 1, 4 do
>>+-- | r1 = ffi.cast("int", 1)
>>+-- | --[[...]]
>>+-- | r = main_payload()
>>+-- | rup1 = r1
>>+-- | --[[...]]
>>+-- | end
>>+-- | end
>>+-- | return test_f
>>+-- Those `rn` variables before and after `main_payload` are
>>+-- required to generate enough register pressure (for GPR). Amount
>>+-- of repeats is empirical.
>>+-- Additional `test_f(...)` wrapper is needed for IR_VLOAD usage,
>>+-- when `main_payload` is just `...`.
>>+local function generate_payload(n_fillers, main_payload)
>>+ local code_chunk = 'local function test_f(...)\n'
>>+ code_chunk = code_chunk .. 'local r\n'
>>+ for i = 1, n_fillers do
>>+ code_chunk = code_chunk .. ('local rup%d\n'):format(i)
>>+ end
>>+ code_chunk = code_chunk .. 'for _ = 1, 4 do\n'
>>+ for i = 1, n_fillers do
>>+ code_chunk = code_chunk ..
>>+ ('local r%d = ffi.cast("int", %d)\n'):format(i, i)
>>+ end
>>+ code_chunk = code_chunk .. 'r = ' .. main_payload .. '\n'
>>+ for i = 1, n_fillers do
>>+ code_chunk = code_chunk .. ('rup%d = r%d\n'):format(i, i)
>>+ end
>>+ code_chunk = code_chunk .. 'end\nend\n'
>>+ code_chunk = code_chunk .. 'return test_f'
>>+ local f, err = loadstring(code_chunk, 'test_function')
>>+ assert(type(f) == 'function', err)
>>+ f = f()
>>+ assert(type(f) == 'function', 'returned generated value is not a function')
>>+ return f
>>+end
>That section is really hard to read, is there any way to make it more
>readable? I believe even slight reformatting might help.
>>+
>>+-- Disable sink optimization to allocate more registers in a
>>+-- "convenient" way. 'hotexit' option is required to be sure that
>>+-- we will start a new trace on false-positive guard assertion.
>Typo: s/on false-positive/on a false-positive/
>>+-- The new trace contains the same IR and so the same assertion
>>+-- guard. This trace will be executed, assertion guard failed
>>+-- again and the new third trace will be recorded. This trace will
>Typo: s/assertion guard failed again/the assertion guard will fail again/
>>+-- be the last one to record as far as iterations over cycle are
>>+-- finished and we returning from the function. The report of
>Typo: s/we returning/we are returning/
>>+-- `jit.dump` before the patch is the following:
>>+-- | TRACE 1 start "test_function":29
>>+-- | TRACE 1 stop -> loop
>>+-- | TRACE 1 exit 0
>>+-- | TRACE 2 start 1/0 "test_function":30
>>+-- | TRACE 2 stop -> loop
>>+-- | TRACE 2 exit 0
>>+-- | TRACE 3 start 2/0 "test_function":30
>>+-- | TRACE 3 stop -> return
>>+-- On the other hand, after the patch we will have only 2 traces:
>>+-- the main one, and the second is recorded on exit from loop.
>>+-- Hence, the report of `jit.dump` will be the following:
>>+-- | TRACE 1 start "test_function":29
>>+-- | TRACE 1 stop -> loop
>>+-- | TRACE 1 exit 3
>>+-- | TRACE 2 start 1/3 "test_function":84
>>+-- | TRACE 2 stop -> return
>>+jit.opt.start('hotloop=1', 'hotexit=1', '-sink')
>>+-- Prevent random hotcount to be sure that the cycle is compiled
>>+-- first.
>>+jit.off()
>>+jit.flush()
>>+for i = 1, N_TESTS do
>>+ local f = generate_payload(26, TESTS[i].payload)
>>+ jit.on()
>>+ -- Argument is needed only for IR_VLOAD.
>>+ f(true)
>>+ jit.off()
>>+ test:ok(not traceinfo(3), 'not recorded sidetrace for IR_' .. TESTS[i].irname)
>>+ jit.flush()
>>+end
>>+
>>+os.exit(test:check() and 0 or 1)
>>--
>>2.34.1
>--
>Best regards,
>Maxim Kokryashkin