From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id D976914BEB08; Mon, 8 Sep 2025 11:55:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D976914BEB08 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1757321701; bh=SFDiOeQ/DIqjiM98QbPKxcpJK2j3thDqfzg1mOZ/NHc=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=ChwFvUTrlHVyqBzYVthNoUzkGjWjQUOt93+/SHJD8JWfo7cXHacXlZwmxHombeMoP UGDT0K+Xejmg2iKLN7B7kJ8ouql1ks3QkuV/Dl8KRLYddaiS2oyJ0Zkt0omdaIf7D+ wiFfDa2Ui+POiTdQgJvSNuFMBB6IeVi/gVepw0O0= Received: from send264.i.mail.ru (send264.i.mail.ru [95.163.59.103]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 577281B7EEE for ; Mon, 8 Sep 2025 11:54:59 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 577281B7EEE Received: by exim-smtp-c584fb9f-wnv2g with esmtpa (envelope-from ) id 1uvXeM-0000000007o-0oyM; Mon, 08 Sep 2025 11:54:58 +0300 Content-Type: multipart/alternative; boundary="------------5eDmUa5Sg1Ci0j6NDhQOzj80" Message-ID: Date: Mon, 8 Sep 2025 11:54:58 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <20250827091711.13681-1-skaplun@tarantool.org> Content-Language: en-US In-Reply-To: <20250827091711.13681-1-skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9C019336662AA214176B16F387A7FB3396E094CE697E41A75182A05F5380850405290D82461A0CC3F3DE06ABAFEAF6705044300D471CFA1B89BD8FE842201AD922C3955A1DF2D7297 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7A8325FA649D0A450EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB553375661343631359B0F9D91D02D37A22DCFC38C09373E6C42D23B6230E457A338C424F389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C078FCF50C7EAF9C588941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6A70DDFFB3186CBC5CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C22495DF8F12EED60F84276E601842F6C81A12EF20D2F80756B5FB606B96278B59C4276E601842F6C81A127C277FBC8AE2E8B57C18602FF70A7313AA81AA40904B5D99C9F4D5AE37F343AD1F44FA8B9022EA23BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE493B89ED3C7A62817818CCB3ED2A1DE2304C4224003CC83647689D4C264860C145E X-C1DE0DAB: 0D63561A33F958A571E24D51D01611AD5002B1117B3ED696C702833ABC16F68A1BDDAE3D1EA49BEA823CB91A9FED034534781492E4B8EEADABF80F987DAEDACBBDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D345C110A855FC0999989405DCBB2698565CE3B056E755977D9EB2474B80BBBFF32F74926FCF49C1DA21D7E09C32AA3244C92EAF1BBD7C5C85277DD89D51EBB77426455EFDB3C4919E8EA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59978A700BF655EAEEED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVdVMtzNxwZu54RAEAWTLuGU= X-Mailru-Sender: 811C44EDE0507D1F797560C68D020EBD3FA07EACD0C54F9108AF9BFB128101EEBCB0BD836EBEE897B171DAA17CFFF986645D15D82EE4B272BD6E4642A116CA93524AA66B5ACBE6721EF430B9A63E2A504198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusion (again). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------5eDmUa5Sg1Ci0j6NDhQOzj80 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, Sergey, The test added with initial fix (test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua) segfaults with proposed patch. CMake configuration: cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON Arch: ARM64. Sergey On 8/27/25 12:17, Sergey Kaplun wrote: > From: Mike Pall > > Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley. > > (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b) > > Assume we have stores/loads from the pointer with offset +488 and -16. > The lower bits of the offset are the same as for the offset (488 + 8). > This leads to the incorrect fusion of these instructions: > | str x20, [x21, 488] > | stur x20, [x21, -16] > to the following instruction: > | stp x20, x20, [x21, 488] > > This patch prevents this fusion by more accurate offset comparison. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#11691 > --- > > Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion > Related issues: > *https://github.com/tarantool/tarantool/issues/11691 > *https://github.com/LuaJIT/LuaJIT/issues/1075 > > src/lj_emit_arm64.h | 17 ++- > ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++ > 2 files changed, 142 insertions(+), 4 deletions(-) > create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > > diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h > index 5c1bc372..9dd92c40 100644 > --- a/src/lj_emit_arm64.h > +++ b/src/lj_emit_arm64.h > @@ -121,6 +121,17 @@ static int emit_checkofs(A64Ins ai, int64_t ofs) > } > } > > +static LJ_AINLINE uint32_t emit_lso_pair_candidate(A64Ins ai, int ofs, int sc) > +{ > + if (ofs >= 0) { > + return ai | A64F_U12(ofs>>sc); /* Subsequent lj_ror checks ofs. */ > + } else if (ofs >= -256) { > + return (ai^A64I_LS_U) | A64F_S9(ofs & 0x1ff); > + } else { > + return A64F_D(31); /* Will mismatch prev. */ > + } > +} > + > static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) > { > int ot = emit_checkofs(ai, ofs), sc = (ai >> 30) & 3; > @@ -132,11 +143,9 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) > uint32_t prev = *as->mcp & ~A64F_D(31); > int ofsm = ofs - (1< A64Ins aip; > - if (prev == (ai | A64F_N(rn) | A64F_U12(ofsm>>sc)) || > - prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsm&0x1ff))) { > + if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsm, sc)) { > aip = (A64F_A(rd) | A64F_D(*as->mcp & 31)); > - } else if (prev == (ai | A64F_N(rn) | A64F_U12(ofsp>>sc)) || > - prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsp&0x1ff))) { > + } else if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsp, sc)) { > aip = (A64F_D(rd) | A64F_A(*as->mcp & 31)); > ofsm = ofs; > } else { > diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > new file mode 100644 > index 00000000..c84c3b23 > --- /dev/null > +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > @@ -0,0 +1,129 @@ > +local tap = require('tap') > +local ffi = require('ffi') > + > +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP > +-- instruction fused from LDR/STR with negative offset and > +-- positive offset with the same lower bits on arm64. > +-- See alsohttps://github.com/LuaJIT/LuaJIT/pull/1075. > +local test = tap.test('lj-1075-arm64-incorrect-ldp-stp-fusion'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(6) > + > +-- Amount of iterations to compile and run the invariant part of > +-- the trace. > +local N_ITERATIONS = 4 > + > +local EXPECTED = 42 > + > +-- 4 slots of redzone for int64_t load/store. > +local REDZONE = 4 > +local MASK_IMM7 = 0x7f > +local BUFLEN = (MASK_IMM7 + REDZONE) * 4 > +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0) > + > +local function clear_buf() > + ffi.fill(buf, ffi.sizeof(buf), 0) > +end > + > +-- Initialize the buffer with simple values. > +local function init_buf() > + -- Limit to fill the buffer. 0 in the top part helps > + -- to detect the issue. > + local LIMIT = BUFLEN - 12 > + for i = 0, LIMIT - 1 do > + buf[i] = i > + end > + for i = LIMIT, BUFLEN - 1 do > + buf[i] = 0 > + end > +end > + > +jit.opt.start('hotloop=2') > + > +-- Assume we have stores/loads from the pointer with offset > +-- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are > +-- 1111100. These bits are the same as for the offset (488 + 8). > +-- Thus, before the patch, these two instructions: > +-- | str x20, [x21, #488] > +-- | stur x20, [x21, #-16] > +-- are incorrectly fused to the: > +-- | stp x20, x20, [x21, #488] > + > +-- Test stores. > + > +local start = ffi.cast('unsigned char *', buf) > +-- Use constants to allow optimization to take place. > +local base_ptr = start + 16 > +for _ = 1, N_ITERATIONS do > + -- Save the result only for the last iteration. > + clear_buf() > + -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496` > + -- on the trace before the patch. > + ffi.cast('uint64_t *', base_ptr + 488)[0] = EXPECTED > + ffi.cast('uint64_t *', base_ptr - 16)[0] = EXPECTED > +end > + > +test:is(buf[488 + 16], EXPECTED, 'correct store top value') > +test:is(buf[0], EXPECTED, 'correct store bottom value') > + > +-- Test loads. > + > +init_buf() > + > +local top, bottom > +for _ = 1, N_ITERATIONS do > + -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496` > + -- on the trace before the patch. > + top = ffi.cast('uint64_t *', base_ptr + 488)[0] > + bottom = ffi.cast('uint64_t *', base_ptr - 16)[0] > +end > + > +test:is(top, 0xfffefdfcfbfaf9f8ULL, 'correct load top value') > +test:is(bottom, 0x706050403020100ULL, 'correct load bottom value') > + > +-- Another reproducer that is based on the snapshot restoring. > +-- Its advantage is avoiding FFI usage. > + > +-- Snapshot slots are restored in the reversed order. > +-- The recording order is the following (from the bottom of the > +-- trace to the top): > +-- - 0th (ofs == -16) -- `f64()` replaced the `tail64()` on the > +-- stack, > +-- - 63rd (ofs == 488) -- 1, > +-- - 64th (ofs == 496) -- 2. > +-- At recording, the instructions for the 0th and 63rd slots are > +-- merged like the following: > +-- | str x3, [x19, #496] > +-- | stp x2, x1, [x19, #488] > +-- The first store is dominated by the stp, so the restored value > +-- is incorrect. > + > +-- Function with 63 slots on the stack. > +local function f63() > + -- 61 unused slots to avoid extra stores in between. > + -- luacheck: no unused > + local _, _, _, _, _, _, _, _, _, _ > + local _, _, _, _, _, _, _, _, _, _ > + local _, _, _, _, _, _, _, _, _, _ > + local _, _, _, _, _, _, _, _, _, _ > + local _, _, _, _, _, _, _, _, _, _ > + local _, _, _, _, _, _, _, _, _, _ > + local _ > + return 1, 2 > +end > + > +local function tail63() > + return f63() > +end > + > +-- Record the trace. > +tail63() > +tail63() > +-- Run the trace. > +local one, two = tail63() > +test:is(one, 1, 'correct 1st value on stack') > +test:is(two, 2, 'correct 2nd value on stack') > + > +test:done(true) --------------5eDmUa5Sg1Ci0j6NDhQOzj80 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi, Sergey,

The test added with initial fix (test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua)

segfaults with proposed patch.

CMake configuration: cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON

Arch: ARM64.

Sergey

On 8/27/25 12:17, Sergey Kaplun wrote:
From: Mike Pall <mike>

Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley.

(cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b)

Assume we have stores/loads from the pointer with offset +488 and -16.
The lower bits of the offset are the same as for the offset (488 + 8).
This leads to the incorrect fusion of these instructions:
| str   x20, [x21, 488]
| stur  x20, [x21, -16]
to the following instruction:
| stp   x20, x20, [x21, 488]

This patch prevents this fusion by more accurate offset comparison.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11691
---

Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion
Related issues:
* https://github.com/tarantool/tarantool/issues/11691
* https://github.com/LuaJIT/LuaJIT/issues/1075

 src/lj_emit_arm64.h                           |  17 ++-
 ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++
 2 files changed, 142 insertions(+), 4 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua

diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
index 5c1bc372..9dd92c40 100644
--- a/src/lj_emit_arm64.h
+++ b/src/lj_emit_arm64.h
@@ -121,6 +121,17 @@ static int emit_checkofs(A64Ins ai, int64_t ofs)
   }
 }
 
+static LJ_AINLINE uint32_t emit_lso_pair_candidate(A64Ins ai, int ofs, int sc)
+{
+  if (ofs >= 0) {
+    return ai | A64F_U12(ofs>>sc);  /* Subsequent lj_ror checks ofs. */
+  } else if (ofs >= -256) {
+    return (ai^A64I_LS_U) | A64F_S9(ofs & 0x1ff);
+  } else {
+    return A64F_D(31);  /* Will mismatch prev. */
+  }
+}
+
 static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
 {
   int ot = emit_checkofs(ai, ofs), sc = (ai >> 30) & 3;
@@ -132,11 +143,9 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
     uint32_t prev = *as->mcp & ~A64F_D(31);
     int ofsm = ofs - (1<<sc), ofsp = ofs + (1<<sc);
     A64Ins aip;
-    if (prev == (ai | A64F_N(rn) | A64F_U12(ofsm>>sc)) ||
-	prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsm&0x1ff))) {
+    if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsm, sc)) {
       aip = (A64F_A(rd) | A64F_D(*as->mcp & 31));
-    } else if (prev == (ai | A64F_N(rn) | A64F_U12(ofsp>>sc)) ||
-	       prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsp&0x1ff))) {
+    } else if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsp, sc)) {
       aip = (A64F_D(rd) | A64F_A(*as->mcp & 31));
       ofsm = ofs;
     } else {
diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
new file mode 100644
index 00000000..c84c3b23
--- /dev/null
+++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
@@ -0,0 +1,129 @@
+local tap = require('tap')
+local ffi = require('ffi')
+
+-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP
+-- instruction fused from LDR/STR with negative offset and
+-- positive offset with the same lower bits on arm64.
+-- See also https://github.com/LuaJIT/LuaJIT/pull/1075.
+local test = tap.test('lj-1075-arm64-incorrect-ldp-stp-fusion'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(6)
+
+-- Amount of iterations to compile and run the invariant part of
+-- the trace.
+local N_ITERATIONS = 4
+
+local EXPECTED = 42
+
+-- 4 slots of redzone for int64_t load/store.
+local REDZONE = 4
+local MASK_IMM7 = 0x7f
+local BUFLEN = (MASK_IMM7 + REDZONE) * 4
+local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0)
+
+local function clear_buf()
+  ffi.fill(buf, ffi.sizeof(buf), 0)
+end
+
+-- Initialize the buffer with simple values.
+local function init_buf()
+  -- Limit to fill the buffer. 0 in the top part helps
+  -- to detect the issue.
+  local LIMIT = BUFLEN - 12
+  for i = 0, LIMIT - 1  do
+    buf[i] = i
+  end
+  for i = LIMIT, BUFLEN - 1  do
+    buf[i] = 0
+  end
+end
+
+jit.opt.start('hotloop=2')
+
+-- Assume we have stores/loads from the pointer with offset
+-- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are
+-- 1111100. These bits are the same as for the offset (488 + 8).
+-- Thus, before the patch, these two instructions:
+-- | str   x20, [x21, #488]
+-- | stur  x20, [x21, #-16]
+-- are incorrectly fused to the:
+-- | stp   x20, x20, [x21, #488]
+
+-- Test stores.
+
+local start = ffi.cast('unsigned char *', buf)
+-- Use constants to allow optimization to take place.
+local base_ptr = start + 16
+for _ = 1, N_ITERATIONS do
+  -- Save the result only for the last iteration.
+  clear_buf()
+  -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496`
+  -- on the trace before the patch.
+  ffi.cast('uint64_t *', base_ptr + 488)[0] = EXPECTED
+  ffi.cast('uint64_t *', base_ptr - 16)[0] = EXPECTED
+end
+
+test:is(buf[488 + 16], EXPECTED, 'correct store top value')
+test:is(buf[0], EXPECTED, 'correct store bottom value')
+
+-- Test loads.
+
+init_buf()
+
+local top, bottom
+for _ = 1, N_ITERATIONS do
+  -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496`
+  -- on the trace before the patch.
+  top = ffi.cast('uint64_t *', base_ptr + 488)[0]
+  bottom = ffi.cast('uint64_t *', base_ptr - 16)[0]
+end
+
+test:is(top, 0xfffefdfcfbfaf9f8ULL, 'correct load top value')
+test:is(bottom, 0x706050403020100ULL, 'correct load bottom value')
+
+-- Another reproducer that is based on the snapshot restoring.
+-- Its advantage is avoiding FFI usage.
+
+-- Snapshot slots are restored in the reversed order.
+-- The recording order is the following (from the bottom of the
+-- trace to the top):
+-- - 0th  (ofs == -16) -- `f64()` replaced the `tail64()` on the
+--                         stack,
+-- - 63rd (ofs == 488) -- 1,
+-- - 64th (ofs == 496) -- 2.
+-- At recording, the instructions for the 0th and 63rd slots are
+-- merged like the following:
+-- | str   x3, [x19, #496]
+-- | stp   x2, x1, [x19, #488]
+-- The first store is dominated by the stp, so the restored value
+-- is incorrect.
+
+-- Function with 63 slots on the stack.
+local function f63()
+  -- 61 unused slots to avoid extra stores in between.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _
+  return 1, 2
+end
+
+local function tail63()
+  return f63()
+end
+
+-- Record the trace.
+tail63()
+tail63()
+-- Run the trace.
+local one, two = tail63()
+test:is(one, 1, 'correct 1st value on stack')
+test:is(two, 2, 'correct 2nd value on stack')
+
+test:done(true)
--------------5eDmUa5Sg1Ci0j6NDhQOzj80--