From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 022A913D3ECD; Tue, 10 Jun 2025 18:29:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 022A913D3ECD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1749569394; bh=rGkoqLhA/89brPhSXwHSJPeMVZpd/e84199kje27//w=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=jhWrp1pF1a45eCV960aNnWB6Acr8KqZ5yDbzTZe9PcnFh9WHlZ/NFCOw7ZTMv1hIZ XWIFqjY3xGkdt9eqz8sDWp/OiE4BRi8CkkvWT+QwNXLtDU5YIoC1va4aQWTs2/fyXU byH/y7NPnHjdMYo0S4xQohTOb+CjpQ1qg0ds6PpU= Received: from send106.i.mail.ru (send106.i.mail.ru [89.221.237.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5497A13D3ECC for ; Tue, 10 Jun 2025 18:29:53 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5497A13D3ECC Received: by exim-smtp-85b97957d7-k2ggr with esmtpa (envelope-from ) id 1uP0vA-00000000GAe-1Ysl; Tue, 10 Jun 2025 18:29:52 +0300 Content-Type: multipart/alternative; boundary="------------JzxHBju0ZrSUw0tjySqYWuGB" Message-ID: <17a7c30a-246b-40f1-a63c-a6e09beffdc6@tarantool.org> Date: Tue, 10 Jun 2025 18:29:52 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <7644f7c143f38426718039d1fefb6626335bf10b.1749550966.git.skaplun@tarantool.org> Content-Language: en-US In-Reply-To: <7644f7c143f38426718039d1fefb6626335bf10b.1749550966.git.skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD909B82214CC98891796DBD297D1E2ABAC76D46466E08ED11800894C459B0CD1B9AD0C06FB3E4EDECF479CDAE959BF6424EDCEF379AD6CD4F49153532B97545997D64C754CAC7C657B X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7AC4684DF4EC4B256EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB553375668906014A252870FC1B00AD6496ECBB36DB849BA886C290899E7D9732AEDFFA7A389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A29E2F051442AF778941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6E5E764EB5D94DBD4CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C22490F250D17497FEF6176E601842F6C81A12EF20D2F80756B5FB606B96278B59C4276E601842F6C81A127C277FBC8AE2E8B8DBB596EC94336063AA81AA40904B5D99C9F4D5AE37F343AD1F44FA8B9022EA23BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E735A3CCBC2573AEBDE1C4224003CC83647689D4C264860C145E X-C1DE0DAB: 0D63561A33F958A55CEC134D626CFFF55002B1117B3ED696B5F8FF8300385F9DD57BAD45EC4C5DE1823CB91A9FED034534781492E4B8EEADD0953842B444AAC3BDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D3490011A262ADAEDFE2C64FAB6FA86C551EEF379F8A5285EA778DD5644035B699496BA6CD87F5B86851D7E09C32AA3244CA0B5158FE771DA6777DD89D51EBB7742F1507E8CC3CF8F84EA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59978A700BF655EAEEED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVSykAyseJQ6/lhxvlc8I83U= X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D6140AD0C06FB3E4EDECF479CDAE959BF642455438DF290242CF40152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to stack overflow. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------JzxHBju0ZrSUw0tjySqYWuGB Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hello, Sergey! Thanks for the patch! LGTM with a minor below. Sergey On 6/10/25 13:28, Sergey Kaplun wrote: > From: Mike Pall > > Reported by pwnhacker0x18. Fixed by Peter Cawley. > > (cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360) > > `lj_snap_restore()` restores the PC for the inner cframe, but not the > outer (before the protected call to the `trace_exit_cp()`). If the stack > overflow is observed during the further snapshot restoration, it doesn't > fix up the outer cframe's PC. After that, in the following error > rethrowing from the right C frame, in case of error handler set, the > stack overflow error may be raised again, and with an incorrect value of > the PC for that frame, it leads to the crash in the `debug_framepc()`. > > This patch prevents it by inserting the special pseudo-valid value `L`. > Unfortunately, this leads to the uninitialized reads by the > `debug_framepc()` (by the address `L - 4`), if the error handler > observes the resulted PC. This will be fixed in the next patch. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#11278 > --- > src/lj_trace.c | 4 +- > .../lj-1196-partial-snap-restore.test.lua | 51 +++++++++++++++++++ > 2 files changed, 54 insertions(+), 1 deletion(-) > create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua > > diff --git a/src/lj_trace.c b/src/lj_trace.c > index 0d1d233a..8a18d3cf 100644 > --- a/src/lj_trace.c > +++ b/src/lj_trace.c > @@ -909,8 +909,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr) > exd.J = J; > exd.exptr = exptr; > errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp); > - if (errcode) > + if (errcode) { > + setcframe_pc(cframe_raw(L->cframe), L); /* Point to any valid memory. */ > return -errcode; /* Return negated error code. */ > + } > > if (exitcode) copyTV(L, L->top++, &exiterr); /* Anchor the error object. */ > > diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua > new file mode 100644 > index 00000000..8ee8f673 > --- /dev/null > +++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua > @@ -0,0 +1,51 @@ > +local tap = require('tap') > + > +-- Test file to demonstrate LuaJIT crash during snapshot restore > +-- in case of the stack overflow. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1196. > + > +local test = tap.test('lj-1196-partial-snap-restore') > + > +test:plan(1) > + > +-- XXX: The reproducer below uses several stack slot offsets to > +-- make sure that stack overflow happens during the snapshot > +-- restoration and not the call to the stitched function and > +-- return its result. The actual stack size should be less than > +-- `LJ_STACK_MAXEX`, but with requested space it should be greater > +-- than `LJ_STACK_MAX`, see for the details. > +-- Before that, the `lj_snap_restore()` restores the `pc` for the > +-- inner cframe, but not the outer (before the protected call to > +-- the `trace_exit_cp()`). Thus, the further error rethrowing from > +-- the right C frame leads to the crash before the patch. > + > +-- XXX: Simplify the `jit.dump()` output. > +local tonumber = tonumber > + > +-- This function starts the first trace. > +local function recursive_f() > + -- Function with the single result to cause the trace stitching. > + tonumber('') > + -- Prereserved stack space before the call. > + -- luacheck: no unused > + local _, _, _, _, _, _, _, _, _, _, _ > + -- Link from the stitched trace to the parent one. > + recursive_f() > + -- Additional stack required for the snapshot restoration. /stack/stack space/? > + -- luacheck: no unused > + local _, _, _ > +end > + > +-- Use coroutine wrap for the fixed stack size at the start. > +coroutine.wrap(function() > + -- XXX: Special stack slot offset. > + -- luacheck: no unused > + local _, _, _, _, _, _, _, _, _, _ > + -- The error is observed only if we have the error handler set, > + -- since we try to resize stack for its call. > + xpcall(recursive_f, function() end) > +end)() > + > +test:ok(true, 'no crash during snapshot restoring') > + > +test:done(true) --------------JzxHBju0ZrSUw0tjySqYWuGB Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hello, Sergey!

Thanks for the patch! LGTM with a minor below.

Sergey

On 6/10/25 13:28, Sergey Kaplun wrote:
From: Mike Pall <mike>

Reported by pwnhacker0x18. Fixed by Peter Cawley.

(cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360)

`lj_snap_restore()` restores the PC for the inner cframe, but not the
outer (before the protected call to the `trace_exit_cp()`). If the stack
overflow is observed during the further snapshot restoration, it doesn't
fix up the outer cframe's PC. After that, in the following error
rethrowing from the right C frame, in case of error handler set, the
stack overflow error may be raised again, and with an incorrect value of
the PC for that frame, it leads to the crash in the `debug_framepc()`.

This patch prevents it by inserting the special pseudo-valid value `L`.
Unfortunately, this leads to the uninitialized reads by the
`debug_framepc()` (by the address `L - 4`), if the error handler
observes the resulted PC. This will be fixed in the next patch.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11278
---
 src/lj_trace.c                                |  4 +-
 .../lj-1196-partial-snap-restore.test.lua     | 51 +++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua

diff --git a/src/lj_trace.c b/src/lj_trace.c
index 0d1d233a..8a18d3cf 100644
--- a/src/lj_trace.c
+++ b/src/lj_trace.c
@@ -909,8 +909,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
   exd.J = J;
   exd.exptr = exptr;
   errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
-  if (errcode)
+  if (errcode) {
+    setcframe_pc(cframe_raw(L->cframe), L);  /* Point to any valid memory. */
     return -errcode;  /* Return negated error code. */
+  }
 
   if (exitcode) copyTV(L, L->top++, &exiterr);  /* Anchor the error object. */
 
diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
new file mode 100644
index 00000000..8ee8f673
--- /dev/null
+++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
@@ -0,0 +1,51 @@
+local tap = require('tap')
+
+-- Test file to demonstrate LuaJIT crash during snapshot restore
+-- in case of the stack overflow.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1196.
+
+local test = tap.test('lj-1196-partial-snap-restore')
+
+test:plan(1)
+
+-- XXX: The reproducer below uses several stack slot offsets to
+-- make sure that stack overflow happens during the snapshot
+-- restoration and not the call to the stitched function and
+-- return its result. The actual stack size should be less than
+-- `LJ_STACK_MAXEX`, but with requested space it should be greater
+-- than `LJ_STACK_MAX`, see <src/lj_state.c> for the details.
+-- Before that, the `lj_snap_restore()` restores the `pc` for the
+-- inner cframe, but not the outer (before the protected call to
+-- the `trace_exit_cp()`). Thus, the further error rethrowing from
+-- the right C frame leads to the crash before the patch.
+
+-- XXX: Simplify the `jit.dump()` output.
+local tonumber = tonumber
+
+-- This function starts the first trace.
+local function recursive_f()
+  -- Function with the single result to cause the trace stitching.
+  tonumber('')
+  -- Prereserved stack space before the call.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _, _
+  -- Link from the stitched trace to the parent one.
+  recursive_f()
+  -- Additional stack required for the snapshot restoration.
/stack/stack space/?
+  -- luacheck: no unused
+  local _, _, _
+end
+
+-- Use coroutine wrap for the fixed stack size at the start.
+coroutine.wrap(function()
+  -- XXX: Special stack slot offset.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _
+  -- The error is observed only if we have the error handler set,
+  -- since we try to resize stack for its call.
+  xpcall(recursive_f, function() end)
+end)()
+
+test:ok(true, 'no crash during snapshot restoring')
+
+test:done(true)
--------------JzxHBju0ZrSUw0tjySqYWuGB--