From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 55050EAD359; Thu, 6 Jun 2024 16:03:14 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 55050EAD359 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1717678994; bh=yfOOwoAAyESXzex7QR4aqO3o7e9UAFDfom4LRLI8gwc=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=xBOSREoKLwHc4Lj0rjWsjPp0SuQEyvg8mzVk29WUYtnzseTcXbQ1QXkcJXRlIkN7g kqpKHCBFjtgQCvJ3NPfxIO4+TJj+HWD474e/X1DOuhFIU8RQR6flW4cz7tIwsbd/K2 vzfFx8M4SEuoYoMRucOLHavvIVAOsW0rd1b2JSX4= Received: from smtp41.i.mail.ru (smtp41.i.mail.ru [95.163.41.64]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C4F91EAD347 for ; Thu, 6 Jun 2024 16:03:12 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C4F91EAD347 Received: by smtp41.i.mail.ru with esmtpa (envelope-from ) id 1sFClr-00000008lk7-2g8E; Thu, 06 Jun 2024 16:03:12 +0300 Content-Type: multipart/alternative; boundary="------------dIs3UotUFcxF3D0mjW1rE00c" Message-ID: Date: Thu, 6 Jun 2024 16:03:11 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun , Maxim Kokryashkin Cc: tarantool-patches@dev.tarantool.org References: <362ec5c556a9e9832726ec89978ef25ba865af41.1713773432.git.skaplun@tarantool.org> In-Reply-To: <362ec5c556a9e9832726ec89978ef25ba865af41.1713773432.git.skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD95EAEBA1571A635001E90AB4BED9F1D60251E963CC1517F27182A05F538085040BA4B2203C2D77393C7A4B249DE6549FDAA6F3410C682760B8EB32B6104382CE673A11E6679D5798F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7DC6E0FF9A9D6BE87EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006373F5C94DB68DC8FE9EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38B043BF0FB74779F3642EE9C7CE83C11A3EA341E86EAA08BC6DDE172C9881A3473A471835C12D1D9774AD6D5ED66289B5278DA827A17800CE74A95F4E53E8DCE969FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C3E478A468B35FE767117882F4460429728AD0CFFFB425014E868A13BD56FB6657D81D268191BDAD3DC09775C1D3CA48CF42F9E9C69CF8A4C3BA3038C0950A5D36C8A9BA7A39EFB766D91E3A1F190DE8FDBA3038C0950A5D36D5E8D9A59859A8B6FFBBB8024B6E06FE76E601842F6C81A1F004C906525384303E02D724532EE2C3F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB33AC447995A7AD1857739F23D657EF2BD5E8D9A59859A8B60425ED1B70270490089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A51F4CBF80F623EA6D5002B1117B3ED696142132E798CC8FEBE772F934B9BCD185823CB91A9FED034534781492E4B8EEAD64C12243F54E5AB4 X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFA9749A7A0740BC3395BFFB336E7D61B9B7574D1C4FD3C0A4E054EB9F6A063B8F24DE828755D94EE900094467C1BF539C39DC0F5E819001CD03D5A3B59F1934F681E2A20C6388942C5F4332CA8FE04980913E6812662D5F2AB9AF64DB4688768036DF5FE9C0001AF333F2C28C22F508233FCF178C6DD14203 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj3qkQfioj9XQcyoT9UjXOEw== X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D61401916AEF4193A90AC0B44F62A7352308B6038D8CA0C83B4A10152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit v1 4/5] Handle all types of errors during trace stitching. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------dIs3UotUFcxF3D0mjW1rE00c Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi, Sergey, thanks for the patch! Please see my comments. On 22.04.2024 11:49, Sergey Kaplun wrote: > From: Mike Pall > > Thanks to Sergey Kaplun and Peter Cawley. > > (cherry picked from commit d06beb0480c5d1eb53b3343e78063950275aa281) > > This commit is a follow-up for the commit > 1b8216023d5a79814389f1c1affef27c15d9de27 ("Throw any errors before stack > changes in trace stitching."). The patch prepends failures for the > specific error to be thrown. Nevertheless, the error may be thrown due > to retrying trace recording in the case when table bump optimization > is enabled or when OOM is observed during reallocation of the snapshot > or IR buffers. > > This patch adds the corresponding protected frame and rethrows the error > after a fixup of the stack. > > This patch also tests the correctness of copying the error message to > the top of the stack to get a valid "abort" reason in the `jit.dump` > utility. > > Also, this patch fixes a non-ASCII space character in the comment for > . > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#9924 > --- > src/lj_ffrecord.c | 21 ++++++-- > test/tarantool-tests/CMakeLists.txt | 1 + > .../lj-1166-error-stitch-oom-ir-buff.test.lua | 46 ++++++++++++++++ > ...j-1166-error-stitch-oom-snap-buff.test.lua | 54 +++++++++++++++++++ > .../lj-1166-error-stitch-table-bump.test.lua | 38 +++++++++++++ > .../lj-1166-error-stitch/CMakeLists.txt | 1 + > .../lj-1166-error-stitch/mockalloc.c | 51 ++++++++++++++++++ > .../lj-720-errors-before-stitch.test.lua | 40 +++++++++++++- > 8 files changed, 245 insertions(+), 7 deletions(-) > create mode 100644 test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua > create mode 100644 test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua > create mode 100644 test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua > create mode 100644 test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt > create mode 100644 test/tarantool-tests/lj-1166-error-stitch/mockalloc.c > > diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c > index e3ed80fb..ff14e9e4 100644 > --- a/src/lj_ffrecord.c > +++ b/src/lj_ffrecord.c > @@ -96,6 +96,14 @@ static ptrdiff_t results_wanted(jit_State *J) > return -1; > } > > +static TValue *rec_stop_stitch_cp(lua_State *L, lua_CFunction dummy, void *ud) > +{ > + jit_State *J = (jit_State *)ud; > + lj_record_stop(J, LJ_TRLINK_STITCH, 0); > + UNUSED(L); UNUSED(dummy); > + return NULL; > +} > + > /* Trace stitching: add continuation below frame to start a new trace. */ > static void recff_stitch(jit_State *J) > { > @@ -106,10 +114,7 @@ static void recff_stitch(jit_State *J) > TValue *nframe = base + 1 + LJ_FR2; > const BCIns *pc = frame_pc(base-1); > TValue *pframe = frame_prevl(base-1); > - > - /* Check for this now. Throwing in lj_record_stop messes up the stack. */ > - if (J->cur.nsnap >= (MSize)J->param[JIT_P_maxsnap]) > - lj_trace_err(J, LJ_TRERR_SNAPOV); > + int errcode; > > /* Move func + args up in Lua stack and insert continuation. */ > memmove(&base[1], &base[-1-LJ_FR2], sizeof(TValue)*nslot); > @@ -134,13 +139,19 @@ static void recff_stitch(jit_State *J) > J->baseslot += 2 + LJ_FR2; > J->framedepth++; > > - lj_record_stop(J, LJ_TRLINK_STITCH, 0); > + errcode = lj_vm_cpcall(L, NULL, J, rec_stop_stitch_cp); > > /* Undo Lua stack changes. */ > memmove(&base[-1-LJ_FR2], &base[1], sizeof(TValue)*nslot); > setframe_pc(base-1, pc); > L->base -= 2 + LJ_FR2; > L->top -= 2 + LJ_FR2; > + > + if (errcode) { > + if (errcode == LUA_ERRRUN) > + copyTV(L, L->top-1, L->top + (1 + LJ_FR2)); > + lj_err_throw(L, errcode); /* Propagate errors. */ > + } > } > > /* Fallback handler for fast functions that are not recorded (yet). */ > diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt > index 56660932..d7c96078 100644 > --- a/test/tarantool-tests/CMakeLists.txt > +++ b/test/tarantool-tests/CMakeLists.txt > @@ -39,6 +39,7 @@ add_subdirectory(lj-802-panic-at-mcode-protfail) > add_subdirectory(lj-flush-on-trace) > add_subdirectory(lj-1004-oom-error-frame) > add_subdirectory(lj-1066-fix-cur_L-after-coroutine-resume) > +add_subdirectory(lj-1166-error-stitch) > > # The part of the memory profiler toolchain is located in tools > # directory, jit, profiler, and bytecode toolchains are located > diff --git a/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua b/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua > new file mode 100644 > index 00000000..e3a5397d > --- /dev/null > +++ b/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua > @@ -0,0 +1,46 @@ > +local tap = require('tap') > + > +-- Test file to demonstrate unbalanced Lua stack after instruction > +-- recording due to throwing an error at recording of a stitched > +-- function. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1166. > + > +local test = tap.test('lj-1166-error-stitch-oom-snap-buff'):skipcond({ should a name in tap.test match to test file name? now it is not. > + ['Test requires JIT enabled'] = not jit.status(), > + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD', > +}) > + > +test:plan(1) > + > +local mockalloc = require('mockalloc') > + > +local function create_chunk(n_slots) I would add a comment like this: --- a/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua +++ b/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua @@ -14,6 +14,18 @@ test:plan(1)  local mockalloc = require('mockalloc') +-- Generate a Lua chunk like below: +-- local s1 +-- local s2 +-- ... +-- local sN +-- for i = 1, 2 do +--   s1 = i + 1 +--   s2 = i + 2 +--   ... +--   sN = i + N +--   math.modf(1) +-- end  local function create_chunk(n_slots)    local chunk = ''    for i = 1, n_slots do > + local chunk = '' > + for i = 1, n_slots do > + chunk = chunk .. ('local s%d\n'):format(i) > + end > + chunk = chunk .. 'for i = 1, 2 do\n' > + -- Generate additional IR instructions. > + for i = 1, n_slots do > + chunk = chunk .. (' s%d = i + %d\n'):format(i, i) > + end > + -- `math.modf()` recording is NYI. > + chunk = chunk .. ' math.modf(1)\n' > + chunk = chunk .. 'end\n' > + return chunk > +end > + > +-- XXX: amount of slots is empirical. > +local tracef = assert(loadstring(create_chunk(175))) > + > +jit.opt.start('hotloop=1', '-loop', '-fold') > + > +mockalloc.mock() > + > +tracef() > + > +mockalloc.unmock() > + > +test:ok(true, 'stack is balanced') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua b/test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua > new file mode 100644 > index 00000000..8d671f8d > --- /dev/null > +++ b/test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua > @@ -0,0 +1,54 @@ > +local tap = require('tap') > + > +-- Test file to demonstrate unbalanced Lua stack after instruction > +-- recording due to throwing an error at recording of a stitched > +-- function. > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1166. > + > +local test = tap.test('lj-1166-error-stitch-oom-snap-buff'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD', > +}) > + > +test:plan(1) > + > +local mockalloc = require('mockalloc') > + > +local function create_chunk(n_conds) the same as above: please add a comment with an example of generated Lua chunk > + local chunk = '' > + chunk = chunk .. 'for i = 1, 2 do\n' > + -- Each condition adds additional snapshot. > + for i = 1, n_conds do > + chunk = chunk .. (' if i < %d then end\n'):format(i + n_conds) > + end > + -- `math.modf()` recording is NYI. > + chunk = chunk .. ' math.modf(1)\n' > + chunk = chunk .. 'end\n' > + return chunk > +end > + > +-- XXX: Need to compile the cycle in the `create_chunk()` to > +-- preallocate the snapshot buffer. > +jit.opt.start('hotloop=1', '-loop', '-fold') > + > +-- XXX: Amount of slots is empirical. > +local tracef = assert(loadstring(create_chunk(6))) > + > +-- XXX: Remove previous trace. > +jit.off() > +jit.flush() > + > +-- XXX: Update hotcounts to avoid hash collisions. > +jit.opt.start('hotloop=1') > + > +jit.on() > + > +mockalloc.mock() > + > +tracef() > + > +mockalloc.unmock() > + > +test:ok(true, 'stack is balanced') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua b/test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua > new file mode 100644 > index 00000000..f2453bbe > --- /dev/null > +++ b/test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua this test is not failed after reverting patch > @@ -0,0 +1,38 @@ > +local tap = require('tap') > + > +-- Test file to demonstrate unbalanced Lua stack after instruction > +-- recording due to throwing an error at recording of a stitched > +-- function. The test fails with LUAJIT_ENABLE_TABLE_BUMP enabled. > +-- See also: > +-- *https://github.com/LuaJIT/LuaJIT/issues/606, > +-- *https://github.com/LuaJIT/LuaJIT/issues/1166. > + > +local test = tap.test('lj-1166-error-stitch-table-bump'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(1) > + > +-- `math.modf` recording is NYI. > +-- Local `modf` simplifies `jit.dump()` output. > +local modf = math.modf > + > +jit.opt.start('hotloop=1') > + > +-- luacheck: no unused > +local t > +-- There is no need to run the trace itself. Just check the > +-- correctness of a recording. > +for i = 1, 2 do > + t = {} > + -- Cause table rehashing to trigger table bump optimization. > + t[i] = i > + -- Forcify stitch. This will throw an error at the end of > + -- recording, since trace recording should be retried after > + -- bytecode updating. > + modf(1) > +end > + > +test:ok(true, 'stack is balanced') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt b/test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt > new file mode 100644 > index 00000000..1ebf253b > --- /dev/null > +++ b/test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt > @@ -0,0 +1 @@ > +BuildTestCLib(mockalloc mockalloc.c) > diff --git a/test/tarantool-tests/lj-1166-error-stitch/mockalloc.c b/test/tarantool-tests/lj-1166-error-stitch/mockalloc.c > new file mode 100644 > index 00000000..d6d3492e > --- /dev/null > +++ b/test/tarantool-tests/lj-1166-error-stitch/mockalloc.c > @@ -0,0 +1,51 @@ > +#include "lua.h" > +#include "lauxlib.h" > + > +#undef NDEBUG > +#include > + > +static lua_Alloc old_allocf = NULL; > +static void *old_alloc_state = NULL; > + > +/* Function to be used instead of the default allocator. */ > +static void *mock_allocf(void *ud, void *ptr, size_t osize, size_t nsize) > +{ > + assert(old_allocf != NULL); > + /* > + * Check the specific reallocation related to the IR > + * buffer or the snapshot buffer. > + */ > + if (osize * 2 == nsize) > + return NULL; > + return old_allocf(ud, ptr, osize, nsize); > +} > + > +static int mock(lua_State *L) It is actually not a test mock. According to definition [1] test mock imitate a behavior of a real object. Your memory allocator behaves as a real allocator, but in some cases it will return a NULL instead of memory address. What if we rename "mock" to "allocator with fault injection"? 1. https://www.martinfowler.com/articles/mocksArentStubs.html > +{ > + assert(old_allocf == NULL); > + old_allocf = lua_getallocf(L, &old_alloc_state); > + lua_setallocf(L, mock_allocf, old_alloc_state); > + return 0; > +} > + > +static int unmock(lua_State *L) > +{ > + assert(old_allocf != NULL); > + assert(old_allocf != mock_allocf); > + lua_setallocf(L, old_allocf, old_alloc_state); > + old_allocf = NULL; > + old_alloc_state = NULL; > + return 0; > +} > + > +static const struct luaL_Reg mockalloc[] = { > + {"mock", mock}, > + {"unmock", unmock}, > + {NULL, NULL} > +}; > + > +LUA_API int luaopen_mockalloc(lua_State *L) > +{ > + luaL_register(L, "mockalloc", mockalloc); > + return 1; > +} > diff --git a/test/tarantool-tests/lj-720-errors-before-stitch.test.lua b/test/tarantool-tests/lj-720-errors-before-stitch.test.lua > index d750b721..6e8f70c2 100644 > --- a/test/tarantool-tests/lj-720-errors-before-stitch.test.lua > +++ b/test/tarantool-tests/lj-720-errors-before-stitch.test.lua > @@ -1,13 +1,27 @@ > local tap = require('tap') > local test = tap.test('lj-720-errors-before-stitch'):skipcond({ > ['Test requires JIT enabled'] = not jit.status(), > + ['Disabled on *BSD due to #4819'] = jit.os == 'BSD', > }) > -test:plan(1) > > --- `math.modf` recording is NYI. > +local jparse = require('utils').jit.parse > + > +-- `math.modf` recording is NYI. > -- Local `modf` simplifies `jit.dump()` output. > local modf = math.modf > + > +-- XXX: Avoid other traces compilation due to hotcount collisions > +-- for predictable results. > +jit.off() > +jit.flush() > + > +test:plan(2) > + > +-- We only need the abort reason in the test. > +jparse.start('t') > + > jit.opt.start('hotloop=1', 'maxsnap=1') > +jit.on() > > -- The loop has only two iterations: the first to detect its > -- hotness and the second to record it. The snapshot limit is > @@ -17,5 +31,27 @@ for _ = 1, 2 do > modf(1.2) > end > > +local _, aborted_traces = jparse.finish() > + > +jit.off() > + > test:ok(true, 'stack is balanced') > + > +-- Tarantool may compile traces on the startup. These traces > +-- already exceed the maximum snapshot amount we set after they > +-- are compiled. Hence, there is no need to reallocate the > +-- snapshot buffer, so the check for the snap size is not > +-- triggered. > +test:skipcond({ > + -- luacheck: no global > + ['Impossible to predict the number of snapshots for Tarantool'] = _TARANTOOL, > +}) > + > +assert(aborted_traces and aborted_traces[1], 'aborted trace is persisted') > + > +-- We tried to compile only one trace. > +local reason = aborted_traces[1][1].abort_reason > + > +test:like(reason, 'too many snapshots', 'abort reason is correct') > + > test:done(true) --------------dIs3UotUFcxF3D0mjW1rE00c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi, Sergey,


thanks for the patch! Please see my comments.

On 22.04.2024 11:49, Sergey Kaplun wrote:
From: Mike Pall <mike>

Thanks to Sergey Kaplun and Peter Cawley.

(cherry picked from commit d06beb0480c5d1eb53b3343e78063950275aa281)

This commit is a follow-up for the commit
1b8216023d5a79814389f1c1affef27c15d9de27 ("Throw any errors before stack
changes in trace stitching."). The patch prepends failures for the
specific error to be thrown. Nevertheless, the error may be thrown due
to retrying trace recording in the case when table bump optimization
is enabled or when OOM is observed during reallocation of the snapshot
or IR buffers.

This patch adds the corresponding protected frame and rethrows the error
after a fixup of the stack.

This patch also tests the correctness of copying the error message to
the top of the stack to get a valid "abort" reason in the `jit.dump`
utility.

Also, this patch fixes a non-ASCII space character in the comment for
<lj-720-errors-before-stitch.test.lua>.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9924
---
 src/lj_ffrecord.c                             | 21 ++++++--
 test/tarantool-tests/CMakeLists.txt           |  1 +
 .../lj-1166-error-stitch-oom-ir-buff.test.lua | 46 ++++++++++++++++
 ...j-1166-error-stitch-oom-snap-buff.test.lua | 54 +++++++++++++++++++
 .../lj-1166-error-stitch-table-bump.test.lua  | 38 +++++++++++++
 .../lj-1166-error-stitch/CMakeLists.txt       |  1 +
 .../lj-1166-error-stitch/mockalloc.c          | 51 ++++++++++++++++++
 .../lj-720-errors-before-stitch.test.lua      | 40 +++++++++++++-
 8 files changed, 245 insertions(+), 7 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua
 create mode 100644 test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua
 create mode 100644 test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua
 create mode 100644 test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt
 create mode 100644 test/tarantool-tests/lj-1166-error-stitch/mockalloc.c

diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
index e3ed80fb..ff14e9e4 100644
--- a/src/lj_ffrecord.c
+++ b/src/lj_ffrecord.c
@@ -96,6 +96,14 @@ static ptrdiff_t results_wanted(jit_State *J)
     return -1;
 }
 
+static TValue *rec_stop_stitch_cp(lua_State *L, lua_CFunction dummy, void *ud)
+{
+  jit_State *J = (jit_State *)ud;
+  lj_record_stop(J, LJ_TRLINK_STITCH, 0);
+  UNUSED(L); UNUSED(dummy);
+  return NULL;
+}
+
 /* Trace stitching: add continuation below frame to start a new trace. */
 static void recff_stitch(jit_State *J)
 {
@@ -106,10 +114,7 @@ static void recff_stitch(jit_State *J)
   TValue *nframe = base + 1 + LJ_FR2;
   const BCIns *pc = frame_pc(base-1);
   TValue *pframe = frame_prevl(base-1);
-
-  /* Check for this now. Throwing in lj_record_stop messes up the stack. */
-  if (J->cur.nsnap >= (MSize)J->param[JIT_P_maxsnap])
-    lj_trace_err(J, LJ_TRERR_SNAPOV);
+  int errcode;
 
   /* Move func + args up in Lua stack and insert continuation. */
   memmove(&base[1], &base[-1-LJ_FR2], sizeof(TValue)*nslot);
@@ -134,13 +139,19 @@ static void recff_stitch(jit_State *J)
   J->baseslot += 2 + LJ_FR2;
   J->framedepth++;
 
-  lj_record_stop(J, LJ_TRLINK_STITCH, 0);
+  errcode = lj_vm_cpcall(L, NULL, J, rec_stop_stitch_cp);
 
   /* Undo Lua stack changes. */
   memmove(&base[-1-LJ_FR2], &base[1], sizeof(TValue)*nslot);
   setframe_pc(base-1, pc);
   L->base -= 2 + LJ_FR2;
   L->top -= 2 + LJ_FR2;
+
+  if (errcode) {
+    if (errcode == LUA_ERRRUN)
+      copyTV(L, L->top-1, L->top + (1 + LJ_FR2));
+    lj_err_throw(L, errcode);  /* Propagate errors. */
+  }
 }
 
 /* Fallback handler for fast functions that are not recorded (yet). */
diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
index 56660932..d7c96078 100644
--- a/test/tarantool-tests/CMakeLists.txt
+++ b/test/tarantool-tests/CMakeLists.txt
@@ -39,6 +39,7 @@ add_subdirectory(lj-802-panic-at-mcode-protfail)
 add_subdirectory(lj-flush-on-trace)
 add_subdirectory(lj-1004-oom-error-frame)
 add_subdirectory(lj-1066-fix-cur_L-after-coroutine-resume)
+add_subdirectory(lj-1166-error-stitch)
 
 # The part of the memory profiler toolchain is located in tools
 # directory, jit, profiler, and bytecode toolchains are located
diff --git a/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua b/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua
new file mode 100644
index 00000000..e3a5397d
--- /dev/null
+++ b/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua
@@ -0,0 +1,46 @@
+local tap = require('tap')
+
+-- Test file to demonstrate unbalanced Lua stack after instruction
+-- recording due to throwing an error at recording of a stitched
+-- function.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1166.
+
+local test = tap.test('lj-1166-error-stitch-oom-snap-buff'):skipcond({

should a name in tap.test match to test file name?

now it is not.

+  ['Test requires JIT enabled'] = not jit.status(),
+  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
+})
+
+test:plan(1)
+
+local mockalloc = require('mockalloc')
+
+local function create_chunk(n_slots)

I would add a comment like this:


--- a/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua
+++ b/test/tarantool-tests/lj-1166-error-stitch-oom-ir-buff.test.lua
@@ -14,6 +14,18 @@ test:plan(1)
 
 local mockalloc = require('mockalloc')
 
+-- Generate a Lua chunk like below:
+-- local s1
+-- local s2
+-- ...
+-- local sN
+-- for i = 1, 2 do
+--   s1 = i + 1
+--   s2 = i + 2
+--   ...
+--   sN = i + N
+--   math.modf(1)
+-- end
 local function create_chunk(n_slots)
   local chunk = ''
   for i = 1, n_slots do


+  local chunk = ''
+  for i = 1, n_slots do
+    chunk = chunk .. ('local s%d\n'):format(i)
+  end
+  chunk = chunk .. 'for i = 1, 2 do\n'
+  -- Generate additional IR instructions.
+  for i = 1, n_slots do
+    chunk = chunk .. ('  s%d = i + %d\n'):format(i, i)
+  end
+  -- `math.modf()` recording is NYI.
+  chunk = chunk .. '  math.modf(1)\n'
+  chunk = chunk .. 'end\n'
+  return chunk
+end
+
+-- XXX: amount of slots is empirical.
+local tracef = assert(loadstring(create_chunk(175)))
+
+jit.opt.start('hotloop=1', '-loop', '-fold')
+
+mockalloc.mock()
+
+tracef()
+
+mockalloc.unmock()
+
+test:ok(true, 'stack is balanced')
+
+test:done(true)
diff --git a/test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua b/test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua
new file mode 100644
index 00000000..8d671f8d
--- /dev/null
+++ b/test/tarantool-tests/lj-1166-error-stitch-oom-snap-buff.test.lua
@@ -0,0 +1,54 @@
+local tap = require('tap')
+
+-- Test file to demonstrate unbalanced Lua stack after instruction
+-- recording due to throwing an error at recording of a stitched
+-- function.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1166.
+
+local test = tap.test('lj-1166-error-stitch-oom-snap-buff'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
+})
+
+test:plan(1)
+
+local mockalloc = require('mockalloc')
+
+local function create_chunk(n_conds)
the same as above: please add a comment with an example of generated Lua chunk
+  local chunk = ''
+  chunk = chunk .. 'for i = 1, 2 do\n'
+  -- Each condition adds additional snapshot.
+  for i = 1, n_conds do
+    chunk = chunk .. ('  if i < %d then end\n'):format(i + n_conds)
+  end
+  -- `math.modf()` recording is NYI.
+  chunk = chunk .. '  math.modf(1)\n'
+  chunk = chunk .. 'end\n'
+  return chunk
+end
+
+-- XXX: Need to compile the cycle in the `create_chunk()` to
+-- preallocate the snapshot buffer.
+jit.opt.start('hotloop=1', '-loop', '-fold')
+
+-- XXX: Amount of slots is empirical.
+local tracef = assert(loadstring(create_chunk(6)))
+
+-- XXX: Remove previous trace.
+jit.off()
+jit.flush()
+
+-- XXX: Update hotcounts to avoid hash collisions.
+jit.opt.start('hotloop=1')
+
+jit.on()
+
+mockalloc.mock()
+
+tracef()
+
+mockalloc.unmock()
+
+test:ok(true, 'stack is balanced')
+
+test:done(true)
diff --git a/test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua b/test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua
new file mode 100644
index 00000000..f2453bbe
--- /dev/null
+++ b/test/tarantool-tests/lj-1166-error-stitch-table-bump.test.lua

this test is not failed after reverting patch


@@ -0,0 +1,38 @@
+local tap = require('tap')
+
+-- Test file to demonstrate unbalanced Lua stack after instruction
+-- recording due to throwing an error at recording of a stitched
+-- function. The test fails with LUAJIT_ENABLE_TABLE_BUMP enabled.
+-- See also:
+-- * https://github.com/LuaJIT/LuaJIT/issues/606,
+-- * https://github.com/LuaJIT/LuaJIT/issues/1166.
+
+local test = tap.test('lj-1166-error-stitch-table-bump'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+-- `math.modf` recording is NYI.
+-- Local `modf` simplifies `jit.dump()` output.
+local modf = math.modf
+
+jit.opt.start('hotloop=1')
+
+-- luacheck: no unused
+local t
+-- There is no need to run the trace itself. Just check the
+-- correctness of a recording.
+for i = 1, 2 do
+  t = {}
+  -- Cause table rehashing to trigger table bump optimization.
+  t[i] = i
+  -- Forcify stitch. This will throw an error at the end of
+  -- recording, since trace recording should be retried after
+  -- bytecode updating.
+  modf(1)
+end
+
+test:ok(true, 'stack is balanced')
+
+test:done(true)
diff --git a/test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt b/test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt
new file mode 100644
index 00000000..1ebf253b
--- /dev/null
+++ b/test/tarantool-tests/lj-1166-error-stitch/CMakeLists.txt
@@ -0,0 +1 @@
+BuildTestCLib(mockalloc mockalloc.c)
diff --git a/test/tarantool-tests/lj-1166-error-stitch/mockalloc.c b/test/tarantool-tests/lj-1166-error-stitch/mockalloc.c
new file mode 100644
index 00000000..d6d3492e
--- /dev/null
+++ b/test/tarantool-tests/lj-1166-error-stitch/mockalloc.c
@@ -0,0 +1,51 @@
+#include "lua.h"
+#include "lauxlib.h"
+
+#undef NDEBUG
+#include <assert.h>
+
+static lua_Alloc old_allocf = NULL;
+static void *old_alloc_state = NULL;
+
+/* Function to be used instead of the default allocator. */
+static void *mock_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
+{
+	assert(old_allocf != NULL);
+	/*
+	 * Check the specific reallocation related to the IR
+	 * buffer or the snapshot buffer.
+	 */
+	if (osize * 2 == nsize)
+		return NULL;
+	return old_allocf(ud, ptr, osize, nsize);
+}
+
+static int mock(lua_State *L)


It is actually not a test mock.

According to definition [1] test mock imitate a behavior of a real object.

Your memory allocator behaves as a real allocator, but in some cases it will return

a NULL instead of memory address. What if we rename "mock" to "allocator with fault injection"?


1. https://www.martinfowler.com/articles/mocksArentStubs.html

+{
+	assert(old_allocf == NULL);
+	old_allocf = lua_getallocf(L, &old_alloc_state);
+	lua_setallocf(L, mock_allocf, old_alloc_state);
+	return 0;
+}
+
+static int unmock(lua_State *L)
+{
+	assert(old_allocf != NULL);
+	assert(old_allocf != mock_allocf);
+	lua_setallocf(L, old_allocf, old_alloc_state);
+	old_allocf = NULL;
+	old_alloc_state = NULL;
+	return 0;
+}
+
+static const struct luaL_Reg mockalloc[] = {
+	{"mock", mock},
+	{"unmock", unmock},
+	{NULL, NULL}
+};
+
+LUA_API int luaopen_mockalloc(lua_State *L)
+{
+	luaL_register(L, "mockalloc", mockalloc);
+	return 1;
+}
diff --git a/test/tarantool-tests/lj-720-errors-before-stitch.test.lua b/test/tarantool-tests/lj-720-errors-before-stitch.test.lua
index d750b721..6e8f70c2 100644
--- a/test/tarantool-tests/lj-720-errors-before-stitch.test.lua
+++ b/test/tarantool-tests/lj-720-errors-before-stitch.test.lua
@@ -1,13 +1,27 @@
 local tap = require('tap')
 local test = tap.test('lj-720-errors-before-stitch'):skipcond({
   ['Test requires JIT enabled'] = not jit.status(),
+  ['Disabled on *BSD due to #4819'] = jit.os == 'BSD',
 })
-test:plan(1)
 
--- `math.modf` recording is NYI.
+local jparse = require('utils').jit.parse
+
+-- `math.modf` recording is NYI.
 -- Local `modf` simplifies `jit.dump()` output.
 local modf = math.modf
+
+-- XXX: Avoid other traces compilation due to hotcount collisions
+-- for predictable results.
+jit.off()
+jit.flush()
+
+test:plan(2)
+
+-- We only need the abort reason in the test.
+jparse.start('t')
+
 jit.opt.start('hotloop=1', 'maxsnap=1')
+jit.on()
 
 -- The loop has only two iterations: the first to detect its
 -- hotness and the second to record it. The snapshot limit is
@@ -17,5 +31,27 @@ for _ = 1, 2 do
   modf(1.2)
 end
 
+local _, aborted_traces = jparse.finish()
+
+jit.off()
+
 test:ok(true, 'stack is balanced')
+
+-- Tarantool may compile traces on the startup. These traces
+-- already exceed the maximum snapshot amount we set after they
+-- are compiled. Hence, there is no need to reallocate the
+-- snapshot buffer, so the check for the snap size is not
+-- triggered.
+test:skipcond({
+  -- luacheck: no global
+  ['Impossible to predict the number of snapshots for Tarantool'] = _TARANTOOL,
+})
+
+assert(aborted_traces and aborted_traces[1], 'aborted trace is persisted')
+
+-- We tried to compile only one trace.
+local reason = aborted_traces[1][1].abort_reason
+
+test:like(reason, 'too many snapshots', 'abort reason is correct')
+
 test:done(true)
--------------dIs3UotUFcxF3D0mjW1rE00c--