[Tarantool-patches] [PATCH luajit] Ensure correct stack top for OOM error message.

Sergey Kaplun skaplun at tarantool.org
Fri Nov 11 15:18:40 MSK 2022


Hi, Igor!

Thanks for the review!
Welcome back to the ML! :)

Updated remote branch considering your comments.

On 11.11.22, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch!
> 
> On 09.11.22, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> > 
> > Reported by Sergey Kaplun.
> > 
> > (cherry picked from commit ca8d3257bb44e42100c7910c47dcdcf01f494187)
> > 
> > `lj_err_mem()` doesn't set up `L->top` for Lua frames, but uses it for
> 
> I believe it "doesn't set `L->top` for the current Lua frame, but...",
> doesn't it?

Fixed, thanks.

> 
> > pushing error message on the stack. So, when we call some routine that
> 
> Typo: s/some routine/a routine/ or s/some routine/arbitrary routine/.

Fixed.

> 
> > does some allocations, it can raise the OOM error (like `lj_tab_dup()`
> 
> Strictly saying it's LUA_ERRMEM, not OOM.

If you say about naming. Replaced.

> 
> > in `BC_TDUP`) and this error may corrupt stack for unwind in situations
> > when `L->top` < `L->base`.
> > 
> > This patch restores `L->top` for Lua frames when raise the error via
> 
> Typo: s/raise the error/the error is raised/.

Fixed.

> 
> > `lj_err_mem()`.
> > 
> > Sergey Kaplun:
> > * added the description and the test for the problem
> > 
> > Resolves tarantool/tarantool#3840
> > Part of tarantool/tarantool#7230
> > ---
> > 
> > Issues:
> > * https://github.com/LuaJIT/LuaJIT/issues/906
> > * https://github.com/tarantool/tarantool/issues/7230
> > * https://github.com/tarantool/tarantool/issues/3840
> > PR: https://github.com/tarantool/tarantool/pull/7915
> > Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-906-fix-err-mem
> > 
> > Red LuaJIT CI for MacOS Release builds is a known issue with self-hosted
> > runners, as Igor has said before.
> 
> Everything is fixed at the moment, please try to rebase.

Rebased. It works now! Thanks!

> 
> > 
> >  src/lj_err.c                                  |  1 +
> >  .../lj-906-fix-err-mem.test.lua               | 90 +++++++++++++++++++
> >  2 files changed, 91 insertions(+)
> >  create mode 100644 test/tarantool-tests/lj-906-fix-err-mem.test.lua
> > 
> 
> <snipped>
> 
> > diff --git a/test/tarantool-tests/lj-906-fix-err-mem.test.lua b/test/tarantool-tests/lj-906-fix-err-mem.test.lua
> > new file mode 100644
> > index 00000000..a139e1c9
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-906-fix-err-mem.test.lua
> > @@ -0,0 +1,90 @@
> > +local tap = require('tap')
> > +local ffi = require('ffi')
> > +local table_new = require('table.new')
> > +
> > +-- Avoid test to be killed.
> > +require('utils').skipcond(ffi.abi('gc64'), 'test is not GC64 only')
> > +
> > +local test = tap.test('lj-906-fix-err-mem')
> > +test:plan(1)
> > +
> > +local KB = 1024
> > +local MB = 1024 * KB
> > +
> > +-- The maximum available table size, taking into account created
> > +-- constants for one function.
> > +local TNEW_SIZE = 511
> > +
> > +local gc_anchor = {}
> > +
> > +-- This function works until raises the error.
> > +local function eat_chunks(size)
> > +  -- Need raise the OOM error inside TDUP, not TNEW, so reserve
> > +  -- memory for it.
> > +  -- luacheck: no unused
> > +  local tnew_anchor = table_new(TNEW_SIZE, 0)
> > +  while true do
> > +    table.insert(gc_anchor, ffi.new('char [?]', size))
> > +  end
> > +end
> > +
> > +-- Function to format inner tab leading to TDUP emitting.
> > +local function format_inner_tab()
> 
> Minor: Maybe <make_deep_table(depth)> instead of <format_inner_tab()>?
> Just asking, feel free to ignore.

Fixed.

> 
> > +  local inner_tab = ''
> > +  local inner_depth = 128
> 
> Why 128? Please, drop a few words.

Added.

> 
> > +  -- Repeate table template for TDUP.
> > +  for _ = 1, inner_depth do
> > +    inner_tab = inner_tab .. '{a ='
> > +  end
> > +  inner_tab = inner_tab .. '{}'
> > +  for _ = 1, inner_depth do
> > +    inner_tab = inner_tab .. '},'
> > +  end
> > +  return inner_tab
> > +end
> > +
> 
> Minor: Same (also think about moving these helpers to utils.*).
> Maybe <make_flat_table(size)> instead of <format_TDUP_chunk()>?

IMHO, it's too specific to use this function test-wide, so I just
keep it as is. If we need similar functionality somewhere else, we
can move this function to <utils.lua> later.

So ignore here.

>                                                                 And
> mention TDUP specifics in the comment.

Fixed.

> 
> Just asking too, feel free to ignore.
> 
> > +local function format_TDUP_chunk()
> > +  local big_tab = 'local _ = {\n'
> > +  local inner_tab = format_inner_tab()
> > +  for _ = 1, TNEW_SIZE do
> > +    big_tab = big_tab .. inner_tab .. '\n'
> > +  end
> > +  big_tab = big_tab .. '}'
> > +  return big_tab
> > +end
> > +
> > +local TDUP, err = loadstring(format_TDUP_chunk())
> > +assert(TDUP, err)
> > +
> > +local function frame_before_TDUP()
> > +  -- Stack slots are needed for coredump in case of misbehaviour.
> 
> Why are they needed? Please drop few more words regarding this.

Hmm, I obscure coredump without them too (because of rewriting the
other stack slot). But the frame before TDUP is really needed to rewrite
pcall frame. Fixed, and added the comment.

All minor patches about comments above are done via the following
iterative patch:

===================================================================
diff --git a/test/tarantool-tests/lj-906-fix-err-mem.test.lua b/test/tarantool-tests/lj-906-fix-err-mem.test.lua
index f512e802..0ec1b30e 100644
--- a/test/tarantool-tests/lj-906-fix-err-mem.test.lua
+++ b/test/tarantool-tests/lj-906-fix-err-mem.test.lua
@@ -12,7 +12,7 @@ local KB = 1024
 local MB = 1024 * KB
 
 -- The maximum available table size, taking into account created
--- constants for one function.
+-- constants for one function and nested level.
 local TNEW_SIZE = 511
 
 local gc_anchor = {}
@@ -29,9 +29,8 @@ local function eat_chunks(size)
 end
 
 -- Function to format inner tab leading to TDUP emitting.
-local function format_inner_tab()
+local function make_flat_table(inner_depth)
   local inner_tab = ''
-  local inner_depth = 128
   -- Repeate table template for TDUP.
   for _ = 1, inner_depth do
     inner_tab = inner_tab .. '{a ='
@@ -43,9 +42,14 @@ local function format_inner_tab()
   return inner_tab
 end
 
+-- The `lj_err_mem()` doesn't fix `L->top`, when called from
+-- helper function (like `lj_tab_dup()`) before the patch.
+-- This function creates a chunk with many BC_TDUP inside.
 local function format_TDUP_chunk()
   local big_tab = 'local _ = {\n'
-  local inner_tab = format_inner_tab()
+  -- The maximum available table size, taking into account created
+  -- constants for one function and nested level.
+  local inner_tab = make_flat_table(128)
   for _ = 1, TNEW_SIZE do
     big_tab = big_tab .. inner_tab .. '\n'
   end
@@ -56,12 +60,10 @@ end
 local TDUP, err = loadstring(format_TDUP_chunk())
 assert(TDUP, err)
 
+-- Function to create the additional frame after to be rewrited
+-- one in case of `lj_err_mem()` misbehaviour.
 local function frame_before_TDUP()
-  -- Stack slots are needed for coredump in case of misbehaviour.
-  -- luacheck: no unused
-  local frame_slot1, frame_slot2
   TDUP()
-  return frame_slot1, frame_slot2
 end
 
 collectgarbage()
===================================================================


> 
> > +  -- luacheck: no unused
> > +  local frame_slot1, frame_slot2
> > +  TDUP()
> > +  return frame_slot1, frame_slot2
> > +end
> > +
> 
> Minor: I believe you can pack these two lines into something one can
> call <janitor>:
> | local function janitor()
> |   collectgarbage('collect')
> |   collectgarbage('stop')
> | end
> 
> Anyway, you can leave this as is.

Added, thanks!
See the iterative patch below.
===================================================================
diff --git a/test/tarantool-tests/lj-906-fix-err-mem.test.lua b/test/tarantool-tests/lj-906-fix-err-mem.test.lua
index 519e6bf6..7dbc1614 100644
--- a/test/tarantool-tests/lj-906-fix-err-mem.test.lua
+++ b/test/tarantool-tests/lj-906-fix-err-mem.test.lua
@@ -66,8 +66,12 @@ local function frame_before_TDUP()
   TDUP()
 end
 
-collectgarbage()
-collectgarbage('stop')
+local function janitor()
+  collectgarbage('collect')
+  collectgarbage('stop')
+end
+
+janitor()
 
 -- Avoid OOM on traces.
 jit.off()
@@ -75,14 +79,11 @@ jit.off()
 -- Stack slots are needed for coredump in case of misbehaviour.
 -- luacheck: no unused
 local r, e = pcall(eat_chunks, 8 * MB)
-collectgarbage()
-collectgarbage('stop')
+janitor()
 pcall(eat_chunks, 8 * KB)
-collectgarbage()
-collectgarbage('stop')
+janitor()
 pcall(eat_chunks, 8)
-collectgarbage()
-collectgarbage('stop')
+janitor()
 
 pcall(frame_before_TDUP)
===================================================================

 
> 
> > +collectgarbage()
> > +collectgarbage('stop')
> > +
> > +-- Avoid OOM on traces.
> > +jit.off()
> > +
> > +-- Stack slots are needed for coredump in case of misbehaviour.
> > +-- luacheck: no unused
> > +local r, e = pcall(eat_chunks, 8 * MB)
> > +collectgarbage()
> > +pcall(eat_chunks, 8 * KB)
> > +collectgarbage()
> > +pcall(eat_chunks, 8)
> > +collectgarbage()
> > +
> > +pcall(frame_before_TDUP)
> 
> Do we need to check the status of the <pcall> above?

I suppose no. As we disscussed offline the test may be flaky (false
positive, when we can alloc all tables) due to memory allocations.
Also, this part is very fragile, and we don't want to fix test later by
removing this check the status of the `pcall()`.

> 
> > +
> > +-- Release memory for `tap` functions.
> > +gc_anchor = nil
> > +collectgarbage()
> 
> Is this step required? I doubt.

Yes it is. Without it we may get the OOM error inside `test:ok()` or
`test:check()` functions (I faced it on my local machine, when rewrite
this test).

> 
> > +
> > +test:ok(true, 'correctly throw memory error')
> > +
> > +os.exit(test:check() and 0 or 1)
> > -- 
> > 2.34.1
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list