[Tarantool-patches] [PATCH luajit v1] tools: fix luajit-gdb stack dump

Sergey Kaplun skaplun at tarantool.org
Wed Jul 21 11:49:50 MSK 2021


Igor,

On 20.07.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Nice catch, actually! I wonder, how much impact
> this bug made while digging the crash reports. Anyway, LGTM, but
> consider several nits below.

Actually, not too much. It is the unique situation, when we have naked
guest Lua stack without any function call.

Branch is force-pushed with the changes below.

> 
> On 18.06.21, Sergey Kaplun wrote:
> > When there is only one frame (so-called dummy frame) with a dummy thread
> > on stack bottom at L->base - 1 - LJ_FR2, stack dump does not show stack
> > slots from bottom to top, because the frame link is already pointed to
> > stack bottom (dummy thread mentioned above). Its output looks like the
> 
> Too much information in a single sentence. I consider the following info
> vital to understand the patch.
> * Dummy frame is the particular and kinda "initial"  coroutine state,
>   when the framelink slot (i.e. L->base - (1 + LJ_FR2)) is the most
>   bottom slot of the guest stack (i.e. L->stack).
> * Since coroutine stack unwinding is implemented via precondition loop,
>   no slots are dumped from L->top to L->base in case when only dummy
>   frame is on the stack.
> * Python is a crap language, and seems like postcondition loop violates
>   The Zen (this is too hard for users, isn't it?), so Guido decided not
>   to introduce such construction in the language at all.
> 
>   #ifdef offtop
>     I dare to guess, it was just too hard to distinguish precondition
>     from postcondition in "colon&spaces" paradigm; glad Larry Wall
>     decided to use braces for blocks, and Roberto just left Pascalish
>     syntax for this. I'm upset with the GDB guys choice every time we're
>     back to developing this extension.

I suppose that it just was rather simple and one of the most popular
languages at the moment of making decision.

>   #endif
> 
> comment.yield()
> 
> > following:
> > 
> > | 0x7fb512ac40:0x7fb512ac70 [    ] 7 slots: Red zone
> > | 0x7fb512ac38            [   M]
> > | 0x7fb512ab28:0x7fb512ac30 [    ] 34 slots: Free stack slots
> > | 0x7fb512ab20            [  T ]
> > | 0x7fb512ab08:0x7fb512ab10 [S   ] FRAME: dummy L
> 
> Minor: I guess, you can align the output. At least I see no reason not
> to do this.
> 
> > 
> 
> comment.resume()
> 
> Hence, there is no other option except unroll the first iteration and
> reorder stack slots dump with moving framelinks downwards to the
> coroutine stack bottom.
> 
> > This patch unrolled first loop iteration. Stack dump first inspect all
> > fields for the top frame and then continue if there are other frames.
> > After patch the output looks like the following:
> > 
> > | 0x7fb512ac40:0x7fb512ac70 [    ] 7 slots: Red zone
> > | 0x7fb512ac38            [   M]
> > | 0x7fb512ab28:0x7fb512ac30 [    ] 34 slots: Free stack slots
> > | 0x7fb512ab20            [  T ]
> > | 0x7fb512ab18            [    ] VALUE: string 0 "/tmp/net_box.lua:6: err in ser" @ 0x7fb512ade8
> > | 0x7fb512ab10            [ B  ] VALUE: table @ 0x7fb512ac80 (asize: 0, hmask: 0x0)
> > | 0x7fb512ab00:0x7fb512ab08 [S   ] FRAME: dummy L
> 
> Feel free to align the output here too.

Thanks for your comments!
The new commit message version:

| Dummy frame is the "initial" coroutine state, when the framelink slot (i.e.
| L->base - (1 + LJ_FR2)) is the bottom slot of the guest stack
| (i.e. L->stack).  Since coroutine stack unwinding is implemented via
| precondition loop, lj-stack doesn't dump the slots for the dummy frame, since
| the framelink points to the stack bottom.
|
| The output looks like the following:
|
| | 0x7fb512ac40:0x7fb512ac70 [    ] 7 slots: Red zone
| | 0x7fb512ac38              [   M]
| | 0x7fb512ab28:0x7fb512ac30 [    ] 34 slots: Free stack slots
| | 0x7fb512ab20              [  T ]
| | 0x7fb512ab08:0x7fb512ab10 [S   ] FRAME: dummy L
|
| Python doesn't provide post-condition (do-while) syntax construction,
| that fits better for this case, so the unwinding of the topmost frame is just manually
| unrolled.
|
| As a result of the patch the output looks like the following:
|
| | 0x7fb512ac40:0x7fb512ac70 [    ] 7 slots: Red zone
| | 0x7fb512ac38              [   M]
| | 0x7fb512ab28:0x7fb512ac30 [    ] 34 slots: Free stack slots
| | 0x7fb512ab20              [  T ]
| | 0x7fb512ab18              [    ] VALUE: string 0 "/tmp/net_box.lua:6: err in ser" @ 0x7fb512ade8
| | 0x7fb512ab10              [ B  ] VALUE: table @ 0x7fb512ac80 (asize: 0, hmask: 0x0)
| | 0x7fb512ab00:0x7fb512ab08 [S   ] FRAME: dummy L

> 
> > ---
> > 
> > Branch: https://github.com/tarantool/luajit/tree/skaplun/fix-luajit-gdb
> > 
> > Steps to test behaviour:
> > 0) Clone branch
> > 1) Build tarantool with GC64 or not (I've tested both cases)
> > 2) Run the following script </tmp/net_box.lua> by Tarantool via gdb.
> > 
> > | $ cat /tmp/netbox.lua
> > | box.cfg{log_level = 50, listen = "127.0.0.1:3802"}
> > | box.schema.user.grant('guest','read,write,execute','universe', nil, {if_not_exists=true})
> > | local net_box = require"net.box"
> > | local cn = net_box:connect("127.0.0.1:3802")
> > | local function f()
> > |   return setmetatable({}, {__serialize = function() error[[err in ser]] end})
> > | end
> > | _G.f = f
> > | local r, st = cn:call("f")
> > | print(r,st)
> > 
> > | $ gdb --args ./tarantool /tmp/netbox.lua
> > | ...
> > | (gdb) source ~/builds_workspace/luajit/fix-luajit-gdb/src/luajit-gdb.py
> > | ...
> > 
> > 3) Set breakpoint to ./src/lua/msgpack.c:234, and run
> > 
> > | (gdb) b ./src/lua/msgpack.c:234
> > 
> > I'm not sure about your sources version, so you can just set breakpoin to
> > mp_encode(), run and continue (need two entries). We are interested in
> > the line #234:
> > 
> > | 233|.......if (luaL_tofield(L, cfg, opts, lua_gettop(L), &field) < 0)
> > | 234|.......|.......return luaT_error(L);
> > 
> > And dump the stack:
> > 
> > | (gdb) lj-stack L
> > | 0x7fb512ac40:0x7fb512ac70 [    ] 7 slots: Red zone
> > | 0x7fb512ac38            [   M]
> > | 0x7fb512ab28:0x7fb512ac30 [    ] 34 slots: Free stack slots
> > | 0x7fb512ab20            [  T ]
> > | 0x7fb512ab18            [    ] VALUE: string 0 "/tmp/net_box.lua:6: err in ser" @ 0x7fb512ade8
> > | 0x7fb512ab10            [ B  ] VALUE: table @ 0x7fb512ac80 (asize: 0, hmask: 0x0)
> > | 0x7fb512ab00:0x7fb512ab08 [S   ] FRAME: dummy L
> > 
> >  src/luajit-gdb.py | 16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
> > index f1fd6230..a3dad585 100644
> > --- a/src/luajit-gdb.py
> > +++ b/src/luajit-gdb.py
> > @@ -424,16 +424,26 @@ def dump_stack(L, base=None, top=None):
> >          )
> >      ]) + '\n'
> >  
> > +    # Set framelink for the top frame. Dump frame slots even for dummy frame.
> 
> It would be also great to drop a few words why the iteration is unrolled
> (the lack of postcondition loops in Python) via XXX comment.
> 
> >      slot = top - 1
> >      framelink = base - (1 + LJ_FR2)
> 
> I suggest to split this part into two blocks via blank line:
> * variable initialization
> * unrolled iteration
> 
> > +    while slot > framelink + LJ_FR2:
> > +        dump += dump_stack_slot(L, slot, base, top)
> > +        slot -= 1
> >  
> > +    # If there are other frames -- continue.
> 
> Again, this is not the particular reason. As for me the main reason you
> can't implement stack unwinding via do-while construction.

Adjusted this chunk considering your suggestions.

===================================================================
diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
index a3dad585..fab276ed 100644
--- a/src/luajit-gdb.py
+++ b/src/luajit-gdb.py
@@ -424,14 +424,22 @@ def dump_stack(L, base=None, top=None):
         )
     ]) + '\n'
 
-    # Set framelink for the top frame. Dump frame slots even for dummy frame.
     slot = top - 1
     framelink = base - (1 + LJ_FR2)
+
+    # XXX: Lua stack unwinding algorithm consists of the following steps:
+    # 1. dump all data slots in the (framelink, top) interval
+    # 2. check whether there are remaining frames
+    # 3. if there are no slots further, stop the unwinding loop
+    # 4. otherwise, resolve the next framelink and top and go to (1)
+    #
+    # Postcondition (i.e. do-while) loops is the most fitting idiom for such
+    # case, but Python doesn't provide such lexical construction. Hence step (1)
+    # is unrolled for the topmost stack frame.
     while slot > framelink + LJ_FR2:
         dump += dump_stack_slot(L, slot, base, top)
         slot -= 1
 
-    # If there are other frames -- continue.
     while framelink > mref('TValue *', L['stack']):
         assert slot == framelink + LJ_FR2, "Invalid slot during frame unwind"
         dump += dump_framelink(L, framelink)
@@ -442,7 +450,7 @@ def dump_stack(L, base=None, top=None):
             slot -= 1
 
     assert slot == framelink + LJ_FR2, "Invalid slot after frame unwind"
-    # Skip nil slot for the last frame for 2-slot frames.
+    # Skip a nil slot for the last frame for 2-slot frames.
     slot -= LJ_FR2
 
     dump += '{fr}{padding} [S   ] FRAME: dummy L'.format(
===================================================================

> 
> >      while framelink > mref('TValue *', L['stack']):
> > -        while slot > framelink + LJ_FR2:
> > -            dump += dump_stack_slot(L, slot, base, top)
> > -            slot -= 1
> > +        assert slot == framelink + LJ_FR2, "Invalid slot during frame unwind"
> 
> Please, use parentheses for assert call.

It's done by design to avoid always true assertions [1]:

| $ python3
| Python 3.7.10 (default, Mar 18 2021, 04:43:06)
| [GCC 9.3.0] on linux
| Type "help", "copyright", "credits" or "license" for more information.
| >>> assert(1!=1, "stupid assert")
| <stdin>:1: SyntaxWarning: assertion is always true, perhaps remove parentheses?

> 
> >          dump += dump_framelink(L, framelink)
> >          framelink = frame_prev(framelink + LJ_FR2) - LJ_FR2
> >          slot -= 1 + LJ_FR2
> > +        while slot > framelink + LJ_FR2:
> > +            dump += dump_stack_slot(L, slot, base, top)
> > +            slot -= 1
> > +
> > +    assert slot == framelink + LJ_FR2, "Invalid slot after frame unwind"
> 
> Ditto.

Ditto.

> 
> > +    # Skip nil slot for the last frame for 2-slot frames.
> > +    slot -= LJ_FR2
> >  
> >      dump += '{fr}{padding} [S   ] FRAME: dummy L'.format(
> >          fr = slot,
> > -- 
> > 2.31.0
> > 
> 
> -- 
> Best regards,
> IM

[1]: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-assert-stmt

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list