[Tarantool-patches] [PATCH luajit v1] tools: fix luajit-gdb stack dump
Sergey Kaplun
skaplun at tarantool.org
Wed Jul 21 11:49:50 MSK 2021
Igor,
On 20.07.21, Igor Munkin wrote:
> Sergey,
>
> Thanks for the patch! Nice catch, actually! I wonder, how much impact
> this bug made while digging the crash reports. Anyway, LGTM, but
> consider several nits below.
Actually, not too much. It is the unique situation, when we have naked
guest Lua stack without any function call.
Branch is force-pushed with the changes below.
>
> On 18.06.21, Sergey Kaplun wrote:
> > When there is only one frame (so-called dummy frame) with a dummy thread
> > on stack bottom at L->base - 1 - LJ_FR2, stack dump does not show stack
> > slots from bottom to top, because the frame link is already pointed to
> > stack bottom (dummy thread mentioned above). Its output looks like the
>
> Too much information in a single sentence. I consider the following info
> vital to understand the patch.
> * Dummy frame is the particular and kinda "initial" coroutine state,
> when the framelink slot (i.e. L->base - (1 + LJ_FR2)) is the most
> bottom slot of the guest stack (i.e. L->stack).
> * Since coroutine stack unwinding is implemented via precondition loop,
> no slots are dumped from L->top to L->base in case when only dummy
> frame is on the stack.
> * Python is a crap language, and seems like postcondition loop violates
> The Zen (this is too hard for users, isn't it?), so Guido decided not
> to introduce such construction in the language at all.
>
> #ifdef offtop
> I dare to guess, it was just too hard to distinguish precondition
> from postcondition in "colon&spaces" paradigm; glad Larry Wall
> decided to use braces for blocks, and Roberto just left Pascalish
> syntax for this. I'm upset with the GDB guys choice every time we're
> back to developing this extension.
I suppose that it just was rather simple and one of the most popular
languages at the moment of making decision.
> #endif
>
> comment.yield()
>
> > following:
> >
> > | 0x7fb512ac40:0x7fb512ac70 [ ] 7 slots: Red zone
> > | 0x7fb512ac38 [ M]
> > | 0x7fb512ab28:0x7fb512ac30 [ ] 34 slots: Free stack slots
> > | 0x7fb512ab20 [ T ]
> > | 0x7fb512ab08:0x7fb512ab10 [S ] FRAME: dummy L
>
> Minor: I guess, you can align the output. At least I see no reason not
> to do this.
>
> >
>
> comment.resume()
>
> Hence, there is no other option except unroll the first iteration and
> reorder stack slots dump with moving framelinks downwards to the
> coroutine stack bottom.
>
> > This patch unrolled first loop iteration. Stack dump first inspect all
> > fields for the top frame and then continue if there are other frames.
> > After patch the output looks like the following:
> >
> > | 0x7fb512ac40:0x7fb512ac70 [ ] 7 slots: Red zone
> > | 0x7fb512ac38 [ M]
> > | 0x7fb512ab28:0x7fb512ac30 [ ] 34 slots: Free stack slots
> > | 0x7fb512ab20 [ T ]
> > | 0x7fb512ab18 [ ] VALUE: string 0 "/tmp/net_box.lua:6: err in ser" @ 0x7fb512ade8
> > | 0x7fb512ab10 [ B ] VALUE: table @ 0x7fb512ac80 (asize: 0, hmask: 0x0)
> > | 0x7fb512ab00:0x7fb512ab08 [S ] FRAME: dummy L
>
> Feel free to align the output here too.
Thanks for your comments!
The new commit message version:
| Dummy frame is the "initial" coroutine state, when the framelink slot (i.e.
| L->base - (1 + LJ_FR2)) is the bottom slot of the guest stack
| (i.e. L->stack). Since coroutine stack unwinding is implemented via
| precondition loop, lj-stack doesn't dump the slots for the dummy frame, since
| the framelink points to the stack bottom.
|
| The output looks like the following:
|
| | 0x7fb512ac40:0x7fb512ac70 [ ] 7 slots: Red zone
| | 0x7fb512ac38 [ M]
| | 0x7fb512ab28:0x7fb512ac30 [ ] 34 slots: Free stack slots
| | 0x7fb512ab20 [ T ]
| | 0x7fb512ab08:0x7fb512ab10 [S ] FRAME: dummy L
|
| Python doesn't provide post-condition (do-while) syntax construction,
| that fits better for this case, so the unwinding of the topmost frame is just manually
| unrolled.
|
| As a result of the patch the output looks like the following:
|
| | 0x7fb512ac40:0x7fb512ac70 [ ] 7 slots: Red zone
| | 0x7fb512ac38 [ M]
| | 0x7fb512ab28:0x7fb512ac30 [ ] 34 slots: Free stack slots
| | 0x7fb512ab20 [ T ]
| | 0x7fb512ab18 [ ] VALUE: string 0 "/tmp/net_box.lua:6: err in ser" @ 0x7fb512ade8
| | 0x7fb512ab10 [ B ] VALUE: table @ 0x7fb512ac80 (asize: 0, hmask: 0x0)
| | 0x7fb512ab00:0x7fb512ab08 [S ] FRAME: dummy L
>
> > ---
> >
> > Branch: https://github.com/tarantool/luajit/tree/skaplun/fix-luajit-gdb
> >
> > Steps to test behaviour:
> > 0) Clone branch
> > 1) Build tarantool with GC64 or not (I've tested both cases)
> > 2) Run the following script </tmp/net_box.lua> by Tarantool via gdb.
> >
> > | $ cat /tmp/netbox.lua
> > | box.cfg{log_level = 50, listen = "127.0.0.1:3802"}
> > | box.schema.user.grant('guest','read,write,execute','universe', nil, {if_not_exists=true})
> > | local net_box = require"net.box"
> > | local cn = net_box:connect("127.0.0.1:3802")
> > | local function f()
> > | return setmetatable({}, {__serialize = function() error[[err in ser]] end})
> > | end
> > | _G.f = f
> > | local r, st = cn:call("f")
> > | print(r,st)
> >
> > | $ gdb --args ./tarantool /tmp/netbox.lua
> > | ...
> > | (gdb) source ~/builds_workspace/luajit/fix-luajit-gdb/src/luajit-gdb.py
> > | ...
> >
> > 3) Set breakpoint to ./src/lua/msgpack.c:234, and run
> >
> > | (gdb) b ./src/lua/msgpack.c:234
> >
> > I'm not sure about your sources version, so you can just set breakpoin to
> > mp_encode(), run and continue (need two entries). We are interested in
> > the line #234:
> >
> > | 233|.......if (luaL_tofield(L, cfg, opts, lua_gettop(L), &field) < 0)
> > | 234|.......|.......return luaT_error(L);
> >
> > And dump the stack:
> >
> > | (gdb) lj-stack L
> > | 0x7fb512ac40:0x7fb512ac70 [ ] 7 slots: Red zone
> > | 0x7fb512ac38 [ M]
> > | 0x7fb512ab28:0x7fb512ac30 [ ] 34 slots: Free stack slots
> > | 0x7fb512ab20 [ T ]
> > | 0x7fb512ab18 [ ] VALUE: string 0 "/tmp/net_box.lua:6: err in ser" @ 0x7fb512ade8
> > | 0x7fb512ab10 [ B ] VALUE: table @ 0x7fb512ac80 (asize: 0, hmask: 0x0)
> > | 0x7fb512ab00:0x7fb512ab08 [S ] FRAME: dummy L
> >
> > src/luajit-gdb.py | 16 +++++++++++++---
> > 1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
> > index f1fd6230..a3dad585 100644
> > --- a/src/luajit-gdb.py
> > +++ b/src/luajit-gdb.py
> > @@ -424,16 +424,26 @@ def dump_stack(L, base=None, top=None):
> > )
> > ]) + '\n'
> >
> > + # Set framelink for the top frame. Dump frame slots even for dummy frame.
>
> It would be also great to drop a few words why the iteration is unrolled
> (the lack of postcondition loops in Python) via XXX comment.
>
> > slot = top - 1
> > framelink = base - (1 + LJ_FR2)
>
> I suggest to split this part into two blocks via blank line:
> * variable initialization
> * unrolled iteration
>
> > + while slot > framelink + LJ_FR2:
> > + dump += dump_stack_slot(L, slot, base, top)
> > + slot -= 1
> >
> > + # If there are other frames -- continue.
>
> Again, this is not the particular reason. As for me the main reason you
> can't implement stack unwinding via do-while construction.
Adjusted this chunk considering your suggestions.
===================================================================
diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
index a3dad585..fab276ed 100644
--- a/src/luajit-gdb.py
+++ b/src/luajit-gdb.py
@@ -424,14 +424,22 @@ def dump_stack(L, base=None, top=None):
)
]) + '\n'
- # Set framelink for the top frame. Dump frame slots even for dummy frame.
slot = top - 1
framelink = base - (1 + LJ_FR2)
+
+ # XXX: Lua stack unwinding algorithm consists of the following steps:
+ # 1. dump all data slots in the (framelink, top) interval
+ # 2. check whether there are remaining frames
+ # 3. if there are no slots further, stop the unwinding loop
+ # 4. otherwise, resolve the next framelink and top and go to (1)
+ #
+ # Postcondition (i.e. do-while) loops is the most fitting idiom for such
+ # case, but Python doesn't provide such lexical construction. Hence step (1)
+ # is unrolled for the topmost stack frame.
while slot > framelink + LJ_FR2:
dump += dump_stack_slot(L, slot, base, top)
slot -= 1
- # If there are other frames -- continue.
while framelink > mref('TValue *', L['stack']):
assert slot == framelink + LJ_FR2, "Invalid slot during frame unwind"
dump += dump_framelink(L, framelink)
@@ -442,7 +450,7 @@ def dump_stack(L, base=None, top=None):
slot -= 1
assert slot == framelink + LJ_FR2, "Invalid slot after frame unwind"
- # Skip nil slot for the last frame for 2-slot frames.
+ # Skip a nil slot for the last frame for 2-slot frames.
slot -= LJ_FR2
dump += '{fr}{padding} [S ] FRAME: dummy L'.format(
===================================================================
>
> > while framelink > mref('TValue *', L['stack']):
> > - while slot > framelink + LJ_FR2:
> > - dump += dump_stack_slot(L, slot, base, top)
> > - slot -= 1
> > + assert slot == framelink + LJ_FR2, "Invalid slot during frame unwind"
>
> Please, use parentheses for assert call.
It's done by design to avoid always true assertions [1]:
| $ python3
| Python 3.7.10 (default, Mar 18 2021, 04:43:06)
| [GCC 9.3.0] on linux
| Type "help", "copyright", "credits" or "license" for more information.
| >>> assert(1!=1, "stupid assert")
| <stdin>:1: SyntaxWarning: assertion is always true, perhaps remove parentheses?
>
> > dump += dump_framelink(L, framelink)
> > framelink = frame_prev(framelink + LJ_FR2) - LJ_FR2
> > slot -= 1 + LJ_FR2
> > + while slot > framelink + LJ_FR2:
> > + dump += dump_stack_slot(L, slot, base, top)
> > + slot -= 1
> > +
> > + assert slot == framelink + LJ_FR2, "Invalid slot after frame unwind"
>
> Ditto.
Ditto.
>
> > + # Skip nil slot for the last frame for 2-slot frames.
> > + slot -= LJ_FR2
> >
> > dump += '{fr}{padding} [S ] FRAME: dummy L'.format(
> > fr = slot,
> > --
> > 2.31.0
> >
>
> --
> Best regards,
> IM
[1]: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-assert-stmt
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list