[Tarantool-patches] [PATCH luajit v2] lldb: introduce luajit-lldb
Igor Munkin
imun at tarantool.org
Mon Apr 24 17:26:58 MSK 2023
Max,
Thanks for your patch! Nicely done with isolating debugger specific
helpers, I'm looking forward to merging it with luajit-gdb.py
On 27.10.22, Maksim Kokryashkin wrote:
> From: "m.kokryashkin" <m.kokryashkin at m-kokryashkin.local>
Oops, looks like you've committed the change on your local macOS.
>
> It is impossible to run gdb on M1 devices, the only available
> debugger is lldb. The luajit-gdb extension doesn't work with
> lldb, so this patch introduces the luajit-lldb extension,
> which re-implements exactly the same functionality.
>
> It is worth noting that the extension is called
> `luajit_lldb.py` instead of `luajit-lldb.py` because lldb
> imports it as a Python module, and it is prohibited to use
> dashes in a python module name.
>
> Part of tarantool/tarantool#4808
> ---
> Branch: https://github.com/tarantool/luajit/tree/gh-fckxorg/luajit-lldb
>
> Changes in v2:
> - Fixed comments as per review by Sergey
> - Reorganized the `c_structs` spec.
>
> src/luajit_lldb.py | 1006 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 1006 insertions(+)
> create mode 100644 src/luajit_lldb.py
>
> diff --git a/src/luajit_lldb.py b/src/luajit_lldb.py
> new file mode 100644
> index 00000000..edd2884e
> --- /dev/null
> +++ b/src/luajit_lldb.py
> @@ -0,0 +1,1006 @@
<snipped>
> + def __add__(self, other):
> + assert isinstance(other, int)
> + return self.__class__(cast(self.normal_type.__name__ + ' *', cast('uintptr_t', self.value.unsigned + other * self.value.deref.size)))
> +
Minor: I see no reason in such assymetry in type variety supported by
__add__ and __sub__. Anyway, it doesn't affect extension usage, so feel
free to ignore.
> + def __sub__(self, other):
> + assert isinstance(other, int) or isinstance(other, Ptr)
> + if isinstance(other, int):
> + return self.__add__(-other)
> + else:
> + return self.value.unsigned - other.value.unsigned
> +
> +
<snipped>
> +
Could you please add some comments to <cast> function? AFAIU, there are
type detection and value cast: some spots are quite cryptic...
> +def cast(typename, value):
> + pointer_type = False
> + name = None
> + if isinstance(value, Struct) or isinstance(value, Ptr):
> + value = value.value
> + if isinstance(typename, type):
> + name = typename.__name__
> + if name.endswith('Ptr'):
> + pointer_type = True
> + name = name[:-3]
> + else:
> + name = typename
> + if name[-1] == '*':
> + name = name[:-1].strip()
> + pointer_type = True
> + t = target.FindFirstType(name)
> + if pointer_type:
> + t = t.GetPointerType()
> +
> + if isinstance(value, int):
> + if pointer_type:
> + casted = target.CreateValueFromAddress('value', lldb.SBAddress(value, target), t.GetPointeeType()).address_of
> + else:
> + casted = target.CreateValueFromData(name = 'value', data = lldb.SBData.CreateDataFromInt(value, size=8), type=t)
> + else:
> + casted = value.Cast(t)
> +
> + if isinstance(typename, type):
> + return typename(casted)
> + else:
> + return casted
> +
> +def lookup_global(name):
> + global target
Minor: Do we need explicitly mention <target> as global here?
> + return target.FindFirstGlobalVariable(name)
> +
> +def type_member(type_obj, name):
> + return next((x for x in type_obj.members if x.name == name), None)
> +
> +def find_type(typename):
> + return target.FindFirstType(typename)
I believe, you can "inline" this function to their callers, if you use
it only in "Debugger helpers" scope.
> +
> +def offsetof(typename, membername):
> + type_obj = find_type(typename)
> + member = type_member(type_obj, membername)
> + return member.GetOffsetInBytes()
AFAIU, <type_member> can yield None, so the following "dereference" can
fail in this case. Furthermore, what about inlining <type_member> here?
> +
> +def sizeof(type_obj):
> + return type_obj.GetByteSize()
I'd rather add <find_type> right to the function, like it's done in
<offsetof> helper.
> +
<snipped>
> +def J(g):
> + return cast(jit_StatePtr, vtou64(cast('char *', g)) - offsetof('GG_State', 'g') + offsetof('GG_State', 'J'))
Please, split this into several lines.
> +
<snipped>
> +def strdata(obj):
> + try:
> + ptr = cast('char *', obj + 1)
> + return ptr.summary
> + except UnicodeEncodeError:
> + return "<luajit-lldb: error occured while rendering non-ascii slot>"
<snipped>
> +def frame_prevl(framelink):
> + return framelink - (1 + LJ_FR2 + bc_a(vtou64(dbg_eval('((BCIns *)' + str(frame_pc(framelink)) + ')[-1]'))))
Ouch... Could you please clarify this black voodoo magic with the
comment? I don't know, whether it's possible to do it in another more
clear way, so I ask only for clarification.
> +
<snipped>
> +def dump_framelink(L, fr):
> + fr2 = fr + LJ_FR2
> +
> + return '{fr}{padding} [ ] FRAME: [{pp}] delta={d}, {f}\n'.format(
> + fr = strx64(fr),
> + padding = ':{fr2: <{width}}'.format(fr2 = strx64(fr2), width=len(PADDING) - 1) if LJ_FR2 else PADDING,
> + pp = 'PP' if frame_ispcall(fr2) else '{frname}{p}'.format(
> + frname = frametypes(frame_type(fr2)),
> + p = 'P' if frame_typep(fr2) & FRAME_P else ''
> + ),
> + d = fr2 - frame_prev(fr2),
> + f = dump_lj_tfunc(fr),
> + )
> +
> +def dump_stack_slot(L, slot, base=None, top=None, eol='\n'):
> + base = base or L.base
> + top = top or L.top
> +
> + return '{addr}{padding} [ {B}{T}{M}] VALUE: {value}{eol}'.format(
> + addr = strx64(slot),
> + padding = PADDING,
> + B = 'B' if slot == base else ' ',
> + T = 'T' if slot == top else ' ',
> + M = 'M' if slot == mref(TValuePtr, L.maxstack) else ' ',
> + value = dump_tvalue(slot),
> + eol = eol,
> + )
> +
I'd rather implement <dump_stack> via Python iterators, like Misha has
done in scope of 2692f36[1]. It will definitely make maintenance easier.
> +def dump_stack(L, base=None, top=None):
> + base = base or L.base
> + top = top or L.top
> + stack = mref(TValuePtr, L.stack)
> + maxstack = mref(TValuePtr, L.maxstack)
> + red = 5 + 2 * LJ_FR2
> +
> + dump = '\n'.join([
> + '{padding} Red zone: {nredslots: >2} slots {padding}'.format(
> + padding = '-' * len(PADDING),
> + nredslots = red,
> + ),
> + *(
> + dump_stack_slot(L, maxstack + offset, base, top, '')
> + for offset in range(red, 0, -1)
> + ),
> + '{padding} Stack: {nstackslots: >5} slots {padding}'.format(
> + padding = '-' * len(PADDING),
> + nstackslots = int((maxstack - stack) >> 3),
> + ),
> + dump_stack_slot(L, maxstack, base, top, ''),
> + '{start}:{end: <{width}} [ ] {nfreeslots} slots: Free stack slots'.format(
> + start = strx64(top + 1),
> + end = strx64(maxstack - 1),
> + width = len(PADDING) - 1,
> + nfreeslots = int((maxstack - top - 8) >> 3),
> + ),
> + ]) + '\n'
> +
> + slot = top
> + framelink = base - (1 + LJ_FR2)
> +
> + # XXX: Lua stack unwinding algorithm consists of the following steps:
> + # 1. dump all data slots in the (framelink, top) interval
> + # 2. check whether there are remaining frames
> + # 3. if there are no slots further, stop the unwinding loop
> + # 4. otherwise, resolve the next framelink and top and go to (1)
> + #
> + # Postcondition (i.e. do-while) loops is the most fitting idiom for such
> + # case, but Python doesn't provide such lexical construction. Hence step (1)
> + # is unrolled for the topmost stack frame.
> + while slot > framelink + LJ_FR2:
> + dump += dump_stack_slot(L, slot, base, top)
> + slot -= 1
> +
> + while framelink > stack:
> + assert slot == framelink + LJ_FR2, "Invalid slot during frame unwind"
> + dump += dump_framelink(L, framelink)
> + framelink = frame_prev(framelink + LJ_FR2) - LJ_FR2
> + slot -= 1 + LJ_FR2
> + while slot > framelink + LJ_FR2:
> + dump += dump_stack_slot(L, slot, base, top)
> + slot -= 1
> +
> + assert slot == framelink + LJ_FR2, "Invalid slot after frame unwind"
> + # Skip a nil slot for the last frame for 2-slot frames.
> + slot -= LJ_FR2
> +
> + dump += '{fr}{padding} [S ] FRAME: dummy L'.format(
> + fr = strx64(slot),
> + padding = ':{nilslot: <{offset}}'.format(nilslot = strx64(slot + 1), offset=len(PADDING) - 1) if LJ_FR2 else PADDING
> + )
> +
> + return dump
<snipped>
> +class LJDumpStack(Command):
<snipped>
> + command = 'lj-stack'
> +
> + def execute(self, debugger, args, result):
> + global target
> + target = debugger.GetSelectedTarget()
As we discussed I suggest to move target and <command> field
initialization to the parent constructor, like it's done in GDB script.
> + l = self.parse(args)
> + print('{}\n'.format(dump_stack(L(l))))
> +
> +
> +
> +
> +
> +def register_commands(debugger):
> + for cls in Command.__subclasses__():
> + debugger.HandleCommand(
> + 'command script add --overwrite --class luajit_lldb.{cls} {command}'.format(
> + cls=cls.__name__,
> + command=cls.command,
> + )
> + )
It would be also nice to report that the extension is loaded. Consider
the output produced by GDB:
| (gdb) source luajit-gdb.py
| lj-arch command initialized
| lj-tv command initialized
| lj-str command initialized
| lj-tab command initialized
| lj-stack command initialized
| lj-state command initialized
| lj-gc command initialized
| luajit-gdb.py is successfully loaded
> +
> +def configure(debugger):
> + global LJ_64, LJ_GC64, LJ_FR2, LJ_DUALNUM, PADDING, LJ_TISNUM
> + target = debugger.GetSelectedTarget()
> + module = target.modules[0]
> + LJ_DUALNUM = module.FindSymbol('lj_lib_checknumber') != None
> +
> + try:
> + irtype_enum = target.FindFirstType('IRType').enum_members
> + for member in irtype_enum:
> + if member.name == 'IRT_PTR':
> + LJ_64 = member.unsigned & 0x1f == IRT_P64
> + if member.name == 'IRT_PGC':
> + LJ_FR2 = LJ_GC64 = member.unsigned & 0x1f == IRT_P64
> + except:
> + print('luajit-lldb.py failed to load: '
> + 'no debugging symbols found for libluajit\n')
Initialization is failed, hence return is missing in the except branch.
> +
> + PADDING = ' ' * len(':' + hex((1 << (47 if LJ_GC64 else 32)) - 1))
> + LJ_TISNUM = 0xfffeffff if LJ_64 and not LJ_GC64 else LJ_T['NUMX']
> +
> +
> +
> +def __lldb_init_module(debugger, internal_dict):
> + configure(debugger)
> + register_commands(debugger)
> --
> 2.37.0 (Apple Git-136)
>
[1]: https://github.com/tarantool/luajit/commit/2692f36
--
Best regards,
IM
More information about the Tarantool-patches
mailing list