Re: [Tarantool-patches] [PATCH luajit v2] lldb: introduce luajit-lldb

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Igor Munkin via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Maksim Kokryashkin <max.kokryashkin@gmail.com>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit v2] lldb: introduce luajit-lldb
Date: Mon, 24 Apr 2023 14:26:58 +0000	[thread overview]
Message-ID: <ZEaROhG2en/vXHCl@tarantool.org> (raw)
In-Reply-To: <20221027131807.31287-1-max.kokryashkin@gmail.com>

Max,

Thanks for your patch! Nicely done with isolating debugger specific
helpers, I'm looking forward to merging it with luajit-gdb.py

On 27.10.22, Maksim Kokryashkin wrote:
> From: "m.kokryashkin" <m.kokryashkin@m-kokryashkin.local>

Oops, looks like you've committed the change on your local macOS.

> 
> It is impossible to run gdb on M1 devices, the only available
> debugger is lldb. The luajit-gdb extension doesn't work with
> lldb, so this patch introduces the luajit-lldb extension,
> which re-implements exactly the same functionality.
> 
> It is worth noting that the extension is called
> `luajit_lldb.py` instead of `luajit-lldb.py` because lldb
> imports it as a Python module, and it is prohibited to use
> dashes in a python module name.
> 
> Part of tarantool/tarantool#4808
> ---
> Branch: https://github.com/tarantool/luajit/tree/gh-fckxorg/luajit-lldb
> 
> Changes in v2:
> - Fixed comments as per review by Sergey
> - Reorganized the `c_structs` spec.
> 
>  src/luajit_lldb.py | 1006 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 1006 insertions(+)
>  create mode 100644 src/luajit_lldb.py
> 
> diff --git a/src/luajit_lldb.py b/src/luajit_lldb.py
> new file mode 100644
> index 00000000..edd2884e
> --- /dev/null
> +++ b/src/luajit_lldb.py
> @@ -0,0 +1,1006 @@

<snipped>

> +    def __add__(self, other):
> +        assert isinstance(other, int)
> +        return self.__class__(cast(self.normal_type.__name__ + ' *', cast('uintptr_t', self.value.unsigned + other * self.value.deref.size)))
> +

Minor: I see no reason in such assymetry in type variety supported by
__add__ and __sub__. Anyway, it doesn't affect extension usage, so feel
free to ignore.

> +    def __sub__(self, other):
> +        assert isinstance(other, int) or isinstance(other, Ptr)
> +        if isinstance(other, int):
> +            return self.__add__(-other)
> +        else:
> +            return self.value.unsigned - other.value.unsigned
> +
> +

<snipped>

> +

Could you please add some comments to <cast> function? AFAIU, there are
type detection and value cast: some spots are quite cryptic...

> +def cast(typename, value):
> +    pointer_type = False
> +    name = None
> +    if isinstance(value, Struct) or isinstance(value, Ptr):
> +        value = value.value
> +    if isinstance(typename, type):
> +        name = typename.__name__
> +        if name.endswith('Ptr'):
> +            pointer_type = True
> +            name = name[:-3]
> +    else:
> +        name = typename
> +        if name[-1] == '*':
> +            name = name[:-1].strip()
> +            pointer_type = True
> +    t = target.FindFirstType(name)
> +    if pointer_type:
> +        t = t.GetPointerType()
> +
> +    if isinstance(value, int):
> +        if pointer_type:
> +            casted = target.CreateValueFromAddress('value', lldb.SBAddress(value, target), t.GetPointeeType()).address_of
> +        else:
> +            casted = target.CreateValueFromData(name = 'value', data = lldb.SBData.CreateDataFromInt(value, size=8), type=t)
> +    else:
> +        casted = value.Cast(t)
> +
> +    if isinstance(typename, type):
> +        return typename(casted)
> +    else:
> +        return casted
> +
> +def lookup_global(name):
> +    global target

Minor: Do we need explicitly mention <target> as global here?

> +    return target.FindFirstGlobalVariable(name)
> +
> +def type_member(type_obj, name):
> +    return next((x for x in type_obj.members if x.name == name), None)
> +
> +def find_type(typename):
> +    return target.FindFirstType(typename)

I believe, you can "inline" this function to their callers, if you use
it only in "Debugger helpers" scope.

> +
> +def offsetof(typename, membername):
> +    type_obj = find_type(typename)
> +    member = type_member(type_obj, membername)
> +    return member.GetOffsetInBytes()

AFAIU, <type_member> can yield None, so the following "dereference" can
fail in this case. Furthermore, what about inlining <type_member> here?

> +
> +def sizeof(type_obj):
> +    return type_obj.GetByteSize()

I'd rather add <find_type> right to the function, like it's done in
<offsetof> helper.

> +

<snipped>

> +def J(g):
> +    return cast(jit_StatePtr, vtou64(cast('char *', g)) - offsetof('GG_State', 'g') + offsetof('GG_State', 'J'))

Please, split this into several lines.

> +

<snipped>

> +def strdata(obj):
> +    try:
> +        ptr = cast('char *', obj + 1)
> +        return ptr.summary
> +    except UnicodeEncodeError:
> +        return "<luajit-lldb: error occured while rendering non-ascii slot>"

<snipped>

> +def frame_prevl(framelink):
> +    return framelink - (1 + LJ_FR2 + bc_a(vtou64(dbg_eval('((BCIns *)' + str(frame_pc(framelink)) + ')[-1]'))))

Ouch... Could you please clarify this black voodoo magic with the
comment? I don't know, whether it's possible to do it in another more
clear way, so I ask only for clarification.

> +

<snipped>

> +def dump_framelink(L, fr):
> +    fr2 = fr + LJ_FR2
> +
> +    return '{fr}{padding} [    ] FRAME: [{pp}] delta={d}, {f}\n'.format(
> +        fr = strx64(fr),
> +        padding = ':{fr2: <{width}}'.format(fr2 = strx64(fr2), width=len(PADDING) - 1) if LJ_FR2 else PADDING,
> +        pp = 'PP' if frame_ispcall(fr2) else '{frname}{p}'.format(
> +            frname = frametypes(frame_type(fr2)),
> +            p = 'P' if frame_typep(fr2) & FRAME_P else ''
> +        ),
> +        d = fr2 - frame_prev(fr2),
> +        f = dump_lj_tfunc(fr),
> +    )
> +
> +def dump_stack_slot(L, slot, base=None, top=None, eol='\n'):
> +    base = base or L.base
> +    top = top or L.top
> +
> +    return '{addr}{padding} [ {B}{T}{M}] VALUE: {value}{eol}'.format(
> +        addr = strx64(slot),
> +        padding = PADDING,
> +        B = 'B' if slot == base else ' ',
> +        T = 'T' if slot == top else ' ',
> +        M = 'M' if slot == mref(TValuePtr, L.maxstack) else ' ',
> +        value = dump_tvalue(slot),
> +        eol = eol,
> +    )
> +

I'd rather implement <dump_stack> via Python iterators, like Misha has
done in scope of 2692f36[1]. It will definitely make maintenance easier.

> +def dump_stack(L, base=None, top=None):
> +    base = base or L.base
> +    top = top or L.top
> +    stack = mref(TValuePtr, L.stack)
> +    maxstack = mref(TValuePtr, L.maxstack)
> +    red = 5 + 2 * LJ_FR2
> +
> +    dump = '\n'.join([
> +        '{padding} Red zone: {nredslots: >2} slots {padding}'.format(
> +            padding = '-' * len(PADDING),
> +            nredslots = red,
> +        ),
> +        *(
> +            dump_stack_slot(L, maxstack + offset, base, top, '')
> +                for offset in range(red, 0, -1)
> +        ),
> +        '{padding} Stack: {nstackslots: >5} slots {padding}'.format(
> +            padding = '-' * len(PADDING),
> +            nstackslots = int((maxstack - stack) >> 3),
> +        ),
> +        dump_stack_slot(L, maxstack, base, top, ''),
> +        '{start}:{end: <{width}} [    ] {nfreeslots} slots: Free stack slots'.format(
> +            start = strx64(top + 1),
> +            end = strx64(maxstack - 1),
> +            width = len(PADDING) - 1,
> +            nfreeslots = int((maxstack - top - 8) >> 3),
> +        ),
> +    ]) + '\n'
> +
> +    slot = top
> +    framelink = base - (1 + LJ_FR2)
> +
> +    # XXX: Lua stack unwinding algorithm consists of the following steps:
> +    # 1. dump all data slots in the (framelink, top) interval
> +    # 2. check whether there are remaining frames
> +    # 3. if there are no slots further, stop the unwinding loop
> +    # 4. otherwise, resolve the next framelink and top and go to (1)
> +    #
> +    # Postcondition (i.e. do-while) loops is the most fitting idiom for such
> +    # case, but Python doesn't provide such lexical construction. Hence step (1)
> +    # is unrolled for the topmost stack frame.
> +    while slot > framelink + LJ_FR2:
> +        dump += dump_stack_slot(L, slot, base, top)
> +        slot -= 1
> +
> +    while framelink > stack:
> +        assert slot == framelink + LJ_FR2, "Invalid slot during frame unwind"
> +        dump += dump_framelink(L, framelink)
> +        framelink = frame_prev(framelink + LJ_FR2) - LJ_FR2
> +        slot -= 1 + LJ_FR2
> +        while slot > framelink + LJ_FR2:
> +            dump += dump_stack_slot(L, slot, base, top)
> +            slot -= 1
> +
> +    assert slot == framelink + LJ_FR2, "Invalid slot after frame unwind"
> +    # Skip a nil slot for the last frame for 2-slot frames.
> +    slot -= LJ_FR2
> +
> +    dump += '{fr}{padding} [S   ] FRAME: dummy L'.format(
> +        fr = strx64(slot),
> +        padding = ':{nilslot: <{offset}}'.format(nilslot = strx64(slot + 1), offset=len(PADDING) - 1) if LJ_FR2 else PADDING
> +    )
> +
> +    return dump

<snipped>

> +class LJDumpStack(Command):

<snipped>

> +    command = 'lj-stack'
> +
> +    def execute(self, debugger, args, result):
> +        global target
> +        target = debugger.GetSelectedTarget()

As we discussed I suggest to move target and <command> field
initialization to the parent constructor, like it's done in GDB script.

> +        l = self.parse(args)
> +        print('{}\n'.format(dump_stack(L(l))))
> +
> +
> +
> +
> +
> +def register_commands(debugger):
> +    for cls in Command.__subclasses__():
> +        debugger.HandleCommand(
> +            'command script add --overwrite --class luajit_lldb.{cls} {command}'.format(
> +                cls=cls.__name__,
> +                command=cls.command,
> +            )
> +        )

It would be also nice to report that the extension is loaded. Consider
the output produced by GDB:
| (gdb) source luajit-gdb.py
| lj-arch command initialized
| lj-tv command initialized
| lj-str command initialized
| lj-tab command initialized
| lj-stack command initialized
| lj-state command initialized
| lj-gc command initialized
| luajit-gdb.py is successfully loaded

> +
> +def configure(debugger):
> +    global LJ_64, LJ_GC64, LJ_FR2, LJ_DUALNUM, PADDING, LJ_TISNUM
> +    target = debugger.GetSelectedTarget()
> +    module = target.modules[0]
> +    LJ_DUALNUM = module.FindSymbol('lj_lib_checknumber') != None
> +
> +    try:
> +        irtype_enum = target.FindFirstType('IRType').enum_members
> +        for member in irtype_enum:
> +            if member.name == 'IRT_PTR':
> +                LJ_64 = member.unsigned & 0x1f == IRT_P64
> +            if member.name == 'IRT_PGC':
> +                LJ_FR2 = LJ_GC64 = member.unsigned & 0x1f == IRT_P64
> +    except:
> +        print('luajit-lldb.py failed to load: '
> +              'no debugging symbols found for libluajit\n')

Initialization is failed, hence return is missing in the except branch.

> +
> +    PADDING = ' ' * len(':' + hex((1 << (47 if LJ_GC64 else 32)) - 1))
> +    LJ_TISNUM = 0xfffeffff if LJ_64 and not LJ_GC64 else LJ_T['NUMX']
> +
> +
> +
> +def __lldb_init_module(debugger, internal_dict):
> +    configure(debugger)
> +    register_commands(debugger)
> -- 
> 2.37.0 (Apple Git-136)
> 

[1]: https://github.com/tarantool/luajit/commit/2692f36

-- 
Best regards,
IM

          parent reply	other threads:[~2023-04-24 14:37 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20221027131807.31287-1-max.kokryashkin@gmail.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZEaROhG2en/vXHCl@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=imun@tarantool.org \
    --cc=max.kokryashkin@gmail.com \
    --subject='Re: [Tarantool-patches] [PATCH luajit v2] lldb: introduce luajit-lldb' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox