Tarantool development patches archive
 help / color / mirror / Atom feed
From: Mikhail Shishatskiy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Igor Munkin <imun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab
Date: Mon, 22 Nov 2021 18:11:19 +0300	[thread overview]
Message-ID: <20211122151119.ilbfcxcux4kpzrqy@surf.localdomain> (raw)
In-Reply-To: <YY5tXRQAf+rkzHkp@tarantool.org>

Igor,
Thank you for the review! Hope, I got your comments right :)

New commit message:
============================================================
memprof: add info about trace start to symtab

Allocation events occured on traces are recorded by the memory
profiler the following way now

| TRACE [<trace-no>] <trace-addr>

This approach is not descriptive enough to understand, where
exactly allocation took place, as we do not know the code
chunk, associated with the trace.

This patch fixes the problem described above by extending the
symbol table with <sym-trace> entries, consisting of a trace's
mcode starting address, trace number, address of function proto,
and line, where trace recording started:

| sym-trace  := sym-header trace-no trace-addr sym-addr sym-line
| trace-no   := <ULEB128>
| trace-addr := <ULEB128>

The memory profiler parser is adjusted to recognize the entries
mentioned above. On top of that, the API of <utils/symtab.lua> changed:
now table with symbols contains two tables: `lfunc` for Lua functions
symbols and `trace` for trace entries.

+ If the trace is in the symtab, it will be associated with a proto
+ by the <symtab.demangle>, and one will get an output in the format
+ described below:

| TRACE [<trace-no>] <trace-addr> started at @<sym-chunk>:<sym-line>

Follows up tarantool/tarantool#5814
============================================================

The diff:
============================================================
diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
index caab8b3a..dca4826c 100644
--- a/tools/memprof/humanize.lua
+++ b/tools/memprof/humanize.lua
@@ -7,23 +7,6 @@ local symtab = require "utils.symtab"

  local M = {}

-local function describe_location(symbols, loc)
-  if loc.traceno == 0 then
-    return symtab.demangle(symbols, loc)
-  end
-
-  local trace = symbols.trace[loc.traceno]
-
-  -- If trace, which was remembered in the symtab, has not
-  -- been flushed, associate it with a proto, where trace
-  -- recording started.
-  if trace and trace.addr == loc.addr then
-    return symtab.demangle(symbols, loc).." started at "..
-           symtab.demangle(symbols, trace.start)
-  end
-  return symtab.demangle(symbols, loc)
-end
-
  function M.render(events, symbols)
    local ids = {}

@@ -38,7 +21,7 @@ function M.render(events, symbols)
    for i = 1, #ids do
      local event = events[ids[i]]
      print(string.format("%s: %d events\t+%d bytes\t-%d bytes",
-      describe_location(symbols, event.loc),
+      symtab.demangle(symbols, event.loc),
        event.num,
        event.alloc,
        event.free
@@ -46,7 +29,7 @@ function M.render(events, symbols)

      local prim_loc = {}
      for _, heap_chunk in pairs(event.primary) do
-      table.insert(prim_loc, describe_location(symbols, heap_chunk.loc))
+      table.insert(prim_loc, symtab.demangle(symbols, heap_chunk.loc))
      end
      if #prim_loc ~= 0 then
        table.sort(prim_loc)
@@ -80,7 +63,7 @@ function M.leak_info(dheap, symbols)
      -- with enabled jit.
      if info.dbytes > 0 then
        table.insert(leaks, {
-        line = describe_location(symbols, info.loc),
+        line = symtab.demangle(symbols, info.loc),
          dbytes = info.dbytes
        })
      end
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 879979f8..c7fcf77c 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -40,7 +40,7 @@ local function parse_sym_trace(reader, symtab)
    symtab.trace[traceno] = {
      addr = trace_addr,
      -- The structure <start> is the same as the one
-    -- yielded from the <parse_location> fucntion
+    -- yielded from the <parse_location> function
      -- in the <memprof/parse.lua> module.
      start = {
        addr = sym_addr,
@@ -101,9 +101,28 @@ function M.id(loc)
    return string_format("f%#xl%dt%d", loc.addr, loc.line, loc.traceno)
  end

+local function demangle_trace(symtab, loc)
+  local traceno = loc.traceno
+  local addr = loc.addr
+
+  assert(traceno ~= 0, "Location is a trace")
+
+  local trace_str = string_format("TRACE [%d] %#x", traceno, addr)
+  local trace = symtab.trace[traceno]
+
+  -- If trace, which was remembered in the symtab, has not
+  -- been flushed, associate it with a proto, where trace
+  -- recording started.
+  if trace and trace.addr == addr then
+    assert(trace.start.traceno == 0, "Trace start is not a trace")
+    return trace_str.." started at "..M.demangle(symtab, trace.start)
+  end
+  return trace_str
+end
+
  function M.demangle(symtab, loc)
    if loc.traceno ~= 0 then
-    return string_format("TRACE [%d] %#x", loc.traceno, loc.addr)
+    return demangle_trace(symtab, loc)
    end

    local addr = loc.addr
============================================================

On 12.11.2021 16:34, Igor Munkin wrote:
>Misha,
>
>Thanks for your fixes! Please consider my answer regarding trace event
>rendering and a typo in the comment below.
>
>On 04.11.21, Mikhail Shishatskiy wrote:
>> Hi, Igor!
>> Thank you for the review.
>>
>> New commit message:
>> ============================================================
>> memprof: add info about trace start to symtab
>>
>> Allocation events occured on traces are recorded by the memory
>> profiler the following way now
>>
>> | TRACE [<trace-no>] <trace-addr>
>>
>> This approach is not descriptive enough to understand, where
>> exactly allocation took place, as we do not know the code
>> chunk, associated with the trace.
>>
>> This patch fixes the problem described above by extending the
>> symbol table with <sym-trace> entries, consisting of a trace's
>> mcode starting address, trace number, address of function proto,
>> and line, where trace recording started:
>>
>> | sym-trace  := sym-header trace-no trace-addr sym-addr sym-line
>> | trace-no   := <ULEB128>
>> | trace-addr := <ULEB128>
>>
>> The memory profiler parser is adjusted to recognize the entries
>> mentioned above. On top of that, the API of <utils/symtab.lua> changed:
>> now table with symbols contains two tables: `lfunc` for Lua functions
>> symbols and `trace` for trace entries.
>>
>> The demangler module has not changed, but the function
>> `describe_location` is added to the <memprof/humanize.lua> module,
>> which allows one to get a description of the trace location in the
>> format described below:
>>
>> | TRACE [<trace-no>] <trace-addr> started at @<sym-chunk>:<sym-line>
>>
>> Follows up tarantool/tarantool#5814
>> ============================================================
>>
>> Diff:
>> ============================================================
>> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
>> index e8b2ebbc..05542052 100644
>> --- a/src/lj_memprof.c
>> +++ b/src/lj_memprof.c
>> @@ -40,7 +40,6 @@ static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
>>                startpc < proto_bc(pt) + pt->sizebc);
>>
>>     lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
>> -  lua_assert(lineno >= 0);
>>
>>     lj_wbuf_addbyte(out, SYMTAB_TRACE);
>>     lj_wbuf_addu64(out, (uint64_t)trace->traceno);
>> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> index 3003b9f5..ce667afc 100644
>> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> @@ -92,7 +92,7 @@ local function fill_ev_type(events, symbols, event_type)
>>         addr = trace_loc.addr
>>         ev_type.trace[traceno] = {
>>           name = string.format("TRACE [%d] %s:%d",
>> -          traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
>> +          traceno, symbols.lfunc[addr].source, trace_loc.line
>>           ),
>>           num = event.num,
>>         }
>> @@ -122,7 +122,7 @@ local function check_alloc_report(alloc, location, nevents)
>>     local traceno = location.traceno
>>     if traceno then
>>       expected_name = string.format("TRACE [%d] ", traceno)..
>> -                    form_source_line(location.linedefined)
>> +                    form_source_line(location.line)
>>       event = alloc.trace[traceno]
>>     else
>>       expected_name = form_source_line(location.linedefined)
>> @@ -244,9 +244,7 @@ test:test("jit-output", function(subtest)
>>     local free = fill_ev_type(events, symbols, "free")
>>
>>     -- We expect, that loop will be compiled into a trace.
>> -  subtest:ok(check_alloc_report(
>> -    alloc, { traceno = 1, line = 37, linedefined = 32 }, 20
>> -  ))

<snipped>

>> +  subtest:ok(check_alloc_report(alloc, { traceno = 1, line = 37 }, 20))
>>     -- See same checks with jit.off().
>>     subtest:ok(check_alloc_report(alloc, { line = 34, linedefined = 32 }, 2))
>>     subtest:ok(free.INTERNAL.num == 22)
>> diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
>> index 7d30f976..caab8b3a 100644
>> --- a/tools/memprof/humanize.lua
>> +++ b/tools/memprof/humanize.lua
>> @@ -7,7 +7,7 @@ local symtab = require "utils.symtab"
>>
>>   local M = {}
>>
>> -function M.describe_location(symbols, loc)
>> +local function describe_location(symbols, loc)
>>     if loc.traceno == 0 then
>>       return symtab.demangle(symbols, loc)
>>     end
>> @@ -15,7 +15,7 @@ function M.describe_location(symbols, loc)
>>     local trace = symbols.trace[loc.traceno]
>>
>>     -- If trace, which was remembered in the symtab, has not
>> -  -- been flushed, assotiate it with a proto, where trace
>> +  -- been flushed, associate it with a proto, where trace
>>     -- recording started.
>>     if trace and trace.addr == loc.addr then
>>       return symtab.demangle(symbols, loc).." started at "..
>> @@ -38,7 +38,7 @@ function M.render(events, symbols)
>>     for i = 1, #ids do
>>       local event = events[ids[i]]
>>       print(string.format("%s: %d events\t+%d bytes\t-%d bytes",
>> -      M.describe_location(symbols, event.loc),
>> +      describe_location(symbols, event.loc),
>>         event.num,
>>         event.alloc,
>>         event.free
>> @@ -46,7 +46,7 @@ function M.render(events, symbols)
>>
>>       local prim_loc = {}
>>       for _, heap_chunk in pairs(event.primary) do
>> -      table.insert(prim_loc, M.describe_location(symbols, heap_chunk.loc))
>> +      table.insert(prim_loc, describe_location(symbols, heap_chunk.loc))
>>       end
>>       if #prim_loc ~= 0 then
>>         table.sort(prim_loc)
>> @@ -80,7 +80,7 @@ function M.leak_info(dheap, symbols)
>>       -- with enabled jit.
>>       if info.dbytes > 0 then
>>         table.insert(leaks, {
>> -        line = M.describe_location(symbols, info.loc),
>> +        line = describe_location(symbols, info.loc),
>>           dbytes = info.dbytes
>>         })
>>       end
>> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
>> index 49ebb36f..879979f8 100644
>> --- a/tools/utils/symtab.lua
>> +++ b/tools/utils/symtab.lua
>> @@ -39,6 +39,9 @@ local function parse_sym_trace(reader, symtab)
>>
>>     symtab.trace[traceno] = {
>>       addr = trace_addr,
>> +    -- The structure <start> is the same as the one
>> +    -- yielded from the <parse_location> fucntion
>
>Typo: s/fucntion/function/.
>
>> +    -- in the <memprof/parse.lua> module.
>>       start = {
>>         addr = sym_addr,
>>         line = sym_line,
>> ============================================================
>>
>> Issue: https://github.com/tarantool/tarantool/issues/5814
>> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>>
>> On 01.11.2021 19:31, Igor Munkin wrote:
>> >Misha,
>> >
>> >Thanks for the patch! Please consider my comments below.
>> >
>> >On 29.09.21, Mikhail Shishatskiy wrote:
>> >> Trace allocation sources, recorded by the memory profiler,
>> >> were reported as
>>
>> <snipped>
>>
>> >>
>> >> | TRACE [<trace-no>] <trace-addr>
>> >>
>> >> This approach is not descriptive enough to understand, where
>> >> exactly allocation took place, as we do not know the code
>> >> chunk, associated with the trace.
>> >>
>> >> This patch fixes the problem described above by extending the
>> >> symbol table with <sym-trace> entries, consisting of a trace's
>> >> mcode starting address, trace number, address of function proto,
>> >> and line, where trace recording started:
>> >>
>> >> | sym-trace  := sym-header trace-no trace-addr sym-addr sym-line
>> >> | trace-no   := <ULEB128>
>> >> | trace-addr := <ULEB128>
>> >>
>> >> The memory profiler parser is adjusted to recognize the entries
>> >> mentioned above. On top of that, the API of <utils/symtab.lua> changed:
>> >> now table with symbols contains two tables: `lfunc` for Lua functions
>> >> symbols and `trace` for trace entries.
>> >>
>> >> The demangler module has not changed, but the function
>> >> `describe_location` is added to the <memprof/humanize.lua> module,
>> >> which allows one to get a description of the trace location in the
>> >> format described below:
>> >>
>> >> | TRACE [<trace-no>] <trace-addr> started at @<sym-chunk>:<sym-line>
>> >>
>> >> Follows up tarantool/tarantool#5814
>> >> ---
>> >>
>> >> Issue: https://github.com/tarantool/tarantool/issues/5814
>> >> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>> >> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>> >>
>> >>  src/lj_memprof.c                              | 43 +++++++++++++++++++
>> >>  src/lj_memprof.h                              |  8 +++-
>> >>  .../misclib-memprof-lapi.test.lua             | 15 ++++---
>> >>  tools/memprof.lua                             |  4 +-
>> >>  tools/memprof/humanize.lua                    | 30 ++++++++++---
>> >>  tools/memprof/process.lua                     |  9 ++--
>> >>  tools/utils/symtab.lua                        | 31 ++++++++++---
>> >>  7 files changed, 118 insertions(+), 22 deletions(-)
>> >>
>
><snipped>
>
>> >> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> >> index 3f4ffea0..b9edb80d 100644
>> >> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> >> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
>
><snipped>
>
>> >> @@ -116,7 +120,8 @@ end
>> >>  local function check_alloc_report(alloc, traceno, line, function_line, nevents)
>> >>    local expected_name, event
>> >>    if traceno ~= 0 then
>> >> -    expected_name = string.format("TRACE [%d]", traceno)
>> >> +    expected_name = string.format("TRACE [%d] ", traceno)..
>> >> +                    form_source_line(function_line)
>> >
>> >The output format differs from the one produced by memprof parser,
>> >doesn't it?
>>
>> Yes, because we demangle names in <fill_ev_type> function. So, we can
>> omit "started at" part to check that everything else is correct.
>
>Oh, I see... This is odd a bit but now I got it, thanks!
>
>>
>
><snipped>
>
>> >> diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
>> >> index 7771005d..7d30f976 100644
>> >> --- a/tools/memprof/humanize.lua
>> >> +++ b/tools/memprof/humanize.lua
>> >> @@ -7,6 +7,23 @@ local symtab = require "utils.symtab"
>
><snipped>
>
>> >> +  -- recording started.
>> >> +  if trace and trace.addr == loc.addr then
>> >> +    return symtab.demangle(symbols, loc).." started at "..
>> >> +           symtab.demangle(symbols, trace.start)
>> >
>> >Finally, I got the thing that bothers me the most. Why do you make
>> ><describe_location> so complex? It looks that you can move all these
>> >if-else branching to <symtab.demangle> and concatenation to
>> ><demangle_trace> function, doesn't it? AFAICS, you can remove
>> ><describe_location> as a result and trace demangling will be
>> >encapsulated in scope of <demangle_trace> function. Feel free to correct
>> >me if I'm wrong.
>>
>> Initially it was implemented, as you suggest now. But Sergey in his
>> review led me to believe, that "started at" part should ideologically
>> relate to the humanizer module. And I agree with that point, but maybe
>> I decomposed things not in a very good way.
>
>Em... In that way all other types (such as "INTERNAL" and "CFUNC %#x")
>should also be in the humanizer module, since this representation is
>specific for a particular output format. All in all nobody stops you
>from moving <symtab.demangle> to the humanize module, since it's used
>only there (and need to be used only there).
>
>BTW, Sergey is also in Cc, so he can also drop a few words regarding it.
>
>>
>> Another way to implement this is to demangle without "started at" and
>> then insert it to the demangled name. What do you think?
>
>My point is to have the whole "stringification" mess encapsulated in a
>single function (like it's almost done within <symtab.demangle>). And
>the only thing remaining outside of this function is "started at" tail.
>I hope this fits your vision regarding decomposition :)

So, I decided to decompose things, as you suggested initially. Now
traces are demangled separately inside the <demangle_trace> function.
The "started at" string also moved to that function in order not to
spout some convoluted logic of injecting this string inside the
demangled string outside the demangler module. The only nuance is
a possibility of recursion calls:

| symtab.demangle -> demangle_trace -> symtab.demangle -> demangle_trace -> ...

But I inserted asserts to prevent such cases.

>
>>
>> >
>> >> +  end
>> >> +  return symtab.demangle(symbols, loc)
>> >> +end
>> >> +
>
><snipped>
>
>> >> --
>> >> 2.33.0
>> >>
>> >
>> >--
>> >Best regards,
>> >IM
>>
>> --
>> Best regards,
>> Mikhail Shishatskiy
>
>-- 
>Best regards,
>IM

--
Best regards,
Mikhail Shishatskiy

  parent reply	other threads:[~2021-11-22 15:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
2021-10-27 15:07     ` Sergey Kaplun via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing Mikhail Shishatskiy via Tarantool-patches
2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
     [not found]       ` <20211104130010.mcvnra6e4yl5moo2@surf.localdomain>
2021-11-10 15:38         ` Igor Munkin via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
     [not found]       ` <20211104130156.f2botlihlfhwd3yh@surf.localdomain>
2021-11-11 15:34         ` Igor Munkin via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab Mikhail Shishatskiy via Tarantool-patches
2021-11-01 16:31     ` Igor Munkin via Tarantool-patches
     [not found]       ` <20211104130228.x6qcne5xeh544hm7@surf.localdomain>
2021-11-12 13:34         ` Igor Munkin via Tarantool-patches
2021-11-17  8:17           ` Sergey Kaplun via Tarantool-patches
2021-11-22 15:11           ` Mikhail Shishatskiy via Tarantool-patches [this message]
2021-11-24 12:42             ` Mikhail Shishatskiy via Tarantool-patches
2021-11-24 16:44             ` Igor Munkin via Tarantool-patches
2022-01-27 23:29   ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Igor Munkin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211122151119.ilbfcxcux4kpzrqy@surf.localdomain \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=imun@tarantool.org \
    --cc=m.shishatskiy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox