[Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno

Wed Oct 27 16:56:55 MSK 2021

Misha,

Thanks for the patch! Please, consider the comments below.

On 29.09.21, Mikhail Shishatskiy wrote:
> When LuaJIT executes a trace, the trace number is stored in

Typo: s/in/as/.

> the virtual machine state. So, we can treat this number as
> an allocation event source in memprof and report allocation events
> from traces as well.
> 
> Previously, all the allocations from traces were marked as INTERNAL.
> 
> This patch introduces the functionality described above by adding
> a new allocation source type named ASOURCE_TRACE. If at the moment
> when allocation event occurs VM state indicates that trace executed,

Minor: To make the wording a bit clearer, I propose the following:
| If allocation event occurs when trace is executed, trace number...

> trace number and trace's mcode starting address streamed to a binary

Typo: s/streamed/is streamed/.

> file:
> 
> | loc-trace  := trace-no trace-addr
> | trace-no   := <ULEB128>
> | trace-addr := <ULEB128>
> 
> Also, the memory profiler parser is adjusted to recognize entries
> mentioned above. The <loc> structure is extended with field <traceno>,
> representing trace number. Trace locations are demangled as
> 
> | TRACE [<trace-no>] <trace-addr>
> 
> Resolves tarantool/tarantool#5814

Does this patch "resolve" the issue or only being a "part of" it?

> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  src/Makefile.dep.original                     |  3 +-
>  src/lj_memprof.c                              | 36 ++++++-
>  src/lj_memprof.h                              | 14 ++-
>  .../misclib-memprof-lapi.test.lua             | 97 +++++++++++++++----
>  tools/memprof/parse.lua                       | 13 ++-
>  tools/utils/symtab.lua                        | 20 ++--
>  6 files changed, 148 insertions(+), 35 deletions(-)
> 

<snipped>

> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> index 9de4bd98..3f4ffea0 100644
> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua

<snipped>

> @@ -96,10 +113,21 @@ local function form_source_line(line)
>    return string.format("@%s:%d", arg[0], line)
>  end
>  
> -local function check_alloc_report(alloc, line, function_line, nevents)
> -  assert(form_source_line(function_line) == alloc[line].name)
> -  assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
> -    alloc[line].num,
> +local function check_alloc_report(alloc, traceno, line, function_line, nevents)

OK, here we have a function with 5 parameters. Almost half of them is
used in a single particular case. Hence, I propose to change the
interface to the following:
| local function check_alloc_report(alloc, location, nevents)

In this case location is a table with either "traceno" key or with
"line" and "linedefined" keys. Such function looks better, doesn't it?
Consider the following:
| check_alloc_report(alloc, { traceno = 1 }, 20)
| check_alloc_report(alloc, { line = 34, linedefined = 32 }, 2)

> +  local expected_name, event
> +  if traceno ~= 0 then
> +    expected_name = string.format("TRACE [%d]", traceno)
> +    event = alloc.trace[traceno]
> +  else
> +    expected_name = form_source_line(function_line)
> +    event = alloc.line[line]
> +  end
> +  assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
> +    event.name,
> +    expected_name
> +  ))
> +  assert(event.num == nevents, ("got=%d, expected=%d"):format(
> +    event.num,
>      nevents
>    ))
>    return true
> @@ -145,18 +173,18 @@ test:test("output", function(subtest)
>    -- one is the number of allocations. 1 event - alocation of
>    -- table by itself + 1 allocation of array part as far it is
>    -- bigger than LJ_MAX_COLOSIZE (16).
> -  subtest:ok(check_alloc_report(alloc, 27, 25, 2))
> -  -- 100 strings allocations.
> -  subtest:ok(check_alloc_report(alloc, 32, 25, 100))
> +  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
> +  -- 20 strings allocations.
> +  subtest:ok(check_alloc_report(alloc, 0, 39, 32, 20))

As a result of the change proposed above, you need no these ugly fixes
for the existing spots in the future.

>  
>    -- Collect all previous allocated objects.
> -  subtest:ok(free.INTERNAL.num == 102)
> +  subtest:ok(free.INTERNAL.num == 22)
>  
>    -- Tests for leak-only option.
>    -- See also https://github.com/tarantool/tarantool/issues/5812.
>    local heap_delta = process.form_heap_delta(events, symbols)
> -  local tab_alloc_stats = heap_delta[form_source_line(27)]
> -  local str_alloc_stats = heap_delta[form_source_line(32)]
> +  local tab_alloc_stats = heap_delta[form_source_line(34)]
> +  local str_alloc_stats = heap_delta[form_source_line(39)]
>    subtest:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
>    subtest:ok(tab_alloc_stats.dbytes == 0)
>    subtest:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
> @@ -185,5 +213,38 @@ test:test("stack-resize", function(subtest)
>    misc.memprof.stop()
>  end)
>  
> +-- Test profiler with enabled JIT.
>  jit.on()
> +
> +test:test("jit-output", function(subtest)
> +  -- Disabled on *BSD due to #4819.
> +  if jit.os == 'BSD' then
> +    subtest:plan(1)
> +    subtest:skip('Disabled due to #4819')
> +    return
> +  end
> +
> +  subtest:plan(3)
> +
> +  jit.opt.start(3, "hotloop=10")
> +  jit.flush()
> +
> +  -- Pregenerate traces to fill symtab entries in the next run.
> +  default_payload()
> +
> +  local symbols, events = generate_parsed_output(default_payload)
> +
> +  local alloc = fill_ev_type(events, symbols, "alloc")
> +  local free = fill_ev_type(events, symbols, "free")
> +
> +  -- We expect, that loop will be compiled into a trace.
> +  subtest:ok(check_alloc_report(alloc, 1, 37, 32, 20))

What are 37 and 32 in this case? I can use 0 and 0 for both parameters
and the test passes, doesn't it? This is another argument for using
table with different number of keys.

> +  -- See same checks with jit.off().
> +  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
> +  subtest:ok(free.INTERNAL.num == 22)
> +
> +  -- Restore default JIT settings.
> +  jit.opt.start(unpack(jit_opt_default))
> +end)
> +
>  os.exit(test:check() and 0 or 1)

<snipped>

> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> index e01daa62..85945fb2 100644
> --- a/tools/utils/symtab.lua
> +++ b/tools/utils/symtab.lua
> @@ -74,21 +74,29 @@ function M.parse(reader)
>  end
>  
>  function M.id(loc)
> -  return string_format("f%#xl%d", loc.addr, loc.line)
> +  return string_format("f%#xl%dt%d", loc.addr, loc.line, loc.traceno)
>  end
>  
> -function M.demangle(symtab, loc)
> +local function demangle_lfunc(symtab, loc)
>    local addr = loc.addr
>  
>    if addr == 0 then
>      return "INTERNAL"
> -  end
> -
> -  if symtab[addr] then
> +  elseif symtab[addr] then
>      return string_format("%s:%d", symtab[addr].source, loc.line)
>    end
> -

Looks like all these changes are excess, aren't they?

>    return string_format("CFUNC %#x", addr)
>  end
>  
> +local function demangle_trace(loc)
> +  return string_format("TRACE [%d] %#x", loc.traceno, loc.addr)
> +end
> +
> +function M.demangle(symtab, loc)
> +  if loc.traceno ~= 0 then
> +    return demangle_trace(loc)
> +  end
> +  return demangle_lfunc(symtab, loc)

Why 3 of 4 types of ASOURCE are demangled within a separate function
(with a bit strange name), but the traces have a separate one function?
Looks irrationale a bit, IMHO. Then either introduce a new function per
ASOURCE (too crazy as for me) or just move this <if> block for trace
into the original <M.demangle> function (the only logic way I see here).

> +end
> +
>  return M
> -- 
> 2.33.0
> 

-- 
Best regards,
IM