Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number
@ 2021-08-20  7:05 Mikhail Shishatskiy via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter Mikhail Shishatskiy via Tarantool-patches
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-08-20  7:05 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

New patch series with Sergey's comments in mind.

Changes in v3:
  - Changed comments as per review;
  - Added trace address dump to avoid collisions;
  - Added humanize.describe_location function (see commit
    message);
  - Separated memprof test into subtests to be able to skip
    JIT-related tests on FreeBSD;
  - Adjusted memprof test payload to lock in the behavior 
    (see patch message).

Benchmark update:

| ---------------------------------------------------------- |
|                     JIT-off, memprof-on                    |
| ---------------------------------------------------------- |
|       BEFORE       |     AFTER, v2     |     AFTER, v3     |
| ------------------ | ----------------- | ----------------- |
| 4.8196 (0.0194)    | +0.0170 (0.0291)  | -0.0801 (0.0350)  |
| ---------------------------------------------------------- |
|                     JIT-on, memprof-on                     |
| ---------------------------------------------------------- |
|       BEFORE       |     AFTER, v2     |     AFTER, v3     |
| ------------------ | ----------------- | ----------------- |
| 3.1243 (0.0770)    | +0.1470 (0.0320)  | +0.1976 (0.0341)  |
| -----------------------------------------------------------|

Issue: https://github.com/tarantool/tarantool/issues/5814
Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/Makefile.dep.original                     |   2 +-
 src/lj_debug.c                                |   2 +-
 src/lj_debug.h                                |   2 +-
 src/lj_memprof.c                              |  75 +++++-
 src/lj_memprof.h                              |  22 +-
 .../misclib-memprof-lapi.test.lua             | 252 ++++++++++++------
 tools/memprof/humanize.lua                    |  21 +-
 tools/memprof/parse.lua                       |  22 +-
 tools/memprof/process.lua                     |   6 +-
 tools/utils/symtab.lua                        |  45 +++-
 10 files changed, 331 insertions(+), 118 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter
  2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-20  7:05 ` Mikhail Shishatskiy via Tarantool-patches
  2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-08-20  7:05 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

As `lj_debug_line` function does not change data under `pt`
pointer argument neither by designation nor in fact, it is
more correct to mark this pointer const.

Part of tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/lj_debug.c | 2 +-
 src/lj_debug.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/lj_debug.c b/src/lj_debug.c
index bb9ab288..58b75dae 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -110,7 +110,7 @@ static BCPos debug_framepc(lua_State *L, GCfunc *fn, cTValue *nextframe)
 /* -- Line numbers -------------------------------------------------------- */
 
 /* Get line number for a bytecode position. */
-BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc)
+BCLine LJ_FASTCALL lj_debug_line(const GCproto *pt, BCPos pc)
 {
   const void *lineinfo = proto_lineinfo(pt);
   if (pc <= pt->sizebc && lineinfo) {
diff --git a/src/lj_debug.h b/src/lj_debug.h
index a157d284..7814c588 100644
--- a/src/lj_debug.h
+++ b/src/lj_debug.h
@@ -27,7 +27,7 @@ typedef struct lj_Debug {
 } lj_Debug;
 
 LJ_FUNC cTValue *lj_debug_frame(lua_State *L, int level, int *size);
-LJ_FUNC BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc);
+LJ_FUNC BCLine LJ_FASTCALL lj_debug_line(const GCproto *pt, BCPos pc);
 LJ_FUNC const char *lj_debug_uvname(GCproto *pt, uint32_t idx);
 LJ_FUNC const char *lj_debug_uvnamev(cTValue *o, uint32_t idx, TValue **tvp);
 LJ_FUNC const char *lj_debug_slotname(GCproto *pt, const BCIns *pc,
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests
  2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-20  7:05 ` Mikhail Shishatskiy via Tarantool-patches
  2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-08-20  7:05 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

As the number of memprof test cases is expected to grow,
memprof tests are separated into subtests to encapsulate
test cases and be able to skip some of the subtests.

Part of tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 .../misclib-memprof-lapi.test.lua             | 153 ++++++++++--------
 1 file changed, 83 insertions(+), 70 deletions(-)

diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 06d96b3b..dbf384ed 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -7,7 +7,7 @@ require("utils").skipcond(
 local tap = require("tap")
 
 local test = tap.test("misc-memprof-lapi")
-test:plan(13)
+test:plan(3)
 
 jit.off()
 jit.flush()
@@ -86,85 +86,98 @@ local function check_alloc_report(alloc, line, function_line, nevents)
   return true
 end
 
--- Not a directory.
-local res, err, errno = misc.memprof.start(BAD_PATH)
-test:ok(res == nil and err:match("No such file or directory"))
-test:ok(type(errno) == "number")
+-- Test profiler API.
+test:test("base", function(subtest)
+  subtest:plan(6)
 
--- Profiler is running.
-res, err = misc.memprof.start(TMP_BINFILE)
-assert(res, err)
-res, err, errno = misc.memprof.start(TMP_BINFILE)
-test:ok(res == nil and err:match("profiler is running already"))
-test:ok(type(errno) == "number")
+  -- Not a directory.
+  local res, err, errno = misc.memprof.start(BAD_PATH)
+  subtest:ok(res == nil and err:match("No such file or directory"))
+  subtest:ok(type(errno) == "number")
 
-res, err = misc.memprof.stop()
-assert(res, err)
+  -- Profiler is running.
+  res, err = misc.memprof.start(TMP_BINFILE)
+  assert(res, err)
+  res, err, errno = misc.memprof.start(TMP_BINFILE)
+  subtest:ok(res == nil and err:match("profiler is running already"))
+  subtest:ok(type(errno) == "number")
+
+  res, err = misc.memprof.stop()
+  assert(res, err)
 
--- Profiler is not running.
-res, err, errno = misc.memprof.stop()
-test:ok(res == nil and err:match("profiler is not running"))
-test:ok(type(errno) == "number")
+  -- Profiler is not running.
+  res, err, errno = misc.memprof.stop()
+  subtest:ok(res == nil and err:match("profiler is not running"))
+  subtest:ok(type(errno) == "number")
+end)
 
 -- Test profiler output and parse.
-res, err = pcall(generate_output, TMP_BINFILE)
+test:test("output", function(subtest)
+  subtest:plan(7)
+
+  local res, err = pcall(generate_output, TMP_BINFILE)
 
--- Want to cleanup carefully if something went wrong.
-if not res then
+  -- Want to cleanup carefully if something went wrong.
+  if not res then
+    os.remove(TMP_BINFILE)
+    error(err)
+  end
+
+  local reader = bufread.new(TMP_BINFILE)
+  local symbols = symtab.parse(reader)
+  local events = memprof.parse(reader, symbols)
+
+  -- We don't need it any more.
   os.remove(TMP_BINFILE)
-  error(err)
-end
 
-local reader = bufread.new(TMP_BINFILE)
-local symbols = symtab.parse(reader)
-local events = memprof.parse(reader, symbols)
-
--- We don't need it any more.
-os.remove(TMP_BINFILE)
-
-local alloc = fill_ev_type(events, symbols, "alloc")
-local free = fill_ev_type(events, symbols, "free")
-
--- Check allocation reports. The second argument is a line number
--- of the allocation event itself. The third is a line number of
--- the corresponding function definition. The last one is
--- the number of allocations.
--- 1 event - alocation of table by itself + 1 allocation
--- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
-test:ok(check_alloc_report(alloc, 27, 25, 2))
--- 100 strings allocations.
-test:ok(check_alloc_report(alloc, 32, 25, 100))
-
--- Collect all previous allocated objects.
-test:ok(free.INTERNAL.num == 102)
-
--- Tests for leak-only option.
--- See also https://github.com/tarantool/tarantool/issues/5812.
-local heap_delta = process.form_heap_delta(events, symbols)
-local tab_alloc_stats = heap_delta[form_source_line(27)]
-local str_alloc_stats = heap_delta[form_source_line(32)]
-test:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
-test:ok(tab_alloc_stats.dbytes == 0)
-test:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
-test:ok(str_alloc_stats.dbytes == 0)
+  local alloc = fill_ev_type(events, symbols, "alloc")
+  local free = fill_ev_type(events, symbols, "free")
+
+  -- Check allocation reports. The second argument is a line number
+  -- of the allocation event itself. The third is a line number of
+  -- the corresponding function definition. The last one is
+  -- the number of allocations.
+  -- 1 event - alocation of table by itself + 1 allocation
+  -- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
+  subtest:ok(check_alloc_report(alloc, 27, 25, 2))
+  -- 100 strings allocations.
+  subtest:ok(check_alloc_report(alloc, 32, 25, 100))
+
+  -- Collect all previous allocated objects.
+  subtest:ok(free.INTERNAL.num == 102)
+
+  -- Tests for leak-only option.
+  -- See also https://github.com/tarantool/tarantool/issues/5812.
+  local heap_delta = process.form_heap_delta(events, symbols)
+  local tab_alloc_stats = heap_delta[form_source_line(27)]
+  local str_alloc_stats = heap_delta[form_source_line(32)]
+  subtest:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
+  subtest:ok(tab_alloc_stats.dbytes == 0)
+  subtest:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
+  subtest:ok(str_alloc_stats.dbytes == 0)
+end)
 
 -- Test for https://github.com/tarantool/tarantool/issues/5842.
--- We are not interested in this report.
-misc.memprof.start("/dev/null")
--- We need to cause stack resize for local variables at function
--- call. Let's create a new coroutine (all slots are free).
--- It has 1 slot for dummy frame + 39 free slots + 5 extra slots
--- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local variables
--- is enough.
-local payload_str = ""
-for i = 1, 50 do
-  payload_str = payload_str..("local v%d = %d\n"):format(i, i)
-end
-local f, errmsg = loadstring(payload_str)
-assert(f, errmsg)
-local co = coroutine.create(f)
-coroutine.resume(co)
-misc.memprof.stop()
+test:test("stack-resize", function(subtest)
+  subtest:plan(0)
+
+  -- We are not interested in this report.
+  misc.memprof.start("/dev/null")
+  -- We need to cause stack resize for local variables at function
+  -- call. Let's create a new coroutine (all slots are free).
+  -- It has 1 slot for dummy frame + 39 free slots + 5 extra slots
+  -- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local variables
+  -- is enough.
+  local payload_str = ""
+  for i = 1, 50 do
+    payload_str = payload_str..("local v%d = %d\n"):format(i, i)
+  end
+  local f, errmsg = loadstring(payload_str)
+  assert(f, errmsg)
+  local co = coroutine.create(f)
+  coroutine.resume(co)
+  misc.memprof.stop()
+end)
 
 jit.on()
 os.exit(test:check() and 0 or 1)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace
  2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter Mikhail Shishatskiy via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-20  7:05 ` Mikhail Shishatskiy via Tarantool-patches
  2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-08-20  7:05 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

When LuaJIT executes a trace, the trace number is stored in
the virtual machine state. So, we can treat this number as
an allocation event source in memprof and report allocation events
from traces as well.

Previously, all the allocations from traces were marked as INTERNAL.

This patch introduces the functionality described above by adding
a new allocation source type named ASOURCE_TRACE. If at the moment
when allocation event occurs VM state indicates that trace executed,
trace number streamed to a binary file:

| loc-trace  := trace-addr trace-no
| trace-addr := <ULEB128>
| trace-no   := <ULEB128>

Also, the memory profiler parser is adjusted to recognize this
source type by extending <loc> structure: field <traceno>,
representing trace number, is added.

Part of tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/Makefile.dep.original |  2 +-
 src/lj_memprof.c          | 35 +++++++++++++++++++++++++++++++++--
 src/lj_memprof.h          | 15 ++++++++++-----
 tools/memprof/parse.lua   | 22 ++++++++++++++--------
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
index f3672413..ee6bafb2 100644
--- a/src/Makefile.dep.original
+++ b/src/Makefile.dep.original
@@ -146,7 +146,7 @@ lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_gc.h lj_err.h lj_errmsg.h lj_jit.h lj_ir.h lj_mcode.h lj_trace.h \
  lj_dispatch.h lj_bc.h lj_traceerr.h lj_vm.h
 lj_memprof.o: lj_memprof.c lj_arch.h lua.h luaconf.h lj_memprof.h \
- lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h
+ lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h lj_dispatch.h
 lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
  lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
  lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2c1ef3b8..fb99829d 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -19,6 +19,10 @@
 #include "lj_frame.h"
 #include "lj_debug.h"
 
+#if LJ_HASJIT
+#include "lj_dispatch.h"
+#endif
+
 /* --------------------------------- Symtab --------------------------------- */
 
 static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
@@ -146,6 +150,31 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
     lua_assert(0);
 }
 
+#if LJ_HASJIT
+
+static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
+{
+  struct lj_wbuf *out = &mp->out;
+  const global_State *g = mp->g;
+  const jit_State *J = G2J(g);
+  const TraceNo traceno = g->vmstate;
+  const GCtrace *trace = traceref(J, traceno);
+  lj_wbuf_addbyte(out, aevent | ASOURCE_TRACE);
+  lj_wbuf_addu64(out, (uintptr_t)trace->mcode);
+  lj_wbuf_addu64(out, (uint64_t)traceno);
+}
+
+#else
+
+static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
+{
+  UNUSED(mp);
+  UNUSED(aevent);
+  lua_assert(0);
+}
+
+#endif
+
 static void memprof_write_hvmstate(struct memprof *mp, uint8_t aevent)
 {
   lj_wbuf_addbyte(&mp->out, aevent | ASOURCE_INT);
@@ -168,9 +197,11 @@ static const memprof_writer memprof_writers[] = {
   ** But since traces must follow the semantics of the original code,
   ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
   ** which makes using memprof with enabled JIT virtually redundant.
-  ** Hence use the stub below.
+  ** But if one wants to investigate allocations with JIT enabled,
+  ** memprof_write_trace() dumps trace number to the binary output.
+  ** It can be useful to compare with with jit.v or jit.dump outputs.
   */
-  memprof_write_hvmstate /* LJ_VMST_TRACE */
+  memprof_write_trace /* LJ_VMST_TRACE */
 };
 
 static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 3417475d..6a35385d 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -51,9 +51,10 @@
 */
 
 #define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_TRACE ((uint8_t)1)
 #define SYMTAB_FINAL ((uint8_t)0x80)
 
-#define LJM_CURRENT_FORMAT_VERSION 0x01
+#define LJM_CURRENT_FORMAT_VERSION 0x02
 
 /*
 ** Event stream format:
@@ -69,11 +70,14 @@
 ** event-realloc  := event-header loc? oaddr osize naddr nsize
 ** event-free     := event-header loc? oaddr osize
 ** event-header   := <BYTE>
-** loc            := loc-lua | loc-c
+** loc            := loc-lua | loc-c | loc-trace
 ** loc-lua        := sym-addr line-no
 ** loc-c          := sym-addr
+** loc-trace      := trace-addr trace-no
 ** sym-addr       := <ULEB128>
 ** line-no        := <ULEB128>
+** trace-addr     := <ULEB128>
+** trace-no       := <ULEB128>
 ** oaddr          := <ULEB128>
 ** naddr          := <ULEB128>
 ** osize          := <ULEB128>
@@ -88,10 +92,10 @@
 ** version: [VVVVVVVV]
 **  * VVVVVVVV: Byte interpreted as a plain integer version number
 **
-** event-header: [FUUUSSEE]
+** event-header: [FUUSSSEE]
 **  * EE   : 2 bits for representing allocation event type (AEVENT_*)
-**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
-**  * UUU  : 3 unused bits
+**  * SSS  : 3 bits for representing allocation source type (ASOURCE_*)
+**  * UU   : 2 unused bits
 **  * F    : 0 for regular events, 1 for epilogue's *F*inal header
 **           (if F is set to 1, all other bits are currently ignored)
 */
@@ -105,6 +109,7 @@
 #define ASOURCE_INT   ((uint8_t)(1 << 2))
 #define ASOURCE_LFUNC ((uint8_t)(2 << 2))
 #define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+#define ASOURCE_TRACE ((uint8_t)(4 << 2))
 
 #define LJM_EPILOGUE_HEADER 0x80
 
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 12e2758f..adc7c072 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -11,7 +11,7 @@ local lshift = bit.lshift
 local string_format = string.format
 
 local LJM_MAGIC = "ljm"
-local LJM_CURRENT_VERSION = 1
+local LJM_CURRENT_VERSION = 0x02
 
 local LJM_EPILOGUE_HEADER = 0x80
 
@@ -24,8 +24,11 @@ local AEVENT_MASK = 0x3
 local ASOURCE_INT = lshift(1, 2)
 local ASOURCE_LFUNC = lshift(2, 2)
 local ASOURCE_CFUNC = lshift(3, 2)
+local ASOURCE_TRACE = lshift(4, 2)
 
-local ASOURCE_MASK = lshift(0x3, 2)
+local ASOURCE_MASK = lshift(0x7, 2)
+
+local EV_HEADER_MAX = ASOURCE_TRACE + AEVENT_REALLOC
 
 local M = {}
 
@@ -59,20 +62,23 @@ local function link_to_previous(heap_chunk, e, nsize)
   end
 end
 
-local function id_location(addr, line)
-  return string_format("f%#xl%d", addr, line), {
+local function id_location(addr, line, traceno)
+  return string_format("f%#xl%dt%d", addr, line, traceno), {
     addr = addr,
     line = line,
+    traceno = traceno,
   }
 end
 
 local function parse_location(reader, asource)
   if asource == ASOURCE_INT then
-    return id_location(0, 0)
+    return id_location(0, 0, 0)
   elseif asource == ASOURCE_CFUNC then
-    return id_location(reader:read_uleb128(), 0)
+    return id_location(reader:read_uleb128(), 0, 0)
   elseif asource == ASOURCE_LFUNC then
-    return id_location(reader:read_uleb128(), reader:read_uleb128())
+    return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
+  elseif asource == ASOURCE_TRACE then
+    return id_location(reader:read_uleb128(), 0, reader:read_uleb128())
   end
   error("Unknown asource "..asource)
 end
@@ -140,7 +146,7 @@ local parsers = {
 }
 
 local function ev_header_is_valid(evh)
-  return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
+  return evh <= EV_HEADER_MAX or evh == LJM_EPILOGUE_HEADER
 end
 
 -- Splits event header into event type (aka aevent = allocation
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces
  2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-20  7:05 ` Mikhail Shishatskiy via Tarantool-patches
  2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules Mikhail Shishatskiy via Tarantool-patches
  2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
  5 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-08-20  7:05 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This patch extends the memprof symtab by adding information
about traces, which are present at the start of profiling.

The symtab format changed with adding <sym-trace> entry,
which consists of a trace address, trace number, address
of function proto, and line, where trace recording started:

| sym-trace  := sym-header trace-addr trace-no sym-addr sym-line
| trace-addr := <ULEB128>
| trace-no   := <ULEB128>

Also, the memprof parser is adjusted to recognize new
symtab entries and associate them with allocation events
from traces.

With adding traces to the symbol table the API of <utils/symtab.lua>
changed: now table with symbols contains two tables: `lfunc` for
Lua functions symbols and `trace` for trace entries.

As trace can be associated with a function proto, <utils/humanize.lua>
module extended with a function `describe_location` intended
to be used instead of symtab.demangle. The reason is that the
location name can be more than demangled `loc`: for example,
when getting trace location name, we want to assotiate the
trace with a function proto, where trace recording started.
On the one hand, we want to get a verbose trace description
with demangled proto location:

| TRACE [<trace-no>] <trace-addr> started at @<proto-name>:<proto-line>

On the other hand, idiomatically, this should be done not by
the demangler module but by the humanizer module.

Resolves tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/lj_memprof.c                              |  40 ++++++
 src/lj_memprof.h                              |   7 +-
 .../misclib-memprof-lapi.test.lua             | 135 +++++++++++++-----
 tools/memprof/humanize.lua                    |  21 ++-
 tools/memprof/process.lua                     |   6 +-
 tools/utils/symtab.lua                        |  45 ++++--
 6 files changed, 206 insertions(+), 48 deletions(-)

diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index fb99829d..fc5bc301 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -28,6 +28,42 @@
 static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
 					   0x0, 0x0, 0x0};
 
+#if LJ_HASJIT
+
+static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
+{
+  const GCproto *pt = &gcref(trace->startpt)->pt;
+  BCLine lineno = -1;
+
+  const BCIns *startpc = mref(trace->startpc, const BCIns);
+  lua_assert(startpc >= proto_bc(pt) &&
+             startpc < proto_bc(pt) + pt->sizebc);
+
+  lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
+  lua_assert(lineno >= 0);
+
+  lj_wbuf_addbyte(out, SYMTAB_TRACE);
+  lj_wbuf_addu64(out, (uint64_t)trace->mcode);
+  lj_wbuf_addu64(out, (uint64_t)trace->traceno);
+  /*
+  ** All the existing prototypes have already been dumped, so we do not
+  ** need to repeat their dump for trace locations.
+  */
+  lj_wbuf_addu64(out, (uintptr_t)pt);
+  lj_wbuf_addu64(out, (uint64_t)lineno);
+}
+
+#else
+
+static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
+{
+  UNUSED(out);
+  UNUSED(trace);
+  lua_assert(0);
+}
+
+#endif
+
 static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
 {
   const GCRef *iter = &g->gc.root;
@@ -47,6 +83,10 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
       lj_wbuf_addu64(out, (uint64_t)pt->firstline);
       break;
     }
+    case (~LJ_TTRACE): {
+      dump_symtab_trace(out, gco2trace(o));
+      break;
+    }
     default:
       break;
     }
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 6a35385d..e72dadf7 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -16,7 +16,7 @@
 #include "lj_def.h"
 #include "lj_wbuf.h"
 
-#define LJS_CURRENT_VERSION 0x1
+#define LJS_CURRENT_VERSION 0x2
 
 /*
 ** symtab format:
@@ -25,13 +25,16 @@
 ** prologue       := 'l' 'j' 's' version reserved
 ** version        := <BYTE>
 ** reserved       := <BYTE> <BYTE> <BYTE>
-** sym            := sym-lua | sym-final
+** sym            := sym-lua | sym-trace | sym-final
 ** sym-lua        := sym-header sym-addr sym-chunk sym-line
+** sym-trace      := sym-header trace-addr trace-no sym-addr sym-line
 ** sym-header     := <BYTE>
 ** sym-addr       := <ULEB128>
 ** sym-chunk      := string
 ** sym-line       := <ULEB128>
 ** sym-final      := sym-header
+** trace-addr     := <ULEB128>
+** trace-no       := <ULEB128>
 ** string         := string-len string-payload
 ** string-len     := <ULEB128>
 ** string-payload := <BYTE> {string-len}
diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index dbf384ed..f84b6df0 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -7,7 +7,14 @@ require("utils").skipcond(
 local tap = require("tap")
 
 local test = tap.test("misc-memprof-lapi")
-test:plan(3)
+test:plan(4)
+
+local jit_opt_default = {
+    3, -- level
+    "hotloop=56",
+    "hotexit=10",
+    "minstitch=0",
+}
 
 jit.off()
 jit.flush()
@@ -22,12 +29,12 @@ local symtab = require "utils.symtab"
 local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
 local BAD_PATH = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%1/memprofdata.tmp.bin")
 
-local function payload()
+local function default_payload()
   -- Preallocate table to avoid table array part reallocations.
-  local _ = table_new(100, 0)
+  local _ = table_new(20, 0)
 
-  -- Want too see 100 objects here.
-  for i = 1, 100 do
+  -- Want too see 20 objects here.
+  for i = 1, 20 do
     -- Try to avoid crossing with "test" module objects.
     _[i] = "memprof-str-"..i
   end
@@ -37,7 +44,7 @@ local function payload()
   collectgarbage()
 end
 
-local function generate_output(filename)
+local function generate_output(filename, payload)
   -- Clean up all garbage to avoid pollution of free.
   collectgarbage()
 
@@ -52,19 +59,49 @@ local function generate_output(filename)
   assert(res, err)
 end
 
+local function generate_parsed_output(payload)
+  local res, err = pcall(generate_output, TMP_BINFILE, payload)
+
+  -- Want to cleanup carefully if something went wrong.
+  if not res then
+    os.remove(TMP_BINFILE)
+    error(err)
+  end
+
+  local reader = bufread.new(TMP_BINFILE)
+  local symbols = symtab.parse(reader)
+  local events = memprof.parse(reader)
+
+  -- We don't need it any more.
+  os.remove(TMP_BINFILE)
+
+  return symbols, events
+end
+
 local function fill_ev_type(events, symbols, event_type)
   local ev_type = {}
   for _, event in pairs(events[event_type]) do
     local addr = event.loc.addr
-    if addr == 0 then
+    local traceno = event.loc.traceno
+
+    if traceno ~= 0 and symbols.trace[traceno] then
+      local trace_loc = symbols.trace[traceno].sym_loc
+      addr = trace_loc.addr
+      ev_type[trace_loc.line] = {
+        name = string.format("TRACE [%d] %s:%d",
+          traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
+        ),
+        num = event.num,
+      }
+    elseif addr == 0 then
       ev_type.INTERNAL = {
         name = "INTERNAL",
         num = event.num,
     }
-    elseif symbols[addr] then
+    elseif symbols.lfunc[addr] then
       ev_type[event.loc.line] = {
         name = string.format(
-          "%s:%d", symbols[addr].source, symbols[addr].linedefined
+          "%s:%d", symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
         ),
         num = event.num,
       }
@@ -77,10 +114,22 @@ local function form_source_line(line)
   return string.format("@%s:%d", arg[0], line)
 end
 
-local function check_alloc_report(alloc, line, function_line, nevents)
-  assert(form_source_line(function_line) == alloc[line].name)
-  assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
-    alloc[line].num,
+local function check_alloc_report(alloc, traceno, line, function_line, nevents)
+  assert(alloc[line], ("no event on line %d"):format(line))
+  local event = alloc[line]
+  local expected_name
+  if traceno ~= 0 then
+    expected_name = string.format("TRACE [%d] ", traceno)..
+                    form_source_line(function_line)
+  else
+    expected_name = form_source_line(function_line)
+  end
+  assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
+    event.name,
+    expected_name
+  ))
+  assert(event.num == nevents, ("got=%d, expected=%d"):format(
+    event.num,
     nevents
   ))
   return true
@@ -115,20 +164,7 @@ end)
 test:test("output", function(subtest)
   subtest:plan(7)
 
-  local res, err = pcall(generate_output, TMP_BINFILE)
-
-  -- Want to cleanup carefully if something went wrong.
-  if not res then
-    os.remove(TMP_BINFILE)
-    error(err)
-  end
-
-  local reader = bufread.new(TMP_BINFILE)
-  local symbols = symtab.parse(reader)
-  local events = memprof.parse(reader, symbols)
-
-  -- We don't need it any more.
-  os.remove(TMP_BINFILE)
+  local symbols, events = generate_parsed_output(default_payload)
 
   local alloc = fill_ev_type(events, symbols, "alloc")
   local free = fill_ev_type(events, symbols, "free")
@@ -139,18 +175,18 @@ test:test("output", function(subtest)
   -- the number of allocations.
   -- 1 event - alocation of table by itself + 1 allocation
   -- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
-  subtest:ok(check_alloc_report(alloc, 27, 25, 2))
-  -- 100 strings allocations.
-  subtest:ok(check_alloc_report(alloc, 32, 25, 100))
+  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
+  -- 20 strings allocations.
+  subtest:ok(check_alloc_report(alloc, 0, 39, 32, 20))
 
   -- Collect all previous allocated objects.
-  subtest:ok(free.INTERNAL.num == 102)
+  subtest:ok(free.INTERNAL.num == 22)
 
   -- Tests for leak-only option.
   -- See also https://github.com/tarantool/tarantool/issues/5812.
   local heap_delta = process.form_heap_delta(events, symbols)
-  local tab_alloc_stats = heap_delta[form_source_line(27)]
-  local str_alloc_stats = heap_delta[form_source_line(32)]
+  local tab_alloc_stats = heap_delta[form_source_line(34)]
+  local str_alloc_stats = heap_delta[form_source_line(39)]
   subtest:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
   subtest:ok(tab_alloc_stats.dbytes == 0)
   subtest:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
@@ -179,5 +215,38 @@ test:test("stack-resize", function(subtest)
   misc.memprof.stop()
 end)
 
+-- Test profiler with enabled JIT.
 jit.on()
+
+test:test("jit-output", function(subtest)
+  -- Disabled on *BSD due to #4819.
+  if jit.os == 'BSD' then
+    subtest:plan(1)
+    subtest:skip('Disabled due to #4819')
+    return
+  end
+
+  subtest:plan(3)
+
+  jit.opt.start(3, "hotloop=10")
+  jit.flush()
+
+  -- Pregenerate traces to fill symtab entries in the next run.
+  default_payload()
+
+  local symbols, events = generate_parsed_output(default_payload)
+
+  local alloc = fill_ev_type(events, symbols, "alloc")
+  local free = fill_ev_type(events, symbols, "free")
+
+  -- We expect, that loop will be compiled into a trace.
+  subtest:ok(check_alloc_report(alloc, 1, 37, 32, 20))
+  -- See same checks with jit.off().
+  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
+  subtest:ok(free.INTERNAL.num == 22)
+
+  -- Restore default JIT settings.
+  jit.opt.start(unpack(jit_opt_default))
+end)
+
 os.exit(test:check() and 0 or 1)
diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
index 7771005d..800a465e 100644
--- a/tools/memprof/humanize.lua
+++ b/tools/memprof/humanize.lua
@@ -21,7 +21,7 @@ function M.render(events, symbols)
   for i = 1, #ids do
     local event = events[ids[i]]
     print(string.format("%s: %d events\t+%d bytes\t-%d bytes",
-      symtab.demangle(symbols, event.loc),
+      M.describe_location(symbols, event.loc),
       event.num,
       event.alloc,
       event.free
@@ -29,7 +29,7 @@ function M.render(events, symbols)
 
     local prim_loc = {}
     for _, heap_chunk in pairs(event.primary) do
-      table.insert(prim_loc, symtab.demangle(symbols, heap_chunk.loc))
+      table.insert(prim_loc, M.describe_location(symbols, heap_chunk.loc))
     end
     if #prim_loc ~= 0 then
       table.sort(prim_loc)
@@ -81,4 +81,21 @@ function M.leak_info(dheap)
   print("")
 end
 
+function M.describe_location(symbols, loc)
+  if loc.traceno == 0 then
+    return symtab.demangle(symbols, loc)
+  end
+
+  local trace = symbols.trace[loc.traceno]
+
+  -- If trace, which was remembered in the symtab, has not
+  -- been flushed, assotiate it with a proto, where trace
+  -- recording started.
+  if trace and trace.addr == loc.addr then
+    return symtab.demangle(symbols, loc).." started at "..
+           symtab.demangle(symbols, trace.sym_loc)
+  end
+  return symtab.demangle(symbols, loc)
+end
+
 return M
diff --git a/tools/memprof/process.lua b/tools/memprof/process.lua
index 0bcb965b..f277ed84 100644
--- a/tools/memprof/process.lua
+++ b/tools/memprof/process.lua
@@ -2,7 +2,7 @@
 
 local M = {}
 
-local symtab = require "utils.symtab"
+local humanize = require "memprof.humanize"
 
 function M.form_heap_delta(events, symbols)
   -- Auto resurrects source event lines for counting/reporting.
@@ -17,7 +17,7 @@ function M.form_heap_delta(events, symbols)
 
   for _, event in pairs(events.alloc) do
     if event.loc then
-      local ev_line = symtab.demangle(symbols, event.loc)
+      local ev_line = humanize.describe_location(symbols, event.loc)
 
       if (event.alloc > 0) then
         dheap[ev_line].dbytes = dheap[ev_line].dbytes + event.alloc
@@ -37,7 +37,7 @@ function M.form_heap_delta(events, symbols)
       -- that references the table with memory changed
       -- (may be empty).
       for _, heap_chunk in pairs(event.primary) do
-        local ev_line = symtab.demangle(symbols, heap_chunk.loc)
+        local ev_line = humanize.describe_location(symbols, heap_chunk.loc)
 
         if (heap_chunk.alloced > 0) then
           dheap[ev_line].dbytes = dheap[ev_line].dbytes + heap_chunk.alloced
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 3ed1dd13..0e742ee1 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -10,11 +10,12 @@ local band = bit.band
 local string_format = string.format
 
 local LJS_MAGIC = "ljs"
-local LJS_CURRENT_VERSION = 1
+local LJS_CURRENT_VERSION = 0x2
 local LJS_EPILOGUE_HEADER = 0x80
 local LJS_SYMTYPE_MASK = 0x03
 
 local SYMTAB_LFUNC = 0
+local SYMTAB_TRACE = 1
 
 local M = {}
 
@@ -24,18 +25,38 @@ local function parse_sym_lfunc(reader, symtab)
   local sym_chunk = reader:read_string()
   local sym_line = reader:read_uleb128()
 
-  symtab[sym_addr] = {
+  symtab.lfunc[sym_addr] = {
     source = sym_chunk,
     linedefined = sym_line,
   }
 end
 
+local function parse_sym_trace(reader, symtab)
+  local trace_addr = reader:read_uleb128()
+  local traceno = reader:read_uleb128()
+  local sym_addr = reader:read_uleb128()
+  local sym_line = reader:read_uleb128()
+
+  symtab.trace[traceno] = {
+    addr = trace_addr,
+    sym_loc = {
+      addr = sym_addr,
+      line = sym_line,
+      traceno = 0,
+    },
+  }
+end
+
 local parsers = {
   [SYMTAB_LFUNC] = parse_sym_lfunc,
+  [SYMTAB_TRACE] = parse_sym_trace,
 }
 
 function M.parse(reader)
-  local symtab = {}
+  local symtab = {
+    lfunc = {},
+    trace = {},
+  }
   local magic = reader:read_octets(3)
   local version = reader:read_octets(1)
 
@@ -73,18 +94,26 @@ function M.parse(reader)
   return symtab
 end
 
-function M.demangle(symtab, loc)
+local function demangle_lfunc(symtab, loc)
   local addr = loc.addr
 
   if addr == 0 then
     return "INTERNAL"
+  elseif symtab.lfunc[addr] then
+    return string_format("%s:%d", symtab.lfunc[loc.addr].source, loc.line)
   end
+  return string_format("CFUNC %#x", addr)
+end
 
-  if symtab[addr] then
-    return string_format("%s:%d", symtab[addr].source, loc.line)
-  end
+local function demangle_trace(loc)
+  return string_format("TRACE [%d] 0x%x", loc.traceno, loc.addr)
+end
 
-  return string_format("CFUNC %#x", addr)
+function M.demangle(symtab, loc)
+  if loc.traceno ~= 0 then
+    return demangle_trace(loc)
+  end
+  return demangle_lfunc(symtab, loc)
 end
 
 return M
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules
  2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-20  7:05 ` Mikhail Shishatskiy via Tarantool-patches
  2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
  2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
  5 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-08-20  7:05 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

Since commit [1] luajit module <memprof/process.lua> depends on
<memprof/humanize.lua>, so we need to change order of these modules
in <src/lua/init.c> to avoid unprotected API calls.

[1]: https://github.com/tarantool/luajit/commit/90fe59d286cddcc4b76c7123a68776fe53d9e47e

Follows up tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/lua/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/lua/init.c b/src/lua/init.c
index f9738025d..8ac2f3592 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -127,8 +127,8 @@ extern char strict_lua[],
 	bufread_lua[],
 	symtab_lua[],
 	parse_lua[],
-	process_lua[],
 	humanize_lua[],
+	process_lua[],
 	memprof_lua[]
 ;
 
@@ -181,8 +181,8 @@ static const char *lua_modules[] = {
 	"utils.bufread", bufread_lua,
 	"utils.symtab", symtab_lua,
 	"memprof.parse", parse_lua,
-	"memprof.process", process_lua,
 	"memprof.humanize", humanize_lua,
+	"memprof.process", process_lua,
 	"memprof", memprof_lua,
 	NULL
 };
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-09-16 15:29 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch, but I propose to drop this one, since our source
tree diverges from the upstream as a result of these changes. I guess
you're misguided by Sergey in his review[1].

[1]: https://lists.tarantool.org/tarantool-patches/YRei70Qf4ZutEQqG@root/

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-09-16 15:29 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! LGTM, with few ignorable nits below.

On 20.08.21, Mikhail Shishatskiy wrote:
> As the number of memprof test cases is expected to grow,
> memprof tests are separated into subtests to encapsulate
> test cases and be able to skip some of the subtests.
> 
> Part of tarantool/tarantool#5814
> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  .../misclib-memprof-lapi.test.lua             | 153 ++++++++++--------
>  1 file changed, 83 insertions(+), 70 deletions(-)
> 
> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> index 06d96b3b..dbf384ed 100644
> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua

<snipped>

> +-- Test profiler API.
> +test:test("base", function(subtest)

Minor: The popular name for this is not "base", but "smoke". Basic test
should also validate the output, but the smoke one can just check that
everything looks or seems to be OK. Anyway, feel free to ignore.

> +  subtest:plan(6)
>  

<snipped>

> +  -- Check allocation reports. The second argument is a line number

Typo: Comment exceeds 66 symbols (however, I'm not sure this is still
required in our guidelines; if it's not, then feel free to ignore).

> +  -- of the allocation event itself. The third is a line number of
> +  -- the corresponding function definition. The last one is
> +  -- the number of allocations.

<snipped>

> +  -- We need to cause stack resize for local variables at function
> +  -- call. Let's create a new coroutine (all slots are free).
> +  -- It has 1 slot for dummy frame + 39 free slots + 5 extra slots
> +  -- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local variables

Typo: Comment exceeds 66 symbols (however, I'm not sure this is still
required in our guidelines; if it's not, then feel free to ignore).

> +  -- is enough.

<snipped>

> -- 
> 2.32.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
  2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
  0 siblings, 1 reply; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-09-16 15:32 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! Please consider my comments below.

On 20.08.21, Mikhail Shishatskiy wrote:
> When LuaJIT executes a trace, the trace number is stored in
> the virtual machine state. So, we can treat this number as
> an allocation event source in memprof and report allocation events
> from traces as well.
> 
> Previously, all the allocations from traces were marked as INTERNAL.
> 
> This patch introduces the functionality described above by adding
> a new allocation source type named ASOURCE_TRACE. If at the moment
> when allocation event occurs VM state indicates that trace executed,
> trace number streamed to a binary file:
> 
> | loc-trace  := trace-addr trace-no
> | trace-addr := <ULEB128>
> | trace-no   := <ULEB128>
> 
> Also, the memory profiler parser is adjusted to recognize this
> source type by extending <loc> structure: field <traceno>,
> representing trace number, is added.

I understand, why you've chosen this order, but I don't like it. IMHO,
the binary format should not rely or depend on the particular parser
implementation a lot. Please, consider more comments below.

> 
> Part of tarantool/tarantool#5814
> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  src/Makefile.dep.original |  2 +-
>  src/lj_memprof.c          | 35 +++++++++++++++++++++++++++++++++--
>  src/lj_memprof.h          | 15 ++++++++++-----
>  tools/memprof/parse.lua   | 22 ++++++++++++++--------
>  4 files changed, 58 insertions(+), 16 deletions(-)
> 
> diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
> index f3672413..ee6bafb2 100644
> --- a/src/Makefile.dep.original
> +++ b/src/Makefile.dep.original
> @@ -146,7 +146,7 @@ lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>   lj_gc.h lj_err.h lj_errmsg.h lj_jit.h lj_ir.h lj_mcode.h lj_trace.h \
>   lj_dispatch.h lj_bc.h lj_traceerr.h lj_vm.h
>  lj_memprof.o: lj_memprof.c lj_arch.h lua.h luaconf.h lj_memprof.h \
> - lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h
> + lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h lj_dispatch.h

It looks some headers are missing (it's better use <make depend> from
Makefile.original to check yourself).

>  lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
>   lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
>   lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index 2c1ef3b8..fb99829d 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c

<snipped>

> @@ -168,9 +197,11 @@ static const memprof_writer memprof_writers[] = {
>    ** But since traces must follow the semantics of the original code,
>    ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
>    ** which makes using memprof with enabled JIT virtually redundant.
> -  ** Hence use the stub below.
> +  ** But if one wants to investigate allocations with JIT enabled,
> +  ** memprof_write_trace() dumps trace number to the binary output.

Typo: number and mcode starting address, right?

> +  ** It can be useful to compare with with jit.v or jit.dump outputs.
>    */
> -  memprof_write_hvmstate /* LJ_VMST_TRACE */
> +  memprof_write_trace /* LJ_VMST_TRACE */
>  };
>  
>  static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> index 3417475d..6a35385d 100644
> --- a/src/lj_memprof.h
> +++ b/src/lj_memprof.h
> @@ -51,9 +51,10 @@
>  */
>  
>  #define SYMTAB_LFUNC ((uint8_t)0)
> +#define SYMTAB_TRACE ((uint8_t)1)

This looks like related to the next patch, doesn't it?

>  #define SYMTAB_FINAL ((uint8_t)0x80)
>  
> -#define LJM_CURRENT_FORMAT_VERSION 0x01
> +#define LJM_CURRENT_FORMAT_VERSION 0x02
>  
>  /*
>  ** Event stream format:

<snipped>

> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> index 12e2758f..adc7c072 100644
> --- a/tools/memprof/parse.lua
> +++ b/tools/memprof/parse.lua

<snipped>

> @@ -24,8 +24,11 @@ local AEVENT_MASK = 0x3
>  local ASOURCE_INT = lshift(1, 2)
>  local ASOURCE_LFUNC = lshift(2, 2)
>  local ASOURCE_CFUNC = lshift(3, 2)
> +local ASOURCE_TRACE = lshift(4, 2)
>  
> -local ASOURCE_MASK = lshift(0x3, 2)
> +local ASOURCE_MASK = lshift(0x7, 2)
> +
> +local EV_HEADER_MAX = ASOURCE_TRACE + AEVENT_REALLOC

Why so complex? I believe lshift(5, 2) is more clear and covers (i.e. is
greater than) all cases of AEVENT_* and ASOURCE_*.

>  
>  local M = {}
>  
> @@ -59,20 +62,23 @@ local function link_to_previous(heap_chunk, e, nsize)
>    end
>  end
>  
> -local function id_location(addr, line)
> -  return string_format("f%#xl%d", addr, line), {
> +local function id_location(addr, line, traceno)
> +  return string_format("f%#xl%dt%d", addr, line, traceno), {
>      addr = addr,
>      line = line,
> +    traceno = traceno,
>    }
>  end
>  
>  local function parse_location(reader, asource)
>    if asource == ASOURCE_INT then
> -    return id_location(0, 0)
> +    return id_location(0, 0, 0)
>    elseif asource == ASOURCE_CFUNC then
> -    return id_location(reader:read_uleb128(), 0)
> +    return id_location(reader:read_uleb128(), 0, 0)
>    elseif asource == ASOURCE_LFUNC then
> -    return id_location(reader:read_uleb128(), reader:read_uleb128())
> +    return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
> +  elseif asource == ASOURCE_TRACE then
> +    return id_location(reader:read_uleb128(), 0, reader:read_uleb128())

As a result of your changes this function becomes too "cryptic". It's
better to refactor this function (maybe even in a separate commit), so
we have something like the function below at the final.

| local function id(params)
|   return string_format("f%#xl%ds%d", params.addr, params.line, params.state)
| end
|
| local function parse_location(reader, asource)
|   local location = { addr = 0, line = 0, traceno = 0 }
|   if asource == ASOURCE_INT then
|     -- Do nothing
|   elseif asource == ASOURCE_CFUNC then
|     location.addr = reader:read_uleb128()
|   elseif asource == ASOURCE_LFUNC then
|     location.addr = reader:read_uleb128()
|     location.line = reader:read_uleb128()
|   elseif asource == ASOURCE_TRACE then
|     location.trace = reader:read_uleb128()
|     location.addr = reader:read_uleb128()
|   else
|     error("Unknown asource "..asource)
|   end
|   return id(location), location
| end

You can also make this function public and move it to utils.lua module.

BTW, these entries are "loaded" but not "rendered" in the final output
now, aren't they? In other words, why don't you make everything in a
single patch?

>    end
>    error("Unknown asource "..asource) >  end

<snipped>

> -- 
> 2.32.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
  2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
  0 siblings, 1 reply; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-09-16 15:32 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! Why don't you squash this one patch with the
previous one? What is the rationale to split these changes?

On 20.08.21, Mikhail Shishatskiy wrote:
> This patch extends the memprof symtab by adding information
> about traces, which are present at the start of profiling.
> 
> The symtab format changed with adding <sym-trace> entry,
> which consists of a trace address, trace number, address
> of function proto, and line, where trace recording started:

Again, traceno should be the first, IMHO.

> 
> | sym-trace  := sym-header trace-addr trace-no sym-addr sym-line
> | trace-addr := <ULEB128> > | trace-no   := <ULEB128>
> 
> Also, the memprof parser is adjusted to recognize new
> symtab entries and associate them with allocation events
> from traces.
> 
> With adding traces to the symbol table the API of <utils/symtab.lua>
> changed: now table with symbols contains two tables: `lfunc` for
> Lua functions symbols and `trace` for trace entries.
> 
> As trace can be associated with a function proto, <utils/humanize.lua>
> module extended with a function `describe_location` intended
> to be used instead of symtab.demangle. The reason is that the
> location name can be more than demangled `loc`: for example,
> when getting trace location name, we want to assotiate the
> trace with a function proto, where trace recording started.

Ouch, too tricky description. I guess you mean that ev_line can
represent both allocations in VM and on trace, right?

> On the one hand, we want to get a verbose trace description
> with demangled proto location:
> 
> | TRACE [<trace-no>] <trace-addr> started at @<proto-name>:<proto-line>
> 
> On the other hand, idiomatically, this should be done not by
> the demangler module but by the humanizer module.

As we discussed offline, you violate so-called MVVM or MVC patterns (I
might be wrong in terms, so excuse me, if I am): none of the parser
modules can depend on humanize.lua -- this is the top-most part of the
parser, that can produce plain text, JSON, CSV or anything else. See
more comments below.

> 
> Resolves tarantool/tarantool#5814
> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  src/lj_memprof.c                              |  40 ++++++
>  src/lj_memprof.h                              |   7 +-
>  .../misclib-memprof-lapi.test.lua             | 135 +++++++++++++-----
>  tools/memprof/humanize.lua                    |  21 ++-
>  tools/memprof/process.lua                     |   6 +-
>  tools/utils/symtab.lua                        |  45 ++++--
>  6 files changed, 206 insertions(+), 48 deletions(-)
> 
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index fb99829d..fc5bc301 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
> @@ -28,6 +28,42 @@
>  static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
>  					   0x0, 0x0, 0x0};
>  
> +#if LJ_HASJIT
> +
> +static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
> +{
> +  const GCproto *pt = &gcref(trace->startpt)->pt;
> +  BCLine lineno = -1;

Why do you use -1 here?

> +
> +  const BCIns *startpc = mref(trace->startpc, const BCIns);
> +  lua_assert(startpc >= proto_bc(pt) &&
> +             startpc < proto_bc(pt) + pt->sizebc);
> +
> +  lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
> +  lua_assert(lineno >= 0);
> +
> +  lj_wbuf_addbyte(out, SYMTAB_TRACE);
> +  lj_wbuf_addu64(out, (uint64_t)trace->mcode);
> +  lj_wbuf_addu64(out, (uint64_t)trace->traceno);
> +  /*
> +  ** All the existing prototypes have already been dumped, so we do not
> +  ** need to repeat their dump for trace locations.

Do you mean, that we don't need to dump chunk name here again? It's also
worth to mention, that GCproto is anchored via GCtrace, so it's not
collected until the trace is alive.

> +  */
> +  lj_wbuf_addu64(out, (uintptr_t)pt);
> +  lj_wbuf_addu64(out, (uint64_t)lineno);
> +}

<snipped>

> @@ -47,6 +83,10 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
>        lj_wbuf_addu64(out, (uint64_t)pt->firstline);
>        break;
>      }
> +    case (~LJ_TTRACE): {
> +      dump_symtab_trace(out, gco2trace(o));

Minor: IMHO, it's better to put the function body right here (like it's
done for LJ_TPROTO), but feel free to ignore.

> +      break;
> +    }
>      default:
>        break;
>      }

<snipped>

> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> index dbf384ed..f84b6df0 100644
> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua

<snipped>

> @@ -52,19 +59,49 @@ local function generate_output(filename)
>    assert(res, err)
>  end
>  
> +local function generate_parsed_output(payload)

Minor: It's better to introduce this in the second patch, IMHO. Feel
free to ignore.

> +  local res, err = pcall(generate_output, TMP_BINFILE, payload)
> +
> +  -- Want to cleanup carefully if something went wrong.
> +  if not res then
> +    os.remove(TMP_BINFILE)
> +    error(err)
> +  end
> +
> +  local reader = bufread.new(TMP_BINFILE)
> +  local symbols = symtab.parse(reader)
> +  local events = memprof.parse(reader)
> +
> +  -- We don't need it any more.
> +  os.remove(TMP_BINFILE)
> +
> +  return symbols, events
> +end
> +
>  local function fill_ev_type(events, symbols, event_type)
>    local ev_type = {}
>    for _, event in pairs(events[event_type]) do
>      local addr = event.loc.addr
> -    if addr == 0 then
> +    local traceno = event.loc.traceno
> +
> +    if traceno ~= 0 and symbols.trace[traceno] then
> +      local trace_loc = symbols.trace[traceno].sym_loc
> +      addr = trace_loc.addr
> +      ev_type[trace_loc.line] = {
> +        name = string.format("TRACE [%d] %s:%d",
> +          traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
> +        ),
> +        num = event.num,
> +      }
> +    elseif addr == 0 then
>        ev_type.INTERNAL = {
>          name = "INTERNAL",
>          num = event.num,
>      }

Hm. Looks like a misindent in the old code.

> -    elseif symbols[addr] then
> +    elseif symbols.lfunc[addr] then
>        ev_type[event.loc.line] = {
>          name = string.format(
> -          "%s:%d", symbols[addr].source, symbols[addr].linedefined
> +          "%s:%d", symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
>          ),
>          num = event.num,
>        }

<snipped>

> @@ -179,5 +215,38 @@ test:test("stack-resize", function(subtest)
>    misc.memprof.stop()
>  end)
>  
> +-- Test profiler with enabled JIT.
>  jit.on()
> +
> +test:test("jit-output", function(subtest)
> +  -- Disabled on *BSD due to #4819.
> +  if jit.os == 'BSD' then
> +    subtest:plan(1)

This line is excess, since you use skip below.

> +    subtest:skip('Disabled due to #4819')
> +    return
> +  end
> +
> +  subtest:plan(3)
> +
> +  jit.opt.start(3, "hotloop=10")
> +  jit.flush()
> +
> +  -- Pregenerate traces to fill symtab entries in the next run.
> +  default_payload()
> +
> +  local symbols, events = generate_parsed_output(default_payload)
> +
> +  local alloc = fill_ev_type(events, symbols, "alloc")
> +  local free = fill_ev_type(events, symbols, "free")
> +
> +  -- We expect, that loop will be compiled into a trace.
> +  subtest:ok(check_alloc_report(alloc, 1, 37, 32, 20))
> +  -- See same checks with jit.off().
> +  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
> +  subtest:ok(free.INTERNAL.num == 22)
> +
> +  -- Restore default JIT settings.
> +  jit.opt.start(unpack(jit_opt_default))
> +end)
> +
>  os.exit(test:check() and 0 or 1)

<snipped>

> diff --git a/tools/memprof/process.lua b/tools/memprof/process.lua
> index 0bcb965b..f277ed84 100644
> --- a/tools/memprof/process.lua
> +++ b/tools/memprof/process.lua
> @@ -2,7 +2,7 @@
>  
>  local M = {}
>  
> -local symtab = require "utils.symtab"
> +local humanize = require "memprof.humanize"
>  
>  function M.form_heap_delta(events, symbols)
>    -- Auto resurrects source event lines for counting/reporting.
> @@ -17,7 +17,7 @@ function M.form_heap_delta(events, symbols)
>  
>    for _, event in pairs(events.alloc) do
>      if event.loc then
> -      local ev_line = symtab.demangle(symbols, event.loc)
> +      local ev_line = humanize.describe_location(symbols, event.loc)

You can use here <id_location> function from the previous patch, as we
discussed offline. As a result, you can just adjust <symtab.demangle>
and use it in humanize.lua.

>  
>        if (event.alloc > 0) then
>          dheap[ev_line].dbytes = dheap[ev_line].dbytes + event.alloc
> @@ -37,7 +37,7 @@ function M.form_heap_delta(events, symbols)
>        -- that references the table with memory changed
>        -- (may be empty).
>        for _, heap_chunk in pairs(event.primary) do
> -        local ev_line = symtab.demangle(symbols, heap_chunk.loc)
> +        local ev_line = humanize.describe_location(symbols, heap_chunk.loc)
>  
>          if (heap_chunk.alloced > 0) then
>            dheap[ev_line].dbytes = dheap[ev_line].dbytes + heap_chunk.alloced
> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> index 3ed1dd13..0e742ee1 100644
> --- a/tools/utils/symtab.lua
> +++ b/tools/utils/symtab.lua

<snipped>

> @@ -24,18 +25,38 @@ local function parse_sym_lfunc(reader, symtab)
>    local sym_chunk = reader:read_string()
>    local sym_line = reader:read_uleb128()
>  
> -  symtab[sym_addr] = {
> +  symtab.lfunc[sym_addr] = {
>      source = sym_chunk,
>      linedefined = sym_line,
>    }
>  end
>  
> +local function parse_sym_trace(reader, symtab)
> +  local trace_addr = reader:read_uleb128()
> +  local traceno = reader:read_uleb128()
> +  local sym_addr = reader:read_uleb128()
> +  local sym_line = reader:read_uleb128()
> +
> +  symtab.trace[traceno] = {
> +    addr = trace_addr,
> +    sym_loc = {

I propose to rename <sym_loc> to <start>.

> +      addr = sym_addr,
> +      line = sym_line,

It's better to use <linedefined> instead of <line> here.

> +      traceno = 0,

Why do you need this?

> +    },
> +  }
> +end
> +
>  local parsers = {
>    [SYMTAB_LFUNC] = parse_sym_lfunc,
> +  [SYMTAB_TRACE] = parse_sym_trace,
>  }
>  
>  function M.parse(reader)
> -  local symtab = {}
> +  local symtab = {
> +    lfunc = {},
> +    trace = {},
> +  }
>    local magic = reader:read_octets(3)
>    local version = reader:read_octets(1)
>  
> @@ -73,18 +94,26 @@ function M.parse(reader)
>    return symtab
>  end
>  
> -function M.demangle(symtab, loc)
> +local function demangle_lfunc(symtab, loc)
>    local addr = loc.addr
>  
>    if addr == 0 then
>      return "INTERNAL"
> +  elseif symtab.lfunc[addr] then
> +    return string_format("%s:%d", symtab.lfunc[loc.addr].source, loc.line)

Why do you use here <loc.addr> instead of <addr>?

>    end
> +  return string_format("CFUNC %#x", addr)
> +end
>  
> -  if symtab[addr] then
> -    return string_format("%s:%d", symtab[addr].source, loc.line)
> -  end
> +local function demangle_trace(loc)
> +  return string_format("TRACE [%d] 0x%x", loc.traceno, loc.addr)
> +end
>  
> -  return string_format("CFUNC %#x", addr)
> +function M.demangle(symtab, loc)
> +  if loc.traceno ~= 0 then
> +    return demangle_trace(loc)
> +  end
> +  return demangle_lfunc(symtab, loc)
>  end
>  
>  return M
> -- 
> 2.32.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-09-16 15:32 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! Considering the comments for the previous patch,
this one is excess. Hence, drop it please.

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace
  2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
@ 2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 19:21 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!
Thank you for the review!

On 16.09.2021 18:32, Igor Munkin wrote:
>Misha,
>
>Thanks for the patch! Please consider my comments below.
>
>On 20.08.21, Mikhail Shishatskiy wrote:
>> When LuaJIT executes a trace, the trace number is stored in
>> the virtual machine state. So, we can treat this number as
>> an allocation event source in memprof and report allocation events
>> from traces as well.
>>
>> Previously, all the allocations from traces were marked as INTERNAL.
>>
>> This patch introduces the functionality described above by adding
>> a new allocation source type named ASOURCE_TRACE. If at the moment
>> when allocation event occurs VM state indicates that trace executed,
>> trace number streamed to a binary file:
>>
>> | loc-trace  := trace-addr trace-no
>> | trace-addr := <ULEB128>
>> | trace-no   := <ULEB128>
>>
>> Also, the memory profiler parser is adjusted to recognize this
>> source type by extending <loc> structure: field <traceno>,
>> representing trace number, is added.
>
>I understand, why you've chosen this order, but I don't like it. IMHO,
>the binary format should not rely or depend on the particular parser
>implementation a lot. Please, consider more comments below.

Fixed in the upcoming patch series v4.

>
>>
>> Part of tarantool/tarantool#5814
>> ---
>>
>> Issue: https://github.com/tarantool/tarantool/issues/5814
>> Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>> tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>>
>>  src/Makefile.dep.original |  2 +-
>>  src/lj_memprof.c          | 35 +++++++++++++++++++++++++++++++++--
>>  src/lj_memprof.h          | 15 ++++++++++-----
>>  tools/memprof/parse.lua   | 22 ++++++++++++++--------
>>  4 files changed, 58 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
>> index f3672413..ee6bafb2 100644
>> --- a/src/Makefile.dep.original
>> +++ b/src/Makefile.dep.original
>> @@ -146,7 +146,7 @@ lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>>   lj_gc.h lj_err.h lj_errmsg.h lj_jit.h lj_ir.h lj_mcode.h lj_trace.h \
>>   lj_dispatch.h lj_bc.h lj_traceerr.h lj_vm.h
>>  lj_memprof.o: lj_memprof.c lj_arch.h lua.h luaconf.h lj_memprof.h \
>> - lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h
>> + lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h lj_dispatch.h
>
>It looks some headers are missing (it's better use <make depend> from
>Makefile.original to check yourself).

Fixed in the upcoming patch series v4.

>
>>  lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
>>   lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
>>   lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
>> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
>> index 2c1ef3b8..fb99829d 100644
>> --- a/src/lj_memprof.c
>> +++ b/src/lj_memprof.c
>
><snipped>
>
>> @@ -168,9 +197,11 @@ static const memprof_writer memprof_writers[] = {
>>    ** But since traces must follow the semantics of the original code,
>>    ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
>>    ** which makes using memprof with enabled JIT virtually redundant.
>> -  ** Hence use the stub below.
>> +  ** But if one wants to investigate allocations with JIT enabled,
>> +  ** memprof_write_trace() dumps trace number to the binary output.
>
>Typo: number and mcode starting address, right?

Fixed in the upcoming patch series v4.

>
>> +  ** It can be useful to compare with with jit.v or jit.dump outputs.
>>    */
>> -  memprof_write_hvmstate /* LJ_VMST_TRACE */
>> +  memprof_write_trace /* LJ_VMST_TRACE */
>>  };
>>
>>  static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
>> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
>> index 3417475d..6a35385d 100644
>> --- a/src/lj_memprof.h
>> +++ b/src/lj_memprof.h
>> @@ -51,9 +51,10 @@
>>  */
>>
>>  #define SYMTAB_LFUNC ((uint8_t)0)
>> +#define SYMTAB_TRACE ((uint8_t)1)
>
>This looks like related to the next patch, doesn't it?

Fixed in the upcoming patch series v4.

>
>>  #define SYMTAB_FINAL ((uint8_t)0x80)
>>
>> -#define LJM_CURRENT_FORMAT_VERSION 0x01
>> +#define LJM_CURRENT_FORMAT_VERSION 0x02
>>
>>  /*
>>  ** Event stream format:
>
><snipped>
>
>> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
>> index 12e2758f..adc7c072 100644
>> --- a/tools/memprof/parse.lua
>> +++ b/tools/memprof/parse.lua
>
><snipped>
>
>> @@ -24,8 +24,11 @@ local AEVENT_MASK = 0x3
>>  local ASOURCE_INT = lshift(1, 2)
>>  local ASOURCE_LFUNC = lshift(2, 2)
>>  local ASOURCE_CFUNC = lshift(3, 2)
>> +local ASOURCE_TRACE = lshift(4, 2)
>>
>> -local ASOURCE_MASK = lshift(0x3, 2)
>> +local ASOURCE_MASK = lshift(0x7, 2)
>> +
>> +local EV_HEADER_MAX = ASOURCE_TRACE + AEVENT_REALLOC
>
>Why so complex? I believe lshift(5, 2) is more clear and covers (i.e. is
>greater than) all cases of AEVENT_* and ASOURCE_*.

As for me, lshift(5, 2) is less descriptive. ASOURCE_TRACE +
AEVENT_REALLOC shows the layout of flags in the header [FUUSSSEE]
                                                            ^^^^^
and gives an idea, why EV_HEADER_MAX is EV_HEADER_MAX :)

>
>>
>>  local M = {}
>>
>> @@ -59,20 +62,23 @@ local function link_to_previous(heap_chunk, e, nsize)
>>    end
>>  end
>>
>> -local function id_location(addr, line)
>> -  return string_format("f%#xl%d", addr, line), {
>> +local function id_location(addr, line, traceno)
>> +  return string_format("f%#xl%dt%d", addr, line, traceno), {
>>      addr = addr,
>>      line = line,
>> +    traceno = traceno,
>>    }
>>  end
>>
>>  local function parse_location(reader, asource)
>>    if asource == ASOURCE_INT then
>> -    return id_location(0, 0)
>> +    return id_location(0, 0, 0)
>>    elseif asource == ASOURCE_CFUNC then
>> -    return id_location(reader:read_uleb128(), 0)
>> +    return id_location(reader:read_uleb128(), 0, 0)
>>    elseif asource == ASOURCE_LFUNC then
>> -    return id_location(reader:read_uleb128(), reader:read_uleb128())
>> +    return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
>> +  elseif asource == ASOURCE_TRACE then
>> +    return id_location(reader:read_uleb128(), 0, reader:read_uleb128())
>
>As a result of your changes this function becomes too "cryptic". It's
>better to refactor this function (maybe even in a separate commit), so
>we have something like the function below at the final.

Refactored in the upcoming patch series v4.

>
>| local function id(params)
>|   return string_format("f%#xl%ds%d", params.addr, params.line, params.state)
>| end
>|
>| local function parse_location(reader, asource)
>|   local location = { addr = 0, line = 0, traceno = 0 }
>|   if asource == ASOURCE_INT then
>|     -- Do nothing
>|   elseif asource == ASOURCE_CFUNC then
>|     location.addr = reader:read_uleb128()
>|   elseif asource == ASOURCE_LFUNC then
>|     location.addr = reader:read_uleb128()
>|     location.line = reader:read_uleb128()
>|   elseif asource == ASOURCE_TRACE then
>|     location.trace = reader:read_uleb128()
>|     location.addr = reader:read_uleb128()
>|   else
>|     error("Unknown asource "..asource)
>|   end
>|   return id(location), location
>| end
>
>You can also make this function public and move it to utils.lua module.
>
>BTW, these entries are "loaded" but not "rendered" in the final output
>now, aren't they? In other words, why don't you make everything in a
>single patch?

My bad, I split changes in quite a strange way. New patch series makes
it more "natural": simple rendering is moved to this patch. More complex
"started at ..." is added in another patch.

>
>>    end
>>    error("Unknown asource "..asource) >  end
>
><snipped>
>
>> --
>> 2.32.0
>>
>
>-- 
>Best regards,
>IM

Best regards,
Mikhail Shishatskiy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces
  2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
@ 2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 19:21 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!
Thank you for the review!

On 16.09.2021 18:32, Igor Munkin wrote:
>Misha,
>
>Thanks for the patch! Why don't you squash this one patch with the
>previous one? What is the rationale to split these changes?

The upcoming patch series v4 fixes this. Now patches are split not by
loading / rendering but by loading only basic info, simple rendering /
extending symtab, descriptive rendering.

>
>On 20.08.21, Mikhail Shishatskiy wrote:
>> This patch extends the memprof symtab by adding information
>> about traces, which are present at the start of profiling.
>>
>> The symtab format changed with adding <sym-trace> entry,
>> which consists of a trace address, trace number, address
>> of function proto, and line, where trace recording started:
>
>Again, traceno should be the first, IMHO.
>
>>
>> | sym-trace  := sym-header trace-addr trace-no sym-addr sym-line
>> | trace-addr := <ULEB128> > | trace-no   := <ULEB128>
>>
>> Also, the memprof parser is adjusted to recognize new
>> symtab entries and associate them with allocation events
>> from traces.
>>
>> With adding traces to the symbol table the API of <utils/symtab.lua>
>> changed: now table with symbols contains two tables: `lfunc` for
>> Lua functions symbols and `trace` for trace entries.
>>
>> As trace can be associated with a function proto, <utils/humanize.lua>
>> module extended with a function `describe_location` intended
>> to be used instead of symtab.demangle. The reason is that the
>> location name can be more than demangled `loc`: for example,
>> when getting trace location name, we want to assotiate the
>> trace with a function proto, where trace recording started.
>
>Ouch, too tricky description. I guess you mean that ev_line can
>represent both allocations in VM and on trace, right?

Fixed in the upcoming patch series v4.

>
>> On the one hand, we want to get a verbose trace description
>> with demangled proto location:
>>
>> | TRACE [<trace-no>] <trace-addr> started at @<proto-name>:<proto-line>
>>
>> On the other hand, idiomatically, this should be done not by
>> the demangler module but by the humanizer module.
>
>As we discussed offline, you violate so-called MVVM or MVC patterns (I
>might be wrong in terms, so excuse me, if I am): none of the parser
>modules can depend on humanize.lua -- this is the top-most part of the
>parser, that can produce plain text, JSON, CSV or anything else. See
>more comments below.

Fixed in the upcoming patch series v4.

>
>>
>> Resolves tarantool/tarantool#5814
>> ---
>>
>> Issue: https://github.com/tarantool/tarantool/issues/5814
>> Luajit branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>> tarantool branch: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>>
>>  src/lj_memprof.c                              |  40 ++++++
>>  src/lj_memprof.h                              |   7 +-
>>  .../misclib-memprof-lapi.test.lua             | 135 +++++++++++++-----
>>  tools/memprof/humanize.lua                    |  21 ++-
>>  tools/memprof/process.lua                     |   6 +-
>>  tools/utils/symtab.lua                        |  45 ++++--
>>  6 files changed, 206 insertions(+), 48 deletions(-)
>>
>> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
>> index fb99829d..fc5bc301 100644
>> --- a/src/lj_memprof.c
>> +++ b/src/lj_memprof.c
>> @@ -28,6 +28,42 @@
>>  static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
>>  					   0x0, 0x0, 0x0};
>>
>> +#if LJ_HASJIT
>> +
>> +static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
>> +{
>> +  const GCproto *pt = &gcref(trace->startpt)->pt;
>> +  BCLine lineno = -1;
>
>Why do you use -1 here?

Changed to 0 in order not to confuse. I believe that variable
initialization is important.

>
>> +
>> +  const BCIns *startpc = mref(trace->startpc, const BCIns);
>> +  lua_assert(startpc >= proto_bc(pt) &&
>> +             startpc < proto_bc(pt) + pt->sizebc);
>> +
>> +  lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
>> +  lua_assert(lineno >= 0);
>> +
>> +  lj_wbuf_addbyte(out, SYMTAB_TRACE);
>> +  lj_wbuf_addu64(out, (uint64_t)trace->mcode);
>> +  lj_wbuf_addu64(out, (uint64_t)trace->traceno);
>> +  /*
>> +  ** All the existing prototypes have already been dumped, so we do not
>> +  ** need to repeat their dump for trace locations.
>
>Do you mean, that we don't need to dump chunk name here again? It's also
>worth to mention, that GCproto is anchored via GCtrace, so it's not
>collected until the trace is alive.

Yes, fixed the comment.

>
>> +  */
>> +  lj_wbuf_addu64(out, (uintptr_t)pt);
>> +  lj_wbuf_addu64(out, (uint64_t)lineno);
>> +}
>
><snipped>
>
>> @@ -47,6 +83,10 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
>>        lj_wbuf_addu64(out, (uint64_t)pt->firstline);
>>        break;
>>      }
>> +    case (~LJ_TTRACE): {
>> +      dump_symtab_trace(out, gco2trace(o));
>
>Minor: IMHO, it's better to put the function body right here (like it's
>done for LJ_TPROTO), but feel free to ignore.

But we need to leave the stub, when the JIT is disabled. Personally I
do not like the defines right inside the function code.

>
>> +      break;
>> +    }
>>      default:
>>        break;
>>      }
>
><snipped>
>
>> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> index dbf384ed..f84b6df0 100644
>> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
>> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
>
><snipped>
>
>> @@ -52,19 +59,49 @@ local function generate_output(filename)
>>    assert(res, err)
>>  end
>>
>> +local function generate_parsed_output(payload)
>
>Minor: It's better to introduce this in the second patch, IMHO. Feel
>free to ignore.

Moved to the patch with test refactored.

>
>> +  local res, err = pcall(generate_output, TMP_BINFILE, payload)
>> +
>> +  -- Want to cleanup carefully if something went wrong.
>> +  if not res then
>> +    os.remove(TMP_BINFILE)
>> +    error(err)
>> +  end
>> +
>> +  local reader = bufread.new(TMP_BINFILE)
>> +  local symbols = symtab.parse(reader)
>> +  local events = memprof.parse(reader)
>> +
>> +  -- We don't need it any more.
>> +  os.remove(TMP_BINFILE)
>> +
>> +  return symbols, events
>> +end
>> +
>>  local function fill_ev_type(events, symbols, event_type)
>>    local ev_type = {}
>>    for _, event in pairs(events[event_type]) do
>>      local addr = event.loc.addr
>> -    if addr == 0 then
>> +    local traceno = event.loc.traceno
>> +
>> +    if traceno ~= 0 and symbols.trace[traceno] then
>> +      local trace_loc = symbols.trace[traceno].sym_loc
>> +      addr = trace_loc.addr
>> +      ev_type[trace_loc.line] = {
>> +        name = string.format("TRACE [%d] %s:%d",
>> +          traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
>> +        ),
>> +        num = event.num,
>> +      }
>> +    elseif addr == 0 then
>>        ev_type.INTERNAL = {
>>          name = "INTERNAL",
>>          num = event.num,
>>      }
>
>Hm. Looks like a misindent in the old code.
>
>> -    elseif symbols[addr] then
>> +    elseif symbols.lfunc[addr] then
>>        ev_type[event.loc.line] = {
>>          name = string.format(
>> -          "%s:%d", symbols[addr].source, symbols[addr].linedefined
>> +          "%s:%d", symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
>>          ),
>>          num = event.num,
>>        }
>
><snipped>
>
>> @@ -179,5 +215,38 @@ test:test("stack-resize", function(subtest)
>>    misc.memprof.stop()
>>  end)
>>
>> +-- Test profiler with enabled JIT.
>>  jit.on()
>> +
>> +test:test("jit-output", function(subtest)
>> +  -- Disabled on *BSD due to #4819.
>> +  if jit.os == 'BSD' then
>> +    subtest:plan(1)
>
>This line is excess, since you use skip below.

The utils module uses this semantics, and if I omit the plan call,
I will fail at the end of the test, as there is "no plan".

>
>> +    subtest:skip('Disabled due to #4819')
>> +    return
>> +  end
>> +
>> +  subtest:plan(3)
>> +
>> +  jit.opt.start(3, "hotloop=10")
>> +  jit.flush()
>> +
>> +  -- Pregenerate traces to fill symtab entries in the next run.
>> +  default_payload()
>> +
>> +  local symbols, events = generate_parsed_output(default_payload)
>> +
>> +  local alloc = fill_ev_type(events, symbols, "alloc")
>> +  local free = fill_ev_type(events, symbols, "free")
>> +
>> +  -- We expect, that loop will be compiled into a trace.
>> +  subtest:ok(check_alloc_report(alloc, 1, 37, 32, 20))
>> +  -- See same checks with jit.off().
>> +  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
>> +  subtest:ok(free.INTERNAL.num == 22)
>> +
>> +  -- Restore default JIT settings.
>> +  jit.opt.start(unpack(jit_opt_default))
>> +end)
>> +
>>  os.exit(test:check() and 0 or 1)
>
><snipped>
>
>> diff --git a/tools/memprof/process.lua b/tools/memprof/process.lua
>> index 0bcb965b..f277ed84 100644
>> --- a/tools/memprof/process.lua
>> +++ b/tools/memprof/process.lua
>> @@ -2,7 +2,7 @@
>>
>>  local M = {}
>>
>> -local symtab = require "utils.symtab"
>> +local humanize = require "memprof.humanize"
>>
>>  function M.form_heap_delta(events, symbols)
>>    -- Auto resurrects source event lines for counting/reporting.
>> @@ -17,7 +17,7 @@ function M.form_heap_delta(events, symbols)
>>
>>    for _, event in pairs(events.alloc) do
>>      if event.loc then
>> -      local ev_line = symtab.demangle(symbols, event.loc)
>> +      local ev_line = humanize.describe_location(symbols, event.loc)
>
<snipped>
>
>>
>>        if (event.alloc > 0) then
>>          dheap[ev_line].dbytes = dheap[ev_line].dbytes + event.alloc
>> @@ -37,7 +37,7 @@ function M.form_heap_delta(events, symbols)
>>        -- that references the table with memory changed
>>        -- (may be empty).
>>        for _, heap_chunk in pairs(event.primary) do
>> -        local ev_line = symtab.demangle(symbols, heap_chunk.loc)
>> +        local ev_line = humanize.describe_location(symbols, heap_chunk.loc)
>>
>>          if (heap_chunk.alloced > 0) then
>>            dheap[ev_line].dbytes = dheap[ev_line].dbytes + heap_chunk.alloced
>> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
>> index 3ed1dd13..0e742ee1 100644
>> --- a/tools/utils/symtab.lua
>> +++ b/tools/utils/symtab.lua
>
><snipped>
>
>> @@ -24,18 +25,38 @@ local function parse_sym_lfunc(reader, symtab)
>>    local sym_chunk = reader:read_string()
>>    local sym_line = reader:read_uleb128()
>>
>> -  symtab[sym_addr] = {
>> +  symtab.lfunc[sym_addr] = {
>>      source = sym_chunk,
>>      linedefined = sym_line,
>>    }
>>  end
>>
>> +local function parse_sym_trace(reader, symtab)
>> +  local trace_addr = reader:read_uleb128()
>> +  local traceno = reader:read_uleb128()
>> +  local sym_addr = reader:read_uleb128()
>> +  local sym_line = reader:read_uleb128()
>> +
>> +  symtab.trace[traceno] = {
>> +    addr = trace_addr,
>> +    sym_loc = {
>
>I propose to rename <sym_loc> to <start>.

Fixed in the upcoming patch series v4.

>
>> +      addr = sym_addr,
>> +      line = sym_line,
>
>It's better to use <linedefined> instead of <line> here.

Here I want to follow the semantics of the <loc> structure
in order to pass it to the symtab.demangle.

>
>> +      traceno = 0,
>
>Why do you need this?

See explanation above (about <loc>).

>
>> +    },
>> +  }
>> +end
>> +
>>  local parsers = {
>>    [SYMTAB_LFUNC] = parse_sym_lfunc,
>> +  [SYMTAB_TRACE] = parse_sym_trace,
>>  }
>>
>>  function M.parse(reader)
>> -  local symtab = {}
>> +  local symtab = {
>> +    lfunc = {},
>> +    trace = {},
>> +  }
>>    local magic = reader:read_octets(3)
>>    local version = reader:read_octets(1)
>>
>> @@ -73,18 +94,26 @@ function M.parse(reader)
>>    return symtab
>>  end
>>
>> -function M.demangle(symtab, loc)
>> +local function demangle_lfunc(symtab, loc)
>>    local addr = loc.addr
>>
>>    if addr == 0 then
>>      return "INTERNAL"
>> +  elseif symtab.lfunc[addr] then
>> +    return string_format("%s:%d", symtab.lfunc[loc.addr].source, loc.line)
>
>Why do you use here <loc.addr> instead of <addr>?

My bad, fixed.

>
>>    end
>> +  return string_format("CFUNC %#x", addr)
>> +end
>>
>> -  if symtab[addr] then
>> -    return string_format("%s:%d", symtab[addr].source, loc.line)
>> -  end
>> +local function demangle_trace(loc)
>> +  return string_format("TRACE [%d] 0x%x", loc.traceno, loc.addr)
>> +end
>>
>> -  return string_format("CFUNC %#x", addr)
>> +function M.demangle(symtab, loc)
>> +  if loc.traceno ~= 0 then
>> +    return demangle_trace(loc)
>> +  end
>> +  return demangle_lfunc(symtab, loc)
>>  end
>>
>>  return M
>> --
>> 2.32.0
>>
>
>-- 
>Best regards,
>IM

Best regards,
Mikhail Shishatskiy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno
  2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
                   ` (4 preceding siblings ...)
  2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-29 20:07 ` Mikhail Shishatskiy via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
                     ` (3 more replies)
  5 siblings, 4 replies; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 20:07 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

New patch series v4 with fixes as per review by Igor Munkin:

Changes in v4:
  * refactored the modules in order not to violate MVC: now
    process module uses the symtab.id(loc) as a key (but not the
    fully demangled name)
  * reorganized patches: now the first important patch has the
    location dumping and simple parsing without the trace start
    info; the second patch adds the information about the trace's
    start and extends the symtab.

@ChangeLog
======================================================================
##feature/luajit

* Now memory profiler records allocations from traces. The memory
  profiler parser can display new type of allocation sources in
  the following format:
  ```
  | TRACE [<trace-no>] <trace-addr> started at @<sym-chunk>:<sym-line>
  ```
======================================================================

Mikhail Shishatskiy (4):
  test: separate memprof Lua API tests into subtests
  memprof: refactor location parsing
  memprof: group allocations on traces by traceno
  memprof: add info about trace start to symtab

 src/Makefile.dep.original                     |   3 +-
 src/lj_memprof.c                              |  79 +++++-
 src/lj_memprof.h                              |  22 +-
 .../misclib-memprof-lapi.test.lua             | 261 ++++++++++++------
 tools/memprof.lua                             |   4 +-
 tools/memprof/humanize.lua                    |  30 +-
 tools/memprof/parse.lua                       |  39 +--
 tools/memprof/process.lua                     |   9 +-
 tools/utils/symtab.lua                        |  49 +++-
 9 files changed, 364 insertions(+), 132 deletions(-)

diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
index ee6bafb2..faa44a0b 100644
--- a/src/Makefile.dep.original
+++ b/src/Makefile.dep.original
@@ -146,7 +146,8 @@ lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_gc.h lj_err.h lj_errmsg.h lj_jit.h lj_ir.h lj_mcode.h lj_trace.h \
  lj_dispatch.h lj_bc.h lj_traceerr.h lj_vm.h
 lj_memprof.o: lj_memprof.c lj_arch.h lua.h luaconf.h lj_memprof.h \
- lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h lj_dispatch.h
+ lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h lj_dispatch.h \
+ lj_jit.h lj_ir.h
 lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
  lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
  lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
diff --git a/src/lj_debug.c b/src/lj_debug.c
index 58b75dae..bb9ab288 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -110,7 +110,7 @@ static BCPos debug_framepc(lua_State *L, GCfunc *fn, cTValue *nextframe)
 /* -- Line numbers -------------------------------------------------------- */
 
 /* Get line number for a bytecode position. */
-BCLine LJ_FASTCALL lj_debug_line(const GCproto *pt, BCPos pc)
+BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc)
 {
   const void *lineinfo = proto_lineinfo(pt);
   if (pc <= pt->sizebc && lineinfo) {
diff --git a/src/lj_debug.h b/src/lj_debug.h
index 7814c588..a157d284 100644
--- a/src/lj_debug.h
+++ b/src/lj_debug.h
@@ -27,7 +27,7 @@ typedef struct lj_Debug {
 } lj_Debug;
 
 LJ_FUNC cTValue *lj_debug_frame(lua_State *L, int level, int *size);
-LJ_FUNC BCLine LJ_FASTCALL lj_debug_line(const GCproto *pt, BCPos pc);
+LJ_FUNC BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc);
 LJ_FUNC const char *lj_debug_uvname(GCproto *pt, uint32_t idx);
 LJ_FUNC const char *lj_debug_uvnamev(cTValue *o, uint32_t idx, TValue **tvp);
 LJ_FUNC const char *lj_debug_slotname(GCproto *pt, const BCIns *pc,
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index fc5bc301..e8b2ebbc 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -32,8 +32,8 @@ static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
 
 static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
 {
-  const GCproto *pt = &gcref(trace->startpt)->pt;
-  BCLine lineno = -1;
+  GCproto *pt = &gcref(trace->startpt)->pt;
+  BCLine lineno = 0;
 
   const BCIns *startpc = mref(trace->startpc, const BCIns);
   lua_assert(startpc >= proto_bc(pt) &&
@@ -43,11 +43,14 @@ static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
   lua_assert(lineno >= 0);
 
   lj_wbuf_addbyte(out, SYMTAB_TRACE);
-  lj_wbuf_addu64(out, (uint64_t)trace->mcode);
   lj_wbuf_addu64(out, (uint64_t)trace->traceno);
+  lj_wbuf_addu64(out, (uint64_t)trace->mcode);
   /*
-  ** All the existing prototypes have already been dumped, so we do not
-  ** need to repeat their dump for trace locations.
+  ** The information about the prototype, associated with the
+  ** trace's start has already been dumped, as it is anchored
+  ** via the trace and is not collected while the trace is alive.
+  ** For this reason, we do not need to repeat dumping the chunk
+  ** name for the prototype.
   */
   lj_wbuf_addu64(out, (uintptr_t)pt);
   lj_wbuf_addu64(out, (uint64_t)lineno);
@@ -200,8 +203,8 @@ static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
   const TraceNo traceno = g->vmstate;
   const GCtrace *trace = traceref(J, traceno);
   lj_wbuf_addbyte(out, aevent | ASOURCE_TRACE);
-  lj_wbuf_addu64(out, (uintptr_t)trace->mcode);
   lj_wbuf_addu64(out, (uint64_t)traceno);
+  lj_wbuf_addu64(out, (uintptr_t)trace->mcode);
 }
 
 #else
@@ -238,8 +241,9 @@ static const memprof_writer memprof_writers[] = {
   ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
   ** which makes using memprof with enabled JIT virtually redundant.
   ** But if one wants to investigate allocations with JIT enabled,
-  ** memprof_write_trace() dumps trace number to the binary output.
-  ** It can be useful to compare with with jit.v or jit.dump outputs.
+  ** memprof_write_trace() dumps trace number and mcode starting address
+  ** to the binary output. It can be useful to compare with with jit.v or
+  ** jit.dump outputs.
   */
   memprof_write_trace /* LJ_VMST_TRACE */
 };
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index e72dadf7..395fb429 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -27,14 +27,14 @@
 ** reserved       := <BYTE> <BYTE> <BYTE>
 ** sym            := sym-lua | sym-trace | sym-final
 ** sym-lua        := sym-header sym-addr sym-chunk sym-line
-** sym-trace      := sym-header trace-addr trace-no sym-addr sym-line
+** sym-trace      := sym-header trace-no trace-addr sym-addr sym-line
 ** sym-header     := <BYTE>
 ** sym-addr       := <ULEB128>
 ** sym-chunk      := string
 ** sym-line       := <ULEB128>
 ** sym-final      := sym-header
-** trace-addr     := <ULEB128>
 ** trace-no       := <ULEB128>
+** trace-addr     := <ULEB128>
 ** string         := string-len string-payload
 ** string-len     := <ULEB128>
 ** string-payload := <BYTE> {string-len}
@@ -76,11 +76,11 @@
 ** loc            := loc-lua | loc-c | loc-trace
 ** loc-lua        := sym-addr line-no
 ** loc-c          := sym-addr
-** loc-trace      := trace-addr trace-no
+** loc-trace      := trace-no trace-addr
 ** sym-addr       := <ULEB128>
 ** line-no        := <ULEB128>
-** trace-addr     := <ULEB128>
 ** trace-no       := <ULEB128>
+** trace-addr     := <ULEB128>
 ** oaddr          := <ULEB128>
 ** naddr          := <ULEB128>
 ** osize          := <ULEB128>
diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index f84b6df0..b9edb80d 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -79,15 +79,18 @@ local function generate_parsed_output(payload)
 end
 
 local function fill_ev_type(events, symbols, event_type)
-  local ev_type = {}
+  local ev_type = {
+    line = {},
+    trace = {},
+  }
   for _, event in pairs(events[event_type]) do
     local addr = event.loc.addr
     local traceno = event.loc.traceno
 
     if traceno ~= 0 and symbols.trace[traceno] then
-      local trace_loc = symbols.trace[traceno].sym_loc
+      local trace_loc = symbols.trace[traceno].start
       addr = trace_loc.addr
-      ev_type[trace_loc.line] = {
+      ev_type.trace[traceno] = {
         name = string.format("TRACE [%d] %s:%d",
           traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
         ),
@@ -97,9 +100,9 @@ local function fill_ev_type(events, symbols, event_type)
       ev_type.INTERNAL = {
         name = "INTERNAL",
         num = event.num,
-    }
+      }
     elseif symbols.lfunc[addr] then
-      ev_type[event.loc.line] = {
+      ev_type.line[event.loc.line] = {
         name = string.format(
           "%s:%d", symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
         ),
@@ -115,14 +118,14 @@ local function form_source_line(line)
 end
 
 local function check_alloc_report(alloc, traceno, line, function_line, nevents)
-  assert(alloc[line], ("no event on line %d"):format(line))
-  local event = alloc[line]
-  local expected_name
+  local expected_name, event
   if traceno ~= 0 then
     expected_name = string.format("TRACE [%d] ", traceno)..
                     form_source_line(function_line)
+    event = alloc.trace[traceno]
   else
     expected_name = form_source_line(function_line)
+    event = alloc.line[line]
   end
   assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
     event.name,
@@ -136,7 +139,7 @@ local function check_alloc_report(alloc, traceno, line, function_line, nevents)
 end
 
 -- Test profiler API.
-test:test("base", function(subtest)
+test:test("smoke", function(subtest)
   subtest:plan(6)
 
   -- Not a directory.
@@ -169,12 +172,12 @@ test:test("output", function(subtest)
   local alloc = fill_ev_type(events, symbols, "alloc")
   local free = fill_ev_type(events, symbols, "free")
 
-  -- Check allocation reports. The second argument is a line number
-  -- of the allocation event itself. The third is a line number of
-  -- the corresponding function definition. The last one is
-  -- the number of allocations.
-  -- 1 event - alocation of table by itself + 1 allocation
-  -- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
+  -- Check allocation reports. The second argument is a line
+  -- number of the allocation event itself. The third is a line
+  -- number of the corresponding function definition. The last
+  -- one is the number of allocations. 1 event - alocation of
+  -- table by itself + 1 allocation of array part as far it is
+  -- bigger than LJ_MAX_COLOSIZE (16).
   subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
   -- 20 strings allocations.
   subtest:ok(check_alloc_report(alloc, 0, 39, 32, 20))
@@ -202,8 +205,8 @@ test:test("stack-resize", function(subtest)
   -- We need to cause stack resize for local variables at function
   -- call. Let's create a new coroutine (all slots are free).
   -- It has 1 slot for dummy frame + 39 free slots + 5 extra slots
-  -- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local variables
-  -- is enough.
+  -- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local
+  -- variables is enough.
   local payload_str = ""
   for i = 1, 50 do
     payload_str = payload_str..("local v%d = %d\n"):format(i, i)
diff --git a/tools/memprof.lua b/tools/memprof.lua
index 18b44fdd..760122fc 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -104,8 +104,8 @@ local function dump(inputfile)
   if not leak_only then
     view.profile_info(events, symbols)
   end
-  local dheap = process.form_heap_delta(events, symbols)
-  view.leak_info(dheap)
+  local dheap = process.form_heap_delta(events)
+  view.leak_info(dheap, symbols)
   os.exit(0)
 end
 
diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
index 800a465e..7d30f976 100644
--- a/tools/memprof/humanize.lua
+++ b/tools/memprof/humanize.lua
@@ -7,6 +7,23 @@ local symtab = require "utils.symtab"
 
 local M = {}
 
+function M.describe_location(symbols, loc)
+  if loc.traceno == 0 then
+    return symtab.demangle(symbols, loc)
+  end
+
+  local trace = symbols.trace[loc.traceno]
+
+  -- If trace, which was remembered in the symtab, has not
+  -- been flushed, assotiate it with a proto, where trace
+  -- recording started.
+  if trace and trace.addr == loc.addr then
+    return symtab.demangle(symbols, loc).." started at "..
+           symtab.demangle(symbols, trace.start)
+  end
+  return symtab.demangle(symbols, loc)
+end
+
 function M.render(events, symbols)
   local ids = {}
 
@@ -56,13 +73,16 @@ function M.profile_info(events, symbols)
   print("")
 end
 
-function M.leak_info(dheap)
+function M.leak_info(dheap, symbols)
   local leaks = {}
-  for line, info in pairs(dheap) do
+  for _, info in pairs(dheap) do
     -- Report "INTERNAL" events inconsistencies for profiling
     -- with enabled jit.
     if info.dbytes > 0 then
-      table.insert(leaks, {line = line, dbytes = info.dbytes})
+      table.insert(leaks, {
+        line = M.describe_location(symbols, info.loc),
+        dbytes = info.dbytes
+      })
     end
   end
 
@@ -81,21 +101,4 @@ function M.leak_info(dheap)
   print("")
 end
 
-function M.describe_location(symbols, loc)
-  if loc.traceno == 0 then
-    return symtab.demangle(symbols, loc)
-  end
-
-  local trace = symbols.trace[loc.traceno]
-
-  -- If trace, which was remembered in the symtab, has not
-  -- been flushed, assotiate it with a proto, where trace
-  -- recording started.
-  if trace and trace.addr == loc.addr then
-    return symtab.demangle(symbols, loc).." started at "..
-           symtab.demangle(symbols, trace.sym_loc)
-  end
-  return symtab.demangle(symbols, loc)
-end
-
 return M
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index adc7c072..968fd90e 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -8,6 +8,8 @@ local bit = require "bit"
 local band = bit.band
 local lshift = bit.lshift
 
+local symtab = require "utils.symtab"
+
 local string_format = string.format
 
 local LJM_MAGIC = "ljm"
@@ -62,25 +64,24 @@ local function link_to_previous(heap_chunk, e, nsize)
   end
 end
 
-local function id_location(addr, line, traceno)
-  return string_format("f%#xl%dt%d", addr, line, traceno), {
-    addr = addr,
-    line = line,
-    traceno = traceno,
-  }
-end
-
 local function parse_location(reader, asource)
-  if asource == ASOURCE_INT then
-    return id_location(0, 0, 0)
-  elseif asource == ASOURCE_CFUNC then
-    return id_location(reader:read_uleb128(), 0, 0)
+  local loc = {
+    addr = 0,
+    line = 0,
+    traceno = 0,
+  }
+  if asource == ASOURCE_CFUNC then
+    loc.addr = reader:read_uleb128()
   elseif asource == ASOURCE_LFUNC then
-    return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
+    loc.addr = reader:read_uleb128()
+    loc.line = reader:read_uleb128()
   elseif asource == ASOURCE_TRACE then
-    return id_location(reader:read_uleb128(), 0, reader:read_uleb128())
+    loc.traceno = reader:read_uleb128()
+    loc.addr = reader:read_uleb128()
+  elseif asource ~= ASOURCE_INT then
+    error("Unknown asource "..asource)
   end
-  error("Unknown asource "..asource)
+  return symtab.id(loc), loc
 end
 
 local function parse_alloc(reader, asource, events, heap)
diff --git a/tools/memprof/process.lua b/tools/memprof/process.lua
index f277ed84..360f6cc4 100644
--- a/tools/memprof/process.lua
+++ b/tools/memprof/process.lua
@@ -2,9 +2,9 @@
 
 local M = {}
 
-local humanize = require "memprof.humanize"
+local symtab = require "utils.symtab"
 
-function M.form_heap_delta(events, symbols)
+function M.form_heap_delta(events)
   -- Auto resurrects source event lines for counting/reporting.
   local dheap = setmetatable({}, {__index = function(t, line)
     rawset(t, line, {
@@ -17,11 +17,12 @@ function M.form_heap_delta(events, symbols)
 
   for _, event in pairs(events.alloc) do
     if event.loc then
-      local ev_line = humanize.describe_location(symbols, event.loc)
+      local ev_line = symtab.id(event.loc)
 
       if (event.alloc > 0) then
         dheap[ev_line].dbytes = dheap[ev_line].dbytes + event.alloc
         dheap[ev_line].nalloc = dheap[ev_line].nalloc + event.num
+        dheap[ev_line].loc = event.loc
       end
     end
   end
@@ -37,16 +38,18 @@ function M.form_heap_delta(events, symbols)
       -- that references the table with memory changed
       -- (may be empty).
       for _, heap_chunk in pairs(event.primary) do
-        local ev_line = humanize.describe_location(symbols, heap_chunk.loc)
+        local ev_line = symtab.id(heap_chunk.loc)
 
         if (heap_chunk.alloced > 0) then
           dheap[ev_line].dbytes = dheap[ev_line].dbytes + heap_chunk.alloced
           dheap[ev_line].nalloc = dheap[ev_line].nalloc + heap_chunk.count
+          dheap[ev_line].loc = heap_chunk.loc
         end
 
         if (heap_chunk.freed > 0) then
           dheap[ev_line].dbytes = dheap[ev_line].dbytes - heap_chunk.freed
           dheap[ev_line].nfree = dheap[ev_line].nfree + heap_chunk.count
+          dheap[ev_line].loc = heap_chunk.loc
         end
       end
     end
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 0e742ee1..496d8480 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -32,14 +32,14 @@ local function parse_sym_lfunc(reader, symtab)
 end
 
 local function parse_sym_trace(reader, symtab)
-  local trace_addr = reader:read_uleb128()
   local traceno = reader:read_uleb128()
+  local trace_addr = reader:read_uleb128()
   local sym_addr = reader:read_uleb128()
   local sym_line = reader:read_uleb128()
 
   symtab.trace[traceno] = {
     addr = trace_addr,
-    sym_loc = {
+    start = {
       addr = sym_addr,
       line = sym_line,
       traceno = 0,
@@ -94,19 +94,23 @@ function M.parse(reader)
   return symtab
 end
 
+function M.id(loc)
+  return string_format("f%#xl%dt%d", loc.addr, loc.line, loc.traceno)
+end
+
 local function demangle_lfunc(symtab, loc)
   local addr = loc.addr
 
   if addr == 0 then
     return "INTERNAL"
   elseif symtab.lfunc[addr] then
-    return string_format("%s:%d", symtab.lfunc[loc.addr].source, loc.line)
+    return string_format("%s:%d", symtab.lfunc[addr].source, loc.line)
   end
   return string_format("CFUNC %#x", addr)
 end
 
 local function demangle_trace(loc)
-  return string_format("TRACE [%d] 0x%x", loc.traceno, loc.addr)
+  return string_format("TRACE [%d] %#x", loc.traceno, loc.addr)
 end
 
 function M.demangle(symtab, loc)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests
  2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-29 20:07   ` Mikhail Shishatskiy via Tarantool-patches
  2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
  2021-10-27 15:07     ` Sergey Kaplun via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing Mikhail Shishatskiy via Tarantool-patches
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 20:07 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

As the number of memprof test cases is expected to grow,
memprof tests are separated into subtests to encapsulate
test cases and be able to skip some of the subtests.

Also, output generation and parsing separated into a
dedicated function `generate_parsed_output`, which allows
one to run memprof on different payloads, passing payload
funtion as an argument.

Part of tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 .../misclib-memprof-lapi.test.lua             | 169 ++++++++++--------
 1 file changed, 94 insertions(+), 75 deletions(-)

diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 06d96b3b..9de4bd98 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -7,7 +7,7 @@ require("utils").skipcond(
 local tap = require("tap")
 
 local test = tap.test("misc-memprof-lapi")
-test:plan(13)
+test:plan(3)
 
 jit.off()
 jit.flush()
@@ -22,7 +22,7 @@ local symtab = require "utils.symtab"
 local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
 local BAD_PATH = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%1/memprofdata.tmp.bin")
 
-local function payload()
+local function default_payload()
   -- Preallocate table to avoid table array part reallocations.
   local _ = table_new(100, 0)
 
@@ -37,7 +37,7 @@ local function payload()
   collectgarbage()
 end
 
-local function generate_output(filename)
+local function generate_output(filename, payload)
   -- Clean up all garbage to avoid pollution of free.
   collectgarbage()
 
@@ -52,6 +52,25 @@ local function generate_output(filename)
   assert(res, err)
 end
 
+local function generate_parsed_output(payload)
+  local res, err = pcall(generate_output, TMP_BINFILE, payload)
+
+  -- Want to cleanup carefully if something went wrong.
+  if not res then
+    os.remove(TMP_BINFILE)
+    error(err)
+  end
+
+  local reader = bufread.new(TMP_BINFILE)
+  local symbols = symtab.parse(reader)
+  local events = memprof.parse(reader)
+
+  -- We don't need it any more.
+  os.remove(TMP_BINFILE)
+
+  return symbols, events
+end
+
 local function fill_ev_type(events, symbols, event_type)
   local ev_type = {}
   for _, event in pairs(events[event_type]) do
@@ -86,85 +105,85 @@ local function check_alloc_report(alloc, line, function_line, nevents)
   return true
 end
 
--- Not a directory.
-local res, err, errno = misc.memprof.start(BAD_PATH)
-test:ok(res == nil and err:match("No such file or directory"))
-test:ok(type(errno) == "number")
-
--- Profiler is running.
-res, err = misc.memprof.start(TMP_BINFILE)
-assert(res, err)
-res, err, errno = misc.memprof.start(TMP_BINFILE)
-test:ok(res == nil and err:match("profiler is running already"))
-test:ok(type(errno) == "number")
+-- Test profiler API.
+test:test("smoke", function(subtest)
+  subtest:plan(6)
 
-res, err = misc.memprof.stop()
-assert(res, err)
+  -- Not a directory.
+  local res, err, errno = misc.memprof.start(BAD_PATH)
+  subtest:ok(res == nil and err:match("No such file or directory"))
+  subtest:ok(type(errno) == "number")
 
--- Profiler is not running.
-res, err, errno = misc.memprof.stop()
-test:ok(res == nil and err:match("profiler is not running"))
-test:ok(type(errno) == "number")
+  -- Profiler is running.
+  res, err = misc.memprof.start(TMP_BINFILE)
+  assert(res, err)
+  res, err, errno = misc.memprof.start(TMP_BINFILE)
+  subtest:ok(res == nil and err:match("profiler is running already"))
+  subtest:ok(type(errno) == "number")
 
--- Test profiler output and parse.
-res, err = pcall(generate_output, TMP_BINFILE)
+  res, err = misc.memprof.stop()
+  assert(res, err)
 
--- Want to cleanup carefully if something went wrong.
-if not res then
-  os.remove(TMP_BINFILE)
-  error(err)
-end
+  -- Profiler is not running.
+  res, err, errno = misc.memprof.stop()
+  subtest:ok(res == nil and err:match("profiler is not running"))
+  subtest:ok(type(errno) == "number")
+end)
 
-local reader = bufread.new(TMP_BINFILE)
-local symbols = symtab.parse(reader)
-local events = memprof.parse(reader, symbols)
-
--- We don't need it any more.
-os.remove(TMP_BINFILE)
-
-local alloc = fill_ev_type(events, symbols, "alloc")
-local free = fill_ev_type(events, symbols, "free")
-
--- Check allocation reports. The second argument is a line number
--- of the allocation event itself. The third is a line number of
--- the corresponding function definition. The last one is
--- the number of allocations.
--- 1 event - alocation of table by itself + 1 allocation
--- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
-test:ok(check_alloc_report(alloc, 27, 25, 2))
--- 100 strings allocations.
-test:ok(check_alloc_report(alloc, 32, 25, 100))
-
--- Collect all previous allocated objects.
-test:ok(free.INTERNAL.num == 102)
-
--- Tests for leak-only option.
--- See also https://github.com/tarantool/tarantool/issues/5812.
-local heap_delta = process.form_heap_delta(events, symbols)
-local tab_alloc_stats = heap_delta[form_source_line(27)]
-local str_alloc_stats = heap_delta[form_source_line(32)]
-test:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
-test:ok(tab_alloc_stats.dbytes == 0)
-test:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
-test:ok(str_alloc_stats.dbytes == 0)
+-- Test profiler output and parse.
+test:test("output", function(subtest)
+  subtest:plan(7)
+
+  local symbols, events = generate_parsed_output(default_payload)
+
+  local alloc = fill_ev_type(events, symbols, "alloc")
+  local free = fill_ev_type(events, symbols, "free")
+
+  -- Check allocation reports. The second argument is a line
+  -- number of the allocation event itself. The third is a line
+  -- number of the corresponding function definition. The last
+  -- one is the number of allocations. 1 event - alocation of
+  -- table by itself + 1 allocation of array part as far it is
+  -- bigger than LJ_MAX_COLOSIZE (16).
+  subtest:ok(check_alloc_report(alloc, 27, 25, 2))
+  -- 100 strings allocations.
+  subtest:ok(check_alloc_report(alloc, 32, 25, 100))
+
+  -- Collect all previous allocated objects.
+  subtest:ok(free.INTERNAL.num == 102)
+
+  -- Tests for leak-only option.
+  -- See also https://github.com/tarantool/tarantool/issues/5812.
+  local heap_delta = process.form_heap_delta(events, symbols)
+  local tab_alloc_stats = heap_delta[form_source_line(27)]
+  local str_alloc_stats = heap_delta[form_source_line(32)]
+  subtest:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
+  subtest:ok(tab_alloc_stats.dbytes == 0)
+  subtest:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
+  subtest:ok(str_alloc_stats.dbytes == 0)
+end)
 
 -- Test for https://github.com/tarantool/tarantool/issues/5842.
--- We are not interested in this report.
-misc.memprof.start("/dev/null")
--- We need to cause stack resize for local variables at function
--- call. Let's create a new coroutine (all slots are free).
--- It has 1 slot for dummy frame + 39 free slots + 5 extra slots
--- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local variables
--- is enough.
-local payload_str = ""
-for i = 1, 50 do
-  payload_str = payload_str..("local v%d = %d\n"):format(i, i)
-end
-local f, errmsg = loadstring(payload_str)
-assert(f, errmsg)
-local co = coroutine.create(f)
-coroutine.resume(co)
-misc.memprof.stop()
+test:test("stack-resize", function(subtest)
+  subtest:plan(0)
+
+  -- We are not interested in this report.
+  misc.memprof.start("/dev/null")
+  -- We need to cause stack resize for local variables at function
+  -- call. Let's create a new coroutine (all slots are free).
+  -- It has 1 slot for dummy frame + 39 free slots + 5 extra slots
+  -- (so-called red zone) + 2 * LJ_FR2 slots. So 50 local
+  -- variables is enough.
+  local payload_str = ""
+  for i = 1, 50 do
+    payload_str = payload_str..("local v%d = %d\n"):format(i, i)
+  end
+  local f, errmsg = loadstring(payload_str)
+  assert(f, errmsg)
+  local co = coroutine.create(f)
+  coroutine.resume(co)
+  misc.memprof.stop()
+end)
 
 jit.on()
 os.exit(test:check() and 0 or 1)
-- 
2.33.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing
  2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-29 20:07   ` Mikhail Shishatskiy via Tarantool-patches
  2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab Mikhail Shishatskiy via Tarantool-patches
  3 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 20:07 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

As it is hard to keep function `id_location` easily usable in the
function `parse_location` with a growing number of fields in the
<loc> table, the code is refactored to make it more understandable.

The API of <utils/symtab.lua> module changed with adding a function
`id(loc)` returning the same id-string, as `id_location` did. This
function is useful to "stringify" the location and use it as a key
in the key/value storage.

Part of tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 tools/memprof/parse.lua | 26 +++++++++++++-------------
 tools/utils/symtab.lua  |  4 ++++
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 12e2758f..34ff8aef 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -8,6 +8,8 @@ local bit = require "bit"
 local band = bit.band
 local lshift = bit.lshift
 
+local symtab = require "utils.symtab"
+
 local string_format = string.format
 
 local LJM_MAGIC = "ljm"
@@ -59,22 +61,20 @@ local function link_to_previous(heap_chunk, e, nsize)
   end
 end
 
-local function id_location(addr, line)
-  return string_format("f%#xl%d", addr, line), {
-    addr = addr,
-    line = line,
-  }
-end
-
 local function parse_location(reader, asource)
-  if asource == ASOURCE_INT then
-    return id_location(0, 0)
-  elseif asource == ASOURCE_CFUNC then
-    return id_location(reader:read_uleb128(), 0)
+  local loc = {
+    addr = 0,
+    line = 0,
+  }
+  if asource == ASOURCE_CFUNC then
+    loc.addr = reader:read_uleb128()
   elseif asource == ASOURCE_LFUNC then
-    return id_location(reader:read_uleb128(), reader:read_uleb128())
+    loc.addr = reader:read_uleb128()
+    loc.line = reader:read_uleb128()
+  elseif asource ~= ASOURCE_INT then
+    error("Unknown asource "..asource)
   end
-  error("Unknown asource "..asource)
+  return symtab.id(loc), loc
 end
 
 local function parse_alloc(reader, asource, events, heap)
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 3ed1dd13..e01daa62 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -73,6 +73,10 @@ function M.parse(reader)
   return symtab
 end
 
+function M.id(loc)
+  return string_format("f%#xl%d", loc.addr, loc.line)
+end
+
 function M.demangle(symtab, loc)
   local addr = loc.addr
 
-- 
2.33.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno
  2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-29 20:07   ` Mikhail Shishatskiy via Tarantool-patches
  2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab Mikhail Shishatskiy via Tarantool-patches
  3 siblings, 1 reply; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 20:07 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

When LuaJIT executes a trace, the trace number is stored in
the virtual machine state. So, we can treat this number as
an allocation event source in memprof and report allocation events
from traces as well.

Previously, all the allocations from traces were marked as INTERNAL.

This patch introduces the functionality described above by adding
a new allocation source type named ASOURCE_TRACE. If at the moment
when allocation event occurs VM state indicates that trace executed,
trace number and trace's mcode starting address streamed to a binary
file:

| loc-trace  := trace-no trace-addr
| trace-no   := <ULEB128>
| trace-addr := <ULEB128>

Also, the memory profiler parser is adjusted to recognize entries
mentioned above. The <loc> structure is extended with field <traceno>,
representing trace number. Trace locations are demangled as

| TRACE [<trace-no>] <trace-addr>

Resolves tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/Makefile.dep.original                     |  3 +-
 src/lj_memprof.c                              | 36 ++++++-
 src/lj_memprof.h                              | 14 ++-
 .../misclib-memprof-lapi.test.lua             | 97 +++++++++++++++----
 tools/memprof/parse.lua                       | 13 ++-
 tools/utils/symtab.lua                        | 20 ++--
 6 files changed, 148 insertions(+), 35 deletions(-)

diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
index f3672413..faa44a0b 100644
--- a/src/Makefile.dep.original
+++ b/src/Makefile.dep.original
@@ -146,7 +146,8 @@ lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_gc.h lj_err.h lj_errmsg.h lj_jit.h lj_ir.h lj_mcode.h lj_trace.h \
  lj_dispatch.h lj_bc.h lj_traceerr.h lj_vm.h
 lj_memprof.o: lj_memprof.c lj_arch.h lua.h luaconf.h lj_memprof.h \
- lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h
+ lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h lj_dispatch.h \
+ lj_jit.h lj_ir.h
 lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
  lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
  lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2c1ef3b8..8702557f 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -19,6 +19,10 @@
 #include "lj_frame.h"
 #include "lj_debug.h"
 
+#if LJ_HASJIT
+#include "lj_dispatch.h"
+#endif
+
 /* --------------------------------- Symtab --------------------------------- */
 
 static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
@@ -146,6 +150,31 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
     lua_assert(0);
 }
 
+#if LJ_HASJIT
+
+static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
+{
+  struct lj_wbuf *out = &mp->out;
+  const global_State *g = mp->g;
+  const jit_State *J = G2J(g);
+  const TraceNo traceno = g->vmstate;
+  const GCtrace *trace = traceref(J, traceno);
+  lj_wbuf_addbyte(out, aevent | ASOURCE_TRACE);
+  lj_wbuf_addu64(out, (uint64_t)traceno);
+  lj_wbuf_addu64(out, (uintptr_t)trace->mcode);
+}
+
+#else
+
+static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
+{
+  UNUSED(mp);
+  UNUSED(aevent);
+  lua_assert(0);
+}
+
+#endif
+
 static void memprof_write_hvmstate(struct memprof *mp, uint8_t aevent)
 {
   lj_wbuf_addbyte(&mp->out, aevent | ASOURCE_INT);
@@ -168,9 +197,12 @@ static const memprof_writer memprof_writers[] = {
   ** But since traces must follow the semantics of the original code,
   ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
   ** which makes using memprof with enabled JIT virtually redundant.
-  ** Hence use the stub below.
+  ** But if one wants to investigate allocations with JIT enabled,
+  ** memprof_write_trace() dumps trace number and mcode starting address
+  ** to the binary output. It can be useful to compare with with jit.v or
+  ** jit.dump outputs.
   */
-  memprof_write_hvmstate /* LJ_VMST_TRACE */
+  memprof_write_trace /* LJ_VMST_TRACE */
 };
 
 static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 3417475d..47474a51 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -53,7 +53,7 @@
 #define SYMTAB_LFUNC ((uint8_t)0)
 #define SYMTAB_FINAL ((uint8_t)0x80)
 
-#define LJM_CURRENT_FORMAT_VERSION 0x01
+#define LJM_CURRENT_FORMAT_VERSION 0x02
 
 /*
 ** Event stream format:
@@ -69,11 +69,14 @@
 ** event-realloc  := event-header loc? oaddr osize naddr nsize
 ** event-free     := event-header loc? oaddr osize
 ** event-header   := <BYTE>
-** loc            := loc-lua | loc-c
+** loc            := loc-lua | loc-c | loc-trace
 ** loc-lua        := sym-addr line-no
 ** loc-c          := sym-addr
+** loc-trace      := trace-no trace-addr
 ** sym-addr       := <ULEB128>
 ** line-no        := <ULEB128>
+** trace-no       := <ULEB128>
+** trace-addr     := <ULEB128>
 ** oaddr          := <ULEB128>
 ** naddr          := <ULEB128>
 ** osize          := <ULEB128>
@@ -88,10 +91,10 @@
 ** version: [VVVVVVVV]
 **  * VVVVVVVV: Byte interpreted as a plain integer version number
 **
-** event-header: [FUUUSSEE]
+** event-header: [FUUSSSEE]
 **  * EE   : 2 bits for representing allocation event type (AEVENT_*)
-**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
-**  * UUU  : 3 unused bits
+**  * SSS  : 3 bits for representing allocation source type (ASOURCE_*)
+**  * UU   : 2 unused bits
 **  * F    : 0 for regular events, 1 for epilogue's *F*inal header
 **           (if F is set to 1, all other bits are currently ignored)
 */
@@ -105,6 +108,7 @@
 #define ASOURCE_INT   ((uint8_t)(1 << 2))
 #define ASOURCE_LFUNC ((uint8_t)(2 << 2))
 #define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+#define ASOURCE_TRACE ((uint8_t)(4 << 2))
 
 #define LJM_EPILOGUE_HEADER 0x80
 
diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 9de4bd98..3f4ffea0 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -7,7 +7,14 @@ require("utils").skipcond(
 local tap = require("tap")
 
 local test = tap.test("misc-memprof-lapi")
-test:plan(3)
+test:plan(4)
+
+local jit_opt_default = {
+    3, -- level
+    "hotloop=56",
+    "hotexit=10",
+    "minstitch=0",
+}
 
 jit.off()
 jit.flush()
@@ -24,10 +31,10 @@ local BAD_PATH = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%1/memprofdata.tmp.bin")
 
 local function default_payload()
   -- Preallocate table to avoid table array part reallocations.
-  local _ = table_new(100, 0)
+  local _ = table_new(20, 0)
 
-  -- Want too see 100 objects here.
-  for i = 1, 100 do
+  -- Want too see 20 objects here.
+  for i = 1, 20 do
     -- Try to avoid crossing with "test" module objects.
     _[i] = "memprof-str-"..i
   end
@@ -72,16 +79,26 @@ local function generate_parsed_output(payload)
 end
 
 local function fill_ev_type(events, symbols, event_type)
-  local ev_type = {}
+  local ev_type = {
+    line = {},
+    trace = {},
+  }
   for _, event in pairs(events[event_type]) do
     local addr = event.loc.addr
-    if addr == 0 then
+    local traceno = event.loc.traceno
+
+    if traceno ~= 0 then
+      ev_type.trace[traceno] = {
+        name = string.format("TRACE [%d]", traceno),
+        num = event.num,
+      }
+    elseif addr == 0 then
       ev_type.INTERNAL = {
         name = "INTERNAL",
         num = event.num,
-    }
+      }
     elseif symbols[addr] then
-      ev_type[event.loc.line] = {
+      ev_type.line[event.loc.line] = {
         name = string.format(
           "%s:%d", symbols[addr].source, symbols[addr].linedefined
         ),
@@ -96,10 +113,21 @@ local function form_source_line(line)
   return string.format("@%s:%d", arg[0], line)
 end
 
-local function check_alloc_report(alloc, line, function_line, nevents)
-  assert(form_source_line(function_line) == alloc[line].name)
-  assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
-    alloc[line].num,
+local function check_alloc_report(alloc, traceno, line, function_line, nevents)
+  local expected_name, event
+  if traceno ~= 0 then
+    expected_name = string.format("TRACE [%d]", traceno)
+    event = alloc.trace[traceno]
+  else
+    expected_name = form_source_line(function_line)
+    event = alloc.line[line]
+  end
+  assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
+    event.name,
+    expected_name
+  ))
+  assert(event.num == nevents, ("got=%d, expected=%d"):format(
+    event.num,
     nevents
   ))
   return true
@@ -145,18 +173,18 @@ test:test("output", function(subtest)
   -- one is the number of allocations. 1 event - alocation of
   -- table by itself + 1 allocation of array part as far it is
   -- bigger than LJ_MAX_COLOSIZE (16).
-  subtest:ok(check_alloc_report(alloc, 27, 25, 2))
-  -- 100 strings allocations.
-  subtest:ok(check_alloc_report(alloc, 32, 25, 100))
+  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
+  -- 20 strings allocations.
+  subtest:ok(check_alloc_report(alloc, 0, 39, 32, 20))
 
   -- Collect all previous allocated objects.
-  subtest:ok(free.INTERNAL.num == 102)
+  subtest:ok(free.INTERNAL.num == 22)
 
   -- Tests for leak-only option.
   -- See also https://github.com/tarantool/tarantool/issues/5812.
   local heap_delta = process.form_heap_delta(events, symbols)
-  local tab_alloc_stats = heap_delta[form_source_line(27)]
-  local str_alloc_stats = heap_delta[form_source_line(32)]
+  local tab_alloc_stats = heap_delta[form_source_line(34)]
+  local str_alloc_stats = heap_delta[form_source_line(39)]
   subtest:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
   subtest:ok(tab_alloc_stats.dbytes == 0)
   subtest:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
@@ -185,5 +213,38 @@ test:test("stack-resize", function(subtest)
   misc.memprof.stop()
 end)
 
+-- Test profiler with enabled JIT.
 jit.on()
+
+test:test("jit-output", function(subtest)
+  -- Disabled on *BSD due to #4819.
+  if jit.os == 'BSD' then
+    subtest:plan(1)
+    subtest:skip('Disabled due to #4819')
+    return
+  end
+
+  subtest:plan(3)
+
+  jit.opt.start(3, "hotloop=10")
+  jit.flush()
+
+  -- Pregenerate traces to fill symtab entries in the next run.
+  default_payload()
+
+  local symbols, events = generate_parsed_output(default_payload)
+
+  local alloc = fill_ev_type(events, symbols, "alloc")
+  local free = fill_ev_type(events, symbols, "free")
+
+  -- We expect, that loop will be compiled into a trace.
+  subtest:ok(check_alloc_report(alloc, 1, 37, 32, 20))
+  -- See same checks with jit.off().
+  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
+  subtest:ok(free.INTERNAL.num == 22)
+
+  -- Restore default JIT settings.
+  jit.opt.start(unpack(jit_opt_default))
+end)
+
 os.exit(test:check() and 0 or 1)
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 34ff8aef..968fd90e 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -13,7 +13,7 @@ local symtab = require "utils.symtab"
 local string_format = string.format
 
 local LJM_MAGIC = "ljm"
-local LJM_CURRENT_VERSION = 1
+local LJM_CURRENT_VERSION = 0x02
 
 local LJM_EPILOGUE_HEADER = 0x80
 
@@ -26,8 +26,11 @@ local AEVENT_MASK = 0x3
 local ASOURCE_INT = lshift(1, 2)
 local ASOURCE_LFUNC = lshift(2, 2)
 local ASOURCE_CFUNC = lshift(3, 2)
+local ASOURCE_TRACE = lshift(4, 2)
 
-local ASOURCE_MASK = lshift(0x3, 2)
+local ASOURCE_MASK = lshift(0x7, 2)
+
+local EV_HEADER_MAX = ASOURCE_TRACE + AEVENT_REALLOC
 
 local M = {}
 
@@ -65,12 +68,16 @@ local function parse_location(reader, asource)
   local loc = {
     addr = 0,
     line = 0,
+    traceno = 0,
   }
   if asource == ASOURCE_CFUNC then
     loc.addr = reader:read_uleb128()
   elseif asource == ASOURCE_LFUNC then
     loc.addr = reader:read_uleb128()
     loc.line = reader:read_uleb128()
+  elseif asource == ASOURCE_TRACE then
+    loc.traceno = reader:read_uleb128()
+    loc.addr = reader:read_uleb128()
   elseif asource ~= ASOURCE_INT then
     error("Unknown asource "..asource)
   end
@@ -140,7 +147,7 @@ local parsers = {
 }
 
 local function ev_header_is_valid(evh)
-  return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
+  return evh <= EV_HEADER_MAX or evh == LJM_EPILOGUE_HEADER
 end
 
 -- Splits event header into event type (aka aevent = allocation
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index e01daa62..85945fb2 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -74,21 +74,29 @@ function M.parse(reader)
 end
 
 function M.id(loc)
-  return string_format("f%#xl%d", loc.addr, loc.line)
+  return string_format("f%#xl%dt%d", loc.addr, loc.line, loc.traceno)
 end
 
-function M.demangle(symtab, loc)
+local function demangle_lfunc(symtab, loc)
   local addr = loc.addr
 
   if addr == 0 then
     return "INTERNAL"
-  end
-
-  if symtab[addr] then
+  elseif symtab[addr] then
     return string_format("%s:%d", symtab[addr].source, loc.line)
   end
-
   return string_format("CFUNC %#x", addr)
 end
 
+local function demangle_trace(loc)
+  return string_format("TRACE [%d] %#x", loc.traceno, loc.addr)
+end
+
+function M.demangle(symtab, loc)
+  if loc.traceno ~= 0 then
+    return demangle_trace(loc)
+  end
+  return demangle_lfunc(symtab, loc)
+end
+
 return M
-- 
2.33.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab
  2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
                     ` (2 preceding siblings ...)
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
@ 2021-09-29 20:07   ` Mikhail Shishatskiy via Tarantool-patches
  3 siblings, 0 replies; 22+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-09-29 20:07 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

Trace allocation sources, recorded by the memory profiler,
were reported as

| TRACE [<trace-no>] <trace-addr>

This approach is not descriptive enough to understand, where
exactly allocation took place, as we do not know the code
chunk, associated with the trace.

This patch fixes the problem described above by extending the
symbol table with <sym-trace> entries, consisting of a trace's
mcode starting address, trace number, address of function proto,
and line, where trace recording started:

| sym-trace  := sym-header trace-no trace-addr sym-addr sym-line
| trace-no   := <ULEB128>
| trace-addr := <ULEB128>

The memory profiler parser is adjusted to recognize the entries
mentioned above. On top of that, the API of <utils/symtab.lua> changed:
now table with symbols contains two tables: `lfunc` for Lua functions
symbols and `trace` for trace entries.

The demangler module has not changed, but the function
`describe_location` is added to the <memprof/humanize.lua> module,
which allows one to get a description of the trace location in the
format described below:

| TRACE [<trace-no>] <trace-addr> started at @<sym-chunk>:<sym-line>

Follows up tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/lj_memprof.c                              | 43 +++++++++++++++++++
 src/lj_memprof.h                              |  8 +++-
 .../misclib-memprof-lapi.test.lua             | 15 ++++---
 tools/memprof.lua                             |  4 +-
 tools/memprof/humanize.lua                    | 30 ++++++++++---
 tools/memprof/process.lua                     |  9 ++--
 tools/utils/symtab.lua                        | 31 ++++++++++---
 7 files changed, 118 insertions(+), 22 deletions(-)

diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 8702557f..e8b2ebbc 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -28,6 +28,45 @@
 static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
 					   0x0, 0x0, 0x0};
 
+#if LJ_HASJIT
+
+static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
+{
+  GCproto *pt = &gcref(trace->startpt)->pt;
+  BCLine lineno = 0;
+
+  const BCIns *startpc = mref(trace->startpc, const BCIns);
+  lua_assert(startpc >= proto_bc(pt) &&
+             startpc < proto_bc(pt) + pt->sizebc);
+
+  lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
+  lua_assert(lineno >= 0);
+
+  lj_wbuf_addbyte(out, SYMTAB_TRACE);
+  lj_wbuf_addu64(out, (uint64_t)trace->traceno);
+  lj_wbuf_addu64(out, (uint64_t)trace->mcode);
+  /*
+  ** The information about the prototype, associated with the
+  ** trace's start has already been dumped, as it is anchored
+  ** via the trace and is not collected while the trace is alive.
+  ** For this reason, we do not need to repeat dumping the chunk
+  ** name for the prototype.
+  */
+  lj_wbuf_addu64(out, (uintptr_t)pt);
+  lj_wbuf_addu64(out, (uint64_t)lineno);
+}
+
+#else
+
+static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
+{
+  UNUSED(out);
+  UNUSED(trace);
+  lua_assert(0);
+}
+
+#endif
+
 static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
 {
   const GCRef *iter = &g->gc.root;
@@ -47,6 +86,10 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
       lj_wbuf_addu64(out, (uint64_t)pt->firstline);
       break;
     }
+    case (~LJ_TTRACE): {
+      dump_symtab_trace(out, gco2trace(o));
+      break;
+    }
     default:
       break;
     }
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 47474a51..395fb429 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -16,7 +16,7 @@
 #include "lj_def.h"
 #include "lj_wbuf.h"
 
-#define LJS_CURRENT_VERSION 0x1
+#define LJS_CURRENT_VERSION 0x2
 
 /*
 ** symtab format:
@@ -25,13 +25,16 @@
 ** prologue       := 'l' 'j' 's' version reserved
 ** version        := <BYTE>
 ** reserved       := <BYTE> <BYTE> <BYTE>
-** sym            := sym-lua | sym-final
+** sym            := sym-lua | sym-trace | sym-final
 ** sym-lua        := sym-header sym-addr sym-chunk sym-line
+** sym-trace      := sym-header trace-no trace-addr sym-addr sym-line
 ** sym-header     := <BYTE>
 ** sym-addr       := <ULEB128>
 ** sym-chunk      := string
 ** sym-line       := <ULEB128>
 ** sym-final      := sym-header
+** trace-no       := <ULEB128>
+** trace-addr     := <ULEB128>
 ** string         := string-len string-payload
 ** string-len     := <ULEB128>
 ** string-payload := <BYTE> {string-len}
@@ -51,6 +54,7 @@
 */
 
 #define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_TRACE ((uint8_t)1)
 #define SYMTAB_FINAL ((uint8_t)0x80)
 
 #define LJM_CURRENT_FORMAT_VERSION 0x02
diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 3f4ffea0..b9edb80d 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -87,9 +87,13 @@ local function fill_ev_type(events, symbols, event_type)
     local addr = event.loc.addr
     local traceno = event.loc.traceno
 
-    if traceno ~= 0 then
+    if traceno ~= 0 and symbols.trace[traceno] then
+      local trace_loc = symbols.trace[traceno].start
+      addr = trace_loc.addr
       ev_type.trace[traceno] = {
-        name = string.format("TRACE [%d]", traceno),
+        name = string.format("TRACE [%d] %s:%d",
+          traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
+        ),
         num = event.num,
       }
     elseif addr == 0 then
@@ -97,10 +101,10 @@ local function fill_ev_type(events, symbols, event_type)
         name = "INTERNAL",
         num = event.num,
       }
-    elseif symbols[addr] then
+    elseif symbols.lfunc[addr] then
       ev_type.line[event.loc.line] = {
         name = string.format(
-          "%s:%d", symbols[addr].source, symbols[addr].linedefined
+          "%s:%d", symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
         ),
         num = event.num,
       }
@@ -116,7 +120,8 @@ end
 local function check_alloc_report(alloc, traceno, line, function_line, nevents)
   local expected_name, event
   if traceno ~= 0 then
-    expected_name = string.format("TRACE [%d]", traceno)
+    expected_name = string.format("TRACE [%d] ", traceno)..
+                    form_source_line(function_line)
     event = alloc.trace[traceno]
   else
     expected_name = form_source_line(function_line)
diff --git a/tools/memprof.lua b/tools/memprof.lua
index 18b44fdd..760122fc 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -104,8 +104,8 @@ local function dump(inputfile)
   if not leak_only then
     view.profile_info(events, symbols)
   end
-  local dheap = process.form_heap_delta(events, symbols)
-  view.leak_info(dheap)
+  local dheap = process.form_heap_delta(events)
+  view.leak_info(dheap, symbols)
   os.exit(0)
 end
 
diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
index 7771005d..7d30f976 100644
--- a/tools/memprof/humanize.lua
+++ b/tools/memprof/humanize.lua
@@ -7,6 +7,23 @@ local symtab = require "utils.symtab"
 
 local M = {}
 
+function M.describe_location(symbols, loc)
+  if loc.traceno == 0 then
+    return symtab.demangle(symbols, loc)
+  end
+
+  local trace = symbols.trace[loc.traceno]
+
+  -- If trace, which was remembered in the symtab, has not
+  -- been flushed, assotiate it with a proto, where trace
+  -- recording started.
+  if trace and trace.addr == loc.addr then
+    return symtab.demangle(symbols, loc).." started at "..
+           symtab.demangle(symbols, trace.start)
+  end
+  return symtab.demangle(symbols, loc)
+end
+
 function M.render(events, symbols)
   local ids = {}
 
@@ -21,7 +38,7 @@ function M.render(events, symbols)
   for i = 1, #ids do
     local event = events[ids[i]]
     print(string.format("%s: %d events\t+%d bytes\t-%d bytes",
-      symtab.demangle(symbols, event.loc),
+      M.describe_location(symbols, event.loc),
       event.num,
       event.alloc,
       event.free
@@ -29,7 +46,7 @@ function M.render(events, symbols)
 
     local prim_loc = {}
     for _, heap_chunk in pairs(event.primary) do
-      table.insert(prim_loc, symtab.demangle(symbols, heap_chunk.loc))
+      table.insert(prim_loc, M.describe_location(symbols, heap_chunk.loc))
     end
     if #prim_loc ~= 0 then
       table.sort(prim_loc)
@@ -56,13 +73,16 @@ function M.profile_info(events, symbols)
   print("")
 end
 
-function M.leak_info(dheap)
+function M.leak_info(dheap, symbols)
   local leaks = {}
-  for line, info in pairs(dheap) do
+  for _, info in pairs(dheap) do
     -- Report "INTERNAL" events inconsistencies for profiling
     -- with enabled jit.
     if info.dbytes > 0 then
-      table.insert(leaks, {line = line, dbytes = info.dbytes})
+      table.insert(leaks, {
+        line = M.describe_location(symbols, info.loc),
+        dbytes = info.dbytes
+      })
     end
   end
 
diff --git a/tools/memprof/process.lua b/tools/memprof/process.lua
index 0bcb965b..360f6cc4 100644
--- a/tools/memprof/process.lua
+++ b/tools/memprof/process.lua
@@ -4,7 +4,7 @@ local M = {}
 
 local symtab = require "utils.symtab"
 
-function M.form_heap_delta(events, symbols)
+function M.form_heap_delta(events)
   -- Auto resurrects source event lines for counting/reporting.
   local dheap = setmetatable({}, {__index = function(t, line)
     rawset(t, line, {
@@ -17,11 +17,12 @@ function M.form_heap_delta(events, symbols)
 
   for _, event in pairs(events.alloc) do
     if event.loc then
-      local ev_line = symtab.demangle(symbols, event.loc)
+      local ev_line = symtab.id(event.loc)
 
       if (event.alloc > 0) then
         dheap[ev_line].dbytes = dheap[ev_line].dbytes + event.alloc
         dheap[ev_line].nalloc = dheap[ev_line].nalloc + event.num
+        dheap[ev_line].loc = event.loc
       end
     end
   end
@@ -37,16 +38,18 @@ function M.form_heap_delta(events, symbols)
       -- that references the table with memory changed
       -- (may be empty).
       for _, heap_chunk in pairs(event.primary) do
-        local ev_line = symtab.demangle(symbols, heap_chunk.loc)
+        local ev_line = symtab.id(heap_chunk.loc)
 
         if (heap_chunk.alloced > 0) then
           dheap[ev_line].dbytes = dheap[ev_line].dbytes + heap_chunk.alloced
           dheap[ev_line].nalloc = dheap[ev_line].nalloc + heap_chunk.count
+          dheap[ev_line].loc = heap_chunk.loc
         end
 
         if (heap_chunk.freed > 0) then
           dheap[ev_line].dbytes = dheap[ev_line].dbytes - heap_chunk.freed
           dheap[ev_line].nfree = dheap[ev_line].nfree + heap_chunk.count
+          dheap[ev_line].loc = heap_chunk.loc
         end
       end
     end
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 85945fb2..496d8480 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -10,11 +10,12 @@ local band = bit.band
 local string_format = string.format
 
 local LJS_MAGIC = "ljs"
-local LJS_CURRENT_VERSION = 1
+local LJS_CURRENT_VERSION = 0x2
 local LJS_EPILOGUE_HEADER = 0x80
 local LJS_SYMTYPE_MASK = 0x03
 
 local SYMTAB_LFUNC = 0
+local SYMTAB_TRACE = 1
 
 local M = {}
 
@@ -24,18 +25,38 @@ local function parse_sym_lfunc(reader, symtab)
   local sym_chunk = reader:read_string()
   local sym_line = reader:read_uleb128()
 
-  symtab[sym_addr] = {
+  symtab.lfunc[sym_addr] = {
     source = sym_chunk,
     linedefined = sym_line,
   }
 end
 
+local function parse_sym_trace(reader, symtab)
+  local traceno = reader:read_uleb128()
+  local trace_addr = reader:read_uleb128()
+  local sym_addr = reader:read_uleb128()
+  local sym_line = reader:read_uleb128()
+
+  symtab.trace[traceno] = {
+    addr = trace_addr,
+    start = {
+      addr = sym_addr,
+      line = sym_line,
+      traceno = 0,
+    },
+  }
+end
+
 local parsers = {
   [SYMTAB_LFUNC] = parse_sym_lfunc,
+  [SYMTAB_TRACE] = parse_sym_trace,
 }
 
 function M.parse(reader)
-  local symtab = {}
+  local symtab = {
+    lfunc = {},
+    trace = {},
+  }
   local magic = reader:read_octets(3)
   local version = reader:read_octets(1)
 
@@ -82,8 +103,8 @@ local function demangle_lfunc(symtab, loc)
 
   if addr == 0 then
     return "INTERNAL"
-  elseif symtab[addr] then
-    return string_format("%s:%d", symtab[addr].source, loc.line)
+  elseif symtab.lfunc[addr] then
+    return string_format("%s:%d", symtab.lfunc[addr].source, loc.line)
   end
   return string_format("CFUNC %#x", addr)
 end
-- 
2.33.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
@ 2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
  2021-10-27 15:07     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-10-27 13:56 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! LGTM.

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing Mikhail Shishatskiy via Tarantool-patches
@ 2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-10-27 13:56 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! Everything is fine in general, but consider a few
nits below, please.

On 29.09.21, Mikhail Shishatskiy wrote:
> As it is hard to keep function `id_location` easily usable in the
> function `parse_location` with a growing number of fields in the
> <loc> table, the code is refactored to make it more understandable.
> 
> The API of <utils/symtab.lua> module changed with adding a function
> `id(loc)` returning the same id-string, as `id_location` did. This
> function is useful to "stringify" the location and use it as a key
> in the key/value storage.

It's worth to also mention that the function need to be used in other
modules, hence you've made it public in symtab.lua.

> 
> Part of tarantool/tarantool#5814
> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  tools/memprof/parse.lua | 26 +++++++++++++-------------
>  tools/utils/symtab.lua  |  4 ++++
>  2 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> index 12e2758f..34ff8aef 100644
> --- a/tools/memprof/parse.lua
> +++ b/tools/memprof/parse.lua
> @@ -8,6 +8,8 @@ local bit = require "bit"
>  local band = bit.band
>  local lshift = bit.lshift
>  
> +local symtab = require "utils.symtab"

Minor: Why here? It's better either to group all <requires> in the one
place (worse, IMHO) or just add a new <require> just prior to the part
with constants i.e. below <string.format> caching (better, IMHO).

> +
>  local string_format = string.format
>  
>  local LJM_MAGIC = "ljm"
> @@ -59,22 +61,20 @@ local function link_to_previous(heap_chunk, e, nsize)
>    end
>  end
>  
> -local function id_location(addr, line)
> -  return string_format("f%#xl%d", addr, line), {
> -    addr = addr,
> -    line = line,
> -  }
> -end
> -
>  local function parse_location(reader, asource)
> -  if asource == ASOURCE_INT then
> -    return id_location(0, 0)
> -  elseif asource == ASOURCE_CFUNC then
> -    return id_location(reader:read_uleb128(), 0)
> +  local loc = {
> +    addr = 0,
> +    line = 0,
> +  }
> +  if asource == ASOURCE_CFUNC then
> +    loc.addr = reader:read_uleb128()
>    elseif asource == ASOURCE_LFUNC then
> -    return id_location(reader:read_uleb128(), reader:read_uleb128())
> +    loc.addr = reader:read_uleb128()
> +    loc.line = reader:read_uleb128()
> +  elseif asource ~= ASOURCE_INT then

Minor: It's better to write a separate condition for <ASOURCE_INT> and a
separate block for so-called "default" (i.e. plain "else" block) with an
error. So it looks kinda the following:

| if asource == ASOURCE_INT then
|   -- Do nothing. Add a suppression for luacheck here.
| elseif asource == ASOURCE_CFUNC then
|   loc.addr = reader:read_uleb128()
| elseif asource == ASOURCE_LFUNC then
|   loc.addr = reader:read_uleb128()
|   loc.line = reader:read_uleb128()
| else
|   error("Unknown asource "..asource)
| end

If you were confused by luacheck, don't worry: just add a suppression
for this block.

> +    error("Unknown asource "..asource)
>    end
> -  error("Unknown asource "..asource)
> +  return symtab.id(loc), loc
>  end
>  
>  local function parse_alloc(reader, asource, events, heap)

<snipped>

>  
> -- 
> 2.33.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
@ 2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
  0 siblings, 0 replies; 22+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2021-10-27 13:56 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Misha,

Thanks for the patch! Please, consider the comments below.

On 29.09.21, Mikhail Shishatskiy wrote:
> When LuaJIT executes a trace, the trace number is stored in

Typo: s/in/as/.

> the virtual machine state. So, we can treat this number as
> an allocation event source in memprof and report allocation events
> from traces as well.
> 
> Previously, all the allocations from traces were marked as INTERNAL.
> 
> This patch introduces the functionality described above by adding
> a new allocation source type named ASOURCE_TRACE. If at the moment
> when allocation event occurs VM state indicates that trace executed,

Minor: To make the wording a bit clearer, I propose the following:
| If allocation event occurs when trace is executed, trace number...

> trace number and trace's mcode starting address streamed to a binary

Typo: s/streamed/is streamed/.

> file:
> 
> | loc-trace  := trace-no trace-addr
> | trace-no   := <ULEB128>
> | trace-addr := <ULEB128>
> 
> Also, the memory profiler parser is adjusted to recognize entries
> mentioned above. The <loc> structure is extended with field <traceno>,
> representing trace number. Trace locations are demangled as
> 
> | TRACE [<trace-no>] <trace-addr>
> 
> Resolves tarantool/tarantool#5814

Does this patch "resolve" the issue or only being a "part of" it?

> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  src/Makefile.dep.original                     |  3 +-
>  src/lj_memprof.c                              | 36 ++++++-
>  src/lj_memprof.h                              | 14 ++-
>  .../misclib-memprof-lapi.test.lua             | 97 +++++++++++++++----
>  tools/memprof/parse.lua                       | 13 ++-
>  tools/utils/symtab.lua                        | 20 ++--
>  6 files changed, 148 insertions(+), 35 deletions(-)
> 

<snipped>

> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> index 9de4bd98..3f4ffea0 100644
> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua

<snipped>

> @@ -96,10 +113,21 @@ local function form_source_line(line)
>    return string.format("@%s:%d", arg[0], line)
>  end
>  
> -local function check_alloc_report(alloc, line, function_line, nevents)
> -  assert(form_source_line(function_line) == alloc[line].name)
> -  assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
> -    alloc[line].num,
> +local function check_alloc_report(alloc, traceno, line, function_line, nevents)

OK, here we have a function with 5 parameters. Almost half of them is
used in a single particular case. Hence, I propose to change the
interface to the following:
| local function check_alloc_report(alloc, location, nevents)

In this case location is a table with either "traceno" key or with
"line" and "linedefined" keys. Such function looks better, doesn't it?
Consider the following:
| check_alloc_report(alloc, { traceno = 1 }, 20)
| check_alloc_report(alloc, { line = 34, linedefined = 32 }, 2)

> +  local expected_name, event
> +  if traceno ~= 0 then
> +    expected_name = string.format("TRACE [%d]", traceno)
> +    event = alloc.trace[traceno]
> +  else
> +    expected_name = form_source_line(function_line)
> +    event = alloc.line[line]
> +  end
> +  assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
> +    event.name,
> +    expected_name
> +  ))
> +  assert(event.num == nevents, ("got=%d, expected=%d"):format(
> +    event.num,
>      nevents
>    ))
>    return true
> @@ -145,18 +173,18 @@ test:test("output", function(subtest)
>    -- one is the number of allocations. 1 event - alocation of
>    -- table by itself + 1 allocation of array part as far it is
>    -- bigger than LJ_MAX_COLOSIZE (16).
> -  subtest:ok(check_alloc_report(alloc, 27, 25, 2))
> -  -- 100 strings allocations.
> -  subtest:ok(check_alloc_report(alloc, 32, 25, 100))
> +  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
> +  -- 20 strings allocations.
> +  subtest:ok(check_alloc_report(alloc, 0, 39, 32, 20))

As a result of the change proposed above, you need no these ugly fixes
for the existing spots in the future.

>  
>    -- Collect all previous allocated objects.
> -  subtest:ok(free.INTERNAL.num == 102)
> +  subtest:ok(free.INTERNAL.num == 22)
>  
>    -- Tests for leak-only option.
>    -- See also https://github.com/tarantool/tarantool/issues/5812.
>    local heap_delta = process.form_heap_delta(events, symbols)
> -  local tab_alloc_stats = heap_delta[form_source_line(27)]
> -  local str_alloc_stats = heap_delta[form_source_line(32)]
> +  local tab_alloc_stats = heap_delta[form_source_line(34)]
> +  local str_alloc_stats = heap_delta[form_source_line(39)]
>    subtest:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
>    subtest:ok(tab_alloc_stats.dbytes == 0)
>    subtest:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
> @@ -185,5 +213,38 @@ test:test("stack-resize", function(subtest)
>    misc.memprof.stop()
>  end)
>  
> +-- Test profiler with enabled JIT.
>  jit.on()
> +
> +test:test("jit-output", function(subtest)
> +  -- Disabled on *BSD due to #4819.
> +  if jit.os == 'BSD' then
> +    subtest:plan(1)
> +    subtest:skip('Disabled due to #4819')
> +    return
> +  end
> +
> +  subtest:plan(3)
> +
> +  jit.opt.start(3, "hotloop=10")
> +  jit.flush()
> +
> +  -- Pregenerate traces to fill symtab entries in the next run.
> +  default_payload()
> +
> +  local symbols, events = generate_parsed_output(default_payload)
> +
> +  local alloc = fill_ev_type(events, symbols, "alloc")
> +  local free = fill_ev_type(events, symbols, "free")
> +
> +  -- We expect, that loop will be compiled into a trace.
> +  subtest:ok(check_alloc_report(alloc, 1, 37, 32, 20))

What are 37 and 32 in this case? I can use 0 and 0 for both parameters
and the test passes, doesn't it? This is another argument for using
table with different number of keys.

> +  -- See same checks with jit.off().
> +  subtest:ok(check_alloc_report(alloc, 0, 34, 32, 2))
> +  subtest:ok(free.INTERNAL.num == 22)
> +
> +  -- Restore default JIT settings.
> +  jit.opt.start(unpack(jit_opt_default))
> +end)
> +
>  os.exit(test:check() and 0 or 1)

<snipped>

> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> index e01daa62..85945fb2 100644
> --- a/tools/utils/symtab.lua
> +++ b/tools/utils/symtab.lua
> @@ -74,21 +74,29 @@ function M.parse(reader)
>  end
>  
>  function M.id(loc)
> -  return string_format("f%#xl%d", loc.addr, loc.line)
> +  return string_format("f%#xl%dt%d", loc.addr, loc.line, loc.traceno)
>  end
>  
> -function M.demangle(symtab, loc)
> +local function demangle_lfunc(symtab, loc)
>    local addr = loc.addr
>  
>    if addr == 0 then
>      return "INTERNAL"
> -  end
> -
> -  if symtab[addr] then
> +  elseif symtab[addr] then
>      return string_format("%s:%d", symtab[addr].source, loc.line)
>    end
> -

Looks like all these changes are excess, aren't they?

>    return string_format("CFUNC %#x", addr)
>  end
>  
> +local function demangle_trace(loc)
> +  return string_format("TRACE [%d] %#x", loc.traceno, loc.addr)
> +end
> +
> +function M.demangle(symtab, loc)
> +  if loc.traceno ~= 0 then
> +    return demangle_trace(loc)
> +  end
> +  return demangle_lfunc(symtab, loc)

Why 3 of 4 types of ASOURCE are demangled within a separate function
(with a bit strange name), but the traces have a separate one function?
Looks irrationale a bit, IMHO. Then either introduce a new function per
ASOURCE (too crazy as for me) or just move this <if> block for trace
into the original <M.demangle> function (the only logic way I see here).

> +end
> +
>  return M
> -- 
> 2.33.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests
  2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
  2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
@ 2021-10-27 15:07     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 22+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-10-27 15:07 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Hi, Mikhail!

Thanks for the patch!

LGTM, except a few nits above.

On 29.09.21, Mikhail Shishatskiy wrote:
> As the number of memprof test cases is expected to grow,
> memprof tests are separated into subtests to encapsulate
> test cases and be able to skip some of the subtests.
> 
> Also, output generation and parsing separated into a

Typo: s/separated/are separated/

> dedicated function `generate_parsed_output`, which allows
> one to run memprof on different payloads, passing payload
> funtion as an argument.

Typo: s/payload funtion/a payload function/

> 
> Part of tarantool/tarantool#5814
> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  .../misclib-memprof-lapi.test.lua             | 169 ++++++++++--------
>  1 file changed, 94 insertions(+), 75 deletions(-)
> 
> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> index 06d96b3b..9de4bd98 100644
> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> @@ -7,7 +7,7 @@ require("utils").skipcond(

<snipped>

> +-- Test profiler API.
> +test:test("smoke", function(subtest)
> +  subtest:plan(6)
>  
> -res, err = misc.memprof.stop()
> -assert(res, err)
> +  -- Not a directory.
> +  local res, err, errno = misc.memprof.start(BAD_PATH)
> +  subtest:ok(res == nil and err:match("No such file or directory"))
> +  subtest:ok(type(errno) == "number")
>  
> --- Profiler is not running.
> -res, err, errno = misc.memprof.stop()
> -test:ok(res == nil and err:match("profiler is not running"))
> -test:ok(type(errno) == "number")
> +  -- Profiler is running.
> +  res, err = misc.memprof.start(TMP_BINFILE)

Nit: This tmp file will be rewritten and removed later by the "output"
test. Please leave the corresponding comment.

> +  assert(res, err)
> +  res, err, errno = misc.memprof.start(TMP_BINFILE)
> +  subtest:ok(res == nil and err:match("profiler is running already"))
> +  subtest:ok(type(errno) == "number")
>  

<snipped>

> -- 
> 2.33.0
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-10-27 15:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20  7:05 [Tarantool-patches] [PATCH luajit v3 0/4] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 1/5] core: add const to lj_debug_line proto parameter Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 2/5] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:29   ` Igor Munkin via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 3/5] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 4/5] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
2021-09-29 19:21     ` Mikhail Shishatskiy via Tarantool-patches
2021-08-20  7:05 ` [Tarantool-patches] [PATCH luajit v3 5/5] luajit: change order of modules Mikhail Shishatskiy via Tarantool-patches
2021-09-16 15:32   ` Igor Munkin via Tarantool-patches
2021-09-29 20:07 ` [Tarantool-patches] [PATCH luajit v4 0/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 1/4] test: separate memprof Lua API tests into subtests Mikhail Shishatskiy via Tarantool-patches
2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
2021-10-27 15:07     ` Sergey Kaplun via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 2/4] memprof: refactor location parsing Mikhail Shishatskiy via Tarantool-patches
2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 3/4] memprof: group allocations on traces by traceno Mikhail Shishatskiy via Tarantool-patches
2021-10-27 13:56     ` Igor Munkin via Tarantool-patches
2021-09-29 20:07   ` [Tarantool-patches] [PATCH luajit v4 4/4] memprof: add info about trace start to symtab Mikhail Shishatskiy via Tarantool-patches

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git