Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v1] memprof: group allocations on traces by trace number
@ 2021-07-21  9:44 Mikhail Shishatskiy via Tarantool-patches
  2021-07-21 11:47 ` Sergey Kaplun via Tarantool-patches
  0 siblings, 1 reply; 5+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-07-21  9:44 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

When luajit executes a trace, it's id is stored in virtual machine
state. So, we can treat this trace number as allocation event source
in memprof and report allocation events from traces as well.

Previously all the allocations from traces were marked as INTERNAL.

This patch introduces functionality described above by adding new
allocation source type named ASOURCE_TRACE. If at the moment when
allocation event occur vmstate indicates that trace is executed,
trace id is streamed to binary file:

| loc-trace := trace-no
| trace-no  := <ULEB128>

Also, memory profiler parser was tuned to recognize this source type by
extending <loc> structure: field trace, representing trace id, was
added.

Now, for example, all the allocation events happened in trace with id 42,
will be reported at source 'TRACE [42]' and not at 'INTERNAL' source.

Closes tarantool/tarantool#5814
---

Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number

 src/lj_memprof.c        | 16 ++++++++--------
 src/lj_memprof.h        |  9 ++++++---
 tools/memprof/parse.lua | 18 +++++++++++-------
 tools/utils/symtab.lua  |  5 +++++
 4 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2c1ef3b8..0cbc0ed7 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -146,6 +146,13 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
     lua_assert(0);
 }
 
+static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
+{
+  struct lj_wbuf *out = &mp->out;
+  lj_wbuf_addbyte(out, aevent | ASOURCE_TRACE);
+  lj_wbuf_addu64(out, (uint64_t)mp->g->vmstate); /* write traceno. */
+}
+
 static void memprof_write_hvmstate(struct memprof *mp, uint8_t aevent)
 {
   lj_wbuf_addbyte(&mp->out, aevent | ASOURCE_INT);
@@ -163,14 +170,7 @@ static const memprof_writer memprof_writers[] = {
   memprof_write_hvmstate, /* LJ_VMST_RECORD */
   memprof_write_hvmstate, /* LJ_VMST_OPT */
   memprof_write_hvmstate, /* LJ_VMST_ASM */
-  /*
-  ** XXX: In ideal world, we should report allocations from traces as well.
-  ** But since traces must follow the semantics of the original code,
-  ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
-  ** which makes using memprof with enabled JIT virtually redundant.
-  ** Hence use the stub below.
-  */
-  memprof_write_hvmstate /* LJ_VMST_TRACE */
+  memprof_write_trace /* LJ_VMST_TRACE */
 };
 
 static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 3417475d..13125536 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -69,11 +69,13 @@
 ** event-realloc  := event-header loc? oaddr osize naddr nsize
 ** event-free     := event-header loc? oaddr osize
 ** event-header   := <BYTE>
-** loc            := loc-lua | loc-c
+** loc            := loc-lua | loc-c | loc-trace
 ** loc-lua        := sym-addr line-no
 ** loc-c          := sym-addr
+** loc-trace      := trace-no
 ** sym-addr       := <ULEB128>
 ** line-no        := <ULEB128>
+** trace-no       := <ULEB128>
 ** oaddr          := <ULEB128>
 ** naddr          := <ULEB128>
 ** osize          := <ULEB128>
@@ -90,8 +92,8 @@
 **
 ** event-header: [FUUUSSEE]
 **  * EE   : 2 bits for representing allocation event type (AEVENT_*)
-**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
-**  * UUU  : 3 unused bits
+**  * SSS  : 3 bits for representing allocation source type (ASOURCE_*)
+**  * UU   : 2 unused bits
 **  * F    : 0 for regular events, 1 for epilogue's *F*inal header
 **           (if F is set to 1, all other bits are currently ignored)
 */
@@ -105,6 +107,7 @@
 #define ASOURCE_INT   ((uint8_t)(1 << 2))
 #define ASOURCE_LFUNC ((uint8_t)(2 << 2))
 #define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+#define ASOURCE_TRACE ((uint8_t)(4 << 2))
 
 #define LJM_EPILOGUE_HEADER 0x80
 
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 12e2758f..2bd491c8 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -24,8 +24,9 @@ local AEVENT_MASK = 0x3
 local ASOURCE_INT = lshift(1, 2)
 local ASOURCE_LFUNC = lshift(2, 2)
 local ASOURCE_CFUNC = lshift(3, 2)
+local ASOURCE_TRACE = lshift(4, 2)
 
-local ASOURCE_MASK = lshift(0x3, 2)
+local ASOURCE_MASK = lshift(0x7, 2)
 
 local M = {}
 
@@ -59,20 +60,23 @@ local function link_to_previous(heap_chunk, e, nsize)
   end
 end
 
-local function id_location(addr, line)
-  return string_format("f%#xl%d", addr, line), {
+local function id_location(addr, line, trace)
+  return string_format("f%#xl%dxt%d", addr, line, trace), {
     addr = addr,
     line = line,
+    trace = trace,
   }
 end
 
 local function parse_location(reader, asource)
   if asource == ASOURCE_INT then
-    return id_location(0, 0)
+    return id_location(0, 0, 0)
   elseif asource == ASOURCE_CFUNC then
-    return id_location(reader:read_uleb128(), 0)
+    return id_location(reader:read_uleb128(), 0, 0)
   elseif asource == ASOURCE_LFUNC then
-    return id_location(reader:read_uleb128(), reader:read_uleb128())
+    return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
+  elseif asource == ASOURCE_TRACE then
+    return id_location(0, 0, reader:read_uleb128())
   end
   error("Unknown asource "..asource)
 end
@@ -140,7 +144,7 @@ local parsers = {
 }
 
 local function ev_header_is_valid(evh)
-  return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
+  return evh <= 0x1f or evh == LJM_EPILOGUE_HEADER
 end
 
 -- Splits event header into event type (aka aevent = allocation
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 3ed1dd13..6121177f 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -75,6 +75,11 @@ end
 
 function M.demangle(symtab, loc)
   local addr = loc.addr
+  local trace = loc.trace
+
+  if trace ~= 0 then
+    return string_format("TRACE [%d]", trace)
+  end
 
   if addr == 0 then
     return "INTERNAL"
-- 
2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1] memprof: group allocations on traces by trace number
  2021-07-21  9:44 [Tarantool-patches] [PATCH luajit v1] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
@ 2021-07-21 11:47 ` Sergey Kaplun via Tarantool-patches
  2021-07-21 11:52   ` Sergey Kaplun via Tarantool-patches
  2021-07-23  7:54   ` Mikhail Shishatskiy via Tarantool-patches
  0 siblings, 2 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-07-21 11:47 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: tarantool-patches

Hi! Thanks for the patch!
Please consider my comments below.

On 21.07.21, Mikhail Shishatskiy wrote:
> When luajit executes a trace, it's id is stored in virtual machine

Typo: s/luajit/LuaJIT/

Nit: suggest to change: s/it's id/trace's number/
Feel free to ignore.

Typo: s/in virtual machine state/in the virtual machine state/

> state. So, we can treat this trace number as allocation event source

Typo: /allocation event source/an allocation event source/

> in memprof and report allocation events from traces as well.
> 
> Previously all the allocations from traces were marked as INTERNAL.

Typo: s/Previously all/Previously, all/

> 
> This patch introduces functionality described above by adding new
> allocation source type named ASOURCE_TRACE. If at the moment when
> allocation event occur vmstate indicates that trace is executed,
> trace id is streamed to binary file:

Typo: s/to binary file/to a binary file/

> 
> | loc-trace := trace-no
> | trace-no  := <ULEB128>
> 
> Also, memory profiler parser was tuned to recognize this source type by

Nit: s/tuned/adjusted/
Feel free to ignore.

> extending <loc> structure: field trace, representing trace id, was
> added.
> 
> Now, for example, all the allocation events happened in trace with id 42,

Nit: s/id/number/
Feel free to ignore.

> will be reported at source 'TRACE [42]' and not at 'INTERNAL' source.

Nit: In my opinion "reported as <smth> source" fills better here.
Feel free to ignore.

I suppose that we can do a little bit more, than just report trace
number. We can also provide information about its start pc (i.e.
its start line). It is already implemented for perftools (see
`perftools_addtrace()` function in <src/lj_trace.c>).

With it, users can get a rough estimate of what trace causes allocation,
without run two tools at the moment (`jit.dump` and `memprof`).
Also, it can bring some clear about the trace if `jit.flush()` is
called during memprof running.

> 
> Closes tarantool/tarantool#5814

Nit: it is better to use "Resolves" either "Closes".
Rationale: Issue isn't closed by this commit. It will be closed after
the LuaJIT submodule is bumped in the Tarantool. But, indeed, the issue
is resolved by this commit.

> ---
> 
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> 
>  src/lj_memprof.c        | 16 ++++++++--------
>  src/lj_memprof.h        |  9 ++++++---
>  tools/memprof/parse.lua | 18 +++++++++++-------
>  tools/utils/symtab.lua  |  5 +++++

I suppose, that it will be nice to add some tests for the patch to check
its behaviour. Also, performance measurements are welcome! :)

You can use the benchmarks from here [1] as it was done for memprof.

>  4 files changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index 2c1ef3b8..0cbc0ed7 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
> @@ -146,6 +146,13 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
>      lua_assert(0);
>  }
>  
> +static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
> +{
> +  struct lj_wbuf *out = &mp->out;
> +  lj_wbuf_addbyte(out, aevent | ASOURCE_TRACE);
> +  lj_wbuf_addu64(out, (uint64_t)mp->g->vmstate); /* write traceno. */

Nit: I suppose, that the comment is redundant, if we add a comment to
corresponding `memprof_writers[]` entry (see below).
Also, we prefer to avoid usage of inline comments (structure or
array definition are exceptions). Just use the previous line: see
`dump_symtab()` for example.

> +}
> +
>  static void memprof_write_hvmstate(struct memprof *mp, uint8_t aevent)
>  {
>    lj_wbuf_addbyte(&mp->out, aevent | ASOURCE_INT);
> @@ -163,14 +170,7 @@ static const memprof_writer memprof_writers[] = {
>    memprof_write_hvmstate, /* LJ_VMST_RECORD */
>    memprof_write_hvmstate, /* LJ_VMST_OPT */
>    memprof_write_hvmstate, /* LJ_VMST_ASM */
> -  /*
> -  ** XXX: In ideal world, we should report allocations from traces as well.
> -  ** But since traces must follow the semantics of the original code,
> -  ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
> -  ** which makes using memprof with enabled JIT virtually redundant.
> -  ** Hence use the stub below.
> -  */

I suppose that this comment is still valid -- we *should* report
allocations from traces, like it is done for bytecode execution via the
VM. So I suggest to adjust this comment by replacing the last sentence
with the description of what `memprof_write_trace()` does instead.

> -  memprof_write_hvmstate /* LJ_VMST_TRACE */
> +  memprof_write_trace /* LJ_VMST_TRACE */
>  };
>  
>  static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> index 3417475d..13125536 100644
> --- a/src/lj_memprof.h
> +++ b/src/lj_memprof.h
> @@ -69,11 +69,13 @@
>  ** event-realloc  := event-header loc? oaddr osize naddr nsize
>  ** event-free     := event-header loc? oaddr osize
>  ** event-header   := <BYTE>
> -** loc            := loc-lua | loc-c
> +** loc            := loc-lua | loc-c | loc-trace
>  ** loc-lua        := sym-addr line-no
>  ** loc-c          := sym-addr
> +** loc-trace      := trace-no
>  ** sym-addr       := <ULEB128>
>  ** line-no        := <ULEB128>
> +** trace-no       := <ULEB128>
>  ** oaddr          := <ULEB128>
>  ** naddr          := <ULEB128>
>  ** osize          := <ULEB128>
> @@ -90,8 +92,8 @@
>  **
>  ** event-header: [FUUUSSEE]

Please, update this header description too.

>  **  * EE   : 2 bits for representing allocation event type (AEVENT_*)

<snipped>

> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> index 12e2758f..2bd491c8 100644
> --- a/tools/memprof/parse.lua
> +++ b/tools/memprof/parse.lua

<snipped>

> @@ -59,20 +60,23 @@ local function link_to_previous(heap_chunk, e, nsize)
>    end
>  end
>  
> -local function id_location(addr, line)
> -  return string_format("f%#xl%d", addr, line), {
> +local function id_location(addr, line, trace)
> +  return string_format("f%#xl%dxt%d", addr, line, trace), {

Typo: s/"f%#xl%dxt%d"/"f%#xl%dt%d"/ - x symbol looks redundant here.

Nit: MB "traceno" fills better than trace.
Feel free to ignore.

>      addr = addr,
>      line = line,
> +    trace = trace,
>    }
>  end
>  
>  local function parse_location(reader, asource)
>    if asource == ASOURCE_INT then
> -    return id_location(0, 0)
> +    return id_location(0, 0, 0)
>    elseif asource == ASOURCE_CFUNC then
> -    return id_location(reader:read_uleb128(), 0)
> +    return id_location(reader:read_uleb128(), 0, 0)
>    elseif asource == ASOURCE_LFUNC then
> -    return id_location(reader:read_uleb128(), reader:read_uleb128())
> +    return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
> +  elseif asource == ASOURCE_TRACE then
> +    return id_location(0, 0, reader:read_uleb128())

Side note: The first two arguments can be used if we will dump
information about the start of a trace.

>    end
>    error("Unknown asource "..asource)
>  end
> @@ -140,7 +144,7 @@ local parsers = {
>  }
>  
>  local function ev_header_is_valid(evh)
> -  return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
> +  return evh <= 0x1f or evh == LJM_EPILOGUE_HEADER

0x1f is too much: the event header maximum value is the following:

| $ perl -E 'say sprintf("0x%x", (4<<2)+3)'
| 0x13

4<<2 is the biggest possible allocation event type (trace). And + 3
is the reallocation event type.

I suggest to create the corresponding constant for the check. It will
simplify maintenance in the future.

>  end
>  
>  -- Splits event header into event type (aka aevent = allocation
> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> index 3ed1dd13..6121177f 100644
> --- a/tools/utils/symtab.lua
> +++ b/tools/utils/symtab.lua
> @@ -75,6 +75,11 @@ end
>  
>  function M.demangle(symtab, loc)
>    local addr = loc.addr
> +  local trace = loc.trace
> +
> +  if trace ~= 0 then

Ditto, the nit about s/trace/traceno/.
Feel free to ignore.

> +    return string_format("TRACE [%d]", trace)
> +  end
>  
>    if addr == 0 then
>      return "INTERNAL"
> -- 
> 2.32.0
> 

[1]: https://github.com/LuaJIT/LuaJIT-test-cleanup/tree/master/bench

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1] memprof: group allocations on traces by trace number
  2021-07-21 11:47 ` Sergey Kaplun via Tarantool-patches
@ 2021-07-21 11:52   ` Sergey Kaplun via Tarantool-patches
  2021-07-23  7:54   ` Mikhail Shishatskiy via Tarantool-patches
  1 sibling, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-07-21 11:52 UTC (permalink / raw)
  To: Mikhail Shishatskiy, tarantool-patches

Sorry, one more thing:
We should increase `LJM_CURRENT_FORMAT_VERSION` since we change the
binary protocol.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit v1] memprof: group allocations on traces by trace number
  2021-07-21 11:47 ` Sergey Kaplun via Tarantool-patches
  2021-07-21 11:52   ` Sergey Kaplun via Tarantool-patches
@ 2021-07-23  7:54   ` Mikhail Shishatskiy via Tarantool-patches
  2021-07-27  6:07     ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 5+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-07-23  7:54 UTC (permalink / raw)
  To: Sergey Kaplun, Igor Munkin, Mikhail Shishatskiy via Tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]


Hi! I have one more question:

How should we properly test the behavior of profiler recording allocations from traces?
Now I can write test which roughly estimates number of allocations in loop:
 
| local function payload()
|   -- Preallocate table to avoid table array part reallocations.
|   local _ = table_new(100, 0)
|   -- Want too see 100 objects here.
|   for i = 1, 100 do
|     -- Try to avoid crossing with "test" module objects.
|     _[i] = "memprof-str-"..i
|   end
|   _ = nil
|   -- VMSTATE == GC, reported as INTERNAL.
|   collectgarbage()
| end
  | jit.on() 
| symbols, events = `run_payload_under_memprof_and_parse`()
| alloc = `get_all_alloc_events`(symbols, events)
| test:ok(alloc[`line_where_loop_starts`].num > `some_guaranteed_number`)

But I think it will be great if we could replace > sign with ==. The problem is we cannot guarantee
constant number of allocations in the loop: on the most of platforms with `jit.opt.start(‘’hotloop=1’’, ‘’-sink’’)`
I get 97 allocations in 100-iteration loop, as we spend some iterations to compile the trace. But on freebsd,
for example, I get 24 allocations. ​​​​
 
--
Best regards,
Mikhail Shishatskiy
 
  
>Среда, 21 июля 2021, 14:48 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
> 
>Hi! Thanks for the patch!
>Please consider my comments below.
 
<snipped>
 
>--
>Best regards,
>Sergey Kaplun
 

[-- Attachment #2: Type: text/html, Size: 2262 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1] memprof: group allocations on traces by trace number
  2021-07-23  7:54   ` Mikhail Shishatskiy via Tarantool-patches
@ 2021-07-27  6:07     ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-07-27  6:07 UTC (permalink / raw)
  To: Mikhail Shishatskiy; +Cc: Mikhail Shishatskiy via Tarantool-patches

Hi, Mikhail!

On 23.07.21, Mikhail Shishatskiy wrote:
> 
> Hi! I have one more question:
> 
> How should we properly test the behavior of profiler recording allocations from traces?
> Now I can write test which roughly estimates number of allocations in loop:
>  
> | local function payload()
> |   -- Preallocate table to avoid table array part reallocations.
> |   local _ = table_new(100, 0)
> |   -- Want too see 100 objects here.
> |   for i = 1, 100 do
> |     -- Try to avoid crossing with "test" module objects.
> |     _[i] = "memprof-str-"..i
> |   end
> |   _ = nil
> |   -- VMSTATE == GC, reported as INTERNAL.
> |   collectgarbage()
> | end
>   | jit.on() 
> | symbols, events = `run_payload_under_memprof_and_parse`()
> | alloc = `get_all_alloc_events`(symbols, events)
> | test:ok(alloc[`line_where_loop_starts`].num > `some_guaranteed_number`)
> 
> But I think it will be great if we could replace > sign with ==. The problem is we cannot guarantee
> constant number of allocations in the loop: on the most of platforms with `jit.opt.start(‘’hotloop=1’’, ‘’-sink’’)`
> I get 97 allocations in 100-iteration loop, as we spend some iterations to compile the trace. But on freebsd,
> for example, I get 24 allocations. ​​​​

It's OK due to [1].

You can add the following skip condition for the test on FreeBSD:

| -- Disabled on *BSD due to #4819.
| utils.skipcond(jit.os == 'BSD', 'Disabled due to #4819')

>  
> --
> Best regards,
> Mikhail Shishatskiy
>  
>   
> >Среда, 21 июля 2021, 14:48 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
> > 
> >Hi! Thanks for the patch!
> >Please consider my comments below.
>  
> <snipped>
>  
> >--
> >Best regards,
> >Sergey Kaplun
>  

[1]: https://github.com/tarantool/tarantool/issues/4819

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-27  6:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21  9:44 [Tarantool-patches] [PATCH luajit v1] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
2021-07-21 11:47 ` Sergey Kaplun via Tarantool-patches
2021-07-21 11:52   ` Sergey Kaplun via Tarantool-patches
2021-07-23  7:54   ` Mikhail Shishatskiy via Tarantool-patches
2021-07-27  6:07     ` Sergey Kaplun via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox