* [Tarantool-patches] [PATCH luajit v2 0/2] memprof: group allocations on traces by trace number
@ 2021-07-28 13:42 Mikhail Shishatskiy via Tarantool-patches
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 1/2] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 2/2] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
0 siblings, 2 replies; 5+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-07-28 13:42 UTC (permalink / raw)
To: tarantool-patches, imun, skaplun
New patch series with Sergey's comments in mind.
Changes in v2:
- Fixed comments as per review by Sergey Kaplun;
- Trace dump supplemented with info about trace start pc,
which is stored in the symtab;
- Added test, checking if allocations in trace reported;
- Ran an allocation-heavy benchmark on new patches to test the
behavior.
The benchmark payload code can be found at [1]. It was summoned from
under the wrapper code [2]. The average time of 15 runs in seconds
is presented below. The second field is the standard deviation for
the found difference:
| -------------------------------------- |
| JIT-off, memprof-on |
| -------------------------------------- |
| BEFORE | AFTER |
| ------------------ | ----------------- |
| 9.6155 (0.0656) | +0.2951 (0.0742) |
| -------------------------------------- |
| JIT-on, memprof-on |
| -------------------------------------- |
| BEFORE | AFTER |
| ------------------ | ----------------- |
| 7.0037 (0.2044) | +0.2842 (0.0662) |
| -------------------------------------- |
[1]: https://github.com/LuaJIT/LuaJIT-test-cleanup/blob/master/bench/binary-trees.lua
[2]: https://gist.github.com/Shishqa/d8e29f501d72647878b8993a206cea19
Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
Mikhail Shishatskiy (2):
memprof: dump traceno if allocate from trace
memprof: extend symtab with info about traces
src/lj_memprof.c | 33 ++++-
src/lj_memprof.h | 20 ++-
.../misclib-memprof-lapi.test.lua | 121 +++++++++++++-----
tools/memprof/parse.lua | 22 ++--
tools/utils/symtab.lua | 41 +++++-
5 files changed, 184 insertions(+), 53 deletions(-)
--
2.32.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Tarantool-patches] [PATCH luajit v2 1/2] memprof: dump traceno if allocate from trace
2021-07-28 13:42 [Tarantool-patches] [PATCH luajit v2 0/2] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
@ 2021-07-28 13:42 ` Mikhail Shishatskiy via Tarantool-patches
2021-08-14 8:00 ` Sergey Kaplun via Tarantool-patches
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 2/2] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
1 sibling, 1 reply; 5+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-07-28 13:42 UTC (permalink / raw)
To: tarantool-patches, imun, skaplun
When LuaJIT executes a trace, the trace number is stored in
the virtual machine state. So, we can treat this number as
an allocation event source in memprof and report allocation events
from traces as well.
Previously, all the allocations from traces were marked as INTERNAL.
This patch introduces the functionality described above by adding
a new allocation source type named ASOURCE_TRACE. If at the moment
when allocation event occurs VM state indicates that trace executed,
trace number streamed to a binary file:
| loc-trace := trace-no
| trace-no := <ULEB128>
Also, the memory profiler parser was adjusted to recognize this
source type by extending <loc> structure: field <traceno>,
representing trace number, was added.
Part of tarantool/tarantool#5814
---
Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
src/lj_memprof.c | 14 ++++++++++++--
src/lj_memprof.h | 13 ++++++++-----
tools/memprof/parse.lua | 22 ++++++++++++++--------
3 files changed, 34 insertions(+), 15 deletions(-)
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2c1ef3b8..0c7e9e89 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -146,6 +146,14 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
lua_assert(0);
}
+static void memprof_write_trace(struct memprof *mp, uint8_t aevent)
+{
+ struct lj_wbuf *out = &mp->out;
+ const global_State *g = mp->g;
+ lj_wbuf_addbyte(out, aevent | ASOURCE_TRACE);
+ lj_wbuf_addu64(out, (uint64_t)g->vmstate);
+}
+
static void memprof_write_hvmstate(struct memprof *mp, uint8_t aevent)
{
lj_wbuf_addbyte(&mp->out, aevent | ASOURCE_INT);
@@ -168,9 +176,11 @@ static const memprof_writer memprof_writers[] = {
** But since traces must follow the semantics of the original code,
** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
** which makes using memprof with enabled JIT virtually redundant.
- ** Hence use the stub below.
+ ** But if one wants to investigate allocations with JIT enabled,
+ ** memprof_write_trace() dumps trace number to the binary output.
+ ** It can be useful to compare with with jit.v or jit.dump outputs.
*/
- memprof_write_hvmstate /* LJ_VMST_TRACE */
+ memprof_write_trace /* LJ_VMST_TRACE */
};
static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 3417475d..e3f55433 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -53,7 +53,7 @@
#define SYMTAB_LFUNC ((uint8_t)0)
#define SYMTAB_FINAL ((uint8_t)0x80)
-#define LJM_CURRENT_FORMAT_VERSION 0x01
+#define LJM_CURRENT_FORMAT_VERSION 0x02
/*
** Event stream format:
@@ -69,11 +69,13 @@
** event-realloc := event-header loc? oaddr osize naddr nsize
** event-free := event-header loc? oaddr osize
** event-header := <BYTE>
-** loc := loc-lua | loc-c
+** loc := loc-lua | loc-c | loc-trace
** loc-lua := sym-addr line-no
** loc-c := sym-addr
+** loc-trace := trace-no
** sym-addr := <ULEB128>
** line-no := <ULEB128>
+** trace-no := <ULEB128>
** oaddr := <ULEB128>
** naddr := <ULEB128>
** osize := <ULEB128>
@@ -88,10 +90,10 @@
** version: [VVVVVVVV]
** * VVVVVVVV: Byte interpreted as a plain integer version number
**
-** event-header: [FUUUSSEE]
+** event-header: [FUUSSSEE]
** * EE : 2 bits for representing allocation event type (AEVENT_*)
-** * SS : 2 bits for representing allocation source type (ASOURCE_*)
-** * UUU : 3 unused bits
+** * SSS : 3 bits for representing allocation source type (ASOURCE_*)
+** * UU : 2 unused bits
** * F : 0 for regular events, 1 for epilogue's *F*inal header
** (if F is set to 1, all other bits are currently ignored)
*/
@@ -105,6 +107,7 @@
#define ASOURCE_INT ((uint8_t)(1 << 2))
#define ASOURCE_LFUNC ((uint8_t)(2 << 2))
#define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+#define ASOURCE_TRACE ((uint8_t)(4 << 2))
#define LJM_EPILOGUE_HEADER 0x80
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 12e2758f..2d3c71f0 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -11,7 +11,7 @@ local lshift = bit.lshift
local string_format = string.format
local LJM_MAGIC = "ljm"
-local LJM_CURRENT_VERSION = 1
+local LJM_CURRENT_VERSION = 0x02
local LJM_EPILOGUE_HEADER = 0x80
@@ -24,8 +24,11 @@ local AEVENT_MASK = 0x3
local ASOURCE_INT = lshift(1, 2)
local ASOURCE_LFUNC = lshift(2, 2)
local ASOURCE_CFUNC = lshift(3, 2)
+local ASOURCE_TRACE = lshift(4, 2)
-local ASOURCE_MASK = lshift(0x3, 2)
+local ASOURCE_MASK = lshift(0x7, 2)
+
+local EV_HEADER_MAX = ASOURCE_TRACE + AEVENT_REALLOC
local M = {}
@@ -59,20 +62,23 @@ local function link_to_previous(heap_chunk, e, nsize)
end
end
-local function id_location(addr, line)
- return string_format("f%#xl%d", addr, line), {
+local function id_location(addr, line, traceno)
+ return string_format("f%#xl%dt%d", addr, line, traceno), {
addr = addr,
line = line,
+ traceno = traceno,
}
end
local function parse_location(reader, asource)
if asource == ASOURCE_INT then
- return id_location(0, 0)
+ return id_location(0, 0, 0)
elseif asource == ASOURCE_CFUNC then
- return id_location(reader:read_uleb128(), 0)
+ return id_location(reader:read_uleb128(), 0, 0)
elseif asource == ASOURCE_LFUNC then
- return id_location(reader:read_uleb128(), reader:read_uleb128())
+ return id_location(reader:read_uleb128(), reader:read_uleb128(), 0)
+ elseif asource == ASOURCE_TRACE then
+ return id_location(0, 0, reader:read_uleb128())
end
error("Unknown asource "..asource)
end
@@ -140,7 +146,7 @@ local parsers = {
}
local function ev_header_is_valid(evh)
- return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
+ return evh <= EV_HEADER_MAX or evh == LJM_EPILOGUE_HEADER
end
-- Splits event header into event type (aka aevent = allocation
--
2.32.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Tarantool-patches] [PATCH luajit v2 2/2] memprof: extend symtab with info about traces
2021-07-28 13:42 [Tarantool-patches] [PATCH luajit v2 0/2] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 1/2] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
@ 2021-07-28 13:42 ` Mikhail Shishatskiy via Tarantool-patches
2021-08-14 11:03 ` Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 5+ messages in thread
From: Mikhail Shishatskiy via Tarantool-patches @ 2021-07-28 13:42 UTC (permalink / raw)
To: tarantool-patches, imun, skaplun
This patch extends the memprof symtab by adding information
about traces, which are present at the start of profiling.
The symtab format changed with adding <sym-trace> entry,
which consists of a trace number, address of function proto,
and line, where trace recording started.
Also, the memprof parser was adjusted to recognize new
symtab entries and associate them with allocation events
from traces.
This patch also provides tests to test the behavior of
the memory profiler with JIT turned on.
Part of tarantool/tarantool#5814
---
Issue: https://github.com/tarantool/tarantool/issues/5814
Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
src/lj_memprof.c | 19 +++
src/lj_memprof.h | 7 +-
.../misclib-memprof-lapi.test.lua | 121 +++++++++++++-----
tools/utils/symtab.lua | 41 +++++-
4 files changed, 150 insertions(+), 38 deletions(-)
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 0c7e9e89..87512c3a 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -18,6 +18,7 @@
#include "lj_obj.h"
#include "lj_frame.h"
#include "lj_debug.h"
+#include "lj_trace.h"
/* --------------------------------- Symtab --------------------------------- */
@@ -43,6 +44,24 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
lj_wbuf_addu64(out, (uint64_t)pt->firstline);
break;
}
+ case (~LJ_TTRACE): {
+ GCtrace *trace = gco2trace(o);
+ GCproto *pt = &gcref(trace->startpt)->pt;
+
+ const BCIns *startpc = mref(trace->startpc, const BCIns);
+ lua_assert(startpc >= proto_bc(pt) &&
+ startpc < proto_bc(pt) + pt->sizebc);
+
+ BCLine lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
+ lua_assert(lineno >= 0);
+
+ lj_wbuf_addbyte(out, SYMTAB_TRACE);
+ lj_wbuf_addu64(out, (uint64_t)trace->traceno);
+ lj_wbuf_addu64(out, (uintptr_t)pt);
+ lj_wbuf_addu64(out, (uint64_t)lineno);
+
+ break;
+ }
default:
break;
}
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index e3f55433..9e950b94 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -16,7 +16,7 @@
#include "lj_def.h"
#include "lj_wbuf.h"
-#define LJS_CURRENT_VERSION 0x1
+#define LJS_CURRENT_VERSION 0x2
/*
** symtab format:
@@ -25,13 +25,15 @@
** prologue := 'l' 'j' 's' version reserved
** version := <BYTE>
** reserved := <BYTE> <BYTE> <BYTE>
-** sym := sym-lua | sym-final
+** sym := sym-lua | sym-trace | sym-final
** sym-lua := sym-header sym-addr sym-chunk sym-line
+** sym-trace := sym-header trace-no sym-addr sym-line
** sym-header := <BYTE>
** sym-addr := <ULEB128>
** sym-chunk := string
** sym-line := <ULEB128>
** sym-final := sym-header
+** trace-no := <ULEB128>
** string := string-len string-payload
** string-len := <ULEB128>
** string-payload := <BYTE> {string-len}
@@ -51,6 +53,7 @@
*/
#define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_TRACE ((uint8_t)1)
#define SYMTAB_FINAL ((uint8_t)0x80)
#define LJM_CURRENT_FORMAT_VERSION 0x02
diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 06d96b3b..b7e456e1 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -1,13 +1,18 @@
+local utils = require "utils"
+
-- Memprof is implemented for x86 and x64 architectures only.
-require("utils").skipcond(
+utils.skipcond(
jit.arch ~= "x86" and jit.arch ~= "x64",
jit.arch.." architecture is NIY for memprof"
)
+-- Disabled on *BSD due to #4819.
+utils.skipcond(jit.os == 'BSD', 'Disabled due to #4819')
+
local tap = require("tap")
local test = tap.test("misc-memprof-lapi")
-test:plan(13)
+test:plan(16)
jit.off()
jit.flush()
@@ -22,7 +27,7 @@ local symtab = require "utils.symtab"
local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
local BAD_PATH = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%1/memprofdata.tmp.bin")
-local function payload()
+local function default_payload()
-- Preallocate table to avoid table array part reallocations.
local _ = table_new(100, 0)
@@ -37,7 +42,7 @@ local function payload()
collectgarbage()
end
-local function generate_output(filename)
+local function generate_output(filename, payload)
-- Clean up all garbage to avoid pollution of free.
collectgarbage()
@@ -52,19 +57,48 @@ local function generate_output(filename)
assert(res, err)
end
+local function generate_parsed_output(payload)
+ local res, err = pcall(generate_output, TMP_BINFILE, payload)
+
+ -- Want to cleanup carefully if something went wrong.
+ if not res then
+ os.remove(TMP_BINFILE)
+ error(err)
+ end
+
+ local reader = bufread.new(TMP_BINFILE)
+ local symbols = symtab.parse(reader)
+ local events = memprof.parse(reader)
+
+ -- We don't need it any more.
+ os.remove(TMP_BINFILE)
+
+ return symbols, events
+end
+
local function fill_ev_type(events, symbols, event_type)
local ev_type = {}
for _, event in pairs(events[event_type]) do
local addr = event.loc.addr
- if addr == 0 then
+ local traceno = event.loc.traceno
+ if traceno ~= 0 then
+ local trace_loc = symbols.trace[traceno]
+ addr = trace_loc.addr
+ ev_type[trace_loc.line] = {
+ name = string.format("TRACE [%d] %s:%d",
+ traceno, symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
+ ),
+ num = event.num,
+ }
+ elseif addr == 0 then
ev_type.INTERNAL = {
name = "INTERNAL",
num = event.num,
}
- elseif symbols[addr] then
+ elseif symbols.lfunc[addr] then
ev_type[event.loc.line] = {
name = string.format(
- "%s:%d", symbols[addr].source, symbols[addr].linedefined
+ "%s:%d", symbols.lfunc[addr].source, symbols.lfunc[addr].linedefined
),
num = event.num,
}
@@ -77,12 +111,36 @@ local function form_source_line(line)
return string.format("@%s:%d", arg[0], line)
end
-local function check_alloc_report(alloc, line, function_line, nevents)
- assert(form_source_line(function_line) == alloc[line].name)
- assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
- alloc[line].num,
+local function nevents_eq(event, nevents)
+ assert(event.num == nevents, ("got=%d, expected=%d"):format(
+ event.num,
+ nevents
+ ))
+end
+
+local function nevents_gr(event, nevents)
+ assert(event.num > nevents, ("got=%d, expected>%d"):format(
+ event.num,
nevents
))
+end
+
+local function check_alloc_report(alloc, traceno, line, function_line,
+ nevents, cmp_events)
+ assert(alloc[line], ("no event on line %d"):format(line))
+ local event = alloc[line]
+ local expected_name
+ if traceno ~= 0 then
+ expected_name = string.format("TRACE [%d] ", traceno)..
+ form_source_line(function_line)
+ else
+ expected_name = form_source_line(function_line)
+ end
+ assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
+ event.name,
+ expected_name
+ ))
+ cmp_events(event, nevents)
return true
end
@@ -107,20 +165,7 @@ test:ok(res == nil and err:match("profiler is not running"))
test:ok(type(errno) == "number")
-- Test profiler output and parse.
-res, err = pcall(generate_output, TMP_BINFILE)
-
--- Want to cleanup carefully if something went wrong.
-if not res then
- os.remove(TMP_BINFILE)
- error(err)
-end
-
-local reader = bufread.new(TMP_BINFILE)
-local symbols = symtab.parse(reader)
-local events = memprof.parse(reader, symbols)
-
--- We don't need it any more.
-os.remove(TMP_BINFILE)
+local symbols, events = generate_parsed_output(default_payload)
local alloc = fill_ev_type(events, symbols, "alloc")
local free = fill_ev_type(events, symbols, "free")
@@ -131,9 +176,9 @@ local free = fill_ev_type(events, symbols, "free")
-- the number of allocations.
-- 1 event - alocation of table by itself + 1 allocation
-- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
-test:ok(check_alloc_report(alloc, 27, 25, 2))
+test:ok(check_alloc_report(alloc, 0, 32, 30, 2, nevents_eq))
-- 100 strings allocations.
-test:ok(check_alloc_report(alloc, 32, 25, 100))
+test:ok(check_alloc_report(alloc, 0, 37, 30, 100, nevents_eq))
-- Collect all previous allocated objects.
test:ok(free.INTERNAL.num == 102)
@@ -141,8 +186,8 @@ test:ok(free.INTERNAL.num == 102)
-- Tests for leak-only option.
-- See also https://github.com/tarantool/tarantool/issues/5812.
local heap_delta = process.form_heap_delta(events, symbols)
-local tab_alloc_stats = heap_delta[form_source_line(27)]
-local str_alloc_stats = heap_delta[form_source_line(32)]
+local tab_alloc_stats = heap_delta[form_source_line(32)]
+local str_alloc_stats = heap_delta[form_source_line(37)]
test:ok(tab_alloc_stats.nalloc == tab_alloc_stats.nfree)
test:ok(tab_alloc_stats.dbytes == 0)
test:ok(str_alloc_stats.nalloc == str_alloc_stats.nfree)
@@ -166,5 +211,23 @@ local co = coroutine.create(f)
coroutine.resume(co)
misc.memprof.stop()
+-- Test profiler with enabled JIT.
jit.on()
+
+-- Pregenerate traces to get expected output:
+default_payload()
+
+symbols, events = generate_parsed_output(default_payload)
+
+alloc = fill_ev_type(events, symbols, "alloc")
+
+-- We expect, that loop will be compiled into a trace.
+-- Depends on system, there will be 99 or 100 allocations
+-- from trace, so expect greater than 98.
+test:ok(check_alloc_report(alloc, 1, 35, 30, 98, nevents_gr))
+-- See same checks with jit.off().
+test:ok(check_alloc_report(alloc, 0, 32, 30, 2, nevents_eq))
+-- Collect all previous allocated objects.
+test:ok(free.INTERNAL.num == 102)
+
os.exit(test:check() and 0 or 1)
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 3ed1dd13..8151ecf8 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -10,11 +10,12 @@ local band = bit.band
local string_format = string.format
local LJS_MAGIC = "ljs"
-local LJS_CURRENT_VERSION = 1
+local LJS_CURRENT_VERSION = 0x2
local LJS_EPILOGUE_HEADER = 0x80
local LJS_SYMTYPE_MASK = 0x03
local SYMTAB_LFUNC = 0
+local SYMTAB_TRACE = 1
local M = {}
@@ -24,18 +25,33 @@ local function parse_sym_lfunc(reader, symtab)
local sym_chunk = reader:read_string()
local sym_line = reader:read_uleb128()
- symtab[sym_addr] = {
+ symtab.lfunc[sym_addr] = {
source = sym_chunk,
linedefined = sym_line,
}
end
+local function parse_sym_trace(reader, symtab)
+ local traceno = reader:read_uleb128()
+ local sym_addr = reader:read_uleb128()
+ local sym_line = reader:read_uleb128()
+
+ symtab.trace[traceno] = {
+ addr = sym_addr,
+ line = sym_line,
+ }
+end
+
local parsers = {
[SYMTAB_LFUNC] = parse_sym_lfunc,
+ [SYMTAB_TRACE] = parse_sym_trace,
}
function M.parse(reader)
- local symtab = {}
+ local symtab = {
+ lfunc = {},
+ trace = {},
+ }
local magic = reader:read_octets(3)
local version = reader:read_octets(1)
@@ -73,18 +89,29 @@ function M.parse(reader)
return symtab
end
-function M.demangle(symtab, loc)
+local function demangle_lfunc(symtab, loc)
local addr = loc.addr
if addr == 0 then
return "INTERNAL"
+ elseif symtab.lfunc[addr] then
+ return string_format("%s:%d", symtab.lfunc[addr].source, loc.line)
end
+ return string_format("CFUNC %#x", addr)
+end
- if symtab[addr] then
- return string_format("%s:%d", symtab[addr].source, loc.line)
+function M.demangle(symtab, loc)
+ local traceno = loc.traceno
+
+ if traceno ~= 0 then
+ if symtab.trace[traceno] then
+ local sym = demangle_lfunc(symtab, symtab.trace[traceno])
+ return string_format("TRACE [%d] ", traceno)..sym
+ end
+ return string_format("TRACE [%d]", traceno)
end
- return string_format("CFUNC %#x", addr)
+ return demangle_lfunc(symtab, loc)
end
return M
--
2.32.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit v2 1/2] memprof: dump traceno if allocate from trace
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 1/2] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-14 8:00 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-14 8:00 UTC (permalink / raw)
To: Mikhail Shishatskiy; +Cc: tarantool-patches
Hi!
Thanks for the fixes!
LGTM, except a few nits regarding the commit message and comment in
the code.
On 28.07.21, Mikhail Shishatskiy wrote:
> When LuaJIT executes a trace, the trace number is stored in
> the virtual machine state. So, we can treat this number as
> an allocation event source in memprof and report allocation events
> from traces as well.
>
> Previously, all the allocations from traces were marked as INTERNAL.
>
> This patch introduces the functionality described above by adding
> a new allocation source type named ASOURCE_TRACE. If at the moment
> when allocation event occurs VM state indicates that trace executed,
> trace number streamed to a binary file:
>
> | loc-trace := trace-no
> | trace-no := <ULEB128>
>
> Also, the memory profiler parser was adjusted to recognize this
Typo: s/was/is/
> source type by extending <loc> structure: field <traceno>,
> representing trace number, was added.
Typo: s/trace number/a trace number/
Typo: s/was/is/
>
> Part of tarantool/tarantool#5814
> ---
>
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>
> src/lj_memprof.c | 14 ++++++++++++--
> src/lj_memprof.h | 13 ++++++++-----
> tools/memprof/parse.lua | 22 ++++++++++++++--------
> 3 files changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index 2c1ef3b8..0c7e9e89 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
<snipped>
> @@ -168,9 +176,11 @@ static const memprof_writer memprof_writers[] = {
> ** But since traces must follow the semantics of the original code,
> ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
> ** which makes using memprof with enabled JIT virtually redundant.
> - ** Hence use the stub below.
> + ** But if one wants to investigate allocations with JIT enabled,
> + ** memprof_write_trace() dumps trace number to the binary output.
Typo: trailing whitespace here ---------------------------------------^
> + ** It can be useful to compare with with jit.v or jit.dump outputs.
> */
> - memprof_write_hvmstate /* LJ_VMST_TRACE */
> + memprof_write_trace /* LJ_VMST_TRACE */
> };
>
> static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> index 3417475d..e3f55433 100644
> --- a/src/lj_memprof.h
> +++ b/src/lj_memprof.h
<snipped>
> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> index 12e2758f..2d3c71f0 100644
> --- a/tools/memprof/parse.lua
> +++ b/tools/memprof/parse.lua
<snipped>
> --
> 2.32.0
>
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit v2 2/2] memprof: extend symtab with info about traces
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 2/2] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
@ 2021-08-14 11:03 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-14 11:03 UTC (permalink / raw)
To: Mikhail Shishatskiy; +Cc: tarantool-patches
Hi, Mikhail!
Thanks for the patch!
Please consider my comments below.
On 28.07.21, Mikhail Shishatskiy wrote:
> This patch extends the memprof symtab by adding information
> about traces, which are present at the start of profiling.
>
> The symtab format changed with adding <sym-trace> entry,
> which consists of a trace number, address of function proto,
> and line, where trace recording started.
>
> Also, the memprof parser was adjusted to recognize new
> symtab entries and associate them with allocation events
> from traces.
>
> This patch also provides tests to test the behavior of
> the memory profiler with JIT turned on.
Nit: This part is obvious and can be dropped, I suppose.
Feel free to ignore.
Also, please mention that API of <utils/symtab.lua> is changed (I'm
about trace\lfunc entries).
>
> Part of tarantool/tarantool#5814
Should it be Resolves here?
> ---
>
> Issue: https://github.com/tarantool/tarantool/issues/5814
> Branch: https://github.com/tarantool/luajit/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
> CI: https://github.com/tarantool/tarantool/tree/shishqa/gh-5814-group-allocations-on-trace-by-trace-number
>
> src/lj_memprof.c | 19 +++
> src/lj_memprof.h | 7 +-
> .../misclib-memprof-lapi.test.lua | 121 +++++++++++++-----
> tools/utils/symtab.lua | 41 +++++-
> 4 files changed, 150 insertions(+), 38 deletions(-)
>
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index 0c7e9e89..87512c3a 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
> @@ -18,6 +18,7 @@
> #include "lj_obj.h"
> #include "lj_frame.h"
> #include "lj_debug.h"
> +#include "lj_trace.h"
I suppose you need "lj_jit.h" here, not "lj_trace.h".
Also I suppose, that this should be enabled only, if LuaJIT compiled
with enabled JIT.
This regarding to the previous patch too -- we should add lua_assert(0);
if LuaJIT compiled without JIT to all JIT-related functions (maybe in the
separate commit).
Also, please update Makefile.dep.original correspondingly with added
header.
>
> /* --------------------------------- Symtab --------------------------------- */
>
> @@ -43,6 +44,24 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
> lj_wbuf_addu64(out, (uint64_t)pt->firstline);
> break;
> }
> + case (~LJ_TTRACE): {
> + GCtrace *trace = gco2trace(o);
Nit: We don't want to change trace, so we can add `const` here
> + GCproto *pt = &gcref(trace->startpt)->pt;
Ditto.
> +
^^^^^
Typo: Trailing whitespaces
> + const BCIns *startpc = mref(trace->startpc, const BCIns);
> + lua_assert(startpc >= proto_bc(pt) &&
Typo: Trailing whitespace --------------------^
> + startpc < proto_bc(pt) + pt->sizebc);
> +
> + BCLine lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
Nit: Please declare all variables at the beggining of the statement.
Else you can observe the following warnings:
| /home/burii/reviews/luajit/memprof-tracealloc/src/lj_memprof.c: In function 'dump_symtab':
| /home/burii/reviews/luajit/memprof-tracealloc/src/lj_memprof.c:55:7: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
| 55 | BCLine lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
| | ^~~~~~
| /home/burii/reviews/luajit/memprof-tracealloc/src/lj_memprof.c: In function 'dump_symtab':
| /home/burii/reviews/luajit/memprof-tracealloc/src/lj_memprof.c:55:7: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
| 55 | BCLine lineno = lj_debug_line(pt, proto_bcpos(pt, startpc));
if configure build like this:
| cmake -DCMAKE_C_FLAGS="-Wextra -Wdeclaration-after-statement -Wredundant-decls -Wshadow -Wpointer-arith"\
| -DCMAKE_BUILD_TYPE=Debug -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON . && make -j
> + lua_assert(lineno >= 0);
> +
> + lj_wbuf_addbyte(out, SYMTAB_TRACE);
> + lj_wbuf_addu64(out, (uint64_t)trace->traceno);
> + lj_wbuf_addu64(out, (uintptr_t)pt);
Nit: Please, add comment that all existing prototypes have already been
dumped and we have no need to repeat their dump for trace locations.
> + lj_wbuf_addu64(out, (uint64_t)lineno);
> +
> + break;
^
Typo: Trailling whitespace
Side note: Please, tune your favorite editor to highlight trailing
whitespaces.
> + }
> default:
> break;
> }
> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> index e3f55433..9e950b94 100644
> --- a/src/lj_memprof.h
> +++ b/src/lj_memprof.h
<snipped>
> diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> index 06d96b3b..b7e456e1 100644
> --- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
> +++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
> @@ -1,13 +1,18 @@
> +local utils = require "utils"
> +
> -- Memprof is implemented for x86 and x64 architectures only.
> -require("utils").skipcond(
> +utils.skipcond(
> jit.arch ~= "x86" and jit.arch ~= "x64",
> jit.arch.." architecture is NIY for memprof"
> )
>
> +-- Disabled on *BSD due to #4819.
> +utils.skipcond(jit.os == 'BSD', 'Disabled due to #4819')
> +
We should skip only JIT-related tests here, but not all memprof tests.
> local tap = require("tap")
>
<snipped>
> -local function check_alloc_report(alloc, line, function_line, nevents)
> - assert(form_source_line(function_line) == alloc[line].name)
> - assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
> - alloc[line].num,
> +local function nevents_eq(event, nevents)
> + assert(event.num == nevents, ("got=%d, expected=%d"):format(
> + event.num,
> + nevents
> + ))
> +end
> +
> +local function nevents_gr(event, nevents)
> + assert(event.num > nevents, ("got=%d, expected>%d"):format(
> + event.num,
> nevents
> ))
> +end
> +
> +local function check_alloc_report(alloc, traceno, line, function_line,
Nit: I suggest to add new fields (traceno, for example) not to the middle
of arguments list, but to the end.
Feel free to ignore.
> + nevents, cmp_events)
> + assert(alloc[line], ("no event on line %d"):format(line))
> + local event = alloc[line]
> + local expected_name
> + if traceno ~= 0 then
> + expected_name = string.format("TRACE [%d] ", traceno)..
> + form_source_line(function_line)
> + else
> + expected_name = form_source_line(function_line)
> + end
> + assert(expected_name == event.name, ("got='%s', expected='%s'"):format(
> + event.name,
> + expected_name
> + ))
> + cmp_events(event, nevents)
> return true
> end
>
> @@ -107,20 +165,7 @@ test:ok(res == nil and err:match("profiler is not running"))
<snipped>
> @@ -166,5 +211,23 @@ local co = coroutine.create(f)
> coroutine.resume(co)
> misc.memprof.stop()
>
> +-- Test profiler with enabled JIT.
> jit.on()
> +
> +-- Pregenerate traces to get expected output:
Nit: "to fill symtab entry" is more verbose to me.
Feel free to ignore.
> +default_payload()
> +
> +symbols, events = generate_parsed_output(default_payload)
> +
> +alloc = fill_ev_type(events, symbols, "alloc")
> +
> +-- We expect, that loop will be compiled into a trace.
> +-- Depends on system, there will be 99 or 100 allocations
Why? Provide an example, please.
Maybe we should to change our payload to lock in the behaviour.
> +-- from trace, so expect greater than 98.
> +test:ok(check_alloc_report(alloc, 1, 35, 30, 98, nevents_gr))
> +-- See same checks with jit.off().
> +test:ok(check_alloc_report(alloc, 0, 32, 30, 2, nevents_eq))
> +-- Collect all previous allocated objects.
> +test:ok(free.INTERNAL.num == 102)
> +
> os.exit(test:check() and 0 or 1)
> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> index 3ed1dd13..8151ecf8 100644
> --- a/tools/utils/symtab.lua
> +++ b/tools/utils/symtab.lua
> @@ -10,11 +10,12 @@ local band = bit.band
<snipped>
> +local function parse_sym_trace(reader, symtab)
> + local traceno = reader:read_uleb128()
> + local sym_addr = reader:read_uleb128()
> + local sym_line = reader:read_uleb128()
> +
> + symtab.trace[traceno] = {
Assume the following:
1) We get output from 100 traces.
2) Make jit.flush().
3) We get output from other 100 traces.
So all output (from new and old traces) will be reported as output for
old traces. So we need at least report trace address as well (yes,
collisions may happen).
Also, we should report addresses of those functions, to avoid collisions
after several jit.flush() calls. It should be removed after symtab
enriching [1] will be implemented.
Thoughts?
> + addr = sym_addr,
> + line = sym_line,
> + }
> +end
> +
> local parsers = {
> [SYMTAB_LFUNC] = parse_sym_lfunc,
> + [SYMTAB_TRACE] = parse_sym_trace,
> }
>
> function M.parse(reader)
> - local symtab = {}
> + local symtab = {
> + lfunc = {},
> + trace = {},
> + }
> local magic = reader:read_octets(3)
> local version = reader:read_octets(1)
>
> @@ -73,18 +89,29 @@ function M.parse(reader)
> return symtab
> end
>
> -function M.demangle(symtab, loc)
> +local function demangle_lfunc(symtab, loc)
> local addr = loc.addr
>
> if addr == 0 then
> return "INTERNAL"
> + elseif symtab.lfunc[addr] then
> + return string_format("%s:%d", symtab.lfunc[addr].source, loc.line)
> end
> + return string_format("CFUNC %#x", addr)
> +end
>
> - if symtab[addr] then
> - return string_format("%s:%d", symtab[addr].source, loc.line)
> +function M.demangle(symtab, loc)
> + local traceno = loc.traceno
> +
> + if traceno ~= 0 then
Nit: we can move this logic (trace related) to the separate function.
> + if symtab.trace[traceno] then
> + local sym = demangle_lfunc(symtab, symtab.trace[traceno])
> + return string_format("TRACE [%d] ", traceno)..sym
Nit: May be it is more userfriendly to add "started at" to avoid
misunderstanding that all allocation of this trace are on `sym` line.
OTOH, idiomatically this should be done not by demangler, but by
humanizer module.
Thoughts?
Nit: Also, IINM there is no way, when sym is something except the value
returned from elseif branch, is it? I suppose, that we can move this
string_format to the separate function (`format_lua_sorce_line()`) and
call it from here and the elseif branch.
> + end
> + return string_format("TRACE [%d]", traceno)
> end
>
> - return string_format("CFUNC %#x", addr)
> + return demangle_lfunc(symtab, loc)
> end
>
> return M
> --
> 2.32.0
>
[1]: https://github.com/tarantool/tarantool/issues/5815
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-14 11:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28 13:42 [Tarantool-patches] [PATCH luajit v2 0/2] memprof: group allocations on traces by trace number Mikhail Shishatskiy via Tarantool-patches
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 1/2] memprof: dump traceno if allocate from trace Mikhail Shishatskiy via Tarantool-patches
2021-08-14 8:00 ` Sergey Kaplun via Tarantool-patches
2021-07-28 13:42 ` [Tarantool-patches] [PATCH luajit v2 2/2] memprof: extend symtab with info about traces Mikhail Shishatskiy via Tarantool-patches
2021-08-14 11:03 ` Sergey Kaplun via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox