* [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving
@ 2022-04-06 13:16 Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols Maxim Kokryashkin via Tarantool-patches
0 siblings, 2 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2022-04-06 13:16 UTC (permalink / raw)
To: tarantool-patches, imun, skaplun
Changes in v7:
- Fixed comments as per review by Sergey
- Added tests for different types of hash sections.
Maxim Kokryashkin (2):
memprof: extend symtab with C-symbols
memprof: enrich symtab with newly loaded symbols
Makefile.original | 2 +-
src/lj_memprof.c | 405 +++++++++++++++++-
src/lj_memprof.h | 17 +-
test/tarantool-tests/CMakeLists.txt | 4 +
.../gh-5813-resolving-of-c-symbols.test.lua | 91 ++++
.../both/CMakeLists.txt | 4 +
.../both/resboth.c | 17 +
.../gnuhash/CMakeLists.txt | 4 +
.../gnuhash/resgnuhash.c | 17 +
.../hash/CMakeLists.txt | 4 +
.../hash/reshash.c | 17 +
.../stripped/CMakeLists.txt | 4 +
.../stripped/resstripped.c | 17 +
test/tarantool-tests/tools-utils-avl.test.lua | 52 +++
tools/CMakeLists.txt | 2 +
tools/memprof.lua | 5 +
tools/memprof/parse.lua | 15 +
tools/utils/avl.lua | 113 +++++
tools/utils/symtab.lua | 29 +-
19 files changed, 805 insertions(+), 14 deletions(-)
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
create mode 100644 test/tarantool-tests/tools-utils-avl.test.lua
create mode 100644 tools/utils/avl.lua
--
2.35.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols
2022-04-06 13:16 [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving Maxim Kokryashkin via Tarantool-patches
@ 2022-04-06 13:16 ` Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols Maxim Kokryashkin via Tarantool-patches
1 sibling, 0 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2022-04-06 13:16 UTC (permalink / raw)
To: tarantool-patches, imun, skaplun
This commit enriches memprof's symbol table with information
about C-symbols. The parser is updated correspondingly.
If there is .symtab section or at least .dynsym segment in
a shared library, then the following data is stored in symtab
for each symbol:
| SYMTAB_CFUNC | symbol address | symbol name |
1 byte 8 bytes
magic
number
If none of those are present, then instead of a symbol name
and its address there will be name and address of a
shared library containing that symbol.
Part of tarantool/tarantool#5813
---
>> +#include "lj_gc.h"
>
> Nit: This include looks excess.
No, it's not. It is needed for the `lj_mem_new`.
> We can pass struct memprof instead custom one -- it already has global state to
> determine lua_State and struct lj_wbuf.
> | lua_State *L = gco2th(gcref(mp->g->mem_L));
Well, we can, but I don't think that it's good in terms of incapsulation.
Makefile.original | 2 +-
src/lj_memprof.c | 323 ++++++++++++++++++
src/lj_memprof.h | 9 +-
test/tarantool-tests/CMakeLists.txt | 4 +
.../gh-5813-resolving-of-c-symbols.test.lua | 87 +++++
.../both/CMakeLists.txt | 4 +
.../both/resboth.c | 17 +
.../gnuhash/CMakeLists.txt | 4 +
.../gnuhash/resgnuhash.c | 17 +
.../hash/CMakeLists.txt | 4 +
.../hash/reshash.c | 17 +
.../stripped/CMakeLists.txt | 4 +
.../stripped/resstripped.c | 17 +
test/tarantool-tests/tools-utils-avl.test.lua | 52 +++
tools/CMakeLists.txt | 2 +
tools/utils/avl.lua | 113 ++++++
tools/utils/symtab.lua | 25 +-
17 files changed, 695 insertions(+), 6 deletions(-)
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
create mode 100644 test/tarantool-tests/tools-utils-avl.test.lua
create mode 100644 tools/utils/avl.lua
diff --git a/Makefile.original b/Makefile.original
index 33dc2ed5..0c92df9e 100644
--- a/Makefile.original
+++ b/Makefile.original
@@ -100,7 +100,7 @@ FILES_JITLIB= bc.lua bcsave.lua dump.lua p.lua v.lua zone.lua \
dis_x86.lua dis_x64.lua dis_arm.lua dis_arm64.lua \
dis_arm64be.lua dis_ppc.lua dis_mips.lua dis_mipsel.lua \
dis_mips64.lua dis_mips64el.lua vmdef.lua
-FILES_UTILSLIB= bufread.lua symtab.lua
+FILES_UTILSLIB= avl.lua bufread.lua symtab.lua
FILES_MEMPROFLIB= parse.lua humanize.lua
FILES_TOOLSLIB= memprof.lua
FILE_TMEMPROF= luajit-parse-memprof
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2d779983..74f9b810 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -8,16 +8,25 @@
#define lj_memprof_c
#define LUA_CORE
+#define _GNU_SOURCE
+
#include <errno.h>
+#include <stdio.h>
#include "lj_arch.h"
#include "lj_memprof.h"
+#if LUAJIT_OS != LUAJIT_OS_OSX
+#include <elf.h>
+#include <link.h>
+#include <sys/auxv.h>
+#endif
#if LJ_HASMEMPROF
#include "lj_obj.h"
#include "lj_frame.h"
#include "lj_debug.h"
+#include "lj_gc.h"
#if LJ_HASJIT
#include "lj_dispatch.h"
@@ -66,12 +75,322 @@ static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
#endif
+#if LUAJIT_OS != LUAJIT_OS_OSX
+
+struct ghashtab_header {
+ uint32_t nbuckets;
+ uint32_t symoffset;
+ uint32_t bloom_size;
+ uint32_t bloom_shift;
+};
+
+static uint32_t ghashtab_size(ElfW(Addr) ghashtab)
+{
+ /*
+ ** There is no easy way to get count of symbols in GNU hashtable, so the
+ ** only way to do this is to take highest possible non-empty bucket and
+ ** iterate through its symbols until the last chain is over.
+ */
+ uint32_t last_entry = 0;
+
+ const uint32_t *chain = NULL;
+ struct ghashtab_header *header = (struct ghashtab_header *)ghashtab;
+ /*
+ ** sizeof(size_t) returns 8, if compiled with 64-bit compiler, and 4 if
+ ** compiled with 32-bit compiler. It is the best option to determine which
+ ** kind of CPU we are running on.
+ */
+ const char *buckets = (char *)ghashtab + sizeof(struct ghashtab_header) +
+ sizeof(size_t) * header->bloom_size;
+
+ uint32_t *cur_bucket = (uint32_t *)buckets;
+ for (uint32_t i = 0; i < header->nbuckets; ++i) {
+ if (last_entry < *cur_bucket)
+ last_entry = *cur_bucket;
+ cur_bucket++;
+ }
+
+ if (last_entry < header->symoffset)
+ return header->symoffset;
+
+ chain = (uint32_t *)(buckets + sizeof(uint32_t) * header->nbuckets);
+ /* The chain ends with the lowest bit set to 1. */
+ while (!(chain[last_entry - header->symoffset] & 1))
+ last_entry++;
+
+ return ++last_entry;
+}
+
+static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
+ size_t sym_cnt, struct lj_wbuf *buf)
+{
+ /*
+ ** Index 0 in ELF symtab is used to represent undefined symbols. Hence, we
+ ** can just start with index 1.
+ **
+ ** For more information, see:
+ ** https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-79797.html
+ */
+
+ for (ElfW(Word) sym_index = 1; sym_index < sym_cnt; sym_index++) {
+ /*
+ ** ELF32_ST_TYPE and ELF64_ST_TYPE are the same, so we can use
+ ** ELF32_ST_TYPE for both 64-bit and 32-bit ELFs.
+ **
+ ** For more, see https://github.com/torvalds/linux/blob/9137eda53752ef73148e42b0d7640a00f1bc96b1/include/uapi/linux/elf.h#L135
+ */
+ if (ELF32_ST_TYPE(sym[sym_index].st_info) == STT_FUNC &&
+ sym[sym_index].st_name != 0) {
+ char *sym_name = &strtab[sym[sym_index].st_name];
+ lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
+ lj_wbuf_addu64(buf, sym[sym_index].st_value + so_addr);
+ lj_wbuf_addstring(buf, sym_name);
+ }
+ }
+}
+
+static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
+ lua_State *L, const ElfW(Addr) so_addr)
+{
+ int status = 0;
+
+ char *strtab = NULL;
+ ElfW(Shdr *) section_headers = NULL;
+ ElfW(Sym *) sym = NULL;
+ ElfW(Ehdr) elf_header = {};
+
+ ElfW(Off) sym_off = 0;
+ ElfW(Off) strtab_off = 0;
+
+ size_t sym_cnt = 0;
+ size_t strtab_size = 0;
+
+ size_t shoff = 0; /* Section headers offset. */
+ size_t shnum = 0; /* Section headers number. */
+ size_t shentsize = 0; /* Section header entry size. */
+
+ FILE *elf_file = fopen(elf_name, "rb");
+
+ if (elf_file == NULL)
+ return -1;
+
+ if(fread(&elf_header, sizeof(elf_header), 1, elf_file) != sizeof(elf_header) &&
+ ferror(elf_file) != 0)
+ goto error;
+ if (memcmp(elf_header.e_ident, ELFMAG, SELFMAG) != 0)
+ /* Not a valid ELF file. */
+ goto error;
+
+ shoff = elf_header.e_shoff;
+ shnum = elf_header.e_shnum;
+ shentsize = elf_header.e_shentsize;
+
+ if (shoff == 0 || shnum == 0 || shentsize == 0)
+ /* No sections in ELF. */
+ goto error;
+
+ /*
+ ** Memory occupied by section headers is unlikely to be more than 160B, but
+ ** 32-bit and 64-bit ELF files may have sections of different sizes and some
+ ** of the sections may duiplicate, so we need to take that into account.
+ */
+ section_headers = lj_mem_new(L, shnum * shentsize);
+ if (section_headers == NULL)
+ goto error;
+
+ if (fseek(elf_file, shoff, SEEK_SET) != 0)
+ goto error;
+
+ if(fread(section_headers, shentsize, shnum, elf_file) != shentsize * shnum &&
+ ferror(elf_file) != 0)
+ goto error;
+
+ for (size_t header_index = 0; header_index < shnum; ++header_index) {
+ if(section_headers[header_index].sh_type == SHT_SYMTAB) {
+ ElfW(Shdr) sym_hdr = section_headers[header_index];
+ ElfW(Shdr) strtab_hdr = section_headers[sym_hdr.sh_link];
+ size_t symtab_size = sym_hdr.sh_size;
+
+ sym_off = sym_hdr.sh_offset;
+ sym_cnt = symtab_size / sym_hdr.sh_entsize;
+
+ strtab_off = strtab_hdr.sh_offset;
+ strtab_size = strtab_hdr.sh_size;
+ break;
+ }
+ }
+
+ if (sym_off == 0 || strtab_off == 0 || sym_cnt == 0)
+ goto error;
+
+ /* Load symtab into memory. */
+ sym = lj_mem_new(L, sym_cnt * sizeof(ElfW(Sym)));
+ if (sym == NULL)
+ goto error;
+ if (fseek(elf_file, sym_off, SEEK_SET) != 0)
+ goto error;
+ if (fread(sym, sizeof(ElfW(Sym)), sym_cnt, elf_file) !=
+ sizeof(ElfW(Sym)) * sym_cnt && ferror(elf_file) != 0)
+ goto error;
+
+
+ /* Load strtab into memory. */
+ strtab = lj_mem_new(L, strtab_size * sizeof(char));
+ if (strtab == NULL)
+ goto error;
+ if (fseek(elf_file, strtab_off, SEEK_SET) != 0)
+ goto error;
+ if (fread(strtab, sizeof(char), strtab_size, elf_file) !=
+ sizeof(char) * strtab_size && ferror(elf_file) != 0)
+ goto error;
+
+ write_c_symtab(sym, strtab, so_addr, sym_cnt, buf);
+
+ goto end;
+
+ error:
+ status = -1;
+
+ end:
+ if (sym != NULL)
+ lj_mem_free(G(L), sym, sym_cnt * sizeof(ElfW(Sym)));
+ if(strtab != NULL)
+ lj_mem_free(G(L), strtab, strtab_size * sizeof(char));
+ if(section_headers != NULL)
+ lj_mem_free(G(L), section_headers, shnum * shentsize);
+
+ fclose(elf_file);
+
+ return status;
+}
+
+static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
+{
+ for (size_t header_index = 0; header_index < info->dlpi_phnum; ++header_index) {
+ if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC) {
+ ElfW(Dyn *) dyn = (ElfW(Dyn) *)(info->dlpi_addr +
+ info->dlpi_phdr[header_index].p_vaddr);
+ ElfW(Sym *) sym = NULL;
+ ElfW(Word *) hashtab = NULL;
+ ElfW(Addr) ghashtab = 0;
+ ElfW(Word) sym_cnt = 0;
+
+ char *strtab = 0;
+
+ for(; dyn->d_tag != DT_NULL; dyn++) {
+ switch(dyn->d_tag) {
+ case DT_HASH:
+ hashtab = (ElfW(Word *))dyn->d_un.d_ptr;
+ break;
+ case DT_GNU_HASH:
+ ghashtab = dyn->d_un.d_ptr;
+ break;
+ case DT_STRTAB:
+ strtab = (char *)dyn->d_un.d_ptr;
+ break;
+ case DT_SYMTAB:
+ sym = (ElfW(Sym *))dyn->d_un.d_ptr;
+ break;
+ default:
+ break;
+ }
+ }
+
+ if ((hashtab == NULL && ghashtab == 0) || strtab == NULL || sym == NULL)
+ /* Not enough data to resolve symbols. */
+ return 1;
+
+ /*
+ ** A hash table consists of Elf32_Word or Elf64_Word objects that provide for
+ ** symbol table access. Hash table has the following organization:
+ ** +-------------------+
+ ** | nbucket |
+ ** +-------------------+
+ ** | nchain |
+ ** +-------------------+
+ ** | bucket[0] |
+ ** | ... |
+ ** | bucket[nbucket-1] |
+ ** +-------------------+
+ ** | chain[0] |
+ ** | ... |
+ ** | chain[nchain-1] |
+ ** +-------------------+
+ ** Chain table entries parallel the symbol table. The number of symbol
+ ** table entries should equal nchain, so symbol table indexes also select
+ ** chain table entries. Since the chain array values are indexes for not only
+ ** the chain array itself, but also for the symbol table, the chain array must
+ ** be the same size as the symbol table. This makes nchain equal to the length
+ ** of the symbol table.
+ **
+ ** For more, see https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-48031.html
+ */
+ sym_cnt = ghashtab == 0 ? hashtab[1] : ghashtab_size(ghashtab);
+ write_c_symtab(sym, strtab, info->dlpi_addr, sym_cnt, buf);
+ return 0;
+ }
+ }
+
+ return 1;
+}
+
+struct symbol_resolver_conf {
+ struct lj_wbuf *buf;
+ lua_State *L;
+};
+
+static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
+ void *data)
+{
+ struct symbol_resolver_conf *conf = data;
+ struct lj_wbuf *buf = conf->buf;
+ lua_State *L = conf->L;
+
+ UNUSED(info_size);
+
+ /* Skip vDSO library. */
+ if (info->dlpi_addr == getauxval(AT_SYSINFO_EHDR))
+ return 0;
+
+ /*
+ ** Main way: try to open ELF and read SHT_SYMTAB, SHT_STRTAB and SHT_HASH
+ ** sections from it.
+ */
+ if (dump_sht_symtab(info->dlpi_name, buf, L, info->dlpi_addr) == 0) {
+ /* Empty body. */
+ }
+ /* First fallback: dump functions only from PT_DYNAMIC segment. */
+ else if(dump_dyn_symtab(info, buf) == 0) {
+ /* Empty body. */
+ }
+ /*
+ ** Last resort: dump ELF size and address to show .so name for its functions
+ ** in memprof output.
+ */
+ else {
+ lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
+ lj_wbuf_addu64(buf, info->dlpi_addr);
+ lj_wbuf_addstring(buf, info->dlpi_name);
+ }
+
+ return 0;
+}
+
+#endif /* LUAJIT_OS != LUAJIT_OS_OSX */
+
static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
{
const GCRef *iter = &g->gc.root;
const GCobj *o;
const size_t ljs_header_len = sizeof(ljs_header) / sizeof(ljs_header[0]);
+#if LUAJIT_OS != LUAJIT_OS_OSX
+ struct symbol_resolver_conf conf = {
+ out,
+ gco2th(gcref(g->cur_L))
+ };
+#endif
+
/* Write prologue. */
lj_wbuf_addn(out, ljs_header, ljs_header_len);
@@ -95,6 +414,10 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
iter = &o->gch.nextgc;
}
+#if LUAJIT_OS != LUAJIT_OS_OSX
+ /* Write C symbols. */
+ dl_iterate_phdr(resolve_symbolnames, &conf);
+#endif
lj_wbuf_addbyte(out, SYMTAB_FINAL);
}
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 395fb429..e7eae4c6 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -16,7 +16,7 @@
#include "lj_def.h"
#include "lj_wbuf.h"
-#define LJS_CURRENT_VERSION 0x2
+#define LJS_CURRENT_VERSION 0x3
/*
** symtab format:
@@ -25,13 +25,15 @@
** prologue := 'l' 'j' 's' version reserved
** version := <BYTE>
** reserved := <BYTE> <BYTE> <BYTE>
-** sym := sym-lua | sym-trace | sym-final
+** sym := sym-lua | sym-cfunc | sym-trace | sym-final
** sym-lua := sym-header sym-addr sym-chunk sym-line
** sym-trace := sym-header trace-no trace-addr sym-addr sym-line
** sym-header := <BYTE>
** sym-addr := <ULEB128>
** sym-chunk := string
** sym-line := <ULEB128>
+** sym-cfunc := sym-header sym-addr sym-name
+** sym-name := string
** sym-final := sym-header
** trace-no := <ULEB128>
** trace-addr := <ULEB128>
@@ -54,7 +56,8 @@
*/
#define SYMTAB_LFUNC ((uint8_t)0)
-#define SYMTAB_TRACE ((uint8_t)1)
+#define SYMTAB_CFUNC ((uint8_t)1)
+#define SYMTAB_TRACE ((uint8_t)2)
#define SYMTAB_FINAL ((uint8_t)0x80)
#define LJM_CURRENT_FORMAT_VERSION 0x02
diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
index b21500a0..f0feabfe 100644
--- a/test/tarantool-tests/CMakeLists.txt
+++ b/test/tarantool-tests/CMakeLists.txt
@@ -57,6 +57,10 @@ macro(BuildTestCLib lib sources)
endmacro()
add_subdirectory(gh-4427-ffi-sandwich)
+add_subdirectory(gh-5813-resolving-of-c-symbols/both)
+add_subdirectory(gh-5813-resolving-of-c-symbols/hash)
+add_subdirectory(gh-5813-resolving-of-c-symbols/gnuhash)
+add_subdirectory(gh-5813-resolving-of-c-symbols/stripped)
add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64)
add_subdirectory(gh-6189-cur_L)
add_subdirectory(lj-49-bad-lightuserdata)
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
new file mode 100644
index 00000000..b28a3948
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
@@ -0,0 +1,87 @@
+-- Memprof is implemented for x86 and x64 architectures only.
+require("utils").skipcond(
+ (jit.arch ~= "x86" and jit.arch ~= "x64")
+ or jit.os == "OSX",
+ jit.arch.." architecture or "..jit.os..
+ " OS is NIY for memprof c symbols resolving"
+)
+
+local tap = require("tap")
+local test = tap.test("gh-5813-resolving-of-c-symbols")
+test:plan(5)
+
+jit.off()
+jit.flush()
+
+local bufread = require "utils.bufread"
+local symtab = require "utils.symtab"
+local testboth = require "resboth"
+local testhash = require "reshash"
+local testgnuhash = require "resgnuhash"
+
+local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
+
+local function tree_contains(node, name)
+ if node == nil then
+ return false
+ elseif node.value.name == name then
+ return true
+ else
+ return tree_contains(node.left, name) or tree_contains(node.right, name)
+ end
+end
+
+local function generate_output(allocator)
+ local res, err = misc.memprof.start(TMP_BINFILE)
+ assert(res, err)
+ for _ = 1, 1e2 do allocator() end
+ misc.memprof.stop()
+end
+
+-- Static symbols resolution.
+local res, err = misc.memprof.start(TMP_BINFILE)
+assert(res, err)
+-- That Lua module is required here to trigger the `luaopen_os`,
+-- which is not stripped.
+local teststripped = require "resstripped"
+
+for _ = 1, 1e2 do teststripped.allocate_string() end
+
+misc.memprof.stop()
+
+local reader = bufread.new(TMP_BINFILE)
+local symbols = symtab.parse(reader)
+
+test:ok(tree_contains(symbols.cfunc, "luaopen_os"))
+
+-- Dynamic symbols resolution.
+generate_output(teststripped.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- .hash style symbol table.
+generate_output(testhash.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- .gnu.hash style symbol table.
+generate_output(testgnuhash.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- Both symbol tables.
+generate_output(testboth.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- FIXME: There is one case that is not tested -- shared objects, which
+-- have neither .symtab section nor .dynsym segment. It is unclear how to
+-- perform a test in that case, since it is impossible to load Lua module
+-- written in C if it doesn't have a .dynsym segment.
+
+os.remove(TMP_BINFILE)
+os.exit(test:check() and 0 or 1)
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
new file mode 100644
index 00000000..e303cd1f
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+ BuildTestCLib(resboth resboth.c)
+ target_link_options(resboth PRIVATE "-Wl,--hash-style=both")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
new file mode 100644
index 00000000..67fd683e
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+ lua_pushstring(L, "test string");
+ return 1;
+}
+
+static const struct luaL_Reg resboth [] = {
+ {"allocate_string", allocate_string},
+ {NULL, NULL}
+};
+
+int luaopen_resboth(lua_State *L) {
+ luaL_register(L, "resboth", resboth);
+ return 1;
+}
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
new file mode 100644
index 00000000..e3a6100f
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+ BuildTestCLib(resgnuhash resgnuhash.c)
+ target_link_options(resgnuhash PRIVATE "-Wl,--hash-style=gnu")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
new file mode 100644
index 00000000..91d2b6e1
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+ lua_pushstring(L, "test string");
+ return 1;
+}
+
+static const struct luaL_Reg resgnuhash [] = {
+ {"allocate_string", allocate_string},
+ {NULL, NULL}
+};
+
+int luaopen_resgnuhash(lua_State *L) {
+ luaL_register(L, "resgnuhash", resgnuhash);
+ return 1;
+}
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
new file mode 100644
index 00000000..f2669502
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+ BuildTestCLib(reshash reshash.c)
+ target_link_options(reshash PRIVATE "-Wl,--hash-style=sysv")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
new file mode 100644
index 00000000..bbb58818
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+ lua_pushstring(L, "test string");
+ return 1;
+}
+
+static const struct luaL_Reg reshash [] = {
+ {"allocate_string", allocate_string},
+ {NULL, NULL}
+};
+
+int luaopen_reshash(lua_State *L) {
+ luaL_register(L, "reshash", reshash);
+ return 1;
+}
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
new file mode 100644
index 00000000..f96b5688
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+ BuildTestCLib(resstripped resstripped.c)
+ target_link_options(resstripped PRIVATE "-s")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
new file mode 100644
index 00000000..82bfcb75
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+ lua_pushstring(L, "test string");
+ return 1;
+}
+
+static const struct luaL_Reg resstripped [] = {
+ {"allocate_string", allocate_string},
+ {NULL, NULL}
+};
+
+int luaopen_resstripped(lua_State *L) {
+ luaL_register(L, "resstripped", resstripped);
+ return 1;
+}
diff --git a/test/tarantool-tests/tools-utils-avl.test.lua b/test/tarantool-tests/tools-utils-avl.test.lua
new file mode 100644
index 00000000..66c14765
--- /dev/null
+++ b/test/tarantool-tests/tools-utils-avl.test.lua
@@ -0,0 +1,52 @@
+local avl = require('utils.avl')
+local tap = require('tap')
+
+local test = tap.test('tools-utils-avl')
+test:plan(7)
+
+local function traverse(node, result)
+ if node ~= nil then
+ table.insert(result, node.key)
+ traverse(node.left, result)
+ traverse(node.right, result)
+ end
+ return result
+end
+
+local function batch_insert(root, values)
+ for i = 1, #values do
+ root = avl.insert(root, values[i])
+ end
+
+ return root
+end
+
+local function compare(arr1, arr2)
+ for i, v in pairs(arr1) do
+ if v ~= arr2[i] then
+ return false
+ end
+ end
+ return true
+end
+
+-- 1L rotation test.
+local root = batch_insert(nil, {1, 2, 3})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- 1R rotation test.
+root = batch_insert(nil, {3, 2, 1})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- 2L rotation test.
+root = batch_insert(nil, {1, 3, 2})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- 2R rotation test.
+root = batch_insert(nil, {3, 1, 2})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- Exact upper bound.
+test:ok(avl.upper_bound(root, 1) == 1)
+-- No upper bound.
+test:ok(avl.upper_bound(root, -10) == nil)
+-- Not exact upper bound.
+test:ok(avl.upper_bound(root, 2.75) == 2)
+
+os.exit(test:check() and 0 or 1)
diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt
index 61830e44..c6803d00 100644
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -30,6 +30,7 @@ else()
memprof/humanize.lua
memprof/parse.lua
memprof.lua
+ utils/avl.lua
utils/bufread.lua
utils/symtab.lua
)
@@ -46,6 +47,7 @@ else()
COMPONENT tools-parse-memprof
)
install(FILES
+ ${CMAKE_CURRENT_SOURCE_DIR}/utils/avl.lua
${CMAKE_CURRENT_SOURCE_DIR}/utils/bufread.lua
${CMAKE_CURRENT_SOURCE_DIR}/utils/symtab.lua
DESTINATION ${LUAJIT_DATAROOTDIR}/utils
diff --git a/tools/utils/avl.lua b/tools/utils/avl.lua
new file mode 100644
index 00000000..ce13c260
--- /dev/null
+++ b/tools/utils/avl.lua
@@ -0,0 +1,113 @@
+local M = {}
+local max = math.max
+
+local function create_node(key, value)
+ return {
+ key = key,
+ value = value,
+ left = nil,
+ right = nil,
+ height = 1,
+ }
+end
+
+local function height(node)
+ if node == nil then
+ return 0
+ end
+ return node.height
+end
+
+local function update_height(node)
+ node.height = 1 + max(height(node.left), height(node.right))
+end
+
+local function get_balance(node)
+ if node == nil then
+ return 0
+ end
+ return height(node.left) - height(node.right)
+end
+
+local function rotate_left(node)
+ local r_subtree = node.right;
+ local rl_subtree = r_subtree.left;
+
+ r_subtree.left = node;
+ node.right = rl_subtree;
+
+ update_height(node)
+ update_height(r_subtree)
+
+ return r_subtree;
+end
+
+local function rotate_right(node)
+ local l_subtree = node.left
+ local lr_subtree = l_subtree.right;
+
+ l_subtree.right = node;
+ node.left = lr_subtree;
+
+ update_height(node)
+ update_height(l_subtree)
+
+ return l_subtree;
+end
+
+local function rebalance(node, key)
+ local balance = get_balance(node)
+
+ if -1 <= balance and balance <=1 then
+ return node
+ end
+
+ if balance > 1 and key < node.left.key then
+ return rotate_right(node)
+ elseif balance < -1 and key > node.right.key then
+ return rotate_left(node)
+ elseif balance > 1 and key > node.left.key then
+ node.left = rotate_left(node.left)
+ return rotate_right(node)
+ elseif balance < -1 and key < node.right.key then
+ node.right = rotate_right(node.right)
+ return rotate_left(node)
+ end
+end
+
+function M.insert(node, key, value)
+ if node == nil then
+ return create_node(key, value)
+ end
+
+ if key < node.key then
+ node.left = M.insert(node.left, key, value)
+ elseif key > node.key then
+ node.right = M.insert(node.right, key, value)
+ else
+ node.key = key
+ node.value = value
+ end
+
+ update_height(node)
+ return rebalance(node, key)
+end
+
+function M.upper_bound(node, key)
+ if node == nil then
+ return nil, nil
+ end
+ -- Explicit match.
+ if key == node.key then
+ return node.key, node.value
+ elseif key < node.key then
+ return M.upper_bound(node.left, key)
+ elseif key > node.key then
+ local right_key, value = M.upper_bound(node.right, key)
+ right_key = right_key or node.key
+ value = value or node.value
+ return right_key, value
+ end
+end
+
+return M
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index c7fcf77c..aedb8da0 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -6,16 +6,19 @@
local bit = require "bit"
+local avl = require "utils.avl"
+
local band = bit.band
local string_format = string.format
local LJS_MAGIC = "ljs"
-local LJS_CURRENT_VERSION = 0x2
+local LJS_CURRENT_VERSION = 0x3
local LJS_EPILOGUE_HEADER = 0x80
local LJS_SYMTYPE_MASK = 0x03
local SYMTAB_LFUNC = 0
-local SYMTAB_TRACE = 1
+local SYMTAB_CFUNC = 1
+local SYMTAB_TRACE = 2
local M = {}
@@ -50,15 +53,27 @@ local function parse_sym_trace(reader, symtab)
}
end
+-- Parse a single entry in a symtab: .so library
+local function parse_sym_cfunc(reader, symtab)
+ local addr = reader:read_uleb128()
+ local name = reader:read_string()
+
+ symtab.cfunc = avl.insert(symtab.cfunc, addr, {
+ name = name
+ })
+end
+
local parsers = {
[SYMTAB_LFUNC] = parse_sym_lfunc,
[SYMTAB_TRACE] = parse_sym_trace,
+ [SYMTAB_CFUNC] = parse_sym_cfunc
}
function M.parse(reader)
local symtab = {
lfunc = {},
trace = {},
+ cfunc = nil,
}
local magic = reader:read_octets(3)
local version = reader:read_octets(1)
@@ -135,6 +150,12 @@ function M.demangle(symtab, loc)
return string_format("%s:%d", symtab.lfunc[addr].source, loc.line)
end
+ local key, value = avl.upper_bound(symtab.cfunc, addr)
+
+ if key then
+ return string_format("%s:%#x", value.name, key)
+ end
+
return string_format("CFUNC %#x", addr)
end
--
2.35.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols
2022-04-06 13:16 [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols Maxim Kokryashkin via Tarantool-patches
@ 2022-04-06 13:16 ` Maxim Kokryashkin via Tarantool-patches
1 sibling, 0 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2022-04-06 13:16 UTC (permalink / raw)
To: tarantool-patches, imun, skaplun
This commit lets memprof extend its symtab when new
C-symbols appear after dlopen. The following data is
stored in event stream for each newly loaded symbol:
| (AEVENT_SYMTAB | ASOURCE_CFUNC) | symbol address | symbol name |
1 byte 8 bytes
magic
number
Every time an allocation occurs inside a C-function, memprof calls
the `dl_iterate_phdr` to check if the `dlpi_adds` has increased since
the last C-function call. If it has, memprof dumps symbols for last
`dlpi_adds` - `prev_dlpi_adds` libraries. Sometimes it may dump a
library that was already dumped, however, that's a rare case and
the parser handles it well.
Resolves tarantool/tarantool#5813
---
>> static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size, void *data)
>> @@ -350,23 +358,46 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size, void
>> struct symbol_resolver_conf *conf = data;
>> struct lj_wbuf *buf = conf->buf;
>> lua_State *L = conf->L;
>> + const uint8_t header = conf->header;
>> +
>> + const uint32_t lib_cnt_prev = *conf->lib_cnt;
>> + uint32_t lib_cnt = 0;
> Nit: I think that we can just declare `lib_cnt`, wo initialization.
I always initialize variables when declaring them. It's a good habit,
that sometimes really helps during the process of debugging. Ignoring
> Also, I noticed that this test is __really__ long now. May be we should
> use smaller amount of allocations?
Well, it takes a long time not because of the number of allocations, but because it
takes a long time to insert all of the C symbols into the AVL tree.
Anyway, I made the number of allocations less.
> Side note:
> Did you rebase you branch on Mikhail's one with enriched symtab for Lua
> functions and traces? It helps to avoid double review of the same things
> (and possible conflicts) in memprof.h, for example.
> Also, those commits should resolve address clashing when a different
> function allocates add the same address as the old deleted one.
No, I didn't because it is still not merged into master.
Also, it will be nice to add the following test:
> 0) Load 1 library, during memprof running, use functions from it with
> allocations
> 1) Unload it, load the second one, use functions from it with
> allocations
> 2) Load the 1st lib again, use functions again
> 3) Check that profile is correct
As Mike says in this[1] letter, it's impossible to unload a library
from Lua side. The only possible way for this to happen is when a
C-library for Lua loads and unloads another library. That case seems to
be really rare, so I don't think we need to test it.
[1]: https://www.freelists.org/post/luajit/BUG-Assertion-failures-when-unloading-and-reloading-the-ffi-package,1
src/lj_memprof.c | 110 ++++++++++++++----
src/lj_memprof.h | 8 +-
.../gh-5813-resolving-of-c-symbols.test.lua | 12 +-
tools/memprof.lua | 5 +
tools/memprof/parse.lua | 15 +++
tools/utils/symtab.lua | 4 +
6 files changed, 128 insertions(+), 26 deletions(-)
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 74f9b810..2c6f67e7 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -122,7 +122,7 @@ static uint32_t ghashtab_size(ElfW(Addr) ghashtab)
}
static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
- size_t sym_cnt, struct lj_wbuf *buf)
+ size_t sym_cnt, const uint8_t header, struct lj_wbuf *buf)
{
/*
** Index 0 in ELF symtab is used to represent undefined symbols. Hence, we
@@ -142,7 +142,7 @@ static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
if (ELF32_ST_TYPE(sym[sym_index].st_info) == STT_FUNC &&
sym[sym_index].st_name != 0) {
char *sym_name = &strtab[sym[sym_index].st_name];
- lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
+ lj_wbuf_addbyte(buf, header);
lj_wbuf_addu64(buf, sym[sym_index].st_value + so_addr);
lj_wbuf_addstring(buf, sym_name);
}
@@ -150,7 +150,7 @@ static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
}
static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
- lua_State *L, const ElfW(Addr) so_addr)
+ lua_State *L, const uint8_t header, const ElfW(Addr) so_addr)
{
int status = 0;
@@ -244,7 +244,7 @@ static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
sizeof(char) * strtab_size && ferror(elf_file) != 0)
goto error;
- write_c_symtab(sym, strtab, so_addr, sym_cnt, buf);
+ write_c_symtab(sym, strtab, so_addr, sym_cnt, header, buf);
goto end;
@@ -264,7 +264,8 @@ static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
return status;
}
-static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
+static int dump_dyn_symtab
+(struct dl_phdr_info *info, const uint8_t header, struct lj_wbuf *buf)
{
for (size_t header_index = 0; header_index < info->dlpi_phnum; ++header_index) {
if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC) {
@@ -326,7 +327,7 @@ static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
** For more, see https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-48031.html
*/
sym_cnt = ghashtab == 0 ? hashtab[1] : ghashtab_size(ghashtab);
- write_c_symtab(sym, strtab, info->dlpi_addr, sym_cnt, buf);
+ write_c_symtab(sym, strtab, info->dlpi_addr, sym_cnt, header, buf);
return 0;
}
}
@@ -335,8 +336,13 @@ static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
}
struct symbol_resolver_conf {
- struct lj_wbuf *buf;
- lua_State *L;
+ struct lj_wbuf *buf; /* Output buffer. */
+ lua_State *L; /* Current Lua state. */
+ const uint8_t header; /* Header for symbol entries to write. */
+
+ uint32_t cur_lib; /* Index of the lib currently dumped. */
+ uint32_t to_dump_cnt; /* Amount of libs to dump. */
+ uint32_t *lib_adds; /* Memprof's counter of libs. */
};
static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
@@ -345,23 +351,48 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
struct symbol_resolver_conf *conf = data;
struct lj_wbuf *buf = conf->buf;
lua_State *L = conf->L;
+ const uint8_t header = conf->header;
+
+ uint32_t lib_cnt = 0;
+
+ /*
+ ** Check that dlpi_adds and dlpi_subs fields are available.
+ ** Assertion was taken from the GLIBC tests:
+ ** https://code.woboq.org/userspace/glibc/elf/tst-dlmodcount.c.html#37
+ */
+ lua_assert(info_size > offsetof(struct dl_phdr_info, dlpi_subs)
+ + sizeof(info->dlpi_subs));
- UNUSED(info_size);
+ lib_cnt = info->dlpi_adds - *conf->lib_adds;
/* Skip vDSO library. */
if (info->dlpi_addr == getauxval(AT_SYSINFO_EHDR))
return 0;
+ if ((conf->to_dump_cnt = info->dlpi_adds - *conf->lib_adds) == 0)
+ /* No new libraries, stop resolver. */
+ return 1;
+
+ if (conf->cur_lib < lib_cnt - conf->to_dump_cnt) {
+ /* That lib is already dumped, skip it. */
+ ++conf->cur_lib;
+ return 0;
+ }
+
+ if (conf->cur_lib == lib_cnt - conf->to_dump_cnt - 1)
+ /* Last library, update memrpof's lib counter. */
+ *conf->lib_adds = info->dlpi_adds;
+
/*
** Main way: try to open ELF and read SHT_SYMTAB, SHT_STRTAB and SHT_HASH
** sections from it.
*/
- if (dump_sht_symtab(info->dlpi_name, buf, L, info->dlpi_addr) == 0) {
- /* Empty body. */
+ if (dump_sht_symtab(info->dlpi_name, buf, L, header, info->dlpi_addr) == 0) {
+ ++conf->cur_lib;
}
/* First fallback: dump functions only from PT_DYNAMIC segment. */
- else if(dump_dyn_symtab(info, buf) == 0) {
- /* Empty body. */
+ else if(dump_dyn_symtab(info, header, buf) == 0) {
+ ++conf->cur_lib;
}
/*
** Last resort: dump ELF size and address to show .so name for its functions
@@ -371,6 +402,7 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
lj_wbuf_addu64(buf, info->dlpi_addr);
lj_wbuf_addstring(buf, info->dlpi_name);
+ ++conf->cur_lib;
}
return 0;
@@ -378,7 +410,8 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
#endif /* LUAJIT_OS != LUAJIT_OS_OSX */
-static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
+static void dump_symtab(struct lj_wbuf *out, const struct global_State *g,
+ uint32_t *lib_adds)
{
const GCRef *iter = &g->gc.root;
const GCobj *o;
@@ -387,8 +420,14 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
#if LUAJIT_OS != LUAJIT_OS_OSX
struct symbol_resolver_conf conf = {
out,
- gco2th(gcref(g->cur_L))
+ gco2th(gcref(g->cur_L)),
+ SYMTAB_CFUNC,
+ 0,
+ 0,
+ lib_adds
};
+#else
+ UNUSED(lib_adds);
#endif
/* Write prologue. */
@@ -447,6 +486,7 @@ struct memprof {
struct alloc orig_alloc; /* Original allocator. */
struct lj_memprof_options opt; /* Profiling options. */
int saved_errno; /* Saved errno when profiler deinstrumented. */
+ uint32_t lib_adds; /* Number of libs loaded. Monotonic. */
};
static struct memprof memprof = {0};
@@ -482,15 +522,43 @@ static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t aevent,
}
static void memprof_write_cfunc(struct lj_wbuf *out, uint8_t aevent,
- const GCfunc *fn)
+ const GCfunc *fn, lua_State *L, uint32_t *lib_adds)
{
+#if LUAJIT_OS != LUAJIT_OS_OSX
+ /* Check if there are any new libs. */
+ struct symbol_resolver_conf conf = {
+ out,
+ L,
+ AEVENT_SYMTAB | ASOURCE_CFUNC,
+ 0,
+ 0,
+ lib_adds
+ };
+
+ /* XXX: Leaving the `vmstate` unchanged leads to an infinite
+ ** recursion, because allocations inside ELF parser are treated
+ ** as C-side allocations by memrpof. Setting the `vmstate` to
+ ** LJ_VMST_INTERP solves the issue.
+ */
+ global_State *g = G(L);
+ const uint32_t ostate = g->vmstate;
+ g->vmstate = ~LJ_VMST_INTERP;
+
+ dl_iterate_phdr(resolve_symbolnames, &conf);
+
+ /* Restore vmstate. */
+ g->vmstate = ostate;
+#else
+ UNUSED(lib_adds);
+#endif
+
lj_wbuf_addbyte(out, aevent | ASOURCE_CFUNC);
lj_wbuf_addu64(out, (uintptr_t)fn->c.f);
}
static void memprof_write_ffunc(struct lj_wbuf *out, uint8_t aevent,
GCfunc *fn, struct lua_State *L,
- cTValue *frame)
+ cTValue *frame, uint32_t *lib_adds)
{
cTValue *pframe = frame_prev(frame);
GCfunc *pfn = frame_func(pframe);
@@ -503,7 +571,7 @@ static void memprof_write_ffunc(struct lj_wbuf *out, uint8_t aevent,
if (pfn != NULL && isluafunc(pfn))
memprof_write_lfunc(out, aevent, pfn, L, frame);
else
- memprof_write_cfunc(out, aevent, fn);
+ memprof_write_cfunc(out, aevent, fn, L, lib_adds);
}
static void memprof_write_func(struct memprof *mp, uint8_t aevent)
@@ -516,9 +584,9 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
if (isluafunc(fn))
memprof_write_lfunc(out, aevent, fn, L, NULL);
else if (isffunc(fn))
- memprof_write_ffunc(out, aevent, fn, L, frame);
+ memprof_write_ffunc(out, aevent, fn, L, frame, &mp->lib_adds);
else if (iscfunc(fn))
- memprof_write_cfunc(out, aevent, fn);
+ memprof_write_cfunc(out, aevent, fn, L, &mp->lib_adds);
else
lua_assert(0);
}
@@ -654,7 +722,7 @@ int lj_memprof_start(struct lua_State *L, const struct lj_memprof_options *opt)
/* Init output. */
lj_wbuf_init(&mp->out, mp_opt->writer, mp_opt->ctx, mp_opt->buf, mp_opt->len);
- dump_symtab(&mp->out, mp->g);
+ dump_symtab(&mp->out, mp->g, &mp->lib_adds);
/* Write prologue. */
lj_wbuf_addn(&mp->out, ljm_header, ljm_header_len);
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index e7eae4c6..75d9f438 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -71,10 +71,11 @@
** prologue := 'l' 'j' 'm' version reserved
** version := <BYTE>
** reserved := <BYTE> <BYTE> <BYTE>
-** event := event-alloc | event-realloc | event-free
+** event := event-alloc | event-realloc | event-free | event-symtab
** event-alloc := event-header loc? naddr nsize
** event-realloc := event-header loc? oaddr osize naddr nsize
** event-free := event-header loc? oaddr osize
+** event-symtab := event-header sym-addr sym-name
** event-header := <BYTE>
** loc := loc-lua | loc-c | loc-trace
** loc-lua := sym-addr line-no
@@ -88,7 +89,11 @@
** naddr := <ULEB128>
** osize := <ULEB128>
** nsize := <ULEB128>
+** sym-name := string
** epilogue := event-header
+** string := string-len string-payload
+** string-len := <ULEB128>
+** string-payload := <BYTE> {string-len}
**
** <BYTE> : A single byte (no surprises here)
** <ULEB128>: Unsigned integer represented in ULEB128 encoding
@@ -107,6 +112,7 @@
*/
/* Allocation events. */
+#define AEVENT_SYMTAB ((uint8_t)0)
#define AEVENT_ALLOC ((uint8_t)1)
#define AEVENT_FREE ((uint8_t)2)
#define AEVENT_REALLOC ((uint8_t)(AEVENT_ALLOC | AEVENT_FREE))
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
index b28a3948..e071d405 100644
--- a/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
@@ -18,6 +18,7 @@ local symtab = require "utils.symtab"
local testboth = require "resboth"
local testhash = require "reshash"
local testgnuhash = require "resgnuhash"
+local memprof = require "memprof.parse"
local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
@@ -51,13 +52,16 @@ misc.memprof.stop()
local reader = bufread.new(TMP_BINFILE)
local symbols = symtab.parse(reader)
+local events = memprof.parse(reader)
+for addr, event in pairs(events.symtab) do
+ symtab.add_cfunc(symbols, addr, event.name)
+end
+
+-- Static symbol resolution.
test:ok(tree_contains(symbols.cfunc, "luaopen_os"))
--- Dynamic symbols resolution.
-generate_output(teststripped.allocate_string)
-reader = bufread.new(TMP_BINFILE)
-symbols = symtab.parse(reader)
+-- Dynamic symbol resolution. Newly loaded symbol resolution.
test:ok(tree_contains(symbols.cfunc, "allocate_string"))
-- .hash style symbol table.
diff --git a/tools/memprof.lua b/tools/memprof.lua
index 18b44fdd..805b7e74 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -101,6 +101,11 @@ local function dump(inputfile)
local reader = bufread.new(inputfile)
local symbols = symtab.parse(reader)
local events = memprof.parse(reader, symbols)
+
+ for addr, event in pairs(events.symtab) do
+ symtab.add_cfunc(symbols, addr, event.name)
+ end
+
if not leak_only then
view.profile_info(events, symbols)
end
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 47dbaee4..181a7e92 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -17,6 +17,7 @@ local LJM_CURRENT_VERSION = 0x02
local LJM_EPILOGUE_HEADER = 0x80
+local AEVENT_SYMTAB = 0
local AEVENT_ALLOC = 1
local AEVENT_FREE = 2
local AEVENT_REALLOC = 3
@@ -41,6 +42,7 @@ local function new_event(loc)
free = 0,
alloc = 0,
primary = {},
+ name = nil
}
end
@@ -85,6 +87,17 @@ local function parse_location(reader, asource)
return symtab.id(loc), loc
end
+local function parse_symtab(reader, _, events, _)
+ local id = reader:read_uleb128()
+ local name = reader:read_string()
+
+ if not events[id] then
+ events[id] = new_event(0)
+ end
+
+ events[id].name = name
+end
+
local function parse_alloc(reader, asource, events, heap)
local id, loc = parse_location(reader, asource)
@@ -142,6 +155,7 @@ local function parse_free(reader, asource, events, heap)
end
local parsers = {
+ [AEVENT_SYMTAB] = {evname = "symtab", parse = parse_symtab},
[AEVENT_ALLOC] = {evname = "alloc", parse = parse_alloc},
[AEVENT_FREE] = {evname = "free", parse = parse_free},
[AEVENT_REALLOC] = {evname = "realloc", parse = parse_realloc},
@@ -182,6 +196,7 @@ function M.parse(reader)
realloc = {},
free = {},
heap = {},
+ symtab = {}
}
local magic = reader:read_octets(3)
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index aedb8da0..5197d956 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -159,4 +159,8 @@ function M.demangle(symtab, loc)
return string_format("CFUNC %#x", addr)
end
+function M.add_cfunc(symtab, addr, name)
+ symtab.cfunc = avl.insert(symtab.cfunc, addr, {name = name})
+end
+
return M
--
2.35.1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-04-06 13:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 13:16 [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols Maxim Kokryashkin via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox