Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving
@ 2022-04-06 13:16 Maxim Kokryashkin via Tarantool-patches
  2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols Maxim Kokryashkin via Tarantool-patches
  2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols Maxim Kokryashkin via Tarantool-patches
  0 siblings, 2 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2022-04-06 13:16 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

Changes in v7:
- Fixed comments as per review by Sergey
- Added tests for different types of hash sections.

Maxim Kokryashkin (2):
  memprof: extend symtab with C-symbols
  memprof: enrich symtab with newly loaded symbols

 Makefile.original                             |   2 +-
 src/lj_memprof.c                              | 405 +++++++++++++++++-
 src/lj_memprof.h                              |  17 +-
 test/tarantool-tests/CMakeLists.txt           |   4 +
 .../gh-5813-resolving-of-c-symbols.test.lua   |  91 ++++
 .../both/CMakeLists.txt                       |   4 +
 .../both/resboth.c                            |  17 +
 .../gnuhash/CMakeLists.txt                    |   4 +
 .../gnuhash/resgnuhash.c                      |  17 +
 .../hash/CMakeLists.txt                       |   4 +
 .../hash/reshash.c                            |  17 +
 .../stripped/CMakeLists.txt                   |   4 +
 .../stripped/resstripped.c                    |  17 +
 test/tarantool-tests/tools-utils-avl.test.lua |  52 +++
 tools/CMakeLists.txt                          |   2 +
 tools/memprof.lua                             |   5 +
 tools/memprof/parse.lua                       |  15 +
 tools/utils/avl.lua                           | 113 +++++
 tools/utils/symtab.lua                        |  29 +-
 19 files changed, 805 insertions(+), 14 deletions(-)
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
 create mode 100644 test/tarantool-tests/tools-utils-avl.test.lua
 create mode 100644 tools/utils/avl.lua

--
2.35.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols
  2022-04-06 13:16 [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving Maxim Kokryashkin via Tarantool-patches
@ 2022-04-06 13:16 ` Maxim Kokryashkin via Tarantool-patches
  2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols Maxim Kokryashkin via Tarantool-patches
  1 sibling, 0 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2022-04-06 13:16 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit enriches memprof's symbol table with information
about C-symbols. The parser is updated correspondingly.

If there is .symtab section or at least .dynsym segment in
a shared library, then the following data is stored in symtab
for each symbol:

| SYMTAB_CFUNC | symbol address | symbol name |
  1 byte            8 bytes
  magic
  number

If none of those are present, then instead of a symbol name
and its address there will be name and address of a
shared library containing that symbol.

Part of tarantool/tarantool#5813
---
>> +#include "lj_gc.h"
>
> Nit: This include looks excess.
No, it's not. It is needed for the `lj_mem_new`.

> We can pass struct memprof instead custom one -- it already has global state to
> determine lua_State and struct lj_wbuf.
> | lua_State *L = gco2th(gcref(mp->g->mem_L));
Well, we can, but I don't think that it's good in terms of incapsulation.

 Makefile.original                             |   2 +-
 src/lj_memprof.c                              | 323 ++++++++++++++++++
 src/lj_memprof.h                              |   9 +-
 test/tarantool-tests/CMakeLists.txt           |   4 +
 .../gh-5813-resolving-of-c-symbols.test.lua   |  87 +++++
 .../both/CMakeLists.txt                       |   4 +
 .../both/resboth.c                            |  17 +
 .../gnuhash/CMakeLists.txt                    |   4 +
 .../gnuhash/resgnuhash.c                      |  17 +
 .../hash/CMakeLists.txt                       |   4 +
 .../hash/reshash.c                            |  17 +
 .../stripped/CMakeLists.txt                   |   4 +
 .../stripped/resstripped.c                    |  17 +
 test/tarantool-tests/tools-utils-avl.test.lua |  52 +++
 tools/CMakeLists.txt                          |   2 +
 tools/utils/avl.lua                           | 113 ++++++
 tools/utils/symtab.lua                        |  25 +-
 17 files changed, 695 insertions(+), 6 deletions(-)
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
 create mode 100644 test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
 create mode 100644 test/tarantool-tests/tools-utils-avl.test.lua
 create mode 100644 tools/utils/avl.lua

diff --git a/Makefile.original b/Makefile.original
index 33dc2ed5..0c92df9e 100644
--- a/Makefile.original
+++ b/Makefile.original
@@ -100,7 +100,7 @@ FILES_JITLIB= bc.lua bcsave.lua dump.lua p.lua v.lua zone.lua \
 	      dis_x86.lua dis_x64.lua dis_arm.lua dis_arm64.lua \
 	      dis_arm64be.lua dis_ppc.lua dis_mips.lua dis_mipsel.lua \
 	      dis_mips64.lua dis_mips64el.lua vmdef.lua
-FILES_UTILSLIB= bufread.lua symtab.lua
+FILES_UTILSLIB= avl.lua bufread.lua symtab.lua
 FILES_MEMPROFLIB= parse.lua humanize.lua
 FILES_TOOLSLIB= memprof.lua
 FILE_TMEMPROF= luajit-parse-memprof
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2d779983..74f9b810 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -8,16 +8,25 @@
 #define lj_memprof_c
 #define LUA_CORE
 
+#define _GNU_SOURCE
+
 #include <errno.h>
+#include <stdio.h>
 
 #include "lj_arch.h"
 #include "lj_memprof.h"
 
+#if LUAJIT_OS != LUAJIT_OS_OSX
+#include <elf.h>
+#include <link.h>
+#include <sys/auxv.h>
+#endif
 #if LJ_HASMEMPROF
 
 #include "lj_obj.h"
 #include "lj_frame.h"
 #include "lj_debug.h"
+#include "lj_gc.h"
 
 #if LJ_HASJIT
 #include "lj_dispatch.h"
@@ -66,12 +75,322 @@ static void dump_symtab_trace(struct lj_wbuf *out, const GCtrace *trace)
 
 #endif
 
+#if LUAJIT_OS != LUAJIT_OS_OSX
+
+struct ghashtab_header {
+  uint32_t nbuckets;
+  uint32_t symoffset;
+  uint32_t bloom_size;
+  uint32_t bloom_shift;
+};
+
+static uint32_t ghashtab_size(ElfW(Addr) ghashtab)
+{
+  /*
+  ** There is no easy way to get count of symbols in GNU hashtable, so the
+  ** only way to do this is to take highest possible non-empty bucket and
+  ** iterate through its symbols until the last chain is over.
+  */
+  uint32_t last_entry = 0;
+
+  const uint32_t *chain = NULL;
+  struct ghashtab_header *header = (struct ghashtab_header *)ghashtab;
+  /*
+  ** sizeof(size_t) returns 8, if compiled with 64-bit compiler, and 4 if
+  ** compiled with 32-bit compiler. It is the best option to determine which
+  ** kind of CPU we are running on.
+  */
+  const char *buckets = (char *)ghashtab + sizeof(struct ghashtab_header) +
+                        sizeof(size_t) * header->bloom_size;
+
+  uint32_t *cur_bucket = (uint32_t *)buckets;
+  for (uint32_t i = 0; i < header->nbuckets; ++i) {
+    if (last_entry < *cur_bucket)
+      last_entry = *cur_bucket;
+    cur_bucket++;
+  }
+
+  if (last_entry < header->symoffset)
+    return header->symoffset;
+
+  chain = (uint32_t *)(buckets + sizeof(uint32_t) * header->nbuckets);
+  /* The chain ends with the lowest bit set to 1. */
+  while (!(chain[last_entry - header->symoffset] & 1))
+    last_entry++;
+
+  return ++last_entry;
+}
+
+static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
+    size_t sym_cnt, struct lj_wbuf *buf)
+{
+  /*
+  ** Index 0 in ELF symtab is used to represent undefined symbols. Hence, we
+  ** can just start with index 1.
+  **
+  ** For more information, see:
+  ** https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-79797.html
+  */
+
+  for (ElfW(Word) sym_index = 1; sym_index < sym_cnt; sym_index++) {
+    /*
+    ** ELF32_ST_TYPE and ELF64_ST_TYPE are the same, so we can use
+    ** ELF32_ST_TYPE for both 64-bit and 32-bit ELFs.
+    **
+    ** For more, see https://github.com/torvalds/linux/blob/9137eda53752ef73148e42b0d7640a00f1bc96b1/include/uapi/linux/elf.h#L135
+    */
+    if (ELF32_ST_TYPE(sym[sym_index].st_info) == STT_FUNC &&
+        sym[sym_index].st_name != 0) {
+      char *sym_name = &strtab[sym[sym_index].st_name];
+      lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
+      lj_wbuf_addu64(buf, sym[sym_index].st_value + so_addr);
+      lj_wbuf_addstring(buf, sym_name);
+    }
+  }
+}
+
+static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
+    lua_State *L, const ElfW(Addr) so_addr)
+{
+  int status = 0;
+
+  char *strtab = NULL;
+  ElfW(Shdr *) section_headers = NULL;
+  ElfW(Sym *) sym = NULL;
+  ElfW(Ehdr) elf_header = {};
+
+  ElfW(Off) sym_off = 0;
+  ElfW(Off) strtab_off = 0;
+
+  size_t sym_cnt = 0;
+  size_t strtab_size = 0;
+
+  size_t shoff = 0; /* Section headers offset. */
+  size_t shnum = 0; /* Section headers number. */
+  size_t shentsize = 0; /* Section header entry size. */
+
+  FILE *elf_file = fopen(elf_name, "rb");
+
+  if (elf_file == NULL)
+    return -1;
+
+  if(fread(&elf_header, sizeof(elf_header), 1, elf_file) != sizeof(elf_header) &&
+      ferror(elf_file) != 0)
+    goto error;
+  if (memcmp(elf_header.e_ident, ELFMAG, SELFMAG) != 0)
+    /* Not a valid ELF file. */
+    goto error;
+
+  shoff = elf_header.e_shoff;
+  shnum = elf_header.e_shnum;
+  shentsize = elf_header.e_shentsize;
+
+  if (shoff == 0 || shnum == 0 || shentsize == 0)
+    /* No sections in ELF. */
+    goto error;
+
+  /*
+  ** Memory occupied by section headers is unlikely to be more than 160B, but
+  ** 32-bit and 64-bit ELF files may have sections of different sizes and some
+  ** of the sections may duiplicate, so we need to take that into account.
+  */
+  section_headers = lj_mem_new(L, shnum * shentsize);
+  if (section_headers == NULL)
+    goto error;
+
+  if (fseek(elf_file, shoff, SEEK_SET) != 0)
+    goto error;
+
+  if(fread(section_headers, shentsize, shnum, elf_file) != shentsize * shnum &&
+      ferror(elf_file) != 0)
+    goto error;
+
+  for (size_t header_index = 0; header_index < shnum; ++header_index) {
+    if(section_headers[header_index].sh_type == SHT_SYMTAB) {
+      ElfW(Shdr) sym_hdr = section_headers[header_index];
+      ElfW(Shdr) strtab_hdr = section_headers[sym_hdr.sh_link];
+      size_t symtab_size = sym_hdr.sh_size;
+
+      sym_off = sym_hdr.sh_offset;
+      sym_cnt = symtab_size / sym_hdr.sh_entsize;
+
+      strtab_off = strtab_hdr.sh_offset;
+      strtab_size = strtab_hdr.sh_size;
+      break;
+    }
+  }
+
+  if (sym_off == 0 || strtab_off == 0 || sym_cnt == 0)
+    goto error;
+
+  /* Load symtab into memory. */
+  sym = lj_mem_new(L, sym_cnt * sizeof(ElfW(Sym)));
+  if (sym == NULL)
+    goto error;
+  if (fseek(elf_file, sym_off, SEEK_SET) != 0)
+    goto error;
+  if (fread(sym, sizeof(ElfW(Sym)), sym_cnt, elf_file) !=
+      sizeof(ElfW(Sym)) * sym_cnt && ferror(elf_file) != 0)
+    goto error;
+
+
+  /* Load strtab into memory. */
+  strtab = lj_mem_new(L, strtab_size * sizeof(char));
+  if (strtab == NULL)
+    goto error;
+  if (fseek(elf_file, strtab_off, SEEK_SET) != 0)
+    goto error;
+  if (fread(strtab, sizeof(char), strtab_size, elf_file) !=
+      sizeof(char) * strtab_size && ferror(elf_file) != 0)
+    goto error;
+
+  write_c_symtab(sym, strtab, so_addr, sym_cnt, buf);
+
+  goto end;
+
+  error:
+  status = -1;
+
+  end:
+  if (sym != NULL)
+    lj_mem_free(G(L), sym, sym_cnt * sizeof(ElfW(Sym)));
+  if(strtab != NULL)
+    lj_mem_free(G(L), strtab, strtab_size * sizeof(char));
+  if(section_headers != NULL)
+    lj_mem_free(G(L), section_headers, shnum * shentsize);
+
+  fclose(elf_file);
+
+  return status;
+}
+
+static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
+{
+  for (size_t header_index = 0; header_index < info->dlpi_phnum; ++header_index) {
+    if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC) {
+      ElfW(Dyn *) dyn = (ElfW(Dyn) *)(info->dlpi_addr +
+          info->dlpi_phdr[header_index].p_vaddr);
+      ElfW(Sym *) sym = NULL;
+      ElfW(Word *) hashtab = NULL;
+      ElfW(Addr) ghashtab = 0;
+      ElfW(Word) sym_cnt = 0;
+
+      char *strtab = 0;
+
+      for(; dyn->d_tag != DT_NULL; dyn++) {
+        switch(dyn->d_tag) {
+        case DT_HASH:
+          hashtab = (ElfW(Word *))dyn->d_un.d_ptr;
+          break;
+        case DT_GNU_HASH:
+          ghashtab = dyn->d_un.d_ptr;
+          break;
+        case DT_STRTAB:
+          strtab = (char *)dyn->d_un.d_ptr;
+          break;
+        case DT_SYMTAB:
+          sym = (ElfW(Sym *))dyn->d_un.d_ptr;
+          break;
+        default:
+          break;
+        }
+      }
+
+      if ((hashtab == NULL && ghashtab == 0) || strtab == NULL || sym == NULL)
+        /* Not enough data to resolve symbols. */
+        return 1;
+
+      /*
+      ** A hash table consists of Elf32_Word or Elf64_Word objects that provide for
+      ** symbol table access. Hash table has the following organization:
+      ** +-------------------+
+      ** |      nbucket      |
+      ** +-------------------+
+      ** |      nchain       |
+      ** +-------------------+
+      ** |     bucket[0]     |
+      ** |       ...         |
+      ** | bucket[nbucket-1] |
+      ** +-------------------+
+      ** |     chain[0]      |
+      ** |       ...         |
+      ** |  chain[nchain-1]  |
+      ** +-------------------+
+      ** Chain table entries parallel the symbol table. The number of symbol
+      ** table entries should equal nchain, so symbol table indexes also select
+      ** chain table entries. Since the chain array values are indexes for not only
+      ** the chain array itself, but also for the symbol table, the chain array must
+      ** be the same size as the symbol table. This makes nchain equal to the length
+      ** of the symbol table.
+      **
+      ** For more, see https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-48031.html
+      */
+      sym_cnt = ghashtab == 0 ? hashtab[1] : ghashtab_size(ghashtab);
+      write_c_symtab(sym, strtab, info->dlpi_addr, sym_cnt, buf);
+      return 0;
+    }
+  }
+
+  return 1;
+}
+
+struct symbol_resolver_conf {
+  struct lj_wbuf *buf;
+  lua_State *L;
+};
+
+static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
+    void *data)
+{
+  struct symbol_resolver_conf *conf = data;
+  struct lj_wbuf *buf = conf->buf;
+  lua_State *L = conf->L;
+
+  UNUSED(info_size);
+
+  /* Skip vDSO library. */
+  if (info->dlpi_addr == getauxval(AT_SYSINFO_EHDR))
+    return 0;
+
+  /*
+  ** Main way: try to open ELF and read SHT_SYMTAB, SHT_STRTAB and SHT_HASH
+  ** sections from it.
+  */
+  if (dump_sht_symtab(info->dlpi_name, buf, L, info->dlpi_addr) == 0) {
+    /* Empty body. */
+  }
+  /* First fallback: dump functions only from PT_DYNAMIC segment. */
+  else if(dump_dyn_symtab(info, buf) == 0) {
+    /* Empty body. */
+  }
+  /*
+  ** Last resort: dump ELF size and address to show .so name for its functions
+  ** in memprof output.
+  */
+  else {
+    lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
+    lj_wbuf_addu64(buf, info->dlpi_addr);
+    lj_wbuf_addstring(buf, info->dlpi_name);
+  }
+
+  return 0;
+}
+
+#endif /* LUAJIT_OS != LUAJIT_OS_OSX */
+
 static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
 {
   const GCRef *iter = &g->gc.root;
   const GCobj *o;
   const size_t ljs_header_len = sizeof(ljs_header) / sizeof(ljs_header[0]);
 
+#if LUAJIT_OS != LUAJIT_OS_OSX
+  struct symbol_resolver_conf conf = {
+    out,
+    gco2th(gcref(g->cur_L))
+  };
+#endif
+
   /* Write prologue. */
   lj_wbuf_addn(out, ljs_header, ljs_header_len);
 
@@ -95,6 +414,10 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
     iter = &o->gch.nextgc;
   }
 
+#if LUAJIT_OS != LUAJIT_OS_OSX
+  /* Write C symbols. */
+  dl_iterate_phdr(resolve_symbolnames, &conf);
+#endif
   lj_wbuf_addbyte(out, SYMTAB_FINAL);
 }
 
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 395fb429..e7eae4c6 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -16,7 +16,7 @@
 #include "lj_def.h"
 #include "lj_wbuf.h"
 
-#define LJS_CURRENT_VERSION 0x2
+#define LJS_CURRENT_VERSION 0x3
 
 /*
 ** symtab format:
@@ -25,13 +25,15 @@
 ** prologue       := 'l' 'j' 's' version reserved
 ** version        := <BYTE>
 ** reserved       := <BYTE> <BYTE> <BYTE>
-** sym            := sym-lua | sym-trace | sym-final
+** sym            := sym-lua | sym-cfunc | sym-trace | sym-final
 ** sym-lua        := sym-header sym-addr sym-chunk sym-line
 ** sym-trace      := sym-header trace-no trace-addr sym-addr sym-line
 ** sym-header     := <BYTE>
 ** sym-addr       := <ULEB128>
 ** sym-chunk      := string
 ** sym-line       := <ULEB128>
+** sym-cfunc      := sym-header sym-addr sym-name
+** sym-name       := string
 ** sym-final      := sym-header
 ** trace-no       := <ULEB128>
 ** trace-addr     := <ULEB128>
@@ -54,7 +56,8 @@
 */
 
 #define SYMTAB_LFUNC ((uint8_t)0)
-#define SYMTAB_TRACE ((uint8_t)1)
+#define SYMTAB_CFUNC ((uint8_t)1)
+#define SYMTAB_TRACE ((uint8_t)2)
 #define SYMTAB_FINAL ((uint8_t)0x80)
 
 #define LJM_CURRENT_FORMAT_VERSION 0x02
diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
index b21500a0..f0feabfe 100644
--- a/test/tarantool-tests/CMakeLists.txt
+++ b/test/tarantool-tests/CMakeLists.txt
@@ -57,6 +57,10 @@ macro(BuildTestCLib lib sources)
 endmacro()
 
 add_subdirectory(gh-4427-ffi-sandwich)
+add_subdirectory(gh-5813-resolving-of-c-symbols/both)
+add_subdirectory(gh-5813-resolving-of-c-symbols/hash)
+add_subdirectory(gh-5813-resolving-of-c-symbols/gnuhash)
+add_subdirectory(gh-5813-resolving-of-c-symbols/stripped)
 add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64)
 add_subdirectory(gh-6189-cur_L)
 add_subdirectory(lj-49-bad-lightuserdata)
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
new file mode 100644
index 00000000..b28a3948
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
@@ -0,0 +1,87 @@
+-- Memprof is implemented for x86 and x64 architectures only.
+require("utils").skipcond(
+  (jit.arch ~= "x86" and jit.arch ~= "x64")
+  or jit.os == "OSX",
+  jit.arch.." architecture or "..jit.os..
+  " OS is NIY for memprof c symbols resolving"
+)
+
+local tap = require("tap")
+local test = tap.test("gh-5813-resolving-of-c-symbols")
+test:plan(5)
+
+jit.off()
+jit.flush()
+
+local bufread = require "utils.bufread"
+local symtab = require "utils.symtab"
+local testboth = require "resboth"
+local testhash = require "reshash"
+local testgnuhash = require "resgnuhash"
+
+local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
+
+local function tree_contains(node, name)
+  if node == nil then
+    return false
+  elseif node.value.name == name then
+    return true
+  else
+    return tree_contains(node.left, name) or tree_contains(node.right, name)
+  end
+end
+
+local function generate_output(allocator)
+  local res, err = misc.memprof.start(TMP_BINFILE)
+  assert(res, err)
+  for _ = 1, 1e2 do allocator() end
+  misc.memprof.stop()
+end
+
+-- Static symbols resolution.
+local res, err = misc.memprof.start(TMP_BINFILE)
+assert(res, err)
+-- That Lua module is required here to trigger the `luaopen_os`,
+-- which is not stripped.
+local teststripped = require "resstripped"
+
+for _ = 1, 1e2 do teststripped.allocate_string() end
+
+misc.memprof.stop()
+
+local reader = bufread.new(TMP_BINFILE)
+local symbols = symtab.parse(reader)
+
+test:ok(tree_contains(symbols.cfunc, "luaopen_os"))
+
+-- Dynamic symbols resolution.
+generate_output(teststripped.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- .hash style symbol table.
+generate_output(testhash.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- .gnu.hash style symbol table.
+generate_output(testgnuhash.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- Both symbol tables.
+generate_output(testboth.allocate_string)
+reader = bufread.new(TMP_BINFILE)
+symbols = symtab.parse(reader)
+test:ok(tree_contains(symbols.cfunc, "allocate_string"))
+
+-- FIXME: There is one case that is not tested -- shared objects, which
+-- have neither .symtab section nor .dynsym segment. It is unclear how to
+-- perform a test in that case, since it is impossible to load Lua module
+-- written in C if it doesn't have a .dynsym segment.
+
+os.remove(TMP_BINFILE)
+os.exit(test:check() and 0 or 1)
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
new file mode 100644
index 00000000..e303cd1f
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+  BuildTestCLib(resboth resboth.c)
+  target_link_options(resboth PRIVATE "-Wl,--hash-style=both")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
new file mode 100644
index 00000000..67fd683e
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/both/resboth.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+    lua_pushstring(L, "test string");
+    return 1;
+}
+
+static const struct luaL_Reg resboth [] = {
+    {"allocate_string", allocate_string},
+    {NULL, NULL}
+};
+
+int luaopen_resboth(lua_State *L) {
+    luaL_register(L, "resboth", resboth);
+    return 1;
+}
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
new file mode 100644
index 00000000..e3a6100f
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+  BuildTestCLib(resgnuhash resgnuhash.c)
+  target_link_options(resgnuhash PRIVATE "-Wl,--hash-style=gnu")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
new file mode 100644
index 00000000..91d2b6e1
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/gnuhash/resgnuhash.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+    lua_pushstring(L, "test string");
+    return 1;
+}
+
+static const struct luaL_Reg resgnuhash [] = {
+    {"allocate_string", allocate_string},
+    {NULL, NULL}
+};
+
+int luaopen_resgnuhash(lua_State *L) {
+    luaL_register(L, "resgnuhash", resgnuhash);
+    return 1;
+}
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
new file mode 100644
index 00000000..f2669502
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+  BuildTestCLib(reshash reshash.c)
+  target_link_options(reshash PRIVATE "-Wl,--hash-style=sysv")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
new file mode 100644
index 00000000..bbb58818
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/hash/reshash.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+    lua_pushstring(L, "test string");
+    return 1;
+}
+
+static const struct luaL_Reg reshash [] = {
+    {"allocate_string", allocate_string},
+    {NULL, NULL}
+};
+
+int luaopen_reshash(lua_State *L) {
+    luaL_register(L, "reshash", reshash);
+    return 1;
+}
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
new file mode 100644
index 00000000..f96b5688
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/CMakeLists.txt
@@ -0,0 +1,4 @@
+if (NOT(CMAKE_SYSTEM_NAME STREQUAL "Darwin"))
+  BuildTestCLib(resstripped resstripped.c)
+  target_link_options(resstripped PRIVATE "-s")
+endif()
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
new file mode 100644
index 00000000..82bfcb75
--- /dev/null
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols/stripped/resstripped.c
@@ -0,0 +1,17 @@
+#include <lua.h>
+#include <lauxlib.h>
+
+int allocate_string(lua_State *L) {
+    lua_pushstring(L, "test string");
+    return 1;
+}
+
+static const struct luaL_Reg resstripped [] = {
+    {"allocate_string", allocate_string},
+    {NULL, NULL}
+};
+
+int luaopen_resstripped(lua_State *L) {
+    luaL_register(L, "resstripped", resstripped);
+    return 1;
+}
diff --git a/test/tarantool-tests/tools-utils-avl.test.lua b/test/tarantool-tests/tools-utils-avl.test.lua
new file mode 100644
index 00000000..66c14765
--- /dev/null
+++ b/test/tarantool-tests/tools-utils-avl.test.lua
@@ -0,0 +1,52 @@
+local avl = require('utils.avl')
+local tap = require('tap')
+
+local test = tap.test('tools-utils-avl')
+test:plan(7)
+
+local function traverse(node, result)
+  if node ~= nil then
+    table.insert(result, node.key)
+    traverse(node.left, result)
+    traverse(node.right, result)
+  end
+  return result
+end
+
+local function batch_insert(root, values)
+  for i = 1, #values do
+    root = avl.insert(root, values[i])
+  end
+
+  return root
+end
+
+local function compare(arr1, arr2)
+  for i, v in pairs(arr1) do
+    if v ~= arr2[i] then
+      return false
+    end
+  end
+  return true
+end
+
+-- 1L rotation test.
+local root = batch_insert(nil, {1, 2, 3})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- 1R rotation test.
+root = batch_insert(nil, {3, 2, 1})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- 2L rotation test.
+root = batch_insert(nil, {1, 3, 2})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- 2R rotation test.
+root = batch_insert(nil, {3, 1, 2})
+test:ok(compare(traverse(root, {}), {2, 1, 3}))
+-- Exact upper bound.
+test:ok(avl.upper_bound(root, 1) == 1)
+-- No upper bound.
+test:ok(avl.upper_bound(root, -10) == nil)
+-- Not exact upper bound.
+test:ok(avl.upper_bound(root, 2.75) == 2)
+
+os.exit(test:check() and 0 or 1)
diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt
index 61830e44..c6803d00 100644
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -30,6 +30,7 @@ else()
     memprof/humanize.lua
     memprof/parse.lua
     memprof.lua
+    utils/avl.lua
     utils/bufread.lua
     utils/symtab.lua
   )
@@ -46,6 +47,7 @@ else()
     COMPONENT tools-parse-memprof
   )
   install(FILES
+      ${CMAKE_CURRENT_SOURCE_DIR}/utils/avl.lua
       ${CMAKE_CURRENT_SOURCE_DIR}/utils/bufread.lua
       ${CMAKE_CURRENT_SOURCE_DIR}/utils/symtab.lua
     DESTINATION ${LUAJIT_DATAROOTDIR}/utils
diff --git a/tools/utils/avl.lua b/tools/utils/avl.lua
new file mode 100644
index 00000000..ce13c260
--- /dev/null
+++ b/tools/utils/avl.lua
@@ -0,0 +1,113 @@
+local M = {}
+local max = math.max
+
+local function create_node(key, value)
+  return {
+    key = key,
+    value = value,
+    left = nil,
+    right = nil,
+    height = 1,
+  }
+end
+
+local function height(node)
+  if node == nil then
+    return 0
+  end
+  return node.height
+end
+
+local function update_height(node)
+  node.height = 1 + max(height(node.left), height(node.right))
+end
+
+local function get_balance(node)
+  if node == nil then
+    return 0
+  end
+  return height(node.left) - height(node.right)
+end
+
+local function rotate_left(node)
+    local r_subtree = node.right;
+    local rl_subtree = r_subtree.left;
+
+    r_subtree.left = node;
+    node.right = rl_subtree;
+
+    update_height(node)
+    update_height(r_subtree)
+
+    return r_subtree;
+end
+
+local function rotate_right(node)
+    local l_subtree = node.left
+    local lr_subtree = l_subtree.right;
+
+    l_subtree.right = node;
+    node.left = lr_subtree;
+
+    update_height(node)
+    update_height(l_subtree)
+
+    return l_subtree;
+end
+
+local function rebalance(node, key)
+  local balance = get_balance(node)
+
+  if -1 <= balance and balance <=1 then
+    return node
+  end
+
+  if balance > 1 and key < node.left.key then
+    return rotate_right(node)
+  elseif balance < -1 and key > node.right.key then
+    return rotate_left(node)
+  elseif balance > 1 and key > node.left.key then
+    node.left = rotate_left(node.left)
+    return rotate_right(node)
+  elseif balance < -1 and key < node.right.key then
+    node.right = rotate_right(node.right)
+    return rotate_left(node)
+  end
+end
+
+function M.insert(node, key, value)
+  if node == nil then
+    return create_node(key, value)
+  end
+
+  if key < node.key then
+    node.left = M.insert(node.left, key, value)
+  elseif key > node.key then
+    node.right = M.insert(node.right, key, value)
+  else
+    node.key = key
+    node.value = value
+  end
+
+  update_height(node)
+  return rebalance(node, key)
+end
+
+function M.upper_bound(node, key)
+  if node == nil then
+    return nil, nil
+  end
+  -- Explicit match.
+  if key == node.key then
+    return node.key, node.value
+  elseif key < node.key then
+    return M.upper_bound(node.left, key)
+  elseif key > node.key then
+    local right_key, value = M.upper_bound(node.right, key)
+    right_key = right_key or node.key
+    value = value or node.value
+    return right_key, value
+  end
+end
+
+return M
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index c7fcf77c..aedb8da0 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -6,16 +6,19 @@
 
 local bit = require "bit"
 
+local avl = require "utils.avl"
+
 local band = bit.band
 local string_format = string.format
 
 local LJS_MAGIC = "ljs"
-local LJS_CURRENT_VERSION = 0x2
+local LJS_CURRENT_VERSION = 0x3
 local LJS_EPILOGUE_HEADER = 0x80
 local LJS_SYMTYPE_MASK = 0x03
 
 local SYMTAB_LFUNC = 0
-local SYMTAB_TRACE = 1
+local SYMTAB_CFUNC = 1
+local SYMTAB_TRACE = 2
 
 local M = {}
 
@@ -50,15 +53,27 @@ local function parse_sym_trace(reader, symtab)
   }
 end
 
+-- Parse a single entry in a symtab: .so library
+local function parse_sym_cfunc(reader, symtab)
+  local addr = reader:read_uleb128()
+  local name = reader:read_string()
+
+  symtab.cfunc = avl.insert(symtab.cfunc, addr, {
+    name = name
+  })
+end
+
 local parsers = {
   [SYMTAB_LFUNC] = parse_sym_lfunc,
   [SYMTAB_TRACE] = parse_sym_trace,
+  [SYMTAB_CFUNC] = parse_sym_cfunc
 }
 
 function M.parse(reader)
   local symtab = {
     lfunc = {},
     trace = {},
+    cfunc = nil,
   }
   local magic = reader:read_octets(3)
   local version = reader:read_octets(1)
@@ -135,6 +150,12 @@ function M.demangle(symtab, loc)
     return string_format("%s:%d", symtab.lfunc[addr].source, loc.line)
   end
 
+  local key, value = avl.upper_bound(symtab.cfunc, addr)
+
+  if key then
+    return string_format("%s:%#x", value.name, key)
+  end
+
   return string_format("CFUNC %#x", addr)
 end
 
-- 
2.35.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols
  2022-04-06 13:16 [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving Maxim Kokryashkin via Tarantool-patches
  2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols Maxim Kokryashkin via Tarantool-patches
@ 2022-04-06 13:16 ` Maxim Kokryashkin via Tarantool-patches
  1 sibling, 0 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2022-04-06 13:16 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit lets memprof extend its symtab when new
C-symbols appear after dlopen. The following data is
stored in event stream for each newly loaded symbol:

| (AEVENT_SYMTAB | ASOURCE_CFUNC) | symbol address | symbol name |
              1 byte                   8 bytes
              magic
              number

Every time an allocation occurs inside a C-function, memprof calls
the `dl_iterate_phdr` to check if the `dlpi_adds` has increased since
the last C-function call. If it has, memprof dumps symbols for last
`dlpi_adds` - `prev_dlpi_adds` libraries. Sometimes it may dump a
library that was already dumped, however, that's a rare case and
the parser handles it well.

Resolves tarantool/tarantool#5813
---
>> static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size, void *data)
>> @@ -350,23 +358,46 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size, void
>> struct symbol_resolver_conf *conf = data;
>> struct lj_wbuf *buf = conf->buf;
>> lua_State *L = conf->L;
>> + const uint8_t header = conf->header;
>> +
>> + const uint32_t lib_cnt_prev = *conf->lib_cnt;
>> + uint32_t lib_cnt = 0;

> Nit: I think that we can just declare `lib_cnt`, wo initialization.
I always initialize variables when declaring them. It's a good habit,
that sometimes really helps during the process of debugging. Ignoring


> Also, I noticed that this test is __really__ long now. May be we should
> use smaller amount of allocations?
Well, it takes a long time not because of the number of allocations, but because it
takes a long time to insert all of the C symbols into the AVL tree.
Anyway, I made the number of allocations less.

> Side note:
> Did you rebase you branch on Mikhail's one with enriched symtab for Lua
> functions and traces? It helps to avoid double review of the same things
> (and possible conflicts) in memprof.h, for example.
> Also, those commits should resolve address clashing when a different
> function allocates add the same address as the old deleted one.
No, I didn't because it is still not merged into master.
Also, it will be nice to add the following test:

> 0) Load 1 library, during memprof running, use functions from it with
>    allocations
> 1) Unload it, load the second one, use functions from it with
>    allocations
> 2) Load the 1st lib again, use functions again
> 3) Check that profile is correct
As Mike says in this[1] letter, it's impossible to unload a library
from Lua side. The only possible way for this to happen is when a
C-library for Lua loads and unloads another library. That case seems to
be really rare, so I don't think we need to test it.

[1]: https://www.freelists.org/post/luajit/BUG-Assertion-failures-when-unloading-and-reloading-the-ffi-package,1

 src/lj_memprof.c                              | 110 ++++++++++++++----
 src/lj_memprof.h                              |   8 +-
 .../gh-5813-resolving-of-c-symbols.test.lua   |  12 +-
 tools/memprof.lua                             |   5 +
 tools/memprof/parse.lua                       |  15 +++
 tools/utils/symtab.lua                        |   4 +
 6 files changed, 128 insertions(+), 26 deletions(-)

diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 74f9b810..2c6f67e7 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -122,7 +122,7 @@ static uint32_t ghashtab_size(ElfW(Addr) ghashtab)
 }
 
 static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
-    size_t sym_cnt, struct lj_wbuf *buf)
+    size_t sym_cnt, const uint8_t header, struct lj_wbuf *buf)
 {
   /*
   ** Index 0 in ELF symtab is used to represent undefined symbols. Hence, we
@@ -142,7 +142,7 @@ static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
     if (ELF32_ST_TYPE(sym[sym_index].st_info) == STT_FUNC &&
         sym[sym_index].st_name != 0) {
       char *sym_name = &strtab[sym[sym_index].st_name];
-      lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
+      lj_wbuf_addbyte(buf, header);
       lj_wbuf_addu64(buf, sym[sym_index].st_value + so_addr);
       lj_wbuf_addstring(buf, sym_name);
     }
@@ -150,7 +150,7 @@ static void write_c_symtab (ElfW(Sym *) sym, char *strtab, ElfW(Addr) so_addr,
 }
 
 static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
-    lua_State *L, const ElfW(Addr) so_addr)
+    lua_State *L, const uint8_t header, const ElfW(Addr) so_addr)
 {
   int status = 0;
 
@@ -244,7 +244,7 @@ static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
       sizeof(char) * strtab_size && ferror(elf_file) != 0)
     goto error;
 
-  write_c_symtab(sym, strtab, so_addr, sym_cnt, buf);
+  write_c_symtab(sym, strtab, so_addr, sym_cnt, header, buf);
 
   goto end;
 
@@ -264,7 +264,8 @@ static int dump_sht_symtab(const char *elf_name, struct lj_wbuf *buf,
   return status;
 }
 
-static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
+static int dump_dyn_symtab
+(struct dl_phdr_info *info, const uint8_t header, struct lj_wbuf *buf)
 {
   for (size_t header_index = 0; header_index < info->dlpi_phnum; ++header_index) {
     if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC) {
@@ -326,7 +327,7 @@ static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
       ** For more, see https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-48031.html
       */
       sym_cnt = ghashtab == 0 ? hashtab[1] : ghashtab_size(ghashtab);
-      write_c_symtab(sym, strtab, info->dlpi_addr, sym_cnt, buf);
+      write_c_symtab(sym, strtab, info->dlpi_addr, sym_cnt, header, buf);
       return 0;
     }
   }
@@ -335,8 +336,13 @@ static int dump_dyn_symtab(struct dl_phdr_info *info, struct lj_wbuf *buf)
 }
 
 struct symbol_resolver_conf {
-  struct lj_wbuf *buf;
-  lua_State *L;
+  struct lj_wbuf *buf; /* Output buffer. */
+  lua_State *L; /* Current Lua state. */
+  const uint8_t header; /* Header for symbol entries to write. */
+
+  uint32_t cur_lib; /* Index of the lib currently dumped. */
+  uint32_t to_dump_cnt; /* Amount of libs to dump. */
+  uint32_t *lib_adds; /* Memprof's counter of libs. */
 };
 
 static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
@@ -345,23 +351,48 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
   struct symbol_resolver_conf *conf = data;
   struct lj_wbuf *buf = conf->buf;
   lua_State *L = conf->L;
+  const uint8_t header = conf->header;
+
+  uint32_t lib_cnt = 0;
+
+  /*
+  ** Check that dlpi_adds and dlpi_subs fields are available.
+  ** Assertion was taken from the GLIBC tests:
+  ** https://code.woboq.org/userspace/glibc/elf/tst-dlmodcount.c.html#37
+  */
+  lua_assert(info_size > offsetof(struct dl_phdr_info, dlpi_subs)
+      + sizeof(info->dlpi_subs));
 
-  UNUSED(info_size);
+  lib_cnt = info->dlpi_adds - *conf->lib_adds;
 
   /* Skip vDSO library. */
   if (info->dlpi_addr == getauxval(AT_SYSINFO_EHDR))
     return 0;
 
+  if ((conf->to_dump_cnt = info->dlpi_adds - *conf->lib_adds) == 0)
+    /* No new libraries, stop resolver. */
+    return 1;
+
+  if (conf->cur_lib < lib_cnt - conf->to_dump_cnt) {
+    /* That lib is already dumped, skip it. */
+    ++conf->cur_lib;
+    return 0;
+  }
+
+  if (conf->cur_lib == lib_cnt - conf->to_dump_cnt - 1)
+    /* Last library, update memrpof's lib counter. */
+    *conf->lib_adds = info->dlpi_adds;
+
   /*
   ** Main way: try to open ELF and read SHT_SYMTAB, SHT_STRTAB and SHT_HASH
   ** sections from it.
   */
-  if (dump_sht_symtab(info->dlpi_name, buf, L, info->dlpi_addr) == 0) {
-    /* Empty body. */
+  if (dump_sht_symtab(info->dlpi_name, buf, L, header, info->dlpi_addr) == 0) {
+    ++conf->cur_lib;
   }
   /* First fallback: dump functions only from PT_DYNAMIC segment. */
-  else if(dump_dyn_symtab(info, buf) == 0) {
-    /* Empty body. */
+  else if(dump_dyn_symtab(info, header, buf) == 0) {
+    ++conf->cur_lib;
   }
   /*
   ** Last resort: dump ELF size and address to show .so name for its functions
@@ -371,6 +402,7 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
     lj_wbuf_addbyte(buf, SYMTAB_CFUNC);
     lj_wbuf_addu64(buf, info->dlpi_addr);
     lj_wbuf_addstring(buf, info->dlpi_name);
+    ++conf->cur_lib;
   }
 
   return 0;
@@ -378,7 +410,8 @@ static int resolve_symbolnames(struct dl_phdr_info *info, size_t info_size,
 
 #endif /* LUAJIT_OS != LUAJIT_OS_OSX */
 
-static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
+static void dump_symtab(struct lj_wbuf *out, const struct global_State *g,
+    uint32_t *lib_adds)
 {
   const GCRef *iter = &g->gc.root;
   const GCobj *o;
@@ -387,8 +420,14 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
 #if LUAJIT_OS != LUAJIT_OS_OSX
   struct symbol_resolver_conf conf = {
     out,
-    gco2th(gcref(g->cur_L))
+    gco2th(gcref(g->cur_L)),
+    SYMTAB_CFUNC,
+    0,
+    0,
+    lib_adds
   };
+#else
+  UNUSED(lib_adds);
 #endif
 
   /* Write prologue. */
@@ -447,6 +486,7 @@ struct memprof {
   struct alloc orig_alloc; /* Original allocator. */
   struct lj_memprof_options opt; /* Profiling options. */
   int saved_errno; /* Saved errno when profiler deinstrumented. */
+  uint32_t lib_adds; /* Number of libs loaded. Monotonic. */
 };
 
 static struct memprof memprof = {0};
@@ -482,15 +522,43 @@ static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t aevent,
 }
 
 static void memprof_write_cfunc(struct lj_wbuf *out, uint8_t aevent,
-				const GCfunc *fn)
+				const GCfunc *fn, lua_State *L, uint32_t *lib_adds)
 {
+#if LUAJIT_OS != LUAJIT_OS_OSX
+  /* Check if there are any new libs. */
+  struct symbol_resolver_conf conf = {
+    out,
+    L,
+    AEVENT_SYMTAB | ASOURCE_CFUNC,
+    0,
+    0,
+    lib_adds
+  };
+
+  /* XXX: Leaving the `vmstate` unchanged leads to an infinite
+  ** recursion, because allocations inside ELF parser are treated
+  ** as C-side allocations by memrpof. Setting the `vmstate` to
+  ** LJ_VMST_INTERP solves the issue.
+  */
+  global_State *g = G(L);
+  const uint32_t ostate = g->vmstate;
+  g->vmstate = ~LJ_VMST_INTERP;
+
+  dl_iterate_phdr(resolve_symbolnames, &conf);
+
+  /* Restore vmstate. */
+  g->vmstate = ostate;
+#else
+  UNUSED(lib_adds);
+#endif
+
   lj_wbuf_addbyte(out, aevent | ASOURCE_CFUNC);
   lj_wbuf_addu64(out, (uintptr_t)fn->c.f);
 }
 
 static void memprof_write_ffunc(struct lj_wbuf *out, uint8_t aevent,
 				GCfunc *fn, struct lua_State *L,
-				cTValue *frame)
+				cTValue *frame, uint32_t *lib_adds)
 {
   cTValue *pframe = frame_prev(frame);
   GCfunc *pfn = frame_func(pframe);
@@ -503,7 +571,7 @@ static void memprof_write_ffunc(struct lj_wbuf *out, uint8_t aevent,
   if (pfn != NULL && isluafunc(pfn))
     memprof_write_lfunc(out, aevent, pfn, L, frame);
   else
-    memprof_write_cfunc(out, aevent, fn);
+    memprof_write_cfunc(out, aevent, fn, L, lib_adds);
 }
 
 static void memprof_write_func(struct memprof *mp, uint8_t aevent)
@@ -516,9 +584,9 @@ static void memprof_write_func(struct memprof *mp, uint8_t aevent)
   if (isluafunc(fn))
     memprof_write_lfunc(out, aevent, fn, L, NULL);
   else if (isffunc(fn))
-    memprof_write_ffunc(out, aevent, fn, L, frame);
+    memprof_write_ffunc(out, aevent, fn, L, frame, &mp->lib_adds);
   else if (iscfunc(fn))
-    memprof_write_cfunc(out, aevent, fn);
+    memprof_write_cfunc(out, aevent, fn, L, &mp->lib_adds);
   else
     lua_assert(0);
 }
@@ -654,7 +722,7 @@ int lj_memprof_start(struct lua_State *L, const struct lj_memprof_options *opt)
 
   /* Init output. */
   lj_wbuf_init(&mp->out, mp_opt->writer, mp_opt->ctx, mp_opt->buf, mp_opt->len);
-  dump_symtab(&mp->out, mp->g);
+  dump_symtab(&mp->out, mp->g, &mp->lib_adds);
 
   /* Write prologue. */
   lj_wbuf_addn(&mp->out, ljm_header, ljm_header_len);
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index e7eae4c6..75d9f438 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -71,10 +71,11 @@
 ** prologue       := 'l' 'j' 'm' version reserved
 ** version        := <BYTE>
 ** reserved       := <BYTE> <BYTE> <BYTE>
-** event          := event-alloc | event-realloc | event-free
+** event          := event-alloc | event-realloc | event-free | event-symtab
 ** event-alloc    := event-header loc? naddr nsize
 ** event-realloc  := event-header loc? oaddr osize naddr nsize
 ** event-free     := event-header loc? oaddr osize
+** event-symtab   := event-header sym-addr sym-name
 ** event-header   := <BYTE>
 ** loc            := loc-lua | loc-c | loc-trace
 ** loc-lua        := sym-addr line-no
@@ -88,7 +89,11 @@
 ** naddr          := <ULEB128>
 ** osize          := <ULEB128>
 ** nsize          := <ULEB128>
+** sym-name       := string
 ** epilogue       := event-header
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
 **
 ** <BYTE>   :  A single byte (no surprises here)
 ** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
@@ -107,6 +112,7 @@
 */
 
 /* Allocation events. */
+#define AEVENT_SYMTAB  ((uint8_t)0)
 #define AEVENT_ALLOC   ((uint8_t)1)
 #define AEVENT_FREE    ((uint8_t)2)
 #define AEVENT_REALLOC ((uint8_t)(AEVENT_ALLOC | AEVENT_FREE))
diff --git a/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
index b28a3948..e071d405 100644
--- a/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
+++ b/test/tarantool-tests/gh-5813-resolving-of-c-symbols.test.lua
@@ -18,6 +18,7 @@ local symtab = require "utils.symtab"
 local testboth = require "resboth"
 local testhash = require "reshash"
 local testgnuhash = require "resgnuhash"
+local memprof = require "memprof.parse"
 
 local TMP_BINFILE = arg[0]:gsub(".+/([^/]+)%.test%.lua$", "%.%1.memprofdata.tmp.bin")
 
@@ -51,13 +52,16 @@ misc.memprof.stop()
 
 local reader = bufread.new(TMP_BINFILE)
 local symbols = symtab.parse(reader)
+local events = memprof.parse(reader)
 
+for addr, event in pairs(events.symtab) do
+  symtab.add_cfunc(symbols, addr, event.name)
+end
+
+-- Static symbol resolution.
 test:ok(tree_contains(symbols.cfunc, "luaopen_os"))
 
--- Dynamic symbols resolution.
-generate_output(teststripped.allocate_string)
-reader = bufread.new(TMP_BINFILE)
-symbols = symtab.parse(reader)
+-- Dynamic symbol resolution. Newly loaded symbol resolution.
 test:ok(tree_contains(symbols.cfunc, "allocate_string"))
 
 -- .hash style symbol table.
diff --git a/tools/memprof.lua b/tools/memprof.lua
index 18b44fdd..805b7e74 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -101,6 +101,11 @@ local function dump(inputfile)
   local reader = bufread.new(inputfile)
   local symbols = symtab.parse(reader)
   local events = memprof.parse(reader, symbols)
+
+  for addr, event in pairs(events.symtab) do
+    symtab.add_cfunc(symbols, addr, event.name)
+  end
+
   if not leak_only then
     view.profile_info(events, symbols)
   end
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index 47dbaee4..181a7e92 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -17,6 +17,7 @@ local LJM_CURRENT_VERSION = 0x02
 
 local LJM_EPILOGUE_HEADER = 0x80
 
+local AEVENT_SYMTAB = 0
 local AEVENT_ALLOC = 1
 local AEVENT_FREE = 2
 local AEVENT_REALLOC = 3
@@ -41,6 +42,7 @@ local function new_event(loc)
     free = 0,
     alloc = 0,
     primary = {},
+    name = nil
   }
 end
 
@@ -85,6 +87,17 @@ local function parse_location(reader, asource)
   return symtab.id(loc), loc
 end
 
+local function parse_symtab(reader, _, events, _)
+  local id = reader:read_uleb128()
+  local name = reader:read_string()
+
+  if not events[id] then
+    events[id] = new_event(0)
+  end
+
+  events[id].name = name
+end
+
 local function parse_alloc(reader, asource, events, heap)
   local id, loc = parse_location(reader, asource)
 
@@ -142,6 +155,7 @@ local function parse_free(reader, asource, events, heap)
 end
 
 local parsers = {
+  [AEVENT_SYMTAB] = {evname = "symtab", parse = parse_symtab},
   [AEVENT_ALLOC] = {evname = "alloc", parse = parse_alloc},
   [AEVENT_FREE] = {evname = "free", parse = parse_free},
   [AEVENT_REALLOC] = {evname = "realloc", parse = parse_realloc},
@@ -182,6 +196,7 @@ function M.parse(reader)
     realloc = {},
     free = {},
     heap = {},
+    symtab = {}
   }
 
   local magic = reader:read_octets(3)
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index aedb8da0..5197d956 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -159,4 +159,8 @@ function M.demangle(symtab, loc)
   return string_format("CFUNC %#x", addr)
 end
 
+function M.add_cfunc(symtab, addr, name)
+  symtab.cfunc = avl.insert(symtab.cfunc, addr, {name = name})
+end
+
 return M
-- 
2.35.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-06 13:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 13:16 [Tarantool-patches] [PATCH luajit v7 0/2] memprof: C-symbols resolving Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 1/2] memprof: extend symtab with C-symbols Maxim Kokryashkin via Tarantool-patches
2022-04-06 13:16 ` [Tarantool-patches] [PATCH luajit v7 2/2] memprof: enrich symtab with newly loaded symbols Maxim Kokryashkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox