Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/3] memprof: add demangling capabilities for C functions
@ 2021-07-23 12:39 Maxim Kokryashkin via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table Maxim Kokryashkin via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-07-23 12:39 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

Once applied, this patch series will make memprof able to demangle 
C fucntions. It means that memprof report will show their names
instead of meaningless address.

That's like memprof report for lua script with C modules looks like now:
```
ALLOCATIONS
@test.lua:0: 4 events   +640 bytes      -0 bytes
INTERNAL: 3 events      +1164 bytes     -0 bytes
CFUNC 0x7f939831a119: 1 events  +29 bytes       -0
bytes

REALLOCATIONS
INTERNAL: 1 events      +128 bytes      -256 bytes

DEALLOCATIONS
INTERNAL: 254 events    +0 bytes        -61020 bytes
        Overrides:
                @test.lua:0

HEAP SUMMARY:
INTERNAL holds 1164 bytes: 3 allocs, 0 frees
@test.lua:0 holds 384 bytes: 4 allocs, 1 frees
CFUNC 0x7f939831a119 holds 29 bytes: 1 allocs, 0 frees
```

Memprof report after applying the patch for the same script:

```
ALLOCATIONS
@test.lua:0: 4 events   +640 bytes      -0 bytes
INTERNAL: 3 events      +1164 bytes     -0 bytes
/home/maxim/Programming/memprof-demangling/hello.so:say_hello 0x7f939831a119: 1 events  +29 bytes       -0
bytes

REALLOCATIONS
INTERNAL: 1 events      +128 bytes      -256 bytes

DEALLOCATIONS
INTERNAL: 254 events    +0 bytes        -61020 bytes
        Overrides:
                @test.lua:0


HEAP SUMMARY:
INTERNAL holds 1164 bytes: 3 allocs, 0 frees
@test.lua:0 holds 384 bytes: 4 allocs, 1 frees
/home/maxim/Programming/memprof-demangling/hello.so:say_hello 0x7f939831a119 holds 29 bytes: 1 allocs, 0 frees
```

Part of tarantool/tarantool#5813

Maxim Kokryashkin (3):
  memprof: extend memprof symbol table
  memprof: update memprof parser
  test: adapt memprof tests to new symbol table format

 CMakeLists.txt                                |   2 +
 src/CMakeLists.txt                            |   1 +
 src/lj_hash.c                                 | 136 ++++++++++++++++++
 src/lj_hash.h                                 |  28 ++++
 src/lj_memprof.c                              |  27 +++-
 src/lj_memprof.h                              |   1 +
 .../misclib-memprof-lapi.test.lua             |   5 +-
 tools/CMakeLists.txt                          |   2 +
 tools/utils/hash.lua                          |  99 +++++++++++++
 tools/utils/symtab.lua                        | 112 ++++++++++++++-
 10 files changed, 405 insertions(+), 8 deletions(-)
 create mode 100644 src/lj_hash.c
 create mode 100644 src/lj_hash.h
 create mode 100644 tools/utils/hash.lua

---
Github branch in tarantool/tarantool:
https://github.com/tarantool/tarantool/tree/fckxorg/gh-5813-memprof-demangling

Github branch in tarantool/luajit:
https://github.com/tarantool/luajit/tree/fckxorg/gh-5813-demangling-of-c-symbols

2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table
  2021-07-23 12:39 [Tarantool-patches] [PATCH 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
@ 2021-07-23 12:39 ` Maxim Kokryashkin via Tarantool-patches
  2021-07-28  8:16   ` Sergey Kaplun via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 2/3] memprof: update memprof parser Maxim Kokryashkin via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 3/3] test: adapt memprof tests to new symbol table format Maxim Kokryashkin via Tarantool-patches
  2 siblings, 1 reply; 5+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-07-23 12:39 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit enriches memprof's symbol table with information about loaded
".so" files. That information will provide demangling capabilities to
the parser.

The following data is stored for each ".so" library:
| SYMTAB_SO | path to .so | it's address |  CRC32 Hash  |
  1 byte                       8 bytes       8 bytes
  magic
  number

C symbol address from profile events with an address of the ".so" file will
give us an offset, and it can be used to get the symbol in that file.

CRC32 hash will be used by parser to determine whether the shared
object that was used during the process profiing and during the process
of parsing is the same.

The CRC32 hash implementation was taken from gcc libberty, which is
distributed under GNU GPL license

Part of tarantool/tarantool#5813
---
 src/CMakeLists.txt |   1 +
 src/lj_hash.c      | 136 +++++++++++++++++++++++++++++++++++++++++++++
 src/lj_hash.h      |  28 ++++++++++
 src/lj_memprof.c   |  27 ++++++++-
 src/lj_memprof.h   |   1 +
 5 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 src/lj_hash.c
 create mode 100644 src/lj_hash.h

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 809aac68..feefb412 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -63,6 +63,7 @@ make_source_list(SOURCES_UTILS
     lj_utils_leb128.c
     lj_vmmath.c
     lj_wbuf.c
+    lj_hash.c
 )
 
 make_source_list(SOURCES_PROFILER
diff --git a/src/lj_hash.c b/src/lj_hash.c
new file mode 100644
index 00000000..88c41a4b
--- /dev/null
+++ b/src/lj_hash.c
@@ -0,0 +1,136 @@
+/*
+   Copyright (C) 2009-2021 Free Software Foundation, Inc.
+   This file is part of the libiberty library.
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+   In addition to the permissions in the GNU General Public License, the
+   Free Software Foundation gives you unlimited permission to link the
+   compiled version of this file into combinations with other programs,
+   and to distribute those combinations without any restriction coming
+   from the use of this file.  (The General Public License restrictions
+   do apply in other respects; for example, they cover modification of
+   the file, and distribution when not linked into a combined
+   executable.)
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+*/
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#include "lj_hash.h"
+
+static const uint32_t crc32_table[] =
+{
+  0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
+  0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005,
+  0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
+  0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
+  0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
+  0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
+  0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011,
+  0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd,
+  0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
+  0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
+  0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81,
+  0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
+  0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49,
+  0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95,
+  0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
+  0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d,
+  0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae,
+  0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
+  0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16,
+  0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
+  0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
+  0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02,
+  0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066,
+  0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
+  0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
+  0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692,
+  0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
+  0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a,
+  0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e,
+  0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
+  0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686,
+  0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a,
+  0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
+  0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb,
+  0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
+  0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
+  0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47,
+  0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b,
+  0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
+  0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
+  0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7,
+  0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
+  0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f,
+  0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3,
+  0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
+  0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b,
+  0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f,
+  0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
+  0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640,
+  0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
+  0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
+  0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24,
+  0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30,
+  0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
+  0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
+  0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654,
+  0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
+  0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c,
+  0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18,
+  0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
+  0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0,
+  0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c,
+  0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
+  0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
+};
+
+uint32_t lj_crc32 (const char *buf, size_t len, uint32_t init)
+{
+  uint32_t crc = init;
+  while (len--)
+    {
+      crc = (crc << 8) ^ crc32_table[((crc >> 24) ^ *buf) & 255];
+      buf++;
+    }
+  return crc;
+}
+
+uint32_t lj_get_file_crc32(const char* name) {
+  FILE* so = fopen(name, "r");
+  if(!so) {
+    return 0;
+  }
+
+  fseek(so, 0, SEEK_END);
+  size_t size = ftell(so);
+  rewind(so);
+
+  char* buf = calloc(size, sizeof(char));
+  if(!buf) {
+    fclose(so);
+    return 0;
+  }
+
+  if(size != fread(buf, sizeof(char), size, so)) {
+    free(buf);
+    fclose(so);
+    return 0;
+  }
+
+  uint32_t hash = lj_crc32(buf, size, 0);
+  free(buf);
+  fclose(so);
+  return hash;
+}
+
diff --git a/src/lj_hash.h b/src/lj_hash.h
new file mode 100644
index 00000000..aca24e97
--- /dev/null
+++ b/src/lj_hash.h
@@ -0,0 +1,28 @@
+#include <stdint.h>
+#include <sys/types.h>
+
+/*
+@deftypefn Extension {unsigned int} crc32 (const unsigned char *@var{buf}, @
+  int @var{len}, unsigned int @var{init})
+Compute the 32-bit CRC of @var{buf} which has length @var{len}.  The
+starting value is @var{init}; this may be used to compute the CRC of
+data split across multiple buffers by passing the return value of each
+call as the @var{init} parameter of the next.
+This is used by the @command{gdb} remote protocol for the @samp{qCRC}
+command.  In order to get the same results as gdb for a block of data,
+you must pass the first CRC parameter as @code{0xffffffff}.
+This CRC can be specified as:
+  Width  : 32
+  Poly   : 0x04c11db7
+  Init   : parameter, typically 0xffffffff
+  RefIn  : false
+  RefOut : false
+  XorOut : 0
+This differs from the "standard" CRC-32 algorithm in that the values
+are not reflected, and there is no final XOR value.  These differences
+make it easy to compose the values of multiple blocks.
+@end deftypefn
+*/
+
+uint32_t lj_crc32(const char *buf, size_t len, uint32_t init);
+uint32_t lj_get_file_crc32(const char* name);
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2c1ef3b8..4e8183aa 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -5,13 +5,19 @@
 ** Copyright (C) 2015-2019 IPONWEB Ltd.
 */
 
+#define _GNU_SOURCE
+#include <link.h>
+
+#include "lj_wbuf.h"
 #define lj_memprof_c
 #define LUA_CORE
-
 #include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
 
 #include "lj_arch.h"
 #include "lj_memprof.h"
+#include "lj_hash.h"
 
 #if LJ_HASMEMPROF
 
@@ -24,6 +30,22 @@
 static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
 					   0x0, 0x0, 0x0};
 
+int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data)
+{
+  struct lj_wbuf *buf = data;
+
+  if(info->dlpi_name && strlen(info->dlpi_name)) {
+    lj_wbuf_addbyte(buf, SYMTAB_SO);
+    lj_wbuf_addstring(buf, info->dlpi_name);
+    lj_wbuf_addu64(buf, info->dlpi_addr);
+
+    uint64_t so_hash = lj_get_file_crc32(info->dlpi_name);
+    lj_wbuf_addu64(buf, so_hash);
+  }
+
+  return 0;
+}
+
 static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
 {
   const GCRef *iter = &g->gc.root;
@@ -49,6 +71,9 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
     iter = &o->gch.nextgc;
   }
 
+  /* Write shared libraries */
+  dl_iterate_phdr(write_shared_obj, out);
+
   lj_wbuf_addbyte(out, SYMTAB_FINAL);
 }
 
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 3417475d..d53f77bc 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -51,6 +51,7 @@
 */
 
 #define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_SO ((uint8_t)1)
 #define SYMTAB_FINAL ((uint8_t)0x80)
 
 #define LJM_CURRENT_FORMAT_VERSION 0x01
-- 
2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Tarantool-patches] [PATCH 2/3] memprof: update memprof parser
  2021-07-23 12:39 [Tarantool-patches] [PATCH 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table Maxim Kokryashkin via Tarantool-patches
@ 2021-07-23 12:39 ` Maxim Kokryashkin via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 3/3] test: adapt memprof tests to new symbol table format Maxim Kokryashkin via Tarantool-patches
  2 siblings, 0 replies; 5+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-07-23 12:39 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit introduces demangling of C symbols to memprof parser.

As symbol table format has changed, parser needed to be updated too. Now
the parser supports new symbol table entries, containing data about shared
objects, that were loaded at the moment of data collection.

Secondly, we needed a way to verify, that .so libraries that we encounter
during the process of parsing are the same libraires we have encountered during
the process of a data collection. A full CRC32 hash of .so file contents is used
for that purpose.

C symbols resolution is implemented with dladdr, which resolves symbols
well only if they are loaded into the dynamic symbol table. Hence, LuaJIT now has to be
compiled with -rdynamic option, which forces the dynamic linker to add
all the symbols from loaded .so libraries to the dynamic symbol table.

Part of tarantool/tarantool#5813
---
 CMakeLists.txt         |   2 +
 tools/CMakeLists.txt   |   2 +
 tools/utils/hash.lua   |  99 ++++++++++++++++++++++++++++++++++++
 tools/utils/symtab.lua | 112 +++++++++++++++++++++++++++++++++++++++--
 4 files changed, 210 insertions(+), 5 deletions(-)
 create mode 100644 tools/utils/hash.lua

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 5348e043..08243f11 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -81,6 +81,8 @@ endif()
 
 # --- Compilation flags setup --------------------------------------------------
 
+AppendFlags(TARGET_C_FLAGS -rdynamic)
+
 if(NOT CMAKE_INSTALL_PREFIX STREQUAL "/usr/local")
   AppendFlags(TARGET_C_FLAGS -DLUA_ROOT='"${CMAKE_INSTALL_PREFIX}"')
 endif()
diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt
index 61830e44..a70be4ea 100644
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -32,6 +32,7 @@ else()
     memprof.lua
     utils/bufread.lua
     utils/symtab.lua
+    utils/hash.lua
   )
   list(APPEND LUAJIT_TOOLS_DEPS tools-parse-memprof)
 
@@ -48,6 +49,7 @@ else()
   install(FILES
       ${CMAKE_CURRENT_SOURCE_DIR}/utils/bufread.lua
       ${CMAKE_CURRENT_SOURCE_DIR}/utils/symtab.lua
+      ${CMAKE_CURRENT_SOURCE_DIR}/utils/hash.lua
     DESTINATION ${LUAJIT_DATAROOTDIR}/utils
     PERMISSIONS
       OWNER_READ OWNER_WRITE
diff --git a/tools/utils/hash.lua b/tools/utils/hash.lua
new file mode 100644
index 00000000..b6abc89c
--- /dev/null
+++ b/tools/utils/hash.lua
@@ -0,0 +1,99 @@
+local bit = require "bit"
+
+local M = {}
+
+local CRC32 = {
+  0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
+  0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005,
+  0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
+  0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
+  0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
+  0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
+  0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011,
+  0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd,
+  0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
+  0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
+  0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81,
+  0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
+  0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49,
+  0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95,
+  0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
+  0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d,
+  0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae,
+  0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
+  0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16,
+  0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
+  0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
+  0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02,
+  0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066,
+  0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
+  0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
+  0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692,
+  0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
+  0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a,
+  0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e,
+  0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
+  0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686,
+  0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a,
+  0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
+  0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb,
+  0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
+  0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
+  0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47,
+  0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b,
+  0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
+  0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
+  0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7,
+  0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
+  0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f,
+  0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3,
+  0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
+  0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b,
+  0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f,
+  0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
+  0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640,
+  0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
+  0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
+  0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24,
+  0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30,
+  0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
+  0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
+  0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654,
+  0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
+  0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c,
+  0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18,
+  0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
+  0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0,
+  0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c,
+  0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
+  0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
+}
+
+local xor = bit.bxor
+local lshift = bit.lshift
+local rshift = bit.rshift
+local band = bit.band
+
+function M.crc32(str)
+    str = tostring(str)
+    local len = string.len(str)
+    local crc = 0
+    local i = 1
+
+    while len > 0 do
+        local byte = string.byte(str, i)
+        i = i + 1
+        len = len - 1
+
+        local tab_idx = band(xor(rshift(crc, 24), byte), 0xFF) + 1
+        crc = xor(band(lshift(crc, 8), 2^32 - 1), CRC32[tab_idx])
+    end
+
+    -- dirty hack for bitop return number < 0
+    if crc < 0 then crc = crc + 2 ^ 32 end
+
+    return crc
+end
+
+
+return M
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 3ed1dd13..c5724d80 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -5,16 +5,57 @@
 -- Copyright (C) 2015-2019 IPONWEB Ltd.
 
 local bit = require "bit"
+local io = require "io"
+local hash = require "utils.hash"
 
 local band = bit.band
 local string_format = string.format
 
+local ffi = require "ffi"
+local dl = ffi.load "dl"
+
+ffi.cdef[[
+  typedef struct {
+    const char *dli_fname;
+    void *dli_fbase;
+    const char *dli_sname;
+    void *dli_saddr;
+  } Dl_info;
+
+  typedef struct
+  {
+    uint32_t        p_type;
+    uint32_t        p_flags;
+    uint64_t        p_offset;
+    uint64_t        p_vaddr;
+    uint64_t        p_paddr;
+    uint64_t        p_filesz;
+    uint64_t        p_memsz;
+    uint64_t        p_align;
+  } Elf64_Phdr;
+
+  struct dl_phdr_info {
+    uint64_t dlpi_addr;
+    const char *dlpi_name;
+    const Elf64_Phdr *dlpi_phdr;
+    uint16_t dlpi_phnum;
+  };
+
+  void *dlopen(const char *filename, int flags);
+  int dlclose(void *handle);
+  int dladdr(const void *addr, Dl_info *info);
+  int dl_iterate_phdr(int (*callback) (struct dl_phdr_info *info, size_t size, void *data), void *data);
+]]
+
 local LJS_MAGIC = "ljs"
 local LJS_CURRENT_VERSION = 1
 local LJS_EPILOGUE_HEADER = 0x80
 local LJS_SYMTYPE_MASK = 0x03
 
 local SYMTAB_LFUNC = 0
+local SYMTAB_SO = 1
+
+local RTLD_NOW = 0x00002
 
 local M = {}
 
@@ -24,18 +65,59 @@ local function parse_sym_lfunc(reader, symtab)
   local sym_chunk = reader:read_string()
   local sym_line = reader:read_uleb128()
 
-  symtab[sym_addr] = {
+  symtab[SYMTAB_LFUNC][sym_addr] = {
     source = sym_chunk,
     linedefined = sym_line,
   }
 end
 
+-- Parse a single entry in a symtab: .so library
+local function parse_sym_so(reader, symtab)
+  local path = reader:read_string()
+  local addr = reader:read_uleb128()
+  local so_hash = reader:read_uleb128()
+
+  local handle = dl.dlopen(path, RTLD_NOW);
+
+  if handle == nil then
+    return
+  end
+
+  local file = io.open(path, "rb")
+
+  if file == nil then
+    dl.dlclose(handle);
+    return
+  end
+
+  local content = file:read("*a")
+  local size = string.len(content)
+  file:close()
+
+  if hash.crc32(content) ~= so_hash then
+    dl.dlclose(handle)
+    return
+  end
+
+  symtab[SYMTAB_SO][path] = {
+    handle = handle,
+    size = size,
+    old_addr = addr,
+    new_addr = ffi.cast("void*", ffi.cast("uint64_t*", handle)[0])
+  }
+end
+
 local parsers = {
   [SYMTAB_LFUNC] = parse_sym_lfunc,
+  [SYMTAB_SO] = parse_sym_so
 }
 
 function M.parse(reader)
-  local symtab = {}
+  local symtab = {
+    [SYMTAB_LFUNC] = {},
+    [SYMTAB_SO] = {}
+  }
+
   local magic = reader:read_octets(3)
   local version = reader:read_octets(1)
 
@@ -69,7 +151,6 @@ function M.parse(reader)
       parsers[sym_type](reader, symtab)
     end
   end
-
   return symtab
 end
 
@@ -80,8 +161,29 @@ function M.demangle(symtab, loc)
     return "INTERNAL"
   end
 
-  if symtab[addr] then
-    return string_format("%s:%d", symtab[addr].source, loc.line)
+  if symtab[SYMTAB_LFUNC][addr] then
+    return string_format("%s:%d", symtab[SYMTAB_LFUNC][addr].source, loc.line)
+  end
+
+  local pDl_info = ffi.new'Dl_info[1]'
+
+  for _, info in pairs(symtab[SYMTAB_SO]) do
+    local offset = addr - info.old_addr
+
+    if offset <= info.size and offset > 0 then
+      local ptr = ffi.cast("void*", offset + ffi.cast('uintptr_t', info.new_addr))
+
+      if 0 ~= dl.dladdr(ptr, pDl_info) then
+        local file = ffi.string(pDl_info[0].dli_fname)
+
+        local symbol = 'unresolved symbol'
+        if pDl_info[0].dli_sname ~= nil then
+          symbol = ffi.string(pDl_info[0].dli_sname)
+        end
+
+        return string_format("%s:%s %#x", file, symbol, addr)
+      end
+    end
   end
 
   return string_format("CFUNC %#x", addr)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Tarantool-patches] [PATCH 3/3] test: adapt memprof tests to new symbol table format
  2021-07-23 12:39 [Tarantool-patches] [PATCH 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table Maxim Kokryashkin via Tarantool-patches
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 2/3] memprof: update memprof parser Maxim Kokryashkin via Tarantool-patches
@ 2021-07-23 12:39 ` Maxim Kokryashkin via Tarantool-patches
  2 siblings, 0 replies; 5+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-07-23 12:39 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

Since memprof's symbol table format has changed to support demangling, it is
necessary to adapt tests to it.

Closes tarantool/tarantool#5813
---
 test/tarantool-tests/misclib-memprof-lapi.test.lua | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 06d96b3b..65b6afa6 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -53,6 +53,7 @@ local function generate_output(filename)
 end
 
 local function fill_ev_type(events, symbols, event_type)
+  local SYMTAB_LFUNC = 0
   local ev_type = {}
   for _, event in pairs(events[event_type]) do
     local addr = event.loc.addr
@@ -61,10 +62,10 @@ local function fill_ev_type(events, symbols, event_type)
         name = "INTERNAL",
         num = event.num,
     }
-    elseif symbols[addr] then
+    elseif symbols[SYMTAB_LFUNC][addr] then
       ev_type[event.loc.line] = {
         name = string.format(
-          "%s:%d", symbols[addr].source, symbols[addr].linedefined
+          "%s:%d", symbols[SYMTAB_LFUNC][addr].source, symbols[SYMTAB_LFUNC][addr].linedefined
         ),
         num = event.num,
       }
-- 
2.32.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table
  2021-07-23 12:39 ` [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table Maxim Kokryashkin via Tarantool-patches
@ 2021-07-28  8:16   ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-07-28  8:16 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

On 23.07.21, Maxim Kokryashkin wrote:

| Subject: Re: [PATCH 1/3] memprof: extend memprof symbol table

Please use "PATCH luajit" prefix, for the LuaJIT-related patches.

| memprof: extend memprof symbol table

Nit: I suppose we can reformulate the header like the following, to
clarify it:

| memprof: extend symtab with information about .so libs

Feel free to change it on your own.

> This commit enriches memprof's symbol table with information about loaded
> ".so" files. That information will provide demangling capabilities to
> the parser.
> 
> The following data is stored for each ".so" library:
> | SYMTAB_SO | path to .so | it's address |  CRC32 Hash  |
>   1 byte                       8 bytes       8 bytes
>   magic
>   number
> 
> C symbol address from profile events with an address of the ".so" file will
> give us an offset, and it can be used to get the symbol in that file.
> 
> CRC32 hash will be used by parser to determine whether the shared

Typo: s/by parser/by the parser/

> object that was used during the process profiing and during the process

Typo: s/profiing/profiling/

> of parsing is the same.
> 
> The CRC32 hash implementation was taken from gcc libberty, which is

Typo? s/was/is/ IINM we prefer to use Present Simple in comments and
commit messages.

> distributed under GNU GPL license

Typo: missed dot at the end of the sentence

> 
> Part of tarantool/tarantool#5813
> ---

IINM we should compare this approach with a full dump of all .so
functions. May you provide some benches of your approach against the
aforementioned?

>  src/CMakeLists.txt |   1 +
>  src/lj_hash.c      | 136 +++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_hash.h      |  28 ++++++++++

I suppose, that header file should be moved to the <lj_utils.h>, like it
is done for uleb. The hash interfaces should start with `lj_utils_`.

I suppose that the introducion of this module with description about its
functionality should be done in the separate commit.

Also, it will be nice to have some tests to verify its behaviour.

Also, I'd like to see some benchmarks to compare it with other crc32
implementations.

>  src/lj_memprof.c   |  27 ++++++++-
>  src/lj_memprof.h   |   1 +
>  5 files changed, 192 insertions(+), 1 deletion(-)
>  create mode 100644 src/lj_hash.c
>  create mode 100644 src/lj_hash.h

I suppose that we can join commit provides the new symtab behaviour and
the commit for parser with tests.

> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
> index 809aac68..feefb412 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -63,6 +63,7 @@ make_source_list(SOURCES_UTILS
>      lj_utils_leb128.c
>      lj_vmmath.c
>      lj_wbuf.c
> +    lj_hash.c

Nit: All entries should be alphabetically sorted.

>  )

Please, update build with Makefile.original as well.
Also, new files should be included in the <lj_amalg.c> and the
corresponding build.

>  
>  make_source_list(SOURCES_PROFILER
> diff --git a/src/lj_hash.c b/src/lj_hash.c
> new file mode 100644
> index 00000000..88c41a4b
> --- /dev/null
> +++ b/src/lj_hash.c
> @@ -0,0 +1,136 @@
> +/*

Please, use the following multiline comment:
| /*
| **
| */

> +   Copyright (C) 2009-2021 Free Software Foundation, Inc.
> +   This file is part of the libiberty library.
> +   This file is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 2 of the License, or
> +   (at your option) any later version.
> +   In addition to the permissions in the GNU General Public License, the
> +   Free Software Foundation gives you unlimited permission to link the
> +   compiled version of this file into combinations with other programs,
> +   and to distribute those combinations without any restriction coming
> +   from the use of this file.  (The General Public License restrictions
> +   do apply in other respects; for example, they cover modification of
> +   the file, and distribution when not linked into a combined
> +   executable.)
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +   You should have received a copy of the GNU General Public License
> +   along with this program; if not, write to the Free Software
> +   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
> +*/
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +#include "lj_hash.h"
> +
> +static const uint32_t crc32_table[] =
> +{
> +  0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
> +  0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005,
> +  0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
> +  0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
> +  0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
> +  0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
> +  0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011,
> +  0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd,
> +  0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
> +  0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
> +  0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81,
> +  0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
> +  0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49,
> +  0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95,
> +  0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
> +  0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d,
> +  0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae,
> +  0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
> +  0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16,
> +  0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
> +  0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
> +  0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02,
> +  0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066,
> +  0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
> +  0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
> +  0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692,
> +  0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
> +  0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a,
> +  0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e,
> +  0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
> +  0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686,
> +  0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a,
> +  0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
> +  0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb,
> +  0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
> +  0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
> +  0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47,
> +  0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b,
> +  0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
> +  0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
> +  0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7,
> +  0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
> +  0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f,
> +  0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3,
> +  0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
> +  0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b,
> +  0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f,
> +  0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
> +  0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640,
> +  0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
> +  0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
> +  0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24,
> +  0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30,
> +  0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
> +  0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
> +  0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654,
> +  0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
> +  0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c,
> +  0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18,
> +  0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
> +  0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0,
> +  0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c,
> +  0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
> +  0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
> +};
> +
> +uint32_t lj_crc32 (const char *buf, size_t len, uint32_t init)
> +{
> +  uint32_t crc = init;
> +  while (len--)
> +    {

This is not considering our code style -- just use the same line
with `while`.

> +      crc = (crc << 8) ^ crc32_table[((crc >> 24) ^ *buf) & 255];
> +      buf++;
> +    }
> +  return crc;
> +}
> +
> +uint32_t lj_get_file_crc32(const char* name) {

I suppose, that it should be done in the other module -- maybe memprof
itself.

Also, this function return 0 on failure and hashsum otherwise. But it is
possibly that the hashsum equals zero. What is the behaviour in this
case?

> +  FILE* so = fopen(name, "r");
> +  if(!so) {

Typo: s/if(!so)/if (!so)
Here and similar typos below.

Nit: Brackets are excess.

> +    return 0;
> +  }
> +
> +  fseek(so, 0, SEEK_END);
> +  size_t size = ftell(so);

Nit: please declare all necessary variable at the beggining of the
scope. See also warnings, when build with:
-Wdeclaration-after-statement

Nit: See no changes related to this variable, so it can be declared as
`const`.

Why do you prefer to read the whole file at once instead read it
piece by piece and calculate the hash from the reading stream?
It has less memory usage -- we even can use memory on the stack.
Also, with this we can avoid OOM errors.

> +  rewind(so);
> +
> +  char* buf = calloc(size, sizeof(char));

It is better to use LuaJIT allocator here, as it is done for all
internal structures.

> +  if(!buf) {
> +    fclose(so);
> +    return 0;
> +  }
> +
> +  if(size != fread(buf, sizeof(char), size, so)) {
> +    free(buf);
> +    fclose(so);
> +    return 0;
> +  }
> +
> +  uint32_t hash = lj_crc32(buf, size, 0);

Why do you use 0 instead of 0xffffffff mentioned in the <lj_hash.h>?

Nit: See no changes related to this variable, so it can be declared as
`const`.

> +  free(buf);
> +  fclose(so);
> +  return hash;
> +}
> +
> diff --git a/src/lj_hash.h b/src/lj_hash.h
> new file mode 100644
> index 00000000..aca24e97
> --- /dev/null
> +++ b/src/lj_hash.h
> @@ -0,0 +1,28 @@

Should it contain copyright header too?

> +#include <stdint.h>
> +#include <sys/types.h>
> +
> +/*
> +@deftypefn Extension {unsigned int} crc32 (const unsigned char *@var{buf}, @

We don't use the autogenerated documenation tools for now, so this
mementiones can be dropped.
Here and below.

> +  int @var{len}, unsigned int @var{init})
> +Compute the 32-bit CRC of @var{buf} which has length @var{len}.  The

Typo: double whitespace

> +starting value is @var{init}; this may be used to compute the CRC of
> +data split across multiple buffers by passing the return value of each
> +call as the @var{init} parameter of the next.
> +This is used by the @command{gdb} remote protocol for the @samp{qCRC}
> +command.  In order to get the same results as gdb for a block of data,

Typo: double whitespace

> +you must pass the first CRC parameter as @code{0xffffffff}.
> +This CRC can be specified as:
> +  Width  : 32
> +  Poly   : 0x04c11db7
> +  Init   : parameter, typically 0xffffffff
> +  RefIn  : false
> +  RefOut : false
> +  XorOut : 0
> +This differs from the "standard" CRC-32 algorithm in that the values
> +are not reflected, and there is no final XOR value.  These differences

Typo: double whitespace

> +make it easy to compose the values of multiple blocks.
> +@end deftypefn
> +*/
> +
> +uint32_t lj_crc32(const char *buf, size_t len, uint32_t init);
> +uint32_t lj_get_file_crc32(const char* name);
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index 2c1ef3b8..4e8183aa 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
> @@ -5,13 +5,19 @@
>  ** Copyright (C) 2015-2019 IPONWEB Ltd.
>  */
>  
> +#define _GNU_SOURCE

Locally, I need to define not _GNU_SOURCE, but __USE_GNU. May you please
check what exactly define is required and why?

> +#include <link.h>
> +
> +#include "lj_wbuf.h"

Nit: "lj_memprof.h" already includes "lj_wbuf.h", so this include is excess.
Also, please move all requires below (including <link.h>)

>  #define lj_memprof_c
>  #define LUA_CORE
> -

Nit: This change is redundant. Also, it separates defines related to
build system with the other code. Please, revert it.

>  #include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
>  
>  #include "lj_arch.h"
>  #include "lj_memprof.h"
> +#include "lj_hash.h"

Nit: "lj_hash.h" is not required for build without memprof, so it can be
moved under the following #if directive.

>  
>  #if LJ_HASMEMPROF
>  
> @@ -24,6 +30,22 @@
>  static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
>  					   0x0, 0x0, 0x0};

The symtab version must be updated as far as we update the protocol.

>  
> +int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data)
> +{
> +  struct lj_wbuf *buf = data;
> +
> +  if(info->dlpi_name && strlen(info->dlpi_name)) {
> +    lj_wbuf_addbyte(buf, SYMTAB_SO);
> +    lj_wbuf_addstring(buf, info->dlpi_name);
> +    lj_wbuf_addu64(buf, info->dlpi_addr);
> +
> +    uint64_t so_hash = lj_get_file_crc32(info->dlpi_name);

Sometimes this call may return the error value -- we should check it
to avoid groupping all .so files into one by the same wrong hash.

Nit: See no changes related to this variable, so it can be declared as
`const`.

> +    lj_wbuf_addu64(buf, so_hash);
> +  }
> +
> +  return 0;
> +}
> +
>  static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
>  {
>    const GCRef *iter = &g->gc.root;
> @@ -49,6 +71,9 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
>      iter = &o->gch.nextgc;
>    }
>  
> +  /* Write shared libraries */

Typo: s/libraries/libraries./

> +  dl_iterate_phdr(write_shared_obj, out);
> +
>    lj_wbuf_addbyte(out, SYMTAB_FINAL);
>  }
>  
> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> index 3417475d..d53f77bc 100644
> --- a/src/lj_memprof.h
> +++ b/src/lj_memprof.h
> @@ -51,6 +51,7 @@
>  */
>  
>  #define SYMTAB_LFUNC ((uint8_t)0)
> +#define SYMTAB_SO ((uint8_t)1)

Side note: I don't remember how we align similar definitions, but
IMHO we can add some whitespaces to improve readability :).

>  #define SYMTAB_FINAL ((uint8_t)0x80)
>  
>  #define LJM_CURRENT_FORMAT_VERSION 0x01
> -- 
> 2.32.0
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-28  8:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-23 12:39 [Tarantool-patches] [PATCH 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
2021-07-23 12:39 ` [Tarantool-patches] [PATCH 1/3] memprof: extend memprof symbol table Maxim Kokryashkin via Tarantool-patches
2021-07-28  8:16   ` Sergey Kaplun via Tarantool-patches
2021-07-23 12:39 ` [Tarantool-patches] [PATCH 2/3] memprof: update memprof parser Maxim Kokryashkin via Tarantool-patches
2021-07-23 12:39 ` [Tarantool-patches] [PATCH 3/3] test: adapt memprof tests to new symbol table format Maxim Kokryashkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox