Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions
@ 2021-08-20 11:10 Maxim Kokryashkin via Tarantool-patches
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation Maxim Kokryashkin via Tarantool-patches
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-08-20 11:10 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

Changes in v2:
  - Fixed comments as per review by Sergey


Maxim Kokryashkin (3):
  utils: add CRC32 hash implementation
  memprof: extend symtab with information about .so libs
  memprof: update memprof parser

 CMakeLists.txt                                |   2 +
 src/CMakeLists.txt                            |   1 +
 src/Makefile.dep.original                     |   3 +-
 src/Makefile.original                         |   2 +-
 src/lj_memprof.c                              |  60 +++++++++
 src/lj_memprof.h                              |   3 +-
 src/lj_utils.h                                |  29 +++++
 src/lj_utils_hash.c                           | 110 +++++++++++++++++
 src/ljamalg.c                                 |   1 +
 .../misclib-memprof-lapi.test.lua             |   5 +-
 tools/CMakeLists.txt                          |   2 +
 tools/utils/hash.lua                          |  99 +++++++++++++++
 tools/utils/symtab.lua                        | 114 +++++++++++++++++-
 13 files changed, 420 insertions(+), 11 deletions(-)
 create mode 100644 src/lj_utils_hash.c
 create mode 100644 tools/utils/hash.lua

--- 
Github branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-5813-demangling-of-c-symbols
Issue: https://github.com/tarantool/tarantool/issues/5813

2.32.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation
  2021-08-20 11:10 [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
@ 2021-08-20 11:10 ` Maxim Kokryashkin via Tarantool-patches
  2021-08-30 14:08   ` Sergey Kaplun via Tarantool-patches
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs Maxim Kokryashkin via Tarantool-patches
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-08-20 11:10 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit adds an implementation of the CRC32 hash. The
implementation is taken from GNU libberty so it differs from the
classic CRC32 implementation. The values are not reflected, and there
is no final XOR value. These differences make it easy to compose the
values of multiple blocks.

That hash implementation will be used in the memprof's symtab later on.

Part of tarantool/tarantool#5813
---
On 28.07.21 Sergey Kaplun wrote:

<snipped>

> IINM we should compare this approach with a full dump of all .so
> functions. May you provide some benches of your approach against the
> aforementioned?
As for benchmarks, I'll provide them later in the separate letter.

<snipped>

>> +#define _GNU_SOURCE
> Locally, I need to define not _GNU_SOURCE, but __USE_GNU. May you please
> check what exactly define is required and why?
Defining __USE_GNU by yourself is a terrible practice as it is an 
internal definition of the glibc. Consodering this, you should stick
to _GNU_SOURCE

<snipped>

 src/CMakeLists.txt        |   1 +
 src/Makefile.dep.original |   3 +-
 src/Makefile.original     |   2 +-
 src/lj_utils.h            |  29 ++++++++++
 src/lj_utils_hash.c       | 110 ++++++++++++++++++++++++++++++++++++++
 src/ljamalg.c             |   1 +
 6 files changed, 144 insertions(+), 2 deletions(-)
 create mode 100644 src/lj_utils_hash.c

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 809aac68..74c7a205 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -60,6 +60,7 @@ make_source_list(SOURCES_UTILS
   SOURCES
     lj_alloc.c
     lj_char.c
+    lj_utils_hash.c
     lj_utils_leb128.c
     lj_vmmath.c
     lj_wbuf.c
diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
index f3672413..3dc6e9b4 100644
--- a/src/Makefile.dep.original
+++ b/src/Makefile.dep.original
@@ -208,6 +208,7 @@ lj_trace.o: lj_trace.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_vm.h lj_vmevent.h lj_target.h lj_target_*.h
 lj_udata.o: lj_udata.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_gc.h lj_udata.h
+lj_utils_hash.o: lj_utils_hash.c lj_utils.h
 lj_utils_leb128.o: lj_utils_leb128.c lj_utils.h lj_def.h lua.h luaconf.h
 lj_vmevent.o: lj_vmevent.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_str.h lj_tab.h lj_state.h lj_dispatch.h lj_bc.h lj_jit.h lj_ir.h \
@@ -234,7 +235,7 @@ ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \
  lj_snap.h lj_opt_split.c lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c \
  lj_record.h lj_ffrecord.h lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h \
  lj_asm.c lj_asm.h lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c \
- lj_alloc.c lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c \
+ lj_alloc.c lj_utils_hash.c lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c \
  lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
  lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
 luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h
diff --git a/src/Makefile.original b/src/Makefile.original
index 031f0778..2a57c022 100644
--- a/src/Makefile.original
+++ b/src/Makefile.original
@@ -498,7 +498,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o lj_wbuf.o \
 	  lj_asm.o lj_trace.o lj_gdbjit.o \
 	  lj_ctype.o lj_cdata.o lj_cconv.o lj_ccall.o lj_ccallback.o \
 	  lj_carith.o lj_clib.o lj_cparse.o \
-	  lj_lib.o lj_alloc.o lj_utils_leb128.o lib_aux.o \
+	  lj_lib.o lj_alloc.o lj_utils_hash.o lj_utils_leb128.o lib_aux.o \
 	  $(LJLIB_O) lib_init.o
 
 LJVMCORE_O= $(LJVM_O) $(LJCORE_O)
diff --git a/src/lj_utils.h b/src/lj_utils.h
index f5c15797..fe179753 100644
--- a/src/lj_utils.h
+++ b/src/lj_utils.h
@@ -5,6 +5,30 @@
 ** Copyright (C) 2015-2019 IPONWEB Ltd.
 */
 
+/*
+** Copyright (C) 2009-2021 Free Software Foundation, Inc.
+** This file is part of the libiberty library.
+** This file is free software; you can redistribute it and/or modify
+** it under the terms of the GNU General Public License as published by
+** the Free Software Foundation; either version 2 of the License, or
+** (at your option) any later version.
+** In addition to the permissions in the GNU General Public License, the
+** Free Software Foundation gives you unlimited permission to link the
+** compiled version of this file into combinations with other programs,
+** and to distribute those combinations without any restriction coming
+** from the use of this file.  (The General Public License restrictions
+** do apply in other respects; for example, they cover modification of
+** the file, and distribution when not linked into a combined
+** executable.)
+** This program is distributed in the hope that it will be useful,
+** but WITHOUT ANY WARRANTY; without even the implied warranty of
+** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+** GNU General Public License for more details.
+** You should have received a copy of the GNU General Public License
+** along with this program; if not, write to the Free Software
+** Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+*/
+
 #ifndef _LJ_UTILS_H
 #define _LJ_UTILS_H
 
@@ -55,4 +79,9 @@ size_t LJ_FASTCALL lj_utils_write_leb128(uint8_t *buffer, int64_t value);
 */
 size_t LJ_FASTCALL lj_utils_write_uleb128(uint8_t *buffer, uint64_t value);
 
+/*
+** Calculates CRC32 hash of a data in a buffer of a known length.
+** Checks if the buffer is NULL, corresponding assertion fails in that case.
+*/
+uint32_t lj_utils_crc32(const char *buf, size_t len, uint32_t init);
 #endif
diff --git a/src/lj_utils_hash.c b/src/lj_utils_hash.c
new file mode 100644
index 00000000..8a7f0869
--- /dev/null
+++ b/src/lj_utils_hash.c
@@ -0,0 +1,110 @@
+/*
+** Copyright (C) 2009-2021 Free Software Foundation, Inc.
+** This file is part of the libiberty library.
+** This file is free software; you can redistribute it and/or modify
+** it under the terms of the GNU General Public License as published by
+** the Free Software Foundation; either version 2 of the License, or
+** (at your option) any later version.
+** In addition to the permissions in the GNU General Public License, the
+** Free Software Foundation gives you unlimited permission to link the
+** compiled version of this file into combinations with other programs,
+** and to distribute those combinations without any restriction coming
+** from the use of this file.  (The General Public License restrictions
+** do apply in other respects; for example, they cover modification of
+** the file, and distribution when not linked into a combined
+** executable.)
+** This program is distributed in the hope that it will be useful,
+** but WITHOUT ANY WARRANTY; without even the implied warranty of
+** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+** GNU General Public License for more details.
+** You should have received a copy of the GNU General Public License
+** along with this program; if not, write to the Free Software
+** Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+*/
+
+#define lj_utils_hash
+
+#include <stdio.h>
+#include <assert.h>
+
+#include "lj_utils.h"
+
+static const uint32_t crc32_table[] =
+{
+  0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
+  0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005,
+  0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
+  0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
+  0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
+  0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
+  0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011,
+  0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd,
+  0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
+  0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
+  0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81,
+  0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
+  0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49,
+  0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95,
+  0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
+  0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d,
+  0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae,
+  0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
+  0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16,
+  0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
+  0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
+  0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02,
+  0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066,
+  0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
+  0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
+  0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692,
+  0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
+  0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a,
+  0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e,
+  0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
+  0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686,
+  0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a,
+  0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
+  0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb,
+  0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
+  0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
+  0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47,
+  0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b,
+  0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
+  0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
+  0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7,
+  0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
+  0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f,
+  0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3,
+  0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
+  0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b,
+  0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f,
+  0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
+  0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640,
+  0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
+  0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
+  0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24,
+  0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30,
+  0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
+  0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
+  0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654,
+  0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
+  0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c,
+  0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18,
+  0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
+  0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0,
+  0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c,
+  0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
+  0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
+};
+
+uint32_t lj_utils_crc32 (const char *buf, size_t len, uint32_t init)
+{
+  assert(buf != NULL);
+
+  uint32_t crc = init;
+  while (len--) {
+    crc = (crc << 8) ^ crc32_table[((crc >> 24) ^ *buf) & 255];
+    buf++;
+  }
+  return crc;
+}
diff --git a/src/ljamalg.c b/src/ljamalg.c
index 3f7e6860..3a667a68 100644
--- a/src/ljamalg.c
+++ b/src/ljamalg.c
@@ -83,6 +83,7 @@
 #include "lj_trace.c"
 #include "lj_gdbjit.c"
 #include "lj_alloc.c"
+#include "lj_utils_hash.c"
 #include "lj_utils_leb128.c"
 
 #include "lib_aux.c"
-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs
  2021-08-20 11:10 [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation Maxim Kokryashkin via Tarantool-patches
@ 2021-08-20 11:10 ` Maxim Kokryashkin via Tarantool-patches
  2021-08-30 14:10   ` Sergey Kaplun via Tarantool-patches
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 3/3] memprof: update memprof parser Maxim Kokryashkin via Tarantool-patches
  2021-08-25 10:01 ` [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Максим Корякшин via Tarantool-patches
  3 siblings, 1 reply; 7+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-08-20 11:10 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit enriches memprof's symbol table with information about loaded
".so" files. That information will provide demangling capabilities to
the parser.

The following data is stored for each ".so" library:
| SYMTAB_SO | path to .so | it's address |  CRC32 Hash  |
  1 byte                       8 bytes       8 bytes
  magic
  number

C symbol address from profile events with an address of the ".so" file will
give us an offset, and it can be used to get the symbol in that file.

CRC32 hash will be used by the parser to determine whether the shared
object that was used during the process profiling and during the process
of parsing is the same.

Part of tarantool/tarantool#5813
---
 src/lj_memprof.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_memprof.h |  3 ++-
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index 2c1ef3b8..c8367c95 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -5,10 +5,15 @@
 ** Copyright (C) 2015-2019 IPONWEB Ltd.
 */
 
+#define _GNU_SOURCE
+
 #define lj_memprof_c
 #define LUA_CORE
 
 #include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <link.h>
 
 #include "lj_arch.h"
 #include "lj_memprof.h"
@@ -18,12 +23,64 @@
 #include "lj_obj.h"
 #include "lj_frame.h"
 #include "lj_debug.h"
+#include "lj_utils.h"
 
 /* --------------------------------- Symtab --------------------------------- */
 
 static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
 					   0x0, 0x0, 0x0};
 
+int lj_file_crc32(const char* name, uint32_t* result) 
+{
+#define CRC_BUF_SIZE 1024 * 1024
+
+  char buf[CRC_BUF_SIZE] = "";
+  size_t read_bytes = 0;
+  FILE* so = NULL;
+
+  if (result == NULL) {
+    return 1;
+  }
+
+  *result = 0xffffffff;
+
+  so = fopen(name, "rb");
+  if (!so)
+    return 1;
+
+  while (CRC_BUF_SIZE == (read_bytes = fread(buf, sizeof(*buf), CRC_BUF_SIZE, so))) {
+    *result = lj_utils_crc32(buf, CRC_BUF_SIZE, *result); 
+  }
+  
+  if(ferror(so) || !feof(so)) {
+    fclose(so);
+    return 1;
+  }
+  
+  *result = lj_utils_crc32(buf, read_bytes, *result);
+  fclose(so);
+  return 0;
+}
+
+int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data)
+{
+  uint32_t so_hash = 0;
+  struct lj_wbuf *buf = data;
+
+  if (info->dlpi_name && strlen(info->dlpi_name)) {
+    if (0 != lj_file_crc32(info->dlpi_name, &so_hash))
+      // XXX: Maybe it is reasonable to print warning here
+      return 0;
+
+    lj_wbuf_addbyte(buf, SYMTAB_SO);
+    lj_wbuf_addstring(buf, info->dlpi_name);
+    lj_wbuf_addu64(buf, info->dlpi_addr);
+    lj_wbuf_addu64(buf, so_hash);
+  }
+
+  return 0;
+}
+
 static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
 {
   const GCRef *iter = &g->gc.root;
@@ -49,6 +106,9 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
     iter = &o->gch.nextgc;
   }
 
+  /* Write shared libraries. */
+  dl_iterate_phdr(write_shared_obj, out);
+
   lj_wbuf_addbyte(out, SYMTAB_FINAL);
 }
 
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
index 3417475d..0cefc403 100644
--- a/src/lj_memprof.h
+++ b/src/lj_memprof.h
@@ -16,7 +16,7 @@
 #include "lj_def.h"
 #include "lj_wbuf.h"
 
-#define LJS_CURRENT_VERSION 0x1
+#define LJS_CURRENT_VERSION 0x2
 
 /*
 ** symtab format:
@@ -51,6 +51,7 @@
 */
 
 #define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_SO    ((uint8_t)1)
 #define SYMTAB_FINAL ((uint8_t)0x80)
 
 #define LJM_CURRENT_FORMAT_VERSION 0x01
-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 3/3] memprof: update memprof parser
  2021-08-20 11:10 [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation Maxim Kokryashkin via Tarantool-patches
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs Maxim Kokryashkin via Tarantool-patches
@ 2021-08-20 11:10 ` Maxim Kokryashkin via Tarantool-patches
  2021-08-25 10:01 ` [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Максим Корякшин via Tarantool-patches
  3 siblings, 0 replies; 7+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2021-08-20 11:10 UTC (permalink / raw)
  To: tarantool-patches, imun, skaplun

This commit introduces demangling of C symbols to memprof parser.

As symbol table format has changed, parser needed to be updated too. Now
the parser supports new symbol table entries, containing data about shared
objects, that were loaded at the moment of data collection.

Secondly, we needed a way to verify, that .so libraries that we encounter
during the process of parsing are the same libraires we have encountered during
the process of a data collection. A full CRC32 hash of .so file contents is used
for that purpose.

C symbols resolution is implemented with dladdr, which resolves symbols
well only if they are loaded into the dynamic symbol table. Hence, LuaJIT now has to be
compiled with -rdynamic option, which forces the dynamic linker to add
all the symbols from loaded .so libraries to the dynamic symbol table.

Closes tarantool/tarantool#5813
---
 CMakeLists.txt                                |   2 +
 .../misclib-memprof-lapi.test.lua             |   5 +-
 tools/CMakeLists.txt                          |   2 +
 tools/utils/hash.lua                          |  99 +++++++++++++++
 tools/utils/symtab.lua                        | 114 +++++++++++++++++-
 5 files changed, 214 insertions(+), 8 deletions(-)
 create mode 100644 tools/utils/hash.lua

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 5348e043..08243f11 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -81,6 +81,8 @@ endif()
 
 # --- Compilation flags setup --------------------------------------------------
 
+AppendFlags(TARGET_C_FLAGS -rdynamic)
+
 if(NOT CMAKE_INSTALL_PREFIX STREQUAL "/usr/local")
   AppendFlags(TARGET_C_FLAGS -DLUA_ROOT='"${CMAKE_INSTALL_PREFIX}"')
 endif()
diff --git a/test/tarantool-tests/misclib-memprof-lapi.test.lua b/test/tarantool-tests/misclib-memprof-lapi.test.lua
index 06d96b3b..65b6afa6 100644
--- a/test/tarantool-tests/misclib-memprof-lapi.test.lua
+++ b/test/tarantool-tests/misclib-memprof-lapi.test.lua
@@ -53,6 +53,7 @@ local function generate_output(filename)
 end
 
 local function fill_ev_type(events, symbols, event_type)
+  local SYMTAB_LFUNC = 0
   local ev_type = {}
   for _, event in pairs(events[event_type]) do
     local addr = event.loc.addr
@@ -61,10 +62,10 @@ local function fill_ev_type(events, symbols, event_type)
         name = "INTERNAL",
         num = event.num,
     }
-    elseif symbols[addr] then
+    elseif symbols[SYMTAB_LFUNC][addr] then
       ev_type[event.loc.line] = {
         name = string.format(
-          "%s:%d", symbols[addr].source, symbols[addr].linedefined
+          "%s:%d", symbols[SYMTAB_LFUNC][addr].source, symbols[SYMTAB_LFUNC][addr].linedefined
         ),
         num = event.num,
       }
diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt
index 61830e44..a70be4ea 100644
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -32,6 +32,7 @@ else()
     memprof.lua
     utils/bufread.lua
     utils/symtab.lua
+    utils/hash.lua
   )
   list(APPEND LUAJIT_TOOLS_DEPS tools-parse-memprof)
 
@@ -48,6 +49,7 @@ else()
   install(FILES
       ${CMAKE_CURRENT_SOURCE_DIR}/utils/bufread.lua
       ${CMAKE_CURRENT_SOURCE_DIR}/utils/symtab.lua
+      ${CMAKE_CURRENT_SOURCE_DIR}/utils/hash.lua
     DESTINATION ${LUAJIT_DATAROOTDIR}/utils
     PERMISSIONS
       OWNER_READ OWNER_WRITE
diff --git a/tools/utils/hash.lua b/tools/utils/hash.lua
new file mode 100644
index 00000000..2724e2bb
--- /dev/null
+++ b/tools/utils/hash.lua
@@ -0,0 +1,99 @@
+local bit = require "bit"
+
+local M = {}
+
+local CRC32 = {
+  0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9,
+  0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005,
+  0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61,
+  0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
+  0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
+  0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75,
+  0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011,
+  0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd,
+  0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
+  0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
+  0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81,
+  0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d,
+  0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49,
+  0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95,
+  0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
+  0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d,
+  0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae,
+  0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072,
+  0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16,
+  0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
+  0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde,
+  0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02,
+  0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066,
+  0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
+  0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
+  0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692,
+  0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6,
+  0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a,
+  0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e,
+  0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
+  0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686,
+  0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a,
+  0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637,
+  0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb,
+  0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
+  0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53,
+  0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47,
+  0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b,
+  0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
+  0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
+  0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7,
+  0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b,
+  0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f,
+  0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3,
+  0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
+  0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b,
+  0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f,
+  0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3,
+  0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640,
+  0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
+  0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8,
+  0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24,
+  0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30,
+  0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
+  0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
+  0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654,
+  0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0,
+  0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c,
+  0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18,
+  0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
+  0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0,
+  0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c,
+  0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668,
+  0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
+}
+
+local xor = bit.bxor
+local lshift = bit.lshift
+local rshift = bit.rshift
+local band = bit.band
+
+function M.crc32(str)
+    str = tostring(str)
+    local len = string.len(str)
+    local crc = 0xffffffff
+    local i = 1
+
+    while len > 0 do
+        local byte = string.byte(str, i)
+        i = i + 1
+        len = len - 1
+
+        local tab_idx = band(xor(rshift(crc, 24), byte), 0xFF) + 1
+        crc = xor(band(lshift(crc, 8), 2^32 - 1), CRC32[tab_idx])
+    end
+
+    -- dirty hack for bitop return number < 0
+    if crc < 0 then crc = crc + 2 ^ 32 end
+
+    return crc
+end
+
+
+return M
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index 3ed1dd13..cb14b516 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -5,16 +5,57 @@
 -- Copyright (C) 2015-2019 IPONWEB Ltd.
 
 local bit = require "bit"
+local io = require "io"
+local hash = require "utils.hash"
 
 local band = bit.band
 local string_format = string.format
 
+local ffi = require "ffi"
+local dl = ffi.load "dl"
+
+ffi.cdef[[
+  typedef struct {
+    const char *dli_fname;
+    void *dli_fbase;
+    const char *dli_sname;
+    void *dli_saddr;
+  } Dl_info;
+
+  typedef struct
+  {
+    uint32_t        p_type;
+    uint32_t        p_flags;
+    uint64_t        p_offset;
+    uint64_t        p_vaddr;
+    uint64_t        p_paddr;
+    uint64_t        p_filesz;
+    uint64_t        p_memsz;
+    uint64_t        p_align;
+  } Elf64_Phdr;
+
+  struct dl_phdr_info {
+    uint64_t dlpi_addr;
+    const char *dlpi_name;
+    const Elf64_Phdr *dlpi_phdr;
+    uint16_t dlpi_phnum;
+  };
+
+  void *dlopen(const char *filename, int flags);
+  int dlclose(void *handle);
+  int dladdr(const void *addr, Dl_info *info);
+  int dl_iterate_phdr(int (*callback) (struct dl_phdr_info *info, size_t size, void *data), void *data);
+]]
+
 local LJS_MAGIC = "ljs"
-local LJS_CURRENT_VERSION = 1
+local LJS_CURRENT_VERSION = 2
 local LJS_EPILOGUE_HEADER = 0x80
 local LJS_SYMTYPE_MASK = 0x03
 
 local SYMTAB_LFUNC = 0
+local SYMTAB_SO = 1
+
+local RTLD_NOW = 0x00002
 
 local M = {}
 
@@ -24,18 +65,59 @@ local function parse_sym_lfunc(reader, symtab)
   local sym_chunk = reader:read_string()
   local sym_line = reader:read_uleb128()
 
-  symtab[sym_addr] = {
+  symtab[SYMTAB_LFUNC][sym_addr] = {
     source = sym_chunk,
     linedefined = sym_line,
   }
 end
 
+-- Parse a single entry in a symtab: .so library
+local function parse_sym_so(reader, symtab)
+  local path = reader:read_string()
+  local addr = reader:read_uleb128()
+  local so_hash = reader:read_uleb128()
+
+  local handle = dl.dlopen(path, RTLD_NOW);
+
+  if handle == nil then
+    return
+  end
+
+  local file = io.open(path, "rb")
+
+  if file == nil then
+    dl.dlclose(handle);
+    return
+  end
+
+  local content = file:read("*a")
+  local size = string.len(content)
+  file:close()
+
+  if hash.crc32(content) ~= so_hash then
+    dl.dlclose(handle)
+    return
+  end
+
+  symtab[SYMTAB_SO][path] = {
+    handle = handle,
+    size = size,
+    old_addr = addr,
+    new_addr = ffi.cast("void*", ffi.cast("uint64_t*", handle)[0])
+  }
+end
+
 local parsers = {
   [SYMTAB_LFUNC] = parse_sym_lfunc,
+  [SYMTAB_SO] = parse_sym_so
 }
 
 function M.parse(reader)
-  local symtab = {}
+  local symtab = {
+    [SYMTAB_LFUNC] = {},
+    [SYMTAB_SO] = {}
+  }
+
   local magic = reader:read_octets(3)
   local version = reader:read_octets(1)
 
@@ -69,7 +151,6 @@ function M.parse(reader)
       parsers[sym_type](reader, symtab)
     end
   end
-
   return symtab
 end
 
@@ -80,8 +161,29 @@ function M.demangle(symtab, loc)
     return "INTERNAL"
   end
 
-  if symtab[addr] then
-    return string_format("%s:%d", symtab[addr].source, loc.line)
+  if symtab[SYMTAB_LFUNC][addr] then
+    return string_format("%s:%d", symtab[SYMTAB_LFUNC][addr].source, loc.line)
+  end
+
+  local pDl_info = ffi.new'Dl_info[1]'
+
+  for _, info in pairs(symtab[SYMTAB_SO]) do
+    local offset = addr - info.old_addr
+
+    if offset <= info.size and offset > 0 then
+      local ptr = ffi.cast("void*", offset + ffi.cast('uintptr_t', info.new_addr))
+
+      if 0 ~= dl.dladdr(ptr, pDl_info) then
+        local file = ffi.string(pDl_info[0].dli_fname)
+
+        local symbol = 'unresolved symbol'
+        if pDl_info[0].dli_sname ~= nil then
+          symbol = ffi.string(pDl_info[0].dli_sname)
+        end
+
+        return string_format("%s:%s %#x", file, symbol, addr)
+      end
+    end
   end
 
   return string_format("CFUNC %#x", addr)
-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions
  2021-08-20 11:10 [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 3/3] memprof: update memprof parser Maxim Kokryashkin via Tarantool-patches
@ 2021-08-25 10:01 ` Максим Корякшин via Tarantool-patches
  3 siblings, 0 replies; 7+ messages in thread
From: Максим Корякшин via Tarantool-patches @ 2021-08-25 10:01 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]


So here are the benchmark results, and they are surprising.
It turns out that dumping all symbols from all shared objects  is 
actually much faster than dumping only the name and address  of
each shared object.
 
==========================================================
tested script:
  package.cpath = "./?.so"
  require "hello"
  for j=1,1e3 do
    misc.memprof.start('./memprof.test')
    misc.memprof.stop()
  end
----------------------------------------------------------
vanilla:
  0,03s user 0,05s system 50% cpu 0,158 total
  0,01s user 0,04s system 88% cpu 0,051 total
  0,01s user 0,05s system 79% cpu 0,071 total
  avg total: 0.093
  avg total per memprof.start/stop: 9.3e-5
  performance relative to vanilla: x1
----------------------------------------------------------
dump only shared objects addresses and names:
  18,83s user 0,36s system 99% cpu 19,285 total
  19,15s user 0,35s system 99% cpu 19,586 total
  18,75s user 0,39s system 99% cpu 19,244 total
  avg total: 19.371
  avg total per memprof.start/stop: 0.019
  performance relative to vanilla: x208.29
----------------------------------------------------------
dump all symbols:
  0,30s user 0,10s system 75% cpu 0,519 total
  0,41s user 0,20s system 52% cpu 1,173 total
  0,31s user 0,08s system 73% cpu 0,531 total
  avg total: 0.741
  avg total per memprof.start/stop: 7.4e-4
  performance relative to vanilla: x7.96
==========================================================
 
Best regards,
Maxim Kokryashkin
  
>Friday, 20.08.21Maxim Kokryashkin <max.kokryashkin@gmail.com>:
> 
>Changes in v2:
>  - Fixed comments as per review by Sergey
>
>
>Maxim Kokryashkin (3):
>  utils: add CRC32 hash implementation
>  memprof: extend symtab with information about .so libs
>  memprof: update memprof parser
>
> CMakeLists.txt | 2 +
> src/CMakeLists.txt | 1 +
> src/Makefile.dep.original | 3 +-
> src/Makefile.original | 2 +-
> src/lj_memprof.c | 60 +++++++++
> src/lj_memprof.h | 3 +-
> src/lj_utils.h | 29 +++++
> src/lj_utils_hash.c | 110 +++++++++++++++++
> src/ljamalg.c | 1 +
> .../misclib-memprof-lapi.test.lua | 5 +-
> tools/CMakeLists.txt | 2 +
> tools/utils/hash.lua | 99 +++++++++++++++
> tools/utils/symtab.lua | 114 +++++++++++++++++-
> 13 files changed, 420 insertions(+), 11 deletions(-)
> create mode 100644 src/lj_utils_hash.c
> create mode 100644 tools/utils/hash.lua
>
>---
>Github branch:  https://github.com/tarantool/luajit/tree/fckxorg/gh-5813-demangling-of-c-symbols
>Issue:  https://github.com/tarantool/tarantool/issues/5813
>
>2.32.0
> 
 

[-- Attachment #2: Type: text/html, Size: 4441 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation Maxim Kokryashkin via Tarantool-patches
@ 2021-08-30 14:08   ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-30 14:08 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!

Thanks for the fixes!
Please, consider my comments below.

On 20.08.21, Maxim Kokryashkin wrote:
> This commit adds an implementation of the CRC32 hash. The
> implementation is taken from GNU libberty so it differs from the
> classic CRC32 implementation. The values are not reflected, and there
> is no final XOR value. These differences make it easy to compose the
> values of multiple blocks.
> 
> That hash implementation will be used in the memprof's symtab later on.
> 
> Part of tarantool/tarantool#5813
> ---
> On 28.07.21 Sergey Kaplun wrote:
> 
> <snipped>
> 
> > IINM we should compare this approach with a full dump of all .so
> > functions. May you provide some benches of your approach against the
> > aforementioned?
> As for benchmarks, I'll provide them later in the separate letter.

Yep, seen it here[1].
Do you check the reason of such huge difference (more than x200!!!)?
I suspect the hash calculation from the whole library, but don't check it.
Also, I didn't get the following:
Are these measurements done for LuaJIT or for Tarantool. Tarantool has
more libraries, so it would be nice to see benchmark for it too.

Also, as far as this way is such heavy, do we need this patch?

> 
> <snipped>
> 
> >> +#define _GNU_SOURCE
> > Locally, I need to define not _GNU_SOURCE, but __USE_GNU. May you please
> > check what exactly define is required and why?
> Defining __USE_GNU by yourself is a terrible practice as it is an 
> internal definition of the glibc. Consodering this, you should stick
> to _GNU_SOURCE

OK, thanks for clarification!

> 
> <snipped>
> 
>  src/CMakeLists.txt        |   1 +
>  src/Makefile.dep.original |   3 +-
>  src/Makefile.original     |   2 +-
>  src/lj_utils.h            |  29 ++++++++++
>  src/lj_utils_hash.c       | 110 ++++++++++++++++++++++++++++++++++++++

Nit: may be lj_utils_crc32.c is better here (at least we know the hash
type).
Feel free to ignore.

>  src/ljamalg.c             |   1 +
>  6 files changed, 144 insertions(+), 2 deletions(-)
>  create mode 100644 src/lj_utils_hash.c
> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
> index 809aac68..74c7a205 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt

<snipped>

> diff --git a/src/Makefile.dep.original b/src/Makefile.dep.original
> index f3672413..3dc6e9b4 100644
> --- a/src/Makefile.dep.original
> +++ b/src/Makefile.dep.original
> @@ -208,6 +208,7 @@ lj_trace.o: lj_trace.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>   lj_vm.h lj_vmevent.h lj_target.h lj_target_*.h
>  lj_udata.o: lj_udata.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>   lj_gc.h lj_udata.h
> +lj_utils_hash.o: lj_utils_hash.c lj_utils.h

Dependencies are defined as the following:
1) The file itself.
2) Each header and all included LuaJIT headers in it.

For example: lj_utils_leb128.o depends on the .c file itself. It
includes <lj_utils.h>, so add it in dependencies too.
<lj_utils.h> includes inside <lj_def.h>. <lj_def.h> includes <lua.h>,
which includes <luaconf.h>.

>  lj_utils_leb128.o: lj_utils_leb128.c lj_utils.h lj_def.h lua.h luaconf.h
>  lj_vmevent.o: lj_vmevent.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>   lj_str.h lj_tab.h lj_state.h lj_dispatch.h lj_bc.h lj_jit.h lj_ir.h \
> @@ -234,7 +235,7 @@ ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \

Also, I see tons of errors like the following:

| /usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: libluajit.a(ljamalg.o): in function `lj_err_throw':
| ljamalg.c:(.text+0xaf50): multiple definition of `lj_err_throw'; libluajit.a(lj_err.o):lj_err.c:(.text+0x270): first defined here

This problem is known, IINM.

How do you check this patch?
If you have unrelated patch, which fixes the mentioned build type,
please send it to the mailing list.

<snipped>

> diff --git a/src/Makefile.original b/src/Makefile.original
> index 031f0778..2a57c022 100644
> --- a/src/Makefile.original
> +++ b/src/Makefile.original

<snipped>

> diff --git a/src/lj_utils.h b/src/lj_utils.h
> index f5c15797..fe179753 100644
> --- a/src/lj_utils.h
> +++ b/src/lj_utils.h

<snipped>

> diff --git a/src/lj_utils_hash.c b/src/lj_utils_hash.c
> new file mode 100644
> index 00000000..8a7f0869
> --- /dev/null
> +++ b/src/lj_utils_hash.c
> @@ -0,0 +1,110 @@
> +/*

<snipped>

> +*/
> +
> +#define lj_utils_hash

Typo: s/lj_utils_hash/lj_utils_hash_c/

Minor: Please, add LUA_CORE define for consistency with other c modules.

> +
> +#include <stdio.h>

Typo: Looks like the extra one. :)

> +#include <assert.h>

This is excess, see the comment near usage below.

> +
> +#include "lj_utils.h"
> +
> +static const uint32_t crc32_table[] =
> +{

<snipped>

> +};
> +
> +uint32_t lj_utils_crc32 (const char *buf, size_t len, uint32_t init)
> +{
> +  assert(buf != NULL);

Please, use `lua_assert()` instead -- it is common practise for all
LuaJIT code.

> +
> +  uint32_t crc = init;

When build like the following (the `CCWARN` comment line in the
<src/Makefile.original>):

| cmake -DCMAKE_C_FLAGS="-Wextra -Wdeclaration-after-statement \
| -Wredundant-decls -Wshadow -Wpointer-arith" -DCMAKE_BUILD_TYPE=Debug \
| -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON . && make -j

I get the following compile warnings:

| /home/burii/reviews/luajit/csymbols/src/lj_utils_hash.c: In function 'lj_utils_crc32':
| /home/burii/reviews/luajit/csymbols/src/lj_utils_hash.c:104:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
|   104 |   uint32_t crc = init;

> +  while (len--) {
> +    crc = (crc << 8) ^ crc32_table[((crc >> 24) ^ *buf) & 255];
> +    buf++;
> +  }
> +  return crc;
> +}
> diff --git a/src/ljamalg.c b/src/ljamalg.c
> index 3f7e6860..3a667a68 100644
> --- a/src/ljamalg.c
> +++ b/src/ljamalg.c

<snipped>

> -- 
> 2.32.0
> 

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2021-August/025707.html

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs
  2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs Maxim Kokryashkin via Tarantool-patches
@ 2021-08-30 14:10   ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2021-08-30 14:10 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

| memprof: extend symtab with information about .so libs

Width limit for commit title is 50 symbols including subsystem name [1].
Commit message should fit 72 symbols line width [1].

Also, you noticed [2], that just dump all symbols from so libraries shown
better performance than the current solution. So looks like we want to
reimplement this patch in this way, don't we?

So, for now I stop reviewing (this patch is the last one) until we don't
make the decision about implementation.

On 20.08.21, Maxim Kokryashkin wrote:
> This commit enriches memprof's symbol table with information about loaded
> ".so" files. That information will provide demangling capabilities to
> the parser.
> 
> The following data is stored for each ".so" library:
> | SYMTAB_SO | path to .so | it's address |  CRC32 Hash  |
>   1 byte                       8 bytes       8 bytes
>   magic
>   number
> 
> C symbol address from profile events with an address of the ".so" file will
> give us an offset, and it can be used to get the symbol in that file.
> 
> CRC32 hash will be used by the parser to determine whether the shared
> object that was used during the process profiling and during the process
> of parsing is the same.
> 
> Part of tarantool/tarantool#5813
> ---
>  src/lj_memprof.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_memprof.h |  3 ++-
>  2 files changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index 2c1ef3b8..c8367c95 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
> @@ -5,10 +5,15 @@
>  ** Copyright (C) 2015-2019 IPONWEB Ltd.
>  */
>  
> +#define _GNU_SOURCE

Should it be defind exactly behind <link.h>?
And even undefined after?

> +
>  #define lj_memprof_c
>  #define LUA_CORE
>  
>  #include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>

Typo: this include looks redundant.

> +#include <link.h>
>  
>  #include "lj_arch.h"
>  #include "lj_memprof.h"
> @@ -18,12 +23,64 @@
>  #include "lj_obj.h"
>  #include "lj_frame.h"
>  #include "lj_debug.h"
> +#include "lj_utils.h"
>  
>  /* --------------------------------- Symtab --------------------------------- */
>  
>  static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
>  					   0x0, 0x0, 0x0};
>  
> +int lj_file_crc32(const char* name, uint32_t* result) 
                                                        ^
Trailing whitespace here -------------------------------^

> +{
> +#define CRC_BUF_SIZE 1024 * 1024

Minor: It is better to use parentheses, when you define not one-token
marco.
Rationale: ~CRC_BUF_SIZE and ~(CRC_BUF_SIZE) return different values.
(Of course, this example is a little bit crazy, but it shows the point).

Also, why do we need 1 MB on the stack?
I suppose that 512 bytes will be enough.

> +
> +  char buf[CRC_BUF_SIZE] = "";

Don't get this line. Why do we need this initialization?

> +  size_t read_bytes = 0;

This too.

> +  FILE* so = NULL;

Typo: please use FILE *so instead FILE* so, considering our code
style.

> +
> +  if (result == NULL) {
> +    return 1;
> +  }
> +
> +  *result = 0xffffffff;

Please avoid the magic values in code.
Also, I don't see any check for this value latter, so it looks
excess.

> +
> +  so = fopen(name, "rb");
> +  if (!so)
> +    return 1;
> +
> +  while (CRC_BUF_SIZE == (read_bytes = fread(buf, sizeof(*buf), CRC_BUF_SIZE, so))) {

The brackets looks excess.

Side note: why do you use "reversed order" i.e. constant == some_call()?

> +    *result = lj_utils_crc32(buf, CRC_BUF_SIZE, *result); 
                                                            ^
Trailing whitespace here -----------------------------------^

> +  }
> +  
   ^^
Trailing whitespaces here.

> +  if(ferror(so) || !feof(so)) {
> +    fclose(so);
> +    return 1;
> +  }
> +  
   ^^
Trailing white spaces here.

Side note: Please adjust your favorite editor to highlight trailing
whytespaces.

> +  *result = lj_utils_crc32(buf, read_bytes, *result);
> +  fclose(so);
> +  return 0;
> +}
> +
> +int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data)

I got the following warning during compilation:

| /home/burii/reviews/luajit/csymbols/src/lj_memprof.c: In function 'write_shared_obj':
| /home/burii/reviews/luajit/csymbols/src/lj_memprof.c:64:56: warning: unused parameter 'size' [-Wunused-parameter]
|    64 | int write_shared_obj(struct dl_phdr_info *info, size_t size, void *data)

Please, used the corresponding UNUSED() macro.

> +{
> +  uint32_t so_hash = 0;
> +  struct lj_wbuf *buf = data;
> +
> +  if (info->dlpi_name && strlen(info->dlpi_name)) {
> +    if (0 != lj_file_crc32(info->dlpi_name, &so_hash))
> +      // XXX: Maybe it is reasonable to print warning here

I don't think so. This warning should be printed by the symtab parser.
So we need to write this event, but with special hash value means "failed
to calculate hash". OTOH, I don't know what we can do with hash collision.

> +      return 0;
> +
> +    lj_wbuf_addbyte(buf, SYMTAB_SO);
> +    lj_wbuf_addstring(buf, info->dlpi_name);
> +    lj_wbuf_addu64(buf, info->dlpi_addr);
> +    lj_wbuf_addu64(buf, so_hash);
> +  }
> +
> +  return 0;
> +}
> +
>  static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
>  {
>    const GCRef *iter = &g->gc.root;
> @@ -49,6 +106,9 @@ static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
>      iter = &o->gch.nextgc;
>    }
>  
> +  /* Write shared libraries. */
> +  dl_iterate_phdr(write_shared_obj, out);
> +
>    lj_wbuf_addbyte(out, SYMTAB_FINAL);
>  }
>  
> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> index 3417475d..0cefc403 100644
> --- a/src/lj_memprof.h
> +++ b/src/lj_memprof.h
> @@ -16,7 +16,7 @@
>  #include "lj_def.h"
>  #include "lj_wbuf.h"
>  
> -#define LJS_CURRENT_VERSION 0x1
> +#define LJS_CURRENT_VERSION 0x2

It means that we need to upgrade parser with the same commit.
Rationale: without this test will fail -- it complicates debugging with
bisecting later (for now, tests fail on this commit).

>  
>  /*
>  ** symtab format:
> @@ -51,6 +51,7 @@
>  */
>  
>  #define SYMTAB_LFUNC ((uint8_t)0)
> +#define SYMTAB_SO    ((uint8_t)1)

Please, add entry for the corresponging events in symtab description.

>  #define SYMTAB_FINAL ((uint8_t)0x80)
>  
>  #define LJM_CURRENT_FORMAT_VERSION 0x01
> -- 
> 2.32.0
> 

Also, please add some tests to check the behaviour of your patch.

[1]: https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message
[2]: https://lists.tarantool.org/pipermail/tarantool-patches/2021-August/025707.html

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-30 14:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 11:10 [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Maxim Kokryashkin via Tarantool-patches
2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 1/3] utils: add CRC32 hash implementation Maxim Kokryashkin via Tarantool-patches
2021-08-30 14:08   ` Sergey Kaplun via Tarantool-patches
2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 2/3] memprof: extend symtab with information about .so libs Maxim Kokryashkin via Tarantool-patches
2021-08-30 14:10   ` Sergey Kaplun via Tarantool-patches
2021-08-20 11:10 ` [Tarantool-patches] [PATCH luajit v2 3/3] memprof: update memprof parser Maxim Kokryashkin via Tarantool-patches
2021-08-25 10:01 ` [Tarantool-patches] [PATCH luajit v2 0/3] memprof: add demangling capabilities for C functions Максим Корякшин via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox