Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler
@ 2020-12-16 19:13 Sergey Kaplun
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch provides a Lua interface for memory profiler in LuaJIT
and the corresponding parser of profiled data.

Issues: https://github.com/tarantool/tarantool/issues/5442
        https://github.com/tarantool/tarantool/issues/5490

Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-5442-luajit-memory-profiler

CI:     https://gitlab.com/tarantool/tarantool/-/pipelines/230917973

RFC: https://lists.tarantool.org/pipermail/tarantool-discussions/2020-December/000144.html

@ChangeLog:
 - Introduce LuaJIT memory profiler (gh-5442).

Sergey Kaplun (11):
  build: add src dir in building
  utils: introduce leb128 reader and writer
  profile: introduce profiler writing module
  profile: introduce symtab write module
  vm: introduce LFUNC and FFUNC vmstates
  core: introduce new mem_L field
  debug: move debug_frameline to public module API
  profile: introduce memory profiler
  misc: add Lua API for memory profiler
  tools: introduce tools directory
  profile: introduce profile parser

 src/Makefile                          |  16 +-
 src/Makefile.dep                      |  14 +-
 src/lib_misc.c                        | 165 ++++++++++
 src/lj_arch.h                         |  22 ++
 src/lj_debug.c                        |   8 +-
 src/lj_debug.h                        |   1 +
 src/lj_errmsg.h                       |   6 +
 src/lj_frame.h                        |  18 +-
 src/lj_gc.c                           |   2 +
 src/lj_obj.h                          |  13 +-
 src/lj_profile.c                      |   5 +-
 src/lj_state.c                        |   8 +
 src/lmisclib.h                        |  29 ++
 src/profile/ljp_memprof.c             | 413 ++++++++++++++++++++++++++
 src/profile/ljp_memprof.h             |  86 ++++++
 src/profile/ljp_symtab.c              |  55 ++++
 src/profile/ljp_symtab.h              |  57 ++++
 src/profile/ljp_write.c               | 195 ++++++++++++
 src/profile/ljp_write.h               |  84 ++++++
 src/utils/leb128.c                    | 124 ++++++++
 src/utils/leb128.h                    |  55 ++++
 src/vm_arm.dasc                       |   6 +-
 src/vm_arm64.dasc                     |   6 +-
 src/vm_mips.dasc                      |   6 +-
 src/vm_mips64.dasc                    |   6 +-
 src/vm_ppc.dasc                       |   6 +-
 src/vm_x64.dasc                       |  99 ++++--
 src/vm_x86.dasc                       | 137 ++++++---
 test/misclib-memprof-lapi.test.lua    | 125 ++++++++
 {src => tools}/luajit-gdb.py          |  14 +-
 tools/luajit-parse-memprof            |  20 ++
 tools/parse_memprof/bufread.lua       | 143 +++++++++
 tools/parse_memprof/main.lua          | 104 +++++++
 tools/parse_memprof/parse_memprof.lua | 195 ++++++++++++
 tools/parse_memprof/parse_symtab.lua  |  88 ++++++
 tools/parse_memprof/view_plain.lua    |  45 +++
 36 files changed, 2260 insertions(+), 116 deletions(-)
 create mode 100644 src/profile/ljp_memprof.c
 create mode 100644 src/profile/ljp_memprof.h
 create mode 100644 src/profile/ljp_symtab.c
 create mode 100644 src/profile/ljp_symtab.h
 create mode 100644 src/profile/ljp_write.c
 create mode 100644 src/profile/ljp_write.h
 create mode 100644 src/utils/leb128.c
 create mode 100644 src/utils/leb128.h
 create mode 100755 test/misclib-memprof-lapi.test.lua
 rename {src => tools}/luajit-gdb.py (98%)
 create mode 100755 tools/luajit-parse-memprof
 create mode 100644 tools/parse_memprof/bufread.lua
 create mode 100644 tools/parse_memprof/main.lua
 create mode 100644 tools/parse_memprof/parse_memprof.lua
 create mode 100644 tools/parse_memprof/parse_symtab.lua
 create mode 100644 tools/parse_memprof/view_plain.lua

-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-20 21:27   ` Igor Munkin
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

That patch allows including headers by absolute path to them from source
directory like: "utils/uleb128.h".

Part of tarantool/tarantool#5442
---
 src/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/Makefile b/src/Makefile
index 2786348..caa49f9 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -156,6 +156,8 @@ XCFLAGS+= -DLUAJIT_SMART_STRINGS=1
 # You probably don't need to change anything below this line!
 ##############################################################################
 
+SRC_DIR=.
+
 ##############################################################################
 # Host system detection.
 ##############################################################################
@@ -225,7 +227,7 @@ TARGET_XSHLDFLAGS= -shared -fPIC -Wl,-soname,$(TARGET_SONAME)
 TARGET_DYNXLDOPTS=
 
 TARGET_LFSFLAGS= -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
-TARGET_XCFLAGS= $(TARGET_LFSFLAGS) -U_FORTIFY_SOURCE
+TARGET_XCFLAGS= $(TARGET_LFSFLAGS) -U_FORTIFY_SOURCE -I$(SRC_DIR)
 TARGET_XLDFLAGS=
 TARGET_XLIBS= -lm
 TARGET_TCFLAGS= $(CCOPTIONS) $(TARGET_XCFLAGS) $(TARGET_FLAGS) $(TARGET_CFLAGS)
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-20 22:44   ` Igor Munkin
  2020-12-23 16:50   ` Sergey Ostanevich
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces module for reading and writing leb128 compression.
It will be used for streaming profiling events writing, that will be
added at the next patches.

Part of tarantool/tarantool#5442
---
 src/Makefile       |   5 +-
 src/Makefile.dep   |   1 +
 src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
 src/utils/leb128.h |  55 ++++++++++++++++++++
 4 files changed, 183 insertions(+), 2 deletions(-)
 create mode 100644 src/utils/leb128.c
 create mode 100644 src/utils/leb128.h

diff --git a/src/Makefile b/src/Makefile
index caa49f9..be7ed95 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -468,6 +468,7 @@ endif
 DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
 DASM_DASC= vm_$(DASM_ARCH).dasc
 
+UTILS_O= utils/leb128.o
 BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
 	   host/buildvm_lib.o host/buildvm_fold.o
 BUILDVM_T= host/buildvm
@@ -498,7 +499,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o \
 	  lj_ctype.o lj_cdata.o lj_cconv.o lj_ccall.o lj_ccallback.o \
 	  lj_carith.o lj_clib.o lj_cparse.o \
 	  lj_lib.o lj_alloc.o lib_aux.o \
-	  $(LJLIB_O) lib_init.o
+	  $(LJLIB_O) lib_init.o $(UTILS_O)
 
 LJVMCORE_O= $(LJVM_O) $(LJCORE_O)
 LJVMCORE_DYNO= $(LJVMCORE_O:.o=_dyn.o)
@@ -516,7 +517,7 @@ ALL_HDRGEN= lj_bcdef.h lj_ffdef.h lj_libdef.h lj_recdef.h lj_folddef.h \
 	    host/buildvm_arch.h
 ALL_GEN= $(LJVM_S) $(ALL_HDRGEN) $(LIB_VMDEFP)
 WIN_RM= *.obj *.lib *.exp *.dll *.exe *.manifest *.pdb *.ilk
-ALL_RM= $(ALL_T) $(ALL_GEN) *.o host/*.o $(WIN_RM)
+ALL_RM= $(ALL_T) $(ALL_GEN) *.o host/*.o utils/*.o $(WIN_RM)
 
 ##############################################################################
 # Build mode handling.
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 556314e..cc75d03 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -248,3 +248,4 @@ host/buildvm_lib.o: host/buildvm_lib.c host/buildvm.h lj_def.h lua.h luaconf.h \
 host/buildvm_peobj.o: host/buildvm_peobj.c host/buildvm.h lj_def.h lua.h \
  luaconf.h lj_arch.h lj_bc.h lj_def.h lj_arch.h
 host/minilua.o: host/minilua.c
+utils/leb128.o: utils/leb128.c
diff --git a/src/utils/leb128.c b/src/utils/leb128.c
new file mode 100644
index 0000000..921e5bc
--- /dev/null
+++ b/src/utils/leb128.c
@@ -0,0 +1,124 @@
+/*
+** Working with LEB128/ULEB128 encoding.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#include <stdint.h>
+#include <stddef.h>
+
+#define LINK_BIT          (0x80)
+#define MIN_TWOBYTE_VALUE (0x80)
+#define PAYLOAD_MASK      (0x7f)
+#define SHIFT_STEP        (7)
+#define LEB_SIGN_BIT      (0x40)
+
+/* ------------------------- Writing ULEB128/LEB128 ------------------------- */
+
+size_t write_uleb128(uint8_t *buffer, uint64_t value)
+{
+  size_t i = 0;
+
+  for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
+    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
+  }
+  buffer[i++] = (uint8_t)value;
+
+  return i;
+}
+
+size_t write_leb128(uint8_t *buffer, int64_t value)
+{
+  size_t i = 0;
+
+  for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
+    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
+  }
+  buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
+
+  return i;
+}
+
+/* ------------------------- Reading ULEB128/LEB128 ------------------------- */
+
+/*
+** NB! For each LEB128 type (signed/unsigned) we have two versions of read
+** functions: The one consuming unlimited number of input octets and the one
+** consuming not more than given number of input octets. Currently reading
+** is not used in performance critical places, so these two functions are
+** implemented via single low-level function + run-time mode check. Feel free
+** to change if this becomes a bottleneck.
+*/
+
+static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded,
+			    size_t n)
+{
+  size_t i = 0;
+  uint64_t value = 0;
+  uint64_t shift = 0;
+  uint8_t octet;
+
+  for(;;) {
+    if (guarded && i + 1 > n) {
+      return 0;
+    }
+    octet = buffer[i++];
+    value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
+    shift += SHIFT_STEP;
+    if (!(octet & LINK_BIT)) {
+      break;
+    }
+  }
+
+  *out = value;
+  return i;
+}
+
+size_t read_uleb128(uint64_t *out, const uint8_t *buffer)
+{
+  return _read_uleb128(out, buffer, 0, 0);
+}
+
+size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n)
+{
+  return _read_uleb128(out, buffer, 1, n);
+}
+
+static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded,
+			   size_t n)
+{
+  size_t i = 0;
+  int64_t  value = 0;
+  uint64_t shift = 0;
+  uint8_t  octet;
+
+  for(;;) {
+    if (guarded && i + 1 > n) {
+      return 0;
+    }
+    octet  = buffer[i++];
+    value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
+    shift += SHIFT_STEP;
+    if (!(octet & LINK_BIT)) {
+      break;
+    }
+  }
+
+  if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) {
+    value |= -(1 << shift);
+  }
+
+  *out = value;
+  return i;
+}
+
+size_t read_leb128(int64_t *out, const uint8_t *buffer)
+{
+  return _read_leb128(out, buffer, 0, 0);
+}
+
+size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n)
+{
+  return _read_leb128(out, buffer, 1, n);
+}
diff --git a/src/utils/leb128.h b/src/utils/leb128.h
new file mode 100644
index 0000000..46d90bc
--- /dev/null
+++ b/src/utils/leb128.h
@@ -0,0 +1,55 @@
+/*
+** Interfaces for working with LEB128/ULEB128 encoding.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJ_UTILS_LEB128_H
+#define _LJ_UTILS_LEB128_H
+
+#include <stddef.h>
+#include <stdint.h>
+
+/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
+#define LEB128_U64_MAXSIZE 10
+
+/*
+** Writes a value from an unsigned 64-bit input to a buffer of bytes.
+** Buffer overflow is not checked. Returns number of bytes written.
+*/
+size_t write_uleb128(uint8_t *buffer, uint64_t value);
+
+/*
+** Writes a value from an signed 64-bit input to a buffer of bytes.
+** Buffer overflow is not checked. Returns number of bytes written.
+*/
+size_t write_leb128(uint8_t *buffer, int64_t value);
+
+/*
+** Reads a value from a buffer of bytes to a uint64_t output.
+** Buffer overflow is not checked. Returns number of bytes read.
+*/
+size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
+
+/*
+** Reads a value from a buffer of bytes to a int64_t output.
+** Buffer overflow is not checked. Returns number of bytes read.
+*/
+size_t read_leb128(int64_t *out, const uint8_t *buffer);
+
+/*
+** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
+** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
+** If more than n bytes is about to be consumed, returns 0 without touching out.
+*/
+size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
+
+/*
+** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
+** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
+** If more than n bytes is about to be consumed, returns 0 without touching out.
+*/
+size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
+
+#endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-21  9:24   ` Igor Munkin
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces module for writing profile data.
Its usage will be added at the next patches.

It can be used for memory profiler or for signal-based
cpu profiler.

Part of tarantool/tarantool#5442
---

Custom memcpy function (see below) makes sense if this module will be
used for cpu/sample profiler based on a signal-based timer. Else it can
be easily redefined.

 src/Makefile            |   5 +-
 src/Makefile.dep        |   2 +
 src/profile/ljp_write.c | 195 ++++++++++++++++++++++++++++++++++++++++
 src/profile/ljp_write.h |  84 +++++++++++++++++
 4 files changed, 284 insertions(+), 2 deletions(-)
 create mode 100644 src/profile/ljp_write.c
 create mode 100644 src/profile/ljp_write.h

diff --git a/src/Makefile b/src/Makefile
index be7ed95..4b1d937 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -469,6 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
 DASM_DASC= vm_$(DASM_ARCH).dasc
 
 UTILS_O= utils/leb128.o
+PROFILE_O= profile/ljp_write.o
 BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
 	   host/buildvm_lib.o host/buildvm_fold.o
 BUILDVM_T= host/buildvm
@@ -499,7 +500,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o \
 	  lj_ctype.o lj_cdata.o lj_cconv.o lj_ccall.o lj_ccallback.o \
 	  lj_carith.o lj_clib.o lj_cparse.o \
 	  lj_lib.o lj_alloc.o lib_aux.o \
-	  $(LJLIB_O) lib_init.o $(UTILS_O)
+	  $(LJLIB_O) lib_init.o $(UTILS_O) $(PROFILE_O)
 
 LJVMCORE_O= $(LJVM_O) $(LJCORE_O)
 LJVMCORE_DYNO= $(LJVMCORE_O:.o=_dyn.o)
@@ -517,7 +518,7 @@ ALL_HDRGEN= lj_bcdef.h lj_ffdef.h lj_libdef.h lj_recdef.h lj_folddef.h \
 	    host/buildvm_arch.h
 ALL_GEN= $(LJVM_S) $(ALL_HDRGEN) $(LIB_VMDEFP)
 WIN_RM= *.obj *.lib *.exp *.dll *.exe *.manifest *.pdb *.ilk
-ALL_RM= $(ALL_T) $(ALL_GEN) *.o host/*.o utils/*.o $(WIN_RM)
+ALL_RM= $(ALL_T) $(ALL_GEN) *.o host/*.o utils/*.o profile/*.o $(WIN_RM)
 
 ##############################################################################
 # Build mode handling.
diff --git a/src/Makefile.dep b/src/Makefile.dep
index cc75d03..7fdbfbe 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -248,4 +248,6 @@ host/buildvm_lib.o: host/buildvm_lib.c host/buildvm.h lj_def.h lua.h luaconf.h \
 host/buildvm_peobj.o: host/buildvm_peobj.c host/buildvm.h lj_def.h lua.h \
  luaconf.h lj_arch.h lj_bc.h lj_def.h lj_arch.h
 host/minilua.o: host/minilua.c
+profile/ljp_write.o: profile/ljp_write.c profile/ljp_write.h utils/leb128.h \
+ lj_def.h lua.h luaconf.h
 utils/leb128.o: utils/leb128.c
diff --git a/src/profile/ljp_write.c b/src/profile/ljp_write.c
new file mode 100644
index 0000000..de7202d
--- /dev/null
+++ b/src/profile/ljp_write.c
@@ -0,0 +1,195 @@
+/*
+** Low-level writer for LuaJIT Profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#include <unistd.h>
+#include <errno.h>
+
+#include "profile/ljp_write.h"
+#include "utils/leb128.h"
+#include "lj_def.h"
+
+/*
+** memcpy from the standard C library is not guaranteed to be async-signal-safe.
+** Common sense might tell us that it should be async-signal-safe (at least
+** because it is unlikely that memcpy will allocate anything dynamically),
+** but the standard is always the standard. So LuaJIT offers its own
+** implementation of memcpy. Of course it will never be as fast as the system
+** memcpy, but it will be guaranteed to always stay async-signal-safe. Feel
+** free to remove the define below to fall-back to the system memcpy if the
+** custom implementation becomes a botlleneck, but this is at your own risk.
+** You are warned:)
+*/
+#define USE_CUSTOM_LJ_MEMCPY
+
+#ifdef USE_CUSTOM_LJ_MEMCPY
+/*
+** Behaves exactly as memcpy from the standard C library with following caveats:
+** - Guaranteed to be async-signal-safe.
+** - Does *not* handle unaligned byte stores.
+** - src cannot be declared as (const void *restrict) unless we start
+**   supporting C99 explicitly. Possible overlapping of dst in src is ignored.
+*/
+#define COPY_BUFFER_TYPE uint64_t
+#define COPY_BUFFER_SIZE sizeof(COPY_BUFFER_TYPE)
+
+static void *write_memcpy(void *dst, const void *src, size_t n)
+{
+  size_t loops;
+  size_t i;
+  uint8_t *dst_pos = (uint8_t *)dst;
+  const uint8_t *src_pos = (const uint8_t *)src;
+
+  loops = n / COPY_BUFFER_SIZE;
+  for (i = 0; i < loops; i++) {
+    *(COPY_BUFFER_TYPE *)dst_pos = *(const COPY_BUFFER_TYPE *)src_pos;
+    dst_pos += COPY_BUFFER_SIZE;
+    src_pos += COPY_BUFFER_SIZE;
+  }
+
+  loops = n % COPY_BUFFER_SIZE;
+  for (i = 0; i < loops; i++) {
+    *dst_pos = *src_pos;
+    dst_pos++;
+    src_pos++;
+  }
+
+  return dst;
+}
+#else /* !USE_CUSTOM_LJ_MEMCPY */
+#define write_memcpy memcpy
+#endif /* USE_CUSTOM_LJ_MEMCPY */
+
+static LJ_AINLINE void write_set_flag(struct ljp_buffer *buf, uint8_t flag)
+{
+  buf->flags |= flag;
+}
+
+static LJ_AINLINE void write_save_errno(struct ljp_buffer *buf)
+{
+  buf->saved_errno = errno;
+}
+
+/* Wraps a write syscall ensuring all data have been written. */
+static void write_buffer_sys(struct ljp_buffer *buffer, const void **data,
+			     size_t len)
+{
+  void *ctx = buffer->ctx;
+  size_t written;
+
+  lua_assert(!ljp_write_test_flag(buffer, STREAM_STOP));
+
+  written = buffer->writer(data, len, ctx);
+
+  if (LJ_UNLIKELY(written < len)) {
+    write_set_flag(buffer, STREAM_ERR_IO);
+    write_save_errno(buffer);
+  }
+  if (LJ_UNLIKELY(*data == NULL)) {
+    write_set_flag(buffer, STREAM_STOP);
+    write_save_errno(buffer);
+  }
+}
+
+static LJ_AINLINE size_t write_bytes_buffered(const struct ljp_buffer *buf)
+{
+  return (size_t)(buf->pos - buf->buf);
+}
+
+static LJ_AINLINE int write_buffer_has(const struct ljp_buffer *buf, size_t n)
+{
+  return (buf->size - write_bytes_buffered(buf)) >= n;
+}
+
+void ljp_write_flush_buffer(struct ljp_buffer *buf)
+{
+  lua_assert(!ljp_write_test_flag(buf, STREAM_STOP));
+
+  write_buffer_sys(buf, (const void **)&buf->buf, write_bytes_buffered(buf));
+  buf->pos = buf->buf;
+}
+
+void ljp_write_init(struct ljp_buffer *buf, ljp_writer writer, void *ctx,
+		    uint8_t *mem, size_t size)
+{
+  buf->ctx = ctx;
+  buf->writer = writer;
+  buf->buf = mem;
+  buf->pos = mem;
+  buf->size = size;
+  buf->flags = 0;
+  buf->saved_errno = 0;
+}
+
+void ljp_write_terminate(struct ljp_buffer *buf)
+{
+  ljp_write_init(buf, NULL, NULL, NULL, 0);
+}
+
+static LJ_AINLINE void write_reserve(struct ljp_buffer *buf, size_t n)
+{
+  if (LJ_UNLIKELY(!write_buffer_has(buf, n)))
+    ljp_write_flush_buffer(buf);
+}
+
+/* Writes a byte to the output buffer. */
+void ljp_write_byte(struct ljp_buffer *buf, uint8_t b)
+{
+  if (LJ_UNLIKELY(ljp_write_test_flag(buf, STREAM_STOP)))
+    return;
+  write_reserve(buf, sizeof(b));
+  *buf->pos++ = b;
+}
+
+/* Writes an unsigned integer which is at most 64 bits long to the output. */
+void ljp_write_u64(struct ljp_buffer *buf, uint64_t n)
+{
+  if (LJ_UNLIKELY(ljp_write_test_flag(buf, STREAM_STOP)))
+    return;
+  write_reserve(buf, LEB128_U64_MAXSIZE);
+  buf->pos += (ptrdiff_t)write_uleb128(buf->pos, n);
+}
+
+/* Writes n bytes from an arbitrary buffer src to the output. */
+static void write_buffer(struct ljp_buffer *buf, const void *src, size_t n)
+{
+  if (LJ_UNLIKELY(ljp_write_test_flag(buf, STREAM_STOP)))
+    return;
+  /*
+  ** Very unlikely: We are told to write a large buffer at once.
+  ** Buffer not belong to us so we must to pump data
+  ** through buffer.
+  */
+  while (LJ_UNLIKELY(n > buf->size)) {
+    ljp_write_flush_buffer(buf);
+    write_memcpy(buf->pos, src, buf->size);
+    buf->pos += (ptrdiff_t)buf->size;
+    n -= buf->size;
+  }
+
+  write_reserve(buf, n);
+  write_memcpy(buf->pos, src, n);
+  buf->pos += (ptrdiff_t)n;
+}
+
+/* Writes a \0-terminated C string to the output buffer. */
+void ljp_write_string(struct ljp_buffer *buf, const char *s)
+{
+  const size_t l = strlen(s);
+
+  ljp_write_u64(buf, (uint64_t)l);
+  write_buffer(buf, s, l);
+}
+
+int ljp_write_test_flag(const struct ljp_buffer *buf, uint8_t flag)
+{
+  return buf->flags & flag;
+}
+
+int ljp_write_errno(const struct ljp_buffer *buf)
+{
+  return buf->saved_errno;
+}
diff --git a/src/profile/ljp_write.h b/src/profile/ljp_write.h
new file mode 100644
index 0000000..29c1669
--- /dev/null
+++ b/src/profile/ljp_write.h
@@ -0,0 +1,84 @@
+/*
+** Low-level event streaming for LuaJIT Profiler.
+** NB! Please note that all events may be streamed inside a signal handler.
+** This means effectively that only async-signal-safe library functions and
+** syscalls MUST be used for streaming. Check with `man 7 signal` when in
+** doubt.
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJP_WRITE_H
+#define _LJP_WRITE_H
+
+#include <stdint.h>
+
+/*
+** Data format for strings:
+**
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
+**
+** Note.
+** For strings shorter than 128 bytes (most likely scenario in our case)
+** we write the same amount of data (1-byte ULEB128 + actual payload) as we
+** would have written with straightforward serialization (actual payload + \0),
+** but make parsing easier.
+*/
+
+/* Stream errors. */
+#define STREAM_ERR_IO 0x1
+#define STREAM_STOP   0x2
+
+typedef size_t (*ljp_writer)(const void **data, size_t len, void *opt);
+
+/* Write buffer for profilers. */
+struct ljp_buffer {
+  /*
+  ** Buffer writer which will called at buffer write.
+  ** Should return amount of written bytes on success or zero in case of error.
+  ** *data should contain new buffer of size greater or equal to len.
+  ** If *data == NULL stream stops.
+  */
+  ljp_writer writer;
+  /* Context to writer function. */
+  void *ctx;
+  /* Buffer size. */
+  size_t size;
+  /* Saved errno in case of error. */
+  int saved_errno;
+  /* Start of buffer. */
+  uint8_t *buf;
+  /* Current position in buffer. */
+  uint8_t *pos;
+  /* Internal flags. */
+  volatile uint8_t flags;
+};
+
+/* Write string. */
+void ljp_write_string(struct ljp_buffer *buf, const char *s);
+
+/* Write single byte. */
+void ljp_write_byte(struct ljp_buffer *buf, uint8_t b);
+
+/* Write uint64_t in uleb128 format. */
+void ljp_write_u64(struct ljp_buffer *buf, uint64_t n);
+
+/* Immediatly flush buffer. */
+void ljp_write_flush_buffer(struct ljp_buffer *buf);
+
+/* Init buffer. */
+void ljp_write_init(struct ljp_buffer *buf, ljp_writer writer, void *ctx,
+		    uint8_t *mem, size_t size);
+
+/* Check flags. */
+int ljp_write_test_flag(const struct ljp_buffer *buf, uint8_t flag);
+
+/* Return saved errno. */
+int ljp_write_errno(const struct ljp_buffer *buf);
+
+/* Set pointers to NULL and reset flags. */
+void ljp_write_terminate(struct ljp_buffer *buf);
+
+#endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (2 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-21 10:30   ` Igor Munkin
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch adds profile writer that writes all necessary Lua
functions prototypes info like GCproto address, name of the chunk this
function was defined in and number of the first line of it.
See <ljp_symtab.h> for details.

Usage of this module will be added at the next patches.

Part of tarantool/tarantool#5442
---
 src/Makefile             |  2 +-
 src/Makefile.dep         |  2 ++
 src/profile/ljp_symtab.c | 55 ++++++++++++++++++++++++++++++++++++++
 src/profile/ljp_symtab.h | 57 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 src/profile/ljp_symtab.c
 create mode 100644 src/profile/ljp_symtab.h

diff --git a/src/Makefile b/src/Makefile
index 4b1d937..e00265c 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -469,7 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
 DASM_DASC= vm_$(DASM_ARCH).dasc
 
 UTILS_O= utils/leb128.o
-PROFILE_O= profile/ljp_write.o
+PROFILE_O= profile/ljp_write.o profile/ljp_symtab.o
 BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
 	   host/buildvm_lib.o host/buildvm_fold.o
 BUILDVM_T= host/buildvm
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 7fdbfbe..831a5ce 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -248,6 +248,8 @@ host/buildvm_lib.o: host/buildvm_lib.c host/buildvm.h lj_def.h lua.h luaconf.h \
 host/buildvm_peobj.o: host/buildvm_peobj.c host/buildvm.h lj_def.h lua.h \
  luaconf.h lj_arch.h lj_bc.h lj_def.h lj_arch.h
 host/minilua.o: host/minilua.c
+profile/ljp_symtab.o: profile/ljp_symtab.c lj_obj.h lua.h luaconf.h lj_def.h \
+ lj_arch.h profile/ljp_write.h profile/ljp_symtab.h
 profile/ljp_write.o: profile/ljp_write.c profile/ljp_write.h utils/leb128.h \
  lj_def.h lua.h luaconf.h
 utils/leb128.o: utils/leb128.c
diff --git a/src/profile/ljp_symtab.c b/src/profile/ljp_symtab.c
new file mode 100644
index 0000000..5a17c97
--- /dev/null
+++ b/src/profile/ljp_symtab.c
@@ -0,0 +1,55 @@
+/*
+** Implementation of the Lua symbol table dumper.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#include "lj_obj.h"
+#include "profile/ljp_write.h"
+#include "profile/ljp_symtab.h"
+
+#define LJS_CURRENT_VERSION 2
+
+static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
+					   0x0, 0x0, 0x0};
+
+static void symtab_write_prologue(struct ljp_buffer *out)
+{
+  const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
+  size_t i = 0;
+
+  for (; i < len; i++)
+    ljp_write_byte(out, ljs_header[i]);
+}
+
+void ljp_symtab_write(struct ljp_buffer *out, const struct global_State *g)
+{
+  const GCobj *o;
+  const GCRef *iter = &g->gc.root;
+
+  symtab_write_prologue(out);
+
+  while (NULL != (o = gcref(*iter))) {
+    switch (o->gch.gct) {
+    case (~LJ_TPROTO): {
+      const GCproto *pt = gco2pt(o);
+      ljp_write_byte(out, SYMTAB_LFUNC);
+      ljp_write_u64(out, (uintptr_t)pt);
+      ljp_write_string(out, proto_chunknamestr(pt));
+      ljp_write_u64(out, (uint64_t)pt->firstline);
+      break;
+    }
+    case (~LJ_TTRACE): {
+      /* TODO: Implement dumping a trace info */
+      break;
+    }
+    default: {
+      break;
+    }
+    }
+    iter = &o->gch.nextgc;
+  }
+
+  ljp_write_byte(out, SYMTAB_FINAL);
+}
diff --git a/src/profile/ljp_symtab.h b/src/profile/ljp_symtab.h
new file mode 100644
index 0000000..3a40d98
--- /dev/null
+++ b/src/profile/ljp_symtab.h
@@ -0,0 +1,57 @@
+/*
+** Lua symbol table dumper.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJ_SYMTAB_H
+#define _LJ_SYMTAB_H
+
+#include <stdint.h>
+
+struct global_State;
+struct ljp_buffer;
+
+/*
+** symtab format:
+**
+** symtab         := prologue sym*
+** prologue       := 'l' 'j' 's' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** sym            := sym-lua | sym-final
+** sym-lua        := sym-header sym-addr sym-chunk sym-line
+** sym-header     := <BYTE>
+** sym-addr       := <ULEB128>
+** sym-chunk      := string
+** sym-line       := <ULEB128>
+** sym-final      := sym-header
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain numeric version number
+**
+** sym-header: [FUUUUUTT]
+**  * TT    : 2 bits for representing symbol type
+**  * UUUUU : 5 unused bits
+**  * F     : 1 bit marking the end of the symtab (final symbol)
+*/
+
+#define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_CFUNC ((uint8_t)1)
+#define SYMTAB_FFUNC ((uint8_t)2)
+#define SYMTAB_TRACE ((uint8_t)3)
+#define SYMTAB_FINAL ((uint8_t)0x80)
+
+/* Writes the symbol table of the VM g to out. */
+void ljp_symtab_write(struct ljp_buffer *out, const struct global_State *g);
+
+#endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (3 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-25 11:07   ` Sergey Ostanevich
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field Sergey Kaplun
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch slivers LJ_VMST_LFUNC and LJ_VMST_FFUNC from LJ_VMST_INERP to
allow to determine the context of vm execution for x86/x64 arches. Also,
LJ_VMST_C is renamed to LJ_VMST_CFUNC for naming consistence with newer
vmstates.

Also, this patch adjusts stack layout for x86/x64 arches to save vmstate
to avoid inconsistent state while stack unwinding when an error is
raised.

Part of tarantool/tarantool#5442
---
 src/lj_frame.h     |  18 +++---
 src/lj_obj.h       |   4 +-
 src/lj_profile.c   |   5 +-
 src/luajit-gdb.py  |  14 +++--
 src/vm_arm.dasc    |   6 +-
 src/vm_arm64.dasc  |   6 +-
 src/vm_mips.dasc   |   6 +-
 src/vm_mips64.dasc |   6 +-
 src/vm_ppc.dasc    |   6 +-
 src/vm_x64.dasc    |  99 ++++++++++++++++++++++----------
 src/vm_x86.dasc    | 137 ++++++++++++++++++++++++++++++---------------
 11 files changed, 200 insertions(+), 107 deletions(-)

diff --git a/src/lj_frame.h b/src/lj_frame.h
index 19c49a4..2e693f9 100644
--- a/src/lj_frame.h
+++ b/src/lj_frame.h
@@ -127,13 +127,13 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_SIZE		(16*4)
 #define CFRAME_SHIFT_MULTRES	0
 #else
-#define CFRAME_OFS_ERRF		(15*4)
-#define CFRAME_OFS_NRES		(14*4)
-#define CFRAME_OFS_PREV		(13*4)
-#define CFRAME_OFS_L		(12*4)
+#define CFRAME_OFS_ERRF		(19*4)
+#define CFRAME_OFS_NRES		(18*4)
+#define CFRAME_OFS_PREV		(17*4)
+#define CFRAME_OFS_L		(16*4)
 #define CFRAME_OFS_PC		(6*4)
 #define CFRAME_OFS_MULTRES	(5*4)
-#define CFRAME_SIZE		(12*4)
+#define CFRAME_SIZE		(16*4)
 #define CFRAME_SHIFT_MULTRES	0
 #endif
 #elif LJ_TARGET_X64
@@ -152,11 +152,11 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_OFS_NRES		(22*4)
 #define CFRAME_OFS_MULTRES	(21*4)
 #endif
-#define CFRAME_SIZE		(10*8)
+#define CFRAME_SIZE		(12*8)
 #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 9*16 + 4*8)
 #define CFRAME_SHIFT_MULTRES	0
 #else
-#define CFRAME_OFS_PREV		(4*8)
+#define CFRAME_OFS_PREV		(6*8)
 #if LJ_GC64
 #define CFRAME_OFS_PC		(3*8)
 #define CFRAME_OFS_L		(2*8)
@@ -171,9 +171,9 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_OFS_MULTRES	(1*4)
 #endif
 #if LJ_NO_UNWIND
-#define CFRAME_SIZE		(12*8)
+#define CFRAME_SIZE		(14*8)
 #else
-#define CFRAME_SIZE		(10*8)
+#define CFRAME_SIZE		(12*8)
 #endif
 #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 16)
 #define CFRAME_SHIFT_MULTRES	0
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 927b347..7fb715e 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -512,7 +512,9 @@ typedef struct GCtab {
 /* VM states. */
 enum {
   LJ_VMST_INTERP,	/* Interpreter. */
-  LJ_VMST_C,		/* C function. */
+  LJ_VMST_LFUNC,	/* Lua function. */
+  LJ_VMST_FFUNC,	/* Fast function. */
+  LJ_VMST_CFUNC,	/* C function. */
   LJ_VMST_GC,		/* Garbage collector. */
   LJ_VMST_EXIT,		/* Trace exit handler. */
   LJ_VMST_RECORD,	/* Trace recorder. */
diff --git a/src/lj_profile.c b/src/lj_profile.c
index 116998e..637e03c 100644
--- a/src/lj_profile.c
+++ b/src/lj_profile.c
@@ -157,7 +157,10 @@ static void profile_trigger(ProfileState *ps)
     int st = g->vmstate;
     ps->vmstate = st >= 0 ? 'N' :
 		  st == ~LJ_VMST_INTERP ? 'I' :
-		  st == ~LJ_VMST_C ? 'C' :
+		  st == ~LJ_VMST_CFUNC ? 'C' :
+		  /* Stubs for profiler hooks. */
+		  st == ~LJ_VMST_FFUNC ? 'I' :
+		  st == ~LJ_VMST_LFUNC ? 'I' :
 		  st == ~LJ_VMST_GC ? 'G' : 'J';
     g->hookmask = (mask | HOOK_PROFILE);
     lj_dispatch_update(g);
diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
index 652c560..f1fd623 100644
--- a/src/luajit-gdb.py
+++ b/src/luajit-gdb.py
@@ -206,12 +206,14 @@ def J(g):
 def vm_state(g):
     return {
         i2notu32(0): 'INTERP',
-        i2notu32(1): 'C',
-        i2notu32(2): 'GC',
-        i2notu32(3): 'EXIT',
-        i2notu32(4): 'RECORD',
-        i2notu32(5): 'OPT',
-        i2notu32(6): 'ASM',
+        i2notu32(1): 'LFUNC',
+        i2notu32(2): 'FFUNC',
+        i2notu32(3): 'CFUNC',
+        i2notu32(4): 'GC',
+        i2notu32(5): 'EXIT',
+        i2notu32(6): 'RECORD',
+        i2notu32(7): 'OPT',
+        i2notu32(8): 'ASM',
     }.get(int(tou32(g['vmstate'])), 'TRACE')
 
 def gc_state(g):
diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
index d4cdaf5..ae2efdf 100644
--- a/src/vm_arm.dasc
+++ b/src/vm_arm.dasc
@@ -287,7 +287,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  str RB, L->base
   |   ldr KBASE, SAVE_NRES
-  |    mv_vmstate CARG4, C
+  |    mv_vmstate CARG4, CFUNC
   |   sub BASE, BASE, #8
   |  subs CARG3, RC, #8
   |   lsl KBASE, KBASE, #3		// KBASE = (nresults_wanted+1)*8
@@ -348,7 +348,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ldr L, SAVE_L
-  |   mv_vmstate CARG4, C
+  |   mv_vmstate CARG4, CFUNC
   |  ldr GL:CARG3, L->glref
   |   str CARG4, GL:CARG3->vmstate
   |   str L, GL:CARG3->cur_L
@@ -4487,7 +4487,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (op == BC_FUNCCW) {
       |  ldr CARG2, CFUNC:CARG3->f
     }
-    |    mv_vmstate CARG3, C
+    |    mv_vmstate CARG3, CFUNC
     |  mov CARG1, L
     |   bhi ->vm_growstack_c		// Need to grow stack.
     |    st_vmstate CARG3
diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
index 3eaf376..f783428 100644
--- a/src/vm_arm64.dasc
+++ b/src/vm_arm64.dasc
@@ -332,7 +332,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  str RB, L->base
   |   ldrsw CARG2, SAVE_NRES		// CARG2 = nresults+1.
-  |    mv_vmstate TMP0w, C
+  |    mv_vmstate TMP0w, CFUNC
   |   sub BASE, BASE, #16
   |  subs TMP2, RC, #8
   |    st_vmstate TMP0w
@@ -391,7 +391,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ldr L, SAVE_L
-  |   mv_vmstate TMP0w, C
+  |   mv_vmstate TMP0w, CFUNC
   |  ldr GL, L->glref
   |   st_vmstate TMP0w
   |  b ->vm_leave_unw
@@ -3816,7 +3816,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (op == BC_FUNCCW) {
       |  ldr CARG2, CFUNC:CARG3->f
     }
-    |    mv_vmstate TMP0w, C
+    |    mv_vmstate TMP0w, CFUNC
     |  mov CARG1, L
     |   bhi ->vm_growstack_c		// Need to grow stack.
     |    st_vmstate TMP0w
diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc
index 1afd611..ec57d78 100644
--- a/src/vm_mips.dasc
+++ b/src/vm_mips.dasc
@@ -403,7 +403,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  addiu TMP1, RD, -8
   |   sw TMP2, L->base
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   lw TMP2, SAVE_NRES
   |   addiu BASE, BASE, -8
   |    st_vmstate
@@ -473,7 +473,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  move CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  lw L, SAVE_L
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  lw GL:TMP1, L->glref
   |  b ->vm_leave_unw
   |.  sw TMP0, GL:TMP1->vmstate
@@ -5085,7 +5085,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  sw BASE, L->base
     |  sltu AT, TMP2, TMP1
     |   sw RC, L->top
-    |    li_vmstate C
+    |    li_vmstate CFUNC
     if (op == BC_FUNCCW) {
       |  lw CARG2, CFUNC:RB->f
     }
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index c06270a..9a749f9 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
@@ -449,7 +449,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  addiu TMP1, RD, -8
   |   sd TMP2, L->base
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   lw TMP2, SAVE_NRES
   |   daddiu BASE, BASE, -16
   |    st_vmstate
@@ -517,7 +517,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  move CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ld L, SAVE_L
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  ld GL:TMP1, L->glref
   |  b ->vm_leave_unw
   |.  sw TMP0, GL:TMP1->vmstate
@@ -4952,7 +4952,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  sd BASE, L->base
     |  sltu AT, TMP2, TMP1
     |   sd RC, L->top
-    |    li_vmstate C
+    |    li_vmstate CFUNC
     if (op == BC_FUNCCW) {
       |  ld CARG2, CFUNC:RB->f
     }
diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
index b4260eb..62e9b68 100644
--- a/src/vm_ppc.dasc
+++ b/src/vm_ppc.dasc
@@ -520,7 +520,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  // TMP0 = PC & FRAME_TYPE
   |  cmpwi TMP0, FRAME_C
   |   rlwinm TMP2, PC, 0, 0, 28
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   sub TMP2, BASE, TMP2		// TMP2 = previous base.
   |  bney ->vm_returnp
   |
@@ -596,7 +596,7 @@ static void build_subroutines(BuildCtx *ctx)
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  lwz L, SAVE_L
   |  .toc ld TOCREG, SAVE_TOC
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  lwz GL:TMP1, L->glref
   |   stw TMP0, GL:TMP1->vmstate
   |  b ->vm_leave_unw
@@ -5060,7 +5060,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   stp BASE, L->base
     |   cmplw TMP1, TMP2
     |    stp RC, L->top
-    |     li_vmstate C
+    |     li_vmstate CFUNC
     |.if TOC
     |  mtctr TMP3
     |.else
diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
index 80753e0..d4d3a1d 100644
--- a/src/vm_x64.dasc
+++ b/src/vm_x64.dasc
@@ -140,7 +140,7 @@
 |//-----------------------------------------------------------------------
 |.else			// x64/POSIX stack layout
 |
-|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
+|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
 |.macro saveregs_
 |  push rbx; push r15; push r14
 |.if NO_UNWIND
@@ -161,26 +161,29 @@
 |
 |//----- 16 byte aligned,
 |.if NO_UNWIND
-|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*10]
-|.define SAVE_R3,	aword [rsp+aword*9]
-|.define SAVE_R2,	aword [rsp+aword*8]
-|.define SAVE_R1,	aword [rsp+aword*7]
-|.define SAVE_RU2,	aword [rsp+aword*6]
-|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*12]
+|.define SAVE_R3,	qword [rsp+qword*11]
+|.define SAVE_R2,	qword [rsp+qword*10]
+|.define SAVE_R1,	qword [rsp+qword*9]
+|.define SAVE_RU2,	qword [rsp+qword*8]
+|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.else
-|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*8]
-|.define SAVE_R3,	aword [rsp+aword*7]
-|.define SAVE_R2,	aword [rsp+aword*6]
-|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*10]
+|.define SAVE_R3,	qword [rsp+qword*9]
+|.define SAVE_R2,	qword [rsp+qword*8]
+|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.endif
-|.define SAVE_CFRAME,	aword [rsp+aword*4]
-|.define SAVE_PC,	aword [rsp+aword*3]
-|.define SAVE_L,	aword [rsp+aword*2]
+|.define SAVE_CFRAME,	qword [rsp+qword*6]
+|.define SAVE_UNUSED2,	qword [rsp+qword*5]
+|.define SAVE_UNUSED1,	dword [rsp+dword*8]
+|.define SAVE_VMSTATE,	dword [rsp+dword*8]
+|.define SAVE_PC,	qword [rsp+qword*3]
+|.define SAVE_L,	qword [rsp+qword*2]
 |.define SAVE_ERRF,	dword [rsp+dword*3]
 |.define SAVE_NRES,	dword [rsp+dword*2]
-|.define TMP1,		aword [rsp]		//<-- rsp while in interpreter.
+|.define TMP1,		qword [rsp]		//<-- rsp while in interpreter.
 |//----- 16 byte aligned
 |
 |.define TMP1d,		dword [rsp]
@@ -342,6 +345,20 @@
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
 |.endmacro
 |
+|.if not WIN
+|// Save vmstate through register.
+|.macro save_vmstate_through, reg
+|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, reg
+|.endmacro
+|
+|// Restore vmstate through register.
+|.macro restore_vmstate_through, reg
+|  mov reg, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
+|.endmacro
+|.endif // WIN
+|
 |.macro fpop1; fstp st1; .endmacro
 |
 |// Synthesize SSE FP constants.
@@ -416,7 +433,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jnz ->vm_returnp
   |
   |  // Return to C.
-  |  set_vmstate C
+  |  set_vmstate CFUNC
   |  and PC, -8
   |  sub PC, BASE
   |  neg PC				// Previous base = BASE - delta.
@@ -448,6 +465,10 @@ static void build_subroutines(BuildCtx *ctx)
   |  xor eax, eax			// Ok return status for vm_pcall.
   |
   |->vm_leave_unw:
+  |.if not WIN
+  |  // DISPATCH required to set properly.
+  |  restore_vmstate_through RAd
+  |.endif
   |  restoreregs
   |  ret
   |
@@ -493,7 +514,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:DISPATCH, SAVE_L
   |  mov GL:RB, L:DISPATCH->glref
   |  mov GL:RB->cur_L, L:DISPATCH
-  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
+  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
+  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
+  |  add DISPATCH, GG_G2DISP
   |  jmp ->vm_leave_unw
   |
   |->vm_unwind_rethrow:
@@ -521,7 +544,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov [BASE-16], RA			// Prepend false to error message.
   |  mov [BASE-8], RB
   |  mov RA, -16			// Results start at BASE+RA = BASE-16.
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
   |
   |//-----------------------------------------------------------------------
@@ -575,6 +598,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  lea KBASE, [esp+CFRAME_RESUME]
   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
   |  add DISPATCH, GG_G2DISP
+  |.if not WIN
+  |  save_vmstate_through TMPRd
+  |.endif
   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
   |  mov SAVE_CFRAME, RD
   |  mov SAVE_NRES, RDd
@@ -585,7 +611,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  // Resume after yield (like a return).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |  mov byte L:RB->status, RDL
   |  mov BASE, L:RB->base
   |  mov RD, L:RB->top
@@ -622,11 +648,14 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_CFRAME, KBASE
   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
   |  add DISPATCH, GG_G2DISP
+  |.if not WIN
+  |  save_vmstate_through RDd
+  |.endif
   |  mov L:RB->cframe, rsp
   |
   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
   |  add PC, RA
   |  sub PC, BASE			// PC = frame delta + frame type
@@ -658,6 +687,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_ERRF, 0			// No error function.
   |  mov SAVE_NRES, KBASEd		// Neg. delta means cframe w/o frame.
   |   add DISPATCH, GG_G2DISP
+  |.if not WIN
+  |  save_vmstate_through KBASEd
+  |.endif
   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
   |
   |  mov KBASE, L:RB->cframe		// Add our C frame to cframe chain.
@@ -697,6 +729,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  cleartp LFUNC:KBASE
   |  mov KBASE, LFUNC:KBASE->pc
   |  mov KBASE, [KBASE+PC2PROTO(k)]
+  |  set_vmstate LFUNC			// LFUNC after KBASE restoration
   |  // BASE = base, RC = result, RB = meta base
   |  jmp RA				// Jump to continuation.
   |
@@ -1137,15 +1170,16 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
+  |  set_vmstate FFUNC
   |.endmacro
   |
   |.macro .ffunc_1, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RDd, 1+1;  jb ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_2, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RDd, 2+1;  jb ->fff_fallback
   |.endmacro
   |
@@ -1578,7 +1612,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:PC, TMP1
   |  mov BASE, L:RB->base
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |
   |  cmp eax, LUA_YIELD
   |  ja >8
@@ -1717,6 +1751,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  movzx RAd, PC_RA
   |  neg RA
   |  lea BASE, [BASE+RA*8-16]		// base = base - (RA+2)*8
+  |  set_vmstate LFUNC			// LFUNC state after BASE restoration
   |  ins_next
   |
   |6:  // Fill up results with nil.
@@ -2481,7 +2516,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  mov L:RB->base, BASE
   |  mov qword [DISPATCH+DISPATCH_GL(jit_base)], 0
-  |  set_vmstate INTERP
+  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  mov RCd, [PC]
   |  movzx RAd, RCH
@@ -2697,8 +2732,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CARG1, CTSTATE
   |  call extern lj_ccallback_enter	// (CTState *cts, void *cf)
   |  // lua_State * returned in eax (RD).
-  |  set_vmstate INTERP
   |  mov BASE, L:RD->base
+  |  set_vmstate LFUNC			// LFUNC after BASE restoration
   |  mov RD, L:RD->top
   |  sub RD, BASE
   |  mov LFUNC:RB, [BASE-16]
@@ -3974,6 +4009,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 
   case BC_CALL: case BC_CALLM:
     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     if (op == BC_CALLM) {
       |  add NARGS:RDd, MULTRES
     }
@@ -3995,6 +4031,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov LFUNC:RB, [RA-16]
     |  checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
     |->BC_CALLT_Z:
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     |  mov PC, [BASE-8]
     |  test PCd, FRAME_TYPE
     |  jnz >7
@@ -4219,6 +4256,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  shl RAd, 3
     }
     |1:
+    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
     |  mov PC, [BASE-8]
     |  mov MULTRES, RDd			// Save nresults+1.
     |  test PCd, FRAME_TYPE		// Check frame type marker.
@@ -4260,6 +4298,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cleartp LFUNC:KBASE
     |  mov KBASE, LFUNC:KBASE->pc
     |  mov KBASE, [KBASE+PC2PROTO(k)]
+    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
     |  ins_next
     |
     |6:  // Fill up results with nil.
@@ -4551,6 +4590,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
     |  mov KBASE, [PC-4+PC2PROTO(k)]
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [BASE+RA*8]		// Top of frame.
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_f
@@ -4588,6 +4628,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov [RD-8], RB			// Store delta + FRAME_VARG.
     |  mov [RD-16], LFUNC:KBASE		// Store copy of LFUNC.
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [RD+RA*8]
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_v		// Need to grow stack.
@@ -4643,7 +4684,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  mov CARG1, L:RB		// Caveat: CARG1 may be RA.
     }
     |  ja ->vm_growstack_c		// Need to grow stack.
-    |  set_vmstate C
+    |  set_vmstate CFUNC		// CFUNC before entering C function
     if (op == BC_FUNCC) {
       |  call KBASE			// (lua_State *L)
     } else {
@@ -4653,7 +4694,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  // nresults returned in eax (RD).
     |  mov BASE, L:RB->base
     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-    |  set_vmstate INTERP
+    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
     |  lea RA, [BASE+RD*8]
     |  neg RA
     |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
index d76fbe3..939c43f 100644
--- a/src/vm_x86.dasc
+++ b/src/vm_x86.dasc
@@ -140,7 +140,7 @@
 |
 |.else
 |
-|.define CFRAME_SPACE,	aword*7			// Delta for esp (see <--).
+|.define CFRAME_SPACE,	dword*11			// Delta for esp (see <--).
 |.macro saveregs_
 |  push edi; push esi; push ebx
 |  sub esp, CFRAME_SPACE
@@ -183,25 +183,30 @@
 |.define ARG1,		aword [esp]		//<-- esp while in interpreter.
 |//----- 16 byte aligned, ^^^ arguments for C callee
 |.else
-|.define SAVE_ERRF,	aword [esp+aword*15]	// vm_pcall/vm_cpcall only.
-|.define SAVE_NRES,	aword [esp+aword*14]
-|.define SAVE_CFRAME,	aword [esp+aword*13]
-|.define SAVE_L,	aword [esp+aword*12]
+|.define SAVE_ERRF,	dword [esp+dword*19]	// vm_pcall/vm_cpcall only.
+|.define SAVE_NRES,	dword [esp+dword*18]
+|.define SAVE_CFRAME,	dword [esp+dword*17]
+|.define SAVE_L,	dword [esp+dword*16]
 |//----- 16 byte aligned, ^^^ arguments from C caller
-|.define SAVE_RET,	aword [esp+aword*11]	//<-- esp entering interpreter.
-|.define SAVE_R4,	aword [esp+aword*10]
-|.define SAVE_R3,	aword [esp+aword*9]
-|.define SAVE_R2,	aword [esp+aword*8]
+|.define SAVE_RET,	dword [esp+dword*15]	//<-- esp entering interpreter.
+|.define SAVE_R4,	dword [esp+dword*14]
+|.define SAVE_R3,	dword [esp+dword*13]
+|.define SAVE_R2,	dword [esp+dword*12]
 |//----- 16 byte aligned
-|.define SAVE_R1,	aword [esp+aword*7]	//<-- esp after register saves.
-|.define SAVE_PC,	aword [esp+aword*6]
-|.define TMP2,		aword [esp+aword*5]
-|.define TMP1,		aword [esp+aword*4]
+|.define SAVE_UNUSED3,	dword [esp+dword*11]
+|.define SAVE_UNUSED2,	dword [esp+dword*10]
+|.define SAVE_UNUSED1,	dword [esp+dword*9]
+|.define SAVE_VMSTATE,	dword [esp+dword*8]
 |//----- 16 byte aligned
-|.define ARG4,		aword [esp+aword*3]
-|.define ARG3,		aword [esp+aword*2]
-|.define ARG2,		aword [esp+aword*1]
-|.define ARG1,		aword [esp]		//<-- esp while in interpreter.
+|.define SAVE_R1,	dword [esp+dword*7]	//<-- esp after register saves.
+|.define SAVE_PC,	dword [esp+dword*6]
+|.define TMP2,		dword [esp+dword*5]
+|.define TMP1,		dword [esp+dword*4]
+|//----- 16 byte aligned
+|.define ARG4,		dword [esp+dword*3]
+|.define ARG3,		dword [esp+dword*2]
+|.define ARG2,		dword [esp+dword*1]
+|.define ARG1,		dword [esp]		//<-- esp while in interpreter.
 |//----- 16 byte aligned, ^^^ arguments for C callee
 |.endif
 |
@@ -269,7 +274,7 @@
 |//-----------------------------------------------------------------------
 |.else			// x64/POSIX stack layout
 |
-|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
+|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
 |.macro saveregs_
 |  push rbx; push r15; push r14
 |.if NO_UNWIND
@@ -290,33 +295,35 @@
 |
 |//----- 16 byte aligned,
 |.if NO_UNWIND
-|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*10]
-|.define SAVE_R3,	aword [rsp+aword*9]
-|.define SAVE_R2,	aword [rsp+aword*8]
-|.define SAVE_R1,	aword [rsp+aword*7]
-|.define SAVE_RU2,	aword [rsp+aword*6]
-|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*12]
+|.define SAVE_R3,	qword [rsp+qword*11]
+|.define SAVE_R2,	qword [rsp+qword*10]
+|.define SAVE_R1,	qword [rsp+qword*9]
+|.define SAVE_RU2,	qword [rsp+qword*8]
+|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.else
-|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*8]
-|.define SAVE_R3,	aword [rsp+aword*7]
-|.define SAVE_R2,	aword [rsp+aword*6]
-|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*10]
+|.define SAVE_R3,	qword [rsp+qword*9]
+|.define SAVE_R2,	qword [rsp+qword*8]
+|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.endif
-|.define SAVE_CFRAME,	aword [rsp+aword*4]
+|.define SAVE_CFRAME,	qword [rsp+qword*6]
+|.define SAVE_UNUSED1,	qword [rsp+qword*5]
+|.define SAVE_VMSTATE,	dword [rsp+dword*8]
 |.define SAVE_PC,	dword [rsp+dword*7]
 |.define SAVE_L,	dword [rsp+dword*6]
 |.define SAVE_ERRF,	dword [rsp+dword*5]
 |.define SAVE_NRES,	dword [rsp+dword*4]
-|.define TMPa,		aword [rsp+aword*1]
+|.define TMPa,		qword [rsp+qword*1]
 |.define TMP2,		dword [rsp+dword*1]
 |.define TMP1,		dword [rsp]		//<-- rsp while in interpreter.
 |//----- 16 byte aligned
 |
 |// TMPQ overlaps TMP1/TMP2. MULTRES overlaps TMP2 (and TMPQ).
 |.define TMPQ,		qword [rsp]
-|.define TMP3,		dword [rsp+aword*1]
+|.define TMP3,		dword [rsp+qword*1]
 |.define MULTRES,	TMP2
 |
 |.endif
@@ -433,6 +440,20 @@
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
 |.endmacro
 |
+|.if not WIN
+|// Save vmstate through register.
+|.macro save_vmstate_through, reg
+|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, reg
+|.endmacro
+|
+|// Restore vmstate through register.
+|.macro restore_vmstate_through, reg
+|  mov reg, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
+|.endmacro
+|.endif // WIN
+|
 |// x87 compares.
 |.macro fcomparepp			// Compare and pop st0 >< st1.
 |  fucomip st1
@@ -520,7 +541,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jnz ->vm_returnp
   |
   |  // Return to C.
-  |  set_vmstate C
+  |  set_vmstate CFUNC
   |  and PC, -8
   |  sub PC, BASE
   |  neg PC				// Previous base = BASE - delta.
@@ -559,6 +580,10 @@ static void build_subroutines(BuildCtx *ctx)
   |  xor eax, eax			// Ok return status for vm_pcall.
   |
   |->vm_leave_unw:
+  |.if not WIN
+  |  // DISPATCH required to set properly.
+  |  restore_vmstate_through RA
+  |.endif
   |  restoreregs
   |  ret
   |
@@ -613,7 +638,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:DISPATCH, SAVE_L
   |  mov GL:RB, L:DISPATCH->glref
   |  mov dword GL:RB->cur_L, L:DISPATCH
-  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
+  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
+  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
+  |  add DISPATCH, GG_G2DISP
   |  jmp ->vm_leave_unw
   |
   |->vm_unwind_rethrow:
@@ -647,7 +674,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov PC, [BASE-4]			// Fetch PC of previous frame.
   |  mov dword [BASE-4], LJ_TFALSE	// Prepend false to error message.
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
   |
   |.if WIN and not X64
@@ -714,10 +741,13 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov RA, INARG_BASE			// Caveat: overlaps SAVE_CFRAME!
   |.endif
   |  mov PC, FRAME_CP
-  |  xor RD, RD
   |  lea KBASEa, [esp+CFRAME_RESUME]
   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
   |  add DISPATCH, GG_G2DISP
+  |.if not WIN
+  |  save_vmstate_through RD
+  |.endif
+  |  xor RD, RD
   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
   |  mov SAVE_CFRAME, RDa
   |.if X64
@@ -730,7 +760,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  // Resume after yield (like a return).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |  mov byte L:RB->status, RDL
   |  mov BASE, L:RB->base
   |  mov RD, L:RB->top
@@ -774,6 +804,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_CFRAME, KBASEa
   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
   |  add DISPATCH, GG_G2DISP
+  |.if not WIN
+  |  save_vmstate_through RD
+  |.endif
   |.if X64
   |  mov L:RB->cframe, rsp
   |.else
@@ -782,7 +815,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
   |  add PC, RA
   |  sub PC, BASE			// PC = frame delta + frame type
@@ -823,6 +856,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_ERRF, 0			// No error function.
   |  mov SAVE_NRES, KBASE		// Neg. delta means cframe w/o frame.
   |   add DISPATCH, GG_G2DISP
+  |.if not WIN
+  |  save_vmstate_through KBASE
+  |.endif
   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
   |
   |.if X64
@@ -885,6 +921,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, LFUNC:KBASE->pc
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  // BASE = base, RC = result, RB = meta base
+  |  set_vmstate LFUNC			// LFUNC after KBASE restoration
   |  jmp RAa				// Jump to continuation.
   |
   |.if FFI
@@ -1409,15 +1446,16 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
+  |  set_vmstate FFUNC
   |.endmacro
   |
   |.macro .ffunc_1, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RD, 1+1;  jb ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_2, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RD, 2+1;  jb ->fff_fallback
   |.endmacro
   |
@@ -1924,7 +1962,7 @@ static void build_subroutines(BuildCtx *ctx)
   |.endif
   |  mov BASE, L:RB->base
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |
   |  cmp eax, LUA_YIELD
   |  ja >8
@@ -2089,6 +2127,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  movzx RA, PC_RA
   |  not RAa				// Note: ~RA = -(RA+1)
   |  lea BASE, [BASE+RA*8]		// base = base - (RA+1)*8
+  |  set_vmstate LFUNC			// LFUNC state after BASE restoration
   |  ins_next
   |
   |6:  // Fill up results with nil.
@@ -2933,7 +2972,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  mov L:RB->base, BASE
   |  mov dword [DISPATCH+DISPATCH_GL(jit_base)], 0
-  |  set_vmstate INTERP
+  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  mov RC, [PC]
   |  movzx RA, RCH
@@ -3203,8 +3242,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov FCARG1, CTSTATE
   |  call extern lj_ccallback_enter@8	// (CTState *cts, void *cf)
   |  // lua_State * returned in eax (RD).
-  |  set_vmstate INTERP
   |  mov BASE, L:RD->base
+  |  set_vmstate LFUNC			// LFUNC after BASE restoration
   |  mov RD, L:RD->top
   |  sub RD, BASE
   |  mov LFUNC:RB, [BASE-8]
@@ -4683,6 +4722,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 
   case BC_CALL: case BC_CALLM:
     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     if (op == BC_CALLM) {
       |  add NARGS:RD, MULTRES
     }
@@ -4706,6 +4746,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cmp dword [RA-4], LJ_TFUNC
     |  jne ->vmeta_call
     |->BC_CALLT_Z:
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     |  mov PC, [BASE-4]
     |  test PC, FRAME_TYPE
     |  jnz >7
@@ -4989,6 +5030,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  shl RA, 3
     }
     |1:
+    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
     |  mov PC, [BASE-4]
     |  mov MULTRES, RD			// Save nresults+1.
     |  test PC, FRAME_TYPE		// Check frame type marker.
@@ -5043,6 +5085,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov LFUNC:KBASE, [BASE-8]
     |  mov KBASE, LFUNC:KBASE->pc
     |  mov KBASE, [KBASE+PC2PROTO(k)]
+    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
     |  ins_next
     |
     |6:  // Fill up results with nil.
@@ -5330,6 +5373,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
     |  mov KBASE, [PC-4+PC2PROTO(k)]
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [BASE+RA*8]		// Top of frame.
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_f
@@ -5367,6 +5411,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov [RD-4], RB			// Store delta + FRAME_VARG.
     |  mov [RD-8], LFUNC:KBASE		// Store copy of LFUNC.
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [RD+RA*8]
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_v		// Need to grow stack.
@@ -5431,7 +5476,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |.endif
     }
     |  ja ->vm_growstack_c		// Need to grow stack.
-    |  set_vmstate C
+    |  set_vmstate CFUNC		// CFUNC before entering C function
     if (op == BC_FUNCC) {
       |  call KBASEa			// (lua_State *L)
     } else {
@@ -5441,7 +5486,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  // nresults returned in eax (RD).
     |  mov BASE, L:RB->base
     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-    |  set_vmstate INTERP
+    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
     |  lea RA, [BASE+RD*8]
     |  neg RA
     |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (4 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch adds new field to global_State structure named
mem_L. It will be used in the next patches by memory profiler
to determine which coroutine triggers memory allocation.

Part of tarantool/tarantool#5442
---
 src/lj_gc.c  | 2 ++
 src/lj_obj.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/src/lj_gc.c b/src/lj_gc.c
index 44c8aa1..800fb2c 100644
--- a/src/lj_gc.c
+++ b/src/lj_gc.c
@@ -852,6 +852,8 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
 {
   global_State *g = G(L);
   lua_assert((osz == 0) == (p == NULL));
+
+  setgcref(g->mem_L, obj2gco(L));
   p = g->allocf(g->allocd, p, osz, nsz);
   if (p == NULL && nsz > 0)
     lj_err_mem(L);
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 7fb715e..c94617d 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -649,6 +649,7 @@ typedef struct global_State {
   BCIns bc_cfunc_int;	/* Bytecode for internal C function calls. */
   BCIns bc_cfunc_ext;	/* Bytecode for external C function calls. */
   GCRef cur_L;		/* Currently executing lua_State. */
+  GCRef mem_L;		/* Currently allocating lua_State. */
   MRef jit_base;	/* Current JIT code L->base or NULL. */
   MRef ctype_state;	/* Pointer to C type state. */
   GCRef gcroot[GCROOT_MAX];  /* GC roots. */
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (5 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-20 22:46   ` Igor Munkin
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler Sergey Kaplun
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch renames debug_frameline to lj_debug_frameline and moves it to
public <lj_debug.h> module API (does not provide it with LUA_API). It
will be used for memory profiler in the next patches.

Part of tarantool/tarantool#5442
---
 src/lj_debug.c | 8 ++++----
 src/lj_debug.h | 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/lj_debug.c b/src/lj_debug.c
index 73bd196..bb9ab28 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -128,7 +128,7 @@ BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc)
 }
 
 /* Get line number for function/frame. */
-static BCLine debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe)
+BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe)
 {
   BCPos pc = debug_framepc(L, fn, nextframe);
   if (pc != NO_BCPOS) {
@@ -353,7 +353,7 @@ void lj_debug_addloc(lua_State *L, const char *msg,
   if (frame) {
     GCfunc *fn = frame_func(frame);
     if (isluafunc(fn)) {
-      BCLine line = debug_frameline(L, fn, nextframe);
+      BCLine line = lj_debug_frameline(L, fn, nextframe);
       if (line >= 0) {
 	GCproto *pt = funcproto(fn);
 	char buf[LUA_IDSIZE];
@@ -470,7 +470,7 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
 	ar->what = "C";
       }
     } else if (*what == 'l') {
-      ar->currentline = frame ? debug_frameline(L, fn, nextframe) : -1;
+      ar->currentline = frame ? lj_debug_frameline(L, fn, nextframe) : -1;
     } else if (*what == 'u') {
       ar->nups = fn->c.nupvalues;
       if (ext) {
@@ -616,7 +616,7 @@ void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt, int depth)
 	    GCproto *pt = funcproto(fn);
 	    if (debug_putchunkname(sb, pt, pathstrip)) {
 	      /* Regular Lua function. */
-	      BCLine line = c == 'l' ? debug_frameline(L, fn, nextframe) :
+	      BCLine line = c == 'l' ? lj_debug_frameline(L, fn, nextframe) :
 				       pt->firstline;
 	      lj_buf_putb(sb, ':');
 	      lj_strfmt_putint(sb, line >= 0 ? line : pt->firstline);
diff --git a/src/lj_debug.h b/src/lj_debug.h
index 5917c00..1b5ef29 100644
--- a/src/lj_debug.h
+++ b/src/lj_debug.h
@@ -40,6 +40,7 @@ LJ_FUNC void lj_debug_addloc(lua_State *L, const char *msg,
 LJ_FUNC void lj_debug_pushloc(lua_State *L, GCproto *pt, BCPos pc);
 LJ_FUNC int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar,
 			     int ext);
+LJ_FUNC BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe);
 #if LJ_HASPROFILE
 LJ_FUNC void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt,
 				int depth);
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (6 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch adds memory profile module. Lua and C API for it will
be added at the next patches.

When VM executes a trace, vmstate is set to the trace number.
Also this patch defines special macro LJ_VMST_TRACE equaled to
LJ_VMST__MAX to encapsulate all different traces into one vmstate when
profiled.
See <ljp_memprof.h> for details.

Part of tarantool/tarantool#5442
---
 src/Makefile              |   8 +-
 src/Makefile.dep          |   4 +
 src/lj_arch.h             |  22 ++
 src/lj_obj.h              |   8 +
 src/lj_state.c            |   8 +
 src/lmisclib.h            |  29 +++
 src/profile/ljp_memprof.c | 413 ++++++++++++++++++++++++++++++++++++++
 src/profile/ljp_memprof.h |  86 ++++++++
 8 files changed, 577 insertions(+), 1 deletion(-)
 create mode 100644 src/profile/ljp_memprof.c
 create mode 100644 src/profile/ljp_memprof.h

diff --git a/src/Makefile b/src/Makefile
index e00265c..1ade2ec 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -113,6 +113,12 @@ XCFLAGS=
 # Enable GC64 mode for x64.
 #XCFLAGS+= -DLUAJIT_ENABLE_GC64
 #
+# Disable the memory profiler.
+#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
+#
+# Disable the thread safe profiler.
+#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE
+#
 ##############################################################################
 
 ##############################################################################
@@ -469,7 +475,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
 DASM_DASC= vm_$(DASM_ARCH).dasc
 
 UTILS_O= utils/leb128.o
-PROFILE_O= profile/ljp_write.o profile/ljp_symtab.o
+PROFILE_O= profile/ljp_write.o profile/ljp_symtab.o profile/ljp_memprof.o
 BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
 	   host/buildvm_lib.o host/buildvm_fold.o
 BUILDVM_T= host/buildvm
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 831a5ce..c41fdcf 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -248,6 +248,10 @@ host/buildvm_lib.o: host/buildvm_lib.c host/buildvm.h lj_def.h lua.h luaconf.h \
 host/buildvm_peobj.o: host/buildvm_peobj.c host/buildvm.h lj_def.h lua.h \
  luaconf.h lj_arch.h lj_bc.h lj_def.h lj_arch.h
 host/minilua.o: host/minilua.c
+profile/ljp_memprof.o: profile/ljp_memprof.c profile/ljp_memprof.h lmisclib.h \
+ lua.h luaconf.h lj_def.h lj_obj.h lj_arch.h lj_frame.h lj_bc.h lj_jit.h \
+ lj_ir.h lj_gc.h profile/ljp_symtab.h lj_debug.h profile/ljp_symtab.h \
+ profile/ljp_write.h
 profile/ljp_symtab.o: profile/ljp_symtab.c lj_obj.h lua.h luaconf.h lj_def.h \
  lj_arch.h profile/ljp_write.h profile/ljp_symtab.h
 profile/ljp_write.o: profile/ljp_write.c profile/ljp_write.h utils/leb128.h \
diff --git a/src/lj_arch.h b/src/lj_arch.h
index c8d7138..5967849 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -213,6 +213,8 @@
 #define LJ_ARCH_VERSION		50
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_ARM64
 
 #define LJ_ARCH_BITS		64
@@ -234,6 +236,8 @@
 
 #define LJ_ARCH_VERSION		80
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_PPC
 
 #ifndef LJ_ARCH_ENDIAN
@@ -299,6 +303,8 @@
 #define LJ_ARCH_XENON		1
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 || LUAJIT_TARGET == LUAJIT_ARCH_MIPS64
 
 #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL)
@@ -358,6 +364,8 @@
 #define LJ_ARCH_VERSION		10
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #else
 #error "No target architecture defined"
 #endif
@@ -564,4 +572,18 @@
 #define LJ_52			0
 #endif
 
+/* Disable or enable the memory profiler. */
+#if defined(LUAJIT_DISABLE_MEMPROF) || defined(LJ_ARCH_NOMEMPROF) || LJ_TARGET_WINDOWS || LJ_TARGET_CYGWIN || LJ_TARGET_PS3 || LJ_TARGET_PS4 || LJ_TARGET_XBOX360
+#define LJ_HASMEMPROF		0
+#else
+#define LJ_HASMEMPROF		1
+#endif
+
+/* Disable or enable the memory profiler's thread safety. */
+#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
+#define LJ_THREAD_SAFE		0
+#else
+#define LJ_THREAD_SAFE		1
+#endif
+
 #endif
diff --git a/src/lj_obj.h b/src/lj_obj.h
index c94617d..c94b0bb 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -523,6 +523,14 @@ enum {
   LJ_VMST__MAX
 };
 
+/*
+** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
+** In fact, when VM executes a trace, vmstate is set to the trace number, but
+** we aggregate all such cases into one VM state during per-VM state profiling.
+*/
+
+#define LJ_VMST_TRACE		(LJ_VMST__MAX)
+
 #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
 
 /* Metamethods. ORDER MM */
diff --git a/src/lj_state.c b/src/lj_state.c
index 1d9c628..09ac1b4 100644
--- a/src/lj_state.c
+++ b/src/lj_state.c
@@ -29,6 +29,10 @@
 #include "lj_alloc.h"
 #include "luajit.h"
 
+#if LJ_HASMEMPROF
+#include "profile/ljp_memprof.h"
+#endif
+
 /* -- Stack handling ------------------------------------------------------ */
 
 /* Stack sizes. */
@@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
   global_State *g = G(L);
   int i;
   L = mainthread(g);  /* Only the main thread can be closed. */
+#if LJ_HASMEMPROF
+  if (ljp_memprof_is_running())
+    ljp_memprof_stop();
+#endif
 #if LJ_HASPROFILE
   luaJIT_profile_stop(L);
 #endif
diff --git a/src/lmisclib.h b/src/lmisclib.h
index 0c07707..b4c41da 100644
--- a/src/lmisclib.h
+++ b/src/lmisclib.h
@@ -60,6 +60,35 @@ struct luam_Metrics {
 
 LUAMISC_API void luaM_metrics(lua_State *L, struct luam_Metrics *metrics);
 
+/* Profiler public API. */
+#define LUAM_PROFILE_SUCCESS 0
+#define LUAM_PROFILE_ERR     1
+#define LUAM_PROFILE_ERRRUN  2
+#define LUAM_PROFILE_ERRMEM  3
+#define LUAM_PROFILE_ERRIO   4
+
+/* Profiler options. */
+struct luam_Prof_options {
+  /* Context for the profile writer and final callback. */
+  void *ctx;
+  /* Custom buffer to write data. */
+  uint8_t *buf;
+  /* The buffer's size. */
+  size_t len;
+  /*
+  ** Writer function for profile events.
+  ** Should return amount of written bytes on success or zero in case of error.
+  ** Setting *data to NULL means end of profiling.
+  */
+  size_t (*writer)(const void **data, size_t len, void *ctx);
+  /*
+  ** Callback on profiler stopping. Required for correctly cleaning
+  ** at vm shoutdown when profiler still running.
+  ** Returns zero on success.
+  */
+  int (*on_stop)(void *ctx, uint8_t *buf);
+};
+
 #define LUAM_MISCLIBNAME "misc"
 LUALIB_API int luaopen_misc(lua_State *L);
 
diff --git a/src/profile/ljp_memprof.c b/src/profile/ljp_memprof.c
new file mode 100644
index 0000000..2137d5a
--- /dev/null
+++ b/src/profile/ljp_memprof.c
@@ -0,0 +1,413 @@
+/*
+** Implementation of memory profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#include <errno.h>
+
+#include "profile/ljp_memprof.h"
+#include "lmisclib.h"
+#include "lj_def.h"
+#include "lj_arch.h"
+
+#if LJ_HASMEMPROF
+
+#if LJ_IS_THREAD_SAFE
+#include <pthread.h>
+#endif
+
+#include "lua.h"
+
+#include "lj_obj.h"
+#include "lj_frame.h"
+#include "lj_debug.h"
+#include "lj_gc.h"
+#include "profile/ljp_symtab.h"
+#include "profile/ljp_write.h"
+
+/* Allocation events: */
+#define AEVENT_ALLOC   ((uint8_t)1)
+#define AEVENT_FREE    ((uint8_t)2)
+#define AEVENT_REALLOC ((uint8_t)(AEVENT_ALLOC | AEVENT_FREE))
+
+/* Allocation sources: */
+#define ASOURCE_INT   ((uint8_t)(1 << 2))
+#define ASOURCE_LFUNC ((uint8_t)(2 << 2))
+#define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+
+/* Aux bits: */
+
+/*
+** Reserved. There is ~1 second between each two events marked with this flag.
+** This will possibly be used later to implement dumps of the evolving heap.
+*/
+#define LJM_TIMESTAMP ((uint8_t)(0x40))
+
+#define LJM_EPILOGUE_HEADER 0x80
+
+enum memprof_state {
+  /* memprof is not running. */
+  MPS_IDLE,
+  /* memprof is running. */
+  MPS_PROFILE,
+  /*
+  ** Stopped in case of stopped stream.
+  ** Saved errno is returned to user at memprof_stop.
+  */
+  MPS_HALT
+};
+
+struct alloc {
+  lua_Alloc allocf; /* Allocating function. */
+  void *state; /* Opaque allocator's state. */
+};
+
+struct memprof {
+  global_State *g; /* Profiled VM. */
+  enum memprof_state state; /* Internal state. */
+  struct ljp_buffer out; /* Output accumulator. */
+  struct alloc orig_alloc; /* Original allocator. */
+  struct luam_Prof_options opt; /* Profiling options. */
+  int saved_errno; /* Saved errno when profiler deinstrumented. */
+};
+
+#if LJ_IS_THREAD_SAFE
+
+pthread_mutex_t memprof_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static LJ_AINLINE int memprof_lock(void)
+{
+  return pthread_mutex_lock(&memprof_mutex);
+}
+
+static LJ_AINLINE int memprof_unlock(void)
+{
+  return pthread_mutex_unlock(&memprof_mutex);
+}
+
+#else /* LJ_IS_THREAD_SAFE */
+
+#define memprof_lock()
+#define memprof_unlock()
+
+#endif /* LJ_IS_THREAD_SAFE */
+
+static struct memprof memprof = {0};
+
+const unsigned char ljm_header[] = {'l', 'j', 'm', LJM_CURRENT_FORMAT_VERSION,
+				    0x0, 0x0, 0x0};
+
+static void memprof_write_lfunc(struct ljp_buffer *out, uint8_t header,
+				GCfunc *fn, struct lua_State *L,
+				cTValue *nextframe)
+{
+  const BCLine line = lj_debug_frameline(L, fn, nextframe);
+  ljp_write_byte(out, header | ASOURCE_LFUNC);
+  ljp_write_u64(out, (uintptr_t)funcproto(fn));
+  ljp_write_u64(out, line >= 0 ? (uintptr_t)line : 0);
+}
+
+static void memprof_write_cfunc(struct ljp_buffer *out, uint8_t header,
+				const GCfunc *fn)
+{
+  ljp_write_byte(out, header | ASOURCE_CFUNC);
+  ljp_write_u64(out, (uintptr_t)fn->c.f);
+}
+
+static void memprof_write_ffunc(struct ljp_buffer *out, uint8_t header,
+				GCfunc *fn, struct lua_State *L,
+				cTValue *frame)
+{
+  cTValue *pframe = frame_prev(frame);
+  GCfunc *pfn = frame_func(pframe);
+
+  /*
+  ** NB! If a fast function is called by a Lua function, report the
+  ** Lua function for more meaningful output. Otherwise report the fast
+  ** function as a C function.
+  */
+  if (pfn != NULL && isluafunc(pfn))
+    memprof_write_lfunc(out, header, pfn, L, frame);
+  else
+    memprof_write_cfunc(out, header, fn);
+}
+
+static void memprof_write_func(struct memprof *mp, uint8_t header)
+{
+  struct ljp_buffer *out = &mp->out;
+  lua_State *L = gco2th(gcref(mp->g->mem_L));
+  cTValue *frame = L->base - 1;
+  GCfunc *fn;
+
+  fn = frame_func(frame);
+
+  if (isluafunc(fn))
+    memprof_write_lfunc(out, header, fn, L, NULL);
+  else if (isffunc(fn))
+    memprof_write_ffunc(out, header, fn, L, frame);
+  else if (iscfunc(fn))
+    memprof_write_cfunc(out, header, fn);
+  else
+    lua_assert(0);
+}
+
+static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
+{
+  ljp_write_byte(&mp->out, header | ASOURCE_INT);
+}
+
+/*
+** NB! In ideal world, we should report allocations from traces as well.
+** But since traces must follow the semantics of the original code, behaviour of
+** Lua and JITted code must match 1:1 in terms of allocations, which makes
+** using memprof with enabled JIT virtually redundant. Hence the stub below.
+*/
+static void memprof_write_trace(struct memprof *mp, uint8_t header)
+{
+  ljp_write_byte(&mp->out, header | ASOURCE_INT);
+}
+
+typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);
+
+static const memprof_writer memprof_writers[] = {
+  memprof_write_hvmstate, /* LJ_VMST_INTERP */
+  memprof_write_func, /* LJ_VMST_LFUNC */
+  memprof_write_func, /* LJ_VMST_FFUNC */
+  memprof_write_func, /* LJ_VMST_CFUNC */
+  memprof_write_hvmstate, /* LJ_VMST_GC */
+  memprof_write_hvmstate, /* LJ_VMST_EXIT */
+  memprof_write_hvmstate, /* LJ_VMST_RECORD */
+  memprof_write_hvmstate, /* LJ_VMST_OPT */
+  memprof_write_hvmstate, /* LJ_VMST_ASM */
+  memprof_write_trace /* LJ_VMST_TRACE */
+};
+
+static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
+{
+  const global_State *g = mp->g;
+  const uint32_t _vmstate = (uint32_t)~g->vmstate;
+  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
+  const uint8_t header = aevent;
+
+  memprof_writers[vmstate](mp, header);
+}
+
+static int memprof_stop(const struct lua_State *L);
+
+static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
+{
+  struct memprof *mp = &memprof;
+  struct alloc *oalloc = &mp->orig_alloc;
+  struct ljp_buffer *out = &mp->out;
+  void *nptr;
+
+  lua_assert(MPS_PROFILE == mp->state);
+  lua_assert(oalloc->allocf != memprof_allocf);
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(ud == oalloc->state);
+
+  nptr = oalloc->allocf(ud, ptr, osize, nsize);
+
+  if (nsize == 0) {
+    memprof_write_caller(mp, AEVENT_FREE);
+    ljp_write_u64(out, (uintptr_t)ptr);
+    ljp_write_u64(out, (uint64_t)osize);
+  } else if (ptr == NULL) {
+    memprof_write_caller(mp, AEVENT_ALLOC);
+    ljp_write_u64(out, (uintptr_t)nptr);
+    ljp_write_u64(out, (uint64_t)nsize);
+  } else {
+    memprof_write_caller(mp, AEVENT_REALLOC);
+    ljp_write_u64(out, (uintptr_t)ptr);
+    ljp_write_u64(out, (uint64_t)osize);
+    ljp_write_u64(out, (uintptr_t)nptr);
+    ljp_write_u64(out, (uint64_t)nsize);
+  }
+
+  /* Deinstrument memprof if required. */
+  if (LJ_UNLIKELY(ljp_write_test_flag(out, STREAM_STOP)))
+    memprof_stop(NULL);
+
+  return nptr;
+}
+
+static void memprof_write_prologue(struct ljp_buffer *out)
+{
+  size_t i = 0;
+  const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
+
+  for (; i < len; i++)
+    ljp_write_byte(out, ljm_header[i]);
+}
+
+int ljp_memprof_start(struct lua_State *L, const struct luam_Prof_options *opt)
+{
+  struct memprof *mp = &memprof;
+  struct alloc *oalloc = &mp->orig_alloc;
+
+  lua_assert(opt->writer != NULL && opt->on_stop != NULL);
+  lua_assert(opt->buf != NULL && opt->len != 0);
+
+  memprof_lock();
+
+  if (mp->state != MPS_IDLE) {
+    memprof_unlock();
+    return LUAM_PROFILE_ERRRUN;
+  }
+
+  /* Discard possible old errno. */
+  mp->saved_errno = 0;
+
+  /* Init options: */
+  memcpy(&mp->opt, opt, sizeof(*opt));
+
+  /* Init general fields: */
+  mp->g = G(L);
+  mp->state = MPS_PROFILE;
+
+  /* Init output: */
+  ljp_write_init(&mp->out, mp->opt.writer, mp->opt.ctx, mp->opt.buf,
+		 mp->opt.len);
+  ljp_symtab_write(&mp->out, mp->g);
+  memprof_write_prologue(&mp->out);
+
+  if (LJ_UNLIKELY(ljp_write_test_flag(&mp->out, STREAM_ERR_IO) ||
+		  ljp_write_test_flag(&mp->out, STREAM_STOP))) {
+    /* on_stop call may change errno value. */
+    int saved_errno = ljp_write_errno(&mp->out);
+    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
+    ljp_write_terminate(&mp->out);
+    mp->state = MPS_IDLE;
+    memprof_unlock();
+    errno = saved_errno;
+    return LUAM_PROFILE_ERRIO;
+  }
+
+  /* Override allocating function: */
+  oalloc->allocf = lua_getallocf(L, &oalloc->state);
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(oalloc->allocf != memprof_allocf);
+  lua_assert(oalloc->state != NULL);
+  lua_setallocf(L, memprof_allocf, oalloc->state);
+
+  memprof_unlock();
+  return LUAM_PROFILE_SUCCESS;
+}
+
+static int memprof_stop(const struct lua_State *L)
+{
+  struct memprof *mp = &memprof;
+  struct alloc *oalloc = &mp->orig_alloc;
+  struct ljp_buffer *out = &mp->out;
+  int return_status = LUAM_PROFILE_SUCCESS;
+  int saved_errno = 0;
+  struct lua_State *main_L;
+  int cb_status;
+
+  memprof_lock();
+
+  if (mp->state == MPS_HALT) {
+    errno = mp->saved_errno;
+    mp->state = MPS_IDLE
+    memprof_unlock();
+    return LUAM_PROFILE_ERRIO;
+  }
+
+  if (mp->state != MPS_PROFILE) {
+    memprof_unlock();
+    return LUAM_PROFILE_ERRRUN;
+  }
+
+  if (L != NULL && mp->g != G(L)) {
+    memprof_unlock();
+    return LUAM_PROFILE_ERR;
+  }
+
+  mp->state = MPS_IDLE;
+
+  lua_assert(mp->g != NULL);
+  main_L = mainthread(mp->g);
+
+  lua_assert(memprof_allocf == lua_getallocf(main_L, NULL));
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(oalloc->state != NULL);
+  lua_setallocf(main_L, oalloc->allocf, oalloc->state);
+
+  if (LJ_UNLIKELY(ljp_write_test_flag(out, STREAM_STOP))) {
+    lua_assert(ljp_write_test_flag(out, STREAM_ERR_IO));
+    mp->state = MPS_HALT;
+    /* on_stop call may change errno value. */
+    mp->saved_errno = ljp_write_errno(out);
+    /* Ignore possible errors. mp->opt.buf == NULL here. */
+    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
+    ljp_write_terminate(out);
+    memprof_unlock();
+    return LUAM_PROFILE_ERRIO;
+  }
+  ljp_write_byte(out, LJM_EPILOGUE_HEADER);
+
+  ljp_write_flush_buffer(out);
+
+  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
+  if (LJ_UNLIKELY(ljp_write_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
+    saved_errno = ljp_write_errno(out);
+    return_status = LUAM_PROFILE_ERRIO;
+  }
+
+  ljp_write_terminate(out);
+
+  memprof_unlock();
+  errno = saved_errno;
+  return return_status;
+}
+
+int ljp_memprof_stop(void)
+{
+  return memprof_stop(NULL);
+}
+
+int ljp_memprof_stop_vm(const struct lua_State *L)
+{
+  return memprof_stop(L);
+}
+
+int ljp_memprof_is_running(void)
+{
+  struct memprof *mp = &memprof;
+  int running;
+
+  memprof_lock();
+  running = mp->state == MPS_PROFILE;
+  memprof_unlock();
+
+  return running;
+}
+
+#else /* LJ_HASMEMPROF */
+
+int ljp_memprof_start(struct lua_State *L, const struct luam_Prof_options *opt)
+{
+  UNUSED(L);
+  UNUSED(opt);
+  return LUAM_PROFILE_ERR;
+}
+
+int ljp_memprof_stop(void)
+{
+  return LUAM_PROFILE_ERR;
+}
+
+int ljp_memprof_stop_vm(const struct lua_State *L)
+{
+  UNUSED(L);
+  return LUAM_PROFILE_ERR;
+}
+
+int ljp_memprof_is_running(void)
+{
+  return 0;
+}
+
+#endif /* LJ_HASMEMPROF */
diff --git a/src/profile/ljp_memprof.h b/src/profile/ljp_memprof.h
new file mode 100644
index 0000000..90c1990
--- /dev/null
+++ b/src/profile/ljp_memprof.h
@@ -0,0 +1,86 @@
+/*
+** Memory profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJP_MEMPROF_H
+#define _LJP_MEMPROF_H
+
+/*
+** Event stream format:
+**
+** stream         := symtab memprof
+** symtab         := see <ljp_symtab.h>
+** memprof        := prologue event* epilogue
+** prologue       := 'l' 'j' 'm' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** event          := event-alloc | event-realloc | event-free
+** event-alloc    := event-header loc? naddr nsize
+** event-realloc  := event-header loc? oaddr osize naddr nsize
+** event-free     := event-header loc? oaddr osize
+** event-header   := <BYTE>
+** loc            := loc-lua | loc-c
+** loc-lua        := sym-addr line-no
+** loc-c          := sym-addr
+** sym-addr       := <ULEB128>
+** line-no        := <ULEB128>
+** oaddr          := <ULEB128>
+** naddr          := <ULEB128>
+** osize          := <ULEB128>
+** nsize          := <ULEB128>
+** epilogue       := event-header
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain integer version number
+**
+** event-header: [FTUUSSEE]
+**  * EE   : 2 bits for representing allocation event type (AEVENT_*)
+**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
+**  * UU   : 2 unused bits
+**  * T    : Reserved. 0 for regular events, 1 for the events marked with
+**           the timestamp mark. It is assumed that the time distance between
+**           two marked events is approximately the same and is equal
+**           to 1 second. Always zero for now.
+**  * F    : 0 for regular events, 1 for epilogue's *F*inal header
+**           (if F is set to 1, all other bits are currently ignored)
+*/
+
+struct lua_State;
+
+#define LJM_CURRENT_FORMAT_VERSION 0x02
+
+struct luam_Prof_options;
+
+/*
+** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
+** LUAM_PROFILE_ERR* codes otherwise. Destroyer is called in case of
+** LUAM_PROFILE_ERRIO.
+*/
+int ljp_memprof_start(struct lua_State *L, const struct luam_Prof_options *opt);
+
+/*
+** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
+** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
+** on call at buffer flush, profiled stream stops, or on_stop() callback
+** returns non-zero value, returns LUAM_PROFILE_ERRIO.
+*/
+int ljp_memprof_stop(void);
+
+/*
+** VM g is currently being profiled, behaves exactly as ljp_memprof_stop().
+** Otherwise does nothing and returns LUAM_PROFILE_ERR.
+*/
+int ljp_memprof_stop_vm(const struct lua_State *L);
+
+/* Check that profiler is running. */
+int ljp_memprof_is_running(void);
+
+#endif
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for memory profiler
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (7 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-24 16:32   ` Sergey Ostanevich
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces Lua API for LuaJIT memory profiler implemented in
the scope of the previous patch.

Profiler returns `true` if start successfully, returns nil on failure
(plus an error message as a second result and a system-dependent error
code as a third result) and some true value on success.
If LuaJIT build without memory profiler both return `false`.

<lj_errmsg.h> have adjusted with two new errors
PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
started/stopped already correspondingly.

Part of tarantool/tarantool#5442
---
 src/Makefile.dep |   5 +-
 src/lib_misc.c   | 165 +++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_errmsg.h  |   6 ++
 3 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.dep b/src/Makefile.dep
index c41fdcf..510b5e5 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -29,8 +29,9 @@ lib_jit.o: lib_jit.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
  lj_vm.h lj_vmevent.h lj_lib.h luajit.h lj_libdef.h
 lib_math.o: lib_math.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h \
  lj_def.h lj_arch.h lj_lib.h lj_vm.h lj_libdef.h
-lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
- lj_str.h lj_tab.h lj_lib.h lj_libdef.h
+lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lauxlib.h lj_obj.h \
+ lj_def.h lj_arch.h lj_str.h lj_tab.h lj_lib.h lj_gc.h lj_err.h \
+ lj_errmsg.h profile/ljp_memprof.h lj_libdef.h
 lib_os.o: lib_os.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_lib.h \
  lj_libdef.h
diff --git a/src/lib_misc.c b/src/lib_misc.c
index 6f7b9a9..d3b5ab4 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -8,13 +8,21 @@
 #define lib_misc_c
 #define LUA_LIB
 
+#include <stdio.h>
+#include <errno.h>
+
 #include "lua.h"
 #include "lmisclib.h"
+#include "lauxlib.h"
 
 #include "lj_obj.h"
 #include "lj_str.h"
 #include "lj_tab.h"
 #include "lj_lib.h"
+#include "lj_gc.h"
+#include "lj_err.h"
+
+#include "profile/ljp_memprof.h"
 
 /* ------------------------------------------------------------------------ */
 
@@ -67,8 +75,165 @@ LJLIB_CF(misc_getmetrics)
 
 #include "lj_libdef.h"
 
+/* ----- misc.memprof module ---------------------------------------------- */
+
+#define LJLIB_MODULE_misc_memprof
+
+/*
+** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
+*/
+#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
+
+/* Structure given as ctx to memprof writer and on_stop callback. */
+struct memprof_ctx {
+  /* Output file stream for data. */
+  FILE *stream;
+  /* Profiled global_State for lj_mem_free at on_stop callback. */
+  global_State *g;
+};
+
+static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
+{
+  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);
+  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+}
+
+/* Default buffer writer function. Just call fwrite to corresponding FILE. */
+static size_t buffer_writer_default(const void **buf_addr, size_t len,
+				    void *opt)
+{
+  FILE *stream = ((struct memprof_ctx *)opt)->stream;
+  const void * const buf_start = *buf_addr;
+  const void *data = *buf_addr;
+  size_t write_total = 0;
+
+  lua_assert(len <= STREAM_BUFFER_SIZE);
+
+  for (;;) {
+    const size_t written = fwrite(data, 1, len, stream);
+
+    if (LJ_UNLIKELY(written == 0)) {
+      /* Re-tries write in case of EINTR. */
+      if (errno == EINTR) {
+	errno = 0;
+	continue;
+      }
+      break;
+    }
+
+    write_total += written;
+
+    if (write_total == len)
+      break;
+
+    data = (uint8_t *)data + (ptrdiff_t)written;
+  }
+  lua_assert(write_total <= len);
+
+  *buf_addr = buf_start;
+  return write_total;
+}
+
+/* Default on stop callback. Just close corresponding stream. */
+static int on_stop_cb_default(void *opt, uint8_t *buf)
+{
+  struct memprof_ctx *ctx = opt;
+  FILE *stream = ctx->stream;
+  memprof_ctx_free(ctx, buf);
+  return fclose(stream);
+}
+
+/* local started, err, errno = misc.memprof.start(fname) */
+LJLIB_CF(misc_memprof_start)
+{
+  struct luam_Prof_options opt = {0};
+  struct memprof_ctx *ctx;
+  const char *fname;
+  int memprof_status;
+  int started;
+
+  fname = strdata(lj_lib_checkstr(L, 1));
+
+  ctx = lj_mem_new(L, sizeof(*ctx));
+  if (ctx == NULL)
+    goto errmem;
+
+  opt.ctx = ctx;
+  opt.writer = buffer_writer_default;
+  opt.on_stop = on_stop_cb_default;
+  opt.len = STREAM_BUFFER_SIZE;
+  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
+  if (NULL == opt.buf) {
+    lj_mem_free(G(L), ctx, sizeof(*ctx));
+    goto errmem;
+  }
+
+  ctx->g = G(L);
+  ctx->stream = fopen(fname, "wb");
+
+  if (ctx->stream == NULL) {
+    memprof_ctx_free(ctx, opt.buf);
+    return luaL_fileresult(L, 0, fname);
+  }
+
+  memprof_status = ljp_memprof_start(L, &opt);
+  started = memprof_status == LUAM_PROFILE_SUCCESS;
+
+  if (LJ_UNLIKELY(!started)) {
+    fclose(ctx->stream);
+    remove(fname);
+    memprof_ctx_free(ctx, opt.buf);
+    switch (memprof_status) {
+    case LUAM_PROFILE_ERRRUN:
+      lua_pushnil(L);
+      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_PROF_ISRUNNING));
+      return 2;
+    case LUAM_PROFILE_ERRMEM:
+      /* Unreachable for now. */
+      goto errmem;
+    case LUAM_PROFILE_ERRIO:
+      return luaL_fileresult(L, 0, fname);
+    default:
+      break;
+    }
+  }
+  lua_pushboolean(L, started);
+
+  return 1;
+errmem:
+  lua_pushnil(L);
+  setstrV(L, L->top++, lj_err_str(L, LJ_ERR_ERRMEM));
+  return 2;
+}
+
+/* local stopped, err, errno = misc.memprof.stop() */
+LJLIB_CF(misc_memprof_stop)
+{
+  int status = ljp_memprof_stop();
+  int stopped_successfully = status == LUAM_PROFILE_SUCCESS;
+  if (!stopped_successfully) {
+    switch (status) {
+    case LUAM_PROFILE_ERRRUN:
+      lua_pushnil(L);
+      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_PROF_NOTRUNNING));
+      return 2;
+    case LUAM_PROFILE_ERRIO:
+      return luaL_fileresult(L, 0, NULL);
+    default:
+      break;
+    }
+  }
+  lua_pushboolean(L, stopped_successfully);
+  return 1;
+}
+
+#include "lj_libdef.h"
+
+/* ------------------------------------------------------------------------ */
+
 LUALIB_API int luaopen_misc(struct lua_State *L)
 {
   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
+  LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
   return 1;
 }
diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
index de7b867..6816da2 100644
--- a/src/lj_errmsg.h
+++ b/src/lj_errmsg.h
@@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
 ERRDEF(FFI_NYICALL,	"NYI: cannot call this C function (yet)")
 #endif
 
+#if LJ_HASPROFILE || LJ_HASMEMPROF
+/* Profiler errors. */
+ERRDEF(PROF_ISRUNNING,	"profiler is running already")
+ERRDEF(PROF_NOTRUNNING,	"profiler is not running")
+#endif
+
 #undef ERRDEF
 
 /* Detecting unused error messages:
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (8 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-20 22:46   ` Igor Munkin
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun
  2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch adds new directory contained all toolling for LuaJIT.

Part of tarantool/tarantool#5442
Part of tarantool/tarantool#5490
---
 {src => tools}/luajit-gdb.py | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {src => tools}/luajit-gdb.py (100%)

diff --git a/src/luajit-gdb.py b/tools/luajit-gdb.py
similarity index 100%
rename from src/luajit-gdb.py
rename to tools/luajit-gdb.py
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (9 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun
@ 2020-12-16 19:13 ` Sergey Kaplun
  2020-12-24 23:09   ` Igor Munkin
  2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin
  11 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-16 19:13 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch adds parser for profiler dumped binary data.
It provides a script that parses given binary file. It parses symtab using
ffi first and after map memory events with this symtab. Finally, it
renders the data in human-readable format.

Part of tarantool/tarantool#5442
Part of tarantool/tarantool#5490
---
 test/misclib-memprof-lapi.test.lua    | 125 +++++++++++++++++
 tools/luajit-parse-memprof            |  20 +++
 tools/parse_memprof/bufread.lua       | 143 +++++++++++++++++++
 tools/parse_memprof/main.lua          | 104 ++++++++++++++
 tools/parse_memprof/parse_memprof.lua | 195 ++++++++++++++++++++++++++
 tools/parse_memprof/parse_symtab.lua  |  88 ++++++++++++
 tools/parse_memprof/view_plain.lua    |  45 ++++++
 7 files changed, 720 insertions(+)
 create mode 100755 test/misclib-memprof-lapi.test.lua
 create mode 100755 tools/luajit-parse-memprof
 create mode 100644 tools/parse_memprof/bufread.lua
 create mode 100644 tools/parse_memprof/main.lua
 create mode 100644 tools/parse_memprof/parse_memprof.lua
 create mode 100644 tools/parse_memprof/parse_symtab.lua
 create mode 100644 tools/parse_memprof/view_plain.lua

diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
new file mode 100755
index 0000000..e3663bb
--- /dev/null
+++ b/test/misclib-memprof-lapi.test.lua
@@ -0,0 +1,125 @@
+#!/usr/bin/env tarantool
+
+jit.off()
+jit.flush()
+
+local path = arg[0]:gsub('/[^/]+%.test%.lua', '')
+local parse_suffix = '../tools/parse_memprof/?.lua;'
+package.path = ('%s/%s;'):format(path, parse_suffix)..package.path
+
+local table_new = require "table.new"
+
+local bufread = require "bufread"
+local memprof = require "parse_memprof"
+local symtab  = require "parse_symtab"
+
+local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
+local BAD_PATH    = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
+
+local function payload()
+  -- Preallocate table to avoid array part reallocations.
+  local _ = table_new(100, 0)
+
+  -- Want too see 100 objects here.
+  for i = 1, 100 do
+    -- Shift to avoid crossing with "test" module objects.
+    _[i] = tostring(i + 100)
+  end
+
+  _ = nil
+  -- VMSTATE == GC, reported as INTERNAL.
+  collectgarbage()
+end
+
+local tap = require('tap')
+
+local test = tap.test("misc-memprof-lapi")
+test:plan(6)
+
+local function generate_output(filename)
+  -- Clean up all garbage to avoid polution of free.
+  collectgarbage()
+
+  local res, err = misc.memprof.start(filename)
+  -- Should start succesfully.
+  assert(res, err)
+
+  payload()
+
+  res, err = misc.memprof.stop()
+  -- Should stop succesfully.
+  assert(res, err)
+end
+
+local function fill_ev_type(events, symbols, event_type)
+  local ev_type = {}
+  for _, event in pairs(events[event_type]) do
+    local addr = event.loc.addr
+    if addr == 0 then
+      ev_type.INTERNAL = {
+        name = "INTERNAL",
+        num = event.num,
+    }
+    elseif symbols[addr] then
+      ev_type[event.loc.line] = {
+        name = symbols[addr].name,
+        num = event.num,
+      }
+    end
+  end
+  return ev_type
+end
+
+local function check_alloc_report(alloc, line, function_line, nevents)
+  assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
+  print(nevents, alloc[line].num)
+  assert(alloc[line].num == nevents)
+  return true
+end
+
+-- Not a directory.
+local res, err = misc.memprof.start(BAD_PATH)
+test:ok(res == nil and err:match("Not a directory"))
+
+-- Profiler is running.
+res, err = misc.memprof.start(TMP_BINFILE)
+assert(res, err)
+res, err = misc.memprof.start(TMP_BINFILE)
+test:ok(res == nil and err:match("profiler is running already"))
+
+res, err = misc.memprof.stop()
+assert(res, err)
+
+-- Profiler is not running.
+res, err = misc.memprof.stop()
+test:ok(res == nil and err:match("profiler is not running"))
+
+-- Test profiler output and parse.
+res, err = pcall(generate_output, TMP_BINFILE)
+
+-- Want to cleanup carefully if something went wrong.
+if not res then
+  os.remove(TMP_BINFILE)
+  error(err)
+end
+
+local reader  = bufread.new(TMP_BINFILE)
+local symbols = symtab.parse(reader)
+local events  = memprof.parse(reader, symbols)
+
+-- We don't need it any more.
+os.remove(TMP_BINFILE)
+
+local alloc = fill_ev_type(events, symbols, "alloc")
+local free = fill_ev_type(events, symbols, "free")
+
+-- 1 event -- alocation of table by itself + 1 allocation
+-- of array part as far it bigger then LJ_MAX_COLOSIZE (16).
+test:ok(check_alloc_report(alloc, 21, 19, 2))
+-- 100 strings allocations.
+test:ok(check_alloc_report(alloc, 26, 19, 100))
+
+-- Collect all previous allocated objects.
+test:ok(free.INTERNAL.num == 102)
+
+os.exit(test:check() and 0 or 1)
diff --git a/tools/luajit-parse-memprof b/tools/luajit-parse-memprof
new file mode 100755
index 0000000..b9b16d7
--- /dev/null
+++ b/tools/luajit-parse-memprof
@@ -0,0 +1,20 @@
+#!/bin/bash
+#
+# Launcher for memprof parser.
+# Copyright (C) 2015-2019 IPONWEB Ltd. See Copyright Notice in COPYRIGHT
+
+
+LAUNCHER_DIR=$(dirname `readlink -f $0`)
+# Assume that we are launched from the source tree for now.
+LUAJIT_BIN=$LAUNCHER_DIR/../src/luajit
+LUAJIT_TOOLS_PREFIX=$LAUNCHER_DIR
+
+if [[ ! -x "$LUAJIT_BIN" ]]; then
+    echo "FATAL: Unable to find LuaJIT at $LUAJIT_BIN. Is it built?"
+    exit 1
+fi
+
+TOOL_DIR=$LUAJIT_TOOLS_PREFIX/parse_memprof
+
+LUA_PATH="$TOOL_DIR/?.lua;;" $LUAJIT_BIN $TOOL_DIR/main.lua $@
+
diff --git a/tools/parse_memprof/bufread.lua b/tools/parse_memprof/bufread.lua
new file mode 100644
index 0000000..d48d6e8
--- /dev/null
+++ b/tools/parse_memprof/bufread.lua
@@ -0,0 +1,143 @@
+-- An implementation of buffered reading data from
+-- an arbitrary binary file.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local assert = assert
+
+local ffi = require 'ffi'
+local bit = require 'bit'
+
+local ffi_C  = ffi.C
+local band   = bit.band
+
+local BUFFER_SIZE = 10 * 1024 * 1024 -- 10 megabytes
+
+local M = {}
+
+ffi.cdef[[
+  void *memcpy(void *, const void *, size_t);
+
+  typedef struct FILE_ FILE;
+  FILE *fopen(const char *, const char *);
+  size_t fread(void *, size_t, size_t, FILE *);
+  int feof(FILE *);
+  int fclose(FILE *);
+]]
+
+local function _read_stream(reader, n)
+  local free_size
+  local tail_size = reader._end - reader._pos
+
+  if tail_size >= n then
+    -- Enough data to satisfy the request of n bytes...
+    return true
+  end
+
+  -- ...otherwise carry tail_size bytes from the end of the buffer
+  -- to the start and fill up free_size bytes with fresh data.
+  -- tail_size < n <= free_size (see assert below) ensures that
+  -- we don't copy overlapping memory regions.
+  -- reader._pos == 0 means filling buffer for the first time.
+
+  free_size = reader._pos > 0 and reader._pos or n
+
+  assert(n <= free_size, 'Internal buffer is large enough')
+
+  if tail_size ~= 0 then
+    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
+  end
+
+  local bytes_read = ffi_C.fread(
+    reader._buf + tail_size, 1, free_size, reader._file
+  )
+
+  reader._pos = 0
+  reader._end = tail_size + bytes_read
+
+  return reader._end - reader._pos >= n
+end
+
+function M.read_octet(reader)
+  if not _read_stream(reader, 1) then
+    return nil
+  end
+
+  local oct = reader._buf[reader._pos]
+  reader._pos = reader._pos + 1
+  return oct
+end
+
+function M.read_octets(reader, n)
+  if not _read_stream(reader, n) then
+    return nil
+  end
+
+  local octets = ffi.string(reader._buf + reader._pos, n)
+  reader._pos = reader._pos + n
+  return octets
+end
+
+function M.read_uleb128(reader)
+  local value = ffi.new('uint64_t', 0)
+  local shift = 0
+
+  repeat
+    local oct = M.read_octet(reader)
+
+    if oct == nil then
+      break
+    end
+
+    -- Alas, bit library works only with 32-bit arguments:
+    local oct_u64 = ffi.new('uint64_t', band(oct, 0x7f))
+    value = value + oct_u64 * (2 ^ shift)
+    shift = shift + 7
+
+  until band(oct, 0x80) == 0
+
+  return tonumber(value)
+end
+
+function M.read_string(reader)
+  local len = M.read_uleb128(reader)
+  return M.read_octets(reader, len)
+end
+
+function M.eof(reader)
+  local sys_feof = ffi_C.feof(reader._file)
+  if sys_feof == 0 then
+    return false
+  end
+  -- ...otherwise return true only we have reached
+  -- the end of the buffer:
+  return reader._pos == reader._end
+end
+
+function M.new(fname)
+  local file = ffi_C.fopen(fname, 'rb')
+  if file == nil then
+    error(string.format('fopen, errno: %d', ffi.errno()))
+  end
+
+  local finalizer = function(f)
+    if ffi_C.fclose(f) ~= 0 then
+      error(string.format('fclose, errno: %d', ffi.errno()))
+    end
+    ffi.gc(f, nil)
+  end
+
+  local reader = setmetatable({
+    _file = ffi.gc(file, finalizer),
+    _buf  = ffi.new('uint8_t[?]', BUFFER_SIZE),
+    _pos  = 0,
+    _end  = 0,
+  }, {__index = M})
+
+  _read_stream(reader, BUFFER_SIZE)
+
+  return reader
+end
+
+return M
diff --git a/tools/parse_memprof/main.lua b/tools/parse_memprof/main.lua
new file mode 100644
index 0000000..9a161b1
--- /dev/null
+++ b/tools/parse_memprof/main.lua
@@ -0,0 +1,104 @@
+-- A tool for parsing and visualisation of LuaJIT's memory
+-- profiler output.
+--
+-- TODO:
+-- * Think about callgraph memory profiling for complex
+--   table reallocations
+-- * Nicer output, probably an HTML view
+-- * Demangling of C symbols
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local bufread = require "bufread"
+local memprof = require "parse_memprof"
+local symtab  = require "parse_symtab"
+local view    = require "view_plain"
+
+local stdout, stderr = io.stdout, io.stderr
+local _s = string
+local match, gmatch = _s.match, _s.gmatch
+
+-- Program options.
+local opt_map = {}
+
+function opt_map.help()
+  stdout:write [[
+luajit-parse-memprof - parser of the memory usage profile collected
+                       with LuaJIT's memprof.
+
+SYNOPSIS
+
+luajit-parse-memprof [options] memprof.bin
+
+Supported options are:
+
+  --help                            Show this help and exit
+]]
+  os.exit(0)
+end
+
+-- Print error and exit with error status.
+local function opterror(...)
+  stderr:write("luajit-parse-memprof.lua: ERROR: ", ...)
+  stderr:write("\n")
+  os.exit(1)
+end
+
+-- Parse single option.
+local function parseopt(opt, args)
+  local opt_current = #opt == 1 and "-"..opt or "--"..opt
+  local f = opt_map[opt]
+  if not f then
+    opterror("unrecognized option `", opt_current, "'. Try `--help'.\n")
+  end
+  f(args)
+end
+
+-- Parse arguments.
+local function parseargs(args)
+  -- Process all option arguments.
+  args.argn = 1
+  repeat
+    local a = args[args.argn]
+    if not a then break end
+    local lopt, opt = match(a, "^%-(%-?)(.+)")
+    if not opt then break end
+    args.argn = args.argn + 1
+    if lopt == "" then
+      -- Loop through short options.
+      for o in gmatch(opt, ".") do parseopt(o, args) end
+    else
+      -- Long option.
+      parseopt(opt, args)
+    end
+  until false
+
+  -- Check for proper number of arguments.
+  local nargs = #args - args.argn + 1
+  if nargs ~= 1 then
+    opt_map.help()
+  end
+
+  -- Translate a single input file.
+  -- TODO: Handle multiple files?
+  return args[args.argn]
+end
+
+local inputfile = parseargs{...}
+
+local reader  = bufread.new(inputfile)
+local symbols = symtab.parse(reader)
+local events  = memprof.parse(reader, symbols)
+
+stdout:write("ALLOCATIONS", "\n")
+view.render(events.alloc, symbols)
+stdout:write("\n")
+
+stdout:write("REALLOCATIONS", "\n")
+view.render(events.realloc, symbols)
+stdout:write("\n")
+
+stdout:write("DEALLOCATIONS", "\n")
+view.render(events.free, symbols)
+stdout:write("\n")
diff --git a/tools/parse_memprof/parse_memprof.lua b/tools/parse_memprof/parse_memprof.lua
new file mode 100644
index 0000000..dc56fed
--- /dev/null
+++ b/tools/parse_memprof/parse_memprof.lua
@@ -0,0 +1,195 @@
+-- Parser of LuaJIT's memprof binary stream.
+-- The format spec can be found in src/profile/ljp_memprof.h.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local bit    = require 'bit'
+local band   = bit.band
+local lshift = bit.lshift
+
+local string_format = string.format
+
+local LJM_MAGIC           = 'ljm'
+local LJM_CURRENT_VERSION = 2
+
+local LJM_EPILOGUE_HEADER = 0x80
+
+local AEVENT_ALLOC   = 1
+local AEVENT_FREE    = 2
+local AEVENT_REALLOC = 3
+
+local ASOURCE_INT   = lshift(1, 2)
+local ASOURCE_LFUNC = lshift(2, 2)
+local ASOURCE_CFUNC = lshift(3, 2)
+
+local M = {}
+
+local function new_event(loc)
+  return {
+    loc     = loc,
+    num     = 0,
+    free    = 0,
+    alloc   = 0,
+    primary = {},
+  }
+end
+
+local function link_to_previous(heap, e, oaddr)
+  -- memory at oaddr was allocated before we started tracking:
+  if heap[oaddr] then
+    e.primary[heap[oaddr][2]] = heap[oaddr][3]
+  end
+end
+
+local function parse_location(reader, asource)
+  if asource == ASOURCE_INT then
+    return 'f0l0', {
+      addr = 0, -- INTERNAL
+      line = 0,
+    }
+  elseif asource == ASOURCE_CFUNC then
+    local addr = reader:read_uleb128()
+    return string_format('f%#xl%d', addr, 0), {
+      addr = addr,
+      line = 0,
+    }
+  elseif asource == ASOURCE_LFUNC then
+    local addr = reader:read_uleb128()
+    local line = reader:read_uleb128()
+    return string_format('f%#xl%d', addr, line), {
+      addr = addr,
+      line = line,
+    }
+  end
+  error('Unknown asource '..asource)
+end
+
+local function parse_alloc(reader, asource, events, heap)
+  local id, loc = parse_location(reader, asource)
+
+  local naddr = reader:read_uleb128()
+  local nsize = reader:read_uleb128()
+
+  if not events[id] then
+    events[id] = new_event(loc)
+  end
+  local e = events[id]
+  e.num   = e.num + 1
+  e.alloc = e.alloc + nsize
+
+  heap[naddr] = {nsize, id, loc}
+end
+
+local function parse_realloc(reader, asource, events, heap)
+  local id, loc = parse_location(reader, asource)
+
+  local oaddr = reader:read_uleb128()
+  local osize = reader:read_uleb128()
+  local naddr = reader:read_uleb128()
+  local nsize = reader:read_uleb128()
+
+  if not events[id] then
+    events[id] = new_event(loc)
+  end
+  local e = events[id]
+  e.num   = e.num + 1
+  e.free  = e.free + osize
+  e.alloc = e.alloc + nsize
+
+  link_to_previous(heap, e, oaddr)
+
+  heap[oaddr] = nil
+  heap[naddr] = {nsize, id, loc}
+end
+
+local function parse_free(reader, asource, events, heap)
+  local id, loc = parse_location(reader, asource)
+
+  local oaddr = reader:read_uleb128()
+  local osize = reader:read_uleb128()
+
+  if not events[id] then
+    events[id] = new_event(loc)
+  end
+  local e = events[id]
+  e.num   = e.num + 1
+  e.free  = e.free + osize
+
+  link_to_previous(heap, e, oaddr)
+
+  heap[oaddr] = nil
+end
+
+local parsers = {
+  [AEVENT_ALLOC]   = {evname =   'alloc', parse = parse_alloc},
+  [AEVENT_REALLOC] = {evname = 'realloc', parse = parse_realloc},
+  [AEVENT_FREE]    = {evname =    'free', parse = parse_free},
+}
+
+local function ev_header_is_valid(evh)
+  return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
+end
+
+local function ev_header_is_epilogue(evh)
+  return evh == LJM_EPILOGUE_HEADER
+end
+
+-- Splits event header into event type (aka aevent = allocation
+-- event) and event source (aka asource = allocation source).
+local function ev_header_split(evh)
+  return band(evh, 0x3), band(evh, lshift(0x3, 2))
+end
+
+local function parse_event(reader, events)
+  local ev_header = reader:read_octet()
+
+  assert(ev_header_is_valid(ev_header), 'Bad ev_header '..ev_header)
+
+  if ev_header_is_epilogue(ev_header) then
+    return false
+  end
+
+  local aevent, asource = ev_header_split(ev_header)
+  local parser = parsers[aevent]
+
+  assert(parser, 'Bad aevent '..aevent)
+
+  parser.parse(reader, asource, events[parser.evname], events.heap)
+
+  return true
+end
+
+function M.parse(reader)
+  local events = {
+    alloc   = {},
+    realloc = {},
+    free    = {},
+    heap    = {},
+  }
+
+  local magic   = reader:read_octets(3)
+  local version = reader:read_octets(1)
+  -- dummy-consume reserved bytes
+  local _       = reader:read_octets(3)
+
+  if magic ~= LJM_MAGIC then
+    error('Bad LJM format prologue: '..magic)
+  end
+
+  if string.byte(version) ~= LJM_CURRENT_VERSION then
+    error(string_format(
+      'LJM format version mismatch: the tool expects %d, but your data is %d',
+      LJM_CURRENT_VERSION,
+      string.byte(version)
+    ))
+  end
+
+  while parse_event(reader, events) do
+    -- empty body
+  end
+
+  return events
+end
+
+return M
diff --git a/tools/parse_memprof/parse_symtab.lua b/tools/parse_memprof/parse_symtab.lua
new file mode 100644
index 0000000..54e9337
--- /dev/null
+++ b/tools/parse_memprof/parse_symtab.lua
@@ -0,0 +1,88 @@
+-- Parser of LuaJIT's symtab binary stream.
+-- The format spec can be found in src/profile/ljp_symtab.h.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local bit = require 'bit'
+
+local band          = bit.band
+local string_format = string.format
+
+local LJS_MAGIC           = 'ljs'
+local LJS_CURRENT_VERSION = 2
+local LJS_EPILOGUE_HEADER = 0x80
+local LJS_SYMTYPE_MASK    = 0x03
+
+local SYMTAB_LFUNC = 0
+
+local M = {}
+
+-- Parse a single entry in a symtab: lfunc symbol
+local function parse_sym_lfunc(reader, symtab)
+  local sym_addr  = reader:read_uleb128()
+  local sym_chunk = reader:read_string()
+  local sym_line  = reader:read_uleb128()
+
+  symtab[sym_addr] = {
+    name = string_format('%s:%d', sym_chunk, sym_line),
+  }
+end
+
+local parsers = {
+  [SYMTAB_LFUNC] = parse_sym_lfunc,
+}
+
+function M.parse(reader)
+  local symtab   = {}
+  local magic    = reader:read_octets(3)
+  local version  = reader:read_octets(1)
+
+  local _ = reader:read_octets(3) -- dummy-consume reserved bytes
+
+  if magic ~= LJS_MAGIC then
+    error("Bad LJS format prologue: "..magic)
+  end
+
+  if string.byte(version) ~= LJS_CURRENT_VERSION then
+    error(string_format(
+         "LJS format version mismatch:"..
+         "the tool expects %d, but your data is %d",
+         LJS_CURRENT_VERSION,
+         string.byte(version)
+    ))
+
+  end
+
+  while not reader:eof() do
+    local header   = reader:read_octet()
+    local is_final = band(header, LJS_EPILOGUE_HEADER) ~= 0
+
+    if is_final then
+      break
+    end
+
+    local sym_type = band(header, LJS_SYMTYPE_MASK)
+    if parsers[sym_type] then
+      parsers[sym_type](reader, symtab)
+    end
+  end
+
+  return symtab
+end
+
+function M.demangle(symtab, loc)
+  local addr = loc.addr
+
+  if addr == 0 then
+    return 'INTERNAL'
+  end
+
+  if symtab[addr] then
+    return string_format('%s, line %d', symtab[addr].name, loc.line)
+  end
+
+  return string_format('CFUNC %#x', addr)
+end
+
+return M
diff --git a/tools/parse_memprof/view_plain.lua b/tools/parse_memprof/view_plain.lua
new file mode 100644
index 0000000..089bc73
--- /dev/null
+++ b/tools/parse_memprof/view_plain.lua
@@ -0,0 +1,45 @@
+-- Simple human-readable renderer of LuaJIT's memprof profile.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local symtab = require 'parse_symtab'
+
+local M = {}
+
+function M.render(events, symbols)
+  local ids = {}
+
+  for id, _ in pairs(events) do
+    table.insert(ids, id)
+  end
+
+  table.sort(ids, function(id1, id2)
+    return events[id1].num > events[id2].num
+  end)
+
+  for i = 1, #ids do
+    local event = events[ids[i]]
+    print(string.format('%s: %d\t%d\t%d',
+      symtab.demangle(symbols, event.loc),
+      event.num,
+      event.alloc,
+      event.free
+    ))
+
+    local prim_loc = {}
+    for _, loc in pairs(event.primary) do
+      table.insert(prim_loc, symtab.demangle(symbols, loc))
+    end
+    if #prim_loc ~= 0 then
+      table.sort(prim_loc)
+      print('\tOverrides:')
+      for j = 1, #prim_loc do
+        print(string.format('\t\t%s', prim_loc[j]))
+      end
+      print('')
+    end
+  end
+end
+
+return M
-- 
2.28.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun
@ 2020-12-20 21:27   ` Igor Munkin
  2020-12-23 18:20     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-20 21:27 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch, but I guess we need to restructure the sources in
a separate series if we want to (but I personally strongly doubt).

I don't want to mix the new layout you want to introduce with the
feature. E.g. we already introduced <lib_misc.c>, however, I also prefer
the uJIT layout (i.e. lib/misc.c). LuaJIT already provides a profiler
that is implemented in scope of lj_profile.[hc]. Let's leave everything
"flat" in <src> directory to save the sources consistency. Please, drop
this patch out of the series.

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun
@ 2020-12-20 22:44   ` Igor Munkin
  2020-12-23 22:34     ` Sergey Kaplun
  2020-12-23 16:50   ` Sergey Ostanevich
  1 sibling, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-20 22:44 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please, consider the comments below.

On 16.12.20, Sergey Kaplun wrote:
> This patch introduces module for reading and writing leb128 compression.
> It will be used for streaming profiling events writing, that will be
> added at the next patches.
> 
> Part of tarantool/tarantool#5442
> ---
>  src/Makefile       |   5 +-
>  src/Makefile.dep   |   1 +
>  src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
>  src/utils/leb128.h |  55 ++++++++++++++++++++
>  4 files changed, 183 insertions(+), 2 deletions(-)
>  create mode 100644 src/utils/leb128.c
>  create mode 100644 src/utils/leb128.h
> 
> diff --git a/src/Makefile b/src/Makefile
> index caa49f9..be7ed95 100644
> --- a/src/Makefile
> +++ b/src/Makefile

Please, adjust these changes considering the comments to the first
patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for
the name.

> @@ -468,6 +468,7 @@ endif

<snipped>

> diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> new file mode 100644
> index 0000000..921e5bc
> --- /dev/null
> +++ b/src/utils/leb128.c
> @@ -0,0 +1,124 @@
> +/*
> +** Working with LEB128/ULEB128 encoding.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#include <stdint.h>
> +#include <stddef.h>

Why do you include this again instead of using leb128.h?

> +
> +#define LINK_BIT          (0x80)
> +#define MIN_TWOBYTE_VALUE (0x80)
> +#define PAYLOAD_MASK      (0x7f)
> +#define SHIFT_STEP        (7)
> +#define LEB_SIGN_BIT      (0x40)
> +
> +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */
> +
> +size_t write_uleb128(uint8_t *buffer, uint64_t value)
> +{
> +  size_t i = 0;
> +
> +  for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> +  }

The braces are excess.

> +  buffer[i++] = (uint8_t)value;
> +
> +  return i;
> +}
> +
> +size_t write_leb128(uint8_t *buffer, int64_t value)
> +{
> +  size_t i = 0;
> +
> +  for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {

What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant
here. Otherwise create a new one with the comment. Please, do not use
magic numbers.

> +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> +  }

The braces are excess.

> +  buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
> +
> +  return i;
> +}
> +
> +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */
> +
> +/*
> +** NB! For each LEB128 type (signed/unsigned) we have two versions of read

Minor: It's better to use XXX for these cases in comments. We already
discussed this with Vlad here[1] (search for "What is 'XXX'?").

> +** functions: The one consuming unlimited number of input octets and the one
> +** consuming not more than given number of input octets. Currently reading
> +** is not used in performance critical places, so these two functions are
> +** implemented via single low-level function + run-time mode check. Feel free
> +** to change if this becomes a bottleneck.

Well, you can also add LJ_AINLINE for a low-level function, or simply
add a similar hint by yourself (I personally prefer the first one).

> +*/
> +
> +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded,
> +			    size_t n)

AFAICS, <n> argument is used only in case <guarded> is set to 1.
Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a
nilpotent function. So it seems you can drop the <guarded> parameter in
favour of the following contract for <n>:
* n == 0 is for guarded == 0 && n == 0
* n > 0  is for guarded == 1 && n > 0

This also relates to <_read_leb128>.

> +{
> +  size_t i = 0;
> +  uint64_t value = 0;
> +  uint64_t shift = 0;
> +  uint8_t octet;
> +
> +  for(;;) {
> +    if (guarded && i + 1 > n) {
> +      return 0;
> +    }

The braces are excess.

> +    octet = buffer[i++];
> +    value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
> +    shift += SHIFT_STEP;
> +    if (!(octet & LINK_BIT)) {
> +      break;
> +    }

The braces are excess.

> +  }
> +
> +  *out = value;
> +  return i;
> +}
> +

<snipped>

> +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded,
> +			   size_t n)
> +{
> +  size_t i = 0;
> +  int64_t  value = 0;
> +  uint64_t shift = 0;
> +  uint8_t  octet;

A mess with whitespace above.

> +
> +  for(;;) {
> +    if (guarded && i + 1 > n) {
> +      return 0;
> +    }

The braces are excess.

> +    octet  = buffer[i++];
> +    value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
> +    shift += SHIFT_STEP;
> +    if (!(octet & LINK_BIT)) {
> +      break;
> +    }

The braces are excess.

> +  }
> +
> +  if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) {
> +    value |= -(1 << shift);
> +  }

The braces are excess.

> +
> +  *out = value;
> +  return i;
> +}
> +

<snipped>

> diff --git a/src/utils/leb128.h b/src/utils/leb128.h
> new file mode 100644
> index 0000000..46d90bc
> --- /dev/null
> +++ b/src/utils/leb128.h
> @@ -0,0 +1,55 @@
> +/*
> +** Interfaces for working with LEB128/ULEB128 encoding.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#ifndef _LJ_UTILS_LEB128_H
> +#define _LJ_UTILS_LEB128_H
> +
> +#include <stddef.h>
> +#include <stdint.h>
> +
> +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
> +#define LEB128_U64_MAXSIZE 10

The naming looks odd to me. Considering my comment for the first patch,
I propose to use something matching "lj_u?leb128_(read|write)(_n)?".

By the way, the order of the interfaces is also odd.

> +
> +/*
> +** Writes a value from an unsigned 64-bit input to a buffer of bytes.
> +** Buffer overflow is not checked. Returns number of bytes written.
> +*/
> +size_t write_uleb128(uint8_t *buffer, uint64_t value);
> +
> +/*
> +** Writes a value from an signed 64-bit input to a buffer of bytes.
> +** Buffer overflow is not checked. Returns number of bytes written.
> +*/
> +size_t write_leb128(uint8_t *buffer, int64_t value);
> +
> +/*
> +** Reads a value from a buffer of bytes to a uint64_t output.
> +** Buffer overflow is not checked. Returns number of bytes read.

If "buffer overflow" stands for "reading out of bounds", please reword
this. Otherwise, I don't get it.

> +*/
> +size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
> +
> +/*
> +** Reads a value from a buffer of bytes to a int64_t output.
> +** Buffer overflow is not checked. Returns number of bytes read.

Ditto.

> +*/
> +size_t read_leb128(int64_t *out, const uint8_t *buffer);
> +
> +/*
> +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.

Ditto.

> +** If more than n bytes is about to be consumed, returns 0 without touching out.
> +*/
> +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
> +
> +/*
> +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
> +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.

Ditto.

> +** If more than n bytes is about to be consumed, returns 0 without touching out.
> +*/
> +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
> +
> +#endif
> -- 
> 2.28.0
> 

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun
@ 2020-12-20 22:46   ` Igor Munkin
  2020-12-24  6:50     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-20 22:46 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! I doubt this change need to be made as a separate
patch. Anyway, I left a couple of nits below.

Furthermore, it's better to use 'core' prefix here, otherwise it's not
needed since every translation unit uses own one then.

On 16.12.20, Sergey Kaplun wrote:
> This patch renames debug_frameline to lj_debug_frameline and moves it to
> public <lj_debug.h> module API (does not provide it with LUA_API). It
> will be used for memory profiler in the next patches.

Minor: I propose more simple wording:
| core: make debug_frameline visible for other sources
|
| This change makes debug_frameline function LuaJIT-wide visible to be
| used in other subsystems (e.g. memory profiler).

> 
> Part of tarantool/tarantool#5442
> ---
>  src/lj_debug.c | 8 ++++----
>  src/lj_debug.h | 1 +
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 

<snipped>

> diff --git a/src/lj_debug.h b/src/lj_debug.h
> index 5917c00..1b5ef29 100644
> --- a/src/lj_debug.h
> +++ b/src/lj_debug.h
> @@ -40,6 +40,7 @@ LJ_FUNC void lj_debug_addloc(lua_State *L, const char *msg,
>  LJ_FUNC void lj_debug_pushloc(lua_State *L, GCproto *pt, BCPos pc);
>  LJ_FUNC int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar,
>  			     int ext);
> +LJ_FUNC BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe);

Minor: As you can see below, <lj_debug_dumpstack> is provided only if
profiler is enabled. Since this function is necessary only for memprof,
there is no need to make it LuaJIT-wide visible by default. This looks
more natural to me. This is not required but seems like a nice approach,
so feel free to ignore.

>  #if LJ_HASPROFILE
>  LJ_FUNC void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt,
>  				int depth);
> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun
@ 2020-12-20 22:46   ` Igor Munkin
  2020-12-24  6:47     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-20 22:46 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch, but I see no rationale for it. Now luajit-gdb.py
is located alongside with LuaJIT binary, so when one runs gdb with it,
the debugger tries to autoload the extenstion. After your changes it
doesn't. Please, drop this patch out of the series.

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun
@ 2020-12-21  9:24   ` Igor Munkin
  2020-12-24  6:46     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-21  9:24 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider the comments below. I expected
that we agreed we need an MVP for now, so the module API is fine except
the <write_buffer_sys> part: it is overcomplicated for the current uses.

On 16.12.20, Sergey Kaplun wrote:
> This patch introduces module for writing profile data.
> Its usage will be added at the next patches.
> 
> It can be used for memory profiler or for signal-based
> cpu profiler.

I see nothing strongly related to the profiler, so it's simply an
internal write buffer. Then let's settle for the naming at first. I
guess <lj_wbuf*> is a good prefix for this subsystem.

So, I propose the following wording for the commit message:
| core: introduce write buffer module
|
| This patch introduces the standalone module for writing data to the
| file via the special buffer. The module provides the API for buffer
| initial setup and its convenient usage.

Feel free to adjust the description on you own and even describe the
introduced API in a more verbose way.

> 
> Part of tarantool/tarantool#5442
> ---
> 
> Custom memcpy function (see below) makes sense if this module will be
> used for cpu/sample profiler based on a signal-based timer. Else it can
> be easily redefined.

I suggest to not overcomplicate all this machinery added for an a single
profiler. Let's drop all memcpy-related hacks. By the way we can simply
incorporate it when it is necessary.

> 
>  src/Makefile            |   5 +-
>  src/Makefile.dep        |   2 +
>  src/profile/ljp_write.c | 195 ++++++++++++++++++++++++++++++++++++++++
>  src/profile/ljp_write.h |  84 +++++++++++++++++
>  4 files changed, 284 insertions(+), 2 deletions(-)
>  create mode 100644 src/profile/ljp_write.c
>  create mode 100644 src/profile/ljp_write.h
> 
> diff --git a/src/Makefile b/src/Makefile
> index be7ed95..4b1d937 100644
> --- a/src/Makefile
> +++ b/src/Makefile

Please, adjust these changes considering the comments to the first
patch. You can find the proposed naming above.

> @@ -469,6 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)

<snipped>

> diff --git a/src/profile/ljp_write.c b/src/profile/ljp_write.c
> new file mode 100644
> index 0000000..de7202d
> --- /dev/null
> +++ b/src/profile/ljp_write.c
> @@ -0,0 +1,195 @@
> +/*
> +** Low-level writer for LuaJIT Profiler.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#include <unistd.h>
> +#include <errno.h>
> +
> +#include "profile/ljp_write.h"
> +#include "utils/leb128.h"
> +#include "lj_def.h"

<snipped>

> +/* Wraps a write syscall ensuring all data have been written. */

I see no syscall wrapped here. Anyway, IIRC we discussed that we don't
need such complex interfaces in scope of MVP. So you can just use
<write> or <fwrite> right here for now and redesign the wbuf API later
if necessary (it's internal, so I see no problem).

> +static void write_buffer_sys(struct ljp_buffer *buffer, const void **data,
> +			     size_t len)

Why do you need a separate function for this? I guess this should be
moved right to the <ljp_write_flush_buffer> (hell these names).

> +{
> +  void *ctx = buffer->ctx;
> +  size_t written;
> +
> +  lua_assert(!ljp_write_test_flag(buffer, STREAM_STOP));
> +
> +  written = buffer->writer(data, len, ctx);

Well, I believe you can use buffer->ctx instead of the additional
variable here. Trust me, you can!

> +
> +  if (LJ_UNLIKELY(written < len)) {
> +    write_set_flag(buffer, STREAM_ERR_IO);
> +    write_save_errno(buffer);
> +  }
> +  if (LJ_UNLIKELY(*data == NULL)) {
> +    write_set_flag(buffer, STREAM_STOP);
> +    write_save_errno(buffer);
> +  }
> +}
> +
> +static LJ_AINLINE size_t write_bytes_buffered(const struct ljp_buffer *buf)

I propose s/write_bytes_buffered/lj_wbuf_len/ (consider sbuflen macro).

<snipped>

> +static LJ_AINLINE int write_buffer_has(const struct ljp_buffer *buf, size_t n)

I propose s/write_buffer_has/lj_wbuf_left/ (consider sbufleft macro).

<snipped>

> +void ljp_write_init(struct ljp_buffer *buf, ljp_writer writer, void *ctx,
> +		    uint8_t *mem, size_t size)
> +{
> +  buf->ctx = ctx;
> +  buf->writer = writer;
> +  buf->buf = mem;
> +  buf->pos = mem;
> +  buf->size = size;
> +  buf->flags = 0;
> +  buf->saved_errno = 0;
> +}
> +
> +void ljp_write_terminate(struct ljp_buffer *buf)
> +{
> +  ljp_write_init(buf, NULL, NULL, NULL, 0);
> +}

<snipped>

> +/* Writes n bytes from an arbitrary buffer src to the output. */
> +static void write_buffer(struct ljp_buffer *buf, const void *src, size_t n)
> +{
> +  if (LJ_UNLIKELY(ljp_write_test_flag(buf, STREAM_STOP)))
> +    return;
> +  /*
> +  ** Very unlikely: We are told to write a large buffer at once.
> +  ** Buffer not belong to us so we must to pump data
> +  ** through buffer.
> +  */
> +  while (LJ_UNLIKELY(n > buf->size)) {
> +    ljp_write_flush_buffer(buf);

Why do you need to flush the buffer on start? I guess you can fill the
buffer till it becomes full and only then flush.

> +    write_memcpy(buf->pos, src, buf->size);
> +    buf->pos += (ptrdiff_t)buf->size;
> +    n -= buf->size;
> +  }
> +
> +  write_reserve(buf, n);
> +  write_memcpy(buf->pos, src, n);
> +  buf->pos += (ptrdiff_t)n;
> +}
> +
> +/* Writes a \0-terminated C string to the output buffer. */
> +void ljp_write_string(struct ljp_buffer *buf, const char *s)
> +{
> +  const size_t l = strlen(s);
> +
> +  ljp_write_u64(buf, (uint64_t)l);

This is unclear that the check that profiling is still active is made in
scope of the callee.

> +  write_buffer(buf, s, l);
> +}
> +

<snipped>

> diff --git a/src/profile/ljp_write.h b/src/profile/ljp_write.h
> new file mode 100644
> index 0000000..29c1669
> --- /dev/null
> +++ b/src/profile/ljp_write.h
> @@ -0,0 +1,84 @@
> +/*
> +** Low-level event streaming for LuaJIT Profiler.
> +** NB! Please note that all events may be streamed inside a signal handler.
> +** This means effectively that only async-signal-safe library functions and
> +** syscalls MUST be used for streaming. Check with `man 7 signal` when in
> +** doubt.
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#ifndef _LJP_WRITE_H
> +#define _LJP_WRITE_H
> +
> +#include <stdint.h>
> +
> +/*
> +** Data format for strings:
> +**
> +** string         := string-len string-payload
> +** string-len     := <ULEB128>
> +** string-payload := <BYTE> {string-len}
> +**
> +** Note.
> +** For strings shorter than 128 bytes (most likely scenario in our case)
> +** we write the same amount of data (1-byte ULEB128 + actual payload) as we
> +** would have written with straightforward serialization (actual payload + \0),
> +** but make parsing easier.
> +*/
> +
> +/* Stream errors. */
> +#define STREAM_ERR_IO 0x1
> +#define STREAM_STOP   0x2
> +
> +typedef size_t (*ljp_writer)(const void **data, size_t len, void *opt);
> +
> +/* Write buffer for profilers. */
> +struct ljp_buffer {
> +  /*
> +  ** Buffer writer which will called at buffer write.
> +  ** Should return amount of written bytes on success or zero in case of error.
> +  ** *data should contain new buffer of size greater or equal to len.
> +  ** If *data == NULL stream stops.
> +  */
> +  ljp_writer writer;
> +  /* Context to writer function. */

Typo: s/Context to/Context for/.

> +  void *ctx;
> +  /* Buffer size. */
> +  size_t size;
> +  /* Saved errno in case of error. */
> +  int saved_errno;
> +  /* Start of buffer. */
> +  uint8_t *buf;
> +  /* Current position in buffer. */
> +  uint8_t *pos;
> +  /* Internal flags. */
> +  volatile uint8_t flags;
> +};

Well, I don't get why the functions are called <ljp_write_*>, but the
first parameter (i.e. "self") is ljp_buffer. As a result such names as
<ljp_write_errno> looks confusing, since errno is actually written
nowhere. I suggest to name everything with <lj_wbuf_*> prefix, so the
names <lj_wbuf_errno> and <lj_wbuf_test_flag> fits the resulting value.
Furthermore, the routines appending the data to the buffer can be
renamed the following way: ljp_write_<type> -> lj_wbuf_add<type>
(consider Lua standart buffer API). Thoughts?

> +
> +/* Write string. */
> +void ljp_write_string(struct ljp_buffer *buf, const char *s);
> +
> +/* Write single byte. */
> +void ljp_write_byte(struct ljp_buffer *buf, uint8_t b);
> +
> +/* Write uint64_t in uleb128 format. */
> +void ljp_write_u64(struct ljp_buffer *buf, uint64_t n);
> +
> +/* Immediatly flush buffer. */
> +void ljp_write_flush_buffer(struct ljp_buffer *buf);
> +
> +/* Init buffer. */
> +void ljp_write_init(struct ljp_buffer *buf, ljp_writer writer, void *ctx,
> +		    uint8_t *mem, size_t size);
> +
> +/* Check flags. */
> +int ljp_write_test_flag(const struct ljp_buffer *buf, uint8_t flag);
> +
> +/* Return saved errno. */
> +int ljp_write_errno(const struct ljp_buffer *buf);
> +
> +/* Set pointers to NULL and reset flags. */
> +void ljp_write_terminate(struct ljp_buffer *buf);
> +
> +#endif
> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun
@ 2020-12-21 10:30   ` Igor Munkin
  2020-12-24  7:00     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-21 10:30 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider minor comments below.

On 16.12.20, Sergey Kaplun wrote:
> This patch adds profile writer that writes all necessary Lua
> functions prototypes info like GCproto address, name of the chunk this
> function was defined in and number of the first line of it.
> See <ljp_symtab.h> for details.
> 
> Usage of this module will be added at the next patches.

I propose the following wording to make it a bit clearer:
| core: introduce symtab dumping module
|
| This patch introduces the routine dumping the definitions of all
| loaded Lua functions via the write buffer introduced in the previous
| patch. The following information is recorded for each function:
| * GCproto address
| * The name of the Lua chunk where this function is defined
| * The line number where this function is defined (i.e. the line where
|   its signature locates)

> 
> Part of tarantool/tarantool#5442
> ---
>  src/Makefile             |  2 +-
>  src/Makefile.dep         |  2 ++
>  src/profile/ljp_symtab.c | 55 ++++++++++++++++++++++++++++++++++++++
>  src/profile/ljp_symtab.h | 57 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 115 insertions(+), 1 deletion(-)
>  create mode 100644 src/profile/ljp_symtab.c
>  create mode 100644 src/profile/ljp_symtab.h
> 
> diff --git a/src/Makefile b/src/Makefile
> index 4b1d937..e00265c 100644
> --- a/src/Makefile
> +++ b/src/Makefile

Please, adjust these changes considering the comments to the first
patch. I propose to name this <lj_wbuf_symtab.*> considering the naming
in the third patch.

> @@ -469,7 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)

<snipped>

> diff --git a/src/profile/ljp_symtab.c b/src/profile/ljp_symtab.c
> new file mode 100644
> index 0000000..5a17c97
> --- /dev/null
> +++ b/src/profile/ljp_symtab.c
> @@ -0,0 +1,55 @@

<snipped>

> +#define LJS_CURRENT_VERSION 2

Why is it already the second version?

> +
> +static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
> +					   0x0, 0x0, 0x0};
> +
> +static void symtab_write_prologue(struct ljp_buffer *out)

Why do you need a separate routine instead of making <write_buffer>
function in <ljp_write.c> public?

> +{

<snipped>

> +void ljp_symtab_write(struct ljp_buffer *out, const struct global_State *g)
> +{
> +  const GCobj *o;
> +  const GCRef *iter = &g->gc.root;
> +
> +  symtab_write_prologue(out);
> +
> +  while (NULL != (o = gcref(*iter))) {
> +    switch (o->gch.gct) {

<snipped>

> +    case (~LJ_TTRACE): {
> +      /* TODO: Implement dumping a trace info */
> +      break;
> +    }

Minor: So this case be dropped for now?

> +    default: {

<snipped>

> diff --git a/src/profile/ljp_symtab.h b/src/profile/ljp_symtab.h
> new file mode 100644
> index 0000000..3a40d98
> --- /dev/null
> +++ b/src/profile/ljp_symtab.h
> @@ -0,0 +1,57 @@

<snipped>

> +struct global_State;
> +struct ljp_buffer;

Why do not simply include <lj_obj.h> and <profile/ljp_write.h>?
Otherwise, just move it close to the function signature and mention the
reason why you omit these headers inclusion.

> +

<snipped>

> +
> +/* Writes the symbol table of the VM g to out. */

Strictly saying this routine dumps the symbol table to *ljp_buffer* and
I guess it should be mentioned in its name e.g. <lj_wbuf_addsymtab>.

> +void ljp_symtab_write(struct ljp_buffer *out, const struct global_State *g);
> +
> +#endif
> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler
  2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
                   ` (10 preceding siblings ...)
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun
@ 2020-12-21 10:43 ` Igor Munkin
  2020-12-24  7:02   ` Sergey Kaplun
  11 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-21 10:43 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the series! I reviewed the patches (1..4, 7, 10) and expect
considerable changes in both code and sources layout, so I guess we can
proceed in the v2. Anyway, the patches 5 and 6 looks independent to
these changes and I review them in a while. The patches 8, 9 and 11 may
change in the next version, so I'll glance them but the full review will
be made for the next version.

On 16.12.20, Sergey Kaplun wrote:
> This patch provides a Lua interface for memory profiler in LuaJIT
> and the corresponding parser of profiled data.
> 
> Issues: https://github.com/tarantool/tarantool/issues/5442
>         https://github.com/tarantool/tarantool/issues/5490
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-5442-luajit-memory-profiler
> 
> CI:     https://gitlab.com/tarantool/tarantool/-/pipelines/230917973
> 
> RFC: https://lists.tarantool.org/pipermail/tarantool-discussions/2020-December/000144.html
> 
> @ChangeLog:
>  - Introduce LuaJIT memory profiler (gh-5442).

IIRC, there is a separate issue for memprof parse, so it's worth to add
it to the ChangeLog also.

> 
> Sergey Kaplun (11):
>   build: add src dir in building
>   utils: introduce leb128 reader and writer
>   profile: introduce profiler writing module
>   profile: introduce symtab write module
>   vm: introduce LFUNC and FFUNC vmstates
>   core: introduce new mem_L field
>   debug: move debug_frameline to public module API
>   profile: introduce memory profiler
>   misc: add Lua API for memory profiler
>   tools: introduce tools directory
>   profile: introduce profile parser
> 

<snipped>

> 
> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun
  2020-12-20 22:44   ` Igor Munkin
@ 2020-12-23 16:50   ` Sergey Ostanevich
  2020-12-23 22:36     ` Sergey Kaplun
  1 sibling, 1 reply; 42+ messages in thread
From: Sergey Ostanevich @ 2020-12-23 16:50 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch! I have only one comment after Igor’s review
put the ‘buffer overflow’ as a ‘No bounds checks for the buffer'

Regards,
Sergos

> +/*
> +** Writes a value from an unsigned 64-bit input to a buffer of bytes.
> +** Buffer overflow is not checked. Returns number of bytes written.
> +*/
> +size_t write_uleb128(uint8_t *buffer, uint64_t value);
> +
> +/*
> +** Writes a value from an signed 64-bit input to a buffer of bytes.
> +** Buffer overflow is not checked. Returns number of bytes written.
> +*/
> +size_t write_leb128(uint8_t *buffer, int64_t value);
> +
> +/*
> +** Reads a value from a buffer of bytes to a uint64_t output.
> +** Buffer overflow is not checked. Returns number of bytes read.
> +*/
> +size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
> +
> +/*
> +** Reads a value from a buffer of bytes to a int64_t output.
> +** Buffer overflow is not checked. Returns number of bytes read.
> +*/
> +size_t read_leb128(int64_t *out, const uint8_t *buffer);
> +
> +/*
> +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> +** If more than n bytes is about to be consumed, returns 0 without touching out.
> +*/
> +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
> +
> +/*
> +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
> +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> +** If more than n bytes is about to be consumed, returns 0 without touching out.
> +*/
> +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
> +
> +#endif
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building
  2020-12-20 21:27   ` Igor Munkin
@ 2020-12-23 18:20     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-23 18:20 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch, but I guess we need to restructure the sources in
> a separate series if we want to (but I personally strongly doubt).
> 
> I don't want to mix the new layout you want to introduce with the
> feature. E.g. we already introduced <lib_misc.c>, however, I also prefer
> the uJIT layout (i.e. lib/misc.c). LuaJIT already provides a profiler
> that is implemented in scope of lj_profile.[hc]. Let's leave everything
> "flat" in <src> directory to save the sources consistency. Please, drop
> this patch out of the series.

OK, I'll drop this commit.

Side note: At least we should change the layout of all sources (if we
want) after we'll sync up with upstream. It makes this procedure less
painful. As for me new layout _after_ sync up will not disturb
applying single patches.

> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-20 22:44   ` Igor Munkin
@ 2020-12-23 22:34     ` Sergey Kaplun
  2020-12-24  9:11       ` Igor Munkin
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-23 22:34 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please, consider the comments below.
> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch introduces module for reading and writing leb128 compression.
> > It will be used for streaming profiling events writing, that will be
> > added at the next patches.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> >  src/Makefile       |   5 +-
> >  src/Makefile.dep   |   1 +
> >  src/utils/leb128.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
> >  src/utils/leb128.h |  55 ++++++++++++++++++++
> >  4 files changed, 183 insertions(+), 2 deletions(-)
> >  create mode 100644 src/utils/leb128.c
> >  create mode 100644 src/utils/leb128.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index caa49f9..be7ed95 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> 
> Please, adjust these changes considering the comments to the first
> patch. I propose to use either <lj_utils.*> or <lj_utils_leb128.*> for
> the name.

OK, no problem.

> 
> > @@ -468,6 +468,7 @@ endif
> 
> <snipped>
> 
> > diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> > new file mode 100644
> > index 0000000..921e5bc
> > --- /dev/null
> > +++ b/src/utils/leb128.c
> > @@ -0,0 +1,124 @@
> > +/*
> > +** Working with LEB128/ULEB128 encoding.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#include <stdint.h>
> > +#include <stddef.h>
> 
> Why do you include this again instead of using leb128.h?

It's enough. Definitions from <leb128.h> is redundant here.

> 
> > +
> > +#define LINK_BIT          (0x80)
> > +#define MIN_TWOBYTE_VALUE (0x80)
> > +#define PAYLOAD_MASK      (0x7f)
> > +#define SHIFT_STEP        (7)
> > +#define LEB_SIGN_BIT      (0x40)
> > +
> > +/* ------------------------- Writing ULEB128/LEB128 ------------------------- */
> > +
> > +size_t write_uleb128(uint8_t *buffer, uint64_t value)
> > +{
> > +  size_t i = 0;
> > +
> > +  for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> > +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +  buffer[i++] = (uint8_t)value;
> > +
> > +  return i;
> > +}
> > +
> > +size_t write_leb128(uint8_t *buffer, int64_t value)
> > +{
> > +  size_t i = 0;
> > +
> > +  for (; (uint64_t)(value + 0x40) >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP) {
> 
> What is 0x40? If this is <LEB_SIGN_BIT>, then just use the constant
> here. Otherwise create a new one with the comment. Please, do not use
> magic numbers.

This necessary bit propagation for correct encoding. I'll drop comment
about it in the next version.

> 
> > +    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +  buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
> > +
> > +  return i;
> > +}
> > +
> > +/* ------------------------- Reading ULEB128/LEB128 ------------------------- */
> > +
> > +/*
> > +** NB! For each LEB128 type (signed/unsigned) we have two versions of read
> 
> Minor: It's better to use XXX for these cases in comments. We already
> discussed this with Vlad here[1] (search for "What is 'XXX'?").

OK, IINM I've already seen "NB:" comment somewhere in LuaJIT sources.
But "XXX" is good to me.

> 
> > +** functions: The one consuming unlimited number of input octets and the one
> > +** consuming not more than given number of input octets. Currently reading
> > +** is not used in performance critical places, so these two functions are
> > +** implemented via single low-level function + run-time mode check. Feel free
> > +** to change if this becomes a bottleneck.
> 
> Well, you can also add LJ_AINLINE for a low-level function, or simply
> add a similar hint by yourself (I personally prefer the first one).

OK.

> 
> > +*/
> > +
> > +static size_t _read_uleb128(uint64_t *out, const uint8_t *buffer, int guarded,
> > +			    size_t n)
> 
> AFAICS, <n> argument is used only in case <guarded> is set to 1.
> Moreover, <n> can't be 0 when <guarded> is set, otherwise this is a
> nilpotent function. So it seems you can drop the <guarded> parameter in
> favour of the following contract for <n>:
> * n == 0 is for guarded == 0 && n == 0
> * n > 0  is for guarded == 1 && n > 0
> 
> This also relates to <_read_leb128>.

Yes, good point, thanks!

> 
> > +{
> > +  size_t i = 0;
> > +  uint64_t value = 0;
> > +  uint64_t shift = 0;
> > +  uint8_t octet;
> > +
> > +  for(;;) {
> > +    if (guarded && i + 1 > n) {
> > +      return 0;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +    octet = buffer[i++];
> > +    value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
> > +    shift += SHIFT_STEP;
> > +    if (!(octet & LINK_BIT)) {
> > +      break;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +  }
> > +
> > +  *out = value;
> > +  return i;
> > +}
> > +
> 
> <snipped>
> 
> > +static size_t _read_leb128(int64_t *out, const uint8_t *buffer, int guarded,
> > +			   size_t n)
> > +{
> > +  size_t i = 0;
> > +  int64_t  value = 0;
> > +  uint64_t shift = 0;
> > +  uint8_t  octet;
> 
> A mess with whitespace above.

Fixed.

> 
> > +
> > +  for(;;) {
> > +    if (guarded && i + 1 > n) {
> > +      return 0;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +    octet  = buffer[i++];
> > +    value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
> > +    shift += SHIFT_STEP;
> > +    if (!(octet & LINK_BIT)) {
> > +      break;
> > +    }
> 
> The braces are excess.

Fixed.

> 
> > +  }
> > +
> > +  if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8) {
> > +    value |= -(1 << shift);
> > +  }
> 
> The braces are excess.

Fixed.

> 
> > +
> > +  *out = value;
> > +  return i;
> > +}
> > +
> 
> <snipped>
> 
> > diff --git a/src/utils/leb128.h b/src/utils/leb128.h
> > new file mode 100644
> > index 0000000..46d90bc
> > --- /dev/null
> > +++ b/src/utils/leb128.h
> > @@ -0,0 +1,55 @@
> > +/*
> > +** Interfaces for working with LEB128/ULEB128 encoding.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#ifndef _LJ_UTILS_LEB128_H
> > +#define _LJ_UTILS_LEB128_H
> > +
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +
> > +/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
> > +#define LEB128_U64_MAXSIZE 10
> 
> The naming looks odd to me. Considering my comment for the first patch,
> I propose to use something matching "lj_u?leb128_(read|write)(_n)?".
> 
> By the way, the order of the interfaces is also odd.

I'll rewrite naming considering new naming of this translation unit.

> 
> > +
> > +/*
> > +** Writes a value from an unsigned 64-bit input to a buffer of bytes.
> > +** Buffer overflow is not checked. Returns number of bytes written.
> > +*/
> > +size_t write_uleb128(uint8_t *buffer, uint64_t value);
> > +
> > +/*
> > +** Writes a value from an signed 64-bit input to a buffer of bytes.
> > +** Buffer overflow is not checked. Returns number of bytes written.
> > +*/
> > +size_t write_leb128(uint8_t *buffer, int64_t value);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a uint64_t output.
> > +** Buffer overflow is not checked. Returns number of bytes read.
> 
> If "buffer overflow" stands for "reading out of bounds", please reword
> this. Otherwise, I don't get it.

Yep, fixed, thanks!

> 
> > +*/
> > +size_t read_uleb128(uint64_t *out, const uint8_t *buffer);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output.
> > +** Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +*/
> > +size_t read_leb128(int64_t *out, const uint8_t *buffer);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +** If more than n bytes is about to be consumed, returns 0 without touching out.
> > +*/
> > +size_t read_uleb128_n(uint64_t *out, const uint8_t *buffer, size_t n);
> > +
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
> > +** than n bytes. Buffer overflow is not checked. Returns number of bytes read.
> 
> Ditto.

Yep, fixed, thanks!

> 
> > +** If more than n bytes is about to be consumed, returns 0 without touching out.
> > +*/
> > +size_t read_leb128_n(int64_t *out, const uint8_t *buffer, size_t n);
> > +
> > +#endif
> > -- 
> > 2.28.0
> > 
> 
> [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018314.html
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-23 16:50   ` Sergey Ostanevich
@ 2020-12-23 22:36     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-23 22:36 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi, Sergos!

Thanks for the review!
On 23.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch! I have only one comment after Igor’s review
> put the ‘buffer overflow’ as a ‘No bounds checks for the buffer'

Thanks! I'll rewrite it in the next version.

> 
> Regards,
> Sergos
> 

<snipped>

> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-21  9:24   ` Igor Munkin
@ 2020-12-24  6:46     ` Sergey Kaplun
  2020-12-24 15:45       ` Sergey Ostanevich
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24  6:46 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below. I expected
> that we agreed we need an MVP for now, so the module API is fine except
> the <write_buffer_sys> part: it is overcomplicated for the current uses.

We can try this API for now. This may lead to thoughts about improving C
API.

> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch introduces module for writing profile data.
> > Its usage will be added at the next patches.
> > 
> > It can be used for memory profiler or for signal-based
> > cpu profiler.
> 
> I see nothing strongly related to the profiler, so it's simply an
> internal write buffer. Then let's settle for the naming at first. I
> guess <lj_wbuf*> is a good prefix for this subsystem.

OK, it looks better.

> 
> So, I propose the following wording for the commit message:
> | core: introduce write buffer module
> |
> | This patch introduces the standalone module for writing data to the
> | file via the special buffer. The module provides the API for buffer
> | initial setup and its convenient usage.
> 
> Feel free to adjust the description on you own and even describe the
> introduced API in a more verbose way.

Thanks, sounds well!

> 
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Custom memcpy function (see below) makes sense if this module will be
> > used for cpu/sample profiler based on a signal-based timer. Else it can
> > be easily redefined.
> 
> I suggest to not overcomplicate all this machinery added for an a single
> profiler. Let's drop all memcpy-related hacks. By the way we can simply
> incorporate it when it is necessary.

Sure.

> 
> > 
> >  src/Makefile            |   5 +-
> >  src/Makefile.dep        |   2 +
> >  src/profile/ljp_write.c | 195 ++++++++++++++++++++++++++++++++++++++++
> >  src/profile/ljp_write.h |  84 +++++++++++++++++
> >  4 files changed, 284 insertions(+), 2 deletions(-)
> >  create mode 100644 src/profile/ljp_write.c
> >  create mode 100644 src/profile/ljp_write.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index be7ed95..4b1d937 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> 
> Please, adjust these changes considering the comments to the first
> patch. You can find the proposed naming above.

Done.

> 
> > @@ -469,6 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
> 
> <snipped>
> 
> > diff --git a/src/profile/ljp_write.c b/src/profile/ljp_write.c
> > new file mode 100644
> > index 0000000..de7202d
> > --- /dev/null
> > +++ b/src/profile/ljp_write.c
> > @@ -0,0 +1,195 @@
> > +/*
> > +** Low-level writer for LuaJIT Profiler.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#include <unistd.h>
> > +#include <errno.h>
> > +
> > +#include "profile/ljp_write.h"
> > +#include "utils/leb128.h"
> > +#include "lj_def.h"
> 
> <snipped>
> 
> > +/* Wraps a write syscall ensuring all data have been written. */
> 
> I see no syscall wrapped here. Anyway, IIRC we discussed that we don't
> need such complex interfaces in scope of MVP. So you can just use
> <write> or <fwrite> right here for now and redesign the wbuf API later
> if necessary (it's internal, so I see no problem).

It *may* be wrapped here (encapsulated in writer function). As I say
above, it can lead us to some thoughts about C API. It is internal
interface and we can change it whenever we want.

Thoughts?

> 
> > +static void write_buffer_sys(struct ljp_buffer *buffer, const void **data,
> > +			     size_t len)
> 
> Why do you need a separate function for this? I guess this should be
> moved right to the <ljp_write_flush_buffer> (hell these names).

I suppose it not necessary. I'll merge it with flush.

> 
> > +{
> > +  void *ctx = buffer->ctx;
> > +  size_t written;
> > +
> > +  lua_assert(!ljp_write_test_flag(buffer, STREAM_STOP));
> > +
> > +  written = buffer->writer(data, len, ctx);
> 
> Well, I believe you can use buffer->ctx instead of the additional
> variable here. Trust me, you can!

Yep :)

> 
> > +
> > +  if (LJ_UNLIKELY(written < len)) {
> > +    write_set_flag(buffer, STREAM_ERR_IO);
> > +    write_save_errno(buffer);
> > +  }
> > +  if (LJ_UNLIKELY(*data == NULL)) {
> > +    write_set_flag(buffer, STREAM_STOP);
> > +    write_save_errno(buffer);
> > +  }
> > +}
> > +
> > +static LJ_AINLINE size_t write_bytes_buffered(const struct ljp_buffer *buf)
> 
> I propose s/write_bytes_buffered/lj_wbuf_len/ (consider sbuflen macro).

Fixed.

> 
> <snipped>
> 
> > +static LJ_AINLINE int write_buffer_has(const struct ljp_buffer *buf, size_t n)
> 
> I propose s/write_buffer_has/lj_wbuf_left/ (consider sbufleft macro).

Fixed.

> 

<snipped>

> > +/* Writes n bytes from an arbitrary buffer src to the output. */
> > +static void write_buffer(struct ljp_buffer *buf, const void *src, size_t n)
> > +{
> > +  if (LJ_UNLIKELY(ljp_write_test_flag(buf, STREAM_STOP)))
> > +    return;
> > +  /*
> > +  ** Very unlikely: We are told to write a large buffer at once.
> > +  ** Buffer not belong to us so we must to pump data
> > +  ** through buffer.
> > +  */
> > +  while (LJ_UNLIKELY(n > buf->size)) {
> > +    ljp_write_flush_buffer(buf);
> 
> Why do you need to flush the buffer on start? I guess you can fill the
> buffer till it becomes full and only then flush.

Thanks! Good point!

> 
> > +    write_memcpy(buf->pos, src, buf->size);
> > +    buf->pos += (ptrdiff_t)buf->size;
> > +    n -= buf->size;
> > +  }
> > +
> > +  write_reserve(buf, n);
> > +  write_memcpy(buf->pos, src, n);
> > +  buf->pos += (ptrdiff_t)n;
> > +}
> > +
> > +/* Writes a \0-terminated C string to the output buffer. */
> > +void ljp_write_string(struct ljp_buffer *buf, const char *s)
> > +{
> > +  const size_t l = strlen(s);
> > +
> > +  ljp_write_u64(buf, (uint64_t)l);
> 
> This is unclear that the check that profiling is still active is made in
> scope of the callee.

I'll add the comment here.

> 
> > +  write_buffer(buf, s, l);
> > +}
> > +
> 
> <snipped>
> 
> > diff --git a/src/profile/ljp_write.h b/src/profile/ljp_write.h
> > new file mode 100644
> > index 0000000..29c1669
> > --- /dev/null
> > +++ b/src/profile/ljp_write.h
> > @@ -0,0 +1,84 @@
> > +/*
> > +** Low-level event streaming for LuaJIT Profiler.
> > +** NB! Please note that all events may be streamed inside a signal handler.
> > +** This means effectively that only async-signal-safe library functions and
> > +** syscalls MUST be used for streaming. Check with `man 7 signal` when in
> > +** doubt.
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#ifndef _LJP_WRITE_H
> > +#define _LJP_WRITE_H
> > +
> > +#include <stdint.h>
> > +
> > +/*
> > +** Data format for strings:
> > +**
> > +** string         := string-len string-payload
> > +** string-len     := <ULEB128>
> > +** string-payload := <BYTE> {string-len}
> > +**
> > +** Note.
> > +** For strings shorter than 128 bytes (most likely scenario in our case)
> > +** we write the same amount of data (1-byte ULEB128 + actual payload) as we
> > +** would have written with straightforward serialization (actual payload + \0),
> > +** but make parsing easier.
> > +*/
> > +
> > +/* Stream errors. */
> > +#define STREAM_ERR_IO 0x1
> > +#define STREAM_STOP   0x2
> > +
> > +typedef size_t (*ljp_writer)(const void **data, size_t len, void *opt);
> > +
> > +/* Write buffer for profilers. */
> > +struct ljp_buffer {
> > +  /*
> > +  ** Buffer writer which will called at buffer write.
> > +  ** Should return amount of written bytes on success or zero in case of error.
> > +  ** *data should contain new buffer of size greater or equal to len.
> > +  ** If *data == NULL stream stops.
> > +  */
> > +  ljp_writer writer;
> > +  /* Context to writer function. */
> 
> Typo: s/Context to/Context for/.

Thanks.

> 
> > +  void *ctx;
> > +  /* Buffer size. */
> > +  size_t size;
> > +  /* Saved errno in case of error. */
> > +  int saved_errno;
> > +  /* Start of buffer. */
> > +  uint8_t *buf;
> > +  /* Current position in buffer. */
> > +  uint8_t *pos;
> > +  /* Internal flags. */
> > +  volatile uint8_t flags;
> > +};
> 
> Well, I don't get why the functions are called <ljp_write_*>, but the
> first parameter (i.e. "self") is ljp_buffer. As a result such names as
> <ljp_write_errno> looks confusing, since errno is actually written
> nowhere. I suggest to name everything with <lj_wbuf_*> prefix, so the
> names <lj_wbuf_errno> and <lj_wbuf_test_flag> fits the resulting value.
> Furthermore, the routines appending the data to the buffer can be
> renamed the following way: ljp_write_<type> -> lj_wbuf_add<type>
> (consider Lua standart buffer API). Thoughts?

Yes, it's more consistent with LuaJIT internals too.

> 
> > +
> > +/* Write string. */
> > +void ljp_write_string(struct ljp_buffer *buf, const char *s);
> > +
> > +/* Write single byte. */
> > +void ljp_write_byte(struct ljp_buffer *buf, uint8_t b);
> > +
> > +/* Write uint64_t in uleb128 format. */
> > +void ljp_write_u64(struct ljp_buffer *buf, uint64_t n);
> > +
> > +/* Immediatly flush buffer. */
> > +void ljp_write_flush_buffer(struct ljp_buffer *buf);
> > +
> > +/* Init buffer. */
> > +void ljp_write_init(struct ljp_buffer *buf, ljp_writer writer, void *ctx,
> > +		    uint8_t *mem, size_t size);
> > +
> > +/* Check flags. */
> > +int ljp_write_test_flag(const struct ljp_buffer *buf, uint8_t flag);
> > +
> > +/* Return saved errno. */
> > +int ljp_write_errno(const struct ljp_buffer *buf);
> > +
> > +/* Set pointers to NULL and reset flags. */
> > +void ljp_write_terminate(struct ljp_buffer *buf);
> > +
> > +#endif
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory
  2020-12-20 22:46   ` Igor Munkin
@ 2020-12-24  6:47     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24  6:47 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch, but I see no rationale for it. Now luajit-gdb.py
> is located alongside with LuaJIT binary, so when one runs gdb with it,
> the debugger tries to autoload the extenstion. After your changes it
> doesn't. Please, drop this patch out of the series.

OK.

> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API
  2020-12-20 22:46   ` Igor Munkin
@ 2020-12-24  6:50     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24  6:50 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! I doubt this change need to be made as a separate
> patch. Anyway, I left a couple of nits below.
> 
> Furthermore, it's better to use 'core' prefix here, otherwise it's not
> needed since every translation unit uses own one then.
> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch renames debug_frameline to lj_debug_frameline and moves it to
> > public <lj_debug.h> module API (does not provide it with LUA_API). It
> > will be used for memory profiler in the next patches.
> 
> Minor: I propose more simple wording:
> | core: make debug_frameline visible for other sources
> |
> | This change makes debug_frameline function LuaJIT-wide visible to be
> | used in other subsystems (e.g. memory profiler).

I'll squash it with memprof commit taking into account your comment below.
Add this note as a separate paragraph.

> 
> > 
> > Part of tarantool/tarantool#5442
> > ---
> >  src/lj_debug.c | 8 ++++----
> >  src/lj_debug.h | 1 +
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> 
> <snipped>
> 
> > diff --git a/src/lj_debug.h b/src/lj_debug.h
> > index 5917c00..1b5ef29 100644
> > --- a/src/lj_debug.h
> > +++ b/src/lj_debug.h
> > @@ -40,6 +40,7 @@ LJ_FUNC void lj_debug_addloc(lua_State *L, const char *msg,
> >  LJ_FUNC void lj_debug_pushloc(lua_State *L, GCproto *pt, BCPos pc);
> >  LJ_FUNC int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar,
> >  			     int ext);
> > +LJ_FUNC BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe);
> 
> Minor: As you can see below, <lj_debug_dumpstack> is provided only if
> profiler is enabled. Since this function is necessary only for memprof,
> there is no need to make it LuaJIT-wide visible by default. This looks
> more natural to me. This is not required but seems like a nice approach,
> so feel free to ignore.

Looks really nice, I'll fix this in the next series.

> 
> >  #if LJ_HASPROFILE
> >  LJ_FUNC void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt,
> >  				int depth);
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module
  2020-12-21 10:30   ` Igor Munkin
@ 2020-12-24  7:00     ` Sergey Kaplun
  2020-12-24  9:36       ` Igor Munkin
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24  7:00 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider minor comments below.
> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch adds profile writer that writes all necessary Lua
> > functions prototypes info like GCproto address, name of the chunk this
> > function was defined in and number of the first line of it.
> > See <ljp_symtab.h> for details.
> > 
> > Usage of this module will be added at the next patches.
> 
> I propose the following wording to make it a bit clearer:
> | core: introduce symtab dumping module
> |
> | This patch introduces the routine dumping the definitions of all
> | loaded Lua functions via the write buffer introduced in the previous
> | patch. The following information is recorded for each function:
> | * GCproto address
> | * The name of the Lua chunk where this function is defined
> | * The line number where this function is defined (i.e. the line where
> |   its signature locates)

Thanks, I'll merge this commit into memprof (see my comments below) and
add corresponding paragraph to commit message.

> 
> > 
> > Part of tarantool/tarantool#5442
> > ---
> >  src/Makefile             |  2 +-
> >  src/Makefile.dep         |  2 ++
> >  src/profile/ljp_symtab.c | 55 ++++++++++++++++++++++++++++++++++++++
> >  src/profile/ljp_symtab.h | 57 ++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 115 insertions(+), 1 deletion(-)
> >  create mode 100644 src/profile/ljp_symtab.c
> >  create mode 100644 src/profile/ljp_symtab.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index 4b1d937..e00265c 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> 
> Please, adjust these changes considering the comments to the first
> patch. I propose to name this <lj_wbuf_symtab.*> considering the naming
> in the third patch.
> 
> > @@ -469,7 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
> 
> <snipped>
> 
> > diff --git a/src/profile/ljp_symtab.c b/src/profile/ljp_symtab.c
> > new file mode 100644
> > index 0000000..5a17c97
> > --- /dev/null
> > +++ b/src/profile/ljp_symtab.c
> > @@ -0,0 +1,55 @@
> 
> <snipped>
> 
> > +#define LJS_CURRENT_VERSION 2
> 
> Why is it already the second version?

It's inherited from LuaVela. I didn't know is it right to drop version
to zero. But AFAICS from your comment it is :)

> 
> > +
> > +static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
> > +					   0x0, 0x0, 0x0};
> > +
> > +static void symtab_write_prologue(struct ljp_buffer *out)
> 
> Why do you need a separate routine instead of making <write_buffer>
> function in <ljp_write.c> public?

Yes, make it public at the next series.

> 
> > +{
> 
> <snipped>
> 
> > +void ljp_symtab_write(struct ljp_buffer *out, const struct global_State *g)
> > +{
> > +  const GCobj *o;
> > +  const GCRef *iter = &g->gc.root;
> > +
> > +  symtab_write_prologue(out);
> > +
> > +  while (NULL != (o = gcref(*iter))) {
> > +    switch (o->gch.gct) {
> 
> <snipped>
> 
> > +    case (~LJ_TTRACE): {
> > +      /* TODO: Implement dumping a trace info */
> > +      break;
> > +    }
> 
> Minor: So this case be dropped for now?

Yes, I've leave it as a reminder. I'll drop it in the next series.

> 
> > +    default: {
> 
> <snipped>
> 
> > diff --git a/src/profile/ljp_symtab.h b/src/profile/ljp_symtab.h
> > new file mode 100644
> > index 0000000..3a40d98
> > --- /dev/null
> > +++ b/src/profile/ljp_symtab.h
> > @@ -0,0 +1,57 @@
> 
> <snipped>
> 
> > +struct global_State;
> > +struct ljp_buffer;
> 
> Why do not simply include <lj_obj.h> and <profile/ljp_write.h>?
> Otherwise, just move it close to the function signature and mention the
> reason why you omit these headers inclusion.

I don't provide neither <lj_obj.h> definition nor <lj_wbuf.h> here.
They are not related to this header, only used in the translation unit.

This will be dropped as far as I'll merge it commit with a memprof-related
one.

> 
> > +
> 
> <snipped>
> 
> > +
> > +/* Writes the symbol table of the VM g to out. */
> 
> Strictly saying this routine dumps the symbol table to *ljp_buffer* and
> I guess it should be mentioned in its name e.g. <lj_wbuf_addsymtab>.

At least you can see it from function definition.
I'll make this function internal for memprof only for now.

> 
> > +void ljp_symtab_write(struct ljp_buffer *out, const struct global_State *g);
> > +
> > +#endif
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler
  2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin
@ 2020-12-24  7:02   ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24  7:02 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!
I just have to add a couple of comments.
Expect next verison soon :).

On 21.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the series! I reviewed the patches (1..4, 7, 10) and expect
> considerable changes in both code and sources layout, so I guess we can
> proceed in the v2. Anyway, the patches 5 and 6 looks independent to
> these changes and I review them in a while. The patches 8, 9 and 11 may
> change in the next version, so I'll glance them but the full review will
> be made for the next version.
> 
> On 16.12.20, Sergey Kaplun wrote:
> > This patch provides a Lua interface for memory profiler in LuaJIT
> > and the corresponding parser of profiled data.
> > 
> > Issues: https://github.com/tarantool/tarantool/issues/5442
> >         https://github.com/tarantool/tarantool/issues/5490
> > 
> > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-5442-luajit-memory-profiler
> > 
> > CI:     https://gitlab.com/tarantool/tarantool/-/pipelines/230917973
> > 
> > RFC: https://lists.tarantool.org/pipermail/tarantool-discussions/2020-December/000144.html
> > 
> > @ChangeLog:
> >  - Introduce LuaJIT memory profiler (gh-5442).
> 
> IIRC, there is a separate issue for memprof parse, so it's worth to add
> it to the ChangeLog also.

Yes, thanks!

> 
> > 
> > Sergey Kaplun (11):
> >   build: add src dir in building
> >   utils: introduce leb128 reader and writer
> >   profile: introduce profiler writing module
> >   profile: introduce symtab write module
> >   vm: introduce LFUNC and FFUNC vmstates
> >   core: introduce new mem_L field
> >   debug: move debug_frameline to public module API
> >   profile: introduce memory profiler
> >   misc: add Lua API for memory profiler
> >   tools: introduce tools directory
> >   profile: introduce profile parser
> > 
> 
> <snipped>
> 
> > 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-23 22:34     ` Sergey Kaplun
@ 2020-12-24  9:11       ` Igor Munkin
  2020-12-25  8:46         ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-24  9:11 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

On 24.12.20, Sergey Kaplun wrote:
> Igor,
> 
> Thanks for the review!
> 
> On 21.12.20, Igor Munkin wrote:
> > Sergey,
> > 
> > Thanks for the patch! Please, consider the comments below.
> > 
> > On 16.12.20, Sergey Kaplun wrote:

<snipped>

> > > diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> > > new file mode 100644
> > > index 0000000..921e5bc
> > > --- /dev/null
> > > +++ b/src/utils/leb128.c
> > > @@ -0,0 +1,124 @@
> > > +/*
> > > +** Working with LEB128/ULEB128 encoding.
> > > +**
> > > +** Major portions taken verbatim or adapted from the LuaVela.
> > > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > > +*/
> > > +
> > > +#include <stdint.h>
> > > +#include <stddef.h>
> > 
> > Why do you include this again instead of using leb128.h?
> 
> It's enough. Definitions from <leb128.h> is redundant here.

Sorry, what definitions are *redundant* here?

> 
> > 
> > > +

<snipped>

> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module
  2020-12-24  7:00     ` Sergey Kaplun
@ 2020-12-24  9:36       ` Igor Munkin
  2020-12-25  8:45         ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-24  9:36 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

On 24.12.20, Sergey Kaplun wrote:
> Igor,
> 
> Thanks for the review!
> 
> On 21.12.20, Igor Munkin wrote:
> > Sergey,
> > 
> > Thanks for the patch! Please consider minor comments below.
> > 
> > On 16.12.20, Sergey Kaplun wrote:

<snipped>

> > > diff --git a/src/profile/ljp_symtab.c b/src/profile/ljp_symtab.c
> > > new file mode 100644
> > > index 0000000..5a17c97
> > > --- /dev/null
> > > +++ b/src/profile/ljp_symtab.c
> > > @@ -0,0 +1,55 @@
> > 
> > <snipped>
> > 
> > > +#define LJS_CURRENT_VERSION 2
> > 
> > Why is it already the second version?
> 
> It's inherited from LuaVela. I didn't know is it right to drop version
> to zero. But AFAICS from your comment it is :)

Why zero? I believe the most natural way to number versions starts with
the *first* version (zero versions are often considered as something not
ready to be released or unstable). Anyway, this value breaks nothing and
should be just incremented for the future symtab changes, so feel free
to use zero, if you want.

> 
> > 
> > > +

<snipped>

> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-24  6:46     ` Sergey Kaplun
@ 2020-12-24 15:45       ` Sergey Ostanevich
  2020-12-24 21:20         ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Ostanevich @ 2020-12-24 15:45 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 11455 bytes --]

Hi!

Thanks for the changes. Although I see no new patch - I look into the 
origin/skaplun/gh-5442-luajit-memory-profiler under third_party/luajit

+void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
+                 uint8_t *mem, size_t size)
+{
+  lua_assert(size >= LEB128_U64_MAXSIZE);

Is it meaningful to allocate just 10bytes?

+/* Writes n bytes from an arbitrary buffer src to the buffer. */
+void lj_wbuf_addn(struct lj_wbuf *buf, const void *src, size_t n)
+{
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(buf, STREAM_STOP)))
+    return;
+  /*
+  ** Very unlikely: We are told to write a large buffer at once.
+  ** Buffer not belong to us so we must to pump data
+  ** through buffer.
+  */
+  while (LJ_UNLIKELY(n > buf->size)) {
+    const size_t left = wbuf_left(buf);
+    memcpy(buf->pos, src, left);

I’m afraid the src pointer is not progress through the while() loop


+    buf->pos += (ptrdiff_t)left;
+    lj_wbuf_flush(buf);
+    n -= left;
+  }


Sergos.


> On 24 Dec 2020, at 09:46, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> Hi, Igor!
> 
> Thanks for the review!
> 
> On 21.12.20, Igor Munkin wrote:
>> Sergey,
>> 
>> Thanks for the patch! Please consider the comments below. I expected
>> that we agreed we need an MVP for now, so the module API is fine except
>> the <write_buffer_sys> part: it is overcomplicated for the current uses.
> 
> We can try this API for now. This may lead to thoughts about improving C
> API.
> 
>> 
>> On 16.12.20, Sergey Kaplun wrote:
>>> This patch introduces module for writing profile data.
>>> Its usage will be added at the next patches.
>>> 
>>> It can be used for memory profiler or for signal-based
>>> cpu profiler.
>> 
>> I see nothing strongly related to the profiler, so it's simply an
>> internal write buffer. Then let's settle for the naming at first. I
>> guess <lj_wbuf*> is a good prefix for this subsystem.
> 
> OK, it looks better.
> 
>> 
>> So, I propose the following wording for the commit message:
>> | core: introduce write buffer module
>> |
>> | This patch introduces the standalone module for writing data to the
>> | file via the special buffer. The module provides the API for buffer
>> | initial setup and its convenient usage.
>> 
>> Feel free to adjust the description on you own and even describe the
>> introduced API in a more verbose way.
> 
> Thanks, sounds well!
> 
>> 
>>> 
>>> Part of tarantool/tarantool#5442
>>> ---
>>> 
>>> Custom memcpy function (see below) makes sense if this module will be
>>> used for cpu/sample profiler based on a signal-based timer. Else it can
>>> be easily redefined.
>> 
>> I suggest to not overcomplicate all this machinery added for an a single
>> profiler. Let's drop all memcpy-related hacks. By the way we can simply
>> incorporate it when it is necessary.
> 
> Sure.
> 
>> 
>>> 
>>> src/Makefile            |   5 +-
>>> src/Makefile.dep        |   2 +
>>> src/profile/ljp_write.c | 195 ++++++++++++++++++++++++++++++++++++++++
>>> src/profile/ljp_write.h |  84 +++++++++++++++++
>>> 4 files changed, 284 insertions(+), 2 deletions(-)
>>> create mode 100644 src/profile/ljp_write.c
>>> create mode 100644 src/profile/ljp_write.h
>>> 
>>> diff --git a/src/Makefile b/src/Makefile
>>> index be7ed95..4b1d937 100644
>>> --- a/src/Makefile
>>> +++ b/src/Makefile
>> 
>> Please, adjust these changes considering the comments to the first
>> patch. You can find the proposed naming above.
> 
> Done.
> 
>> 
>>> @@ -469,6 +469,7 @@ DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
>> 
>> <snipped>
>> 
>>> diff --git a/src/profile/ljp_write.c b/src/profile/ljp_write.c
>>> new file mode 100644
>>> index 0000000..de7202d
>>> --- /dev/null
>>> +++ b/src/profile/ljp_write.c
>>> @@ -0,0 +1,195 @@
>>> +/*
>>> +** Low-level writer for LuaJIT Profiler.
>>> +**
>>> +** Major portions taken verbatim or adapted from the LuaVela.
>>> +** Copyright (C) 2015-2019 IPONWEB Ltd.
>>> +*/
>>> +
>>> +#include <unistd.h>
>>> +#include <errno.h>
>>> +
>>> +#include "profile/ljp_write.h"
>>> +#include "utils/leb128.h"
>>> +#include "lj_def.h"
>> 
>> <snipped>
>> 
>>> +/* Wraps a write syscall ensuring all data have been written. */
>> 
>> I see no syscall wrapped here. Anyway, IIRC we discussed that we don't
>> need such complex interfaces in scope of MVP. So you can just use
>> <write> or <fwrite> right here for now and redesign the wbuf API later
>> if necessary (it's internal, so I see no problem).
> 
> It *may* be wrapped here (encapsulated in writer function). As I say
> above, it can lead us to some thoughts about C API. It is internal
> interface and we can change it whenever we want.
> 
> Thoughts?
> 
>> 
>>> +static void write_buffer_sys(struct ljp_buffer *buffer, const void **data,
>>> +			     size_t len)
>> 
>> Why do you need a separate function for this? I guess this should be
>> moved right to the <ljp_write_flush_buffer> (hell these names).
> 
> I suppose it not necessary. I'll merge it with flush.
> 
>> 
>>> +{
>>> +  void *ctx = buffer->ctx;
>>> +  size_t written;
>>> +
>>> +  lua_assert(!ljp_write_test_flag(buffer, STREAM_STOP));
>>> +
>>> +  written = buffer->writer(data, len, ctx);
>> 
>> Well, I believe you can use buffer->ctx instead of the additional
>> variable here. Trust me, you can!
> 
> Yep :)
> 
>> 
>>> +
>>> +  if (LJ_UNLIKELY(written < len)) {
>>> +    write_set_flag(buffer, STREAM_ERR_IO);
>>> +    write_save_errno(buffer);
>>> +  }
>>> +  if (LJ_UNLIKELY(*data == NULL)) {
>>> +    write_set_flag(buffer, STREAM_STOP);
>>> +    write_save_errno(buffer);
>>> +  }
>>> +}
>>> +
>>> +static LJ_AINLINE size_t write_bytes_buffered(const struct ljp_buffer *buf)
>> 
>> I propose s/write_bytes_buffered/lj_wbuf_len/ (consider sbuflen macro).
> 
> Fixed.
> 
>> 
>> <snipped>
>> 
>>> +static LJ_AINLINE int write_buffer_has(const struct ljp_buffer *buf, size_t n)
>> 
>> I propose s/write_buffer_has/lj_wbuf_left/ (consider sbufleft macro).
> 
> Fixed.
> 
>> 
> 
> <snipped>
> 
>>> +/* Writes n bytes from an arbitrary buffer src to the output. */
>>> +static void write_buffer(struct ljp_buffer *buf, const void *src, size_t n)
>>> +{
>>> +  if (LJ_UNLIKELY(ljp_write_test_flag(buf, STREAM_STOP)))
>>> +    return;
>>> +  /*
>>> +  ** Very unlikely: We are told to write a large buffer at once.
>>> +  ** Buffer not belong to us so we must to pump data
>>> +  ** through buffer.
>>> +  */
>>> +  while (LJ_UNLIKELY(n > buf->size)) {
>>> +    ljp_write_flush_buffer(buf);
>> 
>> Why do you need to flush the buffer on start? I guess you can fill the
>> buffer till it becomes full and only then flush.
> 
> Thanks! Good point!
> 
>> 
>>> +    write_memcpy(buf->pos, src, buf->size);
>>> +    buf->pos += (ptrdiff_t)buf->size;
>>> +    n -= buf->size;
>>> +  }
>>> +
>>> +  write_reserve(buf, n);
>>> +  write_memcpy(buf->pos, src, n);
>>> +  buf->pos += (ptrdiff_t)n;
>>> +}
>>> +
>>> +/* Writes a \0-terminated C string to the output buffer. */
>>> +void ljp_write_string(struct ljp_buffer *buf, const char *s)
>>> +{
>>> +  const size_t l = strlen(s);
>>> +
>>> +  ljp_write_u64(buf, (uint64_t)l);
>> 
>> This is unclear that the check that profiling is still active is made in
>> scope of the callee.
> 
> I'll add the comment here.
> 
>> 
>>> +  write_buffer(buf, s, l);
>>> +}
>>> +
>> 
>> <snipped>
>> 
>>> diff --git a/src/profile/ljp_write.h b/src/profile/ljp_write.h
>>> new file mode 100644
>>> index 0000000..29c1669
>>> --- /dev/null
>>> +++ b/src/profile/ljp_write.h
>>> @@ -0,0 +1,84 @@
>>> +/*
>>> +** Low-level event streaming for LuaJIT Profiler.
>>> +** NB! Please note that all events may be streamed inside a signal handler.
>>> +** This means effectively that only async-signal-safe library functions and
>>> +** syscalls MUST be used for streaming. Check with `man 7 signal` when in
>>> +** doubt.
>>> +** Major portions taken verbatim or adapted from the LuaVela.
>>> +** Copyright (C) 2015-2019 IPONWEB Ltd.
>>> +*/
>>> +
>>> +#ifndef _LJP_WRITE_H
>>> +#define _LJP_WRITE_H
>>> +
>>> +#include <stdint.h>
>>> +
>>> +/*
>>> +** Data format for strings:
>>> +**
>>> +** string         := string-len string-payload
>>> +** string-len     := <ULEB128>
>>> +** string-payload := <BYTE> {string-len}
>>> +**
>>> +** Note.
>>> +** For strings shorter than 128 bytes (most likely scenario in our case)
>>> +** we write the same amount of data (1-byte ULEB128 + actual payload) as we
>>> +** would have written with straightforward serialization (actual payload + \0),
>>> +** but make parsing easier.
>>> +*/
>>> +
>>> +/* Stream errors. */
>>> +#define STREAM_ERR_IO 0x1
>>> +#define STREAM_STOP   0x2
>>> +
>>> +typedef size_t (*ljp_writer)(const void **data, size_t len, void *opt);
>>> +
>>> +/* Write buffer for profilers. */
>>> +struct ljp_buffer {
>>> +  /*
>>> +  ** Buffer writer which will called at buffer write.
>>> +  ** Should return amount of written bytes on success or zero in case of error.
>>> +  ** *data should contain new buffer of size greater or equal to len.
>>> +  ** If *data == NULL stream stops.
>>> +  */
>>> +  ljp_writer writer;
>>> +  /* Context to writer function. */
>> 
>> Typo: s/Context to/Context for/.
> 
> Thanks.
> 
>> 
>>> +  void *ctx;
>>> +  /* Buffer size. */
>>> +  size_t size;
>>> +  /* Saved errno in case of error. */
>>> +  int saved_errno;
>>> +  /* Start of buffer. */
>>> +  uint8_t *buf;
>>> +  /* Current position in buffer. */
>>> +  uint8_t *pos;
>>> +  /* Internal flags. */
>>> +  volatile uint8_t flags;
>>> +};
>> 
>> Well, I don't get why the functions are called <ljp_write_*>, but the
>> first parameter (i.e. "self") is ljp_buffer. As a result such names as
>> <ljp_write_errno> looks confusing, since errno is actually written
>> nowhere. I suggest to name everything with <lj_wbuf_*> prefix, so the
>> names <lj_wbuf_errno> and <lj_wbuf_test_flag> fits the resulting value.
>> Furthermore, the routines appending the data to the buffer can be
>> renamed the following way: ljp_write_<type> -> lj_wbuf_add<type>
>> (consider Lua standart buffer API). Thoughts?
> 
> Yes, it's more consistent with LuaJIT internals too.
> 
>> 
>>> +
>>> +/* Write string. */
>>> +void ljp_write_string(struct ljp_buffer *buf, const char *s);
>>> +
>>> +/* Write single byte. */
>>> +void ljp_write_byte(struct ljp_buffer *buf, uint8_t b);
>>> +
>>> +/* Write uint64_t in uleb128 format. */
>>> +void ljp_write_u64(struct ljp_buffer *buf, uint64_t n);
>>> +
>>> +/* Immediatly flush buffer. */
>>> +void ljp_write_flush_buffer(struct ljp_buffer *buf);
>>> +
>>> +/* Init buffer. */
>>> +void ljp_write_init(struct ljp_buffer *buf, ljp_writer writer, void *ctx,
>>> +		    uint8_t *mem, size_t size);
>>> +
>>> +/* Check flags. */
>>> +int ljp_write_test_flag(const struct ljp_buffer *buf, uint8_t flag);
>>> +
>>> +/* Return saved errno. */
>>> +int ljp_write_errno(const struct ljp_buffer *buf);
>>> +
>>> +/* Set pointers to NULL and reset flags. */
>>> +void ljp_write_terminate(struct ljp_buffer *buf);
>>> +
>>> +#endif
>>> -- 
>>> 2.28.0
>>> 
>> 
>> -- 
>> Best regards,
>> IM
> 
> -- 
> Best regards,
> Sergey Kaplun


[-- Attachment #2: Type: text/html, Size: 57518 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for memory profiler
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun
@ 2020-12-24 16:32   ` Sergey Ostanevich
  2020-12-24 21:25     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Ostanevich @ 2020-12-24 16:32 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch!
Although, the patch might be outdate, still I have some comments below
that are still relevant in branch.

> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index 6f7b9a9..d3b5ab4 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
<snipped>
> +
> +/* local started, err, errno = misc.memprof.start(fname) */
> +LJLIB_CF(misc_memprof_start)
> +{
> +  struct luam_Prof_options opt = {0};
> +  struct memprof_ctx *ctx;
> +  const char *fname;
> +  int memprof_status;
> +  int started;
> +
> +  fname = strdata(lj_lib_checkstr(L, 1));
> +
> +  ctx = lj_mem_new(L, sizeof(*ctx));
> +  if (ctx == NULL)
> +    goto errmem;
> +
> +  opt.ctx = ctx;
> +  opt.writer = buffer_writer_default;
> +  opt.on_stop = on_stop_cb_default;
> +  opt.len = STREAM_BUFFER_SIZE;
> +  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
> +  if (NULL == opt.buf) {
> +    lj_mem_free(G(L), ctx, sizeof(*ctx));
> +    goto errmem;
> +  }
> +
> +  ctx->g = G(L);
> +  ctx->stream = fopen(fname, "wb");
> +
> +  if (ctx->stream == NULL) {
> +    memprof_ctx_free(ctx, opt.buf);
> +    return luaL_fileresult(L, 0, fname);
> +  }
> +
> +  memprof_status = ljp_memprof_start(L, &opt);
> +  started = memprof_status == LUAM_PROFILE_SUCCESS;
> +
> +  if (LJ_UNLIKELY(!started)) {
> +    fclose(ctx->stream);
> +    remove(fname);
> +    memprof_ctx_free(ctx, opt.buf);
> +    switch (memprof_status) {
> +    case LUAM_PROFILE_ERRRUN:
> +      lua_pushnil(L);
> +      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_PROF_ISRUNNING));
> +      return 2;
> +    case LUAM_PROFILE_ERRMEM:
> +      /* Unreachable for now. */
> +      goto errmem;
> +    case LUAM_PROFILE_ERRIO:
> +      return luaL_fileresult(L, 0, fname);
> +    default:
> +      break;

This one means, that you can return ‘false’ without any err/errno set.

> +    }
> +  }
> +  lua_pushboolean(L, started);
> +
> +  return 1;
> +errmem:
> +  lua_pushnil(L);
> +  setstrV(L, L->top++, lj_err_str(L, LJ_ERR_ERRMEM));
> +  return 2;
> +}
> +
> +/* local stopped, err, errno = misc.memprof.stop() */
> +LJLIB_CF(misc_memprof_stop)
> +{
> +  int status = ljp_memprof_stop();
> +  int stopped_successfully = status == LUAM_PROFILE_SUCCESS;
> +  if (!stopped_successfully) {
> +    switch (status) {
> +    case LUAM_PROFILE_ERRRUN:
> +      lua_pushnil(L);
> +      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_PROF_NOTRUNNING));
> +      return 2;
> +    case LUAM_PROFILE_ERRIO:
> +      return luaL_fileresult(L, 0, NULL);
> +    default:
> +      break;

Ditto

> +    }
> +  }
> +  lua_pushboolean(L, stopped_successfully);
> +  return 1;
> +}
> +

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-24 15:45       ` Sergey Ostanevich
@ 2020-12-24 21:20         ` Sergey Kaplun
  2020-12-25  9:37           ` Igor Munkin
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24 21:20 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

On 24.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the changes. Although I see no new patch - I look into the 
> origin/skaplun/gh-5442-luajit-memory-profiler under third_party/luajit

Yes, I'm adding some updates relating to Igor's review for RFC, naming
and commit messages.

> 
> +void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> +                 uint8_t *mem, size_t size)
> +{
> +  lua_assert(size >= LEB128_U64_MAXSIZE);
> 
> Is it meaningful to allocate just 10bytes?

No, but at least we should check it. We need at least 10bytes buffer to
write huge addresses leb128-encoded.

> 
> +/* Writes n bytes from an arbitrary buffer src to the buffer. */
> +void lj_wbuf_addn(struct lj_wbuf *buf, const void *src, size_t n)
> +{
> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(buf, STREAM_STOP)))
> +    return;
> +  /*
> +  ** Very unlikely: We are told to write a large buffer at once.
> +  ** Buffer not belong to us so we must to pump data
> +  ** through buffer.
> +  */
> +  while (LJ_UNLIKELY(n > buf->size)) {
> +    const size_t left = wbuf_left(buf);
> +    memcpy(buf->pos, src, left);
> 
> I’m afraid the src pointer is not progress through the while() loop

Thank you very much! Nice catch! Fixed.

> 
> 
> +    buf->pos += (ptrdiff_t)left;
> +    lj_wbuf_flush(buf);
> +    n -= left;
> +  }
> 
> 
> Sergos.
> 
> 

<snipped>

> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for memory profiler
  2020-12-24 16:32   ` Sergey Ostanevich
@ 2020-12-24 21:25     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-24 21:25 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi!

Thanks for the review!

On 24.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch!
> Although, the patch might be outdate, still I have some comments below
> that are still relevant in branch.
> 
> > diff --git a/src/lib_misc.c b/src/lib_misc.c
> > index 6f7b9a9..d3b5ab4 100644
> > --- a/src/lib_misc.c
> > +++ b/src/lib_misc.c
> <snipped>
> > +
> > +/* local started, err, errno = misc.memprof.start(fname) */
> > +LJLIB_CF(misc_memprof_start)
> > +{
> > +  struct luam_Prof_options opt = {0};
> > +  struct memprof_ctx *ctx;
> > +  const char *fname;
> > +  int memprof_status;
> > +  int started;
> > +
> > +  fname = strdata(lj_lib_checkstr(L, 1));
> > +
> > +  ctx = lj_mem_new(L, sizeof(*ctx));
> > +  if (ctx == NULL)
> > +    goto errmem;
> > +
> > +  opt.ctx = ctx;
> > +  opt.writer = buffer_writer_default;
> > +  opt.on_stop = on_stop_cb_default;
> > +  opt.len = STREAM_BUFFER_SIZE;
> > +  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
> > +  if (NULL == opt.buf) {
> > +    lj_mem_free(G(L), ctx, sizeof(*ctx));
> > +    goto errmem;
> > +  }
> > +
> > +  ctx->g = G(L);
> > +  ctx->stream = fopen(fname, "wb");
> > +
> > +  if (ctx->stream == NULL) {
> > +    memprof_ctx_free(ctx, opt.buf);
> > +    return luaL_fileresult(L, 0, fname);
> > +  }
> > +
> > +  memprof_status = ljp_memprof_start(L, &opt);
> > +  started = memprof_status == LUAM_PROFILE_SUCCESS;
> > +
> > +  if (LJ_UNLIKELY(!started)) {
> > +    fclose(ctx->stream);
> > +    remove(fname);
> > +    memprof_ctx_free(ctx, opt.buf);
> > +    switch (memprof_status) {
> > +    case LUAM_PROFILE_ERRRUN:
> > +      lua_pushnil(L);
> > +      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_PROF_ISRUNNING));
> > +      return 2;
> > +    case LUAM_PROFILE_ERRMEM:
> > +      /* Unreachable for now. */
> > +      goto errmem;
> > +    case LUAM_PROFILE_ERRIO:
> > +      return luaL_fileresult(L, 0, fname);
> > +    default:
> > +      break;
> 
> This one means, that you can return ‘false’ without any err/errno set.

Thanks! I've forgotten to add `lua_assert(0)`. This case is
unreachable, but in release build we should show to the user that
something goes wrong.

> 
> > +    }
> > +  }
> > +  lua_pushboolean(L, started);
> > +
> > +  return 1;
> > +errmem:
> > +  lua_pushnil(L);
> > +  setstrV(L, L->top++, lj_err_str(L, LJ_ERR_ERRMEM));
> > +  return 2;
> > +}
> > +
> > +/* local stopped, err, errno = misc.memprof.stop() */
> > +LJLIB_CF(misc_memprof_stop)
> > +{
> > +  int status = ljp_memprof_stop();
> > +  int stopped_successfully = status == LUAM_PROFILE_SUCCESS;
> > +  if (!stopped_successfully) {
> > +    switch (status) {
> > +    case LUAM_PROFILE_ERRRUN:
> > +      lua_pushnil(L);
> > +      setstrV(L, L->top++, lj_err_str(L, LJ_ERR_PROF_NOTRUNNING));
> > +      return 2;
> > +    case LUAM_PROFILE_ERRIO:
> > +      return luaL_fileresult(L, 0, NULL);
> > +    default:
> > +      break;
> 
> Ditto
> 
> > +    }
> > +  }
> > +  lua_pushboolean(L, stopped_successfully);
> > +  return 1;
> > +}
> > +
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun
@ 2020-12-24 23:09   ` Igor Munkin
  2020-12-25  8:41     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-24 23:09 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider my comments below.

On 16.12.20, Sergey Kaplun wrote:
> This patch adds parser for profiler dumped binary data.
> It provides a script that parses given binary file. It parses symtab using
> ffi first and after map memory events with this symtab. Finally, it
> renders the data in human-readable format.

Side note: I saw the new version for the commit message, so left no
comments for this one.

> 
> Part of tarantool/tarantool#5442
> Part of tarantool/tarantool#5490
> ---
>  test/misclib-memprof-lapi.test.lua    | 125 +++++++++++++++++
>  tools/luajit-parse-memprof            |  20 +++
>  tools/parse_memprof/bufread.lua       | 143 +++++++++++++++++++
>  tools/parse_memprof/main.lua          | 104 ++++++++++++++
>  tools/parse_memprof/parse_memprof.lua | 195 ++++++++++++++++++++++++++
>  tools/parse_memprof/parse_symtab.lua  |  88 ++++++++++++
>  tools/parse_memprof/view_plain.lua    |  45 ++++++
>  7 files changed, 720 insertions(+)
>  create mode 100755 test/misclib-memprof-lapi.test.lua
>  create mode 100755 tools/luajit-parse-memprof
>  create mode 100644 tools/parse_memprof/bufread.lua
>  create mode 100644 tools/parse_memprof/main.lua
>  create mode 100644 tools/parse_memprof/parse_memprof.lua
>  create mode 100644 tools/parse_memprof/parse_symtab.lua
>  create mode 100644 tools/parse_memprof/view_plain.lua

General notes for all comments in the sources below:
* Write it above the code to be commented
* Start it with a capital letter
* End it with a dot

> 
> diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> new file mode 100755
> index 0000000..e3663bb
> --- /dev/null
> +++ b/test/misclib-memprof-lapi.test.lua
> @@ -0,0 +1,125 @@

<snipped>

> +local path = arg[0]:gsub('/[^/]+%.test%.lua', '')
> +local parse_suffix = '../tools/parse_memprof/?.lua;'
> +package.path = ('%s/%s;'):format(path, parse_suffix)..package.path

This is why I hate patching package.path in sources. There is a patchset
where I move this suite to another place. Now I see that I need to patch
this test. OK, we can do nothing with tests for now, so please add a
FIXME note for these lines to fix the issue via environment variable.

> +
> +local table_new = require "table.new"
> +
> +local bufread = require "bufread"
> +local memprof = require "parse_memprof"
> +local symtab  = require "parse_symtab"

Typo: Something is wrong with the whitespace above.

> +
> +local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
> +local BAD_PATH    = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +local tap = require('tap')
> +
> +local test = tap.test("misc-memprof-lapi")
> +test:plan(6)

Why is the test initialized *here*?

> +

<snipped>

> +local function check_alloc_report(alloc, line, function_line, nevents)
> +  assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
> +  print(nevents, alloc[line].num)

Do you need this print?

> +  assert(alloc[line].num == nevents)
> +  return true
> +end
> +
> +-- Not a directory.
> +local res, err = misc.memprof.start(BAD_PATH)
> +test:ok(res == nil and err:match("Not a directory"))

This case can also check so-called "system-dependent error code", right?

> +

<snipped>

> +local reader  = bufread.new(TMP_BINFILE)
> +local symbols = symtab.parse(reader)
> +local events  = memprof.parse(reader, symbols)

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +-- 1 event -- alocation of table by itself + 1 allocation
> +-- of array part as far it bigger then LJ_MAX_COLOSIZE (16).
> +test:ok(check_alloc_report(alloc, 21, 19, 2))
> +-- 100 strings allocations.
> +test:ok(check_alloc_report(alloc, 26, 19, 100))
> +
> +-- Collect all previous allocated objects.
> +test:ok(free.INTERNAL.num == 102)

I guess you there is no need to name these magic numbers, but it's
totally worth to leave a comment describing them.

> +
> +os.exit(test:check() and 0 or 1)
> diff --git a/tools/luajit-parse-memprof b/tools/luajit-parse-memprof
> new file mode 100755
> index 0000000..b9b16d7
> --- /dev/null
> +++ b/tools/luajit-parse-memprof

I believe it's better to implement runner to be installed to the system
instead of the one to be used in sources. By the way, take a look on the
way luajit.pc is installed.

> @@ -0,0 +1,20 @@

<snipped>

> diff --git a/tools/parse_memprof/bufread.lua b/tools/parse_memprof/bufread.lua
> new file mode 100644
> index 0000000..d48d6e8
> --- /dev/null
> +++ b/tools/parse_memprof/bufread.lua
> @@ -0,0 +1,143 @@

<snipped>

> +local ffi_C  = ffi.C
> +local band   = bit.band

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +
> +local function _read_stream(reader, n)
> +  local free_size

Minor: It's better to declare this variable where it is used.

> +  local tail_size = reader._end - reader._pos

<snipped>

> +function M.read_uleb128(reader)
> +  local value = ffi.new('uint64_t', 0)
> +  local shift = 0
> +
> +  repeat
> +    local oct = M.read_octet(reader)
> +
> +    if oct == nil then
> +      break
> +    end

I guess this is an exceptional case, isn't it? If we hit this condition
the ULEB128 value is misencoded, so it's worth to raise an error.

> +
> +    -- Alas, bit library works only with 32-bit arguments:
> +    local oct_u64 = ffi.new('uint64_t', band(oct, 0x7f))

Please do not use magic number in code.

> +    value = value + oct_u64 * (2 ^ shift)
> +    shift = shift + 7
> +
> +  until band(oct, 0x80) == 0

Please do not use magic number in code.

> +
> +  return tonumber(value)
> +end

<snipped>

> diff --git a/tools/parse_memprof/main.lua b/tools/parse_memprof/main.lua
> new file mode 100644
> index 0000000..9a161b1
> --- /dev/null
> +++ b/tools/parse_memprof/main.lua
> @@ -0,0 +1,104 @@

<snipped>

> +local bufread = require "bufread"
> +local memprof = require "parse_memprof"
> +local symtab  = require "parse_symtab"
> +local view    = require "view_plain"

Typo: Something is wrong with the whitespace above.

> +
> +local stdout, stderr = io.stdout, io.stderr
> +local _s = string

This variable looks excess: it is used once for the lookup below.

> +local match, gmatch = _s.match, _s.gmatch

<snipped>

> +-- Parse arguments.
> +local function parseargs(args)
> +  -- Process all option arguments.
> +  args.argn = 1
> +  repeat
> +    local a = args[args.argn]
> +    if not a then break end
> +    local lopt, opt = match(a, "^%-(%-?)(.+)")
> +    if not opt then break end
> +    args.argn = args.argn + 1
> +    if lopt == "" then
> +      -- Loop through short options.
> +      for o in gmatch(opt, ".") do parseopt(o, args) end

Please make this loop multiline.

> +    else
> +      -- Long option.
> +      parseopt(opt, args)
> +    end
> +  until false
> +
> +  -- Check for proper number of arguments.
> +  local nargs = #args - args.argn + 1
> +  if nargs ~= 1 then
> +    opt_map.help()
> +  end
> +
> +  -- Translate a single input file.
> +  -- TODO: Handle multiple files?
> +  return args[args.argn]
> +end
> +
> +local inputfile = parseargs{...}
> +
> +local reader  = bufread.new(inputfile)
> +local symbols = symtab.parse(reader)
> +local events  = memprof.parse(reader, symbols)

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> diff --git a/tools/parse_memprof/parse_memprof.lua b/tools/parse_memprof/parse_memprof.lua
> new file mode 100644
> index 0000000..dc56fed
> --- /dev/null
> +++ b/tools/parse_memprof/parse_memprof.lua
> @@ -0,0 +1,195 @@

<snipped>

> +local bit    = require 'bit'
> +local band   = bit.band
> +local lshift = bit.lshift
> +
> +local string_format = string.format
> +
> +local LJM_MAGIC           = 'ljm'
> +local LJM_CURRENT_VERSION = 2
> +
> +local LJM_EPILOGUE_HEADER = 0x80
> +
> +local AEVENT_ALLOC   = 1
> +local AEVENT_FREE    = 2
> +local AEVENT_REALLOC = 3
> +
> +local ASOURCE_INT   = lshift(1, 2)
> +local ASOURCE_LFUNC = lshift(2, 2)
> +local ASOURCE_CFUNC = lshift(3, 2)

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +local function link_to_previous(heap, e, oaddr)
> +  -- memory at oaddr was allocated before we started tracking:
> +  if heap[oaddr] then
> +    e.primary[heap[oaddr][2]] = heap[oaddr][3]

Please leave the comment that the key is id and the value is location.

> +  end
> +end
> +
> +local function parse_location(reader, asource)

What about introducing another function for it?
| local function id_location(addr, line)
|   return string_format('f%#xl%d', addr, line), {
|     addr = addr,
|     line = line,
|   }
| end
|
| local function parse_location(reader, asource)
|   if asource == ASOURCE_INT then
|     return id_location(0, 0)
|   elseif asource == ASOURCE_CFUNC then
|     return id_location(reader:read_uleb128(), 0)
|   elseif asource == ASOURCE_LFUNC then
|     return id_location(reader:read_uleb128(), reader:read_uleb128())
|   end
|   error('Unknown asource '..asource)
| end

<snipped>

> +  local e = events[id]
> +  e.num   = e.num + 1
> +  e.alloc = e.alloc + nsize

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +  local e = events[id]
> +  e.num   = e.num + 1
> +  e.free  = e.free + osize
> +  e.alloc = e.alloc + nsize

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +  local e = events[id]
> +  e.num   = e.num + 1
> +  e.free  = e.free + osize

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +local parsers = {
> +  [AEVENT_ALLOC]   = {evname =   'alloc', parse = parse_alloc},
> +  [AEVENT_REALLOC] = {evname = 'realloc', parse = parse_realloc},
> +  [AEVENT_FREE]    = {evname =    'free', parse = parse_free},

Typo: Something is wrong with the whitespace above.

> +}

<snipped>

> +local function ev_header_is_epilogue(evh)
> +  return evh == LJM_EPILOGUE_HEADER
> +end

This function is excess. Just use this comparison in the condition.

> +

<snipped>

> +  local magic   = reader:read_octets(3)
> +  local version = reader:read_octets(1)
> +  -- dummy-consume reserved bytes
> +  local _       = reader:read_octets(3)

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> diff --git a/tools/parse_memprof/parse_symtab.lua b/tools/parse_memprof/parse_symtab.lua
> new file mode 100644
> index 0000000..54e9337
> --- /dev/null
> +++ b/tools/parse_memprof/parse_symtab.lua
> @@ -0,0 +1,88 @@

<snipped>

> +local band          = bit.band
> +local string_format = string.format
> +
> +local LJS_MAGIC           = 'ljs'
> +local LJS_CURRENT_VERSION = 2
> +local LJS_EPILOGUE_HEADER = 0x80
> +local LJS_SYMTYPE_MASK    = 0x03

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +-- Parse a single entry in a symtab: lfunc symbol
> +local function parse_sym_lfunc(reader, symtab)
> +  local sym_addr  = reader:read_uleb128()
> +  local sym_chunk = reader:read_string()
> +  local sym_line  = reader:read_uleb128()

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +function M.parse(reader)
> +  local symtab   = {}
> +  local magic    = reader:read_octets(3)
> +  local version  = reader:read_octets(1)

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> +
> +  while not reader:eof() do
> +    local header   = reader:read_octet()
> +    local is_final = band(header, LJS_EPILOGUE_HEADER) ~= 0

Typo: Something is wrong with the whitespace above.

> +

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser
  2020-12-24 23:09   ` Igor Munkin
@ 2020-12-25  8:41     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-25  8:41 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!

Thanks for the review!

On 25.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider my comments below.
> 
> On 16.12.20, Sergey Kaplun wrote:

<snipped>

> 
> General notes for all comments in the sources below:
> * Write it above the code to be commented
> * Start it with a capital letter
> * End it with a dot

Fixed.
Side note: It seemed to me that we decided offline, to transfer
the code from LuaVela with minimal changes in order to avoid conflicts in
case of patching. I am sorry for my misunderstanding.

> 
> > 
> > diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> > new file mode 100755
> > index 0000000..e3663bb
> > --- /dev/null
> > +++ b/test/misclib-memprof-lapi.test.lua
> > @@ -0,0 +1,125 @@
> 
> <snipped>
> 
> > +local path = arg[0]:gsub('/[^/]+%.test%.lua', '')
> > +local parse_suffix = '../tools/parse_memprof/?.lua;'
> > +package.path = ('%s/%s;'):format(path, parse_suffix)..package.path
> 
> This is why I hate patching package.path in sources. There is a patchset
> where I move this suite to another place. Now I see that I need to patch
> this test. OK, we can do nothing with tests for now, so please add a
> FIXME note for these lines to fix the issue via environment variable.

Added, I suppose it should be done in case of this issue [1].

> 
> > +
> > +local table_new = require "table.new"
> > +
> > +local bufread = require "bufread"
> > +local memprof = require "parse_memprof"
> > +local symtab  = require "parse_symtab"
> 
> Typo: Something is wrong with the whitespace above.

Fixed.
Side note: As I see from the Lua developer style guide [2] "also, you
may use alignment". So it's not forbidden. But I take a look inside
LuaJIT <jit.*.lua> modules, for example, and there is no alignment
there. So I suppose we should use LuaJIT internal code style here for
consistency.

P.S: I think it's a good idea to add luacheck-like linter and
clang-format checker (if it is possible to avoid a huge diff) to safe
reviewer's and reviewee's time and formalize LuaJIT coding style guide.
Thoughts?

> 
> > +
> > +local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
> > +local BAD_PATH    = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +local tap = require('tap')
> > +
> > +local test = tap.test("misc-memprof-lapi")
> > +test:plan(6)
> 
> Why is the test initialized *here*?

Good question. Moved to the top.

> 
> > +
> 
> <snipped>
> 
> > +local function check_alloc_report(alloc, line, function_line, nevents)
> > +  assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
> > +  print(nevents, alloc[line].num)
> 
> Do you need this print?

No, thank you, removed.

> 
> > +  assert(alloc[line].num == nevents)
> > +  return true
> > +end
> > +
> > +-- Not a directory.
> > +local res, err = misc.memprof.start(BAD_PATH)
> > +test:ok(res == nil and err:match("Not a directory"))
> 
> This case can also check so-called "system-dependent error code", right?

Yes, I've already added corresponding test to the branch.

> 
> > +
> 
> <snipped>
> 
> > +local reader  = bufread.new(TMP_BINFILE)
> > +local symbols = symtab.parse(reader)
> > +local events  = memprof.parse(reader, symbols)
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +-- 1 event -- alocation of table by itself + 1 allocation
> > +-- of array part as far it bigger then LJ_MAX_COLOSIZE (16).
> > +test:ok(check_alloc_report(alloc, 21, 19, 2))
> > +-- 100 strings allocations.
> > +test:ok(check_alloc_report(alloc, 26, 19, 100))
> > +
> > +-- Collect all previous allocated objects.
> > +test:ok(free.INTERNAL.num == 102)
> 
> I guess you there is no need to name these magic numbers, but it's
> totally worth to leave a comment describing them.

Fixed.

> 
> > +
> > +os.exit(test:check() and 0 or 1)
> > diff --git a/tools/luajit-parse-memprof b/tools/luajit-parse-memprof
> > new file mode 100755
> > index 0000000..b9b16d7
> > --- /dev/null
> > +++ b/tools/luajit-parse-memprof
> 
> I believe it's better to implement runner to be installed to the system
> instead of the one to be used in sources. By the way, take a look on the
> way luajit.pc is installed.

Yes, as we discussed offline.

> 
> > @@ -0,0 +1,20 @@
> 
> <snipped>
> 
> > diff --git a/tools/parse_memprof/bufread.lua b/tools/parse_memprof/bufread.lua
> > new file mode 100644
> > index 0000000..d48d6e8
> > --- /dev/null
> > +++ b/tools/parse_memprof/bufread.lua
> > @@ -0,0 +1,143 @@
> 
> <snipped>
> 
> > +local ffi_C  = ffi.C
> > +local band   = bit.band
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +
> > +local function _read_stream(reader, n)
> > +  local free_size
> 
> Minor: It's better to declare this variable where it is used.

Fixed. Thanks!

> 
> > +  local tail_size = reader._end - reader._pos
> 
> <snipped>
> 
> > +function M.read_uleb128(reader)
> > +  local value = ffi.new('uint64_t', 0)
> > +  local shift = 0
> > +
> > +  repeat
> > +    local oct = M.read_octet(reader)
> > +
> > +    if oct == nil then
> > +      break
> > +    end
> 
> I guess this is an exceptional case, isn't it? If we hit this condition
> the ULEB128 value is misencoded, so it's worth to raise an error.

Done.

> 
> > +
> > +    -- Alas, bit library works only with 32-bit arguments:
> > +    local oct_u64 = ffi.new('uint64_t', band(oct, 0x7f))
> 
> Please do not use magic number in code.

Fixed. Thanks!

> 
> > +    value = value + oct_u64 * (2 ^ shift)
> > +    shift = shift + 7
> > +
> > +  until band(oct, 0x80) == 0
> 
> Please do not use magic number in code.

Fixed. Thanks!

> 
> > +
> > +  return tonumber(value)
> > +end
> 
> <snipped>
> 
> > diff --git a/tools/parse_memprof/main.lua b/tools/parse_memprof/main.lua
> > new file mode 100644
> > index 0000000..9a161b1
> > --- /dev/null
> > +++ b/tools/parse_memprof/main.lua
> > @@ -0,0 +1,104 @@
> 
> <snipped>
> 
> > +local bufread = require "bufread"
> > +local memprof = require "parse_memprof"
> > +local symtab  = require "parse_symtab"
> > +local view    = require "view_plain"
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> > +local stdout, stderr = io.stdout, io.stderr
> > +local _s = string
> 
> This variable looks excess: it is used once for the lookup below.

Fixed.

> 
> > +local match, gmatch = _s.match, _s.gmatch
> 
> <snipped>
> 
> > +-- Parse arguments.
> > +local function parseargs(args)
> > +  -- Process all option arguments.
> > +  args.argn = 1
> > +  repeat
> > +    local a = args[args.argn]
> > +    if not a then break end
> > +    local lopt, opt = match(a, "^%-(%-?)(.+)")
> > +    if not opt then break end
> > +    args.argn = args.argn + 1
> > +    if lopt == "" then
> > +      -- Loop through short options.
> > +      for o in gmatch(opt, ".") do parseopt(o, args) end
> 
> Please make this loop multiline.

Fixed (and if-then-end blocks too).

> 
> > +    else
> > +      -- Long option.
> > +      parseopt(opt, args)
> > +    end
> > +  until false
> > +
> > +  -- Check for proper number of arguments.
> > +  local nargs = #args - args.argn + 1
> > +  if nargs ~= 1 then
> > +    opt_map.help()
> > +  end
> > +
> > +  -- Translate a single input file.
> > +  -- TODO: Handle multiple files?
> > +  return args[args.argn]
> > +end
> > +
> > +local inputfile = parseargs{...}
> > +
> > +local reader  = bufread.new(inputfile)
> > +local symbols = symtab.parse(reader)
> > +local events  = memprof.parse(reader, symbols)
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > diff --git a/tools/parse_memprof/parse_memprof.lua b/tools/parse_memprof/parse_memprof.lua
> > new file mode 100644
> > index 0000000..dc56fed
> > --- /dev/null
> > +++ b/tools/parse_memprof/parse_memprof.lua
> > @@ -0,0 +1,195 @@
> 
> <snipped>
> 
> > +local bit    = require 'bit'
> > +local band   = bit.band
> > +local lshift = bit.lshift
> > +
> > +local string_format = string.format
> > +
> > +local LJM_MAGIC           = 'ljm'
> > +local LJM_CURRENT_VERSION = 2
> > +
> > +local LJM_EPILOGUE_HEADER = 0x80
> > +
> > +local AEVENT_ALLOC   = 1
> > +local AEVENT_FREE    = 2
> > +local AEVENT_REALLOC = 3
> > +
> > +local ASOURCE_INT   = lshift(1, 2)
> > +local ASOURCE_LFUNC = lshift(2, 2)
> > +local ASOURCE_CFUNC = lshift(3, 2)
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +local function link_to_previous(heap, e, oaddr)
> > +  -- memory at oaddr was allocated before we started tracking:
> > +  if heap[oaddr] then
> > +    e.primary[heap[oaddr][2]] = heap[oaddr][3]
> 
> Please leave the comment that the key is id and the value is location.

Done.

> 
> > +  end
> > +end
> > +
> > +local function parse_location(reader, asource)
> 
> What about introducing another function for it?
> | local function id_location(addr, line)
> |   return string_format('f%#xl%d', addr, line), {
> |     addr = addr,
> |     line = line,
> |   }
> | end
> |
> | local function parse_location(reader, asource)
> |   if asource == ASOURCE_INT then
> |     return id_location(0, 0)
> |   elseif asource == ASOURCE_CFUNC then
> |     return id_location(reader:read_uleb128(), 0)
> |   elseif asource == ASOURCE_LFUNC then
> |     return id_location(reader:read_uleb128(), reader:read_uleb128())
> |   end
> |   error('Unknown asource '..asource)
> | end

Thanks, applied.

> 
> <snipped>
> 
> > +  local e = events[id]
> > +  e.num   = e.num + 1
> > +  e.alloc = e.alloc + nsize
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +  local e = events[id]
> > +  e.num   = e.num + 1
> > +  e.free  = e.free + osize
> > +  e.alloc = e.alloc + nsize
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +  local e = events[id]
> > +  e.num   = e.num + 1
> > +  e.free  = e.free + osize
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +local parsers = {
> > +  [AEVENT_ALLOC]   = {evname =   'alloc', parse = parse_alloc},
> > +  [AEVENT_REALLOC] = {evname = 'realloc', parse = parse_realloc},
> > +  [AEVENT_FREE]    = {evname =    'free', parse = parse_free},
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +}
> 
> <snipped>
> 
> > +local function ev_header_is_epilogue(evh)
> > +  return evh == LJM_EPILOGUE_HEADER
> > +end
> 
> This function is excess. Just use this comparison in the condition.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +  local magic   = reader:read_octets(3)
> > +  local version = reader:read_octets(1)
> > +  -- dummy-consume reserved bytes
> > +  local _       = reader:read_octets(3)
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > diff --git a/tools/parse_memprof/parse_symtab.lua b/tools/parse_memprof/parse_symtab.lua
> > new file mode 100644
> > index 0000000..54e9337
> > --- /dev/null
> > +++ b/tools/parse_memprof/parse_symtab.lua
> > @@ -0,0 +1,88 @@
> 
> <snipped>
> 
> > +local band          = bit.band
> > +local string_format = string.format
> > +
> > +local LJS_MAGIC           = 'ljs'
> > +local LJS_CURRENT_VERSION = 2
> > +local LJS_EPILOGUE_HEADER = 0x80
> > +local LJS_SYMTYPE_MASK    = 0x03
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +-- Parse a single entry in a symtab: lfunc symbol
> > +local function parse_sym_lfunc(reader, symtab)
> > +  local sym_addr  = reader:read_uleb128()
> > +  local sym_chunk = reader:read_string()
> > +  local sym_line  = reader:read_uleb128()
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +function M.parse(reader)
> > +  local symtab   = {}
> > +  local magic    = reader:read_octets(3)
> > +  local version  = reader:read_octets(1)
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +
> > +  while not reader:eof() do
> > +    local header   = reader:read_octet()
> > +    local is_final = band(header, LJS_EPILOGUE_HEADER) ~= 0
> 
> Typo: Something is wrong with the whitespace above.

Fixed.

> 
> > +
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

[1]: https://github.com/tarantool/tarantool/issues/4862
[2]: https://www.tarantool.io/en/doc/2.5/dev_guide/lua_style_guide/

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module
  2020-12-24  9:36       ` Igor Munkin
@ 2020-12-25  8:45         ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-25  8:45 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

On 24.12.20, Igor Munkin wrote:
> Sergey,
> 
> On 24.12.20, Sergey Kaplun wrote:
> > Igor,
> > 
> > Thanks for the review!
> > 
> > On 21.12.20, Igor Munkin wrote:
> > > Sergey,
> > > 
> > > Thanks for the patch! Please consider minor comments below.
> > > 
> > > On 16.12.20, Sergey Kaplun wrote:
> 
> <snipped>
> 
> > > > diff --git a/src/profile/ljp_symtab.c b/src/profile/ljp_symtab.c
> > > > new file mode 100644
> > > > index 0000000..5a17c97
> > > > --- /dev/null
> > > > +++ b/src/profile/ljp_symtab.c
> > > > @@ -0,0 +1,55 @@
> > > 
> > > <snipped>
> > > 
> > > > +#define LJS_CURRENT_VERSION 2
> > > 
> > > Why is it already the second version?
> > 
> > It's inherited from LuaVela. I didn't know is it right to drop version
> > to zero. But AFAICS from your comment it is :)
> 
> Why zero? I believe the most natural way to number versions starts with
> the *first* version (zero versions are often considered as something not
> ready to be released or unstable). Anyway, this value breaks nothing and
> should be just incremented for the future symtab changes, so feel free
> to use zero, if you want.

Hmm, my bad. It's MVP, but stable MVP (as we suppose)(:. So lets use the
first verison.

> 
> > 
> > > 
> > > > +
> 
> <snipped>
> 
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer
  2020-12-24  9:11       ` Igor Munkin
@ 2020-12-25  8:46         ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-25  8:46 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

On 24.12.20, Igor Munkin wrote:
> Sergey,
> 
> On 24.12.20, Sergey Kaplun wrote:
> > Igor,
> > 
> > Thanks for the review!
> > 
> > On 21.12.20, Igor Munkin wrote:
> > > Sergey,
> > > 
> > > Thanks for the patch! Please, consider the comments below.
> > > 
> > > On 16.12.20, Sergey Kaplun wrote:
> 
> <snipped>
> 
> > > > diff --git a/src/utils/leb128.c b/src/utils/leb128.c
> > > > new file mode 100644
> > > > index 0000000..921e5bc
> > > > --- /dev/null
> > > > +++ b/src/utils/leb128.c
> > > > @@ -0,0 +1,124 @@
> > > > +/*
> > > > +** Working with LEB128/ULEB128 encoding.
> > > > +**
> > > > +** Major portions taken verbatim or adapted from the LuaVela.
> > > > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > > > +*/
> > > > +
> > > > +#include <stdint.h>
> > > > +#include <stddef.h>
> > > 
> > > Why do you include this again instead of using leb128.h?
> > 
> > It's enough. Definitions from <leb128.h> is redundant here.
> 
> Sorry, what definitions are *redundant* here?

You're right. Sorry. Fixed.

> 
> > 
> > > 
> > > > +
> 
> <snipped>
> 
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-24 21:20         ` Sergey Kaplun
@ 2020-12-25  9:37           ` Igor Munkin
  2020-12-25 10:13             ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Munkin @ 2020-12-25  9:37 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

On 25.12.20, Sergey Kaplun wrote:
> On 24.12.20, Sergey Ostanevich wrote:

<snipped>

> > 
> > +void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> > +                 uint8_t *mem, size_t size)
> > +{
> > +  lua_assert(size >= LEB128_U64_MAXSIZE);
> > 
> > Is it meaningful to allocate just 10bytes?
> 
> No, but at least we should check it. We need at least 10bytes buffer to
> write huge addresses leb128-encoded.

Again, this is a *general purpose write buffer*, so I agree with Sergos,
that this assert looks odd here.

> 

<snipped>

> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module
  2020-12-25  9:37           ` Igor Munkin
@ 2020-12-25 10:13             ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-25 10:13 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

On 25.12.20, Igor Munkin wrote:
> Sergey,
> 
> On 25.12.20, Sergey Kaplun wrote:
> > On 24.12.20, Sergey Ostanevich wrote:
> 
> <snipped>
> 
> > > 
> > > +void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> > > +                 uint8_t *mem, size_t size)
> > > +{
> > > +  lua_assert(size >= LEB128_U64_MAXSIZE);
> > > 
> > > Is it meaningful to allocate just 10bytes?
> > 
> > No, but at least we should check it. We need at least 10bytes buffer to
> > write huge addresses leb128-encoded.
> 
> Again, this is a *general purpose write buffer*, so I agree with Sergos,
> that this assert looks odd here.

Hmm, valid. It can be used for writing bytes only.
OK, I'll remove this assert.

> 
> > 
> 
> <snipped>
> 
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates
  2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun
@ 2020-12-25 11:07   ` Sergey Ostanevich
  2020-12-25 11:23     ` Sergey Kaplun
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Ostanevich @ 2020-12-25 11:07 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch! 
Some comments below.

<...>
> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 80753e0..d4d3a1d 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc
> @@ -140,7 +140,7 @@
> |//-----------------------------------------------------------------------
> |.else			// x64/POSIX stack layout
> |
> -|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
> +|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
> |.macro saveregs_
> |  push rbx; push r15; push r14
> |.if NO_UNWIND
> @@ -161,26 +161,29 @@
> |
> |//----- 16 byte aligned,
> |.if NO_UNWIND
> -|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
> -|.define SAVE_R4,	aword [rsp+aword*10]
> -|.define SAVE_R3,	aword [rsp+aword*9]
> -|.define SAVE_R2,	aword [rsp+aword*8]
> -|.define SAVE_R1,	aword [rsp+aword*7]
> -|.define SAVE_RU2,	aword [rsp+aword*6]
> -|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.

Why did you change all ‘aword’ - which represents address, AFAIU into a qword?

> +|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
> +|.define SAVE_R4,	qword [rsp+qword*12]
> +|.define SAVE_R3,	qword [rsp+qword*11]
> +|.define SAVE_R2,	qword [rsp+qword*10]
> +|.define SAVE_R1,	qword [rsp+qword*9]
> +|.define SAVE_RU2,	qword [rsp+qword*8]
> +|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> |.else
> -|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
> -|.define SAVE_R4,	aword [rsp+aword*8]
> -|.define SAVE_R3,	aword [rsp+aword*7]
> -|.define SAVE_R2,	aword [rsp+aword*6]
> -|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> +|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
> +|.define SAVE_R4,	qword [rsp+qword*10]
> +|.define SAVE_R3,	qword [rsp+qword*9]
> +|.define SAVE_R2,	qword [rsp+qword*8]
> +|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> |.endif
> -|.define SAVE_CFRAME,	aword [rsp+aword*4]
> -|.define SAVE_PC,	aword [rsp+aword*3]
> -|.define SAVE_L,	aword [rsp+aword*2]
> +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> +|.define SAVE_UNUSED2,	qword [rsp+qword*5]

The naming is quite boggling: to save something unused? Why?

> +|.define SAVE_UNUSED1,	dword [rsp+dword*8]
> +|.define SAVE_VMSTATE,	dword [rsp+dword*8]
> +|.define SAVE_PC,	qword [rsp+qword*3]
> +|.define SAVE_L,	qword [rsp+qword*2]
> |.define SAVE_ERRF,	dword [rsp+dword*3]
> |.define SAVE_NRES,	dword [rsp+dword*2]
> -|.define TMP1,		aword [rsp]		//<-- rsp while in interpreter.
> +|.define TMP1,		qword [rsp]		//<-- rsp while in interpreter.
> |//----- 16 byte aligned
> |
> |.define TMP1d,		dword [rsp]
> @@ -342,6 +345,20 @@
> |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
> |.endmacro
> |

Can you set an empty versions of the macros for WIN, so that
later at uses do not do not wrap with .if—--.endif?

> +|.if not WIN
> +|// Save vmstate through register.
> +|.macro save_vmstate_through, reg
> +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> +|  mov SAVE_VMSTATE, reg
> +|.endmacro
> +|
> +|// Restore vmstate through register.
> +|.macro restore_vmstate_through, reg
> +|  mov reg, SAVE_VMSTATE
> +|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
> +|.endmacro
> +|.endif // WIN
> +|
> |.macro fpop1; fstp st1; .endmacro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates
  2020-12-25 11:07   ` Sergey Ostanevich
@ 2020-12-25 11:23     ` Sergey Kaplun
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Kaplun @ 2020-12-25 11:23 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi, Sergos!

Thanks for the review!

On 25.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch! 
> Some comments below.
> 
> <...>
> > diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> > index 80753e0..d4d3a1d 100644
> > --- a/src/vm_x64.dasc
> > +++ b/src/vm_x64.dasc
> > @@ -140,7 +140,7 @@
> > |//-----------------------------------------------------------------------
> > |.else			// x64/POSIX stack layout
> > |
> > -|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
> > +|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
> > |.macro saveregs_
> > |  push rbx; push r15; push r14
> > |.if NO_UNWIND
> > @@ -161,26 +161,29 @@
> > |
> > |//----- 16 byte aligned,
> > |.if NO_UNWIND
> > -|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
> > -|.define SAVE_R4,	aword [rsp+aword*10]
> > -|.define SAVE_R3,	aword [rsp+aword*9]
> > -|.define SAVE_R2,	aword [rsp+aword*8]
> > -|.define SAVE_R1,	aword [rsp+aword*7]
> > -|.define SAVE_RU2,	aword [rsp+aword*6]
> > -|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> 
> Why did you change all ‘aword’ - which represents address, AFAIU into a qword?

According unofficial dynasm documentation [1] `aword` equals 4 or 8
bytes on x86 and x64 correspondingly. Each of this `aword` usage is
already inside defined x86 or x64 arch. So aword usage is misleading
here -- there is always one case, how it can be represented. I've
already changed this code chunk so I thought that is nothing bad to
rewrite it in a more clear way. I can drop these changes if you and Igor
insist on it.

> 
> > +|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
> > +|.define SAVE_R4,	qword [rsp+qword*12]
> > +|.define SAVE_R3,	qword [rsp+qword*11]
> > +|.define SAVE_R2,	qword [rsp+qword*10]
> > +|.define SAVE_R1,	qword [rsp+qword*9]
> > +|.define SAVE_RU2,	qword [rsp+qword*8]
> > +|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> > |.else
> > -|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
> > -|.define SAVE_R4,	aword [rsp+aword*8]
> > -|.define SAVE_R3,	aword [rsp+aword*7]
> > -|.define SAVE_R2,	aword [rsp+aword*6]
> > -|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> > +|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
> > +|.define SAVE_R4,	qword [rsp+qword*10]
> > +|.define SAVE_R3,	qword [rsp+qword*9]
> > +|.define SAVE_R2,	qword [rsp+qword*8]
> > +|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> > |.endif
> > -|.define SAVE_CFRAME,	aword [rsp+aword*4]
> > -|.define SAVE_PC,	aword [rsp+aword*3]
> > -|.define SAVE_L,	aword [rsp+aword*2]
> > +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> > +|.define SAVE_UNUSED2,	qword [rsp+qword*5]
> 
> The naming is quite boggling: to save something unused? Why?

Will rewrite it as UNUSED only, thanks.

> 
> > +|.define SAVE_UNUSED1,	dword [rsp+dword*8]
> > +|.define SAVE_VMSTATE,	dword [rsp+dword*8]
> > +|.define SAVE_PC,	qword [rsp+qword*3]
> > +|.define SAVE_L,	qword [rsp+qword*2]
> > |.define SAVE_ERRF,	dword [rsp+dword*3]
> > |.define SAVE_NRES,	dword [rsp+dword*2]
> > -|.define TMP1,		aword [rsp]		//<-- rsp while in interpreter.
> > +|.define TMP1,		qword [rsp]		//<-- rsp while in interpreter.
> > |//----- 16 byte aligned
> > |
> > |.define TMP1d,		dword [rsp]
> > @@ -342,6 +345,20 @@
> > |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
> > |.endmacro
> > |
> 
> Can you set an empty versions of the macros for WIN, so that
> later at uses do not do not wrap with .if—--.endif?

Good idea. Thank you!

> 
> > +|.if not WIN
> > +|// Save vmstate through register.
> > +|.macro save_vmstate_through, reg
> > +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> > +|  mov SAVE_VMSTATE, reg
> > +|.endmacro
> > +|
> > +|// Restore vmstate through register.
> > +|.macro restore_vmstate_through, reg
> > +|  mov reg, SAVE_VMSTATE
> > +|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
> > +|.endmacro
> > +|.endif // WIN
> > +|
> > |.macro fpop1; fstp st1; .endmacro
> 

[1]: https://corsix.github.io/dynasm-doc/instructions.html#memory

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-12-25 11:24 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 19:13 [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 01/11] build: add src dir in building Sergey Kaplun
2020-12-20 21:27   ` Igor Munkin
2020-12-23 18:20     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 02/11] utils: introduce leb128 reader and writer Sergey Kaplun
2020-12-20 22:44   ` Igor Munkin
2020-12-23 22:34     ` Sergey Kaplun
2020-12-24  9:11       ` Igor Munkin
2020-12-25  8:46         ` Sergey Kaplun
2020-12-23 16:50   ` Sergey Ostanevich
2020-12-23 22:36     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 03/11] profile: introduce profiler writing module Sergey Kaplun
2020-12-21  9:24   ` Igor Munkin
2020-12-24  6:46     ` Sergey Kaplun
2020-12-24 15:45       ` Sergey Ostanevich
2020-12-24 21:20         ` Sergey Kaplun
2020-12-25  9:37           ` Igor Munkin
2020-12-25 10:13             ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 04/11] profile: introduce symtab write module Sergey Kaplun
2020-12-21 10:30   ` Igor Munkin
2020-12-24  7:00     ` Sergey Kaplun
2020-12-24  9:36       ` Igor Munkin
2020-12-25  8:45         ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 05/11] vm: introduce LFUNC and FFUNC vmstates Sergey Kaplun
2020-12-25 11:07   ` Sergey Ostanevich
2020-12-25 11:23     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 06/11] core: introduce new mem_L field Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 07/11] debug: move debug_frameline to public module API Sergey Kaplun
2020-12-20 22:46   ` Igor Munkin
2020-12-24  6:50     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 08/11] profile: introduce memory profiler Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 09/11] misc: add Lua API for " Sergey Kaplun
2020-12-24 16:32   ` Sergey Ostanevich
2020-12-24 21:25     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 10/11] tools: introduce tools directory Sergey Kaplun
2020-12-20 22:46   ` Igor Munkin
2020-12-24  6:47     ` Sergey Kaplun
2020-12-16 19:13 ` [Tarantool-patches] [PATCH luajit v1 11/11] profile: introduce profile parser Sergey Kaplun
2020-12-24 23:09   ` Igor Munkin
2020-12-25  8:41     ` Sergey Kaplun
2020-12-21 10:43 ` [Tarantool-patches] [PATCH luajit v1 00/11] LuaJIT memory profiler Igor Munkin
2020-12-24  7:02   ` Sergey Kaplun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox