Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler
@ 2020-12-25 15:26 Sergey Kaplun
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer Sergey Kaplun
                   ` (11 more replies)
  0 siblings, 12 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches


This patch provides a Lua interface for memory profiler in LuaJIT
and the corresponding parser of profiled data.

Global changes in v2:
  - Moved symtab to memprof module.
  - Added LUA_CORE and `module_name`_c defines
  - Added LJ_FASTCALL in wbuf and leb128 modules.
  - Added translation units to amalg build.
  - Code style fixes and commit message fixes.
  - Added (gh-5490) to ChangeLog.

Issues: https://github.com/tarantool/tarantool/issues/5442
        https://github.com/tarantool/tarantool/issues/5490

Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-5442-luajit-memory-profiler

CI:     https://gitlab.com/tarantool/tarantool/-/pipelines/234430645

RFC: https://lists.tarantool.org/pipermail/tarantool-discussions/2020-December/000147.html

@ChangeLog:
 - Introduce LuaJIT memory profiler (gh-5442).
 - Introduce LuaJIT memory profiler parser (gh-5490).

Sergey Kaplun (7):
  utils: introduce leb128 reader and writer
  core: introduce write buffer module
  vm: introduce VM states for Lua and fast functions
  core: introduce new mem_L field
  core: introduce memory profiler
  misc: add Lua API for memory profiler
  tools: introduce a memory profile parser

 Makefile                           |  39 ++-
 src/Makefile                       |  13 +-
 src/Makefile.dep                   |  44 +--
 src/lib_misc.c                     | 167 +++++++++++
 src/lj_arch.h                      |  22 ++
 src/lj_debug.c                     |   8 +-
 src/lj_debug.h                     |   3 +
 src/lj_errmsg.h                    |   6 +
 src/lj_frame.h                     |  18 +-
 src/lj_gc.c                        |   2 +
 src/lj_memprof.c                   | 430 +++++++++++++++++++++++++++++
 src/lj_memprof.h                   | 165 +++++++++++
 src/lj_obj.h                       |  13 +-
 src/lj_profile.c                   |   5 +-
 src/lj_state.c                     |   8 +
 src/lj_utils.h                     |  58 ++++
 src/lj_utils_leb128.c              | 132 +++++++++
 src/lj_wbuf.c                      | 141 ++++++++++
 src/lj_wbuf.h                      |  87 ++++++
 src/ljamalg.c                      |   3 +
 src/luajit-gdb.py                  |  14 +-
 src/vm_arm.dasc                    |   6 +-
 src/vm_arm64.dasc                  |   6 +-
 src/vm_mips.dasc                   |   6 +-
 src/vm_mips64.dasc                 |   6 +-
 src/vm_ppc.dasc                    |   6 +-
 src/vm_x64.dasc                    |  93 +++++--
 src/vm_x86.dasc                    | 131 ++++++---
 test/misclib-memprof-lapi.test.lua | 135 +++++++++
 tools/luajit-parse-memprof         |   9 +
 tools/memprof.lua                  | 109 ++++++++
 tools/memprof/humanize.lua         |  45 +++
 tools/memprof/parse.lua            | 188 +++++++++++++
 tools/utils/bufread.lua            | 147 ++++++++++
 tools/utils/symtab.lua             |  89 ++++++
 35 files changed, 2217 insertions(+), 137 deletions(-)
 create mode 100644 src/lj_memprof.c
 create mode 100644 src/lj_memprof.h
 create mode 100644 src/lj_utils.h
 create mode 100644 src/lj_utils_leb128.c
 create mode 100644 src/lj_wbuf.c
 create mode 100644 src/lj_wbuf.h
 create mode 100755 test/misclib-memprof-lapi.test.lua
 create mode 100755 tools/luajit-parse-memprof
 create mode 100644 tools/memprof.lua
 create mode 100644 tools/memprof/humanize.lua
 create mode 100644 tools/memprof/parse.lua
 create mode 100644 tools/utils/bufread.lua
 create mode 100644 tools/utils/symtab.lua

-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-25 21:42   ` Igor Munkin
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module Sergey Kaplun
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

Most of the numeric data written by the memory profiler is encoded
via LEB128 compression. This patch introduces the module for encoding
and decoding 64bit number to LEB128 form.

Part of tarantool/tarantool#5442
---

Changes in v2:
  - Removed reader funciton's parameter named guard.
  - Code style fixes.

 src/Makefile          |   3 +-
 src/Makefile.dep      |   7 ++-
 src/lj_utils.h        |  58 +++++++++++++++++++
 src/lj_utils_leb128.c | 132 ++++++++++++++++++++++++++++++++++++++++++
 src/ljamalg.c         |   1 +
 5 files changed, 197 insertions(+), 4 deletions(-)
 create mode 100644 src/lj_utils.h
 create mode 100644 src/lj_utils_leb128.c

diff --git a/src/Makefile b/src/Makefile
index 2786348..dc2ddb6 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -466,6 +466,7 @@ endif
 DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
 DASM_DASC= vm_$(DASM_ARCH).dasc
 
+UTILS_O= lj_utils_leb128.o
 BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
 	   host/buildvm_lib.o host/buildvm_fold.o
 BUILDVM_T= host/buildvm
@@ -495,7 +496,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o \
 	  lj_asm.o lj_trace.o lj_gdbjit.o \
 	  lj_ctype.o lj_cdata.o lj_cconv.o lj_ccall.o lj_ccallback.o \
 	  lj_carith.o lj_clib.o lj_cparse.o \
-	  lj_lib.o lj_alloc.o lib_aux.o \
+	  lj_lib.o lj_alloc.o $(UTILS_O) lib_aux.o \
 	  $(LJLIB_O) lib_init.o
 
 LJVMCORE_O= $(LJVM_O) $(LJCORE_O)
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 556314e..75409bf 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -205,6 +205,7 @@ lj_trace.o: lj_trace.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_vm.h lj_vmevent.h lj_target.h lj_target_*.h
 lj_udata.o: lj_udata.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_gc.h lj_udata.h
+lj_utils_leb128.o: lj_utils_leb128.c lj_utils.h lj_def.h lua.h luaconf.h
 lj_vmevent.o: lj_vmevent.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_str.h lj_tab.h lj_state.h lj_dispatch.h lj_bc.h lj_jit.h lj_ir.h \
  lj_vm.h lj_vmevent.h
@@ -229,9 +230,9 @@ ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \
  lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c lj_record.h lj_ffrecord.h \
  lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h lj_asm.c lj_asm.h \
  lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c lj_alloc.c \
- lib_aux.c lib_base.c lj_libdef.h lib_math.c lib_string.c lib_table.c \
- lib_io.c lib_os.c lib_package.c lib_debug.c lib_bit.c lib_jit.c \
- lib_ffi.c lib_misc.c lib_init.c
+ lj_utils_leb128.c lj_utils.h lib_aux.c lib_base.c lj_libdef.h lib_math.c \
+ lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
+ lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
 luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h
 host/buildvm.o: host/buildvm.c host/buildvm.h lj_def.h lua.h luaconf.h \
  lj_arch.h lj_obj.h lj_def.h lj_arch.h lj_gc.h lj_obj.h lj_bc.h lj_ir.h \
diff --git a/src/lj_utils.h b/src/lj_utils.h
new file mode 100644
index 0000000..1671e8e
--- /dev/null
+++ b/src/lj_utils.h
@@ -0,0 +1,58 @@
+/*
+** Interfaces for working with LEB128/ULEB128 encoding.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJ_UTILS_H
+#define _LJ_UTILS_H
+
+#include "lj_def.h"
+
+/* Maximum number of bytes needed for LEB128 encoding of any 64-bit value. */
+#define LEB128_U64_MAXSIZE 10
+
+/*
+** Reads a value from a buffer of bytes to a int64_t output.
+** No bounds checks for the buffer. Returns number of bytes read.
+*/
+size_t LJ_FASTCALL lj_utils_read_leb128(int64_t *out, const uint8_t *buffer);
+
+/*
+** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
+** than n bytes. No bounds checks for the buffer. Returns number of bytes
+** read. If more than n bytes is about to be consumed, returns 0 without
+** touching out.
+*/
+size_t LJ_FASTCALL lj_utils_read_leb128_n(int64_t *out, const uint8_t *buffer,
+					  size_t n);
+
+/*
+** Reads a value from a buffer of bytes to a uint64_t output.
+** No bounds checks for the buffer. Returns number of bytes read.
+*/
+size_t LJ_FASTCALL lj_utils_read_uleb128(uint64_t *out, const uint8_t *buffer);
+
+/*
+** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
+** than n bytes. No bounds checks for the buffer. Returns number of bytes
+** read. If more than n bytes is about to be consumed, returns 0 without
+** touching out.
+*/
+size_t LJ_FASTCALL lj_utils_read_uleb128_n(uint64_t *out, const uint8_t *buffer,
+					   size_t n);
+
+/*
+** Writes a value from an signed 64-bit input to a buffer of bytes.
+** No bounds checks for the buffer. Returns number of bytes written.
+*/
+size_t LJ_FASTCALL lj_utils_write_leb128(uint8_t *buffer, int64_t value);
+
+/*
+** Writes a value from an unsigned 64-bit input to a buffer of bytes.
+** No bounds checks for the buffer. Returns number of bytes written.
+*/
+size_t LJ_FASTCALL lj_utils_write_uleb128(uint8_t *buffer, uint64_t value);
+
+#endif
diff --git a/src/lj_utils_leb128.c b/src/lj_utils_leb128.c
new file mode 100644
index 0000000..ce8081b
--- /dev/null
+++ b/src/lj_utils_leb128.c
@@ -0,0 +1,132 @@
+/*
+** Working with LEB128/ULEB128 encoding.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#define lj_utils_leb128_c
+#define LUA_CORE
+
+#include "lj_utils.h"
+
+#define LINK_BIT               (0x80)
+#define MIN_TWOBYTE_VALUE      (0x80)
+#define PAYLOAD_MASK           (0x7f)
+#define SHIFT_STEP             (7)
+#define LEB_SIGN_BIT           (0x40)
+
+/* ------------------------- Reading LEB128/ULEB128 ------------------------- */
+
+/*
+** XXX: For each LEB128 type (signed/unsigned) we have two versions of read
+** functions: The one consuming unlimited number of input octets and the one
+** consuming not more than given number of input octets. Currently reading
+** is not used in performance critical places, so these two functions are
+** implemented via single low-level function + run-time mode check. Feel free
+** to change if this becomes a bottleneck.
+*/
+
+static LJ_AINLINE size_t _read_leb128(int64_t *out, const uint8_t *buffer,
+				      size_t n)
+{
+  size_t i = 0;
+  uint64_t shift = 0;
+  int64_t value = 0;
+  uint8_t octet;
+
+  for(;;) {
+    if (n != 0 && i + 1 > n)
+      return 0;
+    octet = buffer[i++];
+    value |= ((int64_t)(octet & PAYLOAD_MASK)) << shift;
+    shift += SHIFT_STEP;
+    if (!(octet & LINK_BIT))
+      break;
+  }
+
+  if (octet & LEB_SIGN_BIT && shift < sizeof(int64_t) * 8)
+    value |= -(1 << shift);
+
+  *out = value;
+  return i;
+}
+
+size_t LJ_FASTCALL lj_utils_read_leb128(int64_t *out, const uint8_t *buffer)
+{
+  return _read_leb128(out, buffer, 0);
+}
+
+size_t LJ_FASTCALL lj_utils_read_leb128_n(int64_t *out, const uint8_t *buffer,
+					  size_t n)
+{
+  return _read_leb128(out, buffer, n);
+}
+
+
+static LJ_AINLINE size_t _read_uleb128(uint64_t *out, const uint8_t *buffer,
+				       size_t n)
+{
+  size_t i = 0;
+  uint64_t value = 0;
+  uint64_t shift = 0;
+  uint8_t octet;
+
+  for(;;) {
+    if (n != 0 && i + 1 > n)
+      return 0;
+    octet = buffer[i++];
+    value |= ((uint64_t)(octet & PAYLOAD_MASK)) << shift;
+    shift += SHIFT_STEP;
+    if (!(octet & LINK_BIT))
+      break;
+  }
+
+  *out = value;
+  return i;
+}
+
+size_t LJ_FASTCALL lj_utils_read_uleb128(uint64_t *out, const uint8_t *buffer)
+{
+  return _read_uleb128(out, buffer, 0);
+}
+
+size_t LJ_FASTCALL lj_utils_read_uleb128_n(uint64_t *out, const uint8_t *buffer,
+					   size_t n)
+{
+  return _read_uleb128(out, buffer, n);
+}
+
+/* ------------------------- Writing LEB128/ULEB128 ------------------------- */
+
+size_t LJ_FASTCALL lj_utils_write_leb128(uint8_t *buffer, int64_t value)
+{
+  size_t i = 0;
+
+  /* LEB_SIGN_BIT propagation to check the remaining value. */
+  while ((uint64_t)(value + LEB_SIGN_BIT) >= MIN_TWOBYTE_VALUE) {
+    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
+    value >>= SHIFT_STEP;
+  }
+
+  /* Omit LINK_BIT in case of overflow. */
+  buffer[i++] = (uint8_t)(value & PAYLOAD_MASK);
+
+  lua_assert(i <= LEB128_U64_MAXSIZE);
+
+  return i;
+}
+
+size_t LJ_FASTCALL lj_utils_write_uleb128(uint8_t *buffer, uint64_t value)
+{
+  size_t i = 0;
+
+  for (; value >= MIN_TWOBYTE_VALUE; value >>= SHIFT_STEP)
+    buffer[i++] = (uint8_t)((value & PAYLOAD_MASK) | LINK_BIT);
+
+  buffer[i++] = (uint8_t)value;
+
+  lua_assert(i <= LEB128_U64_MAXSIZE);
+
+  return i;
+}
diff --git a/src/ljamalg.c b/src/ljamalg.c
index 371bbb6..268b321 100644
--- a/src/ljamalg.c
+++ b/src/ljamalg.c
@@ -81,6 +81,7 @@
 #include "lj_trace.c"
 #include "lj_gdbjit.c"
 #include "lj_alloc.c"
+#include "lj_utils_leb128.c"
 
 #include "lib_aux.c"
 #include "lib_base.c"
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-26 14:22   ` Igor Munkin
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces the standalone module for writing data to the
file, socket or memory (and so on) via the special buffer.
The module provides the API for buffer initial setup
and its convenient usage.

Part of tarantool/tarantool#5442
---

Changes in v2:
  - Removed custom memcpy.
  - lj_wbuf_addn() fills buffer to the end first and then flushes.
  - Changed assert in lj_wbuf_flush() to early return.

 src/Makefile     |   2 +-
 src/Makefile.dep |  23 ++++----
 src/lj_wbuf.c    | 141 +++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_wbuf.h    |  87 +++++++++++++++++++++++++++++
 src/ljamalg.c    |   1 +
 5 files changed, 242 insertions(+), 12 deletions(-)
 create mode 100644 src/lj_wbuf.c
 create mode 100644 src/lj_wbuf.h

diff --git a/src/Makefile b/src/Makefile
index dc2ddb6..384b590 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -485,7 +485,7 @@ LJLIB_O= lib_base.o lib_math.o lib_bit.o lib_string.o lib_table.o \
 	 lib_misc.o
 LJLIB_C= $(LJLIB_O:.o=.c)
 
-LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o \
+LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o lj_wbuf.o \
 	  lj_str.o lj_tab.o lj_func.o lj_udata.o lj_meta.o lj_debug.o \
 	  lj_state.o lj_dispatch.o lj_vmevent.o lj_vmmath.o lj_strscan.o \
 	  lj_strfmt.o lj_strfmt_num.o lj_api.o lj_mapi.o lj_profile.o \
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 75409bf..59ed450 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -211,28 +211,29 @@ lj_vmevent.o: lj_vmevent.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_vm.h lj_vmevent.h
 lj_vmmath.o: lj_vmmath.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_ir.h lj_vm.h
+lj_wbuf.o: lj_wbuf.c lj_wbuf.h lj_def.h lua.h luaconf.h lj_utils.h
 ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \
  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h \
  lj_func.h lj_udata.h lj_meta.h lj_state.h lj_frame.h lj_bc.h lj_ctype.h \
  lj_cdata.h lj_trace.h lj_jit.h lj_ir.h lj_dispatch.h lj_traceerr.h \
  lj_vm.h lj_err.c lj_debug.h lj_ff.h lj_ffdef.h lj_strfmt.h lj_char.c \
- lj_char.h lj_bc.c lj_bcdef.h lj_obj.c lj_buf.c lj_str.c lj_tab.c \
- lj_func.c lj_udata.c lj_meta.c lj_strscan.h lj_lib.h lj_debug.c \
- lj_state.c lj_lex.h lj_alloc.h luajit.h lj_dispatch.c lj_ccallback.h \
- lj_profile.h lj_vmevent.c lj_vmevent.h lj_vmmath.c lj_strscan.c \
- lj_strfmt.c lj_strfmt_num.c lj_api.c lj_mapi.c lmisclib.h lj_profile.c \
- lj_lex.c lualib.h lj_parse.h lj_parse.c lj_bcread.c lj_bcdump.h lj_bcwrite.c \
- lj_load.c lj_ctype.c lj_cdata.c lj_cconv.h lj_cconv.c lj_ccall.c lj_ccall.h \
- lj_ccallback.c lj_target.h lj_target_*.h lj_mcode.h lj_carith.c \
+ lj_char.h lj_bc.c lj_bcdef.h lj_obj.c lj_buf.c lj_wbuf.c lj_wbuf.h lj_utils.h \
+ lj_str.c lj_tab.c lj_func.c lj_udata.c lj_meta.c lj_strscan.h lj_lib.h \
+ lj_debug.c lj_state.c lj_lex.h lj_alloc.h luajit.h lj_dispatch.c \
+ lj_ccallback.h lj_profile.h lj_vmevent.c lj_vmevent.h lj_vmmath.c \
+ lj_strscan.c lj_strfmt.c lj_strfmt_num.c lj_api.c lj_mapi.c lmisclib.h \
+ lj_profile.c lj_lex.c lualib.h lj_parse.h lj_parse.c lj_bcread.c lj_bcdump.h \
+ lj_bcwrite.c lj_load.c lj_ctype.c lj_cdata.c lj_cconv.h lj_cconv.c lj_ccall.c \
+ lj_ccall.h lj_ccallback.c lj_target.h lj_target_*.h lj_mcode.h lj_carith.c \
  lj_carith.h lj_clib.c lj_clib.h lj_cparse.c lj_cparse.h lj_lib.c lj_ir.c \
  lj_ircall.h lj_iropt.h lj_opt_mem.c lj_opt_fold.c lj_folddef.h \
  lj_opt_narrow.c lj_opt_dce.c lj_opt_loop.c lj_snap.h lj_opt_split.c \
  lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c lj_record.h lj_ffrecord.h \
  lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h lj_asm.c lj_asm.h \
  lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c lj_alloc.c \
- lj_utils_leb128.c lj_utils.h lib_aux.c lib_base.c lj_libdef.h lib_math.c \
- lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
- lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
+ lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c lib_string.c \
+ lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c lib_bit.c lib_jit.c \
+ lib_ffi.c lib_misc.c lib_init.c
 luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h
 host/buildvm.o: host/buildvm.c host/buildvm.h lj_def.h lua.h luaconf.h \
  lj_arch.h lj_obj.h lj_def.h lj_arch.h lj_gc.h lj_obj.h lj_bc.h lj_ir.h \
diff --git a/src/lj_wbuf.c b/src/lj_wbuf.c
new file mode 100644
index 0000000..8f090eb
--- /dev/null
+++ b/src/lj_wbuf.c
@@ -0,0 +1,141 @@
+/*
+** Low-level writer for LuaJIT.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#define lj_wbuf_c
+#define LUA_CORE
+
+#include <errno.h>
+
+#include "lj_wbuf.h"
+#include "lj_utils.h"
+
+static LJ_AINLINE void wbuf_set_flag(struct lj_wbuf *buf, uint8_t flag)
+{
+  buf->flags |= flag;
+}
+
+static LJ_AINLINE void wbuf_save_errno(struct lj_wbuf *buf)
+{
+  buf->saved_errno = errno;
+}
+
+static LJ_AINLINE size_t wbuf_len(const struct lj_wbuf *buf)
+{
+  return (size_t)(buf->pos - buf->buf);
+}
+
+static LJ_AINLINE size_t wbuf_left(const struct lj_wbuf *buf)
+{
+  return buf->size - wbuf_len(buf);
+}
+
+void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer,
+		  void *ctx, uint8_t *mem, size_t size)
+{
+  buf->ctx = ctx;
+  buf->writer = writer;
+  buf->buf = mem;
+  buf->pos = mem;
+  buf->size = size;
+  buf->flags = 0;
+  buf->saved_errno = 0;
+}
+
+void LJ_FASTCALL lj_wbuf_terminate(struct lj_wbuf *buf)
+{
+  lj_wbuf_init(buf, NULL, NULL, NULL, 0);
+}
+
+static LJ_AINLINE void wbuf_reserve(struct lj_wbuf *buf, size_t n)
+{
+  lua_assert(n <= buf->size);
+  if (LJ_UNLIKELY(wbuf_left(buf) < n))
+    lj_wbuf_flush(buf);
+}
+
+/* Writes a byte to the output buffer. */
+void LJ_FASTCALL lj_wbuf_addbyte(struct lj_wbuf *buf, uint8_t b)
+{
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(buf, STREAM_STOP)))
+    return;
+  wbuf_reserve(buf, sizeof(b));
+  *buf->pos++ = b;
+}
+
+/* Writes an unsigned integer which is at most 64 bits long to the output. */
+void LJ_FASTCALL lj_wbuf_addu64(struct lj_wbuf *buf, uint64_t n)
+{
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(buf, STREAM_STOP)))
+    return;
+  wbuf_reserve(buf, LEB128_U64_MAXSIZE);
+  buf->pos += (ptrdiff_t)lj_utils_write_uleb128(buf->pos, n);
+}
+
+/* Writes n bytes from an arbitrary buffer src to the buffer. */
+void lj_wbuf_addn(struct lj_wbuf *buf, const void *src, size_t n)
+{
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(buf, STREAM_STOP)))
+    return;
+  /*
+  ** Very unlikely: We are told to write a large buffer at once.
+  ** Buffer not belong to us so we must to pump data
+  ** through buffer.
+  */
+  while (LJ_UNLIKELY(n > buf->size)) {
+    const size_t left = wbuf_left(buf);
+    memcpy(buf->pos, src, left);
+    buf->pos += (ptrdiff_t)left;
+    lj_wbuf_flush(buf);
+    src += (ptrdiff_t)left;
+    n -= left;
+  }
+
+  wbuf_reserve(buf, n);
+  memcpy(buf->pos, src, n);
+  buf->pos += (ptrdiff_t)n;
+}
+
+/* Writes a \0-terminated C string to the output buffer. */
+void LJ_FASTCALL lj_wbuf_addstring(struct lj_wbuf *buf, const char *s)
+{
+  const size_t l = strlen(s);
+
+  /* Check that profiling is still active is made in the callee's scope. */
+  lj_wbuf_addu64(buf, (uint64_t)l);
+  lj_wbuf_addn(buf, s, l);
+}
+
+void LJ_FASTCALL lj_wbuf_flush(struct lj_wbuf *buf)
+{
+  const size_t len = wbuf_len(buf);
+  size_t written;
+
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(buf, STREAM_STOP)))
+    return;
+
+  written = buf->writer((const void **)&buf->buf, len, buf->ctx);
+
+  if (LJ_UNLIKELY(written < len)) {
+    wbuf_set_flag(buf, STREAM_ERR_IO);
+    wbuf_save_errno(buf);
+  }
+  if (LJ_UNLIKELY(buf->buf == NULL)) {
+    wbuf_set_flag(buf, STREAM_STOP);
+    wbuf_save_errno(buf);
+  }
+  buf->pos = buf->buf;
+}
+
+int LJ_FASTCALL lj_wbuf_test_flag(const struct lj_wbuf *buf, uint8_t flag)
+{
+  return buf->flags & flag;
+}
+
+int LJ_FASTCALL lj_wbuf_errno(const struct lj_wbuf *buf)
+{
+  return buf->saved_errno;
+}
diff --git a/src/lj_wbuf.h b/src/lj_wbuf.h
new file mode 100644
index 0000000..58a109e
--- /dev/null
+++ b/src/lj_wbuf.h
@@ -0,0 +1,87 @@
+/*
+** Low-level event streaming for LuaJIT Profiler.
+**
+** XXX: Please note that all events may not be streamed inside a signal handler
+** due to using default memcpy from glibc as not async-signal-safe function.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJ_WBUF_H
+#define _LJ_WBUF_H
+
+#include "lj_def.h"
+
+/*
+** Data format for strings:
+**
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
+**
+** Note.
+** For strings shorter than 128 bytes (most likely scenario in our case)
+** we write the same amount of data (1-byte ULEB128 + actual payload) as we
+** would have written with straightforward serialization (actual payload + \0),
+** but make parsing easier.
+*/
+
+/* Stream errors. */
+#define STREAM_ERR_IO 0x1
+#define STREAM_STOP   0x2
+
+typedef size_t (*lj_wbuf_writer)(const void **data, size_t len, void *opt);
+
+/* Write buffer. */
+struct lj_wbuf {
+  /*
+  ** Buffer writer which will called at buffer write.
+  ** Should return amount of written bytes on success or zero in case of error.
+  ** *data should contain new buffer of size greater or equal to len.
+  ** If *data == NULL stream stops.
+  */
+  lj_wbuf_writer writer;
+  /* Context for writer function. */
+  void *ctx;
+  /* Buffer size. */
+  size_t size;
+  /* Saved errno in case of error. */
+  int saved_errno;
+  /* Start of buffer. */
+  uint8_t *buf;
+  /* Current position in buffer. */
+  uint8_t *pos;
+  /* Internal flags. */
+  volatile uint8_t flags;
+};
+
+/* Init buffer. */
+void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
+		  uint8_t *mem, size_t size);
+
+/* Set pointers to NULL and reset flags and errno. */
+void LJ_FASTCALL lj_wbuf_terminate(struct lj_wbuf *buf);
+
+/* Write single byte to the buffer. */
+void LJ_FASTCALL lj_wbuf_addbyte(struct lj_wbuf *buf, uint8_t b);
+
+/* Write uint64_t in uleb128 format to the buffer. */
+void LJ_FASTCALL lj_wbuf_addu64(struct lj_wbuf *buf, uint64_t n);
+
+/* Writes n bytes from an arbitrary buffer src to the buffer. */
+void lj_wbuf_addn(struct lj_wbuf *buf, const void *src, size_t n);
+
+/* Write string to the buffer. */
+void LJ_FASTCALL lj_wbuf_addstring(struct lj_wbuf *buf, const char *s);
+
+/* Immediatly flush the buffer. */
+void LJ_FASTCALL lj_wbuf_flush(struct lj_wbuf *buf);
+
+/* Check flags. */
+int LJ_FASTCALL lj_wbuf_test_flag(const struct lj_wbuf *buf, uint8_t flag);
+
+/* Return saved errno. */
+int LJ_FASTCALL lj_wbuf_errno(const struct lj_wbuf *buf);
+
+#endif
diff --git a/src/ljamalg.c b/src/ljamalg.c
index 268b321..705e296 100644
--- a/src/ljamalg.c
+++ b/src/ljamalg.c
@@ -34,6 +34,7 @@
 #include "lj_bc.c"
 #include "lj_obj.c"
 #include "lj_buf.c"
+#include "lj_wbuf.c"
 #include "lj_str.c"
 #include "lj_tab.c"
 #include "lj_func.c"
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer Sergey Kaplun
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-26 19:07   ` Sergey Ostanevich
  2020-12-27 23:48   ` Igor Munkin
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field Sergey Kaplun
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces LJ_VMST_LFUNC and LJ_VMST_FFUNC VM states
separated from LJ_VMST_INERP. New VM states allow to determine the
context of Lua VM execution for x86 and x64 arches. Also, LJ_VMST_C is
renamed to LJ_VMST_CFUNC for naming consistence with new VM states.

Also, this patch adjusts stack layout for x86 and x64 arches to save VM
state for its consistency while stack unwinding when error is raised.

Part of tarantool/tarantool#5442
---

Changes in v2:
 - Moved `.if not WIN` macro check inside (save|restore)_vmstate_through
 - Fixed naming: SAVE_UNUSED\d -> UNUSED\d

 src/lj_frame.h     |  18 +++----
 src/lj_obj.h       |   4 +-
 src/lj_profile.c   |   5 +-
 src/luajit-gdb.py  |  14 ++---
 src/vm_arm.dasc    |   6 +--
 src/vm_arm64.dasc  |   6 +--
 src/vm_mips.dasc   |   6 +--
 src/vm_mips64.dasc |   6 +--
 src/vm_ppc.dasc    |   6 +--
 src/vm_x64.dasc    |  93 ++++++++++++++++++++++----------
 src/vm_x86.dasc    | 131 +++++++++++++++++++++++++++++----------------
 11 files changed, 188 insertions(+), 107 deletions(-)

diff --git a/src/lj_frame.h b/src/lj_frame.h
index 19c49a4..2e693f9 100644
--- a/src/lj_frame.h
+++ b/src/lj_frame.h
@@ -127,13 +127,13 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_SIZE		(16*4)
 #define CFRAME_SHIFT_MULTRES	0
 #else
-#define CFRAME_OFS_ERRF		(15*4)
-#define CFRAME_OFS_NRES		(14*4)
-#define CFRAME_OFS_PREV		(13*4)
-#define CFRAME_OFS_L		(12*4)
+#define CFRAME_OFS_ERRF		(19*4)
+#define CFRAME_OFS_NRES		(18*4)
+#define CFRAME_OFS_PREV		(17*4)
+#define CFRAME_OFS_L		(16*4)
 #define CFRAME_OFS_PC		(6*4)
 #define CFRAME_OFS_MULTRES	(5*4)
-#define CFRAME_SIZE		(12*4)
+#define CFRAME_SIZE		(16*4)
 #define CFRAME_SHIFT_MULTRES	0
 #endif
 #elif LJ_TARGET_X64
@@ -152,11 +152,11 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_OFS_NRES		(22*4)
 #define CFRAME_OFS_MULTRES	(21*4)
 #endif
-#define CFRAME_SIZE		(10*8)
+#define CFRAME_SIZE		(12*8)
 #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 9*16 + 4*8)
 #define CFRAME_SHIFT_MULTRES	0
 #else
-#define CFRAME_OFS_PREV		(4*8)
+#define CFRAME_OFS_PREV		(6*8)
 #if LJ_GC64
 #define CFRAME_OFS_PC		(3*8)
 #define CFRAME_OFS_L		(2*8)
@@ -171,9 +171,9 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_OFS_MULTRES	(1*4)
 #endif
 #if LJ_NO_UNWIND
-#define CFRAME_SIZE		(12*8)
+#define CFRAME_SIZE		(14*8)
 #else
-#define CFRAME_SIZE		(10*8)
+#define CFRAME_SIZE		(12*8)
 #endif
 #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 16)
 #define CFRAME_SHIFT_MULTRES	0
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 927b347..7fb715e 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -512,7 +512,9 @@ typedef struct GCtab {
 /* VM states. */
 enum {
   LJ_VMST_INTERP,	/* Interpreter. */
-  LJ_VMST_C,		/* C function. */
+  LJ_VMST_LFUNC,	/* Lua function. */
+  LJ_VMST_FFUNC,	/* Fast function. */
+  LJ_VMST_CFUNC,	/* C function. */
   LJ_VMST_GC,		/* Garbage collector. */
   LJ_VMST_EXIT,		/* Trace exit handler. */
   LJ_VMST_RECORD,	/* Trace recorder. */
diff --git a/src/lj_profile.c b/src/lj_profile.c
index 116998e..637e03c 100644
--- a/src/lj_profile.c
+++ b/src/lj_profile.c
@@ -157,7 +157,10 @@ static void profile_trigger(ProfileState *ps)
     int st = g->vmstate;
     ps->vmstate = st >= 0 ? 'N' :
 		  st == ~LJ_VMST_INTERP ? 'I' :
-		  st == ~LJ_VMST_C ? 'C' :
+		  st == ~LJ_VMST_CFUNC ? 'C' :
+		  /* Stubs for profiler hooks. */
+		  st == ~LJ_VMST_FFUNC ? 'I' :
+		  st == ~LJ_VMST_LFUNC ? 'I' :
 		  st == ~LJ_VMST_GC ? 'G' : 'J';
     g->hookmask = (mask | HOOK_PROFILE);
     lj_dispatch_update(g);
diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
index 652c560..f1fd623 100644
--- a/src/luajit-gdb.py
+++ b/src/luajit-gdb.py
@@ -206,12 +206,14 @@ def J(g):
 def vm_state(g):
     return {
         i2notu32(0): 'INTERP',
-        i2notu32(1): 'C',
-        i2notu32(2): 'GC',
-        i2notu32(3): 'EXIT',
-        i2notu32(4): 'RECORD',
-        i2notu32(5): 'OPT',
-        i2notu32(6): 'ASM',
+        i2notu32(1): 'LFUNC',
+        i2notu32(2): 'FFUNC',
+        i2notu32(3): 'CFUNC',
+        i2notu32(4): 'GC',
+        i2notu32(5): 'EXIT',
+        i2notu32(6): 'RECORD',
+        i2notu32(7): 'OPT',
+        i2notu32(8): 'ASM',
     }.get(int(tou32(g['vmstate'])), 'TRACE')
 
 def gc_state(g):
diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
index d4cdaf5..ae2efdf 100644
--- a/src/vm_arm.dasc
+++ b/src/vm_arm.dasc
@@ -287,7 +287,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  str RB, L->base
   |   ldr KBASE, SAVE_NRES
-  |    mv_vmstate CARG4, C
+  |    mv_vmstate CARG4, CFUNC
   |   sub BASE, BASE, #8
   |  subs CARG3, RC, #8
   |   lsl KBASE, KBASE, #3		// KBASE = (nresults_wanted+1)*8
@@ -348,7 +348,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ldr L, SAVE_L
-  |   mv_vmstate CARG4, C
+  |   mv_vmstate CARG4, CFUNC
   |  ldr GL:CARG3, L->glref
   |   str CARG4, GL:CARG3->vmstate
   |   str L, GL:CARG3->cur_L
@@ -4487,7 +4487,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (op == BC_FUNCCW) {
       |  ldr CARG2, CFUNC:CARG3->f
     }
-    |    mv_vmstate CARG3, C
+    |    mv_vmstate CARG3, CFUNC
     |  mov CARG1, L
     |   bhi ->vm_growstack_c		// Need to grow stack.
     |    st_vmstate CARG3
diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
index 3eaf376..f783428 100644
--- a/src/vm_arm64.dasc
+++ b/src/vm_arm64.dasc
@@ -332,7 +332,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  str RB, L->base
   |   ldrsw CARG2, SAVE_NRES		// CARG2 = nresults+1.
-  |    mv_vmstate TMP0w, C
+  |    mv_vmstate TMP0w, CFUNC
   |   sub BASE, BASE, #16
   |  subs TMP2, RC, #8
   |    st_vmstate TMP0w
@@ -391,7 +391,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ldr L, SAVE_L
-  |   mv_vmstate TMP0w, C
+  |   mv_vmstate TMP0w, CFUNC
   |  ldr GL, L->glref
   |   st_vmstate TMP0w
   |  b ->vm_leave_unw
@@ -3816,7 +3816,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (op == BC_FUNCCW) {
       |  ldr CARG2, CFUNC:CARG3->f
     }
-    |    mv_vmstate TMP0w, C
+    |    mv_vmstate TMP0w, CFUNC
     |  mov CARG1, L
     |   bhi ->vm_growstack_c		// Need to grow stack.
     |    st_vmstate TMP0w
diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc
index 1afd611..ec57d78 100644
--- a/src/vm_mips.dasc
+++ b/src/vm_mips.dasc
@@ -403,7 +403,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  addiu TMP1, RD, -8
   |   sw TMP2, L->base
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   lw TMP2, SAVE_NRES
   |   addiu BASE, BASE, -8
   |    st_vmstate
@@ -473,7 +473,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  move CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  lw L, SAVE_L
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  lw GL:TMP1, L->glref
   |  b ->vm_leave_unw
   |.  sw TMP0, GL:TMP1->vmstate
@@ -5085,7 +5085,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  sw BASE, L->base
     |  sltu AT, TMP2, TMP1
     |   sw RC, L->top
-    |    li_vmstate C
+    |    li_vmstate CFUNC
     if (op == BC_FUNCCW) {
       |  lw CARG2, CFUNC:RB->f
     }
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index c06270a..9a749f9 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
@@ -449,7 +449,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  addiu TMP1, RD, -8
   |   sd TMP2, L->base
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   lw TMP2, SAVE_NRES
   |   daddiu BASE, BASE, -16
   |    st_vmstate
@@ -517,7 +517,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  move CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ld L, SAVE_L
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  ld GL:TMP1, L->glref
   |  b ->vm_leave_unw
   |.  sw TMP0, GL:TMP1->vmstate
@@ -4952,7 +4952,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  sd BASE, L->base
     |  sltu AT, TMP2, TMP1
     |   sd RC, L->top
-    |    li_vmstate C
+    |    li_vmstate CFUNC
     if (op == BC_FUNCCW) {
       |  ld CARG2, CFUNC:RB->f
     }
diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
index b4260eb..62e9b68 100644
--- a/src/vm_ppc.dasc
+++ b/src/vm_ppc.dasc
@@ -520,7 +520,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  // TMP0 = PC & FRAME_TYPE
   |  cmpwi TMP0, FRAME_C
   |   rlwinm TMP2, PC, 0, 0, 28
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   sub TMP2, BASE, TMP2		// TMP2 = previous base.
   |  bney ->vm_returnp
   |
@@ -596,7 +596,7 @@ static void build_subroutines(BuildCtx *ctx)
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  lwz L, SAVE_L
   |  .toc ld TOCREG, SAVE_TOC
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  lwz GL:TMP1, L->glref
   |   stw TMP0, GL:TMP1->vmstate
   |  b ->vm_leave_unw
@@ -5060,7 +5060,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   stp BASE, L->base
     |   cmplw TMP1, TMP2
     |    stp RC, L->top
-    |     li_vmstate C
+    |     li_vmstate CFUNC
     |.if TOC
     |  mtctr TMP3
     |.else
diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
index 80753e0..83cc3e1 100644
--- a/src/vm_x64.dasc
+++ b/src/vm_x64.dasc
@@ -140,7 +140,7 @@
 |//-----------------------------------------------------------------------
 |.else			// x64/POSIX stack layout
 |
-|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
+|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
 |.macro saveregs_
 |  push rbx; push r15; push r14
 |.if NO_UNWIND
@@ -161,26 +161,29 @@
 |
 |//----- 16 byte aligned,
 |.if NO_UNWIND
-|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*10]
-|.define SAVE_R3,	aword [rsp+aword*9]
-|.define SAVE_R2,	aword [rsp+aword*8]
-|.define SAVE_R1,	aword [rsp+aword*7]
-|.define SAVE_RU2,	aword [rsp+aword*6]
-|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*12]
+|.define SAVE_R3,	qword [rsp+qword*11]
+|.define SAVE_R2,	qword [rsp+qword*10]
+|.define SAVE_R1,	qword [rsp+qword*9]
+|.define SAVE_RU2,	qword [rsp+qword*8]
+|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.else
-|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*8]
-|.define SAVE_R3,	aword [rsp+aword*7]
-|.define SAVE_R2,	aword [rsp+aword*6]
-|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*10]
+|.define SAVE_R3,	qword [rsp+qword*9]
+|.define SAVE_R2,	qword [rsp+qword*8]
+|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.endif
-|.define SAVE_CFRAME,	aword [rsp+aword*4]
-|.define SAVE_PC,	aword [rsp+aword*3]
-|.define SAVE_L,	aword [rsp+aword*2]
+|.define SAVE_CFRAME,	qword [rsp+qword*6]
+|.define UNUSED2,	qword [rsp+qword*5]
+|.define UNUSED1,	dword [rsp+dword*8]
+|.define SAVE_VMSTATE,	dword [rsp+dword*8]
+|.define SAVE_PC,	qword [rsp+qword*3]
+|.define SAVE_L,	qword [rsp+qword*2]
 |.define SAVE_ERRF,	dword [rsp+dword*3]
 |.define SAVE_NRES,	dword [rsp+dword*2]
-|.define TMP1,		aword [rsp]		//<-- rsp while in interpreter.
+|.define TMP1,		qword [rsp]		//<-- rsp while in interpreter.
 |//----- 16 byte aligned
 |
 |.define TMP1d,		dword [rsp]
@@ -342,6 +345,22 @@
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
 |.endmacro
 |
+|// Save vmstate through register.
+|.macro save_vmstate_through, reg
+|.if not WIN
+|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, reg
+|.endif // WIN
+|.endmacro
+|
+|// Restore vmstate through register.
+|.macro restore_vmstate_through, reg
+|.if not WIN
+|  mov reg, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
+|.endif // WIN
+|.endmacro
+|
 |.macro fpop1; fstp st1; .endmacro
 |
 |// Synthesize SSE FP constants.
@@ -416,7 +435,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jnz ->vm_returnp
   |
   |  // Return to C.
-  |  set_vmstate C
+  |  set_vmstate CFUNC
   |  and PC, -8
   |  sub PC, BASE
   |  neg PC				// Previous base = BASE - delta.
@@ -448,6 +467,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  xor eax, eax			// Ok return status for vm_pcall.
   |
   |->vm_leave_unw:
+  |  // DISPATCH required to set properly.
+  |  restore_vmstate_through RAd
   |  restoreregs
   |  ret
   |
@@ -493,7 +514,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:DISPATCH, SAVE_L
   |  mov GL:RB, L:DISPATCH->glref
   |  mov GL:RB->cur_L, L:DISPATCH
-  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
+  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
+  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
+  |  add DISPATCH, GG_G2DISP
   |  jmp ->vm_leave_unw
   |
   |->vm_unwind_rethrow:
@@ -521,7 +544,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov [BASE-16], RA			// Prepend false to error message.
   |  mov [BASE-8], RB
   |  mov RA, -16			// Results start at BASE+RA = BASE-16.
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
   |
   |//-----------------------------------------------------------------------
@@ -575,6 +598,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  lea KBASE, [esp+CFRAME_RESUME]
   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate_through TMPRd
   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
   |  mov SAVE_CFRAME, RD
   |  mov SAVE_NRES, RDd
@@ -585,7 +609,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  // Resume after yield (like a return).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |  mov byte L:RB->status, RDL
   |  mov BASE, L:RB->base
   |  mov RD, L:RB->top
@@ -622,11 +646,12 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_CFRAME, KBASE
   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate_through RDd
   |  mov L:RB->cframe, rsp
   |
   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
   |  add PC, RA
   |  sub PC, BASE			// PC = frame delta + frame type
@@ -658,6 +683,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_ERRF, 0			// No error function.
   |  mov SAVE_NRES, KBASEd		// Neg. delta means cframe w/o frame.
   |   add DISPATCH, GG_G2DISP
+  |  save_vmstate_through KBASEd
   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
   |
   |  mov KBASE, L:RB->cframe		// Add our C frame to cframe chain.
@@ -697,6 +723,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  cleartp LFUNC:KBASE
   |  mov KBASE, LFUNC:KBASE->pc
   |  mov KBASE, [KBASE+PC2PROTO(k)]
+  |  set_vmstate LFUNC			// LFUNC after KBASE restoration
   |  // BASE = base, RC = result, RB = meta base
   |  jmp RA				// Jump to continuation.
   |
@@ -1137,15 +1164,16 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
+  |  set_vmstate FFUNC
   |.endmacro
   |
   |.macro .ffunc_1, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RDd, 1+1;  jb ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_2, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RDd, 2+1;  jb ->fff_fallback
   |.endmacro
   |
@@ -1578,7 +1606,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:PC, TMP1
   |  mov BASE, L:RB->base
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |
   |  cmp eax, LUA_YIELD
   |  ja >8
@@ -1717,6 +1745,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  movzx RAd, PC_RA
   |  neg RA
   |  lea BASE, [BASE+RA*8-16]		// base = base - (RA+2)*8
+  |  set_vmstate LFUNC			// LFUNC state after BASE restoration
   |  ins_next
   |
   |6:  // Fill up results with nil.
@@ -2481,7 +2510,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  mov L:RB->base, BASE
   |  mov qword [DISPATCH+DISPATCH_GL(jit_base)], 0
-  |  set_vmstate INTERP
+  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  mov RCd, [PC]
   |  movzx RAd, RCH
@@ -2697,8 +2726,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CARG1, CTSTATE
   |  call extern lj_ccallback_enter	// (CTState *cts, void *cf)
   |  // lua_State * returned in eax (RD).
-  |  set_vmstate INTERP
   |  mov BASE, L:RD->base
+  |  set_vmstate LFUNC			// LFUNC after BASE restoration
   |  mov RD, L:RD->top
   |  sub RD, BASE
   |  mov LFUNC:RB, [BASE-16]
@@ -3974,6 +4003,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 
   case BC_CALL: case BC_CALLM:
     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     if (op == BC_CALLM) {
       |  add NARGS:RDd, MULTRES
     }
@@ -3995,6 +4025,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov LFUNC:RB, [RA-16]
     |  checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
     |->BC_CALLT_Z:
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     |  mov PC, [BASE-8]
     |  test PCd, FRAME_TYPE
     |  jnz >7
@@ -4219,6 +4250,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  shl RAd, 3
     }
     |1:
+    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
     |  mov PC, [BASE-8]
     |  mov MULTRES, RDd			// Save nresults+1.
     |  test PCd, FRAME_TYPE		// Check frame type marker.
@@ -4260,6 +4292,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cleartp LFUNC:KBASE
     |  mov KBASE, LFUNC:KBASE->pc
     |  mov KBASE, [KBASE+PC2PROTO(k)]
+    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
     |  ins_next
     |
     |6:  // Fill up results with nil.
@@ -4551,6 +4584,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
     |  mov KBASE, [PC-4+PC2PROTO(k)]
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [BASE+RA*8]		// Top of frame.
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_f
@@ -4588,6 +4622,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov [RD-8], RB			// Store delta + FRAME_VARG.
     |  mov [RD-16], LFUNC:KBASE		// Store copy of LFUNC.
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [RD+RA*8]
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_v		// Need to grow stack.
@@ -4643,7 +4678,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  mov CARG1, L:RB		// Caveat: CARG1 may be RA.
     }
     |  ja ->vm_growstack_c		// Need to grow stack.
-    |  set_vmstate C
+    |  set_vmstate CFUNC		// CFUNC before entering C function
     if (op == BC_FUNCC) {
       |  call KBASE			// (lua_State *L)
     } else {
@@ -4653,7 +4688,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  // nresults returned in eax (RD).
     |  mov BASE, L:RB->base
     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-    |  set_vmstate INTERP
+    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
     |  lea RA, [BASE+RD*8]
     |  neg RA
     |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
index d76fbe3..b9dffa9 100644
--- a/src/vm_x86.dasc
+++ b/src/vm_x86.dasc
@@ -140,7 +140,7 @@
 |
 |.else
 |
-|.define CFRAME_SPACE,	aword*7			// Delta for esp (see <--).
+|.define CFRAME_SPACE,	dword*11			// Delta for esp (see <--).
 |.macro saveregs_
 |  push edi; push esi; push ebx
 |  sub esp, CFRAME_SPACE
@@ -183,25 +183,30 @@
 |.define ARG1,		aword [esp]		//<-- esp while in interpreter.
 |//----- 16 byte aligned, ^^^ arguments for C callee
 |.else
-|.define SAVE_ERRF,	aword [esp+aword*15]	// vm_pcall/vm_cpcall only.
-|.define SAVE_NRES,	aword [esp+aword*14]
-|.define SAVE_CFRAME,	aword [esp+aword*13]
-|.define SAVE_L,	aword [esp+aword*12]
+|.define SAVE_ERRF,	dword [esp+dword*19]	// vm_pcall/vm_cpcall only.
+|.define SAVE_NRES,	dword [esp+dword*18]
+|.define SAVE_CFRAME,	dword [esp+dword*17]
+|.define SAVE_L,	dword [esp+dword*16]
 |//----- 16 byte aligned, ^^^ arguments from C caller
-|.define SAVE_RET,	aword [esp+aword*11]	//<-- esp entering interpreter.
-|.define SAVE_R4,	aword [esp+aword*10]
-|.define SAVE_R3,	aword [esp+aword*9]
-|.define SAVE_R2,	aword [esp+aword*8]
+|.define SAVE_RET,	dword [esp+dword*15]	//<-- esp entering interpreter.
+|.define SAVE_R4,	dword [esp+dword*14]
+|.define SAVE_R3,	dword [esp+dword*13]
+|.define SAVE_R2,	dword [esp+dword*12]
 |//----- 16 byte aligned
-|.define SAVE_R1,	aword [esp+aword*7]	//<-- esp after register saves.
-|.define SAVE_PC,	aword [esp+aword*6]
-|.define TMP2,		aword [esp+aword*5]
-|.define TMP1,		aword [esp+aword*4]
+|.define UNUSED3,	dword [esp+dword*11]
+|.define UNUSED2,	dword [esp+dword*10]
+|.define UNUSED1,	dword [esp+dword*9]
+|.define SAVE_VMSTATE,	dword [esp+dword*8]
 |//----- 16 byte aligned
-|.define ARG4,		aword [esp+aword*3]
-|.define ARG3,		aword [esp+aword*2]
-|.define ARG2,		aword [esp+aword*1]
-|.define ARG1,		aword [esp]		//<-- esp while in interpreter.
+|.define SAVE_R1,	dword [esp+dword*7]	//<-- esp after register saves.
+|.define SAVE_PC,	dword [esp+dword*6]
+|.define TMP2,		dword [esp+dword*5]
+|.define TMP1,		dword [esp+dword*4]
+|//----- 16 byte aligned
+|.define ARG4,		dword [esp+dword*3]
+|.define ARG3,		dword [esp+dword*2]
+|.define ARG2,		dword [esp+dword*1]
+|.define ARG1,		dword [esp]		//<-- esp while in interpreter.
 |//----- 16 byte aligned, ^^^ arguments for C callee
 |.endif
 |
@@ -269,7 +274,7 @@
 |//-----------------------------------------------------------------------
 |.else			// x64/POSIX stack layout
 |
-|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
+|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
 |.macro saveregs_
 |  push rbx; push r15; push r14
 |.if NO_UNWIND
@@ -290,33 +295,35 @@
 |
 |//----- 16 byte aligned,
 |.if NO_UNWIND
-|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*10]
-|.define SAVE_R3,	aword [rsp+aword*9]
-|.define SAVE_R2,	aword [rsp+aword*8]
-|.define SAVE_R1,	aword [rsp+aword*7]
-|.define SAVE_RU2,	aword [rsp+aword*6]
-|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*12]
+|.define SAVE_R3,	qword [rsp+qword*11]
+|.define SAVE_R2,	qword [rsp+qword*10]
+|.define SAVE_R1,	qword [rsp+qword*9]
+|.define SAVE_RU2,	qword [rsp+qword*8]
+|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.else
-|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*8]
-|.define SAVE_R3,	aword [rsp+aword*7]
-|.define SAVE_R2,	aword [rsp+aword*6]
-|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*10]
+|.define SAVE_R3,	qword [rsp+qword*9]
+|.define SAVE_R2,	qword [rsp+qword*8]
+|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.endif
-|.define SAVE_CFRAME,	aword [rsp+aword*4]
+|.define SAVE_CFRAME,	qword [rsp+qword*6]
+|.define UNUSED1,	qword [rsp+qword*5]
+|.define SAVE_VMSTATE,	dword [rsp+dword*8]
 |.define SAVE_PC,	dword [rsp+dword*7]
 |.define SAVE_L,	dword [rsp+dword*6]
 |.define SAVE_ERRF,	dword [rsp+dword*5]
 |.define SAVE_NRES,	dword [rsp+dword*4]
-|.define TMPa,		aword [rsp+aword*1]
+|.define TMPa,		qword [rsp+qword*1]
 |.define TMP2,		dword [rsp+dword*1]
 |.define TMP1,		dword [rsp]		//<-- rsp while in interpreter.
 |//----- 16 byte aligned
 |
 |// TMPQ overlaps TMP1/TMP2. MULTRES overlaps TMP2 (and TMPQ).
 |.define TMPQ,		qword [rsp]
-|.define TMP3,		dword [rsp+aword*1]
+|.define TMP3,		dword [rsp+qword*1]
 |.define MULTRES,	TMP2
 |
 |.endif
@@ -433,6 +440,22 @@
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
 |.endmacro
 |
+|// Save vmstate through register.
+|.macro save_vmstate_through, reg
+|.if not WIN
+|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, reg
+|.endif // WIN
+|.endmacro
+|
+|// Restore vmstate through register.
+|.macro restore_vmstate_through, reg
+|.if not WIN
+|  mov reg, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
+|.endif // WIN
+|.endmacro
+|
 |// x87 compares.
 |.macro fcomparepp			// Compare and pop st0 >< st1.
 |  fucomip st1
@@ -520,7 +543,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jnz ->vm_returnp
   |
   |  // Return to C.
-  |  set_vmstate C
+  |  set_vmstate CFUNC
   |  and PC, -8
   |  sub PC, BASE
   |  neg PC				// Previous base = BASE - delta.
@@ -559,6 +582,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  xor eax, eax			// Ok return status for vm_pcall.
   |
   |->vm_leave_unw:
+  |  // DISPATCH required to set properly.
+  |  restore_vmstate_through RA
   |  restoreregs
   |  ret
   |
@@ -613,7 +638,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:DISPATCH, SAVE_L
   |  mov GL:RB, L:DISPATCH->glref
   |  mov dword GL:RB->cur_L, L:DISPATCH
-  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
+  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
+  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
+  |  add DISPATCH, GG_G2DISP
   |  jmp ->vm_leave_unw
   |
   |->vm_unwind_rethrow:
@@ -647,7 +674,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov PC, [BASE-4]			// Fetch PC of previous frame.
   |  mov dword [BASE-4], LJ_TFALSE	// Prepend false to error message.
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
   |
   |.if WIN and not X64
@@ -714,10 +741,11 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov RA, INARG_BASE			// Caveat: overlaps SAVE_CFRAME!
   |.endif
   |  mov PC, FRAME_CP
-  |  xor RD, RD
   |  lea KBASEa, [esp+CFRAME_RESUME]
   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate_through RD
+  |  xor RD, RD
   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
   |  mov SAVE_CFRAME, RDa
   |.if X64
@@ -730,7 +758,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  // Resume after yield (like a return).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |  mov byte L:RB->status, RDL
   |  mov BASE, L:RB->base
   |  mov RD, L:RB->top
@@ -774,6 +802,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_CFRAME, KBASEa
   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate_through RD
   |.if X64
   |  mov L:RB->cframe, rsp
   |.else
@@ -782,7 +811,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
   |  add PC, RA
   |  sub PC, BASE			// PC = frame delta + frame type
@@ -823,6 +852,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_ERRF, 0			// No error function.
   |  mov SAVE_NRES, KBASE		// Neg. delta means cframe w/o frame.
   |   add DISPATCH, GG_G2DISP
+  |  save_vmstate_through KBASE
   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
   |
   |.if X64
@@ -885,6 +915,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, LFUNC:KBASE->pc
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  // BASE = base, RC = result, RB = meta base
+  |  set_vmstate LFUNC			// LFUNC after KBASE restoration
   |  jmp RAa				// Jump to continuation.
   |
   |.if FFI
@@ -1409,15 +1440,16 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
+  |  set_vmstate FFUNC
   |.endmacro
   |
   |.macro .ffunc_1, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RD, 1+1;  jb ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_2, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RD, 2+1;  jb ->fff_fallback
   |.endmacro
   |
@@ -1924,7 +1956,7 @@ static void build_subroutines(BuildCtx *ctx)
   |.endif
   |  mov BASE, L:RB->base
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
   |
   |  cmp eax, LUA_YIELD
   |  ja >8
@@ -2089,6 +2121,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  movzx RA, PC_RA
   |  not RAa				// Note: ~RA = -(RA+1)
   |  lea BASE, [BASE+RA*8]		// base = base - (RA+1)*8
+  |  set_vmstate LFUNC			// LFUNC state after BASE restoration
   |  ins_next
   |
   |6:  // Fill up results with nil.
@@ -2933,7 +2966,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  mov L:RB->base, BASE
   |  mov dword [DISPATCH+DISPATCH_GL(jit_base)], 0
-  |  set_vmstate INTERP
+  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  mov RC, [PC]
   |  movzx RA, RCH
@@ -3203,8 +3236,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov FCARG1, CTSTATE
   |  call extern lj_ccallback_enter@8	// (CTState *cts, void *cf)
   |  // lua_State * returned in eax (RD).
-  |  set_vmstate INTERP
   |  mov BASE, L:RD->base
+  |  set_vmstate LFUNC			// LFUNC after BASE restoration
   |  mov RD, L:RD->top
   |  sub RD, BASE
   |  mov LFUNC:RB, [BASE-8]
@@ -4683,6 +4716,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 
   case BC_CALL: case BC_CALLM:
     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     if (op == BC_CALLM) {
       |  add NARGS:RD, MULTRES
     }
@@ -4706,6 +4740,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cmp dword [RA-4], LJ_TFUNC
     |  jne ->vmeta_call
     |->BC_CALLT_Z:
+    |  set_vmstate INTERP		// INTERP until a new BASE is setup
     |  mov PC, [BASE-4]
     |  test PC, FRAME_TYPE
     |  jnz >7
@@ -4989,6 +5024,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  shl RA, 3
     }
     |1:
+    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
     |  mov PC, [BASE-4]
     |  mov MULTRES, RD			// Save nresults+1.
     |  test PC, FRAME_TYPE		// Check frame type marker.
@@ -5043,6 +5079,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov LFUNC:KBASE, [BASE-8]
     |  mov KBASE, LFUNC:KBASE->pc
     |  mov KBASE, [KBASE+PC2PROTO(k)]
+    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
     |  ins_next
     |
     |6:  // Fill up results with nil.
@@ -5330,6 +5367,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
     |  mov KBASE, [PC-4+PC2PROTO(k)]
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [BASE+RA*8]		// Top of frame.
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_f
@@ -5367,6 +5405,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov [RD-4], RB			// Store delta + FRAME_VARG.
     |  mov [RD-8], LFUNC:KBASE		// Store copy of LFUNC.
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
     |  lea RA, [RD+RA*8]
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_v		// Need to grow stack.
@@ -5431,7 +5470,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |.endif
     }
     |  ja ->vm_growstack_c		// Need to grow stack.
-    |  set_vmstate C
+    |  set_vmstate CFUNC		// CFUNC before entering C function
     if (op == BC_FUNCC) {
       |  call KBASEa			// (lua_State *L)
     } else {
@@ -5441,7 +5480,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  // nresults returned in eax (RD).
     |  mov BASE, L:RB->base
     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-    |  set_vmstate INTERP
+    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
     |  lea RA, [BASE+RD*8]
     |  neg RA
     |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (2 preceding siblings ...)
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-26 19:12   ` Sergey Ostanevich
  2020-12-27 13:09   ` Igor Munkin
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler Sergey Kaplun
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

To determine currently allocating coroutine (that may not be equal to
currently executed one) a new field called mem_L is added to
global_State structure. This field is set on each allocation event and
stores the coroutine address that is used for allocation.

Part of tarantool/tarantool#5442
---
 src/lj_gc.c  | 2 ++
 src/lj_obj.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/src/lj_gc.c b/src/lj_gc.c
index 44c8aa1..800fb2c 100644
--- a/src/lj_gc.c
+++ b/src/lj_gc.c
@@ -852,6 +852,8 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
 {
   global_State *g = G(L);
   lua_assert((osz == 0) == (p == NULL));
+
+  setgcref(g->mem_L, obj2gco(L));
   p = g->allocf(g->allocd, p, osz, nsz);
   if (p == NULL && nsz > 0)
     lj_err_mem(L);
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 7fb715e..c94617d 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -649,6 +649,7 @@ typedef struct global_State {
   BCIns bc_cfunc_int;	/* Bytecode for internal C function calls. */
   BCIns bc_cfunc_ext;	/* Bytecode for external C function calls. */
   GCRef cur_L;		/* Currently executing lua_State. */
+  GCRef mem_L;		/* Currently allocating lua_State. */
   MRef jit_base;	/* Current JIT code L->base or NULL. */
   MRef ctype_state;	/* Pointer to C type state. */
   GCRef gcroot[GCROOT_MAX];  /* GC roots. */
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (3 preceding siblings ...)
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-27 10:58   ` Sergey Ostanevich
  2020-12-27 16:44   ` Igor Munkin
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for " Sergey Kaplun
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces memory profiler for Lua machine.

First of all profiler dumps the definitions of all loaded Lua functions
(symtab) via the write buffer introduced in one of the previous patches.

Profiler replaces the old allocation function with the instrumented one
after symtab is dumped. This new function reports all allocations,
reallocations or deallocations events via the write buffer during
profiling. Subsequent content depends on the function's type (LFUNC,
FFUNC or CFUNC).

To divide all traces into the one vmstate when being profiled, a special
macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.

When profiling is over, a special epilogue event header is written and
the old allocation function is restored back.

This change also makes debug_frameline function LuaJIT-wide visible to
be used in the memory profiler.

For more information, see <lj_memprof.h>.

Part of tarantool/tarantool#5442
---

Changes in v2:
  - Merged with debug-to-public commit and symtab.
  - Drop [T]imer bit description.

 src/Makefile     |   8 +-
 src/Makefile.dep |  31 ++--
 src/lj_arch.h    |  22 +++
 src/lj_debug.c   |   8 +-
 src/lj_debug.h   |   3 +
 src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_memprof.h | 165 ++++++++++++++++++
 src/lj_obj.h     |   8 +
 src/lj_state.c   |   8 +
 src/ljamalg.c    |   1 +
 10 files changed, 665 insertions(+), 19 deletions(-)
 create mode 100644 src/lj_memprof.c
 create mode 100644 src/lj_memprof.h

diff --git a/src/Makefile b/src/Makefile
index 384b590..3218dfd 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -113,6 +113,12 @@ XCFLAGS=
 # Enable GC64 mode for x64.
 #XCFLAGS+= -DLUAJIT_ENABLE_GC64
 #
+# Disable the memory profiler.
+#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
+#
+# Disable the thread safe profiler.
+#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE
+#
 ##############################################################################
 
 ##############################################################################
@@ -489,7 +495,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o lj_wbuf.o \
 	  lj_str.o lj_tab.o lj_func.o lj_udata.o lj_meta.o lj_debug.o \
 	  lj_state.o lj_dispatch.o lj_vmevent.o lj_vmmath.o lj_strscan.o \
 	  lj_strfmt.o lj_strfmt_num.o lj_api.o lj_mapi.o lj_profile.o \
-	  lj_lex.o lj_parse.o lj_bcread.o lj_bcwrite.o lj_load.o \
+	  lj_memprof.o lj_lex.o lj_parse.o lj_bcread.o lj_bcwrite.o lj_load.o \
 	  lj_ir.o lj_opt_mem.o lj_opt_fold.o lj_opt_narrow.o \
 	  lj_opt_dce.o lj_opt_loop.o lj_opt_split.o lj_opt_sink.o \
 	  lj_mcode.o lj_snap.o lj_record.o lj_crecord.o lj_ffrecord.o \
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 59ed450..8ae14a5 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -147,6 +147,9 @@ lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
 lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
  lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
  lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
+lj_memprof.o: lj_memprof.c lj_memprof.h lj_def.h lua.h luaconf.h \
+ lj_obj.h lj_arch.h lj_frame.h lj_bc.h lj_jit.h lj_ir.h lj_gc.h lj_debug.h \
+ lj_wbuf.h
 lj_obj.o: lj_obj.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h
 lj_opt_dce.o: lj_opt_dce.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_ir.h lj_jit.h lj_iropt.h
@@ -220,20 +223,20 @@ ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \
  lj_char.h lj_bc.c lj_bcdef.h lj_obj.c lj_buf.c lj_wbuf.c lj_wbuf.h lj_utils.h \
  lj_str.c lj_tab.c lj_func.c lj_udata.c lj_meta.c lj_strscan.h lj_lib.h \
  lj_debug.c lj_state.c lj_lex.h lj_alloc.h luajit.h lj_dispatch.c \
- lj_ccallback.h lj_profile.h lj_vmevent.c lj_vmevent.h lj_vmmath.c \
- lj_strscan.c lj_strfmt.c lj_strfmt_num.c lj_api.c lj_mapi.c lmisclib.h \
- lj_profile.c lj_lex.c lualib.h lj_parse.h lj_parse.c lj_bcread.c lj_bcdump.h \
- lj_bcwrite.c lj_load.c lj_ctype.c lj_cdata.c lj_cconv.h lj_cconv.c lj_ccall.c \
- lj_ccall.h lj_ccallback.c lj_target.h lj_target_*.h lj_mcode.h lj_carith.c \
- lj_carith.h lj_clib.c lj_clib.h lj_cparse.c lj_cparse.h lj_lib.c lj_ir.c \
- lj_ircall.h lj_iropt.h lj_opt_mem.c lj_opt_fold.c lj_folddef.h \
- lj_opt_narrow.c lj_opt_dce.c lj_opt_loop.c lj_snap.h lj_opt_split.c \
- lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c lj_record.h lj_ffrecord.h \
- lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h lj_asm.c lj_asm.h \
- lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c lj_alloc.c \
- lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c lib_string.c \
- lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c lib_bit.c lib_jit.c \
- lib_ffi.c lib_misc.c lib_init.c
+ lj_ccallback.h lj_profile.h lj_memprof.h lj_vmevent.c lj_vmevent.h \
+ lj_vmmath.c lj_strscan.c lj_strfmt.c lj_strfmt_num.c lj_api.c lj_mapi.c \
+ lmisclib.h lj_profile.c lj_memprof.c lj_lex.c lualib.h lj_parse.h lj_parse.c \
+ lj_bcread.c lj_bcdump.h lj_bcwrite.c lj_load.c lj_ctype.c lj_cdata.c \
+ lj_cconv.h lj_cconv.c lj_ccall.c lj_ccall.h lj_ccallback.c lj_target.h \
+ lj_target_*.h lj_mcode.h lj_carith.c lj_carith.h lj_clib.c lj_clib.h \
+ lj_cparse.c lj_cparse.h lj_lib.c lj_ir.c lj_ircall.h lj_iropt.h lj_opt_mem.c \
+ lj_opt_fold.c lj_folddef.h lj_opt_narrow.c lj_opt_dce.c lj_opt_loop.c \
+ lj_snap.h lj_opt_split.c lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c \
+ lj_record.h lj_ffrecord.h lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h \
+ lj_asm.c lj_asm.h lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c \
+ lj_alloc.c lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c \
+ lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
+ lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
 luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h
 host/buildvm.o: host/buildvm.c host/buildvm.h lj_def.h lua.h luaconf.h \
  lj_arch.h lj_obj.h lj_def.h lj_arch.h lj_gc.h lj_obj.h lj_bc.h lj_ir.h \
diff --git a/src/lj_arch.h b/src/lj_arch.h
index c8d7138..5967849 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -213,6 +213,8 @@
 #define LJ_ARCH_VERSION		50
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_ARM64
 
 #define LJ_ARCH_BITS		64
@@ -234,6 +236,8 @@
 
 #define LJ_ARCH_VERSION		80
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_PPC
 
 #ifndef LJ_ARCH_ENDIAN
@@ -299,6 +303,8 @@
 #define LJ_ARCH_XENON		1
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 || LUAJIT_TARGET == LUAJIT_ARCH_MIPS64
 
 #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL)
@@ -358,6 +364,8 @@
 #define LJ_ARCH_VERSION		10
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #else
 #error "No target architecture defined"
 #endif
@@ -564,4 +572,18 @@
 #define LJ_52			0
 #endif
 
+/* Disable or enable the memory profiler. */
+#if defined(LUAJIT_DISABLE_MEMPROF) || defined(LJ_ARCH_NOMEMPROF) || LJ_TARGET_WINDOWS || LJ_TARGET_CYGWIN || LJ_TARGET_PS3 || LJ_TARGET_PS4 || LJ_TARGET_XBOX360
+#define LJ_HASMEMPROF		0
+#else
+#define LJ_HASMEMPROF		1
+#endif
+
+/* Disable or enable the memory profiler's thread safety. */
+#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
+#define LJ_THREAD_SAFE		0
+#else
+#define LJ_THREAD_SAFE		1
+#endif
+
 #endif
diff --git a/src/lj_debug.c b/src/lj_debug.c
index 73bd196..bb9ab28 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -128,7 +128,7 @@ BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc)
 }
 
 /* Get line number for function/frame. */
-static BCLine debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe)
+BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe)
 {
   BCPos pc = debug_framepc(L, fn, nextframe);
   if (pc != NO_BCPOS) {
@@ -353,7 +353,7 @@ void lj_debug_addloc(lua_State *L, const char *msg,
   if (frame) {
     GCfunc *fn = frame_func(frame);
     if (isluafunc(fn)) {
-      BCLine line = debug_frameline(L, fn, nextframe);
+      BCLine line = lj_debug_frameline(L, fn, nextframe);
       if (line >= 0) {
 	GCproto *pt = funcproto(fn);
 	char buf[LUA_IDSIZE];
@@ -470,7 +470,7 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
 	ar->what = "C";
       }
     } else if (*what == 'l') {
-      ar->currentline = frame ? debug_frameline(L, fn, nextframe) : -1;
+      ar->currentline = frame ? lj_debug_frameline(L, fn, nextframe) : -1;
     } else if (*what == 'u') {
       ar->nups = fn->c.nupvalues;
       if (ext) {
@@ -616,7 +616,7 @@ void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt, int depth)
 	    GCproto *pt = funcproto(fn);
 	    if (debug_putchunkname(sb, pt, pathstrip)) {
 	      /* Regular Lua function. */
-	      BCLine line = c == 'l' ? debug_frameline(L, fn, nextframe) :
+	      BCLine line = c == 'l' ? lj_debug_frameline(L, fn, nextframe) :
 				       pt->firstline;
 	      lj_buf_putb(sb, ':');
 	      lj_strfmt_putint(sb, line >= 0 ? line : pt->firstline);
diff --git a/src/lj_debug.h b/src/lj_debug.h
index 5917c00..a157d28 100644
--- a/src/lj_debug.h
+++ b/src/lj_debug.h
@@ -40,6 +40,9 @@ LJ_FUNC void lj_debug_addloc(lua_State *L, const char *msg,
 LJ_FUNC void lj_debug_pushloc(lua_State *L, GCproto *pt, BCPos pc);
 LJ_FUNC int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar,
 			     int ext);
+#if LJ_HASMEMPROF
+LJ_FUNC BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe);
+#endif
 #if LJ_HASPROFILE
 LJ_FUNC void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt,
 				int depth);
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
new file mode 100644
index 0000000..e0df057
--- /dev/null
+++ b/src/lj_memprof.c
@@ -0,0 +1,430 @@
+/*
+** Implementation of memory profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#define lj_memprof_c
+#define LUA_CORE
+
+#include <errno.h>
+
+#include "lj_memprof.h"
+#include "lj_def.h"
+#include "lj_arch.h"
+
+#if LJ_HASMEMPROF
+
+#if LJ_IS_THREAD_SAFE
+#include <pthread.h>
+#endif
+
+#include "lua.h"
+
+#include "lj_obj.h"
+#include "lj_frame.h"
+#include "lj_debug.h"
+#include "lj_gc.h"
+#include "lj_wbuf.h"
+
+/* --------------------------------- Symtab --------------------------------- */
+
+static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
+					   0x0, 0x0, 0x0};
+
+static void symtab_write_prologue(struct lj_wbuf *out)
+{
+  const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
+  lj_wbuf_addn(out, ljs_header, len);
+}
+
+static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
+{
+  const GCRef *iter = &g->gc.root;
+  const GCobj *o;
+
+  symtab_write_prologue(out);
+
+  while (NULL != (o = gcref(*iter))) {
+    switch (o->gch.gct) {
+    case (~LJ_TPROTO): {
+      const GCproto *pt = gco2pt(o);
+      lj_wbuf_addbyte(out, SYMTAB_LFUNC);
+      lj_wbuf_addu64(out, (uintptr_t)pt);
+      lj_wbuf_addstring(out, proto_chunknamestr(pt));
+      lj_wbuf_addu64(out, (uint64_t)pt->firstline);
+      break;
+    }
+    default:
+      break;
+    }
+    iter = &o->gch.nextgc;
+  }
+
+  lj_wbuf_addbyte(out, SYMTAB_FINAL);
+}
+
+/* ---------------------------- Memory profiler ----------------------------- */
+
+enum memprof_state {
+  /* memprof is not running. */
+  MPS_IDLE,
+  /* memprof is running. */
+  MPS_PROFILE,
+  /*
+  ** Stopped in case of stopped stream.
+  ** Saved errno is returned to user at memprof_stop.
+  */
+  MPS_HALT
+};
+
+struct alloc {
+  lua_Alloc allocf; /* Allocating function. */
+  void *state; /* Opaque allocator's state. */
+};
+
+struct memprof {
+  global_State *g; /* Profiled VM. */
+  enum memprof_state state; /* Internal state. */
+  struct lj_wbuf out; /* Output accumulator. */
+  struct alloc orig_alloc; /* Original allocator. */
+  struct lua_Prof_options opt; /* Profiling options. */
+  int saved_errno; /* Saved errno when profiler deinstrumented. */
+};
+
+#if LJ_IS_THREAD_SAFE
+
+pthread_mutex_t memprof_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static LJ_AINLINE int memprof_lock(void)
+{
+  return pthread_mutex_lock(&memprof_mutex);
+}
+
+static LJ_AINLINE int memprof_unlock(void)
+{
+  return pthread_mutex_unlock(&memprof_mutex);
+}
+
+#else /* LJ_IS_THREAD_SAFE */
+
+#define memprof_lock()
+#define memprof_unlock()
+
+#endif /* LJ_IS_THREAD_SAFE */
+
+static struct memprof memprof = {0};
+
+const unsigned char ljm_header[] = {'l', 'j', 'm', LJM_CURRENT_FORMAT_VERSION,
+				    0x0, 0x0, 0x0};
+
+static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header,
+				GCfunc *fn, struct lua_State *L,
+				cTValue *nextframe)
+{
+  const BCLine line = lj_debug_frameline(L, fn, nextframe);
+  lj_wbuf_addbyte(out, header | ASOURCE_LFUNC);
+  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
+  lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0);
+}
+
+static void memprof_write_cfunc(struct lj_wbuf *out, uint8_t header,
+				const GCfunc *fn)
+{
+  lj_wbuf_addbyte(out, header | ASOURCE_CFUNC);
+  lj_wbuf_addu64(out, (uintptr_t)fn->c.f);
+}
+
+static void memprof_write_ffunc(struct lj_wbuf *out, uint8_t header,
+				GCfunc *fn, struct lua_State *L,
+				cTValue *frame)
+{
+  cTValue *pframe = frame_prev(frame);
+  GCfunc *pfn = frame_func(pframe);
+
+  /*
+  ** XXX: If a fast function is called by a Lua function, report the
+  ** Lua function for more meaningful output. Otherwise report the fast
+  ** function as a C function.
+  */
+  if (pfn != NULL && isluafunc(pfn))
+    memprof_write_lfunc(out, header, pfn, L, frame);
+  else
+    memprof_write_cfunc(out, header, fn);
+}
+
+static void memprof_write_func(struct memprof *mp, uint8_t header)
+{
+  struct lj_wbuf *out = &mp->out;
+  lua_State *L = gco2th(gcref(mp->g->mem_L));
+  cTValue *frame = L->base - 1;
+  GCfunc *fn;
+
+  fn = frame_func(frame);
+
+  if (isluafunc(fn))
+    memprof_write_lfunc(out, header, fn, L, NULL);
+  else if (isffunc(fn))
+    memprof_write_ffunc(out, header, fn, L, frame);
+  else if (iscfunc(fn))
+    memprof_write_cfunc(out, header, fn);
+  else
+    lua_assert(0);
+}
+
+static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
+{
+  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
+}
+
+/*
+** XXX: In ideal world, we should report allocations from traces as well.
+** But since traces must follow the semantics of the original code, behaviour of
+** Lua and JITted code must match 1:1 in terms of allocations, which makes
+** using memprof with enabled JIT virtually redundant. Hence the stub below.
+*/
+static void memprof_write_trace(struct memprof *mp, uint8_t header)
+{
+  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
+}
+
+typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);
+
+static const memprof_writer memprof_writers[] = {
+  memprof_write_hvmstate, /* LJ_VMST_INTERP */
+  memprof_write_func, /* LJ_VMST_LFUNC */
+  memprof_write_func, /* LJ_VMST_FFUNC */
+  memprof_write_func, /* LJ_VMST_CFUNC */
+  memprof_write_hvmstate, /* LJ_VMST_GC */
+  memprof_write_hvmstate, /* LJ_VMST_EXIT */
+  memprof_write_hvmstate, /* LJ_VMST_RECORD */
+  memprof_write_hvmstate, /* LJ_VMST_OPT */
+  memprof_write_hvmstate, /* LJ_VMST_ASM */
+  memprof_write_trace /* LJ_VMST_TRACE */
+};
+
+static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
+{
+  const global_State *g = mp->g;
+  const uint32_t _vmstate = (uint32_t)~g->vmstate;
+  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
+  const uint8_t header = aevent;
+
+  memprof_writers[vmstate](mp, header);
+}
+
+static int memprof_stop(const struct lua_State *L);
+
+static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
+{
+  struct memprof *mp = &memprof;
+  struct alloc *oalloc = &mp->orig_alloc;
+  struct lj_wbuf *out = &mp->out;
+  void *nptr;
+
+  lua_assert(MPS_PROFILE == mp->state);
+  lua_assert(oalloc->allocf != memprof_allocf);
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(ud == oalloc->state);
+
+  nptr = oalloc->allocf(ud, ptr, osize, nsize);
+
+  if (nsize == 0) {
+    memprof_write_caller(mp, AEVENT_FREE);
+    lj_wbuf_addu64(out, (uintptr_t)ptr);
+    lj_wbuf_addu64(out, (uint64_t)osize);
+  } else if (ptr == NULL) {
+    memprof_write_caller(mp, AEVENT_ALLOC);
+    lj_wbuf_addu64(out, (uintptr_t)nptr);
+    lj_wbuf_addu64(out, (uint64_t)nsize);
+  } else {
+    memprof_write_caller(mp, AEVENT_REALLOC);
+    lj_wbuf_addu64(out, (uintptr_t)ptr);
+    lj_wbuf_addu64(out, (uint64_t)osize);
+    lj_wbuf_addu64(out, (uintptr_t)nptr);
+    lj_wbuf_addu64(out, (uint64_t)nsize);
+  }
+
+  /* Deinstrument memprof if required. */
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP)))
+    memprof_stop(NULL);
+
+  return nptr;
+}
+
+static void memprof_write_prologue(struct lj_wbuf *out)
+{
+  const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
+  lj_wbuf_addn(out, ljm_header, len);
+}
+
+int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)
+{
+  struct memprof *mp = &memprof;
+  struct alloc *oalloc = &mp->orig_alloc;
+
+  lua_assert(opt->writer != NULL && opt->on_stop != NULL);
+  lua_assert(opt->buf != NULL && opt->len != 0);
+
+  memprof_lock();
+
+  if (mp->state != MPS_IDLE) {
+    memprof_unlock();
+    return PROFILE_ERRRUN;
+  }
+
+  /* Discard possible old errno. */
+  mp->saved_errno = 0;
+
+  /* Init options: */
+  memcpy(&mp->opt, opt, sizeof(*opt));
+
+  /* Init general fields: */
+  mp->g = G(L);
+  mp->state = MPS_PROFILE;
+
+  /* Init output: */
+  lj_wbuf_init(&mp->out, mp->opt.writer, mp->opt.ctx, mp->opt.buf,
+	       mp->opt.len);
+  dump_symtab(&mp->out, mp->g);
+  memprof_write_prologue(&mp->out);
+
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) ||
+		  lj_wbuf_test_flag(&mp->out, STREAM_STOP))) {
+    /* on_stop call may change errno value. */
+    int saved_errno = lj_wbuf_errno(&mp->out);
+    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
+    lj_wbuf_terminate(&mp->out);
+    mp->state = MPS_IDLE;
+    memprof_unlock();
+    errno = saved_errno;
+    return PROFILE_ERRIO;
+  }
+
+  /* Override allocating function: */
+  oalloc->allocf = lua_getallocf(L, &oalloc->state);
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(oalloc->allocf != memprof_allocf);
+  lua_assert(oalloc->state != NULL);
+  lua_setallocf(L, memprof_allocf, oalloc->state);
+
+  memprof_unlock();
+  return PROFILE_SUCCESS;
+}
+
+static int memprof_stop(const struct lua_State *L)
+{
+  struct memprof *mp = &memprof;
+  struct alloc *oalloc = &mp->orig_alloc;
+  struct lj_wbuf *out = &mp->out;
+  int return_status = PROFILE_SUCCESS;
+  int saved_errno = 0;
+  struct lua_State *main_L;
+  int cb_status;
+
+  memprof_lock();
+
+  if (mp->state == MPS_HALT) {
+    errno = mp->saved_errno;
+    mp->state = MPS_IDLE
+    memprof_unlock();
+    return PROFILE_ERRIO;
+  }
+
+  if (mp->state != MPS_PROFILE) {
+    memprof_unlock();
+    return PROFILE_ERRRUN;
+  }
+
+  if (L != NULL && mp->g != G(L)) {
+    memprof_unlock();
+    return PROFILE_ERR;
+  }
+
+  mp->state = MPS_IDLE;
+
+  lua_assert(mp->g != NULL);
+  main_L = mainthread(mp->g);
+
+  lua_assert(memprof_allocf == lua_getallocf(main_L, NULL));
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(oalloc->state != NULL);
+  lua_setallocf(main_L, oalloc->allocf, oalloc->state);
+
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
+    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
+    mp->state = MPS_HALT;
+    /* on_stop call may change errno value. */
+    mp->saved_errno = lj_wbuf_errno(out);
+    /* Ignore possible errors. mp->opt.buf == NULL here. */
+    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
+    lj_wbuf_terminate(out);
+    memprof_unlock();
+    return PROFILE_ERRIO;
+  }
+  lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
+
+  lj_wbuf_flush(out);
+
+  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
+    saved_errno = lj_wbuf_errno(out);
+    return_status = PROFILE_ERRIO;
+  }
+
+  lj_wbuf_terminate(out);
+
+  memprof_unlock();
+  errno = saved_errno;
+  return return_status;
+}
+
+int lj_memprof_stop(void)
+{
+  return memprof_stop(NULL);
+}
+
+int lj_memprof_stop_vm(const struct lua_State *L)
+{
+  return memprof_stop(L);
+}
+
+int lj_memprof_is_running(void)
+{
+  struct memprof *mp = &memprof;
+  int running;
+
+  memprof_lock();
+  running = mp->state == MPS_PROFILE;
+  memprof_unlock();
+
+  return running;
+}
+
+#else /* LJ_HASMEMPROF */
+
+int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)
+{
+  UNUSED(L);
+  UNUSED(opt);
+  return PROFILE_ERR;
+}
+
+int lj_memprof_stop(void)
+{
+  return PROFILE_ERR;
+}
+
+int lj_memprof_stop_vm(const struct lua_State *L)
+{
+  UNUSED(L);
+  return PROFILE_ERR;
+}
+
+int lj_memprof_is_running(void)
+{
+  return 0;
+}
+
+#endif /* LJ_HASMEMPROF */
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
new file mode 100644
index 0000000..a96b72f
--- /dev/null
+++ b/src/lj_memprof.h
@@ -0,0 +1,165 @@
+/*
+** Memory profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#ifndef _LJ_MEMPROF_H
+#define _LJ_MEMPROF_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+#define LJS_CURRENT_VERSION 0x1
+
+/*
+** symtab format:
+**
+** symtab         := prologue sym*
+** prologue       := 'l' 'j' 's' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** sym            := sym-lua | sym-final
+** sym-lua        := sym-header sym-addr sym-chunk sym-line
+** sym-header     := <BYTE>
+** sym-addr       := <ULEB128>
+** sym-chunk      := string
+** sym-line       := <ULEB128>
+** sym-final      := sym-header
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain numeric version number
+**
+** sym-header: [FUUUUUTT]
+**  * TT    : 2 bits for representing symbol type
+**  * UUUUU : 5 unused bits
+**  * F     : 1 bit marking the end of the symtab (final symbol)
+*/
+
+#define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_FFUNC ((uint8_t)1)
+#define SYMTAB_CFUNC ((uint8_t)2)
+#define SYMTAB_TRACE ((uint8_t)3)
+#define SYMTAB_FINAL ((uint8_t)0x80)
+
+#define LJM_CURRENT_FORMAT_VERSION 0x01
+
+/*
+** Event stream format:
+**
+** stream         := symtab memprof
+** symtab         := see symtab description
+** memprof        := prologue event* epilogue
+** prologue       := 'l' 'j' 'm' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** event          := event-alloc | event-realloc | event-free
+** event-alloc    := event-header loc? naddr nsize
+** event-realloc  := event-header loc? oaddr osize naddr nsize
+** event-free     := event-header loc? oaddr osize
+** event-header   := <BYTE>
+** loc            := loc-lua | loc-c
+** loc-lua        := sym-addr line-no
+** loc-c          := sym-addr
+** sym-addr       := <ULEB128>
+** line-no        := <ULEB128>
+** oaddr          := <ULEB128>
+** naddr          := <ULEB128>
+** osize          := <ULEB128>
+** nsize          := <ULEB128>
+** epilogue       := event-header
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain integer version number
+**
+** event-header: [FUUUSSEE]
+**  * EE   : 2 bits for representing allocation event type (AEVENT_*)
+**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
+**  * UUU  : 3 unused bits
+**  * F    : 0 for regular events, 1 for epilogue's *F*inal header
+**           (if F is set to 1, all other bits are currently ignored)
+*/
+
+/* Allocation events: */
+#define AEVENT_ALLOC   ((uint8_t)1)
+#define AEVENT_FREE    ((uint8_t)2)
+#define AEVENT_REALLOC ((uint8_t)(AEVENT_ALLOC | AEVENT_FREE))
+
+/* Allocation sources: */
+#define ASOURCE_INT   ((uint8_t)(1 << 2))
+#define ASOURCE_LFUNC ((uint8_t)(2 << 2))
+#define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+
+#define LJM_EPILOGUE_HEADER 0x80
+
+/* Profiler public API. */
+#define PROFILE_SUCCESS 0
+#define PROFILE_ERR     1
+#define PROFILE_ERRRUN  2
+#define PROFILE_ERRMEM  3
+#define PROFILE_ERRIO   4
+
+/* Profiler options. */
+struct lua_Prof_options {
+  /* Context for the profile writer and final callback. */
+  void *ctx;
+  /* Custom buffer to write data. */
+  uint8_t *buf;
+  /* The buffer's size. */
+  size_t len;
+  /*
+  ** Writer function for profile events.
+  ** Should return amount of written bytes on success or zero in case of error.
+  ** Setting *data to NULL means end of profiling.
+  */
+  size_t (*writer)(const void **data, size_t len, void *ctx);
+  /*
+  ** Callback on profiler stopping. Required for correctly cleaning
+  ** at vm shoutdown when profiler still running.
+  ** Returns zero on success.
+  */
+  int (*on_stop)(void *ctx, uint8_t *buf);
+};
+
+/* Avoid to provide additional interfaces described in other headers. */
+struct lua_State;
+
+/*
+** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
+** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of
+** LUAM_PROFILE_ERRIO.
+*/
+int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt);
+
+/*
+** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
+** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
+** on call at buffer flush, profiled stream stops, or on_stop() callback
+** returns non-zero value, returns LUAM_PROFILE_ERRIO.
+*/
+int lj_memprof_stop(void);
+
+/*
+** VM g is currently being profiled, behaves exactly as lj_memprof_stop().
+** Otherwise does nothing and returns LUAM_PROFILE_ERR.
+*/
+int lj_memprof_stop_vm(const struct lua_State *L);
+
+/* Check that profiler is running. */
+int lj_memprof_is_running(void);
+
+#endif
diff --git a/src/lj_obj.h b/src/lj_obj.h
index c94617d..c94b0bb 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -523,6 +523,14 @@ enum {
   LJ_VMST__MAX
 };
 
+/*
+** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
+** In fact, when VM executes a trace, vmstate is set to the trace number, but
+** we aggregate all such cases into one VM state during per-VM state profiling.
+*/
+
+#define LJ_VMST_TRACE		(LJ_VMST__MAX)
+
 #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
 
 /* Metamethods. ORDER MM */
diff --git a/src/lj_state.c b/src/lj_state.c
index 1d9c628..6c46e3d 100644
--- a/src/lj_state.c
+++ b/src/lj_state.c
@@ -29,6 +29,10 @@
 #include "lj_alloc.h"
 #include "luajit.h"
 
+#if LJ_HASMEMPROF
+#include "lj_memprof.h"
+#endif
+
 /* -- Stack handling ------------------------------------------------------ */
 
 /* Stack sizes. */
@@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
   global_State *g = G(L);
   int i;
   L = mainthread(g);  /* Only the main thread can be closed. */
+#if LJ_HASMEMPROF
+  if (lj_memprof_is_running())
+    lj_memprof_stop();
+#endif
 #if LJ_HASPROFILE
   luaJIT_profile_stop(L);
 #endif
diff --git a/src/ljamalg.c b/src/ljamalg.c
index 705e296..3f7e686 100644
--- a/src/ljamalg.c
+++ b/src/ljamalg.c
@@ -51,6 +51,7 @@
 #include "lj_api.c"
 #include "lj_mapi.c"
 #include "lj_profile.c"
+#include "lj_memprof.c"
 #include "lj_lex.c"
 #include "lj_parse.c"
 #include "lj_bcread.c"
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for memory profiler
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (4 preceding siblings ...)
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-27 11:54   ` Sergey Ostanevich
  2020-12-27 18:58   ` Igor Munkin
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser Sergey Kaplun
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces Lua API for LuaJIT memory profiler implemented in
the scope of the previous patch.

Profiler returns some true value if started/stopped successfully,
returns nil on failure (plus an error message as a second result and a
system-dependent error code as a third result).
If LuaJIT build without memory profiler both return `false`.

<lj_errmsg.h> have adjusted with two new errors
PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
started/stopped already correspondingly.

Part of tarantool/tarantool#5442
---

Changes in v2:
  - Added pushing of errno for ERR_PROF* and ERRMEM
  - Added forgotten assert.

 src/Makefile.dep |   5 +-
 src/lib_misc.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_errmsg.h  |   6 ++
 3 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.dep b/src/Makefile.dep
index 8ae14a5..c3d0977 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -29,8 +29,9 @@ lib_jit.o: lib_jit.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
  lj_vm.h lj_vmevent.h lj_lib.h luajit.h lj_libdef.h
 lib_math.o: lib_math.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h \
  lj_def.h lj_arch.h lj_lib.h lj_vm.h lj_libdef.h
-lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
- lj_str.h lj_tab.h lj_lib.h lj_libdef.h
+lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lauxlib.h lj_obj.h \
+ lj_def.h lj_arch.h lj_str.h lj_tab.h lj_lib.h lj_gc.h lj_err.h \
+ lj_errmsg.h lj_memprof.h lj_libdef.h
 lib_os.o: lib_os.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_lib.h \
  lj_libdef.h
diff --git a/src/lib_misc.c b/src/lib_misc.c
index 6f7b9a9..36fe29f 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -8,13 +8,21 @@
 #define lib_misc_c
 #define LUA_LIB
 
+#include <stdio.h>
+#include <errno.h>
+
 #include "lua.h"
 #include "lmisclib.h"
+#include "lauxlib.h"
 
 #include "lj_obj.h"
 #include "lj_str.h"
 #include "lj_tab.h"
 #include "lj_lib.h"
+#include "lj_gc.h"
+#include "lj_err.h"
+
+#include "lj_memprof.h"
 
 /* ------------------------------------------------------------------------ */
 
@@ -67,8 +75,167 @@ LJLIB_CF(misc_getmetrics)
 
 #include "lj_libdef.h"
 
+/* ----- misc.memprof module ---------------------------------------------- */
+
+#define LJLIB_MODULE_misc_memprof
+
+/*
+** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
+*/
+#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
+
+/* Structure given as ctx to memprof writer and on_stop callback. */
+struct memprof_ctx {
+  /* Output file stream for data. */
+  FILE *stream;
+  /* Profiled global_State for lj_mem_free at on_stop callback. */
+  global_State *g;
+};
+
+static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
+{
+  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);
+  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+}
+
+/* Default buffer writer function. Just call fwrite to corresponding FILE. */
+static size_t buffer_writer_default(const void **buf_addr, size_t len,
+				    void *opt)
+{
+  FILE *stream = ((struct memprof_ctx *)opt)->stream;
+  const void * const buf_start = *buf_addr;
+  const void *data = *buf_addr;
+  size_t write_total = 0;
+
+  lua_assert(len <= STREAM_BUFFER_SIZE);
+
+  for (;;) {
+    const size_t written = fwrite(data, 1, len, stream);
+
+    if (LJ_UNLIKELY(written == 0)) {
+      /* Re-tries write in case of EINTR. */
+      if (errno == EINTR) {
+	errno = 0;
+	continue;
+      }
+      break;
+    }
+
+    write_total += written;
+
+    if (write_total == len)
+      break;
+
+    data = (uint8_t *)data + (ptrdiff_t)written;
+  }
+  lua_assert(write_total <= len);
+
+  *buf_addr = buf_start;
+  return write_total;
+}
+
+/* Default on stop callback. Just close corresponding stream. */
+static int on_stop_cb_default(void *opt, uint8_t *buf)
+{
+  struct memprof_ctx *ctx = opt;
+  FILE *stream = ctx->stream;
+  memprof_ctx_free(ctx, buf);
+  return fclose(stream);
+}
+
+/* local started, err, errno = misc.memprof.start(fname) */
+LJLIB_CF(misc_memprof_start)
+{
+  struct lua_Prof_options opt = {0};
+  struct memprof_ctx *ctx;
+  const char *fname;
+  int memprof_status;
+  int started;
+
+  fname = strdata(lj_lib_checkstr(L, 1));
+
+  ctx = lj_mem_new(L, sizeof(*ctx));
+  if (ctx == NULL)
+    goto errmem;
+
+  opt.ctx = ctx;
+  opt.writer = buffer_writer_default;
+  opt.on_stop = on_stop_cb_default;
+  opt.len = STREAM_BUFFER_SIZE;
+  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
+  if (NULL == opt.buf) {
+    lj_mem_free(G(L), ctx, sizeof(*ctx));
+    goto errmem;
+  }
+
+  ctx->g = G(L);
+  ctx->stream = fopen(fname, "wb");
+
+  if (ctx->stream == NULL) {
+    memprof_ctx_free(ctx, opt.buf);
+    return luaL_fileresult(L, 0, fname);
+  }
+
+  memprof_status = lj_memprof_start(L, &opt);
+  started = memprof_status == PROFILE_SUCCESS;
+
+  if (LJ_UNLIKELY(!started)) {
+    fclose(ctx->stream);
+    remove(fname);
+    memprof_ctx_free(ctx, opt.buf);
+    switch (memprof_status) {
+    case PROFILE_ERRRUN:
+      lua_pushnil(L);
+      lua_pushstring(L, err2msg(LJ_ERR_PROF_ISRUNNING));
+      lua_pushinteger(L, EINVAL);
+      return 3;
+    case PROFILE_ERRIO:
+      return luaL_fileresult(L, 0, fname);
+    default:
+      lua_assert(0);
+      break;
+    }
+  }
+  lua_pushboolean(L, started);
+
+  return 1;
+errmem:
+  lua_pushnil(L);
+  lua_pushstring(L, err2msg(LJ_ERR_ERRMEM));
+  lua_pushinteger(L, ENOMEM);
+  return 3;
+}
+
+/* local stopped, err, errno = misc.memprof.stop() */
+LJLIB_CF(misc_memprof_stop)
+{
+  int status = lj_memprof_stop();
+  int stopped_successfully = status == PROFILE_SUCCESS;
+  if (!stopped_successfully) {
+    switch (status) {
+    case PROFILE_ERRRUN:
+      lua_pushnil(L);
+      lua_pushstring(L, err2msg(LJ_ERR_PROF_NOTRUNNING));
+      lua_pushinteger(L, EINVAL);
+      return 3;
+    case PROFILE_ERRIO:
+      return luaL_fileresult(L, 0, NULL);
+    default:
+      lua_assert(0);
+      break;
+    }
+  }
+  lua_pushboolean(L, stopped_successfully);
+  return 1;
+}
+
+#include "lj_libdef.h"
+
+/* ------------------------------------------------------------------------ */
+
 LUALIB_API int luaopen_misc(struct lua_State *L)
 {
   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
+  LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
   return 1;
 }
diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
index de7b867..6816da2 100644
--- a/src/lj_errmsg.h
+++ b/src/lj_errmsg.h
@@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
 ERRDEF(FFI_NYICALL,	"NYI: cannot call this C function (yet)")
 #endif
 
+#if LJ_HASPROFILE || LJ_HASMEMPROF
+/* Profiler errors. */
+ERRDEF(PROF_ISRUNNING,	"profiler is running already")
+ERRDEF(PROF_NOTRUNNING,	"profiler is not running")
+#endif
+
 #undef ERRDEF
 
 /* Detecting unused error messages:
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (5 preceding siblings ...)
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for " Sergey Kaplun
@ 2020-12-25 15:26 ` Sergey Kaplun
  2020-12-26 22:56   ` Igor Munkin
  2020-12-27 13:24   ` Sergey Ostanevich
  2020-12-28  2:05 ` [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler Sergey Kaplun
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-25 15:26 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch adds a parser for binary data dumped via the memory profiler. It is
a set of the following Lua modules:
* utils/bufread.lua: read binary data from a file.
* utils/symtab.lua: symbol table decode functions
* memprof/parse.lua: decode the memory profiler event stream
* memprof/humanize.lua: display decoded data in human readable format
* memprof.lua: Lua script to display data

There is also a stand-alone bash script <luajit-parse-memprof> that displays
human readable parsed data to a stdout. It calls <memprof.lua> with a
corresponding LUA_PATH.

Part of tarantool/tarantool#5442
Part of tarantool/tarantool#5490
---

Changes in v2:
  - Add (un)?install sections in Makefile
  - Modify bash script correspondingly.
  - Change Lua modules layout.
  - Adjusted test. Check that errno returns in case of error is added.
  - Code clean up.

 Makefile                           |  39 +++++-
 test/misclib-memprof-lapi.test.lua | 135 +++++++++++++++++++++
 tools/luajit-parse-memprof         |   9 ++
 tools/memprof.lua                  | 109 +++++++++++++++++
 tools/memprof/humanize.lua         |  45 +++++++
 tools/memprof/parse.lua            | 188 +++++++++++++++++++++++++++++
 tools/utils/bufread.lua            | 147 ++++++++++++++++++++++
 tools/utils/symtab.lua             |  89 ++++++++++++++
 8 files changed, 757 insertions(+), 4 deletions(-)
 create mode 100755 test/misclib-memprof-lapi.test.lua
 create mode 100755 tools/luajit-parse-memprof
 create mode 100644 tools/memprof.lua
 create mode 100644 tools/memprof/humanize.lua
 create mode 100644 tools/memprof/parse.lua
 create mode 100644 tools/utils/bufread.lua
 create mode 100644 tools/utils/symtab.lua

diff --git a/Makefile b/Makefile
index 4a56917..ba4aa2f 100644
--- a/Makefile
+++ b/Makefile
@@ -37,6 +37,9 @@ INSTALL_INC=   $(DPREFIX)/include/luajit-$(MAJVER).$(MINVER)
 
 INSTALL_LJLIBD= $(INSTALL_SHARE)/luajit-$(VERSION)
 INSTALL_JITLIB= $(INSTALL_LJLIBD)/jit
+INSTALL_UTILSLIB= $(INSTALL_LJLIBD)/utils
+INSTALL_MEMPROFLIB= $(INSTALL_LJLIBD)/memprof
+INSTALL_TOOLSLIB= $(INSTALL_LJLIBD)
 INSTALL_LMODD= $(INSTALL_SHARE)/lua
 INSTALL_LMOD= $(INSTALL_LMODD)/$(ABIVER)
 INSTALL_CMODD= $(INSTALL_LIB)/lua
@@ -54,6 +57,8 @@ INSTALL_DYLIBSHORT1= libluajit-$(ABIVER).dylib
 INSTALL_DYLIBSHORT2= libluajit-$(ABIVER).$(MAJVER).dylib
 INSTALL_DYLIBNAME= libluajit-$(ABIVER).$(MAJVER).$(MINVER).$(RELVER).dylib
 INSTALL_PCNAME= luajit.pc
+INSTALL_TMEMPROFNAME= luajit-$(VERSION)-parse-memprof
+INSTALL_TMEMPROFSYMNAME= luajit-parse-memprof
 
 INSTALL_STATIC= $(INSTALL_LIB)/$(INSTALL_ANAME)
 INSTALL_DYN= $(INSTALL_LIB)/$(INSTALL_SONAME)
@@ -62,11 +67,15 @@ INSTALL_SHORT2= $(INSTALL_LIB)/$(INSTALL_SOSHORT2)
 INSTALL_T= $(INSTALL_BIN)/$(INSTALL_TNAME)
 INSTALL_TSYM= $(INSTALL_BIN)/$(INSTALL_TSYMNAME)
 INSTALL_PC= $(INSTALL_PKGCONFIG)/$(INSTALL_PCNAME)
+INSTALL_TMEMPROF= $(INSTALL_BIN)/$(INSTALL_TMEMPROFNAME)
+INSTALL_TMEMPROFSYM= $(INSTALL_BIN)/$(INSTALL_TMEMPROFSYMNAME)
 
 INSTALL_DIRS= $(INSTALL_BIN) $(INSTALL_LIB) $(INSTALL_INC) $(INSTALL_MAN) \
-  $(INSTALL_PKGCONFIG) $(INSTALL_JITLIB) $(INSTALL_LMOD) $(INSTALL_CMOD)
+  $(INSTALL_PKGCONFIG) $(INSTALL_JITLIB) $(INSTALL_LMOD) $(INSTALL_CMOD) \
+  $(INSTALL_UTILSLIB) $(INSTALL_MEMPROFLIB) $(INSTALL_TOOLSLIB)
 UNINSTALL_DIRS= $(INSTALL_JITLIB) $(INSTALL_LJLIBD) $(INSTALL_INC) \
-  $(INSTALL_LMOD) $(INSTALL_LMODD) $(INSTALL_CMOD) $(INSTALL_CMODD)
+  $(INSTALL_LMOD) $(INSTALL_LMODD) $(INSTALL_CMOD) $(INSTALL_CMODD) \
+  $(INSTALL_UTILSLIB) $(INSTALL_MEMPROFLIB) $(INSTALL_TOOLSLIB)
 
 RM= rm -f
 MKDIR= mkdir -p
@@ -78,6 +87,8 @@ UNINSTALL= $(RM)
 LDCONFIG= ldconfig -n
 SED_PC= sed -e "s|^prefix=.*|prefix=$(PREFIX)|" \
             -e "s|^multilib=.*|multilib=$(MULTILIB)|"
+SED_TMEMPROF= sed -e "s|^TOOL_DIR=.*|TOOL_DIR=$(INSTALL_TOOLSLIB)|" \
+                  -e "s|^LUAJIT_BIN=.*|LUAJIT_BIN=$(INSTALL_T)|"
 
 FILE_T= luajit
 FILE_A= libluajit.a
@@ -89,6 +100,10 @@ FILES_JITLIB= bc.lua bcsave.lua dump.lua p.lua v.lua zone.lua \
 	      dis_x86.lua dis_x64.lua dis_arm.lua dis_arm64.lua \
 	      dis_arm64be.lua dis_ppc.lua dis_mips.lua dis_mipsel.lua \
 	      dis_mips64.lua dis_mips64el.lua vmdef.lua
+FILES_UTILSLIB= bufread.lua symtab.lua
+FILES_MEMPROFLIB= parse.lua humanize.lua
+FILES_TOOLSLIB= memprof.lua
+FILE_TMEMPROF= luajit-parse-memprof
 
 ifeq (,$(findstring Windows,$(OS)))
   HOST_SYS:= $(shell uname -s)
@@ -130,21 +145,37 @@ install: $(INSTALL_DEP)
 	  $(RM) $(FILE_PC).tmp
 	cd src && $(INSTALL_F) $(FILES_INC) $(INSTALL_INC)
 	cd src/jit && $(INSTALL_F) $(FILES_JITLIB) $(INSTALL_JITLIB)
+	cd tools/utils && $(INSTALL_F) $(FILES_UTILSLIB) $(INSTALL_UTILSLIB)
+	cd tools/memprof && $(INSTALL_F) $(FILES_MEMPROFLIB) $(INSTALL_MEMPROFLIB)
+	cd tools && $(INSTALL_F) $(FILES_TOOLSLIB) $(INSTALL_TOOLSLIB)
+	cd tools && $(SED_TMEMPROF) $(FILE_TMEMPROF) > $(FILE_TMEMPROF).tmp && \
+	  $(INSTALL_X) $(FILE_TMEMPROF).tmp $(INSTALL_TMEMPROF) && \
+	  $(RM) $(FILE_TMEMPROF).tmp
 	@echo "==== Successfully installed LuaJIT $(VERSION) to $(PREFIX) ===="
 	@echo ""
 	@echo "Note: the development releases deliberately do NOT install a symlink for luajit"
-	@echo "You can do this now by running this command (with sudo):"
+	@echo "You can do this now by running these commands (with sudo):"
 	@echo ""
 	@echo "  $(SYMLINK) $(INSTALL_TNAME) $(INSTALL_TSYM)"
+	@echo "  $(SYMLINK) $(INSTALL_TMEMPROFNAME) $(INSTALL_TMEMPROFSYM)"
 	@echo ""
 
 
 uninstall:
 	@echo "==== Uninstalling LuaJIT $(VERSION) from $(PREFIX) ===="
-	$(UNINSTALL) $(INSTALL_T) $(INSTALL_STATIC) $(INSTALL_DYN) $(INSTALL_SHORT1) $(INSTALL_SHORT2) $(INSTALL_MAN)/$(FILE_MAN) $(INSTALL_PC)
+	$(UNINSTALL) $(INSTALL_T) $(INSTALL_STATIC) $(INSTALL_DYN) $(INSTALL_SHORT1) $(INSTALL_SHORT2) $(INSTALL_MAN)/$(FILE_MAN) $(INSTALL_PC) $(INSTALL_TMEMPROF)
 	for file in $(FILES_JITLIB); do \
 	  $(UNINSTALL) $(INSTALL_JITLIB)/$$file; \
 	  done
+	for file in $(FILES_UTILSLIB); do \
+	  $(UNINSTALL) $(INSTALL_UTILSLIB)/$$file; \
+	  done
+	for file in $(FILES_MEMPROFLIB); do \
+	  $(UNINSTALL) $(INSTALL_MEMPROFLIB)/$$file; \
+	  done
+	for file in $(FILES_TOOLSLIB); do \
+	  $(UNINSTALL) $(INSTALL_TOOLSLIB)/$$file; \
+	  done
 	for file in $(FILES_INC); do \
 	  $(UNINSTALL) $(INSTALL_INC)/$$file; \
 	  done
diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
new file mode 100755
index 0000000..e02c6fa
--- /dev/null
+++ b/test/misclib-memprof-lapi.test.lua
@@ -0,0 +1,135 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+
+local test = tap.test("misc-memprof-lapi")
+test:plan(9)
+
+jit.off()
+jit.flush()
+
+-- FIXME: Launch tests with LUA_PATH enviroment variable.
+local path = arg[0]:gsub('/[^/]+%.test%.lua', '')
+local path_suffix = '../tools/?.lua;'
+package.path = ('%s/%s;'):format(path, path_suffix)..package.path
+
+local table_new = require "table.new"
+
+local bufread = require "utils.bufread"
+local memprof = require "memprof.parse"
+local symtab = require "utils.symtab"
+
+local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
+local BAD_PATH = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
+
+local function payload()
+  -- Preallocate table to avoid array part reallocations.
+  local _ = table_new(100, 0)
+
+  -- Want too see 100 objects here.
+  for i = 1, 100 do
+    -- Try to avoid crossing with "test" module objects.
+    _[i] = "memprof-str-"..i
+  end
+
+  _ = nil
+  -- VMSTATE == GC, reported as INTERNAL.
+  collectgarbage()
+end
+
+local function generate_output(filename)
+  -- Clean up all garbage to avoid polution of free.
+  collectgarbage()
+
+  local res, err = misc.memprof.start(filename)
+  -- Should start succesfully.
+  assert(res, err)
+
+  payload()
+
+  res, err = misc.memprof.stop()
+  -- Should stop succesfully.
+  assert(res, err)
+end
+
+local function fill_ev_type(events, symbols, event_type)
+  local ev_type = {}
+  for _, event in pairs(events[event_type]) do
+    local addr = event.loc.addr
+    if addr == 0 then
+      ev_type.INTERNAL = {
+        name = "INTERNAL",
+        num = event.num,
+    }
+    elseif symbols[addr] then
+      ev_type[event.loc.line] = {
+        name = symbols[addr].name,
+        num = event.num,
+      }
+    end
+  end
+  return ev_type
+end
+
+local function check_alloc_report(alloc, line, function_line, nevents)
+  assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
+  assert(alloc[line].num == nevents, ("got=%d, ecpected=%d"):format(
+    alloc[line].num,
+    nevents
+  ))
+  return true
+end
+
+-- Not a directory.
+local res, err, errno = misc.memprof.start(BAD_PATH)
+test:ok(res == nil and err:match("Not a directory"))
+test:ok(type(errno) == "number")
+
+-- Profiler is running.
+res, err = misc.memprof.start(TMP_BINFILE)
+assert(res, err)
+res, err, errno = misc.memprof.start(TMP_BINFILE)
+test:ok(res == nil and err:match("profiler is running already"))
+test:ok(type(errno) == "number")
+
+res, err = misc.memprof.stop()
+assert(res, err)
+
+-- Profiler is not running.
+res, err, errno = misc.memprof.stop()
+test:ok(res == nil and err:match("profiler is not running"))
+test:ok(type(errno) == "number")
+
+-- Test profiler output and parse.
+res, err = pcall(generate_output, TMP_BINFILE)
+
+-- Want to cleanup carefully if something went wrong.
+if not res then
+  os.remove(TMP_BINFILE)
+  error(err)
+end
+
+local reader = bufread.new(TMP_BINFILE)
+local symbols = symtab.parse(reader)
+local events = memprof.parse(reader, symbols)
+
+-- We don't need it any more.
+os.remove(TMP_BINFILE)
+
+local alloc = fill_ev_type(events, symbols, "alloc")
+local free = fill_ev_type(events, symbols, "free")
+
+-- Check allocation reports. The second argument is a line number
+-- of the allocation event itself. The third is a line number of
+-- the corresponding function definition. The last one is
+-- the number of allocations.
+-- 1 event - alocation of table by itself + 1 allocation
+-- of array part as far it is bigger than LJ_MAX_COLOSIZE (16).
+test:ok(check_alloc_report(alloc, 27, 25, 2))
+-- 100 strings allocations.
+test:ok(check_alloc_report(alloc, 32, 25, 100))
+
+-- Collect all previous allocated objects.
+test:ok(free.INTERNAL.num == 102)
+
+os.exit(test:check() and 0 or 1)
diff --git a/tools/luajit-parse-memprof b/tools/luajit-parse-memprof
new file mode 100755
index 0000000..c814301
--- /dev/null
+++ b/tools/luajit-parse-memprof
@@ -0,0 +1,9 @@
+#!/bin/bash
+#
+# Launcher for memprof parser.
+
+# This two variables are replaced on installing.
+TOOL_DIR=$(dirname `readlink -f $0`)
+LUAJIT_BIN=$TOOL_DIR/../src/luajit
+
+LUA_PATH="$TOOL_DIR/?.lua;;" $LUAJIT_BIN $TOOL_DIR/memprof.lua $@
diff --git a/tools/memprof.lua b/tools/memprof.lua
new file mode 100644
index 0000000..7476757
--- /dev/null
+++ b/tools/memprof.lua
@@ -0,0 +1,109 @@
+-- A tool for parsing and visualisation of LuaJIT's memory
+-- profiler output.
+--
+-- TODO:
+-- * Think about callgraph memory profiling for complex
+--   table reallocations
+-- * Nicer output, probably an HTML view
+-- * Demangling of C symbols
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local bufread = require "utils.bufread"
+local memprof = require "memprof.parse"
+local symtab  = require "utils.symtab"
+local view    = require "memprof.humanize"
+
+local stdout, stderr = io.stdout, io.stderr
+local match, gmatch = string.match, string.gmatch
+
+-- Program options.
+local opt_map = {}
+
+function opt_map.help()
+  stdout:write [[
+luajit-parse-memprof - parser of the memory usage profile collected
+                       with LuaJIT's memprof.
+
+SYNOPSIS
+
+luajit-parse-memprof [options] memprof.bin
+
+Supported options are:
+
+  --help                            Show this help and exit
+]]
+  os.exit(0)
+end
+
+-- Print error and exit with error status.
+local function opterror(...)
+  stderr:write("luajit-parse-memprof.lua: ERROR: ", ...)
+  stderr:write("\n")
+  os.exit(1)
+end
+
+-- Parse single option.
+local function parseopt(opt, args)
+  local opt_current = #opt == 1 and "-"..opt or "--"..opt
+  local f = opt_map[opt]
+  if not f then
+    opterror("unrecognized option `", opt_current, "'. Try `--help'.\n")
+  end
+  f(args)
+end
+
+-- Parse arguments.
+local function parseargs(args)
+  -- Process all option arguments.
+  args.argn = 1
+  repeat
+    local a = args[args.argn]
+    if not a then
+      break
+    end
+    local lopt, opt = match(a, "^%-(%-?)(.+)")
+    if not opt then
+      break
+    end
+    args.argn = args.argn + 1
+    if lopt == "" then
+      -- Loop through short options.
+      for o in gmatch(opt, ".") do
+        parseopt(o, args)
+      end
+    else
+      -- Long option.
+      parseopt(opt, args)
+    end
+  until false
+
+  -- Check for proper number of arguments.
+  local nargs = #args - args.argn + 1
+  if nargs ~= 1 then
+    opt_map.help()
+  end
+
+  -- Translate a single input file.
+  -- TODO: Handle multiple files?
+  return args[args.argn]
+end
+
+local inputfile = parseargs{...}
+
+local reader  = bufread.new(inputfile)
+local symbols = symtab.parse(reader)
+local events  = memprof.parse(reader, symbols)
+
+stdout:write("ALLOCATIONS", "\n")
+view.render(events.alloc, symbols)
+stdout:write("\n")
+
+stdout:write("REALLOCATIONS", "\n")
+view.render(events.realloc, symbols)
+stdout:write("\n")
+
+stdout:write("DEALLOCATIONS", "\n")
+view.render(events.free, symbols)
+stdout:write("\n")
diff --git a/tools/memprof/humanize.lua b/tools/memprof/humanize.lua
new file mode 100644
index 0000000..109a39d
--- /dev/null
+++ b/tools/memprof/humanize.lua
@@ -0,0 +1,45 @@
+-- Simple human-readable renderer of LuaJIT's memprof profile.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local symtab = require "utils.symtab"
+
+local M = {}
+
+function M.render(events, symbols)
+  local ids = {}
+
+  for id, _ in pairs(events) do
+    table.insert(ids, id)
+  end
+
+  table.sort(ids, function(id1, id2)
+    return events[id1].num > events[id2].num
+  end)
+
+  for i = 1, #ids do
+    local event = events[ids[i]]
+    print(string.format("%s: %d\t%d\t%d",
+      symtab.demangle(symbols, event.loc),
+      event.num,
+      event.alloc,
+      event.free
+    ))
+
+    local prim_loc = {}
+    for _, loc in pairs(event.primary) do
+      table.insert(prim_loc, symtab.demangle(symbols, loc))
+    end
+    if #prim_loc ~= 0 then
+      table.sort(prim_loc)
+      print("\tOverrides:")
+      for j = 1, #prim_loc do
+        print(string.format("\t\t%s", prim_loc[j]))
+      end
+      print("")
+    end
+  end
+end
+
+return M
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
new file mode 100644
index 0000000..f4996f4
--- /dev/null
+++ b/tools/memprof/parse.lua
@@ -0,0 +1,188 @@
+-- Parser of LuaJIT's memprof binary stream.
+-- The format spec can be found in <src/lj_memprof.h>.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local bit = require "bit"
+local band = bit.band
+local lshift = bit.lshift
+
+local string_format = string.format
+
+local LJM_MAGIC = "ljm"
+local LJM_CURRENT_VERSION = 1
+
+local LJM_EPILOGUE_HEADER = 0x80
+
+local AEVENT_ALLOC = 1
+local AEVENT_FREE = 2
+local AEVENT_REALLOC = 3
+
+local ASOURCE_INT = lshift(1, 2)
+local ASOURCE_LFUNC = lshift(2, 2)
+local ASOURCE_CFUNC = lshift(3, 2)
+
+local M = {}
+
+local function new_event(loc)
+  return {
+    loc = loc,
+    num = 0,
+    free = 0,
+    alloc = 0,
+    primary = {},
+  }
+end
+
+local function link_to_previous(heap, e, oaddr)
+  -- Memory at oaddr was allocated before we started tracking.
+  local heap_chunk = heap[oaddr]
+  if heap_chunk then
+    -- Save Lua code location (line) by address (id).
+    e.primary[heap_chunk[2]] = heap_chunk[3]
+  end
+end
+
+local function id_location(addr, line)
+  return string_format("f%#xl%d", addr, line), {
+    addr = addr,
+    line = line,
+  }
+end
+
+local function parse_location(reader, asource)
+  if asource == ASOURCE_INT then
+    return id_location(0, 0)
+  elseif asource == ASOURCE_CFUNC then
+    return id_location(reader:read_uleb128(), 0)
+  elseif asource == ASOURCE_LFUNC then
+    return id_location(reader:read_uleb128(), reader:read_uleb128())
+  end
+  error("Unknown asource "..asource)
+end
+
+local function parse_alloc(reader, asource, events, heap)
+  local id, loc = parse_location(reader, asource)
+
+  local naddr = reader:read_uleb128()
+  local nsize = reader:read_uleb128()
+
+  if not events[id] then
+    events[id] = new_event(loc)
+  end
+  local e = events[id]
+  e.num = e.num + 1
+  e.alloc = e.alloc + nsize
+
+  heap[naddr] = {nsize, id, loc}
+end
+
+local function parse_realloc(reader, asource, events, heap)
+  local id, loc = parse_location(reader, asource)
+
+  local oaddr = reader:read_uleb128()
+  local osize = reader:read_uleb128()
+  local naddr = reader:read_uleb128()
+  local nsize = reader:read_uleb128()
+
+  if not events[id] then
+    events[id] = new_event(loc)
+  end
+  local e = events[id]
+  e.num = e.num + 1
+  e.free = e.free + osize
+  e.alloc = e.alloc + nsize
+
+  link_to_previous(heap, e, oaddr)
+
+  heap[oaddr] = nil
+  heap[naddr] = {nsize, id, loc}
+end
+
+local function parse_free(reader, asource, events, heap)
+  local id, loc = parse_location(reader, asource)
+
+  local oaddr = reader:read_uleb128()
+  local osize = reader:read_uleb128()
+
+  if not events[id] then
+    events[id] = new_event(loc)
+  end
+  local e = events[id]
+  e.num = e.num + 1
+  e.free = e.free + osize
+
+  link_to_previous(heap, e, oaddr)
+
+  heap[oaddr] = nil
+end
+
+local parsers = {
+  [AEVENT_ALLOC] = {evname = "alloc", parse = parse_alloc},
+  [AEVENT_FREE] = {evname = "free", parse = parse_free},
+  [AEVENT_REALLOC] = {evname = "realloc", parse = parse_realloc},
+}
+
+local function ev_header_is_valid(evh)
+  return evh <= 0x0f or evh == LJM_EPILOGUE_HEADER
+end
+
+-- Splits event header into event type (aka aevent = allocation
+-- event) and event source (aka asource = allocation source).
+local function ev_header_split(evh)
+  return band(evh, 0x3), band(evh, lshift(0x3, 2))
+end
+
+local function parse_event(reader, events)
+  local ev_header = reader:read_octet()
+
+  assert(ev_header_is_valid(ev_header), "Bad ev_header "..ev_header)
+
+  if ev_header == LJM_EPILOGUE_HEADER then
+    return false
+  end
+
+  local aevent, asource = ev_header_split(ev_header)
+  local parser = parsers[aevent]
+
+  assert(parser, "Bad aevent "..aevent)
+
+  parser.parse(reader, asource, events[parser.evname], events.heap)
+
+  return true
+end
+
+function M.parse(reader)
+  local events = {
+    alloc = {},
+    realloc = {},
+    free = {},
+    heap = {},
+  }
+
+  local magic = reader:read_octets(3)
+  local version = reader:read_octets(1)
+  -- Dummy-consume reserved bytes.
+  local _ = reader:read_octets(3)
+
+  if magic ~= LJM_MAGIC then
+    error("Bad LJM format prologue: "..magic)
+  end
+
+  if string.byte(version) ~= LJM_CURRENT_VERSION then
+    error(string_format(
+      "LJM format version mismatch: the tool expects %d, but your data is %d",
+      LJM_CURRENT_VERSION,
+      string.byte(version)
+    ))
+  end
+
+  while parse_event(reader, events) do
+    -- Empty body.
+  end
+
+  return events
+end
+
+return M
diff --git a/tools/utils/bufread.lua b/tools/utils/bufread.lua
new file mode 100644
index 0000000..873e06a
--- /dev/null
+++ b/tools/utils/bufread.lua
@@ -0,0 +1,147 @@
+-- An implementation of buffered reading data from
+-- an arbitrary binary file.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local assert = assert
+
+local ffi = require "ffi"
+local bit = require "bit"
+
+local ffi_C = ffi.C
+local band = bit.band
+
+local LINK_BIT = 0x80
+local PAYLOAD_MASK = 0x7f
+local SHIFT_STEP = 7
+
+-- 10 Mb.
+local BUFFER_SIZE = 10 * 1024 * 1024
+
+local M = {}
+
+ffi.cdef[[
+  void *memcpy(void *, const void *, size_t);
+
+  typedef struct FILE_ FILE;
+  FILE *fopen(const char *, const char *);
+  size_t fread(void *, size_t, size_t, FILE *);
+  int feof(FILE *);
+  int fclose(FILE *);
+]]
+
+local function _read_stream(reader, n)
+  local tail_size = reader._end - reader._pos
+
+  if tail_size >= n then
+    -- Enough data to satisfy the request of n bytes.
+    return true
+  end
+
+  -- Otherwise carry tail_size bytes from the end of the buffer
+  -- to the start and fill up free_size bytes with fresh data.
+  -- tail_size < n <= free_size (see assert below) ensures that
+  -- we don't copy overlapping memory regions.
+  -- reader._pos == 0 means filling buffer for the first time.
+
+  local free_size = reader._pos > 0 and reader._pos or n
+
+  assert(n <= free_size, "Internal buffer is large enough")
+
+  if tail_size ~= 0 then
+    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
+  end
+
+  local bytes_read = ffi_C.fread(
+    reader._buf + tail_size, 1, free_size, reader._file
+  )
+
+  reader._pos = 0
+  reader._end = tail_size + bytes_read
+
+  return reader._end - reader._pos >= n
+end
+
+function M.read_octet(reader)
+  if not _read_stream(reader, 1) then
+    return nil
+  end
+
+  local oct = reader._buf[reader._pos]
+  reader._pos = reader._pos + 1
+  return oct
+end
+
+function M.read_octets(reader, n)
+  if not _read_stream(reader, n) then
+    return nil
+  end
+
+  local octets = ffi.string(reader._buf + reader._pos, n)
+  reader._pos = reader._pos + n
+  return octets
+end
+
+function M.read_uleb128(reader)
+  local value = ffi.new("uint64_t", 0)
+  local shift = 0
+
+  repeat
+    local oct = M.read_octet(reader)
+
+    if oct == nil then
+      error(string.format("fread, errno: %d", ffi.errno()))
+    end
+
+    -- Alas, bit library works only with 32-bit arguments.
+    local oct_u64 = ffi.new("uint64_t", band(oct, PAYLOAD_MASK))
+    value = value + oct_u64 * (2 ^ shift)
+    shift = shift + SHIFT_STEP
+
+  until band(oct, LINK_BIT) == 0
+
+  return tonumber(value)
+end
+
+function M.read_string(reader)
+  local len = M.read_uleb128(reader)
+  return M.read_octets(reader, len)
+end
+
+function M.eof(reader)
+  local sys_feof = ffi_C.feof(reader._file)
+  if sys_feof == 0 then
+    return false
+  end
+  -- Otherwise return true only we have reached
+  -- the end of the buffer.
+  return reader._pos == reader._end
+end
+
+function M.new(fname)
+  local file = ffi_C.fopen(fname, "rb")
+  if file == nil then
+    error(string.format("fopen, errno: %d", ffi.errno()))
+  end
+
+  local finalizer = function(f)
+    if ffi_C.fclose(f) ~= 0 then
+      error(string.format("fclose, errno: %d", ffi.errno()))
+    end
+    ffi.gc(f, nil)
+  end
+
+  local reader = setmetatable({
+    _file = ffi.gc(file, finalizer),
+    _buf = ffi.new("uint8_t[?]", BUFFER_SIZE),
+    _pos = 0,
+    _end = 0,
+  }, {__index = M})
+
+  _read_stream(reader, BUFFER_SIZE)
+
+  return reader
+end
+
+return M
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
new file mode 100644
index 0000000..f3e5e31
--- /dev/null
+++ b/tools/utils/symtab.lua
@@ -0,0 +1,89 @@
+-- Parser of LuaJIT's symtab binary stream.
+-- The format spec can be found in <src/lj_symtab.h>.
+--
+-- Major portions taken verbatim or adapted from the LuaVela.
+-- Copyright (C) 2015-2019 IPONWEB Ltd.
+
+local bit = require "bit"
+
+local band = bit.band
+local string_format = string.format
+
+local LJS_MAGIC = "ljs"
+local LJS_CURRENT_VERSION = 1
+local LJS_EPILOGUE_HEADER = 0x80
+local LJS_SYMTYPE_MASK = 0x03
+
+local SYMTAB_LFUNC = 0
+
+local M = {}
+
+-- Parse a single entry in a symtab: lfunc symbol.
+local function parse_sym_lfunc(reader, symtab)
+  local sym_addr = reader:read_uleb128()
+  local sym_chunk = reader:read_string()
+  local sym_line = reader:read_uleb128()
+
+  symtab[sym_addr] = {
+    name = string_format("%s:%d", sym_chunk, sym_line),
+  }
+end
+
+local parsers = {
+  [SYMTAB_LFUNC] = parse_sym_lfunc,
+}
+
+function M.parse(reader)
+  local symtab = {}
+  local magic = reader:read_octets(3)
+  local version = reader:read_octets(1)
+
+  -- Dummy-consume reserved bytes.
+  local _ = reader:read_octets(3)
+
+  if magic ~= LJS_MAGIC then
+    error("Bad LJS format prologue: "..magic)
+  end
+
+  if string.byte(version) ~= LJS_CURRENT_VERSION then
+    error(string_format(
+         "LJS format version mismatch:"..
+         "the tool expects %d, but your data is %d",
+         LJS_CURRENT_VERSION,
+         string.byte(version)
+    ))
+
+  end
+
+  while not reader:eof() do
+    local header = reader:read_octet()
+    local is_final = band(header, LJS_EPILOGUE_HEADER) ~= 0
+
+    if is_final then
+      break
+    end
+
+    local sym_type = band(header, LJS_SYMTYPE_MASK)
+    if parsers[sym_type] then
+      parsers[sym_type](reader, symtab)
+    end
+  end
+
+  return symtab
+end
+
+function M.demangle(symtab, loc)
+  local addr = loc.addr
+
+  if addr == 0 then
+    return "INTERNAL"
+  end
+
+  if symtab[addr] then
+    return string_format("%s, line %d", symtab[addr].name, loc.line)
+  end
+
+  return string_format("CFUNC %#x", addr)
+end
+
+return M
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer Sergey Kaplun
@ 2020-12-25 21:42   ` Igor Munkin
  2020-12-26  9:32     ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-25 21:42 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! LGTM, except the several nits below.

On 25.12.20, Sergey Kaplun wrote:
> Most of the numeric data written by the memory profiler is encoded
> via LEB128 compression. This patch introduces the module for encoding
> and decoding 64bit number to LEB128 form.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>   - Removed reader funciton's parameter named guard.
>   - Code style fixes.
> 
>  src/Makefile          |   3 +-
>  src/Makefile.dep      |   7 ++-
>  src/lj_utils.h        |  58 +++++++++++++++++++
>  src/lj_utils_leb128.c | 132 ++++++++++++++++++++++++++++++++++++++++++
>  src/ljamalg.c         |   1 +
>  5 files changed, 197 insertions(+), 4 deletions(-)
>  create mode 100644 src/lj_utils.h
>  create mode 100644 src/lj_utils_leb128.c
> 
> diff --git a/src/Makefile b/src/Makefile
> index 2786348..dc2ddb6 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -466,6 +466,7 @@ endif
>  DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
>  DASM_DASC= vm_$(DASM_ARCH).dasc
>  
> +UTILS_O= lj_utils_leb128.o

Minor: I personally believe this is excess and you can simply move this
object file to LJCORE_O list. BUILDVM_O is built with another toolchain;
LJLIB_O is used for generating auxiliary headers with buildvm.
Everything else is mentioned in LJCORE_O.

>  BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
>  	   host/buildvm_lib.o host/buildvm_fold.o
>  BUILDVM_T= host/buildvm

<snipped>

> diff --git a/src/lj_utils.h b/src/lj_utils.h
> new file mode 100644
> index 0000000..1671e8e
> --- /dev/null
> +++ b/src/lj_utils.h
> @@ -0,0 +1,58 @@

<snipped>

> +/*
> +** Reads a value from a buffer of bytes to a int64_t output.

Typo: s/a int64_t/an int64_t/g.

<g> flag means the note relates to all comments below.

> +** No bounds checks for the buffer. Returns number of bytes read.
> +*/

<snipped>

> +/*
> +** Writes a value from an signed 64-bit input to a buffer of bytes.

Typo: s/an signed/a signed/.

> +** No bounds checks for the buffer. Returns number of bytes written.
> +*/

<snipped>

> diff --git a/src/lj_utils_leb128.c b/src/lj_utils_leb128.c
> new file mode 100644
> index 0000000..ce8081b
> --- /dev/null
> +++ b/src/lj_utils_leb128.c
> @@ -0,0 +1,132 @@

<snipped>

> +#define LINK_BIT               (0x80)
> +#define MIN_TWOBYTE_VALUE      (0x80)
> +#define PAYLOAD_MASK           (0x7f)
> +#define SHIFT_STEP             (7)
> +#define LEB_SIGN_BIT           (0x40)

Typo: Why did you change the whitespace here? Everything was OK with it
in the previous version.

> +

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer
  2020-12-25 21:42   ` Igor Munkin
@ 2020-12-26  9:32     ` Sergey Kaplun
  2020-12-26 13:57       ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-26  9:32 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review.

On 26.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! LGTM, except the several nits below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > Most of the numeric data written by the memory profiler is encoded
> > via LEB128 compression. This patch introduces the module for encoding
> > and decoding 64bit number to LEB128 form.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >   - Removed reader funciton's parameter named guard.
> >   - Code style fixes.
> > 
> >  src/Makefile          |   3 +-
> >  src/Makefile.dep      |   7 ++-
> >  src/lj_utils.h        |  58 +++++++++++++++++++
> >  src/lj_utils_leb128.c | 132 ++++++++++++++++++++++++++++++++++++++++++
> >  src/ljamalg.c         |   1 +
> >  5 files changed, 197 insertions(+), 4 deletions(-)
> >  create mode 100644 src/lj_utils.h
> >  create mode 100644 src/lj_utils_leb128.c
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index 2786348..dc2ddb6 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -466,6 +466,7 @@ endif
> >  DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
> >  DASM_DASC= vm_$(DASM_ARCH).dasc
> >  
> > +UTILS_O= lj_utils_leb128.o
> 
> Minor: I personally believe this is excess and you can simply move this
> object file to LJCORE_O list. BUILDVM_O is built with another toolchain;
> LJLIB_O is used for generating auxiliary headers with buildvm.
> Everything else is mentioned in LJCORE_O.

Reasonable. Dropped.

> 
> >  BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
> >  	   host/buildvm_lib.o host/buildvm_fold.o
> >  BUILDVM_T= host/buildvm
> 
> <snipped>
> 
> > diff --git a/src/lj_utils.h b/src/lj_utils.h
> > new file mode 100644
> > index 0000000..1671e8e
> > --- /dev/null
> > +++ b/src/lj_utils.h
> > @@ -0,0 +1,58 @@
> 
> <snipped>
> 
> > +/*
> > +** Reads a value from a buffer of bytes to a int64_t output.
> 
> Typo: s/a int64_t/an int64_t/g.

Fixed. Thanks.

> 
> <g> flag means the note relates to all comments below.

Copy that.

> 
> > +** No bounds checks for the buffer. Returns number of bytes read.
> > +*/
> 
> <snipped>
> 
> > +/*
> > +** Writes a value from an signed 64-bit input to a buffer of bytes.
> 
> Typo: s/an signed/a signed/.

Fixed. Thanks.

> 
> > +** No bounds checks for the buffer. Returns number of bytes written.
> > +*/
> 
> <snipped>
> 
> > diff --git a/src/lj_utils_leb128.c b/src/lj_utils_leb128.c
> > new file mode 100644
> > index 0000000..ce8081b
> > --- /dev/null
> > +++ b/src/lj_utils_leb128.c
> > @@ -0,0 +1,132 @@
> 
> <snipped>
> 
> > +#define LINK_BIT               (0x80)
> > +#define MIN_TWOBYTE_VALUE      (0x80)
> > +#define PAYLOAD_MASK           (0x7f)
> > +#define SHIFT_STEP             (7)
> > +#define LEB_SIGN_BIT           (0x40)
> 
> Typo: Why did you change the whitespace here? Everything was OK with it
> in the previous version.

My bad. Fixed.

> 
> > +
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

See the iterative patch below. Branch is force-pushed.

===================================================================
diff --git a/src/Makefile b/src/Makefile
index 3218dfd..ae4489d 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -472,7 +472,6 @@ endif
 DASM_FLAGS= $(DASM_XFLAGS) $(DASM_AFLAGS)
 DASM_DASC= vm_$(DASM_ARCH).dasc
 
-UTILS_O= lj_utils_leb128.o
 BUILDVM_O= host/buildvm.o host/buildvm_asm.o host/buildvm_peobj.o \
 	   host/buildvm_lib.o host/buildvm_fold.o
 BUILDVM_T= host/buildvm
@@ -502,7 +501,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o lj_wbuf.o \
 	  lj_asm.o lj_trace.o lj_gdbjit.o \
 	  lj_ctype.o lj_cdata.o lj_cconv.o lj_ccall.o lj_ccallback.o \
 	  lj_carith.o lj_clib.o lj_cparse.o \
-	  lj_lib.o lj_alloc.o $(UTILS_O) lib_aux.o \
+	  lj_lib.o lj_alloc.o lj_utils_leb128.o lib_aux.o \
 	  $(LJLIB_O) lib_init.o
 
 LJVMCORE_O= $(LJVM_O) $(LJCORE_O)
diff --git a/src/lj_utils.h b/src/lj_utils.h
index 1671e8e..63d6c84 100644
--- a/src/lj_utils.h
+++ b/src/lj_utils.h
@@ -14,13 +14,13 @@
 #define LEB128_U64_MAXSIZE 10
 
 /*
-** Reads a value from a buffer of bytes to a int64_t output.
+** Reads a value from a buffer of bytes to an int64_t output.
 ** No bounds checks for the buffer. Returns number of bytes read.
 */
 size_t LJ_FASTCALL lj_utils_read_leb128(int64_t *out, const uint8_t *buffer);
 
 /*
-** Reads a value from a buffer of bytes to a int64_t output. Consumes no more
+** Reads a value from a buffer of bytes to an int64_t output. Consumes no more
 ** than n bytes. No bounds checks for the buffer. Returns number of bytes
 ** read. If more than n bytes is about to be consumed, returns 0 without
 ** touching out.
@@ -29,13 +29,13 @@ size_t LJ_FASTCALL lj_utils_read_leb128_n(int64_t *out, const uint8_t *buffer,
 					  size_t n);
 
 /*
-** Reads a value from a buffer of bytes to a uint64_t output.
+** Reads a value from a buffer of bytes to an uint64_t output.
 ** No bounds checks for the buffer. Returns number of bytes read.
 */
 size_t LJ_FASTCALL lj_utils_read_uleb128(uint64_t *out, const uint8_t *buffer);
 
 /*
-** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
+** Reads a value from a buffer of bytes to an uint64_t output. Consumes no more
 ** than n bytes. No bounds checks for the buffer. Returns number of bytes
 ** read. If more than n bytes is about to be consumed, returns 0 without
 ** touching out.
@@ -44,7 +44,7 @@ size_t LJ_FASTCALL lj_utils_read_uleb128_n(uint64_t *out, const uint8_t *buffer,
 					   size_t n);
 
 /*
-** Writes a value from an signed 64-bit input to a buffer of bytes.
+** Writes a value from a signed 64-bit input to a buffer of bytes.
 ** No bounds checks for the buffer. Returns number of bytes written.
 */
 size_t LJ_FASTCALL lj_utils_write_leb128(uint8_t *buffer, int64_t value);
diff --git a/src/lj_utils_leb128.c b/src/lj_utils_leb128.c
index ce8081b..0d50b83 100644
--- a/src/lj_utils_leb128.c
+++ b/src/lj_utils_leb128.c
@@ -10,11 +10,11 @@
 
 #include "lj_utils.h"
 
-#define LINK_BIT               (0x80)
-#define MIN_TWOBYTE_VALUE      (0x80)
-#define PAYLOAD_MASK           (0x7f)
-#define SHIFT_STEP             (7)
-#define LEB_SIGN_BIT           (0x40)
+#define LINK_BIT          (0x80)
+#define MIN_TWOBYTE_VALUE (0x80)
+#define PAYLOAD_MASK      (0x7f)
+#define SHIFT_STEP        (7)
+#define LEB_SIGN_BIT      (0x40)
 
 /* ------------------------- Reading LEB128/ULEB128 ------------------------- */
 
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer
  2020-12-26  9:32     ` Sergey Kaplun
@ 2020-12-26 13:57       ` Sergey Kaplun
  2020-12-26 18:47         ` Sergey Ostanevich
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-26 13:57 UTC (permalink / raw)
  To: Igor Munkin, tarantool-patches

Sorry for my Runglish again.
Change back articles "an uint" -> "a uint".

See the iterative patch below. Branch is force-pushed.
===================================================================
diff --git a/src/lj_utils.h b/src/lj_utils.h
index 63d6c84..f5c1579 100644
--- a/src/lj_utils.h
+++ b/src/lj_utils.h
@@ -29,13 +29,13 @@ size_t LJ_FASTCALL lj_utils_read_leb128_n(int64_t *out, const uint8_t *buffer,
 					  size_t n);
 
 /*
-** Reads a value from a buffer of bytes to an uint64_t output.
+** Reads a value from a buffer of bytes to a uint64_t output.
 ** No bounds checks for the buffer. Returns number of bytes read.
 */
 size_t LJ_FASTCALL lj_utils_read_uleb128(uint64_t *out, const uint8_t *buffer);
 
 /*
-** Reads a value from a buffer of bytes to an uint64_t output. Consumes no more
+** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
 ** than n bytes. No bounds checks for the buffer. Returns number of bytes
 ** read. If more than n bytes is about to be consumed, returns 0 without
 ** touching out.
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module Sergey Kaplun
@ 2020-12-26 14:22   ` Igor Munkin
  2020-12-26 15:26     ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-26 14:22 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! LGTM, except the nits below.

On 25.12.20, Sergey Kaplun wrote:
> This patch introduces the standalone module for writing data to the
> file, socket or memory (and so on) via the special buffer.
> The module provides the API for buffer initial setup
> and its convenient usage.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>   - Removed custom memcpy.
>   - lj_wbuf_addn() fills buffer to the end first and then flushes.
>   - Changed assert in lj_wbuf_flush() to early return.
> 
>  src/Makefile     |   2 +-
>  src/Makefile.dep |  23 ++++----
>  src/lj_wbuf.c    | 141 +++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_wbuf.h    |  87 +++++++++++++++++++++++++++++
>  src/ljamalg.c    |   1 +
>  5 files changed, 242 insertions(+), 12 deletions(-)
>  create mode 100644 src/lj_wbuf.c
>  create mode 100644 src/lj_wbuf.h
> 

<snipped>

> diff --git a/src/Makefile.dep b/src/Makefile.dep
> index 75409bf..59ed450 100644
> --- a/src/Makefile.dep
> +++ b/src/Makefile.dep
> @@ -211,28 +211,29 @@ lj_vmevent.o: lj_vmevent.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>   lj_vm.h lj_vmevent.h
>  lj_vmmath.o: lj_vmmath.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
>   lj_ir.h lj_vm.h
> +lj_wbuf.o: lj_wbuf.c lj_wbuf.h lj_def.h lua.h luaconf.h lj_utils.h
>  ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \

<snipped>

> - lj_utils_leb128.c lj_utils.h lib_aux.c lib_base.c lj_libdef.h lib_math.c \
> - lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
> - lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
> + lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c lib_string.c \
> + lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c lib_bit.c lib_jit.c \
> + lib_ffi.c lib_misc.c lib_init.c

Minor: Why this part is changed? If this is done on purpose, please move
this change to the previous patch. Otherwise, <make depends> looks buggy
and I propose to make these changes manually. Feel free to ignore.

>  luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h

<snipped>

> diff --git a/src/lj_wbuf.c b/src/lj_wbuf.c
> new file mode 100644
> index 0000000..8f090eb
> --- /dev/null
> +++ b/src/lj_wbuf.c
> @@ -0,0 +1,141 @@

<snipped>

> +  /*
> +  ** Very unlikely: We are told to write a large buffer at once.
> +  ** Buffer not belong to us so we must to pump data

Typo: s/Buffer not belong/Buffer doesn't belong/.

> +  ** through buffer.

Typo: s/through buffer/through the buffer/.

> +  */

<snipped>

> diff --git a/src/lj_wbuf.h b/src/lj_wbuf.h
> new file mode 100644
> index 0000000..58a109e
> --- /dev/null
> +++ b/src/lj_wbuf.h
> @@ -0,0 +1,87 @@

<snipped>

> +typedef size_t (*lj_wbuf_writer)(const void **data, size_t len, void *opt);
> +
> +/* Write buffer. */
> +struct lj_wbuf {
> +  /*
> +  ** Buffer writer which will called at buffer write.

Typo: s/will called at buffer write/is called on the buffer flush/.

> +  ** Should return amount of written bytes on success or zero in case of error.
> +  ** *data should contain new buffer of size greater or equal to len.

If you consider the <len> as a size of the new buffer, then you need to
pass its pointer but not its value to the writer. Otherwise this
requirement has no sense IMHO.

> +  ** If *data == NULL stream stops.

Minor: It would be nice to describe the contract of this function near
its typedef above.

> +  */
> +  lj_wbuf_writer writer;
> +  /* Context for writer function. */
> +  void *ctx;
> +  /* Buffer size. */
> +  size_t size;
> +  /* Saved errno in case of error. */
> +  int saved_errno;

There is an empty padding to fit the alignment here. It's better to move
this field right prior to the <flags>.

> +  /* Start of buffer. */
> +  uint8_t *buf;
> +  /* Current position in buffer. */
> +  uint8_t *pos;
> +  /* Internal flags. */
> +  volatile uint8_t flags;
> +};
> +
> +/* Init buffer. */

Typo: s/Init buffer/Initialize the buffer/.

> +void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> +		  uint8_t *mem, size_t size);

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module
  2020-12-26 14:22   ` Igor Munkin
@ 2020-12-26 15:26     ` Sergey Kaplun
  2020-12-26 19:03       ` Sergey Ostanevich
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-26 15:26 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 26.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! LGTM, except the nits below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > This patch introduces the standalone module for writing data to the
> > file, socket or memory (and so on) via the special buffer.
> > The module provides the API for buffer initial setup
> > and its convenient usage.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >   - Removed custom memcpy.
> >   - lj_wbuf_addn() fills buffer to the end first and then flushes.
> >   - Changed assert in lj_wbuf_flush() to early return.
> > 
> >  src/Makefile     |   2 +-
> >  src/Makefile.dep |  23 ++++----
> >  src/lj_wbuf.c    | 141 +++++++++++++++++++++++++++++++++++++++++++++++
> >  src/lj_wbuf.h    |  87 +++++++++++++++++++++++++++++
> >  src/ljamalg.c    |   1 +
> >  5 files changed, 242 insertions(+), 12 deletions(-)
> >  create mode 100644 src/lj_wbuf.c
> >  create mode 100644 src/lj_wbuf.h
> > 
> 
> <snipped>
> 
> > diff --git a/src/Makefile.dep b/src/Makefile.dep
> > index 75409bf..59ed450 100644
> > --- a/src/Makefile.dep
> > +++ b/src/Makefile.dep
> > @@ -211,28 +211,29 @@ lj_vmevent.o: lj_vmevent.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
> >   lj_vm.h lj_vmevent.h
> >  lj_vmmath.o: lj_vmmath.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
> >   lj_ir.h lj_vm.h
> > +lj_wbuf.o: lj_wbuf.c lj_wbuf.h lj_def.h lua.h luaconf.h lj_utils.h
> >  ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \
> 
> <snipped>
> 
> > - lj_utils_leb128.c lj_utils.h lib_aux.c lib_base.c lj_libdef.h lib_math.c \
> > - lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
> > - lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
> > + lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c lib_string.c \
> > + lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c lib_bit.c lib_jit.c \
> > + lib_ffi.c lib_misc.c lib_init.c
> 
> Minor: Why this part is changed? If this is done on purpose, please move
> this change to the previous patch. Otherwise, <make depends> looks buggy
> and I propose to make these changes manually. Feel free to ignore.

<lj_utils.h> is included before in <lj_wbuf.c>, so it should be added
right after it, without duplicates later.

> 
> >  luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h
> 
> <snipped>
> 
> > diff --git a/src/lj_wbuf.c b/src/lj_wbuf.c
> > new file mode 100644
> > index 0000000..8f090eb
> > --- /dev/null
> > +++ b/src/lj_wbuf.c
> > @@ -0,0 +1,141 @@
> 
> <snipped>
> 
> > +  /*
> > +  ** Very unlikely: We are told to write a large buffer at once.
> > +  ** Buffer not belong to us so we must to pump data
> 
> Typo: s/Buffer not belong/Buffer doesn't belong/.

Fixed. Thanks!

> 
> > +  ** through buffer.
> 
> Typo: s/through buffer/through the buffer/.

Fixed.

> 
> > +  */
> 
> <snipped>
> 
> > diff --git a/src/lj_wbuf.h b/src/lj_wbuf.h
> > new file mode 100644
> > index 0000000..58a109e
> > --- /dev/null
> > +++ b/src/lj_wbuf.h
> > @@ -0,0 +1,87 @@
> 
> <snipped>
> 
> > +typedef size_t (*lj_wbuf_writer)(const void **data, size_t len, void *opt);
> > +
> > +/* Write buffer. */
> > +struct lj_wbuf {
> > +  /*
> > +  ** Buffer writer which will called at buffer write.
> 
> Typo: s/will called at buffer write/is called on the buffer flush/.

Fixed.

> 
> > +  ** Should return amount of written bytes on success or zero in case of error.
> > +  ** *data should contain new buffer of size greater or equal to len.
> 
> If you consider the <len> as a size of the new buffer, then you need to
> pass its pointer but not its value to the writer. Otherwise this
> requirement has no sense IMHO.

Yes, this is misleading comment. I rewrote it the following way:
| *data should contain a buffer of at least the initial size.

> 
> > +  ** If *data == NULL stream stops.
> 
> Minor: It would be nice to describe the contract of this function near
> its typedef above.

OK, moved it above.

> 
> > +  */
> > +  lj_wbuf_writer writer;
> > +  /* Context for writer function. */
> > +  void *ctx;
> > +  /* Buffer size. */
> > +  size_t size;
> > +  /* Saved errno in case of error. */
> > +  int saved_errno;
> 
> There is an empty padding to fit the alignment here. It's better to move
> this field right prior to the <flags>.

My bad! Thanks!

> 
> > +  /* Start of buffer. */
> > +  uint8_t *buf;
> > +  /* Current position in buffer. */
> > +  uint8_t *pos;
> > +  /* Internal flags. */
> > +  volatile uint8_t flags;
> > +};
> > +
> > +/* Init buffer. */
> 
> Typo: s/Init buffer/Initialize the buffer/.

Fixed!

> 
> > +void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> > +		  uint8_t *mem, size_t size);
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

See the iterative patch below. Branch is force-pushed.

===================================================================
diff --git a/src/lj_wbuf.c b/src/lj_wbuf.c
index 8f090eb..ef46545 100644
--- a/src/lj_wbuf.c
+++ b/src/lj_wbuf.c
@@ -82,8 +82,8 @@ void lj_wbuf_addn(struct lj_wbuf *buf, const void *src, size_t n)
     return;
   /*
   ** Very unlikely: We are told to write a large buffer at once.
-  ** Buffer not belong to us so we must to pump data
-  ** through buffer.
+  ** Buffer doesn't belong to us so we must to pump data
+  ** through the buffer.
   */
   while (LJ_UNLIKELY(n > buf->size)) {
     const size_t left = wbuf_left(buf);
diff --git a/src/lj_wbuf.h b/src/lj_wbuf.h
index 58a109e..77a7cf4 100644
--- a/src/lj_wbuf.h
+++ b/src/lj_wbuf.h
@@ -31,32 +31,32 @@
 #define STREAM_ERR_IO 0x1
 #define STREAM_STOP   0x2
 
+/*
+** Buffer writer which is called on the buffer flush.
+** Should return amount of written bytes on success or zero in case of error.
+** *data should contain a buffer of at least the initial size.
+** If *data == NULL stream stops.
+*/
 typedef size_t (*lj_wbuf_writer)(const void **data, size_t len, void *opt);
 
 /* Write buffer. */
 struct lj_wbuf {
-  /*
-  ** Buffer writer which will called at buffer write.
-  ** Should return amount of written bytes on success or zero in case of error.
-  ** *data should contain new buffer of size greater or equal to len.
-  ** If *data == NULL stream stops.
-  */
   lj_wbuf_writer writer;
   /* Context for writer function. */
   void *ctx;
   /* Buffer size. */
   size_t size;
-  /* Saved errno in case of error. */
-  int saved_errno;
   /* Start of buffer. */
   uint8_t *buf;
   /* Current position in buffer. */
   uint8_t *pos;
+  /* Saved errno in case of error. */
+  int saved_errno;
   /* Internal flags. */
   volatile uint8_t flags;
 };
 
-/* Init buffer. */
+/* Initialize the buffer. */
 void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
 		  uint8_t *mem, size_t size);
 
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer
  2020-12-26 13:57       ` Sergey Kaplun
@ 2020-12-26 18:47         ` Sergey Ostanevich
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-26 18:47 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

LGTM.

Sergos

> On 26 Dec 2020, at 16:57, Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org> wrote:
> 
> Sorry for my Runglish again.
> Change back articles "an uint" -> "a uint".
> 
> See the iterative patch below. Branch is force-pushed.
> ===================================================================
> diff --git a/src/lj_utils.h b/src/lj_utils.h
> index 63d6c84..f5c1579 100644
> --- a/src/lj_utils.h
> +++ b/src/lj_utils.h
> @@ -29,13 +29,13 @@ size_t LJ_FASTCALL lj_utils_read_leb128_n(int64_t *out, const uint8_t *buffer,
> 					  size_t n);
> 
> /*
> -** Reads a value from a buffer of bytes to an uint64_t output.
> +** Reads a value from a buffer of bytes to a uint64_t output.
> ** No bounds checks for the buffer. Returns number of bytes read.
> */
> size_t LJ_FASTCALL lj_utils_read_uleb128(uint64_t *out, const uint8_t *buffer);
> 
> /*
> -** Reads a value from a buffer of bytes to an uint64_t output. Consumes no more
> +** Reads a value from a buffer of bytes to a uint64_t output. Consumes no more
> ** than n bytes. No bounds checks for the buffer. Returns number of bytes
> ** read. If more than n bytes is about to be consumed, returns 0 without
> ** touching out.
> ===================================================================
> 
> -- 
> Best regards,
> Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module
  2020-12-26 15:26     ` Sergey Kaplun
@ 2020-12-26 19:03       ` Sergey Ostanevich
  2020-12-26 19:37         ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-26 19:03 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

Hi! 

Thanks for the patch!

Just one problem and LGTM otherwise.

Sergos

> --- a/src/lj_wbuf.h
> +++ b/src/lj_wbuf.h
> @@ -31,32 +31,32 @@
> #define STREAM_ERR_IO 0x1
> #define STREAM_STOP   0x2
> 
> +/*
> +** Buffer writer which is called on the buffer flush.
> +** Should return amount of written bytes on success or zero in case of error.
> +** *data should contain a buffer of at least the initial size.

Does it mean the writer’s provider should preserve the ‘initial size’ is some way?
Should it create two or more buffers of the same size during initialization?

> +** If *data == NULL stream stops.
> +*/
> typedef size_t (*lj_wbuf_writer)(const void **data, size_t len, void *opt);
> 
> /* Write buffer. */
> struct lj_wbuf {
> -  /*
> -  ** Buffer writer which will called at buffer write.
> -  ** Should return amount of written bytes on success or zero in case of error.
> -  ** *data should contain new buffer of size greater or equal to len.
> -  ** If *data == NULL stream stops.
> -  */
>   lj_wbuf_writer writer;
>   /* Context for writer function. */
>   void *ctx;
>   /* Buffer size. */
>   size_t size;
> -  /* Saved errno in case of error. */
> -  int saved_errno;
>   /* Start of buffer. */
>   uint8_t *buf;
>   /* Current position in buffer. */
>   uint8_t *pos;
> +  /* Saved errno in case of error. */
> +  int saved_errno;
>   /* Internal flags. */
>   volatile uint8_t flags;
> };
> 
> -/* Init buffer. */
> +/* Initialize the buffer. */
> void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> 		  uint8_t *mem, size_t size);
> 
> ===================================================================
> 
> -- 
> Best regards,
> Sergey Kaplun


[-- Attachment #2: Type: text/html, Size: 36531 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
@ 2020-12-26 19:07   ` Sergey Ostanevich
  2020-12-27 23:48   ` Igor Munkin
  1 sibling, 0 replies; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-26 19:07 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

LGTM.

Sergos

> On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> This patch introduces LJ_VMST_LFUNC and LJ_VMST_FFUNC VM states
> separated from LJ_VMST_INERP. New VM states allow to determine the
> context of Lua VM execution for x86 and x64 arches. Also, LJ_VMST_C is
> renamed to LJ_VMST_CFUNC for naming consistence with new VM states.
> 
> Also, this patch adjusts stack layout for x86 and x64 arches to save VM
> state for its consistency while stack unwinding when error is raised.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
> - Moved `.if not WIN` macro check inside (save|restore)_vmstate_through
> - Fixed naming: SAVE_UNUSED\d -> UNUSED\d
> 
> src/lj_frame.h     |  18 +++----
> src/lj_obj.h       |   4 +-
> src/lj_profile.c   |   5 +-
> src/luajit-gdb.py  |  14 ++---
> src/vm_arm.dasc    |   6 +--
> src/vm_arm64.dasc  |   6 +--
> src/vm_mips.dasc   |   6 +--
> src/vm_mips64.dasc |   6 +--
> src/vm_ppc.dasc    |   6 +--
> src/vm_x64.dasc    |  93 ++++++++++++++++++++++----------
> src/vm_x86.dasc    | 131 +++++++++++++++++++++++++++++----------------
> 11 files changed, 188 insertions(+), 107 deletions(-)
> 
> diff --git a/src/lj_frame.h b/src/lj_frame.h
> index 19c49a4..2e693f9 100644
> --- a/src/lj_frame.h
> +++ b/src/lj_frame.h
> @@ -127,13 +127,13 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
> #define CFRAME_SIZE		(16*4)
> #define CFRAME_SHIFT_MULTRES	0
> #else
> -#define CFRAME_OFS_ERRF		(15*4)
> -#define CFRAME_OFS_NRES		(14*4)
> -#define CFRAME_OFS_PREV		(13*4)
> -#define CFRAME_OFS_L		(12*4)
> +#define CFRAME_OFS_ERRF		(19*4)
> +#define CFRAME_OFS_NRES		(18*4)
> +#define CFRAME_OFS_PREV		(17*4)
> +#define CFRAME_OFS_L		(16*4)
> #define CFRAME_OFS_PC		(6*4)
> #define CFRAME_OFS_MULTRES	(5*4)
> -#define CFRAME_SIZE		(12*4)
> +#define CFRAME_SIZE		(16*4)
> #define CFRAME_SHIFT_MULTRES	0
> #endif
> #elif LJ_TARGET_X64
> @@ -152,11 +152,11 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
> #define CFRAME_OFS_NRES		(22*4)
> #define CFRAME_OFS_MULTRES	(21*4)
> #endif
> -#define CFRAME_SIZE		(10*8)
> +#define CFRAME_SIZE		(12*8)
> #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 9*16 + 4*8)
> #define CFRAME_SHIFT_MULTRES	0
> #else
> -#define CFRAME_OFS_PREV		(4*8)
> +#define CFRAME_OFS_PREV		(6*8)
> #if LJ_GC64
> #define CFRAME_OFS_PC		(3*8)
> #define CFRAME_OFS_L		(2*8)
> @@ -171,9 +171,9 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
> #define CFRAME_OFS_MULTRES	(1*4)
> #endif
> #if LJ_NO_UNWIND
> -#define CFRAME_SIZE		(12*8)
> +#define CFRAME_SIZE		(14*8)
> #else
> -#define CFRAME_SIZE		(10*8)
> +#define CFRAME_SIZE		(12*8)
> #endif
> #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 16)
> #define CFRAME_SHIFT_MULTRES	0
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 927b347..7fb715e 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -512,7 +512,9 @@ typedef struct GCtab {
> /* VM states. */
> enum {
>   LJ_VMST_INTERP,	/* Interpreter. */
> -  LJ_VMST_C,		/* C function. */
> +  LJ_VMST_LFUNC,	/* Lua function. */
> +  LJ_VMST_FFUNC,	/* Fast function. */
> +  LJ_VMST_CFUNC,	/* C function. */
>   LJ_VMST_GC,		/* Garbage collector. */
>   LJ_VMST_EXIT,		/* Trace exit handler. */
>   LJ_VMST_RECORD,	/* Trace recorder. */
> diff --git a/src/lj_profile.c b/src/lj_profile.c
> index 116998e..637e03c 100644
> --- a/src/lj_profile.c
> +++ b/src/lj_profile.c
> @@ -157,7 +157,10 @@ static void profile_trigger(ProfileState *ps)
>     int st = g->vmstate;
>     ps->vmstate = st >= 0 ? 'N' :
> 		  st == ~LJ_VMST_INTERP ? 'I' :
> -		  st == ~LJ_VMST_C ? 'C' :
> +		  st == ~LJ_VMST_CFUNC ? 'C' :
> +		  /* Stubs for profiler hooks. */
> +		  st == ~LJ_VMST_FFUNC ? 'I' :
> +		  st == ~LJ_VMST_LFUNC ? 'I' :
> 		  st == ~LJ_VMST_GC ? 'G' : 'J';
>     g->hookmask = (mask | HOOK_PROFILE);
>     lj_dispatch_update(g);
> diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
> index 652c560..f1fd623 100644
> --- a/src/luajit-gdb.py
> +++ b/src/luajit-gdb.py
> @@ -206,12 +206,14 @@ def J(g):
> def vm_state(g):
>     return {
>         i2notu32(0): 'INTERP',
> -        i2notu32(1): 'C',
> -        i2notu32(2): 'GC',
> -        i2notu32(3): 'EXIT',
> -        i2notu32(4): 'RECORD',
> -        i2notu32(5): 'OPT',
> -        i2notu32(6): 'ASM',
> +        i2notu32(1): 'LFUNC',
> +        i2notu32(2): 'FFUNC',
> +        i2notu32(3): 'CFUNC',
> +        i2notu32(4): 'GC',
> +        i2notu32(5): 'EXIT',
> +        i2notu32(6): 'RECORD',
> +        i2notu32(7): 'OPT',
> +        i2notu32(8): 'ASM',
>     }.get(int(tou32(g['vmstate'])), 'TRACE')
> 
> def gc_state(g):
> diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
> index d4cdaf5..ae2efdf 100644
> --- a/src/vm_arm.dasc
> +++ b/src/vm_arm.dasc
> @@ -287,7 +287,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |  str RB, L->base
>   |   ldr KBASE, SAVE_NRES
> -  |    mv_vmstate CARG4, C
> +  |    mv_vmstate CARG4, CFUNC
>   |   sub BASE, BASE, #8
>   |  subs CARG3, RC, #8
>   |   lsl KBASE, KBASE, #3		// KBASE = (nresults_wanted+1)*8
> @@ -348,7 +348,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov CRET1, CARG2
>   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
>   |  ldr L, SAVE_L
> -  |   mv_vmstate CARG4, C
> +  |   mv_vmstate CARG4, CFUNC
>   |  ldr GL:CARG3, L->glref
>   |   str CARG4, GL:CARG3->vmstate
>   |   str L, GL:CARG3->cur_L
> @@ -4487,7 +4487,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     if (op == BC_FUNCCW) {
>       |  ldr CARG2, CFUNC:CARG3->f
>     }
> -    |    mv_vmstate CARG3, C
> +    |    mv_vmstate CARG3, CFUNC
>     |  mov CARG1, L
>     |   bhi ->vm_growstack_c		// Need to grow stack.
>     |    st_vmstate CARG3
> diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
> index 3eaf376..f783428 100644
> --- a/src/vm_arm64.dasc
> +++ b/src/vm_arm64.dasc
> @@ -332,7 +332,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |  str RB, L->base
>   |   ldrsw CARG2, SAVE_NRES		// CARG2 = nresults+1.
> -  |    mv_vmstate TMP0w, C
> +  |    mv_vmstate TMP0w, CFUNC
>   |   sub BASE, BASE, #16
>   |  subs TMP2, RC, #8
>   |    st_vmstate TMP0w
> @@ -391,7 +391,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov CRET1, CARG2
>   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
>   |  ldr L, SAVE_L
> -  |   mv_vmstate TMP0w, C
> +  |   mv_vmstate TMP0w, CFUNC
>   |  ldr GL, L->glref
>   |   st_vmstate TMP0w
>   |  b ->vm_leave_unw
> @@ -3816,7 +3816,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     if (op == BC_FUNCCW) {
>       |  ldr CARG2, CFUNC:CARG3->f
>     }
> -    |    mv_vmstate TMP0w, C
> +    |    mv_vmstate TMP0w, CFUNC
>     |  mov CARG1, L
>     |   bhi ->vm_growstack_c		// Need to grow stack.
>     |    st_vmstate TMP0w
> diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc
> index 1afd611..ec57d78 100644
> --- a/src/vm_mips.dasc
> +++ b/src/vm_mips.dasc
> @@ -403,7 +403,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |  addiu TMP1, RD, -8
>   |   sw TMP2, L->base
> -  |    li_vmstate C
> +  |    li_vmstate CFUNC
>   |   lw TMP2, SAVE_NRES
>   |   addiu BASE, BASE, -8
>   |    st_vmstate
> @@ -473,7 +473,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  move CRET1, CARG2
>   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
>   |  lw L, SAVE_L
> -  |   li TMP0, ~LJ_VMST_C
> +  |   li TMP0, ~LJ_VMST_CFUNC
>   |  lw GL:TMP1, L->glref
>   |  b ->vm_leave_unw
>   |.  sw TMP0, GL:TMP1->vmstate
> @@ -5085,7 +5085,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  sw BASE, L->base
>     |  sltu AT, TMP2, TMP1
>     |   sw RC, L->top
> -    |    li_vmstate C
> +    |    li_vmstate CFUNC
>     if (op == BC_FUNCCW) {
>       |  lw CARG2, CFUNC:RB->f
>     }
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index c06270a..9a749f9 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -449,7 +449,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |  addiu TMP1, RD, -8
>   |   sd TMP2, L->base
> -  |    li_vmstate C
> +  |    li_vmstate CFUNC
>   |   lw TMP2, SAVE_NRES
>   |   daddiu BASE, BASE, -16
>   |    st_vmstate
> @@ -517,7 +517,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  move CRET1, CARG2
>   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
>   |  ld L, SAVE_L
> -  |   li TMP0, ~LJ_VMST_C
> +  |   li TMP0, ~LJ_VMST_CFUNC
>   |  ld GL:TMP1, L->glref
>   |  b ->vm_leave_unw
>   |.  sw TMP0, GL:TMP1->vmstate
> @@ -4952,7 +4952,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  sd BASE, L->base
>     |  sltu AT, TMP2, TMP1
>     |   sd RC, L->top
> -    |    li_vmstate C
> +    |    li_vmstate CFUNC
>     if (op == BC_FUNCCW) {
>       |  ld CARG2, CFUNC:RB->f
>     }
> diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
> index b4260eb..62e9b68 100644
> --- a/src/vm_ppc.dasc
> +++ b/src/vm_ppc.dasc
> @@ -520,7 +520,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  // TMP0 = PC & FRAME_TYPE
>   |  cmpwi TMP0, FRAME_C
>   |   rlwinm TMP2, PC, 0, 0, 28
> -  |    li_vmstate C
> +  |    li_vmstate CFUNC
>   |   sub TMP2, BASE, TMP2		// TMP2 = previous base.
>   |  bney ->vm_returnp
>   |
> @@ -596,7 +596,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
>   |  lwz L, SAVE_L
>   |  .toc ld TOCREG, SAVE_TOC
> -  |   li TMP0, ~LJ_VMST_C
> +  |   li TMP0, ~LJ_VMST_CFUNC
>   |  lwz GL:TMP1, L->glref
>   |   stw TMP0, GL:TMP1->vmstate
>   |  b ->vm_leave_unw
> @@ -5060,7 +5060,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |   stp BASE, L->base
>     |   cmplw TMP1, TMP2
>     |    stp RC, L->top
> -    |     li_vmstate C
> +    |     li_vmstate CFUNC
>     |.if TOC
>     |  mtctr TMP3
>     |.else
> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 80753e0..83cc3e1 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc
> @@ -140,7 +140,7 @@
> |//-----------------------------------------------------------------------
> |.else			// x64/POSIX stack layout
> |
> -|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
> +|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
> |.macro saveregs_
> |  push rbx; push r15; push r14
> |.if NO_UNWIND
> @@ -161,26 +161,29 @@
> |
> |//----- 16 byte aligned,
> |.if NO_UNWIND
> -|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
> -|.define SAVE_R4,	aword [rsp+aword*10]
> -|.define SAVE_R3,	aword [rsp+aword*9]
> -|.define SAVE_R2,	aword [rsp+aword*8]
> -|.define SAVE_R1,	aword [rsp+aword*7]
> -|.define SAVE_RU2,	aword [rsp+aword*6]
> -|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> +|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
> +|.define SAVE_R4,	qword [rsp+qword*12]
> +|.define SAVE_R3,	qword [rsp+qword*11]
> +|.define SAVE_R2,	qword [rsp+qword*10]
> +|.define SAVE_R1,	qword [rsp+qword*9]
> +|.define SAVE_RU2,	qword [rsp+qword*8]
> +|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> |.else
> -|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
> -|.define SAVE_R4,	aword [rsp+aword*8]
> -|.define SAVE_R3,	aword [rsp+aword*7]
> -|.define SAVE_R2,	aword [rsp+aword*6]
> -|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> +|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
> +|.define SAVE_R4,	qword [rsp+qword*10]
> +|.define SAVE_R3,	qword [rsp+qword*9]
> +|.define SAVE_R2,	qword [rsp+qword*8]
> +|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> |.endif
> -|.define SAVE_CFRAME,	aword [rsp+aword*4]
> -|.define SAVE_PC,	aword [rsp+aword*3]
> -|.define SAVE_L,	aword [rsp+aword*2]
> +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> +|.define UNUSED2,	qword [rsp+qword*5]
> +|.define UNUSED1,	dword [rsp+dword*8]
> +|.define SAVE_VMSTATE,	dword [rsp+dword*8]
> +|.define SAVE_PC,	qword [rsp+qword*3]
> +|.define SAVE_L,	qword [rsp+qword*2]
> |.define SAVE_ERRF,	dword [rsp+dword*3]
> |.define SAVE_NRES,	dword [rsp+dword*2]
> -|.define TMP1,		aword [rsp]		//<-- rsp while in interpreter.
> +|.define TMP1,		qword [rsp]		//<-- rsp while in interpreter.
> |//----- 16 byte aligned
> |
> |.define TMP1d,		dword [rsp]
> @@ -342,6 +345,22 @@
> |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
> |.endmacro
> |
> +|// Save vmstate through register.
> +|.macro save_vmstate_through, reg
> +|.if not WIN
> +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> +|  mov SAVE_VMSTATE, reg
> +|.endif // WIN
> +|.endmacro
> +|
> +|// Restore vmstate through register.
> +|.macro restore_vmstate_through, reg
> +|.if not WIN
> +|  mov reg, SAVE_VMSTATE
> +|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
> +|.endif // WIN
> +|.endmacro
> +|
> |.macro fpop1; fstp st1; .endmacro
> |
> |// Synthesize SSE FP constants.
> @@ -416,7 +435,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  jnz ->vm_returnp
>   |
>   |  // Return to C.
> -  |  set_vmstate C
> +  |  set_vmstate CFUNC
>   |  and PC, -8
>   |  sub PC, BASE
>   |  neg PC				// Previous base = BASE - delta.
> @@ -448,6 +467,8 @@ static void build_subroutines(BuildCtx *ctx)
>   |  xor eax, eax			// Ok return status for vm_pcall.
>   |
>   |->vm_leave_unw:
> +  |  // DISPATCH required to set properly.
> +  |  restore_vmstate_through RAd
>   |  restoreregs
>   |  ret
>   |
> @@ -493,7 +514,9 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov L:DISPATCH, SAVE_L
>   |  mov GL:RB, L:DISPATCH->glref
>   |  mov GL:RB->cur_L, L:DISPATCH
> -  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
> +  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
> +  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
> +  |  add DISPATCH, GG_G2DISP
>   |  jmp ->vm_leave_unw
>   |
>   |->vm_unwind_rethrow:
> @@ -521,7 +544,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov [BASE-16], RA			// Prepend false to error message.
>   |  mov [BASE-8], RB
>   |  mov RA, -16			// Results start at BASE+RA = BASE-16.
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
>   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
>   |
>   |//-----------------------------------------------------------------------
> @@ -575,6 +598,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  lea KBASE, [esp+CFRAME_RESUME]
>   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
>   |  add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through TMPRd
>   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
>   |  mov SAVE_CFRAME, RD
>   |  mov SAVE_NRES, RDd
> @@ -585,7 +609,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |  // Resume after yield (like a return).
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
>   |  mov byte L:RB->status, RDL
>   |  mov BASE, L:RB->base
>   |  mov RD, L:RB->top
> @@ -622,11 +646,12 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov SAVE_CFRAME, KBASE
>   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
>   |  add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through RDd
>   |  mov L:RB->cframe, rsp
>   |
>   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
>   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
>   |  add PC, RA
>   |  sub PC, BASE			// PC = frame delta + frame type
> @@ -658,6 +683,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov SAVE_ERRF, 0			// No error function.
>   |  mov SAVE_NRES, KBASEd		// Neg. delta means cframe w/o frame.
>   |   add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through KBASEd
>   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
>   |
>   |  mov KBASE, L:RB->cframe		// Add our C frame to cframe chain.
> @@ -697,6 +723,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  cleartp LFUNC:KBASE
>   |  mov KBASE, LFUNC:KBASE->pc
>   |  mov KBASE, [KBASE+PC2PROTO(k)]
> +  |  set_vmstate LFUNC			// LFUNC after KBASE restoration
>   |  // BASE = base, RC = result, RB = meta base
>   |  jmp RA				// Jump to continuation.
>   |
> @@ -1137,15 +1164,16 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |.macro .ffunc, name
>   |->ff_ .. name:
> +  |  set_vmstate FFUNC
>   |.endmacro
>   |
>   |.macro .ffunc_1, name
> -  |->ff_ .. name:
> +  |  .ffunc name
>   |  cmp NARGS:RDd, 1+1;  jb ->fff_fallback
>   |.endmacro
>   |
>   |.macro .ffunc_2, name
> -  |->ff_ .. name:
> +  |  .ffunc name
>   |  cmp NARGS:RDd, 2+1;  jb ->fff_fallback
>   |.endmacro
>   |
> @@ -1578,7 +1606,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov L:PC, TMP1
>   |  mov BASE, L:RB->base
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
>   |
>   |  cmp eax, LUA_YIELD
>   |  ja >8
> @@ -1717,6 +1745,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  movzx RAd, PC_RA
>   |  neg RA
>   |  lea BASE, [BASE+RA*8-16]		// base = base - (RA+2)*8
> +  |  set_vmstate LFUNC			// LFUNC state after BASE restoration
>   |  ins_next
>   |
>   |6:  // Fill up results with nil.
> @@ -2481,7 +2510,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov KBASE, [KBASE+PC2PROTO(k)]
>   |  mov L:RB->base, BASE
>   |  mov qword [DISPATCH+DISPATCH_GL(jit_base)], 0
> -  |  set_vmstate INTERP
> +  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration
>   |  // Modified copy of ins_next which handles function header dispatch, too.
>   |  mov RCd, [PC]
>   |  movzx RAd, RCH
> @@ -2697,8 +2726,8 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov CARG1, CTSTATE
>   |  call extern lj_ccallback_enter	// (CTState *cts, void *cf)
>   |  // lua_State * returned in eax (RD).
> -  |  set_vmstate INTERP
>   |  mov BASE, L:RD->base
> +  |  set_vmstate LFUNC			// LFUNC after BASE restoration
>   |  mov RD, L:RD->top
>   |  sub RD, BASE
>   |  mov LFUNC:RB, [BASE-16]
> @@ -3974,6 +4003,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> 
>   case BC_CALL: case BC_CALLM:
>     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
>     if (op == BC_CALLM) {
>       |  add NARGS:RDd, MULTRES
>     }
> @@ -3995,6 +4025,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  mov LFUNC:RB, [RA-16]
>     |  checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
>     |->BC_CALLT_Z:
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
>     |  mov PC, [BASE-8]
>     |  test PCd, FRAME_TYPE
>     |  jnz >7
> @@ -4219,6 +4250,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  shl RAd, 3
>     }
>     |1:
> +    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
>     |  mov PC, [BASE-8]
>     |  mov MULTRES, RDd			// Save nresults+1.
>     |  test PCd, FRAME_TYPE		// Check frame type marker.
> @@ -4260,6 +4292,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  cleartp LFUNC:KBASE
>     |  mov KBASE, LFUNC:KBASE->pc
>     |  mov KBASE, [KBASE+PC2PROTO(k)]
> +    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
>     |  ins_next
>     |
>     |6:  // Fill up results with nil.
> @@ -4551,6 +4584,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
>     |  mov KBASE, [PC-4+PC2PROTO(k)]
>     |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
>     |  lea RA, [BASE+RA*8]		// Top of frame.
>     |  cmp RA, L:RB->maxstack
>     |  ja ->vm_growstack_f
> @@ -4588,6 +4622,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  mov [RD-8], RB			// Store delta + FRAME_VARG.
>     |  mov [RD-16], LFUNC:KBASE		// Store copy of LFUNC.
>     |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
>     |  lea RA, [RD+RA*8]
>     |  cmp RA, L:RB->maxstack
>     |  ja ->vm_growstack_v		// Need to grow stack.
> @@ -4643,7 +4678,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  mov CARG1, L:RB		// Caveat: CARG1 may be RA.
>     }
>     |  ja ->vm_growstack_c		// Need to grow stack.
> -    |  set_vmstate C
> +    |  set_vmstate CFUNC		// CFUNC before entering C function
>     if (op == BC_FUNCC) {
>       |  call KBASE			// (lua_State *L)
>     } else {
> @@ -4653,7 +4688,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  // nresults returned in eax (RD).
>     |  mov BASE, L:RB->base
>     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -    |  set_vmstate INTERP
> +    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
>     |  lea RA, [BASE+RD*8]
>     |  neg RA
>     |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index d76fbe3..b9dffa9 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc
> @@ -140,7 +140,7 @@
> |
> |.else
> |
> -|.define CFRAME_SPACE,	aword*7			// Delta for esp (see <--).
> +|.define CFRAME_SPACE,	dword*11			// Delta for esp (see <--).
> |.macro saveregs_
> |  push edi; push esi; push ebx
> |  sub esp, CFRAME_SPACE
> @@ -183,25 +183,30 @@
> |.define ARG1,		aword [esp]		//<-- esp while in interpreter.
> |//----- 16 byte aligned, ^^^ arguments for C callee
> |.else
> -|.define SAVE_ERRF,	aword [esp+aword*15]	// vm_pcall/vm_cpcall only.
> -|.define SAVE_NRES,	aword [esp+aword*14]
> -|.define SAVE_CFRAME,	aword [esp+aword*13]
> -|.define SAVE_L,	aword [esp+aword*12]
> +|.define SAVE_ERRF,	dword [esp+dword*19]	// vm_pcall/vm_cpcall only.
> +|.define SAVE_NRES,	dword [esp+dword*18]
> +|.define SAVE_CFRAME,	dword [esp+dword*17]
> +|.define SAVE_L,	dword [esp+dword*16]
> |//----- 16 byte aligned, ^^^ arguments from C caller
> -|.define SAVE_RET,	aword [esp+aword*11]	//<-- esp entering interpreter.
> -|.define SAVE_R4,	aword [esp+aword*10]
> -|.define SAVE_R3,	aword [esp+aword*9]
> -|.define SAVE_R2,	aword [esp+aword*8]
> +|.define SAVE_RET,	dword [esp+dword*15]	//<-- esp entering interpreter.
> +|.define SAVE_R4,	dword [esp+dword*14]
> +|.define SAVE_R3,	dword [esp+dword*13]
> +|.define SAVE_R2,	dword [esp+dword*12]
> |//----- 16 byte aligned
> -|.define SAVE_R1,	aword [esp+aword*7]	//<-- esp after register saves.
> -|.define SAVE_PC,	aword [esp+aword*6]
> -|.define TMP2,		aword [esp+aword*5]
> -|.define TMP1,		aword [esp+aword*4]
> +|.define UNUSED3,	dword [esp+dword*11]
> +|.define UNUSED2,	dword [esp+dword*10]
> +|.define UNUSED1,	dword [esp+dword*9]
> +|.define SAVE_VMSTATE,	dword [esp+dword*8]
> |//----- 16 byte aligned
> -|.define ARG4,		aword [esp+aword*3]
> -|.define ARG3,		aword [esp+aword*2]
> -|.define ARG2,		aword [esp+aword*1]
> -|.define ARG1,		aword [esp]		//<-- esp while in interpreter.
> +|.define SAVE_R1,	dword [esp+dword*7]	//<-- esp after register saves.
> +|.define SAVE_PC,	dword [esp+dword*6]
> +|.define TMP2,		dword [esp+dword*5]
> +|.define TMP1,		dword [esp+dword*4]
> +|//----- 16 byte aligned
> +|.define ARG4,		dword [esp+dword*3]
> +|.define ARG3,		dword [esp+dword*2]
> +|.define ARG2,		dword [esp+dword*1]
> +|.define ARG1,		dword [esp]		//<-- esp while in interpreter.
> |//----- 16 byte aligned, ^^^ arguments for C callee
> |.endif
> |
> @@ -269,7 +274,7 @@
> |//-----------------------------------------------------------------------
> |.else			// x64/POSIX stack layout
> |
> -|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
> +|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
> |.macro saveregs_
> |  push rbx; push r15; push r14
> |.if NO_UNWIND
> @@ -290,33 +295,35 @@
> |
> |//----- 16 byte aligned,
> |.if NO_UNWIND
> -|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
> -|.define SAVE_R4,	aword [rsp+aword*10]
> -|.define SAVE_R3,	aword [rsp+aword*9]
> -|.define SAVE_R2,	aword [rsp+aword*8]
> -|.define SAVE_R1,	aword [rsp+aword*7]
> -|.define SAVE_RU2,	aword [rsp+aword*6]
> -|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> +|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
> +|.define SAVE_R4,	qword [rsp+qword*12]
> +|.define SAVE_R3,	qword [rsp+qword*11]
> +|.define SAVE_R2,	qword [rsp+qword*10]
> +|.define SAVE_R1,	qword [rsp+qword*9]
> +|.define SAVE_RU2,	qword [rsp+qword*8]
> +|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> |.else
> -|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
> -|.define SAVE_R4,	aword [rsp+aword*8]
> -|.define SAVE_R3,	aword [rsp+aword*7]
> -|.define SAVE_R2,	aword [rsp+aword*6]
> -|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
> +|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
> +|.define SAVE_R4,	qword [rsp+qword*10]
> +|.define SAVE_R3,	qword [rsp+qword*9]
> +|.define SAVE_R2,	qword [rsp+qword*8]
> +|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
> |.endif
> -|.define SAVE_CFRAME,	aword [rsp+aword*4]
> +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> +|.define UNUSED1,	qword [rsp+qword*5]
> +|.define SAVE_VMSTATE,	dword [rsp+dword*8]
> |.define SAVE_PC,	dword [rsp+dword*7]
> |.define SAVE_L,	dword [rsp+dword*6]
> |.define SAVE_ERRF,	dword [rsp+dword*5]
> |.define SAVE_NRES,	dword [rsp+dword*4]
> -|.define TMPa,		aword [rsp+aword*1]
> +|.define TMPa,		qword [rsp+qword*1]
> |.define TMP2,		dword [rsp+dword*1]
> |.define TMP1,		dword [rsp]		//<-- rsp while in interpreter.
> |//----- 16 byte aligned
> |
> |// TMPQ overlaps TMP1/TMP2. MULTRES overlaps TMP2 (and TMPQ).
> |.define TMPQ,		qword [rsp]
> -|.define TMP3,		dword [rsp+aword*1]
> +|.define TMP3,		dword [rsp+qword*1]
> |.define MULTRES,	TMP2
> |
> |.endif
> @@ -433,6 +440,22 @@
> |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
> |.endmacro
> |
> +|// Save vmstate through register.
> +|.macro save_vmstate_through, reg
> +|.if not WIN
> +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> +|  mov SAVE_VMSTATE, reg
> +|.endif // WIN
> +|.endmacro
> +|
> +|// Restore vmstate through register.
> +|.macro restore_vmstate_through, reg
> +|.if not WIN
> +|  mov reg, SAVE_VMSTATE
> +|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], reg
> +|.endif // WIN
> +|.endmacro
> +|
> |// x87 compares.
> |.macro fcomparepp			// Compare and pop st0 >< st1.
> |  fucomip st1
> @@ -520,7 +543,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  jnz ->vm_returnp
>   |
>   |  // Return to C.
> -  |  set_vmstate C
> +  |  set_vmstate CFUNC
>   |  and PC, -8
>   |  sub PC, BASE
>   |  neg PC				// Previous base = BASE - delta.
> @@ -559,6 +582,8 @@ static void build_subroutines(BuildCtx *ctx)
>   |  xor eax, eax			// Ok return status for vm_pcall.
>   |
>   |->vm_leave_unw:
> +  |  // DISPATCH required to set properly.
> +  |  restore_vmstate_through RA
>   |  restoreregs
>   |  ret
>   |
> @@ -613,7 +638,9 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov L:DISPATCH, SAVE_L
>   |  mov GL:RB, L:DISPATCH->glref
>   |  mov dword GL:RB->cur_L, L:DISPATCH
> -  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
> +  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
> +  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
> +  |  add DISPATCH, GG_G2DISP
>   |  jmp ->vm_leave_unw
>   |
>   |->vm_unwind_rethrow:
> @@ -647,7 +674,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov PC, [BASE-4]			// Fetch PC of previous frame.
>   |  mov dword [BASE-4], LJ_TFALSE	// Prepend false to error message.
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
>   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
>   |
>   |.if WIN and not X64
> @@ -714,10 +741,11 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov RA, INARG_BASE			// Caveat: overlaps SAVE_CFRAME!
>   |.endif
>   |  mov PC, FRAME_CP
> -  |  xor RD, RD
>   |  lea KBASEa, [esp+CFRAME_RESUME]
>   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
>   |  add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through RD
> +  |  xor RD, RD
>   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
>   |  mov SAVE_CFRAME, RDa
>   |.if X64
> @@ -730,7 +758,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |  // Resume after yield (like a return).
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
>   |  mov byte L:RB->status, RDL
>   |  mov BASE, L:RB->base
>   |  mov RD, L:RB->top
> @@ -774,6 +802,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov SAVE_CFRAME, KBASEa
>   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
>   |  add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through RD
>   |.if X64
>   |  mov L:RB->cframe, rsp
>   |.else
> @@ -782,7 +811,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
>   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
>   |  add PC, RA
>   |  sub PC, BASE			// PC = frame delta + frame type
> @@ -823,6 +852,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov SAVE_ERRF, 0			// No error function.
>   |  mov SAVE_NRES, KBASE		// Neg. delta means cframe w/o frame.
>   |   add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through KBASE
>   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
>   |
>   |.if X64
> @@ -885,6 +915,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov KBASE, LFUNC:KBASE->pc
>   |  mov KBASE, [KBASE+PC2PROTO(k)]
>   |  // BASE = base, RC = result, RB = meta base
> +  |  set_vmstate LFUNC			// LFUNC after KBASE restoration
>   |  jmp RAa				// Jump to continuation.
>   |
>   |.if FFI
> @@ -1409,15 +1440,16 @@ static void build_subroutines(BuildCtx *ctx)
>   |
>   |.macro .ffunc, name
>   |->ff_ .. name:
> +  |  set_vmstate FFUNC
>   |.endmacro
>   |
>   |.macro .ffunc_1, name
> -  |->ff_ .. name:
> +  |  .ffunc name
>   |  cmp NARGS:RD, 1+1;  jb ->fff_fallback
>   |.endmacro
>   |
>   |.macro .ffunc_2, name
> -  |->ff_ .. name:
> +  |  .ffunc name
>   |  cmp NARGS:RD, 2+1;  jb ->fff_fallback
>   |.endmacro
>   |
> @@ -1924,7 +1956,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |.endif
>   |  mov BASE, L:RB->base
>   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
>   |
>   |  cmp eax, LUA_YIELD
>   |  ja >8
> @@ -2089,6 +2121,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  movzx RA, PC_RA
>   |  not RAa				// Note: ~RA = -(RA+1)
>   |  lea BASE, [BASE+RA*8]		// base = base - (RA+1)*8
> +  |  set_vmstate LFUNC			// LFUNC state after BASE restoration
>   |  ins_next
>   |
>   |6:  // Fill up results with nil.
> @@ -2933,7 +2966,7 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov KBASE, [KBASE+PC2PROTO(k)]
>   |  mov L:RB->base, BASE
>   |  mov dword [DISPATCH+DISPATCH_GL(jit_base)], 0
> -  |  set_vmstate INTERP
> +  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration
>   |  // Modified copy of ins_next which handles function header dispatch, too.
>   |  mov RC, [PC]
>   |  movzx RA, RCH
> @@ -3203,8 +3236,8 @@ static void build_subroutines(BuildCtx *ctx)
>   |  mov FCARG1, CTSTATE
>   |  call extern lj_ccallback_enter@8	// (CTState *cts, void *cf)
>   |  // lua_State * returned in eax (RD).
> -  |  set_vmstate INTERP
>   |  mov BASE, L:RD->base
> +  |  set_vmstate LFUNC			// LFUNC after BASE restoration
>   |  mov RD, L:RD->top
>   |  sub RD, BASE
>   |  mov LFUNC:RB, [BASE-8]
> @@ -4683,6 +4716,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> 
>   case BC_CALL: case BC_CALLM:
>     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
>     if (op == BC_CALLM) {
>       |  add NARGS:RD, MULTRES
>     }
> @@ -4706,6 +4740,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  cmp dword [RA-4], LJ_TFUNC
>     |  jne ->vmeta_call
>     |->BC_CALLT_Z:
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
>     |  mov PC, [BASE-4]
>     |  test PC, FRAME_TYPE
>     |  jnz >7
> @@ -4989,6 +5024,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |  shl RA, 3
>     }
>     |1:
> +    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
>     |  mov PC, [BASE-4]
>     |  mov MULTRES, RD			// Save nresults+1.
>     |  test PC, FRAME_TYPE		// Check frame type marker.
> @@ -5043,6 +5079,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  mov LFUNC:KBASE, [BASE-8]
>     |  mov KBASE, LFUNC:KBASE->pc
>     |  mov KBASE, [KBASE+PC2PROTO(k)]
> +    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
>     |  ins_next
>     |
>     |6:  // Fill up results with nil.
> @@ -5330,6 +5367,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
>     |  mov KBASE, [PC-4+PC2PROTO(k)]
>     |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
>     |  lea RA, [BASE+RA*8]		// Top of frame.
>     |  cmp RA, L:RB->maxstack
>     |  ja ->vm_growstack_f
> @@ -5367,6 +5405,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  mov [RD-4], RB			// Store delta + FRAME_VARG.
>     |  mov [RD-8], LFUNC:KBASE		// Store copy of LFUNC.
>     |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
>     |  lea RA, [RD+RA*8]
>     |  cmp RA, L:RB->maxstack
>     |  ja ->vm_growstack_v		// Need to grow stack.
> @@ -5431,7 +5470,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>       |.endif
>     }
>     |  ja ->vm_growstack_c		// Need to grow stack.
> -    |  set_vmstate C
> +    |  set_vmstate CFUNC		// CFUNC before entering C function
>     if (op == BC_FUNCC) {
>       |  call KBASEa			// (lua_State *L)
>     } else {
> @@ -5441,7 +5480,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>     |  // nresults returned in eax (RD).
>     |  mov BASE, L:RB->base
>     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -    |  set_vmstate INTERP
> +    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
>     |  lea RA, [BASE+RD*8]
>     |  neg RA
>     |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field Sergey Kaplun
@ 2020-12-26 19:12   ` Sergey Ostanevich
  2020-12-26 19:42     ` Sergey Kaplun
  2020-12-27 13:09   ` Igor Munkin
  1 sibling, 1 reply; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-26 19:12 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Just some nits in the message, LGTM.

Sergos.

> On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> To determine currently allocating coroutine (that may not be equal to
> currently executed one) a new field called mem_L is added to
                               remove^^^^^^^                  ^ the
> global_State structure. This field is set on each allocation event and
> stores the coroutine address that is used for allocation.
> 
> Part of tarantool/tarantool#5442
> ---
> src/lj_gc.c  | 2 ++
> src/lj_obj.h | 1 +
> 2 files changed, 3 insertions(+)
> 
> diff --git a/src/lj_gc.c b/src/lj_gc.c
> index 44c8aa1..800fb2c 100644
> --- a/src/lj_gc.c
> +++ b/src/lj_gc.c
> @@ -852,6 +852,8 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
> {
>   global_State *g = G(L);
>   lua_assert((osz == 0) == (p == NULL));
> +
> +  setgcref(g->mem_L, obj2gco(L));
>   p = g->allocf(g->allocd, p, osz, nsz);
>   if (p == NULL && nsz > 0)
>     lj_err_mem(L);
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 7fb715e..c94617d 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -649,6 +649,7 @@ typedef struct global_State {
>   BCIns bc_cfunc_int;	/* Bytecode for internal C function calls. */
>   BCIns bc_cfunc_ext;	/* Bytecode for external C function calls. */
>   GCRef cur_L;		/* Currently executing lua_State. */
> +  GCRef mem_L;		/* Currently allocating lua_State. */
>   MRef jit_base;	/* Current JIT code L->base or NULL. */
>   MRef ctype_state;	/* Pointer to C type state. */
>   GCRef gcroot[GCROOT_MAX];  /* GC roots. */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module
  2020-12-26 19:03       ` Sergey Ostanevich
@ 2020-12-26 19:37         ` Sergey Kaplun
  2020-12-28  1:43           ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-26 19:37 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi!

Thanks for the review!

On 26.12.20, Sergey Ostanevich wrote:
> Hi! 
> 
> Thanks for the patch!
> 
> Just one problem and LGTM otherwise.
> 
> Sergos
> 
> > --- a/src/lj_wbuf.h
> > +++ b/src/lj_wbuf.h
> > @@ -31,32 +31,32 @@
> > #define STREAM_ERR_IO 0x1
> > #define STREAM_STOP   0x2
> > 
> > +/*
> > +** Buffer writer which is called on the buffer flush.
> > +** Should return amount of written bytes on success or zero in case of error.
> > +** *data should contain a buffer of at least the initial size.
> 
> Does it mean the writer’s provider should preserve the ‘initial size’ is some way?

Yes, inside `ctx`, for example.

> Should it create two or more buffers of the same size during initialization?

If necessary, but it is not required. This is the user's responsibility.

> 
> > +** If *data == NULL stream stops.
> > +*/
> > typedef size_t (*lj_wbuf_writer)(const void **data, size_t len, void *opt);
> > 
> > /* Write buffer. */
> > struct lj_wbuf {
> > -  /*
> > -  ** Buffer writer which will called at buffer write.
> > -  ** Should return amount of written bytes on success or zero in case of error.
> > -  ** *data should contain new buffer of size greater or equal to len.
> > -  ** If *data == NULL stream stops.
> > -  */
> >   lj_wbuf_writer writer;
> >   /* Context for writer function. */
> >   void *ctx;
> >   /* Buffer size. */
> >   size_t size;
> > -  /* Saved errno in case of error. */
> > -  int saved_errno;
> >   /* Start of buffer. */
> >   uint8_t *buf;
> >   /* Current position in buffer. */
> >   uint8_t *pos;
> > +  /* Saved errno in case of error. */
> > +  int saved_errno;
> >   /* Internal flags. */
> >   volatile uint8_t flags;
> > };
> > 
> > -/* Init buffer. */
> > +/* Initialize the buffer. */
> > void lj_wbuf_init(struct lj_wbuf *buf, lj_wbuf_writer writer, void *ctx,
> > 		  uint8_t *mem, size_t size);
> > 
> > ===================================================================
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field
  2020-12-26 19:12   ` Sergey Ostanevich
@ 2020-12-26 19:42     ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-26 19:42 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi!

Thanks for the review!

On 26.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Just some nits in the message, LGTM.
> 
> Sergos.
> 
> > On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
> > 
> > To determine currently allocating coroutine (that may not be equal to
> > currently executed one) a new field called mem_L is added to
>                                remove^^^^^^^                  ^ the

Reworded commit message.

> > global_State structure. This field is set on each allocation event and
> > stores the coroutine address that is used for allocation.
> > 
> > Part of tarantool/tarantool#5442
> > ---

<snipped>

> > 2.28.0
> > 
> 

Reworded commit message to the following. Branch is force-pushed.

===================================================================
core: introduce new mem_L field

To determine currently allocating coroutine (that may not be equal to
currently executed one) a new field mem_L is added to the
global_State structure. This field is set on each allocation event and
stores the coroutine address that is used for allocation.

Part of tarantool/tarantool#5442
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser Sergey Kaplun
@ 2020-12-26 22:56   ` Igor Munkin
  2020-12-27  7:16     ` Sergey Kaplun
  2020-12-27 13:24   ` Sergey Ostanevich
  1 sibling, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-26 22:56 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! LGTM, except the several nits below.

On 25.12.20, Sergey Kaplun wrote:
> This patch adds a parser for binary data dumped via the memory profiler. It is
> a set of the following Lua modules:
> * utils/bufread.lua: read binary data from a file.
> * utils/symtab.lua: symbol table decode functions
> * memprof/parse.lua: decode the memory profiler event stream
> * memprof/humanize.lua: display decoded data in human readable format
> * memprof.lua: Lua script to display data
> 
> There is also a stand-alone bash script <luajit-parse-memprof> that displays
> human readable parsed data to a stdout. It calls <memprof.lua> with a
> corresponding LUA_PATH.
> 
> Part of tarantool/tarantool#5442
> Part of tarantool/tarantool#5490
> ---
> 
> Changes in v2:
>   - Add (un)?install sections in Makefile
>   - Modify bash script correspondingly.
>   - Change Lua modules layout.
>   - Adjusted test. Check that errno returns in case of error is added.
>   - Code clean up.
> 
>  Makefile                           |  39 +++++-
>  test/misclib-memprof-lapi.test.lua | 135 +++++++++++++++++++++
>  tools/luajit-parse-memprof         |   9 ++
>  tools/memprof.lua                  | 109 +++++++++++++++++
>  tools/memprof/humanize.lua         |  45 +++++++
>  tools/memprof/parse.lua            | 188 +++++++++++++++++++++++++++++
>  tools/utils/bufread.lua            | 147 ++++++++++++++++++++++
>  tools/utils/symtab.lua             |  89 ++++++++++++++
>  8 files changed, 757 insertions(+), 4 deletions(-)
>  create mode 100755 test/misclib-memprof-lapi.test.lua
>  create mode 100755 tools/luajit-parse-memprof
>  create mode 100644 tools/memprof.lua
>  create mode 100644 tools/memprof/humanize.lua
>  create mode 100644 tools/memprof/parse.lua
>  create mode 100644 tools/utils/bufread.lua
>  create mode 100644 tools/utils/symtab.lua
> 
> diff --git a/Makefile b/Makefile
> index 4a56917..ba4aa2f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -37,6 +37,9 @@ INSTALL_INC=   $(DPREFIX)/include/luajit-$(MAJVER).$(MINVER)
>  
>  INSTALL_LJLIBD= $(INSTALL_SHARE)/luajit-$(VERSION)
>  INSTALL_JITLIB= $(INSTALL_LJLIBD)/jit
> +INSTALL_UTILSLIB= $(INSTALL_LJLIBD)/utils
> +INSTALL_MEMPROFLIB= $(INSTALL_LJLIBD)/memprof

Minor: It's better to use $(INSALL_TOOLSLIB) here, isn't it?

> +INSTALL_TOOLSLIB= $(INSTALL_LJLIBD)
>  INSTALL_LMODD= $(INSTALL_SHARE)/lua
>  INSTALL_LMOD= $(INSTALL_LMODD)/$(ABIVER)
>  INSTALL_CMODD= $(INSTALL_LIB)/lua

<snipped>

> diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> new file mode 100755
> index 0000000..e02c6fa
> --- /dev/null
> +++ b/test/misclib-memprof-lapi.test.lua
> @@ -0,0 +1,135 @@

<snipped>

> +local function check_alloc_report(alloc, line, function_line, nevents)
> +  assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
> +  assert(alloc[line].num == nevents, ("got=%d, ecpected=%d"):format(

Typo: s/ecpected/expected/.

> +    alloc[line].num,
> +    nevents
> +  ))
> +  return true
> +end

<snipped>

> diff --git a/tools/luajit-parse-memprof b/tools/luajit-parse-memprof
> new file mode 100755
> index 0000000..c814301
> --- /dev/null
> +++ b/tools/luajit-parse-memprof
> @@ -0,0 +1,9 @@
> +#!/bin/bash
> +#
> +# Launcher for memprof parser.
> +
> +# This two variables are replaced on installing.
> +TOOL_DIR=$(dirname `readlink -f $0`)
> +LUAJIT_BIN=$TOOL_DIR/../src/luajit
> +
> +LUA_PATH="$TOOL_DIR/?.lua;;" $LUAJIT_BIN $TOOL_DIR/memprof.lua $@
> diff --git a/tools/memprof.lua b/tools/memprof.lua
> new file mode 100644
> index 0000000..7476757
> --- /dev/null
> +++ b/tools/memprof.lua
> @@ -0,0 +1,109 @@

<snipped>

> +local bufread = require "utils.bufread"
> +local memprof = require "memprof.parse"
> +local symtab  = require "utils.symtab"
> +local view    = require "memprof.humanize"

Typo: Still mess with whitespace.

> +

<snipped>

> +local reader  = bufread.new(inputfile)
> +local symbols = symtab.parse(reader)
> +local events  = memprof.parse(reader, symbols)

Typo: Still mess with whitespace.

> +

<snipped>

> diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> new file mode 100644
> index 0000000..f3e5e31
> --- /dev/null
> +++ b/tools/utils/symtab.lua
> @@ -0,0 +1,89 @@
> +-- Parser of LuaJIT's symtab binary stream.
> +-- The format spec can be found in <src/lj_symtab.h>.

I see no such file. I guess you mean <src/lj_memprof.h>.

> +--

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-26 22:56   ` Igor Munkin
@ 2020-12-27  7:16     ` Sergey Kaplun
  2020-12-28  5:30       ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-27  7:16 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 27.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! LGTM, except the several nits below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > This patch adds a parser for binary data dumped via the memory profiler. It is
> > a set of the following Lua modules:
> > * utils/bufread.lua: read binary data from a file.
> > * utils/symtab.lua: symbol table decode functions
> > * memprof/parse.lua: decode the memory profiler event stream
> > * memprof/humanize.lua: display decoded data in human readable format
> > * memprof.lua: Lua script to display data
> > 
> > There is also a stand-alone bash script <luajit-parse-memprof> that displays
> > human readable parsed data to a stdout. It calls <memprof.lua> with a
> > corresponding LUA_PATH.
> > 
> > Part of tarantool/tarantool#5442
> > Part of tarantool/tarantool#5490
> > ---
> > 
> > Changes in v2:
> >   - Add (un)?install sections in Makefile
> >   - Modify bash script correspondingly.
> >   - Change Lua modules layout.
> >   - Adjusted test. Check that errno returns in case of error is added.
> >   - Code clean up.
> > 
> >  Makefile                           |  39 +++++-
> >  test/misclib-memprof-lapi.test.lua | 135 +++++++++++++++++++++
> >  tools/luajit-parse-memprof         |   9 ++
> >  tools/memprof.lua                  | 109 +++++++++++++++++
> >  tools/memprof/humanize.lua         |  45 +++++++
> >  tools/memprof/parse.lua            | 188 +++++++++++++++++++++++++++++
> >  tools/utils/bufread.lua            | 147 ++++++++++++++++++++++
> >  tools/utils/symtab.lua             |  89 ++++++++++++++
> >  8 files changed, 757 insertions(+), 4 deletions(-)
> >  create mode 100755 test/misclib-memprof-lapi.test.lua
> >  create mode 100755 tools/luajit-parse-memprof
> >  create mode 100644 tools/memprof.lua
> >  create mode 100644 tools/memprof/humanize.lua
> >  create mode 100644 tools/memprof/parse.lua
> >  create mode 100644 tools/utils/bufread.lua
> >  create mode 100644 tools/utils/symtab.lua
> > 
> > diff --git a/Makefile b/Makefile
> > index 4a56917..ba4aa2f 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -37,6 +37,9 @@ INSTALL_INC=   $(DPREFIX)/include/luajit-$(MAJVER).$(MINVER)
> >  
> >  INSTALL_LJLIBD= $(INSTALL_SHARE)/luajit-$(VERSION)
> >  INSTALL_JITLIB= $(INSTALL_LJLIBD)/jit
> > +INSTALL_UTILSLIB= $(INSTALL_LJLIBD)/utils
> > +INSTALL_MEMPROFLIB= $(INSTALL_LJLIBD)/memprof
> 
> Minor: It's better to use $(INSALL_TOOLSLIB) here, isn't it?

Yes, fixed.

> 
> > +INSTALL_TOOLSLIB= $(INSTALL_LJLIBD)
> >  INSTALL_LMODD= $(INSTALL_SHARE)/lua
> >  INSTALL_LMOD= $(INSTALL_LMODD)/$(ABIVER)
> >  INSTALL_CMODD= $(INSTALL_LIB)/lua
> 
> <snipped>
> 
> > diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> > new file mode 100755
> > index 0000000..e02c6fa
> > --- /dev/null
> > +++ b/test/misclib-memprof-lapi.test.lua
> > @@ -0,0 +1,135 @@
> 
> <snipped>
> 
> > +local function check_alloc_report(alloc, line, function_line, nevents)
> > +  assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
> > +  assert(alloc[line].num == nevents, ("got=%d, ecpected=%d"):format(
> 
> Typo: s/ecpected/expected/.

Thanks! Fixed!

> 
> > +    alloc[line].num,
> > +    nevents
> > +  ))
> > +  return true
> > +end
> 
> <snipped>
> 
> > diff --git a/tools/luajit-parse-memprof b/tools/luajit-parse-memprof
> > new file mode 100755
> > index 0000000..c814301
> > --- /dev/null
> > +++ b/tools/luajit-parse-memprof
> > @@ -0,0 +1,9 @@
> > +#!/bin/bash
> > +#
> > +# Launcher for memprof parser.
> > +
> > +# This two variables are replaced on installing.
> > +TOOL_DIR=$(dirname `readlink -f $0`)
> > +LUAJIT_BIN=$TOOL_DIR/../src/luajit
> > +
> > +LUA_PATH="$TOOL_DIR/?.lua;;" $LUAJIT_BIN $TOOL_DIR/memprof.lua $@
> > diff --git a/tools/memprof.lua b/tools/memprof.lua
> > new file mode 100644
> > index 0000000..7476757
> > --- /dev/null
> > +++ b/tools/memprof.lua
> > @@ -0,0 +1,109 @@
> 
> <snipped>
> 
> > +local bufread = require "utils.bufread"
> > +local memprof = require "memprof.parse"
> > +local symtab  = require "utils.symtab"
> > +local view    = require "memprof.humanize"
> 
> Typo: Still mess with whitespace.

Sorry, fixed!

> 
> > +
> 
> <snipped>
> 
> > +local reader  = bufread.new(inputfile)
> > +local symbols = symtab.parse(reader)
> > +local events  = memprof.parse(reader, symbols)
> 
> Typo: Still mess with whitespace.

My bad, fixed!

> 
> > +
> 
> <snipped>
> 
> > diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
> > new file mode 100644
> > index 0000000..f3e5e31
> > --- /dev/null
> > +++ b/tools/utils/symtab.lua
> > @@ -0,0 +1,89 @@
> > +-- Parser of LuaJIT's symtab binary stream.
> > +-- The format spec can be found in <src/lj_symtab.h>.
> 
> I see no such file. I guess you mean <src/lj_memprof.h>.

Yes, thanks, fixed!

> 
> > +--
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

Also fixed mess with quotes for strings.
See the iterative patch below. Branch is force-pushed.

===================================================================
diff --git a/Makefile b/Makefile
index ba4aa2f..61967df 100644
--- a/Makefile
+++ b/Makefile
@@ -37,9 +37,9 @@ INSTALL_INC=   $(DPREFIX)/include/luajit-$(MAJVER).$(MINVER)
 
 INSTALL_LJLIBD= $(INSTALL_SHARE)/luajit-$(VERSION)
 INSTALL_JITLIB= $(INSTALL_LJLIBD)/jit
-INSTALL_UTILSLIB= $(INSTALL_LJLIBD)/utils
-INSTALL_MEMPROFLIB= $(INSTALL_LJLIBD)/memprof
 INSTALL_TOOLSLIB= $(INSTALL_LJLIBD)
+INSTALL_UTILSLIB= $(INSTALL_TOOLSLIB)/utils
+INSTALL_MEMPROFLIB= $(INSTALL_TOOLSLIB)/memprof
 INSTALL_LMODD= $(INSTALL_SHARE)/lua
 INSTALL_LMOD= $(INSTALL_LMODD)/$(ABIVER)
 INSTALL_CMODD= $(INSTALL_LIB)/lua
diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
index e02c6fa..2366c00 100755
--- a/test/misclib-memprof-lapi.test.lua
+++ b/test/misclib-memprof-lapi.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 
-local tap = require('tap')
+local tap = require("tap")
 
 local test = tap.test("misc-memprof-lapi")
 test:plan(9)
@@ -9,9 +9,9 @@ jit.off()
 jit.flush()
 
 -- FIXME: Launch tests with LUA_PATH enviroment variable.
-local path = arg[0]:gsub('/[^/]+%.test%.lua', '')
-local path_suffix = '../tools/?.lua;'
-package.path = ('%s/%s;'):format(path, path_suffix)..package.path
+local path = arg[0]:gsub("/[^/]+%.test%.lua", "")
+local path_suffix = "../tools/?.lua;"
+package.path = ("%s/%s;"):format(path, path_suffix)..package.path
 
 local table_new = require "table.new"
 
@@ -19,8 +19,8 @@ local bufread = require "utils.bufread"
 local memprof = require "memprof.parse"
 local symtab = require "utils.symtab"
 
-local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
-local BAD_PATH = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
+local TMP_BINFILE = arg[0]:gsub("[^/]+%.test%.lua", "%.%1.memprofdata.tmp.bin")
+local BAD_PATH = arg[0]:gsub("[^/]+%.test%.lua", "%1/memprofdata.tmp.bin")
 
 local function payload()
   -- Preallocate table to avoid array part reallocations.
@@ -73,7 +73,7 @@ end
 
 local function check_alloc_report(alloc, line, function_line, nevents)
   assert(string.format("@%s:%d", arg[0], function_line) == alloc[line].name)
-  assert(alloc[line].num == nevents, ("got=%d, ecpected=%d"):format(
+  assert(alloc[line].num == nevents, ("got=%d, expected=%d"):format(
     alloc[line].num,
     nevents
   ))
diff --git a/tools/memprof.lua b/tools/memprof.lua
index 7476757..92d192e 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -12,8 +12,8 @@
 
 local bufread = require "utils.bufread"
 local memprof = require "memprof.parse"
-local symtab  = require "utils.symtab"
-local view    = require "memprof.humanize"
+local symtab = require "utils.symtab"
+local view = require "memprof.humanize"
 
 local stdout, stderr = io.stdout, io.stderr
 local match, gmatch = string.match, string.gmatch
@@ -92,9 +92,9 @@ end
 
 local inputfile = parseargs{...}
 
-local reader  = bufread.new(inputfile)
+local reader = bufread.new(inputfile)
 local symbols = symtab.parse(reader)
-local events  = memprof.parse(reader, symbols)
+local events = memprof.parse(reader, symbols)
 
 stdout:write("ALLOCATIONS", "\n")
 view.render(events.alloc, symbols)
diff --git a/tools/utils/symtab.lua b/tools/utils/symtab.lua
index f3e5e31..03aadbd 100644
--- a/tools/utils/symtab.lua
+++ b/tools/utils/symtab.lua
@@ -1,5 +1,5 @@
 -- Parser of LuaJIT's symtab binary stream.
--- The format spec can be found in <src/lj_symtab.h>.
+-- The format spec can be found in <src/lj_memprof.h>.
 --
 -- Major portions taken verbatim or adapted from the LuaVela.
 -- Copyright (C) 2015-2019 IPONWEB Ltd.
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler Sergey Kaplun
@ 2020-12-27 10:58   ` Sergey Ostanevich
  2020-12-27 11:54     ` Sergey Kaplun
  2020-12-27 16:44   ` Igor Munkin
  1 sibling, 1 reply; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-27 10:58 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch! Just one question below, looks good otherwise.

Sergos

> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> new file mode 100644
> index 0000000..e0df057
> --- /dev/null
> +++ b/src/lj_memprof.c
<snipped>
> +static int memprof_stop(const struct lua_State *L)
> +{
> +  struct memprof *mp = &memprof;
> +  struct alloc *oalloc = &mp->orig_alloc;
> +  struct lj_wbuf *out = &mp->out;
> +  int return_status = PROFILE_SUCCESS;
> +  int saved_errno = 0;
> +  struct lua_State *main_L;
> +  int cb_status;
> +
> +  memprof_lock();
> +
> +  if (mp->state == MPS_HALT) {
> +    errno = mp->saved_errno;
> +    mp->state = MPS_IDLE
> +    memprof_unlock();
> +    return PROFILE_ERRIO;
> +  }
> +
> +  if (mp->state != MPS_PROFILE) {
> +    memprof_unlock();
> +    return PROFILE_ERRRUN;
> +  }
> +

> +  if (L != NULL && mp->g != G(L)) {
> +    memprof_unlock();
> +    return PROFILE_ERR;
> +  }
> +
> +  mp->state = MPS_IDLE;
> +
> +  lua_assert(mp->g != NULL);
> +  main_L = mainthread(mp->g);
> +
> +  lua_assert(memprof_allocf == lua_getallocf(main_L, NULL));
> +  lua_assert(oalloc->allocf != NULL);
> +  lua_assert(oalloc->state != NULL);
> +  lua_setallocf(main_L, oalloc->allocf, oalloc->state);
> +
> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> +    mp->state = MPS_HALT;
> +    /* on_stop call may change errno value. */
> +    mp->saved_errno = lj_wbuf_errno(out);
> +    /* Ignore possible errors. mp->opt.buf == NULL here. */
> +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> +    lj_wbuf_terminate(out);
> +    memprof_unlock();
> +    return PROFILE_ERRIO;
> +  }
> +  lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
> +
> +  lj_wbuf_flush(out);
> +
> +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> +    saved_errno = lj_wbuf_errno(out);

Previous return of PROFILE_ERRIO causes MPS_HALT. Should it be the same?

> +    return_status = PROFILE_ERRIO;
> +  }
> +
> +  lj_wbuf_terminate(out);
> +
> +  memprof_unlock();
> +  errno = saved_errno;
> +  return return_status;
> +}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
  2020-12-27 10:58   ` Sergey Ostanevich
@ 2020-12-27 11:54     ` Sergey Kaplun
  2020-12-27 13:27       ` Sergey Ostanevich
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-27 11:54 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi, Sergos!

Thanks for the review!

On 27.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch! Just one question below, looks good otherwise.
> 
> Sergos
> 
> > diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> > new file mode 100644
> > index 0000000..e0df057
> > --- /dev/null
> > +++ b/src/lj_memprof.c
> <snipped>
> > +static int memprof_stop(const struct lua_State *L)
> > +{
> > +  struct memprof *mp = &memprof;
> > +  struct alloc *oalloc = &mp->orig_alloc;
> > +  struct lj_wbuf *out = &mp->out;
> > +  int return_status = PROFILE_SUCCESS;
> > +  int saved_errno = 0;
> > +  struct lua_State *main_L;
> > +  int cb_status;
> > +
> > +  memprof_lock();
> > +
> > +  if (mp->state == MPS_HALT) {
> > +    errno = mp->saved_errno;
> > +    mp->state = MPS_IDLE
> > +    memprof_unlock();
> > +    return PROFILE_ERRIO;
> > +  }
> > +
> > +  if (mp->state != MPS_PROFILE) {
> > +    memprof_unlock();
> > +    return PROFILE_ERRRUN;
> > +  }
> > +
> 
> > +  if (L != NULL && mp->g != G(L)) {
> > +    memprof_unlock();
> > +    return PROFILE_ERR;
> > +  }
> > +
> > +  mp->state = MPS_IDLE;
> > +
> > +  lua_assert(mp->g != NULL);
> > +  main_L = mainthread(mp->g);
> > +
> > +  lua_assert(memprof_allocf == lua_getallocf(main_L, NULL));
> > +  lua_assert(oalloc->allocf != NULL);
> > +  lua_assert(oalloc->state != NULL);
> > +  lua_setallocf(main_L, oalloc->allocf, oalloc->state);
> > +
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> > +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> > +    mp->state = MPS_HALT;
> > +    /* on_stop call may change errno value. */
> > +    mp->saved_errno = lj_wbuf_errno(out);
> > +    /* Ignore possible errors. mp->opt.buf == NULL here. */
> > +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > +    lj_wbuf_terminate(out);
> > +    memprof_unlock();
> > +    return PROFILE_ERRIO;
> > +  }
> > +  lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
> > +
> > +  lj_wbuf_flush(out);
> > +
> > +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> > +    saved_errno = lj_wbuf_errno(out);
> 
> Previous return of PROFILE_ERRIO causes MPS_HALT. Should it be the same?

Yes, you're right! Previous MPS_HALT is redundant -- profiler anyway
should stop (IDLE) immediately, there is no reason to set HALT there.
Thanks! Fixed!

See the iterative diff below. Branch is force-pushed.

> 
> > +    return_status = PROFILE_ERRIO;
> > +  }
> > +
> > +  lj_wbuf_terminate(out);
> > +
> > +  memprof_unlock();
> > +  errno = saved_errno;
> > +  return return_status;
> > +}
> 

===================================================================
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
index e0df057..998cbea 100644
--- a/src/lj_memprof.c
+++ b/src/lj_memprof.c
@@ -354,13 +354,13 @@ static int memprof_stop(const struct lua_State *L)
 
   if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
     lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
-    mp->state = MPS_HALT;
     /* on_stop call may change errno value. */
-    mp->saved_errno = lj_wbuf_errno(out);
+    saved_errno = lj_wbuf_errno(out);
     /* Ignore possible errors. mp->opt.buf == NULL here. */
     mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
     lj_wbuf_terminate(out);
     memprof_unlock();
+    errno = saved_errno;
     return PROFILE_ERRIO;
   }
   lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for memory profiler
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for " Sergey Kaplun
@ 2020-12-27 11:54   ` Sergey Ostanevich
  2020-12-27 13:42     ` Sergey Kaplun
  2020-12-27 18:58   ` Igor Munkin
  1 sibling, 1 reply; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-27 11:54 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch!

2 nits below, LGTM after the fix.

Sergos


> On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> This patch introduces Lua API for LuaJIT memory profiler implemented in
> the scope of the previous patch.
> 
> Profiler returns some true value if started/stopped successfully,
> returns nil on failure (plus an error message as a second result and a
> system-dependent error code as a third result).
> If LuaJIT build without memory profiler both return `false`.
           ^^^^^^ was built
> 
> <lj_errmsg.h> have adjusted with two new errors
> PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
> started/stopped already correspondingly.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>  - Added pushing of errno for ERR_PROF* and ERRMEM
>  - Added forgotten assert.
> 
> src/Makefile.dep |   5 +-
> src/lib_misc.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++
> src/lj_errmsg.h  |   6 ++
> 3 files changed, 176 insertions(+), 2 deletions(-)
> 
> diff --git a/src/Makefile.dep b/src/Makefile.dep
> index 8ae14a5..c3d0977 100644
> --- a/src/Makefile.dep
> +++ b/src/Makefile.dep
> @@ -29,8 +29,9 @@ lib_jit.o: lib_jit.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
>  lj_vm.h lj_vmevent.h lj_lib.h luajit.h lj_libdef.h
> lib_math.o: lib_math.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h \
>  lj_def.h lj_arch.h lj_lib.h lj_vm.h lj_libdef.h
> -lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
> - lj_str.h lj_tab.h lj_lib.h lj_libdef.h
> +lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lauxlib.h lj_obj.h \
> + lj_def.h lj_arch.h lj_str.h lj_tab.h lj_lib.h lj_gc.h lj_err.h \
> + lj_errmsg.h lj_memprof.h lj_libdef.h
> lib_os.o: lib_os.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
>  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_lib.h \
>  lj_libdef.h
> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index 6f7b9a9..36fe29f 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c
> @@ -8,13 +8,21 @@
> #define lib_misc_c
> #define LUA_LIB
> 
> +#include <stdio.h>
> +#include <errno.h>
> +
> #include "lua.h"
> #include "lmisclib.h"
> +#include "lauxlib.h"
> 
> #include "lj_obj.h"
> #include "lj_str.h"
> #include "lj_tab.h"
> #include "lj_lib.h"
> +#include "lj_gc.h"
> +#include "lj_err.h"
> +
> +#include "lj_memprof.h"
> 
> /* ------------------------------------------------------------------------ */
> 
> @@ -67,8 +75,167 @@ LJLIB_CF(misc_getmetrics)
> 
> #include "lj_libdef.h"
> 
> +/* ----- misc.memprof module ---------------------------------------------- */
> +
> +#define LJLIB_MODULE_misc_memprof
> +
> +/*
> +** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
> +*/
> +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
> +
> +/* Structure given as ctx to memprof writer and on_stop callback. */
> +struct memprof_ctx {
> +  /* Output file stream for data. */
> +  FILE *stream;
> +  /* Profiled global_State for lj_mem_free at on_stop callback. */
> +  global_State *g;
> +};
> +
> +static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
> +{
> +  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);
> +  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
> +}
> +
> +/* Default buffer writer function. Just call fwrite to corresponding FILE. */
> +static size_t buffer_writer_default(const void **buf_addr, size_t len,
> +				    void *opt)
> +{
> +  FILE *stream = ((struct memprof_ctx *)opt)->stream;
> +  const void * const buf_start = *buf_addr;
> +  const void *data = *buf_addr;
> +  size_t write_total = 0;
> +
> +  lua_assert(len <= STREAM_BUFFER_SIZE);
> +
> +  for (;;) {
> +    const size_t written = fwrite(data, 1, len, stream);
> +
> +    if (LJ_UNLIKELY(written == 0)) {
> +      /* Re-tries write in case of EINTR. */
> +      if (errno == EINTR) {
> +	errno = 0;
> +	continue;
> +      }
> +      break;
> +    }
> +
> +    write_total += written;
> +
> +    if (write_total == len)
> +      break;
> +
> +    data = (uint8_t *)data + (ptrdiff_t)written;

After incomplete write you’ll return to the fwrite() call with
data pointer moved, but with len untouched -> you’ll read beyond
the buffer.

> +  }
> +  lua_assert(write_total <= len);
> +
> +  *buf_addr = buf_start;
> +  return write_total;
> +}
> +
> +/* Default on stop callback. Just close corresponding stream. */
> +static int on_stop_cb_default(void *opt, uint8_t *buf)
> +{
> +  struct memprof_ctx *ctx = opt;
> +  FILE *stream = ctx->stream;
> +  memprof_ctx_free(ctx, buf);
> +  return fclose(stream);
> +}
> +
> +/* local started, err, errno = misc.memprof.start(fname) */
> +LJLIB_CF(misc_memprof_start)
> +{
> +  struct lua_Prof_options opt = {0};
> +  struct memprof_ctx *ctx;
> +  const char *fname;
> +  int memprof_status;
> +  int started;
> +
> +  fname = strdata(lj_lib_checkstr(L, 1));
> +
> +  ctx = lj_mem_new(L, sizeof(*ctx));
> +  if (ctx == NULL)
> +    goto errmem;
> +
> +  opt.ctx = ctx;
> +  opt.writer = buffer_writer_default;
> +  opt.on_stop = on_stop_cb_default;
> +  opt.len = STREAM_BUFFER_SIZE;
> +  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
> +  if (NULL == opt.buf) {
> +    lj_mem_free(G(L), ctx, sizeof(*ctx));
> +    goto errmem;
> +  }
> +
> +  ctx->g = G(L);
> +  ctx->stream = fopen(fname, "wb");
> +
> +  if (ctx->stream == NULL) {
> +    memprof_ctx_free(ctx, opt.buf);
> +    return luaL_fileresult(L, 0, fname);
> +  }
> +
> +  memprof_status = lj_memprof_start(L, &opt);
> +  started = memprof_status == PROFILE_SUCCESS;
> +
> +  if (LJ_UNLIKELY(!started)) {
> +    fclose(ctx->stream);
> +    remove(fname);
> +    memprof_ctx_free(ctx, opt.buf);
> +    switch (memprof_status) {
> +    case PROFILE_ERRRUN:
> +      lua_pushnil(L);
> +      lua_pushstring(L, err2msg(LJ_ERR_PROF_ISRUNNING));
> +      lua_pushinteger(L, EINVAL);
> +      return 3;
> +    case PROFILE_ERRIO:
> +      return luaL_fileresult(L, 0, fname);
> +    default:
> +      lua_assert(0);
> +      break;
> +    }
> +  }
> +  lua_pushboolean(L, started);
> +
> +  return 1;
> +errmem:
> +  lua_pushnil(L);
> +  lua_pushstring(L, err2msg(LJ_ERR_ERRMEM));
> +  lua_pushinteger(L, ENOMEM);
> +  return 3;
> +}
> +
> +/* local stopped, err, errno = misc.memprof.stop() */
> +LJLIB_CF(misc_memprof_stop)
> +{
> +  int status = lj_memprof_stop();
> +  int stopped_successfully = status == PROFILE_SUCCESS;
> +  if (!stopped_successfully) {
> +    switch (status) {
> +    case PROFILE_ERRRUN:
> +      lua_pushnil(L);
> +      lua_pushstring(L, err2msg(LJ_ERR_PROF_NOTRUNNING));
> +      lua_pushinteger(L, EINVAL);
> +      return 3;
> +    case PROFILE_ERRIO:
> +      return luaL_fileresult(L, 0, NULL);
> +    default:
> +      lua_assert(0);
> +      break;
> +    }
> +  }
> +  lua_pushboolean(L, stopped_successfully);
> +  return 1;
> +}
> +
> +#include "lj_libdef.h"
> +
> +/* ------------------------------------------------------------------------ */
> +
> LUALIB_API int luaopen_misc(struct lua_State *L)
> {
>   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
> +  LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
>   return 1;
> }
> diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
> index de7b867..6816da2 100644
> --- a/src/lj_errmsg.h
> +++ b/src/lj_errmsg.h
> @@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
> ERRDEF(FFI_NYICALL,	"NYI: cannot call this C function (yet)")
> #endif
> 
> +#if LJ_HASPROFILE || LJ_HASMEMPROF
> +/* Profiler errors. */
> +ERRDEF(PROF_ISRUNNING,	"profiler is running already")
> +ERRDEF(PROF_NOTRUNNING,	"profiler is not running")
> +#endif
> +
> #undef ERRDEF
> 
> /* Detecting unused error messages:
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field Sergey Kaplun
  2020-12-26 19:12   ` Sergey Ostanevich
@ 2020-12-27 13:09   ` Igor Munkin
  2020-12-27 17:44     ` Sergey Kaplun
  1 sibling, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-27 13:09 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! I guess this commit can be squashed with the
following (introducing memprof engine), since <mem_L> field is added
only to make it work. Also consider my comments below.

On 25.12.20, Sergey Kaplun wrote:
> To determine currently allocating coroutine (that may not be equal to
> currently executed one) a new field called mem_L is added to
> global_State structure. This field is set on each allocation event and
> stores the coroutine address that is used for allocation.
> 
> Part of tarantool/tarantool#5442
> ---
>  src/lj_gc.c  | 2 ++
>  src/lj_obj.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/src/lj_gc.c b/src/lj_gc.c
> index 44c8aa1..800fb2c 100644
> --- a/src/lj_gc.c
> +++ b/src/lj_gc.c
> @@ -852,6 +852,8 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
>  {
>    global_State *g = G(L);
>    lua_assert((osz == 0) == (p == NULL));
> +
> +  setgcref(g->mem_L, obj2gco(L));

This field is initialized only here. What about <lj_mem_newgco> and
<lj_mem_free>? As for the latter the initialization is not necessary,
since all deallocations are reported as internals (but the comment
strongly required), but this assignment is definitely lost in the first
routine.

>    p = g->allocf(g->allocd, p, osz, nsz);
>    if (p == NULL && nsz > 0)
>      lj_err_mem(L);

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser Sergey Kaplun
  2020-12-26 22:56   ` Igor Munkin
@ 2020-12-27 13:24   ` Sergey Ostanevich
  2020-12-27 16:02     ` Sergey Kaplun
  1 sibling, 1 reply; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-27 13:24 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi!

Thanks for the patch!
See my 7 comments below.

Sergos


> On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
<snipped>
> diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> new file mode 100755
> index 0000000..e02c6fa
> --- /dev/null
> +++ b/test/misclib-memprof-lapi.test.lua
> @@ -0,0 +1,135 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +
> +local test = tap.test("misc-memprof-lapi")
> +test:plan(9)
> +
> +jit.off()
> +jit.flush()
> +
> +-- FIXME: Launch tests with LUA_PATH enviroment variable.
> +local path = arg[0]:gsub('/[^/]+%.test%.lua', ‘’)

I believe it won’t work well for some cases, such as

tarantool> arg[0]
---
- void.test.lua
...

tarantool> arg[0]:gsub('/[^/]+%.test%.lua', '')
---
- void.test.lua
- 0
...

Alternative is:

tarantool> os.execute('dirname '..arg[0])
.
---
- 0
...


> +local path_suffix = '../tools/?.lua;'
> +package.path = ('%s/%s;'):format(path, path_suffix)..package.path
> +
> +local table_new = require "table.new"
> +
> +local bufread = require "utils.bufread"
> +local memprof = require "memprof.parse"
> +local symtab = require "utils.symtab"
> +
> +local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
> +local BAD_PATH = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
> +
> +local function payload()
> +  -- Preallocate table to avoid array part reallocations.
                                         ^parts?
> +  local _ = table_new(100, 0)
> +
> +  -- Want too see 100 objects here.
> +  for i = 1, 100 do
> +    -- Try to avoid crossing with "test" module objects.
> +    _[i] = "memprof-str-"..i
> +  end
> +
> +  _ = nil
> +  -- VMSTATE == GC, reported as INTERNAL.
> +  collectgarbage()
> +end
> +
> +local function generate_output(filename)
> +  -- Clean up all garbage to avoid polution of free.
                                     pollution
> +  collectgarbage()
> +
> +  local res, err = misc.memprof.start(filename)
> +  -- Should start succesfully.
> +  assert(res, err)
> +
> +  payload()
> +
> +  res, err = misc.memprof.stop()
> +  -- Should stop succesfully.
> +  assert(res, err)
> +end

<snipped>

> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> new file mode 100644
> index 0000000..f4996f4
> --- /dev/null
> +++ b/tools/memprof/parse.lua

<snipped>

> +local function link_to_previous(heap, e, oaddr)
> +  -- Memory at oaddr was allocated before we started tracking.
> +  local heap_chunk = heap[oaddr]

Do you need two args for this? Can you just pass the heap[oaddr] instead?

> +  if heap_chunk then
> +    -- Save Lua code location (line) by address (id).
> +    e.primary[heap_chunk[2]] = heap_chunk[3]
> +  end
> +end
> +

<snipped>

> +local function ev_header_split(evh)
> +  return band(evh, 0x3), band(evh, lshift(0x3, 2))

Should you intorduce masks along with AEVENT/ASOURCE to avoid these
magic numbers?

> +end
> +

<snipped>

> diff --git a/tools/utils/bufread.lua b/tools/utils/bufread.lua

<snipped>

> +
> +local function _read_stream(reader, n)
> +  local tail_size = reader._end - reader._pos
> +
> +  if tail_size >= n then
> +    -- Enough data to satisfy the request of n bytes.
> +    return true
> +  end
> +
> +  -- Otherwise carry tail_size bytes from the end of the buffer
> +  -- to the start and fill up free_size bytes with fresh data.
> +  -- tail_size < n <= free_size (see assert below) ensures that
> +  -- we don't copy overlapping memory regions.
> +  -- reader._pos == 0 means filling buffer for the first time.
> +
> +  local free_size = reader._pos > 0 and reader._pos or n
> +
> +  assert(n <= free_size, "Internal buffer is large enough")

Does it mean I will have a fail in case _pos is less that half of the
buffer and n is more than the tail_size? 
Which means I can use only half of the buffer?

> +
> +  if tail_size ~= 0 then
> +    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
> +  end
> +
> +  local bytes_read = ffi_C.fread(
> +    reader._buf + tail_size, 1, free_size, reader._file
> +  )
> +
> +  reader._pos = 0
> +  reader._end = tail_size + bytes_read
> +
> +  return reader._end - reader._pos >= n
> +end
> +

<snipped>

> +function M.eof(reader)
> +  local sys_feof = ffi_C.feof(reader._file)
> +  if sys_feof == 0 then
> +    return false
> +  end
> +  -- Otherwise return true only we have reached
                                  ^^ if we

> +  -- the end of the buffer.
> +  return reader._pos == reader._end
> +end
<snipped>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
  2020-12-27 11:54     ` Sergey Kaplun
@ 2020-12-27 13:27       ` Sergey Ostanevich
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-27 13:27 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 3839 bytes --]

LGTM

Sergos

> On 27 Dec 2020, at 14:54, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> Hi, Sergos!
> 
> Thanks for the review!
> 
> On 27.12.20, Sergey Ostanevich wrote:
>> Hi!
>> 
>> Thanks for the patch! Just one question below, looks good otherwise.
>> 
>> Sergos
>> 
>>> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
>>> new file mode 100644
>>> index 0000000..e0df057
>>> --- /dev/null
>>> +++ b/src/lj_memprof.c
>> <snipped>
>>> +static int memprof_stop(const struct lua_State *L)
>>> +{
>>> +  struct memprof *mp = &memprof;
>>> +  struct alloc *oalloc = &mp->orig_alloc;
>>> +  struct lj_wbuf *out = &mp->out;
>>> +  int return_status = PROFILE_SUCCESS;
>>> +  int saved_errno = 0;
>>> +  struct lua_State *main_L;
>>> +  int cb_status;
>>> +
>>> +  memprof_lock();
>>> +
>>> +  if (mp->state == MPS_HALT) {
>>> +    errno = mp->saved_errno;
>>> +    mp->state = MPS_IDLE
>>> +    memprof_unlock();
>>> +    return PROFILE_ERRIO;
>>> +  }
>>> +
>>> +  if (mp->state != MPS_PROFILE) {
>>> +    memprof_unlock();
>>> +    return PROFILE_ERRRUN;
>>> +  }
>>> +
>> 
>>> +  if (L != NULL && mp->g != G(L)) {
>>> +    memprof_unlock();
>>> +    return PROFILE_ERR;
>>> +  }
>>> +
>>> +  mp->state = MPS_IDLE;
>>> +
>>> +  lua_assert(mp->g != NULL);
>>> +  main_L = mainthread(mp->g);
>>> +
>>> +  lua_assert(memprof_allocf == lua_getallocf(main_L, NULL));
>>> +  lua_assert(oalloc->allocf != NULL);
>>> +  lua_assert(oalloc->state != NULL);
>>> +  lua_setallocf(main_L, oalloc->allocf, oalloc->state);
>>> +
>>> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
>>> +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
>>> +    mp->state = MPS_HALT;
>>> +    /* on_stop call may change errno value. */
>>> +    mp->saved_errno = lj_wbuf_errno(out);
>>> +    /* Ignore possible errors. mp->opt.buf == NULL here. */
>>> +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
>>> +    lj_wbuf_terminate(out);
>>> +    memprof_unlock();
>>> +    return PROFILE_ERRIO;
>>> +  }
>>> +  lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
>>> +
>>> +  lj_wbuf_flush(out);
>>> +
>>> +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
>>> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
>>> +    saved_errno = lj_wbuf_errno(out);
>> 
>> Previous return of PROFILE_ERRIO causes MPS_HALT. Should it be the same?
> 
> Yes, you're right! Previous MPS_HALT is redundant -- profiler anyway
> should stop (IDLE) immediately, there is no reason to set HALT there.
> Thanks! Fixed!
> 
> See the iterative diff below. Branch is force-pushed.
> 
>> 
>>> +    return_status = PROFILE_ERRIO;
>>> +  }
>>> +
>>> +  lj_wbuf_terminate(out);
>>> +
>>> +  memprof_unlock();
>>> +  errno = saved_errno;
>>> +  return return_status;
>>> +}
>> 
> 
> ===================================================================
> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> index e0df057..998cbea 100644
> --- a/src/lj_memprof.c
> +++ b/src/lj_memprof.c
> @@ -354,13 +354,13 @@ static int memprof_stop(const struct lua_State *L)
> 
>   if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
>     lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> -    mp->state = MPS_HALT;
>     /* on_stop call may change errno value. */
> -    mp->saved_errno = lj_wbuf_errno(out);
> +    saved_errno = lj_wbuf_errno(out);
>     /* Ignore possible errors. mp->opt.buf == NULL here. */
>     mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
>     lj_wbuf_terminate(out);
>     memprof_unlock();
> +    errno = saved_errno;
>     return PROFILE_ERRIO;
>   }
>   lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
> ===================================================================
> 
> -- 
> Best regards,
> Sergey Kaplun


[-- Attachment #2: Type: text/html, Size: 31522 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for memory profiler
  2020-12-27 11:54   ` Sergey Ostanevich
@ 2020-12-27 13:42     ` Sergey Kaplun
  2020-12-27 15:37       ` Sergey Ostanevich
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-27 13:42 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches


Hi!

Thanks for the review!

On 27.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch!
> 
> 2 nits below, LGTM after the fix.
> 
> Sergos
> 
> 
> > On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
> > 
> > This patch introduces Lua API for LuaJIT memory profiler implemented in
> > the scope of the previous patch.
> > 
> > Profiler returns some true value if started/stopped successfully,
> > returns nil on failure (plus an error message as a second result and a
> > system-dependent error code as a third result).
> > If LuaJIT build without memory profiler both return `false`.
>            ^^^^^^ was built

Changed to `is build`.

> > 
> > <lj_errmsg.h> have adjusted with two new errors
> > PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
> > started/stopped already correspondingly.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >  - Added pushing of errno for ERR_PROF* and ERRMEM
> >  - Added forgotten assert.
> > 
> > src/Makefile.dep |   5 +-
> > src/lib_misc.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++
> > src/lj_errmsg.h  |   6 ++
> > 3 files changed, 176 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/Makefile.dep b/src/Makefile.dep
> > index 8ae14a5..c3d0977 100644
> > --- a/src/Makefile.dep
> > +++ b/src/Makefile.dep
> > @@ -29,8 +29,9 @@ lib_jit.o: lib_jit.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
> >  lj_vm.h lj_vmevent.h lj_lib.h luajit.h lj_libdef.h
> > lib_math.o: lib_math.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h \
> >  lj_def.h lj_arch.h lj_lib.h lj_vm.h lj_libdef.h
> > -lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
> > - lj_str.h lj_tab.h lj_lib.h lj_libdef.h
> > +lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lauxlib.h lj_obj.h \
> > + lj_def.h lj_arch.h lj_str.h lj_tab.h lj_lib.h lj_gc.h lj_err.h \
> > + lj_errmsg.h lj_memprof.h lj_libdef.h
> > lib_os.o: lib_os.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
> >  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_lib.h \
> >  lj_libdef.h
> > diff --git a/src/lib_misc.c b/src/lib_misc.c
> > index 6f7b9a9..36fe29f 100644
> > --- a/src/lib_misc.c
> > +++ b/src/lib_misc.c
> > @@ -8,13 +8,21 @@
> > #define lib_misc_c
> > #define LUA_LIB
> > 
> > +#include <stdio.h>
> > +#include <errno.h>
> > +
> > #include "lua.h"
> > #include "lmisclib.h"
> > +#include "lauxlib.h"
> > 
> > #include "lj_obj.h"
> > #include "lj_str.h"
> > #include "lj_tab.h"
> > #include "lj_lib.h"
> > +#include "lj_gc.h"
> > +#include "lj_err.h"
> > +
> > +#include "lj_memprof.h"
> > 
> > /* ------------------------------------------------------------------------ */
> > 
> > @@ -67,8 +75,167 @@ LJLIB_CF(misc_getmetrics)
> > 
> > #include "lj_libdef.h"
> > 
> > +/* ----- misc.memprof module ---------------------------------------------- */
> > +
> > +#define LJLIB_MODULE_misc_memprof
> > +
> > +/*
> > +** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
> > +*/
> > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
> > +
> > +/* Structure given as ctx to memprof writer and on_stop callback. */
> > +struct memprof_ctx {
> > +  /* Output file stream for data. */
> > +  FILE *stream;
> > +  /* Profiled global_State for lj_mem_free at on_stop callback. */
> > +  global_State *g;
> > +};
> > +
> > +static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
> > +{
> > +  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);
> > +  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
> > +}
> > +
> > +/* Default buffer writer function. Just call fwrite to corresponding FILE. */
> > +static size_t buffer_writer_default(const void **buf_addr, size_t len,
> > +				    void *opt)
> > +{
> > +  FILE *stream = ((struct memprof_ctx *)opt)->stream;
> > +  const void * const buf_start = *buf_addr;
> > +  const void *data = *buf_addr;
> > +  size_t write_total = 0;
> > +
> > +  lua_assert(len <= STREAM_BUFFER_SIZE);
> > +
> > +  for (;;) {
> > +    const size_t written = fwrite(data, 1, len, stream);
> > +
> > +    if (LJ_UNLIKELY(written == 0)) {
> > +      /* Re-tries write in case of EINTR. */
> > +      if (errno == EINTR) {
> > +	errno = 0;
> > +	continue;
> > +      }
> > +      break;
> > +    }
> > +
> > +    write_total += written;
> > +
> > +    if (write_total == len)
> > +      break;
> > +
> > +    data = (uint8_t *)data + (ptrdiff_t)written;
> 
> After incomplete write you’ll return to the fwrite() call with
> data pointer moved, but with len untouched -> you’ll read beyond
> the buffer.

Oh! I keep stepping on the same rake...
Thank you very much! Fixed!

See the iterative patch below. Branch is force pushed.

> 
> > +  }
> > +  lua_assert(write_total <= len);
> > +
> > +  *buf_addr = buf_start;
> > +  return write_total;
> > +}
> > +
> > +/* Default on stop callback. Just close corresponding stream. */
> > +static int on_stop_cb_default(void *opt, uint8_t *buf)
> > +{
> > +  struct memprof_ctx *ctx = opt;
> > +  FILE *stream = ctx->stream;
> > +  memprof_ctx_free(ctx, buf);
> > +  return fclose(stream);
> > +}
> > +
> > +/* local started, err, errno = misc.memprof.start(fname) */
> > +LJLIB_CF(misc_memprof_start)
> > +{
> > +  struct lua_Prof_options opt = {0};
> > +  struct memprof_ctx *ctx;
> > +  const char *fname;
> > +  int memprof_status;
> > +  int started;
> > +
> > +  fname = strdata(lj_lib_checkstr(L, 1));
> > +
> > +  ctx = lj_mem_new(L, sizeof(*ctx));
> > +  if (ctx == NULL)
> > +    goto errmem;
> > +
> > +  opt.ctx = ctx;
> > +  opt.writer = buffer_writer_default;
> > +  opt.on_stop = on_stop_cb_default;
> > +  opt.len = STREAM_BUFFER_SIZE;
> > +  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
> > +  if (NULL == opt.buf) {
> > +    lj_mem_free(G(L), ctx, sizeof(*ctx));
> > +    goto errmem;
> > +  }
> > +
> > +  ctx->g = G(L);
> > +  ctx->stream = fopen(fname, "wb");
> > +
> > +  if (ctx->stream == NULL) {
> > +    memprof_ctx_free(ctx, opt.buf);
> > +    return luaL_fileresult(L, 0, fname);
> > +  }
> > +
> > +  memprof_status = lj_memprof_start(L, &opt);
> > +  started = memprof_status == PROFILE_SUCCESS;
> > +
> > +  if (LJ_UNLIKELY(!started)) {
> > +    fclose(ctx->stream);
> > +    remove(fname);
> > +    memprof_ctx_free(ctx, opt.buf);
> > +    switch (memprof_status) {
> > +    case PROFILE_ERRRUN:
> > +      lua_pushnil(L);
> > +      lua_pushstring(L, err2msg(LJ_ERR_PROF_ISRUNNING));
> > +      lua_pushinteger(L, EINVAL);
> > +      return 3;
> > +    case PROFILE_ERRIO:
> > +      return luaL_fileresult(L, 0, fname);
> > +    default:
> > +      lua_assert(0);
> > +      break;
> > +    }
> > +  }
> > +  lua_pushboolean(L, started);
> > +
> > +  return 1;
> > +errmem:
> > +  lua_pushnil(L);
> > +  lua_pushstring(L, err2msg(LJ_ERR_ERRMEM));
> > +  lua_pushinteger(L, ENOMEM);
> > +  return 3;
> > +}
> > +
> > +/* local stopped, err, errno = misc.memprof.stop() */
> > +LJLIB_CF(misc_memprof_stop)
> > +{
> > +  int status = lj_memprof_stop();
> > +  int stopped_successfully = status == PROFILE_SUCCESS;
> > +  if (!stopped_successfully) {
> > +    switch (status) {
> > +    case PROFILE_ERRRUN:
> > +      lua_pushnil(L);
> > +      lua_pushstring(L, err2msg(LJ_ERR_PROF_NOTRUNNING));
> > +      lua_pushinteger(L, EINVAL);
> > +      return 3;
> > +    case PROFILE_ERRIO:
> > +      return luaL_fileresult(L, 0, NULL);
> > +    default:
> > +      lua_assert(0);
> > +      break;
> > +    }
> > +  }
> > +  lua_pushboolean(L, stopped_successfully);
> > +  return 1;
> > +}
> > +
> > +#include "lj_libdef.h"
> > +
> > +/* ------------------------------------------------------------------------ */
> > +
> > LUALIB_API int luaopen_misc(struct lua_State *L)
> > {
> >   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
> > +  LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
> >   return 1;
> > }
> > diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
> > index de7b867..6816da2 100644
> > --- a/src/lj_errmsg.h
> > +++ b/src/lj_errmsg.h
> > @@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
> > ERRDEF(FFI_NYICALL,	"NYI: cannot call this C function (yet)")
> > #endif
> > 
> > +#if LJ_HASPROFILE || LJ_HASMEMPROF
> > +/* Profiler errors. */
> > +ERRDEF(PROF_ISRUNNING,	"profiler is running already")
> > +ERRDEF(PROF_NOTRUNNING,	"profiler is not running")
> > +#endif
> > +
> > #undef ERRDEF
> > 
> > /* Detecting unused error messages:
> > -- 
> > 2.28.0
> > 
> 

===================================================================
diff --git a/src/lib_misc.c b/src/lib_misc.c
index 36fe29f..f69f933 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -110,7 +110,7 @@ static size_t buffer_writer_default(const void **buf_addr, size_t len,
   lua_assert(len <= STREAM_BUFFER_SIZE);
 
   for (;;) {
-    const size_t written = fwrite(data, 1, len, stream);
+    const size_t written = fwrite(data, 1, len - write_total, stream);
 
     if (LJ_UNLIKELY(written == 0)) {
       /* Re-tries write in case of EINTR. */
@@ -122,13 +122,13 @@ static size_t buffer_writer_default(const void **buf_addr, size_t len,
     }
 
     write_total += written;
+    lua_assert(write_total <= len);
 
     if (write_total == len)
       break;
 
     data = (uint8_t *)data + (ptrdiff_t)written;
   }
-  lua_assert(write_total <= len);
 
   *buf_addr = buf_start;
   return write_total;
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for memory profiler
  2020-12-27 13:42     ` Sergey Kaplun
@ 2020-12-27 15:37       ` Sergey Ostanevich
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-27 15:37 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 9982 bytes --]


LGTM

Best regards,
Sergos 

Sunday, 27 December 2020, 16:43 +0300 from skaplun@tarantool.org  <skaplun@tarantool.org>:
>
>Hi!
>
>Thanks for the review!
>
>On 27.12.20, Sergey Ostanevich wrote:
>> Hi!
>> 
>> Thanks for the patch!
>> 
>> 2 nits below, LGTM after the fix.
>> 
>> Sergos
>> 
>> 
>> > On 25 Dec 2020, at 18:26, Sergey Kaplun < skaplun@tarantool.org > wrote:
>> > 
>> > This patch introduces Lua API for LuaJIT memory profiler implemented in
>> > the scope of the previous patch.
>> > 
>> > Profiler returns some true value if started/stopped successfully,
>> > returns nil on failure (plus an error message as a second result and a
>> > system-dependent error code as a third result).
>> > If LuaJIT build without memory profiler both return `false`.
>>            ^^^^^^ was built
>
>Changed to `is build`.
>
>> > 
>> > <lj_errmsg.h> have adjusted with two new errors
>> > PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
>> > started/stopped already correspondingly.
>> > 
>> > Part of tarantool/tarantool#5442
>> > ---
>> > 
>> > Changes in v2:
>> >  - Added pushing of errno for ERR_PROF* and ERRMEM
>> >  - Added forgotten assert.
>> > 
>> > src/Makefile.dep |   5 +-
>> > src/lib_misc.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++
>> > src/lj_errmsg.h  |   6 ++
>> > 3 files changed, 176 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/Makefile.dep b/src/Makefile.dep
>> > index 8ae14a5..c3d0977 100644
>> > --- a/src/Makefile.dep
>> > +++ b/src/Makefile.dep
>> > @@ -29,8 +29,9 @@ lib_jit.o: lib_jit.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
>> >  lj_vm.h lj_vmevent.h lj_lib.h luajit.h lj_libdef.h
>> > lib_math.o: lib_math.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h \
>> >  lj_def.h lj_arch.h lj_lib.h lj_vm.h lj_libdef.h
>> > -lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
>> > - lj_str.h lj_tab.h lj_lib.h lj_libdef.h
>> > +lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lauxlib.h lj_obj.h \
>> > + lj_def.h lj_arch.h lj_str.h lj_tab.h lj_lib.h lj_gc.h lj_err.h \
>> > + lj_errmsg.h lj_memprof.h lj_libdef.h
>> > lib_os.o: lib_os.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
>> >  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_lib.h \
>> >  lj_libdef.h
>> > diff --git a/src/lib_misc.c b/src/lib_misc.c
>> > index 6f7b9a9..36fe29f 100644
>> > --- a/src/lib_misc.c
>> > +++ b/src/lib_misc.c
>> > @@ -8,13 +8,21 @@
>> > #define lib_misc_c
>> > #define LUA_LIB
>> > 
>> > +#include <stdio.h>
>> > +#include <errno.h>
>> > +
>> > #include "lua.h"
>> > #include "lmisclib.h"
>> > +#include "lauxlib.h"
>> > 
>> > #include "lj_obj.h"
>> > #include "lj_str.h"
>> > #include "lj_tab.h"
>> > #include "lj_lib.h"
>> > +#include "lj_gc.h"
>> > +#include "lj_err.h"
>> > +
>> > +#include "lj_memprof.h"
>> > 
>> > /* ------------------------------------------------------------------------ */
>> > 
>> > @@ -67,8 +75,167 @@ LJLIB_CF(misc_getmetrics)
>> > 
>> > #include "lj_libdef.h"
>> > 
>> > +/* ----- misc.memprof module ---------------------------------------------- */
>> > +
>> > +#define LJLIB_MODULE_misc_memprof
>> > +
>> > +/*
>> > +** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
>> > +*/
>> > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
>> > +
>> > +/* Structure given as ctx to memprof writer and on_stop callback. */
>> > +struct memprof_ctx {
>> > +  /* Output file stream for data. */
>> > +  FILE *stream;
>> > +  /* Profiled global_State for lj_mem_free at on_stop callback. */
>> > +  global_State *g;
>> > +};
>> > +
>> > +static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
>> > +{
>> > +  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);
>> > +  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
>> > +}
>> > +
>> > +/* Default buffer writer function. Just call fwrite to corresponding FILE. */
>> > +static size_t buffer_writer_default(const void **buf_addr, size_t len,
>> > +				    void *opt)
>> > +{
>> > +  FILE *stream = ((struct memprof_ctx *)opt)->stream;
>> > +  const void * const buf_start = *buf_addr;
>> > +  const void *data = *buf_addr;
>> > +  size_t write_total = 0;
>> > +
>> > +  lua_assert(len <= STREAM_BUFFER_SIZE);
>> > +
>> > +  for (;;) {
>> > +    const size_t written = fwrite(data, 1, len, stream);
>> > +
>> > +    if (LJ_UNLIKELY(written == 0)) {
>> > +      /* Re-tries write in case of EINTR. */
>> > +      if (errno == EINTR) {
>> > +	errno = 0;
>> > +	continue;
>> > +      }
>> > +      break;
>> > +    }
>> > +
>> > +    write_total += written;
>> > +
>> > +    if (write_total == len)
>> > +      break;
>> > +
>> > +    data = (uint8_t *)data + (ptrdiff_t)written;
>> 
>> After incomplete write you’ll return to the fwrite() call with
>> data pointer moved, but with len untouched -> you’ll read beyond
>> the buffer.
>
>Oh! I keep stepping on the same rake...
>Thank you very much! Fixed!
>
>See the iterative patch below. Branch is force pushed.
>
>> 
>> > +  }
>> > +  lua_assert(write_total <= len);
>> > +
>> > +  *buf_addr = buf_start;
>> > +  return write_total;
>> > +}
>> > +
>> > +/* Default on stop callback. Just close corresponding stream. */
>> > +static int on_stop_cb_default(void *opt, uint8_t *buf)
>> > +{
>> > +  struct memprof_ctx *ctx = opt;
>> > +  FILE *stream = ctx->stream;
>> > +  memprof_ctx_free(ctx, buf);
>> > +  return fclose(stream);
>> > +}
>> > +
>> > +/* local started, err, errno = misc.memprof.start(fname) */
>> > +LJLIB_CF(misc_memprof_start)
>> > +{
>> > +  struct lua_Prof_options opt = {0};
>> > +  struct memprof_ctx *ctx;
>> > +  const char *fname;
>> > +  int memprof_status;
>> > +  int started;
>> > +
>> > +  fname = strdata(lj_lib_checkstr(L, 1));
>> > +
>> > +  ctx = lj_mem_new(L, sizeof(*ctx));
>> > +  if (ctx == NULL)
>> > +    goto errmem;
>> > +
>> > +  opt.ctx = ctx;
>> > +  opt.writer = buffer_writer_default;
>> > +  opt.on_stop = on_stop_cb_default;
>> > +  opt.len = STREAM_BUFFER_SIZE;
>> > +  opt.buf = (uint8_t *)lj_mem_new(L, STREAM_BUFFER_SIZE);
>> > +  if (NULL == opt.buf) {
>> > +    lj_mem_free(G(L), ctx, sizeof(*ctx));
>> > +    goto errmem;
>> > +  }
>> > +
>> > +  ctx->g = G(L);
>> > +  ctx->stream = fopen(fname, "wb");
>> > +
>> > +  if (ctx->stream == NULL) {
>> > +    memprof_ctx_free(ctx, opt.buf);
>> > +    return luaL_fileresult(L, 0, fname);
>> > +  }
>> > +
>> > +  memprof_status = lj_memprof_start(L, &opt);
>> > +  started = memprof_status == PROFILE_SUCCESS;
>> > +
>> > +  if (LJ_UNLIKELY(!started)) {
>> > +    fclose(ctx->stream);
>> > +    remove(fname);
>> > +    memprof_ctx_free(ctx, opt.buf);
>> > +    switch (memprof_status) {
>> > +    case PROFILE_ERRRUN:
>> > +      lua_pushnil(L);
>> > +      lua_pushstring(L, err2msg(LJ_ERR_PROF_ISRUNNING));
>> > +      lua_pushinteger(L, EINVAL);
>> > +      return 3;
>> > +    case PROFILE_ERRIO:
>> > +      return luaL_fileresult(L, 0, fname);
>> > +    default:
>> > +      lua_assert(0);
>> > +      break;
>> > +    }
>> > +  }
>> > +  lua_pushboolean(L, started);
>> > +
>> > +  return 1;
>> > +errmem:
>> > +  lua_pushnil(L);
>> > +  lua_pushstring(L, err2msg(LJ_ERR_ERRMEM));
>> > +  lua_pushinteger(L, ENOMEM);
>> > +  return 3;
>> > +}
>> > +
>> > +/* local stopped, err, errno = misc.memprof.stop() */
>> > +LJLIB_CF(misc_memprof_stop)
>> > +{
>> > +  int status = lj_memprof_stop();
>> > +  int stopped_successfully = status == PROFILE_SUCCESS;
>> > +  if (!stopped_successfully) {
>> > +    switch (status) {
>> > +    case PROFILE_ERRRUN:
>> > +      lua_pushnil(L);
>> > +      lua_pushstring(L, err2msg(LJ_ERR_PROF_NOTRUNNING));
>> > +      lua_pushinteger(L, EINVAL);
>> > +      return 3;
>> > +    case PROFILE_ERRIO:
>> > +      return luaL_fileresult(L, 0, NULL);
>> > +    default:
>> > +      lua_assert(0);
>> > +      break;
>> > +    }
>> > +  }
>> > +  lua_pushboolean(L, stopped_successfully);
>> > +  return 1;
>> > +}
>> > +
>> > +#include "lj_libdef.h"
>> > +
>> > +/* ------------------------------------------------------------------------ */
>> > +
>> > LUALIB_API int luaopen_misc(struct lua_State *L)
>> > {
>> >   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
>> > +  LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
>> >   return 1;
>> > }
>> > diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
>> > index de7b867..6816da2 100644
>> > --- a/src/lj_errmsg.h
>> > +++ b/src/lj_errmsg.h
>> > @@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
>> > ERRDEF(FFI_NYICALL,	"NYI: cannot call this C function (yet)")
>> > #endif
>> > 
>> > +#if LJ_HASPROFILE || LJ_HASMEMPROF
>> > +/* Profiler errors. */
>> > +ERRDEF(PROF_ISRUNNING,	"profiler is running already")
>> > +ERRDEF(PROF_NOTRUNNING,	"profiler is not running")
>> > +#endif
>> > +
>> > #undef ERRDEF
>> > 
>> > /* Detecting unused error messages:
>> > -- 
>> > 2.28.0
>> > 
>> 
>
>===================================================================
>diff --git a/src/lib_misc.c b/src/lib_misc.c
>index 36fe29f..f69f933 100644
>--- a/src/lib_misc.c
>+++ b/src/lib_misc.c
>@@ -110,7 +110,7 @@ static size_t buffer_writer_default(const void **buf_addr, size_t len,
>   lua_assert(len <= STREAM_BUFFER_SIZE);
> 
>   for (;;) {
>-    const size_t written = fwrite(data, 1, len, stream);
>+    const size_t written = fwrite(data, 1, len - write_total, stream);
> 
>     if (LJ_UNLIKELY(written == 0)) {
>       /* Re-tries write in case of EINTR. */
>@@ -122,13 +122,13 @@ static size_t buffer_writer_default(const void **buf_addr, size_t len,
>     }
> 
>     write_total += written;
>+    lua_assert(write_total <= len);
> 
>     if (write_total == len)
>       break;
> 
>     data = (uint8_t *)data + (ptrdiff_t)written;
>   }
>-  lua_assert(write_total <= len);
> 
>   *buf_addr = buf_start;
>   return write_total;
>===================================================================
>
>-- 
>Best regards,
>Sergey Kaplun

[-- Attachment #2: Type: text/html, Size: 24859 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-27 13:24   ` Sergey Ostanevich
@ 2020-12-27 16:02     ` Sergey Kaplun
  2020-12-27 21:55       ` Sergey Ostanevich
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-27 16:02 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches


Hi!

Thanks for the review.

On 27.12.20, Sergey Ostanevich wrote:
> Hi!
> 
> Thanks for the patch!
> See my 7 comments below.
> 
> Sergos
> 
> 
> > On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
> <snipped>
> > diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> > new file mode 100755
> > index 0000000..e02c6fa
> > --- /dev/null
> > +++ b/test/misclib-memprof-lapi.test.lua
> > @@ -0,0 +1,135 @@
> > +#!/usr/bin/env tarantool
> > +
> > +local tap = require('tap')
> > +
> > +local test = tap.test("misc-memprof-lapi")
> > +test:plan(9)
> > +
> > +jit.off()
> > +jit.flush()
> > +
> > +-- FIXME: Launch tests with LUA_PATH enviroment variable.
> > +local path = arg[0]:gsub('/[^/]+%.test%.lua', ‘’)
> 
> I believe it won’t work well for some cases, such as
> 
> tarantool> arg[0]
> ---
> - void.test.lua
> ...
> 
> tarantool> arg[0]:gsub('/[^/]+%.test%.lua', '')
> ---
> - void.test.lua
> - 0
> ...
> 
> Alternative is:
> 
> tarantool> os.execute('dirname '..arg[0])
> .
> ---
> - 0
> ...
> 
> 

I suppose you want to use `popen` here to catch output?
General practice is using `gsub` for now (see <test/utils.lua>).
I fixed regexp to the following:

| local path = arg[0]:gsub("[^/]+%.test%.lua", "")
| local path_suffix = "../tools/?.lua;"
| package.path = ("%s%s;"):format(path, path_suffix)..package.path

> > +local path_suffix = '../tools/?.lua;'
> > +package.path = ('%s/%s;'):format(path, path_suffix)..package.path
> > +
> > +local table_new = require "table.new"
> > +
> > +local bufread = require "utils.bufread"
> > +local memprof = require "memprof.parse"
> > +local symtab = require "utils.symtab"
> > +
> > +local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
> > +local BAD_PATH = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
> > +
> > +local function payload()
> > +  -- Preallocate table to avoid array part reallocations.
>                                          ^parts?

No, I meant array part of table (not a hash part).
I rewrote it to the following:
| -- Preallocate table to avoid table array part reallocations.

> > +  local _ = table_new(100, 0)
> > +
> > +  -- Want too see 100 objects here.
> > +  for i = 1, 100 do
> > +    -- Try to avoid crossing with "test" module objects.
> > +    _[i] = "memprof-str-"..i
> > +  end
> > +
> > +  _ = nil
> > +  -- VMSTATE == GC, reported as INTERNAL.
> > +  collectgarbage()
> > +end
> > +
> > +local function generate_output(filename)
> > +  -- Clean up all garbage to avoid polution of free.
>                                      pollution

Fixed.

> > +  collectgarbage()
> > +
> > +  local res, err = misc.memprof.start(filename)
> > +  -- Should start succesfully.
> > +  assert(res, err)
> > +
> > +  payload()
> > +
> > +  res, err = misc.memprof.stop()
> > +  -- Should stop succesfully.
> > +  assert(res, err)
> > +end
> 
> <snipped>
> 

I also foggot to return `jit.on()` back in the end of test.
Added.

> > diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> > new file mode 100644
> > index 0000000..f4996f4
> > --- /dev/null
> > +++ b/tools/memprof/parse.lua
> 
> <snipped>
> 
> > +local function link_to_previous(heap, e, oaddr)
> > +  -- Memory at oaddr was allocated before we started tracking.
> > +  local heap_chunk = heap[oaddr]
> 
> Do you need two args for this? Can you just pass the heap[oaddr] instead?

Yes. It's looks better. Applied. Thank you!

> 
> > +  if heap_chunk then
> > +    -- Save Lua code location (line) by address (id).
> > +    e.primary[heap_chunk[2]] = heap_chunk[3]
> > +  end
> > +end
> > +
> 
> <snipped>
> 
> > +local function ev_header_split(evh)
> > +  return band(evh, 0x3), band(evh, lshift(0x3, 2))
> 
> Should you intorduce masks along with AEVENT/ASOURCE to avoid these
> magic numbers?

My bad. Done.

> 
> > +end
> > +
> 
> <snipped>
> 
> > diff --git a/tools/utils/bufread.lua b/tools/utils/bufread.lua
> 
> <snipped>
> 
> > +
> > +local function _read_stream(reader, n)
> > +  local tail_size = reader._end - reader._pos
> > +
> > +  if tail_size >= n then
> > +    -- Enough data to satisfy the request of n bytes.
> > +    return true
> > +  end
> > +
> > +  -- Otherwise carry tail_size bytes from the end of the buffer
> > +  -- to the start and fill up free_size bytes with fresh data.
> > +  -- tail_size < n <= free_size (see assert below) ensures that
> > +  -- we don't copy overlapping memory regions.
> > +  -- reader._pos == 0 means filling buffer for the first time.
> > +
> > +  local free_size = reader._pos > 0 and reader._pos or n
> > +
> > +  assert(n <= free_size, "Internal buffer is large enough")
> 
> Does it mean I will have a fail in case _pos is less that half of the
> buffer and n is more than the tail_size? 
> Which means I can use only half of the buffer?

Hat-trick of buffer's misuse.
Thanks you very much again! Fixed (see the iterative patch below).

> 
> > +
> > +  if tail_size ~= 0 then
> > +    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
> > +  end
> > +
> > +  local bytes_read = ffi_C.fread(
> > +    reader._buf + tail_size, 1, free_size, reader._file
> > +  )
> > +
> > +  reader._pos = 0
> > +  reader._end = tail_size + bytes_read
> > +
> > +  return reader._end - reader._pos >= n
> > +end
> > +
> 
> <snipped>
> 
> > +function M.eof(reader)
> > +  local sys_feof = ffi_C.feof(reader._file)
> > +  if sys_feof == 0 then
> > +    return false
> > +  end
> > +  -- Otherwise return true only we have reached
>                                   ^^ if we

Fixed.

> 
> > +  -- the end of the buffer.
> > +  return reader._pos == reader._end
> > +end
> <snipped>
> 

The iterative patch. Branch is force-pushed.
===================================================================
diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
index 2366c00..dd484f4 100755
--- a/test/misclib-memprof-lapi.test.lua
+++ b/test/misclib-memprof-lapi.test.lua
@@ -9,9 +9,9 @@ jit.off()
 jit.flush()
 
 -- FIXME: Launch tests with LUA_PATH enviroment variable.
-local path = arg[0]:gsub("/[^/]+%.test%.lua", "")
+local path = arg[0]:gsub("[^/]+%.test%.lua", "")
 local path_suffix = "../tools/?.lua;"
-package.path = ("%s/%s;"):format(path, path_suffix)..package.path
+package.path = ("%s%s;"):format(path, path_suffix)..package.path
 
 local table_new = require "table.new"
 
@@ -23,7 +23,7 @@ local TMP_BINFILE = arg[0]:gsub("[^/]+%.test%.lua", "%.%1.memprofdata.tmp.bin")
 local BAD_PATH = arg[0]:gsub("[^/]+%.test%.lua", "%1/memprofdata.tmp.bin")
 
 local function payload()
-  -- Preallocate table to avoid array part reallocations.
+  -- Preallocate table to avoid table array part reallocations.
   local _ = table_new(100, 0)
 
   -- Want too see 100 objects here.
@@ -38,7 +38,7 @@ local function payload()
 end
 
 local function generate_output(filename)
-  -- Clean up all garbage to avoid polution of free.
+  -- Clean up all garbage to avoid pollution of free.
   collectgarbage()
 
   local res, err = misc.memprof.start(filename)
@@ -132,4 +132,5 @@ test:ok(check_alloc_report(alloc, 32, 25, 100))
 -- Collect all previous allocated objects.
 test:ok(free.INTERNAL.num == 102)
 
+jit.on()
 os.exit(test:check() and 0 or 1)
diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
index f4996f4..6dae22d 100644
--- a/tools/memprof/parse.lua
+++ b/tools/memprof/parse.lua
@@ -19,10 +19,14 @@ local AEVENT_ALLOC = 1
 local AEVENT_FREE = 2
 local AEVENT_REALLOC = 3
 
+local AEVENT_MASK = 0x3
+
 local ASOURCE_INT = lshift(1, 2)
 local ASOURCE_LFUNC = lshift(2, 2)
 local ASOURCE_CFUNC = lshift(3, 2)
 
+local ASOURCE_MASK = lshift(0x3, 2)
+
 local M = {}
 
 local function new_event(loc)
@@ -35,9 +39,8 @@ local function new_event(loc)
   }
 end
 
-local function link_to_previous(heap, e, oaddr)
-  -- Memory at oaddr was allocated before we started tracking.
-  local heap_chunk = heap[oaddr]
+local function link_to_previous(heap_chunk, e)
+  -- Memory at this chunk was allocated before we start tracking.
   if heap_chunk then
     -- Save Lua code location (line) by address (id).
     e.primary[heap_chunk[2]] = heap_chunk[3]
@@ -94,7 +97,7 @@ local function parse_realloc(reader, asource, events, heap)
   e.free = e.free + osize
   e.alloc = e.alloc + nsize
 
-  link_to_previous(heap, e, oaddr)
+  link_to_previous(heap[oaddr], e)
 
   heap[oaddr] = nil
   heap[naddr] = {nsize, id, loc}
@@ -113,7 +116,7 @@ local function parse_free(reader, asource, events, heap)
   e.num = e.num + 1
   e.free = e.free + osize
 
-  link_to_previous(heap, e, oaddr)
+  link_to_previous(heap[oaddr], e)
 
   heap[oaddr] = nil
 end
@@ -131,7 +134,7 @@ end
 -- Splits event header into event type (aka aevent = allocation
 -- event) and event source (aka asource = allocation source).
 local function ev_header_split(evh)
-  return band(evh, 0x3), band(evh, lshift(0x3, 2))
+  return band(evh, AEVENT_MASK), band(evh, ASOURCE_MASK)
 end
 
 local function parse_event(reader, events)
diff --git a/tools/utils/bufread.lua b/tools/utils/bufread.lua
index 873e06a..34bae9a 100644
--- a/tools/utils/bufread.lua
+++ b/tools/utils/bufread.lua
@@ -22,7 +22,7 @@ local BUFFER_SIZE = 10 * 1024 * 1024
 local M = {}
 
 ffi.cdef[[
-  void *memcpy(void *, const void *, size_t);
+  void *memmove(void *, const void *, size_t);
 
   typedef struct FILE_ FILE;
   FILE *fopen(const char *, const char *);
@@ -34,6 +34,8 @@ ffi.cdef[[
 local function _read_stream(reader, n)
   local tail_size = reader._end - reader._pos
 
+  assert(n <= BUFFER_SIZE, "Internal buffer is large enough")
+
   if tail_size >= n then
     -- Enough data to satisfy the request of n bytes.
     return true
@@ -41,16 +43,11 @@ local function _read_stream(reader, n)
 
   -- Otherwise carry tail_size bytes from the end of the buffer
   -- to the start and fill up free_size bytes with fresh data.
-  -- tail_size < n <= free_size (see assert below) ensures that
-  -- we don't copy overlapping memory regions.
-  -- reader._pos == 0 means filling buffer for the first time.
-
-  local free_size = reader._pos > 0 and reader._pos or n
 
-  assert(n <= free_size, "Internal buffer is large enough")
+  local free_size = BUFFER_SIZE - tail_size
 
   if tail_size ~= 0 then
-    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
+    ffi_C.memmove(reader._buf, reader._buf + reader._pos, tail_size)
   end
 
   local bytes_read = ffi_C.fread(
@@ -114,7 +111,7 @@ function M.eof(reader)
   if sys_feof == 0 then
     return false
   end
-  -- Otherwise return true only we have reached
+  -- Otherwise return true only if we have reached
   -- the end of the buffer.
   return reader._pos == reader._end
 end
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler Sergey Kaplun
  2020-12-27 10:58   ` Sergey Ostanevich
@ 2020-12-27 16:44   ` Igor Munkin
  2020-12-27 21:47     ` Sergey Kaplun
  1 sibling, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-27 16:44 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider the comments below.

On 25.12.20, Sergey Kaplun wrote:
> This patch introduces memory profiler for Lua machine.
> 
> First of all profiler dumps the definitions of all loaded Lua functions
> (symtab) via the write buffer introduced in one of the previous patches.
> 
> Profiler replaces the old allocation function with the instrumented one
> after symtab is dumped. This new function reports all allocations,
> reallocations or deallocations events via the write buffer during
> profiling. Subsequent content depends on the function's type (LFUNC,
> FFUNC or CFUNC).
> 
> To divide all traces into the one vmstate when being profiled, a special
> macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.
> 
> When profiling is over, a special epilogue event header is written and
> the old allocation function is restored back.
> 
> This change also makes debug_frameline function LuaJIT-wide visible to
> be used in the memory profiler.
> 
> For more information, see <lj_memprof.h>.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>   - Merged with debug-to-public commit and symtab.
>   - Drop [T]imer bit description.
> 
>  src/Makefile     |   8 +-
>  src/Makefile.dep |  31 ++--
>  src/lj_arch.h    |  22 +++
>  src/lj_debug.c   |   8 +-
>  src/lj_debug.h   |   3 +
>  src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_memprof.h | 165 ++++++++++++++++++
>  src/lj_obj.h     |   8 +
>  src/lj_state.c   |   8 +
>  src/ljamalg.c    |   1 +
>  10 files changed, 665 insertions(+), 19 deletions(-)
>  create mode 100644 src/lj_memprof.c
>  create mode 100644 src/lj_memprof.h
> 
> diff --git a/src/Makefile b/src/Makefile
> index 384b590..3218dfd 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -113,6 +113,12 @@ XCFLAGS=
>  # Enable GC64 mode for x64.
>  #XCFLAGS+= -DLUAJIT_ENABLE_GC64
>  #
> +# Disable the memory profiler.
> +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
> +#
> +# Disable the thread safe profiler.
> +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE

Well, I personally see little sense in this flag *now*. AFAICS it can
only occur in multithread host application running a separate VM per
thread. But this not quite well implemented even in <jit.p> profiler
(see Mike's comment here[1]), so you can either disable it for now
(since this is only MVP) or make it the same way Mike did: lock a mutex
if <memprof->g> is not NULL; otherwise simply initialize the profiler.
By the way, there is no mutex used for POSIX targets in <jit.p>, but I
don't know why.

> +#
>  ##############################################################################
>  
>  ##############################################################################

<snipped>

> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index c8d7138..5967849 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h

<snipped>

> +/* Disable or enable the memory profiler's thread safety. */
> +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
> +#define LJ_THREAD_SAFE		0
> +#else
> +#define LJ_THREAD_SAFE		1

Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/.

Otherwise thread safety doesn't work at all.

> +#endif
> +
>  #endif

<snipped>

> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> new file mode 100644
> index 0000000..e0df057
> --- /dev/null
> +++ b/src/lj_memprof.c
> @@ -0,0 +1,430 @@
> +/*
> +** Implementation of memory profiler.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#define lj_memprof_c
> +#define LUA_CORE
> +
> +#include <errno.h>
> +
> +#include "lj_memprof.h"
> +#include "lj_def.h"

<snipped>

> +#include "lua.h"

This include is excess.

> +
> +#include "lj_obj.h"
> +#include "lj_frame.h"
> +#include "lj_debug.h"
> +#include "lj_gc.h"

This include is excess.

> +#include "lj_wbuf.h"
> +
> +/* --------------------------------- Symtab --------------------------------- */

<snipped>

> +static void symtab_write_prologue(struct lj_wbuf *out)
> +{
> +  const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
> +  lj_wbuf_addn(out, ljs_header, len);
> +}

Minor: Again, I guess this function semantics can be moved right to its
caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte).

> +

<snipped>

> +/* ---------------------------- Memory profiler ----------------------------- */

<snipped>

> +struct alloc {
> +  lua_Alloc allocf; /* Allocating function. */
> +  void *state; /* Opaque allocator's state. */
> +};

Minor: This structure can be used in <lj_obj.h> to store the default
allocator. Feel free to ignore.

> +

<snipped>

> +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header,
> +				GCfunc *fn, struct lua_State *L,
> +				cTValue *nextframe)
> +{
> +  const BCLine line = lj_debug_frameline(L, fn, nextframe);
> +  lj_wbuf_addbyte(out, header | ASOURCE_LFUNC);
> +  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
> +  lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0);

As we discussed offline, I have two notes for this line:
* When the <line> value can be negative?
* When the <line> value can be zero?

Furthermore, I have no idea, why <line> is casted to <uintptr_t>.

> +}

<snipped>

> +static void memprof_write_func(struct memprof *mp, uint8_t header)
> +{
> +  struct lj_wbuf *out = &mp->out;
> +  lua_State *L = gco2th(gcref(mp->g->mem_L));
> +  cTValue *frame = L->base - 1;
> +  GCfunc *fn;
> +
> +  fn = frame_func(frame);

Minor: Why does this line differ from those above?

> +

<snipped>

> +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
> +{
> +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> +}
> +
> +/*
> +** XXX: In ideal world, we should report allocations from traces as well.
> +** But since traces must follow the semantics of the original code, behaviour of
> +** Lua and JITted code must match 1:1 in terms of allocations, which makes
> +** using memprof with enabled JIT virtually redundant. Hence the stub below.
> +*/

I guess you can drop the function below (since it simply duplicates the
INTERNAL allocation semantics), *but* the comment above is nice, so you
can move it to the default function or to the corresponding item in
<memprof_writers> set.

> +static void memprof_write_trace(struct memprof *mp, uint8_t header)
> +{
> +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> +}
> +
> +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);

Minor: I guess this is not <header> but <aevent>.

> +

<snipped>

> +static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> +{
> +  const global_State *g = mp->g;
> +  const uint32_t _vmstate = (uint32_t)~g->vmstate;
> +  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
> +  const uint8_t header = aevent;

General Genius :)

> +
> +  memprof_writers[vmstate](mp, header);
> +}

<snipped>

> +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
> +{
> +  struct memprof *mp = &memprof;

This should be "const", IMHO.

> +  struct alloc *oalloc = &mp->orig_alloc;

<snipped>

> +}
> +
> +static void memprof_write_prologue(struct lj_wbuf *out)
> +{
> +  const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
> +  lj_wbuf_addn(out, ljm_header, len);
> +}

See comments for <symtab_write_prologue>.

> +
> +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)

Side note: it's better to wrap this function to move all mutex-related
work in this wrapper.

> +{

<snipped>

> +  lua_assert(opt->writer != NULL && opt->on_stop != NULL);
> +  lua_assert(opt->buf != NULL && opt->len != 0);

Do these asserts depend on each other? If no, please split them into the
separate ones.

> +

<snipped>

> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) ||
> +		  lj_wbuf_test_flag(&mp->out, STREAM_STOP))) {

You can test (STREAM_ERR_IO|STREAM_STOP) here.

> +    /* on_stop call may change errno value. */

<snipped>

> +}
> +
> +static int memprof_stop(const struct lua_State *L)
> +{

<snipped>

> +  if (L != NULL && mp->g != G(L)) {

This is a nice check (but looks redundant for the current version). Why
did you make it optional (if L is given)?

> +    memprof_unlock();
> +    return PROFILE_ERR;
> +  }

<snipped>

> +  main_L = mainthread(mp->g);

Why do you use main coroutine here instead of the given one?

> +

<snipped>

> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> +    mp->state = MPS_HALT;
> +    /* on_stop call may change errno value. */
> +    mp->saved_errno = lj_wbuf_errno(out);
> +    /* Ignore possible errors. mp->opt.buf == NULL here. */
> +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> +    lj_wbuf_terminate(out);
> +    memprof_unlock();
> +    return PROFILE_ERRIO;
> +  }

<snipped>

> +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> +    saved_errno = lj_wbuf_errno(out);
> +    return_status = PROFILE_ERRIO;

Well, you introduce a separate variable for the status. For what? I
think this branch is the same as the "stop" one above. So either use
goto to the error branch or duplicate the logic from the "stop" branch.
In general, this looks like a mess: I almost lost the mutex unlock and
the buffer termination.

> +  }
> +
> +  lj_wbuf_terminate(out);
> +
> +  memprof_unlock();
> +  errno = saved_errno;
> +  return return_status;
> +}
> +
> +int lj_memprof_stop(void)

Why do you omit the coroutine argument for this function?

> +{

<snipped>

> +int lj_memprof_stop_vm(const struct lua_State *L)

This function is not used anywhere. Please drop it.

> +{

<snipped>

> +int lj_memprof_is_running(void)

This function is used only on global state finalization. However, it is
excess (see the comment below), so I believe it can be dropped.

> +{

<snipped>

> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> new file mode 100644
> index 0000000..a96b72f
> --- /dev/null
> +++ b/src/lj_memprof.h
> @@ -0,0 +1,165 @@

<snipped>

> +#include <stdint.h>
> +#include <stddef.h>

lj_def.h is definitely enough here.

> +

<snipped>

> +#define SYMTAB_FFUNC ((uint8_t)1)
> +#define SYMTAB_CFUNC ((uint8_t)2)
> +#define SYMTAB_TRACE ((uint8_t)3)

These defines are unused.

> +#define SYMTAB_FINAL ((uint8_t)0x80)

<snipped>

> +/* Profiler public API. */
> +#define PROFILE_SUCCESS 0
> +#define PROFILE_ERR     1

Minor: Considering the usage <PROFILE_ERRUSE> looks better to me. Feel
free to ignore.

> +#define PROFILE_ERRRUN  2
> +#define PROFILE_ERRMEM  3
> +#define PROFILE_ERRIO   4
> +
> +/* Profiler options. */
> +struct lua_Prof_options {

Typo: s/lua_Prof/lj_memprof/ since <lua_*> is used for exported members.

> +  /* Context for the profile writer and final callback. */
> +  void *ctx;
> +  /* Custom buffer to write data. */
> +  uint8_t *buf;
> +  /* The buffer's size. */
> +  size_t len;
> +  /*
> +  ** Writer function for profile events.
> +  ** Should return amount of written bytes on success or zero in case of error.
> +  ** Setting *data to NULL means end of profiling.
> +  */

Why don't you use <lj_wbuf_writer> typedef below? So you can reference
<lj_wbuf.h> for the writer contract.

> +  size_t (*writer)(const void **data, size_t len, void *ctx);
> +  /*
> +  ** Callback on profiler stopping. Required for correctly cleaning
> +  ** at vm shoutdown when profiler still running.

Typo: s/VM shoutdown/VM finalization/.
Typo: s/profiled still running/profiler is still running/.

> +  ** Returns zero on success.
> +  */
> +  int (*on_stop)(void *ctx, uint8_t *buf);
> +};
> +
> +/* Avoid to provide additional interfaces described in other headers. */

It looks like cargo cult, IMHO. What is the reason?

> +struct lua_State;
> +
> +/*
> +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of
> +** LUAM_PROFILE_ERRIO.

Typo: s/LUAM_PROFILE_*/PROFILE_*/g.

> +*/
> +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt);
> +
> +/*
> +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
> +** on call at buffer flush, profiled stream stops, or on_stop() callback
> +** returns non-zero value, returns LUAM_PROFILE_ERRIO.
> +*/

<snipped>

> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index c94617d..c94b0bb 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -523,6 +523,14 @@ enum {
>    LJ_VMST__MAX
>  };
>  
> +/*
> +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
> +** In fact, when VM executes a trace, vmstate is set to the trace number, but
> +** we aggregate all such cases into one VM state during per-VM state profiling.
> +*/

Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide
one. If <vmstate> is less than LJ_VMST__MAX it is considered as a trace
number and all LuaJIT universe works with assumtions the trace is being
run. It looks natural to me to move this change to the previous patch
related to VM states slivering into *FUNC set to close this story there.

> +
> +#define LJ_VMST_TRACE		(LJ_VMST__MAX)
> +
>  #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
>  
>  /* Metamethods. ORDER MM */
> diff --git a/src/lj_state.c b/src/lj_state.c
> index 1d9c628..6c46e3d 100644
> --- a/src/lj_state.c
> +++ b/src/lj_state.c

<snipped>

> @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
>    global_State *g = G(L);
>    int i;
>    L = mainthread(g);  /* Only the main thread can be closed. */
> +#if LJ_HASMEMPROF
> +  if (lj_memprof_is_running())

This check is excess, since you don't check the return value below. If
<lj_memprof_stop> is called when profiler doesn't work, PROFILE_ERRRUN
or PROFILE_ERRIO is yield.

> +    lj_memprof_stop();
> +#endif
>  #if LJ_HASPROFILE
>    luaJIT_profile_stop(L);
>  #endif

<snipped>

> -- 
> 2.28.0
> 

[1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field
  2020-12-27 13:09   ` Igor Munkin
@ 2020-12-27 17:44     ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-27 17:44 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 27.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! I guess this commit can be squashed with the
> following (introducing memprof engine), since <mem_L> field is added
> only to make it work. Also consider my comments below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > To determine currently allocating coroutine (that may not be equal to
> > currently executed one) a new field called mem_L is added to
> > global_State structure. This field is set on each allocation event and
> > stores the coroutine address that is used for allocation.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> >  src/lj_gc.c  | 2 ++
> >  src/lj_obj.h | 1 +
> >  2 files changed, 3 insertions(+)
> > 
> > diff --git a/src/lj_gc.c b/src/lj_gc.c
> > index 44c8aa1..800fb2c 100644
> > --- a/src/lj_gc.c
> > +++ b/src/lj_gc.c
> > @@ -852,6 +852,8 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
> >  {
> >    global_State *g = G(L);
> >    lua_assert((osz == 0) == (p == NULL));
> > +
> > +  setgcref(g->mem_L, obj2gco(L));
> 
> This field is initialized only here. What about <lj_mem_newgco> and
> <lj_mem_free>? As for the latter the initialization is not necessary,
> since all deallocations are reported as internals (but the comment
> strongly required), but this assignment is definitely lost in the first
> routine.

I'll squash this patch with the next as you recommended and send
v3 for that patch (5/7) in the reply to the cover.

> 
> >    p = g->allocf(g->allocd, p, osz, nsz);
> >    if (p == NULL && nsz > 0)
> >      lj_err_mem(L);
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for memory profiler
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for " Sergey Kaplun
  2020-12-27 11:54   ` Sergey Ostanevich
@ 2020-12-27 18:58   ` Igor Munkin
  2020-12-28  0:14     ` Sergey Kaplun
  1 sibling, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-27 18:58 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider the comments below.

On 25.12.20, Sergey Kaplun wrote:
> This patch introduces Lua API for LuaJIT memory profiler implemented in
> the scope of the previous patch.
> 
> Profiler returns some true value if started/stopped successfully,

There is no "some" true value there is a sole one. Please fix it in both
the patch and the RFC.

> returns nil on failure (plus an error message as a second result and a
> system-dependent error code as a third result).
> If LuaJIT build without memory profiler both return `false`.
> 
> <lj_errmsg.h> have adjusted with two new errors
> PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
> started/stopped already correspondingly.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>   - Added pushing of errno for ERR_PROF* and ERRMEM
>   - Added forgotten assert.
> 
>  src/Makefile.dep |   5 +-
>  src/lib_misc.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_errmsg.h  |   6 ++
>  3 files changed, 176 insertions(+), 2 deletions(-)
> 

<snipped>

> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index 6f7b9a9..36fe29f 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c

<snipped>

> @@ -67,8 +75,167 @@ LJLIB_CF(misc_getmetrics)

<snipped>

> +static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
> +{
> +  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);

Side note: This is odd that you free the buffer here, but the buffer
itself is not a part of the memprof context. Let's return to this later.

> +  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
> +}
> +
> +/* Default buffer writer function. Just call fwrite to corresponding FILE. */
> +static size_t buffer_writer_default(const void **buf_addr, size_t len,
> +				    void *opt)
> +{

<snipped>

> +    if (LJ_UNLIKELY(written == 0)) {
> +      /* Re-tries write in case of EINTR. */
> +      if (errno == EINTR) {

Minor: It's better to use early return here. Feel free to ignore.

> +	errno = 0;
> +	continue;
> +      }
> +      break;

If other error occurs, you need to pass the NULL to buf_addr, right?
Otherwise, there is no guarantee everything is written to the file and
profiling proceeds.

> +    }

<snipped>

> +
> +/* Default on stop callback. Just close corresponding stream. */

Typo: s/close corresponding/close the corresponding/.

> +static int on_stop_cb_default(void *opt, uint8_t *buf)

<snipped>

> +/* local started, err, errno = misc.memprof.start(fname) */
> +LJLIB_CF(misc_memprof_start)
> +{

<snipped>

> +  fname = strdata(lj_lib_checkstr(L, 1));

Minor: You can make this assignment alongside with the declaration.

> +
> +  ctx = lj_mem_new(L, sizeof(*ctx));
> +  if (ctx == NULL)

This is a dead code: <lj_mem_new> raises a LUA_ERRMEM.

> +    goto errmem;
> +

<snipped>

> +  if (NULL == opt.buf) {

This is a dead code: <lj_mem_new> raises a LUA_ERRMEM.

> +    lj_mem_free(G(L), ctx, sizeof(*ctx));
> +    goto errmem;
> +  }

<snipped>

> +  memprof_status = lj_memprof_start(L, &opt);
> +  started = memprof_status == PROFILE_SUCCESS;

Trust me, you don't need this variable.
*/me making Jedi mind tricks here*

> +
> +  if (LJ_UNLIKELY(!started)) {
> +    fclose(ctx->stream);
> +    remove(fname);

Minor: I doubt we need to remove a file even if LuaJIT failed to start
profiling. Leave the comment if this makes sense. Feel free to ignore.

> +    memprof_ctx_free(ctx, opt.buf);
> +    switch (memprof_status) {

<snipped>

> +    }
> +  }
> +  lua_pushboolean(L, started);

Please, s/started/1/ since there is no another value here.

> +
> +  return 1;

<snipped>

> +}
> +
> +/* local stopped, err, errno = misc.memprof.stop() */
> +LJLIB_CF(misc_memprof_stop)
> +{
> +  int status = lj_memprof_stop();
> +  int stopped_successfully = status == PROFILE_SUCCESS;

Trust me, you don't need this variable.
*/me making Jedi mind tricks here*

> +  if (!stopped_successfully) {

<snipped>

> +  lua_pushboolean(L, stopped_successfully);

Please, s/stopped_succesfully/1/ since there is no another value here.

> +  return 1;
> +}
> +

<snipped>

> diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
> index de7b867..6816da2 100644
> --- a/src/lj_errmsg.h
> +++ b/src/lj_errmsg.h
> @@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")

<snipped>

> +#if LJ_HASPROFILE || LJ_HASMEMPROF

Why did you also initialize them for <jit.p> profiler?

> +/* Profiler errors. */

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
  2020-12-27 16:44   ` Igor Munkin
@ 2020-12-27 21:47     ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-27 21:47 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!

Thanks for the review!

On 27.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > This patch introduces memory profiler for Lua machine.
> > 
> > First of all profiler dumps the definitions of all loaded Lua functions
> > (symtab) via the write buffer introduced in one of the previous patches.
> > 
> > Profiler replaces the old allocation function with the instrumented one
> > after symtab is dumped. This new function reports all allocations,
> > reallocations or deallocations events via the write buffer during
> > profiling. Subsequent content depends on the function's type (LFUNC,
> > FFUNC or CFUNC).
> > 
> > To divide all traces into the one vmstate when being profiled, a special
> > macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.
> > 
> > When profiling is over, a special epilogue event header is written and
> > the old allocation function is restored back.
> > 
> > This change also makes debug_frameline function LuaJIT-wide visible to
> > be used in the memory profiler.
> > 
> > For more information, see <lj_memprof.h>.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >   - Merged with debug-to-public commit and symtab.
> >   - Drop [T]imer bit description.
> > 
> >  src/Makefile     |   8 +-
> >  src/Makefile.dep |  31 ++--
> >  src/lj_arch.h    |  22 +++
> >  src/lj_debug.c   |   8 +-
> >  src/lj_debug.h   |   3 +
> >  src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++
> >  src/lj_memprof.h | 165 ++++++++++++++++++
> >  src/lj_obj.h     |   8 +
> >  src/lj_state.c   |   8 +
> >  src/ljamalg.c    |   1 +
> >  10 files changed, 665 insertions(+), 19 deletions(-)
> >  create mode 100644 src/lj_memprof.c
> >  create mode 100644 src/lj_memprof.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index 384b590..3218dfd 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -113,6 +113,12 @@ XCFLAGS=
> >  # Enable GC64 mode for x64.
> >  #XCFLAGS+= -DLUAJIT_ENABLE_GC64
> >  #
> > +# Disable the memory profiler.
> > +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
> > +#
> > +# Disable the thread safe profiler.
> > +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE
> 
> Well, I personally see little sense in this flag *now*. AFAICS it can
> only occur in multithread host application running a separate VM per
> thread. But this not quite well implemented even in <jit.p> profiler
> (see Mike's comment here[1]), so you can either disable it for now
> (since this is only MVP) or make it the same way Mike did: lock a mutex
> if <memprof->g> is not NULL; otherwise simply initialize the profiler.
> By the way, there is no mutex used for POSIX targets in <jit.p>, but I
> don't know why.

OK, I suppose the best solution for now is to drop all related changes
about it. I'll add corresponding notice in the RFC and header file.

> 
> > +#
> >  ##############################################################################
> >  
> >  ##############################################################################
> 
> <snipped>
> 
> > diff --git a/src/lj_arch.h b/src/lj_arch.h
> > index c8d7138..5967849 100644
> > --- a/src/lj_arch.h
> > +++ b/src/lj_arch.h
> 
> <snipped>
> 
> > +/* Disable or enable the memory profiler's thread safety. */
> > +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
> > +#define LJ_THREAD_SAFE		0
> > +#else
> > +#define LJ_THREAD_SAFE		1
> 
> Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/.
> 
> Otherwise thread safety doesn't work at all.

Dropped for now.

> 
> > +#endif
> > +
> >  #endif
> 
> <snipped>
> 
> > diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> > new file mode 100644
> > index 0000000..e0df057
> > --- /dev/null
> > +++ b/src/lj_memprof.c
> > @@ -0,0 +1,430 @@
> > +/*
> > +** Implementation of memory profiler.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#define lj_memprof_c
> > +#define LUA_CORE
> > +
> > +#include <errno.h>
> > +
> > +#include "lj_memprof.h"
> > +#include "lj_def.h"
> 
> <snipped>
> 
> > +#include "lua.h"
> 
> This include is excess.

Removed.

> 
> > +
> > +#include "lj_obj.h"
> > +#include "lj_frame.h"
> > +#include "lj_debug.h"
> > +#include "lj_gc.h"
> 
> This include is excess.

Removed. And from Makefile.dep too.

> 
> > +#include "lj_wbuf.h"
> > +
> > +/* --------------------------------- Symtab --------------------------------- */
> 
> <snipped>
> 
> > +static void symtab_write_prologue(struct lj_wbuf *out)
> > +{
> > +  const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
> > +  lj_wbuf_addn(out, ljs_header, len);
> > +}
> 
> Minor: Again, I guess this function semantics can be moved right to its
> caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte).

Moved.

> 
> > +
> 
> <snipped>
> 
> > +/* ---------------------------- Memory profiler ----------------------------- */
> 
> <snipped>
> 
> > +struct alloc {
> > +  lua_Alloc allocf; /* Allocating function. */
> > +  void *state; /* Opaque allocator's state. */
> > +};
> 
> Minor: This structure can be used in <lj_obj.h> to store the default
> allocator. Feel free to ignore.

Agree. But I suggest to do it with refactoring later. It'll cause
a lot of unnecessary code changes now.

> 
> > +
> 
> <snipped>
> 
> > +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header,
> > +				GCfunc *fn, struct lua_State *L,
> > +				cTValue *nextframe)
> > +{
> > +  const BCLine line = lj_debug_frameline(L, fn, nextframe);
> > +  lj_wbuf_addbyte(out, header | ASOURCE_LFUNC);
> > +  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
> > +  lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0);
> 
> As we discussed offline, I have two notes for this line:
> * When the <line> value can be negative?

Not for Lua function, AFAICS. I rewrote this part to the following:

|  const BCLine line = lj_debug_frameline(L, fn, nextframe);
|  /*
|  ** Line is always >= 0 if we are inside a Lua function.
|  ** Equals to zero when LuaJIT is built with the
|  ** -DLUAJIT_DISABLE_DEBUGINFO flag.
|  */
|  lua_assert(line >= 0);
|  lj_wbuf_addbyte(out, aevent | ASOURCE_LFUNC);
|  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
|  lj_wbuf_addu64(out, (uint64_t)line);

> * When the <line> value can be zero?

When LuaJIT is built with `-DLUAJIT_DISABLE_DEBUGINFO` - exist only
in <src/lj_parse.c>. Anyway, it's insignificant after the changes
above.

> 
> Furthermore, I have no idea, why <line> is casted to <uintptr_t>.

Typo. Fixed.

> 
> > +}
> 
> <snipped>
> 
> > +static void memprof_write_func(struct memprof *mp, uint8_t header)
> > +{
> > +  struct lj_wbuf *out = &mp->out;
> > +  lua_State *L = gco2th(gcref(mp->g->mem_L));
> > +  cTValue *frame = L->base - 1;
> > +  GCfunc *fn;
> > +
> > +  fn = frame_func(frame);
> 
> Minor: Why does this line differ from those above?

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
> > +{
> > +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> > +}
> > +
> > +/*
> > +** XXX: In ideal world, we should report allocations from traces as well.
> > +** But since traces must follow the semantics of the original code, behaviour of
> > +** Lua and JITted code must match 1:1 in terms of allocations, which makes
> > +** using memprof with enabled JIT virtually redundant. Hence the stub below.
> > +*/
> 
> I guess you can drop the function below (since it simply duplicates the
> INTERNAL allocation semantics), *but* the comment above is nice, so you
> can move it to the default function or to the corresponding item in
> <memprof_writers> set.

Done.

> 
> > +static void memprof_write_trace(struct memprof *mp, uint8_t header)
> > +{
> > +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> > +}
> > +
> > +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);
> 
> Minor: I guess this is not <header> but <aevent>.

Fixed here and above.

> 
> > +
> 
> <snipped>
> 
> > +static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> > +{
> > +  const global_State *g = mp->g;
> > +  const uint32_t _vmstate = (uint32_t)~g->vmstate;
> > +  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
> > +  const uint8_t header = aevent;
> 
> General Genius :)

Fixed.

> 
> > +
> > +  memprof_writers[vmstate](mp, header);
> > +}
> 
> <snipped>
> 
> > +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
> > +{
> > +  struct memprof *mp = &memprof;
> 
> This should be "const", IMHO.

Not here (we write to wbuf and it is nested into memprof structure) but
for oalloc below, fixed.

> 
> > +  struct alloc *oalloc = &mp->orig_alloc;
> 
> <snipped>
> 
> > +}
> > +
> > +static void memprof_write_prologue(struct lj_wbuf *out)
> > +{
> > +  const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
> > +  lj_wbuf_addn(out, ljm_header, len);
> > +}
> 
> See comments for <symtab_write_prologue>.

Fixed.

> 
> > +
> > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)
> 
> Side note: it's better to wrap this function to move all mutex-related
> work in this wrapper.

I'll drop mutex related work for now.

> 
> > +{
> 
> <snipped>
> 
> > +  lua_assert(opt->writer != NULL && opt->on_stop != NULL);
> > +  lua_assert(opt->buf != NULL && opt->len != 0);
> 
> Do these asserts depend on each other? If no, please split them into the
> separate ones.

Done.

> 
> > +
> 
> <snipped>
> 
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) ||
> > +		  lj_wbuf_test_flag(&mp->out, STREAM_STOP))) {
> 
> You can test (STREAM_ERR_IO|STREAM_STOP) here.

Yes, it's pretier! Thank you!

> 
> > +    /* on_stop call may change errno value. */
> 
> <snipped>
> 
> > +}
> > +
> > +static int memprof_stop(const struct lua_State *L)
> > +{
> 
> <snipped>
> 
> > +  if (L != NULL && mp->g != G(L)) {
> 
> This is a nice check (but looks redundant for the current version). Why
> did you make it optional (if L is given)?

Dropped L-part (considering your comments below).

> 
> > +    memprof_unlock();
> > +    return PROFILE_ERR;
> > +  }
> 
> <snipped>
> 
> > +  main_L = mainthread(mp->g);
> 
> Why do you use main coroutine here instead of the given one?

See no reason for that now... Dropped.

> 
> > +
> 
> <snipped>
> 
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> > +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> > +    mp->state = MPS_HALT;
> > +    /* on_stop call may change errno value. */
> > +    mp->saved_errno = lj_wbuf_errno(out);
> > +    /* Ignore possible errors. mp->opt.buf == NULL here. */
> > +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > +    lj_wbuf_terminate(out);
> > +    memprof_unlock();
> > +    return PROFILE_ERRIO;
> > +  }
> 
> <snipped>
> 
> > +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> > +    saved_errno = lj_wbuf_errno(out);
> > +    return_status = PROFILE_ERRIO;
> 
> Well, you introduce a separate variable for the status. For what? I
> think this branch is the same as the "stop" one above. So either use
> goto to the error branch or duplicate the logic from the "stop" branch.
> In general, this looks like a mess: I almost lost the mutex unlock and
> the buffer termination.

Yes, looks unclear. I'll rewrite it in the new version.

> 
> > +  }
> > +
> > +  lj_wbuf_terminate(out);
> > +
> > +  memprof_unlock();
> > +  errno = saved_errno;
> > +  return return_status;
> > +}
> > +
> > +int lj_memprof_stop(void)
> 
> Why do you omit the coroutine argument for this function?
> 
> > +{
> 
> <snipped>
> 
> > +int lj_memprof_stop_vm(const struct lua_State *L)
> 
> This function is not used anywhere. Please drop it.

About this and function above.
I'll provide 1 interface with the coroutine argument.

> 
> > +{
> 
> <snipped>
> 
> > +int lj_memprof_is_running(void)
> 
> This function is used only on global state finalization. However, it is
> excess (see the comment below), so I believe it can be dropped.

Done.

> 
> > +{
> 
> <snipped>
> 
> > diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> > new file mode 100644
> > index 0000000..a96b72f
> > --- /dev/null
> > +++ b/src/lj_memprof.h
> > @@ -0,0 +1,165 @@
> 
> <snipped>
> 
> > +#include <stdint.h>
> > +#include <stddef.h>
> 
> lj_def.h is definitely enough here.

Yep.

> 
> > +
> 
> <snipped>
> 
> > +#define SYMTAB_FFUNC ((uint8_t)1)
> > +#define SYMTAB_CFUNC ((uint8_t)2)
> > +#define SYMTAB_TRACE ((uint8_t)3)
> 
> These defines are unused.

Removed.

> 
> > +#define SYMTAB_FINAL ((uint8_t)0x80)
> 
> <snipped>
> 
> > +/* Profiler public API. */
> > +#define PROFILE_SUCCESS 0
> > +#define PROFILE_ERR     1
> 
> Minor: Considering the usage <PROFILE_ERRUSE> looks better to me. Feel
> free to ignore.

Hmm, indead. Applied.

> 
> > +#define PROFILE_ERRRUN  2
> > +#define PROFILE_ERRMEM  3
> > +#define PROFILE_ERRIO   4
> > +
> > +/* Profiler options. */
> > +struct lua_Prof_options {
> 
> Typo: s/lua_Prof/lj_memprof/ since <lua_*> is used for exported members.

Fixed.

> 
> > +  /* Context for the profile writer and final callback. */
> > +  void *ctx;
> > +  /* Custom buffer to write data. */
> > +  uint8_t *buf;
> > +  /* The buffer's size. */
> > +  size_t len;
> > +  /*
> > +  ** Writer function for profile events.
> > +  ** Should return amount of written bytes on success or zero in case of error.
> > +  ** Setting *data to NULL means end of profiling.
> > +  */
> 
> Why don't you use <lj_wbuf_writer> typedef below? So you can reference
> <lj_wbuf.h> for the writer contract.

Updated.

> 
> > +  size_t (*writer)(const void **data, size_t len, void *ctx);
> > +  /*
> > +  ** Callback on profiler stopping. Required for correctly cleaning
> > +  ** at vm shoutdown when profiler still running.
> 
> Typo: s/VM shoutdown/VM finalization/.
> Typo: s/profiled still running/profiler is still running/.

Thanks! Fixed.

> 
> > +  ** Returns zero on success.
> > +  */
> > +  int (*on_stop)(void *ctx, uint8_t *buf);
> > +};
> > +
> > +/* Avoid to provide additional interfaces described in other headers. */
> 
> It looks like cargo cult, IMHO. What is the reason?

You don't add extra information in object file and it's become
smaller, for example :). Avoiding namespace polution is a small
benefit.

> 
> > +struct lua_State;
> > +
> > +/*
> > +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> > +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of
> > +** LUAM_PROFILE_ERRIO.
> 
> Typo: s/LUAM_PROFILE_*/PROFILE_*/g.

Fixed! Thanks!

> 
> > +*/
> > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt);
> > +
> > +/*
> > +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> > +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
> > +** on call at buffer flush, profiled stream stops, or on_stop() callback
> > +** returns non-zero value, returns LUAM_PROFILE_ERRIO.
> > +*/
> 
> <snipped>
> 
> > diff --git a/src/lj_obj.h b/src/lj_obj.h
> > index c94617d..c94b0bb 100644
> > --- a/src/lj_obj.h
> > +++ b/src/lj_obj.h
> > @@ -523,6 +523,14 @@ enum {
> >    LJ_VMST__MAX
> >  };
> >  
> > +/*
> > +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
> > +** In fact, when VM executes a trace, vmstate is set to the trace number, but
> > +** we aggregate all such cases into one VM state during per-VM state profiling.
> > +*/
> 
> Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide
> one. If <vmstate> is less than LJ_VMST__MAX it is considered as a trace
> number and all LuaJIT universe works with assumtions the trace is being
> run. It looks natural to me to move this change to the previous patch
> related to VM states slivering into *FUNC set to close this story there.

Done.

> 
> > +
> > +#define LJ_VMST_TRACE		(LJ_VMST__MAX)
> > +
> >  #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
> >  
> >  /* Metamethods. ORDER MM */
> > diff --git a/src/lj_state.c b/src/lj_state.c
> > index 1d9c628..6c46e3d 100644
> > --- a/src/lj_state.c
> > +++ b/src/lj_state.c
> 
> <snipped>
> 
> > @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
> >    global_State *g = G(L);
> >    int i;
> >    L = mainthread(g);  /* Only the main thread can be closed. */
> > +#if LJ_HASMEMPROF
> > +  if (lj_memprof_is_running())
> 
> This check is excess, since you don't check the return value below. If
> <lj_memprof_stop> is called when profiler doesn't work, PROFILE_ERRRUN
> or PROFILE_ERRIO is yield.

Dropped check.

> 
> > +    lj_memprof_stop();
> > +#endif
> >  #if LJ_HASPROFILE
> >    luaJIT_profile_stop(L);
> >  #endif
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> [1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89
> 
> -- 
> Best regards,
> IM

Also I'll rename STREAM_ERR_* -> STREAM_ERR* for consistency.
I'll send the iterative patch for wbuf module soon.

See you soon in v3 :)

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-27 16:02     ` Sergey Kaplun
@ 2020-12-27 21:55       ` Sergey Ostanevich
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Ostanevich @ 2020-12-27 21:55 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 11684 bytes --]

LGTM.

Sergos

> On 27 Dec 2020, at 19:02, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> 
> Hi!
> 
> Thanks for the review.
> 
> On 27.12.20, Sergey Ostanevich wrote:
>> Hi!
>> 
>> Thanks for the patch!
>> See my 7 comments below.
>> 
>> Sergos
>> 
>> 
>>> On 25 Dec 2020, at 18:26, Sergey Kaplun <skaplun@tarantool.org> wrote:
>> <snipped>
>>> diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
>>> new file mode 100755
>>> index 0000000..e02c6fa
>>> --- /dev/null
>>> +++ b/test/misclib-memprof-lapi.test.lua
>>> @@ -0,0 +1,135 @@
>>> +#!/usr/bin/env tarantool
>>> +
>>> +local tap = require('tap')
>>> +
>>> +local test = tap.test("misc-memprof-lapi")
>>> +test:plan(9)
>>> +
>>> +jit.off()
>>> +jit.flush()
>>> +
>>> +-- FIXME: Launch tests with LUA_PATH enviroment variable.
>>> +local path = arg[0]:gsub('/[^/]+%.test%.lua', ‘’)
>> 
>> I believe it won’t work well for some cases, such as
>> 
>> tarantool> arg[0]
>> ---
>> - void.test.lua
>> ...
>> 
>> tarantool> arg[0]:gsub('/[^/]+%.test%.lua', '')
>> ---
>> - void.test.lua
>> - 0
>> ...
>> 
>> Alternative is:
>> 
>> tarantool> os.execute('dirname '..arg[0])
>> .
>> ---
>> - 0
>> ...
>> 
>> 
> 
> I suppose you want to use `popen` here to catch output?
> General practice is using `gsub` for now (see <test/utils.lua>).
> I fixed regexp to the following:
> 
> | local path = arg[0]:gsub("[^/]+%.test%.lua", "")
> | local path_suffix = "../tools/?.lua;"
> | package.path = ("%s%s;"):format(path, path_suffix)..package.path
> 
>>> +local path_suffix = '../tools/?.lua;'
>>> +package.path = ('%s/%s;'):format(path, path_suffix)..package.path
>>> +
>>> +local table_new = require "table.new"
>>> +
>>> +local bufread = require "utils.bufread"
>>> +local memprof = require "memprof.parse"
>>> +local symtab = require "utils.symtab"
>>> +
>>> +local TMP_BINFILE = arg[0]:gsub('[^/]+%.test%.lua', '%.%1.memprofdata.tmp.bin')
>>> +local BAD_PATH = arg[0]:gsub('[^/]+%.test%.lua', '%1/memprofdata.tmp.bin')
>>> +
>>> +local function payload()
>>> +  -- Preallocate table to avoid array part reallocations.
>>                                         ^parts?
> 
> No, I meant array part of table (not a hash part).
> I rewrote it to the following:
> | -- Preallocate table to avoid table array part reallocations.
> 
>>> +  local _ = table_new(100, 0)
>>> +
>>> +  -- Want too see 100 objects here.
>>> +  for i = 1, 100 do
>>> +    -- Try to avoid crossing with "test" module objects.
>>> +    _[i] = "memprof-str-"..i
>>> +  end
>>> +
>>> +  _ = nil
>>> +  -- VMSTATE == GC, reported as INTERNAL.
>>> +  collectgarbage()
>>> +end
>>> +
>>> +local function generate_output(filename)
>>> +  -- Clean up all garbage to avoid polution of free.
>>                                     pollution
> 
> Fixed.
> 
>>> +  collectgarbage()
>>> +
>>> +  local res, err = misc.memprof.start(filename)
>>> +  -- Should start succesfully.
>>> +  assert(res, err)
>>> +
>>> +  payload()
>>> +
>>> +  res, err = misc.memprof.stop()
>>> +  -- Should stop succesfully.
>>> +  assert(res, err)
>>> +end
>> 
>> <snipped>
>> 
> 
> I also foggot to return `jit.on()` back in the end of test.
> Added.
> 
>>> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
>>> new file mode 100644
>>> index 0000000..f4996f4
>>> --- /dev/null
>>> +++ b/tools/memprof/parse.lua
>> 
>> <snipped>
>> 
>>> +local function link_to_previous(heap, e, oaddr)
>>> +  -- Memory at oaddr was allocated before we started tracking.
>>> +  local heap_chunk = heap[oaddr]
>> 
>> Do you need two args for this? Can you just pass the heap[oaddr] instead?
> 
> Yes. It's looks better. Applied. Thank you!
> 
>> 
>>> +  if heap_chunk then
>>> +    -- Save Lua code location (line) by address (id).
>>> +    e.primary[heap_chunk[2]] = heap_chunk[3]
>>> +  end
>>> +end
>>> +
>> 
>> <snipped>
>> 
>>> +local function ev_header_split(evh)
>>> +  return band(evh, 0x3), band(evh, lshift(0x3, 2))
>> 
>> Should you intorduce masks along with AEVENT/ASOURCE to avoid these
>> magic numbers?
> 
> My bad. Done.
> 
>> 
>>> +end
>>> +
>> 
>> <snipped>
>> 
>>> diff --git a/tools/utils/bufread.lua b/tools/utils/bufread.lua
>> 
>> <snipped>
>> 
>>> +
>>> +local function _read_stream(reader, n)
>>> +  local tail_size = reader._end - reader._pos
>>> +
>>> +  if tail_size >= n then
>>> +    -- Enough data to satisfy the request of n bytes.
>>> +    return true
>>> +  end
>>> +
>>> +  -- Otherwise carry tail_size bytes from the end of the buffer
>>> +  -- to the start and fill up free_size bytes with fresh data.
>>> +  -- tail_size < n <= free_size (see assert below) ensures that
>>> +  -- we don't copy overlapping memory regions.
>>> +  -- reader._pos == 0 means filling buffer for the first time.
>>> +
>>> +  local free_size = reader._pos > 0 and reader._pos or n
>>> +
>>> +  assert(n <= free_size, "Internal buffer is large enough")
>> 
>> Does it mean I will have a fail in case _pos is less that half of the
>> buffer and n is more than the tail_size? 
>> Which means I can use only half of the buffer?
> 
> Hat-trick of buffer's misuse.
> Thanks you very much again! Fixed (see the iterative patch below).
> 
>> 
>>> +
>>> +  if tail_size ~= 0 then
>>> +    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
>>> +  end
>>> +
>>> +  local bytes_read = ffi_C.fread(
>>> +    reader._buf + tail_size, 1, free_size, reader._file
>>> +  )
>>> +
>>> +  reader._pos = 0
>>> +  reader._end = tail_size + bytes_read
>>> +
>>> +  return reader._end - reader._pos >= n
>>> +end
>>> +
>> 
>> <snipped>
>> 
>>> +function M.eof(reader)
>>> +  local sys_feof = ffi_C.feof(reader._file)
>>> +  if sys_feof == 0 then
>>> +    return false
>>> +  end
>>> +  -- Otherwise return true only we have reached
>>                                  ^^ if we
> 
> Fixed.
> 
>> 
>>> +  -- the end of the buffer.
>>> +  return reader._pos == reader._end
>>> +end
>> <snipped>
>> 
> 
> The iterative patch. Branch is force-pushed.
> ===================================================================
> diff --git a/test/misclib-memprof-lapi.test.lua b/test/misclib-memprof-lapi.test.lua
> index 2366c00..dd484f4 100755
> --- a/test/misclib-memprof-lapi.test.lua
> +++ b/test/misclib-memprof-lapi.test.lua
> @@ -9,9 +9,9 @@ jit.off()
> jit.flush()
> 
> -- FIXME: Launch tests with LUA_PATH enviroment variable.
> -local path = arg[0]:gsub("/[^/]+%.test%.lua", "")
> +local path = arg[0]:gsub("[^/]+%.test%.lua", "")
> local path_suffix = "../tools/?.lua;"
> -package.path = ("%s/%s;"):format(path, path_suffix)..package.path
> +package.path = ("%s%s;"):format(path, path_suffix)..package.path
> 
> local table_new = require "table.new"
> 
> @@ -23,7 +23,7 @@ local TMP_BINFILE = arg[0]:gsub("[^/]+%.test%.lua", "%.%1.memprofdata.tmp.bin")
> local BAD_PATH = arg[0]:gsub("[^/]+%.test%.lua", "%1/memprofdata.tmp.bin")
> 
> local function payload()
> -  -- Preallocate table to avoid array part reallocations.
> +  -- Preallocate table to avoid table array part reallocations.
>   local _ = table_new(100, 0)
> 
>   -- Want too see 100 objects here.
> @@ -38,7 +38,7 @@ local function payload()
> end
> 
> local function generate_output(filename)
> -  -- Clean up all garbage to avoid polution of free.
> +  -- Clean up all garbage to avoid pollution of free.
>   collectgarbage()
> 
>   local res, err = misc.memprof.start(filename)
> @@ -132,4 +132,5 @@ test:ok(check_alloc_report(alloc, 32, 25, 100))
> -- Collect all previous allocated objects.
> test:ok(free.INTERNAL.num == 102)
> 
> +jit.on()
> os.exit(test:check() and 0 or 1)
> diff --git a/tools/memprof/parse.lua b/tools/memprof/parse.lua
> index f4996f4..6dae22d 100644
> --- a/tools/memprof/parse.lua
> +++ b/tools/memprof/parse.lua
> @@ -19,10 +19,14 @@ local AEVENT_ALLOC = 1
> local AEVENT_FREE = 2
> local AEVENT_REALLOC = 3
> 
> +local AEVENT_MASK = 0x3
> +
> local ASOURCE_INT = lshift(1, 2)
> local ASOURCE_LFUNC = lshift(2, 2)
> local ASOURCE_CFUNC = lshift(3, 2)
> 
> +local ASOURCE_MASK = lshift(0x3, 2)
> +
> local M = {}
> 
> local function new_event(loc)
> @@ -35,9 +39,8 @@ local function new_event(loc)
>   }
> end
> 
> -local function link_to_previous(heap, e, oaddr)
> -  -- Memory at oaddr was allocated before we started tracking.
> -  local heap_chunk = heap[oaddr]
> +local function link_to_previous(heap_chunk, e)
> +  -- Memory at this chunk was allocated before we start tracking.
>   if heap_chunk then
>     -- Save Lua code location (line) by address (id).
>     e.primary[heap_chunk[2]] = heap_chunk[3]
> @@ -94,7 +97,7 @@ local function parse_realloc(reader, asource, events, heap)
>   e.free = e.free + osize
>   e.alloc = e.alloc + nsize
> 
> -  link_to_previous(heap, e, oaddr)
> +  link_to_previous(heap[oaddr], e)
> 
>   heap[oaddr] = nil
>   heap[naddr] = {nsize, id, loc}
> @@ -113,7 +116,7 @@ local function parse_free(reader, asource, events, heap)
>   e.num = e.num + 1
>   e.free = e.free + osize
> 
> -  link_to_previous(heap, e, oaddr)
> +  link_to_previous(heap[oaddr], e)
> 
>   heap[oaddr] = nil
> end
> @@ -131,7 +134,7 @@ end
> -- Splits event header into event type (aka aevent = allocation
> -- event) and event source (aka asource = allocation source).
> local function ev_header_split(evh)
> -  return band(evh, 0x3), band(evh, lshift(0x3, 2))
> +  return band(evh, AEVENT_MASK), band(evh, ASOURCE_MASK)
> end
> 
> local function parse_event(reader, events)
> diff --git a/tools/utils/bufread.lua b/tools/utils/bufread.lua
> index 873e06a..34bae9a 100644
> --- a/tools/utils/bufread.lua
> +++ b/tools/utils/bufread.lua
> @@ -22,7 +22,7 @@ local BUFFER_SIZE = 10 * 1024 * 1024
> local M = {}
> 
> ffi.cdef[[
> -  void *memcpy(void *, const void *, size_t);
> +  void *memmove(void *, const void *, size_t);
> 
>   typedef struct FILE_ FILE;
>   FILE *fopen(const char *, const char *);
> @@ -34,6 +34,8 @@ ffi.cdef[[
> local function _read_stream(reader, n)
>   local tail_size = reader._end - reader._pos
> 
> +  assert(n <= BUFFER_SIZE, "Internal buffer is large enough")
> +
>   if tail_size >= n then
>     -- Enough data to satisfy the request of n bytes.
>     return true
> @@ -41,16 +43,11 @@ local function _read_stream(reader, n)
> 
>   -- Otherwise carry tail_size bytes from the end of the buffer
>   -- to the start and fill up free_size bytes with fresh data.
> -  -- tail_size < n <= free_size (see assert below) ensures that
> -  -- we don't copy overlapping memory regions.
> -  -- reader._pos == 0 means filling buffer for the first time.
> -
> -  local free_size = reader._pos > 0 and reader._pos or n
> 
> -  assert(n <= free_size, "Internal buffer is large enough")
> +  local free_size = BUFFER_SIZE - tail_size
> 
>   if tail_size ~= 0 then
> -    ffi_C.memcpy(reader._buf, reader._buf + reader._pos, tail_size)
> +    ffi_C.memmove(reader._buf, reader._buf + reader._pos, tail_size)
>   end
> 
>   local bytes_read = ffi_C.fread(
> @@ -114,7 +111,7 @@ function M.eof(reader)
>   if sys_feof == 0 then
>     return false
>   end
> -  -- Otherwise return true only we have reached
> +  -- Otherwise return true only if we have reached
>   -- the end of the buffer.
>   return reader._pos == reader._end
> end
> ===================================================================
> 
> -- 
> Best regards,
> Sergey Kaplun


[-- Attachment #2: Type: text/html, Size: 143228 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
  2020-12-26 19:07   ` Sergey Ostanevich
@ 2020-12-27 23:48   ` Igor Munkin
  2020-12-28  3:54     ` Sergey Kaplun
  1 sibling, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-27 23:48 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! Please consider the comments below.

On 25.12.20, Sergey Kaplun wrote:
> This patch introduces LJ_VMST_LFUNC and LJ_VMST_FFUNC VM states
> separated from LJ_VMST_INERP. New VM states allow to determine the
> context of Lua VM execution for x86 and x64 arches. Also, LJ_VMST_C is
> renamed to LJ_VMST_CFUNC for naming consistence with new VM states.
> 
> Also, this patch adjusts stack layout for x86 and x64 arches to save VM
> state for its consistency while stack unwinding when error is raised.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>  - Moved `.if not WIN` macro check inside (save|restore)_vmstate_through
>  - Fixed naming: SAVE_UNUSED\d -> UNUSED\d
> 
>  src/lj_frame.h     |  18 +++----
>  src/lj_obj.h       |   4 +-
>  src/lj_profile.c   |   5 +-
>  src/luajit-gdb.py  |  14 ++---
>  src/vm_arm.dasc    |   6 +--
>  src/vm_arm64.dasc  |   6 +--
>  src/vm_mips.dasc   |   6 +--
>  src/vm_mips64.dasc |   6 +--
>  src/vm_ppc.dasc    |   6 +--
>  src/vm_x64.dasc    |  93 ++++++++++++++++++++++----------
>  src/vm_x86.dasc    | 131 +++++++++++++++++++++++++++++----------------
>  11 files changed, 188 insertions(+), 107 deletions(-)
> 
> diff --git a/src/lj_frame.h b/src/lj_frame.h
> index 19c49a4..2e693f9 100644
> --- a/src/lj_frame.h
> +++ b/src/lj_frame.h

<snipped>

> @@ -152,11 +152,11 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
>  #define CFRAME_OFS_NRES		(22*4)
>  #define CFRAME_OFS_MULTRES	(21*4)
>  #endif
> -#define CFRAME_SIZE		(10*8)
> +#define CFRAME_SIZE		(12*8)

This change looks Windows-related, so is excess.

>  #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 9*16 + 4*8)

<snipped>

> diff --git a/src/lj_profile.c b/src/lj_profile.c
> index 116998e..637e03c 100644
> --- a/src/lj_profile.c
> +++ b/src/lj_profile.c
> @@ -157,7 +157,10 @@ static void profile_trigger(ProfileState *ps)
>      int st = g->vmstate;
>      ps->vmstate = st >= 0 ? 'N' :
>  		  st == ~LJ_VMST_INTERP ? 'I' :
> -		  st == ~LJ_VMST_C ? 'C' :
> +		  st == ~LJ_VMST_CFUNC ? 'C' :
> +		  /* Stubs for profiler hooks. */
> +		  st == ~LJ_VMST_FFUNC ? 'I' :
> +		  st == ~LJ_VMST_LFUNC ? 'I' :

Minor: Move this comparisons above to save the ordering from <lj_obj.h>.

>  		  st == ~LJ_VMST_GC ? 'G' : 'J';
>      g->hookmask = (mask | HOOK_PROFILE);
>      lj_dispatch_update(g);

<snipped>

> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 80753e0..83cc3e1 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc

<snipped>

> @@ -161,26 +161,29 @@

<snipped>

> +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> +|.define UNUSED2,	qword [rsp+qword*5]
> +|.define UNUSED1,	dword [rsp+dword*8]

UNUSED1 should alias <dword [rsp + dword*9]>

Minor: What about UNUSEDd and UNUSEDq instead? Feel free to ignore.

> +|.define SAVE_VMSTATE,	dword [rsp+dword*8]

<snipped>

> @@ -342,6 +345,22 @@
>  |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
>  |.endmacro
>  |
> +|// Save vmstate through register.
> +|.macro save_vmstate_through, reg
> +|.if not WIN
> +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> +|  mov SAVE_VMSTATE, reg

This is too complicated. VM is implemented as a file with 6kloc of
pseudo-assembly in it and I don't want to make its support harder. We've
recently found a bug with miscomparison that we coudn't catch for a
year. This is not a fun. This is not a kinda sport. This is damn 6k kloc
of assembly we ought to support. So I propose to make this macro work
with no args at all. Consider the following:
|.macro save_vmstate
|.if not WIN
|  mov TMPRd, dword [DISPATCH+DISPATCH_GL(vmstate)]	// TMPRd is r10d
|  mov SAVE_VMSTATE, TMPRd
|.endif // WIN
|
|.macro restore_vmstate
|.if not WIN
|  mov TMPRd, SAVE_VMSTATE	// XCHGd is r11d
|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], TMPRd
|.endif // WIN

I agree that TMPRd is used in other places within vm_x64.dasc but this
is temporary register and I failed to find any conflict. I left the
comments below to ease the check.

> +|.endif // WIN
> +|.endmacro

<snipped>

> @@ -448,6 +467,8 @@ static void build_subroutines(BuildCtx *ctx)
>    |  xor eax, eax			// Ok return status for vm_pcall.
>    |
>    |->vm_leave_unw:
> +  |  // DISPATCH required to set properly.
> +  |  restore_vmstate_through RAd

Re TMPR: The TMP register is not used, so there is no need to restore
vmstate via RAd. Everything is OK.

>    |  restoreregs
>    |  ret
>    |

<snipped>

> @@ -521,7 +544,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mov [BASE-16], RA			// Prepend false to error message.
>    |  mov [BASE-8], RB
>    |  mov RA, -16			// Results start at BASE+RA = BASE-16.
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C

Typo: Please, adjust the comments considering the ones nearby.

>    |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
>    |
>    |//-----------------------------------------------------------------------
> @@ -575,6 +598,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  lea KBASE, [esp+CFRAME_RESUME]
>    |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
>    |  add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through TMPRd

Re TMPR: It is manually used here. Everything is OK.

>    |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
>    |  mov SAVE_CFRAME, RD
>    |  mov SAVE_NRES, RDd
> @@ -585,7 +609,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |  // Resume after yield (like a return).
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return

Typo: Please, adjust the comments considering the ones nearby.

>    |  mov byte L:RB->status, RDL
>    |  mov BASE, L:RB->base
>    |  mov RD, L:RB->top
> @@ -622,11 +646,12 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mov SAVE_CFRAME, KBASE
>    |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
>    |  add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through RDd

Re TMPR: The temporary register is not used prior to this line.
Everything is OK.

>    |  mov L:RB->cframe, rsp
>    |
>    |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*

This branch is also taken for lj_vm_pcall/lj_vm_call/lj_vm_cpcall.

Typo: Please, adjust the comments considering the ones nearby.

>    |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
>    |  add PC, RA
>    |  sub PC, BASE			// PC = frame delta + frame type
> @@ -658,6 +683,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mov SAVE_ERRF, 0			// No error function.
>    |  mov SAVE_NRES, KBASEd		// Neg. delta means cframe w/o frame.
>    |   add DISPATCH, GG_G2DISP
> +  |  save_vmstate_through KBASEd

Re TMPR: The temporary register is not used prior to this line.
Everything is OK.

>    |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
>    |
>    |  mov KBASE, L:RB->cframe		// Add our C frame to cframe chain.

<snipped>

> @@ -1578,7 +1606,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mov L:PC, TMP1
>    |  mov BASE, L:RB->base
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return

Typo: Please, adjust the comments considering the ones nearby.

>    |
>    |  cmp eax, LUA_YIELD
>    |  ja >8

<snipped>

> @@ -3974,6 +4003,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>  
>    case BC_CALL: case BC_CALLM:
>      |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup

It looks like VM state is INTERP until the call enters *FUNC* bytecode.

>      if (op == BC_CALLM) {
>        |  add NARGS:RDd, MULTRES
>      }
> @@ -3995,6 +4025,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  mov LFUNC:RB, [RA-16]
>      |  checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
>      |->BC_CALLT_Z:
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup

Ditto.

>      |  mov PC, [BASE-8]
>      |  test PCd, FRAME_TYPE
>      |  jnz >7
> @@ -4219,6 +4250,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  shl RAd, 3
>      }
>      |1:
> +    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored

Typo: Please, adjust the comments considering the ones nearby.

>      |  mov PC, [BASE-8]
>      |  mov MULTRES, RDd			// Save nresults+1.
>      |  test PCd, FRAME_TYPE		// Check frame type marker.

<snipped>

> @@ -4551,6 +4584,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
>      |  mov KBASE, [PC-4+PC2PROTO(k)]
>      |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration

This is OK, but this path is also taken for the JFUNC. Do we need to set
another VM state for this case (in scope of JLOOP)?

>      |  lea RA, [BASE+RA*8]		// Top of frame.
>      |  cmp RA, L:RB->maxstack
>      |  ja ->vm_growstack_f
> @@ -4588,6 +4622,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  mov [RD-8], RB			// Store delta + FRAME_VARG.
>      |  mov [RD-16], LFUNC:KBASE		// Store copy of LFUNC.
>      |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration

Ditto.

>      |  lea RA, [RD+RA*8]
>      |  cmp RA, L:RB->maxstack
>      |  ja ->vm_growstack_v		// Need to grow stack.

<snipped>

> @@ -4653,7 +4688,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  // nresults returned in eax (RD).
>      |  mov BASE, L:RB->base
>      |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -    |  set_vmstate INTERP
> +    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return

Typo: Please, adjust the comments considering the ones nearby.

>      |  lea RA, [BASE+RD*8]
>      |  neg RA
>      |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index d76fbe3..b9dffa9 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc

<snipped>

> @@ -290,33 +295,35 @@

<snipped>

> +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> +|.define UNUSED1,	qword [rsp+qword*5]

There is lost unused dword right here.

> +|.define SAVE_VMSTATE,	dword [rsp+dword*8]

<snipped>

> @@ -433,6 +440,22 @@
>  |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
>  |.endmacro
>  |
> +|// Save vmstate through register.
> +|.macro save_vmstate_through, reg
> +|.if not WIN
> +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> +|  mov SAVE_VMSTATE, reg

See the rationale in the note to vm_x64.dasc. Unfortunately, for
vm_x86.dasc the macro has two branches, *but* I think it is much better
than managing free registers in *whole* VM. Consider the following:
|.macro save_vmstate
|.if not WIN
|.if not X64
|  mov UNUSED1, ecx	// Please rename this field (e.g SAVE_XCHG).
|  mov ecx, dword [DISPATCH+DISPATCH_GL(vmstate)]
|  mov SAVE_VMSTATE, ecx
|  mov ecx, UNUSED1
|.else // X64
|  mov XCHGd, dword [DISPATCH+DISPATCH_GL(vmstate)]	// XCHGd is r11d
|  mov SAVE_VMSTATE, XCHGd
|.endif // X64
|.endif // WIN
|
|.macro restore_vmstate
|.if not WIN
|.if not X64
|  mov UNUSED1, ecx	// Please rename this field (e.g SPILLECX).
|  mov ecx, SAVE_VMSTATE
|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ecx
|  mov ecx, UNUSED1
|.else // X64
|  mov XCHGd, SAVE_VMSTATE	// XCHGd is r11d
|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], XCHGd
|.endif // X64
|.endif // WIN

> +|.endif // WIN
> +|.endmacro

<snipped>

>    |->vm_unwind_rethrow:
> @@ -647,7 +674,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mov PC, [BASE-4]			// Fetch PC of previous frame.
>    |  mov dword [BASE-4], LJ_TFALSE	// Prepend false to error message.
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C

Typo: Please, adjust the comments considering the ones nearby.

>    |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
>    |
>    |.if WIN and not X64

<snipped>

> @@ -730,7 +758,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |  // Resume after yield (like a return).
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return

Typo: Please, adjust the comments considering the ones nearby.

>    |  mov byte L:RB->status, RDL
>    |  mov BASE, L:RB->base
>    |  mov RD, L:RB->top

<snipped>

> @@ -782,7 +811,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |
>    |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*

Typo: Please, adjust the comments considering the ones nearby.

>    |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
>    |  add PC, RA
>    |  sub PC, BASE			// PC = frame delta + frame type

<snipped>

> @@ -1924,7 +1956,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |.endif
>    |  mov BASE, L:RB->base
>    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -  |  set_vmstate INTERP
> +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return

Typo: Please, adjust the comments considering the ones nearby.

>    |
>    |  cmp eax, LUA_YIELD
>    |  ja >8

<snipped>

> @@ -4683,6 +4716,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>  
>    case BC_CALL: case BC_CALLM:
>      |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup

It looks like VM state is INTERP until the call enters *FUNC* bytecode.

>      if (op == BC_CALLM) {
>        |  add NARGS:RD, MULTRES
>      }
> @@ -4706,6 +4740,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  cmp dword [RA-4], LJ_TFUNC
>      |  jne ->vmeta_call
>      |->BC_CALLT_Z:
> +    |  set_vmstate INTERP		// INTERP until a new BASE is setup

Ditto.

>      |  mov PC, [BASE-4]
>      |  test PC, FRAME_TYPE
>      |  jnz >7
> @@ -4989,6 +5024,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>        |  shl RA, 3
>      }
>      |1:
> +    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored

Typo: Please, adjust the comments considering the ones nearby.

>      |  mov PC, [BASE-4]
>      |  mov MULTRES, RD			// Save nresults+1.
>      |  test PC, FRAME_TYPE		// Check frame type marker.
> @@ -5043,6 +5079,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  mov LFUNC:KBASE, [BASE-8]
>      |  mov KBASE, LFUNC:KBASE->pc
>      |  mov KBASE, [KBASE+PC2PROTO(k)]
> +    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored

Typo: Please, adjust the comments considering the ones nearby.

>      |  ins_next
>      |
>      |6:  // Fill up results with nil.
> @@ -5330,6 +5367,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
>      |  mov KBASE, [PC-4+PC2PROTO(k)]
>      |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration

This is OK, but this path is also taken for the JFUNC. Do we need to set
another VM state for this case (in scope of JLOOP)?

>      |  lea RA, [BASE+RA*8]		// Top of frame.
>      |  cmp RA, L:RB->maxstack
>      |  ja ->vm_growstack_f
> @@ -5367,6 +5405,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  mov [RD-4], RB			// Store delta + FRAME_VARG.
>      |  mov [RD-8], LFUNC:KBASE		// Store copy of LFUNC.
>      |  mov L:RB, SAVE_L
> +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration

Ditto.

>      |  lea RA, [RD+RA*8]
>      |  cmp RA, L:RB->maxstack
>      |  ja ->vm_growstack_v		// Need to grow stack.

<snipped>

> @@ -5441,7 +5480,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  // nresults returned in eax (RD).
>      |  mov BASE, L:RB->base
>      |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> -    |  set_vmstate INTERP
> +    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return

Typo: Please, adjust the comments considering the ones nearby.

>      |  lea RA, [BASE+RD*8]
>      |  neg RA
>      |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for memory profiler
  2020-12-27 18:58   ` Igor Munkin
@ 2020-12-28  0:14     ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  0:14 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 27.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > This patch introduces Lua API for LuaJIT memory profiler implemented in
> > the scope of the previous patch.
> > 
> > Profiler returns some true value if started/stopped successfully,
> 
> There is no "some" true value there is a sole one. Please fix it in both
> the patch and the RFC.

Fixed.

> 
> > returns nil on failure (plus an error message as a second result and a
> > system-dependent error code as a third result).
> > If LuaJIT build without memory profiler both return `false`.
> > 
> > <lj_errmsg.h> have adjusted with two new errors
> > PROF_ISRUNNING/PROF_NOTRUNNING returned in case when profiler has
> > started/stopped already correspondingly.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >   - Added pushing of errno for ERR_PROF* and ERRMEM
> >   - Added forgotten assert.
> > 
> >  src/Makefile.dep |   5 +-
> >  src/lib_misc.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++
> >  src/lj_errmsg.h  |   6 ++
> >  3 files changed, 176 insertions(+), 2 deletions(-)
> > 
> 
> <snipped>
> 
> > diff --git a/src/lib_misc.c b/src/lib_misc.c
> > index 6f7b9a9..36fe29f 100644
> > --- a/src/lib_misc.c
> > +++ b/src/lib_misc.c
> 
> <snipped>
> 
> > @@ -67,8 +75,167 @@ LJLIB_CF(misc_getmetrics)
> 
> <snipped>
> 
> > +static LJ_AINLINE void memprof_ctx_free(struct memprof_ctx *ctx, uint8_t *buf)
> > +{
> > +  lj_mem_free(ctx->g, buf, STREAM_BUFFER_SIZE);
> 
> Side note: This is odd that you free the buffer here, but the buffer
> itself is not a part of the memprof context. Let's return to this later.

I'll rewrite this part considering the fact that lj_mem_new raises an
error in case of OOM.

> 
> > +  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
> > +}
> > +
> > +/* Default buffer writer function. Just call fwrite to corresponding FILE. */
> > +static size_t buffer_writer_default(const void **buf_addr, size_t len,
> > +				    void *opt)
> > +{
> 
> <snipped>
> 
> > +    if (LJ_UNLIKELY(written == 0)) {
> > +      /* Re-tries write in case of EINTR. */
> > +      if (errno == EINTR) {
> 
> Minor: It's better to use early return here. Feel free to ignore.

Applied.

> 
> > +	errno = 0;
> > +	continue;
> > +      }
> > +      break;
> 
> If other error occurs, you need to pass the NULL to buf_addr, right?
> Otherwise, there is no guarantee everything is written to the file and
> profiling proceeds.

Yes, lets stopped profiling immedeately.

> 
> > +    }
> 
> <snipped>
> 
> > +
> > +/* Default on stop callback. Just close corresponding stream. */
> 
> Typo: s/close corresponding/close the corresponding/.

Thanks fixed.

> 
> > +static int on_stop_cb_default(void *opt, uint8_t *buf)
> 
> <snipped>
> 
> > +/* local started, err, errno = misc.memprof.start(fname) */
> > +LJLIB_CF(misc_memprof_start)
> > +{
> 
> <snipped>
> 
> > +  fname = strdata(lj_lib_checkstr(L, 1));
> 
> Minor: You can make this assignment alongside with the declaration.

Fixed.

> 
> > +
> > +  ctx = lj_mem_new(L, sizeof(*ctx));
> > +  if (ctx == NULL)
> 
> This is a dead code: <lj_mem_new> raises a LUA_ERRMEM.

Fixed in the new version.

> 
> > +    goto errmem;
> > +
> 
> <snipped>
> 
> > +  if (NULL == opt.buf) {
> 
> This is a dead code: <lj_mem_new> raises a LUA_ERRMEM.

Fixed in the new version.

> 
> > +    lj_mem_free(G(L), ctx, sizeof(*ctx));
> > +    goto errmem;
> > +  }
> 
> <snipped>
> 
> > +  memprof_status = lj_memprof_start(L, &opt);
> > +  started = memprof_status == PROFILE_SUCCESS;
> 
> Trust me, you don't need this variable.
> */me making Jedi mind tricks here*

I don't need this variable.

> 
> > +
> > +  if (LJ_UNLIKELY(!started)) {
> > +    fclose(ctx->stream);
> > +    remove(fname);
> 
> Minor: I doubt we need to remove a file even if LuaJIT failed to start
> profiling. Leave the comment if this makes sense. Feel free to ignore.

Dropped the corresponding comment.

> 
> > +    memprof_ctx_free(ctx, opt.buf);
> > +    switch (memprof_status) {
> 
> <snipped>
> 
> > +    }
> > +  }
> > +  lua_pushboolean(L, started);
> 
> Please, s/started/1/ since there is no another value here.

Fixed. Added return 0 after lua_assert(0).

> 
> > +
> > +  return 1;
> 
> <snipped>
> 
> > +}
> > +
> > +/* local stopped, err, errno = misc.memprof.stop() */
> > +LJLIB_CF(misc_memprof_stop)
> > +{
> > +  int status = lj_memprof_stop();
> > +  int stopped_successfully = status == PROFILE_SUCCESS;
> 
> Trust me, you don't need this variable.
> */me making Jedi mind tricks here*

I don't need this variable.

> 
> > +  if (!stopped_successfully) {
> 
> <snipped>
> 
> > +  lua_pushboolean(L, stopped_successfully);
> 
> Please, s/stopped_succesfully/1/ since there is no another value here.

Fixed. Added return 0 after lua_assert(0).

> 
> > +  return 1;
> > +}
> > +
> 
> <snipped>
> 
> > diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
> > index de7b867..6816da2 100644
> > --- a/src/lj_errmsg.h
> > +++ b/src/lj_errmsg.h
> > @@ -185,6 +185,12 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
> 
> <snipped>
> 
> > +#if LJ_HASPROFILE || LJ_HASMEMPROF
> 
> Why did you also initialize them for <jit.p> profiler?

I've supposed that it can be used there too. Let's drop it until the
refactoring.

> 
> > +/* Profiler errors. */
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module
  2020-12-26 19:37         ` Sergey Kaplun
@ 2020-12-28  1:43           ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  1:43 UTC (permalink / raw)
  To: Sergey Ostanevich, tarantool-patches

Fixed warning "pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]"
Changed naming.

See iterative patch below. Branch will be force-pushed later with
v3 series.
===================================================================
diff --git a/src/lj_wbuf.c b/src/lj_wbuf.c
index ef46545..897ef08 100644
--- a/src/lj_wbuf.c
+++ b/src/lj_wbuf.c
@@ -90,7 +90,7 @@ void lj_wbuf_addn(struct lj_wbuf *buf, const void *src, size_t n)
     memcpy(buf->pos, src, left);
     buf->pos += (ptrdiff_t)left;
     lj_wbuf_flush(buf);
-    src += (ptrdiff_t)left;
+    src = (uint8_t *)src + (ptrdiff_t)left;
     n -= left;
   }
 
@@ -120,7 +120,7 @@ void LJ_FASTCALL lj_wbuf_flush(struct lj_wbuf *buf)
   written = buf->writer((const void **)&buf->buf, len, buf->ctx);
 
   if (LJ_UNLIKELY(written < len)) {
-    wbuf_set_flag(buf, STREAM_ERR_IO);
+    wbuf_set_flag(buf, STREAM_ERRIO);
     wbuf_save_errno(buf);
   }
   if (LJ_UNLIKELY(buf->buf == NULL)) {
diff --git a/src/lj_wbuf.h b/src/lj_wbuf.h
index 77a7cf4..9eaa5e5 100644
--- a/src/lj_wbuf.h
+++ b/src/lj_wbuf.h
@@ -28,7 +28,7 @@
 */
 
 /* Stream errors. */
-#define STREAM_ERR_IO 0x1
+#define STREAM_ERRIO 0x1
 #define STREAM_STOP   0x2
 
 /*
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (6 preceding siblings ...)
  2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser Sergey Kaplun
@ 2020-12-28  2:05 ` Sergey Kaplun
  2020-12-28  2:49   ` Igor Munkin
  2020-12-28  2:06 ` [Tarantool-patches] [PATCH luajit v3 1/2] core: introduce " Sergey Kaplun
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  2:05 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces Lua API for LuaJIT memory profiler implemented in
the scope of the previous patch.

Profiler returns true value if started/stopped successfully,
returns nil on failure (plus an error message as a second result and a
system-dependent error code as a third result).
If LuaJIT is build without memory profiler both return `false`.

<lj_errmsg.h> have adjusted with three new errors
PROF_MISUSE/PROF_ISRUNNING/PROF_NOTRUNNING returned in case when
profiler has used incorrectly/started/stopped already correspondingly.

Part of tarantool/tarantool#5442
---

Changes in v3:
  * Fixed lj_mem_new misuse.
  * Moved buffer inside ctx.
  * Codestyle fixes.

 src/Makefile.dep |   5 +-
 src/lib_misc.c   | 166 +++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_errmsg.h  |   7 ++
 3 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.dep b/src/Makefile.dep
index 6813bc8..f367241 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -29,8 +29,9 @@ lib_jit.o: lib_jit.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
  lj_vm.h lj_vmevent.h lj_lib.h luajit.h lj_libdef.h
 lib_math.o: lib_math.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h \
  lj_def.h lj_arch.h lj_lib.h lj_vm.h lj_libdef.h
-lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
- lj_str.h lj_tab.h lj_lib.h lj_libdef.h
+lib_misc.o: lib_misc.c lua.h luaconf.h lmisclib.h lauxlib.h lj_obj.h \
+ lj_def.h lj_arch.h lj_str.h lj_tab.h lj_lib.h lj_gc.h lj_err.h \
+ lj_errmsg.h lj_memprof.h lj_wbuf.h lj_libdef.h
 lib_os.o: lib_os.c lua.h luaconf.h lauxlib.h lualib.h lj_obj.h lj_def.h \
  lj_arch.h lj_gc.h lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_lib.h \
  lj_libdef.h
diff --git a/src/lib_misc.c b/src/lib_misc.c
index 6f7b9a9..619cfb7 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -8,13 +8,21 @@
 #define lib_misc_c
 #define LUA_LIB
 
+#include <stdio.h>
+#include <errno.h>
+
 #include "lua.h"
 #include "lmisclib.h"
+#include "lauxlib.h"
 
 #include "lj_obj.h"
 #include "lj_str.h"
 #include "lj_tab.h"
 #include "lj_lib.h"
+#include "lj_gc.h"
+#include "lj_err.h"
+
+#include "lj_memprof.h"
 
 /* ------------------------------------------------------------------------ */
 
@@ -67,8 +75,166 @@ LJLIB_CF(misc_getmetrics)
 
 #include "lj_libdef.h"
 
+/* ----- misc.memprof module ---------------------------------------------- */
+
+#define LJLIB_MODULE_misc_memprof
+
+/*
+** Yep, 8Mb. Tuned in order not to bother the platform with too often flushes.
+*/
+#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
+
+/* Structure given as ctx to memprof writer and on_stop callback. */
+struct memprof_ctx {
+  /* Output file stream for data. */
+  FILE *stream;
+  /* Profiled global_State for lj_mem_free at on_stop callback. */
+  global_State *g;
+  /* Buffer for data. */
+  uint8_t buf[STREAM_BUFFER_SIZE];
+};
+
+/*
+** Default buffer writer function.
+** Just call fwrite to the corresponding FILE.
+*/
+static size_t buffer_writer_default(const void **buf_addr, size_t len,
+				    void *opt)
+{
+  struct memprof_ctx *ctx = opt;
+  FILE *stream = ctx->stream;
+  const void * const buf_start = *buf_addr;
+  const void *data = *buf_addr;
+  size_t write_total = 0;
+
+  lua_assert(len <= STREAM_BUFFER_SIZE);
+
+  for (;;) {
+    const size_t written = fwrite(data, 1, len - write_total, stream);
+
+    if (LJ_UNLIKELY(written == 0)) {
+      /* Re-tries write in case of EINTR. */
+      if (errno != EINTR) {
+	/* Will be freed as whole chunk later. */
+	*buf_addr = NULL;
+	return write_total;
+      }
+
+      errno = 0;
+      continue;
+    }
+
+    write_total += written;
+    lua_assert(write_total <= len);
+
+    if (write_total == len)
+      break;
+
+    data = (uint8_t *)data + (ptrdiff_t)written;
+  }
+
+  *buf_addr = buf_start;
+  return write_total;
+}
+
+/* Default on stop callback. Just close the corresponding stream. */
+static int on_stop_cb_default(void *opt, uint8_t *buf)
+{
+  struct memprof_ctx *ctx = opt;
+  FILE *stream = ctx->stream;
+  UNUSED(buf);
+  lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+  return fclose(stream);
+}
+
+/* local started, err, errno = misc.memprof.start(fname) */
+LJLIB_CF(misc_memprof_start)
+{
+  struct lj_memprof_options opt = {0};
+  const char *fname = strdata(lj_lib_checkstr(L, 1));
+  struct memprof_ctx *ctx;
+  int memprof_status;
+
+  /*
+  ** FIXME: more elegant solution with ctx.
+  ** Throws in case of OOM.
+  */
+  ctx = lj_mem_new(L, sizeof(*ctx));
+  opt.ctx = ctx;
+  opt.buf = ctx->buf;
+  opt.writer = buffer_writer_default;
+  opt.on_stop = on_stop_cb_default;
+  opt.len = STREAM_BUFFER_SIZE;
+
+  ctx->g = G(L);
+  ctx->stream = fopen(fname, "wb");
+
+  if (ctx->stream == NULL) {
+    lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+    return luaL_fileresult(L, 0, fname);
+  }
+
+  memprof_status = lj_memprof_start(L, &opt);
+
+  if (LJ_UNLIKELY(memprof_status != PROFILE_SUCCESS)) {
+    lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+    switch (memprof_status) {
+    case PROFILE_ERRUSE:
+      lua_pushnil(L);
+      lua_pushstring(L, err2msg(LJ_ERR_PROF_MISUSE));
+      lua_pushinteger(L, EINVAL);
+      return 3;
+    case PROFILE_ERRRUN:
+      lua_pushnil(L);
+      lua_pushstring(L, err2msg(LJ_ERR_PROF_ISRUNNING));
+      lua_pushinteger(L, EINVAL);
+      return 3;
+    case PROFILE_ERRIO:
+      return luaL_fileresult(L, 0, fname);
+    default:
+      lua_assert(0);
+      return 0;
+    }
+  }
+  lua_pushboolean(L, 1);
+
+  return 1;
+}
+
+/* local stopped, err, errno = misc.memprof.stop() */
+LJLIB_CF(misc_memprof_stop)
+{
+  int status = lj_memprof_stop(L);
+  if (status != PROFILE_SUCCESS) {
+    switch (status) {
+    case PROFILE_ERRUSE:
+      lua_pushnil(L);
+      lua_pushstring(L, err2msg(LJ_ERR_PROF_MISUSE));
+      lua_pushinteger(L, EINVAL);
+      return 3;
+    case PROFILE_ERRRUN:
+      lua_pushnil(L);
+      lua_pushstring(L, err2msg(LJ_ERR_PROF_NOTRUNNING));
+      lua_pushinteger(L, EINVAL);
+      return 3;
+    case PROFILE_ERRIO:
+      return luaL_fileresult(L, 0, NULL);
+    default:
+      lua_assert(0);
+      return 0;
+    }
+  }
+  lua_pushboolean(L, 1);
+  return 1;
+}
+
+#include "lj_libdef.h"
+
+/* ------------------------------------------------------------------------ */
+
 LUALIB_API int luaopen_misc(struct lua_State *L)
 {
   LJ_LIB_REG(L, LUAM_MISCLIBNAME, misc);
+  LJ_LIB_REG(L, LUAM_MISCLIBNAME ".memprof", misc_memprof);
   return 1;
 }
diff --git a/src/lj_errmsg.h b/src/lj_errmsg.h
index de7b867..bebe804 100644
--- a/src/lj_errmsg.h
+++ b/src/lj_errmsg.h
@@ -185,6 +185,13 @@ ERRDEF(FFI_NYIPACKBIT,	"NYI: packed bit fields")
 ERRDEF(FFI_NYICALL,	"NYI: cannot call this C function (yet)")
 #endif
 
+#if LJ_HASMEMPROF
+/* Profiler errors. */
+ERRDEF(PROF_MISUSE,	"profiler misuse")
+ERRDEF(PROF_ISRUNNING,	"profiler is running already")
+ERRDEF(PROF_NOTRUNNING,	"profiler is not running")
+#endif
+
 #undef ERRDEF
 
 /* Detecting unused error messages:
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 1/2] core: introduce memory profiler
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (7 preceding siblings ...)
  2020-12-28  2:05 ` [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler Sergey Kaplun
@ 2020-12-28  2:06 ` Sergey Kaplun
  2020-12-28  3:59   ` Igor Munkin
  2020-12-28  4:05 ` [Tarantool-patches] [PATCH luajit v3 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  2:06 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces memory profiler for Lua machine.

To determine currently allocating coroutine (that may not be equal to
currently executed one) a new field mem_L is added to the
global_State structure. This field is set on each allocation event and
stores the coroutine address that is used for allocation.

First of all profiler dumps the definitions of all loaded Lua functions
(symtab) via the write buffer introduced in one of the previous patches.

Profiler replaces the old allocation function with the instrumented one
after symtab is dumped. This new function reports all allocations,
reallocations or deallocations events via the write buffer during
profiling. Subsequent content depends on the function's type (LFUNC,
FFUNC or CFUNC).

When profiling is over, a special epilogue event header is written and
the old allocation function is restored back.

This change also makes debug_frameline function LuaJIT-wide visible to
be used in the memory profiler.

For more information, see <lj_memprof.h>.

Part of tarantool/tarantool#5442
---

Changes in v3:
  * Fixed invalid pointer usage at on_stop cb.
  * Dropped thread safe logic.
  * Dropped unused functions.
  * Added assertion to memprof_write_lfunc.
  * Codestyle fixes.

 src/Makefile     |   5 +-
 src/Makefile.dep |  30 +++--
 src/lj_arch.h    |  15 +++
 src/lj_debug.c   |   8 +-
 src/lj_debug.h   |   3 +
 src/lj_gc.c      |   7 +-
 src/lj_gc.h      |   1 +
 src/lj_memprof.c | 344 +++++++++++++++++++++++++++++++++++++++++++++++
 src/lj_memprof.h | 159 ++++++++++++++++++++++
 src/lj_obj.h     |   1 +
 src/lj_state.c   |   7 +
 src/ljamalg.c    |   1 +
 12 files changed, 561 insertions(+), 20 deletions(-)
 create mode 100644 src/lj_memprof.c
 create mode 100644 src/lj_memprof.h

diff --git a/src/Makefile b/src/Makefile
index 936dcbb..825b01c 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -113,6 +113,9 @@ XCFLAGS=
 # Enable GC64 mode for x64.
 #XCFLAGS+= -DLUAJIT_ENABLE_GC64
 #
+# Disable the memory profiler.
+#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
+#
 ##############################################################################
 
 ##############################################################################
@@ -488,7 +491,7 @@ LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o lj_wbuf.o \
 	  lj_str.o lj_tab.o lj_func.o lj_udata.o lj_meta.o lj_debug.o \
 	  lj_state.o lj_dispatch.o lj_vmevent.o lj_vmmath.o lj_strscan.o \
 	  lj_strfmt.o lj_strfmt_num.o lj_api.o lj_mapi.o lj_profile.o \
-	  lj_lex.o lj_parse.o lj_bcread.o lj_bcwrite.o lj_load.o \
+	  lj_memprof.o lj_lex.o lj_parse.o lj_bcread.o lj_bcwrite.o lj_load.o \
 	  lj_ir.o lj_opt_mem.o lj_opt_fold.o lj_opt_narrow.o \
 	  lj_opt_dce.o lj_opt_loop.o lj_opt_split.o lj_opt_sink.o \
 	  lj_mcode.o lj_snap.o lj_record.o lj_crecord.o lj_ffrecord.o \
diff --git a/src/Makefile.dep b/src/Makefile.dep
index 59ed450..6813bc8 100644
--- a/src/Makefile.dep
+++ b/src/Makefile.dep
@@ -144,6 +144,8 @@ lj_mapi.o: lj_mapi.c lua.h luaconf.h lmisclib.h lj_obj.h lj_def.h lj_arch.h \
 lj_mcode.o: lj_mcode.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h \
  lj_gc.h lj_err.h lj_errmsg.h lj_jit.h lj_ir.h lj_mcode.h lj_trace.h \
  lj_dispatch.h lj_bc.h lj_traceerr.h lj_vm.h
+lj_memprof.o: lj_memprof.c lj_arch.h lua.h luaconf.h lj_memprof.h \
+ lj_def.h lj_wbuf.h lj_obj.h lj_frame.h lj_bc.h lj_debug.h
 lj_meta.o: lj_meta.c lj_obj.h lua.h luaconf.h lj_def.h lj_arch.h lj_gc.h \
  lj_err.h lj_errmsg.h lj_buf.h lj_str.h lj_tab.h lj_meta.h lj_frame.h \
  lj_bc.h lj_vm.h lj_strscan.h lj_strfmt.h lj_lib.h
@@ -220,20 +222,20 @@ ljamalg.o: ljamalg.c lua.h luaconf.h lauxlib.h lj_gc.c lj_obj.h lj_def.h \
  lj_char.h lj_bc.c lj_bcdef.h lj_obj.c lj_buf.c lj_wbuf.c lj_wbuf.h lj_utils.h \
  lj_str.c lj_tab.c lj_func.c lj_udata.c lj_meta.c lj_strscan.h lj_lib.h \
  lj_debug.c lj_state.c lj_lex.h lj_alloc.h luajit.h lj_dispatch.c \
- lj_ccallback.h lj_profile.h lj_vmevent.c lj_vmevent.h lj_vmmath.c \
- lj_strscan.c lj_strfmt.c lj_strfmt_num.c lj_api.c lj_mapi.c lmisclib.h \
- lj_profile.c lj_lex.c lualib.h lj_parse.h lj_parse.c lj_bcread.c lj_bcdump.h \
- lj_bcwrite.c lj_load.c lj_ctype.c lj_cdata.c lj_cconv.h lj_cconv.c lj_ccall.c \
- lj_ccall.h lj_ccallback.c lj_target.h lj_target_*.h lj_mcode.h lj_carith.c \
- lj_carith.h lj_clib.c lj_clib.h lj_cparse.c lj_cparse.h lj_lib.c lj_ir.c \
- lj_ircall.h lj_iropt.h lj_opt_mem.c lj_opt_fold.c lj_folddef.h \
- lj_opt_narrow.c lj_opt_dce.c lj_opt_loop.c lj_snap.h lj_opt_split.c \
- lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c lj_record.h lj_ffrecord.h \
- lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h lj_asm.c lj_asm.h \
- lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c lj_alloc.c \
- lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c lib_string.c \
- lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c lib_bit.c lib_jit.c \
- lib_ffi.c lib_misc.c lib_init.c
+ lj_ccallback.h lj_profile.h lj_memprof.h lj_vmevent.c lj_vmevent.h \
+ lj_vmmath.c lj_strscan.c lj_strfmt.c lj_strfmt_num.c lj_api.c lj_mapi.c \
+ lmisclib.h lj_profile.c lj_memprof.c lj_lex.c lualib.h lj_parse.h lj_parse.c \
+ lj_bcread.c lj_bcdump.h lj_bcwrite.c lj_load.c lj_ctype.c lj_cdata.c \
+ lj_cconv.h lj_cconv.c lj_ccall.c lj_ccall.h lj_ccallback.c lj_target.h \
+ lj_target_*.h lj_mcode.h lj_carith.c lj_carith.h lj_clib.c lj_clib.h \
+ lj_cparse.c lj_cparse.h lj_lib.c lj_ir.c lj_ircall.h lj_iropt.h lj_opt_mem.c \
+ lj_opt_fold.c lj_folddef.h lj_opt_narrow.c lj_opt_dce.c lj_opt_loop.c \
+ lj_snap.h lj_opt_split.c lj_opt_sink.c lj_mcode.c lj_snap.c lj_record.c \
+ lj_record.h lj_ffrecord.h lj_crecord.c lj_crecord.h lj_ffrecord.c lj_recdef.h \
+ lj_asm.c lj_asm.h lj_emit_*.h lj_asm_*.h lj_trace.c lj_gdbjit.h lj_gdbjit.c \
+ lj_alloc.c lj_utils_leb128.c lib_aux.c lib_base.c lj_libdef.h lib_math.c \
+ lib_string.c lib_table.c lib_io.c lib_os.c lib_package.c lib_debug.c \
+ lib_bit.c lib_jit.c lib_ffi.c lib_misc.c lib_init.c
 luajit.o: luajit.c lua.h luaconf.h lauxlib.h lualib.h luajit.h lj_arch.h
 host/buildvm.o: host/buildvm.c host/buildvm.h lj_def.h lua.h luaconf.h \
  lj_arch.h lj_obj.h lj_def.h lj_arch.h lj_gc.h lj_obj.h lj_bc.h lj_ir.h \
diff --git a/src/lj_arch.h b/src/lj_arch.h
index c8d7138..d8676e9 100644
--- a/src/lj_arch.h
+++ b/src/lj_arch.h
@@ -213,6 +213,8 @@
 #define LJ_ARCH_VERSION		50
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_ARM64
 
 #define LJ_ARCH_BITS		64
@@ -234,6 +236,8 @@
 
 #define LJ_ARCH_VERSION		80
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_PPC
 
 #ifndef LJ_ARCH_ENDIAN
@@ -299,6 +303,8 @@
 #define LJ_ARCH_XENON		1
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #elif LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 || LUAJIT_TARGET == LUAJIT_ARCH_MIPS64
 
 #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL)
@@ -358,6 +364,8 @@
 #define LJ_ARCH_VERSION		10
 #endif
 
+#define LJ_ARCH_NOMEMPROF	1
+
 #else
 #error "No target architecture defined"
 #endif
@@ -564,4 +572,11 @@
 #define LJ_52			0
 #endif
 
+/* Disable or enable the memory profiler. */
+#if defined(LUAJIT_DISABLE_MEMPROF) || defined(LJ_ARCH_NOMEMPROF) || LJ_TARGET_WINDOWS || LJ_TARGET_CYGWIN || LJ_TARGET_PS3 || LJ_TARGET_PS4 || LJ_TARGET_XBOX360
+#define LJ_HASMEMPROF		0
+#else
+#define LJ_HASMEMPROF		1
+#endif
+
 #endif
diff --git a/src/lj_debug.c b/src/lj_debug.c
index 73bd196..bb9ab28 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -128,7 +128,7 @@ BCLine LJ_FASTCALL lj_debug_line(GCproto *pt, BCPos pc)
 }
 
 /* Get line number for function/frame. */
-static BCLine debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe)
+BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe)
 {
   BCPos pc = debug_framepc(L, fn, nextframe);
   if (pc != NO_BCPOS) {
@@ -353,7 +353,7 @@ void lj_debug_addloc(lua_State *L, const char *msg,
   if (frame) {
     GCfunc *fn = frame_func(frame);
     if (isluafunc(fn)) {
-      BCLine line = debug_frameline(L, fn, nextframe);
+      BCLine line = lj_debug_frameline(L, fn, nextframe);
       if (line >= 0) {
 	GCproto *pt = funcproto(fn);
 	char buf[LUA_IDSIZE];
@@ -470,7 +470,7 @@ int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar, int ext)
 	ar->what = "C";
       }
     } else if (*what == 'l') {
-      ar->currentline = frame ? debug_frameline(L, fn, nextframe) : -1;
+      ar->currentline = frame ? lj_debug_frameline(L, fn, nextframe) : -1;
     } else if (*what == 'u') {
       ar->nups = fn->c.nupvalues;
       if (ext) {
@@ -616,7 +616,7 @@ void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt, int depth)
 	    GCproto *pt = funcproto(fn);
 	    if (debug_putchunkname(sb, pt, pathstrip)) {
 	      /* Regular Lua function. */
-	      BCLine line = c == 'l' ? debug_frameline(L, fn, nextframe) :
+	      BCLine line = c == 'l' ? lj_debug_frameline(L, fn, nextframe) :
 				       pt->firstline;
 	      lj_buf_putb(sb, ':');
 	      lj_strfmt_putint(sb, line >= 0 ? line : pt->firstline);
diff --git a/src/lj_debug.h b/src/lj_debug.h
index 5917c00..a157d28 100644
--- a/src/lj_debug.h
+++ b/src/lj_debug.h
@@ -40,6 +40,9 @@ LJ_FUNC void lj_debug_addloc(lua_State *L, const char *msg,
 LJ_FUNC void lj_debug_pushloc(lua_State *L, GCproto *pt, BCPos pc);
 LJ_FUNC int lj_debug_getinfo(lua_State *L, const char *what, lj_Debug *ar,
 			     int ext);
+#if LJ_HASMEMPROF
+LJ_FUNC BCLine lj_debug_frameline(lua_State *L, GCfunc *fn, cTValue *nextframe);
+#endif
 #if LJ_HASPROFILE
 LJ_FUNC void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt,
 				int depth);
diff --git a/src/lj_gc.c b/src/lj_gc.c
index 44c8aa1..7f0ec89 100644
--- a/src/lj_gc.c
+++ b/src/lj_gc.c
@@ -852,6 +852,8 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
 {
   global_State *g = G(L);
   lua_assert((osz == 0) == (p == NULL));
+
+  setgcref(g->mem_L, obj2gco(L));
   p = g->allocf(g->allocd, p, osz, nsz);
   if (p == NULL && nsz > 0)
     lj_err_mem(L);
@@ -867,7 +869,10 @@ void *lj_mem_realloc(lua_State *L, void *p, GCSize osz, GCSize nsz)
 void * LJ_FASTCALL lj_mem_newgco(lua_State *L, GCSize size)
 {
   global_State *g = G(L);
-  GCobj *o = (GCobj *)g->allocf(g->allocd, NULL, 0, size);
+  GCobj *o;
+
+  setgcref(g->mem_L, obj2gco(L));
+  o = (GCobj *)g->allocf(g->allocd, NULL, 0, size);
   if (o == NULL)
     lj_err_mem(L);
   lua_assert(checkptrGC(o));
diff --git a/src/lj_gc.h b/src/lj_gc.h
index 2051220..40b02cb 100644
--- a/src/lj_gc.h
+++ b/src/lj_gc.h
@@ -113,6 +113,7 @@ static LJ_AINLINE void lj_mem_free(global_State *g, void *p, size_t osize)
 {
   g->gc.total -= (GCSize)osize;
   g->gc.freed += osize;
+  /* All deallocations are reported as internal. Not necessary to set mem_L. */
   g->allocf(g->allocd, p, osize, 0);
 }
 
diff --git a/src/lj_memprof.c b/src/lj_memprof.c
new file mode 100644
index 0000000..4994de5
--- /dev/null
+++ b/src/lj_memprof.c
@@ -0,0 +1,344 @@
+/*
+** Implementation of memory profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+#define lj_memprof_c
+#define LUA_CORE
+
+#include <errno.h>
+
+#include "lj_arch.h"
+#include "lj_memprof.h"
+
+#if LJ_HASMEMPROF
+
+#include "lj_obj.h"
+#include "lj_frame.h"
+#include "lj_debug.h"
+
+/* --------------------------------- Symtab --------------------------------- */
+
+static const unsigned char ljs_header[] = {'l', 'j', 's', LJS_CURRENT_VERSION,
+					   0x0, 0x0, 0x0};
+
+static void dump_symtab(struct lj_wbuf *out, const struct global_State *g)
+{
+  const GCRef *iter = &g->gc.root;
+  const GCobj *o;
+  const size_t ljs_header_len = sizeof(ljs_header) / sizeof(ljs_header[0]);
+
+  /* Write prologue. */
+  lj_wbuf_addn(out, ljs_header, ljs_header_len);
+
+  while ((o = gcref(*iter)) != NULL) {
+    switch (o->gch.gct) {
+    case (~LJ_TPROTO): {
+      const GCproto *pt = gco2pt(o);
+      lj_wbuf_addbyte(out, SYMTAB_LFUNC);
+      lj_wbuf_addu64(out, (uintptr_t)pt);
+      lj_wbuf_addstring(out, proto_chunknamestr(pt));
+      lj_wbuf_addu64(out, (uint64_t)pt->firstline);
+      break;
+    }
+    default:
+      break;
+    }
+    iter = &o->gch.nextgc;
+  }
+
+  lj_wbuf_addbyte(out, SYMTAB_FINAL);
+}
+
+/* ---------------------------- Memory profiler ----------------------------- */
+
+enum memprof_state {
+  /* Memory profiler is not running. */
+  MPS_IDLE,
+  /* Memory profiler is running. */
+  MPS_PROFILE,
+  /*
+  ** Stopped in case of stopped stream.
+  ** Saved errno is returned to user at lj_memprof_stop.
+  */
+  MPS_HALT
+};
+
+struct alloc {
+  lua_Alloc allocf; /* Allocating function. */
+  void *state; /* Opaque allocator's state. */
+};
+
+struct memprof {
+  global_State *g; /* Profiled VM. */
+  enum memprof_state state; /* Internal state. */
+  struct lj_wbuf out; /* Output accumulator. */
+  struct alloc orig_alloc; /* Original allocator. */
+  struct lj_memprof_options opt; /* Profiling options. */
+  int saved_errno; /* Saved errno when profiler deinstrumented. */
+};
+
+static struct memprof memprof = {0};
+
+const unsigned char ljm_header[] = {'l', 'j', 'm', LJM_CURRENT_FORMAT_VERSION,
+				    0x0, 0x0, 0x0};
+
+static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t aevent,
+				GCfunc *fn, struct lua_State *L,
+				cTValue *nextframe)
+{
+  const BCLine line = lj_debug_frameline(L, fn, nextframe);
+  /*
+  ** Line is always >= 0 if we are inside a Lua function.
+  ** Equals to zero when LuaJIT is built with the
+  ** -DLUAJIT_DISABLE_DEBUGINFO flag.
+  */
+  lua_assert(line >= 0);
+  lj_wbuf_addbyte(out, aevent | ASOURCE_LFUNC);
+  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
+  lj_wbuf_addu64(out, (uint64_t)line);
+}
+
+static void memprof_write_cfunc(struct lj_wbuf *out, uint8_t aevent,
+				const GCfunc *fn)
+{
+  lj_wbuf_addbyte(out, aevent | ASOURCE_CFUNC);
+  lj_wbuf_addu64(out, (uintptr_t)fn->c.f);
+}
+
+static void memprof_write_ffunc(struct lj_wbuf *out, uint8_t aevent,
+				GCfunc *fn, struct lua_State *L,
+				cTValue *frame)
+{
+  cTValue *pframe = frame_prev(frame);
+  GCfunc *pfn = frame_func(pframe);
+
+  /*
+  ** XXX: If a fast function is called by a Lua function, report the
+  ** Lua function for more meaningful output. Otherwise report the fast
+  ** function as a C function.
+  */
+  if (pfn != NULL && isluafunc(pfn))
+    memprof_write_lfunc(out, aevent, pfn, L, frame);
+  else
+    memprof_write_cfunc(out, aevent, fn);
+}
+
+static void memprof_write_func(struct memprof *mp, uint8_t aevent)
+{
+  struct lj_wbuf *out = &mp->out;
+  lua_State *L = gco2th(gcref(mp->g->mem_L));
+  cTValue *frame = L->base - 1;
+  GCfunc *fn = frame_func(frame);
+
+  if (isluafunc(fn))
+    memprof_write_lfunc(out, aevent, fn, L, NULL);
+  else if (isffunc(fn))
+    memprof_write_ffunc(out, aevent, fn, L, frame);
+  else if (iscfunc(fn))
+    memprof_write_cfunc(out, aevent, fn);
+  else
+    lua_assert(0);
+}
+
+static void memprof_write_hvmstate(struct memprof *mp, uint8_t aevent)
+{
+  lj_wbuf_addbyte(&mp->out, aevent | ASOURCE_INT);
+}
+
+typedef void (*memprof_writer)(struct memprof *mp, uint8_t aevent);
+
+static const memprof_writer memprof_writers[] = {
+  memprof_write_hvmstate, /* LJ_VMST_INTERP */
+  memprof_write_func, /* LJ_VMST_LFUNC */
+  memprof_write_func, /* LJ_VMST_FFUNC */
+  memprof_write_func, /* LJ_VMST_CFUNC */
+  memprof_write_hvmstate, /* LJ_VMST_GC */
+  memprof_write_hvmstate, /* LJ_VMST_EXIT */
+  memprof_write_hvmstate, /* LJ_VMST_RECORD */
+  memprof_write_hvmstate, /* LJ_VMST_OPT */
+  memprof_write_hvmstate, /* LJ_VMST_ASM */
+  /*
+  ** XXX: In ideal world, we should report allocations from traces as well.
+  ** But since traces must follow the semantics of the original code,
+  ** behaviour of Lua and JITted code must match 1:1 in terms of allocations,
+  ** which makes using memprof with enabled JIT virtually redundant.
+  ** Hence use the stub below.
+  */
+  memprof_write_hvmstate /* LJ_VMST_TRACE */
+};
+
+static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
+{
+  const global_State *g = mp->g;
+  const uint32_t _vmstate = (uint32_t)~g->vmstate;
+  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
+
+  memprof_writers[vmstate](mp, aevent);
+}
+
+static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
+{
+  struct memprof *mp = &memprof;
+  const struct alloc *oalloc = &mp->orig_alloc;
+  struct lj_wbuf *out = &mp->out;
+  void *nptr;
+
+  lua_assert(MPS_PROFILE == mp->state);
+  lua_assert(oalloc->allocf != memprof_allocf);
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(ud == oalloc->state);
+
+  nptr = oalloc->allocf(ud, ptr, osize, nsize);
+
+  if (nsize == 0) {
+    memprof_write_caller(mp, AEVENT_FREE);
+    lj_wbuf_addu64(out, (uintptr_t)ptr);
+    lj_wbuf_addu64(out, (uint64_t)osize);
+  } else if (ptr == NULL) {
+    memprof_write_caller(mp, AEVENT_ALLOC);
+    lj_wbuf_addu64(out, (uintptr_t)nptr);
+    lj_wbuf_addu64(out, (uint64_t)nsize);
+  } else {
+    memprof_write_caller(mp, AEVENT_REALLOC);
+    lj_wbuf_addu64(out, (uintptr_t)ptr);
+    lj_wbuf_addu64(out, (uint64_t)osize);
+    lj_wbuf_addu64(out, (uintptr_t)nptr);
+    lj_wbuf_addu64(out, (uint64_t)nsize);
+  }
+
+  /* Deinstrument memprof if required. */
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP)))
+    lj_memprof_stop(mainthread(mp->g));
+
+  return nptr;
+}
+
+int lj_memprof_start(struct lua_State *L, const struct lj_memprof_options *opt)
+{
+  struct memprof *mp = &memprof;
+  struct lj_memprof_options *mp_opt = &mp->opt;
+  struct alloc *oalloc = &mp->orig_alloc;
+  const size_t ljm_header_len = sizeof(ljm_header) / sizeof(ljm_header[0]);
+
+  lua_assert(opt->writer != NULL);
+  lua_assert(opt->on_stop != NULL);
+  lua_assert(opt->buf != NULL);
+  lua_assert(opt->len != 0);
+
+  if (mp->state != MPS_IDLE)
+    return PROFILE_ERRRUN;
+
+  /* Discard possible old errno. */
+  mp->saved_errno = 0;
+
+  /* Init options. */
+  memcpy(mp_opt, opt, sizeof(*opt));
+
+  /* Init general fields. */
+  mp->g = G(L);
+  mp->state = MPS_PROFILE;
+
+  /* Init output. */
+  lj_wbuf_init(&mp->out, mp_opt->writer, mp_opt->ctx, mp_opt->buf, mp_opt->len);
+  dump_symtab(&mp->out, mp->g);
+
+  /* Write prologue. */
+  lj_wbuf_addn(&mp->out, ljm_header, ljm_header_len);
+
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERRIO|STREAM_STOP))) {
+    /* on_stop call may change errno value. */
+    int saved_errno = lj_wbuf_errno(&mp->out);
+    /* Ignore possible errors. mp->out.buf may be NULL here. */
+    mp_opt->on_stop(mp_opt->ctx, mp->out.buf);
+    lj_wbuf_terminate(&mp->out);
+    mp->state = MPS_IDLE;
+    errno = saved_errno;
+    return PROFILE_ERRIO;
+  }
+
+  /* Override allocating function. */
+  oalloc->allocf = lua_getallocf(L, &oalloc->state);
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(oalloc->allocf != memprof_allocf);
+  lua_assert(oalloc->state != NULL);
+  lua_setallocf(L, memprof_allocf, oalloc->state);
+
+  return PROFILE_SUCCESS;
+}
+
+int lj_memprof_stop(struct lua_State *L)
+{
+  struct memprof *mp = &memprof;
+  struct lj_memprof_options *mp_opt = &mp->opt;
+  struct alloc *oalloc = &mp->orig_alloc;
+  struct lj_wbuf *out = &mp->out;
+  int cb_status;
+
+  if (mp->state == MPS_HALT) {
+    errno = mp->saved_errno;
+    mp->state = MPS_IDLE;
+    /* wbuf was terminated before. */
+    return PROFILE_ERRIO;
+  }
+
+  if (mp->state != MPS_PROFILE)
+    return PROFILE_ERRRUN;
+
+  if (mp->g != G(L))
+    return PROFILE_ERRUSE;
+
+  mp->state = MPS_IDLE;
+
+  lua_assert(mp->g != NULL);
+
+  lua_assert(memprof_allocf == lua_getallocf(L, NULL));
+  lua_assert(oalloc->allocf != NULL);
+  lua_assert(oalloc->state != NULL);
+  lua_setallocf(L, oalloc->allocf, oalloc->state);
+
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
+    /* on_stop call may change errno value. */
+    int saved_errno = lj_wbuf_errno(out);
+    /* Ignore possible errors. out->buf may be NULL here. */
+    mp_opt->on_stop(mp_opt->ctx, out->buf);
+    errno = saved_errno;
+    goto errio;
+  }
+
+  lj_wbuf_addbyte(out, LJM_EPILOGUE_HEADER);
+
+  lj_wbuf_flush(out);
+
+  cb_status = mp_opt->on_stop(mp_opt->ctx, out->buf);
+  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERRIO|STREAM_STOP) ||
+		  cb_status != 0)) {
+    errno = lj_wbuf_errno(out);
+    goto errio;
+  }
+
+  lj_wbuf_terminate(out);
+  return PROFILE_SUCCESS;
+errio:
+  lj_wbuf_terminate(out);
+  return PROFILE_ERRIO;
+}
+
+#else /* LJ_HASMEMPROF */
+
+int lj_memprof_start(struct lua_State *L, const struct lj_memprof_options *opt)
+{
+  UNUSED(L);
+  UNUSED(opt);
+  return PROFILE_ERRUSE;
+}
+
+int lj_memprof_stop(struct lua_State *L)
+{
+  UNUSED(L);
+  return PROFILE_ERRUSE;
+}
+
+#endif /* LJ_HASMEMPROF */
diff --git a/src/lj_memprof.h b/src/lj_memprof.h
new file mode 100644
index 0000000..3417475
--- /dev/null
+++ b/src/lj_memprof.h
@@ -0,0 +1,159 @@
+/*
+** Memory profiler.
+**
+** Major portions taken verbatim or adapted from the LuaVela.
+** Copyright (C) 2015-2019 IPONWEB Ltd.
+*/
+
+/*
+** XXX: Memory profiler is not thread safe. Please, don't try to
+** use it inside several VM, you can profile only one at a time.
+*/
+
+#ifndef _LJ_MEMPROF_H
+#define _LJ_MEMPROF_H
+
+#include "lj_def.h"
+#include "lj_wbuf.h"
+
+#define LJS_CURRENT_VERSION 0x1
+
+/*
+** symtab format:
+**
+** symtab         := prologue sym*
+** prologue       := 'l' 'j' 's' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** sym            := sym-lua | sym-final
+** sym-lua        := sym-header sym-addr sym-chunk sym-line
+** sym-header     := <BYTE>
+** sym-addr       := <ULEB128>
+** sym-chunk      := string
+** sym-line       := <ULEB128>
+** sym-final      := sym-header
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain numeric version number
+**
+** sym-header: [FUUUUUTT]
+**  * TT    : 2 bits for representing symbol type
+**  * UUUUU : 5 unused bits
+**  * F     : 1 bit marking the end of the symtab (final symbol)
+*/
+
+#define SYMTAB_LFUNC ((uint8_t)0)
+#define SYMTAB_FINAL ((uint8_t)0x80)
+
+#define LJM_CURRENT_FORMAT_VERSION 0x01
+
+/*
+** Event stream format:
+**
+** stream         := symtab memprof
+** symtab         := see symtab description
+** memprof        := prologue event* epilogue
+** prologue       := 'l' 'j' 'm' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** event          := event-alloc | event-realloc | event-free
+** event-alloc    := event-header loc? naddr nsize
+** event-realloc  := event-header loc? oaddr osize naddr nsize
+** event-free     := event-header loc? oaddr osize
+** event-header   := <BYTE>
+** loc            := loc-lua | loc-c
+** loc-lua        := sym-addr line-no
+** loc-c          := sym-addr
+** sym-addr       := <ULEB128>
+** line-no        := <ULEB128>
+** oaddr          := <ULEB128>
+** naddr          := <ULEB128>
+** osize          := <ULEB128>
+** nsize          := <ULEB128>
+** epilogue       := event-header
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain integer version number
+**
+** event-header: [FUUUSSEE]
+**  * EE   : 2 bits for representing allocation event type (AEVENT_*)
+**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
+**  * UUU  : 3 unused bits
+**  * F    : 0 for regular events, 1 for epilogue's *F*inal header
+**           (if F is set to 1, all other bits are currently ignored)
+*/
+
+/* Allocation events. */
+#define AEVENT_ALLOC   ((uint8_t)1)
+#define AEVENT_FREE    ((uint8_t)2)
+#define AEVENT_REALLOC ((uint8_t)(AEVENT_ALLOC | AEVENT_FREE))
+
+/* Allocation sources. */
+#define ASOURCE_INT   ((uint8_t)(1 << 2))
+#define ASOURCE_LFUNC ((uint8_t)(2 << 2))
+#define ASOURCE_CFUNC ((uint8_t)(3 << 2))
+
+#define LJM_EPILOGUE_HEADER 0x80
+
+/* Profiler public API. */
+#define PROFILE_SUCCESS 0
+#define PROFILE_ERRUSE  1
+#define PROFILE_ERRRUN  2
+#define PROFILE_ERRMEM  3
+#define PROFILE_ERRIO   4
+
+/* Profiler options. */
+struct lj_memprof_options {
+  /* Context for the profile writer and final callback. */
+  void *ctx;
+  /* Custom buffer to write data. */
+  uint8_t *buf;
+  /* The buffer's size. */
+  size_t len;
+  /*
+  ** Writer function for profile events.
+  ** Should return amount of written bytes on success or zero in case of error.
+  ** Setting *data to NULL means end of profiling.
+  ** For details see <lj_wbuf.h>.
+  */
+  lj_wbuf_writer writer;
+  /*
+  ** Callback on profiler stopping. Required for correctly cleaning
+  ** at VM finalization when profiler is still running.
+  ** Returns zero on success.
+  */
+  int (*on_stop)(void *ctx, uint8_t *buf);
+};
+
+/* Avoid to provide additional interfaces described in other headers. */
+struct lua_State;
+
+/*
+** Starts profiling. Returns PROFILE_SUCCESS on success and one of
+** PROFILE_ERR* codes otherwise. Destructor is called in case of
+** PROFILE_ERRIO.
+*/
+int lj_memprof_start(struct lua_State *L, const struct lj_memprof_options *opt);
+
+/*
+** Stops profiling. Returns PROFILE_SUCCESS on success and one of
+** PROFILE_ERR* codes otherwise. If writer() function returns zero
+** on call at buffer flush, profiled stream stops, or on_stop() callback
+** returns non-zero value, returns PROFILE_ERRIO.
+*/
+int lj_memprof_stop(struct lua_State *L);
+
+#endif
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 1a0b1f6..4a4d77f 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -656,6 +656,7 @@ typedef struct global_State {
   BCIns bc_cfunc_int;	/* Bytecode for internal C function calls. */
   BCIns bc_cfunc_ext;	/* Bytecode for external C function calls. */
   GCRef cur_L;		/* Currently executing lua_State. */
+  GCRef mem_L;		/* Currently allocating lua_State. */
   MRef jit_base;	/* Current JIT code L->base or NULL. */
   MRef ctype_state;	/* Pointer to C type state. */
   GCRef gcroot[GCROOT_MAX];  /* GC roots. */
diff --git a/src/lj_state.c b/src/lj_state.c
index 1d9c628..1ed79a5 100644
--- a/src/lj_state.c
+++ b/src/lj_state.c
@@ -29,6 +29,10 @@
 #include "lj_alloc.h"
 #include "luajit.h"
 
+#if LJ_HASMEMPROF
+#include "lj_memprof.h"
+#endif
+
 /* -- Stack handling ------------------------------------------------------ */
 
 /* Stack sizes. */
@@ -243,6 +247,9 @@ LUA_API void lua_close(lua_State *L)
   global_State *g = G(L);
   int i;
   L = mainthread(g);  /* Only the main thread can be closed. */
+#if LJ_HASMEMPROF
+  lj_memprof_stop(L);
+#endif
 #if LJ_HASPROFILE
   luaJIT_profile_stop(L);
 #endif
diff --git a/src/ljamalg.c b/src/ljamalg.c
index 705e296..3f7e686 100644
--- a/src/ljamalg.c
+++ b/src/ljamalg.c
@@ -51,6 +51,7 @@
 #include "lj_api.c"
 #include "lj_mapi.c"
 #include "lj_profile.c"
+#include "lj_memprof.c"
 #include "lj_lex.c"
 #include "lj_parse.c"
 #include "lj_bcread.c"
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler
  2020-12-28  2:05 ` [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler Sergey Kaplun
@ 2020-12-28  2:49   ` Igor Munkin
  2020-12-28  5:19     ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-28  2:49 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! LGTM with the several comments below.

On 28.12.20, Sergey Kaplun wrote:
> This patch introduces Lua API for LuaJIT memory profiler implemented in
> the scope of the previous patch.
> 
> Profiler returns true value if started/stopped successfully,
> returns nil on failure (plus an error message as a second result and a
> system-dependent error code as a third result).
> If LuaJIT is build without memory profiler both return `false`.

Typo: s/`false`/false/ considering true and nil above.

> 
> <lj_errmsg.h> have adjusted with three new errors

Typo: s/have adjusted/has been adjusted/.

> PROF_MISUSE/PROF_ISRUNNING/PROF_NOTRUNNING returned in case when
> profiler has used incorrectly/started/stopped already correspondingly.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v3:
>   * Fixed lj_mem_new misuse.
>   * Moved buffer inside ctx.
>   * Codestyle fixes.
> 
>  src/Makefile.dep |   5 +-
>  src/lib_misc.c   | 166 +++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_errmsg.h  |   7 ++
>  3 files changed, 176 insertions(+), 2 deletions(-)
> 

<snipped>

> diff --git a/src/lib_misc.c b/src/lib_misc.c
> index 6f7b9a9..619cfb7 100644
> --- a/src/lib_misc.c
> +++ b/src/lib_misc.c

<snipped>

> @@ -67,8 +75,166 @@ LJLIB_CF(misc_getmetrics)

<snipped>

> +/* local started, err, errno = misc.memprof.start(fname) */
> +LJLIB_CF(misc_memprof_start)
> +{

<snipped>

> +  if (LJ_UNLIKELY(memprof_status != PROFILE_SUCCESS)) {
> +    lj_mem_free(ctx->g, ctx, sizeof(*ctx));

This deallocation causes double free if PROFILE_ERRIO occurs, since the
ctx is released within on_stop callback.

ctx->stream should be closed if PROFILE_ERR(USE|RUN) occurs.

> +    switch (memprof_status) {

<snipped>

> +    }
> +  }
> +  lua_pushboolean(L, 1);
> +

Typo: Excess newline.

> +  return 1;
> +}

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions
  2020-12-27 23:48   ` Igor Munkin
@ 2020-12-28  3:54     ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  3:54 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi, Igor!

Thanks for the review!

On 28.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > This patch introduces LJ_VMST_LFUNC and LJ_VMST_FFUNC VM states
> > separated from LJ_VMST_INERP. New VM states allow to determine the
> > context of Lua VM execution for x86 and x64 arches. Also, LJ_VMST_C is
> > renamed to LJ_VMST_CFUNC for naming consistence with new VM states.
> > 
> > Also, this patch adjusts stack layout for x86 and x64 arches to save VM
> > state for its consistency while stack unwinding when error is raised.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >  - Moved `.if not WIN` macro check inside (save|restore)_vmstate_through
> >  - Fixed naming: SAVE_UNUSED\d -> UNUSED\d
> > 
> >  src/lj_frame.h     |  18 +++----
> >  src/lj_obj.h       |   4 +-
> >  src/lj_profile.c   |   5 +-
> >  src/luajit-gdb.py  |  14 ++---
> >  src/vm_arm.dasc    |   6 +--
> >  src/vm_arm64.dasc  |   6 +--
> >  src/vm_mips.dasc   |   6 +--
> >  src/vm_mips64.dasc |   6 +--
> >  src/vm_ppc.dasc    |   6 +--
> >  src/vm_x64.dasc    |  93 ++++++++++++++++++++++----------
> >  src/vm_x86.dasc    | 131 +++++++++++++++++++++++++++++----------------
> >  11 files changed, 188 insertions(+), 107 deletions(-)
> > 
> > diff --git a/src/lj_frame.h b/src/lj_frame.h
> > index 19c49a4..2e693f9 100644
> > --- a/src/lj_frame.h
> > +++ b/src/lj_frame.h
> 
> <snipped>
> 
> > @@ -152,11 +152,11 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
> >  #define CFRAME_OFS_NRES		(22*4)
> >  #define CFRAME_OFS_MULTRES	(21*4)
> >  #endif
> > -#define CFRAME_SIZE		(10*8)
> > +#define CFRAME_SIZE		(12*8)
> 
> This change looks Windows-related, so is excess.

Reverted.

> 
> >  #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 9*16 + 4*8)
> 
> <snipped>
> 
> > diff --git a/src/lj_profile.c b/src/lj_profile.c
> > index 116998e..637e03c 100644
> > --- a/src/lj_profile.c
> > +++ b/src/lj_profile.c
> > @@ -157,7 +157,10 @@ static void profile_trigger(ProfileState *ps)
> >      int st = g->vmstate;
> >      ps->vmstate = st >= 0 ? 'N' :
> >  		  st == ~LJ_VMST_INTERP ? 'I' :
> > -		  st == ~LJ_VMST_C ? 'C' :
> > +		  st == ~LJ_VMST_CFUNC ? 'C' :
> > +		  /* Stubs for profiler hooks. */
> > +		  st == ~LJ_VMST_FFUNC ? 'I' :
> > +		  st == ~LJ_VMST_LFUNC ? 'I' :
> 
> Minor: Move this comparisons above to save the ordering from <lj_obj.h>.

Fixed.

> 
> >  		  st == ~LJ_VMST_GC ? 'G' : 'J';
> >      g->hookmask = (mask | HOOK_PROFILE);
> >      lj_dispatch_update(g);
> 
> <snipped>
> 
> > diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> > index 80753e0..83cc3e1 100644
> > --- a/src/vm_x64.dasc
> > +++ b/src/vm_x64.dasc
> 
> <snipped>
> 
> > @@ -161,26 +161,29 @@
> 
> <snipped>
> 
> > +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> > +|.define UNUSED2,	qword [rsp+qword*5]
> > +|.define UNUSED1,	dword [rsp+dword*8]
> 
> UNUSED1 should alias <dword [rsp + dword*9]>

Fixed this typo.

> 
> Minor: What about UNUSEDd and UNUSEDq instead? Feel free to ignore.

Looks inconsistent with respect to original code style.

> 
> > +|.define SAVE_VMSTATE,	dword [rsp+dword*8]
> 
> <snipped>
> 
> > @@ -342,6 +345,22 @@
> >  |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
> >  |.endmacro
> >  |
> > +|// Save vmstate through register.
> > +|.macro save_vmstate_through, reg
> > +|.if not WIN
> > +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> > +|  mov SAVE_VMSTATE, reg
> 
> This is too complicated. VM is implemented as a file with 6kloc of
> pseudo-assembly in it and I don't want to make its support harder. We've
> recently found a bug with miscomparison that we coudn't catch for a
> year. This is not a fun. This is not a kinda sport. This is damn 6k kloc
> of assembly we ought to support. So I propose to make this macro work
> with no args at all. Consider the following:
> |.macro save_vmstate
> |.if not WIN
> |  mov TMPRd, dword [DISPATCH+DISPATCH_GL(vmstate)]	// TMPRd is r10d
> |  mov SAVE_VMSTATE, TMPRd
> |.endif // WIN
> |
> |.macro restore_vmstate
> |.if not WIN
> |  mov TMPRd, SAVE_VMSTATE	// XCHGd is r11d
> |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], TMPRd
> |.endif // WIN
> 
> I agree that TMPRd is used in other places within vm_x64.dasc but this
> is temporary register and I failed to find any conflict. I left the
> comments below to ease the check.

Agree, update in the next version.

> 
> > +|.endif // WIN
> > +|.endmacro
> 
> <snipped>
> 
> > @@ -448,6 +467,8 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  xor eax, eax			// Ok return status for vm_pcall.
> >    |
> >    |->vm_leave_unw:
> > +  |  // DISPATCH required to set properly.
> > +  |  restore_vmstate_through RAd
> 
> Re TMPR: The TMP register is not used, so there is no need to restore
> vmstate via RAd. Everything is OK.

Thanks!

> 
> >    |  restoreregs
> >    |  ret
> >    |
> 
> <snipped>
> 
> > @@ -521,7 +544,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  mov [BASE-16], RA			// Prepend false to error message.
> >    |  mov [BASE-8], RB
> >    |  mov RA, -16			// Results start at BASE+RA = BASE-16.
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
> 
> Typo: Please, adjust the comments considering the ones nearby.

Fixed.

> 
> >    |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
> >    |
> >    |//-----------------------------------------------------------------------
> > @@ -575,6 +598,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  lea KBASE, [esp+CFRAME_RESUME]
> >    |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
> >    |  add DISPATCH, GG_G2DISP
> > +  |  save_vmstate_through TMPRd
> 
> Re TMPR: It is manually used here. Everything is OK.

Yep.

> 
> >    |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
> >    |  mov SAVE_CFRAME, RD
> >    |  mov SAVE_NRES, RDd
> > @@ -585,7 +609,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |
> >    |  // Resume after yield (like a return).
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
> 
> Typo: Please, adjust the comments considering the ones nearby.

Fixed.

> 
> >    |  mov byte L:RB->status, RDL
> >    |  mov BASE, L:RB->base
> >    |  mov RD, L:RB->top
> > @@ -622,11 +646,12 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  mov SAVE_CFRAME, KBASE
> >    |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
> >    |  add DISPATCH, GG_G2DISP
> > +  |  save_vmstate_through RDd
> 
> Re TMPR: The temporary register is not used prior to this line.
> Everything is OK.

Agree.

> 
> >    |  mov L:RB->cframe, rsp
> >    |
> >    |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
> 
> This branch is also taken for lj_vm_pcall/lj_vm_call/lj_vm_cpcall.

Adjusted comment.

> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >    |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
> >    |  add PC, RA
> >    |  sub PC, BASE			// PC = frame delta + frame type
> > @@ -658,6 +683,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  mov SAVE_ERRF, 0			// No error function.
> >    |  mov SAVE_NRES, KBASEd		// Neg. delta means cframe w/o frame.
> >    |   add DISPATCH, GG_G2DISP
> > +  |  save_vmstate_through KBASEd
> 
> Re TMPR: The temporary register is not used prior to this line.
> Everything is OK.

Agree.

> 
> >    |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
> >    |
> >    |  mov KBASE, L:RB->cframe		// Add our C frame to cframe chain.
> 
> <snipped>
> 
> > @@ -1578,7 +1606,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  mov L:PC, TMP1
> >    |  mov BASE, L:RB->base
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >    |
> >    |  cmp eax, LUA_YIELD
> >    |  ja >8
> 
> <snipped>
> 
> > @@ -3974,6 +4003,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >  
> >    case BC_CALL: case BC_CALLM:
> >      |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
> > +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
> 
> It looks like VM state is INTERP until the call enters *FUNC* bytecode.

Adjusted.

> 
> >      if (op == BC_CALLM) {
> >        |  add NARGS:RDd, MULTRES
> >      }
> > @@ -3995,6 +4025,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  mov LFUNC:RB, [RA-16]
> >      |  checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
> >      |->BC_CALLT_Z:
> > +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
> 
> Ditto.

Adjusted.

> 
> >      |  mov PC, [BASE-8]
> >      |  test PCd, FRAME_TYPE
> >      |  jnz >7
> > @@ -4219,6 +4250,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >        |  shl RAd, 3
> >      }
> >      |1:
> > +    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
> 
> Typo: Please, adjust the comments considering the ones nearby.

Fixed.

> 
> >      |  mov PC, [BASE-8]
> >      |  mov MULTRES, RDd			// Save nresults+1.
> >      |  test PCd, FRAME_TYPE		// Check frame type marker.
> 
> <snipped>
> 
> > @@ -4551,6 +4584,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
> >      |  mov KBASE, [PC-4+PC2PROTO(k)]
> >      |  mov L:RB, SAVE_L
> > +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
> 
> This is OK, but this path is also taken for the JFUNC. Do we need to set
> another VM state for this case (in scope of JLOOP)?

AFAIK, it doesn't purpose. LFUNC is consistent since JIT follows
Lua semantics.

> 
> >      |  lea RA, [BASE+RA*8]		// Top of frame.
> >      |  cmp RA, L:RB->maxstack
> >      |  ja ->vm_growstack_f
> > @@ -4588,6 +4622,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  mov [RD-8], RB			// Store delta + FRAME_VARG.
> >      |  mov [RD-16], LFUNC:KBASE		// Store copy of LFUNC.
> >      |  mov L:RB, SAVE_L
> > +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
> 
> Ditto.

Ditto to ditto :).

> 
> >      |  lea RA, [RD+RA*8]
> >      |  cmp RA, L:RB->maxstack
> >      |  ja ->vm_growstack_v		// Need to grow stack.
> 
> <snipped>
> 
> > @@ -4653,7 +4688,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  // nresults returned in eax (RD).
> >      |  mov BASE, L:RB->base
> >      |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -    |  set_vmstate INTERP
> > +    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >      |  lea RA, [BASE+RD*8]
> >      |  neg RA
> >      |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
> > diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> > index d76fbe3..b9dffa9 100644
> > --- a/src/vm_x86.dasc
> > +++ b/src/vm_x86.dasc
> 
> <snipped>
> 
> > @@ -290,33 +295,35 @@
> 
> <snipped>
> 
> > +|.define SAVE_CFRAME,	qword [rsp+qword*6]
> > +|.define UNUSED1,	qword [rsp+qword*5]
> 
> There is lost unused dword right here.

Thanks, fixed.

> 
> > +|.define SAVE_VMSTATE,	dword [rsp+dword*8]
> 
> <snipped>
> 
> > @@ -433,6 +440,22 @@
> >  |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
> >  |.endmacro
> >  |
> > +|// Save vmstate through register.
> > +|.macro save_vmstate_through, reg
> > +|.if not WIN
> > +|  mov reg, dword [DISPATCH+DISPATCH_GL(vmstate)]
> > +|  mov SAVE_VMSTATE, reg
> 
> See the rationale in the note to vm_x64.dasc. Unfortunately, for
> vm_x86.dasc the macro has two branches, *but* I think it is much better
> than managing free registers in *whole* VM. Consider the following:
> |.macro save_vmstate
> |.if not WIN
> |.if not X64
> |  mov UNUSED1, ecx	// Please rename this field (e.g SAVE_XCHG).
> |  mov ecx, dword [DISPATCH+DISPATCH_GL(vmstate)]
> |  mov SAVE_VMSTATE, ecx
> |  mov ecx, UNUSED1
> |.else // X64
> |  mov XCHGd, dword [DISPATCH+DISPATCH_GL(vmstate)]	// XCHGd is r11d
> |  mov SAVE_VMSTATE, XCHGd
> |.endif // X64
> |.endif // WIN
> |
> |.macro restore_vmstate
> |.if not WIN
> |.if not X64
> |  mov UNUSED1, ecx	// Please rename this field (e.g SPILLECX).
> |  mov ecx, SAVE_VMSTATE
> |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ecx
> |  mov ecx, UNUSED1
> |.else // X64
> |  mov XCHGd, SAVE_VMSTATE	// XCHGd is r11d
> |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], XCHGd
> |.endif // X64
> |.endif // WIN

Applied.

> 
> > +|.endif // WIN
> > +|.endmacro
> 
> <snipped>
> 
> >    |->vm_unwind_rethrow:
> > @@ -647,7 +674,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |  mov PC, [BASE-4]			// Fetch PC of previous frame.
> >    |  mov dword [BASE-4], LJ_TFALSE	// Prepend false to error message.
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or return to C
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >    |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
> >    |
> >    |.if WIN and not X64
> 
> <snipped>
> 
> > @@ -730,7 +758,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |
> >    |  // Resume after yield (like a return).
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >    |  mov byte L:RB->status, RDL
> >    |  mov BASE, L:RB->base
> >    |  mov RD, L:RB->top
> 
> <snipped>
> 
> > @@ -782,7 +811,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |
> >    |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // vm_resume: INTERP until executing BC_IFUNC*
> 
> Typo: Please, adjust the comments considering the ones nearby.
> 
> >    |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
> >    |  add PC, RA
> >    |  sub PC, BASE			// PC = frame delta + frame type
> 

Done.

> <snipped>
> 
> > @@ -1924,7 +1956,7 @@ static void build_subroutines(BuildCtx *ctx)
> >    |.endif
> >    |  mov BASE, L:RB->base
> >    |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -  |  set_vmstate INTERP
> > +  |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
> 
> Typo: Please, adjust the comments considering the ones nearby.
> 

Done.

> >    |
> >    |  cmp eax, LUA_YIELD
> >    |  ja >8
> 
> <snipped>
> 
> > @@ -4683,6 +4716,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >  
> >    case BC_CALL: case BC_CALLM:
> >      |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
> > +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
> 
> It looks like VM state is INTERP until the call enters *FUNC* bytecode.

Adjusted comment.

> 
> >      if (op == BC_CALLM) {
> >        |  add NARGS:RD, MULTRES
> >      }
> > @@ -4706,6 +4740,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  cmp dword [RA-4], LJ_TFUNC
> >      |  jne ->vmeta_call
> >      |->BC_CALLT_Z:
> > +    |  set_vmstate INTERP		// INTERP until a new BASE is setup
> 
> Ditto.

Adjusted comment.

> 
> >      |  mov PC, [BASE-4]
> >      |  test PC, FRAME_TYPE
> >      |  jnz >7
> > @@ -4989,6 +5024,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >        |  shl RA, 3
> >      }
> >      |1:
> > +    |  set_vmstate INTERP // INTERP until the old BASE & KBASE is restored
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >      |  mov PC, [BASE-4]
> >      |  mov MULTRES, RD			// Save nresults+1.
> >      |  test PC, FRAME_TYPE		// Check frame type marker.
> > @@ -5043,6 +5079,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  mov LFUNC:KBASE, [BASE-8]
> >      |  mov KBASE, LFUNC:KBASE->pc
> >      |  mov KBASE, [KBASE+PC2PROTO(k)]
> > +    |  set_vmstate LFUNC // LFUNC after the old BASE & KBASE is restored
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >      |  ins_next
> >      |
> >      |6:  // Fill up results with nil.
> > @@ -5330,6 +5367,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
> >      |  mov KBASE, [PC-4+PC2PROTO(k)]
> >      |  mov L:RB, SAVE_L
> > +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
> 
> This is OK, but this path is also taken for the JFUNC. Do we need to set
> another VM state for this case (in scope of JLOOP)?

See the answer above.

> 
> >      |  lea RA, [BASE+RA*8]		// Top of frame.
> >      |  cmp RA, L:RB->maxstack
> >      |  ja ->vm_growstack_f
> > @@ -5367,6 +5405,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  mov [RD-4], RB			// Store delta + FRAME_VARG.
> >      |  mov [RD-8], LFUNC:KBASE		// Store copy of LFUNC.
> >      |  mov L:RB, SAVE_L
> > +    |  set_vmstate LFUNC		// LFUNC after KBASE restoration
> 
> Ditto.

See the answer above.

> 
> >      |  lea RA, [RD+RA*8]
> >      |  cmp RA, L:RB->maxstack
> >      |  ja ->vm_growstack_v		// Need to grow stack.
> 
> <snipped>
> 
> > @@ -5441,7 +5480,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
> >      |  // nresults returned in eax (RD).
> >      |  mov BASE, L:RB->base
> >      |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
> > -    |  set_vmstate INTERP
> > +    |  set_vmstate INTERP // INTERP until jump to BC_RET* or vm_return
> 
> Typo: Please, adjust the comments considering the ones nearby.

Done.

> 
> >      |  lea RA, [BASE+RD*8]
> >      |  neg RA
> >      |  add RA, L:RB->top		// RA = (L->top-(L->base+nresults))*8
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

I'll send v3 version soon.

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 1/2] core: introduce memory profiler
  2020-12-28  2:06 ` [Tarantool-patches] [PATCH luajit v3 1/2] core: introduce " Sergey Kaplun
@ 2020-12-28  3:59   ` Igor Munkin
  0 siblings, 0 replies; 52+ messages in thread
From: Igor Munkin @ 2020-12-28  3:59 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the fixes! LGTM.

On 28.12.20, Sergey Kaplun wrote:
> This patch introduces memory profiler for Lua machine.
> 
> To determine currently allocating coroutine (that may not be equal to
> currently executed one) a new field mem_L is added to the
> global_State structure. This field is set on each allocation event and
> stores the coroutine address that is used for allocation.
> 
> First of all profiler dumps the definitions of all loaded Lua functions
> (symtab) via the write buffer introduced in one of the previous patches.
> 
> Profiler replaces the old allocation function with the instrumented one
> after symtab is dumped. This new function reports all allocations,
> reallocations or deallocations events via the write buffer during
> profiling. Subsequent content depends on the function's type (LFUNC,
> FFUNC or CFUNC).
> 
> When profiling is over, a special epilogue event header is written and
> the old allocation function is restored back.
> 
> This change also makes debug_frameline function LuaJIT-wide visible to
> be used in the memory profiler.
> 
> For more information, see <lj_memprof.h>.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v3:
>   * Fixed invalid pointer usage at on_stop cb.
>   * Dropped thread safe logic.
>   * Dropped unused functions.
>   * Added assertion to memprof_write_lfunc.
>   * Codestyle fixes.
> 
>  src/Makefile     |   5 +-
>  src/Makefile.dep |  30 +++--
>  src/lj_arch.h    |  15 +++
>  src/lj_debug.c   |   8 +-
>  src/lj_debug.h   |   3 +
>  src/lj_gc.c      |   7 +-
>  src/lj_gc.h      |   1 +
>  src/lj_memprof.c | 344 +++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_memprof.h | 159 ++++++++++++++++++++++
>  src/lj_obj.h     |   1 +
>  src/lj_state.c   |   7 +
>  src/ljamalg.c    |   1 +
>  12 files changed, 561 insertions(+), 20 deletions(-)
>  create mode 100644 src/lj_memprof.c
>  create mode 100644 src/lj_memprof.h
> 

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Tarantool-patches] [PATCH luajit v3 3/7] vm: introduce VM states for Lua and fast functions
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (8 preceding siblings ...)
  2020-12-28  2:06 ` [Tarantool-patches] [PATCH luajit v3 1/2] core: introduce " Sergey Kaplun
@ 2020-12-28  4:05 ` Sergey Kaplun
  2020-12-28  5:14   ` Igor Munkin
  2020-12-28  6:01 ` [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Alexander V. Tikhonov
  2020-12-28  8:15 ` Igor Munkin
  11 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  4:05 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

This patch introduces LJ_VMST_LFUNC and LJ_VMST_FFUNC VM states
separated from LJ_VMST_INERP. New VM states allow to determine the
context of Lua VM execution for x86 and x64 arches. Also, LJ_VMST_C is
renamed to LJ_VMST_CFUNC for naming consistence with new VM states.

Also, this patch adjusts stack layout for x86 and x64 arches to save VM
state for its consistency while stack unwinding when error is raised.

To group all traces into the one vmstate, a special
macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.

Part of tarantool/tarantool#5442
---

Changes in v3:
 * Adjusted vmstate saving.
 * Fix Win X64.
 * Fix typos in frame layout.
 * Codestyle fixes.

 src/lj_frame.h     |  16 ++---
 src/lj_obj.h       |  11 +++-
 src/lj_profile.c   |   5 +-
 src/luajit-gdb.py  |  14 +++--
 src/vm_arm.dasc    |   6 +-
 src/vm_arm64.dasc  |   6 +-
 src/vm_mips.dasc   |   6 +-
 src/vm_mips64.dasc |   6 +-
 src/vm_ppc.dasc    |   6 +-
 src/vm_x64.dasc    |  93 +++++++++++++++++++++--------
 src/vm_x86.dasc    | 146 ++++++++++++++++++++++++++++++++-------------
 11 files changed, 218 insertions(+), 97 deletions(-)

diff --git a/src/lj_frame.h b/src/lj_frame.h
index 19c49a4..9fd63fa 100644
--- a/src/lj_frame.h
+++ b/src/lj_frame.h
@@ -127,13 +127,13 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_SIZE		(16*4)
 #define CFRAME_SHIFT_MULTRES	0
 #else
-#define CFRAME_OFS_ERRF		(15*4)
-#define CFRAME_OFS_NRES		(14*4)
-#define CFRAME_OFS_PREV		(13*4)
-#define CFRAME_OFS_L		(12*4)
+#define CFRAME_OFS_ERRF		(19*4)
+#define CFRAME_OFS_NRES		(18*4)
+#define CFRAME_OFS_PREV		(17*4)
+#define CFRAME_OFS_L		(16*4)
 #define CFRAME_OFS_PC		(6*4)
 #define CFRAME_OFS_MULTRES	(5*4)
-#define CFRAME_SIZE		(12*4)
+#define CFRAME_SIZE		(16*4)
 #define CFRAME_SHIFT_MULTRES	0
 #endif
 #elif LJ_TARGET_X64
@@ -156,7 +156,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 9*16 + 4*8)
 #define CFRAME_SHIFT_MULTRES	0
 #else
-#define CFRAME_OFS_PREV		(4*8)
+#define CFRAME_OFS_PREV		(6*8)
 #if LJ_GC64
 #define CFRAME_OFS_PC		(3*8)
 #define CFRAME_OFS_L		(2*8)
@@ -171,9 +171,9 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK };  /* Special continuations. */
 #define CFRAME_OFS_MULTRES	(1*4)
 #endif
 #if LJ_NO_UNWIND
-#define CFRAME_SIZE		(12*8)
+#define CFRAME_SIZE		(14*8)
 #else
-#define CFRAME_SIZE		(10*8)
+#define CFRAME_SIZE		(12*8)
 #endif
 #define CFRAME_SIZE_JIT		(CFRAME_SIZE + 16)
 #define CFRAME_SHIFT_MULTRES	0
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 927b347..1a0b1f6 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -512,7 +512,9 @@ typedef struct GCtab {
 /* VM states. */
 enum {
   LJ_VMST_INTERP,	/* Interpreter. */
-  LJ_VMST_C,		/* C function. */
+  LJ_VMST_LFUNC,	/* Lua function. */
+  LJ_VMST_FFUNC,	/* Fast function. */
+  LJ_VMST_CFUNC,	/* C function. */
   LJ_VMST_GC,		/* Garbage collector. */
   LJ_VMST_EXIT,		/* Trace exit handler. */
   LJ_VMST_RECORD,	/* Trace recorder. */
@@ -521,6 +523,13 @@ enum {
   LJ_VMST__MAX
 };
 
+/*
+** In fact, when VM executes a trace, vmstate is set to the trace number,
+** but we set the boundary to group all traces in a single pseudo-vmstate.
+*/
+
+#define LJ_VMST_TRACE		(LJ_VMST__MAX)
+
 #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
 
 /* Metamethods. ORDER MM */
diff --git a/src/lj_profile.c b/src/lj_profile.c
index 116998e..7b09a63 100644
--- a/src/lj_profile.c
+++ b/src/lj_profile.c
@@ -157,7 +157,10 @@ static void profile_trigger(ProfileState *ps)
     int st = g->vmstate;
     ps->vmstate = st >= 0 ? 'N' :
 		  st == ~LJ_VMST_INTERP ? 'I' :
-		  st == ~LJ_VMST_C ? 'C' :
+		  /* Stubs for profiler hooks. */
+		  st == ~LJ_VMST_LFUNC ? 'I' :
+		  st == ~LJ_VMST_FFUNC ? 'I' :
+		  st == ~LJ_VMST_CFUNC ? 'C' :
 		  st == ~LJ_VMST_GC ? 'G' : 'J';
     g->hookmask = (mask | HOOK_PROFILE);
     lj_dispatch_update(g);
diff --git a/src/luajit-gdb.py b/src/luajit-gdb.py
index 652c560..f1fd623 100644
--- a/src/luajit-gdb.py
+++ b/src/luajit-gdb.py
@@ -206,12 +206,14 @@ def J(g):
 def vm_state(g):
     return {
         i2notu32(0): 'INTERP',
-        i2notu32(1): 'C',
-        i2notu32(2): 'GC',
-        i2notu32(3): 'EXIT',
-        i2notu32(4): 'RECORD',
-        i2notu32(5): 'OPT',
-        i2notu32(6): 'ASM',
+        i2notu32(1): 'LFUNC',
+        i2notu32(2): 'FFUNC',
+        i2notu32(3): 'CFUNC',
+        i2notu32(4): 'GC',
+        i2notu32(5): 'EXIT',
+        i2notu32(6): 'RECORD',
+        i2notu32(7): 'OPT',
+        i2notu32(8): 'ASM',
     }.get(int(tou32(g['vmstate'])), 'TRACE')
 
 def gc_state(g):
diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
index d4cdaf5..ae2efdf 100644
--- a/src/vm_arm.dasc
+++ b/src/vm_arm.dasc
@@ -287,7 +287,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  str RB, L->base
   |   ldr KBASE, SAVE_NRES
-  |    mv_vmstate CARG4, C
+  |    mv_vmstate CARG4, CFUNC
   |   sub BASE, BASE, #8
   |  subs CARG3, RC, #8
   |   lsl KBASE, KBASE, #3		// KBASE = (nresults_wanted+1)*8
@@ -348,7 +348,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ldr L, SAVE_L
-  |   mv_vmstate CARG4, C
+  |   mv_vmstate CARG4, CFUNC
   |  ldr GL:CARG3, L->glref
   |   str CARG4, GL:CARG3->vmstate
   |   str L, GL:CARG3->cur_L
@@ -4487,7 +4487,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (op == BC_FUNCCW) {
       |  ldr CARG2, CFUNC:CARG3->f
     }
-    |    mv_vmstate CARG3, C
+    |    mv_vmstate CARG3, CFUNC
     |  mov CARG1, L
     |   bhi ->vm_growstack_c		// Need to grow stack.
     |    st_vmstate CARG3
diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
index 3eaf376..f783428 100644
--- a/src/vm_arm64.dasc
+++ b/src/vm_arm64.dasc
@@ -332,7 +332,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  str RB, L->base
   |   ldrsw CARG2, SAVE_NRES		// CARG2 = nresults+1.
-  |    mv_vmstate TMP0w, C
+  |    mv_vmstate TMP0w, CFUNC
   |   sub BASE, BASE, #16
   |  subs TMP2, RC, #8
   |    st_vmstate TMP0w
@@ -391,7 +391,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ldr L, SAVE_L
-  |   mv_vmstate TMP0w, C
+  |   mv_vmstate TMP0w, CFUNC
   |  ldr GL, L->glref
   |   st_vmstate TMP0w
   |  b ->vm_leave_unw
@@ -3816,7 +3816,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     if (op == BC_FUNCCW) {
       |  ldr CARG2, CFUNC:CARG3->f
     }
-    |    mv_vmstate TMP0w, C
+    |    mv_vmstate TMP0w, CFUNC
     |  mov CARG1, L
     |   bhi ->vm_growstack_c		// Need to grow stack.
     |    st_vmstate TMP0w
diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc
index 1afd611..ec57d78 100644
--- a/src/vm_mips.dasc
+++ b/src/vm_mips.dasc
@@ -403,7 +403,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  addiu TMP1, RD, -8
   |   sw TMP2, L->base
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   lw TMP2, SAVE_NRES
   |   addiu BASE, BASE, -8
   |    st_vmstate
@@ -473,7 +473,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  move CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  lw L, SAVE_L
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  lw GL:TMP1, L->glref
   |  b ->vm_leave_unw
   |.  sw TMP0, GL:TMP1->vmstate
@@ -5085,7 +5085,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  sw BASE, L->base
     |  sltu AT, TMP2, TMP1
     |   sw RC, L->top
-    |    li_vmstate C
+    |    li_vmstate CFUNC
     if (op == BC_FUNCCW) {
       |  lw CARG2, CFUNC:RB->f
     }
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index c06270a..9a749f9 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
@@ -449,7 +449,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  addiu TMP1, RD, -8
   |   sd TMP2, L->base
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   lw TMP2, SAVE_NRES
   |   daddiu BASE, BASE, -16
   |    st_vmstate
@@ -517,7 +517,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  move CRET1, CARG2
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  ld L, SAVE_L
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  ld GL:TMP1, L->glref
   |  b ->vm_leave_unw
   |.  sw TMP0, GL:TMP1->vmstate
@@ -4952,7 +4952,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  sd BASE, L->base
     |  sltu AT, TMP2, TMP1
     |   sd RC, L->top
-    |    li_vmstate C
+    |    li_vmstate CFUNC
     if (op == BC_FUNCCW) {
       |  ld CARG2, CFUNC:RB->f
     }
diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
index b4260eb..62e9b68 100644
--- a/src/vm_ppc.dasc
+++ b/src/vm_ppc.dasc
@@ -520,7 +520,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  // TMP0 = PC & FRAME_TYPE
   |  cmpwi TMP0, FRAME_C
   |   rlwinm TMP2, PC, 0, 0, 28
-  |    li_vmstate C
+  |    li_vmstate CFUNC
   |   sub TMP2, BASE, TMP2		// TMP2 = previous base.
   |  bney ->vm_returnp
   |
@@ -596,7 +596,7 @@ static void build_subroutines(BuildCtx *ctx)
   |->vm_unwind_c_eh:			// Landing pad for external unwinder.
   |  lwz L, SAVE_L
   |  .toc ld TOCREG, SAVE_TOC
-  |   li TMP0, ~LJ_VMST_C
+  |   li TMP0, ~LJ_VMST_CFUNC
   |  lwz GL:TMP1, L->glref
   |   stw TMP0, GL:TMP1->vmstate
   |  b ->vm_leave_unw
@@ -5060,7 +5060,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   stp BASE, L->base
     |   cmplw TMP1, TMP2
     |    stp RC, L->top
-    |     li_vmstate C
+    |     li_vmstate CFUNC
     |.if TOC
     |  mtctr TMP3
     |.else
diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
index 80753e0..974047d 100644
--- a/src/vm_x64.dasc
+++ b/src/vm_x64.dasc
@@ -140,7 +140,7 @@
 |//-----------------------------------------------------------------------
 |.else			// x64/POSIX stack layout
 |
-|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
+|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
 |.macro saveregs_
 |  push rbx; push r15; push r14
 |.if NO_UNWIND
@@ -161,26 +161,29 @@
 |
 |//----- 16 byte aligned,
 |.if NO_UNWIND
-|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*10]
-|.define SAVE_R3,	aword [rsp+aword*9]
-|.define SAVE_R2,	aword [rsp+aword*8]
-|.define SAVE_R1,	aword [rsp+aword*7]
-|.define SAVE_RU2,	aword [rsp+aword*6]
-|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*12]
+|.define SAVE_R3,	qword [rsp+qword*11]
+|.define SAVE_R2,	qword [rsp+qword*10]
+|.define SAVE_R1,	qword [rsp+qword*9]
+|.define SAVE_RU2,	qword [rsp+qword*8]
+|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.else
-|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*8]
-|.define SAVE_R3,	aword [rsp+aword*7]
-|.define SAVE_R2,	aword [rsp+aword*6]
-|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*10]
+|.define SAVE_R3,	qword [rsp+qword*9]
+|.define SAVE_R2,	qword [rsp+qword*8]
+|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.endif
-|.define SAVE_CFRAME,	aword [rsp+aword*4]
-|.define SAVE_PC,	aword [rsp+aword*3]
-|.define SAVE_L,	aword [rsp+aword*2]
+|.define SAVE_CFRAME,	qword [rsp+qword*6]
+|.define UNUSED2,	qword [rsp+qword*5]
+|.define UNUSED1,	dword [rsp+dword*9]
+|.define SAVE_VMSTATE,	dword [rsp+dword*8]
+|.define SAVE_PC,	qword [rsp+qword*3]
+|.define SAVE_L,	qword [rsp+qword*2]
 |.define SAVE_ERRF,	dword [rsp+dword*3]
 |.define SAVE_NRES,	dword [rsp+dword*2]
-|.define TMP1,		aword [rsp]		//<-- rsp while in interpreter.
+|.define TMP1,		qword [rsp]		//<-- rsp while in interpreter.
 |//----- 16 byte aligned
 |
 |.define TMP1d,		dword [rsp]
@@ -342,6 +345,22 @@
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
 |.endmacro
 |
+|// Uses TMPRd (r10d).
+|.macro save_vmstate
+|.if not WIN
+|  mov TMPRd, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, TMPRd
+|.endif // WIN
+|.endmacro
+|
+|// Uses r10d.
+|.macro restore_vmstate
+|.if not WIN
+|  mov TMPRd, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], TMPRd
+|.endif // WIN
+|.endmacro
+|
 |.macro fpop1; fstp st1; .endmacro
 |
 |// Synthesize SSE FP constants.
@@ -416,7 +435,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jnz ->vm_returnp
   |
   |  // Return to C.
-  |  set_vmstate C
+  |  set_vmstate CFUNC
   |  and PC, -8
   |  sub PC, BASE
   |  neg PC				// Previous base = BASE - delta.
@@ -448,6 +467,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  xor eax, eax			// Ok return status for vm_pcall.
   |
   |->vm_leave_unw:
+  |  // DISPATCH required to set properly.
+  |  restore_vmstate			// Caveat: uses TMPRd (r10d).
   |  restoreregs
   |  ret
   |
@@ -493,7 +514,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:DISPATCH, SAVE_L
   |  mov GL:RB, L:DISPATCH->glref
   |  mov GL:RB->cur_L, L:DISPATCH
-  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
+  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
+  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
+  |  add DISPATCH, GG_G2DISP
   |  jmp ->vm_leave_unw
   |
   |->vm_unwind_rethrow:
@@ -521,6 +544,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov [BASE-16], RA			// Prepend false to error message.
   |  mov [BASE-8], RB
   |  mov RA, -16			// Results start at BASE+RA = BASE-16.
+  |  // INTERP until jump to BC_RET* or return to C.
   |  set_vmstate INTERP
   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
   |
@@ -575,6 +599,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  lea KBASE, [esp+CFRAME_RESUME]
   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate			// Caveat: uses TMPRd (r10d).
   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
   |  mov SAVE_CFRAME, RD
   |  mov SAVE_NRES, RDd
@@ -585,6 +610,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  // Resume after yield (like a return).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+  |  // INTERP until jump to BC_RET* or vm_return.
   |  set_vmstate INTERP
   |  mov byte L:RB->status, RDL
   |  mov BASE, L:RB->base
@@ -622,11 +648,12 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_CFRAME, KBASE
   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate			// Caveat: uses TMPRd (r10d).
   |  mov L:RB->cframe, rsp
   |
   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP			// INTERP until executing BC_IFUNC*.
   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
   |  add PC, RA
   |  sub PC, BASE			// PC = frame delta + frame type
@@ -658,6 +685,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_ERRF, 0			// No error function.
   |  mov SAVE_NRES, KBASEd		// Neg. delta means cframe w/o frame.
   |   add DISPATCH, GG_G2DISP
+  |  save_vmstate			// Caveat: uses TMPRd (r10d).
   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
   |
   |  mov KBASE, L:RB->cframe		// Add our C frame to cframe chain.
@@ -697,6 +725,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  cleartp LFUNC:KBASE
   |  mov KBASE, LFUNC:KBASE->pc
   |  mov KBASE, [KBASE+PC2PROTO(k)]
+  |  set_vmstate LFUNC			// LFUNC after KBASE restoration.
   |  // BASE = base, RC = result, RB = meta base
   |  jmp RA				// Jump to continuation.
   |
@@ -1137,15 +1166,16 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
+  |  set_vmstate FFUNC
   |.endmacro
   |
   |.macro .ffunc_1, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RDd, 1+1;  jb ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_2, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RDd, 2+1;  jb ->fff_fallback
   |.endmacro
   |
@@ -1578,6 +1608,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:PC, TMP1
   |  mov BASE, L:RB->base
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+  |  // INTERP until jump to BC_RET* or vm_return.
   |  set_vmstate INTERP
   |
   |  cmp eax, LUA_YIELD
@@ -1717,6 +1748,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  movzx RAd, PC_RA
   |  neg RA
   |  lea BASE, [BASE+RA*8-16]		// base = base - (RA+2)*8
+  |  set_vmstate LFUNC			// LFUNC state after BASE restoration.
   |  ins_next
   |
   |6:  // Fill up results with nil.
@@ -2481,7 +2513,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  mov L:RB->base, BASE
   |  mov qword [DISPATCH+DISPATCH_GL(jit_base)], 0
-  |  set_vmstate INTERP
+  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration.
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  mov RCd, [PC]
   |  movzx RAd, RCH
@@ -2697,8 +2729,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov CARG1, CTSTATE
   |  call extern lj_ccallback_enter	// (CTState *cts, void *cf)
   |  // lua_State * returned in eax (RD).
-  |  set_vmstate INTERP
   |  mov BASE, L:RD->base
+  |  set_vmstate LFUNC			// LFUNC after BASE restoration.
   |  mov RD, L:RD->top
   |  sub RD, BASE
   |  mov LFUNC:RB, [BASE-16]
@@ -3974,6 +4006,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 
   case BC_CALL: case BC_CALLM:
     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+    |  // INTERP until enters *FUNC* bytecode and a new BASE is setup.
+    |  set_vmstate INTERP
     if (op == BC_CALLM) {
       |  add NARGS:RDd, MULTRES
     }
@@ -3995,6 +4029,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov LFUNC:RB, [RA-16]
     |  checktp_nc LFUNC:RB, LJ_TFUNC, ->vmeta_call
     |->BC_CALLT_Z:
+    |  // INTERP until enters *FUNC* bytecode and a new BASE is setup.
+    |  set_vmstate INTERP
     |  mov PC, [BASE-8]
     |  test PCd, FRAME_TYPE
     |  jnz >7
@@ -4219,6 +4255,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  shl RAd, 3
     }
     |1:
+    |  // INTERP until the old BASE & KBASE is restored.
+    |  set_vmstate INTERP
     |  mov PC, [BASE-8]
     |  mov MULTRES, RDd			// Save nresults+1.
     |  test PCd, FRAME_TYPE		// Check frame type marker.
@@ -4260,6 +4298,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cleartp LFUNC:KBASE
     |  mov KBASE, LFUNC:KBASE->pc
     |  mov KBASE, [KBASE+PC2PROTO(k)]
+    |  // LFUNC after the old BASE & KBASE is restored.
+    |  set_vmstate LFUNC
     |  ins_next
     |
     |6:  // Fill up results with nil.
@@ -4551,6 +4591,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
     |  mov KBASE, [PC-4+PC2PROTO(k)]
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration.
     |  lea RA, [BASE+RA*8]		// Top of frame.
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_f
@@ -4588,6 +4629,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov [RD-8], RB			// Store delta + FRAME_VARG.
     |  mov [RD-16], LFUNC:KBASE		// Store copy of LFUNC.
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration.
     |  lea RA, [RD+RA*8]
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_v		// Need to grow stack.
@@ -4643,7 +4685,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  mov CARG1, L:RB		// Caveat: CARG1 may be RA.
     }
     |  ja ->vm_growstack_c		// Need to grow stack.
-    |  set_vmstate C
+    |  set_vmstate CFUNC		// CFUNC before entering C function.
     if (op == BC_FUNCC) {
       |  call KBASE			// (lua_State *L)
     } else {
@@ -4653,6 +4695,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  // nresults returned in eax (RD).
     |  mov BASE, L:RB->base
     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+    |  // INTERP until jump to BC_RET* or vm_return.
     |  set_vmstate INTERP
     |  lea RA, [BASE+RD*8]
     |  neg RA
diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
index d76fbe3..ab8e6f2 100644
--- a/src/vm_x86.dasc
+++ b/src/vm_x86.dasc
@@ -99,6 +99,8 @@
 |.define CARG6d,	r9d
 |.define FCARG1,	CARG1d		// Simulate x86 fastcall.
 |.define FCARG2,	CARG2d
+|
+|.define XCHGd,		r11d		// TMP on x64, used for exchange.
 |.endif
 |
 |// Type definitions. Some of these are only used for documentation.
@@ -140,7 +142,7 @@
 |
 |.else
 |
-|.define CFRAME_SPACE,	aword*7			// Delta for esp (see <--).
+|.define CFRAME_SPACE,	dword*11			// Delta for esp (see <--).
 |.macro saveregs_
 |  push edi; push esi; push ebx
 |  sub esp, CFRAME_SPACE
@@ -183,25 +185,30 @@
 |.define ARG1,		aword [esp]		//<-- esp while in interpreter.
 |//----- 16 byte aligned, ^^^ arguments for C callee
 |.else
-|.define SAVE_ERRF,	aword [esp+aword*15]	// vm_pcall/vm_cpcall only.
-|.define SAVE_NRES,	aword [esp+aword*14]
-|.define SAVE_CFRAME,	aword [esp+aword*13]
-|.define SAVE_L,	aword [esp+aword*12]
+|.define SAVE_ERRF,	dword [esp+dword*19]	// vm_pcall/vm_cpcall only.
+|.define SAVE_NRES,	dword [esp+dword*18]
+|.define SAVE_CFRAME,	dword [esp+dword*17]
+|.define SAVE_L,	dword [esp+dword*16]
 |//----- 16 byte aligned, ^^^ arguments from C caller
-|.define SAVE_RET,	aword [esp+aword*11]	//<-- esp entering interpreter.
-|.define SAVE_R4,	aword [esp+aword*10]
-|.define SAVE_R3,	aword [esp+aword*9]
-|.define SAVE_R2,	aword [esp+aword*8]
+|.define SAVE_RET,	dword [esp+dword*15]	//<-- esp entering interpreter.
+|.define SAVE_R4,	dword [esp+dword*14]
+|.define SAVE_R3,	dword [esp+dword*13]
+|.define SAVE_R2,	dword [esp+dword*12]
 |//----- 16 byte aligned
-|.define SAVE_R1,	aword [esp+aword*7]	//<-- esp after register saves.
-|.define SAVE_PC,	aword [esp+aword*6]
-|.define TMP2,		aword [esp+aword*5]
-|.define TMP1,		aword [esp+aword*4]
+|.define UNUSED2,	dword [esp+dword*11]
+|.define UNUSED1,	dword [esp+dword*10]
+|.define SPILLECX,	dword [esp+dword*9]
+|.define SAVE_VMSTATE,	dword [esp+dword*8]
 |//----- 16 byte aligned
-|.define ARG4,		aword [esp+aword*3]
-|.define ARG3,		aword [esp+aword*2]
-|.define ARG2,		aword [esp+aword*1]
-|.define ARG1,		aword [esp]		//<-- esp while in interpreter.
+|.define SAVE_R1,	dword [esp+dword*7]	//<-- esp after register saves.
+|.define SAVE_PC,	dword [esp+dword*6]
+|.define TMP2,		dword [esp+dword*5]
+|.define TMP1,		dword [esp+dword*4]
+|//----- 16 byte aligned
+|.define ARG4,		dword [esp+dword*3]
+|.define ARG3,		dword [esp+dword*2]
+|.define ARG2,		dword [esp+dword*1]
+|.define ARG1,		dword [esp]		//<-- esp while in interpreter.
 |//----- 16 byte aligned, ^^^ arguments for C callee
 |.endif
 |
@@ -269,7 +276,7 @@
 |//-----------------------------------------------------------------------
 |.else			// x64/POSIX stack layout
 |
-|.define CFRAME_SPACE,	aword*5			// Delta for rsp (see <--).
+|.define CFRAME_SPACE,	qword*7			// Delta for rsp (see <--).
 |.macro saveregs_
 |  push rbx; push r15; push r14
 |.if NO_UNWIND
@@ -290,33 +297,36 @@
 |
 |//----- 16 byte aligned,
 |.if NO_UNWIND
-|.define SAVE_RET,	aword [rsp+aword*11]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*10]
-|.define SAVE_R3,	aword [rsp+aword*9]
-|.define SAVE_R2,	aword [rsp+aword*8]
-|.define SAVE_R1,	aword [rsp+aword*7]
-|.define SAVE_RU2,	aword [rsp+aword*6]
-|.define SAVE_RU1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*13]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*12]
+|.define SAVE_R3,	qword [rsp+qword*11]
+|.define SAVE_R2,	qword [rsp+qword*10]
+|.define SAVE_R1,	qword [rsp+qword*9]
+|.define SAVE_RU2,	qword [rsp+qword*8]
+|.define SAVE_RU1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.else
-|.define SAVE_RET,	aword [rsp+aword*9]	//<-- rsp entering interpreter.
-|.define SAVE_R4,	aword [rsp+aword*8]
-|.define SAVE_R3,	aword [rsp+aword*7]
-|.define SAVE_R2,	aword [rsp+aword*6]
-|.define SAVE_R1,	aword [rsp+aword*5]	//<-- rsp after register saves.
+|.define SAVE_RET,	qword [rsp+qword*11]	//<-- rsp entering interpreter.
+|.define SAVE_R4,	qword [rsp+qword*10]
+|.define SAVE_R3,	qword [rsp+qword*9]
+|.define SAVE_R2,	qword [rsp+qword*8]
+|.define SAVE_R1,	qword [rsp+qword*7]	//<-- rsp after register saves.
 |.endif
-|.define SAVE_CFRAME,	aword [rsp+aword*4]
+|.define SAVE_CFRAME,	qword [rsp+qword*6]
+|.define UNUSED2,	qword [rsp+qword*5]
+|.define UNUSED1,	dword [rsp+dword*9]
+|.define SAVE_VMSTATE,	dword [rsp+dword*8]
 |.define SAVE_PC,	dword [rsp+dword*7]
 |.define SAVE_L,	dword [rsp+dword*6]
 |.define SAVE_ERRF,	dword [rsp+dword*5]
 |.define SAVE_NRES,	dword [rsp+dword*4]
-|.define TMPa,		aword [rsp+aword*1]
+|.define TMPa,		qword [rsp+qword*1]
 |.define TMP2,		dword [rsp+dword*1]
 |.define TMP1,		dword [rsp]		//<-- rsp while in interpreter.
 |//----- 16 byte aligned
 |
 |// TMPQ overlaps TMP1/TMP2. MULTRES overlaps TMP2 (and TMPQ).
 |.define TMPQ,		qword [rsp]
-|.define TMP3,		dword [rsp+aword*1]
+|.define TMP3,		dword [rsp+qword*1]
 |.define MULTRES,	TMP2
 |
 |.endif
@@ -433,6 +443,36 @@
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ~LJ_VMST_..st
 |.endmacro
 |
+|// Uses spilled ecx on x86 or XCHGd (r11d) on x64.
+|.macro save_vmstate
+|.if not WIN
+|.if not X64
+|  mov SPILLECX, ecx
+|  mov ecx, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, ecx
+|  mov ecx, SPILLECX
+|.else // X64
+|  mov XCHGd, dword [DISPATCH+DISPATCH_GL(vmstate)]
+|  mov SAVE_VMSTATE, XCHGd
+|.endif // X64
+|.endif // WIN
+|.endmacro
+|
+|// Uses spilled ecx on x86 or XCHGd (r11d) on x64.
+|.macro restore_vmstate
+|.if not WIN
+|.if not X64
+|  mov SPILLECX, ecx
+|  mov ecx, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ecx
+|  mov ecx, SPILLECX
+|.else // X64
+|  mov XCHGd, SAVE_VMSTATE
+|  mov dword [DISPATCH+DISPATCH_GL(vmstate)], XCHGd
+|.endif // X64
+|.endif // WIN
+|.endmacro
+|
 |// x87 compares.
 |.macro fcomparepp			// Compare and pop st0 >< st1.
 |  fucomip st1
@@ -520,7 +560,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  jnz ->vm_returnp
   |
   |  // Return to C.
-  |  set_vmstate C
+  |  set_vmstate CFUNC
   |  and PC, -8
   |  sub PC, BASE
   |  neg PC				// Previous base = BASE - delta.
@@ -559,6 +599,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  xor eax, eax			// Ok return status for vm_pcall.
   |
   |->vm_leave_unw:
+  |  // DISPATCH required to set properly.
+  |  restore_vmstate			// Caveat: on x64 uses XCHGd (r11d).
   |  restoreregs
   |  ret
   |
@@ -613,7 +655,9 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov L:DISPATCH, SAVE_L
   |  mov GL:RB, L:DISPATCH->glref
   |  mov dword GL:RB->cur_L, L:DISPATCH
-  |  mov dword GL:RB->vmstate, ~LJ_VMST_C
+  |  mov dword GL:RB->vmstate, ~LJ_VMST_CFUNC
+  |  mov DISPATCH, L:DISPATCH->glref	// Setup pointer to dispatch table.
+  |  add DISPATCH, GG_G2DISP
   |  jmp ->vm_leave_unw
   |
   |->vm_unwind_rethrow:
@@ -647,6 +691,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov PC, [BASE-4]			// Fetch PC of previous frame.
   |  mov dword [BASE-4], LJ_TFALSE	// Prepend false to error message.
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+  |  // INTERP until jump to BC_RET* or return to C.
   |  set_vmstate INTERP
   |  jmp ->vm_returnc			// Increments RD/MULTRES and returns.
   |
@@ -718,6 +763,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  lea KBASEa, [esp+CFRAME_RESUME]
   |  mov DISPATCH, L:RB->glref		// Setup pointer to dispatch table.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate			// Caveat: on x64 uses XCHGd (r11d).
   |  mov SAVE_PC, RD			// Any value outside of bytecode is ok.
   |  mov SAVE_CFRAME, RDa
   |.if X64
@@ -730,6 +776,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |  // Resume after yield (like a return).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+  |  // INTERP until jump to BC_RET* or vm_return.
   |  set_vmstate INTERP
   |  mov byte L:RB->status, RDL
   |  mov BASE, L:RB->base
@@ -774,6 +821,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_CFRAME, KBASEa
   |  mov SAVE_PC, L:RB			// Any value outside of bytecode is ok.
   |  add DISPATCH, GG_G2DISP
+  |  save_vmstate			// Caveat: on x64 uses XCHGd (r11d).
   |.if X64
   |  mov L:RB->cframe, rsp
   |.else
@@ -782,7 +830,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |2:  // Entry point for vm_resume/vm_cpcall (RA = base, RB = L, PC = ftype).
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
-  |  set_vmstate INTERP
+  |  set_vmstate INTERP // INTERP until executing BC_IFUNC*.
   |  mov BASE, L:RB->base		// BASE = old base (used in vmeta_call).
   |  add PC, RA
   |  sub PC, BASE			// PC = frame delta + frame type
@@ -823,6 +871,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov SAVE_ERRF, 0			// No error function.
   |  mov SAVE_NRES, KBASE		// Neg. delta means cframe w/o frame.
   |   add DISPATCH, GG_G2DISP
+  |  save_vmstate			// Caveat: on x64 uses XCHGd (r11d).
   |  // Handler may change cframe_nres(L->cframe) or cframe_errfunc(L->cframe).
   |
   |.if X64
@@ -885,6 +934,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, LFUNC:KBASE->pc
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  // BASE = base, RC = result, RB = meta base
+  |  set_vmstate LFUNC			// LFUNC after KBASE restoration.
   |  jmp RAa				// Jump to continuation.
   |
   |.if FFI
@@ -1409,15 +1459,16 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
+  |  set_vmstate FFUNC
   |.endmacro
   |
   |.macro .ffunc_1, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RD, 1+1;  jb ->fff_fallback
   |.endmacro
   |
   |.macro .ffunc_2, name
-  |->ff_ .. name:
+  |  .ffunc name
   |  cmp NARGS:RD, 2+1;  jb ->fff_fallback
   |.endmacro
   |
@@ -1924,6 +1975,7 @@ static void build_subroutines(BuildCtx *ctx)
   |.endif
   |  mov BASE, L:RB->base
   |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+  |  // INTERP until jump to BC_RET* or vm_return.
   |  set_vmstate INTERP
   |
   |  cmp eax, LUA_YIELD
@@ -2089,6 +2141,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  movzx RA, PC_RA
   |  not RAa				// Note: ~RA = -(RA+1)
   |  lea BASE, [BASE+RA*8]		// base = base - (RA+1)*8
+  |  set_vmstate LFUNC			// LFUNC state after BASE restoration.
   |  ins_next
   |
   |6:  // Fill up results with nil.
@@ -2933,7 +2986,7 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov KBASE, [KBASE+PC2PROTO(k)]
   |  mov L:RB->base, BASE
   |  mov dword [DISPATCH+DISPATCH_GL(jit_base)], 0
-  |  set_vmstate INTERP
+  |  set_vmstate LFUNC			// LFUNC after BASE & KBASE restoration.
   |  // Modified copy of ins_next which handles function header dispatch, too.
   |  mov RC, [PC]
   |  movzx RA, RCH
@@ -3203,8 +3256,8 @@ static void build_subroutines(BuildCtx *ctx)
   |  mov FCARG1, CTSTATE
   |  call extern lj_ccallback_enter@8	// (CTState *cts, void *cf)
   |  // lua_State * returned in eax (RD).
-  |  set_vmstate INTERP
   |  mov BASE, L:RD->base
+  |  set_vmstate LFUNC			// LFUNC after BASE restoration.
   |  mov RD, L:RD->top
   |  sub RD, BASE
   |  mov LFUNC:RB, [BASE-8]
@@ -4683,6 +4736,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
 
   case BC_CALL: case BC_CALLM:
     |  ins_A_C	// RA = base, (RB = nresults+1,) RC = nargs+1 | extra_nargs
+    |  // INTERP until enters *FUNC* bytecode and a new BASE is setup.
+    |  set_vmstate INTERP
     if (op == BC_CALLM) {
       |  add NARGS:RD, MULTRES
     }
@@ -4706,6 +4761,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  cmp dword [RA-4], LJ_TFUNC
     |  jne ->vmeta_call
     |->BC_CALLT_Z:
+    |  // INTERP until enters *FUNC* bytecode and a new BASE is setup.
+    |  set_vmstate INTERP
     |  mov PC, [BASE-4]
     |  test PC, FRAME_TYPE
     |  jnz >7
@@ -4989,6 +5046,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |  shl RA, 3
     }
     |1:
+    |  // INTERP until the old BASE & KBASE is restored.
+    |  set_vmstate INTERP
     |  mov PC, [BASE-4]
     |  mov MULTRES, RD			// Save nresults+1.
     |  test PC, FRAME_TYPE		// Check frame type marker.
@@ -5043,6 +5102,8 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov LFUNC:KBASE, [BASE-8]
     |  mov KBASE, LFUNC:KBASE->pc
     |  mov KBASE, [KBASE+PC2PROTO(k)]
+    |  // LFUNC after the old BASE & KBASE is restored.
+    |  set_vmstate LFUNC
     |  ins_next
     |
     |6:  // Fill up results with nil.
@@ -5330,6 +5391,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  ins_AD  // BASE = new base, RA = framesize, RD = nargs+1
     |  mov KBASE, [PC-4+PC2PROTO(k)]
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration.
     |  lea RA, [BASE+RA*8]		// Top of frame.
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_f
@@ -5367,6 +5429,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  mov [RD-4], RB			// Store delta + FRAME_VARG.
     |  mov [RD-8], LFUNC:KBASE		// Store copy of LFUNC.
     |  mov L:RB, SAVE_L
+    |  set_vmstate LFUNC		// LFUNC after KBASE restoration.
     |  lea RA, [RD+RA*8]
     |  cmp RA, L:RB->maxstack
     |  ja ->vm_growstack_v		// Need to grow stack.
@@ -5431,7 +5494,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
       |.endif
     }
     |  ja ->vm_growstack_c		// Need to grow stack.
-    |  set_vmstate C
+    |  set_vmstate CFUNC		// CFUNC before entering C function.
     if (op == BC_FUNCC) {
       |  call KBASEa			// (lua_State *L)
     } else {
@@ -5441,6 +5504,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |  // nresults returned in eax (RD).
     |  mov BASE, L:RB->base
     |  mov [DISPATCH+DISPATCH_GL(cur_L)], L:RB
+    |  // INTERP until jump to BC_RET* or vm_return.
     |  set_vmstate INTERP
     |  lea RA, [BASE+RD*8]
     |  neg RA
-- 
2.28.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 3/7] vm: introduce VM states for Lua and fast functions
  2020-12-28  4:05 ` [Tarantool-patches] [PATCH luajit v3 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
@ 2020-12-28  5:14   ` Igor Munkin
  0 siblings, 0 replies; 52+ messages in thread
From: Igor Munkin @ 2020-12-28  5:14 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the fixes! LGTM.

On 28.12.20, Sergey Kaplun wrote:
> This patch introduces LJ_VMST_LFUNC and LJ_VMST_FFUNC VM states
> separated from LJ_VMST_INERP. New VM states allow to determine the
> context of Lua VM execution for x86 and x64 arches. Also, LJ_VMST_C is
> renamed to LJ_VMST_CFUNC for naming consistence with new VM states.
> 
> Also, this patch adjusts stack layout for x86 and x64 arches to save VM
> state for its consistency while stack unwinding when error is raised.
> 
> To group all traces into the one vmstate, a special
> macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v3:
>  * Adjusted vmstate saving.
>  * Fix Win X64.
>  * Fix typos in frame layout.
>  * Codestyle fixes.
> 
>  src/lj_frame.h     |  16 ++---
>  src/lj_obj.h       |  11 +++-
>  src/lj_profile.c   |   5 +-
>  src/luajit-gdb.py  |  14 +++--
>  src/vm_arm.dasc    |   6 +-
>  src/vm_arm64.dasc  |   6 +-
>  src/vm_mips.dasc   |   6 +-
>  src/vm_mips64.dasc |   6 +-
>  src/vm_ppc.dasc    |   6 +-
>  src/vm_x64.dasc    |  93 +++++++++++++++++++++--------
>  src/vm_x86.dasc    | 146 ++++++++++++++++++++++++++++++++-------------
>  11 files changed, 218 insertions(+), 97 deletions(-)
> 

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler
  2020-12-28  2:49   ` Igor Munkin
@ 2020-12-28  5:19     ` Sergey Kaplun
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  5:19 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Igor,

Thanks for the review!

On 28.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! LGTM with the several comments below.
> 
> On 28.12.20, Sergey Kaplun wrote:
> > This patch introduces Lua API for LuaJIT memory profiler implemented in
> > the scope of the previous patch.
> > 
> > Profiler returns true value if started/stopped successfully,
> > returns nil on failure (plus an error message as a second result and a
> > system-dependent error code as a third result).
> > If LuaJIT is build without memory profiler both return `false`.
> 
> Typo: s/`false`/false/ considering true and nil above.

Done.

> 
> > 
> > <lj_errmsg.h> have adjusted with three new errors
> 
> Typo: s/have adjusted/has been adjusted/.

Fixed.

> 
> > PROF_MISUSE/PROF_ISRUNNING/PROF_NOTRUNNING returned in case when
> > profiler has used incorrectly/started/stopped already correspondingly.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v3:
> >   * Fixed lj_mem_new misuse.
> >   * Moved buffer inside ctx.
> >   * Codestyle fixes.
> > 
> >  src/Makefile.dep |   5 +-
> >  src/lib_misc.c   | 166 +++++++++++++++++++++++++++++++++++++++++++++++
> >  src/lj_errmsg.h  |   7 ++
> >  3 files changed, 176 insertions(+), 2 deletions(-)
> > 
> 
> <snipped>
> 
> > diff --git a/src/lib_misc.c b/src/lib_misc.c
> > index 6f7b9a9..619cfb7 100644
> > --- a/src/lib_misc.c
> > +++ b/src/lib_misc.c
> 
> <snipped>
> 
> > @@ -67,8 +75,166 @@ LJLIB_CF(misc_getmetrics)
> 
> <snipped>
> 
> > +/* local started, err, errno = misc.memprof.start(fname) */
> > +LJLIB_CF(misc_memprof_start)
> > +{
> 
> <snipped>
> 
> > +  if (LJ_UNLIKELY(memprof_status != PROFILE_SUCCESS)) {
> > +    lj_mem_free(ctx->g, ctx, sizeof(*ctx));
> 
> This deallocation causes double free if PROFILE_ERRIO occurs, since the
> ctx is released within on_stop callback.
> 
> ctx->stream should be closed if PROFILE_ERR(USE|RUN) occurs.

This is the reason why I suggested not to call on_stop callback on
finish -- this "corner case" looks weird.
Nice catch! See the iterative patch below.

> 
> > +    switch (memprof_status) {
> 
> <snipped>
> 
> > +    }
> > +  }
> > +  lua_pushboolean(L, 1);
> > +
> 
> Typo: Excess newline.

Fixed.

> 
> > +  return 1;
> > +}
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

===================================================================
diff --git a/src/lib_misc.c b/src/lib_misc.c
index 619cfb7..958c189 100644
--- a/src/lib_misc.c
+++ b/src/lib_misc.c
@@ -177,7 +177,10 @@ LJLIB_CF(misc_memprof_start)
   memprof_status = lj_memprof_start(L, &opt);
 
   if (LJ_UNLIKELY(memprof_status != PROFILE_SUCCESS)) {
-    lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+    if (memprof_status == PROFILE_ERRIO) {
+      fclose(ctx->stream);
+      lj_mem_free(ctx->g, ctx, sizeof(*ctx));
+    }
     switch (memprof_status) {
     case PROFILE_ERRUSE:
       lua_pushnil(L);
@@ -197,7 +200,6 @@ LJLIB_CF(misc_memprof_start)
     }
   }
   lua_pushboolean(L, 1);
-
   return 1;
 }
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-27  7:16     ` Sergey Kaplun
@ 2020-12-28  5:30       ` Sergey Kaplun
  2020-12-28  5:33         ` Igor Munkin
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  5:30 UTC (permalink / raw)
  To: Igor Munkin, tarantool-patches

Adjusted <memprof.lua> to more painless integration with Tarantool.
See the iterative patch below. Branch is force-pushed.

===================================================================
diff --git a/tools/memprof.lua b/tools/memprof.lua
index 92d192e..a98e192 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -90,7 +90,7 @@ local function parseargs(args)
   return args[args.argn]
 end
 
-local inputfile = parseargs{...}
+local inputfile = parseargs(arg)
 
 local reader = bufread.new(inputfile)
 local symbols = symtab.parse(reader)
@@ -107,3 +107,5 @@ stdout:write("\n")
 stdout:write("DEALLOCATIONS", "\n")
 view.render(events.free, symbols)
 stdout:write("\n")
+
+os.exit(0)
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-28  5:30       ` Sergey Kaplun
@ 2020-12-28  5:33         ` Igor Munkin
  2020-12-28  6:28           ` Sergey Kaplun
  0 siblings, 1 reply; 52+ messages in thread
From: Igor Munkin @ 2020-12-28  5:33 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Neato, LGTM.

On 28.12.20, Sergey Kaplun wrote:
> Adjusted <memprof.lua> to more painless integration with Tarantool.
> See the iterative patch below. Branch is force-pushed.
> 
> ===================================================================
> diff --git a/tools/memprof.lua b/tools/memprof.lua
> index 92d192e..a98e192 100644
> --- a/tools/memprof.lua
> +++ b/tools/memprof.lua
> @@ -90,7 +90,7 @@ local function parseargs(args)
>    return args[args.argn]
>  end
>  
> -local inputfile = parseargs{...}
> +local inputfile = parseargs(arg)
>  
>  local reader = bufread.new(inputfile)
>  local symbols = symtab.parse(reader)
> @@ -107,3 +107,5 @@ stdout:write("\n")
>  stdout:write("DEALLOCATIONS", "\n")
>  view.render(events.free, symbols)
>  stdout:write("\n")
> +
> +os.exit(0)
> ===================================================================
> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (9 preceding siblings ...)
  2020-12-28  4:05 ` [Tarantool-patches] [PATCH luajit v3 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
@ 2020-12-28  6:01 ` Alexander V. Tikhonov
  2020-12-28  8:15 ` Igor Munkin
  11 siblings, 0 replies; 52+ messages in thread
From: Alexander V. Tikhonov @ 2020-12-28  6:01 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Hi Sergey, thanks for the patchset, as I see no new degradation found
in gitlab-ci testing commit criteria pipeline [1], patchset LGTM.

[1] - https://gitlab.com/tarantool/tarantool/-/pipelines/234321640

p.s. failed 'ubuntu_19_10' job already removed from testing.

On Fri, Dec 25, 2020 at 06:26:02PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> 
> This patch provides a Lua interface for memory profiler in LuaJIT
> and the corresponding parser of profiled data.
> 
> Global changes in v2:
>   - Moved symtab to memprof module.
>   - Added LUA_CORE and `module_name`_c defines
>   - Added LJ_FASTCALL in wbuf and leb128 modules.
>   - Added translation units to amalg build.
>   - Code style fixes and commit message fixes.
>   - Added (gh-5490) to ChangeLog.
> 
> Issues: https://github.com/tarantool/tarantool/issues/5442
>         https://github.com/tarantool/tarantool/issues/5490
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-5442-luajit-memory-profiler
> 
> CI:     https://gitlab.com/tarantool/tarantool/-/pipelines/234430645
> 
> RFC: https://lists.tarantool.org/pipermail/tarantool-discussions/2020-December/000147.html
> 
> @ChangeLog:
>  - Introduce LuaJIT memory profiler (gh-5442).
>  - Introduce LuaJIT memory profiler parser (gh-5490).
> 
> Sergey Kaplun (7):
>   utils: introduce leb128 reader and writer
>   core: introduce write buffer module
>   vm: introduce VM states for Lua and fast functions
>   core: introduce new mem_L field
>   core: introduce memory profiler
>   misc: add Lua API for memory profiler
>   tools: introduce a memory profile parser
> 
>  Makefile                           |  39 ++-
>  src/Makefile                       |  13 +-
>  src/Makefile.dep                   |  44 +--
>  src/lib_misc.c                     | 167 +++++++++++
>  src/lj_arch.h                      |  22 ++
>  src/lj_debug.c                     |   8 +-
>  src/lj_debug.h                     |   3 +
>  src/lj_errmsg.h                    |   6 +
>  src/lj_frame.h                     |  18 +-
>  src/lj_gc.c                        |   2 +
>  src/lj_memprof.c                   | 430 +++++++++++++++++++++++++++++
>  src/lj_memprof.h                   | 165 +++++++++++
>  src/lj_obj.h                       |  13 +-
>  src/lj_profile.c                   |   5 +-
>  src/lj_state.c                     |   8 +
>  src/lj_utils.h                     |  58 ++++
>  src/lj_utils_leb128.c              | 132 +++++++++
>  src/lj_wbuf.c                      | 141 ++++++++++
>  src/lj_wbuf.h                      |  87 ++++++
>  src/ljamalg.c                      |   3 +
>  src/luajit-gdb.py                  |  14 +-
>  src/vm_arm.dasc                    |   6 +-
>  src/vm_arm64.dasc                  |   6 +-
>  src/vm_mips.dasc                   |   6 +-
>  src/vm_mips64.dasc                 |   6 +-
>  src/vm_ppc.dasc                    |   6 +-
>  src/vm_x64.dasc                    |  93 +++++--
>  src/vm_x86.dasc                    | 131 ++++++---
>  test/misclib-memprof-lapi.test.lua | 135 +++++++++
>  tools/luajit-parse-memprof         |   9 +
>  tools/memprof.lua                  | 109 ++++++++
>  tools/memprof/humanize.lua         |  45 +++
>  tools/memprof/parse.lua            | 188 +++++++++++++
>  tools/utils/bufread.lua            | 147 ++++++++++
>  tools/utils/symtab.lua             |  89 ++++++
>  35 files changed, 2217 insertions(+), 137 deletions(-)
>  create mode 100644 src/lj_memprof.c
>  create mode 100644 src/lj_memprof.h
>  create mode 100644 src/lj_utils.h
>  create mode 100644 src/lj_utils_leb128.c
>  create mode 100644 src/lj_wbuf.c
>  create mode 100644 src/lj_wbuf.h
>  create mode 100755 test/misclib-memprof-lapi.test.lua
>  create mode 100755 tools/luajit-parse-memprof
>  create mode 100644 tools/memprof.lua
>  create mode 100644 tools/memprof/humanize.lua
>  create mode 100644 tools/memprof/parse.lua
>  create mode 100644 tools/utils/bufread.lua
>  create mode 100644 tools/utils/symtab.lua
> 
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-28  5:33         ` Igor Munkin
@ 2020-12-28  6:28           ` Sergey Kaplun
  2020-12-28  6:31             ` Igor Munkin
  0 siblings, 1 reply; 52+ messages in thread
From: Sergey Kaplun @ 2020-12-28  6:28 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

This is a patch for workaround of Tarantool's work with builtins.
It is necessary to be able launch profiler parser something like
this (with Tarantool only):

| src/tarantool -e 'require"memprof"(arg[1])' - ~/tmp/uj_memprof/memprof_new.bin

Branch is force-pushed.

===================================================================
diff --git a/tools/memprof.lua b/tools/memprof.lua
index a98e192..9f96208 100644
--- a/tools/memprof.lua
+++ b/tools/memprof.lua
@@ -90,22 +90,30 @@ local function parseargs(args)
   return args[args.argn]
 end
 
-local inputfile = parseargs(arg)
+local function dump(inputfile)
+  local reader = bufread.new(inputfile)
+  local symbols = symtab.parse(reader)
+  local events = memprof.parse(reader, symbols)
 
-local reader = bufread.new(inputfile)
-local symbols = symtab.parse(reader)
-local events = memprof.parse(reader, symbols)
+  stdout:write("ALLOCATIONS", "\n")
+  view.render(events.alloc, symbols)
+  stdout:write("\n")
 
-stdout:write("ALLOCATIONS", "\n")
-view.render(events.alloc, symbols)
-stdout:write("\n")
+  stdout:write("REALLOCATIONS", "\n")
+  view.render(events.realloc, symbols)
+  stdout:write("\n")
 
-stdout:write("REALLOCATIONS", "\n")
-view.render(events.realloc, symbols)
-stdout:write("\n")
+  stdout:write("DEALLOCATIONS", "\n")
+  view.render(events.free, symbols)
+  stdout:write("\n")
 
-stdout:write("DEALLOCATIONS", "\n")
-view.render(events.free, symbols)
-stdout:write("\n")
+  os.exit(0)
+end
 
-os.exit(0)
+-- FIXME: this script should be application-independent.
+local args = {...}
+if #args == 1 and args[1] == "memprof" then
+  return dump
+else
+  dump(parseargs(args))
+end
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser
  2020-12-28  6:28           ` Sergey Kaplun
@ 2020-12-28  6:31             ` Igor Munkin
  0 siblings, 0 replies; 52+ messages in thread
From: Igor Munkin @ 2020-12-28  6:31 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

On 28.12.20, Sergey Kaplun wrote:
> This is a patch for workaround of Tarantool's work with builtins.
> It is necessary to be able launch profiler parser something like
> this (with Tarantool only):
> 
> | src/tarantool -e 'require"memprof"(arg[1])' - ~/tmp/uj_memprof/memprof_new.bin

Heck this. Still LGTM :)

> 
> Branch is force-pushed.
> 
> ===================================================================
> diff --git a/tools/memprof.lua b/tools/memprof.lua
> index a98e192..9f96208 100644
> --- a/tools/memprof.lua
> +++ b/tools/memprof.lua
> @@ -90,22 +90,30 @@ local function parseargs(args)
>    return args[args.argn]
>  end
>  
> -local inputfile = parseargs(arg)
> +local function dump(inputfile)
> +  local reader = bufread.new(inputfile)
> +  local symbols = symtab.parse(reader)
> +  local events = memprof.parse(reader, symbols)
>  
> -local reader = bufread.new(inputfile)
> -local symbols = symtab.parse(reader)
> -local events = memprof.parse(reader, symbols)
> +  stdout:write("ALLOCATIONS", "\n")
> +  view.render(events.alloc, symbols)
> +  stdout:write("\n")
>  
> -stdout:write("ALLOCATIONS", "\n")
> -view.render(events.alloc, symbols)
> -stdout:write("\n")
> +  stdout:write("REALLOCATIONS", "\n")
> +  view.render(events.realloc, symbols)
> +  stdout:write("\n")
>  
> -stdout:write("REALLOCATIONS", "\n")
> -view.render(events.realloc, symbols)
> -stdout:write("\n")
> +  stdout:write("DEALLOCATIONS", "\n")
> +  view.render(events.free, symbols)
> +  stdout:write("\n")
>  
> -stdout:write("DEALLOCATIONS", "\n")
> -view.render(events.free, symbols)
> -stdout:write("\n")
> +  os.exit(0)
> +end
>  
> -os.exit(0)
> +-- FIXME: this script should be application-independent.
> +local args = {...}
> +if #args == 1 and args[1] == "memprof" then
> +  return dump
> +else
> +  dump(parseargs(args))
> +end
> ===================================================================
> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler
  2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
                   ` (10 preceding siblings ...)
  2020-12-28  6:01 ` [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Alexander V. Tikhonov
@ 2020-12-28  8:15 ` Igor Munkin
  11 siblings, 0 replies; 52+ messages in thread
From: Igor Munkin @ 2020-12-28  8:15 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

On 25.12.20, Sergey Kaplun wrote:
> 
> This patch provides a Lua interface for memory profiler in LuaJIT
> and the corresponding parser of profiled data.
> 
> Global changes in v2:
>   - Moved symtab to memprof module.
>   - Added LUA_CORE and `module_name`_c defines
>   - Added LJ_FASTCALL in wbuf and leb128 modules.
>   - Added translation units to amalg build.
>   - Code style fixes and commit message fixes.
>   - Added (gh-5490) to ChangeLog.
> 
> Issues: https://github.com/tarantool/tarantool/issues/5442
>         https://github.com/tarantool/tarantool/issues/5490
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-5442-luajit-memory-profiler
> 
> CI:     https://gitlab.com/tarantool/tarantool/-/pipelines/234430645
> 
> RFC: https://lists.tarantool.org/pipermail/tarantool-discussions/2020-December/000147.html
> 
> @ChangeLog:
>  - Introduce LuaJIT memory profiler (gh-5442).
>  - Introduce LuaJIT memory profiler parser (gh-5490).
> 
> Sergey Kaplun (7):
>   utils: introduce leb128 reader and writer
>   core: introduce write buffer module
>   vm: introduce VM states for Lua and fast functions
>   core: introduce new mem_L field
>   core: introduce memory profiler
>   misc: add Lua API for memory profiler
>   tools: introduce a memory profile parser

I've checked your patchset into tarantool/luajit bleeding branch and
bumped a new version in Tarantool master.

> 

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-12-28  8:15 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 15:26 [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Sergey Kaplun
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 1/7] utils: introduce leb128 reader and writer Sergey Kaplun
2020-12-25 21:42   ` Igor Munkin
2020-12-26  9:32     ` Sergey Kaplun
2020-12-26 13:57       ` Sergey Kaplun
2020-12-26 18:47         ` Sergey Ostanevich
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 2/7] core: introduce write buffer module Sergey Kaplun
2020-12-26 14:22   ` Igor Munkin
2020-12-26 15:26     ` Sergey Kaplun
2020-12-26 19:03       ` Sergey Ostanevich
2020-12-26 19:37         ` Sergey Kaplun
2020-12-28  1:43           ` Sergey Kaplun
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
2020-12-26 19:07   ` Sergey Ostanevich
2020-12-27 23:48   ` Igor Munkin
2020-12-28  3:54     ` Sergey Kaplun
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 4/7] core: introduce new mem_L field Sergey Kaplun
2020-12-26 19:12   ` Sergey Ostanevich
2020-12-26 19:42     ` Sergey Kaplun
2020-12-27 13:09   ` Igor Munkin
2020-12-27 17:44     ` Sergey Kaplun
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler Sergey Kaplun
2020-12-27 10:58   ` Sergey Ostanevich
2020-12-27 11:54     ` Sergey Kaplun
2020-12-27 13:27       ` Sergey Ostanevich
2020-12-27 16:44   ` Igor Munkin
2020-12-27 21:47     ` Sergey Kaplun
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 6/7] misc: add Lua API for " Sergey Kaplun
2020-12-27 11:54   ` Sergey Ostanevich
2020-12-27 13:42     ` Sergey Kaplun
2020-12-27 15:37       ` Sergey Ostanevich
2020-12-27 18:58   ` Igor Munkin
2020-12-28  0:14     ` Sergey Kaplun
2020-12-25 15:26 ` [Tarantool-patches] [PATCH luajit v2 7/7] tools: introduce a memory profile parser Sergey Kaplun
2020-12-26 22:56   ` Igor Munkin
2020-12-27  7:16     ` Sergey Kaplun
2020-12-28  5:30       ` Sergey Kaplun
2020-12-28  5:33         ` Igor Munkin
2020-12-28  6:28           ` Sergey Kaplun
2020-12-28  6:31             ` Igor Munkin
2020-12-27 13:24   ` Sergey Ostanevich
2020-12-27 16:02     ` Sergey Kaplun
2020-12-27 21:55       ` Sergey Ostanevich
2020-12-28  2:05 ` [Tarantool-patches] [PATCH luajit v3 2/2] misc: add Lua API for memory profiler Sergey Kaplun
2020-12-28  2:49   ` Igor Munkin
2020-12-28  5:19     ` Sergey Kaplun
2020-12-28  2:06 ` [Tarantool-patches] [PATCH luajit v3 1/2] core: introduce " Sergey Kaplun
2020-12-28  3:59   ` Igor Munkin
2020-12-28  4:05 ` [Tarantool-patches] [PATCH luajit v3 3/7] vm: introduce VM states for Lua and fast functions Sergey Kaplun
2020-12-28  5:14   ` Igor Munkin
2020-12-28  6:01 ` [Tarantool-patches] [PATCH luajit v2 0/7] LuaJIT memory profiler Alexander V. Tikhonov
2020-12-28  8:15 ` Igor Munkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox