[Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler

Igor Munkin imun at tarantool.org
Sun Dec 27 19:44:38 MSK 2020


Sergey,

Thanks for the patch! Please consider the comments below.

On 25.12.20, Sergey Kaplun wrote:
> This patch introduces memory profiler for Lua machine.
> 
> First of all profiler dumps the definitions of all loaded Lua functions
> (symtab) via the write buffer introduced in one of the previous patches.
> 
> Profiler replaces the old allocation function with the instrumented one
> after symtab is dumped. This new function reports all allocations,
> reallocations or deallocations events via the write buffer during
> profiling. Subsequent content depends on the function's type (LFUNC,
> FFUNC or CFUNC).
> 
> To divide all traces into the one vmstate when being profiled, a special
> macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.
> 
> When profiling is over, a special epilogue event header is written and
> the old allocation function is restored back.
> 
> This change also makes debug_frameline function LuaJIT-wide visible to
> be used in the memory profiler.
> 
> For more information, see <lj_memprof.h>.
> 
> Part of tarantool/tarantool#5442
> ---
> 
> Changes in v2:
>   - Merged with debug-to-public commit and symtab.
>   - Drop [T]imer bit description.
> 
>  src/Makefile     |   8 +-
>  src/Makefile.dep |  31 ++--
>  src/lj_arch.h    |  22 +++
>  src/lj_debug.c   |   8 +-
>  src/lj_debug.h   |   3 +
>  src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++
>  src/lj_memprof.h | 165 ++++++++++++++++++
>  src/lj_obj.h     |   8 +
>  src/lj_state.c   |   8 +
>  src/ljamalg.c    |   1 +
>  10 files changed, 665 insertions(+), 19 deletions(-)
>  create mode 100644 src/lj_memprof.c
>  create mode 100644 src/lj_memprof.h
> 
> diff --git a/src/Makefile b/src/Makefile
> index 384b590..3218dfd 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -113,6 +113,12 @@ XCFLAGS=
>  # Enable GC64 mode for x64.
>  #XCFLAGS+= -DLUAJIT_ENABLE_GC64
>  #
> +# Disable the memory profiler.
> +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
> +#
> +# Disable the thread safe profiler.
> +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE

Well, I personally see little sense in this flag *now*. AFAICS it can
only occur in multithread host application running a separate VM per
thread. But this not quite well implemented even in <jit.p> profiler
(see Mike's comment here[1]), so you can either disable it for now
(since this is only MVP) or make it the same way Mike did: lock a mutex
if <memprof->g> is not NULL; otherwise simply initialize the profiler.
By the way, there is no mutex used for POSIX targets in <jit.p>, but I
don't know why.

> +#
>  ##############################################################################
>  
>  ##############################################################################

<snipped>

> diff --git a/src/lj_arch.h b/src/lj_arch.h
> index c8d7138..5967849 100644
> --- a/src/lj_arch.h
> +++ b/src/lj_arch.h

<snipped>

> +/* Disable or enable the memory profiler's thread safety. */
> +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
> +#define LJ_THREAD_SAFE		0
> +#else
> +#define LJ_THREAD_SAFE		1

Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/.

Otherwise thread safety doesn't work at all.

> +#endif
> +
>  #endif

<snipped>

> diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> new file mode 100644
> index 0000000..e0df057
> --- /dev/null
> +++ b/src/lj_memprof.c
> @@ -0,0 +1,430 @@
> +/*
> +** Implementation of memory profiler.
> +**
> +** Major portions taken verbatim or adapted from the LuaVela.
> +** Copyright (C) 2015-2019 IPONWEB Ltd.
> +*/
> +
> +#define lj_memprof_c
> +#define LUA_CORE
> +
> +#include <errno.h>
> +
> +#include "lj_memprof.h"
> +#include "lj_def.h"

<snipped>

> +#include "lua.h"

This include is excess.

> +
> +#include "lj_obj.h"
> +#include "lj_frame.h"
> +#include "lj_debug.h"
> +#include "lj_gc.h"

This include is excess.

> +#include "lj_wbuf.h"
> +
> +/* --------------------------------- Symtab --------------------------------- */

<snipped>

> +static void symtab_write_prologue(struct lj_wbuf *out)
> +{
> +  const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
> +  lj_wbuf_addn(out, ljs_header, len);
> +}

Minor: Again, I guess this function semantics can be moved right to its
caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte).

> +

<snipped>

> +/* ---------------------------- Memory profiler ----------------------------- */

<snipped>

> +struct alloc {
> +  lua_Alloc allocf; /* Allocating function. */
> +  void *state; /* Opaque allocator's state. */
> +};

Minor: This structure can be used in <lj_obj.h> to store the default
allocator. Feel free to ignore.

> +

<snipped>

> +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header,
> +				GCfunc *fn, struct lua_State *L,
> +				cTValue *nextframe)
> +{
> +  const BCLine line = lj_debug_frameline(L, fn, nextframe);
> +  lj_wbuf_addbyte(out, header | ASOURCE_LFUNC);
> +  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
> +  lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0);

As we discussed offline, I have two notes for this line:
* When the <line> value can be negative?
* When the <line> value can be zero?

Furthermore, I have no idea, why <line> is casted to <uintptr_t>.

> +}

<snipped>

> +static void memprof_write_func(struct memprof *mp, uint8_t header)
> +{
> +  struct lj_wbuf *out = &mp->out;
> +  lua_State *L = gco2th(gcref(mp->g->mem_L));
> +  cTValue *frame = L->base - 1;
> +  GCfunc *fn;
> +
> +  fn = frame_func(frame);

Minor: Why does this line differ from those above?

> +

<snipped>

> +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
> +{
> +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> +}
> +
> +/*
> +** XXX: In ideal world, we should report allocations from traces as well.
> +** But since traces must follow the semantics of the original code, behaviour of
> +** Lua and JITted code must match 1:1 in terms of allocations, which makes
> +** using memprof with enabled JIT virtually redundant. Hence the stub below.
> +*/

I guess you can drop the function below (since it simply duplicates the
INTERNAL allocation semantics), *but* the comment above is nice, so you
can move it to the default function or to the corresponding item in
<memprof_writers> set.

> +static void memprof_write_trace(struct memprof *mp, uint8_t header)
> +{
> +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> +}
> +
> +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);

Minor: I guess this is not <header> but <aevent>.

> +

<snipped>

> +static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> +{
> +  const global_State *g = mp->g;
> +  const uint32_t _vmstate = (uint32_t)~g->vmstate;
> +  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
> +  const uint8_t header = aevent;

General Genius :)

> +
> +  memprof_writers[vmstate](mp, header);
> +}

<snipped>

> +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
> +{
> +  struct memprof *mp = &memprof;

This should be "const", IMHO.

> +  struct alloc *oalloc = &mp->orig_alloc;

<snipped>

> +}
> +
> +static void memprof_write_prologue(struct lj_wbuf *out)
> +{
> +  const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
> +  lj_wbuf_addn(out, ljm_header, len);
> +}

See comments for <symtab_write_prologue>.

> +
> +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)

Side note: it's better to wrap this function to move all mutex-related
work in this wrapper.

> +{

<snipped>

> +  lua_assert(opt->writer != NULL && opt->on_stop != NULL);
> +  lua_assert(opt->buf != NULL && opt->len != 0);

Do these asserts depend on each other? If no, please split them into the
separate ones.

> +

<snipped>

> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) ||
> +		  lj_wbuf_test_flag(&mp->out, STREAM_STOP))) {

You can test (STREAM_ERR_IO|STREAM_STOP) here.

> +    /* on_stop call may change errno value. */

<snipped>

> +}
> +
> +static int memprof_stop(const struct lua_State *L)
> +{

<snipped>

> +  if (L != NULL && mp->g != G(L)) {

This is a nice check (but looks redundant for the current version). Why
did you make it optional (if L is given)?

> +    memprof_unlock();
> +    return PROFILE_ERR;
> +  }

<snipped>

> +  main_L = mainthread(mp->g);

Why do you use main coroutine here instead of the given one?

> +

<snipped>

> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> +    mp->state = MPS_HALT;
> +    /* on_stop call may change errno value. */
> +    mp->saved_errno = lj_wbuf_errno(out);
> +    /* Ignore possible errors. mp->opt.buf == NULL here. */
> +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> +    lj_wbuf_terminate(out);
> +    memprof_unlock();
> +    return PROFILE_ERRIO;
> +  }

<snipped>

> +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> +    saved_errno = lj_wbuf_errno(out);
> +    return_status = PROFILE_ERRIO;

Well, you introduce a separate variable for the status. For what? I
think this branch is the same as the "stop" one above. So either use
goto to the error branch or duplicate the logic from the "stop" branch.
In general, this looks like a mess: I almost lost the mutex unlock and
the buffer termination.

> +  }
> +
> +  lj_wbuf_terminate(out);
> +
> +  memprof_unlock();
> +  errno = saved_errno;
> +  return return_status;
> +}
> +
> +int lj_memprof_stop(void)

Why do you omit the coroutine argument for this function?

> +{

<snipped>

> +int lj_memprof_stop_vm(const struct lua_State *L)

This function is not used anywhere. Please drop it.

> +{

<snipped>

> +int lj_memprof_is_running(void)

This function is used only on global state finalization. However, it is
excess (see the comment below), so I believe it can be dropped.

> +{

<snipped>

> diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> new file mode 100644
> index 0000000..a96b72f
> --- /dev/null
> +++ b/src/lj_memprof.h
> @@ -0,0 +1,165 @@

<snipped>

> +#include <stdint.h>
> +#include <stddef.h>

lj_def.h is definitely enough here.

> +

<snipped>

> +#define SYMTAB_FFUNC ((uint8_t)1)
> +#define SYMTAB_CFUNC ((uint8_t)2)
> +#define SYMTAB_TRACE ((uint8_t)3)

These defines are unused.

> +#define SYMTAB_FINAL ((uint8_t)0x80)

<snipped>

> +/* Profiler public API. */
> +#define PROFILE_SUCCESS 0
> +#define PROFILE_ERR     1

Minor: Considering the usage <PROFILE_ERRUSE> looks better to me. Feel
free to ignore.

> +#define PROFILE_ERRRUN  2
> +#define PROFILE_ERRMEM  3
> +#define PROFILE_ERRIO   4
> +
> +/* Profiler options. */
> +struct lua_Prof_options {

Typo: s/lua_Prof/lj_memprof/ since <lua_*> is used for exported members.

> +  /* Context for the profile writer and final callback. */
> +  void *ctx;
> +  /* Custom buffer to write data. */
> +  uint8_t *buf;
> +  /* The buffer's size. */
> +  size_t len;
> +  /*
> +  ** Writer function for profile events.
> +  ** Should return amount of written bytes on success or zero in case of error.
> +  ** Setting *data to NULL means end of profiling.
> +  */

Why don't you use <lj_wbuf_writer> typedef below? So you can reference
<lj_wbuf.h> for the writer contract.

> +  size_t (*writer)(const void **data, size_t len, void *ctx);
> +  /*
> +  ** Callback on profiler stopping. Required for correctly cleaning
> +  ** at vm shoutdown when profiler still running.

Typo: s/VM shoutdown/VM finalization/.
Typo: s/profiled still running/profiler is still running/.

> +  ** Returns zero on success.
> +  */
> +  int (*on_stop)(void *ctx, uint8_t *buf);
> +};
> +
> +/* Avoid to provide additional interfaces described in other headers. */

It looks like cargo cult, IMHO. What is the reason?

> +struct lua_State;
> +
> +/*
> +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of
> +** LUAM_PROFILE_ERRIO.

Typo: s/LUAM_PROFILE_*/PROFILE_*/g.

> +*/
> +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt);
> +
> +/*
> +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
> +** on call at buffer flush, profiled stream stops, or on_stop() callback
> +** returns non-zero value, returns LUAM_PROFILE_ERRIO.
> +*/

<snipped>

> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index c94617d..c94b0bb 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -523,6 +523,14 @@ enum {
>    LJ_VMST__MAX
>  };
>  
> +/*
> +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
> +** In fact, when VM executes a trace, vmstate is set to the trace number, but
> +** we aggregate all such cases into one VM state during per-VM state profiling.
> +*/

Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide
one. If <vmstate> is less than LJ_VMST__MAX it is considered as a trace
number and all LuaJIT universe works with assumtions the trace is being
run. It looks natural to me to move this change to the previous patch
related to VM states slivering into *FUNC set to close this story there.

> +
> +#define LJ_VMST_TRACE		(LJ_VMST__MAX)
> +
>  #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
>  
>  /* Metamethods. ORDER MM */
> diff --git a/src/lj_state.c b/src/lj_state.c
> index 1d9c628..6c46e3d 100644
> --- a/src/lj_state.c
> +++ b/src/lj_state.c

<snipped>

> @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
>    global_State *g = G(L);
>    int i;
>    L = mainthread(g);  /* Only the main thread can be closed. */
> +#if LJ_HASMEMPROF
> +  if (lj_memprof_is_running())

This check is excess, since you don't check the return value below. If
<lj_memprof_stop> is called when profiler doesn't work, PROFILE_ERRRUN
or PROFILE_ERRIO is yield.

> +    lj_memprof_stop();
> +#endif
>  #if LJ_HASPROFILE
>    luaJIT_profile_stop(L);
>  #endif

<snipped>

> -- 
> 2.28.0
> 

[1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list