[Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler

Sergey Kaplun skaplun at tarantool.org
Mon Dec 28 00:47:15 MSK 2020


Hi, Igor!

Thanks for the review!

On 27.12.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider the comments below.
> 
> On 25.12.20, Sergey Kaplun wrote:
> > This patch introduces memory profiler for Lua machine.
> > 
> > First of all profiler dumps the definitions of all loaded Lua functions
> > (symtab) via the write buffer introduced in one of the previous patches.
> > 
> > Profiler replaces the old allocation function with the instrumented one
> > after symtab is dumped. This new function reports all allocations,
> > reallocations or deallocations events via the write buffer during
> > profiling. Subsequent content depends on the function's type (LFUNC,
> > FFUNC or CFUNC).
> > 
> > To divide all traces into the one vmstate when being profiled, a special
> > macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.
> > 
> > When profiling is over, a special epilogue event header is written and
> > the old allocation function is restored back.
> > 
> > This change also makes debug_frameline function LuaJIT-wide visible to
> > be used in the memory profiler.
> > 
> > For more information, see <lj_memprof.h>.
> > 
> > Part of tarantool/tarantool#5442
> > ---
> > 
> > Changes in v2:
> >   - Merged with debug-to-public commit and symtab.
> >   - Drop [T]imer bit description.
> > 
> >  src/Makefile     |   8 +-
> >  src/Makefile.dep |  31 ++--
> >  src/lj_arch.h    |  22 +++
> >  src/lj_debug.c   |   8 +-
> >  src/lj_debug.h   |   3 +
> >  src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++
> >  src/lj_memprof.h | 165 ++++++++++++++++++
> >  src/lj_obj.h     |   8 +
> >  src/lj_state.c   |   8 +
> >  src/ljamalg.c    |   1 +
> >  10 files changed, 665 insertions(+), 19 deletions(-)
> >  create mode 100644 src/lj_memprof.c
> >  create mode 100644 src/lj_memprof.h
> > 
> > diff --git a/src/Makefile b/src/Makefile
> > index 384b590..3218dfd 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -113,6 +113,12 @@ XCFLAGS=
> >  # Enable GC64 mode for x64.
> >  #XCFLAGS+= -DLUAJIT_ENABLE_GC64
> >  #
> > +# Disable the memory profiler.
> > +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
> > +#
> > +# Disable the thread safe profiler.
> > +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE
> 
> Well, I personally see little sense in this flag *now*. AFAICS it can
> only occur in multithread host application running a separate VM per
> thread. But this not quite well implemented even in <jit.p> profiler
> (see Mike's comment here[1]), so you can either disable it for now
> (since this is only MVP) or make it the same way Mike did: lock a mutex
> if <memprof->g> is not NULL; otherwise simply initialize the profiler.
> By the way, there is no mutex used for POSIX targets in <jit.p>, but I
> don't know why.

OK, I suppose the best solution for now is to drop all related changes
about it. I'll add corresponding notice in the RFC and header file.

> 
> > +#
> >  ##############################################################################
> >  
> >  ##############################################################################
> 
> <snipped>
> 
> > diff --git a/src/lj_arch.h b/src/lj_arch.h
> > index c8d7138..5967849 100644
> > --- a/src/lj_arch.h
> > +++ b/src/lj_arch.h
> 
> <snipped>
> 
> > +/* Disable or enable the memory profiler's thread safety. */
> > +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
> > +#define LJ_THREAD_SAFE		0
> > +#else
> > +#define LJ_THREAD_SAFE		1
> 
> Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/.
> 
> Otherwise thread safety doesn't work at all.

Dropped for now.

> 
> > +#endif
> > +
> >  #endif
> 
> <snipped>
> 
> > diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> > new file mode 100644
> > index 0000000..e0df057
> > --- /dev/null
> > +++ b/src/lj_memprof.c
> > @@ -0,0 +1,430 @@
> > +/*
> > +** Implementation of memory profiler.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#define lj_memprof_c
> > +#define LUA_CORE
> > +
> > +#include <errno.h>
> > +
> > +#include "lj_memprof.h"
> > +#include "lj_def.h"
> 
> <snipped>
> 
> > +#include "lua.h"
> 
> This include is excess.

Removed.

> 
> > +
> > +#include "lj_obj.h"
> > +#include "lj_frame.h"
> > +#include "lj_debug.h"
> > +#include "lj_gc.h"
> 
> This include is excess.

Removed. And from Makefile.dep too.

> 
> > +#include "lj_wbuf.h"
> > +
> > +/* --------------------------------- Symtab --------------------------------- */
> 
> <snipped>
> 
> > +static void symtab_write_prologue(struct lj_wbuf *out)
> > +{
> > +  const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
> > +  lj_wbuf_addn(out, ljs_header, len);
> > +}
> 
> Minor: Again, I guess this function semantics can be moved right to its
> caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte).

Moved.

> 
> > +
> 
> <snipped>
> 
> > +/* ---------------------------- Memory profiler ----------------------------- */
> 
> <snipped>
> 
> > +struct alloc {
> > +  lua_Alloc allocf; /* Allocating function. */
> > +  void *state; /* Opaque allocator's state. */
> > +};
> 
> Minor: This structure can be used in <lj_obj.h> to store the default
> allocator. Feel free to ignore.

Agree. But I suggest to do it with refactoring later. It'll cause
a lot of unnecessary code changes now.

> 
> > +
> 
> <snipped>
> 
> > +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header,
> > +				GCfunc *fn, struct lua_State *L,
> > +				cTValue *nextframe)
> > +{
> > +  const BCLine line = lj_debug_frameline(L, fn, nextframe);
> > +  lj_wbuf_addbyte(out, header | ASOURCE_LFUNC);
> > +  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
> > +  lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0);
> 
> As we discussed offline, I have two notes for this line:
> * When the <line> value can be negative?

Not for Lua function, AFAICS. I rewrote this part to the following:

|  const BCLine line = lj_debug_frameline(L, fn, nextframe);
|  /*
|  ** Line is always >= 0 if we are inside a Lua function.
|  ** Equals to zero when LuaJIT is built with the
|  ** -DLUAJIT_DISABLE_DEBUGINFO flag.
|  */
|  lua_assert(line >= 0);
|  lj_wbuf_addbyte(out, aevent | ASOURCE_LFUNC);
|  lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
|  lj_wbuf_addu64(out, (uint64_t)line);

> * When the <line> value can be zero?

When LuaJIT is built with `-DLUAJIT_DISABLE_DEBUGINFO` - exist only
in <src/lj_parse.c>. Anyway, it's insignificant after the changes
above.

> 
> Furthermore, I have no idea, why <line> is casted to <uintptr_t>.

Typo. Fixed.

> 
> > +}
> 
> <snipped>
> 
> > +static void memprof_write_func(struct memprof *mp, uint8_t header)
> > +{
> > +  struct lj_wbuf *out = &mp->out;
> > +  lua_State *L = gco2th(gcref(mp->g->mem_L));
> > +  cTValue *frame = L->base - 1;
> > +  GCfunc *fn;
> > +
> > +  fn = frame_func(frame);
> 
> Minor: Why does this line differ from those above?

Fixed.

> 
> > +
> 
> <snipped>
> 
> > +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
> > +{
> > +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> > +}
> > +
> > +/*
> > +** XXX: In ideal world, we should report allocations from traces as well.
> > +** But since traces must follow the semantics of the original code, behaviour of
> > +** Lua and JITted code must match 1:1 in terms of allocations, which makes
> > +** using memprof with enabled JIT virtually redundant. Hence the stub below.
> > +*/
> 
> I guess you can drop the function below (since it simply duplicates the
> INTERNAL allocation semantics), *but* the comment above is nice, so you
> can move it to the default function or to the corresponding item in
> <memprof_writers> set.

Done.

> 
> > +static void memprof_write_trace(struct memprof *mp, uint8_t header)
> > +{
> > +  lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> > +}
> > +
> > +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);
> 
> Minor: I guess this is not <header> but <aevent>.

Fixed here and above.

> 
> > +
> 
> <snipped>
> 
> > +static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> > +{
> > +  const global_State *g = mp->g;
> > +  const uint32_t _vmstate = (uint32_t)~g->vmstate;
> > +  const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
> > +  const uint8_t header = aevent;
> 
> General Genius :)

Fixed.

> 
> > +
> > +  memprof_writers[vmstate](mp, header);
> > +}
> 
> <snipped>
> 
> > +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
> > +{
> > +  struct memprof *mp = &memprof;
> 
> This should be "const", IMHO.

Not here (we write to wbuf and it is nested into memprof structure) but
for oalloc below, fixed.

> 
> > +  struct alloc *oalloc = &mp->orig_alloc;
> 
> <snipped>
> 
> > +}
> > +
> > +static void memprof_write_prologue(struct lj_wbuf *out)
> > +{
> > +  const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
> > +  lj_wbuf_addn(out, ljm_header, len);
> > +}
> 
> See comments for <symtab_write_prologue>.

Fixed.

> 
> > +
> > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)
> 
> Side note: it's better to wrap this function to move all mutex-related
> work in this wrapper.

I'll drop mutex related work for now.

> 
> > +{
> 
> <snipped>
> 
> > +  lua_assert(opt->writer != NULL && opt->on_stop != NULL);
> > +  lua_assert(opt->buf != NULL && opt->len != 0);
> 
> Do these asserts depend on each other? If no, please split them into the
> separate ones.

Done.

> 
> > +
> 
> <snipped>
> 
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) ||
> > +		  lj_wbuf_test_flag(&mp->out, STREAM_STOP))) {
> 
> You can test (STREAM_ERR_IO|STREAM_STOP) here.

Yes, it's pretier! Thank you!

> 
> > +    /* on_stop call may change errno value. */
> 
> <snipped>
> 
> > +}
> > +
> > +static int memprof_stop(const struct lua_State *L)
> > +{
> 
> <snipped>
> 
> > +  if (L != NULL && mp->g != G(L)) {
> 
> This is a nice check (but looks redundant for the current version). Why
> did you make it optional (if L is given)?

Dropped L-part (considering your comments below).

> 
> > +    memprof_unlock();
> > +    return PROFILE_ERR;
> > +  }
> 
> <snipped>
> 
> > +  main_L = mainthread(mp->g);
> 
> Why do you use main coroutine here instead of the given one?

See no reason for that now... Dropped.

> 
> > +
> 
> <snipped>
> 
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> > +    lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> > +    mp->state = MPS_HALT;
> > +    /* on_stop call may change errno value. */
> > +    mp->saved_errno = lj_wbuf_errno(out);
> > +    /* Ignore possible errors. mp->opt.buf == NULL here. */
> > +    mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > +    lj_wbuf_terminate(out);
> > +    memprof_unlock();
> > +    return PROFILE_ERRIO;
> > +  }
> 
> <snipped>
> 
> > +  cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > +  if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> > +    saved_errno = lj_wbuf_errno(out);
> > +    return_status = PROFILE_ERRIO;
> 
> Well, you introduce a separate variable for the status. For what? I
> think this branch is the same as the "stop" one above. So either use
> goto to the error branch or duplicate the logic from the "stop" branch.
> In general, this looks like a mess: I almost lost the mutex unlock and
> the buffer termination.

Yes, looks unclear. I'll rewrite it in the new version.

> 
> > +  }
> > +
> > +  lj_wbuf_terminate(out);
> > +
> > +  memprof_unlock();
> > +  errno = saved_errno;
> > +  return return_status;
> > +}
> > +
> > +int lj_memprof_stop(void)
> 
> Why do you omit the coroutine argument for this function?
> 
> > +{
> 
> <snipped>
> 
> > +int lj_memprof_stop_vm(const struct lua_State *L)
> 
> This function is not used anywhere. Please drop it.

About this and function above.
I'll provide 1 interface with the coroutine argument.

> 
> > +{
> 
> <snipped>
> 
> > +int lj_memprof_is_running(void)
> 
> This function is used only on global state finalization. However, it is
> excess (see the comment below), so I believe it can be dropped.

Done.

> 
> > +{
> 
> <snipped>
> 
> > diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> > new file mode 100644
> > index 0000000..a96b72f
> > --- /dev/null
> > +++ b/src/lj_memprof.h
> > @@ -0,0 +1,165 @@
> 
> <snipped>
> 
> > +#include <stdint.h>
> > +#include <stddef.h>
> 
> lj_def.h is definitely enough here.

Yep.

> 
> > +
> 
> <snipped>
> 
> > +#define SYMTAB_FFUNC ((uint8_t)1)
> > +#define SYMTAB_CFUNC ((uint8_t)2)
> > +#define SYMTAB_TRACE ((uint8_t)3)
> 
> These defines are unused.

Removed.

> 
> > +#define SYMTAB_FINAL ((uint8_t)0x80)
> 
> <snipped>
> 
> > +/* Profiler public API. */
> > +#define PROFILE_SUCCESS 0
> > +#define PROFILE_ERR     1
> 
> Minor: Considering the usage <PROFILE_ERRUSE> looks better to me. Feel
> free to ignore.

Hmm, indead. Applied.

> 
> > +#define PROFILE_ERRRUN  2
> > +#define PROFILE_ERRMEM  3
> > +#define PROFILE_ERRIO   4
> > +
> > +/* Profiler options. */
> > +struct lua_Prof_options {
> 
> Typo: s/lua_Prof/lj_memprof/ since <lua_*> is used for exported members.

Fixed.

> 
> > +  /* Context for the profile writer and final callback. */
> > +  void *ctx;
> > +  /* Custom buffer to write data. */
> > +  uint8_t *buf;
> > +  /* The buffer's size. */
> > +  size_t len;
> > +  /*
> > +  ** Writer function for profile events.
> > +  ** Should return amount of written bytes on success or zero in case of error.
> > +  ** Setting *data to NULL means end of profiling.
> > +  */
> 
> Why don't you use <lj_wbuf_writer> typedef below? So you can reference
> <lj_wbuf.h> for the writer contract.

Updated.

> 
> > +  size_t (*writer)(const void **data, size_t len, void *ctx);
> > +  /*
> > +  ** Callback on profiler stopping. Required for correctly cleaning
> > +  ** at vm shoutdown when profiler still running.
> 
> Typo: s/VM shoutdown/VM finalization/.
> Typo: s/profiled still running/profiler is still running/.

Thanks! Fixed.

> 
> > +  ** Returns zero on success.
> > +  */
> > +  int (*on_stop)(void *ctx, uint8_t *buf);
> > +};
> > +
> > +/* Avoid to provide additional interfaces described in other headers. */
> 
> It looks like cargo cult, IMHO. What is the reason?

You don't add extra information in object file and it's become
smaller, for example :). Avoiding namespace polution is a small
benefit.

> 
> > +struct lua_State;
> > +
> > +/*
> > +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> > +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of
> > +** LUAM_PROFILE_ERRIO.
> 
> Typo: s/LUAM_PROFILE_*/PROFILE_*/g.

Fixed! Thanks!

> 
> > +*/
> > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt);
> > +
> > +/*
> > +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> > +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
> > +** on call at buffer flush, profiled stream stops, or on_stop() callback
> > +** returns non-zero value, returns LUAM_PROFILE_ERRIO.
> > +*/
> 
> <snipped>
> 
> > diff --git a/src/lj_obj.h b/src/lj_obj.h
> > index c94617d..c94b0bb 100644
> > --- a/src/lj_obj.h
> > +++ b/src/lj_obj.h
> > @@ -523,6 +523,14 @@ enum {
> >    LJ_VMST__MAX
> >  };
> >  
> > +/*
> > +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
> > +** In fact, when VM executes a trace, vmstate is set to the trace number, but
> > +** we aggregate all such cases into one VM state during per-VM state profiling.
> > +*/
> 
> Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide
> one. If <vmstate> is less than LJ_VMST__MAX it is considered as a trace
> number and all LuaJIT universe works with assumtions the trace is being
> run. It looks natural to me to move this change to the previous patch
> related to VM states slivering into *FUNC set to close this story there.

Done.

> 
> > +
> > +#define LJ_VMST_TRACE		(LJ_VMST__MAX)
> > +
> >  #define setvmstate(g, st)	((g)->vmstate = ~LJ_VMST_##st)
> >  
> >  /* Metamethods. ORDER MM */
> > diff --git a/src/lj_state.c b/src/lj_state.c
> > index 1d9c628..6c46e3d 100644
> > --- a/src/lj_state.c
> > +++ b/src/lj_state.c
> 
> <snipped>
> 
> > @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
> >    global_State *g = G(L);
> >    int i;
> >    L = mainthread(g);  /* Only the main thread can be closed. */
> > +#if LJ_HASMEMPROF
> > +  if (lj_memprof_is_running())
> 
> This check is excess, since you don't check the return value below. If
> <lj_memprof_stop> is called when profiler doesn't work, PROFILE_ERRRUN
> or PROFILE_ERRIO is yield.

Dropped check.

> 
> > +    lj_memprof_stop();
> > +#endif
> >  #if LJ_HASPROFILE
> >    luaJIT_profile_stop(L);
> >  #endif
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> [1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89
> 
> -- 
> Best regards,
> IM

Also I'll rename STREAM_ERR_* -> STREAM_ERR* for consistency.
I'll send the iterative patch for wbuf module soon.

See you soon in v3 :)

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list