[Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler
Sergey Kaplun
skaplun at tarantool.org
Mon Dec 28 00:47:15 MSK 2020
Hi, Igor!
Thanks for the review!
On 27.12.20, Igor Munkin wrote:
> Sergey,
>
> Thanks for the patch! Please consider the comments below.
>
> On 25.12.20, Sergey Kaplun wrote:
> > This patch introduces memory profiler for Lua machine.
> >
> > First of all profiler dumps the definitions of all loaded Lua functions
> > (symtab) via the write buffer introduced in one of the previous patches.
> >
> > Profiler replaces the old allocation function with the instrumented one
> > after symtab is dumped. This new function reports all allocations,
> > reallocations or deallocations events via the write buffer during
> > profiling. Subsequent content depends on the function's type (LFUNC,
> > FFUNC or CFUNC).
> >
> > To divide all traces into the one vmstate when being profiled, a special
> > macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced.
> >
> > When profiling is over, a special epilogue event header is written and
> > the old allocation function is restored back.
> >
> > This change also makes debug_frameline function LuaJIT-wide visible to
> > be used in the memory profiler.
> >
> > For more information, see <lj_memprof.h>.
> >
> > Part of tarantool/tarantool#5442
> > ---
> >
> > Changes in v2:
> > - Merged with debug-to-public commit and symtab.
> > - Drop [T]imer bit description.
> >
> > src/Makefile | 8 +-
> > src/Makefile.dep | 31 ++--
> > src/lj_arch.h | 22 +++
> > src/lj_debug.c | 8 +-
> > src/lj_debug.h | 3 +
> > src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++
> > src/lj_memprof.h | 165 ++++++++++++++++++
> > src/lj_obj.h | 8 +
> > src/lj_state.c | 8 +
> > src/ljamalg.c | 1 +
> > 10 files changed, 665 insertions(+), 19 deletions(-)
> > create mode 100644 src/lj_memprof.c
> > create mode 100644 src/lj_memprof.h
> >
> > diff --git a/src/Makefile b/src/Makefile
> > index 384b590..3218dfd 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -113,6 +113,12 @@ XCFLAGS=
> > # Enable GC64 mode for x64.
> > #XCFLAGS+= -DLUAJIT_ENABLE_GC64
> > #
> > +# Disable the memory profiler.
> > +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF
> > +#
> > +# Disable the thread safe profiler.
> > +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE
>
> Well, I personally see little sense in this flag *now*. AFAICS it can
> only occur in multithread host application running a separate VM per
> thread. But this not quite well implemented even in <jit.p> profiler
> (see Mike's comment here[1]), so you can either disable it for now
> (since this is only MVP) or make it the same way Mike did: lock a mutex
> if <memprof->g> is not NULL; otherwise simply initialize the profiler.
> By the way, there is no mutex used for POSIX targets in <jit.p>, but I
> don't know why.
OK, I suppose the best solution for now is to drop all related changes
about it. I'll add corresponding notice in the RFC and header file.
>
> > +#
> > ##############################################################################
> >
> > ##############################################################################
>
> <snipped>
>
> > diff --git a/src/lj_arch.h b/src/lj_arch.h
> > index c8d7138..5967849 100644
> > --- a/src/lj_arch.h
> > +++ b/src/lj_arch.h
>
> <snipped>
>
> > +/* Disable or enable the memory profiler's thread safety. */
> > +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360
> > +#define LJ_THREAD_SAFE 0
> > +#else
> > +#define LJ_THREAD_SAFE 1
>
> Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/.
>
> Otherwise thread safety doesn't work at all.
Dropped for now.
>
> > +#endif
> > +
> > #endif
>
> <snipped>
>
> > diff --git a/src/lj_memprof.c b/src/lj_memprof.c
> > new file mode 100644
> > index 0000000..e0df057
> > --- /dev/null
> > +++ b/src/lj_memprof.c
> > @@ -0,0 +1,430 @@
> > +/*
> > +** Implementation of memory profiler.
> > +**
> > +** Major portions taken verbatim or adapted from the LuaVela.
> > +** Copyright (C) 2015-2019 IPONWEB Ltd.
> > +*/
> > +
> > +#define lj_memprof_c
> > +#define LUA_CORE
> > +
> > +#include <errno.h>
> > +
> > +#include "lj_memprof.h"
> > +#include "lj_def.h"
>
> <snipped>
>
> > +#include "lua.h"
>
> This include is excess.
Removed.
>
> > +
> > +#include "lj_obj.h"
> > +#include "lj_frame.h"
> > +#include "lj_debug.h"
> > +#include "lj_gc.h"
>
> This include is excess.
Removed. And from Makefile.dep too.
>
> > +#include "lj_wbuf.h"
> > +
> > +/* --------------------------------- Symtab --------------------------------- */
>
> <snipped>
>
> > +static void symtab_write_prologue(struct lj_wbuf *out)
> > +{
> > + const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]);
> > + lj_wbuf_addn(out, ljs_header, len);
> > +}
>
> Minor: Again, I guess this function semantics can be moved right to its
> caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte).
Moved.
>
> > +
>
> <snipped>
>
> > +/* ---------------------------- Memory profiler ----------------------------- */
>
> <snipped>
>
> > +struct alloc {
> > + lua_Alloc allocf; /* Allocating function. */
> > + void *state; /* Opaque allocator's state. */
> > +};
>
> Minor: This structure can be used in <lj_obj.h> to store the default
> allocator. Feel free to ignore.
Agree. But I suggest to do it with refactoring later. It'll cause
a lot of unnecessary code changes now.
>
> > +
>
> <snipped>
>
> > +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header,
> > + GCfunc *fn, struct lua_State *L,
> > + cTValue *nextframe)
> > +{
> > + const BCLine line = lj_debug_frameline(L, fn, nextframe);
> > + lj_wbuf_addbyte(out, header | ASOURCE_LFUNC);
> > + lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
> > + lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0);
>
> As we discussed offline, I have two notes for this line:
> * When the <line> value can be negative?
Not for Lua function, AFAICS. I rewrote this part to the following:
| const BCLine line = lj_debug_frameline(L, fn, nextframe);
| /*
| ** Line is always >= 0 if we are inside a Lua function.
| ** Equals to zero when LuaJIT is built with the
| ** -DLUAJIT_DISABLE_DEBUGINFO flag.
| */
| lua_assert(line >= 0);
| lj_wbuf_addbyte(out, aevent | ASOURCE_LFUNC);
| lj_wbuf_addu64(out, (uintptr_t)funcproto(fn));
| lj_wbuf_addu64(out, (uint64_t)line);
> * When the <line> value can be zero?
When LuaJIT is built with `-DLUAJIT_DISABLE_DEBUGINFO` - exist only
in <src/lj_parse.c>. Anyway, it's insignificant after the changes
above.
>
> Furthermore, I have no idea, why <line> is casted to <uintptr_t>.
Typo. Fixed.
>
> > +}
>
> <snipped>
>
> > +static void memprof_write_func(struct memprof *mp, uint8_t header)
> > +{
> > + struct lj_wbuf *out = &mp->out;
> > + lua_State *L = gco2th(gcref(mp->g->mem_L));
> > + cTValue *frame = L->base - 1;
> > + GCfunc *fn;
> > +
> > + fn = frame_func(frame);
>
> Minor: Why does this line differ from those above?
Fixed.
>
> > +
>
> <snipped>
>
> > +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header)
> > +{
> > + lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> > +}
> > +
> > +/*
> > +** XXX: In ideal world, we should report allocations from traces as well.
> > +** But since traces must follow the semantics of the original code, behaviour of
> > +** Lua and JITted code must match 1:1 in terms of allocations, which makes
> > +** using memprof with enabled JIT virtually redundant. Hence the stub below.
> > +*/
>
> I guess you can drop the function below (since it simply duplicates the
> INTERNAL allocation semantics), *but* the comment above is nice, so you
> can move it to the default function or to the corresponding item in
> <memprof_writers> set.
Done.
>
> > +static void memprof_write_trace(struct memprof *mp, uint8_t header)
> > +{
> > + lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT);
> > +}
> > +
> > +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header);
>
> Minor: I guess this is not <header> but <aevent>.
Fixed here and above.
>
> > +
>
> <snipped>
>
> > +static void memprof_write_caller(struct memprof *mp, uint8_t aevent)
> > +{
> > + const global_State *g = mp->g;
> > + const uint32_t _vmstate = (uint32_t)~g->vmstate;
> > + const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE;
> > + const uint8_t header = aevent;
>
> General Genius :)
Fixed.
>
> > +
> > + memprof_writers[vmstate](mp, header);
> > +}
>
> <snipped>
>
> > +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize)
> > +{
> > + struct memprof *mp = &memprof;
>
> This should be "const", IMHO.
Not here (we write to wbuf and it is nested into memprof structure) but
for oalloc below, fixed.
>
> > + struct alloc *oalloc = &mp->orig_alloc;
>
> <snipped>
>
> > +}
> > +
> > +static void memprof_write_prologue(struct lj_wbuf *out)
> > +{
> > + const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]);
> > + lj_wbuf_addn(out, ljm_header, len);
> > +}
>
> See comments for <symtab_write_prologue>.
Fixed.
>
> > +
> > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt)
>
> Side note: it's better to wrap this function to move all mutex-related
> work in this wrapper.
I'll drop mutex related work for now.
>
> > +{
>
> <snipped>
>
> > + lua_assert(opt->writer != NULL && opt->on_stop != NULL);
> > + lua_assert(opt->buf != NULL && opt->len != 0);
>
> Do these asserts depend on each other? If no, please split them into the
> separate ones.
Done.
>
> > +
>
> <snipped>
>
> > + if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) ||
> > + lj_wbuf_test_flag(&mp->out, STREAM_STOP))) {
>
> You can test (STREAM_ERR_IO|STREAM_STOP) here.
Yes, it's pretier! Thank you!
>
> > + /* on_stop call may change errno value. */
>
> <snipped>
>
> > +}
> > +
> > +static int memprof_stop(const struct lua_State *L)
> > +{
>
> <snipped>
>
> > + if (L != NULL && mp->g != G(L)) {
>
> This is a nice check (but looks redundant for the current version). Why
> did you make it optional (if L is given)?
Dropped L-part (considering your comments below).
>
> > + memprof_unlock();
> > + return PROFILE_ERR;
> > + }
>
> <snipped>
>
> > + main_L = mainthread(mp->g);
>
> Why do you use main coroutine here instead of the given one?
See no reason for that now... Dropped.
>
> > +
>
> <snipped>
>
> > + if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) {
> > + lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO));
> > + mp->state = MPS_HALT;
> > + /* on_stop call may change errno value. */
> > + mp->saved_errno = lj_wbuf_errno(out);
> > + /* Ignore possible errors. mp->opt.buf == NULL here. */
> > + mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > + lj_wbuf_terminate(out);
> > + memprof_unlock();
> > + return PROFILE_ERRIO;
> > + }
>
> <snipped>
>
> > + cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf);
> > + if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) {
> > + saved_errno = lj_wbuf_errno(out);
> > + return_status = PROFILE_ERRIO;
>
> Well, you introduce a separate variable for the status. For what? I
> think this branch is the same as the "stop" one above. So either use
> goto to the error branch or duplicate the logic from the "stop" branch.
> In general, this looks like a mess: I almost lost the mutex unlock and
> the buffer termination.
Yes, looks unclear. I'll rewrite it in the new version.
>
> > + }
> > +
> > + lj_wbuf_terminate(out);
> > +
> > + memprof_unlock();
> > + errno = saved_errno;
> > + return return_status;
> > +}
> > +
> > +int lj_memprof_stop(void)
>
> Why do you omit the coroutine argument for this function?
>
> > +{
>
> <snipped>
>
> > +int lj_memprof_stop_vm(const struct lua_State *L)
>
> This function is not used anywhere. Please drop it.
About this and function above.
I'll provide 1 interface with the coroutine argument.
>
> > +{
>
> <snipped>
>
> > +int lj_memprof_is_running(void)
>
> This function is used only on global state finalization. However, it is
> excess (see the comment below), so I believe it can be dropped.
Done.
>
> > +{
>
> <snipped>
>
> > diff --git a/src/lj_memprof.h b/src/lj_memprof.h
> > new file mode 100644
> > index 0000000..a96b72f
> > --- /dev/null
> > +++ b/src/lj_memprof.h
> > @@ -0,0 +1,165 @@
>
> <snipped>
>
> > +#include <stdint.h>
> > +#include <stddef.h>
>
> lj_def.h is definitely enough here.
Yep.
>
> > +
>
> <snipped>
>
> > +#define SYMTAB_FFUNC ((uint8_t)1)
> > +#define SYMTAB_CFUNC ((uint8_t)2)
> > +#define SYMTAB_TRACE ((uint8_t)3)
>
> These defines are unused.
Removed.
>
> > +#define SYMTAB_FINAL ((uint8_t)0x80)
>
> <snipped>
>
> > +/* Profiler public API. */
> > +#define PROFILE_SUCCESS 0
> > +#define PROFILE_ERR 1
>
> Minor: Considering the usage <PROFILE_ERRUSE> looks better to me. Feel
> free to ignore.
Hmm, indead. Applied.
>
> > +#define PROFILE_ERRRUN 2
> > +#define PROFILE_ERRMEM 3
> > +#define PROFILE_ERRIO 4
> > +
> > +/* Profiler options. */
> > +struct lua_Prof_options {
>
> Typo: s/lua_Prof/lj_memprof/ since <lua_*> is used for exported members.
Fixed.
>
> > + /* Context for the profile writer and final callback. */
> > + void *ctx;
> > + /* Custom buffer to write data. */
> > + uint8_t *buf;
> > + /* The buffer's size. */
> > + size_t len;
> > + /*
> > + ** Writer function for profile events.
> > + ** Should return amount of written bytes on success or zero in case of error.
> > + ** Setting *data to NULL means end of profiling.
> > + */
>
> Why don't you use <lj_wbuf_writer> typedef below? So you can reference
> <lj_wbuf.h> for the writer contract.
Updated.
>
> > + size_t (*writer)(const void **data, size_t len, void *ctx);
> > + /*
> > + ** Callback on profiler stopping. Required for correctly cleaning
> > + ** at vm shoutdown when profiler still running.
>
> Typo: s/VM shoutdown/VM finalization/.
> Typo: s/profiled still running/profiler is still running/.
Thanks! Fixed.
>
> > + ** Returns zero on success.
> > + */
> > + int (*on_stop)(void *ctx, uint8_t *buf);
> > +};
> > +
> > +/* Avoid to provide additional interfaces described in other headers. */
>
> It looks like cargo cult, IMHO. What is the reason?
You don't add extra information in object file and it's become
smaller, for example :). Avoiding namespace polution is a small
benefit.
>
> > +struct lua_State;
> > +
> > +/*
> > +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> > +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of
> > +** LUAM_PROFILE_ERRIO.
>
> Typo: s/LUAM_PROFILE_*/PROFILE_*/g.
Fixed! Thanks!
>
> > +*/
> > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt);
> > +
> > +/*
> > +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of
> > +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero
> > +** on call at buffer flush, profiled stream stops, or on_stop() callback
> > +** returns non-zero value, returns LUAM_PROFILE_ERRIO.
> > +*/
>
> <snipped>
>
> > diff --git a/src/lj_obj.h b/src/lj_obj.h
> > index c94617d..c94b0bb 100644
> > --- a/src/lj_obj.h
> > +++ b/src/lj_obj.h
> > @@ -523,6 +523,14 @@ enum {
> > LJ_VMST__MAX
> > };
> >
> > +/*
> > +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler.
> > +** In fact, when VM executes a trace, vmstate is set to the trace number, but
> > +** we aggregate all such cases into one VM state during per-VM state profiling.
> > +*/
>
> Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide
> one. If <vmstate> is less than LJ_VMST__MAX it is considered as a trace
> number and all LuaJIT universe works with assumtions the trace is being
> run. It looks natural to me to move this change to the previous patch
> related to VM states slivering into *FUNC set to close this story there.
Done.
>
> > +
> > +#define LJ_VMST_TRACE (LJ_VMST__MAX)
> > +
> > #define setvmstate(g, st) ((g)->vmstate = ~LJ_VMST_##st)
> >
> > /* Metamethods. ORDER MM */
> > diff --git a/src/lj_state.c b/src/lj_state.c
> > index 1d9c628..6c46e3d 100644
> > --- a/src/lj_state.c
> > +++ b/src/lj_state.c
>
> <snipped>
>
> > @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L)
> > global_State *g = G(L);
> > int i;
> > L = mainthread(g); /* Only the main thread can be closed. */
> > +#if LJ_HASMEMPROF
> > + if (lj_memprof_is_running())
>
> This check is excess, since you don't check the return value below. If
> <lj_memprof_stop> is called when profiler doesn't work, PROFILE_ERRRUN
> or PROFILE_ERRIO is yield.
Dropped check.
>
> > + lj_memprof_stop();
> > +#endif
> > #if LJ_HASPROFILE
> > luaJIT_profile_stop(L);
> > #endif
>
> <snipped>
>
> > --
> > 2.28.0
> >
>
> [1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89
>
> --
> Best regards,
> IM
Also I'll rename STREAM_ERR_* -> STREAM_ERR* for consistency.
I'll send the iterative patch for wbuf module soon.
See you soon in v3 :)
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list