From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 87B8A4765E3 for ; Mon, 28 Dec 2020 00:48:01 +0300 (MSK) Date: Mon, 28 Dec 2020 00:47:15 +0300 From: Sergey Kaplun Message-ID: <20201227214715.GE14702@root> References: <036ac4034b714d9c3338246fd1bb3b373ef94b64.1608907726.git.skaplun@tarantool.org> <20201227164437.GK5396@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201227164437.GK5396@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org Hi, Igor! Thanks for the review! On 27.12.20, Igor Munkin wrote: > Sergey, > > Thanks for the patch! Please consider the comments below. > > On 25.12.20, Sergey Kaplun wrote: > > This patch introduces memory profiler for Lua machine. > > > > First of all profiler dumps the definitions of all loaded Lua functions > > (symtab) via the write buffer introduced in one of the previous patches. > > > > Profiler replaces the old allocation function with the instrumented one > > after symtab is dumped. This new function reports all allocations, > > reallocations or deallocations events via the write buffer during > > profiling. Subsequent content depends on the function's type (LFUNC, > > FFUNC or CFUNC). > > > > To divide all traces into the one vmstate when being profiled, a special > > macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced. > > > > When profiling is over, a special epilogue event header is written and > > the old allocation function is restored back. > > > > This change also makes debug_frameline function LuaJIT-wide visible to > > be used in the memory profiler. > > > > For more information, see . > > > > Part of tarantool/tarantool#5442 > > --- > > > > Changes in v2: > > - Merged with debug-to-public commit and symtab. > > - Drop [T]imer bit description. > > > > src/Makefile | 8 +- > > src/Makefile.dep | 31 ++-- > > src/lj_arch.h | 22 +++ > > src/lj_debug.c | 8 +- > > src/lj_debug.h | 3 + > > src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++ > > src/lj_memprof.h | 165 ++++++++++++++++++ > > src/lj_obj.h | 8 + > > src/lj_state.c | 8 + > > src/ljamalg.c | 1 + > > 10 files changed, 665 insertions(+), 19 deletions(-) > > create mode 100644 src/lj_memprof.c > > create mode 100644 src/lj_memprof.h > > > > diff --git a/src/Makefile b/src/Makefile > > index 384b590..3218dfd 100644 > > --- a/src/Makefile > > +++ b/src/Makefile > > @@ -113,6 +113,12 @@ XCFLAGS= > > # Enable GC64 mode for x64. > > #XCFLAGS+= -DLUAJIT_ENABLE_GC64 > > # > > +# Disable the memory profiler. > > +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF > > +# > > +# Disable the thread safe profiler. > > +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE > > Well, I personally see little sense in this flag *now*. AFAICS it can > only occur in multithread host application running a separate VM per > thread. But this not quite well implemented even in profiler > (see Mike's comment here[1]), so you can either disable it for now > (since this is only MVP) or make it the same way Mike did: lock a mutex > if g> is not NULL; otherwise simply initialize the profiler. > By the way, there is no mutex used for POSIX targets in , but I > don't know why. OK, I suppose the best solution for now is to drop all related changes about it. I'll add corresponding notice in the RFC and header file. > > > +# > > ############################################################################## > > > > ############################################################################## > > > > > diff --git a/src/lj_arch.h b/src/lj_arch.h > > index c8d7138..5967849 100644 > > --- a/src/lj_arch.h > > +++ b/src/lj_arch.h > > > > > +/* Disable or enable the memory profiler's thread safety. */ > > +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360 > > +#define LJ_THREAD_SAFE 0 > > +#else > > +#define LJ_THREAD_SAFE 1 > > Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/. > > Otherwise thread safety doesn't work at all. Dropped for now. > > > +#endif > > + > > #endif > > > > > diff --git a/src/lj_memprof.c b/src/lj_memprof.c > > new file mode 100644 > > index 0000000..e0df057 > > --- /dev/null > > +++ b/src/lj_memprof.c > > @@ -0,0 +1,430 @@ > > +/* > > +** Implementation of memory profiler. > > +** > > +** Major portions taken verbatim or adapted from the LuaVela. > > +** Copyright (C) 2015-2019 IPONWEB Ltd. > > +*/ > > + > > +#define lj_memprof_c > > +#define LUA_CORE > > + > > +#include > > + > > +#include "lj_memprof.h" > > +#include "lj_def.h" > > > > > +#include "lua.h" > > This include is excess. Removed. > > > + > > +#include "lj_obj.h" > > +#include "lj_frame.h" > > +#include "lj_debug.h" > > +#include "lj_gc.h" > > This include is excess. Removed. And from Makefile.dep too. > > > +#include "lj_wbuf.h" > > + > > +/* --------------------------------- Symtab --------------------------------- */ > > > > > +static void symtab_write_prologue(struct lj_wbuf *out) > > +{ > > + const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]); > > + lj_wbuf_addn(out, ljs_header, len); > > +} > > Minor: Again, I guess this function semantics can be moved right to its > caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte). Moved. > > > + > > > > > +/* ---------------------------- Memory profiler ----------------------------- */ > > > > > +struct alloc { > > + lua_Alloc allocf; /* Allocating function. */ > > + void *state; /* Opaque allocator's state. */ > > +}; > > Minor: This structure can be used in to store the default > allocator. Feel free to ignore. Agree. But I suggest to do it with refactoring later. It'll cause a lot of unnecessary code changes now. > > > + > > > > > +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header, > > + GCfunc *fn, struct lua_State *L, > > + cTValue *nextframe) > > +{ > > + const BCLine line = lj_debug_frameline(L, fn, nextframe); > > + lj_wbuf_addbyte(out, header | ASOURCE_LFUNC); > > + lj_wbuf_addu64(out, (uintptr_t)funcproto(fn)); > > + lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0); > > As we discussed offline, I have two notes for this line: > * When the value can be negative? Not for Lua function, AFAICS. I rewrote this part to the following: | const BCLine line = lj_debug_frameline(L, fn, nextframe); | /* | ** Line is always >= 0 if we are inside a Lua function. | ** Equals to zero when LuaJIT is built with the | ** -DLUAJIT_DISABLE_DEBUGINFO flag. | */ | lua_assert(line >= 0); | lj_wbuf_addbyte(out, aevent | ASOURCE_LFUNC); | lj_wbuf_addu64(out, (uintptr_t)funcproto(fn)); | lj_wbuf_addu64(out, (uint64_t)line); > * When the value can be zero? When LuaJIT is built with `-DLUAJIT_DISABLE_DEBUGINFO` - exist only in . Anyway, it's insignificant after the changes above. > > Furthermore, I have no idea, why is casted to . Typo. Fixed. > > > +} > > > > > +static void memprof_write_func(struct memprof *mp, uint8_t header) > > +{ > > + struct lj_wbuf *out = &mp->out; > > + lua_State *L = gco2th(gcref(mp->g->mem_L)); > > + cTValue *frame = L->base - 1; > > + GCfunc *fn; > > + > > + fn = frame_func(frame); > > Minor: Why does this line differ from those above? Fixed. > > > + > > > > > +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header) > > +{ > > + lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT); > > +} > > + > > +/* > > +** XXX: In ideal world, we should report allocations from traces as well. > > +** But since traces must follow the semantics of the original code, behaviour of > > +** Lua and JITted code must match 1:1 in terms of allocations, which makes > > +** using memprof with enabled JIT virtually redundant. Hence the stub below. > > +*/ > > I guess you can drop the function below (since it simply duplicates the > INTERNAL allocation semantics), *but* the comment above is nice, so you > can move it to the default function or to the corresponding item in > set. Done. > > > +static void memprof_write_trace(struct memprof *mp, uint8_t header) > > +{ > > + lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT); > > +} > > + > > +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header); > > Minor: I guess this is not
but . Fixed here and above. > > > + > > > > > +static void memprof_write_caller(struct memprof *mp, uint8_t aevent) > > +{ > > + const global_State *g = mp->g; > > + const uint32_t _vmstate = (uint32_t)~g->vmstate; > > + const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE; > > + const uint8_t header = aevent; > > General Genius :) Fixed. > > > + > > + memprof_writers[vmstate](mp, header); > > +} > > > > > +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize) > > +{ > > + struct memprof *mp = &memprof; > > This should be "const", IMHO. Not here (we write to wbuf and it is nested into memprof structure) but for oalloc below, fixed. > > > + struct alloc *oalloc = &mp->orig_alloc; > > > > > +} > > + > > +static void memprof_write_prologue(struct lj_wbuf *out) > > +{ > > + const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]); > > + lj_wbuf_addn(out, ljm_header, len); > > +} > > See comments for . Fixed. > > > + > > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt) > > Side note: it's better to wrap this function to move all mutex-related > work in this wrapper. I'll drop mutex related work for now. > > > +{ > > > > > + lua_assert(opt->writer != NULL && opt->on_stop != NULL); > > + lua_assert(opt->buf != NULL && opt->len != 0); > > Do these asserts depend on each other? If no, please split them into the > separate ones. Done. > > > + > > > > > + if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) || > > + lj_wbuf_test_flag(&mp->out, STREAM_STOP))) { > > You can test (STREAM_ERR_IO|STREAM_STOP) here. Yes, it's pretier! Thank you! > > > + /* on_stop call may change errno value. */ > > > > > +} > > + > > +static int memprof_stop(const struct lua_State *L) > > +{ > > > > > + if (L != NULL && mp->g != G(L)) { > > This is a nice check (but looks redundant for the current version). Why > did you make it optional (if L is given)? Dropped L-part (considering your comments below). > > > + memprof_unlock(); > > + return PROFILE_ERR; > > + } > > > > > + main_L = mainthread(mp->g); > > Why do you use main coroutine here instead of the given one? See no reason for that now... Dropped. > > > + > > > > > + if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) { > > + lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO)); > > + mp->state = MPS_HALT; > > + /* on_stop call may change errno value. */ > > + mp->saved_errno = lj_wbuf_errno(out); > > + /* Ignore possible errors. mp->opt.buf == NULL here. */ > > + mp->opt.on_stop(mp->opt.ctx, mp->opt.buf); > > + lj_wbuf_terminate(out); > > + memprof_unlock(); > > + return PROFILE_ERRIO; > > + } > > > > > + cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf); > > + if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) { > > + saved_errno = lj_wbuf_errno(out); > > + return_status = PROFILE_ERRIO; > > Well, you introduce a separate variable for the status. For what? I > think this branch is the same as the "stop" one above. So either use > goto to the error branch or duplicate the logic from the "stop" branch. > In general, this looks like a mess: I almost lost the mutex unlock and > the buffer termination. Yes, looks unclear. I'll rewrite it in the new version. > > > + } > > + > > + lj_wbuf_terminate(out); > > + > > + memprof_unlock(); > > + errno = saved_errno; > > + return return_status; > > +} > > + > > +int lj_memprof_stop(void) > > Why do you omit the coroutine argument for this function? > > > +{ > > > > > +int lj_memprof_stop_vm(const struct lua_State *L) > > This function is not used anywhere. Please drop it. About this and function above. I'll provide 1 interface with the coroutine argument. > > > +{ > > > > > +int lj_memprof_is_running(void) > > This function is used only on global state finalization. However, it is > excess (see the comment below), so I believe it can be dropped. Done. > > > +{ > > > > > diff --git a/src/lj_memprof.h b/src/lj_memprof.h > > new file mode 100644 > > index 0000000..a96b72f > > --- /dev/null > > +++ b/src/lj_memprof.h > > @@ -0,0 +1,165 @@ > > > > > +#include > > +#include > > lj_def.h is definitely enough here. Yep. > > > + > > > > > +#define SYMTAB_FFUNC ((uint8_t)1) > > +#define SYMTAB_CFUNC ((uint8_t)2) > > +#define SYMTAB_TRACE ((uint8_t)3) > > These defines are unused. Removed. > > > +#define SYMTAB_FINAL ((uint8_t)0x80) > > > > > +/* Profiler public API. */ > > +#define PROFILE_SUCCESS 0 > > +#define PROFILE_ERR 1 > > Minor: Considering the usage looks better to me. Feel > free to ignore. Hmm, indead. Applied. > > > +#define PROFILE_ERRRUN 2 > > +#define PROFILE_ERRMEM 3 > > +#define PROFILE_ERRIO 4 > > + > > +/* Profiler options. */ > > +struct lua_Prof_options { > > Typo: s/lua_Prof/lj_memprof/ since is used for exported members. Fixed. > > > + /* Context for the profile writer and final callback. */ > > + void *ctx; > > + /* Custom buffer to write data. */ > > + uint8_t *buf; > > + /* The buffer's size. */ > > + size_t len; > > + /* > > + ** Writer function for profile events. > > + ** Should return amount of written bytes on success or zero in case of error. > > + ** Setting *data to NULL means end of profiling. > > + */ > > Why don't you use typedef below? So you can reference > for the writer contract. Updated. > > > + size_t (*writer)(const void **data, size_t len, void *ctx); > > + /* > > + ** Callback on profiler stopping. Required for correctly cleaning > > + ** at vm shoutdown when profiler still running. > > Typo: s/VM shoutdown/VM finalization/. > Typo: s/profiled still running/profiler is still running/. Thanks! Fixed. > > > + ** Returns zero on success. > > + */ > > + int (*on_stop)(void *ctx, uint8_t *buf); > > +}; > > + > > +/* Avoid to provide additional interfaces described in other headers. */ > > It looks like cargo cult, IMHO. What is the reason? You don't add extra information in object file and it's become smaller, for example :). Avoiding namespace polution is a small benefit. > > > +struct lua_State; > > + > > +/* > > +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of > > +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of > > +** LUAM_PROFILE_ERRIO. > > Typo: s/LUAM_PROFILE_*/PROFILE_*/g. Fixed! Thanks! > > > +*/ > > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt); > > + > > +/* > > +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of > > +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero > > +** on call at buffer flush, profiled stream stops, or on_stop() callback > > +** returns non-zero value, returns LUAM_PROFILE_ERRIO. > > +*/ > > > > > diff --git a/src/lj_obj.h b/src/lj_obj.h > > index c94617d..c94b0bb 100644 > > --- a/src/lj_obj.h > > +++ b/src/lj_obj.h > > @@ -523,6 +523,14 @@ enum { > > LJ_VMST__MAX > > }; > > > > +/* > > +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler. > > +** In fact, when VM executes a trace, vmstate is set to the trace number, but > > +** we aggregate all such cases into one VM state during per-VM state profiling. > > +*/ > > Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide > one. If is less than LJ_VMST__MAX it is considered as a trace > number and all LuaJIT universe works with assumtions the trace is being > run. It looks natural to me to move this change to the previous patch > related to VM states slivering into *FUNC set to close this story there. Done. > > > + > > +#define LJ_VMST_TRACE (LJ_VMST__MAX) > > + > > #define setvmstate(g, st) ((g)->vmstate = ~LJ_VMST_##st) > > > > /* Metamethods. ORDER MM */ > > diff --git a/src/lj_state.c b/src/lj_state.c > > index 1d9c628..6c46e3d 100644 > > --- a/src/lj_state.c > > +++ b/src/lj_state.c > > > > > @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L) > > global_State *g = G(L); > > int i; > > L = mainthread(g); /* Only the main thread can be closed. */ > > +#if LJ_HASMEMPROF > > + if (lj_memprof_is_running()) > > This check is excess, since you don't check the return value below. If > is called when profiler doesn't work, PROFILE_ERRRUN > or PROFILE_ERRIO is yield. Dropped check. > > > + lj_memprof_stop(); > > +#endif > > #if LJ_HASPROFILE > > luaJIT_profile_stop(L); > > #endif > > > > > -- > > 2.28.0 > > > > [1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89 > > -- > Best regards, > IM Also I'll rename STREAM_ERR_* -> STREAM_ERR* for consistency. I'll send the iterative patch for wbuf module soon. See you soon in v3 :) -- Best regards, Sergey Kaplun