From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 59A8F4765E3 for ; Sun, 27 Dec 2020 19:44:44 +0300 (MSK) Date: Sun, 27 Dec 2020 19:44:38 +0300 From: Igor Munkin Message-ID: <20201227164437.GK5396@tarantool.org> References: <036ac4034b714d9c3338246fd1bb3b373ef94b64.1608907726.git.skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <036ac4034b714d9c3338246fd1bb3b373ef94b64.1608907726.git.skaplun@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH luajit v2 5/7] core: introduce memory profiler List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Sergey, Thanks for the patch! Please consider the comments below. On 25.12.20, Sergey Kaplun wrote: > This patch introduces memory profiler for Lua machine. > > First of all profiler dumps the definitions of all loaded Lua functions > (symtab) via the write buffer introduced in one of the previous patches. > > Profiler replaces the old allocation function with the instrumented one > after symtab is dumped. This new function reports all allocations, > reallocations or deallocations events via the write buffer during > profiling. Subsequent content depends on the function's type (LFUNC, > FFUNC or CFUNC). > > To divide all traces into the one vmstate when being profiled, a special > macro LJ_VMST_TRACE equal to LJ_VMST__MAX is introduced. > > When profiling is over, a special epilogue event header is written and > the old allocation function is restored back. > > This change also makes debug_frameline function LuaJIT-wide visible to > be used in the memory profiler. > > For more information, see . > > Part of tarantool/tarantool#5442 > --- > > Changes in v2: > - Merged with debug-to-public commit and symtab. > - Drop [T]imer bit description. > > src/Makefile | 8 +- > src/Makefile.dep | 31 ++-- > src/lj_arch.h | 22 +++ > src/lj_debug.c | 8 +- > src/lj_debug.h | 3 + > src/lj_memprof.c | 430 +++++++++++++++++++++++++++++++++++++++++++++++ > src/lj_memprof.h | 165 ++++++++++++++++++ > src/lj_obj.h | 8 + > src/lj_state.c | 8 + > src/ljamalg.c | 1 + > 10 files changed, 665 insertions(+), 19 deletions(-) > create mode 100644 src/lj_memprof.c > create mode 100644 src/lj_memprof.h > > diff --git a/src/Makefile b/src/Makefile > index 384b590..3218dfd 100644 > --- a/src/Makefile > +++ b/src/Makefile > @@ -113,6 +113,12 @@ XCFLAGS= > # Enable GC64 mode for x64. > #XCFLAGS+= -DLUAJIT_ENABLE_GC64 > # > +# Disable the memory profiler. > +#XCFLAGS+= -DLUAJIT_DISABLE_MEMPROF > +# > +# Disable the thread safe profiler. > +#XCFLAGS+= -DLUAJIT_DISABLE_THREAD_SAFE Well, I personally see little sense in this flag *now*. AFAICS it can only occur in multithread host application running a separate VM per thread. But this not quite well implemented even in profiler (see Mike's comment here[1]), so you can either disable it for now (since this is only MVP) or make it the same way Mike did: lock a mutex if g> is not NULL; otherwise simply initialize the profiler. By the way, there is no mutex used for POSIX targets in , but I don't know why. > +# > ############################################################################## > > ############################################################################## > diff --git a/src/lj_arch.h b/src/lj_arch.h > index c8d7138..5967849 100644 > --- a/src/lj_arch.h > +++ b/src/lj_arch.h > +/* Disable or enable the memory profiler's thread safety. */ > +#if defined(LUAJIT_DISABLE_THREAD_SAFE) || LJ_TARGET_WINDOWS || LJ_TARGET_XBOX360 > +#define LJ_THREAD_SAFE 0 > +#else > +#define LJ_THREAD_SAFE 1 Typo: s/LJ_THREAD_SAFE/LJ_IS_THREAD_SAFE/. Otherwise thread safety doesn't work at all. > +#endif > + > #endif > diff --git a/src/lj_memprof.c b/src/lj_memprof.c > new file mode 100644 > index 0000000..e0df057 > --- /dev/null > +++ b/src/lj_memprof.c > @@ -0,0 +1,430 @@ > +/* > +** Implementation of memory profiler. > +** > +** Major portions taken verbatim or adapted from the LuaVela. > +** Copyright (C) 2015-2019 IPONWEB Ltd. > +*/ > + > +#define lj_memprof_c > +#define LUA_CORE > + > +#include > + > +#include "lj_memprof.h" > +#include "lj_def.h" > +#include "lua.h" This include is excess. > + > +#include "lj_obj.h" > +#include "lj_frame.h" > +#include "lj_debug.h" > +#include "lj_gc.h" This include is excess. > +#include "lj_wbuf.h" > + > +/* --------------------------------- Symtab --------------------------------- */ > +static void symtab_write_prologue(struct lj_wbuf *out) > +{ > + const size_t len = sizeof(ljs_header) / sizeof(ljs_header[0]); > + lj_wbuf_addn(out, ljs_header, len); > +} Minor: Again, I guess this function semantics can be moved right to its caller, similar like you emit the "epilogue" (i.e. SYMTAB_FINAL byte). > + > +/* ---------------------------- Memory profiler ----------------------------- */ > +struct alloc { > + lua_Alloc allocf; /* Allocating function. */ > + void *state; /* Opaque allocator's state. */ > +}; Minor: This structure can be used in to store the default allocator. Feel free to ignore. > + > +static void memprof_write_lfunc(struct lj_wbuf *out, uint8_t header, > + GCfunc *fn, struct lua_State *L, > + cTValue *nextframe) > +{ > + const BCLine line = lj_debug_frameline(L, fn, nextframe); > + lj_wbuf_addbyte(out, header | ASOURCE_LFUNC); > + lj_wbuf_addu64(out, (uintptr_t)funcproto(fn)); > + lj_wbuf_addu64(out, line >= 0 ? (uintptr_t)line : 0); As we discussed offline, I have two notes for this line: * When the value can be negative? * When the value can be zero? Furthermore, I have no idea, why is casted to . > +} > +static void memprof_write_func(struct memprof *mp, uint8_t header) > +{ > + struct lj_wbuf *out = &mp->out; > + lua_State *L = gco2th(gcref(mp->g->mem_L)); > + cTValue *frame = L->base - 1; > + GCfunc *fn; > + > + fn = frame_func(frame); Minor: Why does this line differ from those above? > + > +static void memprof_write_hvmstate(struct memprof *mp, uint8_t header) > +{ > + lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT); > +} > + > +/* > +** XXX: In ideal world, we should report allocations from traces as well. > +** But since traces must follow the semantics of the original code, behaviour of > +** Lua and JITted code must match 1:1 in terms of allocations, which makes > +** using memprof with enabled JIT virtually redundant. Hence the stub below. > +*/ I guess you can drop the function below (since it simply duplicates the INTERNAL allocation semantics), *but* the comment above is nice, so you can move it to the default function or to the corresponding item in set. > +static void memprof_write_trace(struct memprof *mp, uint8_t header) > +{ > + lj_wbuf_addbyte(&mp->out, header | ASOURCE_INT); > +} > + > +typedef void (*memprof_writer)(struct memprof *mp, uint8_t header); Minor: I guess this is not
but . > + > +static void memprof_write_caller(struct memprof *mp, uint8_t aevent) > +{ > + const global_State *g = mp->g; > + const uint32_t _vmstate = (uint32_t)~g->vmstate; > + const uint32_t vmstate = _vmstate < LJ_VMST_TRACE ? _vmstate : LJ_VMST_TRACE; > + const uint8_t header = aevent; General Genius :) > + > + memprof_writers[vmstate](mp, header); > +} > +static void *memprof_allocf(void *ud, void *ptr, size_t osize, size_t nsize) > +{ > + struct memprof *mp = &memprof; This should be "const", IMHO. > + struct alloc *oalloc = &mp->orig_alloc; > +} > + > +static void memprof_write_prologue(struct lj_wbuf *out) > +{ > + const size_t len = sizeof(ljm_header) / sizeof(ljm_header[0]); > + lj_wbuf_addn(out, ljm_header, len); > +} See comments for . > + > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt) Side note: it's better to wrap this function to move all mutex-related work in this wrapper. > +{ > + lua_assert(opt->writer != NULL && opt->on_stop != NULL); > + lua_assert(opt->buf != NULL && opt->len != 0); Do these asserts depend on each other? If no, please split them into the separate ones. > + > + if (LJ_UNLIKELY(lj_wbuf_test_flag(&mp->out, STREAM_ERR_IO) || > + lj_wbuf_test_flag(&mp->out, STREAM_STOP))) { You can test (STREAM_ERR_IO|STREAM_STOP) here. > + /* on_stop call may change errno value. */ > +} > + > +static int memprof_stop(const struct lua_State *L) > +{ > + if (L != NULL && mp->g != G(L)) { This is a nice check (but looks redundant for the current version). Why did you make it optional (if L is given)? > + memprof_unlock(); > + return PROFILE_ERR; > + } > + main_L = mainthread(mp->g); Why do you use main coroutine here instead of the given one? > + > + if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_STOP))) { > + lua_assert(lj_wbuf_test_flag(out, STREAM_ERR_IO)); > + mp->state = MPS_HALT; > + /* on_stop call may change errno value. */ > + mp->saved_errno = lj_wbuf_errno(out); > + /* Ignore possible errors. mp->opt.buf == NULL here. */ > + mp->opt.on_stop(mp->opt.ctx, mp->opt.buf); > + lj_wbuf_terminate(out); > + memprof_unlock(); > + return PROFILE_ERRIO; > + } > + cb_status = mp->opt.on_stop(mp->opt.ctx, mp->opt.buf); > + if (LJ_UNLIKELY(lj_wbuf_test_flag(out, STREAM_ERR_IO) || cb_status != 0)) { > + saved_errno = lj_wbuf_errno(out); > + return_status = PROFILE_ERRIO; Well, you introduce a separate variable for the status. For what? I think this branch is the same as the "stop" one above. So either use goto to the error branch or duplicate the logic from the "stop" branch. In general, this looks like a mess: I almost lost the mutex unlock and the buffer termination. > + } > + > + lj_wbuf_terminate(out); > + > + memprof_unlock(); > + errno = saved_errno; > + return return_status; > +} > + > +int lj_memprof_stop(void) Why do you omit the coroutine argument for this function? > +{ > +int lj_memprof_stop_vm(const struct lua_State *L) This function is not used anywhere. Please drop it. > +{ > +int lj_memprof_is_running(void) This function is used only on global state finalization. However, it is excess (see the comment below), so I believe it can be dropped. > +{ > diff --git a/src/lj_memprof.h b/src/lj_memprof.h > new file mode 100644 > index 0000000..a96b72f > --- /dev/null > +++ b/src/lj_memprof.h > @@ -0,0 +1,165 @@ > +#include > +#include lj_def.h is definitely enough here. > + > +#define SYMTAB_FFUNC ((uint8_t)1) > +#define SYMTAB_CFUNC ((uint8_t)2) > +#define SYMTAB_TRACE ((uint8_t)3) These defines are unused. > +#define SYMTAB_FINAL ((uint8_t)0x80) > +/* Profiler public API. */ > +#define PROFILE_SUCCESS 0 > +#define PROFILE_ERR 1 Minor: Considering the usage looks better to me. Feel free to ignore. > +#define PROFILE_ERRRUN 2 > +#define PROFILE_ERRMEM 3 > +#define PROFILE_ERRIO 4 > + > +/* Profiler options. */ > +struct lua_Prof_options { Typo: s/lua_Prof/lj_memprof/ since is used for exported members. > + /* Context for the profile writer and final callback. */ > + void *ctx; > + /* Custom buffer to write data. */ > + uint8_t *buf; > + /* The buffer's size. */ > + size_t len; > + /* > + ** Writer function for profile events. > + ** Should return amount of written bytes on success or zero in case of error. > + ** Setting *data to NULL means end of profiling. > + */ Why don't you use typedef below? So you can reference for the writer contract. > + size_t (*writer)(const void **data, size_t len, void *ctx); > + /* > + ** Callback on profiler stopping. Required for correctly cleaning > + ** at vm shoutdown when profiler still running. Typo: s/VM shoutdown/VM finalization/. Typo: s/profiled still running/profiler is still running/. > + ** Returns zero on success. > + */ > + int (*on_stop)(void *ctx, uint8_t *buf); > +}; > + > +/* Avoid to provide additional interfaces described in other headers. */ It looks like cargo cult, IMHO. What is the reason? > +struct lua_State; > + > +/* > +** Starts profiling. Returns LUAM_PROFILE_SUCCESS on success and one of > +** LUAM_PROFILE_ERR* codes otherwise. Destructor is called in case of > +** LUAM_PROFILE_ERRIO. Typo: s/LUAM_PROFILE_*/PROFILE_*/g. > +*/ > +int lj_memprof_start(struct lua_State *L, const struct lua_Prof_options *opt); > + > +/* > +** Stops profiling. Returns LUAM_PROFILE_SUCCESS on success and one of > +** LUAM_PROFILE_ERR* codes otherwise. If writer() function returns zero > +** on call at buffer flush, profiled stream stops, or on_stop() callback > +** returns non-zero value, returns LUAM_PROFILE_ERRIO. > +*/ > diff --git a/src/lj_obj.h b/src/lj_obj.h > index c94617d..c94b0bb 100644 > --- a/src/lj_obj.h > +++ b/src/lj_obj.h > @@ -523,6 +523,14 @@ enum { > LJ_VMST__MAX > }; > > +/* > +** PROFILER HACK: VM is inside a trace. This is a pseudo-state used by profiler. > +** In fact, when VM executes a trace, vmstate is set to the trace number, but > +** we aggregate all such cases into one VM state during per-VM state profiling. > +*/ Strictly saying, this is not a "profiler hack", but rather LuaJIT-wide one. If is less than LJ_VMST__MAX it is considered as a trace number and all LuaJIT universe works with assumtions the trace is being run. It looks natural to me to move this change to the previous patch related to VM states slivering into *FUNC set to close this story there. > + > +#define LJ_VMST_TRACE (LJ_VMST__MAX) > + > #define setvmstate(g, st) ((g)->vmstate = ~LJ_VMST_##st) > > /* Metamethods. ORDER MM */ > diff --git a/src/lj_state.c b/src/lj_state.c > index 1d9c628..6c46e3d 100644 > --- a/src/lj_state.c > +++ b/src/lj_state.c > @@ -243,6 +247,10 @@ LUA_API void lua_close(lua_State *L) > global_State *g = G(L); > int i; > L = mainthread(g); /* Only the main thread can be closed. */ > +#if LJ_HASMEMPROF > + if (lj_memprof_is_running()) This check is excess, since you don't check the return value below. If is called when profiler doesn't work, PROFILE_ERRRUN or PROFILE_ERRIO is yield. > + lj_memprof_stop(); > +#endif > #if LJ_HASPROFILE > luaJIT_profile_stop(L); > #endif > -- > 2.28.0 > [1]: https://github.com/tarantool/luajit/blob/tarantool/src/lj_profile.c#L84L89 -- Best regards, IM