* [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests @ 2023-11-28 14:53 Igor Munkin via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine Igor Munkin via Tarantool-patches ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-11-28 14:53 UTC (permalink / raw) To: Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches Hello there, The latter patch of the patchset provides the new approach for deterministic testing for our sampling profiler. See more info in the commit message. The first patch fixes the issue occurred while reimplementing the sysprof test via fork(3) + ptrace(2): the output for prove was buffered and hence duplicated when fork is done. I decided to turn of buffering at all, since there is little sense in it for tests and nobody wants to debug many (still unrevealed) related problems. I hope the approach will be moved to our utils for tests written in C, but I do not see the whole picture at the moment, so the approach is implemented for the only test being affected by the patch for #8594. Branch: https://github.com/tarantool/luajit/commits/imun/sysprof-ptrace-ffunc-test CI: https://github.com/tarantool/luajit/commit/b48b905 Tarantool CI: https://github.com/tarantool/tarantool/pull/9424 Related issues: * https://github.com/tarantool/tarantool/issues/9387 * https://github.com/tarantool/tarantool/issues/8594 * https://github.com/tarantool/tarantool/issues/7900 Igor Munkin (2): test: disable buffering for the C test engine test: rewrite sysprof test using managed execution .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ test/tarantool-c-tests/test.c | 6 + .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- 3 files changed, 275 insertions(+), 55 deletions(-) create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua -- 2.39.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine 2023-11-28 14:53 [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Igor Munkin via Tarantool-patches @ 2023-11-28 14:53 ` Igor Munkin via Tarantool-patches 2023-12-03 12:25 ` Maxim Kokryashkin via Tarantool-patches 2023-12-04 8:46 ` Sergey Kaplun via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution Igor Munkin via Tarantool-patches ` (2 subsequent siblings) 3 siblings, 2 replies; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-11-28 14:53 UTC (permalink / raw) To: Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches Our testing engine for the tests implemented in C introduced in scope of the commit a0483bd214f2bbc9d7e5fc95ebc0ae13e8d22bcc ("test: introduce module for C tests") lazily flushes the TAP-formatted report to the <stdout>. This might lead to a mess in a report e.g. in case when the particular case forks a child (that inherits its <stdout>) prior to TAP header is yield to <stdout>. Hence the patch disables buffering for <stdout> before any part of the TAP-formatted report is printed. Follows up tarantool/tarantool#7900 Signed-off-by: Igor Munkin <imun@tarantool.org> --- test/tarantool-c-tests/test.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/test/tarantool-c-tests/test.c b/test/tarantool-c-tests/test.c index 7907c12a..0a370cdd 100644 --- a/test/tarantool-c-tests/test.c +++ b/test/tarantool-c-tests/test.c @@ -228,6 +228,12 @@ static int test_run(const struct test_unit *test, size_t test_number, int _test_run_group(const char *group_name, const struct test_unit tests[], size_t n_tests, void *test_state) { + /* + * XXX: Disable buffering for stdout to not mess with the + * output in case there are forking tests in the group. + */ + setvbuf(stdout, NULL, _IONBF, 0); + test_start_comment(group_name); level++; -- 2.39.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine Igor Munkin via Tarantool-patches @ 2023-12-03 12:25 ` Maxim Kokryashkin via Tarantool-patches 2023-12-04 9:48 ` Igor Munkin via Tarantool-patches 2023-12-04 8:46 ` Sergey Kaplun via Tarantool-patches 1 sibling, 1 reply; 17+ messages in thread From: Maxim Kokryashkin via Tarantool-patches @ 2023-12-03 12:25 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi, Igor! Thanks for the patch! LGTM, except for a few typos in the commit message. On Tue, Nov 28, 2023 at 02:53:16PM +0000, Igor Munkin wrote: > Our testing engine for the tests implemented in C introduced in scope of > the commit a0483bd214f2bbc9d7e5fc95ebc0ae13e8d22bcc ("test: introduce > module for C tests") lazily flushes the TAP-formatted report to the > <stdout>. This might lead to a mess in a report e.g. in case when the Typo: s/e.g. in case/, e.g.,/ > particular case forks a child (that inherits its <stdout>) prior to TAP Typo: s/TAP/the TAP/ > header is yield to <stdout>. Hence the patch disables buffering for Typo: s/is yield/being yielded/ Typo: s/Hence/Hence,/ > <stdout> before any part of the TAP-formatted report is printed. > > Follows up tarantool/tarantool#7900 > > Signed-off-by: Igor Munkin <imun@tarantool.org> > --- > test/tarantool-c-tests/test.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/test/tarantool-c-tests/test.c b/test/tarantool-c-tests/test.c > index 7907c12a..0a370cdd 100644 > --- a/test/tarantool-c-tests/test.c > +++ b/test/tarantool-c-tests/test.c > @@ -228,6 +228,12 @@ static int test_run(const struct test_unit *test, size_t test_number, > int _test_run_group(const char *group_name, const struct test_unit tests[], > size_t n_tests, void *test_state) > { > + /* > + * XXX: Disable buffering for stdout to not mess with the > + * output in case there are forking tests in the group. > + */ > + setvbuf(stdout, NULL, _IONBF, 0); > + > test_start_comment(group_name); > > level++; > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine 2023-12-03 12:25 ` Maxim Kokryashkin via Tarantool-patches @ 2023-12-04 9:48 ` Igor Munkin via Tarantool-patches 0 siblings, 0 replies; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-12-04 9:48 UTC (permalink / raw) To: Maxim Kokryashkin; +Cc: tarantool-patches Max, Thanks for your review! On 03.12.23, Maxim Kokryashkin wrote: > Hi, Igor! > Thanks for the patch! > LGTM, except for a few typos in the commit message. > On Tue, Nov 28, 2023 at 02:53:16PM +0000, Igor Munkin wrote: > > Our testing engine for the tests implemented in C introduced in scope of > > the commit a0483bd214f2bbc9d7e5fc95ebc0ae13e8d22bcc ("test: introduce > > module for C tests") lazily flushes the TAP-formatted report to the > > <stdout>. This might lead to a mess in a report e.g. in case when the > Typo: s/e.g. in case/, e.g.,/ > > particular case forks a child (that inherits its <stdout>) prior to TAP > Typo: s/TAP/the TAP/ > > header is yield to <stdout>. Hence the patch disables buffering for > Typo: s/is yield/being yielded/ > Typo: s/Hence/Hence,/ > > <stdout> before any part of the TAP-formatted report is printed. > > > > Follows up tarantool/tarantool#7900 The new commit message is below. ================================================================================ test: disable buffering for the C test engine Our testing engine for the tests implemented in C introduced in scope of the commit a0483bd214f2bbc9d7e5fc95ebc0ae13e8d22bcc ("test: introduce module for C tests") lazily flushes the TAP-formatted report to the <stdout>. This might lead to a mess in a report e.g., in case when the particular case forks a child (that inherits its <stdout>) prior to the TAP header being yield to <stdout>. Hence, the patch disables buffering for <stdout> before any part of the TAP-formatted report is printed. Follows up tarantool/tarantool#7900 ================================================================================ > > > > Signed-off-by: Igor Munkin <imun@tarantool.org> > > --- <snipped> > > -- > > 2.39.2 > > -- Best regards, IM ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine Igor Munkin via Tarantool-patches 2023-12-03 12:25 ` Maxim Kokryashkin via Tarantool-patches @ 2023-12-04 8:46 ` Sergey Kaplun via Tarantool-patches 2023-12-04 9:50 ` Igor Munkin via Tarantool-patches 1 sibling, 1 reply; 17+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2023-12-04 8:46 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi, Igor! Thanks for the patch! LGTM, after fixing Max's comments and my nit below. On 28.11.23, Igor Munkin wrote: <snipped> > --- > test/tarantool-c-tests/test.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/test/tarantool-c-tests/test.c b/test/tarantool-c-tests/test.c > index 7907c12a..0a370cdd 100644 > --- a/test/tarantool-c-tests/test.c > +++ b/test/tarantool-c-tests/test.c > @@ -228,6 +228,12 @@ static int test_run(const struct test_unit *test, size_t test_number, > int _test_run_group(const char *group_name, const struct test_unit tests[], > size_t n_tests, void *test_state) > { > + /* > + * XXX: Disable buffering for stdout to not mess with the > + * output in case there are forking tests in the group. > + */ > + setvbuf(stdout, NULL, _IONBF, 0); So, we can remove `fflush(stdout);` in the `test_finish()`. > + > test_start_comment(group_name); > > level++; > -- > 2.39.2 > -- Best regards, Sergey Kaplun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine 2023-12-04 8:46 ` Sergey Kaplun via Tarantool-patches @ 2023-12-04 9:50 ` Igor Munkin via Tarantool-patches 0 siblings, 0 replies; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-12-04 9:50 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches Sergey, Thanks for your review! On 04.12.23, Sergey Kaplun wrote: > Hi, Igor! > Thanks for the patch! > LGTM, after fixing Max's comments and my nit below. > > On 28.11.23, Igor Munkin wrote: > > <snipped> > > > --- > > test/tarantool-c-tests/test.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/test/tarantool-c-tests/test.c b/test/tarantool-c-tests/test.c > > index 7907c12a..0a370cdd 100644 > > --- a/test/tarantool-c-tests/test.c > > +++ b/test/tarantool-c-tests/test.c > > @@ -228,6 +228,12 @@ static int test_run(const struct test_unit *test, size_t test_number, > > int _test_run_group(const char *group_name, const struct test_unit tests[], > > size_t n_tests, void *test_state) > > { > > + /* > > + * XXX: Disable buffering for stdout to not mess with the > > + * output in case there are forking tests in the group. > > + */ > > + setvbuf(stdout, NULL, _IONBF, 0); > > So, we can remove `fflush(stdout);` in the `test_finish()`. Thanks for noticing. Removed. ================================================================================ diff --git a/test/tarantool-c-tests/test.c b/test/tarantool-c-tests/test.c index 0a370cdd..c3387e9a 100644 --- a/test/tarantool-c-tests/test.c +++ b/test/tarantool-c-tests/test.c @@ -113,7 +113,6 @@ static void test_finish(size_t planned, size_t failed) if (failed > 0) test_comment("Failed %lu %s out of %lu", failed, t_type, planned); - fflush(stdout); } void test_set_skip_reason(const char *reason) ================================================================================ > > > + > > test_start_comment(group_name); > > > > level++; > > -- > > 2.39.2 > > > > -- > Best regards, > Sergey Kaplun -- Best regards, IM ^ permalink raw reply [flat|nested] 17+ messages in thread
* [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-11-28 14:53 [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Igor Munkin via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine Igor Munkin via Tarantool-patches @ 2023-11-28 14:53 ` Igor Munkin via Tarantool-patches 2023-12-03 14:17 ` Maxim Kokryashkin via Tarantool-patches 2023-11-28 16:14 ` [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Sergey Bronnikov via Tarantool-patches 2024-01-10 8:50 ` Igor Munkin via Tarantool-patches 3 siblings, 1 reply; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-11-28 14:53 UTC (permalink / raw) To: Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches Often tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus the test added in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix crash during FFUNC stream") expects FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to test routine hang for several environments and if it does not we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace> implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof to collect the particular event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence it's enough to call <tostring> fast function in the profiled instance (i.e. "tracee"). To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, the manager (i.e. "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resumimg the "tracee"; As a result sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594 Signed-off-by: Igor Munkin <imun@tarantool.org> --- .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- 2 files changed, 269 insertions(+), 55 deletions(-) create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c new file mode 100644 index 00000000..4caed8cb --- /dev/null +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -0,0 +1,269 @@ +#include "lauxlib.h" +#include "lmisclib.h" +#include "lua.h" + +#include "test.h" +#include "utils.h" + +#include <signal.h> +#include <sys/ptrace.h> +#include <sys/user.h> +#include <sys/wait.h> +#include <unistd.h> + +/* XXX: Still need normal assert inside <tracee> and helpers. */ +#undef NDEBUG +#include <assert.h> + +/* + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so + * does the sampling profiler. <lj_arch.> is needed for LUAJIT_OS + * and LUAJIT_TARGET. + */ +#include "lj_arch.h" + +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 + +/* + * XXX: The test makes sysprof to collect the particular event + * (FFUNC) at the particular instruction (<lj_fff_res1>) to + * reproduce the issue #8594. Hence it's enough to call <tostring> + * fast function (this is done in <tracee> function). To emit + * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast + * function, the managed execution is implemented in the function + * <tracer>: <int 3> instruction is poisoned as the first + * instruction at <lj_ff_tostring> to stop <tracee> at the + * beginning of the fast function; <tracer> resumes the <tracee>; + * the same hack is done for <lj_fff_res1>. When the <tracee> hits + * the interruption at <lj_fff_res1>, SIGPROF is emitted while + * resuming the <tracee>. As a result sysprof collects the full + * backtrace with <tostring> fast function as the topmost frame. + * + * See more info here: + * * https://man7.org/linux/man-pages/man2/ptrace.2.html + * * https://github.com/tarantool/tarantool/issues/8594 + * * https://github.com/tarantool/tarantool/issues/9387 + */ + +#define MESSAGE "Canary is alive" +#define LUACALL "local a = tostring('" MESSAGE "') return a" +/* XXX: Resolve the necessary addresses from VM engine. */ +extern void *lj_ff_tostring(void); +extern void *lj_fff_res1(void); + +/* Sysprof "/dev/null" stream helpers. {{{ */ + +/* + * Yep, 8Mb. Tuned in order not to bother the platform with too + * often flushes. + */ +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) +#define DEVNULL -1 + +struct devnull_ctx { + /* + * XXX: Dummy file descriptor to be used as "/dev/null" + * context indicator in writer and on_stop callback. + */ + int fd; + /* Buffer for data recorded by sysprof. */ + uint8_t buf[STREAM_BUFFER_SIZE]; +}; + +static int stream_new(struct luam_Sysprof_Options *options) { + struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx)); + if (ctx == NULL) + return PROFILE_ERRIO; + + /* Set "/dev/null" context indicator. */ + ctx->fd = DEVNULL; + options->ctx = ctx; + options->buf = ctx->buf; + options->len = STREAM_BUFFER_SIZE; + + return PROFILE_SUCCESS; +} + +static int stream_delete(void *rawctx, uint8_t *buf) { + struct devnull_ctx *ctx = rawctx; + assert(ctx->fd == DEVNULL); + free(ctx); + return PROFILE_SUCCESS; +} + +static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { + struct devnull_ctx *ctx = rawctx; + assert(ctx->fd == DEVNULL); + /* Do nothing, just return back to the profiler. */ + return STREAM_BUFFER_SIZE; +} + +/* }}} Sysprof "/dev/null" stream helpers. */ + +static int tracee(const char *luacode) { + struct luam_Sysprof_Counters counters = {}; + struct luam_Sysprof_Options opt = { + /* Collect full backtraces per event. */ + .mode = LUAM_SYSPROF_CALLGRAPH, + /* + * XXX: Setting the "endless timer". The test + * requires the single event to be streamed at + * <lj_fff_res1> instruction, so to avoid spoiling + * the stream with other unwanted events, the + * timer is set to some unreachable point, so the + * profiler will be guaranteed to stop before any + * event is emitted. + */ + .interval = -1ULL, + }; + + /* Allow tracing for this process */ + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { + perror("Failed to turn the calling thread into a tracee"); + return EXIT_FAILURE; + } + + /* + * XXX: Allow parent (which is our tracer now) to observe + * our signal-delivery-stop (i.e. the tracee is ready). + * For more info see ptrace(2), "Attaching and detaching". + */ + raise(SIGSTOP); + + lua_State *L = utils_lua_init(); + + /* Customize and start profiler. */ + assert(stream_new(&opt) == PROFILE_SUCCESS); + assert(luaM_sysprof_set_writer(stream_writer) == PROFILE_SUCCESS); + assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS); + assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS); + + /* FIXME: Make this part test agnostic. */ + assert(luaL_dostring(L, luacode) == LUA_OK); + assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); + + /* Terminate profiler and Lua universe. */ + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); + utils_lua_close(L); + + /* + * XXX: The only event to be streamed must be FFUNC at + * <lj_fff_res1> instruction. + * FIXME: Make this part test agnostic. + */ + assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS); + assert(counters.samples == 1); + assert(counters.vmst_ffunc == 1); + + return EXIT_SUCCESS; +} + +const uint8_t INT3 = 0xCC; +static inline unsigned long int3poison(unsigned long instruction) { + const size_t int3bits = sizeof(INT3) * 8; + const unsigned long int3mask = -1UL >> int3bits << int3bits; + return (instruction & int3mask) | INT3; +} + +static int continue_until(pid_t chpid, void *addr) { + int wstatus; + struct user_regs_struct regs; + + /* Obtain the instructions at the <addr>. */ + unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0); + /* + * Emit the <int 3> instruction to the <addr>. + * XXX: <int 3> is poisoned as the LSB to the <data> + * obtained from the <addr> above. + */ + ptrace(PTRACE_POKETEXT, chpid, addr, int3poison(data)); + + /* Resume <chpid> tracee until SIGTRAP occurs. */ + ptrace(PTRACE_CONT, chpid, 0, 0); + /* Wait <chpid> tracee signal-delivery-stop. */ + waitpid(chpid, &wstatus, 0); + + /* Obtain GPR set to tweak RIP for further execution. */ + ptrace(PTRACE_GETREGS, chpid, 0, ®s); + /* + * Make sure we indeed are stopped at <addr>. + * XXX: RIP points right after <int 3> instruction. + */ + assert(regs.rip == (long)addr + sizeof(INT3)); + + /* + * XXX: Restore the original instruction at <addr> and + * "rewind" RIP by <int 3> size to "replay" the poisoned + * instruction at the <addr>. + */ + regs.rip -= sizeof(INT3); + ptrace(PTRACE_SETREGS, chpid, 0, ®s); + ptrace(PTRACE_POKETEXT, chpid, addr, data); + + /* Return wait status to the caller for test checks. */ + return wstatus; +} + +static int tracer(pid_t chpid) { + int wstatus; + + /* Wait until <chpid> tracee is ready. */ + waitpid(chpid, &wstatus, 0); + + /* Resume <chpid> tracee until <lj_ff_tostring>. */ + wstatus = continue_until(chpid, lj_ff_tostring); + + /* The tracee has to be alive and stopped by SIGTRAP. */ + assert_false(WIFEXITED(wstatus)); + assert_true(WIFSTOPPED(wstatus)); + + /* Resume <chpid> tracee until <lj_fff_res1>. */ + wstatus = continue_until(chpid, lj_fff_res1); + + /* The tracee has to be alive and stopped by SIGTRAP. */ + assert_false(WIFEXITED(wstatus)); + assert_true(WIFSTOPPED(wstatus)); + + /* Send SIGPROF to make sysprof collect the event. */ + ptrace(PTRACE_CONT, chpid, 0, SIGPROF); + + /* Wait until <chpid> tracee successfully exits. */ + waitpid(chpid, &wstatus, 0); + assert_true(WIFEXITED(wstatus)); + + return TEST_EXIT_SUCCESS; +} + +static int test_tostring_call(void *ctx) { + pid_t chpid = fork(); + switch(chpid) { + case -1: + bail_out("Tracee fork failed"); + case 0: + /* + * XXX: Tracee has to <exit> instead of <return> + * to avoid duplicate reports in <test_run_group>. + * Test assertions are used only in the <tracer>, + * so the <tracer> ought to report whether the + * test succeeded or not. + */ + exit(tracee(LUACALL)); + default: + return tracer(chpid); + } +} + +#else /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ + +static int test_tostring_call(void *ctx) { + return skip_all("sysprof is implemented for Linux/x86_64 only"); +} + +#endif /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ + +int main(void) { + const struct test_unit tgroup[] = { + test_unit_def(test_tostring_call), + }; + return test_run_group(tgroup, NULL); +} diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua deleted file mode 100644 index f8b409ae..00000000 --- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua +++ /dev/null @@ -1,55 +0,0 @@ -local tap = require('tap') -local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({ - ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and - jit.arch ~= 'x64', - ['Sysprof is implemented for Linux only'] = jit.os ~= 'Linux', - -- luacheck: no global - ['Prevent hanging Tarantool CI due to #9387'] = _TARANTOOL, -}) - -test:plan(1) - -jit.off() --- XXX: Run JIT tuning functions in a safe frame to avoid errors --- thrown when LuaJIT is compiled with JIT engine disabled. -pcall(jit.flush) - -local TMP_BINFILE = '/dev/null' - --- XXX: The best way to test the issue is to set the profile --- interval to be as short as possible. However, our CI is --- not capable of handling such intense testing, so it was a --- forced decision to reduce the sampling frequency for it. As a --- result, it is now less likely to reproduce the issue --- statistically, but the test case is still valid. - --- GitHub always sets[1] the `CI` environment variable to `true` --- for every step in a workflow. --- [1]: https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables -local CI = os.getenv('CI') == 'true' - --- Profile interval and number of iterations for CI are --- empirical. Non-CI profile interval is set to be as short --- as possible, so the issue is more likely to reproduce. --- Non-CI number of iterations is greater for the same reason. -local PROFILE_INTERVAL = CI and 3 or 1 -local N_ITERATIONS = CI and 1e5 or 1e6 - -local res, err = misc.sysprof.start{ - mode = 'C', - interval = PROFILE_INTERVAL, - path = TMP_BINFILE, -} -assert(res, err) - -for i = 1, N_ITERATIONS do - -- XXX: `tostring` is FFUNC. - tostring(i) -end - -res, err = misc.sysprof.stop() -assert(res, err) - -test:ok(true, 'FFUNC frames were streamed correctly') - -os.exit(test:check() and 0 or 1) -- 2.39.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution Igor Munkin via Tarantool-patches @ 2023-12-03 14:17 ` Maxim Kokryashkin via Tarantool-patches 2023-12-05 8:37 ` Sergey Kaplun via Tarantool-patches 2023-12-05 11:34 ` Igor Munkin via Tarantool-patches 0 siblings, 2 replies; 17+ messages in thread From: Maxim Kokryashkin via Tarantool-patches @ 2023-12-03 14:17 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi, Igor! Thanks for the patch! Please consider my comments below. Side note: This new testing approach is amazing, thanks a lot! On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote: > Often tests for sampling profiler require long running loops to be Typo: s/Often/Often,/ > executed, so a certain situation is likely to occur. Thus the test added Typo: s/Thus/Thus,/ > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix Typo: s/in the/in/ > crash during FFUNC stream") expects FFUNC VM state (and even the Typo: s/expects FFUNC/expects the FFUNC/ > particular instruction to be executed) at the moment when stacktrace is > being collected. Unfortunately, it leads to test routine hang for Typo: s/to test routine hang/to the test routine hanging/ > several environments and if it does not we cannot guarantee that the Typo: s/environments/environments,/ Typo: s/and if it does not/and even if it does not,/ > desired scenario is tested (only rely on statistics). As a result the Typo: s/As a result/As a result,/ > test for the aforementioned patch was disabled for Tarantool CI in the > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging > Tarantool CI by sysprof test") until the issue is not resolved. > > This patch introduces the new approach for testing our sampling profiler > via <ptrace> implementing precise managed execution of the profiled Typo: s/<ptrace>/<ptrace>,/ > instance mentioned in tarantool/tarantool#9387. > > Instead of running around <tostring> gazillion times we accurately step Typo: s/times/times,/ > to the exact place where the issue reproduces and manually emit SIGPROF > to the Lua VM being profiled. The particular approach implemented in > this patch is described below. > > As it was mentioned, the test makes sysprof to collect the particular Typo: s/to collect/collect/ > event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to Typo: s/Hence/Hence,/ > call <tostring> fast function in the profiled instance (i.e. "tracee"). Typo: s/i.e./i.e.,/ > To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, Typo: s/in scope/in the scope/ > the manager (i.e. "tracer") is implemented. Typo: s/i.e./i.e.,/ > > Here are the main steps (see comments and `man 2 ptrace' for more info): > 1. Poison <int 3> instruction as the first instruction at > <lj_ff_tostring> to stop at the beginning of the fast function; > 2. Resume the "tracee" from the "tracer"; > 3. Hit the emitted interruption, restore the original instruction and > "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; > 4. Do the hack 1-3 for <lj_fff_res1>; > 5. Emit SIGPROF while resumimg the "tracee"; Typo: s/resumimg/resuming/ > > As a result sysprof collects the full backtrace with <tostring> fast Typo: s/a result/a result,/ > function as the topmost frame. > > Resolves tarantool/tarantool#9387 > Follows up tarantool/tarantool#8594 > > Signed-off-by: Igor Munkin <imun@tarantool.org> > --- > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > 2 files changed, 269 insertions(+), 55 deletions(-) > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > new file mode 100644 > index 00000000..4caed8cb > --- /dev/null > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > @@ -0,0 +1,269 @@ > +#include "lauxlib.h" > +#include "lmisclib.h" > +#include "lua.h" > + > +#include "test.h" > +#include "utils.h" > + > +#include <signal.h> > +#include <sys/ptrace.h> > +#include <sys/user.h> > +#include <sys/wait.h> > +#include <unistd.h> > + > +/* XXX: Still need normal assert inside <tracee> and helpers. */ > +#undef NDEBUG > +#include <assert.h> > + > +/* > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so > + * does the sampling profiler. <lj_arch.> is needed for LUAJIT_OS > + * and LUAJIT_TARGET. > + */ > +#include "lj_arch.h" > + > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 Why not use the `LJ_HASSYSPROF` flag? > + > +/* > + * XXX: The test makes sysprof to collect the particular event Typo: s/sysprof to/sysprof/ > + * (FFUNC) at the particular instruction (<lj_fff_res1>) to > + * reproduce the issue #8594. Hence it's enough to call <tostring> Typo: s/Hence/Hence,/ > + * fast function (this is done in <tracee> function). To emit > + * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast > + * function, the managed execution is implemented in the function > + * <tracer>: <int 3> instruction is poisoned as the first > + * instruction at <lj_ff_tostring> to stop <tracee> at the > + * beginning of the fast function; <tracer> resumes the <tracee>; > + * the same hack is done for <lj_fff_res1>. When the <tracee> hits > + * the interruption at <lj_fff_res1>, SIGPROF is emitted while > + * resuming the <tracee>. As a result sysprof collects the full Typo: s/a result/a result,/ > + * backtrace with <tostring> fast function as the topmost frame. > + * > + * See more info here: > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html > + * * https://github.com/tarantool/tarantool/issues/8594 > + * * https://github.com/tarantool/tarantool/issues/9387 > + */ > + > +#define MESSAGE "Canary is alive" > +#define LUACALL "local a = tostring('" MESSAGE "') return a" > +/* XXX: Resolve the necessary addresses from VM engine. */ > +extern void *lj_ff_tostring(void); > +extern void *lj_fff_res1(void); > + > +/* Sysprof "/dev/null" stream helpers. {{{ */ > + > +/* > + * Yep, 8Mb. Tuned in order not to bother the platform with too > + * often flushes. > + */ > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) > +#define DEVNULL -1 > + > +struct devnull_ctx { > + /* > + * XXX: Dummy file descriptor to be used as "/dev/null" > + * context indicator in writer and on_stop callback. > + */ > + int fd; > + /* Buffer for data recorded by sysprof. */ > + uint8_t buf[STREAM_BUFFER_SIZE]; > +}; > + > +static int stream_new(struct luam_Sysprof_Options *options) { > + struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx)); > + if (ctx == NULL) > + return PROFILE_ERRIO; > + > + /* Set "/dev/null" context indicator. */ > + ctx->fd = DEVNULL; > + options->ctx = ctx; > + options->buf = ctx->buf; > + options->len = STREAM_BUFFER_SIZE; > + > + return PROFILE_SUCCESS; > +} > + > +static int stream_delete(void *rawctx, uint8_t *buf) { > + struct devnull_ctx *ctx = rawctx; > + assert(ctx->fd == DEVNULL); > + free(ctx); > + return PROFILE_SUCCESS; > +} > + > +static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { > + struct devnull_ctx *ctx = rawctx; > + assert(ctx->fd == DEVNULL); > + /* Do nothing, just return back to the profiler. */ > + return STREAM_BUFFER_SIZE; > +} > + > +/* }}} Sysprof "/dev/null" stream helpers. */ > + > +static int tracee(const char *luacode) { > + struct luam_Sysprof_Counters counters = {}; > + struct luam_Sysprof_Options opt = { > + /* Collect full backtraces per event. */ > + .mode = LUAM_SYSPROF_CALLGRAPH, > + /* > + * XXX: Setting the "endless timer". The test > + * requires the single event to be streamed at > + * <lj_fff_res1> instruction, so to avoid spoiling > + * the stream with other unwanted events, the > + * timer is set to some unreachable point, so the > + * profiler will be guaranteed to stop before any > + * event is emitted. > + */ > + .interval = -1ULL, > + }; > + > + /* Allow tracing for this process */ > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { > + perror("Failed to turn the calling thread into a tracee"); Typo?: s/thread/process/ > + return EXIT_FAILURE; > + } > + > + /* > + * XXX: Allow parent (which is our tracer now) to observe > + * our signal-delivery-stop (i.e. the tracee is ready). > + * For more info see ptrace(2), "Attaching and detaching". > + */ > + raise(SIGSTOP); > + > + lua_State *L = utils_lua_init(); > + > + /* Customize and start profiler. */ > + assert(stream_new(&opt) == PROFILE_SUCCESS); > + assert(luaM_sysprof_set_writer(stream_writer) == PROFILE_SUCCESS); > + assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS); > + assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS); > + > + /* FIXME: Make this part test agnostic. */ > + assert(luaL_dostring(L, luacode) == LUA_OK); > + assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); > + > + /* Terminate profiler and Lua universe. */ > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); > + utils_lua_close(L); It doesn't seem to be semantically correct to terminate Lua universe before obtaining the counters, even though there is no obstacles to do so in the API implementation itself. > + > + /* > + * XXX: The only event to be streamed must be FFUNC at > + * <lj_fff_res1> instruction. > + * FIXME: Make this part test agnostic. > + */ > + assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS); > + assert(counters.samples == 1); > + assert(counters.vmst_ffunc == 1); > + > + return EXIT_SUCCESS; > +} > + > +const uint8_t INT3 = 0xCC; > +static inline unsigned long int3poison(unsigned long instruction) { > + const size_t int3bits = sizeof(INT3) * 8; > + const unsigned long int3mask = -1UL >> int3bits << int3bits; > + return (instruction & int3mask) | INT3; > +} > + > +static int continue_until(pid_t chpid, void *addr) { > + int wstatus; > + struct user_regs_struct regs; > + > + /* Obtain the instructions at the <addr>. */ > + unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0); > + /* > + * Emit the <int 3> instruction to the <addr>. > + * XXX: <int 3> is poisoned as the LSB to the <data> > + * obtained from the <addr> above. > + */ > + ptrace(PTRACE_POKETEXT, chpid, addr, int3poison(data)); > + > + /* Resume <chpid> tracee until SIGTRAP occurs. */ > + ptrace(PTRACE_CONT, chpid, 0, 0); > + /* Wait <chpid> tracee signal-delivery-stop. */ > + waitpid(chpid, &wstatus, 0); Should we check the `wstatus` here too? > + > + /* Obtain GPR set to tweak RIP for further execution. */ > + ptrace(PTRACE_GETREGS, chpid, 0, ®s); > + /* > + * Make sure we indeed are stopped at <addr>. > + * XXX: RIP points right after <int 3> instruction. > + */ > + assert(regs.rip == (long)addr + sizeof(INT3)); > + > + /* > + * XXX: Restore the original instruction at <addr> and > + * "rewind" RIP by <int 3> size to "replay" the poisoned > + * instruction at the <addr>. > + */ > + regs.rip -= sizeof(INT3); > + ptrace(PTRACE_SETREGS, chpid, 0, ®s); > + ptrace(PTRACE_POKETEXT, chpid, addr, data); > + > + /* Return wait status to the caller for test checks. */ > + return wstatus; > +} > + > +static int tracer(pid_t chpid) { > + int wstatus; > + > + /* Wait until <chpid> tracee is ready. */ > + waitpid(chpid, &wstatus, 0); Should we check the process status here too? > + > + /* Resume <chpid> tracee until <lj_ff_tostring>. */ > + wstatus = continue_until(chpid, lj_ff_tostring); > + > + /* The tracee has to be alive and stopped by SIGTRAP. */ > + assert_false(WIFEXITED(wstatus)); > + assert_true(WIFSTOPPED(wstatus)); > + > + /* Resume <chpid> tracee until <lj_fff_res1>. */ > + wstatus = continue_until(chpid, lj_fff_res1); > + > + /* The tracee has to be alive and stopped by SIGTRAP. */ > + assert_false(WIFEXITED(wstatus)); > + assert_true(WIFSTOPPED(wstatus)); This part is duplicated, let's move it to a separate `healthcheck` function, or something like that. > + > + /* Send SIGPROF to make sysprof collect the event. */ > + ptrace(PTRACE_CONT, chpid, 0, SIGPROF); > + > + /* Wait until <chpid> tracee successfully exits. */ > + waitpid(chpid, &wstatus, 0); > + assert_true(WIFEXITED(wstatus)); > + > + return TEST_EXIT_SUCCESS; > +} > + > +static int test_tostring_call(void *ctx) { > + pid_t chpid = fork(); > + switch(chpid) { > + case -1: > + bail_out("Tracee fork failed"); > + case 0: > + /* > + * XXX: Tracee has to <exit> instead of <return> > + * to avoid duplicate reports in <test_run_group>. > + * Test assertions are used only in the <tracer>, > + * so the <tracer> ought to report whether the > + * test succeeded or not. > + */ > + exit(tracee(LUACALL)); > + default: > + return tracer(chpid); > + } > +} > + > +#else /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ > + > +static int test_tostring_call(void *ctx) { > + return skip_all("sysprof is implemented for Linux/x86_64 only"); > +} > + > +#endif /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ > + > +int main(void) { > + const struct test_unit tgroup[] = { > + test_unit_def(test_tostring_call), > + }; > + return test_run_group(tgroup, NULL); > +} > diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > deleted file mode 100644 > index f8b409ae..00000000 > --- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > +++ /dev/null <snipped> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-03 14:17 ` Maxim Kokryashkin via Tarantool-patches @ 2023-12-05 8:37 ` Sergey Kaplun via Tarantool-patches 2023-12-05 12:04 ` Igor Munkin via Tarantool-patches 2023-12-05 11:34 ` Igor Munkin via Tarantool-patches 1 sibling, 1 reply; 17+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2023-12-05 8:37 UTC (permalink / raw) To: Maxim Kokryashkin; +Cc: tarantool-patches Hi, Igor! Thanks for the patch! Really awesome improvement for our sysprof testing! Please consider my comments below. On 03.12.23, Maxim Kokryashkin wrote: > Hi, Igor! > Thanks for the patch! > Please consider my comments below. > > Side note: This new testing approach is amazing, thanks a lot! > > On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote: > > Often tests for sampling profiler require long running loops to be > Typo: s/Often/Often,/ > > executed, so a certain situation is likely to occur. Thus the test added > Typo: s/Thus/Thus,/ > > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix > Typo: s/in the/in/ > > crash during FFUNC stream") expects FFUNC VM state (and even the > Typo: s/expects FFUNC/expects the FFUNC/ > > particular instruction to be executed) at the moment when stacktrace is > > being collected. Unfortunately, it leads to test routine hang for > Typo: s/to test routine hang/to the test routine hanging/ > > several environments and if it does not we cannot guarantee that the > Typo: s/environments/environments,/ > Typo: s/and if it does not/and even if it does not,/ > > desired scenario is tested (only rely on statistics). As a result the > Typo: s/As a result/As a result,/ > > test for the aforementioned patch was disabled for Tarantool CI in the > > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging > > Tarantool CI by sysprof test") until the issue is not resolved. > > > > This patch introduces the new approach for testing our sampling profiler > > via <ptrace> implementing precise managed execution of the profiled > Typo: s/<ptrace>/<ptrace>,/ > > instance mentioned in tarantool/tarantool#9387. > > > > Instead of running around <tostring> gazillion times we accurately step > Typo: s/times/times,/ > > to the exact place where the issue reproduces and manually emit SIGPROF > > to the Lua VM being profiled. The particular approach implemented in > > this patch is described below. > > > > As it was mentioned, the test makes sysprof to collect the particular > Typo: s/to collect/collect/ > > event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to > > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to > Typo: s/Hence/Hence,/ > > call <tostring> fast function in the profiled instance (i.e. "tracee"). > Typo: s/i.e./i.e.,/ > > To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, > Typo: s/in scope/in the scope/ > > the manager (i.e. "tracer") is implemented. > Typo: s/i.e./i.e.,/ > > > > Here are the main steps (see comments and `man 2 ptrace' for more info): > > 1. Poison <int 3> instruction as the first instruction at > > <lj_ff_tostring> to stop at the beginning of the fast function; > > 2. Resume the "tracee" from the "tracer"; > > 3. Hit the emitted interruption, restore the original instruction and > > "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; > > 4. Do the hack 1-3 for <lj_fff_res1>; > > 5. Emit SIGPROF while resumimg the "tracee"; > Typo: s/resumimg/resuming/ > > > > As a result sysprof collects the full backtrace with <tostring> fast > Typo: s/a result/a result,/ > > function as the topmost frame. > > > > Resolves tarantool/tarantool#9387 > > Follows up tarantool/tarantool#8594 > > > > Signed-off-by: Igor Munkin <imun@tarantool.org> > > --- > > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > > 2 files changed, 269 insertions(+), 55 deletions(-) > > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > new file mode 100644 > > index 00000000..4caed8cb > > --- /dev/null > > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > @@ -0,0 +1,269 @@ > > +#include "lauxlib.h" > > +#include "lmisclib.h" > > +#include "lua.h" > > + > > +#include "test.h" > > +#include "utils.h" > > + > > +#include <signal.h> > > +#include <sys/ptrace.h> > > +#include <sys/user.h> > > +#include <sys/wait.h> > > +#include <unistd.h> > > + > > +/* XXX: Still need normal assert inside <tracee> and helpers. */ > > +#undef NDEBUG > > +#include <assert.h> > > + > > +/* > > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so > > + * does the sampling profiler. <lj_arch.> is needed for LUAJIT_OS > > + * and LUAJIT_TARGET. > > + */ > > +#include "lj_arch.h" > > + > > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 > Why not use the `LJ_HASSYSPROF` flag? I agree here. `LJ_HASSYSPROF` is more foolproof -- if we will port sysprof for arm64 (for example) we will not forget to update this test. > > + > > +/* > > + * XXX: The test makes sysprof to collect the particular event > Typo: s/sysprof to/sysprof/ > > + * (FFUNC) at the particular instruction (<lj_fff_res1>) to > > + * reproduce the issue #8594. Hence it's enough to call <tostring> > Typo: s/Hence/Hence,/ > > + * fast function (this is done in <tracee> function). To emit > > + * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast > > + * function, the managed execution is implemented in the function > > + * <tracer>: <int 3> instruction is poisoned as the first > > + * instruction at <lj_ff_tostring> to stop <tracee> at the > > + * beginning of the fast function; <tracer> resumes the <tracee>; > > + * the same hack is done for <lj_fff_res1>. When the <tracee> hits > > + * the interruption at <lj_fff_res1>, SIGPROF is emitted while > > + * resuming the <tracee>. As a result sysprof collects the full > Typo: s/a result/a result,/ > > + * backtrace with <tostring> fast function as the topmost frame. > > + * > > + * See more info here: > > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html > > + * * https://github.com/tarantool/tarantool/issues/8594 > > + * * https://github.com/tarantool/tarantool/issues/9387 Minor: This link may be useful too: https://c9x.me/x86/html/file_module_x86_id_142.html > > + */ > > + > > +#define MESSAGE "Canary is alive" > > +#define LUACALL "local a = tostring('" MESSAGE "') return a" Nit: I suggest adding an excess empty line here to separate extern asm functions from Lua code defines. > > +/* XXX: Resolve the necessary addresses from VM engine. */ > > +extern void *lj_ff_tostring(void); > > +extern void *lj_fff_res1(void); > > + > > +/* Sysprof "/dev/null" stream helpers. {{{ */ > > + > > +/* > > + * Yep, 8Mb. Tuned in order not to bother the platform with too > > + * often flushes. > > + */ This comment is misleading since there is no any flushes in the writer function. Also, do we need such big buffer? Maybe `PATH_MAX` * 2 (or something like that) should be enough? > > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) > > +#define DEVNULL -1 `DEVNULL` name is misleading (like we've actually opened `/dev/null`). Maybe `DUMMY_FD` is better? Also, I suppose we can drop the `fd` field at all. We can just use static structure in this translation unit and compare pointer against it, so there is no need for an additional field fd in the structure itself. > > + > > +struct devnull_ctx { > > + /* > > + * XXX: Dummy file descriptor to be used as "/dev/null" > > + * context indicator in writer and on_stop callback. > > + */ > > + int fd; See my comment above. > > + /* Buffer for data recorded by sysprof. */ > > + uint8_t buf[STREAM_BUFFER_SIZE]; > > +}; > > + > > +static int stream_new(struct luam_Sysprof_Options *options) { Please, use separate line for block start `{`. Here and below. > > + struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx)); > > + if (ctx == NULL) > > + return PROFILE_ERRIO; > > + > > + /* Set "/dev/null" context indicator. */ > > + ctx->fd = DEVNULL; > > + options->ctx = ctx; > > + options->buf = ctx->buf; > > + options->len = STREAM_BUFFER_SIZE; > > + > > + return PROFILE_SUCCESS; > > +} > > + > > +static int stream_delete(void *rawctx, uint8_t *buf) { > > + struct devnull_ctx *ctx = rawctx; > > + assert(ctx->fd == DEVNULL); > > + free(ctx); > > + return PROFILE_SUCCESS; > > +} > > + > > +static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { > > + struct devnull_ctx *ctx = rawctx; > > + assert(ctx->fd == DEVNULL); > > + /* Do nothing, just return back to the profiler. */ > > + return STREAM_BUFFER_SIZE; > > +} > > + > > +/* }}} Sysprof "/dev/null" stream helpers. */ > > + > > +static int tracee(const char *luacode) { > > + struct luam_Sysprof_Counters counters = {}; > > + struct luam_Sysprof_Options opt = { > > + /* Collect full backtraces per event. */ > > + .mode = LUAM_SYSPROF_CALLGRAPH, > > + /* > > + * XXX: Setting the "endless timer". The test > > + * requires the single event to be streamed at > > + * <lj_fff_res1> instruction, so to avoid spoiling > > + * the stream with other unwanted events, the > > + * timer is set to some unreachable point, so the > > + * profiler will be guaranteed to stop before any > > + * event is emitted. > > + */ > > + .interval = -1ULL, > > + }; > > + > > + /* Allow tracing for this process */ Nit: Missed dot at the end of the sentence. > > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { Nit: It's more readable to use `ptrace(PTRACE_TRACEME, 0, NULL, NULL)` to match arguments types. > > + perror("Failed to turn the calling thread into a tracee"); > Typo?: s/thread/process/ Also, it's better to use `test_comment()` format here (i.e., with "#") to be TAP-compatible. > > + return EXIT_FAILURE; > > + } > > + > > + /* > > + * XXX: Allow parent (which is our tracer now) to observe > > + * our signal-delivery-stop (i.e. the tracee is ready). > > + * For more info see ptrace(2), "Attaching and detaching". > > + */ > > + raise(SIGSTOP); > > + > > + lua_State *L = utils_lua_init(); > > + > > + /* Customize and start profiler. */ > > + assert(stream_new(&opt) == PROFILE_SUCCESS); > > + assert(luaM_sysprof_set_writer(stream_writer) == PROFILE_SUCCESS); > > + assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS); > > + assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS); > > + > > + /* FIXME: Make this part test agnostic. */ Typo: s/test agnostic/test-agnostic/ Nit: TODO looks more valid for me (since it's working). Feel free to ignore. > > + assert(luaL_dostring(L, luacode) == LUA_OK); > > + assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); > > + > > + /* Terminate profiler and Lua universe. */ > > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); > > + utils_lua_close(L); > It doesn't seem to be semantically correct to terminate Lua universe > before obtaining the counters, even though there is no obstacles to do > so in the API implementation itself. > > + > > + /* > > + * XXX: The only event to be streamed must be FFUNC at > > + * <lj_fff_res1> instruction. Typo: s/<lj_fff_res1>/the <lj_fff_res1>/ > > + * FIXME: Make this part test agnostic. Typo: s/test agnostic/test-agnostic/ Nit: TODO looks more valid for me (since it's working). Feel free to ignore. > > + */ > > + assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS); > > + assert(counters.samples == 1); > > + assert(counters.vmst_ffunc == 1); > > + > > + return EXIT_SUCCESS; > > +} > > + > > +const uint8_t INT3 = 0xCC; Maybe it should be moved up to other constants and defines? > > +static inline unsigned long int3poison(unsigned long instruction) { > > + const size_t int3bits = sizeof(INT3) * 8; > > + const unsigned long int3mask = -1UL >> int3bits << int3bits; Minor: It's better to use parentesises here. > > + return (instruction & int3mask) | INT3; > > +} > > + > > +static int continue_until(pid_t chpid, void *addr) { > > + int wstatus; > > + struct user_regs_struct regs; > > + > > + /* Obtain the instructions at the <addr>. */ > > + unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0); Nit: s/0/NULL/. Here and below. > > + /* > > + * Emit the <int 3> instruction to the <addr>. > > + * XXX: <int 3> is poisoned as the LSB to the <data> > > + * obtained from the <addr> above. > > + */ > > + ptrace(PTRACE_POKETEXT, chpid, addr, int3poison(data)); > > + > > + /* Resume <chpid> tracee until SIGTRAP occurs. */ > > + ptrace(PTRACE_CONT, chpid, 0, 0); > > + /* Wait <chpid> tracee signal-delivery-stop. */ > > + waitpid(chpid, &wstatus, 0); > Should we check the `wstatus` here too? > > + > > + /* Obtain GPR set to tweak RIP for further execution. */ > > + ptrace(PTRACE_GETREGS, chpid, 0, ®s); > > + /* > > + * Make sure we indeed are stopped at <addr>. > > + * XXX: RIP points right after <int 3> instruction. > > + */ > > + assert(regs.rip == (long)addr + sizeof(INT3)); > > + > > + /* > > + * XXX: Restore the original instruction at <addr> and > > + * "rewind" RIP by <int 3> size to "replay" the poisoned > > + * instruction at the <addr>. > > + */ > > + regs.rip -= sizeof(INT3); > > + ptrace(PTRACE_SETREGS, chpid, 0, ®s); > > + ptrace(PTRACE_POKETEXT, chpid, addr, data); > > + > > + /* Return wait status to the caller for test checks. */ > > + return wstatus; > > +} > > + > > +static int tracer(pid_t chpid) { > > + int wstatus; > > + > > + /* Wait until <chpid> tracee is ready. */ > > + waitpid(chpid, &wstatus, 0); > Should we check the process status here too? > > + > > + /* Resume <chpid> tracee until <lj_ff_tostring>. */ > > + wstatus = continue_until(chpid, lj_ff_tostring); > > + > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > + assert_false(WIFEXITED(wstatus)); > > + assert_true(WIFSTOPPED(wstatus)); > > + > > + /* Resume <chpid> tracee until <lj_fff_res1>. */ > > + wstatus = continue_until(chpid, lj_fff_res1); > > + > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > + assert_false(WIFEXITED(wstatus)); > > + assert_true(WIFSTOPPED(wstatus)); > This part is duplicated, let's move it to a separate `healthcheck` > function, or something like that. May be we can move these checks inside the `continue_until()` routine? Do we want to check something except the status is `WIFSTOPPED(status)`? > > + > > + /* Send SIGPROF to make sysprof collect the event. */ > > + ptrace(PTRACE_CONT, chpid, 0, SIGPROF); > > + > > + /* Wait until <chpid> tracee successfully exits. */ > > + waitpid(chpid, &wstatus, 0); > > + assert_true(WIFEXITED(wstatus)); > > + > > + return TEST_EXIT_SUCCESS; > > +} > > + <snipped> > > diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > deleted file mode 100644 > > index f8b409ae..00000000 > > --- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > +++ /dev/null > <snipped> -- Best regards, Sergey Kaplun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-05 8:37 ` Sergey Kaplun via Tarantool-patches @ 2023-12-05 12:04 ` Igor Munkin via Tarantool-patches 2023-12-05 12:25 ` Sergey Kaplun via Tarantool-patches 0 siblings, 1 reply; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-12-05 12:04 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches Sergey, Thanks for your review! On 05.12.23, Sergey Kaplun wrote: > Hi, Igor! > Thanks for the patch! > Really awesome improvement for our sysprof testing! > Please consider my comments below. > > On 03.12.23, Maxim Kokryashkin wrote: <snipped> > > > --- > > > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > > > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > > > 2 files changed, 269 insertions(+), 55 deletions(-) > > > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > > > > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > > new file mode 100644 > > > index 00000000..4caed8cb > > > --- /dev/null > > > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > > @@ -0,0 +1,269 @@ <snipped> > > > + > > > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 > > Why not use the `LJ_HASSYSPROF` flag? > > I agree here. `LJ_HASSYSPROF` is more foolproof -- if we will port > sysprof for arm64 (for example) we will not forget to update this test. See my arguments for this in my previous reply. > <snipped> > > Minor: This link may be useful too: > https://c9x.me/x86/html/file_module_x86_id_142.html Thanks, added: ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index d17386dd..d9d2e481 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -41,6 +41,7 @@ * * See more info here: * * https://man7.org/linux/man-pages/man2/ptrace.2.html + * * https://c9x.me/x86/html/file_module_x86_id_142.html * * https://github.com/tarantool/tarantool/issues/8594 * * https://github.com/tarantool/tarantool/issues/9387 */ ================================================================================ > > > > + */ > > > + > > > +#define MESSAGE "Canary is alive" > > > +#define LUACALL "local a = tostring('" MESSAGE "') return a" > > Nit: I suggest adding an excess empty line here to separate extern asm functions > from Lua code defines. OK, added. > > > > +/* XXX: Resolve the necessary addresses from VM engine. */ > > > +extern void *lj_ff_tostring(void); > > > +extern void *lj_fff_res1(void); > > > + > > > +/* Sysprof "/dev/null" stream helpers. {{{ */ > > > + > > > +/* > > > + * Yep, 8Mb. Tuned in order not to bother the platform with too > > > + * often flushes. > > > + */ > > This comment is misleading since there is no any flushes in the writer > function. Well, it's not about I/O flushes, but rather about sysprof write-buffer pressure, isn't it? Hence, 8Mb is needed to make the test omit almost all buffer-flush related callbacks. > Also, do we need such big buffer? Maybe `PATH_MAX` * 2 (or something > like that) should be enough? Could you elaborate the reason for such a strange constant, please? > > > > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) > > > +#define DEVNULL -1 > > `DEVNULL` name is misleading (like we've actually opened `/dev/null`). > Maybe `DUMMY_FD` is better? Also, I suppose we can drop the `fd` field > at all. We can just use static structure in this translation unit and > compare pointer against it, so there is no need for an additional field > fd in the structure itself. I see little reasoning in this suggestion: the approach with the structure is quite straight-forward and copied from another sysprof test implemented in C. Re naming: I prefer to leave it as is. In general, your nit is nice, but I believe it should be applied for the patch moving <ptrace> tricks to utils library. > > > > + <snipped> > > > + > > > +static int stream_new(struct luam_Sysprof_Options *options) { > > Please, use separate line for block start `{`. > Here and below. OK, fixed everywhere in test. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 942787a4..884c62cc 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -72,7 +72,8 @@ struct devnull_ctx { uint8_t buf[STREAM_BUFFER_SIZE]; }; -static int stream_new(struct luam_Sysprof_Options *options) { +static int stream_new(struct luam_Sysprof_Options *options) +{ struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx)); if (ctx == NULL) return PROFILE_ERRIO; @@ -86,14 +87,16 @@ static int stream_new(struct luam_Sysprof_Options *options) { return PROFILE_SUCCESS; } -static int stream_delete(void *rawctx, uint8_t *buf) { +static int stream_delete(void *rawctx, uint8_t *buf) +{ struct devnull_ctx *ctx = rawctx; assert(ctx->fd == DEVNULL); free(ctx); return PROFILE_SUCCESS; } -static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { +static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) +{ struct devnull_ctx *ctx = rawctx; assert(ctx->fd == DEVNULL); /* Do nothing, just return back to the profiler. */ @@ -102,7 +105,8 @@ static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { /* }}} Sysprof "/dev/null" stream helpers. */ -static int tracee(const char *luacode) { +static int tracee(const char *luacode) +{ struct luam_Sysprof_Counters counters = {}; struct luam_Sysprof_Options opt = { /* Collect full backtraces per event. */ @@ -162,7 +166,8 @@ static int tracee(const char *luacode) { return EXIT_SUCCESS; } -static void wait_alive(pid_t chpid) { +static void wait_alive(pid_t chpid) +{ int wstatus; /* Wait <chpid> tracee signal-delivery-stop. */ @@ -175,13 +180,15 @@ static void wait_alive(pid_t chpid) { } const uint8_t INT3 = 0xCC; -static inline unsigned long int3poison(unsigned long instruction) { +static inline unsigned long int3poison(unsigned long instruction) +{ const size_t int3bits = sizeof(INT3) * 8; const unsigned long int3mask = -1UL >> int3bits << int3bits; return (instruction & int3mask) | INT3; } -static void continue_until(pid_t chpid, void *addr) { +static void continue_until(pid_t chpid, void *addr) +{ struct user_regs_struct regs; /* Obtain the instructions at the <addr>. */ @@ -220,7 +227,8 @@ static void continue_until(pid_t chpid, void *addr) { ptrace(PTRACE_POKETEXT, chpid, addr, data); } -static int tracer(pid_t chpid) { +static int tracer(pid_t chpid) +{ int wstatus; /* Wait until <chpid> tracee is ready. */ @@ -250,7 +258,8 @@ static int tracer(pid_t chpid) { return TEST_EXIT_SUCCESS; } -static int test_tostring_call(void *ctx) { +static int test_tostring_call(void *ctx) +{ pid_t chpid = fork(); switch(chpid) { case -1: @@ -271,13 +280,15 @@ static int test_tostring_call(void *ctx) { #else /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ -static int test_tostring_call(void *ctx) { +static int test_tostring_call(void *ctx) +{ return skip_all("sysprof is implemented for Linux/x86_64 only"); } #endif /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ -int main(void) { +int main(void) +{ const struct test_unit tgroup[] = { test_unit_def(test_tostring_call), }; ================================================================================ > <snipped> > > > +static int tracee(const char *luacode) { > > > + struct luam_Sysprof_Counters counters = {}; > > > + struct luam_Sysprof_Options opt = { > > > + /* Collect full backtraces per event. */ > > > + .mode = LUAM_SYSPROF_CALLGRAPH, > > > + /* > > > + * XXX: Setting the "endless timer". The test > > > + * requires the single event to be streamed at > > > + * <lj_fff_res1> instruction, so to avoid spoiling > > > + * the stream with other unwanted events, the > > > + * timer is set to some unreachable point, so the > > > + * profiler will be guaranteed to stop before any > > > + * event is emitted. > > > + */ > > > + .interval = -1ULL, > > > + }; > > > + > > > + /* Allow tracing for this process */ > > Nit: Missed dot at the end of the sentence. Fixed. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 884c62cc..40926a50 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -123,7 +123,7 @@ static int tracee(const char *luacode) .interval = -1ULL, }; - /* Allow tracing for this process */ + /* Allow tracing for this process. */ if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { perror("Failed to turn the calling process into a tracee"); return EXIT_FAILURE; ================================================================================ > > > > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { > > Nit: It's more readable to use `ptrace(PTRACE_TRACEME, 0, NULL, NULL)` > to match arguments types. OK, as you wish. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 40926a50..6e5a9255 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -124,7 +124,7 @@ static int tracee(const char *luacode) }; /* Allow tracing for this process. */ - if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { + if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) < 0) { perror("Failed to turn the calling process into a tracee"); return EXIT_FAILURE; } ================================================================================ > > > > + perror("Failed to turn the calling thread into a tracee"); > > Typo?: s/thread/process/ > > Also, it's better to use `test_comment()` format here (i.e., with "#") > to be TAP-compatible. No, it's not. Tracee output is not TAP compatible, since the tracer is the only TAP output emitter. > > > > + return EXIT_FAILURE; > > > + } > > > + > > > + /* > > > + * XXX: Allow parent (which is our tracer now) to observe > > > + * our signal-delivery-stop (i.e. the tracee is ready). > > > + * For more info see ptrace(2), "Attaching and detaching". > > > + */ > > > + raise(SIGSTOP); > > > + > > > + lua_State *L = utils_lua_init(); > > > + > > > + /* Customize and start profiler. */ > > > + assert(stream_new(&opt) == PROFILE_SUCCESS); > > > + assert(luaM_sysprof_set_writer(stream_writer) == PROFILE_SUCCESS); > > > + assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS); > > > + assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS); > > > + > > > + /* FIXME: Make this part test agnostic. */ > > Typo: s/test agnostic/test-agnostic/ > > Nit: TODO looks more valid for me (since it's working). > Feel free to ignore. Fixed. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 6e5a9255..44421812 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -144,7 +144,7 @@ static int tracee(const char *luacode) assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS); assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS); - /* FIXME: Make this part test agnostic. */ + /* TODO: Make this part test-agnostic. */ assert(luaL_dostring(L, luacode) == LUA_OK); assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); ================================================================================ > > > > + assert(luaL_dostring(L, luacode) == LUA_OK); > > > + assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); > > > + > > > + /* Terminate profiler and Lua universe. */ > > > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); > > > + utils_lua_close(L); > > It doesn't seem to be semantically correct to terminate Lua universe > > before obtaining the counters, even though there is no obstacles to do > > so in the API implementation itself. > > > + > > > + /* > > > + * XXX: The only event to be streamed must be FFUNC at > > > + * <lj_fff_res1> instruction. > > Typo: s/<lj_fff_res1>/the <lj_fff_res1>/ > > > > + * FIXME: Make this part test agnostic. > > Typo: s/test agnostic/test-agnostic/ > > Nit: TODO looks more valid for me (since it's working). > Feel free to ignore. Fixed. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 44421812..036f8141 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -153,8 +153,8 @@ static int tracee(const char *luacode) /* * XXX: The only event to be streamed must be FFUNC at - * <lj_fff_res1> instruction. - * FIXME: Make this part test agnostic. + * the <lj_fff_res1> instruction. + * TODO: Make this part test agnostic. */ assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS); assert(counters.samples == 1); ================================================================================ > <snipped> > > > +const uint8_t INT3 = 0xCC; > > Maybe it should be moved up to other constants and defines? I'd rather prefer to leave it here to localize the scope of its usage. > > > > +static inline unsigned long int3poison(unsigned long instruction) { > > > + const size_t int3bits = sizeof(INT3) * 8; > > > + const unsigned long int3mask = -1UL >> int3bits << int3bits; > > Minor: It's better to use parentesises here. I just want to do tuda-suda to nil eight LSBs. Since operations has the same prio, it will look like (1 + 2) + 3. I see little enhancements, so ignoring. > > > > + return (instruction & int3mask) | INT3; > > > +} > > > + > > > +static int continue_until(pid_t chpid, void *addr) { > > > + int wstatus; > > > + struct user_regs_struct regs; > > > + > > > + /* Obtain the instructions at the <addr>. */ > > > + unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0); > > Nit: s/0/NULL/. > Here and below. Yes, honey :( ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 036f8141..6bf7562f 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -192,7 +192,7 @@ static void continue_until(pid_t chpid, void *addr) struct user_regs_struct regs; /* Obtain the instructions at the <addr>. */ - unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0); + unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, NULL); /* * Emit the <int 3> instruction to the <addr>. * XXX: <int 3> is poisoned as the LSB to the <data> @@ -201,7 +201,7 @@ static void continue_until(pid_t chpid, void *addr) ptrace(PTRACE_POKETEXT, chpid, addr, int3poison(data)); /* Resume <chpid> tracee until SIGTRAP occurs. */ - ptrace(PTRACE_CONT, chpid, 0, 0); + ptrace(PTRACE_CONT, chpid, NULL, NULL); /* * Wait <chpid> tracee signal-delivery-stop and check @@ -210,7 +210,7 @@ static void continue_until(pid_t chpid, void *addr) wait_alive(chpid); /* Obtain GPR set to tweak RIP for further execution. */ - ptrace(PTRACE_GETREGS, chpid, 0, ®s); + ptrace(PTRACE_GETREGS, chpid, NULL, ®s); /* * Make sure we indeed are stopped at <addr>. * XXX: RIP points right after <int 3> instruction. @@ -223,7 +223,7 @@ static void continue_until(pid_t chpid, void *addr) * instruction at the <addr>. */ regs.rip -= sizeof(INT3); - ptrace(PTRACE_SETREGS, chpid, 0, ®s); + ptrace(PTRACE_SETREGS, chpid, NULL, ®s); ptrace(PTRACE_POKETEXT, chpid, addr, data); } ================================================================================ > <snipped> > > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > > + assert_false(WIFEXITED(wstatus)); > > > + assert_true(WIFSTOPPED(wstatus)); > > This part is duplicated, let's move it to a separate `healthcheck` > > function, or something like that. > > May be we can move these checks inside the `continue_until()` routine? > Do we want to check something except the status is `WIFSTOPPED(status)`? See the updated hunk in the previous reply. > <snipped> > > -- > Best regards, > Sergey Kaplun -- Best regards, IM ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-05 12:04 ` Igor Munkin via Tarantool-patches @ 2023-12-05 12:25 ` Sergey Kaplun via Tarantool-patches 2023-12-05 12:59 ` Igor Munkin via Tarantool-patches 0 siblings, 1 reply; 17+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2023-12-05 12:25 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi, Igor! Thanks for the fixes! Please, consider my comments below. On 05.12.23, Igor Munkin wrote: > Sergey, > > Thanks for your review! > > On 05.12.23, Sergey Kaplun wrote: > > Hi, Igor! > > Thanks for the patch! > > Really awesome improvement for our sysprof testing! > > Please consider my comments below. > > > > On 03.12.23, Maxim Kokryashkin wrote: <snipped> > > > > +/* Sysprof "/dev/null" stream helpers. {{{ */ > > > > + > > > > +/* > > > > + * Yep, 8Mb. Tuned in order not to bother the platform with too > > > > + * often flushes. > > > > + */ > > > > This comment is misleading since there is no any flushes in the writer > > function. > > Well, it's not about I/O flushes, but rather about sysprof write-buffer > pressure, isn't it? Hence, 8Mb is needed to make the test omit almost > all buffer-flush related callbacks. Ok, got your point. So mention this in the comment instead. And remove "/dev/null" mentioning. > > > Also, do we need such big buffer? Maybe `PATH_MAX` * 2 (or something > > like that) should be enough? > > Could you elaborate the reason for such a strange constant, please? Considering your comment above this isn't relevant no more. Ignore please. > > > > > > > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) > > > > +#define DEVNULL -1 > > > > `DEVNULL` name is misleading (like we've actually opened `/dev/null`). > > Maybe `DUMMY_FD` is better? Also, I suppose we can drop the `fd` field > > at all. We can just use static structure in this translation unit and > > compare pointer against it, so there is no need for an additional field > > fd in the structure itself. > > I see little reasoning in this suggestion: the approach with the > structure is quite straight-forward and copied from another sysprof test > implemented in C. Re naming: I prefer to leave it as is. I don't get why it is straight-forward. This approach with unused fields is just misleading. In <test/tarantool-c-tests/misclib-sysprof-capi.test.c> the `fd` field is used since the "/dev/null" is opened and data written to it. Also, since we can use custom writer, that test may be updated wo using `fd` too. But this should be done as a separate refactoring patch. > > In general, your nit is nice, but I believe it should be applied > for the patch moving <ptrace> tricks to utils library. I don't see any relations with ptrace here. It's just the usage of sysprof's C API. > > > > > > > + <snipped> > > > > Typo: s/<lj_fff_res1>/the <lj_fff_res1>/ > > > > > > + * FIXME: Make this part test agnostic. > > > > Typo: s/test agnostic/test-agnostic/ > > > > Nit: TODO looks more valid for me (since it's working). > > Feel free to ignore. > > Fixed. > > ================================================================================ > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > index 44421812..036f8141 100644 > --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > @@ -153,8 +153,8 @@ static int tracee(const char *luacode) > > /* > * XXX: The only event to be streamed must be FFUNC at > - * <lj_fff_res1> instruction. > - * FIXME: Make this part test agnostic. > + * the <lj_fff_res1> instruction. > + * TODO: Make this part test agnostic. Typo: s/test agnostic/test-agnostic/ > <snipped> > > > > -- > > Best regards, > > Sergey Kaplun > > -- > Best regards, > IM -- Best regards, Sergey Kaplun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-05 12:25 ` Sergey Kaplun via Tarantool-patches @ 2023-12-05 12:59 ` Igor Munkin via Tarantool-patches 2023-12-05 15:08 ` Sergey Kaplun via Tarantool-patches 0 siblings, 1 reply; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-12-05 12:59 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches Sergey, On 05.12.23, Sergey Kaplun wrote: > Hi, Igor! > Thanks for the fixes! > Please, consider my comments below. > <snipped> > > > > > > > > > > > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) > > > > > +#define DEVNULL -1 > > > > > > `DEVNULL` name is misleading (like we've actually opened `/dev/null`). > > > Maybe `DUMMY_FD` is better? Also, I suppose we can drop the `fd` field > > > at all. We can just use static structure in this translation unit and > > > compare pointer against it, so there is no need for an additional field > > > fd in the structure itself. > > > > I see little reasoning in this suggestion: the approach with the > > structure is quite straight-forward and copied from another sysprof test > > implemented in C. Re naming: I prefer to leave it as is. > > I don't get why it is straight-forward. This approach with unused fields > is just misleading. In > <test/tarantool-c-tests/misclib-sysprof-capi.test.c> the `fd` field is > used since the "/dev/null" is opened and data written to it. > > Also, since we can use custom writer, that test may be updated wo using > `fd` too. But this should be done as a separate refactoring patch. > > > > > In general, your nit is nice, but I believe it should be applied > > for the patch moving <ptrace> tricks to utils library. > > I don't see any relations with ptrace here. It's just the usage of > sysprof's C API. Well, I've finally got your original idea. Here it is: ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 6bf7562f..8967abc4 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -53,35 +53,26 @@ extern void *lj_ff_tostring(void); extern void *lj_fff_res1(void); -/* Sysprof "/dev/null" stream helpers. {{{ */ +/* Sysprof dummy stream helpers. {{{ */ /* * Yep, 8Mb. Tuned in order not to bother the platform with too * often flushes. */ #define STREAM_BUFFER_SIZE (8 * 1024 * 1024) -#define DEVNULL -1 -struct devnull_ctx { - /* - * XXX: Dummy file descriptor to be used as "/dev/null" - * context indicator in writer and on_stop callback. - */ - int fd; +struct dummy_ctx { /* Buffer for data recorded by sysprof. */ uint8_t buf[STREAM_BUFFER_SIZE]; }; +static struct dummy_ctx context; + static int stream_new(struct luam_Sysprof_Options *options) { - struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx)); - if (ctx == NULL) - return PROFILE_ERRIO; - - /* Set "/dev/null" context indicator. */ - ctx->fd = DEVNULL; - options->ctx = ctx; - options->buf = ctx->buf; + /* Set dummy context. */ + options->ctx = &context; + options->buf = (uint8_t *)&context.buf; options->len = STREAM_BUFFER_SIZE; return PROFILE_SUCCESS; @@ -89,21 +80,19 @@ static int stream_new(struct luam_Sysprof_Options *options) static int stream_delete(void *rawctx, uint8_t *buf) { - struct devnull_ctx *ctx = rawctx; - assert(ctx->fd == DEVNULL); - free(ctx); + assert(rawctx == &context); + /* XXX: No need to release context memory. Just return. */ return PROFILE_SUCCESS; } static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { - struct devnull_ctx *ctx = rawctx; - assert(ctx->fd == DEVNULL); + assert(rawctx == &context); /* Do nothing, just return back to the profiler. */ return STREAM_BUFFER_SIZE; } -/* }}} Sysprof "/dev/null" stream helpers. */ +/* }}} Sysprof dummy stream helpers. */ static int tracee(const char *luacode) { ================================================================================ > <snipped> > > > > > > > Typo: s/<lj_fff_res1>/the <lj_fff_res1>/ > > > > > > > > + * FIXME: Make this part test agnostic. > > > > > > Typo: s/test agnostic/test-agnostic/ > > > > > > Nit: TODO looks more valid for me (since it's working). > > > Feel free to ignore. > > > > Fixed. > > > > ================================================================================ > > > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > index 44421812..036f8141 100644 > > --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > @@ -153,8 +153,8 @@ static int tracee(const char *luacode) > > > > /* > > * XXX: The only event to be streamed must be FFUNC at > > - * <lj_fff_res1> instruction. > > - * FIXME: Make this part test agnostic. > > + * the <lj_fff_res1> instruction. > > + * TODO: Make this part test agnostic. > > Typo: s/test agnostic/test-agnostic/ Fixed (again). ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 8967abc4..a1c4a3ed 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -143,7 +143,7 @@ static int tracee(const char *luacode) /* * XXX: The only event to be streamed must be FFUNC at * the <lj_fff_res1> instruction. - * TODO: Make this part test agnostic. + * TODO: Make this part test-agnostic. */ assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS); assert(counters.samples == 1); ================================================================================ > <snipped> > > > -- > > > Best regards, > > > Sergey Kaplun > > > > -- > > Best regards, > > IM > > -- > Best regards, > Sergey Kaplun -- Best regards, IM ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-05 12:59 ` Igor Munkin via Tarantool-patches @ 2023-12-05 15:08 ` Sergey Kaplun via Tarantool-patches 0 siblings, 0 replies; 17+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2023-12-05 15:08 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi, Igor! Thanks for the fixes! LGTM! -- Best regards, Sergey Kaplun ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-03 14:17 ` Maxim Kokryashkin via Tarantool-patches 2023-12-05 8:37 ` Sergey Kaplun via Tarantool-patches @ 2023-12-05 11:34 ` Igor Munkin via Tarantool-patches 2023-12-05 13:23 ` Maxim Kokryashkin via Tarantool-patches 1 sibling, 1 reply; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-12-05 11:34 UTC (permalink / raw) To: Maxim Kokryashkin; +Cc: tarantool-patches Max, Thanks for your review! On 03.12.23, Maxim Kokryashkin wrote: > Hi, Igor! > Thanks for the patch! > Please consider my comments below. > > Side note: This new testing approach is amazing, thanks a lot! > > On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote: > > Often tests for sampling profiler require long running loops to be > Typo: s/Often/Often,/ > > executed, so a certain situation is likely to occur. Thus the test added > Typo: s/Thus/Thus,/ > > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix > Typo: s/in the/in/ > > crash during FFUNC stream") expects FFUNC VM state (and even the > Typo: s/expects FFUNC/expects the FFUNC/ > > particular instruction to be executed) at the moment when stacktrace is > > being collected. Unfortunately, it leads to test routine hang for > Typo: s/to test routine hang/to the test routine hanging/ > > several environments and if it does not we cannot guarantee that the > Typo: s/environments/environments,/ > Typo: s/and if it does not/and even if it does not,/ > > desired scenario is tested (only rely on statistics). As a result the > Typo: s/As a result/As a result,/ > > test for the aforementioned patch was disabled for Tarantool CI in the > > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging > > Tarantool CI by sysprof test") until the issue is not resolved. > > > > This patch introduces the new approach for testing our sampling profiler > > via <ptrace> implementing precise managed execution of the profiled > Typo: s/<ptrace>/<ptrace>,/ > > instance mentioned in tarantool/tarantool#9387. > > > > Instead of running around <tostring> gazillion times we accurately step > Typo: s/times/times,/ > > to the exact place where the issue reproduces and manually emit SIGPROF > > to the Lua VM being profiled. The particular approach implemented in > > this patch is described below. > > > > As it was mentioned, the test makes sysprof to collect the particular > Typo: s/to collect/collect/ > > event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to > > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to > Typo: s/Hence/Hence,/ > > call <tostring> fast function in the profiled instance (i.e. "tracee"). > Typo: s/i.e./i.e.,/ > > To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, > Typo: s/in scope/in the scope/ > > the manager (i.e. "tracer") is implemented. > Typo: s/i.e./i.e.,/ > > > > Here are the main steps (see comments and `man 2 ptrace' for more info): > > 1. Poison <int 3> instruction as the first instruction at > > <lj_ff_tostring> to stop at the beginning of the fast function; > > 2. Resume the "tracee" from the "tracer"; > > 3. Hit the emitted interruption, restore the original instruction and > > "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; > > 4. Do the hack 1-3 for <lj_fff_res1>; > > 5. Emit SIGPROF while resumimg the "tracee"; > Typo: s/resumimg/resuming/ > > > > As a result sysprof collects the full backtrace with <tostring> fast > Typo: s/a result/a result,/ > > function as the topmost frame. > > > > Resolves tarantool/tarantool#9387 > > Follows up tarantool/tarantool#8594 > > > > Signed-off-by: Igor Munkin <imun@tarantool.org> Here is the updated commit message: ================================================================================ test: rewrite sysprof test using managed execution Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via <ptrace>, implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around <tostring> gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call <tostring> fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison <int 3> instruction as the first instruction at <lj_ff_tostring> to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; 4. Do the hack 1-3 for <lj_fff_res1>; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with <tostring> fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594 ================================================================================ > > --- > > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > > 2 files changed, 269 insertions(+), 55 deletions(-) > > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > new file mode 100644 > > index 00000000..4caed8cb > > --- /dev/null > > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > @@ -0,0 +1,269 @@ > > +#include "lauxlib.h" > > +#include "lmisclib.h" > > +#include "lua.h" > > + > > +#include "test.h" > > +#include "utils.h" > > + > > +#include <signal.h> > > +#include <sys/ptrace.h> > > +#include <sys/user.h> > > +#include <sys/wait.h> > > +#include <unistd.h> > > + > > +/* XXX: Still need normal assert inside <tracee> and helpers. */ > > +#undef NDEBUG > > +#include <assert.h> > > + > > +/* > > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so > > + * does the sampling profiler. <lj_arch.> is needed for LUAJIT_OS > > + * and LUAJIT_TARGET. > > + */ > > +#include "lj_arch.h" > > + > > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 > Why not use the `LJ_HASSYSPROF` flag? Since the test per se is Linux/x86_64 specific (consider the particular register name used below). E.g. sysprof is implemented for x86 either, but the test is not. If somebody (Sergey Kaplun) finally decided to introduce x86 testing environment, the test would crash. I prefer more developer-friendly approach here (considering <ptrace> peculiarities). > > + > > +/* > > + * XXX: The test makes sysprof to collect the particular event > Typo: s/sysprof to/sysprof/ > > + * (FFUNC) at the particular instruction (<lj_fff_res1>) to > > + * reproduce the issue #8594. Hence it's enough to call <tostring> > Typo: s/Hence/Hence,/ > > + * fast function (this is done in <tracee> function). To emit > > + * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast > > + * function, the managed execution is implemented in the function > > + * <tracer>: <int 3> instruction is poisoned as the first > > + * instruction at <lj_ff_tostring> to stop <tracee> at the > > + * beginning of the fast function; <tracer> resumes the <tracee>; > > + * the same hack is done for <lj_fff_res1>. When the <tracee> hits > > + * the interruption at <lj_fff_res1>, SIGPROF is emitted while > > + * resuming the <tracee>. As a result sysprof collects the full > Typo: s/a result/a result,/ > > + * backtrace with <tostring> fast function as the topmost frame. > > + * > > + * See more info here: > > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html > > + * * https://github.com/tarantool/tarantool/issues/8594 > > + * * https://github.com/tarantool/tarantool/issues/9387 > > + */ Fixed. The diff is below: ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 4caed8cb..f060760b 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -25,18 +25,18 @@ #if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 /* - * XXX: The test makes sysprof to collect the particular event + * XXX: The test makes sysprof collect the particular event * (FFUNC) at the particular instruction (<lj_fff_res1>) to - * reproduce the issue #8594. Hence it's enough to call <tostring> - * fast function (this is done in <tracee> function). To emit - * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast - * function, the managed execution is implemented in the function - * <tracer>: <int 3> instruction is poisoned as the first + * reproduce the issue #8594. Hence, it's enough to call + * <tostring> fast function (this is done in <tracee> function). + * To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> + * fast function, the managed execution is implemented in the + * function <tracer>: <int 3> instruction is poisoned as the first * instruction at <lj_ff_tostring> to stop <tracee> at the * beginning of the fast function; <tracer> resumes the <tracee>; * the same hack is done for <lj_fff_res1>. When the <tracee> hits * the interruption at <lj_fff_res1>, SIGPROF is emitted while - * resuming the <tracee>. As a result sysprof collects the full + * resuming the <tracee>. As a result, sysprof collects the full * backtrace with <tostring> fast function as the topmost frame. * * See more info here: ================================================================================ > > + <snipped> > > + > > + /* Allow tracing for this process */ > > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { > > + perror("Failed to turn the calling thread into a tracee"); > Typo?: s/thread/process/ Yeah, copy-pasted from the <ptrace> recipe. Fixed. > > + return EXIT_FAILURE; > > + } <snipped> > > + /* Terminate profiler and Lua universe. */ > > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); > > + utils_lua_close(L); > It doesn't seem to be semantically correct to terminate Lua universe > before obtaining the counters, even though there is no obstacles to do > so in the API implementation itself. To[MAH]to vs.To[MAY]to: fixed. The diff is below. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index c0f9d1ca..5a2523ef 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -142,9 +142,8 @@ static int tracee(const char *luacode) { assert(luaL_dostring(L, luacode) == LUA_OK); assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); - /* Terminate profiler and Lua universe. */ + /* Terminate profiler. */ assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); - utils_lua_close(L); /* * XXX: The only event to be streamed must be FFUNC at @@ -155,6 +154,9 @@ static int tracee(const char *luacode) { assert(counters.samples == 1); assert(counters.vmst_ffunc == 1); + /* Terminate Lua universe. */ + utils_lua_close(L); + return EXIT_SUCCESS; } ================================================================================ > > + <snipped> > > + /* Resume <chpid> tracee until SIGTRAP occurs. */ > > + ptrace(PTRACE_CONT, chpid, 0, 0); > > + /* Wait <chpid> tracee signal-delivery-stop. */ > > + waitpid(chpid, &wstatus, 0); > Should we check the `wstatus` here too? Implemented via <wait_alive> helper (see the details below). > > + <snipped> > > + > > +static int tracer(pid_t chpid) { > > + int wstatus; > > + > > + /* Wait until <chpid> tracee is ready. */ > > + waitpid(chpid, &wstatus, 0); > Should we check the process status here too? Implemented via <wait_alive> helper (see the implementation below). ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 84c024f0..d17386dd 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -222,7 +222,7 @@ static int tracer(pid_t chpid) { int wstatus; /* Wait until <chpid> tracee is ready. */ - waitpid(chpid, &wstatus, 0); + wait_alive(chpid); /* * Resume <chpid> tracee until <lj_ff_tostring>. ================================================================================ > > + > > + /* Resume <chpid> tracee until <lj_ff_tostring>. */ > > + wstatus = continue_until(chpid, lj_ff_tostring); > > + > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > + assert_false(WIFEXITED(wstatus)); > > + assert_true(WIFSTOPPED(wstatus)); > > + > > + /* Resume <chpid> tracee until <lj_fff_res1>. */ > > + wstatus = continue_until(chpid, lj_fff_res1); > > + > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > + assert_false(WIFEXITED(wstatus)); > > + assert_true(WIFSTOPPED(wstatus)); > This part is duplicated, let's move it to a separate `healthcheck` > function, or something like that. It's only duplicated in sense of reusing this particular code, no more. However, let's consider these conditions are not parts of the test assertions set, but rather invariants for <continue_until> helper (i.e. <continue_until> has to yield the execution with the "alive" tracee). In this case, the patch for <continue_until> is the following: ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 5a2523ef..84c024f0 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -160,6 +160,18 @@ static int tracee(const char *luacode) { return EXIT_SUCCESS; } +static void wait_alive(pid_t chpid) { + int wstatus; + + /* Wait <chpid> tracee signal-delivery-stop. */ + waitpid(chpid, &wstatus, 0); + + /* Check the tracee is still alive and just stopped. */ + assert(!WIFEXITED(wstatus)); + assert(!WIFSIGNALED(wstatus)); + assert(WIFSTOPPED(wstatus)); +} + const uint8_t INT3 = 0xCC; static inline unsigned long int3poison(unsigned long instruction) { const size_t int3bits = sizeof(INT3) * 8; @@ -167,8 +179,7 @@ static inline unsigned long int3poison(unsigned long instruction) { return (instruction & int3mask) | INT3; } -static int continue_until(pid_t chpid, void *addr) { - int wstatus; +static void continue_until(pid_t chpid, void *addr) { struct user_regs_struct regs; /* Obtain the instructions at the <addr>. */ @@ -182,8 +193,12 @@ static int continue_until(pid_t chpid, void *addr) { /* Resume <chpid> tracee until SIGTRAP occurs. */ ptrace(PTRACE_CONT, chpid, 0, 0); - /* Wait <chpid> tracee signal-delivery-stop. */ - waitpid(chpid, &wstatus, 0); + + /* + * Wait <chpid> tracee signal-delivery-stop and check + * whether it's still alive and just stopped via SIGTRAP. + */ + wait_alive(chpid); /* Obtain GPR set to tweak RIP for further execution. */ ptrace(PTRACE_GETREGS, chpid, 0, ®s); @@ -201,9 +216,6 @@ static int continue_until(pid_t chpid, void *addr) { regs.rip -= sizeof(INT3); ptrace(PTRACE_SETREGS, chpid, 0, ®s); ptrace(PTRACE_POKETEXT, chpid, addr, data); - - /* Return wait status to the caller for test checks. */ - return wstatus; } static int tracer(pid_t chpid) { @@ -212,19 +224,19 @@ static int tracer(pid_t chpid) { /* Wait until <chpid> tracee is ready. */ waitpid(chpid, &wstatus, 0); - /* Resume <chpid> tracee until <lj_ff_tostring>. */ - wstatus = continue_until(chpid, lj_ff_tostring); - - /* The tracee has to be alive and stopped by SIGTRAP. */ - assert_false(WIFEXITED(wstatus)); - assert_true(WIFSTOPPED(wstatus)); - - /* Resume <chpid> tracee until <lj_fff_res1>. */ - wstatus = continue_until(chpid, lj_fff_res1); + /* + * Resume <chpid> tracee until <lj_ff_tostring>. + * The tracee is alive and stopped by SIGTRAP if + * <continue_until> returns. + */ + continue_until(chpid, lj_ff_tostring); - /* The tracee has to be alive and stopped by SIGTRAP. */ - assert_false(WIFEXITED(wstatus)); - assert_true(WIFSTOPPED(wstatus)); + /* + * Resume <chpid> tracee until <lj_fff_res1>. + * The tracee is alive and stopped by SIGTRAP if + * <continue_until> returns. + */ + continue_until(chpid, lj_fff_res1); /* Send SIGPROF to make sysprof collect the event. */ ptrace(PTRACE_CONT, chpid, 0, SIGPROF); ================================================================================ > > + <snipped> -- Best regards, IM ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution 2023-12-05 11:34 ` Igor Munkin via Tarantool-patches @ 2023-12-05 13:23 ` Maxim Kokryashkin via Tarantool-patches 0 siblings, 0 replies; 17+ messages in thread From: Maxim Kokryashkin via Tarantool-patches @ 2023-12-05 13:23 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 18913 bytes --] Hi, Igor! Thanks for the fixes! LGTM -- Best regards, Maxim Kokryashkin > >>Max, >> >>Thanks for your review! >> >>On 03.12.23, Maxim Kokryashkin wrote: >>> Hi, Igor! >>> Thanks for the patch! >>> Please consider my comments below. >>> >>> Side note: This new testing approach is amazing, thanks a lot! >>> >>> On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote: >>> > Often tests for sampling profiler require long running loops to be >>> Typo: s/Often/Often,/ >>> > executed, so a certain situation is likely to occur. Thus the test added >>> Typo: s/Thus/Thus,/ >>> > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix >>> Typo: s/in the/in/ >>> > crash during FFUNC stream") expects FFUNC VM state (and even the >>> Typo: s/expects FFUNC/expects the FFUNC/ >>> > particular instruction to be executed) at the moment when stacktrace is >>> > being collected. Unfortunately, it leads to test routine hang for >>> Typo: s/to test routine hang/to the test routine hanging/ >>> > several environments and if it does not we cannot guarantee that the >>> Typo: s/environments/environments,/ >>> Typo: s/and if it does not/and even if it does not,/ >>> > desired scenario is tested (only rely on statistics). As a result the >>> Typo: s/As a result/As a result,/ >>> > test for the aforementioned patch was disabled for Tarantool CI in the >>> > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging >>> > Tarantool CI by sysprof test") until the issue is not resolved. >>> > >>> > This patch introduces the new approach for testing our sampling profiler >>> > via <ptrace> implementing precise managed execution of the profiled >>> Typo: s/<ptrace>/<ptrace>,/ >>> > instance mentioned in tarantool/tarantool#9387. >>> > >>> > Instead of running around <tostring> gazillion times we accurately step >>> Typo: s/times/times,/ >>> > to the exact place where the issue reproduces and manually emit SIGPROF >>> > to the Lua VM being profiled. The particular approach implemented in >>> > this patch is described below. >>> > >>> > As it was mentioned, the test makes sysprof to collect the particular >>> Typo: s/to collect/collect/ >>> > event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to >>> > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to >>> Typo: s/Hence/Hence,/ >>> > call <tostring> fast function in the profiled instance (i.e. "tracee"). >>> Typo: s/i.e./i.e.,/ >>> > To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin, >>> Typo: s/in scope/in the scope/ >>> > the manager (i.e. "tracer") is implemented. >>> Typo: s/i.e./i.e.,/ >>> > >>> > Here are the main steps (see comments and `man 2 ptrace' for more info): >>> > 1. Poison <int 3> instruction as the first instruction at >>> > <lj_ff_tostring> to stop at the beginning of the fast function; >>> > 2. Resume the "tracee" from the "tracer"; >>> > 3. Hit the emitted interruption, restore the original instruction and >>> > "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; >>> > 4. Do the hack 1-3 for <lj_fff_res1>; >>> > 5. Emit SIGPROF while resumimg the "tracee"; >>> Typo: s/resumimg/resuming/ >>> > >>> > As a result sysprof collects the full backtrace with <tostring> fast >>> Typo: s/a result/a result,/ >>> > function as the topmost frame. >>> > >>> > Resolves tarantool/tarantool#9387 >>> > Follows up tarantool/tarantool#8594 >>> > >>> > Signed-off-by: Igor Munkin < imun@tarantool.org > >> >>Here is the updated commit message: >> >>================================================================================ >> >>test: rewrite sysprof test using managed execution >> >>Often, tests for sampling profiler require long running loops to be >>executed, so a certain situation is likely to occur. Thus, the test >>added in commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix >>crash during FFUNC stream") expects the FFUNC VM state (and even the >>particular instruction to be executed) at the moment when stacktrace is >>being collected. Unfortunately, it leads to the test routine hang for >>several environments, and even if it does not, we cannot guarantee that >>the desired scenario is tested (only rely on statistics). As a result, >>the test for the aforementioned patch was disabled for Tarantool CI in >>the commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent >>hanging Tarantool CI by sysprof test") until the issue is not resolved. >> >>This patch introduces the new approach for testing our sampling profiler >>via <ptrace>, implementing precise managed execution of the profiled >>instance mentioned in tarantool/tarantool#9387. >> >>Instead of running around <tostring> gazillion times, we accurately step >>to the exact place where the issue reproduces and manually emit SIGPROF >>to the Lua VM being profiled. The particular approach implemented in >>this patch is described below. >> >>As it was mentioned, the test makes sysprof collect the particular event >>type (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to >>reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to >>call <tostring> fast function in the profiled instance (i.e., "tracee"). >>To emit SIGPROF right at <lj_fff_res1> in the scope of <tostring> >>builtin, the manager (i.e., "tracer") is implemented. >> >>Here are the main steps (see comments and `man 2 ptrace' for more info): >> 1. Poison <int 3> instruction as the first instruction at >> <lj_ff_tostring> to stop at the beginning of the fast function; >> 2. Resume the "tracee" from the "tracer"; >> 3. Hit the emitted interruption, restore the original instruction and >> "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>; >> 4. Do the hack 1-3 for <lj_fff_res1>; >> 5. Emit SIGPROF while resuming the "tracee"; >> >>As a result, sysprof collects the full backtrace with <tostring> fast >>function as the topmost frame. >> >>Resolves tarantool/tarantool#9387 >>Follows up tarantool/tarantool#8594 >> >>================================================================================ >> >>> > --- >>> > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ >>> > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- >>> > 2 files changed, 269 insertions(+), 55 deletions(-) >>> > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>> > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >>> > >>> > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>> > new file mode 100644 >>> > index 00000000..4caed8cb >>> > --- /dev/null >>> > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>> > @@ -0,0 +1,269 @@ >>> > +#include "lauxlib.h" >>> > +#include "lmisclib.h" >>> > +#include "lua.h" >>> > + >>> > +#include "test.h" >>> > +#include "utils.h" >>> > + >>> > +#include <signal.h> >>> > +#include <sys/ptrace.h> >>> > +#include <sys/user.h> >>> > +#include <sys/wait.h> >>> > +#include <unistd.h> >>> > + >>> > +/* XXX: Still need normal assert inside <tracee> and helpers. */ >>> > +#undef NDEBUG >>> > +#include <assert.h> >>> > + >>> > +/* >>> > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so >>> > + * does the sampling profiler. <lj_arch.> is needed for LUAJIT_OS >>> > + * and LUAJIT_TARGET. >>> > + */ >>> > +#include "lj_arch.h" >>> > + >>> > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 >>> Why not use the `LJ_HASSYSPROF` flag? >> >>Since the test per se is Linux/x86_64 specific (consider the particular >>register name used below). E.g. sysprof is implemented for x86 either, >>but the test is not. If somebody (Sergey Kaplun) finally decided to >>introduce x86 testing environment, the test would crash. I prefer more >>developer-friendly approach here (considering <ptrace> peculiarities). >> >>> > + >>> > +/* >>> > + * XXX: The test makes sysprof to collect the particular event >>> Typo: s/sysprof to/sysprof/ >>> > + * (FFUNC) at the particular instruction (<lj_fff_res1>) to >>> > + * reproduce the issue #8594. Hence it's enough to call <tostring> >>> Typo: s/Hence/Hence,/ >>> > + * fast function (this is done in <tracee> function). To emit >>> > + * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast >>> > + * function, the managed execution is implemented in the function >>> > + * <tracer>: <int 3> instruction is poisoned as the first >>> > + * instruction at <lj_ff_tostring> to stop <tracee> at the >>> > + * beginning of the fast function; <tracer> resumes the <tracee>; >>> > + * the same hack is done for <lj_fff_res1>. When the <tracee> hits >>> > + * the interruption at <lj_fff_res1>, SIGPROF is emitted while >>> > + * resuming the <tracee>. As a result sysprof collects the full >>> Typo: s/a result/a result,/ >>> > + * backtrace with <tostring> fast function as the topmost frame. >>> > + * >>> > + * See more info here: >>> > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html >>> > + * * https://github.com/tarantool/tarantool/issues/8594 >>> > + * * https://github.com/tarantool/tarantool/issues/9387 >>> > + */ >> >>Fixed. The diff is below: >> >>================================================================================ >> >>diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>index 4caed8cb..f060760b 100644 >>--- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>+++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>@@ -25,18 +25,18 @@ >> #if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 >> >> /* >>- * XXX: The test makes sysprof to collect the particular event >>+ * XXX: The test makes sysprof collect the particular event >> * (FFUNC) at the particular instruction (<lj_fff_res1>) to >>- * reproduce the issue #8594. Hence it's enough to call <tostring> >>- * fast function (this is done in <tracee> function). To emit >>- * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast >>- * function, the managed execution is implemented in the function >>- * <tracer>: <int 3> instruction is poisoned as the first >>+ * reproduce the issue #8594. Hence, it's enough to call >>+ * <tostring> fast function (this is done in <tracee> function). >>+ * To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> >>+ * fast function, the managed execution is implemented in the >>+ * function <tracer>: <int 3> instruction is poisoned as the first >> * instruction at <lj_ff_tostring> to stop <tracee> at the >> * beginning of the fast function; <tracer> resumes the <tracee>; >> * the same hack is done for <lj_fff_res1>. When the <tracee> hits >> * the interruption at <lj_fff_res1>, SIGPROF is emitted while >>- * resuming the <tracee>. As a result sysprof collects the full >>+ * resuming the <tracee>. As a result, sysprof collects the full >> * backtrace with <tostring> fast function as the topmost frame. >> * >> * See more info here: >> >>================================================================================ >> >>> > + >> >><snipped> >> >>> > + >>> > + /* Allow tracing for this process */ >>> > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { >>> > + perror("Failed to turn the calling thread into a tracee"); >>> Typo?: s/thread/process/ >> >>Yeah, copy-pasted from the <ptrace> recipe. Fixed. >> >>> > + return EXIT_FAILURE; >>> > + } >> >><snipped> >> >>> > + /* Terminate profiler and Lua universe. */ >>> > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); >>> > + utils_lua_close(L); >>> It doesn't seem to be semantically correct to terminate Lua universe >>> before obtaining the counters, even though there is no obstacles to do >>> so in the API implementation itself. >> >>To[MAH]to vs.To[MAY]to: fixed. The diff is below. >> >>================================================================================ >> >>diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>index c0f9d1ca..5a2523ef 100644 >>--- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>+++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>@@ -142,9 +142,8 @@ static int tracee(const char *luacode) { >> assert(luaL_dostring(L, luacode) == LUA_OK); >> assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); >> >>- /* Terminate profiler and Lua universe. */ >>+ /* Terminate profiler. */ >> assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); >>- utils_lua_close(L); >> >> /* >> * XXX: The only event to be streamed must be FFUNC at >>@@ -155,6 +154,9 @@ static int tracee(const char *luacode) { >> assert(counters.samples == 1); >> assert(counters.vmst_ffunc == 1); >> >>+ /* Terminate Lua universe. */ >>+ utils_lua_close(L); >>+ >> return EXIT_SUCCESS; >> } >> >> >>================================================================================ >> >>> > + >> >><snipped> >> >>> > + /* Resume <chpid> tracee until SIGTRAP occurs. */ >>> > + ptrace(PTRACE_CONT, chpid, 0, 0); >>> > + /* Wait <chpid> tracee signal-delivery-stop. */ >>> > + waitpid(chpid, &wstatus, 0); >>> Should we check the `wstatus` here too? >> >>Implemented via <wait_alive> helper (see the details below). >> >>> > + >> >><snipped> >> >>> > + >>> > +static int tracer(pid_t chpid) { >>> > + int wstatus; >>> > + >>> > + /* Wait until <chpid> tracee is ready. */ >>> > + waitpid(chpid, &wstatus, 0); >>> Should we check the process status here too? >> >>Implemented via <wait_alive> helper (see the implementation below). >> >>================================================================================ >> >>diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>index 84c024f0..d17386dd 100644 >>--- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>+++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>@@ -222,7 +222,7 @@ static int tracer(pid_t chpid) { >> int wstatus; >> >> /* Wait until <chpid> tracee is ready. */ >>- waitpid(chpid, &wstatus, 0); >>+ wait_alive(chpid); >> >> /* >> * Resume <chpid> tracee until <lj_ff_tostring>. >> >>================================================================================ >> >>> > + >>> > + /* Resume <chpid> tracee until <lj_ff_tostring>. */ >>> > + wstatus = continue_until(chpid, lj_ff_tostring); >>> > + >>> > + /* The tracee has to be alive and stopped by SIGTRAP. */ >>> > + assert_false(WIFEXITED(wstatus)); >>> > + assert_true(WIFSTOPPED(wstatus)); >>> > + >>> > + /* Resume <chpid> tracee until <lj_fff_res1>. */ >>> > + wstatus = continue_until(chpid, lj_fff_res1); >>> > + >>> > + /* The tracee has to be alive and stopped by SIGTRAP. */ >>> > + assert_false(WIFEXITED(wstatus)); >>> > + assert_true(WIFSTOPPED(wstatus)); >>> This part is duplicated, let's move it to a separate `healthcheck` >>> function, or something like that. >> >>It's only duplicated in sense of reusing this particular code, no more. >>However, let's consider these conditions are not parts of the test >>assertions set, but rather invariants for <continue_until> helper (i.e. >><continue_until> has to yield the execution with the "alive" tracee). >> >>In this case, the patch for <continue_until> is the following: >> >>================================================================================ >> >>diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>index 5a2523ef..84c024f0 100644 >>--- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>+++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c >>@@ -160,6 +160,18 @@ static int tracee(const char *luacode) { >> return EXIT_SUCCESS; >> } >> >>+static void wait_alive(pid_t chpid) { >>+ int wstatus; >>+ >>+ /* Wait <chpid> tracee signal-delivery-stop. */ >>+ waitpid(chpid, &wstatus, 0); >>+ >>+ /* Check the tracee is still alive and just stopped. */ >>+ assert(!WIFEXITED(wstatus)); >>+ assert(!WIFSIGNALED(wstatus)); >>+ assert(WIFSTOPPED(wstatus)); >>+} >>+ >> const uint8_t INT3 = 0xCC; >> static inline unsigned long int3poison(unsigned long instruction) { >> const size_t int3bits = sizeof(INT3) * 8; >>@@ -167,8 +179,7 @@ static inline unsigned long int3poison(unsigned long instruction) { >> return (instruction & int3mask) | INT3; >> } >> >>-static int continue_until(pid_t chpid, void *addr) { >>- int wstatus; >>+static void continue_until(pid_t chpid, void *addr) { >> struct user_regs_struct regs; >> >> /* Obtain the instructions at the <addr>. */ >>@@ -182,8 +193,12 @@ static int continue_until(pid_t chpid, void *addr) { >> >> /* Resume <chpid> tracee until SIGTRAP occurs. */ >> ptrace(PTRACE_CONT, chpid, 0, 0); >>- /* Wait <chpid> tracee signal-delivery-stop. */ >>- waitpid(chpid, &wstatus, 0); >>+ >>+ /* >>+ * Wait <chpid> tracee signal-delivery-stop and check >>+ * whether it's still alive and just stopped via SIGTRAP. >>+ */ >>+ wait_alive(chpid); >> >> /* Obtain GPR set to tweak RIP for further execution. */ >> ptrace(PTRACE_GETREGS, chpid, 0, ®s); >>@@ -201,9 +216,6 @@ static int continue_until(pid_t chpid, void *addr) { >> regs.rip -= sizeof(INT3); >> ptrace(PTRACE_SETREGS, chpid, 0, ®s); >> ptrace(PTRACE_POKETEXT, chpid, addr, data); >>- >>- /* Return wait status to the caller for test checks. */ >>- return wstatus; >> } >> >> static int tracer(pid_t chpid) { >>@@ -212,19 +224,19 @@ static int tracer(pid_t chpid) { >> /* Wait until <chpid> tracee is ready. */ >> waitpid(chpid, &wstatus, 0); >> >>- /* Resume <chpid> tracee until <lj_ff_tostring>. */ >>- wstatus = continue_until(chpid, lj_ff_tostring); >>- >>- /* The tracee has to be alive and stopped by SIGTRAP. */ >>- assert_false(WIFEXITED(wstatus)); >>- assert_true(WIFSTOPPED(wstatus)); >>- >>- /* Resume <chpid> tracee until <lj_fff_res1>. */ >>- wstatus = continue_until(chpid, lj_fff_res1); >>+ /* >>+ * Resume <chpid> tracee until <lj_ff_tostring>. >>+ * The tracee is alive and stopped by SIGTRAP if >>+ * <continue_until> returns. >>+ */ >>+ continue_until(chpid, lj_ff_tostring); >> >>- /* The tracee has to be alive and stopped by SIGTRAP. */ >>- assert_false(WIFEXITED(wstatus)); >>- assert_true(WIFSTOPPED(wstatus)); >>+ /* >>+ * Resume <chpid> tracee until <lj_fff_res1>. >>+ * The tracee is alive and stopped by SIGTRAP if >>+ * <continue_until> returns. >>+ */ >>+ continue_until(chpid, lj_fff_res1); >> >> /* Send SIGPROF to make sysprof collect the event. */ >> ptrace(PTRACE_CONT, chpid, 0, SIGPROF); >> >>================================================================================ >> >>> > + >> >><snipped> >> >>-- >>Best regards, >>IM > [-- Attachment #2: Type: text/html, Size: 22208 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests 2023-11-28 14:53 [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Igor Munkin via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine Igor Munkin via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution Igor Munkin via Tarantool-patches @ 2023-11-28 16:14 ` Sergey Bronnikov via Tarantool-patches 2024-01-10 8:50 ` Igor Munkin via Tarantool-patches 3 siblings, 0 replies; 17+ messages in thread From: Sergey Bronnikov via Tarantool-patches @ 2023-11-28 16:14 UTC (permalink / raw) To: Igor Munkin, Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches Hello, On 11/28/23 17:53, Igor Munkin via Tarantool-patches wrote: > Hello there, > > The latter patch of the patchset provides the new approach for > deterministic testing for our sampling profiler. See more info in the > commit message. > > The first patch fixes the issue occurred while reimplementing the > sysprof test via fork(3) + ptrace(2): the output for prove was buffered > and hence duplicated when fork is done. I decided to turn of buffering probably you mean: turn off <snipped> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests 2023-11-28 14:53 [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Igor Munkin via Tarantool-patches ` (2 preceding siblings ...) 2023-11-28 16:14 ` [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Sergey Bronnikov via Tarantool-patches @ 2024-01-10 8:50 ` Igor Munkin via Tarantool-patches 3 siblings, 0 replies; 17+ messages in thread From: Igor Munkin via Tarantool-patches @ 2024-01-10 8:50 UTC (permalink / raw) To: Maxim Kokryashkin, Sergey Kaplun; +Cc: tarantool-patches Pals, thanks for your reviews! I've checked the patchset into all long-term branches in tarantool/luajit and bumped a new version in master, release/2.11 and release/2.10. On 28.11.23, Igor Munkin wrote: > Hello there, > > The latter patch of the patchset provides the new approach for > deterministic testing for our sampling profiler. See more info in the > commit message. > > The first patch fixes the issue occurred while reimplementing the > sysprof test via fork(3) + ptrace(2): the output for prove was buffered > and hence duplicated when fork is done. I decided to turn of buffering > at all, since there is little sense in it for tests and nobody wants to > debug many (still unrevealed) related problems. > > I hope the approach will be moved to our utils for tests written in C, > but I do not see the whole picture at the moment, so the approach is > implemented for the only test being affected by the patch for #8594. > > Branch: https://github.com/tarantool/luajit/commits/imun/sysprof-ptrace-ffunc-test > CI: https://github.com/tarantool/luajit/commit/b48b905 > Tarantool CI: https://github.com/tarantool/tarantool/pull/9424 > Related issues: > * https://github.com/tarantool/tarantool/issues/9387 > * https://github.com/tarantool/tarantool/issues/8594 > * https://github.com/tarantool/tarantool/issues/7900 > > Igor Munkin (2): > test: disable buffering for the C test engine > test: rewrite sysprof test using managed execution > > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > test/tarantool-c-tests/test.c | 6 + > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > 3 files changed, 275 insertions(+), 55 deletions(-) > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > -- > 2.39.2 > -- Best regards, IM ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-01-10 8:57 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-28 14:53 [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Igor Munkin via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 1/2] test: disable buffering for the C test engine Igor Munkin via Tarantool-patches 2023-12-03 12:25 ` Maxim Kokryashkin via Tarantool-patches 2023-12-04 9:48 ` Igor Munkin via Tarantool-patches 2023-12-04 8:46 ` Sergey Kaplun via Tarantool-patches 2023-12-04 9:50 ` Igor Munkin via Tarantool-patches 2023-11-28 14:53 ` [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution Igor Munkin via Tarantool-patches 2023-12-03 14:17 ` Maxim Kokryashkin via Tarantool-patches 2023-12-05 8:37 ` Sergey Kaplun via Tarantool-patches 2023-12-05 12:04 ` Igor Munkin via Tarantool-patches 2023-12-05 12:25 ` Sergey Kaplun via Tarantool-patches 2023-12-05 12:59 ` Igor Munkin via Tarantool-patches 2023-12-05 15:08 ` Sergey Kaplun via Tarantool-patches 2023-12-05 11:34 ` Igor Munkin via Tarantool-patches 2023-12-05 13:23 ` Maxim Kokryashkin via Tarantool-patches 2023-11-28 16:14 ` [Tarantool-patches] [PATCH luajit 0/2] Use ptrace for sysprof tests Sergey Bronnikov via Tarantool-patches 2024-01-10 8:50 ` Igor Munkin via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox