From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 3B483825AC1; Sun, 3 Dec 2023 17:17:15 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3B483825AC1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1701613035; bh=iWtwNMdXMo/UZflqMVIf/C9Yz+DwxI6+c1OWzl/kQ4Y=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=enG61/S/yeYqbi4iPVHaSA2zWInO7Mc4UnyOfcNX/f2u8+RCvtcy9Bm9w3BdVEeYz 6Ir+JHkoHIm+Kd8Qujec2Y1yU+9sJ5+9W0GGGrv3s8ztQOGBFB++MWsjQFW1SaT72y ZGUcQmrKNFZCaM4c5RnUNoGS7FOhHpLLsNnsdEJc= Received: from smtp30.i.mail.ru (smtp30.i.mail.ru [95.163.41.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9BDDD825AC1 for ; Sun, 3 Dec 2023 17:17:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 9BDDD825AC1 Received: by smtp30.i.mail.ru with esmtpa (envelope-from ) id 1r9nHU-002XJK-1E; Sun, 03 Dec 2023 17:17:13 +0300 Date: Sun, 3 Dec 2023 17:17:12 +0300 To: Igor Munkin Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD91486559A3ABF84E7CBAF789B6B32821FF6551DECFE8B152A182A05F5380850406686929DA2B3B39FBF9DB5A9C0182003E652FA24D6BF09053DE89979697DFC05 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE737BB76880A4CA9A4EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063766C9C3EB94AE7C6E8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D866963B7546327DE0C5E9DA1D8371066E117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCECADA55FE5B58BB7A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FCB629EEF1311BF91D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EEB1593CA6EC85F86DFCE65BE3358055BDD8FC6C240DEA76429C9F4D5AE37F343AA9539A8B242431040A6AB1C7CE11FEE3457EE4B4996FC54603F1AB874ED89028C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947C062BEEFFB5F8EA3E2E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89F616AD31D0D18CD5C6D8C47C27EEC5E9FB5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A5C5B4FF7FD6DBF17E0C5E7627325DCE5E1A4508520D7C436EF87CCE6106E1FC07E67D4AC08A07B9B0A6C7FFFE744CA7FBCB5012B2E24CD356 X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFD5C24D34B9F07ACABFC7726E0EDB6F2B7D6612D32735E30642B8D31C4814B4434905B00F5DFE93A29592102E2E78BC315B20296BAD9607ACA5C6D67A56B111F5E48CAC7CA610320002C26D483E81D6BE64ACE4A408B72B61B0CA6F94E606A667A52EF62A646584F811BD90D3D42C882D43082AE146A756F3 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojAT457WPMCJwEoaOBd3lu1Q== X-Mailru-Sender: 7940E2A4EB16C997B2A36B21676DDA6B38F6D0E1EAC74352A52D7CFFE4A78006E2527C969975515CFF9FCECFB8D89CB6C77752E0C033A69E235A20A81F3B0E39AB3C5F247CB2F7F93A5DB60FBEB33A8A0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Maxim Kokryashkin via Tarantool-patches Reply-To: Maxim Kokryashkin Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Igor! Thanks for the patch! Please consider my comments below. Side note: This new testing approach is amazing, thanks a lot! On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote: > Often tests for sampling profiler require long running loops to be Typo: s/Often/Often,/ > executed, so a certain situation is likely to occur. Thus the test added Typo: s/Thus/Thus,/ > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix Typo: s/in the/in/ > crash during FFUNC stream") expects FFUNC VM state (and even the Typo: s/expects FFUNC/expects the FFUNC/ > particular instruction to be executed) at the moment when stacktrace is > being collected. Unfortunately, it leads to test routine hang for Typo: s/to test routine hang/to the test routine hanging/ > several environments and if it does not we cannot guarantee that the Typo: s/environments/environments,/ Typo: s/and if it does not/and even if it does not,/ > desired scenario is tested (only rely on statistics). As a result the Typo: s/As a result/As a result,/ > test for the aforementioned patch was disabled for Tarantool CI in the > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging > Tarantool CI by sysprof test") until the issue is not resolved. > > This patch introduces the new approach for testing our sampling profiler > via implementing precise managed execution of the profiled Typo: s//,/ > instance mentioned in tarantool/tarantool#9387. > > Instead of running around gazillion times we accurately step Typo: s/times/times,/ > to the exact place where the issue reproduces and manually emit SIGPROF > to the Lua VM being profiled. The particular approach implemented in > this patch is described below. > > As it was mentioned, the test makes sysprof to collect the particular Typo: s/to collect/collect/ > event (FFUNC) at the certain instruction in Lua VM () to > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to Typo: s/Hence/Hence,/ > call fast function in the profiled instance (i.e. "tracee"). Typo: s/i.e./i.e.,/ > To emit SIGPROF right at in scope of builtin, Typo: s/in scope/in the scope/ > the manager (i.e. "tracer") is implemented. Typo: s/i.e./i.e.,/ > > Here are the main steps (see comments and `man 2 ptrace' for more info): > 1. Poison instruction as the first instruction at > to stop at the beginning of the fast function; > 2. Resume the "tracee" from the "tracer"; > 3. Hit the emitted interruption, restore the original instruction and > "rewind" the RIP to "replay" the instruction at ; > 4. Do the hack 1-3 for ; > 5. Emit SIGPROF while resumimg the "tracee"; Typo: s/resumimg/resuming/ > > As a result sysprof collects the full backtrace with fast Typo: s/a result/a result,/ > function as the topmost frame. > > Resolves tarantool/tarantool#9387 > Follows up tarantool/tarantool#8594 > > Signed-off-by: Igor Munkin > --- > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > 2 files changed, 269 insertions(+), 55 deletions(-) > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > new file mode 100644 > index 00000000..4caed8cb > --- /dev/null > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > @@ -0,0 +1,269 @@ > +#include "lauxlib.h" > +#include "lmisclib.h" > +#include "lua.h" > + > +#include "test.h" > +#include "utils.h" > + > +#include > +#include > +#include > +#include > +#include > + > +/* XXX: Still need normal assert inside and helpers. */ > +#undef NDEBUG > +#include > + > +/* > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so > + * does the sampling profiler. is needed for LUAJIT_OS > + * and LUAJIT_TARGET. > + */ > +#include "lj_arch.h" > + > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 Why not use the `LJ_HASSYSPROF` flag? > + > +/* > + * XXX: The test makes sysprof to collect the particular event Typo: s/sysprof to/sysprof/ > + * (FFUNC) at the particular instruction () to > + * reproduce the issue #8594. Hence it's enough to call Typo: s/Hence/Hence,/ > + * fast function (this is done in function). To emit > + * SIGPROF right at in scope of fast > + * function, the managed execution is implemented in the function > + * : instruction is poisoned as the first > + * instruction at to stop at the > + * beginning of the fast function; resumes the ; > + * the same hack is done for . When the hits > + * the interruption at , SIGPROF is emitted while > + * resuming the . As a result sysprof collects the full Typo: s/a result/a result,/ > + * backtrace with fast function as the topmost frame. > + * > + * See more info here: > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html > + * * https://github.com/tarantool/tarantool/issues/8594 > + * * https://github.com/tarantool/tarantool/issues/9387 > + */ > + > +#define MESSAGE "Canary is alive" > +#define LUACALL "local a = tostring('" MESSAGE "') return a" > +/* XXX: Resolve the necessary addresses from VM engine. */ > +extern void *lj_ff_tostring(void); > +extern void *lj_fff_res1(void); > + > +/* Sysprof "/dev/null" stream helpers. {{{ */ > + > +/* > + * Yep, 8Mb. Tuned in order not to bother the platform with too > + * often flushes. > + */ > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024) > +#define DEVNULL -1 > + > +struct devnull_ctx { > + /* > + * XXX: Dummy file descriptor to be used as "/dev/null" > + * context indicator in writer and on_stop callback. > + */ > + int fd; > + /* Buffer for data recorded by sysprof. */ > + uint8_t buf[STREAM_BUFFER_SIZE]; > +}; > + > +static int stream_new(struct luam_Sysprof_Options *options) { > + struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx)); > + if (ctx == NULL) > + return PROFILE_ERRIO; > + > + /* Set "/dev/null" context indicator. */ > + ctx->fd = DEVNULL; > + options->ctx = ctx; > + options->buf = ctx->buf; > + options->len = STREAM_BUFFER_SIZE; > + > + return PROFILE_SUCCESS; > +} > + > +static int stream_delete(void *rawctx, uint8_t *buf) { > + struct devnull_ctx *ctx = rawctx; > + assert(ctx->fd == DEVNULL); > + free(ctx); > + return PROFILE_SUCCESS; > +} > + > +static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) { > + struct devnull_ctx *ctx = rawctx; > + assert(ctx->fd == DEVNULL); > + /* Do nothing, just return back to the profiler. */ > + return STREAM_BUFFER_SIZE; > +} > + > +/* }}} Sysprof "/dev/null" stream helpers. */ > + > +static int tracee(const char *luacode) { > + struct luam_Sysprof_Counters counters = {}; > + struct luam_Sysprof_Options opt = { > + /* Collect full backtraces per event. */ > + .mode = LUAM_SYSPROF_CALLGRAPH, > + /* > + * XXX: Setting the "endless timer". The test > + * requires the single event to be streamed at > + * instruction, so to avoid spoiling > + * the stream with other unwanted events, the > + * timer is set to some unreachable point, so the > + * profiler will be guaranteed to stop before any > + * event is emitted. > + */ > + .interval = -1ULL, > + }; > + > + /* Allow tracing for this process */ > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { > + perror("Failed to turn the calling thread into a tracee"); Typo?: s/thread/process/ > + return EXIT_FAILURE; > + } > + > + /* > + * XXX: Allow parent (which is our tracer now) to observe > + * our signal-delivery-stop (i.e. the tracee is ready). > + * For more info see ptrace(2), "Attaching and detaching". > + */ > + raise(SIGSTOP); > + > + lua_State *L = utils_lua_init(); > + > + /* Customize and start profiler. */ > + assert(stream_new(&opt) == PROFILE_SUCCESS); > + assert(luaM_sysprof_set_writer(stream_writer) == PROFILE_SUCCESS); > + assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS); > + assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS); > + > + /* FIXME: Make this part test agnostic. */ > + assert(luaL_dostring(L, luacode) == LUA_OK); > + assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); > + > + /* Terminate profiler and Lua universe. */ > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); > + utils_lua_close(L); It doesn't seem to be semantically correct to terminate Lua universe before obtaining the counters, even though there is no obstacles to do so in the API implementation itself. > + > + /* > + * XXX: The only event to be streamed must be FFUNC at > + * instruction. > + * FIXME: Make this part test agnostic. > + */ > + assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS); > + assert(counters.samples == 1); > + assert(counters.vmst_ffunc == 1); > + > + return EXIT_SUCCESS; > +} > + > +const uint8_t INT3 = 0xCC; > +static inline unsigned long int3poison(unsigned long instruction) { > + const size_t int3bits = sizeof(INT3) * 8; > + const unsigned long int3mask = -1UL >> int3bits << int3bits; > + return (instruction & int3mask) | INT3; > +} > + > +static int continue_until(pid_t chpid, void *addr) { > + int wstatus; > + struct user_regs_struct regs; > + > + /* Obtain the instructions at the . */ > + unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0); > + /* > + * Emit the instruction to the . > + * XXX: is poisoned as the LSB to the > + * obtained from the above. > + */ > + ptrace(PTRACE_POKETEXT, chpid, addr, int3poison(data)); > + > + /* Resume tracee until SIGTRAP occurs. */ > + ptrace(PTRACE_CONT, chpid, 0, 0); > + /* Wait tracee signal-delivery-stop. */ > + waitpid(chpid, &wstatus, 0); Should we check the `wstatus` here too? > + > + /* Obtain GPR set to tweak RIP for further execution. */ > + ptrace(PTRACE_GETREGS, chpid, 0, ®s); > + /* > + * Make sure we indeed are stopped at . > + * XXX: RIP points right after instruction. > + */ > + assert(regs.rip == (long)addr + sizeof(INT3)); > + > + /* > + * XXX: Restore the original instruction at and > + * "rewind" RIP by size to "replay" the poisoned > + * instruction at the . > + */ > + regs.rip -= sizeof(INT3); > + ptrace(PTRACE_SETREGS, chpid, 0, ®s); > + ptrace(PTRACE_POKETEXT, chpid, addr, data); > + > + /* Return wait status to the caller for test checks. */ > + return wstatus; > +} > + > +static int tracer(pid_t chpid) { > + int wstatus; > + > + /* Wait until tracee is ready. */ > + waitpid(chpid, &wstatus, 0); Should we check the process status here too? > + > + /* Resume tracee until . */ > + wstatus = continue_until(chpid, lj_ff_tostring); > + > + /* The tracee has to be alive and stopped by SIGTRAP. */ > + assert_false(WIFEXITED(wstatus)); > + assert_true(WIFSTOPPED(wstatus)); > + > + /* Resume tracee until . */ > + wstatus = continue_until(chpid, lj_fff_res1); > + > + /* The tracee has to be alive and stopped by SIGTRAP. */ > + assert_false(WIFEXITED(wstatus)); > + assert_true(WIFSTOPPED(wstatus)); This part is duplicated, let's move it to a separate `healthcheck` function, or something like that. > + > + /* Send SIGPROF to make sysprof collect the event. */ > + ptrace(PTRACE_CONT, chpid, 0, SIGPROF); > + > + /* Wait until tracee successfully exits. */ > + waitpid(chpid, &wstatus, 0); > + assert_true(WIFEXITED(wstatus)); > + > + return TEST_EXIT_SUCCESS; > +} > + > +static int test_tostring_call(void *ctx) { > + pid_t chpid = fork(); > + switch(chpid) { > + case -1: > + bail_out("Tracee fork failed"); > + case 0: > + /* > + * XXX: Tracee has to instead of > + * to avoid duplicate reports in . > + * Test assertions are used only in the , > + * so the ought to report whether the > + * test succeeded or not. > + */ > + exit(tracee(LUACALL)); > + default: > + return tracer(chpid); > + } > +} > + > +#else /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ > + > +static int test_tostring_call(void *ctx) { > + return skip_all("sysprof is implemented for Linux/x86_64 only"); > +} > + > +#endif /* LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 */ > + > +int main(void) { > + const struct test_unit tgroup[] = { > + test_unit_def(test_tostring_call), > + }; > + return test_run_group(tgroup, NULL); > +} > diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > deleted file mode 100644 > index f8b409ae..00000000 > --- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > +++ /dev/null