From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id B1A23716A53; Tue, 5 Dec 2023 14:35:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B1A23716A53 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1701776129; bh=87BcO8YyYu527yo/C+wStYhAmwHBCz421riXSy95AsA=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=vqi89p02iGb4njtzwEUgQNthYAx/g5yv7VW0DDDjDlKUild3g7kf8vWpMF+7SIBk5 SgS/s319nTDvRgpiGBDZaPKsUiuPIx54vsYBf6/evWH+gCIdMg124T8k+de2E5rGpm dIMMtWZuE0A1nJbCAeCLETEBjWf2EQTKjP1IaRnA= Received: from smtp36.i.mail.ru (smtp36.i.mail.ru [95.163.41.77]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 81CB6716A53 for ; Tue, 5 Dec 2023 14:35:27 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 81CB6716A53 Received: by smtp36.i.mail.ru with esmtpa (envelope-from ) id 1rATi2-00CSx2-1S; Tue, 05 Dec 2023 14:35:27 +0300 Date: Tue, 5 Dec 2023 11:34:39 +0000 To: Maxim Kokryashkin Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD91486559A3ABF84E788857612FD0928520F2C4037DC1BB7AD182A05F538085040554FED771F98F3632003A1F03931B44F3AD3960F3B921790450229BBABDFEA29 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE78981306C6E927004EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063719513F03911326708638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D89E7029D8BCA2CF84047552C25580191E117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC974A882099E279BDA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18F04B652EEC242312D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE9647ADFADE5905B1452896749CDDA0A6D8FC6C240DEA76429C9F4D5AE37F343AA9539A8B242431040A6AB1C7CE11FEE34AB4081B6A6C2E0703F1AB874ED89028C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB861051D4BA689FC2E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89FB26E97DCB74E6252C6EABA9B74D0DA47B5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A5BB68805570E5DEF936B01C7322A5EFCBDBA333413C0F5145F87CCE6106E1FC07E67D4AC08A07B9B01F9513A7CA91E555CB5012B2E24CD356 X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D34BC3EEE75EF3BACCF7F583C3DCC3309D3B9E4F136B8471EA378EA69659418DBE95E7B8B987C20E5B41D7E09C32AA3244CFD20D55125A2C1D2DA3FA5D98325104DFE8DA44ABE2443F7BAD658CF5C8AB4025DA084F8E80FEBD376A4ED3E9341DC2FCD72808BE417F3B9E0E7457915DAA85F X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojJEmydpeiNq+wX1QwXiP1zw== X-Mailru-Sender: 2FEBA92C8E508479FE7B9A1DF348D5310C7E8A9DDA059D4EC9673672B615692DEE413AC195CE408C2326FE6F2A341ACE0FB9F97486540B4CD9E8847AB8CFED4D9ABF8A61C016C2CFB0DAF586E7D11B3E67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Igor Munkin via Tarantool-patches Reply-To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Max, Thanks for your review! On 03.12.23, Maxim Kokryashkin wrote: > Hi, Igor! > Thanks for the patch! > Please consider my comments below. > > Side note: This new testing approach is amazing, thanks a lot! > > On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote: > > Often tests for sampling profiler require long running loops to be > Typo: s/Often/Often,/ > > executed, so a certain situation is likely to occur. Thus the test added > Typo: s/Thus/Thus,/ > > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix > Typo: s/in the/in/ > > crash during FFUNC stream") expects FFUNC VM state (and even the > Typo: s/expects FFUNC/expects the FFUNC/ > > particular instruction to be executed) at the moment when stacktrace is > > being collected. Unfortunately, it leads to test routine hang for > Typo: s/to test routine hang/to the test routine hanging/ > > several environments and if it does not we cannot guarantee that the > Typo: s/environments/environments,/ > Typo: s/and if it does not/and even if it does not,/ > > desired scenario is tested (only rely on statistics). As a result the > Typo: s/As a result/As a result,/ > > test for the aforementioned patch was disabled for Tarantool CI in the > > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging > > Tarantool CI by sysprof test") until the issue is not resolved. > > > > This patch introduces the new approach for testing our sampling profiler > > via implementing precise managed execution of the profiled > Typo: s//,/ > > instance mentioned in tarantool/tarantool#9387. > > > > Instead of running around gazillion times we accurately step > Typo: s/times/times,/ > > to the exact place where the issue reproduces and manually emit SIGPROF > > to the Lua VM being profiled. The particular approach implemented in > > this patch is described below. > > > > As it was mentioned, the test makes sysprof to collect the particular > Typo: s/to collect/collect/ > > event (FFUNC) at the certain instruction in Lua VM () to > > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to > Typo: s/Hence/Hence,/ > > call fast function in the profiled instance (i.e. "tracee"). > Typo: s/i.e./i.e.,/ > > To emit SIGPROF right at in scope of builtin, > Typo: s/in scope/in the scope/ > > the manager (i.e. "tracer") is implemented. > Typo: s/i.e./i.e.,/ > > > > Here are the main steps (see comments and `man 2 ptrace' for more info): > > 1. Poison instruction as the first instruction at > > to stop at the beginning of the fast function; > > 2. Resume the "tracee" from the "tracer"; > > 3. Hit the emitted interruption, restore the original instruction and > > "rewind" the RIP to "replay" the instruction at ; > > 4. Do the hack 1-3 for ; > > 5. Emit SIGPROF while resumimg the "tracee"; > Typo: s/resumimg/resuming/ > > > > As a result sysprof collects the full backtrace with fast > Typo: s/a result/a result,/ > > function as the topmost frame. > > > > Resolves tarantool/tarantool#9387 > > Follows up tarantool/tarantool#8594 > > > > Signed-off-by: Igor Munkin Here is the updated commit message: ================================================================================ test: rewrite sysprof test using managed execution Often, tests for sampling profiler require long running loops to be executed, so a certain situation is likely to occur. Thus, the test added in commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix crash during FFUNC stream") expects the FFUNC VM state (and even the particular instruction to be executed) at the moment when stacktrace is being collected. Unfortunately, it leads to the test routine hang for several environments, and even if it does not, we cannot guarantee that the desired scenario is tested (only rely on statistics). As a result, the test for the aforementioned patch was disabled for Tarantool CI in the commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging Tarantool CI by sysprof test") until the issue is not resolved. This patch introduces the new approach for testing our sampling profiler via , implementing precise managed execution of the profiled instance mentioned in tarantool/tarantool#9387. Instead of running around gazillion times, we accurately step to the exact place where the issue reproduces and manually emit SIGPROF to the Lua VM being profiled. The particular approach implemented in this patch is described below. As it was mentioned, the test makes sysprof collect the particular event type (FFUNC) at the certain instruction in Lua VM () to reproduce the issue from tarantool/tarantool#8594. Hence, it's enough to call fast function in the profiled instance (i.e., "tracee"). To emit SIGPROF right at in the scope of builtin, the manager (i.e., "tracer") is implemented. Here are the main steps (see comments and `man 2 ptrace' for more info): 1. Poison instruction as the first instruction at to stop at the beginning of the fast function; 2. Resume the "tracee" from the "tracer"; 3. Hit the emitted interruption, restore the original instruction and "rewind" the RIP to "replay" the instruction at ; 4. Do the hack 1-3 for ; 5. Emit SIGPROF while resuming the "tracee"; As a result, sysprof collects the full backtrace with fast function as the topmost frame. Resolves tarantool/tarantool#9387 Follows up tarantool/tarantool#8594 ================================================================================ > > --- > > .../gh-8594-sysprof-ffunc-crash.test.c | 269 ++++++++++++++++++ > > .../gh-8594-sysprof-ffunc-crash.test.lua | 55 ---- > > 2 files changed, 269 insertions(+), 55 deletions(-) > > create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua > > > > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > new file mode 100644 > > index 00000000..4caed8cb > > --- /dev/null > > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c > > @@ -0,0 +1,269 @@ > > +#include "lauxlib.h" > > +#include "lmisclib.h" > > +#include "lua.h" > > + > > +#include "test.h" > > +#include "utils.h" > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* XXX: Still need normal assert inside and helpers. */ > > +#undef NDEBUG > > +#include > > + > > +/* > > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so > > + * does the sampling profiler. is needed for LUAJIT_OS > > + * and LUAJIT_TARGET. > > + */ > > +#include "lj_arch.h" > > + > > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 > Why not use the `LJ_HASSYSPROF` flag? Since the test per se is Linux/x86_64 specific (consider the particular register name used below). E.g. sysprof is implemented for x86 either, but the test is not. If somebody (Sergey Kaplun) finally decided to introduce x86 testing environment, the test would crash. I prefer more developer-friendly approach here (considering peculiarities). > > + > > +/* > > + * XXX: The test makes sysprof to collect the particular event > Typo: s/sysprof to/sysprof/ > > + * (FFUNC) at the particular instruction () to > > + * reproduce the issue #8594. Hence it's enough to call > Typo: s/Hence/Hence,/ > > + * fast function (this is done in function). To emit > > + * SIGPROF right at in scope of fast > > + * function, the managed execution is implemented in the function > > + * : instruction is poisoned as the first > > + * instruction at to stop at the > > + * beginning of the fast function; resumes the ; > > + * the same hack is done for . When the hits > > + * the interruption at , SIGPROF is emitted while > > + * resuming the . As a result sysprof collects the full > Typo: s/a result/a result,/ > > + * backtrace with fast function as the topmost frame. > > + * > > + * See more info here: > > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html > > + * * https://github.com/tarantool/tarantool/issues/8594 > > + * * https://github.com/tarantool/tarantool/issues/9387 > > + */ Fixed. The diff is below: ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 4caed8cb..f060760b 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -25,18 +25,18 @@ #if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64 /* - * XXX: The test makes sysprof to collect the particular event + * XXX: The test makes sysprof collect the particular event * (FFUNC) at the particular instruction () to - * reproduce the issue #8594. Hence it's enough to call - * fast function (this is done in function). To emit - * SIGPROF right at in scope of fast - * function, the managed execution is implemented in the function - * : instruction is poisoned as the first + * reproduce the issue #8594. Hence, it's enough to call + * fast function (this is done in function). + * To emit SIGPROF right at in scope of + * fast function, the managed execution is implemented in the + * function : instruction is poisoned as the first * instruction at to stop at the * beginning of the fast function; resumes the ; * the same hack is done for . When the hits * the interruption at , SIGPROF is emitted while - * resuming the . As a result sysprof collects the full + * resuming the . As a result, sysprof collects the full * backtrace with fast function as the topmost frame. * * See more info here: ================================================================================ > > + > > + > > + /* Allow tracing for this process */ > > + if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { > > + perror("Failed to turn the calling thread into a tracee"); > Typo?: s/thread/process/ Yeah, copy-pasted from the recipe. Fixed. > > + return EXIT_FAILURE; > > + } > > + /* Terminate profiler and Lua universe. */ > > + assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); > > + utils_lua_close(L); > It doesn't seem to be semantically correct to terminate Lua universe > before obtaining the counters, even though there is no obstacles to do > so in the API implementation itself. To[MAH]to vs.To[MAY]to: fixed. The diff is below. ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index c0f9d1ca..5a2523ef 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -142,9 +142,8 @@ static int tracee(const char *luacode) { assert(luaL_dostring(L, luacode) == LUA_OK); assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0); - /* Terminate profiler and Lua universe. */ + /* Terminate profiler. */ assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS); - utils_lua_close(L); /* * XXX: The only event to be streamed must be FFUNC at @@ -155,6 +154,9 @@ static int tracee(const char *luacode) { assert(counters.samples == 1); assert(counters.vmst_ffunc == 1); + /* Terminate Lua universe. */ + utils_lua_close(L); + return EXIT_SUCCESS; } ================================================================================ > > + > > + /* Resume tracee until SIGTRAP occurs. */ > > + ptrace(PTRACE_CONT, chpid, 0, 0); > > + /* Wait tracee signal-delivery-stop. */ > > + waitpid(chpid, &wstatus, 0); > Should we check the `wstatus` here too? Implemented via helper (see the details below). > > + > > + > > +static int tracer(pid_t chpid) { > > + int wstatus; > > + > > + /* Wait until tracee is ready. */ > > + waitpid(chpid, &wstatus, 0); > Should we check the process status here too? Implemented via helper (see the implementation below). ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 84c024f0..d17386dd 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -222,7 +222,7 @@ static int tracer(pid_t chpid) { int wstatus; /* Wait until tracee is ready. */ - waitpid(chpid, &wstatus, 0); + wait_alive(chpid); /* * Resume tracee until . ================================================================================ > > + > > + /* Resume tracee until . */ > > + wstatus = continue_until(chpid, lj_ff_tostring); > > + > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > + assert_false(WIFEXITED(wstatus)); > > + assert_true(WIFSTOPPED(wstatus)); > > + > > + /* Resume tracee until . */ > > + wstatus = continue_until(chpid, lj_fff_res1); > > + > > + /* The tracee has to be alive and stopped by SIGTRAP. */ > > + assert_false(WIFEXITED(wstatus)); > > + assert_true(WIFSTOPPED(wstatus)); > This part is duplicated, let's move it to a separate `healthcheck` > function, or something like that. It's only duplicated in sense of reusing this particular code, no more. However, let's consider these conditions are not parts of the test assertions set, but rather invariants for helper (i.e. has to yield the execution with the "alive" tracee). In this case, the patch for is the following: ================================================================================ diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c index 5a2523ef..84c024f0 100644 --- a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c @@ -160,6 +160,18 @@ static int tracee(const char *luacode) { return EXIT_SUCCESS; } +static void wait_alive(pid_t chpid) { + int wstatus; + + /* Wait tracee signal-delivery-stop. */ + waitpid(chpid, &wstatus, 0); + + /* Check the tracee is still alive and just stopped. */ + assert(!WIFEXITED(wstatus)); + assert(!WIFSIGNALED(wstatus)); + assert(WIFSTOPPED(wstatus)); +} + const uint8_t INT3 = 0xCC; static inline unsigned long int3poison(unsigned long instruction) { const size_t int3bits = sizeof(INT3) * 8; @@ -167,8 +179,7 @@ static inline unsigned long int3poison(unsigned long instruction) { return (instruction & int3mask) | INT3; } -static int continue_until(pid_t chpid, void *addr) { - int wstatus; +static void continue_until(pid_t chpid, void *addr) { struct user_regs_struct regs; /* Obtain the instructions at the . */ @@ -182,8 +193,12 @@ static int continue_until(pid_t chpid, void *addr) { /* Resume tracee until SIGTRAP occurs. */ ptrace(PTRACE_CONT, chpid, 0, 0); - /* Wait tracee signal-delivery-stop. */ - waitpid(chpid, &wstatus, 0); + + /* + * Wait tracee signal-delivery-stop and check + * whether it's still alive and just stopped via SIGTRAP. + */ + wait_alive(chpid); /* Obtain GPR set to tweak RIP for further execution. */ ptrace(PTRACE_GETREGS, chpid, 0, ®s); @@ -201,9 +216,6 @@ static int continue_until(pid_t chpid, void *addr) { regs.rip -= sizeof(INT3); ptrace(PTRACE_SETREGS, chpid, 0, ®s); ptrace(PTRACE_POKETEXT, chpid, addr, data); - - /* Return wait status to the caller for test checks. */ - return wstatus; } static int tracer(pid_t chpid) { @@ -212,19 +224,19 @@ static int tracer(pid_t chpid) { /* Wait until tracee is ready. */ waitpid(chpid, &wstatus, 0); - /* Resume tracee until . */ - wstatus = continue_until(chpid, lj_ff_tostring); - - /* The tracee has to be alive and stopped by SIGTRAP. */ - assert_false(WIFEXITED(wstatus)); - assert_true(WIFSTOPPED(wstatus)); - - /* Resume tracee until . */ - wstatus = continue_until(chpid, lj_fff_res1); + /* + * Resume tracee until . + * The tracee is alive and stopped by SIGTRAP if + * returns. + */ + continue_until(chpid, lj_ff_tostring); - /* The tracee has to be alive and stopped by SIGTRAP. */ - assert_false(WIFEXITED(wstatus)); - assert_true(WIFSTOPPED(wstatus)); + /* + * Resume tracee until . + * The tracee is alive and stopped by SIGTRAP if + * returns. + */ + continue_until(chpid, lj_fff_res1); /* Send SIGPROF to make sysprof collect the event. */ ptrace(PTRACE_CONT, chpid, 0, SIGPROF); ================================================================================ > > + -- Best regards, IM