[Tarantool-patches] [PATCH luajit 2/2] test: rewrite sysprof test using managed execution

Sergey Kaplun skaplun at tarantool.org
Tue Dec 5 11:37:20 MSK 2023


Hi, Igor!
Thanks for the patch!
Really awesome improvement for our sysprof testing!
Please consider my comments below.

On 03.12.23, Maxim Kokryashkin wrote:
> Hi, Igor!
> Thanks for the patch!
> Please consider my comments below.
> 
> Side note: This new testing approach is amazing, thanks a lot!
> 
> On Tue, Nov 28, 2023 at 02:53:17PM +0000, Igor Munkin wrote:
> > Often tests for sampling profiler require long running loops to be
> Typo: s/Often/Often,/
> > executed, so a certain situation is likely to occur. Thus the test added
> Typo: s/Thus/Thus,/
> > in the commit 285a1b0a16c1c3224ab93f7cf1fef17ab29e4ab4 ("sysprof: fix
> Typo: s/in the/in/
> > crash during FFUNC stream") expects FFUNC VM state (and even the
> Typo: s/expects FFUNC/expects the FFUNC/
> > particular instruction to be executed) at the moment when stacktrace is
> > being collected. Unfortunately, it leads to test routine hang for
> Typo: s/to test routine hang/to the test routine hanging/
> > several environments and if it does not we cannot guarantee that the
> Typo: s/environments/environments,/
> Typo: s/and if it does not/and even if it does not,/
> > desired scenario is tested (only rely on statistics). As a result the
> Typo: s/As a result/As a result,/
> > test for the aforementioned patch was disabled for Tarantool CI in the
> > commit fef60a1052b8444be61e9a9932ab4875aff0b2ed ("test: prevent hanging
> > Tarantool CI by sysprof test") until the issue is not resolved.
> >
> > This patch introduces the new approach for testing our sampling profiler
> > via <ptrace> implementing precise managed execution of the profiled
> Typo: s/<ptrace>/<ptrace>,/
> > instance mentioned in tarantool/tarantool#9387.
> >
> > Instead of running around <tostring> gazillion times we accurately step
> Typo: s/times/times,/
> > to the exact place where the issue reproduces and manually emit SIGPROF
> > to the Lua VM being profiled. The particular approach implemented in
> > this patch is described below.
> >
> > As it was mentioned, the test makes sysprof to collect the particular
> Typo: s/to collect/collect/
> > event (FFUNC) at the certain instruction in Lua VM (<lj_fff_res1>) to
> > reproduce the issue from tarantool/tarantool#8594. Hence it's enough to
> Typo: s/Hence/Hence,/
> > call <tostring> fast function in the profiled instance (i.e. "tracee").
> Typo: s/i.e./i.e.,/
> > To emit SIGPROF right at <lj_fff_res1> in scope of <tostring> builtin,
> Typo: s/in scope/in the scope/
> > the manager (i.e. "tracer") is implemented.
> Typo: s/i.e./i.e.,/
> >
> > Here are the main steps (see comments and `man 2 ptrace' for more info):
> >   1. Poison <int 3> instruction as the first instruction at
> >      <lj_ff_tostring> to stop at the beginning of the fast function;
> >   2. Resume the "tracee" from the "tracer";
> >   3. Hit the emitted interruption, restore the original instruction and
> >      "rewind" the RIP to "replay" the instruction at <lj_ff_tostring>;
> >   4. Do the hack 1-3 for <lj_fff_res1>;
> >   5. Emit SIGPROF while resumimg the "tracee";
> Typo: s/resumimg/resuming/
> >
> > As a result sysprof collects the full backtrace with <tostring> fast
> Typo: s/a result/a result,/
> > function as the topmost frame.
> >
> > Resolves tarantool/tarantool#9387
> > Follows up tarantool/tarantool#8594
> >
> > Signed-off-by: Igor Munkin <imun at tarantool.org>
> > ---
> >  .../gh-8594-sysprof-ffunc-crash.test.c        | 269 ++++++++++++++++++
> >  .../gh-8594-sysprof-ffunc-crash.test.lua      |  55 ----
> >  2 files changed, 269 insertions(+), 55 deletions(-)
> >  create mode 100644 test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c
> >  delete mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> >
> > diff --git a/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c
> > new file mode 100644
> > index 00000000..4caed8cb
> > --- /dev/null
> > +++ b/test/tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c
> > @@ -0,0 +1,269 @@
> > +#include "lauxlib.h"
> > +#include "lmisclib.h"
> > +#include "lua.h"
> > +
> > +#include "test.h"
> > +#include "utils.h"
> > +
> > +#include <signal.h>
> > +#include <sys/ptrace.h>
> > +#include <sys/user.h>
> > +#include <sys/wait.h>
> > +#include <unistd.h>
> > +
> > +/* XXX: Still need normal assert inside <tracee> and helpers. */
> > +#undef NDEBUG
> > +#include <assert.h>
> > +
> > +/*
> > + * XXX: The test is *very* Linux/x86_64 specific. Fortunately, so
> > + * does the sampling profiler. <lj_arch.> is needed for LUAJIT_OS
> > + * and LUAJIT_TARGET.
> > + */
> > +#include "lj_arch.h"
> > +
> > +#if LUAJIT_OS == LUAJIT_OS_LINUX && LUAJIT_TARGET == LUAJIT_ARCH_X64
> Why not use the `LJ_HASSYSPROF` flag?

I agree here. `LJ_HASSYSPROF` is more foolproof -- if we will port
sysprof for arm64 (for example) we will not forget to update this test.

> > +
> > +/*
> > + * XXX: The test makes sysprof to collect the particular event
> Typo: s/sysprof to/sysprof/
> > + * (FFUNC) at the particular instruction (<lj_fff_res1>) to
> > + * reproduce the issue #8594. Hence it's enough to call <tostring>
> Typo: s/Hence/Hence,/
> > + * fast function (this is done in <tracee> function). To emit
> > + * SIGPROF right at <lj_fff_res1> in scope of <tostring> fast
> > + * function, the managed execution is implemented in the function
> > + * <tracer>: <int 3> instruction is poisoned as the first
> > + * instruction at <lj_ff_tostring> to stop <tracee> at the
> > + * beginning of the fast function; <tracer> resumes the <tracee>;
> > + * the same hack is done for <lj_fff_res1>. When the <tracee> hits
> > + * the interruption at <lj_fff_res1>, SIGPROF is emitted while
> > + * resuming the <tracee>. As a result sysprof collects the full
> Typo: s/a result/a result,/
> > + * backtrace with <tostring> fast function as the topmost frame.
> > + *
> > + * See more info here:
> > + * * https://man7.org/linux/man-pages/man2/ptrace.2.html
> > + * * https://github.com/tarantool/tarantool/issues/8594
> > + * * https://github.com/tarantool/tarantool/issues/9387

Minor: This link may be useful too:
https://c9x.me/x86/html/file_module_x86_id_142.html

> > + */
> > +
> > +#define MESSAGE "Canary is alive"
> > +#define LUACALL "local a = tostring('" MESSAGE "') return a"

Nit: I suggest adding an excess empty line here to separate extern asm functions
from Lua code defines.

> > +/* XXX: Resolve the necessary addresses from VM engine. */
> > +extern void *lj_ff_tostring(void);
> > +extern void *lj_fff_res1(void);
> > +
> > +/* Sysprof "/dev/null" stream helpers. {{{ */
> > +
> > +/*
> > + * Yep, 8Mb. Tuned in order not to bother the platform with too
> > + * often flushes.
> > + */

This comment is misleading since there is no any flushes in the writer
function.
Also, do we need such big buffer? Maybe `PATH_MAX` * 2 (or something
like that) should be enough?

> > +#define STREAM_BUFFER_SIZE (8 * 1024 * 1024)
> > +#define DEVNULL -1

`DEVNULL` name is misleading (like we've actually opened `/dev/null`).
Maybe `DUMMY_FD` is better? Also, I suppose we can drop the `fd` field
at all. We can just use static structure in this translation unit and
compare pointer against it, so there is no need for an additional field
fd in the structure itself.

> > +
> > +struct devnull_ctx {
> > +	/*
> > +	 * XXX: Dummy file descriptor to be used as "/dev/null"
> > +	 * context indicator in writer and on_stop callback.
> > +	 */
> > +	int fd;

See my comment above.

> > +	/* Buffer for data recorded by sysprof. */
> > +	uint8_t buf[STREAM_BUFFER_SIZE];
> > +};
> > +
> > +static int stream_new(struct luam_Sysprof_Options *options) {

Please, use separate line for block start `{`.
Here and below.

> > +	struct devnull_ctx *ctx = calloc(1, sizeof(struct devnull_ctx));
> > +	if (ctx == NULL)
> > +		return PROFILE_ERRIO;
> > +
> > +	/* Set "/dev/null" context indicator. */
> > +	ctx->fd = DEVNULL;
> > +	options->ctx = ctx;
> > +	options->buf = ctx->buf;
> > +	options->len = STREAM_BUFFER_SIZE;
> > +
> > +	return PROFILE_SUCCESS;
> > +}
> > +
> > +static int stream_delete(void *rawctx, uint8_t *buf) {
> > +	struct devnull_ctx *ctx = rawctx;
> > +	assert(ctx->fd == DEVNULL);
> > +	free(ctx);
> > +	return PROFILE_SUCCESS;
> > +}
> > +
> > +static size_t stream_writer(const void **buf_addr, size_t len, void *rawctx) {
> > +	struct devnull_ctx *ctx = rawctx;
> > +	assert(ctx->fd == DEVNULL);
> > +	/* Do nothing, just return back to the profiler. */
> > +	return STREAM_BUFFER_SIZE;
> > +}
> > +
> > +/* }}} Sysprof "/dev/null" stream helpers. */
> > +
> > +static int tracee(const char *luacode) {
> > +	struct luam_Sysprof_Counters counters = {};
> > +	struct luam_Sysprof_Options opt = {
> > +		/* Collect full backtraces per event. */
> > +		.mode = LUAM_SYSPROF_CALLGRAPH,
> > +		/*
> > +		 * XXX: Setting the "endless timer". The test
> > +		 * requires the single event to be streamed at
> > +		 * <lj_fff_res1> instruction, so to avoid spoiling
> > +		 * the stream with other unwanted events, the
> > +		 * timer is set to some unreachable point, so the
> > +		 * profiler will be guaranteed to stop before any
> > +		 * event is emitted.
> > +		 */
> > +		.interval = -1ULL,
> > +	};
> > +
> > +	/* Allow tracing for this process */

Nit: Missed dot at the end of the sentence.

> > +	if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) {

Nit: It's more readable to use `ptrace(PTRACE_TRACEME, 0, NULL, NULL)`
to match arguments types.

> > +		perror("Failed to turn the calling thread into a tracee");
> Typo?: s/thread/process/

Also, it's better to use `test_comment()` format here (i.e., with "#")
to be TAP-compatible.

> > +		return EXIT_FAILURE;
> > +	}
> > +
> > +	/*
> > +	 * XXX: Allow parent (which is our tracer now) to observe
> > +	 * our signal-delivery-stop (i.e. the tracee is ready).
> > +	 * For more info see ptrace(2), "Attaching and detaching".
> > +	 */
> > +	raise(SIGSTOP);
> > +
> > +	lua_State *L = utils_lua_init();
> > +
> > +	/* Customize and start profiler. */
> > +	assert(stream_new(&opt) == PROFILE_SUCCESS);
> > +	assert(luaM_sysprof_set_writer(stream_writer) == PROFILE_SUCCESS);
> > +	assert(luaM_sysprof_set_on_stop(stream_delete) == PROFILE_SUCCESS);
> > +	assert(luaM_sysprof_start(L, &opt) == PROFILE_SUCCESS);
> > +
> > +	/* FIXME: Make this part test agnostic. */

Typo: s/test agnostic/test-agnostic/

Nit: TODO looks more valid for me (since it's working).
Feel free to ignore.

> > +	assert(luaL_dostring(L, luacode) == LUA_OK);
> > +	assert(strcmp(lua_tostring(L, -1), MESSAGE) == 0);
> > +
> > +	/* Terminate profiler and Lua universe. */
> > +	assert(luaM_sysprof_stop(L) == PROFILE_SUCCESS);
> > +	utils_lua_close(L);
> It doesn't seem to be semantically correct to terminate Lua universe
> before obtaining the counters, even though there is no obstacles to do
> so in the API implementation itself.
> > +
> > +	/*
> > +	 * XXX: The only event to be streamed must be FFUNC at
> > +	 * <lj_fff_res1> instruction.

Typo: s/<lj_fff_res1>/the <lj_fff_res1>/

> > +	 * FIXME: Make this part test agnostic.

Typo: s/test agnostic/test-agnostic/

Nit: TODO looks more valid for me (since it's working).
Feel free to ignore.

> > +	 */
> > +	assert(luaM_sysprof_report(&counters) == PROFILE_SUCCESS);
> > +	assert(counters.samples == 1);
> > +	assert(counters.vmst_ffunc == 1);
> > +
> > +	return EXIT_SUCCESS;
> > +}
> > +
> > +const uint8_t INT3 = 0xCC;

Maybe it should be moved up to other constants and defines?

> > +static inline unsigned long int3poison(unsigned long instruction) {
> > +	const size_t int3bits = sizeof(INT3) * 8;
> > +	const unsigned long int3mask = -1UL >> int3bits << int3bits;

Minor: It's better to use parentesises here.

> > +	return (instruction & int3mask) | INT3;
> > +}
> > +
> > +static int continue_until(pid_t chpid, void *addr) {
> > +	int wstatus;
> > +	struct user_regs_struct regs;
> > +
> > +	/* Obtain the instructions at the <addr>. */
> > +	unsigned long data = ptrace(PTRACE_PEEKTEXT, chpid, addr, 0);

Nit: s/0/NULL/.
Here and below.

> > +	/*
> > +	 * Emit the <int 3> instruction to the <addr>.
> > +	 * XXX: <int 3> is poisoned as the LSB to the <data>
> > +	 * obtained from the <addr> above.
> > +	 */
> > +	ptrace(PTRACE_POKETEXT, chpid, addr, int3poison(data));
> > +
> > +	/* Resume <chpid> tracee until SIGTRAP occurs. */
> > +	ptrace(PTRACE_CONT, chpid, 0, 0);
> > +	/* Wait <chpid> tracee signal-delivery-stop. */
> > +	waitpid(chpid, &wstatus, 0);
> Should we check the `wstatus` here too?
> > +
> > +	/* Obtain GPR set to tweak RIP for further execution. */
> > +	ptrace(PTRACE_GETREGS, chpid, 0, &regs);
> > +	/*
> > +	 * Make sure we indeed are stopped at <addr>.
> > +	 * XXX: RIP points right after <int 3> instruction.
> > +	 */
> > +	assert(regs.rip == (long)addr + sizeof(INT3));
> > +
> > +	/*
> > +	 * XXX: Restore the original instruction at <addr> and
> > +	 * "rewind" RIP by <int 3> size to "replay" the poisoned
> > +	 * instruction at the <addr>.
> > +	 */
> > +	regs.rip -= sizeof(INT3);
> > +	ptrace(PTRACE_SETREGS, chpid, 0, &regs);
> > +	ptrace(PTRACE_POKETEXT, chpid, addr, data);
> > +
> > +	/* Return wait status to the caller for test checks. */
> > +	return wstatus;
> > +}
> > +
> > +static int tracer(pid_t chpid) {
> > +	int wstatus;
> > +
> > +	/* Wait until <chpid> tracee is ready. */
> > +	waitpid(chpid, &wstatus, 0);
> Should we check the process status here too?
> > +
> > +	/* Resume <chpid> tracee until <lj_ff_tostring>. */
> > +	wstatus = continue_until(chpid, lj_ff_tostring);
> > +
> > +	/* The tracee has to be alive and stopped by SIGTRAP. */
> > +	assert_false(WIFEXITED(wstatus));
> > +	assert_true(WIFSTOPPED(wstatus));
> > +
> > +	/* Resume <chpid> tracee until <lj_fff_res1>. */
> > +	wstatus = continue_until(chpid, lj_fff_res1);
> > +
> > +	/* The tracee has to be alive and stopped by SIGTRAP. */
> > +	assert_false(WIFEXITED(wstatus));
> > +	assert_true(WIFSTOPPED(wstatus));
> This part is duplicated, let's move it to a separate `healthcheck`
> function, or something like that.

May be we can move these checks inside the `continue_until()` routine?
Do we want to check something except the status is `WIFSTOPPED(status)`?

> > +
> > +	/* Send SIGPROF to make sysprof collect the event. */
> > +	ptrace(PTRACE_CONT, chpid, 0, SIGPROF);
> > +
> > +	/* Wait until <chpid> tracee successfully exits. */
> > +	waitpid(chpid, &wstatus, 0);
> > +	assert_true(WIFEXITED(wstatus));
> > +
> > +	return TEST_EXIT_SUCCESS;
> > +}
> > +

<snipped>

> > diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> > deleted file mode 100644
> > index f8b409ae..00000000
> > --- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> > +++ /dev/null
> <snipped>

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list