From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 606B06EC55; Tue, 27 Jul 2021 14:53:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 606B06EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627386826; bh=gY949H/XwAuLMKQYe7aheWZV5qkOdIoHaOT3J5fgOA0=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=eAw+lV1wLzeVadTPSYa7ZxjM7Cc2fSW8+pU1V/VvOM4CyTmbOTHqtC40MWR8DR72k TLwYrjbRwBNiCY6I6hi1dDdHJa7uaaBgQwXMsA1mcbxGZP+MdLw9z6whiDBRlF+W6X JKm9Gyinw8v2sOzLTmOK7BFpCLdRsJy1hK9cwic8= Received: from smtpng2.i.mail.ru (smtpng2.i.mail.ru [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id DE8986EC55 for ; Tue, 27 Jul 2021 14:53:44 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org DE8986EC55 Received: by smtpng2.m.smailru.net with esmtpa (envelope-from ) id 1m8LeZ-00006q-TX; Tue, 27 Jul 2021 14:53:44 +0300 Date: Tue, 27 Jul 2021 14:53:41 +0300 To: Egor Elchinov via Tarantool-patches Message-ID: <20210727115341.oplysx67zrslvfw7@esperanza> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C30288BCF456A452EC429C5145D5B58EFD182A05F53808504021150D9ADD92F3636FE33B110BEE97BC083AEAD896F45CB633EF3A38D2FC6602 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE70CB15FA6C489297DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370D24454B2F95E3848638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8FC025BEDBC84FE021F8D2FADA7754AD1117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAA867293B0326636D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6753C3A5E0A5AB5B7089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B50534B7D94AE4D8B59A310129D85B939E3E X-C1DE0DAB: 0D63561A33F958A51AF274A63BF4C2A9029FAFF31412206AA6AAC97BA5D40A54D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75FA7FF33AA1A4D21C410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D349FD10E01B3CDC9E28BA3930B9FAA5352AF6FA3067BA5F3DF209A6FC47CB52646E600DA2F380DA6AB1D7E09C32AA3244CD599B9D17D47FF4B059025B31D86230530452B15D76AEC1483B48618A63566E0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojMEANdStWW58rQEbGie9EkA== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D22A47B297B3C80C0BD67F1E3B9009B8F274CEFED1673C562683ABF942079399BFB559BB5D741EB966A65DFF43FF7BE03240331F90058701C67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3 0/4] fiber: introduce creation backtrace X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladimir Davydov via Tarantool-patches Reply-To: Vladimir Davydov Cc: Egor Elchinov , gorcunov@tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Wed, Jul 14, 2021 at 02:12:48PM +0300, Egor Elchinov via Tarantool-patches wrote: > https://github.com/tarantool/tarantool/tree/Egor2001/gh-4002-fiber-creation-backtrace > > Egor Elchinov (4): > fiber: add PoC for fiber creation backtrace > fiber: add option and PoC for Lua parent backtrace > fiber: refactor lua backtrace routines > fiber: refactor C backtrace and add changelog Kirill asked me to take a look at this series. The branch contains a different set of patches. I'm going to review what you pushed to the branch. Side note: the series is not very well split IMO. The goal of splitting is to speed up the review process so it's good to split a series in such a way that each patch can be viewed at and applied independently of the following patches. In particular, all code introduced in a patch should be fully covered by existing tests or tests added in the patch so that it can be applied before the rest of the series. In your case it isn't so: - fiber: add fiber creation Lua backtrace Adds some C functions that aren't used, nor tested. - fiber: add fiber creation C backtrace with option Adds a knob that does nothing + some struct members that aren't used. - backtrace: add RIP tracing and resolving API Introduces the rest of the feature. As a result, a reviewer has to jump between patch 3 and patches 1, 2 or ignore the splitting and review the resulting diff instead. I'm going to do the latter. You do a lot of refactoring in your patch, e.g. factor out helpers to work with proc name cache. I'd do each independent piece of refactoring in a separate, preparatory patch. There are two main concerns regarding this patch, which should be addressed if possible: - I think that we should include backtraces of all ancestors into a fiber's backtrace, not just its parent. Without it, the feature doesn't seem to be complete. - There's a lot of code duplication between the code handling child and parent backtraces. This makes the code difficult for understanding and future support. It would be nice to have one function that captures a backtrace and stores it in some format and one function that dumps a captured backtrace to Lua. And use these two functions for capturing and showing both child and parent backtraces. Can it be achieved? Looked through the patch. Some minor comments below. > diff --git a/changelogs/unreleased/gh-4002-fiber-creation-backtrace.md b/changelogs/unreleased/gh-4002-fiber-creation-backtrace.md > new file mode 100644 > index 000000000000..1e1ac177a6c6 > --- /dev/null > +++ b/changelogs/unreleased/gh-4002-fiber-creation-backtrace.md > @@ -0,0 +1,8 @@ > +## feature/fiber Should be feature/core I think. > + > + * Added new subtable `parent_backtrace` to the `fiber.info()` > + containing C and Lua backtrace chunks of fiber creation. > + * Added `fiber.parent_bt_enable()` and `fiber.parent_bt_disable()` > + options in order to switch on/off the ability to collect > + parent backtraces for newly created fibers and to > + show/hide `parent_backtrace` subtables in the `fiber.info()`. Please rephrase and put under one bullet point - it's one feature that can be enabled/disabled, not two features. > diff --git a/src/lib/core/backtrace.cc b/src/lib/core/backtrace.cc > index 85c1aaefd009..a59b77053582 100644 > --- a/src/lib/core/backtrace.cc > +++ b/src/lib/core/backtrace.cc > @@ -74,12 +78,68 @@ backtrace_proc_cache_clear(void) > proc_cache = NULL; > } > > +int > +backtrace_proc_cache_find(unw_word_t ip, const char **name, unw_word_t *offset) Should be static. > +int > +backtrace_proc_cache_put(unw_word_t ip, const char *name, unw_word_t offset) Should be static. The return value is never used, please remove. Adding these two functions would be a good preparatory patch that could be applied before the rest of the series is even reviewed. > +/** > + * Collect up to `limit' IP register values > + * for frames of the current stack into `ip_buf'. > + * Must be by far faster than usual backtrace according to the > + * libunwind doc for unw_backtrace(). > + */ > +void NOINLINE > +backtrace_collect_ip(void **ip_buf, int limit) > +{ > + memset(ip_buf, 0, limit * sizeof(*ip_buf)); > +#ifndef TARGET_OS_DARWIN > + unw_backtrace(ip_buf, limit); > +#else > + /* > + * This dumb implementation was chosen because the DARWIN > + * lacks unw_backtrace() routine from libunwind and > + * usual backtrace() from has less capabilities > + * than the libunwind version which uses DWARF. > + */ > + unw_cursor_t unw_cur; > + unw_context_t unw_ctx; > + int frame_no = 0; > + unw_word_t ip; > + > + unw_getcontext(&unw_ctx); > + unw_init_local(&unw_cur, &unw_ctx); > + > + while (frame_no < limit && unw_step(&unw_cur) > 0) { > + unw_get_reg(&unw_cur, UNW_REG_IP, &ip); > + ip_buf[frame_no] = (void *)ip; > + ++frame_no; > + } > +#endif > +} > + > +/** > + * Call `cb' callback for not more than > + * first `limit' frames present in the `ip_buf'. > + * > + * The implementation uses poorly documented `get_proc_name' callback > + * from the `unw_accessors_t' to get procedure names via `ip_buf' values. > + * Although `get_proc_name' is present on most architectures, it's an optional > + * field, so procedure name is allowed to be absent (NULL) in `cb' call. > + * > + * TODO: to add cache and demangling support Who's going to do that and when? Is there a ticket for that? What won't work without "cache and demangling support"? It would be best to avoid introducing any TODOs in the code if possible. If not possible, please explain thoroughly what doesn't work, what should be done, create a ticket, and add a reference to the ticket to the TODO, e.g. TODO(gh-12345): ... > + */ > +void > +backtrace_foreach_ip(backtrace_cb cb, void **ip_buf, int limit, > + void *cb_ctx) Please make cb next to last argument, near cb_ctx: backtrace_foreach_ip(ip_buf, limit, cb, cb_ctx) > +{ > + int demangle_status; > + char *demangle_buf = NULL; > + size_t demangle_buf_len = 0; > +#ifndef TARGET_OS_DARWIN > + char proc_name[BACKTRACE_NAME_MAX]; > + unw_word_t ip = 0, offset = 0; > + unw_proc_info_t pi; > + int frame_no, ret = 0; > + const char *proc = NULL; > + > + unw_accessors_t *acc = unw_get_accessors(unw_local_addr_space); > + > + /* > + * RIPs collecting comes from inside a helper routine > + * so we skip the collector function address itself thus > + * start fetching functions with frame number = 1. > + */ > + for (frame_no = 1; frame_no < limit && ip_buf[frame_no] != NULL; Can we somehow not return the frame corresponding to backtrace_collect_ip, since we don't need it anyway? > + frame_no++) { > + ip = (unw_word_t)ip_buf[frame_no]; > + > + if (backtrace_proc_cache_find(ip, &proc, &offset) == 0) { > + ret = 0; > + } else if (acc->get_proc_name == NULL) { > + ret = unw_get_proc_info_by_ip(unw_local_addr_space, > + ip, &pi, NULL); > + offset = ip - pi.start_ip; > + proc = NULL; > + backtrace_proc_cache_put(ip, proc, offset); > + } else { > + ret = acc->get_proc_name(unw_local_addr_space, ip, > + proc_name, sizeof(proc_name), > + &offset, NULL); > + proc = proc_name; > + backtrace_proc_cache_put(ip, proc, offset); > + } > + > + if (proc != NULL) { > + char *cxxname = abi::__cxa_demangle(proc, demangle_buf, > + &demangle_buf_len, > + &demangle_status); > + if (cxxname != NULL) { > + demangle_buf = cxxname; > + proc = cxxname; > + } > + } > + if (ret != 0 || cb(frame_no - 1, (void *)ip, proc, > + (size_t)offset, cb_ctx) != 0) > + break; > + } > + > + free(demangle_buf); > + if (ret != 0) > + say_debug("unwinding error: %s", unw_strerror(ret)); > +#else > + int frame_no, ret = 1; > + void *ip = NULL; > + size_t offset = 0; > + Dl_info dli; > + const char *proc = NULL; > + > + for (frame_no = 1; frame_no < limit && ip_buf[frame_no] != NULL; > + ++frame_no) { > + ip = ip_buf[frame_no]; > + if (backtrace_proc_cache_find((unw_word_t)ip, &proc, > + &offset) == 0) { > + ret = 1; > + } else { > + ret = dladdr(ip, &dli); > + if (ret == 0) > + break; > + offset = (char *)ip - (char *)dli.dli_saddr; > + proc = dli.dli_sname; > + backtrace_proc_cache_put((unw_word_t)ip, proc, offset); > + } > + > + if (proc != NULL) { > + char *cxxname = abi::__cxa_demangle(proc, demangle_buf, > + &demangle_buf_len, > + &demangle_status); > + if (cxxname != NULL) { > + demangle_buf = cxxname; > + proc = cxxname; > + } > + } Looks like code duplication between #if TARGET_OS_DARWIN and #else blocks. Is it possible to factor out only those parts that are actually different between those two cases and put them in a separate function? > + if (cb(frame_no - 1, ip, proc, offset, cb_ctx) != 0) > + break; > + } > + > + free(demangle_buf); > + if (ret == 0) > + say_debug("unwinding error: %i", ret); > +#endif > +} > + > void > print_backtrace(void) > { > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 588b78504f38..f0fe6b893cd3 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -45,6 +45,11 @@ > > extern void cord_on_yield(void); > > +#if ENABLE_BACKTRACE > +#include "backtrace.h" /* fast_trace */ > + > +#endif /* ENABLE_BACKTRACE */ > + The #if/endif isn't needed here. > #if ENABLE_FIBER_TOP > #include /* __rdtscp() */ > > @@ -215,6 +220,10 @@ fiber_mprotect(void *addr, size_t len, int prot) > static __thread bool fiber_top_enabled = false; > #endif /* ENABLE_FIBER_TOP */ > > +#if ENABLE_BACKTRACE > +static __thread bool fiber_parent_bt_enabled = false; > +#endif /* ENABLE_BACKTRACE */ > + Why __thread? > diff --git a/src/lua/fiber.c b/src/lua/fiber.c > index 38394666b41d..d82a3a20c132 100644 > --- a/src/lua/fiber.c > +++ b/src/lua/fiber.c > @@ -210,6 +226,69 @@ dump_lua_frame(struct lua_State *L, lua_Debug *ar, int tb_frame) > lua_settable(L, -3); > } > > +static void > +lua_backtrace_foreach(struct lua_State *L, lua_backtrace_cb cb, void *cb_ctx) This helper function could be introduced in a separate patch. > @@ -220,38 +299,60 @@ fiber_backtrace_cb(int frameno, void *frameret, const char *func, size_t offset, > * https://github.com/tarantool/tarantool/issues/5326 > * will not resolved, but is possible afterwards. > */ > - if (func != NULL && strstr(func, "lj_BC_FUNCC") == func) { > + if (func != NULL && tb_ctx->R && strstr(func, "lj_BC_FUNCC") == func) { > /* We are in the LUA vm. */ > - lua_Debug ar; > - while (tb_ctx->R && lua_getstack(tb_ctx->R, tb_ctx->lua_frame, &ar) > 0) { > - /* Skip all following C-frames. */ > - lua_getinfo(tb_ctx->R, "Sln", &ar); > - if (*ar.what != 'C') > - break; > - if (ar.name != NULL) { > - /* Dump frame if it is a C built-in call. */ > - tb_ctx->tb_frame++; > - dump_lua_frame(L, &ar, tb_ctx->tb_frame); > - } > - tb_ctx->lua_frame++; > - } > - while (tb_ctx->R && lua_getstack(tb_ctx->R, tb_ctx->lua_frame, &ar) > 0) { > - /* Trace Lua frame. */ > - lua_getinfo(tb_ctx->R, "Sln", &ar); > - if (*ar.what == 'C') { > - break; > - } > + lua_backtrace_foreach(tb_ctx->R, dump_lua_frame_cb, tb_ctx); > + } > + char buf[512]; > + int l = snprintf(buf, sizeof(buf), "#%-2d %p in ", > + frameno, frameret); > + if (func) > + snprintf(buf + l, sizeof(buf) - l, "%s+%zu", func, offset); > + else > + snprintf(buf + l, sizeof(buf) - l, "?"); > + buf[sizeof(buf) - 1] = 0; > + tb_ctx->tb_frame++; > + lua_pushnumber(L, tb_ctx->tb_frame); > + lua_newtable(L); > + lua_pushstring(L, "C"); > + lua_pushstring(L, buf); > + lua_settable(L, -3); > + lua_settable(L, -3); Duplication with the code from fiber_backtrace_cb below. Should be factored out to a separate helper function. > + return 0; > +} > + > +static int > +fiber_parent_backtrace_cb(int frameno, void *frameret, const char *func, > + size_t offset, void *cb_ctx) > +{ > + int lua_frame = 0; > + struct lua_parent_tb_ctx *tb_ctx = (struct lua_parent_tb_ctx *)cb_ctx; > + struct parent_bt_lua *bt = tb_ctx->bt; > + struct lua_State *L = tb_ctx->L; > + /* > + * There is impossible to get func == NULL until > + * https://github.com/tarantool/tarantool/issues/5326 > + * will not resolved, but is possible afterwards. > + */ > + if (bt != NULL && func != NULL && strstr(func, "lj_BC_FUNCC") == func) { > + /* We are in the LUA vm. */ > + while (lua_frame < bt->cnt) { > tb_ctx->tb_frame++; > - dump_lua_frame(L, &ar, tb_ctx->tb_frame); > - tb_ctx->lua_frame++; > + dump_lua_frame(L, bt->names[lua_frame], > + bt->sources[lua_frame], > + bt->lines[lua_frame], > + tb_ctx->tb_frame); > + lua_frame++; > } > } > char buf[512]; > - int l = snprintf(buf, sizeof(buf), "#%-2d %p in ", frameno, frameret); > + int l = snprintf(buf, sizeof(buf), "#%-2d %p in ", > + frameno, frameret); > if (func) > snprintf(buf + l, sizeof(buf) - l, "%s+%zu", func, offset); > else > snprintf(buf + l, sizeof(buf) - l, "?"); > + buf[sizeof(buf) - 1] = 0; > tb_ctx->tb_frame++; > lua_pushnumber(L, tb_ctx->tb_frame); > lua_newtable(L); > @@ -433,6 +569,22 @@ lbox_do_backtrace(struct lua_State *L) > } > return true; > } > + > +static int > +lbox_fiber_parent_bt_enable(struct lua_State *L) > +{ > + (void) L; > + fiber_parent_bt_enable(); > + return 0; > +} > + > +static int > +lbox_fiber_parent_bt_disable(struct lua_State *L) > +{ > + (void) L; > + fiber_parent_bt_disable(); > + return 0; > +} > #endif /* ENABLE_BACKTRACE */ Better have one function which would take true/false IMO, but I think we have to wind up with two, for consistency with fiber_top_enable... > @@ -498,6 +654,12 @@ fiber_create(struct lua_State *L) > luaT_error(L); > } > > +#ifdef ENABLE_BACKTRACE > + // TODO: error handling Please fix in this patch. Either with luaT_error() or with panic(), which is okay, because malloc doesn't normally fail and if it does there isn't much we can do. > + if (fiber_parent_bt_is_enabled()) > + fiber_parent_bt_init(f, L); > +#endif > + > diff --git a/src/lua/fiber.h b/src/lua/fiber.h > index e298987062c5..450840fb045b 100644 > --- a/src/lua/fiber.h > +++ b/src/lua/fiber.h > @@ -36,6 +36,25 @@ extern "C" { > > struct lua_State; > > +/** > + * Maximal name length (including '\0') > + * and backtrace length. > + */ > +enum { > + PARENT_BT_LUA_NAME_MAX = 64, > + PARENT_BT_LUA_LEN_MAX = 8 I think we should capture more frames, especially if we capture backtraces of all ancestors. I think 64 would be an adequate limit. > +}; > + > +/** > + * Stores lua parent backtrace for fiber. > + */ > +struct parent_bt_lua { > + int cnt; > + char names[PARENT_BT_LUA_LEN_MAX][PARENT_BT_LUA_NAME_MAX]; > + char sources[PARENT_BT_LUA_LEN_MAX][PARENT_BT_LUA_NAME_MAX]; > + int lines[PARENT_BT_LUA_LEN_MAX]; > +}; Instead of having three arrays, better introduce a separate struct and create one array of those structs.