From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 690FD45C304 for ; Wed, 16 Dec 2020 14:16:11 +0300 (MSK) Received: by mail-lf1-f45.google.com with SMTP id u18so47344419lfd.9 for ; Wed, 16 Dec 2020 03:16:11 -0800 (PST) Date: Wed, 16 Dec 2020 14:16:06 +0300 From: Cyrill Gorcunov Message-ID: <20201216111606.GA14556@grain> References: <20201210161832.729439-1-gorcunov@gmail.com> <20201210161832.729439-5-gorcunov@gmail.com> <3d201857-f809-c758-297c-e3e896fbf06c@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3d201857-f809-c758-297c-e3e896fbf06c@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: TML On Mon, Dec 14, 2020 at 11:54:26PM +0100, Vladislav Shpilevoy wrote: > > 2. infomation -> information. > > It helps to write a commit message in an editor like Google Docs or your > local offline editor, and see all the highlighted errors, if you can't > get a spell checker working in vim or whatever you usually use. Sorry, I've been checking the patch with spell checker but happen to miss merging fixed verion in. Thank you! > > | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID" > > | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO" > > | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs" > > | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN" > > | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW" > > | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe" > > | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh" > > | "MGUgaW4gX3N0YXJ0KzJlCg==", > > | "timestamp": "2020-12-10 17:08:42 MSK" > > | } > > | } > > 3. And what is the reason you encode it all into base64 second time? Because backtrace contains newlines and other special symbols, this is very inconvenient when you print the json-dump into console. > > > The `backtrace` istself is encoded as base64 because of newline symbols > > (and may comprise some nonascii symbols as well). ^^^ here I explained why > > 4. analisys -> analysis. yup, thanks! > > > > is sent to the feedback server. To disable it set `feedback_no_crashinfo` > > to `false`. > > 5. Doesn't the name look wrong to you? To disable something you propose > to set 'no' to 'false'. No_crashinfo = false. It is the same as > crashinfo = true. Vice versa. > > I propose not to use 'no' in option names, as it is always confusing. > crashinfo = false means don't send anything, true means send. ok, will do, thanks! > > setmetatable(daemon, { > > __index = { > > set_feedback_params = function() > > daemon.enabled = box.cfg.feedback_enabled > > daemon.host = box.cfg.feedback_host > > daemon.interval = box.cfg.feedback_interval > > + ffi.C.crash_cfg() > > 6. I realized just now that your new option has nothing to do with > the feedback daemon except the option name and the URL. > > You probably should configure it via Lua C, just like almost all > the other box.cfg options do. I'll try, thanks! > > > > +#include "third_party/base64.h" > > +#include "small/static.h" > > #include "trivia/util.h" > > #include "backtrace.h" > > #include "crash.h" > > +#include "cfg.h" > > 7. By using 'cfg' here and linking with 'misc' you introduce a cyclic > dependency. Because 'cfg' module is a part of 'server' library. > 'Server' depends on 'core', which has the new crash handling system. > Also you start depending on Lua here, because 'cfg' uses Lua inside. > > In addition, 'cfg' depends on box, although implicitly. Because it > takes the config options from box.cfg. I suggest you to configure this > module like other modules - via crash_cfg_*() functions. Or pass > the options as arguments into one crash_cfg(). OK, thanks! > > > > +static char tarantool_path[PATH_MAX]; > > +static char feedback_host[2048]; > > 8. I don't like that you just added +6KB to the process > size, not counting tons of other memory such as backtrace > buffer (+4KB), lots of 8 byte registers, and so on, when > we fight for each byte. At least +10KB in total. But I > have no a good idea how to fix it. Personally I don't like that we use static buffer inside crash handler. I would prefer to have the whole memory we need being preallocated, our executable file is about 11M (for debug build) so I don't think that even 16K of preallocated memory matter somethow. And would like to have a separate preallocated stack for crash handler (which requires to disable simultaneous signals handling as we have now, and frankly I don't understand why we need SA_NODEFER flag). I think we can rethink about memory usage later since this is definitely not a critical part. > > +static void > > +crash_report_feedback_daemon(struct crash_info *cinfo) > > +{ > > + /* > > + * Update to a new number if format get changed. > > + */ > > + static const int crashinfo_version = 1; > > 9. Why is it static? I suspect we discussed that at least 4 > times ... To use stack less, still 8 bytes won't make much difference so I thik indeed we can simply se nonstatick value. > > + > > + char *p = static_alloc(SMALL_STATIC_SIZE); > > + char *e = &p[SMALL_STATIC_SIZE - 1]; > > + char *head = p; > > + char *tail = &p[SMALL_STATIC_SIZE - 8]; > > 10. Why -8? To have at least a gap before the end of a buffer. Note that this addresses are not used for snprintf (where we have -1 as the last addressable pointer) but rather for arbitrary data. For safety in short. > > + > > + /* > > + * Lets reuse tail of the buffer as a temp space. > > + */ > > + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)]; > > + if (p >= (char *)(void *)uname_ptr) > > 11. Why do you need double cast? I can't imagine that it does anything > except adds confusion. Compiler doesn't like to compare a char pointer with a structure pointer src/lib/core/crash.c:251:8: error: comparison of distinct pointer types lacks a cast [-Werror] 251 | if (p >= (void *)uname_ptr) | ^~ > > + if (uname(uname_ptr) != 0) { > > + pr_syserr("uname call failed, ignore"); > > + memset(uname_ptr, 0, sizeof(struct utsname)); > > + } > > + > > + snprintf_safe("{"); > > 12. snprintf_safe uses memory [begin, e]. Where 'begin' is the initial > value of 'p'. 'uname_ptr' occupies [e - 7 - sizeof(*uname_ptr), e - 7]. > Assume static size is 12KB, and sizeof(*uname_ptr) is 640. > > [begin = 0, snprintf e = 12287] > [11640 uname 12280] > > So snprintf and uname buffers intersect. If there is a big enough dump, > it will corrupt uname object. > > Is it correct? This is undefined behaviour. I though about it and I don't like this but there is no other way if we want to use a single buffer for all. This was the reason while I've been using preallocated memory for all this and you said to reuse static alloc instead. Another option is to adjust @e pointer to be sure that we're not jumping into the tail. Letme think maybe I manage to make this more clear and readable... > > + if (cinfo->signo == SIGSEGV) { > > + if (cinfo->sicode == SEGV_MAPERR) { > > + snprintf_safe("\"si_code_str\":\"%s\",", > > + "SEGV_MAPERR"); > > + } else if (cinfo->sicode == SEGV_ACCERR) { > > + snprintf_safe("\"si_code_str\":\"%s\",", > > + "SEGV_ACCERR"); > > + } > > + snprintf_safe("\"si_addr\":\"0x%llx\",", > > + (long long)cinfo->siaddr); > > + } > > 13. Instead of using all these global many KB buffers you > could also create a pipe to the new process, which the > child would read as stdin from Lua, without any tiny limits > on how many KBs it can use. And do write() to it directly. Pipes use kernel memory for own buffers and what is worse this memory is not accountable as far as I remember. I'm not sure how critical this would be but I like the idea in general. Gimme some time to figure out the details. > > + > > + char *timestamp_rt_str = &tail[-128]; > > + if (p >= timestamp_rt_str) > > + goto out; > > + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128); > > 14. Why 128? And why is it hardcoded? This should fit any possible timestring I think. The timestring is bound to locale and I think it might vary in size. I choose a big value but to be honest I'm not 100% if this will be enough forever. > > + snprintf_safe("\"timestamp\":\"%s\"", timestamp_rt_str); > > + snprintf_safe("}"); > > + snprintf_safe("}"); > > + > > + size_t report_len = p - head; > > + size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP); > > + > > + char *report_base64 = &tail[-report_elen-8]; > > 15. Why -8? Already explained. > > 16. What if report_elen is bigger than *uname_ptr? Will you override > the output buffer even more? Need to recheck, thanks! > > > + if (p >= report_base64) > > + goto out; > > + base64_encode(head, report_len, report_base64, > > + report_elen, BASE64_NOWRAP); > > + report_base64[report_elen] = '\0'; > > + ... > > + pid_t pid = vfork(); > > + if (pid == 0) { > > + /* > > + * The script must exit at the end but there > > + * is no simple way to make sure from inside > > + * of a signal crash handler. So just hope it > > + * is running fine. > > + */ > > + execve(exec_argv[0], exec_argv, NULL); > > + pr_crit("exec(%s,[%s,%s,%s,NULL]) failed", > > + exec_argv[0], exec_argv[0], > > + exec_argv[1], exec_argv[2]); > > 17. What will happen if this tarantool also will crash? Will there > be an infinite chain of crashes? For instance, if we will have an > issue with curl again. Or somebody else will build with his own > curl leading to a crash on any usage, like it already happened one > time. Strictly speaking -- yes, you're right. We might have a chain of crashes and I think we might need a command line option for this?