From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0117745C305 for ; Sun, 20 Dec 2020 17:48:15 +0300 (MSK) References: <20201210161832.729439-1-gorcunov@gmail.com> <20201210161832.729439-5-gorcunov@gmail.com> <3d201857-f809-c758-297c-e3e896fbf06c@tarantool.org> <20201216111606.GA14556@grain> From: Vladislav Shpilevoy Message-ID: <0bb6f2db-5ef8-9d71-be61-67e113c7e0ad@tarantool.org> Date: Sun, 20 Dec 2020 15:48:14 +0100 MIME-Version: 1.0 In-Reply-To: <20201216111606.GA14556@grain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cyrill Gorcunov Cc: TML >>> | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID" >>> | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO" >>> | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs" >>> | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN" >>> | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW" >>> | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe" >>> | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh" >>> | "MGUgaW4gX3N0YXJ0KzJlCg==", >>> | "timestamp": "2020-12-10 17:08:42 MSK" >>> | } >>> | } >> >> 3. And what is the reason you encode it all into base64 second time? > > Because backtrace contains newlines and other special symbols, this is very inconvenient > when you print the json-dump into console. I didn't ask about backtrace base64. I ask about why do you encode crashdump.data field into base64 again. Currently you do base64 encoding 2 times: one for signal.backtrace field, and second time you encode the whole {uname: ..., build: ..., signal: ...}. Signal.backtrace here is already in base64. I ask about the second encoding. >>> + >>> + char *p = static_alloc(SMALL_STATIC_SIZE); >>> + char *e = &p[SMALL_STATIC_SIZE - 1]; >>> + char *head = p; >>> + char *tail = &p[SMALL_STATIC_SIZE - 8]; >> >> 10. Why -8? > > To have at least a gap before the end of a buffer. Sorry, but this is not an answer. Why not -7, -100, -3? I don't see where these 8 bytes are used for anything, even as a special 8-byte gap. >>> + >>> + /* >>> + * Lets reuse tail of the buffer as a temp space. >>> + */ >>> + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)]; >>> + if (p >= (char *)(void *)uname_ptr) >> >> 11. Why do you need double cast? I can't imagine that it does anything >> except adds confusion. > > Compiler doesn't like to compare a char pointer with a structure pointer > > src/lib/core/crash.c:251:8: error: comparison of distinct pointer types lacks a cast [-Werror] > 251 | if (p >= (void *)uname_ptr) > | ^~ Can you just use (char *) cast instead of (void *)? Could you simply try that on your own without me suggesting it? >>> + if (uname(uname_ptr) != 0) { >>> + pr_syserr("uname call failed, ignore"); >>> + memset(uname_ptr, 0, sizeof(struct utsname)); >>> + } >>> + >>> + snprintf_safe("{"); >> >> 12. snprintf_safe uses memory [begin, e]. Where 'begin' is the initial >> value of 'p'. 'uname_ptr' occupies [e - 7 - sizeof(*uname_ptr), e - 7]. >> Assume static size is 12KB, and sizeof(*uname_ptr) is 640. >> >> [begin = 0, snprintf e = 12287] >> [11640 uname 12280] >> >> So snprintf and uname buffers intersect. If there is a big enough dump, >> it will corrupt uname object. >> >> Is it correct? > > This is undefined behaviour. I though about it and I don't like this but > there is no other way if we want to use a single buffer for all. This was > the reason while I've been using preallocated memory for all this and you > said to reuse static alloc instead. Another option is to adjust @e pointer > to be sure that we're not jumping into the tail. Letme think maybe I manage > to make this more clear and readable... It is not about clear/readable, for now. You just use buffer which can be corrupted. It is a bug. Regardless of how clear will it look. >>> + if (cinfo->signo == SIGSEGV) { >>> + if (cinfo->sicode == SEGV_MAPERR) { >>> + snprintf_safe("\"si_code_str\":\"%s\",", >>> + "SEGV_MAPERR"); >>> + } else if (cinfo->sicode == SEGV_ACCERR) { >>> + snprintf_safe("\"si_code_str\":\"%s\",", >>> + "SEGV_ACCERR"); >>> + } >>> + snprintf_safe("\"si_addr\":\"0x%llx\",", >>> + (long long)cinfo->siaddr); >>> + } >> >> 13. Instead of using all these global many KB buffers you >> could also create a pipe to the new process, which the >> child would read as stdin from Lua, without any tiny limits >> on how many KBs it can use. And do write() to it directly. > > Pipes use kernel memory for own buffers and what is worse this > memory is not accountable as far as I remember. I don't see any problem with kernel using memory to support pipes. But I do see a problem in the code above, which can simply return garbage or crash again if the dump will be big enough. >>> + >>> + char *timestamp_rt_str = &tail[-128]; >>> + if (p >= timestamp_rt_str) >>> + goto out; >>> + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128); >> >> 14. Why 128? And why is it hardcoded? > > This should fit any possible timestring I think. The timestring is bound > to locale and I think it might vary in size. I choose a big value but > to be honest I'm not 100% if this will be enough forever. Ok, then the second question remains - why is it hardcoded in 2 places? I remember you sent some commits to the other source files where you changed such places to use sizeof() or const variable or something like that. How is it different here? >>> + snprintf_safe("\"timestamp\":\"%s\"", timestamp_rt_str); >>> + snprintf_safe("}"); >>> + snprintf_safe("}"); >>> + >>> + size_t report_len = p - head; >>> + size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP); >>> + >>> + char *report_base64 = &tail[-report_elen-8]; >> >> 15. Why -8? > > Already explained. Didn't understand, sorry. >>> + pid_t pid = vfork(); >>> + if (pid == 0) { >>> + /* >>> + * The script must exit at the end but there >>> + * is no simple way to make sure from inside >>> + * of a signal crash handler. So just hope it >>> + * is running fine. >>> + */ >>> + execve(exec_argv[0], exec_argv, NULL); >>> + pr_crit("exec(%s,[%s,%s,%s,NULL]) failed", >>> + exec_argv[0], exec_argv[0], >>> + exec_argv[1], exec_argv[2]); >> >> 17. What will happen if this tarantool also will crash? Will there >> be an infinite chain of crashes? For instance, if we will have an >> issue with curl again. Or somebody else will build with his own >> curl leading to a crash on any usage, like it already happened one >> time. > > Strictly speaking -- yes, you're right. We might have a chain of crashes > and I think we might need a command line option for this? If it will crash before checking the command line option, it won't help. I see that feedback_daemon is disabled until first box.cfg. I suggest you do the same here - disable crash sending by default. Enable it with first box.cfg. In your script you don't use box.cfg, so a chain crash won't happen. Unless some bug will override the flag responsible for crash send. I also realized, that before box.cfg you don't know the URL. So if I crash the process before it, I get this: SystemError curl: URL using bad/illegal format or missing URL: Invalid argument fatal error, exiting the event loop This is because you try to send the dump before the crash module is configured.