From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2A78D45C305 for ; Sat, 5 Dec 2020 22:58:30 +0300 (MSK) Received: by mail-lf1-f50.google.com with SMTP id l11so12487653lfg.0 for ; Sat, 05 Dec 2020 11:58:30 -0800 (PST) Date: Sat, 5 Dec 2020 22:58:27 +0300 From: Cyrill Gorcunov Message-ID: <20201205195827.GC2303@grain> References: <20201204153003.175555-1-gorcunov@gmail.com> <20201204153003.175555-3-gorcunov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: Mons Anderson , tml On Sat, Dec 05, 2020 at 07:33:05PM +0100, Vladislav Shpilevoy wrote: > > | "build": { > > | "major": 2, > > | "minor": 7, > > | "patch": 0, > > 1. Why do you need the version numbers and the 'version' > string both? All can be extracted from 'version' string. Sure we can extract it but I thought this gonna be more convenient 'cause it won't require the reader to split string, extract the major/minor. This is not a problem to drop these fields and leave "version" only if you prefer. I don't mind. > > | }, > > | "signal": { > > | "signo": 11, > > | "si_code": 0, > > | "si_addr": "0x3e800009727", > > | "backtrace": "IzAgIDB4NjMyODhiIGlu..", > > | "timestamp": "0x164ceceac0610ae9", > > 2. Why is timestamp in hex? And why do you need both > hex and string? hex is simply more short form for timestamp, I can put plain integer number if you prefer. as to time string -- the prblem is in local timezone, the timestamp itself doesn't show which tz it is. thus we either have to send tz explicitly or a time string. if you mean to simply drop timestamp and use timestring only I'm ok with this.. > > > | "timestamp_str": "2020-12-02 17:34:20 MSK" > > | } > > | } > > > > Part-of #5261 > > 3. Lets better put the issue link in the end, like we do in 100% > of our patches except when a docbot request is present. Because > we usually look for issue reference in the end, and by putting it > into a random place of a commit message you complicate that > search. Ouch, sorry. Been basing so many times that managed to miss this, thanks! > > variables internally and feedback_no_crashinfo will be introduced > > in next paches, thus after this patch the crash report is disabled. > > 4. paches -> patches. You need to turn on a spell checker or something. true, thanks! > > + > > +static struct errstat glob_errstat; > > +static bool cfg_send_crash = false; > > 5. Why do you have a global object for this module, and yet > this part of it is separated into another global variable? Good point. Actually I introduced this cfg_send_crash a way later than glob_errstat. Indeed we can keep this settings inside errstat. Also let me explain why this structure called errstat - it keeps not just report to send but also registers (we use it to print to a console), backtrace, report. And we're planning to gather diag_set() calls here as well and provide some box.info interface for last say 10 errors. On the other hands crash.c may be sounds better, dunno. > > +static unsigned char * > > +base64_encode(unsigned char *dst, unsigned char *src, size_t src_len) > > +{ > > 6. Why can't you use the existing base64_encode() function from > base64.h? It is not different from your version. You both use global > variables (static also is global, just not visible everywhere). So > your version is not 'safer' in terms of being called from a signal > handler. This library is not linked into src/lib/core by default I can add it actually but since it is third_party library I can't be sure that it won't be changed someday leading to unpredicted results. As you might note I try to use local functions as much as I can and I think we need to implement own strlen/snprintf and etc here as well because when we're inside sigsegv some state of libc/third_party components might be broken. If you think it is safe to reuse base64 here I'll link this library in. > > + > > +#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst)) > > 7. What is _a? array, we use sizeof(dst) > > + > > +struct errstat * > > +errstat_get(void) > > 8. Why do you ever need to get it? Why the stat is not entirely > in the .c file? It heavily depends on the platform, so it does > not make much sense to expose it to any external code. It won't > be able to use it properly except for some common things. True. Leftover from previous design. Thanks! > > + > > +static inline > > +uint64_t timespec_to_ns(struct timespec *ts) > > 9. Please, put return type on a separate line. On the same line > as 'static inline'. yeah, sorry for this typo. > > + localtime_r(&sec, &tm); > > + ssize_t total = strftime(start, len, "%F %T %Z", &tm); > > + start += total; > > + if (total < len) > > + return buf; > > + buf[len-1] = '\0'; > > 10. Lets conform with our code style and use spaces before and after > binary operators. OK > > + } else > > + pr_err("can't fetch uname"); > > 11. Lets conform with the code style, and use {} either in both 'if' > and 'else', or in none. Yeah, sorry, thanks! > > +void > > +cfg_errstat_set_params(void) > > 12. I suggest you to rename it to errstat_cfg(). > > 'cfg' is a prefix used by 'cfg' module, which you also > use here. This is probably why we have box_cfg, raft_cfg_*, > iproto_cfg_* and so on. sure > > +{ > > + struct errstat_crash *cinfo = &errstat_get()->crash_info; > > + const char *host = cfg_gets("feedback_host"); > > + int enabled = cfg_getb("feedback_enabled"); > > 13. For flags we use name prefix 'is_'. Please, blend in. nod > > + int no_crashinfo = cfg_getb("feedback_no_crashinfo"); > > 14. That is another proof that the module name is misleading - you > even use crashinfo in the options. Does not it look consistent > to name the module 'crashinfo' too? (Or something else with 'crash' > and without 'stat' in name.) You know I think I agree. It is not clear when error statistics gonna be implemented, so crashinfo looks pretty fine. Will rename. > > + > > +void > > +errstat_reset(void) > > 15. This function is never used. thanks! will drop. > > +static void > > +prepare_report_script(void) > > +{ > > + struct errstat_crash *cinfo = &errstat_get()->crash_info; > > + struct errstat_build *binfo = &errstat_get()->build_info; > > + struct errstat_uname *uinfo = &errstat_get()->uname_info; > > + > > + char *p, *e; > > + > > +#ifdef ENABLE_BACKTRACE > > + /* > > + * We can't use arbitrary data which can be misinterpreted > > + * by Lua script when we pass it to a script. > > + * > > + * WARNING: We use report_encoded as a temp buffer. > > 16. Static buffer is much bigger, and is already there. Why > can't you use it? Instead you somewhy add several kilobytes of > another static buffer padding out the process size. static buffer is three pages only while we might need more size i suspect. let me reconsider this moment maybe indeed using tt_static would suite better. thanks! > > + > > + snprintf_safe("require(\'http.client\').post(\'%s\'," > > + "'{\"crashdump\":{\"version\":\"%d\"," > > + "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);", > > 17. Why so small timeout? this is the same timeout the feedback daemon uses. I can increase it or make it configurable if you prefer. > > +out: > > + pr_crit("unable to prepare a crash report"); > > + struct sigaction sa = { > > + .sa_handler = SIG_DFL, > > + }; > > + sigemptyset(&sa.sa_mask); > > + sigaction(SIGABRT, &sa, NULL); > > 18. Why do you touch the signal handler here? You didn't > change it anywhere in this file. It is taken from main signal handler code. I thought it was done this way in presume that this is safer -- ie even if one day code get changed and someone add sigabrt hander we still will produce a coredump. Probably worth to just drop out. > > + abort(); > > +} > > + > > +void > > +errstat_exec_send_crash(void) > > 19. Why is it called 'exec'? Because it runs exec inside. I can drop this exec_ if you prefer. > > + > > +#if defined(__cplusplus) > > +extern "C" { > > +#endif /* defined(__cplusplus) */ > > + > > +#define ERRSTAT_REPORT_VERSION 1 > > 20. It is not used out of .c file. I suggest you to > move it there. ok > > + > > +/** > > + * Return a pointer to the info keeper. > > + */ > > +extern struct errstat * > > 21. Please, omit 'extern'. They don't make any difference > here, and stand out of all the other libs we have. ok > > + > > +/** > > + * Collect a crash. > > + */ > > +extern void > > +errstat_collect_crash(int signo, siginfo_t *siginfo, void *context); > > + > > +/** > > + * Send a crash report. > > + */ > > +extern void > > +errstat_exec_send_crash(void); > > 22. Why do you have collect and send separated? Why can't it just be > something like 'crashinfo_handle_crash(sig stuff...);' To reuse regs and backtrace