[Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code
Cyrill Gorcunov
gorcunov at gmail.com
Sat Dec 5 22:58:27 MSK 2020
On Sat, Dec 05, 2020 at 07:33:05PM +0100, Vladislav Shpilevoy wrote:
> > | "build": {
> > | "major": 2,
> > | "minor": 7,
> > | "patch": 0,
>
> 1. Why do you need the version numbers and the 'version'
> string both? All can be extracted from 'version' string.
Sure we can extract it but I thought this gonna be more
convenient 'cause it won't require the reader to split
string, extract the major/minor. This is not a problem
to drop these fields and leave "version" only if you prefer.
I don't mind.
> > | },
> > | "signal": {
> > | "signo": 11,
> > | "si_code": 0,
> > | "si_addr": "0x3e800009727",
> > | "backtrace": "IzAgIDB4NjMyODhiIGlu..",
> > | "timestamp": "0x164ceceac0610ae9",
>
> 2. Why is timestamp in hex? And why do you need both
> hex and string?
hex is simply more short form for timestamp, I can put
plain integer number if you prefer.
as to time string -- the prblem is in local timezone,
the timestamp itself doesn't show which tz it is. thus
we either have to send tz explicitly or a time string.
if you mean to simply drop timestamp and use timestring
only I'm ok with this..
>
> > | "timestamp_str": "2020-12-02 17:34:20 MSK"
> > | }
> > | }
> >
> > Part-of #5261
>
> 3. Lets better put the issue link in the end, like we do in 100%
> of our patches except when a docbot request is present. Because
> we usually look for issue reference in the end, and by putting it
> into a random place of a commit message you complicate that
> search.
Ouch, sorry. Been basing so many times that managed to miss this,
thanks!
> > variables internally and feedback_no_crashinfo will be introduced
> > in next paches, thus after this patch the crash report is disabled.
>
> 4. paches -> patches. You need to turn on a spell checker or something.
true, thanks!
> > +
> > +static struct errstat glob_errstat;
> > +static bool cfg_send_crash = false;
>
> 5. Why do you have a global object for this module, and yet
> this part of it is separated into another global variable?
Good point. Actually I introduced this cfg_send_crash a way
later than glob_errstat. Indeed we can keep this settings inside
errstat.
Also let me explain why this structure called errstat - it
keeps not just report to send but also registers (we use it
to print to a console), backtrace, report. And we're planning
to gather diag_set() calls here as well and provide some
box.info interface for last say 10 errors.
On the other hands crash.c may be sounds better, dunno.
> > +static unsigned char *
> > +base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
> > +{
>
> 6. Why can't you use the existing base64_encode() function from
> base64.h? It is not different from your version. You both use global
> variables (static also is global, just not visible everywhere). So
> your version is not 'safer' in terms of being called from a signal
> handler.
This library is not linked into src/lib/core by default I can add it
actually but since it is third_party library I can't be sure that
it won't be changed someday leading to unpredicted results. As you
might note I try to use local functions as much as I can and I think
we need to implement own strlen/snprintf and etc here as well because
when we're inside sigsegv some state of libc/third_party components
might be broken.
If you think it is safe to reuse base64 here I'll link this library in.
> > +
> > +#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
>
> 7. What is _a?
array, we use sizeof(dst)
> > +
> > +struct errstat *
> > +errstat_get(void)
>
> 8. Why do you ever need to get it? Why the stat is not entirely
> in the .c file? It heavily depends on the platform, so it does
> not make much sense to expose it to any external code. It won't
> be able to use it properly except for some common things.
True. Leftover from previous design. Thanks!
> > +
> > +static inline
> > +uint64_t timespec_to_ns(struct timespec *ts)
>
> 9. Please, put return type on a separate line. On the same line
> as 'static inline'.
yeah, sorry for this typo.
> > + localtime_r(&sec, &tm);
> > + ssize_t total = strftime(start, len, "%F %T %Z", &tm);
> > + start += total;
> > + if (total < len)
> > + return buf;
> > + buf[len-1] = '\0';
>
> 10. Lets conform with our code style and use spaces before and after
> binary operators.
OK
> > + } else
> > + pr_err("can't fetch uname");
>
> 11. Lets conform with the code style, and use {} either in both 'if'
> and 'else', or in none.
Yeah, sorry, thanks!
> > +void
> > +cfg_errstat_set_params(void)
>
> 12. I suggest you to rename it to errstat_cfg().
>
> 'cfg' is a prefix used by 'cfg' module, which you also
> use here. This is probably why we have box_cfg, raft_cfg_*,
> iproto_cfg_* and so on.
sure
> > +{
> > + struct errstat_crash *cinfo = &errstat_get()->crash_info;
> > + const char *host = cfg_gets("feedback_host");
> > + int enabled = cfg_getb("feedback_enabled");
>
> 13. For flags we use name prefix 'is_'. Please, blend in.
nod
> > + int no_crashinfo = cfg_getb("feedback_no_crashinfo");
>
> 14. That is another proof that the module name is misleading - you
> even use crashinfo in the options. Does not it look consistent
> to name the module 'crashinfo' too? (Or something else with 'crash'
> and without 'stat' in name.)
You know I think I agree. It is not clear when error statistics gonna
be implemented, so crashinfo looks pretty fine. Will rename.
> > +
> > +void
> > +errstat_reset(void)
>
> 15. This function is never used.
thanks! will drop.
> > +static void
> > +prepare_report_script(void)
> > +{
> > + struct errstat_crash *cinfo = &errstat_get()->crash_info;
> > + struct errstat_build *binfo = &errstat_get()->build_info;
> > + struct errstat_uname *uinfo = &errstat_get()->uname_info;
> > +
> > + char *p, *e;
> > +
> > +#ifdef ENABLE_BACKTRACE
> > + /*
> > + * We can't use arbitrary data which can be misinterpreted
> > + * by Lua script when we pass it to a script.
> > + *
> > + * WARNING: We use report_encoded as a temp buffer.
>
> 16. Static buffer is much bigger, and is already there. Why
> can't you use it? Instead you somewhy add several kilobytes of
> another static buffer padding out the process size.
static buffer is three pages only while we might need more
size i suspect. let me reconsider this moment maybe indeed
using tt_static would suite better. thanks!
> > +
> > + snprintf_safe("require(\'http.client\').post(\'%s\',"
> > + "'{\"crashdump\":{\"version\":\"%d\","
> > + "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",
>
> 17. Why so small timeout?
this is the same timeout the feedback daemon uses. I can increase it
or make it configurable if you prefer.
> > +out:
> > + pr_crit("unable to prepare a crash report");
> > + struct sigaction sa = {
> > + .sa_handler = SIG_DFL,
> > + };
> > + sigemptyset(&sa.sa_mask);
> > + sigaction(SIGABRT, &sa, NULL);
>
> 18. Why do you touch the signal handler here? You didn't
> change it anywhere in this file.
It is taken from main signal handler code. I thought it was
done this way in presume that this is safer -- ie even if
one day code get changed and someone add sigabrt hander we
still will produce a coredump. Probably worth to just drop
out.
> > + abort();
> > +}
> > +
> > +void
> > +errstat_exec_send_crash(void)
>
> 19. Why is it called 'exec'?
Because it runs exec inside. I can drop this exec_ if you prefer.
> > +
> > +#if defined(__cplusplus)
> > +extern "C" {
> > +#endif /* defined(__cplusplus) */
> > +
> > +#define ERRSTAT_REPORT_VERSION 1
>
> 20. It is not used out of .c file. I suggest you to
> move it there.
ok
> > +
> > +/**
> > + * Return a pointer to the info keeper.
> > + */
> > +extern struct errstat *
>
> 21. Please, omit 'extern'. They don't make any difference
> here, and stand out of all the other libs we have.
ok
> > +
> > +/**
> > + * Collect a crash.
> > + */
> > +extern void
> > +errstat_collect_crash(int signo, siginfo_t *siginfo, void *context);
> > +
> > +/**
> > + * Send a crash report.
> > + */
> > +extern void
> > +errstat_exec_send_crash(void);
>
> 22. Why do you have collect and send separated? Why can't it just be
> something like 'crashinfo_handle_crash(sig stuff...);'
To reuse regs and backtrace
More information about the Tarantool-patches
mailing list