[Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code

Cyrill Gorcunov gorcunov at gmail.com
Sat Dec 5 22:58:27 MSK 2020


On Sat, Dec 05, 2020 at 07:33:05PM +0100, Vladislav Shpilevoy wrote:
> >  |   "build": {
> >  |     "major": 2,
> >  |     "minor": 7,
> >  |     "patch": 0,
> 
> 1. Why do you need the version numbers and the 'version'
> string both? All can be extracted from 'version' string.

Sure we can extract it but I thought this gonna be more
convenient 'cause it won't require the reader to split
string, extract the major/minor. This is not a problem
to drop these fields and leave "version" only if you prefer.
I don't mind.

> >  |   },
> >  |   "signal": {
> >  |     "signo": 11,
> >  |     "si_code": 0,
> >  |     "si_addr": "0x3e800009727",
> >  |     "backtrace": "IzAgIDB4NjMyODhiIGlu..",
> >  |     "timestamp": "0x164ceceac0610ae9",
> 
> 2. Why is timestamp in hex? And why do you need both
> hex and string?

hex is simply more short form for timestamp, I can put
plain integer number if you prefer.

as to time string -- the prblem is in local timezone,
the timestamp itself doesn't show which tz it is. thus
we either have to send tz explicitly or a time string.

if you mean to simply drop timestamp and use timestring
only I'm ok with this..

> 
> >  |     "timestamp_str": "2020-12-02 17:34:20 MSK"
> >  |   }
> >  | }
> > 
> > Part-of #5261
> 
> 3. Lets better put the issue link in the end, like we do in 100%
> of our patches except when a docbot request is present. Because
> we usually look for issue reference in the end, and by putting it
> into a random place of a commit message you complicate that
> search.

Ouch, sorry. Been basing so many times that managed to miss this,
thanks!

> > variables internally and feedback_no_crashinfo will be introduced
> > in next paches, thus after this patch the crash report is disabled.
> 
> 4. paches -> patches. You need to turn on a spell checker or something.

true, thanks!

> > +
> > +static struct errstat glob_errstat;
> > +static bool cfg_send_crash = false;
> 
> 5. Why do you have a global object for this module, and yet
> this part of it is separated into another global variable?

Good point. Actually I introduced this cfg_send_crash a way
later than glob_errstat. Indeed we can keep this settings inside
errstat.

Also let me explain why this structure called errstat - it
keeps not just report to send but also registers (we use it
to print to a console), backtrace, report. And we're planning
to gather diag_set() calls here as well and provide some
box.info interface for last say 10 errors.

On the other hands crash.c may be sounds better, dunno.

> > +static unsigned char *
> > +base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
> > +{
> 
> 6. Why can't you use the existing base64_encode() function from
> base64.h? It is not different from your version. You both use global
> variables (static also is global, just not visible everywhere). So
> your version is not 'safer' in terms of being called from a signal
> handler.

This library is not linked into src/lib/core by default I can add it
actually but since it is third_party library I can't be sure that
it won't be changed someday leading to unpredicted results. As you
might note I try to use local functions as much as I can and I think
we need to implement own strlen/snprintf and etc here as well because
when we're inside sigsegv some state of libc/third_party components
might be broken.

If you think it is safe to reuse base64 here I'll link this library in.

> > +
> > +#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
> 
> 7. What is _a?

array, we use sizeof(dst)

> > +
> > +struct errstat *
> > +errstat_get(void)
> 
> 8. Why do you ever need to get it? Why the stat is not entirely
> in the .c file? It heavily depends on the platform, so it does
> not make much sense to expose it to any external code. It won't
> be able to use it properly except for some common things.

True. Leftover from previous design. Thanks!

> > +
> > +static inline
> > +uint64_t timespec_to_ns(struct timespec *ts)
> 
> 9. Please, put return type on a separate line. On the same line
> as 'static inline'.

yeah, sorry for this typo.

> > +	localtime_r(&sec, &tm);
> > +	ssize_t total = strftime(start, len, "%F %T %Z", &tm);
> > +	start += total;
> > +	if (total < len)
> > +		return buf;
> > +	buf[len-1] = '\0';
> 
> 10. Lets conform with our code style and use spaces before and after
> binary operators.

OK

> > +	} else
> > +		pr_err("can't fetch uname");
> 
> 11. Lets conform with the code style, and use {} either in both 'if'
> and 'else', or in none.

Yeah, sorry, thanks!

> > +void
> > +cfg_errstat_set_params(void)
> 
> 12. I suggest you to rename it to errstat_cfg().
> 
> 'cfg' is a prefix used by 'cfg' module, which you also
> use here. This is probably why we have box_cfg, raft_cfg_*,
> iproto_cfg_* and so on.

sure

> > +{
> > +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> > +	const char *host = cfg_gets("feedback_host");
> > +	int enabled = cfg_getb("feedback_enabled");
> 
> 13. For flags we use name prefix 'is_'. Please, blend in.

nod

> > +	int no_crashinfo = cfg_getb("feedback_no_crashinfo");
> 
> 14. That is another proof that the module name is misleading - you
> even use crashinfo in the options. Does not it look consistent
> to name the module 'crashinfo' too? (Or something else with 'crash'
> and without 'stat' in name.)

You know I think I agree. It is not clear when error statistics gonna
be implemented, so crashinfo looks pretty fine. Will rename.

> > +
> > +void
> > +errstat_reset(void)
> 
> 15. This function is never used.

thanks! will drop.

> > +static void
> > +prepare_report_script(void)
> > +{
> > +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> > +	struct errstat_build *binfo = &errstat_get()->build_info;
> > +	struct errstat_uname *uinfo = &errstat_get()->uname_info;
> > +
> > +	char *p, *e;
> > +
> > +#ifdef ENABLE_BACKTRACE
> > +	/*
> > +	 * We can't use arbitrary data which can be misinterpreted
> > +	 * by Lua script when we pass it to a script.
> > +	 *
> > +	 * WARNING: We use report_encoded as a temp buffer.
> 
> 16. Static buffer is much bigger, and is already there. Why
> can't you use it? Instead you somewhy add several kilobytes of
> another static buffer padding out the process size.

static buffer is three pages only while we might need more
size i suspect. let me reconsider this moment maybe indeed
using tt_static would suite better. thanks!

> > +
> > +	snprintf_safe("require(\'http.client\').post(\'%s\',"
> > +		      "'{\"crashdump\":{\"version\":\"%d\","
> > +		      "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",
> 
> 17. Why so small timeout?

this is the same timeout the feedback daemon uses. I can increase it
or make it configurable if you prefer.

> > +out:
> > +	pr_crit("unable to prepare a crash report");
> > +	struct sigaction sa = {
> > +		.sa_handler = SIG_DFL,
> > +	};
> > +	sigemptyset(&sa.sa_mask);
> > +	sigaction(SIGABRT, &sa, NULL);
> 
> 18. Why do you touch the signal handler here? You didn't
> change it anywhere in this file.

It is taken from main signal handler code. I thought it was
done this way in presume that this is safer -- ie even if
one day code get changed and someone add sigabrt hander we
still will produce a coredump. Probably worth to just drop
out.

> > +	abort();
> > +}
> > +
> > +void
> > +errstat_exec_send_crash(void)
> 
> 19. Why is it called 'exec'?

Because it runs exec inside. I can drop this exec_ if you prefer.

> > +
> > +#if defined(__cplusplus)
> > +extern "C" {
> > +#endif /* defined(__cplusplus) */
> > +
> > +#define ERRSTAT_REPORT_VERSION 1
> 
> 20. It is not used out of .c file. I suggest you to
> move it there.

ok

> > +
> > +/**
> > + * Return a pointer to the info keeper.
> > + */
> > +extern struct errstat *
> 
> 21. Please, omit 'extern'. They don't make any difference
> here, and stand out of all the other libs we have.

ok

> > +
> > +/**
> > + * Collect a crash.
> > + */
> > +extern void
> > +errstat_collect_crash(int signo, siginfo_t *siginfo, void *context);
> > +
> > +/**
> > + * Send a crash report.
> > + */
> > +extern void
> > +errstat_exec_send_crash(void);
> 
> 22. Why do you have collect and send separated? Why can't it just be
> something like 'crashinfo_handle_crash(sig stuff...);'

To reuse regs and backtrace


More information about the Tarantool-patches mailing list