Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback
@ 2020-12-04 15:29 Cyrill Gorcunov
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-04 15:29 UTC (permalink / raw)
  To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy

Our feedback daemon sends only a few portions of usage
statistics. But crash dumps are pretty important for us
as well, because real users may catch a way more important
issues than our testing farm, it is simply impossible to
cover all possible scenarios.

For this sake, if crash happens we can send it to our
feedback server.

In this series we implement only base functionality and
may extend it later.

I didn't find yet a simple way to test this code anything
but manually.

Any comments are highly appreciated.

v2:
 - left for internal use
v3 (by Mons):
 - enable sending crash report by default but give
   an ability to disable it
 - use vfork when running subsequent exeutable of
   tarantool to send report, since plain execve won't
   produce local crashdump

issue https://github.com/tarantool/tarantool/issues/5261
branch gorcunov/gh-5261-crash-report-3

Cyrill Gorcunov (4):
  backtrace: allow to specify destination buffer
  errstat: add crash report base code
  crash: use errstat code in fatal signals
  cfg: configure crash report sending

 src/box/lua/feedback_daemon.lua |   7 +
 src/box/lua/load_cfg.lua        |   3 +
 src/lib/core/CMakeLists.txt     |   1 +
 src/lib/core/backtrace.cc       |  12 +-
 src/lib/core/backtrace.h        |   3 +
 src/lib/core/errstat.c          | 409 ++++++++++++++++++++++++++++++++
 src/lib/core/errstat.h          | 253 ++++++++++++++++++++
 src/main.cc                     |  83 ++++---
 test/box/admin.result           |   2 +
 test/box/cfg.result             |   4 +
 10 files changed, 735 insertions(+), 42 deletions(-)
 create mode 100644 src/lib/core/errstat.c
 create mode 100644 src/lib/core/errstat.h


base-commit: a14dad9543360c49c2cf8906a88e09271cad91b6
-- 
2.26.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer
  2020-12-04 15:29 [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Cyrill Gorcunov
@ 2020-12-04 15:30 ` Cyrill Gorcunov
  2020-12-05 18:30   ` Vladislav Shpilevoy
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code Cyrill Gorcunov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-04 15:30 UTC (permalink / raw)
  To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy

This will allow to reuse this routine in crash
reports.

Part-of #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/backtrace.cc | 12 ++++++------
 src/lib/core/backtrace.h  |  3 +++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/lib/core/backtrace.cc b/src/lib/core/backtrace.cc
index 456ce9a4d..68d0d3ee6 100644
--- a/src/lib/core/backtrace.cc
+++ b/src/lib/core/backtrace.cc
@@ -131,7 +131,7 @@ get_proc_name(unw_cursor_t *unw_cur, unw_word_t *offset, bool skip_cache)
 }
 
 char *
-backtrace(void)
+backtrace(char *start, char *end)
 {
 	int frame_no = 0;
 	unw_word_t sp = 0, old_sp = 0, ip, offset;
@@ -139,10 +139,8 @@ backtrace(void)
 	unw_getcontext(&unw_context);
 	unw_cursor_t unw_cur;
 	unw_init_local(&unw_cur, &unw_context);
-	char *backtrace_buf = (char *)static_alloc(SMALL_STATIC_SIZE);
-	char *p = backtrace_buf;
-	char *end = p + SMALL_STATIC_SIZE - 1;
 	int unw_status;
+	char *p = start;
 	*p = '\0';
 	while ((unw_status = unw_step(&unw_cur)) > 0) {
 		const char *proc;
@@ -174,7 +172,7 @@ backtrace(void)
 		say_debug("unwinding error: %i", unw_status);
 #endif
 out:
-	return backtrace_buf;
+	return start;
 }
 
 /*
@@ -436,7 +434,9 @@ backtrace_foreach(backtrace_cb cb, coro_context *coro_ctx, void *cb_ctx)
 void
 print_backtrace(void)
 {
-	fdprintf(STDERR_FILENO, "%s", backtrace());
+	char *start = (char *)static_alloc(SMALL_STATIC_SIZE);
+	char *end = start + SMALL_STATIC_SIZE - 1;
+	fdprintf(STDERR_FILENO, "%s", backtrace(start, end));
 }
 #endif /* ENABLE_BACKTRACE */
 
diff --git a/src/lib/core/backtrace.h b/src/lib/core/backtrace.h
index c119d5402..55489c01b 100644
--- a/src/lib/core/backtrace.h
+++ b/src/lib/core/backtrace.h
@@ -40,6 +40,9 @@ extern "C" {
 #ifdef ENABLE_BACKTRACE
 #include <coro.h>
 
+char *
+backtrace(char *start, char *end);
+
 void print_backtrace(void);
 
 typedef int (backtrace_cb)(int frameno, void *frameret,
-- 
2.26.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code
  2020-12-04 15:29 [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Cyrill Gorcunov
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
@ 2020-12-04 15:30 ` Cyrill Gorcunov
  2020-12-05 18:33   ` Vladislav Shpilevoy
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-04 15:30 UTC (permalink / raw)
  To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy

The errstat stands for error statistics. At moment it supports
gathering crash dumps into an internal preallocated buffer
which includes:

 - `uname` output
 - build type
 - a reason for the crash
 - call backtrace (linux x86-64 only)

The data is collected into json format and then encoded into
base64 form. Moreover the backtrace itself is preencoded as
base64 earlier because we don't need the json values to consist
some weird characters.

Once data is collected one can run errstat_exec_send_crash function
which executes another copy of tarantool and passes it a script
in form of

 > tarantool -e "require('http.client').post('127.0.0.1:1500', \
 > '{"crashdump":{"version":"1","data":"eyJ1bmFtZSI6eyJzeXNuYW1l.."}}', \
 > {timeout=1}); os.exit(1);"

The address of the network dump collector is configurable via traditional
box.cfg{feedback_host=addr}.

Note that we're trying to use preallocated memory for data collecting
since this code is supposed to be called from inside of a fatal signal
handler. For simplicity we use snprintf, strlen, memcpy inside encoder
but should consider to implement own helpers instead to minimize the
use of system libraries. For exactly the same reason we use inplace
base64 encode helper.

When data is ready we send it with help of vfork+execve, this is done
to make sure that local coredump is still generated even after the
report is sent.

A typical encoded data looks like

 | {
 |   "uname": {
 |     "sysname": "Linux",
 |     "release": "5.9.10-100.fc32.x86_64",
 |     "version": "#1 SMP Mon Nov 23 18:12:36 UTC 2020",
 |     "machine": "x86_64"
 |   },
 |   "build": {
 |     "major": 2,
 |     "minor": 7,
 |     "patch": 0,
 |     "version": "2.7.0-78-gf898822f9",
 |     "cmake_type": "Linux-x86_64-Debug"
 |   },
 |   "signal": {
 |     "signo": 11,
 |     "si_code": 0,
 |     "si_addr": "0x3e800009727",
 |     "backtrace": "IzAgIDB4NjMyODhiIGlu..",
 |     "timestamp": "0x164ceceac0610ae9",
 |     "timestamp_str": "2020-12-02 17:34:20 MSK"
 |   }
 | }

Part-of #5261

Worth to mention that the patch uses box.cfg's

 - feedback_host
 - feedback_enabled
 - feedback_no_crashinfo

variables internally and feedback_no_crashinfo will be introduced
in next paches, thus after this patch the crash report is disabled.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/CMakeLists.txt |   1 +
 src/lib/core/errstat.c      | 409 ++++++++++++++++++++++++++++++++++++
 src/lib/core/errstat.h      | 253 ++++++++++++++++++++++
 3 files changed, 663 insertions(+)
 create mode 100644 src/lib/core/errstat.c
 create mode 100644 src/lib/core/errstat.h

diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt
index 13ed1e7ab..621a4f019 100644
--- a/src/lib/core/CMakeLists.txt
+++ b/src/lib/core/CMakeLists.txt
@@ -1,5 +1,6 @@
 set(core_sources
     diag.c
+    errstat.c
     say.c
     memory.c
     clock.c
diff --git a/src/lib/core/errstat.c b/src/lib/core/errstat.c
new file mode 100644
index 000000000..992bc426a
--- /dev/null
+++ b/src/lib/core/errstat.c
@@ -0,0 +1,409 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#include <string.h>
+#include <unistd.h>
+#include <time.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+
+#include "trivia/util.h"
+#include "backtrace.h"
+#include "errstat.h"
+#include "cfg.h"
+#include "say.h"
+
+#define pr_fmt(fmt)		"errstat: " fmt
+#define pr_info(fmt, ...)	say_info(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)	say_error(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)	fprintf(stderr, pr_fmt(fmt) "\n", ##__VA_ARGS__)
+#define pr_panic(fmt, ...)	panic(pr_fmt(fmt), ##__VA_ARGS__)
+
+static struct errstat glob_errstat;
+static bool cfg_send_crash = false;
+
+/*
+ * We don't need it to be optimized but rather a compact form.
+ */
+static unsigned char *
+base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
+{
+	static int m[] = {0, 2, 1};
+	static unsigned char t[] = {
+		'A','B','C','D','E','F','G','H',
+		'I','J','K','L','M','N','O','P',
+		'Q','R','S','T','U','V','W','X',
+		'Y','Z','a','b','c','d','e','f',
+		'g','h','i','j','k','l','m','n',
+		'o','p','q','r','s','t','u','v',
+		'w','x','y','z','0','1','2','3',
+		'4','5','6','7','8','9','+','/'
+	};
+	size_t i, j;
+
+	for (i = 0, j = 0; i < src_len;) {
+		uint32_t a = i < src_len ? src[i++] : 0;
+		uint32_t b = i < src_len ? src[i++] : 0;
+		uint32_t c = i < src_len ? src[i++] : 0;
+
+		uint32_t d = (a << 0x10) + (b << 0x08) + c;
+
+		dst[j++] = t[(d >> 3 * 6) & 0x3f];
+		dst[j++] = t[(d >> 2 * 6) & 0x3f];
+		dst[j++] = t[(d >> 1 * 6) & 0x3f];
+		dst[j++] = t[(d >> 0 * 6) & 0x3f];
+	}
+
+	size_t dst_len = ERRSTAT_BASE64_LEN(src_len);
+	j = m[src_len % 3];
+	for (i = 0; i < j; i++)
+		dst[dst_len - 1 - i] = '=';
+
+	return dst;
+}
+
+static size_t
+strlcpy(char *dst, const char *src, size_t size)
+{
+	size_t ret = strlen(src);
+	if (size) {
+		size_t len = (ret >= size) ? size - 1 : ret;
+		memcpy(dst, src, len);
+		dst[len] = '\0';
+	}
+	return ret;
+}
+
+#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
+
+struct errstat *
+errstat_get(void)
+{
+	return &glob_errstat;
+}
+
+static inline
+uint64_t timespec_to_ns(struct timespec *ts)
+{
+	return (uint64_t)ts->tv_sec * 1000000000 + (uint64_t)ts->tv_nsec;
+}
+
+static char *
+ns_to_localtime(uint64_t timestamp, char *buf, ssize_t len)
+{
+	time_t sec = timestamp / 1000000000;
+	char *start = buf;
+	struct tm tm;
+
+	/*
+	 * Use similar format as say_x logger. Except plain
+	 * seconds should be enough.
+	 */
+	localtime_r(&sec, &tm);
+	ssize_t total = strftime(start, len, "%F %T %Z", &tm);
+	start += total;
+	if (total < len)
+		return buf;
+	buf[len-1] = '\0';
+	return buf;
+}
+
+void
+errstat_init(const char *tarantool_bin)
+{
+	struct errstat_build *binfo = &errstat_get()->build_info;
+	struct errstat_uname *uinfo = &errstat_get()->uname_info;
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+
+	binfo->major = PACKAGE_VERSION_MAJOR;
+	binfo->minor = PACKAGE_VERSION_MINOR;
+	binfo->patch = PACKAGE_VERSION_PATCH;
+
+	strlcpy_a(binfo->version, PACKAGE_VERSION);
+	strlcpy_a(binfo->cmake_type, BUILD_INFO);
+
+	static_assert(ERRSTAT_UNAME_BUF_LEN > sizeof(struct errstat_uname),
+		      "uname_buf is too small");
+
+	char uname_buf[ERRSTAT_UNAME_BUF_LEN];
+	struct utsname *uname_ptr = (void *)uname_buf;
+	if (uname(uname_ptr) == 0) {
+		strlcpy_a(uinfo->sysname, uname_ptr->sysname);
+		strlcpy_a(uinfo->nodename, uname_ptr->nodename);
+		strlcpy_a(uinfo->release, uname_ptr->release);
+		strlcpy_a(uinfo->version, uname_ptr->version);
+		strlcpy_a(uinfo->machine, uname_ptr->machine);
+	} else
+		pr_err("can't fetch uname");
+
+	strlcpy_a(cinfo->tarantool_bin, tarantool_bin);
+	if (strlen(cinfo->tarantool_bin) < strlen(tarantool_bin))
+		pr_panic("can't save binary path");
+
+	static_assert(sizeof(cinfo->exec_argv_1) == 4,
+		      "exec_argv_1 is too small");
+	strlcpy_a(cinfo->exec_argv_1, "-e");
+}
+
+void
+cfg_errstat_set_params(void)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+	const char *host = cfg_gets("feedback_host");
+	int enabled = cfg_getb("feedback_enabled");
+	int no_crashinfo = cfg_getb("feedback_no_crashinfo");
+
+	if (host == NULL || enabled == 0 || no_crashinfo == 1) {
+		if (cfg_send_crash) {
+			pr_info("disable sedning crashinfo feedback");
+			cfg_send_crash = false;
+			cinfo->feedback_host[0] = '\0';
+		}
+		return;
+	}
+
+	if (strcmp(cinfo->feedback_host, host) != 0) {
+		strlcpy_a(cinfo->feedback_host, host);
+		if (strlen(cinfo->feedback_host) < strlen(host))
+			pr_panic("feedback_host is too long");
+	}
+
+	if (!cfg_send_crash) {
+		pr_info("enable sedning crashinfo feedback");
+		cfg_send_crash = true;
+	}
+}
+
+void
+errstat_reset(void)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+
+#ifdef ENABLE_BACKTRACE
+	cinfo->backtrace_buf[0] = '\0';
+#endif
+	memset(&cinfo->siginfo, 0, sizeof(cinfo->siginfo));
+	cinfo->timestamp_rt = 0;
+}
+
+#ifdef TARGET_OS_LINUX
+static void
+collect_gregs(struct errstat_crash *cinfo, ucontext_t *uc)
+{
+	static_assert(sizeof(cinfo->greg) == sizeof(uc->uc_mcontext),
+		      "GP regs are not matching signal frame");
+
+	/*
+	 * uc_mcontext on libc level looks somehow strange,
+	 * they define an array of uint64_t where each register
+	 * defined by REG_x macro.
+	 *
+	 * In turn the kernel is quite explicit about the context.
+	 * Moreover it is a part of user ABI, thus won't be changed.
+	 *
+	 * Lets use memcpy here to make a copy in a fast way.
+	 */
+	memcpy(&cinfo->greg, &uc->uc_mcontext, sizeof(cinfo->greg));
+}
+#endif
+
+/**
+ * The routine is called inside crash signal handler so
+ * be carefull to not cause additional signals inside.
+ */
+void
+errstat_collect_crash(int signo, siginfo_t *siginfo, void *ucontext)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+
+	struct timespec ts;
+	if (clock_gettime(CLOCK_REALTIME, &ts) == 0) {
+		cinfo->timestamp_rt = timespec_to_ns(&ts);
+		ns_to_localtime(cinfo->timestamp_rt,
+				cinfo->timestamp_rt_str,
+				sizeof(cinfo->timestamp_rt_str));
+	} else {
+		cinfo->timestamp_rt = 0;
+		memset(cinfo->timestamp_rt_str, 0,
+		       sizeof(cinfo->timestamp_rt_str));
+	}
+
+	cinfo->signo = signo;
+	cinfo->siginfo = *siginfo;
+
+	cinfo->context_addr = ucontext;
+	cinfo->siginfo_addr = siginfo;
+
+#ifdef ENABLE_BACKTRACE
+	char *start = cinfo->backtrace_buf;
+	char *end = start + sizeof(cinfo->backtrace_buf) - 1;
+	backtrace(start, end);
+#endif
+
+#ifdef TARGET_OS_LINUX
+	collect_gregs(cinfo, ucontext);
+#endif
+}
+
+/**
+ * Prepare report in json format and put it into a buffer.
+ */
+static void
+prepare_report_script(void)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+	struct errstat_build *binfo = &errstat_get()->build_info;
+	struct errstat_uname *uinfo = &errstat_get()->uname_info;
+
+	char *p, *e;
+
+#ifdef ENABLE_BACKTRACE
+	/*
+	 * We can't use arbitrary data which can be misinterpreted
+	 * by Lua script when we pass it to a script.
+	 *
+	 * WARNING: We use report_encoded as a temp buffer.
+	 */
+	size_t bt_len = strlen(cinfo->backtrace_buf);
+	size_t bt_elen = ERRSTAT_BASE64_LEN(bt_len);
+	if (bt_elen >= sizeof(cinfo->report_encoded))
+		pr_panic("backtrace space is too small");
+
+	base64_encode((unsigned char *)cinfo->report_encoded,
+		      (unsigned char *)cinfo->backtrace_buf, bt_len);
+	cinfo->report_encoded[bt_elen] = '\0';
+	memcpy(cinfo->backtrace_buf, cinfo->report_encoded, bt_elen + 1);
+#endif
+
+#define snprintf_safe(fmt, ...)					\
+	do {							\
+		p += snprintf(p, e - p, fmt, ##__VA_ARGS__);	\
+		if (p >= e)					\
+			goto out;				\
+	} while (0)
+
+	e = cinfo->report + sizeof(cinfo->report) - 1;
+	p = cinfo->report;
+
+	snprintf_safe("{");
+	snprintf_safe("\"uname\":{");
+	snprintf_safe("\"sysname\":\"%s\",", uinfo->sysname);
+#if 0
+	/*
+	 * nodename might a sensitive information so don't
+	 * send it by default.
+	 */
+	snprintf_safe("\"nodename\":\"%s\",", uinfo->nodename);
+#endif
+	snprintf_safe("\"release\":\"%s\",", uinfo->release);
+	snprintf_safe("\"version\":\"%s\",", uinfo->version);
+	snprintf_safe("\"machine\":\"%s\"", uinfo->machine);
+	snprintf_safe("},");
+
+	snprintf_safe("\"build\":{");
+	snprintf_safe("\"major\":%d,", binfo->major);
+	snprintf_safe("\"minor\":%d,", binfo->minor);
+	snprintf_safe("\"patch\":%d,", binfo->patch);
+	snprintf_safe("\"version\":\"%s\",", binfo->version);
+	snprintf_safe("\"cmake_type\":\"%s\"", binfo->cmake_type);
+	snprintf_safe("},");
+
+	snprintf_safe("\"signal\":{");
+	snprintf_safe("\"signo\":%d,", cinfo->signo);
+	snprintf_safe("\"si_code\":%d,", cinfo->siginfo.si_code);
+	if (cinfo->signo == SIGSEGV) {
+		if (cinfo->siginfo.si_code == SEGV_MAPERR) {
+			snprintf_safe("\"si_code_str\":\"%s\",",
+				      "SEGV_MAPERR");
+		} else if (cinfo->siginfo.si_code == SEGV_ACCERR) {
+			snprintf_safe("\"si_code_str\":\"%s\",",
+				      "SEGV_ACCERR");
+		}
+		snprintf_safe("\"si_addr\":\"0x%llx\",",
+			      (long long)cinfo->siginfo.si_addr);
+	}
+#ifdef ENABLE_BACKTRACE
+	snprintf_safe("\"backtrace\":\"%s\",", cinfo->backtrace_buf);
+#endif
+	snprintf_safe("\"timestamp\":\"0x%llx\",",
+		      (long long)cinfo->timestamp_rt);
+	snprintf_safe("\"timestamp_str\":\"%s\"",
+		      cinfo->timestamp_rt_str);
+	snprintf_safe("}");
+	snprintf_safe("}\'");
+
+	size_t report_len = strlen(cinfo->report);
+	size_t report_elen = ERRSTAT_BASE64_LEN(report_len);
+	if (report_elen >= sizeof(cinfo->report_encoded))
+		pr_panic("report encoded space is too small");
+
+	base64_encode((unsigned char *)cinfo->report_encoded,
+		      (unsigned char *)cinfo->report,
+		      report_len);
+	cinfo->report_encoded[report_elen] = '\0';
+
+	e = cinfo->report_script + sizeof(cinfo->report_script) - 1;
+	p = cinfo->report_script;
+
+	snprintf_safe("require(\'http.client\').post(\'%s\',"
+		      "'{\"crashdump\":{\"version\":\"%d\","
+		      "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",
+		      cinfo->feedback_host,
+		      ERRSTAT_REPORT_VERSION,
+		      cinfo->report_encoded);
+
+#undef snprintf_safe
+	return;
+
+out:
+	pr_crit("unable to prepare a crash report");
+	struct sigaction sa = {
+		.sa_handler = SIG_DFL,
+	};
+	sigemptyset(&sa.sa_mask);
+	sigaction(SIGABRT, &sa, NULL);
+
+	abort();
+}
+
+void
+errstat_exec_send_crash(void)
+{
+	if (!cfg_send_crash)
+		return;
+
+	prepare_report_script();
+
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+	cinfo->exec_argv[0] = cinfo->tarantool_bin;
+	cinfo->exec_argv[1] = cinfo->exec_argv_1;
+	cinfo->exec_argv[2] = cinfo->report_script;
+	cinfo->exec_argv[3] = NULL;
+
+	/*
+	 * Can't use fork here because libev has own
+	 * at_fork helpers with mutex where we might
+	 * stuck (see popen code).
+	 */
+	pid_t pid = vfork();
+	if (pid < 0) {
+		pr_crit("unable to fork (errno %d)", errno);
+	} else if (pid == 0) {
+		/*
+		 * The script must exit at the end but there
+		 * is no simple way to make sure from inside
+		 * of a signal crash handler. So just hope it
+		 * is running fine.
+		 */
+		execve(cinfo->tarantool_bin, cinfo->exec_argv, NULL);
+		pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
+			cinfo->tarantool_bin,
+			cinfo->exec_argv[0],
+			cinfo->exec_argv[1],
+			cinfo->exec_argv[2]);
+		_exit(1);
+	}
+}
diff --git a/src/lib/core/errstat.h b/src/lib/core/errstat.h
new file mode 100644
index 000000000..77db53052
--- /dev/null
+++ b/src/lib/core/errstat.h
@@ -0,0 +1,253 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+#pragma once
+
+#include <stdint.h>
+#include <signal.h>
+#include <limits.h>
+
+#include "trivia/config.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif /* defined(__cplusplus) */
+
+#define ERRSTAT_REPORT_VERSION 1
+
+/**
+ * Build type information for statistics.
+ */
+struct errstat_build {
+	/**
+	 * Package major version - 1 for 1.6.7.
+	 */
+	int major;
+	/**
+	 * Package minor version - 6 for 1.6.7
+	 */
+	int minor;
+	/**
+	 * Package patch version - 7 for 1.6.7
+	 */
+	int patch;
+	/**
+	 * A string with major-minor-patch-commit-id identifier of the
+	 * release, e.g. 2.7.0-62-g0b7726571.
+	 */
+	char version[64];
+	/**
+	 * Build type (Debug and etc).
+	 */
+	char cmake_type[64];
+};
+
+#ifdef TARGET_OS_LINUX
+#ifndef __x86_64__
+# error "Non x86-64 architectures are not supported"
+#endif
+struct errstat_greg {
+	uint64_t	r8;
+	uint64_t	r9;
+	uint64_t	r10;
+	uint64_t	r11;
+	uint64_t	r12;
+	uint64_t	r13;
+	uint64_t	r14;
+	uint64_t	r15;
+	uint64_t	di;
+	uint64_t	si;
+	uint64_t	bp;
+	uint64_t	bx;
+	uint64_t	dx;
+	uint64_t	ax;
+	uint64_t	cx;
+	uint64_t	sp;
+	uint64_t	ip;
+	uint64_t	flags;
+	uint16_t	cs;
+	uint16_t	gs;
+	uint16_t	fs;
+	uint16_t	ss;
+	uint64_t	err;
+	uint64_t	trapno;
+	uint64_t	oldmask;
+	uint64_t	cr2;
+	uint64_t	fpstate;
+	uint64_t	reserved1[8];
+};
+#endif /* TARGET_OS_LINUX */
+
+#define ERRSTAT_BASE64_LEN(len)			(4 * (((len) + 2) / 3))
+
+/*
+ * 4K of memory should be enough to keep the backtrace.
+ * In worst case it gonna be simply trimmed. Since we're
+ * reporting it encoded the pain text shrinks to 3070 bytes.
+ */
+#define ERRSTAT_BACKTRACE_MAX			(4096)
+
+/*
+ * The report should include the bactrace
+ * and all additional information we're
+ * going to send.
+ */
+#define ERRSTAT_REPORT_PAYLOAD_MAX		(2048)
+#ifdef ENABLE_BACKTRACE
+# define ERRSTAT_REPORT_MAX			\
+	(ERRSTAT_BACKTRACE_MAX +		\
+	 ERRSTAT_REPORT_PAYLOAD_MAX)
+# else
+# define ERRSTAT_REPORT_MAX			\
+	(ERRSTAT_REPORT_PAYLOAD_MAX)
+#endif
+
+/*
+ * We encode report into base64 because
+ * it is passed inside Lua script.
+ */
+#define ERRSTAT_REPORT_ENCODED_MAX		\
+	ERRSTAT_BASE64_LEN(ERRSTAT_REPORT_MAX)
+
+
+/*
+ * The script to execute should contain encoded
+ * report.
+ */
+#define ERRSTAT_REPORT_SCRIPT_PAYLOAD_MAX	(512)
+#define ERRSTAT_REPORT_SCRIPT_MAX		\
+	(ERRSTAT_REPORT_SCRIPT_PAYLOAD_MAX +	\
+	 ERRSTAT_REPORT_ENCODED_MAX)
+
+struct errstat_crash {
+	/**
+	 * Exec arguments pointers.
+	 */
+	char *exec_argv[4];
+	/**
+	 * Predefined argument "-e".
+	 */
+	char exec_argv_1[4];
+	/**
+	 * Crash report in plain json format.
+	 */
+	char report[ERRSTAT_REPORT_MAX];
+	/**
+	 * Crash report in base64 form.
+	 */
+	char report_encoded[ERRSTAT_REPORT_ENCODED_MAX];
+	/**
+	 * Tarantool executable to send report stript.
+	 */
+	char tarantool_bin[PATH_MAX];
+	/**
+	 * The script to evaluate by tarantool
+	 * to send the report.
+	 */
+	char report_script[ERRSTAT_REPORT_SCRIPT_MAX];
+#ifdef ENABLE_BACKTRACE
+	/**
+	 * Backtrace buffer.
+	 */
+	char backtrace_buf[ERRSTAT_BACKTRACE_MAX];
+#endif
+	/**
+	 * Crash signal.
+	 */
+	int signo;
+	/**
+	 * Signal information.
+	 */
+	siginfo_t siginfo;
+	/**
+	 * These two are mostly useless as being
+	 * plain addresses but keep for backward
+	 * compatibility.
+	 */
+	void *context_addr;
+	void *siginfo_addr;
+#ifdef TARGET_OS_LINUX
+	/**
+	 * Registers contents.
+	 */
+	struct errstat_greg greg;
+#endif
+	/**
+	 * Timestamp in nanoseconds (realtime).
+	 */
+	uint64_t timestamp_rt;
+	/**
+	 * Timestamp string representation to
+	 * use on demand.
+	 */
+	char timestamp_rt_str[32];
+	/**
+	 * Crash collector host.
+	 */
+	char feedback_host[1024];
+};
+
+#define ERRSTAT_UNAME_BUF_LEN	1024
+#define ERRSTAT_UNAME_FIELD_LEN	128
+/**
+ * Information about node.
+ *
+ * On linux there is new_utsname structure which
+ * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
+ * So lets just reserve more data in advance.
+ */
+struct errstat_uname {
+	char sysname[ERRSTAT_UNAME_FIELD_LEN];
+	char nodename[ERRSTAT_UNAME_FIELD_LEN];
+	char release[ERRSTAT_UNAME_FIELD_LEN];
+	char version[ERRSTAT_UNAME_FIELD_LEN];
+	char machine[ERRSTAT_UNAME_FIELD_LEN];
+};
+
+struct errstat {
+	struct errstat_build build_info;
+	struct errstat_uname uname_info;
+	struct errstat_crash crash_info;
+};
+
+/**
+ * Return a pointer to the info keeper.
+ */
+extern struct errstat *
+errstat_get(void);
+
+/**
+ * Initialize error statistics.
+ */
+extern void
+errstat_init(const char *tarantool_bin);
+
+/**
+ * Configure errstat from feedback daemon.
+ */
+extern void
+cfg_errstat_set_params(void);
+
+/**
+ * Reset everything except build information.
+ */
+extern void
+errstat_reset(void);
+
+/**
+ * Collect a crash.
+ */
+extern void
+errstat_collect_crash(int signo, siginfo_t *siginfo, void *context);
+
+/**
+ * Send a crash report.
+ */
+extern void
+errstat_exec_send_crash(void);
+
+#if defined(__cplusplus)
+}
+#endif /* defined(__cplusplus) */
-- 
2.26.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals
  2020-12-04 15:29 [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Cyrill Gorcunov
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code Cyrill Gorcunov
@ 2020-12-04 15:30 ` Cyrill Gorcunov
  2020-12-05 18:33   ` Vladislav Shpilevoy
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 4/4] cfg: configure crash report sending Cyrill Gorcunov
  2020-12-05 18:30 ` [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Vladislav Shpilevoy
  4 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-04 15:30 UTC (permalink / raw)
  To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy

In errstat code we fetch the signal statistic and
generate a backtrace for report. We don't send this
data right now but can reuse this code to not decode
registers and generate backtrace twice.

Part-of #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/main.cc | 83 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 36 deletions(-)

diff --git a/src/main.cc b/src/main.cc
index 2f48f474c..260b9a0ff 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -79,6 +79,7 @@
 #include "systemd.h"
 #include "crypto/crypto.h"
 #include "core/popen.h"
+#include "core/errstat.h"
 
 static pid_t master_pid = getpid();
 static struct pidfh *pid_file_handle;
@@ -184,45 +185,43 @@ signal_sigwinch_cb(ev_loop *loop, struct ev_signal *w, int revents)
 		rl_resize_terminal();
 }
 
-#if defined(__linux__) && defined(__amd64)
-
-inline void
-dump_x86_64_register(const char *reg_name, unsigned long long val)
+#ifdef TARGET_OS_LINUX
+static inline void
+dump_register(const char *reg_name, unsigned long long val)
 {
 	fprintf(stderr, "  %-9s0x%-17llx%lld\n", reg_name, val, val);
 }
 
-void
-dump_x86_64_registers(ucontext_t *uc)
+static void
+dump_registers(struct errstat_crash *cinfo)
 {
-	dump_x86_64_register("rax", uc->uc_mcontext.gregs[REG_RAX]);
-	dump_x86_64_register("rbx", uc->uc_mcontext.gregs[REG_RBX]);
-	dump_x86_64_register("rcx", uc->uc_mcontext.gregs[REG_RCX]);
-	dump_x86_64_register("rdx", uc->uc_mcontext.gregs[REG_RDX]);
-	dump_x86_64_register("rsi", uc->uc_mcontext.gregs[REG_RSI]);
-	dump_x86_64_register("rdi", uc->uc_mcontext.gregs[REG_RDI]);
-	dump_x86_64_register("rsp", uc->uc_mcontext.gregs[REG_RSP]);
-	dump_x86_64_register("rbp", uc->uc_mcontext.gregs[REG_RBP]);
-	dump_x86_64_register("r8", uc->uc_mcontext.gregs[REG_R8]);
-	dump_x86_64_register("r9", uc->uc_mcontext.gregs[REG_R9]);
-	dump_x86_64_register("r10", uc->uc_mcontext.gregs[REG_R10]);
-	dump_x86_64_register("r11", uc->uc_mcontext.gregs[REG_R11]);
-	dump_x86_64_register("r12", uc->uc_mcontext.gregs[REG_R12]);
-	dump_x86_64_register("r13", uc->uc_mcontext.gregs[REG_R13]);
-	dump_x86_64_register("r14", uc->uc_mcontext.gregs[REG_R14]);
-	dump_x86_64_register("r15", uc->uc_mcontext.gregs[REG_R15]);
-	dump_x86_64_register("rip", uc->uc_mcontext.gregs[REG_RIP]);
-	dump_x86_64_register("eflags", uc->uc_mcontext.gregs[REG_EFL]);
-	dump_x86_64_register("cs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 0) & 0xffff);
-	dump_x86_64_register("gs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 16) & 0xffff);
-	dump_x86_64_register("fs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 32) & 0xffff);
-	dump_x86_64_register("cr2", uc->uc_mcontext.gregs[REG_CR2]);
-	dump_x86_64_register("err", uc->uc_mcontext.gregs[REG_ERR]);
-	dump_x86_64_register("oldmask", uc->uc_mcontext.gregs[REG_OLDMASK]);
-	dump_x86_64_register("trapno", uc->uc_mcontext.gregs[REG_TRAPNO]);
+	dump_register("rax", cinfo->greg.ax);
+	dump_register("rbx", cinfo->greg.bx);
+	dump_register("rcx", cinfo->greg.cx);
+	dump_register("rdx", cinfo->greg.dx);
+	dump_register("rsi", cinfo->greg.si);
+	dump_register("rdi", cinfo->greg.di);
+	dump_register("rsp", cinfo->greg.sp);
+	dump_register("rbp", cinfo->greg.bp);
+	dump_register("r8", cinfo->greg.r8);
+	dump_register("r9", cinfo->greg.r9);
+	dump_register("r10", cinfo->greg.r10);
+	dump_register("r11", cinfo->greg.r11);
+	dump_register("r12", cinfo->greg.r12);
+	dump_register("r13", cinfo->greg.r13);
+	dump_register("r14", cinfo->greg.r14);
+	dump_register("r15", cinfo->greg.r15);
+	dump_register("rip", cinfo->greg.ip);
+	dump_register("eflags", cinfo->greg.flags);
+	dump_register("cs", cinfo->greg.cs);
+	dump_register("gs", cinfo->greg.gs);
+	dump_register("fs", cinfo->greg.fs);
+	dump_register("cr2", cinfo->greg.cr2);
+	dump_register("err", cinfo->greg.err);
+	dump_register("oldmask", cinfo->greg.oldmask);
+	dump_register("trapno", cinfo->greg.trapno);
 }
-
-#endif /* defined(__linux__) && defined(__amd64) */
+#endif /* TARGET_OS_LINUX */
 
 /** Try to log as much as possible before dumping a core.
  *
@@ -242,6 +241,7 @@ dump_x86_64_registers(ucontext_t *uc)
 static void
 sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 {
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
 	static volatile sig_atomic_t in_cb = 0;
 	int fd = STDERR_FILENO;
 	struct sigaction sa;
@@ -253,6 +253,10 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 	}
 
 	in_cb = 1;
+	/*
+	 * Notify errstat engine about the crash.
+	 */
+	errstat_collect_crash(signo, siginfo, context);
 
 	if (signo == SIGSEGV) {
 		fdprintf(fd, "Segmentation fault\n");
@@ -279,8 +283,8 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 	fprintf(stderr, "  context: %p\n", context);
 	fprintf(stderr, "  siginfo: %p\n", siginfo);
 
-#if defined(__linux__) && defined(__amd64)
-	dump_x86_64_registers((ucontext_t *)context);
+#ifdef TARGET_OS_LINUX
+	dump_registers(cinfo);
 #endif
 
 	fdprintf(fd, "Current time: %u\n", (unsigned) time(0));
@@ -290,8 +294,14 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 #ifdef ENABLE_BACKTRACE
 	fdprintf(fd, "Attempting backtrace... Note: since the server has "
 		 "already crashed, \nthis may fail as well\n");
-	print_backtrace();
+	fdprintf(STDERR_FILENO, "%s", cinfo->backtrace_buf);
 #endif
+	/*
+	 * If sending crash report to the feedback server is
+	 * allowed we won't be generating local core dump but
+	 * rather try to send data and exit.
+	 */
+	errstat_exec_send_crash();
 end:
 	/* Try to dump core. */
 	memset(&sa, 0, sizeof(sa));
@@ -815,6 +825,7 @@ main(int argc, char **argv)
 		title_set_script_name(argv[0]);
 	}
 
+	errstat_init(tarantool_bin);
 	export_syms();
 
 	random_init();
-- 
2.26.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v3 4/4] cfg: configure crash report sending
  2020-12-04 15:29 [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
@ 2020-12-04 15:30 ` Cyrill Gorcunov
  2020-12-05 18:34   ` Vladislav Shpilevoy
  2020-12-05 18:30 ` [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Vladislav Shpilevoy
  4 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-04 15:30 UTC (permalink / raw)
  To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy

Introcude a new option to box.cfg{} to control sending
of a crash report.

There is no simple way to test all this because it involves
subsequent runs of tarantool. I've been using one terminal
with tarantool running inside as

> box.cfg{feedback_host="127.0.0.1:1500"}

another terminal to listen for incoming data

> while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done

and one another to kill main tarantool instance

> kill -11 `pidof tarantool`

If we run as

> box.cfg{feedback_host="127.0.0.1:1500", feedback_no_crashinfo=true}

then the crash report will not be sent.

Closes #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Configuration reference
Header: Feedback

For better analisys of program crashes the information associated with
the crash such as

 - utsname (similar to `uname -a` output except the network name)
 - build information
 - reason for a crash
 - call backtrace

is sent to the feedback server. To disable it set `feedback_no_crashinfo`
to `false`.
---
 src/box/lua/feedback_daemon.lua | 7 +++++++
 src/box/lua/load_cfg.lua        | 3 +++
 src/lib/core/errstat.c          | 2 +-
 test/box/admin.result           | 2 ++
 test/box/cfg.result             | 4 ++++
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 1d39012ed..613d2864c 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -1,6 +1,7 @@
 -- feedback_daemon.lua (internal file)
 --
 local log   = require('log')
+local ffi   = require('ffi')
 local json  = require('json')
 local fiber = require('fiber')
 local http  = require('http.client')
@@ -360,12 +361,18 @@ local function reload(self)
     self:start()
 end
 
+ffi.cdef[[
+void
+cfg_errstat_set_params(void);
+]]
+
 setmetatable(daemon, {
     __index = {
         set_feedback_params = function()
             daemon.enabled  = box.cfg.feedback_enabled
             daemon.host     = box.cfg.feedback_host
             daemon.interval = box.cfg.feedback_interval
+            ffi.C.cfg_errstat_set_params()
             reload(daemon)
             return
         end,
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 770442052..1878c31bd 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -99,6 +99,7 @@ local default_cfg = {
     replication_skip_conflict = false,
     replication_anon      = false,
     feedback_enabled      = true,
+    feedback_no_crashinfo = false,
     feedback_host         = "https://feedback.tarantool.io",
     feedback_interval     = 3600,
     net_msg_max           = 768,
@@ -179,6 +180,7 @@ local template_cfg = {
     replication_skip_conflict = 'boolean',
     replication_anon      = 'boolean',
     feedback_enabled      = ifdef_feedback('boolean'),
+    feedback_no_crashinfo = ifdef_feedback('boolean'),
     feedback_host         = ifdef_feedback('string'),
     feedback_interval     = ifdef_feedback('number'),
     net_msg_max           = 'number',
@@ -277,6 +279,7 @@ local dynamic_cfg = {
     checkpoint_wal_threshold = private.cfg_set_checkpoint_wal_threshold,
     worker_pool_threads     = private.cfg_set_worker_pool_threads,
     feedback_enabled        = ifdef_feedback_set_params,
+    feedback_no_crashinfo   = ifdef_feedback_set_params,
     feedback_host           = ifdef_feedback_set_params,
     feedback_interval       = ifdef_feedback_set_params,
     -- do nothing, affects new replicas, which query this value on start
diff --git a/src/lib/core/errstat.c b/src/lib/core/errstat.c
index 992bc426a..8664908f6 100644
--- a/src/lib/core/errstat.c
+++ b/src/lib/core/errstat.c
@@ -23,7 +23,7 @@
 #define pr_panic(fmt, ...)	panic(pr_fmt(fmt), ##__VA_ARGS__)
 
 static struct errstat glob_errstat;
-static bool cfg_send_crash = false;
+static bool cfg_send_crash = true;
 
 /*
  * We don't need it to be optimized but rather a compact form.
diff --git a/test/box/admin.result b/test/box/admin.result
index 8c5626c36..b8fa68749 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -47,6 +47,8 @@ cfg_filter(box.cfg)
     - https://feedback.tarantool.io
   - - feedback_interval
     - 3600
+  - - feedback_no_crashinfo
+    - false
   - - force_recovery
     - false
   - - hot_standby
diff --git a/test/box/cfg.result b/test/box/cfg.result
index 5ca6ce72b..68c45b4cc 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -35,6 +35,8 @@ cfg_filter(box.cfg)
  |     - https://feedback.tarantool.io
  |   - - feedback_interval
  |     - 3600
+ |   - - feedback_no_crashinfo
+ |     - false
  |   - - force_recovery
  |     - false
  |   - - hot_standby
@@ -148,6 +150,8 @@ cfg_filter(box.cfg)
  |     - https://feedback.tarantool.io
  |   - - feedback_interval
  |     - 3600
+ |   - - feedback_no_crashinfo
+ |     - false
  |   - - force_recovery
  |     - false
  |   - - hot_standby
-- 
2.26.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback
  2020-12-04 15:29 [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 4/4] cfg: configure crash report sending Cyrill Gorcunov
@ 2020-12-05 18:30 ` Vladislav Shpilevoy
  4 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-05 18:30 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Mons Anderson

Hi! Thanks for the patch!

I can't build it. Also almost the entire CI is red:
https://gitlab.com/tarantool/tarantool/-/pipelines/225435069

So I am going to look at the patch, but I can't validate if it works.

The first question is why is the module called 'errstat'? It has nothing
to with errors nor with statistics. It is just a dump collector. I
suggest you to rename it to crashinfo or just crash or something like that.
Even the new config option uses 'crashinfo' name.

My compilation errors are below. For strlcpy I can tell, that it already
exists on my system. Probably this is why it fails to compile your version.

/Users/gerold/Work/Repositories/tarantool/src/lib/core/errstat.c:69:1: error: expected parameter declarator
strlcpy(char *dst, const char *src, size_t size)
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_string.h:112:47: note: expanded from macro 'strlcpy'
                __builtin___strlcpy_chk (dest, __VA_ARGS__, __darwin_obsz (dest))
                                                            ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:39:62: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
                                                             ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:30:32: note: expanded from macro '_USE_FORTIFY_LEVEL'
#    define _USE_FORTIFY_LEVEL 2
                               ^
/Users/gerold/Work/Repositories/tarantool/src/lib/core/errstat.c:69:1: error: expected ')'
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_string.h:112:47: note: expanded from macro 'strlcpy'
                __builtin___strlcpy_chk (dest, __VA_ARGS__, __darwin_obsz (dest))
                                                            ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:39:62: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
                                                             ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:30:32: note: expanded from macro '_USE_FORTIFY_LEVEL'
#    define _USE_FORTIFY_LEVEL 2
                               ^
/Users/gerold/Work/Repositories/tarantool/src/lib/core/errstat.c:69:1: note: to match this '('
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_string.h:112:47: note: expanded from macro 'strlcpy'
                __builtin___strlcpy_chk (dest, __VA_ARGS__, __darwin_obsz (dest))
                                                            ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:39:53: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
                                                    ^
/Users/gerold/Work/Repositories/tarantool/src/lib/core/errstat.c:69:1: error: type specifier missing, defaults to 'int' [-Werror,-Wimplicit-int]
strlcpy(char *dst, const char *src, size_t size)
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_string.h:112:47: note: expanded from macro 'strlcpy'
                __builtin___strlcpy_chk (dest, __VA_ARGS__, __darwin_obsz (dest))
                                                            ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:39:31: note: expanded from macro '__darwin_obsz'
#define __darwin_obsz(object) __builtin_object_size (object, _USE_FORTIFY_LEVEL > 1 ? 1 : 0)
                              ^
/Users/gerold/Work/Repositories/tarantool/src/lib/core/errstat.c:69:1: error: definition of builtin function '__builtin___strlcpy_chk'
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_string.h:112:3: note: expanded from macro 'strlcpy'
                __builtin___strlcpy_chk (dest, __VA_ARGS__, __darwin_obsz (dest))
                ^
/Users/gerold/Work/Repositories/tarantool/src/lib/core/errstat.c:74:3: error: too many arguments to function call, expected 1, have 2
                memcpy(dst, src, len);
                ^~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_string.h:63:46: note: expanded from macro 'memcpy'
                __builtin___memcpy_chk (dest, __VA_ARGS__, __darwin_obsz0 (dest))
                                                           ^~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/secure/_common.h:38:63: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                               ~~~~~~~~~~~~~~~~~~~~~          ^
5 errors generated.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
@ 2020-12-05 18:30   ` Vladislav Shpilevoy
  2020-12-05 18:52     ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-05 18:30 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Mons Anderson

Thanks for the patch!

On 04.12.2020 16:30, Cyrill Gorcunov via Tarantool-patches wrote:
> This will allow to reuse this routine in crash
> reports.
> 
> Part-of #5261
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  src/lib/core/backtrace.cc | 12 ++++++------
>  src/lib/core/backtrace.h  |  3 +++
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/src/lib/core/backtrace.cc b/src/lib/core/backtrace.cc
> index 456ce9a4d..68d0d3ee6 100644
> --- a/src/lib/core/backtrace.cc
> +++ b/src/lib/core/backtrace.cc
> @@ -131,7 +131,7 @@ get_proc_name(unw_cursor_t *unw_cur, unw_word_t *offset, bool skip_cache)
>  }
>  
>  char *
> -backtrace(void)
> +backtrace(char *start, char *end)

Why so strange choice of arguments? We almost always use char* + size_t,
except for a few cases such as code working with tuples, where we 'save'
time on not calculating 'end' in each next stack frame. Lets be consistent
and use char* + size_t, not to raise unnecessary questions for such simple
code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code Cyrill Gorcunov
@ 2020-12-05 18:33   ` Vladislav Shpilevoy
  2020-12-05 19:58     ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-05 18:33 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Mons Anderson

Thanks for the patch!

See 22 comments below.

On 04.12.2020 16:30, Cyrill Gorcunov via Tarantool-patches wrote:
> The errstat stands for error statistics. At moment it supports
> gathering crash dumps into an internal preallocated buffer
> which includes:
> 
>  - `uname` output
>  - build type
>  - a reason for the crash
>  - call backtrace (linux x86-64 only)
> 
> The data is collected into json format and then encoded into
> base64 form. Moreover the backtrace itself is preencoded as
> base64 earlier because we don't need the json values to consist
> some weird characters.
> 
> Once data is collected one can run errstat_exec_send_crash function
> which executes another copy of tarantool and passes it a script
> in form of
> 
>  > tarantool -e "require('http.client').post('127.0.0.1:1500', \
>  > '{"crashdump":{"version":"1","data":"eyJ1bmFtZSI6eyJzeXNuYW1l.."}}', \
>  > {timeout=1}); os.exit(1);"
> 
> The address of the network dump collector is configurable via traditional
> box.cfg{feedback_host=addr}.
> 
> Note that we're trying to use preallocated memory for data collecting
> since this code is supposed to be called from inside of a fatal signal
> handler. For simplicity we use snprintf, strlen, memcpy inside encoder
> but should consider to implement own helpers instead to minimize the
> use of system libraries. For exactly the same reason we use inplace
> base64 encode helper.
> 
> When data is ready we send it with help of vfork+execve, this is done
> to make sure that local coredump is still generated even after the
> report is sent.
> 
> A typical encoded data looks like
> 
>  | {
>  |   "uname": {
>  |     "sysname": "Linux",
>  |     "release": "5.9.10-100.fc32.x86_64",
>  |     "version": "#1 SMP Mon Nov 23 18:12:36 UTC 2020",
>  |     "machine": "x86_64"
>  |   },
>  |   "build": {
>  |     "major": 2,
>  |     "minor": 7,
>  |     "patch": 0,

1. Why do you need the version numbers and the 'version'
string both? All can be extracted from 'version' string.

>  |     "version": "2.7.0-78-gf898822f9",
>  |     "cmake_type": "Linux-x86_64-Debug"
>  |   },
>  |   "signal": {
>  |     "signo": 11,
>  |     "si_code": 0,
>  |     "si_addr": "0x3e800009727",
>  |     "backtrace": "IzAgIDB4NjMyODhiIGlu..",
>  |     "timestamp": "0x164ceceac0610ae9",

2. Why is timestamp in hex? And why do you need both
hex and string?

>  |     "timestamp_str": "2020-12-02 17:34:20 MSK"
>  |   }
>  | }
> 
> Part-of #5261

3. Lets better put the issue link in the end, like we do in 100%
of our patches except when a docbot request is present. Because
we usually look for issue reference in the end, and by putting it
into a random place of a commit message you complicate that
search.

> Worth to mention that the patch uses box.cfg's
> 
>  - feedback_host
>  - feedback_enabled
>  - feedback_no_crashinfo
> 
> variables internally and feedback_no_crashinfo will be introduced
> in next paches, thus after this patch the crash report is disabled.

4. paches -> patches. You need to turn on a spell checker or something.

> diff --git a/src/lib/core/errstat.c b/src/lib/core/errstat.c
> new file mode 100644
> index 000000000..992bc426a
> --- /dev/null
> +++ b/src/lib/core/errstat.c
> @@ -0,0 +1,409 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +
> +#include <string.h>
> +#include <unistd.h>
> +#include <time.h>
> +#include <sys/types.h>
> +#include <sys/utsname.h>
> +
> +#include "trivia/util.h"
> +#include "backtrace.h"
> +#include "errstat.h"
> +#include "cfg.h"
> +#include "say.h"
> +
> +#define pr_fmt(fmt)		"errstat: " fmt
> +#define pr_info(fmt, ...)	say_info(pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_err(fmt, ...)	say_error(pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_crit(fmt, ...)	fprintf(stderr, pr_fmt(fmt) "\n", ##__VA_ARGS__)
> +#define pr_panic(fmt, ...)	panic(pr_fmt(fmt), ##__VA_ARGS__)
> +
> +static struct errstat glob_errstat;
> +static bool cfg_send_crash = false;

5. Why do you have a global object for this module, and yet
this part of it is separated into another global variable?

> +
> +/*
> + * We don't need it to be optimized but rather a compact form.
> + */
> +static unsigned char *
> +base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
> +{

6. Why can't you use the existing base64_encode() function from
base64.h? It is not different from your version. You both use global
variables (static also is global, just not visible everywhere). So
your version is not 'safer' in terms of being called from a signal
handler.

> +	static int m[] = {0, 2, 1};
> +	static unsigned char t[] = {
> +		'A','B','C','D','E','F','G','H',
> +		'I','J','K','L','M','N','O','P',
> +		'Q','R','S','T','U','V','W','X',
> +		'Y','Z','a','b','c','d','e','f',
> +		'g','h','i','j','k','l','m','n',
> +		'o','p','q','r','s','t','u','v',
> +		'w','x','y','z','0','1','2','3',
> +		'4','5','6','7','8','9','+','/'
> +	};
> +	size_t i, j;
> +
> +	for (i = 0, j = 0; i < src_len;) {
> +		uint32_t a = i < src_len ? src[i++] : 0;
> +		uint32_t b = i < src_len ? src[i++] : 0;
> +		uint32_t c = i < src_len ? src[i++] : 0;
> +
> +		uint32_t d = (a << 0x10) + (b << 0x08) + c;
> +
> +		dst[j++] = t[(d >> 3 * 6) & 0x3f];
> +		dst[j++] = t[(d >> 2 * 6) & 0x3f];
> +		dst[j++] = t[(d >> 1 * 6) & 0x3f];
> +		dst[j++] = t[(d >> 0 * 6) & 0x3f];
> +	}
> +
> +	size_t dst_len = ERRSTAT_BASE64_LEN(src_len);
> +	j = m[src_len % 3];
> +	for (i = 0; i < j; i++)
> +		dst[dst_len - 1 - i] = '=';
> +
> +	return dst;
> +}
> +
> +static size_t
> +strlcpy(char *dst, const char *src, size_t size)
> +{
> +	size_t ret = strlen(src);
> +	if (size) {
> +		size_t len = (ret >= size) ? size - 1 : ret;
> +		memcpy(dst, src, len);
> +		dst[len] = '\0';
> +	}
> +	return ret;
> +}
> +
> +#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))

7. What is _a?

> +
> +struct errstat *
> +errstat_get(void)

8. Why do you ever need to get it? Why the stat is not entirely
in the .c file? It heavily depends on the platform, so it does
not make much sense to expose it to any external code. It won't
be able to use it properly except for some common things.

> +{
> +	return &glob_errstat;
> +}
> +
> +static inline
> +uint64_t timespec_to_ns(struct timespec *ts)

9. Please, put return type on a separate line. On the same line
as 'static inline'.

> +{
> +	return (uint64_t)ts->tv_sec * 1000000000 + (uint64_t)ts->tv_nsec;
> +}
> +
> +static char *
> +ns_to_localtime(uint64_t timestamp, char *buf, ssize_t len)
> +{
> +	time_t sec = timestamp / 1000000000;
> +	char *start = buf;
> +	struct tm tm;
> +
> +	/*
> +	 * Use similar format as say_x logger. Except plain
> +	 * seconds should be enough.
> +	 */
> +	localtime_r(&sec, &tm);
> +	ssize_t total = strftime(start, len, "%F %T %Z", &tm);
> +	start += total;
> +	if (total < len)
> +		return buf;
> +	buf[len-1] = '\0';

10. Lets conform with our code style and use spaces before and after
binary operators.

> +	return buf;
> +}
> +
> +void
> +errstat_init(const char *tarantool_bin)
> +{
> +	struct errstat_build *binfo = &errstat_get()->build_info;
> +	struct errstat_uname *uinfo = &errstat_get()->uname_info;
> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> +
> +	binfo->major = PACKAGE_VERSION_MAJOR;
> +	binfo->minor = PACKAGE_VERSION_MINOR;
> +	binfo->patch = PACKAGE_VERSION_PATCH;
> +
> +	strlcpy_a(binfo->version, PACKAGE_VERSION);
> +	strlcpy_a(binfo->cmake_type, BUILD_INFO);
> +
> +	static_assert(ERRSTAT_UNAME_BUF_LEN > sizeof(struct errstat_uname),
> +		      "uname_buf is too small");
> +
> +	char uname_buf[ERRSTAT_UNAME_BUF_LEN];
> +	struct utsname *uname_ptr = (void *)uname_buf;
> +	if (uname(uname_ptr) == 0) {
> +		strlcpy_a(uinfo->sysname, uname_ptr->sysname);
> +		strlcpy_a(uinfo->nodename, uname_ptr->nodename);
> +		strlcpy_a(uinfo->release, uname_ptr->release);
> +		strlcpy_a(uinfo->version, uname_ptr->version);
> +		strlcpy_a(uinfo->machine, uname_ptr->machine);
> +	} else
> +		pr_err("can't fetch uname");

11. Lets conform with the code style, and use {} either in both 'if'
and 'else', or in none.

> +
> +	strlcpy_a(cinfo->tarantool_bin, tarantool_bin);
> +	if (strlen(cinfo->tarantool_bin) < strlen(tarantool_bin))
> +		pr_panic("can't save binary path");
> +
> +	static_assert(sizeof(cinfo->exec_argv_1) == 4,
> +		      "exec_argv_1 is too small");
> +	strlcpy_a(cinfo->exec_argv_1, "-e");
> +}
> +
> +void
> +cfg_errstat_set_params(void)

12. I suggest you to rename it to errstat_cfg().

'cfg' is a prefix used by 'cfg' module, which you also
use here. This is probably why we have box_cfg, raft_cfg_*,
iproto_cfg_* and so on.

> +{
> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> +	const char *host = cfg_gets("feedback_host");
> +	int enabled = cfg_getb("feedback_enabled");

13. For flags we use name prefix 'is_'. Please, blend in.

> +	int no_crashinfo = cfg_getb("feedback_no_crashinfo");

14. That is another proof that the module name is misleading - you
even use crashinfo in the options. Does not it look consistent
to name the module 'crashinfo' too? (Or something else with 'crash'
and without 'stat' in name.)

> +
> +	if (host == NULL || enabled == 0 || no_crashinfo == 1) {
> +		if (cfg_send_crash) {
> +			pr_info("disable sedning crashinfo feedback");
> +			cfg_send_crash = false;
> +			cinfo->feedback_host[0] = '\0';
> +		}
> +		return;
> +	}
> +
> +	if (strcmp(cinfo->feedback_host, host) != 0) {
> +		strlcpy_a(cinfo->feedback_host, host);
> +		if (strlen(cinfo->feedback_host) < strlen(host))
> +			pr_panic("feedback_host is too long");
> +	}
> +
> +	if (!cfg_send_crash) {
> +		pr_info("enable sedning crashinfo feedback");
> +		cfg_send_crash = true;
> +	}
> +}
> +
> +void
> +errstat_reset(void)

15. This function is never used.

> +{
> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> +
> +#ifdef ENABLE_BACKTRACE
> +	cinfo->backtrace_buf[0] = '\0';
> +#endif
> +	memset(&cinfo->siginfo, 0, sizeof(cinfo->siginfo));
> +	cinfo->timestamp_rt = 0;
> +}
> +
> +#ifdef TARGET_OS_LINUX
> +static void
> +collect_gregs(struct errstat_crash *cinfo, ucontext_t *uc)
> +{
> +	static_assert(sizeof(cinfo->greg) == sizeof(uc->uc_mcontext),
> +		      "GP regs are not matching signal frame");
> +
> +	/*
> +	 * uc_mcontext on libc level looks somehow strange,
> +	 * they define an array of uint64_t where each register
> +	 * defined by REG_x macro.
> +	 *
> +	 * In turn the kernel is quite explicit about the context.
> +	 * Moreover it is a part of user ABI, thus won't be changed.
> +	 *
> +	 * Lets use memcpy here to make a copy in a fast way.
> +	 */
> +	memcpy(&cinfo->greg, &uc->uc_mcontext, sizeof(cinfo->greg));
> +}
> +#endif
> +
> +/**
> + * The routine is called inside crash signal handler so
> + * be carefull to not cause additional signals inside.
> + */
> +void
> +errstat_collect_crash(int signo, siginfo_t *siginfo, void *ucontext)
> +{
> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> +
> +	struct timespec ts;
> +	if (clock_gettime(CLOCK_REALTIME, &ts) == 0) {
> +		cinfo->timestamp_rt = timespec_to_ns(&ts);
> +		ns_to_localtime(cinfo->timestamp_rt,
> +				cinfo->timestamp_rt_str,
> +				sizeof(cinfo->timestamp_rt_str));
> +	} else {
> +		cinfo->timestamp_rt = 0;
> +		memset(cinfo->timestamp_rt_str, 0,
> +		       sizeof(cinfo->timestamp_rt_str));
> +	}
> +
> +	cinfo->signo = signo;
> +	cinfo->siginfo = *siginfo;
> +
> +	cinfo->context_addr = ucontext;
> +	cinfo->siginfo_addr = siginfo;
> +
> +#ifdef ENABLE_BACKTRACE
> +	char *start = cinfo->backtrace_buf;
> +	char *end = start + sizeof(cinfo->backtrace_buf) - 1;
> +	backtrace(start, end);
> +#endif
> +
> +#ifdef TARGET_OS_LINUX
> +	collect_gregs(cinfo, ucontext);
> +#endif
> +}
> +
> +/**
> + * Prepare report in json format and put it into a buffer.
> + */
> +static void
> +prepare_report_script(void)
> +{
> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> +	struct errstat_build *binfo = &errstat_get()->build_info;
> +	struct errstat_uname *uinfo = &errstat_get()->uname_info;
> +
> +	char *p, *e;
> +
> +#ifdef ENABLE_BACKTRACE
> +	/*
> +	 * We can't use arbitrary data which can be misinterpreted
> +	 * by Lua script when we pass it to a script.
> +	 *
> +	 * WARNING: We use report_encoded as a temp buffer.

16. Static buffer is much bigger, and is already there. Why
can't you use it? Instead you somewhy add several kilobytes of
another static buffer padding out the process size.

> +	 */
> +	size_t bt_len = strlen(cinfo->backtrace_buf);
> +	size_t bt_elen = ERRSTAT_BASE64_LEN(bt_len);
> +	if (bt_elen >= sizeof(cinfo->report_encoded))
> +		pr_panic("backtrace space is too small");
> +
> +	base64_encode((unsigned char *)cinfo->report_encoded,
> +		      (unsigned char *)cinfo->backtrace_buf, bt_len);
> +	cinfo->report_encoded[bt_elen] = '\0';
> +	memcpy(cinfo->backtrace_buf, cinfo->report_encoded, bt_elen + 1);
> +#endif
> +
> +#define snprintf_safe(fmt, ...)					\
> +	do {							\
> +		p += snprintf(p, e - p, fmt, ##__VA_ARGS__);	\
> +		if (p >= e)					\
> +			goto out;				\
> +	} while (0)
> +
> +	e = cinfo->report + sizeof(cinfo->report) - 1;
> +	p = cinfo->report;
> +
> +	snprintf_safe("{");
> +	snprintf_safe("\"uname\":{");
> +	snprintf_safe("\"sysname\":\"%s\",", uinfo->sysname);
> +#if 0> +	/*
> +	 * nodename might a sensitive information so don't
> +	 * send it by default.
> +	 */
> +	snprintf_safe("\"nodename\":\"%s\",", uinfo->nodename);
> +#endif
> +	snprintf_safe("\"release\":\"%s\",", uinfo->release);
> +	snprintf_safe("\"version\":\"%s\",", uinfo->version);
> +	snprintf_safe("\"machine\":\"%s\"", uinfo->machine);
> +	snprintf_safe("},");
> +
> +	snprintf_safe("\"build\":{");
> +	snprintf_safe("\"major\":%d,", binfo->major);
> +	snprintf_safe("\"minor\":%d,", binfo->minor);
> +	snprintf_safe("\"patch\":%d,", binfo->patch);
> +	snprintf_safe("\"version\":\"%s\",", binfo->version);
> +	snprintf_safe("\"cmake_type\":\"%s\"", binfo->cmake_type);
> +	snprintf_safe("},");
> +
> +	snprintf_safe("\"signal\":{");
> +	snprintf_safe("\"signo\":%d,", cinfo->signo);
> +	snprintf_safe("\"si_code\":%d,", cinfo->siginfo.si_code);
> +	if (cinfo->signo == SIGSEGV) {
> +		if (cinfo->siginfo.si_code == SEGV_MAPERR) {
> +			snprintf_safe("\"si_code_str\":\"%s\",",
> +				      "SEGV_MAPERR");
> +		} else if (cinfo->siginfo.si_code == SEGV_ACCERR) {
> +			snprintf_safe("\"si_code_str\":\"%s\",",
> +				      "SEGV_ACCERR");
> +		}
> +		snprintf_safe("\"si_addr\":\"0x%llx\",",
> +			      (long long)cinfo->siginfo.si_addr);
> +	}
> +#ifdef ENABLE_BACKTRACE
> +	snprintf_safe("\"backtrace\":\"%s\",", cinfo->backtrace_buf);
> +#endif
> +	snprintf_safe("\"timestamp\":\"0x%llx\",",
> +		      (long long)cinfo->timestamp_rt);
> +	snprintf_safe("\"timestamp_str\":\"%s\"",
> +		      cinfo->timestamp_rt_str);
> +	snprintf_safe("}");
> +	snprintf_safe("}\'");
> +
> +	size_t report_len = strlen(cinfo->report);
> +	size_t report_elen = ERRSTAT_BASE64_LEN(report_len);
> +	if (report_elen >= sizeof(cinfo->report_encoded))
> +		pr_panic("report encoded space is too small");
> +
> +	base64_encode((unsigned char *)cinfo->report_encoded,
> +		      (unsigned char *)cinfo->report,
> +		      report_len);
> +	cinfo->report_encoded[report_elen] = '\0';
> +
> +	e = cinfo->report_script + sizeof(cinfo->report_script) - 1;
> +	p = cinfo->report_script;
> +
> +	snprintf_safe("require(\'http.client\').post(\'%s\',"
> +		      "'{\"crashdump\":{\"version\":\"%d\","
> +		      "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",

17. Why so small timeout?

> +		      cinfo->feedback_host,
> +		      ERRSTAT_REPORT_VERSION,
> +		      cinfo->report_encoded);
> +
> +#undef snprintf_safe
> +	return;
> +
> +out:
> +	pr_crit("unable to prepare a crash report");
> +	struct sigaction sa = {
> +		.sa_handler = SIG_DFL,
> +	};
> +	sigemptyset(&sa.sa_mask);
> +	sigaction(SIGABRT, &sa, NULL);

18. Why do you touch the signal handler here? You didn't
change it anywhere in this file.

> +
> +	abort();
> +}
> +
> +void
> +errstat_exec_send_crash(void)

19. Why is it called 'exec'?

> +{
> +	if (!cfg_send_crash)
> +		return;
> +
> +	prepare_report_script();
> +
> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> +	cinfo->exec_argv[0] = cinfo->tarantool_bin;
> +	cinfo->exec_argv[1] = cinfo->exec_argv_1;
> +	cinfo->exec_argv[2] = cinfo->report_script;
> +	cinfo->exec_argv[3] = NULL;
> +
> +	/*
> +	 * Can't use fork here because libev has own
> +	 * at_fork helpers with mutex where we might
> +	 * stuck (see popen code).
> +	 */
> +	pid_t pid = vfork();
> +	if (pid < 0) {
> +		pr_crit("unable to fork (errno %d)", errno);
> +	} else if (pid == 0) {
> +		/*
> +		 * The script must exit at the end but there
> +		 * is no simple way to make sure from inside
> +		 * of a signal crash handler. So just hope it
> +		 * is running fine.
> +		 */
> +		execve(cinfo->tarantool_bin, cinfo->exec_argv, NULL);
> +		pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
> +			cinfo->tarantool_bin,
> +			cinfo->exec_argv[0],
> +			cinfo->exec_argv[1],
> +			cinfo->exec_argv[2]);
> +		_exit(1);
> +	}
> +}
> diff --git a/src/lib/core/errstat.h b/src/lib/core/errstat.h
> new file mode 100644
> index 000000000..77db53052
> --- /dev/null
> +++ b/src/lib/core/errstat.h
> @@ -0,0 +1,253 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +#pragma once
> +
> +#include <stdint.h>
> +#include <signal.h>
> +#include <limits.h>
> +
> +#include "trivia/config.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +#define ERRSTAT_REPORT_VERSION 1

20. It is not used out of .c file. I suggest you to
move it there.

> +
> +/**
> + * Build type information for statistics.
> + */
> +struct errstat_build {
> +	/**
> +	 * Package major version - 1 for 1.6.7.
> +	 */
> +	int major;
> +	/**
> +	 * Package minor version - 6 for 1.6.7
> +	 */
> +	int minor;
> +	/**
> +	 * Package patch version - 7 for 1.6.7
> +	 */
> +	int patch;
> +	/**
> +	 * A string with major-minor-patch-commit-id identifier of the
> +	 * release, e.g. 2.7.0-62-g0b7726571.
> +	 */
> +	char version[64];
> +	/**
> +	 * Build type (Debug and etc).
> +	 */
> +	char cmake_type[64];
> +};
> +
> +#ifdef TARGET_OS_LINUX
> +#ifndef __x86_64__
> +# error "Non x86-64 architectures are not supported"
> +#endif
> +struct errstat_greg {
> +	uint64_t	r8;
> +	uint64_t	r9;
> +	uint64_t	r10;
> +	uint64_t	r11;
> +	uint64_t	r12;
> +	uint64_t	r13;
> +	uint64_t	r14;
> +	uint64_t	r15;
> +	uint64_t	di;
> +	uint64_t	si;
> +	uint64_t	bp;
> +	uint64_t	bx;
> +	uint64_t	dx;
> +	uint64_t	ax;
> +	uint64_t	cx;
> +	uint64_t	sp;
> +	uint64_t	ip;
> +	uint64_t	flags;
> +	uint16_t	cs;
> +	uint16_t	gs;
> +	uint16_t	fs;
> +	uint16_t	ss;
> +	uint64_t	err;
> +	uint64_t	trapno;
> +	uint64_t	oldmask;
> +	uint64_t	cr2;
> +	uint64_t	fpstate;
> +	uint64_t	reserved1[8];
> +};
> +#endif /* TARGET_OS_LINUX */
> +
> +#define ERRSTAT_BASE64_LEN(len)			(4 * (((len) + 2) / 3))
> +
> +/*
> + * 4K of memory should be enough to keep the backtrace.
> + * In worst case it gonna be simply trimmed. Since we're
> + * reporting it encoded the pain text shrinks to 3070 bytes.
> + */
> +#define ERRSTAT_BACKTRACE_MAX			(4096)
> +
> +/*
> + * The report should include the bactrace
> + * and all additional information we're
> + * going to send.
> + */
> +#define ERRSTAT_REPORT_PAYLOAD_MAX		(2048)
> +#ifdef ENABLE_BACKTRACE
> +# define ERRSTAT_REPORT_MAX			\
> +	(ERRSTAT_BACKTRACE_MAX +		\
> +	 ERRSTAT_REPORT_PAYLOAD_MAX)
> +# else
> +# define ERRSTAT_REPORT_MAX			\
> +	(ERRSTAT_REPORT_PAYLOAD_MAX)
> +#endif
> +
> +/*
> + * We encode report into base64 because
> + * it is passed inside Lua script.
> + */
> +#define ERRSTAT_REPORT_ENCODED_MAX		\
> +	ERRSTAT_BASE64_LEN(ERRSTAT_REPORT_MAX)
> +
> +
> +/*
> + * The script to execute should contain encoded
> + * report.
> + */
> +#define ERRSTAT_REPORT_SCRIPT_PAYLOAD_MAX	(512)
> +#define ERRSTAT_REPORT_SCRIPT_MAX		\
> +	(ERRSTAT_REPORT_SCRIPT_PAYLOAD_MAX +	\
> +	 ERRSTAT_REPORT_ENCODED_MAX)
> +
> +struct errstat_crash {
> +	/**
> +	 * Exec arguments pointers.
> +	 */
> +	char *exec_argv[4];
> +	/**
> +	 * Predefined argument "-e".
> +	 */
> +	char exec_argv_1[4];
> +	/**
> +	 * Crash report in plain json format.
> +	 */
> +	char report[ERRSTAT_REPORT_MAX];
> +	/**
> +	 * Crash report in base64 form.
> +	 */
> +	char report_encoded[ERRSTAT_REPORT_ENCODED_MAX];
> +	/**
> +	 * Tarantool executable to send report stript.
> +	 */
> +	char tarantool_bin[PATH_MAX];
> +	/**
> +	 * The script to evaluate by tarantool
> +	 * to send the report.
> +	 */
> +	char report_script[ERRSTAT_REPORT_SCRIPT_MAX];
> +#ifdef ENABLE_BACKTRACE
> +	/**
> +	 * Backtrace buffer.
> +	 */
> +	char backtrace_buf[ERRSTAT_BACKTRACE_MAX];
> +#endif
> +	/**
> +	 * Crash signal.
> +	 */
> +	int signo;
> +	/**
> +	 * Signal information.
> +	 */
> +	siginfo_t siginfo;
> +	/**
> +	 * These two are mostly useless as being
> +	 * plain addresses but keep for backward
> +	 * compatibility.
> +	 */
> +	void *context_addr;
> +	void *siginfo_addr;
> +#ifdef TARGET_OS_LINUX
> +	/**
> +	 * Registers contents.
> +	 */
> +	struct errstat_greg greg;
> +#endif
> +	/**
> +	 * Timestamp in nanoseconds (realtime).
> +	 */
> +	uint64_t timestamp_rt;
> +	/**
> +	 * Timestamp string representation to
> +	 * use on demand.
> +	 */
> +	char timestamp_rt_str[32];
> +	/**
> +	 * Crash collector host.
> +	 */
> +	char feedback_host[1024];
> +};
> +
> +#define ERRSTAT_UNAME_BUF_LEN	1024
> +#define ERRSTAT_UNAME_FIELD_LEN	128
> +/**
> + * Information about node.
> + *
> + * On linux there is new_utsname structure which
> + * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
> + * So lets just reserve more data in advance.
> + */
> +struct errstat_uname {
> +	char sysname[ERRSTAT_UNAME_FIELD_LEN];
> +	char nodename[ERRSTAT_UNAME_FIELD_LEN];
> +	char release[ERRSTAT_UNAME_FIELD_LEN];
> +	char version[ERRSTAT_UNAME_FIELD_LEN];
> +	char machine[ERRSTAT_UNAME_FIELD_LEN];
> +};
> +
> +struct errstat {
> +	struct errstat_build build_info;
> +	struct errstat_uname uname_info;
> +	struct errstat_crash crash_info;
> +};
> +
> +/**
> + * Return a pointer to the info keeper.
> + */
> +extern struct errstat *

21. Please, omit 'extern'. They don't make any difference
here, and stand out of all the other libs we have.

> +errstat_get(void);
> +
> +/**
> + * Initialize error statistics.
> + */
> +extern void
> +errstat_init(const char *tarantool_bin);
> +
> +/**
> + * Configure errstat from feedback daemon.
> + */
> +extern void
> +cfg_errstat_set_params(void);
> +
> +/**
> + * Reset everything except build information.
> + */
> +extern void
> +errstat_reset(void);
> +
> +/**
> + * Collect a crash.
> + */
> +extern void
> +errstat_collect_crash(int signo, siginfo_t *siginfo, void *context);
> +
> +/**
> + * Send a crash report.
> + */
> +extern void
> +errstat_exec_send_crash(void);

22. Why do you have collect and send separated? Why can't it just be
something like 'crashinfo_handle_crash(sig stuff...);'

> +
> +#if defined(__cplusplus)
> +}
> +#endif /* defined(__cplusplus) */
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
@ 2020-12-05 18:33   ` Vladislav Shpilevoy
  2020-12-05 20:04     ` Cyrill Gorcunov
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-05 18:33 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Mons Anderson

Thanks for the patch!

See 2 comments below.

On 04.12.2020 16:30, Cyrill Gorcunov via Tarantool-patches wrote:
> In errstat code we fetch the signal statistic and
> generate a backtrace for report. We don't send this
> data right now but can reuse this code to not decode
> registers and generate backtrace twice.
> 
> Part-of #5261
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  src/main.cc | 83 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 47 insertions(+), 36 deletions(-)
> 
> diff --git a/src/main.cc b/src/main.cc
> index 2f48f474c..260b9a0ff 100644
> --- a/src/main.cc
> +++ b/src/main.cc> @@ -253,6 +253,10 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
>  	}
>  
>  	in_cb = 1;
> +	/*
> +	 * Notify errstat engine about the crash.
> +	 */
> +	errstat_collect_crash(signo, siginfo, context);

1. So I probably see now. You have collect and send separated,
because there is also printing below. In that case I can propose
to make it more explicit somehow. Because when I firstly looked
at that, it took quite some time to realize, that 'cinfo' variable
you obtain above it filled inside errstat_collect_crash. It just is
not passed to there.

For example, you can delete errstat_get(), and make
collect_crash() return struct errstat_crash*, which you will use
below for the printing.

Another option - move entire signal handler to your module, and then
access any internals of it. This option looks the best for me. Having
platform-dependent code both in the new module and in main.cc doing
basically the same but for stderr vs network looks strange.

>  	if (signo == SIGSEGV) {
>  		fdprintf(fd, "Segmentation fault\n");
> @@ -279,8 +283,8 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
>  	fprintf(stderr, "  context: %p\n", context);
>  	fprintf(stderr, "  siginfo: %p\n", siginfo);
>  
> -#if defined(__linux__) && defined(__amd64)
> -	dump_x86_64_registers((ucontext_t *)context);
> +#ifdef TARGET_OS_LINUX
> +	dump_registers(cinfo);
>  #endif
>  
>  	fdprintf(fd, "Current time: %u\n", (unsigned) time(0));
> @@ -290,8 +294,14 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
>  #ifdef ENABLE_BACKTRACE
>  	fdprintf(fd, "Attempting backtrace... Note: since the server has "
>  		 "already crashed, \nthis may fail as well\n");
> -	print_backtrace();
> +	fdprintf(STDERR_FILENO, "%s", cinfo->backtrace_buf);
>  #endif
> +	/*
> +	 * If sending crash report to the feedback server is
> +	 * allowed we won't be generating local core dump but
> +	 * rather try to send data and exit.

2. But why?

If as a dump you mean the text printed above, it is still printed.
If you mean the dump with the data, your module does not send it,
and it is still dumped locally, below.

Btw, having errstat_exec_send_crash() with void args makes existence
of errstat_get() even stranger.

> +	 */
> +	errstat_exec_send_crash();
>  end:
>  	/* Try to dump core. */
>  	memset(&sa, 0, sizeof(sa));
> @@ -815,6 +825,7 @@ main(int argc, char **argv)
>  		title_set_script_name(argv[0]);
>  	}
>  
> +	errstat_init(tarantool_bin);
>  	export_syms();
>  
>  	random_init();
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 4/4] cfg: configure crash report sending
  2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 4/4] cfg: configure crash report sending Cyrill Gorcunov
@ 2020-12-05 18:34   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-05 18:34 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Mons Anderson

Thanks for the patch!

See 4 comments below.

On 04.12.2020 16:30, Cyrill Gorcunov via Tarantool-patches wrote:
> Introcude a new option to box.cfg{} to control sending

1. Introcude -> Introduce.

> of a crash report.
> 
> There is no simple way to test all this because it involves
> subsequent runs of tarantool. I've been using one terminal
> with tarantool running inside as
> 
>> box.cfg{feedback_host="127.0.0.1:1500"}
> 
> another terminal to listen for incoming data
> 
>> while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done
> 
> and one another to kill main tarantool instance
> 
>> kill -11 `pidof tarantool`
> 
> If we run as
> 
>> box.cfg{feedback_host="127.0.0.1:1500", feedback_no_crashinfo=true}
> 
> then the crash report will not be sent.
> 
> Closes #5261
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> 
> @TarantoolBot document
> Title: Configuration reference

2. Please, be more specific. I guess about 80% of all
doc tickets are about configuration update.

> Header: Feedback

3. What is 'Header:'? Docbot does not parse anything except
Title.

> For better analisys of program crashes the information associated with
> the crash such as
> 
>  - utsname (similar to `uname -a` output except the network name)
>  - build information
>  - reason for a crash
>  - call backtrace
> 
> is sent to the feedback server. To disable it set `feedback_no_crashinfo`
> to `false`.
> ---
>  src/box/lua/feedback_daemon.lua | 7 +++++++
>  src/box/lua/load_cfg.lua        | 3 +++
>  src/lib/core/errstat.c          | 2 +-
>  test/box/admin.result           | 2 ++
>  test/box/cfg.result             | 4 ++++
>  5 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
> index 1d39012ed..613d2864c 100644
> --- a/src/box/lua/feedback_daemon.lua
> +++ b/src/box/lua/feedback_daemon.lua
> @@ -360,12 +361,18 @@ local function reload(self)
>      self:start()
>  end
>  
> +ffi.cdef[[
> +void
> +cfg_errstat_set_params(void);

4. You must add this function to exports.h,
otherwise it may be deleted by some optimization, like
it happened to other functions and entire modules before.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer
  2020-12-05 18:30   ` Vladislav Shpilevoy
@ 2020-12-05 18:52     ` Cyrill Gorcunov
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-05 18:52 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml

On Sat, Dec 05, 2020 at 07:30:28PM +0100, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> >  char *
> > -backtrace(void)
> > +backtrace(char *start, char *end)
> 
> Why so strange choice of arguments? We almost always use char* + size_t,
> except for a few cases such as code working with tuples, where we 'save'
> time on not calculating 'end' in each next stack frame. Lets be consistent
> and use char* + size_t, not to raise unnecessary questions for such simple
> code.

Since start/end were calculated inside I juts moved them to arguments.
No problem, will use size instead.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code
  2020-12-05 18:33   ` Vladislav Shpilevoy
@ 2020-12-05 19:58     ` Cyrill Gorcunov
  2020-12-06 15:50       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-05 19:58 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml

On Sat, Dec 05, 2020 at 07:33:05PM +0100, Vladislav Shpilevoy wrote:
> >  |   "build": {
> >  |     "major": 2,
> >  |     "minor": 7,
> >  |     "patch": 0,
> 
> 1. Why do you need the version numbers and the 'version'
> string both? All can be extracted from 'version' string.

Sure we can extract it but I thought this gonna be more
convenient 'cause it won't require the reader to split
string, extract the major/minor. This is not a problem
to drop these fields and leave "version" only if you prefer.
I don't mind.

> >  |   },
> >  |   "signal": {
> >  |     "signo": 11,
> >  |     "si_code": 0,
> >  |     "si_addr": "0x3e800009727",
> >  |     "backtrace": "IzAgIDB4NjMyODhiIGlu..",
> >  |     "timestamp": "0x164ceceac0610ae9",
> 
> 2. Why is timestamp in hex? And why do you need both
> hex and string?

hex is simply more short form for timestamp, I can put
plain integer number if you prefer.

as to time string -- the prblem is in local timezone,
the timestamp itself doesn't show which tz it is. thus
we either have to send tz explicitly or a time string.

if you mean to simply drop timestamp and use timestring
only I'm ok with this..

> 
> >  |     "timestamp_str": "2020-12-02 17:34:20 MSK"
> >  |   }
> >  | }
> > 
> > Part-of #5261
> 
> 3. Lets better put the issue link in the end, like we do in 100%
> of our patches except when a docbot request is present. Because
> we usually look for issue reference in the end, and by putting it
> into a random place of a commit message you complicate that
> search.

Ouch, sorry. Been basing so many times that managed to miss this,
thanks!

> > variables internally and feedback_no_crashinfo will be introduced
> > in next paches, thus after this patch the crash report is disabled.
> 
> 4. paches -> patches. You need to turn on a spell checker or something.

true, thanks!

> > +
> > +static struct errstat glob_errstat;
> > +static bool cfg_send_crash = false;
> 
> 5. Why do you have a global object for this module, and yet
> this part of it is separated into another global variable?

Good point. Actually I introduced this cfg_send_crash a way
later than glob_errstat. Indeed we can keep this settings inside
errstat.

Also let me explain why this structure called errstat - it
keeps not just report to send but also registers (we use it
to print to a console), backtrace, report. And we're planning
to gather diag_set() calls here as well and provide some
box.info interface for last say 10 errors.

On the other hands crash.c may be sounds better, dunno.

> > +static unsigned char *
> > +base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
> > +{
> 
> 6. Why can't you use the existing base64_encode() function from
> base64.h? It is not different from your version. You both use global
> variables (static also is global, just not visible everywhere). So
> your version is not 'safer' in terms of being called from a signal
> handler.

This library is not linked into src/lib/core by default I can add it
actually but since it is third_party library I can't be sure that
it won't be changed someday leading to unpredicted results. As you
might note I try to use local functions as much as I can and I think
we need to implement own strlen/snprintf and etc here as well because
when we're inside sigsegv some state of libc/third_party components
might be broken.

If you think it is safe to reuse base64 here I'll link this library in.

> > +
> > +#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
> 
> 7. What is _a?

array, we use sizeof(dst)

> > +
> > +struct errstat *
> > +errstat_get(void)
> 
> 8. Why do you ever need to get it? Why the stat is not entirely
> in the .c file? It heavily depends on the platform, so it does
> not make much sense to expose it to any external code. It won't
> be able to use it properly except for some common things.

True. Leftover from previous design. Thanks!

> > +
> > +static inline
> > +uint64_t timespec_to_ns(struct timespec *ts)
> 
> 9. Please, put return type on a separate line. On the same line
> as 'static inline'.

yeah, sorry for this typo.

> > +	localtime_r(&sec, &tm);
> > +	ssize_t total = strftime(start, len, "%F %T %Z", &tm);
> > +	start += total;
> > +	if (total < len)
> > +		return buf;
> > +	buf[len-1] = '\0';
> 
> 10. Lets conform with our code style and use spaces before and after
> binary operators.

OK

> > +	} else
> > +		pr_err("can't fetch uname");
> 
> 11. Lets conform with the code style, and use {} either in both 'if'
> and 'else', or in none.

Yeah, sorry, thanks!

> > +void
> > +cfg_errstat_set_params(void)
> 
> 12. I suggest you to rename it to errstat_cfg().
> 
> 'cfg' is a prefix used by 'cfg' module, which you also
> use here. This is probably why we have box_cfg, raft_cfg_*,
> iproto_cfg_* and so on.

sure

> > +{
> > +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> > +	const char *host = cfg_gets("feedback_host");
> > +	int enabled = cfg_getb("feedback_enabled");
> 
> 13. For flags we use name prefix 'is_'. Please, blend in.

nod

> > +	int no_crashinfo = cfg_getb("feedback_no_crashinfo");
> 
> 14. That is another proof that the module name is misleading - you
> even use crashinfo in the options. Does not it look consistent
> to name the module 'crashinfo' too? (Or something else with 'crash'
> and without 'stat' in name.)

You know I think I agree. It is not clear when error statistics gonna
be implemented, so crashinfo looks pretty fine. Will rename.

> > +
> > +void
> > +errstat_reset(void)
> 
> 15. This function is never used.

thanks! will drop.

> > +static void
> > +prepare_report_script(void)
> > +{
> > +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
> > +	struct errstat_build *binfo = &errstat_get()->build_info;
> > +	struct errstat_uname *uinfo = &errstat_get()->uname_info;
> > +
> > +	char *p, *e;
> > +
> > +#ifdef ENABLE_BACKTRACE
> > +	/*
> > +	 * We can't use arbitrary data which can be misinterpreted
> > +	 * by Lua script when we pass it to a script.
> > +	 *
> > +	 * WARNING: We use report_encoded as a temp buffer.
> 
> 16. Static buffer is much bigger, and is already there. Why
> can't you use it? Instead you somewhy add several kilobytes of
> another static buffer padding out the process size.

static buffer is three pages only while we might need more
size i suspect. let me reconsider this moment maybe indeed
using tt_static would suite better. thanks!

> > +
> > +	snprintf_safe("require(\'http.client\').post(\'%s\',"
> > +		      "'{\"crashdump\":{\"version\":\"%d\","
> > +		      "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",
> 
> 17. Why so small timeout?

this is the same timeout the feedback daemon uses. I can increase it
or make it configurable if you prefer.

> > +out:
> > +	pr_crit("unable to prepare a crash report");
> > +	struct sigaction sa = {
> > +		.sa_handler = SIG_DFL,
> > +	};
> > +	sigemptyset(&sa.sa_mask);
> > +	sigaction(SIGABRT, &sa, NULL);
> 
> 18. Why do you touch the signal handler here? You didn't
> change it anywhere in this file.

It is taken from main signal handler code. I thought it was
done this way in presume that this is safer -- ie even if
one day code get changed and someone add sigabrt hander we
still will produce a coredump. Probably worth to just drop
out.

> > +	abort();
> > +}
> > +
> > +void
> > +errstat_exec_send_crash(void)
> 
> 19. Why is it called 'exec'?

Because it runs exec inside. I can drop this exec_ if you prefer.

> > +
> > +#if defined(__cplusplus)
> > +extern "C" {
> > +#endif /* defined(__cplusplus) */
> > +
> > +#define ERRSTAT_REPORT_VERSION 1
> 
> 20. It is not used out of .c file. I suggest you to
> move it there.

ok

> > +
> > +/**
> > + * Return a pointer to the info keeper.
> > + */
> > +extern struct errstat *
> 
> 21. Please, omit 'extern'. They don't make any difference
> here, and stand out of all the other libs we have.

ok

> > +
> > +/**
> > + * Collect a crash.
> > + */
> > +extern void
> > +errstat_collect_crash(int signo, siginfo_t *siginfo, void *context);
> > +
> > +/**
> > + * Send a crash report.
> > + */
> > +extern void
> > +errstat_exec_send_crash(void);
> 
> 22. Why do you have collect and send separated? Why can't it just be
> something like 'crashinfo_handle_crash(sig stuff...);'

To reuse regs and backtrace

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals
  2020-12-05 18:33   ` Vladislav Shpilevoy
@ 2020-12-05 20:04     ` Cyrill Gorcunov
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2020-12-05 20:04 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml

On Sat, Dec 05, 2020 at 07:33:49PM +0100, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> See 2 comments below.
> 
> On 04.12.2020 16:30, Cyrill Gorcunov via Tarantool-patches wrote:
> > In errstat code we fetch the signal statistic and
> > generate a backtrace for report. We don't send this
> > data right now but can reuse this code to not decode
> > registers and generate backtrace twice.
> > 
> > Part-of #5261
> > 
> > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> > ---
> >  src/main.cc | 83 ++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 47 insertions(+), 36 deletions(-)
> > 
> > diff --git a/src/main.cc b/src/main.cc
> > index 2f48f474c..260b9a0ff 100644
> > --- a/src/main.cc
> > +++ b/src/main.cc> @@ -253,6 +253,10 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
> >  	}
> >  
> >  	in_cb = 1;
> > +	/*
> > +	 * Notify errstat engine about the crash.
> > +	 */
> > +	errstat_collect_crash(signo, siginfo, context);
> 
> 1. So I probably see now. You have collect and send separated,
> because there is also printing below. In that case I can propose
> to make it more explicit somehow. Because when I firstly looked
> at that, it took quite some time to realize, that 'cinfo' variable
> you obtain above it filled inside errstat_collect_crash. It just is
> not passed to there.
> 
> For example, you can delete errstat_get(), and make
> collect_crash() return struct errstat_crash*, which you will use
> below for the printing.
> 
> Another option - move entire signal handler to your module, and then
> access any internals of it. This option looks the best for me. Having
> platform-dependent code both in the new module and in main.cc doing
> basically the same but for stderr vs network looks strange.

Thanks! I'll try to implement. Looks reasonable to me.

> >  	fdprintf(fd, "Current time: %u\n", (unsigned) time(0));
> > @@ -290,8 +294,14 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
> >  #ifdef ENABLE_BACKTRACE
> >  	fdprintf(fd, "Attempting backtrace... Note: since the server has "
> >  		 "already crashed, \nthis may fail as well\n");
> > -	print_backtrace();
> > +	fdprintf(STDERR_FILENO, "%s", cinfo->backtrace_buf);
> >  #endif
> > +	/*
> > +	 * If sending crash report to the feedback server is
> > +	 * allowed we won't be generating local core dump but
> > +	 * rather try to send data and exit.
> 
> 2. But why?
> 
> If as a dump you mean the text printed above, it is still printed.
> If you mean the dump with the data, your module does not send it,
> and it is still dumped locally, below.

Actually this comment left from previous version which I didn't send.
There I was simply using execve, instead of vfork+execve. Thus once
execve passed no more local abort() call happened. Thanks for pointing,
need to drop this comment.

> Btw, having errstat_exec_send_crash() with void args makes existence
> of errstat_get() even stranger.

Indeed. I'll think. Thanks for all your comments!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code
  2020-12-05 19:58     ` Cyrill Gorcunov
@ 2020-12-06 15:50       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-06 15:50 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Mons Anderson, tml

On 05.12.2020 20:58, Cyrill Gorcunov wrote:
> On Sat, Dec 05, 2020 at 07:33:05PM +0100, Vladislav Shpilevoy wrote:
>>>  |   "build": {
>>>  |     "major": 2,
>>>  |     "minor": 7,
>>>  |     "patch": 0,
>>
>> 1. Why do you need the version numbers and the 'version'
>> string both? All can be extracted from 'version' string.
> 
> Sure we can extract it but I thought this gonna be more
> convenient 'cause it won't require the reader to split
> string, extract the major/minor. This is not a problem
> to drop these fields and leave "version" only if you prefer.
> I don't mind.

Currently the reader must concatenate these 3 lines into 1,
which is already present in a next field.

Also this will be handled by the feedback server, not by people.
The people will see some digested form of all of this.

>>>  |   },
>>>  |   "signal": {
>>>  |     "signo": 11,
>>>  |     "si_code": 0,
>>>  |     "si_addr": "0x3e800009727",
>>>  |     "backtrace": "IzAgIDB4NjMyODhiIGlu..",
>>>  |     "timestamp": "0x164ceceac0610ae9",
>>
>> 2. Why is timestamp in hex? And why do you need both
>> hex and string?
> 
> hex is simply more short form for timestamp, I can put
> plain integer number if you prefer.

I don't understand why do we need both hex and string. The
string already contains the timestamp + zone, which can be
parsed on the server side into a number if necessary.

> as to time string -- the prblem is in local timezone,
> the timestamp itself doesn't show which tz it is. thus
> we either have to send tz explicitly or a time string.
> 
> if you mean to simply drop timestamp and use timestring
> only I'm ok with this..

Better drop the hex, I think. The number can be calculated
from the string, and it does not improve the readability
anyhow. Which is not necessary anyway, because it will be
parsed by a server, and can be converted to any form there.

>>>  |     "timestamp_str": "2020-12-02 17:34:20 MSK"
>>>  |   }
>>>  | }
>>>
>>> +static struct errstat glob_errstat;
>>> +static bool cfg_send_crash = false;
>>
>> 5. Why do you have a global object for this module, and yet
>> this part of it is separated into another global variable?
> 
> Good point. Actually I introduced this cfg_send_crash a way
> later than glob_errstat. Indeed we can keep this settings inside
> errstat.
> 
> Also let me explain why this structure called errstat - it
> keeps not just report to send but also registers (we use it
> to print to a console), backtrace, report.

But it is also not errors and not statistics.

> And we're planning
> to gather diag_set() calls here as well and provide some
> box.info interface for last say 10 errors.

Why here? Normal errors and their statistics has nothing to
do with crashes. Only common property of them is that they
are not normal, usually.

>>> +static unsigned char *
>>> +base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
>>> +{
>>
>> 6. Why can't you use the existing base64_encode() function from
>> base64.h? It is not different from your version. You both use global
>> variables (static also is global, just not visible everywhere). So
>> your version is not 'safer' in terms of being called from a signal
>> handler.
> 
> This library is not linked into src/lib/core by default I can add it
> actually but since it is third_party library I can't be sure that
> it won't be changed someday leading to unpredicted results.

It is in third_party folder, but it is not a submodule. It is our code,
which we can decide when and how to change.

Talking of third_party, I don't know why some of the code is stored
here. I think it is supposed to store submodules which we can only update
from the upstream sometimes, but we also store files here which should
belong to src/lib, with our own code.

With that said, treat base64 as our own library. You can use it anywhere
and not be afraid that it will be changed or something.

> As you
> might note I try to use local functions as much as I can and I think
> we need to implement own strlen/snprintf and etc here as well because
> when we're inside sigsegv some state of libc/third_party components
> might be broken.
> 
> If you think it is safe to reuse base64 here I'll link this library in.

Yes, it is safe. At least not less safe than the inlined version here.

>>> +static void
>>> +prepare_report_script(void)
>>> +{
>>> +	struct errstat_crash *cinfo = &errstat_get()->crash_info;
>>> +	struct errstat_build *binfo = &errstat_get()->build_info;
>>> +	struct errstat_uname *uinfo = &errstat_get()->uname_info;
>>> +
>>> +	char *p, *e;
>>> +
>>> +#ifdef ENABLE_BACKTRACE
>>> +	/*
>>> +	 * We can't use arbitrary data which can be misinterpreted
>>> +	 * by Lua script when we pass it to a script.
>>> +	 *
>>> +	 * WARNING: We use report_encoded as a temp buffer.
>>
>> 16. Static buffer is much bigger, and is already there. Why
>> can't you use it? Instead you somewhy add several kilobytes of
>> another static buffer padding out the process size.
> 
> static buffer is three pages only while we might need more
> size i suspect. let me reconsider this moment maybe indeed
> using tt_static would suite better. thanks!

Isn't your backtrace buffer smaller than 3 pages?

>>> +
>>> +	snprintf_safe("require(\'http.client\').post(\'%s\',"
>>> +		      "'{\"crashdump\":{\"version\":\"%d\","
>>> +		      "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",
>>
>> 17. Why so small timeout?
> 
> this is the same timeout the feedback daemon uses. I can increase it
> or make it configurable if you prefer.

No, should be fine then.

>>> +	abort();
>>> +}
>>> +
>>> +void
>>> +errstat_exec_send_crash(void)
>>
>> 19. Why is it called 'exec'?
> 
> Because it runs exec inside. I can drop this exec_ if you prefer.

It is not a good idea to expose what this function does inside
if it does not affect the caller. It calls exec in the child
process, so it has nothing to do with the caller. For example,
why popen_new() is not called popen_new_exec() then?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-12-06 15:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04 15:29 [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Cyrill Gorcunov
2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
2020-12-05 18:30   ` Vladislav Shpilevoy
2020-12-05 18:52     ` Cyrill Gorcunov
2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 2/4] errstat: add crash report base code Cyrill Gorcunov
2020-12-05 18:33   ` Vladislav Shpilevoy
2020-12-05 19:58     ` Cyrill Gorcunov
2020-12-06 15:50       ` Vladislav Shpilevoy
2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
2020-12-05 18:33   ` Vladislav Shpilevoy
2020-12-05 20:04     ` Cyrill Gorcunov
2020-12-04 15:30 ` [Tarantool-patches] [PATCH v3 4/4] cfg: configure crash report sending Cyrill Gorcunov
2020-12-05 18:34   ` Vladislav Shpilevoy
2020-12-05 18:30 ` [Tarantool-patches] [PATCH v3 0/4] crash dump: implement sending feedback Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox