Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback
@ 2020-12-02 15:18 Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Cyrill Gorcunov @ 2020-12-02 15:18 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Our feedback daemon sends only a few portions of usage
statistics. But crash dumps are pretty important for us
as well, because real users may catch a way more important
issues than our testing farm, it is simply impossible to
cover all possible scenarios.

For this sake, if crash happens we can send it to our
feedback server.

In this series we implement only base functionality and
may extend it later.

I didn't find yet a simple way to test this code anything
but manually.

Any comments are highly appreciated.

issue https://github.com/tarantool/tarantool/issues/5261
branch gorcunov/gh-5261-crash-report-2

Cyrill Gorcunov (4):
  backtrace: allow to specify destination buffer
  errstat: add crash report base code
  crash: use errstat code in fatal signals
  cfg: allow to configure crash report sending

 src/box/box.cc              |   2 +
 src/box/lua/load_cfg.lua    |   3 +
 src/lib/core/CMakeLists.txt |   1 +
 src/lib/core/backtrace.cc   |  12 +-
 src/lib/core/backtrace.h    |   3 +
 src/lib/core/errstat.c      | 390 ++++++++++++++++++++++++++++++++++++
 src/lib/core/errstat.h      | 253 +++++++++++++++++++++++
 src/main.cc                 |  83 ++++----
 8 files changed, 705 insertions(+), 42 deletions(-)
 create mode 100644 src/lib/core/errstat.c
 create mode 100644 src/lib/core/errstat.h


base-commit: 71377c28e1c20108a8691481660bea6263f9a2e8
-- 
2.26.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 1/4] backtrace: allow to specify destination buffer
  2020-12-02 15:18 [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
@ 2020-12-02 15:18 ` Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 2/4] errstat: add crash report base code Cyrill Gorcunov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Cyrill Gorcunov @ 2020-12-02 15:18 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

This will allow to reuse this routine in crash
reports.

Part-of #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/backtrace.cc | 12 ++++++------
 src/lib/core/backtrace.h  |  3 +++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/lib/core/backtrace.cc b/src/lib/core/backtrace.cc
index 456ce9a4d..68d0d3ee6 100644
--- a/src/lib/core/backtrace.cc
+++ b/src/lib/core/backtrace.cc
@@ -131,7 +131,7 @@ get_proc_name(unw_cursor_t *unw_cur, unw_word_t *offset, bool skip_cache)
 }
 
 char *
-backtrace(void)
+backtrace(char *start, char *end)
 {
 	int frame_no = 0;
 	unw_word_t sp = 0, old_sp = 0, ip, offset;
@@ -139,10 +139,8 @@ backtrace(void)
 	unw_getcontext(&unw_context);
 	unw_cursor_t unw_cur;
 	unw_init_local(&unw_cur, &unw_context);
-	char *backtrace_buf = (char *)static_alloc(SMALL_STATIC_SIZE);
-	char *p = backtrace_buf;
-	char *end = p + SMALL_STATIC_SIZE - 1;
 	int unw_status;
+	char *p = start;
 	*p = '\0';
 	while ((unw_status = unw_step(&unw_cur)) > 0) {
 		const char *proc;
@@ -174,7 +172,7 @@ backtrace(void)
 		say_debug("unwinding error: %i", unw_status);
 #endif
 out:
-	return backtrace_buf;
+	return start;
 }
 
 /*
@@ -436,7 +434,9 @@ backtrace_foreach(backtrace_cb cb, coro_context *coro_ctx, void *cb_ctx)
 void
 print_backtrace(void)
 {
-	fdprintf(STDERR_FILENO, "%s", backtrace());
+	char *start = (char *)static_alloc(SMALL_STATIC_SIZE);
+	char *end = start + SMALL_STATIC_SIZE - 1;
+	fdprintf(STDERR_FILENO, "%s", backtrace(start, end));
 }
 #endif /* ENABLE_BACKTRACE */
 
diff --git a/src/lib/core/backtrace.h b/src/lib/core/backtrace.h
index c119d5402..55489c01b 100644
--- a/src/lib/core/backtrace.h
+++ b/src/lib/core/backtrace.h
@@ -40,6 +40,9 @@ extern "C" {
 #ifdef ENABLE_BACKTRACE
 #include <coro.h>
 
+char *
+backtrace(char *start, char *end);
+
 void print_backtrace(void);
 
 typedef int (backtrace_cb)(int frameno, void *frameret,
-- 
2.26.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 2/4] errstat: add crash report base code
  2020-12-02 15:18 [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
@ 2020-12-02 15:18 ` Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Cyrill Gorcunov @ 2020-12-02 15:18 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

The errstat stands for error statistics. At moment it supports
gathering crash dumps into an internal preallocated buffer
which includes:

 - `uname` output
 - build type
 - a reason for the crash
 - call backtrace (linux x86-64 only)

The data is collected into json format and then encoded into
base64 form. Moreover the backtrace itself is preencoded as
base64 earlier because we don't need the json values to consist
some weird characters.

Once data is collected one can run errstat_exec_send_crash function
which executes another copy of tarantool and passes it a script
in form of

 > tarantool -e "require('http.client').post('127.0.0.1:1500', \
 > '{"crashdump":{"version":"1","data":"eyJ1bmFtZSI6eyJzeXNuYW1l.."}}', \
 > {timeout=1}); os.exit(1);"

The address of the network dump collector is configurable via traditional
box.cfg{feedback_host=addr}.

Note that we're trying to use preallocated memory for data collecting
since this code is supposed to be called from inside of a fatal signal
handler. For simplicity we use snprintf inside encoder but should consider
to implement own helpers instead to minimize the use of system libraries.

Also when data is about to send the routine doesn't use any fork/wait
calls but direct execve instead to eliminate possibility of subsequent
fails (and doesn't cleanup file descriptors) which means the caller's
terminal might need a reset after this.

A typical encoded data looks like

 | {
 |   "uname": {
 |     "sysname": "Linux",
 |     "release": "5.9.10-100.fc32.x86_64",
 |     "version": "#1 SMP Mon Nov 23 18:12:36 UTC 2020",
 |     "machine": "x86_64"
 |   },
 |   "build": {
 |     "major": 2,
 |     "minor": 7,
 |     "patch": 0,
 |     "version": "2.7.0-78-gf898822f9",
 |     "cmake_type": "Linux-x86_64-Debug"
 |   },
 |   "signal": {
 |     "signo": 11,
 |     "si_code": 0,
 |     "si_addr": "0x3e800009727",
 |     "backtrace": "IzAgIDB4NjMyODhiIGlu..",
 |     "timestamp": "0x164ceceac0610ae9",
 |     "timestamp_str": "2020-12-02 17:34:20 MSK"
 |   }
 | }

Part-of #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/CMakeLists.txt |   1 +
 src/lib/core/errstat.c      | 390 ++++++++++++++++++++++++++++++++++++
 src/lib/core/errstat.h      | 253 +++++++++++++++++++++++
 3 files changed, 644 insertions(+)
 create mode 100644 src/lib/core/errstat.c
 create mode 100644 src/lib/core/errstat.h

diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt
index 13ed1e7ab..621a4f019 100644
--- a/src/lib/core/CMakeLists.txt
+++ b/src/lib/core/CMakeLists.txt
@@ -1,5 +1,6 @@
 set(core_sources
     diag.c
+    errstat.c
     say.c
     memory.c
     clock.c
diff --git a/src/lib/core/errstat.c b/src/lib/core/errstat.c
new file mode 100644
index 000000000..5413420ba
--- /dev/null
+++ b/src/lib/core/errstat.c
@@ -0,0 +1,390 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#include <string.h>
+#include <time.h>
+#include <sys/utsname.h>
+
+#include "trivia/util.h"
+#include "backtrace.h"
+#include "errstat.h"
+#include "cfg.h"
+#include "say.h"
+
+#define pr_fmt(fmt)		"errstat: " fmt
+#define pr_info(fmt, ...)	say_info(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)	say_error(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)	fprintf(stderr, pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_panic(fmt, ...)	panic(pr_fmt(fmt), ##__VA_ARGS__)
+
+static struct errstat glob_errstat;
+static bool cfg_send_crash = false;
+
+/*
+ * We don't need it to be optimized but rather a compact form.
+ */
+static unsigned char *
+base64_encode(unsigned char *dst, unsigned char *src, size_t src_len)
+{
+	static int m[] = {0, 2, 1};
+	static unsigned char t[] = {
+		'A','B','C','D','E','F','G','H',
+		'I','J','K','L','M','N','O','P',
+		'Q','R','S','T','U','V','W','X',
+		'Y','Z','a','b','c','d','e','f',
+		'g','h','i','j','k','l','m','n',
+		'o','p','q','r','s','t','u','v',
+		'w','x','y','z','0','1','2','3',
+		'4','5','6','7','8','9','+','/'
+	};
+	size_t i, j;
+
+	for (i = 0, j = 0; i < src_len;) {
+		uint32_t a = i < src_len ? src[i++] : 0;
+		uint32_t b = i < src_len ? src[i++] : 0;
+		uint32_t c = i < src_len ? src[i++] : 0;
+
+		uint32_t d = (a << 0x10) + (b << 0x08) + c;
+
+		dst[j++] = t[(d >> 3 * 6) & 0x3f];
+		dst[j++] = t[(d >> 2 * 6) & 0x3f];
+		dst[j++] = t[(d >> 1 * 6) & 0x3f];
+		dst[j++] = t[(d >> 0 * 6) & 0x3f];
+	}
+
+	size_t dst_len = ERRSTAT_BASE64_LEN(src_len);
+	j = m[src_len % 3];
+	for (i = 0; i < j; i++)
+		dst[dst_len - 1 - i] = '=';
+
+	return dst;
+}
+
+static size_t
+strlcpy(char *dst, const char *src, size_t size)
+{
+	size_t ret = strlen(src);
+	if (size) {
+		size_t len = (ret >= size) ? size - 1 : ret;
+		memcpy(dst, src, len);
+		dst[len] = '\0';
+	}
+	return ret;
+}
+
+#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
+
+struct errstat *
+errstat_get(void)
+{
+	return &glob_errstat;
+}
+
+static inline
+uint64_t timespec_to_ns(struct timespec *ts)
+{
+	return (uint64_t)ts->tv_sec * 1000000000 + (uint64_t)ts->tv_nsec;
+}
+
+static char *
+ns_to_localtime(uint64_t timestamp, char *buf, ssize_t len)
+{
+	time_t sec = timestamp / 1000000000;
+	char *start = buf;
+	struct tm tm;
+
+	/*
+	 * Use similar format as say_x logger. Except plain
+	 * seconds should be enough.
+	 */
+	localtime_r(&sec, &tm);
+	ssize_t total = strftime(start, len, "%F %T %Z", &tm);
+	start += total;
+	if (total < len)
+		return buf;
+	buf[len-1] = '\0';
+	return buf;
+}
+
+void
+errstat_init(const char *tarantool_bin)
+{
+	struct errstat_build *binfo = &errstat_get()->build_info;
+	struct errstat_uname *uinfo = &errstat_get()->uname_info;
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+
+	binfo->major = PACKAGE_VERSION_MAJOR;
+	binfo->minor = PACKAGE_VERSION_MINOR;
+	binfo->patch = PACKAGE_VERSION_PATCH;
+
+	strlcpy_a(binfo->version, PACKAGE_VERSION);
+	strlcpy_a(binfo->cmake_type, BUILD_INFO);
+
+	static_assert(ERRSTAT_UNAME_BUF_LEN > sizeof(struct errstat_uname),
+		      "uname_buf is too small");
+
+	char uname_buf[ERRSTAT_UNAME_BUF_LEN];
+	struct utsname *uname_ptr = (void *)uname_buf;
+	if (uname(uname_ptr) == 0) {
+		strlcpy_a(uinfo->sysname, uname_ptr->sysname);
+		strlcpy_a(uinfo->nodename, uname_ptr->nodename);
+		strlcpy_a(uinfo->release, uname_ptr->release);
+		strlcpy_a(uinfo->version, uname_ptr->version);
+		strlcpy_a(uinfo->machine, uname_ptr->machine);
+	} else
+		pr_err("can't fetch uname");
+
+	strlcpy_a(cinfo->tarantool_bin, tarantool_bin);
+	if (strlen(cinfo->tarantool_bin) < strlen(tarantool_bin))
+		pr_panic("can't save binary path");
+
+	static_assert(sizeof(cinfo->exec_argv_1) == 4,
+		      "exec_argv_1 is too small");
+	strlcpy_a(cinfo->exec_argv_1, "-e");
+}
+
+void
+box_errstat_cfg(void)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+	const char *feedback_host = cfg_gets("feedback_host");
+	int feedback_enabled = cfg_getb("feedback_enabled");
+	int charsh_enabled = cfg_getb("feedback_crash");
+
+	if (feedback_enabled == 1 && charsh_enabled == 1&&
+	    feedback_host != NULL) {
+		strlcpy_a(cinfo->feedback_host, feedback_host);
+		if (strlen(cinfo->feedback_host) < strlen(feedback_host))
+			pr_panic("feedback_host is too long");
+		pr_info("enable crash report");
+		cfg_send_crash = true;
+	} else {
+		cfg_send_crash = false;
+		pr_info("disable crash report");
+		cinfo->feedback_host[0] = '\0';
+	}
+}
+
+void
+errstat_reset(void)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+
+#ifdef ENABLE_BACKTRACE
+	cinfo->backtrace_buf[0] = '\0';
+#endif
+	memset(&cinfo->siginfo, 0, sizeof(cinfo->siginfo));
+	cinfo->timestamp_rt = 0;
+}
+
+#ifdef TARGET_OS_LINUX
+static void
+collect_gregs(struct errstat_crash *cinfo, ucontext_t *uc)
+{
+	static_assert(sizeof(cinfo->greg) == sizeof(uc->uc_mcontext),
+		      "GP regs are not matching signal frame");
+
+	/*
+	 * uc_mcontext on libc level looks somehow strange,
+	 * they define an array of uint64_t where each register
+	 * defined by REG_x macro.
+	 *
+	 * In turn the kernel is quite explicit about the context.
+	 * Moreover it is a part of user ABI, thus won't be changed.
+	 *
+	 * Lets use memcpy here to make a copy in a fast way.
+	 */
+	memcpy(&cinfo->greg, &uc->uc_mcontext, sizeof(cinfo->greg));
+}
+#endif
+
+/**
+ * The routine is called inside crash signal handler so
+ * be carefull to not cause additional signals inside.
+ */
+void
+errstat_collect_crash(int signo, siginfo_t *siginfo, void *ucontext)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+
+	struct timespec ts;
+	if (clock_gettime(CLOCK_REALTIME, &ts) == 0) {
+		cinfo->timestamp_rt = timespec_to_ns(&ts);
+		ns_to_localtime(cinfo->timestamp_rt,
+				cinfo->timestamp_rt_str,
+				sizeof(cinfo->timestamp_rt_str));
+	} else {
+		cinfo->timestamp_rt = 0;
+		memset(cinfo->timestamp_rt_str, 0,
+		       sizeof(cinfo->timestamp_rt_str));
+	}
+
+	cinfo->signo = signo;
+	cinfo->siginfo = *siginfo;
+
+	cinfo->context_addr = ucontext;
+	cinfo->siginfo_addr = siginfo;
+
+#ifdef ENABLE_BACKTRACE
+	char *start = cinfo->backtrace_buf;
+	char *end = start + sizeof(cinfo->backtrace_buf) - 1;
+	backtrace(start, end);
+#endif
+
+#ifdef TARGET_OS_LINUX
+	collect_gregs(cinfo, ucontext);
+#endif
+}
+
+/**
+ * Prepare report in json format and put it into a buffer.
+ */
+static void
+prepare_report_script(void)
+{
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+	struct errstat_build *binfo = &errstat_get()->build_info;
+	struct errstat_uname *uinfo = &errstat_get()->uname_info;
+
+	char *p, *e;
+
+#ifdef ENABLE_BACKTRACE
+	/*
+	 * We can't use arbitrary data which can be misinterpreted
+	 * by Lua script when we pass it to a script.
+	 *
+	 * WARNING: We use report_encoded as a temp buffer.
+	 */
+	size_t bt_len = strlen(cinfo->backtrace_buf);
+	size_t bt_elen = ERRSTAT_BASE64_LEN(bt_len);
+	if (bt_elen >= sizeof(cinfo->report_encoded))
+		pr_panic("backtrace space is too small");
+
+	base64_encode((unsigned char *)cinfo->report_encoded,
+		      (unsigned char *)cinfo->backtrace_buf, bt_len);
+	cinfo->report_encoded[bt_elen] = '\0';
+	memcpy(cinfo->backtrace_buf, cinfo->report_encoded, bt_elen + 1);
+#endif
+
+#define snprintf_safe(fmt, ...)					\
+	do {							\
+		p += snprintf(p, e - p, fmt, ##__VA_ARGS__);	\
+		if (p >= e)					\
+			goto out;				\
+	} while (0)
+
+	e = cinfo->report + sizeof(cinfo->report) - 1;
+	p = cinfo->report;
+
+	snprintf_safe("{");
+	snprintf_safe("\"uname\":{");
+	snprintf_safe("\"sysname\":\"%s\",", uinfo->sysname);
+#if 0
+	/*
+	 * nodename might a sensitive information so don't
+	 * send it by default.
+	 */
+	snprintf_safe("\"nodename\":\"%s\",", uinfo->nodename);
+#endif
+	snprintf_safe("\"release\":\"%s\",", uinfo->release);
+	snprintf_safe("\"version\":\"%s\",", uinfo->version);
+	snprintf_safe("\"machine\":\"%s\"", uinfo->machine);
+	snprintf_safe("},");
+
+	snprintf_safe("\"build\":{");
+	snprintf_safe("\"major\":%d,", binfo->major);
+	snprintf_safe("\"minor\":%d,", binfo->minor);
+	snprintf_safe("\"patch\":%d,", binfo->patch);
+	snprintf_safe("\"version\":\"%s\",", binfo->version);
+	snprintf_safe("\"cmake_type\":\"%s\"", binfo->cmake_type);
+	snprintf_safe("},");
+
+	snprintf_safe("\"signal\":{");
+	snprintf_safe("\"signo\":%d,", cinfo->signo);
+	snprintf_safe("\"si_code\":%d,", cinfo->siginfo.si_code);
+	if (cinfo->signo == SIGSEGV) {
+		if (cinfo->siginfo.si_code == SEGV_MAPERR) {
+			snprintf_safe("\"si_code_str\":\"%s\",",
+				      "SEGV_MAPERR");
+		} else if (cinfo->siginfo.si_code == SEGV_ACCERR) {
+			snprintf_safe("\"si_code_str\":\"%s\",",
+				      "SEGV_ACCERR");
+		}
+		snprintf_safe("\"si_addr\":\"0x%llx\",",
+			      (long long)cinfo->siginfo.si_addr);
+	}
+#ifdef ENABLE_BACKTRACE
+	snprintf_safe("\"backtrace\":\"%s\",", cinfo->backtrace_buf);
+#endif
+	snprintf_safe("\"timestamp\":\"0x%llx\",",
+		      (long long)cinfo->timestamp_rt);
+	snprintf_safe("\"timestamp_str\":\"%s\"",
+		      cinfo->timestamp_rt_str);
+	snprintf_safe("}");
+	snprintf_safe("}\'");
+
+	size_t report_len = strlen(cinfo->report);
+	size_t report_elen = ERRSTAT_BASE64_LEN(report_len);
+	if (report_elen >= sizeof(cinfo->report_encoded))
+		pr_panic("report encoded space is too small");
+
+	base64_encode((unsigned char *)cinfo->report_encoded,
+		      (unsigned char *)cinfo->report,
+		      report_len);
+	cinfo->report_encoded[report_elen] = '\0';
+
+	e = cinfo->report_script + sizeof(cinfo->report_script) - 1;
+	p = cinfo->report_script;
+
+	strcpy(cinfo->feedback_host, "127.0.0.1:1500");
+	snprintf_safe("require(\'http.client\').post(\'%s\',"
+		      "'{\"crashdump\":{\"version\":\"%d\","
+		      "\"data\":\"%s\"}}',{timeout=1}); os.exit(1);",
+		      cinfo->feedback_host,
+		      ERRSTAT_REPORT_VERSION,
+		      cinfo->report_encoded);
+
+#undef snprintf_safe
+	return;
+
+out:
+	pr_crit("unable to prepare a crash report");
+	struct sigaction sa = {
+		.sa_handler = SIG_DFL,
+	};
+	sigemptyset(&sa.sa_mask);
+	sigaction(SIGABRT, &sa, NULL);
+
+	abort();
+}
+
+void
+errstat_exec_send_crash(void)
+{
+	if (!cfg_send_crash)
+		return;
+
+	prepare_report_script();
+
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
+	cinfo->exec_argv[0] = cinfo->tarantool_bin;
+	cinfo->exec_argv[1] = cinfo->exec_argv_1;
+	cinfo->exec_argv[2] = cinfo->report_script;
+	cinfo->exec_argv[3] = NULL;
+
+	/*
+	 * The script must exit at the end but there
+	 * is no simple way to make sure from inside
+	 * of a signal crash handler. So just hope it
+	 * is running fine.
+	 */
+	execve(cinfo->tarantool_bin, cinfo->exec_argv, NULL);
+	pr_panic("errstat: exec(%s,[%s,%s,%s,NULL]) failed",
+		 cinfo->tarantool_bin,
+		 cinfo->exec_argv[0],
+		 cinfo->exec_argv[1],
+		 cinfo->exec_argv[2]);
+}
diff --git a/src/lib/core/errstat.h b/src/lib/core/errstat.h
new file mode 100644
index 000000000..eb2a2e8c8
--- /dev/null
+++ b/src/lib/core/errstat.h
@@ -0,0 +1,253 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+#pragma once
+
+#include <stdint.h>
+#include <signal.h>
+#include <limits.h>
+
+#include "trivia/config.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif /* defined(__cplusplus) */
+
+#define ERRSTAT_REPORT_VERSION 1
+
+/**
+ * Build type information for statistics.
+ */
+struct errstat_build {
+	/**
+	 * Package major version - 1 for 1.6.7.
+	 */
+	int major;
+	/**
+	 * Package minor version - 6 for 1.6.7
+	 */
+	int minor;
+	/**
+	 * Package patch version - 7 for 1.6.7
+	 */
+	int patch;
+	/**
+	 * A string with major-minor-patch-commit-id identifier of the
+	 * release, e.g. 2.7.0-62-g0b7726571.
+	 */
+	char version[64];
+	/**
+	 * Build type (Debug and etc).
+	 */
+	char cmake_type[64];
+};
+
+#ifdef TARGET_OS_LINUX
+#ifndef __x86_64__
+# error "Non x86-64 architectures are not supported"
+#endif
+struct errstat_greg {
+	uint64_t	r8;
+	uint64_t	r9;
+	uint64_t	r10;
+	uint64_t	r11;
+	uint64_t	r12;
+	uint64_t	r13;
+	uint64_t	r14;
+	uint64_t	r15;
+	uint64_t	di;
+	uint64_t	si;
+	uint64_t	bp;
+	uint64_t	bx;
+	uint64_t	dx;
+	uint64_t	ax;
+	uint64_t	cx;
+	uint64_t	sp;
+	uint64_t	ip;
+	uint64_t	flags;
+	uint16_t	cs;
+	uint16_t	gs;
+	uint16_t	fs;
+	uint16_t	ss;
+	uint64_t	err;
+	uint64_t	trapno;
+	uint64_t	oldmask;
+	uint64_t	cr2;
+	uint64_t	fpstate;
+	uint64_t	reserved1[8];
+};
+#endif /* TARGET_OS_LINUX */
+
+#define ERRSTAT_BASE64_LEN(len)			(4 * (((len) + 2) / 3))
+
+/*
+ * 4K of memory should be enough to keep the backtrace.
+ * In worst case it gonna be simply trimmed. Since we're
+ * reporting it encoded the pain text shrinks to 3070 bytes.
+ */
+#define ERRSTAT_BACKTRACE_MAX			(4096)
+
+/*
+ * The report should include the bactrace
+ * and all additional information we're
+ * going to send.
+ */
+#define ERRSTAT_REPORT_PAYLOAD_MAX		(2048)
+#ifdef ENABLE_BACKTRACE
+# define ERRSTAT_REPORT_MAX			\
+	(ERRSTAT_BACKTRACE_MAX +		\
+	 ERRSTAT_REPORT_PAYLOAD_MAX)
+# else
+# define ERRSTAT_REPORT_MAX			\
+	(ERRSTAT_REPORT_PAYLOAD_MAX)
+#endif
+
+/*
+ * We encode report into base64 because
+ * it is passed inside Lua script.
+ */
+#define ERRSTAT_REPORT_ENCODED_MAX		\
+	ERRSTAT_BASE64_LEN(ERRSTAT_REPORT_MAX)
+
+
+/*
+ * The script to execute should contain encoded
+ * report.
+ */
+#define ERRSTAT_REPORT_SCRIPT_PAYLOAD_MAX	(512)
+#define ERRSTAT_REPORT_SCRIPT_MAX		\
+	(ERRSTAT_REPORT_SCRIPT_PAYLOAD_MAX +	\
+	 ERRSTAT_REPORT_ENCODED_MAX)
+
+struct errstat_crash {
+	/**
+	 * Exec arguments pointers.
+	 */
+	char *exec_argv[4];
+	/**
+	 * Predefined argument "-e".
+	 */
+	char exec_argv_1[4];
+	/**
+	 * Crash report in plain json format.
+	 */
+	char report[ERRSTAT_REPORT_MAX];
+	/**
+	 * Crash report in base64 form.
+	 */
+	char report_encoded[ERRSTAT_REPORT_ENCODED_MAX];
+	/**
+	 * Tarantool executable to send report stript.
+	 */
+	char tarantool_bin[PATH_MAX];
+	/**
+	 * The script to evaluate by tarantool
+	 * to send the report.
+	 */
+	char report_script[ERRSTAT_REPORT_SCRIPT_MAX];
+#ifdef ENABLE_BACKTRACE
+	/**
+	 * Backtrace buffer.
+	 */
+	char backtrace_buf[ERRSTAT_BACKTRACE_MAX];
+#endif
+	/**
+	 * Crash signal.
+	 */
+	int signo;
+	/**
+	 * Signal information.
+	 */
+	siginfo_t siginfo;
+	/**
+	 * These two are mostly useless as being
+	 * plain addresses but keep for backward
+	 * compatibility.
+	 */
+	void *context_addr;
+	void *siginfo_addr;
+#ifdef TARGET_OS_LINUX
+	/**
+	 * Registers contents.
+	 */
+	struct errstat_greg greg;
+#endif
+	/**
+	 * Timestamp in nanoseconds (realtime).
+	 */
+	uint64_t timestamp_rt;
+	/**
+	 * Timestamp string representation to
+	 * use on demand.
+	 */
+	char timestamp_rt_str[32];
+	/**
+	 * Crash collector host.
+	 */
+	char feedback_host[1024];
+};
+
+#define ERRSTAT_UNAME_BUF_LEN	1024
+#define ERRSTAT_UNAME_FIELD_LEN	128
+/**
+ * Information about node.
+ *
+ * On linux there is new_utsname structure which
+ * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
+ * So lets just reserve more data in advance.
+ */
+struct errstat_uname {
+	char sysname[ERRSTAT_UNAME_FIELD_LEN];
+	char nodename[ERRSTAT_UNAME_FIELD_LEN];
+	char release[ERRSTAT_UNAME_FIELD_LEN];
+	char version[ERRSTAT_UNAME_FIELD_LEN];
+	char machine[ERRSTAT_UNAME_FIELD_LEN];
+};
+
+struct errstat {
+	struct errstat_build build_info;
+	struct errstat_uname uname_info;
+	struct errstat_crash crash_info;
+};
+
+/**
+ * Return a pointer to the info keeper.
+ */
+extern struct errstat *
+errstat_get(void);
+
+/**
+ * Initialize error statistics.
+ */
+extern void
+errstat_init(const char *tarantool_bin);
+
+/**
+ * Configure errstat.
+ */
+extern void
+box_errstat_cfg(void);
+
+/**
+ * Reset everything except build information.
+ */
+extern void
+errstat_reset(void);
+
+/**
+ * Collect a crash.
+ */
+extern void
+errstat_collect_crash(int signo, siginfo_t *siginfo, void *context);
+
+/**
+ * Send a crash report.
+ */
+extern void
+errstat_exec_send_crash(void);
+
+#if defined(__cplusplus)
+}
+#endif /* defined(__cplusplus) */
-- 
2.26.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 3/4] crash: use errstat code in fatal signals
  2020-12-02 15:18 [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 2/4] errstat: add crash report base code Cyrill Gorcunov
@ 2020-12-02 15:18 ` Cyrill Gorcunov
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 4/4] cfg: allow to configure crash report sending Cyrill Gorcunov
  2020-12-04 15:29 ` [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
  4 siblings, 0 replies; 6+ messages in thread
From: Cyrill Gorcunov @ 2020-12-02 15:18 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

In errstat code we fetch the signal statistic and
generate a backtrace for report. We don't send this
data right now but can reuse this code to not decode
registers and generate backtrace twice.

Part-of #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/main.cc | 83 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 36 deletions(-)

diff --git a/src/main.cc b/src/main.cc
index 2f48f474c..260b9a0ff 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -79,6 +79,7 @@
 #include "systemd.h"
 #include "crypto/crypto.h"
 #include "core/popen.h"
+#include "core/errstat.h"
 
 static pid_t master_pid = getpid();
 static struct pidfh *pid_file_handle;
@@ -184,45 +185,43 @@ signal_sigwinch_cb(ev_loop *loop, struct ev_signal *w, int revents)
 		rl_resize_terminal();
 }
 
-#if defined(__linux__) && defined(__amd64)
-
-inline void
-dump_x86_64_register(const char *reg_name, unsigned long long val)
+#ifdef TARGET_OS_LINUX
+static inline void
+dump_register(const char *reg_name, unsigned long long val)
 {
 	fprintf(stderr, "  %-9s0x%-17llx%lld\n", reg_name, val, val);
 }
 
-void
-dump_x86_64_registers(ucontext_t *uc)
+static void
+dump_registers(struct errstat_crash *cinfo)
 {
-	dump_x86_64_register("rax", uc->uc_mcontext.gregs[REG_RAX]);
-	dump_x86_64_register("rbx", uc->uc_mcontext.gregs[REG_RBX]);
-	dump_x86_64_register("rcx", uc->uc_mcontext.gregs[REG_RCX]);
-	dump_x86_64_register("rdx", uc->uc_mcontext.gregs[REG_RDX]);
-	dump_x86_64_register("rsi", uc->uc_mcontext.gregs[REG_RSI]);
-	dump_x86_64_register("rdi", uc->uc_mcontext.gregs[REG_RDI]);
-	dump_x86_64_register("rsp", uc->uc_mcontext.gregs[REG_RSP]);
-	dump_x86_64_register("rbp", uc->uc_mcontext.gregs[REG_RBP]);
-	dump_x86_64_register("r8", uc->uc_mcontext.gregs[REG_R8]);
-	dump_x86_64_register("r9", uc->uc_mcontext.gregs[REG_R9]);
-	dump_x86_64_register("r10", uc->uc_mcontext.gregs[REG_R10]);
-	dump_x86_64_register("r11", uc->uc_mcontext.gregs[REG_R11]);
-	dump_x86_64_register("r12", uc->uc_mcontext.gregs[REG_R12]);
-	dump_x86_64_register("r13", uc->uc_mcontext.gregs[REG_R13]);
-	dump_x86_64_register("r14", uc->uc_mcontext.gregs[REG_R14]);
-	dump_x86_64_register("r15", uc->uc_mcontext.gregs[REG_R15]);
-	dump_x86_64_register("rip", uc->uc_mcontext.gregs[REG_RIP]);
-	dump_x86_64_register("eflags", uc->uc_mcontext.gregs[REG_EFL]);
-	dump_x86_64_register("cs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 0) & 0xffff);
-	dump_x86_64_register("gs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 16) & 0xffff);
-	dump_x86_64_register("fs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 32) & 0xffff);
-	dump_x86_64_register("cr2", uc->uc_mcontext.gregs[REG_CR2]);
-	dump_x86_64_register("err", uc->uc_mcontext.gregs[REG_ERR]);
-	dump_x86_64_register("oldmask", uc->uc_mcontext.gregs[REG_OLDMASK]);
-	dump_x86_64_register("trapno", uc->uc_mcontext.gregs[REG_TRAPNO]);
+	dump_register("rax", cinfo->greg.ax);
+	dump_register("rbx", cinfo->greg.bx);
+	dump_register("rcx", cinfo->greg.cx);
+	dump_register("rdx", cinfo->greg.dx);
+	dump_register("rsi", cinfo->greg.si);
+	dump_register("rdi", cinfo->greg.di);
+	dump_register("rsp", cinfo->greg.sp);
+	dump_register("rbp", cinfo->greg.bp);
+	dump_register("r8", cinfo->greg.r8);
+	dump_register("r9", cinfo->greg.r9);
+	dump_register("r10", cinfo->greg.r10);
+	dump_register("r11", cinfo->greg.r11);
+	dump_register("r12", cinfo->greg.r12);
+	dump_register("r13", cinfo->greg.r13);
+	dump_register("r14", cinfo->greg.r14);
+	dump_register("r15", cinfo->greg.r15);
+	dump_register("rip", cinfo->greg.ip);
+	dump_register("eflags", cinfo->greg.flags);
+	dump_register("cs", cinfo->greg.cs);
+	dump_register("gs", cinfo->greg.gs);
+	dump_register("fs", cinfo->greg.fs);
+	dump_register("cr2", cinfo->greg.cr2);
+	dump_register("err", cinfo->greg.err);
+	dump_register("oldmask", cinfo->greg.oldmask);
+	dump_register("trapno", cinfo->greg.trapno);
 }
-
-#endif /* defined(__linux__) && defined(__amd64) */
+#endif /* TARGET_OS_LINUX */
 
 /** Try to log as much as possible before dumping a core.
  *
@@ -242,6 +241,7 @@ dump_x86_64_registers(ucontext_t *uc)
 static void
 sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 {
+	struct errstat_crash *cinfo = &errstat_get()->crash_info;
 	static volatile sig_atomic_t in_cb = 0;
 	int fd = STDERR_FILENO;
 	struct sigaction sa;
@@ -253,6 +253,10 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 	}
 
 	in_cb = 1;
+	/*
+	 * Notify errstat engine about the crash.
+	 */
+	errstat_collect_crash(signo, siginfo, context);
 
 	if (signo == SIGSEGV) {
 		fdprintf(fd, "Segmentation fault\n");
@@ -279,8 +283,8 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 	fprintf(stderr, "  context: %p\n", context);
 	fprintf(stderr, "  siginfo: %p\n", siginfo);
 
-#if defined(__linux__) && defined(__amd64)
-	dump_x86_64_registers((ucontext_t *)context);
+#ifdef TARGET_OS_LINUX
+	dump_registers(cinfo);
 #endif
 
 	fdprintf(fd, "Current time: %u\n", (unsigned) time(0));
@@ -290,8 +294,14 @@ sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
 #ifdef ENABLE_BACKTRACE
 	fdprintf(fd, "Attempting backtrace... Note: since the server has "
 		 "already crashed, \nthis may fail as well\n");
-	print_backtrace();
+	fdprintf(STDERR_FILENO, "%s", cinfo->backtrace_buf);
 #endif
+	/*
+	 * If sending crash report to the feedback server is
+	 * allowed we won't be generating local core dump but
+	 * rather try to send data and exit.
+	 */
+	errstat_exec_send_crash();
 end:
 	/* Try to dump core. */
 	memset(&sa, 0, sizeof(sa));
@@ -815,6 +825,7 @@ main(int argc, char **argv)
 		title_set_script_name(argv[0]);
 	}
 
+	errstat_init(tarantool_bin);
 	export_syms();
 
 	random_init();
-- 
2.26.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 4/4] cfg: allow to configure crash report sending
  2020-12-02 15:18 [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
@ 2020-12-02 15:18 ` Cyrill Gorcunov
  2020-12-04 15:29 ` [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
  4 siblings, 0 replies; 6+ messages in thread
From: Cyrill Gorcunov @ 2020-12-02 15:18 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Introcude a new option to box.cfg{} to control
sending of crash report.

There is no simple way to test all this because
it involves subsequent runs of tarantool. I've
been using one terminal with tarantool running
inside as

> box.cfg{feedback_host="127.0.0.1:1500", feedback_crash=true}

another terminal to listen for incoming data

> while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done

and one another to kill main tarantool instance

> kill -11 `pidof tarantool`

Closes #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Send crash dump to feedback server

The default feedback server gathers only small portion of usage
statistics. Setting up option `feedback_crash=true` in box
configuration allows to send detailed information in case of
program crash.

This information includes:

 - utsname information (similar to `uname -a` output except
   the network name)
 - build information
 - reason for a crash
 - call backtrace

By default the option `feedback_crash` is disabled.
---
 src/box/box.cc           | 2 ++
 src/box/lua/load_cfg.lua | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/src/box/box.cc b/src/box/box.cc
index 4070cbeab..f6e515af8 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -79,6 +79,7 @@
 #include "sql_stmt_cache.h"
 #include "msgpack.h"
 #include "raft.h"
+#include "errstat.h"
 #include "trivia/util.h"
 
 static char status[64] = "unknown";
@@ -2665,6 +2666,7 @@ box_cfg_xc(void)
 	port_init();
 	iproto_init();
 	sql_init();
+	box_errstat_cfg();
 
 	int64_t wal_max_size = box_check_wal_max_size(cfg_geti64("wal_max_size"));
 	enum wal_mode wal_mode = box_check_wal_mode(cfg_gets("wal_mode"));
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 76e2e92c2..d0591d12a 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -99,6 +99,7 @@ local default_cfg = {
     replication_skip_conflict = false,
     replication_anon      = false,
     feedback_enabled      = true,
+    feedback_crash        = false,
     feedback_host         = "https://feedback.tarantool.io",
     feedback_interval     = 3600,
     net_msg_max           = 768,
@@ -179,6 +180,7 @@ local template_cfg = {
     replication_skip_conflict = 'boolean',
     replication_anon      = 'boolean',
     feedback_enabled      = ifdef_feedback('boolean'),
+    feedback_crash        = ifdef_feedback('boolean'),
     feedback_host         = ifdef_feedback('string'),
     feedback_interval     = ifdef_feedback('number'),
     net_msg_max           = 'number',
@@ -277,6 +279,7 @@ local dynamic_cfg = {
     checkpoint_wal_threshold = private.cfg_set_checkpoint_wal_threshold,
     worker_pool_threads     = private.cfg_set_worker_pool_threads,
     feedback_enabled        = ifdef_feedback_set_params,
+    feedback_crash          = ifdef_feedback_set_params,
     feedback_host           = ifdef_feedback_set_params,
     feedback_interval       = ifdef_feedback_set_params,
     -- do nothing, affects new replicas, which query this value on start
-- 
2.26.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback
  2020-12-02 15:18 [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  2020-12-02 15:18 ` [Tarantool-patches] [PATCH 4/4] cfg: allow to configure crash report sending Cyrill Gorcunov
@ 2020-12-04 15:29 ` Cyrill Gorcunov
  4 siblings, 0 replies; 6+ messages in thread
From: Cyrill Gorcunov @ 2020-12-04 15:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

On Wed, Dec 02, 2020 at 06:18:38PM +0300, Cyrill Gorcunov wrote:
> Our feedback daemon sends only a few portions of usage
> statistics. But crash dumps are pretty important for us
> as well, because real users may catch a way more important
> issues than our testing farm, it is simply impossible to
> cover all possible scenarios.

Drop this series, I'll send v3.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-04 15:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02 15:18 [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov
2020-12-02 15:18 ` [Tarantool-patches] [PATCH 1/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
2020-12-02 15:18 ` [Tarantool-patches] [PATCH 2/4] errstat: add crash report base code Cyrill Gorcunov
2020-12-02 15:18 ` [Tarantool-patches] [PATCH 3/4] crash: use errstat code in fatal signals Cyrill Gorcunov
2020-12-02 15:18 ` [Tarantool-patches] [PATCH 4/4] cfg: allow to configure crash report sending Cyrill Gorcunov
2020-12-04 15:29 ` [Tarantool-patches] [PATCH 0/4] crash dump: implement sending feedback Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox