* [Tarantool-patches] [PATCH v4 0/4] crash: implement sending feedback
@ 2020-12-10 16:18 Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper Cyrill Gorcunov
` (3 more replies)
0 siblings, 4 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-10 16:18 UTC (permalink / raw)
To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy
Our feedback daemon sends only a few portions of usage
statistics. But crash dumps are pretty important for us
as well, because real users may catch a way more important
issues than our testing farm, it is simply impossible to
cover all possible scenarios.
For this sake, if crash happens we can send it to our
feedback server.
In this series we implement only base functionality and
may extend it later.
I didn't find yet a simple way to test this code anything
but manually.
Any comments are highly appreciated.
v2:
- left for internal use
v3 (by Mons):
- enable sending crash report by default but give
an ability to disable it
- use vfork when running subsequent exeutable of
tarantool to send report, since plain execve won't
produce local crashdump
v4 (by Vlad):
- move all fatal's signal handling to crash.c
- export only those helpers which are really needed
- update backtrace to use size as an arument
- provide strlcpy if not present on the system
- use static allocated buffer for report
- use base64 library instead of own implmentation
issue https://github.com/tarantool/tarantool/issues/5261
branch gorcunov/gh-5261-crash-report-4
Cyrill Gorcunov (4):
util: introduce strlcpy helper
backtrace: allow to specify destination buffer
crash: move fatal signal handling in
crash: report crash data to the feedback server
CMakeLists.txt | 1 +
src/box/lua/feedback_daemon.lua | 7 +
src/box/lua/load_cfg.lua | 3 +
src/exports.h | 1 +
src/lib/core/CMakeLists.txt | 3 +-
src/lib/core/backtrace.cc | 12 +-
src/lib/core/backtrace.h | 3 +
src/lib/core/crash.c | 554 ++++++++++++++++++++++++++++++++
src/lib/core/crash.h | 44 +++
src/lib/core/util.c | 14 +
src/main.cc | 139 +-------
src/trivia/config.h.cmake | 5 +
src/trivia/util.h | 15 +
test/box/admin.result | 2 +
test/box/cfg.result | 4 +
15 files changed, 667 insertions(+), 140 deletions(-)
create mode 100644 src/lib/core/crash.c
create mode 100644 src/lib/core/crash.h
base-commit: a14dad9543360c49c2cf8906a88e09271cad91b6
--
2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-10 16:18 [Tarantool-patches] [PATCH v4 0/4] crash: implement sending feedback Cyrill Gorcunov
@ 2020-12-10 16:18 ` Cyrill Gorcunov
2020-12-11 7:34 ` Serge Petrenko
2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 2/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
` (2 subsequent siblings)
3 siblings, 2 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-10 16:18 UTC (permalink / raw)
To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy
Very convenient to have this string extension.
We will use it in crash handling.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
CMakeLists.txt | 1 +
src/lib/core/util.c | 14 ++++++++++++++
src/trivia/config.h.cmake | 5 +++++
src/trivia/util.h | 15 +++++++++++++++
4 files changed, 35 insertions(+)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index fa6818f8e..aa23113b3 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -126,6 +126,7 @@ set(CMAKE_REQUIRED_LIBRARIES "")
check_function_exists(clock_gettime HAVE_CLOCK_GETTIME_WITHOUT_RT)
check_symbol_exists(__get_cpuid cpuid.h HAVE_CPUID)
+check_symbol_exists(strlcpy string.h HAVE_STRLCPY)
# Checks for libev
include(CheckStructHasMember)
diff --git a/src/lib/core/util.c b/src/lib/core/util.c
index dfce317f0..7b168a177 100644
--- a/src/lib/core/util.c
+++ b/src/lib/core/util.c
@@ -250,6 +250,20 @@ int2str(long long int val)
return buf;
}
+#ifndef HAVE_STRLCPY
+size_t
+strlcpy(char *dst, const char *src, size_t size)
+{
+ size_t src_len = strlen(src);
+ if (size) {
+ size_t len = (src_len >= size) ? size - 1 : src_len;
+ memcpy(dst, src, len);
+ dst[len] = '\0';
+ }
+ return src_len;
+}
+#endif
+
int
utf8_check_printable(const char *start, size_t length)
{
diff --git a/src/trivia/config.h.cmake b/src/trivia/config.h.cmake
index 89e0d39c6..21efb978e 100644
--- a/src/trivia/config.h.cmake
+++ b/src/trivia/config.h.cmake
@@ -47,6 +47,11 @@
*/
#cmakedefine HAVE_CPUID 1
+/**
+ * Defined if strlcpy() string extension helper present.
+ */
+ #cmakedefine HAVE_STRLCPY 1
+
/*
* Defined if gcov instrumentation should be enabled.
*/
diff --git a/src/trivia/util.h b/src/trivia/util.h
index da5a3705e..8dcc48ada 100644
--- a/src/trivia/util.h
+++ b/src/trivia/util.h
@@ -450,6 +450,21 @@ fpconv_strtod(const char *nptr, char **endptr)
return strtod(nptr, endptr);
}
+#ifndef HAVE_STRLCPY
+/**
+ * Copy string. Unlike \a strncpy the result string
+ * is always null-terminated.
+ *
+ * @param dst destination buffer.
+ * @param src source string.
+ * @param size destination buffer size.
+ *
+ * @return size of \a src string.
+ */
+size_t
+strlcpy(char *dst, const char *src, size_t size);
+#endif
+
/**
* Check that @a str is valid utf-8 sequence and can be printed
* unescaped.
--
2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Tarantool-patches] [PATCH v4 2/4] backtrace: allow to specify destination buffer
2020-12-10 16:18 [Tarantool-patches] [PATCH v4 0/4] crash: implement sending feedback Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper Cyrill Gorcunov
@ 2020-12-10 16:18 ` Cyrill Gorcunov
2020-12-11 7:50 ` Serge Petrenko
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server Cyrill Gorcunov
3 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-10 16:18 UTC (permalink / raw)
To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy
This will allow to reuse this routine in crash reports.
Part-of #5261
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/lib/core/backtrace.cc | 12 ++++++------
src/lib/core/backtrace.h | 3 +++
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/src/lib/core/backtrace.cc b/src/lib/core/backtrace.cc
index 456ce9a4d..b4048089f 100644
--- a/src/lib/core/backtrace.cc
+++ b/src/lib/core/backtrace.cc
@@ -131,7 +131,7 @@ get_proc_name(unw_cursor_t *unw_cur, unw_word_t *offset, bool skip_cache)
}
char *
-backtrace(void)
+backtrace(char *start, size_t size)
{
int frame_no = 0;
unw_word_t sp = 0, old_sp = 0, ip, offset;
@@ -139,10 +139,9 @@ backtrace(void)
unw_getcontext(&unw_context);
unw_cursor_t unw_cur;
unw_init_local(&unw_cur, &unw_context);
- char *backtrace_buf = (char *)static_alloc(SMALL_STATIC_SIZE);
- char *p = backtrace_buf;
- char *end = p + SMALL_STATIC_SIZE - 1;
int unw_status;
+ char *p = start;
+ char *end = start + size - 1;
*p = '\0';
while ((unw_status = unw_step(&unw_cur)) > 0) {
const char *proc;
@@ -174,7 +173,7 @@ backtrace(void)
say_debug("unwinding error: %i", unw_status);
#endif
out:
- return backtrace_buf;
+ return start;
}
/*
@@ -436,7 +435,8 @@ backtrace_foreach(backtrace_cb cb, coro_context *coro_ctx, void *cb_ctx)
void
print_backtrace(void)
{
- fdprintf(STDERR_FILENO, "%s", backtrace());
+ char *start = (char *)static_alloc(SMALL_STATIC_SIZE);
+ fdprintf(STDERR_FILENO, "%s", backtrace(start, SMALL_STATIC_SIZE));
}
#endif /* ENABLE_BACKTRACE */
diff --git a/src/lib/core/backtrace.h b/src/lib/core/backtrace.h
index c119d5402..e0ae56be4 100644
--- a/src/lib/core/backtrace.h
+++ b/src/lib/core/backtrace.h
@@ -40,6 +40,9 @@ extern "C" {
#ifdef ENABLE_BACKTRACE
#include <coro.h>
+char *
+backtrace(char *start, size_t size);
+
void print_backtrace(void);
typedef int (backtrace_cb)(int frameno, void *frameret,
--
2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-10 16:18 [Tarantool-patches] [PATCH v4 0/4] crash: implement sending feedback Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 2/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
@ 2020-12-10 16:18 ` Cyrill Gorcunov
2020-12-11 9:31 ` Serge Petrenko
` (2 more replies)
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server Cyrill Gorcunov
3 siblings, 3 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-10 16:18 UTC (permalink / raw)
To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy
When SIGSEGV or SIGFPE reaches the tarantool we try to gather
all information related to the crash and print it out to the
console (well, stderr actually). Still there is a request
to not just show this info locally but send it out to the
feedback server.
Thus to keep gathering crash related information in one module,
we move fatal signal handling into the separate crash.c file.
This allows us to collect the data we need in one place and
reuse it when we need to send reports to stderr (and to the
feedback server, which will be implemented in next patch).
Part-of #5261
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/lib/core/CMakeLists.txt | 1 +
src/lib/core/crash.c | 291 ++++++++++++++++++++++++++++++++++++
src/lib/core/crash.h | 32 ++++
src/main.cc | 138 +----------------
4 files changed, 329 insertions(+), 133 deletions(-)
create mode 100644 src/lib/core/crash.c
create mode 100644 src/lib/core/crash.h
diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt
index 13ed1e7ab..06b2b91e1 100644
--- a/src/lib/core/CMakeLists.txt
+++ b/src/lib/core/CMakeLists.txt
@@ -1,5 +1,6 @@
set(core_sources
diag.c
+ crash.c
say.c
memory.c
clock.c
diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
new file mode 100644
index 000000000..9572a023c
--- /dev/null
+++ b/src/lib/core/crash.c
@@ -0,0 +1,291 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#include <string.h>
+#include <unistd.h>
+#include <time.h>
+
+#include "trivia/util.h"
+#include "backtrace.h"
+#include "crash.h"
+#include "say.h"
+
+#define pr_fmt(fmt) "crash: " fmt
+#define pr_syserr(fmt, ...) say_syserror(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_panic(fmt, ...) panic(pr_fmt(fmt), ##__VA_ARGS__)
+
+#ifdef TARGET_OS_LINUX
+#ifndef __x86_64__
+# error "Non x86-64 architectures are not supported"
+#endif
+struct crash_greg {
+ uint64_t r8;
+ uint64_t r9;
+ uint64_t r10;
+ uint64_t r11;
+ uint64_t r12;
+ uint64_t r13;
+ uint64_t r14;
+ uint64_t r15;
+ uint64_t di;
+ uint64_t si;
+ uint64_t bp;
+ uint64_t bx;
+ uint64_t dx;
+ uint64_t ax;
+ uint64_t cx;
+ uint64_t sp;
+ uint64_t ip;
+ uint64_t flags;
+ uint16_t cs;
+ uint16_t gs;
+ uint16_t fs;
+ uint16_t ss;
+ uint64_t err;
+ uint64_t trapno;
+ uint64_t oldmask;
+ uint64_t cr2;
+ uint64_t fpstate;
+ uint64_t reserved1[8];
+};
+#endif /* TARGET_OS_LINUX */
+
+static struct crash_info {
+ /**
+ * These two are mostly useless as being
+ * plain addresses but keep for backward
+ * compatibility.
+ */
+ void *context_addr;
+ void *siginfo_addr;
+#ifdef TARGET_OS_LINUX
+ /**
+ * Registers contents.
+ */
+ struct crash_greg greg;
+#endif
+ /**
+ * Faulting address.
+ */
+ void *siaddr;
+ /**
+ * Crash signal number.
+ */
+ int signo;
+ /**
+ * Crash signal code.
+ */
+ int sicode;
+#ifdef ENABLE_BACKTRACE
+ /*
+ * 4K of memory should be enough to keep the backtrace.
+ * In worst case it gonna be simply trimmed.
+ */
+ char backtrace_buf[4096];
+#endif
+} crash_info;
+
+/**
+ * The routine is called inside crash signal handler so
+ * be careful to not cause additional signals inside.
+ */
+static struct crash_info *
+crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
+{
+ struct crash_info *cinfo = &crash_info;
+
+ cinfo->signo = signo;
+ cinfo->sicode = siginfo->si_code;
+ cinfo->siaddr = siginfo->si_addr;
+ cinfo->context_addr = ucontext;
+ cinfo->siginfo_addr = siginfo;
+
+#ifdef ENABLE_BACKTRACE
+ char *start = cinfo->backtrace_buf;
+ backtrace(start, sizeof(cinfo->backtrace_buf));
+#endif
+
+#ifdef TARGET_OS_LINUX
+ /*
+ * uc_mcontext on libc level looks somehow strange,
+ * they define an array of uint64_t where each register
+ * defined by REG_x macro.
+ *
+ * In turn the kernel is quite explicit about the context.
+ * Moreover it is a part of user ABI, thus won't be changed.
+ *
+ * Lets use memcpy here to make a copy in a fast way.
+ */
+ ucontext_t *uc = ucontext;
+ memcpy(&cinfo->greg, &uc->uc_mcontext, sizeof(cinfo->greg));
+#endif
+
+ return cinfo;
+}
+
+/**
+ * Report crash information to the stderr
+ * (usually a current console).
+ */
+static void
+crash_report_stderr(struct crash_info *cinfo)
+{
+ if (cinfo->signo == SIGSEGV) {
+ fprintf(stderr, "Segmentation fault\n");
+ const char *signal_code_repr = 0;
+
+ switch (cinfo->sicode) {
+ case SEGV_MAPERR:
+ signal_code_repr = "SEGV_MAPERR";
+ break;
+ case SEGV_ACCERR:
+ signal_code_repr = "SEGV_ACCERR";
+ break;
+ }
+
+ if (signal_code_repr)
+ fprintf(stderr, " code: %s\n", signal_code_repr);
+ else
+ fprintf(stderr, " code: %d\n", cinfo->sicode);
+ /*
+ * fprintf is used instead of fdprintf, because
+ * fdprintf does not understand %p
+ */
+ fprintf(stderr, " addr: %p\n", cinfo->siaddr);
+ } else {
+ fprintf(stderr, "Got a fatal signal %d\n", cinfo->signo);
+ }
+
+ fprintf(stderr, " context: %p\n", cinfo->context_addr);
+ fprintf(stderr, " siginfo: %p\n", cinfo->siginfo_addr);
+
+#ifdef TARGET_OS_LINUX
+# define fprintf_reg(__n, __v) \
+ fprintf(stderr, " %-9s0x%-17llx%lld\n", \
+ __n, (long long)__v, (long long)__v)
+ fprintf_reg("rax", cinfo->greg.ax);
+ fprintf_reg("rbx", cinfo->greg.bx);
+ fprintf_reg("rcx", cinfo->greg.cx);
+ fprintf_reg("rdx", cinfo->greg.dx);
+ fprintf_reg("rsi", cinfo->greg.si);
+ fprintf_reg("rdi", cinfo->greg.di);
+ fprintf_reg("rsp", cinfo->greg.sp);
+ fprintf_reg("rbp", cinfo->greg.bp);
+ fprintf_reg("r8", cinfo->greg.r8);
+ fprintf_reg("r9", cinfo->greg.r9);
+ fprintf_reg("r10", cinfo->greg.r10);
+ fprintf_reg("r11", cinfo->greg.r11);
+ fprintf_reg("r12", cinfo->greg.r12);
+ fprintf_reg("r13", cinfo->greg.r13);
+ fprintf_reg("r14", cinfo->greg.r14);
+ fprintf_reg("r15", cinfo->greg.r15);
+ fprintf_reg("rip", cinfo->greg.ip);
+ fprintf_reg("eflags", cinfo->greg.flags);
+ fprintf_reg("cs", cinfo->greg.cs);
+ fprintf_reg("gs", cinfo->greg.gs);
+ fprintf_reg("fs", cinfo->greg.fs);
+ fprintf_reg("cr2", cinfo->greg.cr2);
+ fprintf_reg("err", cinfo->greg.err);
+ fprintf_reg("oldmask", cinfo->greg.oldmask);
+ fprintf_reg("trapno", cinfo->greg.trapno);
+# undef fprintf_reg
+#endif /* TARGET_OS_LINUX */
+
+ fprintf(stderr, "Current time: %u\n", (unsigned)time(0));
+ fprintf(stderr, "Please file a bug at "
+ "http://github.com/tarantool/tarantool/issues\n");
+
+#ifdef ENABLE_BACKTRACE
+ fprintf(stderr, "Attempting backtrace... Note: since the server has "
+ "already crashed, \nthis may fail as well\n");
+ fprintf(stderr, "%s", cinfo->backtrace_buf);
+#endif
+}
+
+/**
+ * Handle fatal (crashing) signal.
+ *
+ * Try to log as much as possible before dumping a core.
+ *
+ * Core files are not always allowed and it takes an effort to
+ * extract useful information from them.
+ *
+ * *Recursive invocation*
+ *
+ * Unless SIGSEGV is sent by kill(), Linux resets the signal
+ * a default value before invoking the handler.
+ *
+ * Despite that, as an extra precaution to avoid infinite
+ * recursion, we count invocations of the handler, and
+ * quietly _exit() when called for a second time.
+ */
+static void
+crash_signal_cb(int signo, siginfo_t *siginfo, void *context)
+{
+ static volatile sig_atomic_t in_cb = 0;
+ struct crash_info *cinfo;
+
+ if (in_cb == 0) {
+ in_cb = 1;
+ cinfo = crash_collect(signo, siginfo, context);
+ crash_report_stderr(cinfo);
+ } else {
+ /* Got a signal while running the handler. */
+ fprintf(stderr, "Fatal %d while backtracing", signo);
+ }
+
+ /* Try to dump a core */
+ struct sigaction sa = {
+ .sa_handler = SIG_DFL,
+ };
+ sigemptyset(&sa.sa_mask);
+ sigaction(SIGABRT, &sa, NULL);
+ abort();
+}
+
+/**
+ * Fatal signals we generate crash on.
+ */
+static const int crash_signals[] = { SIGSEGV, SIGFPE };
+
+void
+crash_signal_reset(void)
+{
+ struct sigaction sa = {
+ .sa_handler = SIG_DFL,
+ };
+ sigemptyset(&sa.sa_mask);
+
+ for (size_t i = 0; i < lengthof(crash_signals); i++) {
+ if (sigaction(crash_signals[i], &sa, NULL) == 0)
+ continue;
+ pr_syserr("reset sigaction %d", crash_signals[i]);
+ }
+}
+
+void
+crash_signal_init(void)
+{
+ /*
+ * SA_RESETHAND resets handler action to the default
+ * one when entering handler.
+ *
+ * SA_NODEFER allows receiving the same signal
+ * during handler.
+ */
+ struct sigaction sa = {
+ .sa_flags = SA_RESETHAND | SA_NODEFER | SA_SIGINFO,
+ .sa_sigaction = crash_signal_cb,
+ };
+ sigemptyset(&sa.sa_mask);
+
+ for (size_t i = 0; i < lengthof(crash_signals); i++) {
+ if (sigaction(crash_signals[i], &sa, NULL) == 0)
+ continue;
+ pr_panic("sigaction %d (%s)", crash_signals[i],
+ strerror(errno));
+ }
+}
diff --git a/src/lib/core/crash.h b/src/lib/core/crash.h
new file mode 100644
index 000000000..d107cd953
--- /dev/null
+++ b/src/lib/core/crash.h
@@ -0,0 +1,32 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+#pragma once
+
+#include <stdint.h>
+#include <signal.h>
+#include <limits.h>
+
+#include "trivia/config.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif /* defined(__cplusplus) */
+
+/**
+ * Initialize crash signal handlers.
+ */
+void
+crash_signal_init(void);
+
+/**
+ * Reset crash signal handlers.
+ */
+void
+crash_signal_reset(void);
+
+#if defined(__cplusplus)
+}
+#endif /* defined(__cplusplus) */
diff --git a/src/main.cc b/src/main.cc
index 2f48f474c..391e0f878 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -79,6 +79,7 @@
#include "systemd.h"
#include "crypto/crypto.h"
#include "core/popen.h"
+#include "core/crash.h"
static pid_t master_pid = getpid();
static struct pidfh *pid_file_handle;
@@ -184,124 +185,6 @@ signal_sigwinch_cb(ev_loop *loop, struct ev_signal *w, int revents)
rl_resize_terminal();
}
-#if defined(__linux__) && defined(__amd64)
-
-inline void
-dump_x86_64_register(const char *reg_name, unsigned long long val)
-{
- fprintf(stderr, " %-9s0x%-17llx%lld\n", reg_name, val, val);
-}
-
-void
-dump_x86_64_registers(ucontext_t *uc)
-{
- dump_x86_64_register("rax", uc->uc_mcontext.gregs[REG_RAX]);
- dump_x86_64_register("rbx", uc->uc_mcontext.gregs[REG_RBX]);
- dump_x86_64_register("rcx", uc->uc_mcontext.gregs[REG_RCX]);
- dump_x86_64_register("rdx", uc->uc_mcontext.gregs[REG_RDX]);
- dump_x86_64_register("rsi", uc->uc_mcontext.gregs[REG_RSI]);
- dump_x86_64_register("rdi", uc->uc_mcontext.gregs[REG_RDI]);
- dump_x86_64_register("rsp", uc->uc_mcontext.gregs[REG_RSP]);
- dump_x86_64_register("rbp", uc->uc_mcontext.gregs[REG_RBP]);
- dump_x86_64_register("r8", uc->uc_mcontext.gregs[REG_R8]);
- dump_x86_64_register("r9", uc->uc_mcontext.gregs[REG_R9]);
- dump_x86_64_register("r10", uc->uc_mcontext.gregs[REG_R10]);
- dump_x86_64_register("r11", uc->uc_mcontext.gregs[REG_R11]);
- dump_x86_64_register("r12", uc->uc_mcontext.gregs[REG_R12]);
- dump_x86_64_register("r13", uc->uc_mcontext.gregs[REG_R13]);
- dump_x86_64_register("r14", uc->uc_mcontext.gregs[REG_R14]);
- dump_x86_64_register("r15", uc->uc_mcontext.gregs[REG_R15]);
- dump_x86_64_register("rip", uc->uc_mcontext.gregs[REG_RIP]);
- dump_x86_64_register("eflags", uc->uc_mcontext.gregs[REG_EFL]);
- dump_x86_64_register("cs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 0) & 0xffff);
- dump_x86_64_register("gs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 16) & 0xffff);
- dump_x86_64_register("fs", (uc->uc_mcontext.gregs[REG_CSGSFS] >> 32) & 0xffff);
- dump_x86_64_register("cr2", uc->uc_mcontext.gregs[REG_CR2]);
- dump_x86_64_register("err", uc->uc_mcontext.gregs[REG_ERR]);
- dump_x86_64_register("oldmask", uc->uc_mcontext.gregs[REG_OLDMASK]);
- dump_x86_64_register("trapno", uc->uc_mcontext.gregs[REG_TRAPNO]);
-}
-
-#endif /* defined(__linux__) && defined(__amd64) */
-
-/** Try to log as much as possible before dumping a core.
- *
- * Core files are not aways allowed and it takes an effort to
- * extract useful information from them.
- *
- * *Recursive invocation*
- *
- * Unless SIGSEGV is sent by kill(), Linux
- * resets the signal a default value before invoking
- * the handler.
- *
- * Despite that, as an extra precaution to avoid infinite
- * recursion, we count invocations of the handler, and
- * quietly _exit() when called for a second time.
- */
-static void
-sig_fatal_cb(int signo, siginfo_t *siginfo, void *context)
-{
- static volatile sig_atomic_t in_cb = 0;
- int fd = STDERR_FILENO;
- struct sigaction sa;
-
- /* Got a signal while running the handler. */
- if (in_cb) {
- fdprintf(fd, "Fatal %d while backtracing", signo);
- goto end;
- }
-
- in_cb = 1;
-
- if (signo == SIGSEGV) {
- fdprintf(fd, "Segmentation fault\n");
- const char *signal_code_repr = 0;
- switch (siginfo->si_code) {
- case SEGV_MAPERR:
- signal_code_repr = "SEGV_MAPERR";
- break;
- case SEGV_ACCERR:
- signal_code_repr = "SEGV_ACCERR";
- break;
- }
- if (signal_code_repr)
- fdprintf(fd, " code: %s\n", signal_code_repr);
- else
- fdprintf(fd, " code: %d\n", siginfo->si_code);
- /*
- * fprintf is used insted of fdprintf, because
- * fdprintf does not understand %p
- */
- fprintf(stderr, " addr: %p\n", siginfo->si_addr);
- } else
- fdprintf(fd, "Got a fatal signal %d\n", signo);
- fprintf(stderr, " context: %p\n", context);
- fprintf(stderr, " siginfo: %p\n", siginfo);
-
-#if defined(__linux__) && defined(__amd64)
- dump_x86_64_registers((ucontext_t *)context);
-#endif
-
- fdprintf(fd, "Current time: %u\n", (unsigned) time(0));
- fdprintf(fd,
- "Please file a bug at http://github.com/tarantool/tarantool/issues\n");
-
-#ifdef ENABLE_BACKTRACE
- fdprintf(fd, "Attempting backtrace... Note: since the server has "
- "already crashed, \nthis may fail as well\n");
- print_backtrace();
-#endif
-end:
- /* Try to dump core. */
- memset(&sa, 0, sizeof(sa));
- sigemptyset(&sa.sa_mask);
- sa.sa_handler = SIG_DFL;
- sigaction(SIGABRT, &sa, NULL);
-
- abort();
-}
-
static void
signal_free(void)
{
@@ -328,11 +211,11 @@ signal_reset(void)
sigaction(SIGINT, &sa, NULL) == -1 ||
sigaction(SIGTERM, &sa, NULL) == -1 ||
sigaction(SIGHUP, &sa, NULL) == -1 ||
- sigaction(SIGWINCH, &sa, NULL) == -1 ||
- sigaction(SIGSEGV, &sa, NULL) == -1 ||
- sigaction(SIGFPE, &sa, NULL) == -1)
+ sigaction(SIGWINCH, &sa, NULL) == -1)
say_syserror("sigaction");
+ crash_signal_reset();
+
/* Unblock any signals blocked by libev. */
sigset_t sigset;
sigfillset(&sigset);
@@ -362,18 +245,7 @@ signal_init(void)
if (sigaction(SIGPIPE, &sa, 0) == -1)
panic_syserror("sigaction");
- /*
- * SA_RESETHAND resets handler action to the default
- * one when entering handler.
- * SA_NODEFER allows receiving the same signal during handler.
- */
- sa.sa_flags = SA_RESETHAND | SA_NODEFER | SA_SIGINFO;
- sa.sa_sigaction = sig_fatal_cb;
-
- if (sigaction(SIGSEGV, &sa, 0) == -1 ||
- sigaction(SIGFPE, &sa, 0) == -1) {
- panic_syserror("sigaction");
- }
+ crash_signal_init();
ev_signal_init(&ev_sigs[0], sig_checkpoint, SIGUSR1);
ev_signal_init(&ev_sigs[1], signal_cb, SIGINT);
--
2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-10 16:18 [Tarantool-patches] [PATCH v4 0/4] crash: implement sending feedback Cyrill Gorcunov
` (2 preceding siblings ...)
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in Cyrill Gorcunov
@ 2020-12-10 16:18 ` Cyrill Gorcunov
2020-12-11 12:57 ` Serge Petrenko
2020-12-14 22:54 ` Vladislav Shpilevoy
3 siblings, 2 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-10 16:18 UTC (permalink / raw)
To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy
We have a feedback server which gathers information about a running instnace.
While general info is enough for now we may loose a precious information about
crashes (such as call backtrace which caused the issue, type of build and etc).
In the commit we add support of sending this kind of infomation to the feedback
server. Internally we gather the reason of failure, pack it into base64 form
and then run another tarantool instance which sends it out.
A typical report might look like
| {
| "crashdump": {
| "version": "1",
| "data": "eyJ1bmFtZSI6eyJzeXNuYW1lIjoiTGludXgiLCJyZWxlYXNlIjoiNS"
| "45LjExLTEwMC5mYzMyLng4Nl82NCIsInZlcnNpb24iOiIjMSBTTVAg"
| "VHVlIE5vdiAyNCAxOToxNjo1MyBVVEMgMjAyMCIsIm1hY2hpbmUiOi"
| "J4ODZfNjQifSwiYnVpbGQiOnsidmVyc2lvbiI6IjIuNy4wLTgzLWc5"
| "ZTg1MDM4ZmIiLCJjbWFrZV90eXBlIjoiTGludXgteDg2XzY0LURlYn"
| "VnIn0sInNpZ25hbCI6eyJzaWdubyI6MTEsInNpX2NvZGUiOjAsInNp"
| "X2FkZHIiOiIweDNlODAwMDVjOGE5IiwiYmFja3RyYWNlIjoiSXpBZ0"
| "lEQjROak14WkRBeklHbHVJR055WVhOb1gyTnZiR3hsWTNRclltWUtJ"
| "ekVnSURCNE5qTXlaV1JoSUdsdUlHTnlZWE5vWDNOcFoyNWhiRjlqWW"
| "lzMU1Rb2pNaUFnTUhnM1pqRmtORGt5TXpsaE9UQWdhVzRnWm5WdWJH"
| "OWphMlpwYkdVck5qQUtJek1nSURCNE4yWXhaRFE0WkdReFl6VmxJR2"
| "x1SUdWd2IyeHNYM2RoYVhRck5XVUtJelFnSURCNE9ETTNOekV6SUds"
| "dUlHVndiMnhzWDNCdmJHd3JPVGtLSXpVZ0lEQjRPRE5oWkRRNElHbH"
| "VJR1YyWDNKMWJpczBPRGNLSXpZZ0lEQjROakJtTUdGbElHbHVJSFJo"
| "Y21GdWRHOXZiRjlzZFdGZmNuVnVYM05qY21sd2RDc3hNemdLSXpjZ0"
| "lEQjRORFEyTldZMUlHbHVJRzFoYVc0ck5XRmpDaU00SUNBd2VEZG1N"
| "V1EwT0dObU56QTBNaUJwYmlCZlgyeHBZbU5mYzNSaGNuUmZiV0ZwYm"
| "l0bU1nb2pPU0FnTUhnME5EUmhNR1VnYVc0Z1gzTjBZWEowS3pKbENn"
| "PT0iLCJ0aW1lc3RhbXAiOiIyMDIwLTEyLTEwIDE3OjA4OjQyIE1TSyJ9fQ=="
| }
| }
The `data` value is a single string I wrapped it for commit message only.
When `data` is decoded it consists of
| {
| "uname": {
| "sysname": "Linux",
| "release": "5.9.11-100.fc32.x86_64",
| "version": "#1 SMP Tue Nov 24 19:16:53 UTC 2020",
| "machine": "x86_64"
| },
| "build": {
| "version": "2.7.0-83-g9e85038fb",
| "cmake_type": "Linux-x86_64-Debug"
| },
| "signal": {
| "signo": 11,
| "si_code": 0,
| "si_addr": "0x3e80005c8a9",
| "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
| "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
| "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
| "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
| "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
| "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
| "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
| "MGUgaW4gX3N0YXJ0KzJlCg==",
| "timestamp": "2020-12-10 17:08:42 MSK"
| }
| }
The `backtrace` istself is encoded as base64 because of newline symbols
(and may comprise some nonascii symbols as well).
There is no simple way to test this so I did it manually:
1) Run instance with
box.cfg{log_level = 8, feedback_host="127.0.0.1:1500"}
2) Run listener shell as
while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done
3) Send SIGSEGV
kill -11 `pidof tarantool`
Once SIGSEGV is delivered the crashinfo data is generated and sent out. For
debug purpose this data is also printed to the terminal on debug log level.
Closes #5261
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
@TarantoolBot document
Title: Configuration update, allow to disable sending crash information
For better analisys of program crashes the information associated with
the crash such as
- utsname (similar to `uname -a` output except the network name)
- build information
- reason for a crash
- call backtrace
is sent to the feedback server. To disable it set `feedback_no_crashinfo`
to `false`.
---
src/box/lua/feedback_daemon.lua | 7 +
src/box/lua/load_cfg.lua | 3 +
src/exports.h | 1 +
src/lib/core/CMakeLists.txt | 2 +-
src/lib/core/crash.c | 263 ++++++++++++++++++++++++++++++++
src/lib/core/crash.h | 12 ++
src/main.cc | 1 +
test/box/admin.result | 2 +
test/box/cfg.result | 4 +
9 files changed, 294 insertions(+), 1 deletion(-)
diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 1d39012ed..ae4d1a25f 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -1,5 +1,6 @@
-- feedback_daemon.lua (internal file)
--
+local ffi = require('ffi')
local log = require('log')
local json = require('json')
local fiber = require('fiber')
@@ -360,12 +361,18 @@ local function reload(self)
self:start()
end
+ffi.cdef[[
+void
+crash_cfg(void);
+]]
+
setmetatable(daemon, {
__index = {
set_feedback_params = function()
daemon.enabled = box.cfg.feedback_enabled
daemon.host = box.cfg.feedback_host
daemon.interval = box.cfg.feedback_interval
+ ffi.C.crash_cfg()
reload(daemon)
return
end,
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 770442052..1878c31bd 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -99,6 +99,7 @@ local default_cfg = {
replication_skip_conflict = false,
replication_anon = false,
feedback_enabled = true,
+ feedback_no_crashinfo = false,
feedback_host = "https://feedback.tarantool.io",
feedback_interval = 3600,
net_msg_max = 768,
@@ -179,6 +180,7 @@ local template_cfg = {
replication_skip_conflict = 'boolean',
replication_anon = 'boolean',
feedback_enabled = ifdef_feedback('boolean'),
+ feedback_no_crashinfo = ifdef_feedback('boolean'),
feedback_host = ifdef_feedback('string'),
feedback_interval = ifdef_feedback('number'),
net_msg_max = 'number',
@@ -277,6 +279,7 @@ local dynamic_cfg = {
checkpoint_wal_threshold = private.cfg_set_checkpoint_wal_threshold,
worker_pool_threads = private.cfg_set_worker_pool_threads,
feedback_enabled = ifdef_feedback_set_params,
+ feedback_no_crashinfo = ifdef_feedback_set_params,
feedback_host = ifdef_feedback_set_params,
feedback_interval = ifdef_feedback_set_params,
-- do nothing, affects new replicas, which query this value on start
diff --git a/src/exports.h b/src/exports.h
index 867a027dc..a91edbc1a 100644
--- a/src/exports.h
+++ b/src/exports.h
@@ -113,6 +113,7 @@ EXPORT(coio_wait)
EXPORT(console_get_output_format)
EXPORT(console_set_output_format)
EXPORT(cord_slab_cache)
+EXPORT(crash_cfg)
EXPORT(crc32_calc)
EXPORT(crypto_EVP_MD_CTX_free)
EXPORT(crypto_EVP_MD_CTX_new)
diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt
index 06b2b91e1..d057c855d 100644
--- a/src/lib/core/CMakeLists.txt
+++ b/src/lib/core/CMakeLists.txt
@@ -40,7 +40,7 @@ endif()
add_library(core STATIC ${core_sources})
-target_link_libraries(core salad small uri decNumber ${LIBEV_LIBRARIES}
+target_link_libraries(core salad small uri decNumber misc ${LIBEV_LIBRARIES}
${LIBEIO_LIBRARIES} ${LIBCORO_LIBRARIES}
${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES})
diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
index 9572a023c..06cc240a4 100644
--- a/src/lib/core/crash.c
+++ b/src/lib/core/crash.c
@@ -7,16 +7,28 @@
#include <string.h>
#include <unistd.h>
#include <time.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include "third_party/base64.h"
+#include "small/static.h"
#include "trivia/util.h"
#include "backtrace.h"
#include "crash.h"
+#include "cfg.h"
#include "say.h"
#define pr_fmt(fmt) "crash: " fmt
+#define pr_debug(fmt, ...) say_debug(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...) say_info(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...) say_error(pr_fmt(fmt), ##__VA_ARGS__)
#define pr_syserr(fmt, ...) say_syserror(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...) fprintf(stderr, pr_fmt(fmt) "\n", ##__VA_ARGS__)
#define pr_panic(fmt, ...) panic(pr_fmt(fmt), ##__VA_ARGS__)
+/* Use strlcpy with destination as an array */
+#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
+
#ifdef TARGET_OS_LINUX
#ifndef __x86_64__
# error "Non x86-64 architectures are not supported"
@@ -67,6 +79,10 @@ static struct crash_info {
*/
struct crash_greg greg;
#endif
+ /**
+ * Timestamp in nanoseconds (realtime).
+ */
+ uint64_t timestamp_rt;
/**
* Faulting address.
*/
@@ -88,6 +104,72 @@ static struct crash_info {
#endif
} crash_info;
+static char tarantool_path[PATH_MAX];
+static char feedback_host[2048];
+static bool send_crashinfo = true;
+
+static inline uint64_t
+timespec_to_ns(struct timespec *ts)
+{
+ return (uint64_t)ts->tv_sec * 1000000000 + (uint64_t)ts->tv_nsec;
+}
+
+static char *
+ns_to_localtime(uint64_t timestamp, char *buf, ssize_t len)
+{
+ time_t sec = timestamp / 1000000000;
+ char *start = buf;
+ struct tm tm;
+
+ /*
+ * Use similar format as say_x logger. Except plain
+ * seconds should be enough.
+ */
+ localtime_r(&sec, &tm);
+ ssize_t total = strftime(start, len, "%F %T %Z", &tm);
+ start += total;
+ if (total < len)
+ return buf;
+ buf[len - 1] = '\0';
+ return buf;
+}
+
+void
+crash_init(const char *tarantool_bin)
+{
+ strlcpy_a(tarantool_path, tarantool_bin);
+ if (strlen(tarantool_path) < strlen(tarantool_bin))
+ pr_panic("executable path is trimmed");
+}
+
+void
+crash_cfg(void)
+{
+ const char *host = cfg_gets("feedback_host");
+ bool is_enabled = cfg_getb("feedback_enabled");
+ bool no_crashinfo = cfg_getb("feedback_no_crashinfo");
+
+ if (host == NULL || !is_enabled || no_crashinfo) {
+ if (send_crashinfo) {
+ pr_info("disable sedning crashinfo feedback");
+ send_crashinfo = false;
+ feedback_host[0] = '\0';
+ }
+ return;
+ }
+
+ if (strcmp(feedback_host, host) != 0) {
+ strlcpy_a(feedback_host, host);
+ if (strlen(feedback_host) < strlen(host))
+ pr_panic("feedback_host is too long");
+ }
+
+ if (!send_crashinfo) {
+ pr_info("enable sedning crashinfo feedback");
+ send_crashinfo = true;
+ }
+}
+
/**
* The routine is called inside crash signal handler so
* be careful to not cause additional signals inside.
@@ -96,6 +178,12 @@ static struct crash_info *
crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
{
struct crash_info *cinfo = &crash_info;
+ struct timespec ts;
+
+ if (clock_gettime(CLOCK_REALTIME, &ts) == 0)
+ cinfo->timestamp_rt = timespec_to_ns(&ts);
+ else
+ cinfo->timestamp_rt = 0;
cinfo->signo = signo;
cinfo->sicode = siginfo->si_code;
@@ -126,6 +214,179 @@ crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
return cinfo;
}
+/**
+ * Report crash information to the feedback daemon
+ * (ie send it to feedback daemon).
+ */
+static void
+crash_report_feedback_daemon(struct crash_info *cinfo)
+{
+ /*
+ * Update to a new number if format get changed.
+ */
+ static const int crashinfo_version = 1;
+
+ char *p = static_alloc(SMALL_STATIC_SIZE);
+ char *e = &p[SMALL_STATIC_SIZE - 1];
+ char *head = p;
+ char *tail = &p[SMALL_STATIC_SIZE - 8];
+
+ /*
+ * Note that while we encode the report we
+ * intensively use a tail of the allocated
+ * buffer as a temporary store.
+ */
+
+#define snprintf_safe(fmt, ...) \
+ do { \
+ p += snprintf(p, e - p, fmt, ##__VA_ARGS__); \
+ if (p >= e) \
+ goto out; \
+ } while (0)
+
+ /*
+ * On linux there is new_utsname structure which
+ * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
+ *
+ * So lets just reserve more data in advance: 5 fields
+ * 128 bytes each => 640 bytes.
+ */
+ static_assert(sizeof(struct utsname) <= 640,
+ "utsname is bigger than expected");
+
+ /*
+ * Lets reuse tail of the buffer as a temp space.
+ */
+ struct utsname *uname_ptr = (void *)&tail[-640];
+ if (p > (char *)(void *)uname_ptr)
+ goto out;
+
+ if (uname(uname_ptr) != 0) {
+ pr_syserr("uname call failed, ignore");
+ memset(uname_ptr, 0, 640);
+ }
+
+ snprintf_safe("{");
+ snprintf_safe("\"uname\":{");
+ snprintf_safe("\"sysname\":\"%s\",", uname_ptr->sysname);
+ /*
+ * nodename might a sensitive information, skip.
+ */
+ snprintf_safe("\"release\":\"%s\",", uname_ptr->release);
+ snprintf_safe("\"version\":\"%s\",", uname_ptr->version);
+ snprintf_safe("\"machine\":\"%s\"", uname_ptr->machine);
+ snprintf_safe("},");
+
+ if (p >= (char *)(void *)uname_ptr)
+ goto out;
+
+ snprintf_safe("\"build\":{");
+ snprintf_safe("\"version\":\"%s\",", PACKAGE_VERSION);
+ snprintf_safe("\"cmake_type\":\"%s\"", BUILD_INFO);
+ snprintf_safe("},");
+
+ snprintf_safe("\"signal\":{");
+ snprintf_safe("\"signo\":%d,", cinfo->signo);
+ snprintf_safe("\"si_code\":%d,", cinfo->sicode);
+ if (cinfo->signo == SIGSEGV) {
+ if (cinfo->sicode == SEGV_MAPERR) {
+ snprintf_safe("\"si_code_str\":\"%s\",",
+ "SEGV_MAPERR");
+ } else if (cinfo->sicode == SEGV_ACCERR) {
+ snprintf_safe("\"si_code_str\":\"%s\",",
+ "SEGV_ACCERR");
+ }
+ snprintf_safe("\"si_addr\":\"0x%llx\",",
+ (long long)cinfo->siaddr);
+ }
+
+#ifdef ENABLE_BACKTRACE
+ /*
+ * The backtrace itself is encoded into base64 form
+ * since it might have arbitrary symbols not suitable
+ * for json encoding (newlines and etc).
+ */
+ size_t bt_len = strlen(cinfo->backtrace_buf);
+ size_t bt_elen = base64_bufsize(bt_len, BASE64_NOWRAP);
+ char *bt_base64 = &tail[-bt_elen-8];
+ if (p >= bt_base64)
+ goto out;
+ base64_encode(cinfo->backtrace_buf, bt_len,
+ bt_base64, bt_elen, BASE64_NOWRAP);
+ bt_base64[bt_elen] = '\0';
+ snprintf_safe("\"backtrace\":\"%s\",", bt_base64);
+#endif
+
+ char *timestamp_rt_str = &tail[-128];
+ if (p >= timestamp_rt_str)
+ goto out;
+ ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128);
+ snprintf_safe("\"timestamp\":\"%s\"", timestamp_rt_str);
+ snprintf_safe("}");
+ snprintf_safe("}");
+
+ size_t report_len = p - head;
+ size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP);
+
+ char *report_base64 = &tail[-report_elen-8];
+ if (p >= report_base64)
+ goto out;
+ base64_encode(head, report_len, report_base64,
+ report_elen, BASE64_NOWRAP);
+ report_base64[report_elen] = '\0';
+
+ /*
+ * Encoded report now sits at report_base64 position,
+ * at the tail of 'small' static buffer. Lets prepare
+ * the script to run (make sure the strings do not
+ * overlap).
+ */
+ p = head;
+ snprintf_safe("require(\'http.client\').post(\'%s\',"
+ "'{\"crashdump\":{\"version\":\"%d\","
+ "\"data\":", feedback_host,
+ crashinfo_version);
+ if (report_base64 - p <= (long)(report_elen - 2))
+ goto out;
+ snprintf_safe("\"%s\"", report_base64);
+ snprintf_safe("}}',{timeout=1}); os.exit(1);");
+
+ pr_debug("crashinfo script: %s", head);
+
+ char *exec_argv[4] = {
+ [0] = tarantool_path,
+ [1] = "-e",
+ [2] = head,
+ [3] = NULL,
+ };
+
+ /*
+ * Can't use fork here because libev has own
+ * at_fork helpers with mutex where we might
+ * stuck (see popen code).
+ */
+ pid_t pid = vfork();
+ if (pid == 0) {
+ /*
+ * The script must exit at the end but there
+ * is no simple way to make sure from inside
+ * of a signal crash handler. So just hope it
+ * is running fine.
+ */
+ execve(exec_argv[0], exec_argv, NULL);
+ pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
+ exec_argv[0], exec_argv[0],
+ exec_argv[1], exec_argv[2]);
+ _exit(1);
+ } else if (pid < 0) {
+ pr_crit("unable to vfork (errno %d)", errno);
+ }
+
+ return;
+out:
+ pr_crit("unable to prepare a crash report");
+}
+
/**
* Report crash information to the stderr
* (usually a current console).
@@ -232,6 +493,8 @@ crash_signal_cb(int signo, siginfo_t *siginfo, void *context)
in_cb = 1;
cinfo = crash_collect(signo, siginfo, context);
crash_report_stderr(cinfo);
+ if (send_crashinfo)
+ crash_report_feedback_daemon(cinfo);
} else {
/* Got a signal while running the handler. */
fprintf(stderr, "Fatal %d while backtracing", signo);
diff --git a/src/lib/core/crash.h b/src/lib/core/crash.h
index d107cd953..b005e3ec4 100644
--- a/src/lib/core/crash.h
+++ b/src/lib/core/crash.h
@@ -15,12 +15,24 @@
extern "C" {
#endif /* defined(__cplusplus) */
+/**
+ * Initialize crash subsystem.
+ */
+void
+crash_init(const char *tarantool_bin);
+
/**
* Initialize crash signal handlers.
*/
void
crash_signal_init(void);
+/**
+ * Configure crash engine from box.cfg.
+ */
+void
+crash_cfg(void);
+
/**
* Reset crash signal handlers.
*/
diff --git a/src/main.cc b/src/main.cc
index 391e0f878..2fce81bb3 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -687,6 +687,7 @@ main(int argc, char **argv)
title_set_script_name(argv[0]);
}
+ crash_init(tarantool_bin);
export_syms();
random_init();
diff --git a/test/box/admin.result b/test/box/admin.result
index 8c5626c36..b8fa68749 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -47,6 +47,8 @@ cfg_filter(box.cfg)
- https://feedback.tarantool.io
- - feedback_interval
- 3600
+ - - feedback_no_crashinfo
+ - false
- - force_recovery
- false
- - hot_standby
diff --git a/test/box/cfg.result b/test/box/cfg.result
index 5ca6ce72b..68c45b4cc 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -35,6 +35,8 @@ cfg_filter(box.cfg)
| - https://feedback.tarantool.io
| - - feedback_interval
| - 3600
+ | - - feedback_no_crashinfo
+ | - false
| - - force_recovery
| - false
| - - hot_standby
@@ -148,6 +150,8 @@ cfg_filter(box.cfg)
| - https://feedback.tarantool.io
| - - feedback_interval
| - 3600
+ | - - feedback_no_crashinfo
+ | - false
| - - force_recovery
| - false
| - - hot_standby
--
2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper Cyrill Gorcunov
@ 2020-12-11 7:34 ` Serge Petrenko
2020-12-11 7:58 ` Serge Petrenko
2020-12-14 22:54 ` Vladislav Shpilevoy
1 sibling, 1 reply; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 7:34 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy
10.12.2020 19:18, Cyrill Gorcunov пишет:
> Very convenient to have this string extension.
> We will use it in crash handling.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
Hi! Thanks for the patch!
LGTM.
> CMakeLists.txt | 1 +
> src/lib/core/util.c | 14 ++++++++++++++
> src/trivia/config.h.cmake | 5 +++++
> src/trivia/util.h | 15 +++++++++++++++
> 4 files changed, 35 insertions(+)
>
> diff --git a/CMakeLists.txt b/CMakeLists.txt
> index fa6818f8e..aa23113b3 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -126,6 +126,7 @@ set(CMAKE_REQUIRED_LIBRARIES "")
> check_function_exists(clock_gettime HAVE_CLOCK_GETTIME_WITHOUT_RT)
>
> check_symbol_exists(__get_cpuid cpuid.h HAVE_CPUID)
> +check_symbol_exists(strlcpy string.h HAVE_STRLCPY)
>
> # Checks for libev
> include(CheckStructHasMember)
> diff --git a/src/lib/core/util.c b/src/lib/core/util.c
> index dfce317f0..7b168a177 100644
> --- a/src/lib/core/util.c
> +++ b/src/lib/core/util.c
> @@ -250,6 +250,20 @@ int2str(long long int val)
> return buf;
> }
>
> +#ifndef HAVE_STRLCPY
> +size_t
> +strlcpy(char *dst, const char *src, size_t size)
> +{
> + size_t src_len = strlen(src);
> + if (size) {
> + size_t len = (src_len >= size) ? size - 1 : src_len;
> + memcpy(dst, src, len);
> + dst[len] = '\0';
> + }
> + return src_len;
> +}
> +#endif
> +
> int
> utf8_check_printable(const char *start, size_t length)
> {
> diff --git a/src/trivia/config.h.cmake b/src/trivia/config.h.cmake
> index 89e0d39c6..21efb978e 100644
> --- a/src/trivia/config.h.cmake
> +++ b/src/trivia/config.h.cmake
> @@ -47,6 +47,11 @@
> */
> #cmakedefine HAVE_CPUID 1
>
> +/**
> + * Defined if strlcpy() string extension helper present.
> + */
> + #cmakedefine HAVE_STRLCPY 1
> +
> /*
> * Defined if gcov instrumentation should be enabled.
> */
> diff --git a/src/trivia/util.h b/src/trivia/util.h
> index da5a3705e..8dcc48ada 100644
> --- a/src/trivia/util.h
> +++ b/src/trivia/util.h
> @@ -450,6 +450,21 @@ fpconv_strtod(const char *nptr, char **endptr)
> return strtod(nptr, endptr);
> }
>
> +#ifndef HAVE_STRLCPY
> +/**
> + * Copy string. Unlike \a strncpy the result string
> + * is always null-terminated.
> + *
> + * @param dst destination buffer.
> + * @param src source string.
> + * @param size destination buffer size.
> + *
> + * @return size of \a src string.
> + */
> +size_t
> +strlcpy(char *dst, const char *src, size_t size);
> +#endif
> +
> /**
> * Check that @a str is valid utf-8 sequence and can be printed
> * unescaped.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 2/4] backtrace: allow to specify destination buffer
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 2/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
@ 2020-12-11 7:50 ` Serge Petrenko
0 siblings, 0 replies; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 7:50 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy
10.12.2020 19:18, Cyrill Gorcunov пишет:
> This will allow to reuse this routine in crash reports.
>
> Part-of #5261
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
LGTM.
> src/lib/core/backtrace.cc | 12 ++++++------
> src/lib/core/backtrace.h | 3 +++
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/src/lib/core/backtrace.cc b/src/lib/core/backtrace.cc
> index 456ce9a4d..b4048089f 100644
> --- a/src/lib/core/backtrace.cc
> +++ b/src/lib/core/backtrace.cc
> @@ -131,7 +131,7 @@ get_proc_name(unw_cursor_t *unw_cur, unw_word_t *offset, bool skip_cache)
> }
>
> char *
> -backtrace(void)
> +backtrace(char *start, size_t size)
> {
> int frame_no = 0;
> unw_word_t sp = 0, old_sp = 0, ip, offset;
> @@ -139,10 +139,9 @@ backtrace(void)
> unw_getcontext(&unw_context);
> unw_cursor_t unw_cur;
> unw_init_local(&unw_cur, &unw_context);
> - char *backtrace_buf = (char *)static_alloc(SMALL_STATIC_SIZE);
> - char *p = backtrace_buf;
> - char *end = p + SMALL_STATIC_SIZE - 1;
> int unw_status;
> + char *p = start;
> + char *end = start + size - 1;
> *p = '\0';
> while ((unw_status = unw_step(&unw_cur)) > 0) {
> const char *proc;
> @@ -174,7 +173,7 @@ backtrace(void)
> say_debug("unwinding error: %i", unw_status);
> #endif
> out:
> - return backtrace_buf;
> + return start;
> }
>
> /*
> @@ -436,7 +435,8 @@ backtrace_foreach(backtrace_cb cb, coro_context *coro_ctx, void *cb_ctx)
> void
> print_backtrace(void)
> {
> - fdprintf(STDERR_FILENO, "%s", backtrace());
> + char *start = (char *)static_alloc(SMALL_STATIC_SIZE);
> + fdprintf(STDERR_FILENO, "%s", backtrace(start, SMALL_STATIC_SIZE));
> }
> #endif /* ENABLE_BACKTRACE */
>
> diff --git a/src/lib/core/backtrace.h b/src/lib/core/backtrace.h
> index c119d5402..e0ae56be4 100644
> --- a/src/lib/core/backtrace.h
> +++ b/src/lib/core/backtrace.h
> @@ -40,6 +40,9 @@ extern "C" {
> #ifdef ENABLE_BACKTRACE
> #include <coro.h>
>
> +char *
> +backtrace(char *start, size_t size);
> +
> void print_backtrace(void);
>
> typedef int (backtrace_cb)(int frameno, void *frameret,
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-11 7:34 ` Serge Petrenko
@ 2020-12-11 7:58 ` Serge Petrenko
2020-12-11 10:04 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 7:58 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy
11.12.2020 10:34, Serge Petrenko via Tarantool-patches пишет:
>
> 10.12.2020 19:18, Cyrill Gorcunov пишет:
>> Very convenient to have this string extension.
>> We will use it in crash handling.
>>
>> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
>> ---
>
> Hi! Thanks for the patch!
>
> LGTM.
>
>> CMakeLists.txt | 1 +
>> src/lib/core/util.c | 14 ++++++++++++++
>> src/trivia/config.h.cmake | 5 +++++
>> src/trivia/util.h | 15 +++++++++++++++
>> 4 files changed, 35 insertions(+)
>>
>> diff --git a/CMakeLists.txt b/CMakeLists.txt
>> index fa6818f8e..aa23113b3 100644
>> --- a/CMakeLists.txt
>> +++ b/CMakeLists.txt
>> @@ -126,6 +126,7 @@ set(CMAKE_REQUIRED_LIBRARIES "")
>> check_function_exists(clock_gettime HAVE_CLOCK_GETTIME_WITHOUT_RT)
>> check_symbol_exists(__get_cpuid cpuid.h HAVE_CPUID)
>> +check_symbol_exists(strlcpy string.h HAVE_STRLCPY)
>> # Checks for libev
>> include(CheckStructHasMember)
>> diff --git a/src/lib/core/util.c b/src/lib/core/util.c
>> index dfce317f0..7b168a177 100644
>> --- a/src/lib/core/util.c
>> +++ b/src/lib/core/util.c
>> @@ -250,6 +250,20 @@ int2str(long long int val)
>> return buf;
>> }
>> +#ifndef HAVE_STRLCPY
>> +size_t
>> +strlcpy(char *dst, const char *src, size_t size)
>> +{
>> + size_t src_len = strlen(src);
>> + if (size) {
One nit: we usually use `if(smth)` for boolean values only.
For integer values please use the explicit variant: `if(size != 0)`
>> + size_t len = (src_len >= size) ? size - 1 : src_len;
>> + memcpy(dst, src, len);
>> + dst[len] = '\0';
>> + }
>> + return src_len;
>> +}
>> +#endif
>> +
>> int
>> utf8_check_printable(const char *start, size_t length)
>> {
>> diff --git a/src/trivia/config.h.cmake b/src/trivia/config.h.cmake
>> index 89e0d39c6..21efb978e 100644
>> --- a/src/trivia/config.h.cmake
>> +++ b/src/trivia/config.h.cmake
>> @@ -47,6 +47,11 @@
>> */
>> #cmakedefine HAVE_CPUID 1
>> +/**
>> + * Defined if strlcpy() string extension helper present.
>> + */
>> + #cmakedefine HAVE_STRLCPY 1
>> +
>> /*
>> * Defined if gcov instrumentation should be enabled.
>> */
>> diff --git a/src/trivia/util.h b/src/trivia/util.h
>> index da5a3705e..8dcc48ada 100644
>> --- a/src/trivia/util.h
>> +++ b/src/trivia/util.h
>> @@ -450,6 +450,21 @@ fpconv_strtod(const char *nptr, char **endptr)
>> return strtod(nptr, endptr);
>> }
>> +#ifndef HAVE_STRLCPY
>> +/**
>> + * Copy string. Unlike \a strncpy the result string
>> + * is always null-terminated.
>> + *
>> + * @param dst destination buffer.
>> + * @param src source string.
>> + * @param size destination buffer size.
>> + *
>> + * @return size of \a src string.
>> + */
>> +size_t
>> +strlcpy(char *dst, const char *src, size_t size);
>> +#endif
>> +
>> /**
>> * Check that @a str is valid utf-8 sequence and can be printed
>> * unescaped.
>
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in Cyrill Gorcunov
@ 2020-12-11 9:31 ` Serge Petrenko
2020-12-11 10:38 ` Cyrill Gorcunov
2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-20 15:45 ` Vladislav Shpilevoy
2 siblings, 1 reply; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 9:31 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy
10.12.2020 19:18, Cyrill Gorcunov пишет:
> When SIGSEGV or SIGFPE reaches the tarantool we try to gather
> all information related to the crash and print it out to the
> console (well, stderr actually). Still there is a request
> to not just show this info locally but send it out to the
> feedback server.
>
> Thus to keep gathering crash related information in one module,
> we move fatal signal handling into the separate crash.c file.
> This allows us to collect the data we need in one place and
> reuse it when we need to send reports to stderr (and to the
> feedback server, which will be implemented in next patch).
>
> Part-of #5261
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Thanks for the patch!
Please find a couple of style comments below.
Other than that the patch looks good.
> ---
> src/lib/core/CMakeLists.txt | 1 +
> src/lib/core/crash.c | 291 ++++++++++++++++++++++++++++++++++++
> src/lib/core/crash.h | 32 ++++
> src/main.cc | 138 +----------------
> 4 files changed, 329 insertions(+), 133 deletions(-)
> create mode 100644 src/lib/core/crash.c
> create mode 100644 src/lib/core/crash.h
>
> diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt
> index 13ed1e7ab..06b2b91e1 100644
> --- a/src/lib/core/CMakeLists.txt
> +++ b/src/lib/core/CMakeLists.txt
> @@ -1,5 +1,6 @@
> set(core_sources
> diag.c
> + crash.c
> say.c
> memory.c
> clock.c
> diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
> new file mode 100644
> index 000000000..9572a023c
> --- /dev/null
> +++ b/src/lib/core/crash.c
> @@ -0,0 +1,291 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
I haven't seen us using such a license before.
It must be fine, I'm just not familiar with this.
> + */
> +
> +#include <string.h>
> +#include <unistd.h>
> +#include <time.h>
> +
> +/**
> + * Report crash information to the stderr
> + * (usually a current console).
> + */
> +static void
> +crash_report_stderr(struct crash_info *cinfo)
> +{
> + if (cinfo->signo == SIGSEGV) {
> + fprintf(stderr, "Segmentation fault\n");
> + const char *signal_code_repr = 0;
Please use NULL for pointers.
> +
> + switch (cinfo->sicode) {
> + case SEGV_MAPERR:
> + signal_code_repr = "SEGV_MAPERR";
> + break;
> + case SEGV_ACCERR:
> + signal_code_repr = "SEGV_ACCERR";
> + break;
> + }
> +
> + if (signal_code_repr)
Please add `!= NULL` to the condition.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-11 7:58 ` Serge Petrenko
@ 2020-12-11 10:04 ` Cyrill Gorcunov
2020-12-11 11:07 ` Serge Petrenko
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-11 10:04 UTC (permalink / raw)
To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
On Fri, Dec 11, 2020 at 10:58:21AM +0300, Serge Petrenko wrote:
> > > +size_t
> > > +strlcpy(char *dst, const char *src, size_t size)
> > > +{
> > > + size_t src_len = strlen(src);
> > > + if (size) {
>
>
> One nit: we usually use `if(smth)` for boolean values only.
>
> For integer values please use the explicit variant: `if(size != 0)`
Ouch, indeed. I'll force push an update once Vlad comment out
the series.
n.b. You know every time I see `if (x != [0|NULL])` statement
it driving me nuts: the language standart is pretty clear for
`if ()` statement and explains how it is evaluated and I always
wonder who exactly invented this explisit test for non-zero/nil?!
Now *every* if statement requires 5 additional symbols for simply
nothing :( I suspect the person who started to use this form
simply was not aware of the language standart.
Surely I try to follow current code style, just saying :)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-11 9:31 ` Serge Petrenko
@ 2020-12-11 10:38 ` Cyrill Gorcunov
2020-12-11 11:12 ` Serge Petrenko
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-11 10:38 UTC (permalink / raw)
To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
On Fri, Dec 11, 2020 at 12:31:18PM +0300, Serge Petrenko wrote:
> > --- /dev/null
> > +++ b/src/lib/core/crash.c
> > @@ -0,0 +1,291 @@
> > +/*
> > + * SPDX-License-Identifier: BSD-2-Clause
> > + *
> > + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
>
>
> I haven't seen us using such a license before.
> It must be fine, I'm just not familiar with this.
We've been discussing this spdx reference in chat and I've got no
rejection so far. So I think using short ref a way more convenient
instead of pushing the big header to each file.
https://spdx.org/licenses/
https://en.wikipedia.org/wiki/Software_Package_Data_Exchange
> > +static void
> > +crash_report_stderr(struct crash_info *cinfo)
> > +{
> > + if (cinfo->signo == SIGSEGV) {
> > + fprintf(stderr, "Segmentation fault\n");
> > + const char *signal_code_repr = 0;
>
>
> Please use NULL for pointers.
Both comments are fixed and force pushed. Thanks!
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-11 10:04 ` Cyrill Gorcunov
@ 2020-12-11 11:07 ` Serge Petrenko
2020-12-11 11:38 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 11:07 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
11.12.2020 13:04, Cyrill Gorcunov пишет:
> On Fri, Dec 11, 2020 at 10:58:21AM +0300, Serge Petrenko wrote:
>>>> +size_t
>>>> +strlcpy(char *dst, const char *src, size_t size)
>>>> +{
>>>> + size_t src_len = strlen(src);
>>>> + if (size) {
>>
>> One nit: we usually use `if(smth)` for boolean values only.
>>
>> For integer values please use the explicit variant: `if(size != 0)`
> Ouch, indeed. I'll force push an update once Vlad comment out
> the series.
>
> n.b. You know every time I see `if (x != [0|NULL])` statement
> it driving me nuts: the language standart is pretty clear for
> `if ()` statement and explains how it is evaluated and I always
> wonder who exactly invented this explisit test for non-zero/nil?!
> Now *every* if statement requires 5 additional symbols for simply
> nothing :( I suspect the person who started to use this form
> simply was not aware of the language standart.
I guess it's more about code readability rather than producing a
correct expression according to the standard.
I was also surprised by this rule at first, but
got used to it now. It's all about what you're used to, after all.
>
> Surely I try to follow current code style, just saying :)
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-11 10:38 ` Cyrill Gorcunov
@ 2020-12-11 11:12 ` Serge Petrenko
0 siblings, 0 replies; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 11:12 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
11.12.2020 13:38, Cyrill Gorcunov пишет:
> On Fri, Dec 11, 2020 at 12:31:18PM +0300, Serge Petrenko wrote:
>>> --- /dev/null
>>> +++ b/src/lib/core/crash.c
>>> @@ -0,0 +1,291 @@
>>> +/*
>>> + * SPDX-License-Identifier: BSD-2-Clause
>>> + *
>>> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
>>
>> I haven't seen us using such a license before.
>> It must be fine, I'm just not familiar with this.
> We've been discussing this spdx reference in chat and I've got no
> rejection so far. So I think using short ref a way more convenient
> instead of pushing the big header to each file.
>
> https://spdx.org/licenses/
> https://en.wikipedia.org/wiki/Software_Package_Data_Exchange
Ok, I see. Discard this comment then.
>>> +static void
>>> +crash_report_stderr(struct crash_info *cinfo)
>>> +{
>>> + if (cinfo->signo == SIGSEGV) {
>>> + fprintf(stderr, "Segmentation fault\n");
>>> + const char *signal_code_repr = 0;
>>
>> Please use NULL for pointers.
> Both comments are fixed and force pushed. Thanks!
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-11 11:07 ` Serge Petrenko
@ 2020-12-11 11:38 ` Cyrill Gorcunov
2020-12-14 22:54 ` Vladislav Shpilevoy
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-11 11:38 UTC (permalink / raw)
To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
On Fri, Dec 11, 2020 at 02:07:53PM +0300, Serge Petrenko wrote:
...
> >
> > n.b. You know every time I see `if (x != [0|NULL])` statement
> > it driving me nuts: the language standart is pretty clear for
> > `if ()` statement and explains how it is evaluated and I always
> > wonder who exactly invented this explisit test for non-zero/nil?!
> > Now *every* if statement requires 5 additional symbols for simply
> > nothing :( I suspect the person who started to use this form
> > simply was not aware of the language standart.
>
> I guess it's more about code readability rather than producing a
> correct expression according to the standard.
Hardly. I suspect it was due to lack of understanding how code
compiles and evaluates, and being nonfamiliar with standarts ;)
>
> I was also surprised by this rule at first, but got used to it now.
> It's all about what you're used to, after all.
Yeah, not a big deal, I can live with this style as well and try
to follow for sure :)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server Cyrill Gorcunov
@ 2020-12-11 12:57 ` Serge Petrenko
2020-12-12 16:19 ` Cyrill Gorcunov
2020-12-14 22:54 ` Vladislav Shpilevoy
1 sibling, 1 reply; 38+ messages in thread
From: Serge Petrenko @ 2020-12-11 12:57 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy
10.12.2020 19:18, Cyrill Gorcunov пишет:
> We have a feedback server which gathers information about a running instnace.
> While general info is enough for now we may loose a precious information about
> crashes (such as call backtrace which caused the issue, type of build and etc).
>
> In the commit we add support of sending this kind of infomation to the feedback
> server. Internally we gather the reason of failure, pack it into base64 form
> and then run another tarantool instance which sends it out.
>
> A typical report might look like
>
> | {
> | "crashdump": {
> | "version": "1",
> | "data": "eyJ1bmFtZSI6eyJzeXNuYW1lIjoiTGludXgiLCJyZWxlYXNlIjoiNS"
> | "45LjExLTEwMC5mYzMyLng4Nl82NCIsInZlcnNpb24iOiIjMSBTTVAg"
> | "VHVlIE5vdiAyNCAxOToxNjo1MyBVVEMgMjAyMCIsIm1hY2hpbmUiOi"
> | "J4ODZfNjQifSwiYnVpbGQiOnsidmVyc2lvbiI6IjIuNy4wLTgzLWc5"
> | "ZTg1MDM4ZmIiLCJjbWFrZV90eXBlIjoiTGludXgteDg2XzY0LURlYn"
> | "VnIn0sInNpZ25hbCI6eyJzaWdubyI6MTEsInNpX2NvZGUiOjAsInNp"
> | "X2FkZHIiOiIweDNlODAwMDVjOGE5IiwiYmFja3RyYWNlIjoiSXpBZ0"
> | "lEQjROak14WkRBeklHbHVJR055WVhOb1gyTnZiR3hsWTNRclltWUtJ"
> | "ekVnSURCNE5qTXlaV1JoSUdsdUlHTnlZWE5vWDNOcFoyNWhiRjlqWW"
> | "lzMU1Rb2pNaUFnTUhnM1pqRmtORGt5TXpsaE9UQWdhVzRnWm5WdWJH"
> | "OWphMlpwYkdVck5qQUtJek1nSURCNE4yWXhaRFE0WkdReFl6VmxJR2"
> | "x1SUdWd2IyeHNYM2RoYVhRck5XVUtJelFnSURCNE9ETTNOekV6SUds"
> | "dUlHVndiMnhzWDNCdmJHd3JPVGtLSXpVZ0lEQjRPRE5oWkRRNElHbH"
> | "VJR1YyWDNKMWJpczBPRGNLSXpZZ0lEQjROakJtTUdGbElHbHVJSFJo"
> | "Y21GdWRHOXZiRjlzZFdGZmNuVnVYM05qY21sd2RDc3hNemdLSXpjZ0"
> | "lEQjRORFEyTldZMUlHbHVJRzFoYVc0ck5XRmpDaU00SUNBd2VEZG1N"
> | "V1EwT0dObU56QTBNaUJwYmlCZlgyeHBZbU5mYzNSaGNuUmZiV0ZwYm"
> | "l0bU1nb2pPU0FnTUhnME5EUmhNR1VnYVc0Z1gzTjBZWEowS3pKbENn"
> | "PT0iLCJ0aW1lc3RhbXAiOiIyMDIwLTEyLTEwIDE3OjA4OjQyIE1TSyJ9fQ=="
> | }
> | }
>
> The `data` value is a single string I wrapped it for commit message only.
> When `data` is decoded it consists of
>
> | {
> | "uname": {
> | "sysname": "Linux",
> | "release": "5.9.11-100.fc32.x86_64",
> | "version": "#1 SMP Tue Nov 24 19:16:53 UTC 2020",
> | "machine": "x86_64"
> | },
> | "build": {
> | "version": "2.7.0-83-g9e85038fb",
> | "cmake_type": "Linux-x86_64-Debug"
> | },
> | "signal": {
> | "signo": 11,
> | "si_code": 0,
> | "si_addr": "0x3e80005c8a9",
> | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
> | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
> | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
> | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
> | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
> | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
> | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
> | "MGUgaW4gX3N0YXJ0KzJlCg==",
> | "timestamp": "2020-12-10 17:08:42 MSK"
> | }
> | }
Thanks for the patch!
Please find one comment below.
> @@ -126,6 +214,179 @@ crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
> return cinfo;
> }
>
> +/**
> + * Report crash information to the feedback daemon
> + * (ie send it to feedback daemon).
> + */
> +static void
> +crash_report_feedback_daemon(struct crash_info *cinfo)
> +{
> + /*
> + * Update to a new number if format get changed.
> + */
> + static const int crashinfo_version = 1;
> +
> + char *p = static_alloc(SMALL_STATIC_SIZE);
> + char *e = &p[SMALL_STATIC_SIZE - 1];
> + char *head = p;
> + char *tail = &p[SMALL_STATIC_SIZE - 8];
> +
> + /*
> + * Note that while we encode the report we
> + * intensively use a tail of the allocated
> + * buffer as a temporary store.
> + */
> +
> +#define snprintf_safe(fmt, ...) \
> + do { \
> + p += snprintf(p, e - p, fmt, ##__VA_ARGS__); \
> + if (p >= e) \
> + goto out; \
> + } while (0)
> +
> + /*
> + * On linux there is new_utsname structure which
> + * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
> + *
> + * So lets just reserve more data in advance: 5 fields
> + * 128 bytes each => 640 bytes.
> + */
> + static_assert(sizeof(struct utsname) <= 640,
> + "utsname is bigger than expected");
This static assert fails on my mac.
Looks like `struct utsname` is 1280 bytes in size there.
> +
> + /*
> + * Lets reuse tail of the buffer as a temp space.
> + */
> + struct utsname *uname_ptr = (void *)&tail[-640];
> + if (p > (char *)(void *)uname_ptr)
> + goto out;
> +
> + if (uname(uname_ptr) != 0) {
> + pr_syserr("uname call failed, ignore");
> + memset(uname_ptr, 0, 640);
> + }
> +
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-11 12:57 ` Serge Petrenko
@ 2020-12-12 16:19 ` Cyrill Gorcunov
2020-12-12 17:07 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-12 16:19 UTC (permalink / raw)
To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
On Fri, Dec 11, 2020 at 03:57:20PM +0300, Serge Petrenko wrote:
> > +
> > + /*
> > + * On linux there is new_utsname structure which
> > + * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
> > + *
> > + * So lets just reserve more data in advance: 5 fields
> > + * 128 bytes each => 640 bytes.
> > + */
> > + static_assert(sizeof(struct utsname) <= 640,
> > + "utsname is bigger than expected");
>
>
> This static assert fails on my mac.
> Looks like `struct utsname` is 1280 bytes in size there.
Thanks, Serge! I'll force update the size.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-12 16:19 ` Cyrill Gorcunov
@ 2020-12-12 17:07 ` Cyrill Gorcunov
2020-12-14 9:41 ` Serge Petrenko
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-12 17:07 UTC (permalink / raw)
To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
On Sat, Dec 12, 2020 at 07:19:47PM +0300, Cyrill Gorcunov wrote:
> On Fri, Dec 11, 2020 at 03:57:20PM +0300, Serge Petrenko wrote:
> > > +
> > > + /*
> > > + * On linux there is new_utsname structure which
> > > + * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
> > > + *
> > > + * So lets just reserve more data in advance: 5 fields
> > > + * 128 bytes each => 640 bytes.
> > > + */
> > > + static_assert(sizeof(struct utsname) <= 640,
> > > + "utsname is bigger than expected");
> >
> >
> > This static assert fails on my mac.
> > Looks like `struct utsname` is 1280 bytes in size there.
>
> Thanks, Serge! I'll force update the size.
Force pushed the update
---
/*
* Lets reuse tail of the buffer as a temp space.
*/
struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
if (p >= (char *)(void *)uname_ptr)
goto out;
if (uname(uname_ptr) != 0) {
pr_syserr("uname call failed, ignore");
memset(uname_ptr, 0, sizeof(struct utsname));
}
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-12 17:07 ` Cyrill Gorcunov
@ 2020-12-14 9:41 ` Serge Petrenko
0 siblings, 0 replies; 38+ messages in thread
From: Serge Petrenko @ 2020-12-14 9:41 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
12.12.2020 20:07, Cyrill Gorcunov пишет:
> On Sat, Dec 12, 2020 at 07:19:47PM +0300, Cyrill Gorcunov wrote:
>> On Fri, Dec 11, 2020 at 03:57:20PM +0300, Serge Petrenko wrote:
>>>> +
>>>> + /*
>>>> + * On linux there is new_utsname structure which
>>>> + * encodes each field to __NEW_UTS_LEN + 1 => 64 + 1 = 65.
>>>> + *
>>>> + * So lets just reserve more data in advance: 5 fields
>>>> + * 128 bytes each => 640 bytes.
>>>> + */
>>>> + static_assert(sizeof(struct utsname) <= 640,
>>>> + "utsname is bigger than expected");
>>>
>>> This static assert fails on my mac.
>>> Looks like `struct utsname` is 1280 bytes in size there.
>> Thanks, Serge! I'll force update the size.
> Force pushed the update
> ---
> /*
> * Lets reuse tail of the buffer as a temp space.
> */
> struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
> if (p >= (char *)(void *)uname_ptr)
> goto out;
>
> if (uname(uname_ptr) != 0) {
> pr_syserr("uname call failed, ignore");
> memset(uname_ptr, 0, sizeof(struct utsname));
> }
Thanks for the fix!
LGTM.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper Cyrill Gorcunov
2020-12-11 7:34 ` Serge Petrenko
@ 2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-15 8:47 ` Cyrill Gorcunov
1 sibling, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-14 22:54 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson
Hi! Thanks for the patch!
> diff --git a/src/trivia/util.h b/src/trivia/util.h
> index da5a3705e..8dcc48ada 100644
> --- a/src/trivia/util.h
> +++ b/src/trivia/util.h
> @@ -450,6 +450,21 @@ fpconv_strtod(const char *nptr, char **endptr)
> return strtod(nptr, endptr);
> }
>
> +#ifndef HAVE_STRLCPY
> +/**
> + * Copy string. Unlike \a strncpy the result string
Please, use @, not \. We use it in all new code.
https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style
> + * is always null-terminated.
> + *
> + * @param dst destination buffer.
> + * @param src source string.
> + * @param size destination buffer size.
> + *
> + * @return size of \a src string.
> + */
> +size_t
> +strlcpy(char *dst, const char *src, size_t size);
> +#endif
> +
> /**
> * Check that @a str is valid utf-8 sequence and can be printed
> * unescaped.
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-11 11:38 ` Cyrill Gorcunov
@ 2020-12-14 22:54 ` Vladislav Shpilevoy
0 siblings, 0 replies; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-14 22:54 UTC (permalink / raw)
To: Cyrill Gorcunov, Serge Petrenko; +Cc: Mons Anderson, tml
On 11.12.2020 12:38, Cyrill Gorcunov wrote:
> On Fri, Dec 11, 2020 at 02:07:53PM +0300, Serge Petrenko wrote:
> ...
>>>
>>> n.b. You know every time I see `if (x != [0|NULL])` statement
>>> it driving me nuts: the language standart is pretty clear for
>>> `if ()` statement and explains how it is evaluated and I always
>>> wonder who exactly invented this explisit test for non-zero/nil?!
>>> Now *every* if statement requires 5 additional symbols for simply
>>> nothing :( I suspect the person who started to use this form
>>> simply was not aware of the language standart.
>>
>> I guess it's more about code readability rather than producing a
>> correct expression according to the standard.
>
> Hardly. I suspect it was due to lack of understanding how code
> compiles and evaluates, and being nonfamiliar with standarts ;)
I don't think it is because of not knowing basics of C. Everyone
knows that you can use anything as a boolean expression and it will
be evaluated to != 0/NULL. Even first year students.
We don't use that intentionally. Because code with implicit boolean
checks is harder to read. You can't deduct type of a variable in 'if'
when you don't use '!' for bools, != 0 for numbers, != NULL for
pointers. Also implicit boolean cast sometimes looks confusing. For
example '!strcmp(a, b)' seems like a != b, but it means the opposite.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in Cyrill Gorcunov
2020-12-11 9:31 ` Serge Petrenko
@ 2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-15 8:16 ` Cyrill Gorcunov
2020-12-20 15:45 ` Vladislav Shpilevoy
2 siblings, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-14 22:54 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson
Thanks for the patch!
See 5 comments below.
On 10.12.2020 17:18, Cyrill Gorcunov wrote:
> When SIGSEGV or SIGFPE reaches the tarantool we try to gather
> all information related to the crash and print it out to the
> console (well, stderr actually). Still there is a request
> to not just show this info locally but send it out to the
> feedback server.
>
> Thus to keep gathering crash related information in one module,
> we move fatal signal handling into the separate crash.c file.
> This allows us to collect the data we need in one place and
> reuse it when we need to send reports to stderr (and to the
> feedback server, which will be implemented in next patch).
>
> Part-of #5261
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> src/lib/core/CMakeLists.txt | 1 +
> src/lib/core/crash.c | 291 ++++++++++++++++++++++++++++++++++++
> src/lib/core/crash.h | 32 ++++
> src/main.cc | 138 +----------------
> 4 files changed, 329 insertions(+), 133 deletions(-)
> create mode 100644 src/lib/core/crash.c
> create mode 100644 src/lib/core/crash.h
>
> diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
> new file mode 100644
> index 000000000..9572a023c
> --- /dev/null
> +++ b/src/lib/core/crash.c
> @@ -0,0 +1,291 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +
> +#include <string.h>
> +#include <unistd.h>
> +#include <time.h>
> +
> +#include "trivia/util.h"
> +#include "backtrace.h"
> +#include "crash.h"
> +#include "say.h"
> +
> +#define pr_fmt(fmt) "crash: " fmt
> +#define pr_syserr(fmt, ...) say_syserror(pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_panic(fmt, ...) panic(pr_fmt(fmt), ##__VA_ARGS__)
> +
> +#ifdef TARGET_OS_LINUX
> +#ifndef __x86_64__
> +# error "Non x86-64 architectures are not supported"
> +#endif
> +struct crash_greg {
1. What is 'g' in 'greg'?
> + uint64_t r8;
> + uint64_t r9;
> + uint64_t r10;
> + uint64_t r11;
> + uint64_t r12;
> + uint64_t r13;
> + uint64_t r14;
> + uint64_t r15;
> + uint64_t di;
> + uint64_t si;
> + uint64_t bp;
> + uint64_t bx;
> + uint64_t dx;
> + uint64_t ax;
> + uint64_t cx;
> + uint64_t sp;
> + uint64_t ip;
> + uint64_t flags;
> + uint16_t cs;
> + uint16_t gs;
> + uint16_t fs;
> + uint16_t ss;
> + uint64_t err;
> + uint64_t trapno;
> + uint64_t oldmask;
> + uint64_t cr2;
> + uint64_t fpstate;
> + uint64_t reserved1[8];
> +};
> +#endif /* TARGET_OS_LINUX */
2. Perhaps you could reduce number of #ifdef-#endif
if you would define struct crash_greg as an empty
struct for all the other platforms. Then you wouldn't
need the TARGET_OS_LINUX check inside of crash_info.
But up to you.
> +
> +static struct crash_info {
> + /**
> + * These two are mostly useless as being
> + * plain addresses but keep for backward
> + * compatibility.
3. Why don't you say the same about 'siaddr'? It is also
just a plain address.
> + */
> + void *context_addr;
> + void *siginfo_addr;
> +#ifdef TARGET_OS_LINUX
> + /**
> + * Registers contents.
> + */
> + struct crash_greg greg;
> +#endif
> + /**
> + * Faulting address.
> + */
> + void *siaddr;
> + /**
> + * Crash signal number.
> + */
> + int signo;
> + /**
> + * Crash signal code.
> + */
> + int sicode;
> +#ifdef ENABLE_BACKTRACE
> + /*
4. We usually use /** in out-of-function comment's
first line.
> + * 4K of memory should be enough to keep the backtrace.
> + * In worst case it gonna be simply trimmed.
> + */
> + char backtrace_buf[4096];
5. This is a functional change. Previously for the backtrace
we used the static buffer.
1) Why did you change it?
2) Why isn't it in a separate commit? As I told you, it is really
hard to extract what did you change in a ~460 lines patch titled as
'move' to check if it does not break anything or is even needed.
Please, don't make it harder.
Also print_backtrace() becomes unused after your patch.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server Cyrill Gorcunov
2020-12-11 12:57 ` Serge Petrenko
@ 2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-16 11:16 ` Cyrill Gorcunov
1 sibling, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-14 22:54 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson
Thanks for the patch!
See 17 comments below.
On 10.12.2020 17:18, Cyrill Gorcunov wrote:
> We have a feedback server which gathers information about a running instnace.
1. instnace -> instance.
> While general info is enough for now we may loose a precious information about
> crashes (such as call backtrace which caused the issue, type of build and etc).
>
> In the commit we add support of sending this kind of infomation to the feedback
2. infomation -> information.
It helps to write a commit message in an editor like Google Docs or your
local offline editor, and see all the highlighted errors, if you can't
get a spell checker working in vim or whatever you usually use.
> server. Internally we gather the reason of failure, pack it into base64 form
> and then run another tarantool instance which sends it out.
>
> A typical report might look like
>
> | {
> | "crashdump": {
> | "version": "1",
> | "data": "eyJ1bmFtZSI6eyJzeXNuYW1lIjoiTGludXgiLCJyZWxlYXNlIjoiNS"
> | "45LjExLTEwMC5mYzMyLng4Nl82NCIsInZlcnNpb24iOiIjMSBTTVAg"
> | "VHVlIE5vdiAyNCAxOToxNjo1MyBVVEMgMjAyMCIsIm1hY2hpbmUiOi"
> | "J4ODZfNjQifSwiYnVpbGQiOnsidmVyc2lvbiI6IjIuNy4wLTgzLWc5"
> | "ZTg1MDM4ZmIiLCJjbWFrZV90eXBlIjoiTGludXgteDg2XzY0LURlYn"
> | "VnIn0sInNpZ25hbCI6eyJzaWdubyI6MTEsInNpX2NvZGUiOjAsInNp"
> | "X2FkZHIiOiIweDNlODAwMDVjOGE5IiwiYmFja3RyYWNlIjoiSXpBZ0"
> | "lEQjROak14WkRBeklHbHVJR055WVhOb1gyTnZiR3hsWTNRclltWUtJ"
> | "ekVnSURCNE5qTXlaV1JoSUdsdUlHTnlZWE5vWDNOcFoyNWhiRjlqWW"
> | "lzMU1Rb2pNaUFnTUhnM1pqRmtORGt5TXpsaE9UQWdhVzRnWm5WdWJH"
> | "OWphMlpwYkdVck5qQUtJek1nSURCNE4yWXhaRFE0WkdReFl6VmxJR2"
> | "x1SUdWd2IyeHNYM2RoYVhRck5XVUtJelFnSURCNE9ETTNOekV6SUds"
> | "dUlHVndiMnhzWDNCdmJHd3JPVGtLSXpVZ0lEQjRPRE5oWkRRNElHbH"
> | "VJR1YyWDNKMWJpczBPRGNLSXpZZ0lEQjROakJtTUdGbElHbHVJSFJo"
> | "Y21GdWRHOXZiRjlzZFdGZmNuVnVYM05qY21sd2RDc3hNemdLSXpjZ0"
> | "lEQjRORFEyTldZMUlHbHVJRzFoYVc0ck5XRmpDaU00SUNBd2VEZG1N"
> | "V1EwT0dObU56QTBNaUJwYmlCZlgyeHBZbU5mYzNSaGNuUmZiV0ZwYm"
> | "l0bU1nb2pPU0FnTUhnME5EUmhNR1VnYVc0Z1gzTjBZWEowS3pKbENn"
> | "PT0iLCJ0aW1lc3RhbXAiOiIyMDIwLTEyLTEwIDE3OjA4OjQyIE1TSyJ9fQ=="
> | }
> | }
>
> The `data` value is a single string I wrapped it for commit message only.
> When `data` is decoded it consists of
>
> | {
> | "uname": {
> | "sysname": "Linux",
> | "release": "5.9.11-100.fc32.x86_64",
> | "version": "#1 SMP Tue Nov 24 19:16:53 UTC 2020",
> | "machine": "x86_64"
> | },
> | "build": {
> | "version": "2.7.0-83-g9e85038fb",
> | "cmake_type": "Linux-x86_64-Debug"
> | },
> | "signal": {
> | "signo": 11,
> | "si_code": 0,
> | "si_addr": "0x3e80005c8a9",
> | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
> | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
> | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
> | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
> | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
> | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
> | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
> | "MGUgaW4gX3N0YXJ0KzJlCg==",
> | "timestamp": "2020-12-10 17:08:42 MSK"
> | }
> | }
3. And what is the reason you encode it all into base64 second time?
> The `backtrace` istself is encoded as base64 because of newline symbols
> (and may comprise some nonascii symbols as well).
>
> There is no simple way to test this so I did it manually:
> 1) Run instance with
>
> box.cfg{log_level = 8, feedback_host="127.0.0.1:1500"}
>
> 2) Run listener shell as
>
> while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done
>
> 3) Send SIGSEGV
>
> kill -11 `pidof tarantool`
>
> Once SIGSEGV is delivered the crashinfo data is generated and sent out. For
> debug purpose this data is also printed to the terminal on debug log level.
>
> Closes #5261
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
>
> @TarantoolBot document
> Title: Configuration update, allow to disable sending crash information
>
> For better analisys of program crashes the information associated with
4. analisys -> analysis.
> the crash such as
>
> - utsname (similar to `uname -a` output except the network name)
> - build information
> - reason for a crash
> - call backtrace
>
> is sent to the feedback server. To disable it set `feedback_no_crashinfo`
> to `false`.
5. Doesn't the name look wrong to you? To disable something you propose
to set 'no' to 'false'. No_crashinfo = false. It is the same as
crashinfo = true. Vice versa.
I propose not to use 'no' in option names, as it is always confusing.
crashinfo = false means don't send anything, true means send.
> ---
> src/box/lua/feedback_daemon.lua | 7 +
> src/box/lua/load_cfg.lua | 3 +
> src/exports.h | 1 +
> src/lib/core/CMakeLists.txt | 2 +-
> src/lib/core/crash.c | 263 ++++++++++++++++++++++++++++++++
> src/lib/core/crash.h | 12 ++
> src/main.cc | 1 +
> test/box/admin.result | 2 +
> test/box/cfg.result | 4 +
> 9 files changed, 294 insertions(+), 1 deletion(-)
>
> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
> index 1d39012ed..ae4d1a25f 100644
> --- a/src/box/lua/feedback_daemon.lua
> +++ b/src/box/lua/feedback_daemon.lua
> @@ -360,12 +361,18 @@ local function reload(self)
> self:start()
> end
>
> +ffi.cdef[[
> +void
> +crash_cfg(void);
> +]]
> +
> setmetatable(daemon, {
> __index = {
> set_feedback_params = function()
> daemon.enabled = box.cfg.feedback_enabled
> daemon.host = box.cfg.feedback_host
> daemon.interval = box.cfg.feedback_interval
> + ffi.C.crash_cfg()
6. I realized just now that your new option has nothing to do with
the feedback daemon except the option name and the URL.
You probably should configure it via Lua C, just like almost all
the other box.cfg options do.
> reload(daemon)
> return
> end,
> diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
> index 9572a023c..06cc240a4 100644
> --- a/src/lib/core/crash.c
> +++ b/src/lib/core/crash.c
> @@ -7,16 +7,28 @@
> #include <string.h>
> #include <unistd.h>
> #include <time.h>
> +#include <sys/types.h>
> +#include <sys/utsname.h>
>
> +#include "third_party/base64.h"
> +#include "small/static.h"
> #include "trivia/util.h"
> #include "backtrace.h"
> #include "crash.h"
> +#include "cfg.h"
7. By using 'cfg' here and linking with 'misc' you introduce a cyclic
dependency. Because 'cfg' module is a part of 'server' library.
'Server' depends on 'core', which has the new crash handling system.
Also you start depending on Lua here, because 'cfg' uses Lua inside.
In addition, 'cfg' depends on box, although implicitly. Because it
takes the config options from box.cfg. I suggest you to configure this
module like other modules - via crash_cfg_*() functions. Or pass
the options as arguments into one crash_cfg().
> #include "say.h"
>
> #define pr_fmt(fmt) "crash: " fmt
> +#define pr_debug(fmt, ...) say_debug(pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_info(fmt, ...) say_info(pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_err(fmt, ...) say_error(pr_fmt(fmt), ##__VA_ARGS__)
> #define pr_syserr(fmt, ...) say_syserror(pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_crit(fmt, ...) fprintf(stderr, pr_fmt(fmt) "\n", ##__VA_ARGS__)
> #define pr_panic(fmt, ...) panic(pr_fmt(fmt), ##__VA_ARGS__)
>
> @@ -88,6 +104,72 @@ static struct crash_info {
> #endif
> } crash_info;
>
> +static char tarantool_path[PATH_MAX];
> +static char feedback_host[2048];
8. I don't like that you just added +6KB to the process
size, not counting tons of other memory such as backtrace
buffer (+4KB), lots of 8 byte registers, and so on, when
we fight for each byte. At least +10KB in total. But I
have no a good idea how to fix it.
> +static bool send_crashinfo = true;
> +
> +static inline uint64_t
> +timespec_to_ns(struct timespec *ts)
> +{
> + return (uint64_t)ts->tv_sec * 1000000000 + (uint64_t)ts->tv_nsec;
> +}
> @@ -126,6 +214,179 @@ crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
> return cinfo;
> }
>
> +/**
> + * Report crash information to the feedback daemon
> + * (ie send it to feedback daemon).
> + */
> +static void
> +crash_report_feedback_daemon(struct crash_info *cinfo)
> +{
> + /*
> + * Update to a new number if format get changed.
> + */
> + static const int crashinfo_version = 1;
9. Why is it static? I suspect we discussed that at least 4
times ...
> +
> + char *p = static_alloc(SMALL_STATIC_SIZE);
> + char *e = &p[SMALL_STATIC_SIZE - 1];
> + char *head = p;
> + char *tail = &p[SMALL_STATIC_SIZE - 8];
10. Why -8?
> +
> + /*
> + * Note that while we encode the report we
> + * intensively use a tail of the allocated
> + * buffer as a temporary store.
> + */
> +
> +#define snprintf_safe(fmt, ...) \
> + do { \
> + p += snprintf(p, e - p, fmt, ##__VA_ARGS__); \
> + if (p >= e) \
> + goto out; \
> + } while (0)
> +
> + /*
> + * Lets reuse tail of the buffer as a temp space.
> + */
> + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
> + if (p >= (char *)(void *)uname_ptr)
11. Why do you need double cast? I can't imagine that it does anything
except adds confusion.
> + goto out;
> +
> + if (uname(uname_ptr) != 0) {
> + pr_syserr("uname call failed, ignore");
> + memset(uname_ptr, 0, sizeof(struct utsname));
> + }
> +
> + snprintf_safe("{");
12. snprintf_safe uses memory [begin, e]. Where 'begin' is the initial
value of 'p'. 'uname_ptr' occupies [e - 7 - sizeof(*uname_ptr), e - 7].
Assume static size is 12KB, and sizeof(*uname_ptr) is 640.
[begin = 0, snprintf e = 12287]
[11640 uname 12280]
So snprintf and uname buffers intersect. If there is a big enough dump,
it will corrupt uname object.
Is it correct?
> + snprintf_safe("\"uname\":{");
> + snprintf_safe("\"sysname\":\"%s\",", uname_ptr->sysname);
> + /*
> + * nodename might a sensitive information, skip.
> + */
> + snprintf_safe("\"release\":\"%s\",", uname_ptr->release);
> + snprintf_safe("\"version\":\"%s\",", uname_ptr->version);
> + snprintf_safe("\"machine\":\"%s\"", uname_ptr->machine);
> + snprintf_safe("},");
> +
> + if (p >= (char *)(void *)uname_ptr)
> + goto out;
> +
> + snprintf_safe("\"build\":{");
> + snprintf_safe("\"version\":\"%s\",", PACKAGE_VERSION);
> + snprintf_safe("\"cmake_type\":\"%s\"", BUILD_INFO);
> + snprintf_safe("},");
> +
> + snprintf_safe("\"signal\":{");
> + snprintf_safe("\"signo\":%d,", cinfo->signo);
> + snprintf_safe("\"si_code\":%d,", cinfo->sicode);
> + if (cinfo->signo == SIGSEGV) {
> + if (cinfo->sicode == SEGV_MAPERR) {
> + snprintf_safe("\"si_code_str\":\"%s\",",
> + "SEGV_MAPERR");
> + } else if (cinfo->sicode == SEGV_ACCERR) {
> + snprintf_safe("\"si_code_str\":\"%s\",",
> + "SEGV_ACCERR");
> + }
> + snprintf_safe("\"si_addr\":\"0x%llx\",",
> + (long long)cinfo->siaddr);
> + }
13. Instead of using all these global many KB buffers you
could also create a pipe to the new process, which the
child would read as stdin from Lua, without any tiny limits
on how many KBs it can use. And do write() to it directly.
> +
> +#ifdef ENABLE_BACKTRACE
> + /*
> + * The backtrace itself is encoded into base64 form
> + * since it might have arbitrary symbols not suitable
> + * for json encoding (newlines and etc).
> + */
> + size_t bt_len = strlen(cinfo->backtrace_buf);
> + size_t bt_elen = base64_bufsize(bt_len, BASE64_NOWRAP);
> + char *bt_base64 = &tail[-bt_elen-8];
> + if (p >= bt_base64)
> + goto out;
> + base64_encode(cinfo->backtrace_buf, bt_len,
> + bt_base64, bt_elen, BASE64_NOWRAP);
> + bt_base64[bt_elen] = '\0';
> + snprintf_safe("\"backtrace\":\"%s\",", bt_base64);
> +#endif
> +
> + char *timestamp_rt_str = &tail[-128];
> + if (p >= timestamp_rt_str)
> + goto out;
> + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128);
14. Why 128? And why is it hardcoded?
> + snprintf_safe("\"timestamp\":\"%s\"", timestamp_rt_str);
> + snprintf_safe("}");
> + snprintf_safe("}");
> +
> + size_t report_len = p - head;
> + size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP);
> +
> + char *report_base64 = &tail[-report_elen-8];
15. Why -8?
16. What if report_elen is bigger than *uname_ptr? Will you override
the output buffer even more?
> + if (p >= report_base64)
> + goto out;
> + base64_encode(head, report_len, report_base64,
> + report_elen, BASE64_NOWRAP);
> + report_base64[report_elen] = '\0';
> +
> + /*
> + * Encoded report now sits at report_base64 position,
> + * at the tail of 'small' static buffer. Lets prepare
> + * the script to run (make sure the strings do not
> + * overlap).
> + */
> + p = head;
> + snprintf_safe("require(\'http.client\').post(\'%s\',"
> + "'{\"crashdump\":{\"version\":\"%d\","
> + "\"data\":", feedback_host,
> + crashinfo_version);
> + if (report_base64 - p <= (long)(report_elen - 2))
> + goto out;
> + snprintf_safe("\"%s\"", report_base64);
> + snprintf_safe("}}',{timeout=1}); os.exit(1);");
> +
> + pr_debug("crashinfo script: %s", head);
> +
> + char *exec_argv[4] = {
> + [0] = tarantool_path,
> + [1] = "-e",
> + [2] = head,
> + [3] = NULL,
> + };
> +
> + /*
> + * Can't use fork here because libev has own
> + * at_fork helpers with mutex where we might
> + * stuck (see popen code).
> + */
> + pid_t pid = vfork();
> + if (pid == 0) {
> + /*
> + * The script must exit at the end but there
> + * is no simple way to make sure from inside
> + * of a signal crash handler. So just hope it
> + * is running fine.
> + */
> + execve(exec_argv[0], exec_argv, NULL);
> + pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
> + exec_argv[0], exec_argv[0],
> + exec_argv[1], exec_argv[2]);
17. What will happen if this tarantool also will crash? Will there
be an infinite chain of crashes? For instance, if we will have an
issue with curl again. Or somebody else will build with his own
curl leading to a crash on any usage, like it already happened one
time.
> + _exit(1);
> + } else if (pid < 0) {
> + pr_crit("unable to vfork (errno %d)", errno);
> + }
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-14 22:54 ` Vladislav Shpilevoy
@ 2020-12-15 8:16 ` Cyrill Gorcunov
2020-12-20 14:48 ` Vladislav Shpilevoy
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-15 8:16 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml
On Mon, Dec 14, 2020 at 11:54:23PM +0100, Vladislav Shpilevoy wrote:
> > +
> > +#ifdef TARGET_OS_LINUX
> > +#ifndef __x86_64__
> > +# error "Non x86-64 architectures are not supported"
> > +#endif
> > +struct crash_greg {
>
> 1. What is 'g' in 'greg'?
G(eneral).
> > +#endif /* TARGET_OS_LINUX */
>
> 2. Perhaps you could reduce number of #ifdef-#endif
> if you would define struct crash_greg as an empty
> struct for all the other platforms. Then you wouldn't
> need the TARGET_OS_LINUX check inside of crash_info.
> But up to you.
I thought about it, but you know on other OSes there might
be different names for these registers (greg name comes from
inside of the kernel) so I stick to OS specifics to be more
clear.
> > +static struct crash_info {
> > + /**
> > + * These two are mostly useless as being
> > + * plain addresses but keep for backward
> > + * compatibility.
>
> 3. Why don't you say the same about 'siaddr'? It is also
> just a plain address.
The other members are exported to report while these two
are printed in local console only. To be honest I don't
see any reason in these two members but I kept them to
not break backward compatibility.
> > +#ifdef ENABLE_BACKTRACE
> > + /*
>
> 4. We usually use /** in out-of-function comment's
> first line.
This comment is not a part of doxygen, I left it this
way intentionally. This comment for internal use.
>
> > + * 4K of memory should be enough to keep the backtrace.
> > + * In worst case it gonna be simply trimmed.
> > + */
> > + char backtrace_buf[4096];
>
> 5. This is a functional change. Previously for the backtrace
> we used the static buffer.
>
> 1) Why did you change it?
Because the buffer is used between the calls, iow it filled once
and then passed to plain report to the console and then to
json encoding. And keeping data in static buffer between the
calls is very bad idea, it bounds calls to the context. I'm ready
to spend 4K per instance for this. We can shrink the value down
to 1K if you prefer but keeping static buffer between the calls
definitely is not an option.
> 2) Why isn't it in a separate commit? As I told you, it is really
> hard to extract what did you change in a ~460 lines patch titled as
> 'move' to check if it does not break anything or is even needed.
> Please, don't make it harder.
Vlad, I remember this. The problem is that if I would do interdiff
the result will be simply unreadable (believe me, I tried). This
is why I sent the whole new patch instead. I reworked the patch
too much.
> Also print_backtrace() becomes unused after your patch.
Not really
[cyrill@grain tarantool.git] git grep -n print_backtrace
src/lib/core/backtrace.cc:436:print_backtrace(void)
src/lib/core/backtrace.cc:449: print_backtrace();
src/lib/core/backtrace.h:46:void print_backtrace(void);
src/lua/init.c:367: print_backtrace();
It is still suitable own handler, no?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper
2020-12-14 22:54 ` Vladislav Shpilevoy
@ 2020-12-15 8:47 ` Cyrill Gorcunov
0 siblings, 0 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-15 8:47 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml
On Mon, Dec 14, 2020 at 11:54:04PM +0100, Vladislav Shpilevoy wrote:
> > +#ifndef HAVE_STRLCPY
> > +/**
> > + * Copy string. Unlike \a strncpy the result string
>
> Please, use @, not \. We use it in all new code.
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style
Thanks, force pushed.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-14 22:54 ` Vladislav Shpilevoy
@ 2020-12-16 11:16 ` Cyrill Gorcunov
2020-12-16 20:31 ` Cyrill Gorcunov
2020-12-20 14:48 ` Vladislav Shpilevoy
0 siblings, 2 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-16 11:16 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: TML
On Mon, Dec 14, 2020 at 11:54:26PM +0100, Vladislav Shpilevoy wrote:
>
> 2. infomation -> information.
>
> It helps to write a commit message in an editor like Google Docs or your
> local offline editor, and see all the highlighted errors, if you can't
> get a spell checker working in vim or whatever you usually use.
Sorry, I've been checking the patch with spell checker but happen to miss
merging fixed verion in. Thank you!
> > | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
> > | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
> > | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
> > | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
> > | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
> > | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
> > | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
> > | "MGUgaW4gX3N0YXJ0KzJlCg==",
> > | "timestamp": "2020-12-10 17:08:42 MSK"
> > | }
> > | }
>
> 3. And what is the reason you encode it all into base64 second time?
Because backtrace contains newlines and other special symbols, this is very inconvenient
when you print the json-dump into console.
>
> > The `backtrace` istself is encoded as base64 because of newline symbols
> > (and may comprise some nonascii symbols as well).
^^^ here I explained why
>
> 4. analisys -> analysis.
yup, thanks!
> >
> > is sent to the feedback server. To disable it set `feedback_no_crashinfo`
> > to `false`.
>
> 5. Doesn't the name look wrong to you? To disable something you propose
> to set 'no' to 'false'. No_crashinfo = false. It is the same as
> crashinfo = true. Vice versa.
>
> I propose not to use 'no' in option names, as it is always confusing.
> crashinfo = false means don't send anything, true means send.
ok, will do, thanks!
> > setmetatable(daemon, {
> > __index = {
> > set_feedback_params = function()
> > daemon.enabled = box.cfg.feedback_enabled
> > daemon.host = box.cfg.feedback_host
> > daemon.interval = box.cfg.feedback_interval
> > + ffi.C.crash_cfg()
>
> 6. I realized just now that your new option has nothing to do with
> the feedback daemon except the option name and the URL.
>
> You probably should configure it via Lua C, just like almost all
> the other box.cfg options do.
I'll try, thanks!
> >
> > +#include "third_party/base64.h"
> > +#include "small/static.h"
> > #include "trivia/util.h"
> > #include "backtrace.h"
> > #include "crash.h"
> > +#include "cfg.h"
>
> 7. By using 'cfg' here and linking with 'misc' you introduce a cyclic
> dependency. Because 'cfg' module is a part of 'server' library.
> 'Server' depends on 'core', which has the new crash handling system.
> Also you start depending on Lua here, because 'cfg' uses Lua inside.
>
> In addition, 'cfg' depends on box, although implicitly. Because it
> takes the config options from box.cfg. I suggest you to configure this
> module like other modules - via crash_cfg_*() functions. Or pass
> the options as arguments into one crash_cfg().
OK, thanks!
> >
> > +static char tarantool_path[PATH_MAX];
> > +static char feedback_host[2048];
>
> 8. I don't like that you just added +6KB to the process
> size, not counting tons of other memory such as backtrace
> buffer (+4KB), lots of 8 byte registers, and so on, when
> we fight for each byte. At least +10KB in total. But I
> have no a good idea how to fix it.
Personally I don't like that we use static buffer inside
crash handler. I would prefer to have the whole memory we
need being preallocated, our executable file is about 11M
(for debug build) so I don't think that even 16K of
preallocated memory matter somethow. And would like to
have a separate preallocated stack for crash handler (which
requires to disable simultaneous signals handling as we have
now, and frankly I don't understand why we need SA_NODEFER
flag). I think we can rethink about memory usage later since
this is definitely not a critical part.
> > +static void
> > +crash_report_feedback_daemon(struct crash_info *cinfo)
> > +{
> > + /*
> > + * Update to a new number if format get changed.
> > + */
> > + static const int crashinfo_version = 1;
>
> 9. Why is it static? I suspect we discussed that at least 4
> times ...
To use stack less, still 8 bytes won't make much difference
so I thik indeed we can simply se nonstatick value.
> > +
> > + char *p = static_alloc(SMALL_STATIC_SIZE);
> > + char *e = &p[SMALL_STATIC_SIZE - 1];
> > + char *head = p;
> > + char *tail = &p[SMALL_STATIC_SIZE - 8];
>
> 10. Why -8?
To have at least a gap before the end of a buffer. Note
that this addresses are not used for snprintf (where we have
-1 as the last addressable pointer) but rather for arbitrary
data. For safety in short.
> > +
> > + /*
> > + * Lets reuse tail of the buffer as a temp space.
> > + */
> > + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
> > + if (p >= (char *)(void *)uname_ptr)
>
> 11. Why do you need double cast? I can't imagine that it does anything
> except adds confusion.
Compiler doesn't like to compare a char pointer with a structure pointer
src/lib/core/crash.c:251:8: error: comparison of distinct pointer types lacks a cast [-Werror]
251 | if (p >= (void *)uname_ptr)
| ^~
> > + if (uname(uname_ptr) != 0) {
> > + pr_syserr("uname call failed, ignore");
> > + memset(uname_ptr, 0, sizeof(struct utsname));
> > + }
> > +
> > + snprintf_safe("{");
>
> 12. snprintf_safe uses memory [begin, e]. Where 'begin' is the initial
> value of 'p'. 'uname_ptr' occupies [e - 7 - sizeof(*uname_ptr), e - 7].
> Assume static size is 12KB, and sizeof(*uname_ptr) is 640.
>
> [begin = 0, snprintf e = 12287]
> [11640 uname 12280]
>
> So snprintf and uname buffers intersect. If there is a big enough dump,
> it will corrupt uname object.
>
> Is it correct?
This is undefined behaviour. I though about it and I don't like this but
there is no other way if we want to use a single buffer for all. This was
the reason while I've been using preallocated memory for all this and you
said to reuse static alloc instead. Another option is to adjust @e pointer
to be sure that we're not jumping into the tail. Letme think maybe I manage
to make this more clear and readable...
> > + if (cinfo->signo == SIGSEGV) {
> > + if (cinfo->sicode == SEGV_MAPERR) {
> > + snprintf_safe("\"si_code_str\":\"%s\",",
> > + "SEGV_MAPERR");
> > + } else if (cinfo->sicode == SEGV_ACCERR) {
> > + snprintf_safe("\"si_code_str\":\"%s\",",
> > + "SEGV_ACCERR");
> > + }
> > + snprintf_safe("\"si_addr\":\"0x%llx\",",
> > + (long long)cinfo->siaddr);
> > + }
>
> 13. Instead of using all these global many KB buffers you
> could also create a pipe to the new process, which the
> child would read as stdin from Lua, without any tiny limits
> on how many KBs it can use. And do write() to it directly.
Pipes use kernel memory for own buffers and what is worse this
memory is not accountable as far as I remember. I'm not sure how
critical this would be but I like the idea in general. Gimme some
time to figure out the details.
> > +
> > + char *timestamp_rt_str = &tail[-128];
> > + if (p >= timestamp_rt_str)
> > + goto out;
> > + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128);
>
> 14. Why 128? And why is it hardcoded?
This should fit any possible timestring I think. The timestring is bound
to locale and I think it might vary in size. I choose a big value but
to be honest I'm not 100% if this will be enough forever.
> > + snprintf_safe("\"timestamp\":\"%s\"", timestamp_rt_str);
> > + snprintf_safe("}");
> > + snprintf_safe("}");
> > +
> > + size_t report_len = p - head;
> > + size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP);
> > +
> > + char *report_base64 = &tail[-report_elen-8];
>
> 15. Why -8?
Already explained.
>
> 16. What if report_elen is bigger than *uname_ptr? Will you override
> the output buffer even more?
Need to recheck, thanks!
>
> > + if (p >= report_base64)
> > + goto out;
> > + base64_encode(head, report_len, report_base64,
> > + report_elen, BASE64_NOWRAP);
> > + report_base64[report_elen] = '\0';
> > +
...
> > + pid_t pid = vfork();
> > + if (pid == 0) {
> > + /*
> > + * The script must exit at the end but there
> > + * is no simple way to make sure from inside
> > + * of a signal crash handler. So just hope it
> > + * is running fine.
> > + */
> > + execve(exec_argv[0], exec_argv, NULL);
> > + pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
> > + exec_argv[0], exec_argv[0],
> > + exec_argv[1], exec_argv[2]);
>
> 17. What will happen if this tarantool also will crash? Will there
> be an infinite chain of crashes? For instance, if we will have an
> issue with curl again. Or somebody else will build with his own
> curl leading to a crash on any usage, like it already happened one
> time.
Strictly speaking -- yes, you're right. We might have a chain of crashes
and I think we might need a command line option for this?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-16 11:16 ` Cyrill Gorcunov
@ 2020-12-16 20:31 ` Cyrill Gorcunov
2020-12-20 15:16 ` Vladislav Shpilevoy
2020-12-20 14:48 ` Vladislav Shpilevoy
1 sibling, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-16 20:31 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: TML
Vlad, take a look please once time permit. I put the whole patch
here because changes are significant.
- more strict checks for buffer overrides whe filling report
- shrink bactrace buffer down to 1K, should be enough
- use lbox_ interface for crashinfo configuration
- setup env variable to address recursive crashes
I desided to not switch to pipes because this requires more
changes in the script but I think we could swicth to pipes
at any moment if we choose.
---
Subject: [PATCH] crash: report crash data to the feedback server
We have a feedback server which gathers information about a running instance.
While general info is enough for now we may loose a precious information about
crashes (such as call backtrace which caused the issue, type of build and etc).
In the commit we add support of sending this kind of information to the feedback
server. Internally we gather the reason of failure, pack it into base64 form
and then run another Tarantool instance which sends it out.
A typical report might look like
| {
| "crashdump": {
| "version": "1",
| "data": "eyJ1bmFtZSI6eyJzeXNuYW1lIjoiTGludXgiLCJyZWxlYXNlIjoiNS"
| "45LjExLTEwMC5mYzMyLng4Nl82NCIsInZlcnNpb24iOiIjMSBTTVAg"
| "VHVlIE5vdiAyNCAxOToxNjo1MyBVVEMgMjAyMCIsIm1hY2hpbmUiOi"
| "J4ODZfNjQifSwiYnVpbGQiOnsidmVyc2lvbiI6IjIuNy4wLTgzLWc5"
| "ZTg1MDM4ZmIiLCJjbWFrZV90eXBlIjoiTGludXgteDg2XzY0LURlYn"
| "VnIn0sInNpZ25hbCI6eyJzaWdubyI6MTEsInNpX2NvZGUiOjAsInNp"
| "X2FkZHIiOiIweDNlODAwMDVjOGE5IiwiYmFja3RyYWNlIjoiSXpBZ0"
| "lEQjROak14WkRBeklHbHVJR055WVhOb1gyTnZiR3hsWTNRclltWUtJ"
| "ekVnSURCNE5qTXlaV1JoSUdsdUlHTnlZWE5vWDNOcFoyNWhiRjlqWW"
| "lzMU1Rb2pNaUFnTUhnM1pqRmtORGt5TXpsaE9UQWdhVzRnWm5WdWJH"
| "OWphMlpwYkdVck5qQUtJek1nSURCNE4yWXhaRFE0WkdReFl6VmxJR2"
| "x1SUdWd2IyeHNYM2RoYVhRck5XVUtJelFnSURCNE9ETTNOekV6SUds"
| "dUlHVndiMnhzWDNCdmJHd3JPVGtLSXpVZ0lEQjRPRE5oWkRRNElHbH"
| "VJR1YyWDNKMWJpczBPRGNLSXpZZ0lEQjROakJtTUdGbElHbHVJSFJo"
| "Y21GdWRHOXZiRjlzZFdGZmNuVnVYM05qY21sd2RDc3hNemdLSXpjZ0"
| "lEQjRORFEyTldZMUlHbHVJRzFoYVc0ck5XRmpDaU00SUNBd2VEZG1N"
| "V1EwT0dObU56QTBNaUJwYmlCZlgyeHBZbU5mYzNSaGNuUmZiV0ZwYm"
| "l0bU1nb2pPU0FnTUhnME5EUmhNR1VnYVc0Z1gzTjBZWEowS3pKbENn"
| "PT0iLCJ0aW1lc3RhbXAiOiIyMDIwLTEyLTEwIDE3OjA4OjQyIE1TSyJ9fQ=="
| }
| }
The `data` value is a single string I wrapped it for commit message only.
When `data` is decoded it consists of
| {
| "uname": {
| "sysname": "Linux",
| "release": "5.9.11-100.fc32.x86_64",
| "version": "#1 SMP Tue Nov 24 19:16:53 UTC 2020",
| "machine": "x86_64"
| },
| "build": {
| "version": "2.7.0-83-g9e85038fb",
| "cmake_type": "Linux-x86_64-Debug"
| },
| "signal": {
| "signo": 11,
| "si_code": 0,
| "si_addr": "0x3e80005c8a9",
| "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
| "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
| "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
| "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
| "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
| "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
| "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
| "MGUgaW4gX3N0YXJ0KzJlCg==",
| "timestamp": "2020-12-10 17:08:42 MSK"
| }
| }
The `backtrace` itself is encoded as base64 because of newline symbols
(and may comprise some nonascii symbols as well).
There is no simple way to test this so I did it manually:
1) Run instance with
box.cfg{log_level = 8, feedback_host="127.0.0.1:1500"}
2) Run listener shell as
while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done
3) Send SIGSEGV
kill -11 `pidof tarantool`
Once SIGSEGV is delivered the crashinfo data is generated and sent out. For
debug purpose this data is also printed to the terminal on debug log level.
Closes #5261
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
@TarantoolBot document
Title: Configuration update, allow to disable sending crash information
For better analysis of program crashes the information associated with
the crash such as
- utsname (similar to `uname -a` output except the network name)
- build information
- reason for a crash
- call backtrace
is sent to the feedback server. To disable it set `feedback_crashinfo`
to `false`.
---
src/box/box.cc | 16 ++
src/box/box.h | 1 +
src/box/lua/cfg.cc | 12 ++
src/box/lua/load_cfg.lua | 6 +-
src/lib/core/CMakeLists.txt | 2 +-
src/lib/core/crash.c | 279 +++++++++++++++++++++++++++++++-
src/lib/core/crash.h | 18 +++
src/main.cc | 1 +
test/app-tap/init_script.result | 1 +
test/box/admin.result | 2 +
test/box/cfg.result | 4 +
11 files changed, 338 insertions(+), 4 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index a8bc3471d..27079fd46 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -74,6 +74,7 @@
#include "sql.h"
#include "systemd.h"
#include "call.h"
+#include "crash.h"
#include "func.h"
#include "sequence.h"
#include "sql_stmt_cache.h"
@@ -1213,6 +1214,21 @@ box_set_prepared_stmt_cache_size(void)
return 0;
}
+void
+box_set_crash_params(void)
+{
+ const char *host = cfg_gets("feedback_host");
+ bool is_enabled_1 = cfg_getb("feedback_enabled");
+ bool is_enabled_2 = cfg_getb("feedback_crashinfo");
+
+ if (host != NULL && strlen(host) >= CRASH_FEEDBACK_HOST_MAX) {
+ tnt_raise(ClientError, ER_CFG, "feedback_host",
+ "the address is too long");
+ }
+
+ crash_cfg_set_params(host, is_enabled_1 && is_enabled_2);
+}
+
/* }}} configuration bindings */
/**
diff --git a/src/box/box.h b/src/box/box.h
index b47a220b7..6792ade95 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -257,6 +257,7 @@ void box_set_replication_sync_timeout(void);
void box_set_replication_skip_conflict(void);
void box_set_replication_anon(void);
void box_set_net_msg_max(void);
+void box_set_crash_params(void);
int
box_set_prepared_stmt_cache_size(void);
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index 42805e602..089c770e8 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -375,6 +375,17 @@ lbox_cfg_set_replication_skip_conflict(struct lua_State *L)
return 0;
}
+static int
+lbox_cfg_set_crash_params(struct lua_State *L)
+{
+ try {
+ box_set_crash_params();
+ } catch (Exception *) {
+ luaT_error(L);
+ }
+ return 0;
+}
+
void
box_lua_cfg_init(struct lua_State *L)
{
@@ -411,6 +422,7 @@ box_lua_cfg_init(struct lua_State *L)
{"cfg_set_replication_anon", lbox_cfg_set_replication_anon},
{"cfg_set_net_msg_max", lbox_cfg_set_net_msg_max},
{"cfg_set_sql_cache_size", lbox_set_prepared_stmt_cache_size},
+ {"cfg_set_crash_params", lbox_cfg_set_crash_params},
{NULL, NULL}
};
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 770442052..80009c7af 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -33,7 +33,8 @@ end
local ifdef_feedback_set_params =
private.feedback_daemon ~= nil and
- private.feedback_daemon.set_feedback_params or nil
+ private.feedback_daemon.set_feedback_params and
+ private.cfg_set_crash_params or nil
-- all available options
local default_cfg = {
@@ -99,6 +100,7 @@ local default_cfg = {
replication_skip_conflict = false,
replication_anon = false,
feedback_enabled = true,
+ feedback_crashinfo = true,
feedback_host = "https://feedback.tarantool.io",
feedback_interval = 3600,
net_msg_max = 768,
@@ -179,6 +181,7 @@ local template_cfg = {
replication_skip_conflict = 'boolean',
replication_anon = 'boolean',
feedback_enabled = ifdef_feedback('boolean'),
+ feedback_crashinfo = ifdef_feedback('boolean'),
feedback_host = ifdef_feedback('string'),
feedback_interval = ifdef_feedback('number'),
net_msg_max = 'number',
@@ -277,6 +280,7 @@ local dynamic_cfg = {
checkpoint_wal_threshold = private.cfg_set_checkpoint_wal_threshold,
worker_pool_threads = private.cfg_set_worker_pool_threads,
feedback_enabled = ifdef_feedback_set_params,
+ feedback_crashinfo = ifdef_feedback_set_params,
feedback_host = ifdef_feedback_set_params,
feedback_interval = ifdef_feedback_set_params,
-- do nothing, affects new replicas, which query this value on start
diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt
index 30cf0dd15..358e98ea7 100644
--- a/src/lib/core/CMakeLists.txt
+++ b/src/lib/core/CMakeLists.txt
@@ -40,7 +40,7 @@ endif()
add_library(core STATIC ${core_sources})
-target_link_libraries(core salad small uri decNumber bit ${LIBEV_LIBRARIES}
+target_link_libraries(core salad small uri decNumber bit misc ${LIBEV_LIBRARIES}
${LIBEIO_LIBRARIES} ${LIBCORO_LIBRARIES}
${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES})
diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
index a15a13e8e..f6dd91987 100644
--- a/src/lib/core/crash.c
+++ b/src/lib/core/crash.c
@@ -7,16 +7,27 @@
#include <string.h>
#include <unistd.h>
#include <time.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include "third_party/base64.h"
+#include "small/static.h"
#include "trivia/util.h"
#include "backtrace.h"
#include "crash.h"
#include "say.h"
#define pr_fmt(fmt) "crash: " fmt
+#define pr_debug(fmt, ...) say_debug(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...) say_info(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...) say_error(pr_fmt(fmt), ##__VA_ARGS__)
#define pr_syserr(fmt, ...) say_syserror(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...) fprintf(stderr, pr_fmt(fmt) "\n", ##__VA_ARGS__)
#define pr_panic(fmt, ...) panic(pr_fmt(fmt), ##__VA_ARGS__)
+/* Use strlcpy with destination as an array */
+#define strlcpy_a(dst, src) strlcpy(dst, src, sizeof(dst))
+
#ifdef TARGET_OS_LINUX
#ifndef __x86_64__
# error "Non x86-64 architectures are not supported"
@@ -67,6 +78,10 @@ static struct crash_info {
*/
struct crash_greg greg;
#endif
+ /**
+ * Timestamp in nanoseconds (realtime).
+ */
+ uint64_t timestamp_rt;
/**
* Faulting address.
*/
@@ -81,13 +96,75 @@ static struct crash_info {
int sicode;
#ifdef ENABLE_BACKTRACE
/*
- * 4K of memory should be enough to keep the backtrace.
+ * 1K of memory should be enough to keep the backtrace.
* In worst case it gonna be simply trimmed.
*/
- char backtrace_buf[4096];
+ char backtrace_buf[1024];
#endif
} crash_info;
+static char tarantool_path[PATH_MAX];
+static char feedback_host[CRASH_FEEDBACK_HOST_MAX];
+static bool send_crashinfo = false;
+
+static inline uint64_t
+timespec_to_ns(struct timespec *ts)
+{
+ return (uint64_t)ts->tv_sec * 1000000000 + (uint64_t)ts->tv_nsec;
+}
+
+static char *
+ns_to_localtime(uint64_t timestamp, char *buf, ssize_t len)
+{
+ time_t sec = timestamp / 1000000000;
+ char *start = buf;
+ struct tm tm;
+
+ /*
+ * Use similar format as say_x logger. Except plain
+ * seconds should be enough.
+ */
+ localtime_r(&sec, &tm);
+ ssize_t total = strftime(start, len, "%F %T %Z", &tm);
+ start += total;
+ if (total < len)
+ return buf;
+ buf[len - 1] = '\0';
+ return buf;
+}
+
+void
+crash_init(const char *tarantool_bin)
+{
+ strlcpy_a(tarantool_path, tarantool_bin);
+ if (strlen(tarantool_path) < strlen(tarantool_bin))
+ pr_panic("executable path is trimmed");
+}
+
+void
+crash_cfg_set_params(const char *host, bool is_enabled)
+{
+ if (host == NULL || !is_enabled) {
+ if (send_crashinfo) {
+ pr_debug("disable sending crashinfo feedback");
+ send_crashinfo = false;
+ feedback_host[0] = '\0';
+ }
+ return;
+ }
+
+ if (strcmp(feedback_host, host) != 0) {
+ strlcpy_a(feedback_host, host);
+ if (strlen(feedback_host) < strlen(host))
+ pr_panic("feedback_host is too long");
+ }
+
+ if (!send_crashinfo) {
+ pr_debug("enable sending crashinfo feedback");
+ send_crashinfo = true;
+ }
+}
+
/**
* The routine is called inside crash signal handler so
* be careful to not cause additional signals inside.
@@ -96,6 +173,12 @@ static struct crash_info *
crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
{
struct crash_info *cinfo = &crash_info;
+ struct timespec ts;
+
+ if (clock_gettime(CLOCK_REALTIME, &ts) == 0)
+ cinfo->timestamp_rt = timespec_to_ns(&ts);
+ else
+ cinfo->timestamp_rt = 0;
cinfo->signo = signo;
cinfo->sicode = siginfo->si_code;
@@ -126,6 +209,196 @@ crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
return cinfo;
}
+/**
+ * Mark an environment that we're in crashinfo handling, this
+ * allows us to escape recursive attempts to send report,
+ * if the action of sending report is failing itself.
+ */
+static int
+crash_mark_env_mode(void)
+{
+ const char *env_name = "TT_CRASHINFO_MODE";
+ if (getenv(env_name) != NULL) {
+ pr_crit("recursive failure detected");
+ return -1;
+ }
+
+ if (setenv(env_name, "y", 0) != 0) {
+ pr_crit("unable to setup %s", env_name);
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Report crash information to the feedback daemon
+ * (ie send it to feedback daemon).
+ */
+static void
+crash_report_feedback_daemon(struct crash_info *cinfo)
+{
+ if (crash_mark_env_mode() != 0)
+ return;
+
+ /*
+ * Update to a new number if format get changed.
+ */
+ const int crashinfo_version = 1;
+
+ char *p = static_alloc(SMALL_STATIC_SIZE);
+ char *e = &p[SMALL_STATIC_SIZE - 1];
+ char *head = p;
+ char *tail = &p[SMALL_STATIC_SIZE - 8];
+
+ /*
+ * Note that while we encode the report we
+ * intensively use a tail of the allocated
+ * buffer as a temporary store.
+ */
+
+#define snprintf_safe(__end, __fmt, ...) \
+ do { \
+ size_t size = (char *)(void *)__end - p; \
+ p += snprintf(p, size, __fmt, ##__VA_ARGS__); \
+ if (p >= (char *)(void *)__end) \
+ goto out; \
+ } while (0)
+
+ /*
+ * Lets reuse tail of the buffer as a temp space.
+ */
+ struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
+ if (p >= (char *)(void *)uname_ptr)
+ goto out;
+
+ if (uname(uname_ptr) != 0) {
+ pr_syserr("uname call failed, ignore");
+ memset(uname_ptr, 0, sizeof(struct utsname));
+ }
+
+ snprintf_safe(uname_ptr, "{");
+ snprintf_safe(uname_ptr, "\"uname\":{");
+ snprintf_safe(uname_ptr, "\"sysname\":\"%s\",", uname_ptr->sysname);
+ /*
+ * nodename might a sensitive information, skip.
+ */
+ snprintf_safe(uname_ptr, "\"release\":\"%s\",", uname_ptr->release);
+ snprintf_safe(uname_ptr, "\"version\":\"%s\",", uname_ptr->version);
+ snprintf_safe(uname_ptr, "\"machine\":\"%s\"", uname_ptr->machine);
+ snprintf_safe(uname_ptr, "},");
+
+ snprintf_safe(e, "\"build\":{");
+ snprintf_safe(e, "\"version\":\"%s\",", PACKAGE_VERSION);
+ snprintf_safe(e, "\"cmake_type\":\"%s\"", BUILD_INFO);
+ snprintf_safe(e, "},");
+
+ snprintf_safe(e, "\"signal\":{");
+ snprintf_safe(e, "\"signo\":%d,", cinfo->signo);
+ snprintf_safe(e, "\"si_code\":%d,", cinfo->sicode);
+ if (cinfo->signo == SIGSEGV) {
+ if (cinfo->sicode == SEGV_MAPERR) {
+ snprintf_safe(e, "\"si_code_str\":\"%s\",",
+ "SEGV_MAPERR");
+ } else if (cinfo->sicode == SEGV_ACCERR) {
+ snprintf_safe(e, "\"si_code_str\":\"%s\",",
+ "SEGV_ACCERR");
+ }
+ snprintf_safe(e, "\"si_addr\":\"0x%llx\",",
+ (long long)cinfo->siaddr);
+ }
+
+#ifdef ENABLE_BACKTRACE
+ /*
+ * The backtrace itself is encoded into base64 form
+ * since it might have arbitrary symbols not suitable
+ * for json encoding (newlines and etc).
+ */
+ size_t bt_len = strlen(cinfo->backtrace_buf);
+ size_t bt_elen = base64_bufsize(bt_len, BASE64_NOWRAP);
+ char *bt_base64 = &tail[-bt_elen];
+ if (p >= bt_base64)
+ goto out;
+ base64_encode(cinfo->backtrace_buf, bt_len,
+ bt_base64, bt_elen, BASE64_NOWRAP);
+ bt_base64[bt_elen] = '\0';
+ snprintf_safe(bt_base64, "\"backtrace\":\"%s\",", bt_base64);
+#endif
+
+ /* 64 bytes should be enough for longest localtime */
+ char *timestamp_rt_str = &tail[-64];
+ if (p >= timestamp_rt_str)
+ goto out;
+ ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 64);
+ snprintf_safe(timestamp_rt_str, "\"timestamp\":\"%s\"", timestamp_rt_str);
+ snprintf_safe(timestamp_rt_str, "}");
+ snprintf_safe(timestamp_rt_str, "}");
+
+ size_t report_len = p - head;
+ size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP);
+
+ char *report_base64 = &tail[-report_elen];
+ if (p >= report_base64)
+ goto out;
+ base64_encode(head, report_len, report_base64,
+ report_elen, BASE64_NOWRAP);
+ report_base64[report_elen] = '\0';
+
+ /*
+ * Encoded report now sits at report_base64 position,
+ * at the tail of 'small' static buffer. Lets prepare
+ * the script to run.
+ */
+ p = head;
+ snprintf_safe(report_base64,
+ "require(\'http.client\').post(\'%s\',"
+ "'{\"crashdump\":{\"version\":\"%d\","
+ "\"data\": \"%s\"}}',{timeout=1});"
+ "os.exit(1);", feedback_host,
+ crashinfo_version, report_base64);
+
+ pr_debug("crashinfo script: %s", head);
+
+ char *exec_argv[4] = {
+ [0] = tarantool_path,
+ [1] = "-e",
+ [2] = head,
+ [3] = NULL,
+ };
+
+ /*
+ * Can't use fork here because libev has own
+ * at_fork helpers with mutex where we might
+ * stuck (see popen code).
+ */
+ pid_t pid = vfork();
+ if (pid == 0) {
+ /*
+ * Environment is needed for recursive
+ * crash protection. See crash_mark_env_mode
+ * above.
+ */
+ extern char **environ;
+ /*
+ * The script must exit at the end but there
+ * is no simple way to make sure from inside
+ * of a signal crash handler. So just hope it
+ * is running fine.
+ */
+ execve(exec_argv[0], exec_argv, environ);
+ pr_crit("exec(%s,[%s,%s,%s]) failed",
+ exec_argv[0], exec_argv[0],
+ exec_argv[1], exec_argv[2]);
+ _exit(1);
+ } else if (pid < 0) {
+ pr_crit("unable to vfork (errno %d)", errno);
+ }
+
+ return;
+out:
+ pr_crit("unable to prepare a crash report");
+}
+
/**
* Report crash information to the stderr
* (usually a current console).
@@ -232,6 +505,8 @@ crash_signal_cb(int signo, siginfo_t *siginfo, void *context)
in_cb = 1;
cinfo = crash_collect(signo, siginfo, context);
crash_report_stderr(cinfo);
+ if (send_crashinfo)
+ crash_report_feedback_daemon(cinfo);
} else {
/* Got a signal while running the handler. */
fprintf(stderr, "Fatal %d while backtracing", signo);
diff --git a/src/lib/core/crash.h b/src/lib/core/crash.h
index d107cd953..800d525e5 100644
--- a/src/lib/core/crash.h
+++ b/src/lib/core/crash.h
@@ -15,6 +15,24 @@
extern "C" {
#endif /* defined(__cplusplus) */
+/**
+ * PATH_MAX is too big and 2K is recommended
+ * limit for web address.
+ */
+#define CRASH_FEEDBACK_HOST_MAX 2048
+
+/**
+ * Initialize crash subsystem.
+ */
+void
+crash_init(const char *tarantool_bin);
+
+/**
+ * Configure crash parameters.
+ */
+void
+crash_cfg_set_params(const char *host, bool is_enabled);
+
/**
* Initialize crash signal handlers.
*/
diff --git a/src/main.cc b/src/main.cc
index 391e0f878..2fce81bb3 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -687,6 +687,7 @@ main(int argc, char **argv)
title_set_script_name(argv[0]);
}
+ crash_init(tarantool_bin);
export_syms();
random_init();
diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
index 72aa67db2..16c5b01d2 100644
--- a/test/app-tap/init_script.result
+++ b/test/app-tap/init_script.result
@@ -10,6 +10,7 @@ checkpoint_wal_threshold:1e+18
coredump:false
election_mode:off
election_timeout:5
+feedback_crashinfo:true
feedback_enabled:true
feedback_host:https://feedback.tarantool.io
feedback_interval:3600
diff --git a/test/box/admin.result b/test/box/admin.result
index e05440f66..05debe673 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -41,6 +41,8 @@ cfg_filter(box.cfg)
- off
- - election_timeout
- 5
+ - - feedback_crashinfo
+ - true
- - feedback_enabled
- true
- - feedback_host
diff --git a/test/box/cfg.result b/test/box/cfg.result
index 10fef006c..22a720c2c 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -29,6 +29,8 @@ cfg_filter(box.cfg)
| - off
| - - election_timeout
| - 5
+ | - - feedback_crashinfo
+ | - true
| - - feedback_enabled
| - true
| - - feedback_host
@@ -142,6 +144,8 @@ cfg_filter(box.cfg)
| - off
| - - election_timeout
| - 5
+ | - - feedback_crashinfo
+ | - true
| - - feedback_enabled
| - true
| - - feedback_host
--
2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-15 8:16 ` Cyrill Gorcunov
@ 2020-12-20 14:48 ` Vladislav Shpilevoy
2020-12-20 15:49 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-20 14:48 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Mons Anderson, tml
>>> +static struct crash_info {
>>> + /**
>>> + * These two are mostly useless as being
>>> + * plain addresses but keep for backward
>>> + * compatibility.
>>
>> 3. Why don't you say the same about 'siaddr'? It is also
>> just a plain address.
>
> The other members are exported to report while these two
> are printed in local console only. To be honest I don't
> see any reason in these two members but I kept them to
> not break backward compatibility.
This one also is printed to local console only. So the
question is the same, why don't you call it also a useless
plain address?
>>> +#ifdef ENABLE_BACKTRACE
>>> + /*
>>
>> 4. We usually use /** in out-of-function comment's
>> first line.
>
> This comment is not a part of doxygen, I left it this
> way intentionally. This comment for internal use.
So you seriously think everything else should go to doxygen?
Please, lets be consistent. The rule is simple - /** our of
functions, /* - inside. Please, just follow it.
>>> + * 4K of memory should be enough to keep the backtrace.
>>> + * In worst case it gonna be simply trimmed.
>>> + */
>>> + char backtrace_buf[4096];
>>
>> 5. This is a functional change. Previously for the backtrace
>> we used the static buffer.
>>
>> 1) Why did you change it?
>
> Because the buffer is used between the calls, iow it filled once
> and then passed to plain report to the console and then to
> json encoding. And keeping data in static buffer between the
> calls is very bad idea, it bounds calls to the context. I'm ready
> to spend 4K per instance for this. We can shrink the value down
> to 1K if you prefer but keeping static buffer between the calls
> definitely is not an option.
>
>> 2) Why isn't it in a separate commit? As I told you, it is really
>> hard to extract what did you change in a ~460 lines patch titled as
>> 'move' to check if it does not break anything or is even needed.
>> Please, don't make it harder.
>
> Vlad, I remember this. The problem is that if I would do interdiff
> the result will be simply unreadable (believe me, I tried). This
> is why I sent the whole new patch instead. I reworked the patch
> too much.
I don't know what is 'interdiff'. But I do know what is an atomic
commit. And this commit is not atomic. Also I know when a patch is
easy to follow and easy to review. This one isn't. Because I constantly
need to look for changes you did among hundreds of lines of refactoring.
Please, split the independent changes into separate commits so as they
could be properly reviewed and the changes could be justified in the
commit messages.
>> Also print_backtrace() becomes unused after your patch.
>
> Not really
>
> [cyrill@grain tarantool.git] git grep -n print_backtrace
> src/lib/core/backtrace.cc:436:print_backtrace(void)
> src/lib/core/backtrace.cc:449: print_backtrace();
> src/lib/core/backtrace.h:46:void print_backtrace(void);
> src/lua/init.c:367: print_backtrace();
>
> It is still suitable own handler, no?
Yes, my bad.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-16 11:16 ` Cyrill Gorcunov
2020-12-16 20:31 ` Cyrill Gorcunov
@ 2020-12-20 14:48 ` Vladislav Shpilevoy
2020-12-20 18:21 ` Cyrill Gorcunov
1 sibling, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-20 14:48 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: TML
>>> | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
>>> | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
>>> | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
>>> | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
>>> | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
>>> | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
>>> | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
>>> | "MGUgaW4gX3N0YXJ0KzJlCg==",
>>> | "timestamp": "2020-12-10 17:08:42 MSK"
>>> | }
>>> | }
>>
>> 3. And what is the reason you encode it all into base64 second time?
>
> Because backtrace contains newlines and other special symbols, this is very inconvenient
> when you print the json-dump into console.
I didn't ask about backtrace base64. I ask about why do you encode crashdump.data
field into base64 again. Currently you do base64 encoding 2 times: one for
signal.backtrace field, and second time you encode the whole
{uname: ..., build: ..., signal: ...}. Signal.backtrace here is already in base64.
I ask about the second encoding.
>>> +
>>> + char *p = static_alloc(SMALL_STATIC_SIZE);
>>> + char *e = &p[SMALL_STATIC_SIZE - 1];
>>> + char *head = p;
>>> + char *tail = &p[SMALL_STATIC_SIZE - 8];
>>
>> 10. Why -8?
>
> To have at least a gap before the end of a buffer.
Sorry, but this is not an answer. Why not -7, -100, -3?
I don't see where these 8 bytes are used for anything,
even as a special 8-byte gap.
>>> +
>>> + /*
>>> + * Lets reuse tail of the buffer as a temp space.
>>> + */
>>> + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
>>> + if (p >= (char *)(void *)uname_ptr)
>>
>> 11. Why do you need double cast? I can't imagine that it does anything
>> except adds confusion.
>
> Compiler doesn't like to compare a char pointer with a structure pointer
>
> src/lib/core/crash.c:251:8: error: comparison of distinct pointer types lacks a cast [-Werror]
> 251 | if (p >= (void *)uname_ptr)
> | ^~
Can you just use (char *) cast instead of (void *)? Could you simply
try that on your own without me suggesting it?
>>> + if (uname(uname_ptr) != 0) {
>>> + pr_syserr("uname call failed, ignore");
>>> + memset(uname_ptr, 0, sizeof(struct utsname));
>>> + }
>>> +
>>> + snprintf_safe("{");
>>
>> 12. snprintf_safe uses memory [begin, e]. Where 'begin' is the initial
>> value of 'p'. 'uname_ptr' occupies [e - 7 - sizeof(*uname_ptr), e - 7].
>> Assume static size is 12KB, and sizeof(*uname_ptr) is 640.
>>
>> [begin = 0, snprintf e = 12287]
>> [11640 uname 12280]
>>
>> So snprintf and uname buffers intersect. If there is a big enough dump,
>> it will corrupt uname object.
>>
>> Is it correct?
>
> This is undefined behaviour. I though about it and I don't like this but
> there is no other way if we want to use a single buffer for all. This was
> the reason while I've been using preallocated memory for all this and you
> said to reuse static alloc instead. Another option is to adjust @e pointer
> to be sure that we're not jumping into the tail. Letme think maybe I manage
> to make this more clear and readable...
It is not about clear/readable, for now. You just use buffer which can be
corrupted. It is a bug. Regardless of how clear will it look.
>>> + if (cinfo->signo == SIGSEGV) {
>>> + if (cinfo->sicode == SEGV_MAPERR) {
>>> + snprintf_safe("\"si_code_str\":\"%s\",",
>>> + "SEGV_MAPERR");
>>> + } else if (cinfo->sicode == SEGV_ACCERR) {
>>> + snprintf_safe("\"si_code_str\":\"%s\",",
>>> + "SEGV_ACCERR");
>>> + }
>>> + snprintf_safe("\"si_addr\":\"0x%llx\",",
>>> + (long long)cinfo->siaddr);
>>> + }
>>
>> 13. Instead of using all these global many KB buffers you
>> could also create a pipe to the new process, which the
>> child would read as stdin from Lua, without any tiny limits
>> on how many KBs it can use. And do write() to it directly.
>
> Pipes use kernel memory for own buffers and what is worse this
> memory is not accountable as far as I remember.
I don't see any problem with kernel using memory to support pipes.
But I do see a problem in the code above, which can simply return
garbage or crash again if the dump will be big enough.
>>> +
>>> + char *timestamp_rt_str = &tail[-128];
>>> + if (p >= timestamp_rt_str)
>>> + goto out;
>>> + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128);
>>
>> 14. Why 128? And why is it hardcoded?
>
> This should fit any possible timestring I think. The timestring is bound
> to locale and I think it might vary in size. I choose a big value but
> to be honest I'm not 100% if this will be enough forever.
Ok, then the second question remains - why is it hardcoded in 2 places?
I remember you sent some commits to the other source files where you
changed such places to use sizeof() or const variable or something like
that. How is it different here?
>>> + snprintf_safe("\"timestamp\":\"%s\"", timestamp_rt_str);
>>> + snprintf_safe("}");
>>> + snprintf_safe("}");
>>> +
>>> + size_t report_len = p - head;
>>> + size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP);
>>> +
>>> + char *report_base64 = &tail[-report_elen-8];
>>
>> 15. Why -8?
>
> Already explained.
Didn't understand, sorry.
>>> + pid_t pid = vfork();
>>> + if (pid == 0) {
>>> + /*
>>> + * The script must exit at the end but there
>>> + * is no simple way to make sure from inside
>>> + * of a signal crash handler. So just hope it
>>> + * is running fine.
>>> + */
>>> + execve(exec_argv[0], exec_argv, NULL);
>>> + pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
>>> + exec_argv[0], exec_argv[0],
>>> + exec_argv[1], exec_argv[2]);
>>
>> 17. What will happen if this tarantool also will crash? Will there
>> be an infinite chain of crashes? For instance, if we will have an
>> issue with curl again. Or somebody else will build with his own
>> curl leading to a crash on any usage, like it already happened one
>> time.
>
> Strictly speaking -- yes, you're right. We might have a chain of crashes
> and I think we might need a command line option for this?
If it will crash before checking the command line option, it won't
help.
I see that feedback_daemon is disabled until first box.cfg. I suggest
you do the same here - disable crash sending by default. Enable it
with first box.cfg. In your script you don't use box.cfg, so a chain
crash won't happen. Unless some bug will override the flag responsible
for crash send.
I also realized, that before box.cfg you don't know the URL. So if I
crash the process before it, I get this:
SystemError curl: URL using bad/illegal format or missing URL: Invalid argument
fatal error, exiting the event loop
This is because you try to send the dump before the crash module is
configured.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-16 20:31 ` Cyrill Gorcunov
@ 2020-12-20 15:16 ` Vladislav Shpilevoy
2020-12-20 18:26 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-20 15:16 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: TML
I see you didn't push it on the branch. Please, do. Otherwise I can't
validate if it works.
See 6 comments below.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index a8bc3471d..27079fd46 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1213,6 +1214,21 @@ box_set_prepared_stmt_cache_size(void)
> return 0;
> }
>
> +void
> +box_set_crash_params(void)
> +{
> + const char *host = cfg_gets("feedback_host");
> + bool is_enabled_1 = cfg_getb("feedback_enabled");
> + bool is_enabled_2 = cfg_getb("feedback_crashinfo");
> +
> + if (host != NULL && strlen(host) >= CRASH_FEEDBACK_HOST_MAX) {
> + tnt_raise(ClientError, ER_CFG, "feedback_host",
> + "the address is too long");
1. Please, don't use exceptions in the new code. We are getting rid
of them, and such changes complicate it in future.
> + }
> +
> + crash_cfg_set_params(host, is_enabled_1 && is_enabled_2);
2. Just 'crash_cfg', please. We have lots of 'cfg' functions for
various modules, and none of them has 'set_params' suffix. Probably
because this is what 'cfg' means anyway.
> +}
> diff --git a/src/lib/core/crash.c b/src/lib/core/crash.c
> index a15a13e8e..f6dd91987 100644
> --- a/src/lib/core/crash.c
> +++ b/src/lib/core/crash.c
> @@ -81,13 +96,75 @@ static struct crash_info {
> int sicode;
> #ifdef ENABLE_BACKTRACE
> /*
> - * 4K of memory should be enough to keep the backtrace.
> + * 1K of memory should be enough to keep the backtrace.
> * In worst case it gonna be simply trimmed.
> */
> - char backtrace_buf[4096];
> + char backtrace_buf[1024];
3. I am begging you. Please. Stop mixing independent changes with
each other. Please. How can I ask otherwise? What can I do to
explain this? How can I help to understand?
> #endif> @@ -126,6 +209,196 @@ crash_collect(int signo, siginfo_t *siginfo, void *ucontext)
> return cinfo;
> }
>
> +/**
> + * Mark an environment that we're in crashinfo handling, this
> + * allows us to escape recursive attempts to send report,
> + * if the action of sending report is failing itself.
> + */
> +static int
> +crash_mark_env_mode(void)
> +{
> + const char *env_name = "TT_CRASHINFO_MODE";
> + if (getenv(env_name) != NULL) {
> + pr_crit("recursive failure detected");
> + return -1;
> + }
> +
> + if (setenv(env_name, "y", 0) != 0) {
> + pr_crit("unable to setup %s", env_name);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * Report crash information to the feedback daemon
> + * (ie send it to feedback daemon).
> + */
> +static void
> +crash_report_feedback_daemon(struct crash_info *cinfo)
> +{
> + if (crash_mark_env_mode() != 0)
> + return;
> +
> + /*
> + * Update to a new number if format get changed.
> + */
> + const int crashinfo_version = 1;
> +
> + char *p = static_alloc(SMALL_STATIC_SIZE);
> + char *e = &p[SMALL_STATIC_SIZE - 1];
> + char *head = p;
> + char *tail = &p[SMALL_STATIC_SIZE - 8];
> +
> + /*
> + * Note that while we encode the report we
> + * intensively use a tail of the allocated
> + * buffer as a temporary store.
> + */
> +
> +#define snprintf_safe(__end, __fmt, ...) \
> + do { \
> + size_t size = (char *)(void *)__end - p; \
4. Please, remove the double cast. You can use (char *) as is. Here and
in other places.
> + p += snprintf(p, size, __fmt, ##__VA_ARGS__); \
> + if (p >= (char *)(void *)__end) \
> + goto out; \
> + } while (0)
> +
> + /*
> + * Lets reuse tail of the buffer as a temp space.
> + */
> + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
> + if (p >= (char *)(void *)uname_ptr)
> + goto out;
> +
> + if (uname(uname_ptr) != 0) {
> + pr_syserr("uname call failed, ignore");
> + memset(uname_ptr, 0, sizeof(struct utsname));
> + }
> +
> + snprintf_safe(uname_ptr, "{");
> + snprintf_safe(uname_ptr, "\"uname\":{");
> + snprintf_safe(uname_ptr, "\"sysname\":\"%s\",", uname_ptr->sysname);
> + /*
> + * nodename might a sensitive information, skip.
5. 'might' what? You missed a verb.
> + */
> + snprintf_safe(uname_ptr, "\"release\":\"%s\",", uname_ptr->release);
> + snprintf_safe(uname_ptr, "\"version\":\"%s\",", uname_ptr->version);
> + snprintf_safe(uname_ptr, "\"machine\":\"%s\"", uname_ptr->machine);
> + snprintf_safe(uname_ptr, "},");
> +
> + snprintf_safe(e, "\"build\":{");
> + snprintf_safe(e, "\"version\":\"%s\",", PACKAGE_VERSION);
> + snprintf_safe(e, "\"cmake_type\":\"%s\"", BUILD_INFO);
> + snprintf_safe(e, "},");
> +
> + snprintf_safe(e, "\"signal\":{");
> + snprintf_safe(e, "\"signo\":%d,", cinfo->signo);
> + snprintf_safe(e, "\"si_code\":%d,", cinfo->sicode);
> + if (cinfo->signo == SIGSEGV) {
> + if (cinfo->sicode == SEGV_MAPERR) {
> + snprintf_safe(e, "\"si_code_str\":\"%s\",",
> + "SEGV_MAPERR");
> + } else if (cinfo->sicode == SEGV_ACCERR) {
> + snprintf_safe(e, "\"si_code_str\":\"%s\",",
> + "SEGV_ACCERR");
> + }
> + snprintf_safe(e, "\"si_addr\":\"0x%llx\",",
> + (long long)cinfo->siaddr);
> + }
> +
> +#ifdef ENABLE_BACKTRACE
> + /*
> + * The backtrace itself is encoded into base64 form
> + * since it might have arbitrary symbols not suitable
> + * for json encoding (newlines and etc).
> + */
> + size_t bt_len = strlen(cinfo->backtrace_buf);
> + size_t bt_elen = base64_bufsize(bt_len, BASE64_NOWRAP);
> + char *bt_base64 = &tail[-bt_elen];
> + if (p >= bt_base64)
> + goto out;
> + base64_encode(cinfo->backtrace_buf, bt_len,
> + bt_base64, bt_elen, BASE64_NOWRAP);
> + bt_base64[bt_elen] = '\0';
> + snprintf_safe(bt_base64, "\"backtrace\":\"%s\",", bt_base64);
> +#endif
> +
> + /* 64 bytes should be enough for longest localtime */
> + char *timestamp_rt_str = &tail[-64];
> + if (p >= timestamp_rt_str)
> + goto out;
> + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 64);
> + snprintf_safe(timestamp_rt_str, "\"timestamp\":\"%s\"", timestamp_rt_str);
> + snprintf_safe(timestamp_rt_str, "}");
> + snprintf_safe(timestamp_rt_str, "}");
> +
> + size_t report_len = p - head;
> + size_t report_elen = base64_bufsize(report_len, BASE64_NOWRAP);
> +
> + char *report_base64 = &tail[-report_elen];
> + if (p >= report_base64)
> + goto out;
> + base64_encode(head, report_len, report_base64,
> + report_elen, BASE64_NOWRAP);
> + report_base64[report_elen] = '\0';
> +
> + /*
> + * Encoded report now sits at report_base64 position,
> + * at the tail of 'small' static buffer. Lets prepare
> + * the script to run.
> + */
> + p = head;
> + snprintf_safe(report_base64,
> + "require(\'http.client\').post(\'%s\',"
> + "'{\"crashdump\":{\"version\":\"%d\","
> + "\"data\": \"%s\"}}',{timeout=1});"
> + "os.exit(1);", feedback_host,
> + crashinfo_version, report_base64);
6. I think now I understood all these buffer allocs. Except
that I still don't understand why is tail initialized as
char *tail = &p[SMALL_STATIC_SIZE - 8];
instead of just `tail = end;`.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in Cyrill Gorcunov
2020-12-11 9:31 ` Serge Petrenko
2020-12-14 22:54 ` Vladislav Shpilevoy
@ 2020-12-20 15:45 ` Vladislav Shpilevoy
2 siblings, 0 replies; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-20 15:45 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson
> diff --git a/src/lib/core/crash.h b/src/lib/core/crash.h
> new file mode 100644
> index 000000000..d107cd953
> --- /dev/null
> +++ b/src/lib/core/crash.h
> @@ -0,0 +1,32 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +#pragma once
> +
> +#include <stdint.h>
> +#include <signal.h>
> +#include <limits.h>> +
> +#include "trivia/config.h"
Why are these headers here? Crash.h does not use
anything except what is present in the language syntax.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-20 14:48 ` Vladislav Shpilevoy
@ 2020-12-20 15:49 ` Cyrill Gorcunov
2020-12-20 16:07 ` Vladislav Shpilevoy
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-20 15:49 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml
On Sun, Dec 20, 2020 at 03:48:11PM +0100, Vladislav Shpilevoy wrote:
> >>> +static struct crash_info {
> >>> + /**
> >>> + * These two are mostly useless as being
> >>> + * plain addresses but keep for backward
> >>> + * compatibility.
> >>
> >> 3. Why don't you say the same about 'siaddr'? It is also
> >> just a plain address.
> >
> > The other members are exported to report while these two
> > are printed in local console only. To be honest I don't
> > see any reason in these two members but I kept them to
> > not break backward compatibility.
>
> This one also is printed to local console only. So the
> question is the same, why don't you call it also a useless
> plain address?
Because context_addr and siginfo_addr comes from a signal frame
and they are uselesss if you have no real crash dump under your
hands. You simply cant do anything with them and I don't understand
why we're printing them and who needs them.
In turn siaddr points to the memory caused the signal to trigger
and this address a way more suitable for understanding what
is happeneing than context and siginfo.
I can simply drop the comment if you dont like it.
> >>> +#ifdef ENABLE_BACKTRACE
> >>> + /*
> >>
> >> 4. We usually use /** in out-of-function comment's
> >> first line.
> >
> > This comment is not a part of doxygen, I left it this
> > way intentionally. This comment for internal use.
>
> So you seriously think everything else should go to doxygen?
Yes, I thought so.
> Please, lets be consistent. The rule is simple - /** our of
> functions, /* - inside. Please, just follow it.
Sure, will update.
> >
> >> 2) Why isn't it in a separate commit? As I told you, it is really
> >> hard to extract what did you change in a ~460 lines patch titled as
> >> 'move' to check if it does not break anything or is even needed.
> >> Please, don't make it harder.
> >
> > Vlad, I remember this. The problem is that if I would do interdiff
> > the result will be simply unreadable (believe me, I tried). This
> > is why I sent the whole new patch instead. I reworked the patch
> > too much.
>
> I don't know what is 'interdiff'. But I do know what is an atomic
interdiff shows changes on top of previous commit. the changes were
so intrusive that I posted a new version instead because interdiff
were unreadable.
> commit. And this commit is not atomic. Also I know when a patch is
> easy to follow and easy to review. This one isn't. Because I constantly
> need to look for changes you did among hundreds of lines of refactoring.
>
> Please, split the independent changes into separate commits so as they
> could be properly reviewed and the changes could be justified in the
> commit messages.
Currently the commit is big enough because I jump into signal handling
in one pass. If you prefer I can split the series in more small changes.
Or you mean the updates over previsous series should be changes in small
steps as a separate commits?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-20 15:49 ` Cyrill Gorcunov
@ 2020-12-20 16:07 ` Vladislav Shpilevoy
2020-12-20 16:58 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-20 16:07 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Mons Anderson, tml
On 20.12.2020 16:49, Cyrill Gorcunov wrote:
> On Sun, Dec 20, 2020 at 03:48:11PM +0100, Vladislav Shpilevoy wrote:
>>>>> +static struct crash_info {
>>>>> + /**
>>>>> + * These two are mostly useless as being
>>>>> + * plain addresses but keep for backward
>>>>> + * compatibility.
>>>>
>>>> 3. Why don't you say the same about 'siaddr'? It is also
>>>> just a plain address.
>>>
>>> The other members are exported to report while these two
>>> are printed in local console only. To be honest I don't
>>> see any reason in these two members but I kept them to
>>> not break backward compatibility.
>>
>> This one also is printed to local console only. So the
>> question is the same, why don't you call it also a useless
>> plain address?
>
> Because context_addr and siginfo_addr comes from a signal frame
> and they are uselesss if you have no real crash dump under your
> hands. You simply cant do anything with them and I don't understand
> why we're printing them and who needs them.
>
> In turn siaddr points to the memory caused the signal to trigger
> and this address a way more suitable for understanding what
> is happeneing than context and siginfo.
>
> I can simply drop the comment if you dont like it.
No. I am trying to understand why one plain address matters,
and the others are not. But now I think I understood, thanks.
Keep the comment.
>> commit. And this commit is not atomic. Also I know when a patch is
>> easy to follow and easy to review. This one isn't. Because I constantly
>> need to look for changes you did among hundreds of lines of refactoring.
>>
>> Please, split the independent changes into separate commits so as they
>> could be properly reviewed and the changes could be justified in the
>> commit messages.
>
> Currently the commit is big enough because I jump into signal handling
> in one pass. If you prefer I can split the series in more small changes.
> Or you mean the updates over previsous series should be changes in small
> steps as a separate commits?
No, I don't mean commits with review fixes. I mean the patch commits.
Currently you have a patch that moves signal handling to crash.c AND it
also changes backtrace to use a 4KB buffer instead of the static buffer.
These 2 changes (move and backtrace buffer source change) are not related.
Therefore it is not correct to put them into 1 commit. It does not matter
in which order you will do these 2 changes, but one should be about move
only, and the other one about backtrace buffer update only.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in
2020-12-20 16:07 ` Vladislav Shpilevoy
@ 2020-12-20 16:58 ` Cyrill Gorcunov
0 siblings, 0 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-20 16:58 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml
On Sun, Dec 20, 2020 at 05:07:14PM +0100, Vladislav Shpilevoy wrote:
> >
> > I can simply drop the comment if you dont like it.
>
> No. I am trying to understand why one plain address matters,
> and the others are not. But now I think I understood, thanks.
> Keep the comment.
OK.
> >> commit. And this commit is not atomic. Also I know when a patch is
> >> easy to follow and easy to review. This one isn't. Because I constantly
> >> need to look for changes you did among hundreds of lines of refactoring.
> >>
> >> Please, split the independent changes into separate commits so as they
> >> could be properly reviewed and the changes could be justified in the
> >> commit messages.
> >
> > Currently the commit is big enough because I jump into signal handling
> > in one pass. If you prefer I can split the series in more small changes.
> > Or you mean the updates over previsous series should be changes in small
> > steps as a separate commits?
>
> No, I don't mean commits with review fixes. I mean the patch commits.
> Currently you have a patch that moves signal handling to crash.c AND it
> also changes backtrace to use a 4KB buffer instead of the static buffer.
Yes, this is my bad :-( I'm really sorry for that, Vlad! Actually I found it
only after I've sent it out. I really appreciate all your comments!
> These 2 changes (move and backtrace buffer source change) are not related.
> Therefore it is not correct to put them into 1 commit. It does not matter
> in which order you will do these 2 changes, but one should be about move
> only, and the other one about backtrace buffer update only.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-20 14:48 ` Vladislav Shpilevoy
@ 2020-12-20 18:21 ` Cyrill Gorcunov
2020-12-20 18:41 ` Vladislav Shpilevoy
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-20 18:21 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: TML
On Sun, Dec 20, 2020 at 03:48:14PM +0100, Vladislav Shpilevoy wrote:
> >>> | "backtrace": "IzAgIDB4NjMxZDAzIGluIGNyYXNoX2NvbGxlY3QrYmYKIzEgID"
> >>> | "B4NjMyZWRhIGluIGNyYXNoX3NpZ25hbF9jYis1MQojMiAgMHg3ZjFkNDkyMzlhO"
> >>> | "TAgaW4gZnVubG9ja2ZpbGUrNjAKIzMgIDB4N2YxZDQ4ZGQxYzVlIGluIGVwb2xs"
> >>> | "X3dhaXQrNWUKIzQgIDB4ODM3NzEzIGluIGVwb2xsX3BvbGwrOTkKIzUgIDB4ODN"
> >>> | "hZDQ4IGluIGV2X3J1bis0ODcKIzYgIDB4NjBmMGFlIGluIHRhcmFudG9vbF9sdW"
> >>> | "FfcnVuX3NjcmlwdCsxMzgKIzcgIDB4NDQ2NWY1IGluIG1haW4rNWFjCiM4ICAwe"
> >>> | "DdmMWQ0OGNmNzA0MiBpbiBfX2xpYmNfc3RhcnRfbWFpbitmMgojOSAgMHg0NDRh"
> >>> | "MGUgaW4gX3N0YXJ0KzJlCg==",
> >>> | "timestamp": "2020-12-10 17:08:42 MSK"
> >>> | }
> >>> | }
> >>
> >> 3. And what is the reason you encode it all into base64 second time?
> >
> > Because backtrace contains newlines and other special symbols, this is very inconvenient
> > when you print the json-dump into console.
>
> I didn't ask about backtrace base64. I ask about why do you encode crashdump.data
> field into base64 again. Currently you do base64 encoding 2 times: one for
> signal.backtrace field, and second time you encode the whole
> {uname: ..., build: ..., signal: ...}. Signal.backtrace here is already in base64.
> I ask about the second encoding.
Ah. The second encoding is needed because we use a json container when
sending this data via http request. IOW, the server receives the form
| {
| "crashdump": {
| "version": "1",
| "data": "eyJ1bmFtZSI6eyJzeXNuYW1lIjoiTGludXgiLCJyZWxlYXNlIjoiNS"
| ...
Because this is more clear I think. We've two stages:
- generation of "data" value
- generation of the toplevel "crashdump"
and if you look into the code you'll see that if we need to extend any
of these stages it gonna be easier to count braces and add new fields.
Still if you prefer to not encode "data" then I can rip off the second
encoding.
> >>> + char *p = static_alloc(SMALL_STATIC_SIZE);
> >>> + char *e = &p[SMALL_STATIC_SIZE - 1];
> >>> + char *head = p;
> >>> + char *tail = &p[SMALL_STATIC_SIZE - 8];
> >>
> >> 10. Why -8?
> >
> > To have at least a gap before the end of a buffer.
>
> Sorry, but this is not an answer. Why not -7, -100, -3?
> I don't see where these 8 bytes are used for anything,
> even as a special 8-byte gap.
-8 is a native alignment pointers. You know, after thinking
more about this gap I suppose we don't need it, it won't
prevent from potential arithmetical errors and gonna be
simply confusng. You're right, I'll drop it.
> >>> +
> >>> + /*
> >>> + * Lets reuse tail of the buffer as a temp space.
> >>> + */
> >>> + struct utsname *uname_ptr = (void *)&tail[-sizeof(struct utsname)];
> >>> + if (p >= (char *)(void *)uname_ptr)
> >>
> >> 11. Why do you need double cast? I can't imagine that it does anything
> >> except adds confusion.
> >
> > Compiler doesn't like to compare a char pointer with a structure pointer
> >
> > src/lib/core/crash.c:251:8: error: comparison of distinct pointer types lacks a cast [-Werror]
> > 251 | if (p >= (void *)uname_ptr)
> > | ^~
>
> Can you just use (char *) cast instead of (void *)? Could you simply
> try that on your own without me suggesting it?
OK. Actually this simply a shorter form of the same meaning.
> >>
> >> 12. snprintf_safe uses memory [begin, e]. Where 'begin' is the initial
> >> value of 'p'. 'uname_ptr' occupies [e - 7 - sizeof(*uname_ptr), e - 7].
> >> Assume static size is 12KB, and sizeof(*uname_ptr) is 640.
> >>
> >> [begin = 0, snprintf e = 12287]
> >> [11640 uname 12280]
> >>
> >> So snprintf and uname buffers intersect. If there is a big enough dump,
> >> it will corrupt uname object.
> >>
> >> Is it correct?
> >
> > This is undefined behaviour. I though about it and I don't like this but
> > there is no other way if we want to use a single buffer for all. This was
> > the reason while I've been using preallocated memory for all this and you
> > said to reuse static alloc instead. Another option is to adjust @e pointer
> > to be sure that we're not jumping into the tail. Letme think maybe I manage
> > to make this more clear and readable...
>
> It is not about clear/readable, for now. You just use buffer which can be
> corrupted. It is a bug. Regardless of how clear will it look.
Reworked.
> >> 13. Instead of using all these global many KB buffers you
> >> could also create a pipe to the new process, which the
> >> child would read as stdin from Lua, without any tiny limits
> >> on how many KBs it can use. And do write() to it directly.
> >
> > Pipes use kernel memory for own buffers and what is worse this
> > memory is not accountable as far as I remember.
>
> I don't see any problem with kernel using memory to support pipes.
I predict problems. We already been fighing with closing inherited
fds in popen code, and this new fds are gonna be more confusing
(not considering unaccounted memory).
What we need now is to walk over _all_ fds we create in tarantool
(including sockets) code and make sure that close-on-exec flag is
set. Only after this we should mark the new fds for crash dump
as intentionally inheritable. Otherwise the code becomes a pure
mess.
Vlad, I propose to keep it in form as it is now and probably think
of fds inheritance later. The potential chain crash is addressed.
> But I do see a problem in the code above, which can simply return
> garbage or crash again if the dump will be big enough.
Which garbage you mean? The potential buffer override is addressed
in latest version.
> >>> + char *timestamp_rt_str = &tail[-128];
> >>> + if (p >= timestamp_rt_str)
> >>> + goto out;
> >>> + ns_to_localtime(cinfo->timestamp_rt, timestamp_rt_str, 128);
> >>
> >> 14. Why 128? And why is it hardcoded?
> >
> > This should fit any possible timestring I think. The timestring is bound
> > to locale and I think it might vary in size. I choose a big value but
> > to be honest I'm not 100% if this will be enough forever.
>
> Ok, then the second question remains - why is it hardcoded in 2 places?
> I remember you sent some commits to the other source files where you
> changed such places to use sizeof() or const variable or something like
> that. How is it different here?
I can define a macro for this length, but since this snippet is small
and both constants sit near each other, should not be a problem if
we opencoded them.
> >>
> >> 15. Why -8?
> >
> > Already explained.
>
> Didn't understand, sorry.
I got your point, will address.
>
> >>> + pid_t pid = vfork();
> >>> + if (pid == 0) {
> >>> + /*
> >>> + * The script must exit at the end but there
> >>> + * is no simple way to make sure from inside
> >>> + * of a signal crash handler. So just hope it
> >>> + * is running fine.
> >>> + */
> >>> + execve(exec_argv[0], exec_argv, NULL);
> >>> + pr_crit("exec(%s,[%s,%s,%s,NULL]) failed",
> >>> + exec_argv[0], exec_argv[0],
> >>> + exec_argv[1], exec_argv[2]);
> >>
> >> 17. What will happen if this tarantool also will crash? Will there
> >> be an infinite chain of crashes? For instance, if we will have an
> >> issue with curl again. Or somebody else will build with his own
> >> curl leading to a crash on any usage, like it already happened one
> >> time.
> >
> > Strictly speaking -- yes, you're right. We might have a chain of crashes
> > and I think we might need a command line option for this?
>
> If it will crash before checking the command line option, it won't
> help.
In update I setup environment variable to intercept chained crash.
> I see that feedback_daemon is disabled until first box.cfg. I suggest
> you do the same here - disable crash sending by default. Enable it
> with first box.cfg. In your script you don't use box.cfg, so a chain
> crash won't happen. Unless some bug will override the flag responsible
> for crash send.
Already did in update, you should have it in your mbox.
> I also realized, that before box.cfg you don't know the URL. So if I
> crash the process before it, I get this:
>
> SystemError curl: URL using bad/illegal format or missing URL: Invalid argument
> fatal error, exiting the event loop
>
> This is because you try to send the dump before the crash module is
> configured.
Updated.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-20 15:16 ` Vladislav Shpilevoy
@ 2020-12-20 18:26 ` Cyrill Gorcunov
0 siblings, 0 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-20 18:26 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: TML
On Sun, Dec 20, 2020 at 04:16:46PM +0100, Vladislav Shpilevoy wrote:
> I see you didn't push it on the branch. Please, do. Otherwise I can't
> validate if it works.
OK, I'll make a new branch and address your comments.
> See 6 comments below.
>
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index a8bc3471d..27079fd46 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1213,6 +1214,21 @@ box_set_prepared_stmt_cache_size(void)
> > return 0;
> > }
> >
> > +void
> > +box_set_crash_params(void)
> > +{
> > + const char *host = cfg_gets("feedback_host");
> > + bool is_enabled_1 = cfg_getb("feedback_enabled");
> > + bool is_enabled_2 = cfg_getb("feedback_crashinfo");
> > +
> > + if (host != NULL && strlen(host) >= CRASH_FEEDBACK_HOST_MAX) {
> > + tnt_raise(ClientError, ER_CFG, "feedback_host",
> > + "the address is too long");
>
> 1. Please, don't use exceptions in the new code. We are getting rid
> of them, and such changes complicate it in future.
Sure!
> > + }
> > +
> > + crash_cfg_set_params(host, is_enabled_1 && is_enabled_2);
>
> 2. Just 'crash_cfg', please. We have lots of 'cfg' functions for
> various modules, and none of them has 'set_params' suffix. Probably
> because this is what 'cfg' means anyway.
OK
> > - char backtrace_buf[4096];
> > + char backtrace_buf[1024];
>
> 3. I am begging you. Please. Stop mixing independent changes with
> each other. Please. How can I ask otherwise? What can I do to
> explain this? How can I help to understand?
Sorry Vlad :(
> > +#define snprintf_safe(__end, __fmt, ...) \
> > + do { \
> > + size_t size = (char *)(void *)__end - p; \
>
> 4. Please, remove the double cast. You can use (char *) as is. Here and
> in other places.
OK
> > + /*
> > + * nodename might a sensitive information, skip.
>
> 5. 'might' what? You missed a verb.
ouch, might contain of course
>
> 6. I think now I understood all these buffer allocs. Except
> that I still don't understand why is tail initialized as
>
> char *tail = &p[SMALL_STATIC_SIZE - 8];
>
> instead of just `tail = end;`.
yes, redundant, will drop.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-20 18:21 ` Cyrill Gorcunov
@ 2020-12-20 18:41 ` Vladislav Shpilevoy
2020-12-20 19:16 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-20 18:41 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: TML
>> I see that feedback_daemon is disabled until first box.cfg. I suggest
>> you do the same here - disable crash sending by default. Enable it
>> with first box.cfg. In your script you don't use box.cfg, so a chain
>> crash won't happen. Unless some bug will override the flag responsible
>> for crash send.
>
> Already did in update, you should have it in your mbox.
>
>> I also realized, that before box.cfg you don't know the URL. So if I
>> crash the process before it, I get this:
>>
>> SystemError curl: URL using bad/illegal format or missing URL: Invalid argument
>> fatal error, exiting the event loop
>>
>> This is because you try to send the dump before the crash module is
>> configured.
>
> Updated.
The error above I checked on your branch. And it remains the same
even with the latest commit which you didn't push yet. Because the
feedback options still are not configured before box.cfg. It has
nothing to do with the recursive crash.
It means, as I said above, that you can't send a crash before
box.cfg is done. Therefore you can simply not even try if the
host is not configured yet. It solves the error message, and alongside
solves the crash chain issue. You don't need environment variables
for this.
I would suggest not to rush responding to these emails, but give it
some thought beforehand.
It is actually strange, that feedback daemon does not work before
box.cfg. We don't see any reports from tarantool instances which are
used as an application server. But it has nothing to do with your
patch anyway.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-20 18:41 ` Vladislav Shpilevoy
@ 2020-12-20 19:16 ` Cyrill Gorcunov
2020-12-21 17:01 ` Vladislav Shpilevoy
0 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2020-12-20 19:16 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: TML
On Sun, Dec 20, 2020 at 07:41:47PM +0100, Vladislav Shpilevoy wrote:
>
> The error above I checked on your branch. And it remains the same
> even with the latest commit which you didn't push yet. Because the
> feedback options still are not configured before box.cfg. It has
> nothing to do with the recursive crash.
>
> It means, as I said above, that you can't send a crash before
> box.cfg is done. Therefore you can simply not even try if the
> host is not configured yet. It solves the error message, and alongside
> solves the crash chain issue. You don't need environment variables
> for this.
>
> I would suggest not to rush responding to these emails, but give it
> some thought beforehand.
>
> It is actually strange, that feedback daemon does not work before
> box.cfg. We don't see any reports from tarantool instances which are
> used as an application server. But it has nothing to do with your
> patch anyway.
The latest updates are in gorcunov/gh-5261-crash-report-4-rev2
---
...
static bool send_crashinfo = false;
...
static void
crash_signal_cb(int signo, siginfo_t *siginfo, void *context)
{
static volatile sig_atomic_t in_cb = 0;
struct crash_info *cinfo;
if (in_cb == 0) {
in_cb = 1;
cinfo = crash_collect(signo, siginfo, context);
crash_report_stderr(cinfo);
if (send_crashinfo)
crash_report_feedback_daemon(cinfo);
} else {
/* Got a signal while running the handler. */
fprintf(stderr, "Fatal %d while backtracing", signo);
}
/* Try to dump a core */
struct sigaction sa = {
.sa_handler = SIG_DFL,
};
sigemptyset(&sa.sa_mask);
sigaction(SIGABRT, &sa, NULL);
abort();
}
...
void
crash_cfg_set_params(const char *host, bool is_enabled)
{
if (host == NULL || !is_enabled) {
if (send_crashinfo) {
pr_debug("disable sending crashinfo feedback");
send_crashinfo = false;
feedback_host[0] = '\0';
}
return;
}
if (strcmp(feedback_host, host) != 0) {
strlcpy_a(feedback_host, host);
if (strlen(feedback_host) < strlen(host))
pr_panic("feedback_host is too long");
}
if (!send_crashinfo) {
pr_debug("enable sending crashinfo feedback");
send_crashinfo = true;
}
}
---
IOW, until explicitly configured via box.cfg{} the crash won't
be sent. I'll address all your previous comments in v5 of the branch.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server
2020-12-20 19:16 ` Cyrill Gorcunov
@ 2020-12-21 17:01 ` Vladislav Shpilevoy
0 siblings, 0 replies; 38+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-21 17:01 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: TML
On 20.12.2020 20:16, Cyrill Gorcunov wrote:
> On Sun, Dec 20, 2020 at 07:41:47PM +0100, Vladislav Shpilevoy wrote:
>>
>> The error above I checked on your branch. And it remains the same
>> even with the latest commit which you didn't push yet. Because the
>> feedback options still are not configured before box.cfg. It has
>> nothing to do with the recursive crash.
>>
>> It means, as I said above, that you can't send a crash before
>> box.cfg is done. Therefore you can simply not even try if the
>> host is not configured yet. It solves the error message, and alongside
>> solves the crash chain issue. You don't need environment variables
>> for this.
>>
>> I would suggest not to rush responding to these emails, but give it
>> some thought beforehand.
>>
>> It is actually strange, that feedback daemon does not work before
>> box.cfg. We don't see any reports from tarantool instances which are
>> used as an application server. But it has nothing to do with your
>> patch anyway.
>
> The latest updates are in gorcunov/gh-5261-crash-report-4-rev2
Indeed, on this branch the crash is not attempted to be sent.
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2020-12-21 17:01 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-10 16:18 [Tarantool-patches] [PATCH v4 0/4] crash: implement sending feedback Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 1/4] util: introduce strlcpy helper Cyrill Gorcunov
2020-12-11 7:34 ` Serge Petrenko
2020-12-11 7:58 ` Serge Petrenko
2020-12-11 10:04 ` Cyrill Gorcunov
2020-12-11 11:07 ` Serge Petrenko
2020-12-11 11:38 ` Cyrill Gorcunov
2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-15 8:47 ` Cyrill Gorcunov
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 2/4] backtrace: allow to specify destination buffer Cyrill Gorcunov
2020-12-11 7:50 ` Serge Petrenko
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 3/4] crash: move fatal signal handling in Cyrill Gorcunov
2020-12-11 9:31 ` Serge Petrenko
2020-12-11 10:38 ` Cyrill Gorcunov
2020-12-11 11:12 ` Serge Petrenko
2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-15 8:16 ` Cyrill Gorcunov
2020-12-20 14:48 ` Vladislav Shpilevoy
2020-12-20 15:49 ` Cyrill Gorcunov
2020-12-20 16:07 ` Vladislav Shpilevoy
2020-12-20 16:58 ` Cyrill Gorcunov
2020-12-20 15:45 ` Vladislav Shpilevoy
2020-12-10 16:18 ` [Tarantool-patches] [PATCH v4 4/4] crash: report crash data to the feedback server Cyrill Gorcunov
2020-12-11 12:57 ` Serge Petrenko
2020-12-12 16:19 ` Cyrill Gorcunov
2020-12-12 17:07 ` Cyrill Gorcunov
2020-12-14 9:41 ` Serge Petrenko
2020-12-14 22:54 ` Vladislav Shpilevoy
2020-12-16 11:16 ` Cyrill Gorcunov
2020-12-16 20:31 ` Cyrill Gorcunov
2020-12-20 15:16 ` Vladislav Shpilevoy
2020-12-20 18:26 ` Cyrill Gorcunov
2020-12-20 14:48 ` Vladislav Shpilevoy
2020-12-20 18:21 ` Cyrill Gorcunov
2020-12-20 18:41 ` Vladislav Shpilevoy
2020-12-20 19:16 ` Cyrill Gorcunov
2020-12-21 17:01 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox