[Tarantool-patches] [PATCH] tools: implement toolchain for crash artefacts
Sergey Bronnikov
sergeyb at tarantool.org
Wed Mar 3 15:44:24 MSK 2021
Hello!
Igor, thanks for the patch!
I spend a bit on testing script. To create a coredump I run tarantool binary
and generated coredump with GDB: "gdb -batch -ex "generate-core-file" -p
PID".
To create report I run:
[s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ ./tools/tarabrt.sh -c
core.2715242 -e build/src/tarantool
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
The resulting is located here:
/home/s.bronnikov/work/tarantool/tarantool-core-N-202103031215-tarantool-core-dev-mcs1.tar.gz
If you want to upload it, choose the available resourse
(e.g. http://transfer.sh) and run the following command:
$ curl -T
/home/s.bronnikov/work/tarantool/tarantool-core-N-202103031222-tarantool-core-dev-mcs1.tar.gz
<resourse-uri>
1. Can we suppress warnings produced by gdb? I believe it is not useful
information for users.
2. I propose to show files included to the report and the end of creation.
When I run tarabrt.sh without specifying -e it says that
/usr/bin/tarantool is not ELF:
(venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$
./tools/tarabrt.sh -c core.2715242
Not an ELF file: /usr/bin/tarantool
The given BINARY file is not an ELF (see elf(5) for more info).
If you see this message, check the BINARY file the following way:
$ file /usr/bin/tarantool
(venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ file
/usr/bin/tarantool
/usr/bin/tarantool: ELF 64-bit LSB shared object, x86-64, version 1
(SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for
GNU/Linux 3.2.0, BuildID[sha1]=58d86a2b8bf88de3496ea4ead0a8d3c7a73e9c36,
stripped, too many notes (256)
(venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$
Also see my comments inline.
On 25.02.2021 16:23, Igor Munkin wrote:
> This patch introduces two scripts to ease crash artefacts collecting and
> loading for postmortem analysis:
>
> * tarabrt.sh - the tool collecting a tarball with the crash artefacts
> the right way: the coredump with the binary, all loaded shared libs,
> Tarantool version (this is a separate exercise to get it from the
> binary built with -O2). Besides, the tarball has a unified layout, so
> it can be easily processed with the second script:
> - /coredump - core dump file on the root level
> - /binary - tarantool executable on the root level
> - /version - plain text file on the root level with
> `tarantool --version` output
> - /checklist - plain text file on the root level with
> the list of the collected entities
> - all shared libraries used by the crashed instance - their layout
> respects the one on the host machine, so them can be easily loaded
> with the following gdb command: set sysroot $(realpath .)
>
> The script can be easily used either manually or via
> kernel.core_pattern variable.
>
> * gdb.sh - the auxiliary script originally written by @Totktonada, but
> needed to be adjusted to the crash artefacts layout every time. Since
> there is a unified layout, the original script is enhanced a bit to
> automatically load the coredump via gdb the right way.
2. Looks like gdb.sh is an internal script, so I propose to rename it to
"_gdb.sh" to reflect it for users.
>
> Closes #5569
>
> Signed-off-by: Igor Munkin <imun at tarantool.org>
> ---
>
> Issue: https://github.com/tarantool/tarantool/issues/5569
> Branch: https://github.com/tarantool/tarantool/tree/imun/gh-5569-coredump-tooling
>
> changelogs/unreleased/tarabrt.md | 3 +
> tools/gdb.sh | 59 ++++++++
> tools/tarabrt.sh | 234 +++++++++++++++++++++++++++++++
> 3 files changed, 296 insertions(+)
> create mode 100644 changelogs/unreleased/tarabrt.md
> create mode 100755 tools/gdb.sh
> create mode 100755 tools/tarabrt.sh
>
> diff --git a/changelogs/unreleased/tarabrt.md b/changelogs/unreleased/tarabrt.md
> new file mode 100644
> index 000000000..e5e616111
> --- /dev/null
> +++ b/changelogs/unreleased/tarabrt.md
> @@ -0,0 +1,3 @@
> +## feature/tools
> +
> +* Introduced tooling for crash artefacts collecting and postmortem analysis (gh-5569).
> diff --git a/tools/gdb.sh b/tools/gdb.sh
> new file mode 100755
> index 000000000..a58c47cab
> --- /dev/null
> +++ b/tools/gdb.sh
> @@ -0,0 +1,59 @@
> +#!/bin/sh
> +set -eu
3. "set -euo pipefail" is better. Pipe used for SUBPATH var.
> +
> +# Check that gdb is installed.
> +if ! command -v gdb >/dev/null; then
> + cat <<NOGDB
> +gdb is not installed or not found in the PATH.
> +
> +Install gdb or adjust you PATH if you are using non-system gdb and
> +try once more.
> +NOGDB
> + exit 1;
> +fi
> +
> +VERSION=${PWD}/version
> +
> +# Check the location: if the coredump artefacts are collected via
> +# `tarabrt.sh' there should be /version file in the root of the
> +# unpacked tarball. Otherwise, there is no guarantee the coredump
> +# is collected the right way and we can't proceed loading it.
> +if [ ! -f "${VERSION}" ]; then
> + cat <<NOARTEFACTS
> +${VERSION} file is missing.
> +
> +If the coredump artefacts are collected via \`tararbrt.sh' tool
> +there should be /version file in the root of the unpacked tarball
> +(i.e. ${PWD}).
> +If version file is missing, there is no guarantee the coredump
> +is collected the right way and its loading can't be proceeded
> +with this script. Check whether current working directory is the
> +tarball root, or try load the core dump file manually.
> +NOARTEFACTS
> + exit 1;
> +fi
> +
> +REVISION=$(grep -oP 'Tarantool \d+\.\d+\.\d+-\d+-g\K[a-f0-9]+' "$VERSION")
> +cat <<SOURCES
> +================================================================================
> +
> +Do not forget to properly setup the environment:
> +* git clone https://github.com/tarantool/tarantool.git sources
> +* cd !$
> +* git checkout $REVISION
> +* git submodule update --recursive --init
> +
> +================================================================================
> +SOURCES
> +
> +# Define the build path to be substituted with the source path.
> +# XXX: Check the absolute path on the function <main> definition
> +# considering it is located in src/main.cc within Tarantool repo.
> +SUBPATH=$(gdb -batch -n ./tarantool -ex 'info line main' | \
> + grep -oP 'Line \d+ of \"\K.+(?=\/src\/main\.cc\")')
> +
> +# Launch gdb and load coredump with all related artefacts.
> +gdb ./tarantool \
> + -ex "set sysroot $(realpath .)" \
> + -ex "set substitute-path $SUBPATH sources" \
> + -ex 'core coredump'
> diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
> new file mode 100755
> index 000000000..3d44803be
> --- /dev/null
> +++ b/tools/tarabrt.sh
> @@ -0,0 +1,234 @@
> +#!/bin/sh
> +set -eu
4. "set -euo pipefail" is better. Pipe used for SUBPATH var.
> +
> +TOOL=$(basename "$0")
> +HELP=$(cat <<HELP
> +${TOOL} - Tarantool Automatic Bug Reporting Tool
> +
> +This tool collects all required artefacts (listed below) and packs them into
> +a single archive with unified format:
> + - /checklist - the plain text file with the list of tarball contents
> + - /version - the plain text file containing \`tarantool --version' output
> + - /tarantool - the executable binary file produced the core dump
> + - /coredump - the core dump file produced by the executable
> + - all shared libraries loaded (even via dlopen(3)) at the crash moment.
> +
> +SYNOPSIS
> +
> + ${TOOL} [-h] [-c core] [-d dir] [-e executable] [-p procID] [-t datetime]
> +
> +Supported options are:
> + -c COREDUMP Use file COREDUMP as a core dump to examine.
5. I propose to specify somewhere links to documentation how to setup
coredump paths and where
to find coredumps in a system. For Linux it is core(5) [1].
1. https://man7.org/linux/man-pages/man5/core.5.html
> +
> + -d DIRECTORY Create the resulting archive with the artefacts
> + within DIRECTORY.
> +
> + -e TARANTOOL Use file TARANTOOL as the executable file for
> + examining with a core dump COREDUMP. If PID is
> + specified, the one from /proc/PID/exe is chosen
> + (see proc(5) for more info). If TARANTOOL is
> + omitted, /usr/bin/tarantool is chosen.
> +
> + -p PID PID of the dumped process, as seen in the PID
> + namespace in which the given process resides
> + (see %p in core(5) for more info). This flag
> + have to be set when ${TOOL} is used as
> + kernel.core_pattern pipeline script.
> +
> + -t DATETIME Time of dump, expressed as seconds since the
> + epoch, 1970-01-01 00:00:00 +0000 (UTC).
> +
6. Why users may want to specify datetime manually?
We can get datetime in the script with stat:
$stat -c '%Y' core.2715242
1614773664
> + -h Shows this message and exit.
> +
> +USAGE
> +
> + - Manual usage. User can simply pack all necessary artefacts by running the
> + following command.
> + $ /path/to/${TOOL} -c ./core -d /tmp
> +
> + - Automatic usage. If user faces the failures often, one can set this script
> + as a pipe reciever in kernel.core_pattern syntax.
> + # sysctl -w kernel.core_pattern="|/absolute/path/to/${TOOL} -d /var/core -p %p -t %t"
> +
> +HELP
> +)
> +
> +# Parse CLI options.
> +OPTIONS=$(getopt -o c:d:e:hp:t: -n "${TOOL}" -- "$@")
> +eval set -- "${OPTIONS}"
> +while true; do
> + case "$1" in
> + --) shift; break;;
> + -c) COREFILE=$2; shift 2;;
> + -d) COREDIR=$2; shift 2;;
> + -e) BINARY=$2; shift 2;;
> + -p) PID=$2; shift 2;;
> + -t) TIME=$2; shift 2;;
> + -h) printf "%s\n", "${HELP}";
> + exit 0;;
> + *) printf "Invalid option: $1\n%s\n", "${HELP}";
> + exit 1;;
> + esac
> +done
> +
> +# Use the default values for the remaining parameters.
> +BINARY=${BINARY:-/usr/bin/tarantool}
> +COREDIR=${COREDIR:-${PWD}}
> +COREFILE=${COREFILE:-}
> +PID=${PID:-}
> +TIME=${TIME:-$(date +%s)}
> +
> +# XXX: This section handles the case when the script is used for
> +# kernel.core_pattern. If PID is set and there is a directory in
> +# procfs with this PID, the script processes the core dumped by
> +# this process. If the process exe (or strictly saying its comm)
> +# is not 'tarantool' then the coredump is simply saved to the
> +# COREDIR; otherwise the dumped core is packed to the tarball.
> +if [ -n "${PID}" ] && [ -d /proc/"${PID}" ]; then
> + BINARY=$(readlink /proc/"${PID}"/exe)
> + CMDNAME=$(sed -z 's/\s$//' /proc/"${PID}"/comm)
> + COREFILE=${COREDIR}/${CMDNAME}-core.${PID}.${TIME}
> + cat >"${COREFILE}"
> + if [ "${CMDNAME}" != 'tarantool' ]; then
> + [ -t 1 ] && cat <<ALIENCOREDUMP
> +/proc/${PID}/comm doesn't equal to 'tarantool', so we assume the
> +obtained core is dumped by \`${CMDNAME}' and should be packed in
> +a different way. As a result it is simply stored to the file, so
> +you can process it on your own.
> +
> +The file with core dump: ${COREFILE}
> +ALIENCOREDUMP
> + exit 0;
> + fi
> +fi
> +
> +if [ -z "${COREFILE}" ]; then
> + [ -t 1 ] && cat <<NOCOREDUMP
> +There is no core dump file passed to ${TOOL}. The artefacts can't
> +be collected. If you see this message, check the usage by running
> +\`${TOOL} -h': -c option is the obligatory one.
> +NOCOREDUMP
> + exit 1;
> +fi
> +
> +if file "${COREFILE}" | grep -qv 'core file'; then
> + [ -t 1 ] && cat <<NOTACOREDUMP
> +Not a core dump: ${COREFILE}
> +
> +The given COREDUMP file is not a valid core dump (see core(5) for
> +more info) or not even an ELF (see elf(5) for more info). If you
> +see this message, check the COREDUMP file the following way:
> +$ file ${COREFILE}
> +NOTACOREDUMP
> + exit 1;
> +fi
> +
> +# Check that gdb is installed.
> +if ! command -v gdb >/dev/null; then
> + [ -t 1 ] && cat <<NOGDB
> +gdb is not installed, but it is obligatory for collecting the
> +loaded shared libraries from the core dump.
> +
> +You can proceed collecting the artefacts manually later by running
> +the following command:
> +$ ${TOOL} -e ${BINARY} -c ${COREFILE}
> +NOGDB
> + exit 1;
> +fi
> +
> +if file "${BINARY}" | grep -qv 'executable'; then
> + [ -t 1 ] && cat <<NOTELF
> +Not an ELF file: ${BINARY}
> +
> +The given BINARY file is not an ELF (see elf(5) for more info).
> +If you see this message, check the BINARY file the following way:
> +$ file ${BINARY}
> +NOTELF
> + exit 1;
> +fi
> +
> +if gdb -batch -n "${BINARY}" -ex 'info symbol tarantool_version' 2>/dev/null | \
> + grep -q 'tarantool_version in section .text'
> +then
> + # XXX: This is a very ugly hack to implement 'unless'
> + # operator in bash for a long pipeline as a conditional.
> + :
> +else
> + [ -t 1 ] && cat <<NOTARANTOOL
> +Not a Tarantool binary: ${BINARY}
> +
> +The given BINARY file is not a Tarantool executable: there is no a
> +signature symbol in the binary file. If you see this message,
> +check the BINARY file the following way:
> +$ ${BINARY} --help
> +NOTARANTOOL
> + exit 1;
> +fi
> +
> +# Resolve the host name if possible.
> +HOSTNAME=$(hostname 2>/dev/null || echo hostname)
> +
> +# Proceed with collecting and packing artefacts.
> +TMPDIR=$(mktemp -d -p "${COREDIR}")
> +TARLIST=${TMPDIR}/tarlist
> +VERSION=${TMPDIR}/version
> +ARCHIVENAME=${COREDIR}/tarantool-core-${PID:-N}-$(date +%Y%m%d%H%M -d @"${TIME}")-${HOSTNAME%%.*}.tar.gz
> +
> +# Dump the version to checkout the right commit later.
> +${BINARY} --version >"${VERSION}"
> +
> +# Collect the most important artefacts.
> +{
> + echo "${BINARY}"
> + echo "${COREFILE}"
> + echo "${VERSION}"
> +} >>"${TARLIST}"
> +
> +SEPARATOR1="Shared Object Library"
> +SEPARATOR2="Shared library is missing debugging information"
> +# XXX: This is kinda "postmortem ldd": the command below dumps the
> +# full list of the shared libraries the binary is linked against
> +# or those loaded via dlopen at the platform runtime.
> +# This is black voodoo magic. Do not touch. You are warned.
> +if gdb -batch -n "${BINARY}" -c "${COREFILE}" -ex 'info shared' | \
> + sed -n "/${SEPARATOR1}/,/${SEPARATOR2}/p;/${SEPARATOR2}/q" | \
> + awk '{ print $NF }' | grep '^/' >>"${TARLIST}"
> +then
> + # XXX: This is a very ugly hack to implement 'unless'
> + # operator in bash for a long pipeline as a conditional.
> + :
> +else
> + [ -t 1 ] && cat <<COREMISMATCH
> +Core dump file is produced by the different Tarantool executable.
> +
> +Looks like '${COREFILE}' is not generated by \`${BINARY}'.
> +If you see this message, please check that the given COREDUMP
> +is produced by the specified BINARY.
> +There are some temporary artefacts in ${TMPDIR}.
> +Remove it manually if you don't need them anymore.
> +COREMISMATCH
> + exit 1;
> +fi
> +
> +# Pack everything listed in TARLIST file into a tarball. To unify
> +# the archive format BINARY, COREFILE, VERSION and TARLIST are
> +# renamed while packing.
> +tar -czhf "${ARCHIVENAME}" -P -T "${TARLIST}" \
> + --transform="s|${BINARY}|tarantool|" \
> + --transform="s|${COREFILE}|coredump|" \
> + --transform="s|${TARLIST}|checklist|" \
> + --transform="s|${VERSION}|version|" \
> + --add-file="${TARLIST}"
> +
> +[ -t 1 ] && cat <<FINALIZE
> +The resulting is located here: ${ARCHIVENAME}
> +
> +If you want to upload it, choose the available resourse
7. Typo: resourse -> resource
> +(e.g. http://transfer.sh) and run the following command:
> +$ curl -T ${ARCHIVENAME} <resourse-uri>
8. Typo: resourse-uri -> resource-uri
> +FINALIZE
> +
> +# Cleanup temporary files.
> +[ -f "${TARLIST}" ] && rm -f "${TARLIST}"
> +[ -f "${VERSION}" ] && rm -f "${VERSION}"
> +[ -d "${TMPDIR}" ] && rmdir "${TMPDIR}"
More information about the Tarantool-patches
mailing list