[Tarantool-patches] [PATCH] tools: implement toolchain for crash artefacts

Sergey Bronnikov sergeyb at tarantool.org
Wed Mar 3 15:44:24 MSK 2021


Hello!

Igor, thanks for the patch!

I spend a bit on testing script. To create a coredump I run tarantool binary

and generated coredump with GDB: "gdb -batch -ex "generate-core-file" -p 
PID".

To create report I run:

[s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ ./tools/tarabrt.sh -c 
core.2715242 -e build/src/tarantool

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments
The resulting is located here: 
/home/s.bronnikov/work/tarantool/tarantool-core-N-202103031215-tarantool-core-dev-mcs1.tar.gz


If you want to upload it, choose the available resourse
(e.g. http://transfer.sh) and run the following command:
$ curl -T 
/home/s.bronnikov/work/tarantool/tarantool-core-N-202103031222-tarantool-core-dev-mcs1.tar.gz 
<resourse-uri>

1. Can we suppress warnings produced by gdb? I believe it is not useful 
information for users.

2. I propose to show files included to the report and the end of creation.


When I run tarabrt.sh without specifying -e it says that 
/usr/bin/tarantool is not ELF:

(venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ 
./tools/tarabrt.sh -c core.2715242
Not an ELF file: /usr/bin/tarantool

The given BINARY file is not an ELF (see elf(5) for more info).
If you see this message, check the BINARY file the following way:
$ file /usr/bin/tarantool
(venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ file 
/usr/bin/tarantool
/usr/bin/tarantool: ELF 64-bit LSB shared object, x86-64, version 1 
(SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for 
GNU/Linux 3.2.0, BuildID[sha1]=58d86a2b8bf88de3496ea4ead0a8d3c7a73e9c36, 
stripped, too many notes (256)
(venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$

Also see my comments inline.

On 25.02.2021 16:23, Igor Munkin wrote:
> This patch introduces two scripts to ease crash artefacts collecting and
> loading for postmortem analysis:
>
> * tarabrt.sh - the tool collecting a tarball with the crash artefacts
>    the right way: the coredump with the binary, all loaded shared libs,
>    Tarantool version (this is a separate exercise to get it from the
>    binary built with -O2). Besides, the tarball has a unified layout, so
>    it can be easily processed with the second script:
>    - /coredump - core dump file on the root level
>    - /binary - tarantool executable on the root level
>    - /version - plain text file on the root level with
>      `tarantool --version` output
>    - /checklist - plain text file on the root level with
>      the list of the collected entities
>    - all shared libraries used by the crashed instance - their layout
>      respects the one on the host machine, so them can be easily loaded
>      with the following gdb command: set sysroot $(realpath .)
>
>    The script can be easily used either manually or via
>    kernel.core_pattern variable.
>
> * gdb.sh - the auxiliary script originally written by @Totktonada, but
>    needed to be adjusted to the crash artefacts layout every time. Since
>    there is a unified layout, the original script is enhanced a bit to
>    automatically load the coredump via gdb the right way.
2. Looks like gdb.sh is an internal script, so I propose to rename it to 
"_gdb.sh" to reflect it for users.
>
> Closes #5569
>
> Signed-off-by: Igor Munkin <imun at tarantool.org>
> ---
>
> Issue: https://github.com/tarantool/tarantool/issues/5569
> Branch: https://github.com/tarantool/tarantool/tree/imun/gh-5569-coredump-tooling
>
>   changelogs/unreleased/tarabrt.md |   3 +
>   tools/gdb.sh                     |  59 ++++++++
>   tools/tarabrt.sh                 | 234 +++++++++++++++++++++++++++++++
>   3 files changed, 296 insertions(+)
>   create mode 100644 changelogs/unreleased/tarabrt.md
>   create mode 100755 tools/gdb.sh
>   create mode 100755 tools/tarabrt.sh
>
> diff --git a/changelogs/unreleased/tarabrt.md b/changelogs/unreleased/tarabrt.md
> new file mode 100644
> index 000000000..e5e616111
> --- /dev/null
> +++ b/changelogs/unreleased/tarabrt.md
> @@ -0,0 +1,3 @@
> +## feature/tools
> +
> +* Introduced tooling for crash artefacts collecting and postmortem analysis (gh-5569).
> diff --git a/tools/gdb.sh b/tools/gdb.sh
> new file mode 100755
> index 000000000..a58c47cab
> --- /dev/null
> +++ b/tools/gdb.sh
> @@ -0,0 +1,59 @@
> +#!/bin/sh
> +set -eu
3. "set -euo pipefail" is better. Pipe used for SUBPATH var.
> +
> +# Check that gdb is installed.
> +if ! command -v gdb >/dev/null; then
> +	cat <<NOGDB
> +gdb is not installed or not found in the PATH.
> +
> +Install gdb or adjust you PATH if you are using non-system gdb and
> +try once more.
> +NOGDB
> +	exit 1;
> +fi
> +
> +VERSION=${PWD}/version
> +
> +# Check the location: if the coredump artefacts are collected via
> +# `tarabrt.sh' there should be /version file in the root of the
> +# unpacked tarball. Otherwise, there is no guarantee the coredump
> +# is collected the right way and we can't proceed loading it.
> +if [ ! -f "${VERSION}" ]; then
> +	cat <<NOARTEFACTS
> +${VERSION} file is missing.
> +
> +If the coredump artefacts are collected via \`tararbrt.sh' tool
> +there should be /version file in the root of the unpacked tarball
> +(i.e. ${PWD}).
> +If version file is missing, there is no guarantee the coredump
> +is collected the right way and its loading can't be proceeded
> +with this script. Check whether current working directory is the
> +tarball root, or try load the core dump file manually.
> +NOARTEFACTS
> +	exit 1;
> +fi
> +
> +REVISION=$(grep -oP 'Tarantool \d+\.\d+\.\d+-\d+-g\K[a-f0-9]+' "$VERSION")
> +cat <<SOURCES
> +================================================================================
> +
> +Do not forget to properly setup the environment:
> +* git clone https://github.com/tarantool/tarantool.git sources
> +* cd !$
> +* git checkout $REVISION
> +* git submodule update --recursive --init
> +
> +================================================================================
> +SOURCES
> +
> +# Define the build path to be substituted with the source path.
> +# XXX: Check the absolute path on the function <main> definition
> +# considering it is located in src/main.cc within Tarantool repo.
> +SUBPATH=$(gdb -batch -n ./tarantool -ex 'info line main' | \
> +	grep -oP 'Line \d+ of \"\K.+(?=\/src\/main\.cc\")')
> +
> +# Launch gdb and load coredump with all related artefacts.
> +gdb ./tarantool \
> +    -ex "set sysroot $(realpath .)" \
> +    -ex "set substitute-path $SUBPATH sources" \
> +    -ex 'core coredump'
> diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
> new file mode 100755
> index 000000000..3d44803be
> --- /dev/null
> +++ b/tools/tarabrt.sh
> @@ -0,0 +1,234 @@
> +#!/bin/sh
> +set -eu
4. "set -euo pipefail" is better. Pipe used for SUBPATH var.
> +
> +TOOL=$(basename "$0")
> +HELP=$(cat <<HELP
> +${TOOL} - Tarantool Automatic Bug Reporting Tool
> +
> +This tool collects all required artefacts (listed below) and packs them into
> +a single archive with unified format:
> +  - /checklist - the plain text file with the list of tarball contents
> +  - /version   - the plain text file containing \`tarantool --version' output
> +  - /tarantool - the executable binary file produced the core dump
> +  - /coredump  - the core dump file produced by the executable
> +  - all shared libraries loaded (even via dlopen(3)) at the crash moment.
> +
> +SYNOPSIS
> +
> +  ${TOOL} [-h] [-c core] [-d dir] [-e executable] [-p procID] [-t datetime]
> +
> +Supported options are:
> +  -c COREDUMP                   Use file COREDUMP as a core dump to examine.

5. I propose to specify somewhere links to documentation how to setup 
coredump paths and where

to find coredumps in a system. For Linux it is core(5) [1].

1. https://man7.org/linux/man-pages/man5/core.5.html


> +
> +  -d DIRECTORY                  Create the resulting archive with the artefacts
> +                                within DIRECTORY.
> +
> +  -e TARANTOOL                  Use file TARANTOOL as the executable file for
> +                                examining with a core dump COREDUMP. If PID is
> +                                specified, the one from /proc/PID/exe is chosen
> +                                (see proc(5) for more info). If TARANTOOL is
> +                                omitted, /usr/bin/tarantool is chosen.
> +
> +  -p PID                        PID of the dumped process, as seen in the PID
> +                                namespace in which the given process resides
> +                                (see %p in core(5) for more info). This flag
> +                                have to be set when ${TOOL} is used as
> +                                kernel.core_pattern pipeline script.
> +
> +  -t DATETIME                   Time of dump, expressed as seconds since the
> +                                epoch, 1970-01-01 00:00:00 +0000 (UTC).
> +

6. Why users may want to specify datetime manually?

We can get datetime in the script with stat:

  $stat -c '%Y' core.2715242

1614773664

> +  -h                            Shows this message and exit.
> +
> +USAGE
> +
> +  - Manual usage. User can simply pack all necessary artefacts by running the
> +    following command.
> +    $ /path/to/${TOOL} -c ./core -d /tmp
> +
> +  - Automatic usage. If user faces the failures often, one can set this script
> +    as a pipe reciever in kernel.core_pattern syntax.
> +    # sysctl -w kernel.core_pattern="|/absolute/path/to/${TOOL} -d /var/core -p %p -t %t"
> +
> +HELP
> +)
> +
> +# Parse CLI options.
> +OPTIONS=$(getopt -o c:d:e:hp:t: -n "${TOOL}" -- "$@")
> +eval set -- "${OPTIONS}"
> +while true; do
> +	case "$1" in
> +		--) shift; break;;
> +		-c) COREFILE=$2; shift 2;;
> +		-d) COREDIR=$2;  shift 2;;
> +		-e) BINARY=$2;   shift 2;;
> +		-p) PID=$2;      shift 2;;
> +		-t) TIME=$2;     shift 2;;
> +		-h) printf "%s\n", "${HELP}";
> +			exit 0;;
> +		*)  printf "Invalid option: $1\n%s\n", "${HELP}";
> +			exit 1;;
> +	esac
> +done
> +
> +# Use the default values for the remaining parameters.
> +BINARY=${BINARY:-/usr/bin/tarantool}
> +COREDIR=${COREDIR:-${PWD}}
> +COREFILE=${COREFILE:-}
> +PID=${PID:-}
> +TIME=${TIME:-$(date +%s)}
> +
> +# XXX: This section handles the case when the script is used for
> +# kernel.core_pattern. If PID is set and there is a directory in
> +# procfs with this PID, the script processes the core dumped by
> +# this process. If the process exe (or strictly saying its comm)
> +# is not 'tarantool' then the coredump is simply saved to the
> +# COREDIR; otherwise the dumped core is packed to the tarball.
> +if [ -n "${PID}" ] && [ -d /proc/"${PID}" ]; then
> +	BINARY=$(readlink /proc/"${PID}"/exe)
> +	CMDNAME=$(sed -z 's/\s$//' /proc/"${PID}"/comm)
> +	COREFILE=${COREDIR}/${CMDNAME}-core.${PID}.${TIME}
> +	cat >"${COREFILE}"
> +	if [ "${CMDNAME}" != 'tarantool' ]; then
> +		[ -t 1 ] && cat <<ALIENCOREDUMP
> +/proc/${PID}/comm doesn't equal to 'tarantool', so we assume the
> +obtained core is dumped by \`${CMDNAME}' and should be packed in
> +a different way. As a result it is simply stored to the file, so
> +you can process it on your own.
> +
> +The file with core dump: ${COREFILE}
> +ALIENCOREDUMP
> +		exit 0;
> +	fi
> +fi
> +
> +if [ -z "${COREFILE}" ]; then
> +	[ -t 1 ] && cat <<NOCOREDUMP
> +There is no core dump file passed to ${TOOL}. The artefacts can't
> +be collected. If you see this message, check the usage by running
> +\`${TOOL} -h': -c option is the obligatory one.
> +NOCOREDUMP
> +	exit 1;
> +fi
> +
> +if file "${COREFILE}" | grep -qv 'core file'; then
> +	[ -t 1 ] && cat <<NOTACOREDUMP
> +Not a core dump: ${COREFILE}
> +
> +The given COREDUMP file is not a valid core dump (see core(5) for
> +more info) or not even an ELF (see elf(5) for more info). If you
> +see this message, check the COREDUMP file the following way:
> +$ file ${COREFILE}
> +NOTACOREDUMP
> +	exit 1;
> +fi
> +
> +# Check that gdb is installed.
> +if ! command -v gdb >/dev/null; then
> +	[ -t 1 ] && cat <<NOGDB
> +gdb is not installed, but it is obligatory for collecting the
> +loaded shared libraries from the core dump.
> +
> +You can proceed collecting the artefacts manually later by running
> +the following command:
> +$ ${TOOL} -e ${BINARY} -c ${COREFILE}
> +NOGDB
> +	exit 1;
> +fi
> +
> +if file "${BINARY}" | grep -qv 'executable'; then
> +	[ -t 1 ] && cat <<NOTELF
> +Not an ELF file: ${BINARY}
> +
> +The given BINARY file is not an ELF (see elf(5) for more info).
> +If you see this message, check the BINARY file the following way:
> +$ file ${BINARY}
> +NOTELF
> +	exit 1;
> +fi
> +
> +if gdb -batch -n "${BINARY}" -ex 'info symbol tarantool_version' 2>/dev/null | \
> +	grep -q 'tarantool_version in section .text'
> +then
> +	# XXX: This is a very ugly hack to implement 'unless'
> +	# operator in bash for a long pipeline as a conditional.
> +	:
> +else
> +	[ -t 1 ] && cat <<NOTARANTOOL
> +Not a Tarantool binary: ${BINARY}
> +
> +The given BINARY file is not a Tarantool executable: there is no a
> +signature symbol in the binary file. If you see this message,
> +check the BINARY file the following way:
> +$ ${BINARY} --help
> +NOTARANTOOL
> +	exit 1;
> +fi
> +
> +# Resolve the host name if possible.
> +HOSTNAME=$(hostname 2>/dev/null || echo hostname)
> +
> +# Proceed with collecting and packing artefacts.
> +TMPDIR=$(mktemp -d -p "${COREDIR}")
> +TARLIST=${TMPDIR}/tarlist
> +VERSION=${TMPDIR}/version
> +ARCHIVENAME=${COREDIR}/tarantool-core-${PID:-N}-$(date +%Y%m%d%H%M -d @"${TIME}")-${HOSTNAME%%.*}.tar.gz
> +
> +# Dump the version to checkout the right commit later.
> +${BINARY} --version >"${VERSION}"
> +
> +# Collect the most important artefacts.
> +{
> +	echo "${BINARY}"
> +	echo "${COREFILE}"
> +	echo "${VERSION}"
> +} >>"${TARLIST}"
> +
> +SEPARATOR1="Shared Object Library"
> +SEPARATOR2="Shared library is missing debugging information"
> +# XXX: This is kinda "postmortem ldd": the command below dumps the
> +# full list of the shared libraries the binary is linked against
> +# or those loaded via dlopen at the platform runtime.
> +# This is black voodoo magic. Do not touch. You are warned.
> +if gdb -batch -n "${BINARY}" -c "${COREFILE}" -ex 'info shared'    | \
> +	sed -n "/${SEPARATOR1}/,/${SEPARATOR2}/p;/${SEPARATOR2}/q" | \
> +	awk '{ print $NF }' | grep '^/' >>"${TARLIST}"
> +then
> +	# XXX: This is a very ugly hack to implement 'unless'
> +	# operator in bash for a long pipeline as a conditional.
> +	:
> +else
> +	[ -t 1 ] && cat <<COREMISMATCH
> +Core dump file is produced by the different Tarantool executable.
> +
> +Looks like '${COREFILE}' is not generated by \`${BINARY}'.
> +If you see this message, please check that the given COREDUMP
> +is produced by the specified BINARY.
> +There are some temporary artefacts in ${TMPDIR}.
> +Remove it manually if you don't need them anymore.
> +COREMISMATCH
> +	exit 1;
> +fi
> +
> +# Pack everything listed in TARLIST file into a tarball. To unify
> +# the archive format BINARY, COREFILE, VERSION and TARLIST are
> +# renamed while packing.
> +tar -czhf "${ARCHIVENAME}" -P -T "${TARLIST}" \
> +	--transform="s|${BINARY}|tarantool|"  \
> +	--transform="s|${COREFILE}|coredump|" \
> +	--transform="s|${TARLIST}|checklist|" \
> +	--transform="s|${VERSION}|version|"   \
> +	--add-file="${TARLIST}"
> +
> +[ -t 1 ] && cat <<FINALIZE
> +The resulting is located here: ${ARCHIVENAME}
> +
> +If you want to upload it, choose the available resourse

7. Typo: resourse -> resource

> +(e.g. http://transfer.sh) and run the following command:
> +$ curl -T ${ARCHIVENAME} <resourse-uri>
8. Typo: resourse-uri -> resource-uri
> +FINALIZE
> +
> +# Cleanup temporary files.
> +[ -f "${TARLIST}" ] && rm -f "${TARLIST}"
> +[ -f "${VERSION}" ] && rm -f "${VERSION}"
> +[ -d "${TMPDIR}" ] && rmdir "${TMPDIR}"


More information about the Tarantool-patches mailing list