[Tarantool-patches] [PATCH] tools: implement toolchain for crash artefacts

Igor Munkin imun at tarantool.org
Thu Mar 11 15:13:42 MSK 2021


Sergey,

Thanks for your review!

On 03.03.21, Sergey Bronnikov wrote:
> Hello!
> 
> Igor, thanks for the patch!
> 
> I spend a bit on testing script. To create a coredump I run tarantool binary
> 
> and generated coredump with GDB: "gdb -batch -ex "generate-core-file" -p 
> PID".
> 
> To create report I run:
> 
> [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ ./tools/tarabrt.sh -c 
> core.2715242 -e build/src/tarantool
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> 
> warning: Loadable section ".note.gnu.property" outside of ELF segments
> The resulting is located here: 
> /home/s.bronnikov/work/tarantool/tarantool-core-N-202103031215-tarantool-core-dev-mcs1.tar.gz
> 
> 
> If you want to upload it, choose the available resourse
> (e.g. http://transfer.sh) and run the following command:
> $ curl -T 
> /home/s.bronnikov/work/tarantool/tarantool-core-N-202103031222-tarantool-core-dev-mcs1.tar.gz 
> <resourse-uri>
> 
> 1. Can we suppress warnings produced by gdb? I believe it is not useful 
> information for users.

I don't want to do this, since in the normal case there should be no
warnings at all. Otherwise, it's better if customer share them with us.

> 
> 2. I propose to show files included to the report and the end of creation.

Well, I believe I didn't understand you right before (this list is
included to the archive). Anyway, now I added the list also to the
message at the end.

================================================================================

diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
index 212511813..6b88698a2 100755
--- a/tools/tarabrt.sh
+++ b/tools/tarabrt.sh
@@ -222,7 +222,10 @@ tar -czhf "${ARCHIVENAME}" -P -T "${TARLIST}" \
 	--add-file="${TARLIST}"
 
 [ -t 1 ] && cat <<FINALIZE
-The resulting is located here: ${ARCHIVENAME}
+The resulting archive is located here: ${ARCHIVENAME}
+The archive contents are listed below:
+${TARLIST}
+$(cat ${TARLIST})
 
 If you want to upload it, choose the available resourse
 (e.g. http://transfer.sh) and run the following command:

================================================================================

> 
> 
> When I run tarabrt.sh without specifying -e it says that 
> /usr/bin/tarantool is not ELF:
> 
> (venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ 
> ./tools/tarabrt.sh -c core.2715242
> Not an ELF file: /usr/bin/tarantool
> 
> The given BINARY file is not an ELF (see elf(5) for more info).
> If you see this message, check the BINARY file the following way:
> $ file /usr/bin/tarantool
> (venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$ file 
> /usr/bin/tarantool
> /usr/bin/tarantool: ELF 64-bit LSB shared object, x86-64, version 1 
> (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for 
> GNU/Linux 3.2.0, BuildID[sha1]=58d86a2b8bf88de3496ea4ead0a8d3c7a73e9c36, 
> stripped, too many notes (256)
> (venv) [s.bronnikov at tarantool-core-dev-mcs1 tarantool]$

Oh, crap... Thank you a lot! I've tried on two hosts: my Gentoo working
station and our CentOS 7 dev1 stand: both are looking good for two
reasons:
* There is a bleeding file package on my Gentoo and Tarantool binary is
  built with PIE enabled.
* There is an ancient file package on our CentOS 7 machine and Tarantool
  binary is built without PIE.

Your case discovered the third case: a buggy file utility used against
Tarantool binary built with PIE. Hence I've adjusted the condition and
added the comment with the link providing the highly detailed answer.
Here are the changes.

================================================================================

diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
index c3ec9241e..372ff80c1 100755
--- a/tools/tarabrt.sh
+++ b/tools/tarabrt.sh
@@ -137,7 +137,16 @@ NOGDB
 	exit 1;
 fi
 
-if file "${BINARY}" | grep -qv 'executable'; then
+# XXX: There is annoying bug in file(1), making it claim that PIE
+# is a shared object, but not an executable one. Strictly saying,
+# there was no PIE support until 5.34 at all, but the whole thing
+# didn't work until 5.36. Unfortunately, there are distros (e.g.
+# CentOS 8) providing file utility without valid PIE support, but
+# compiling PIE binaries. As a result we ought to respect both
+# behaviours for the check below. For more info, see the highly
+# detailed answer on Stack Overflow below.
+# https://stackoverflow.com/a/55704865/4609359
+if file "${BINARY}" | grep -qvE 'executable|shared object'; then
 	[ -t 1 ] && cat <<NOTELF
 Not an ELF file: ${BINARY}
 

================================================================================

> 
> Also see my comments inline.
> 
> On 25.02.2021 16:23, Igor Munkin wrote:
> > This patch introduces two scripts to ease crash artefacts collecting and
> > loading for postmortem analysis:
> >
> > * tarabrt.sh - the tool collecting a tarball with the crash artefacts
> >    the right way: the coredump with the binary, all loaded shared libs,
> >    Tarantool version (this is a separate exercise to get it from the
> >    binary built with -O2). Besides, the tarball has a unified layout, so
> >    it can be easily processed with the second script:
> >    - /coredump - core dump file on the root level
> >    - /binary - tarantool executable on the root level
> >    - /version - plain text file on the root level with
> >      `tarantool --version` output
> >    - /checklist - plain text file on the root level with
> >      the list of the collected entities
> >    - all shared libraries used by the crashed instance - their layout
> >      respects the one on the host machine, so them can be easily loaded
> >      with the following gdb command: set sysroot $(realpath .)
> >
> >    The script can be easily used either manually or via
> >    kernel.core_pattern variable.
> >
> > * gdb.sh - the auxiliary script originally written by @Totktonada, but
> >    needed to be adjusted to the crash artefacts layout every time. Since
> >    there is a unified layout, the original script is enhanced a bit to
> >    automatically load the coredump via gdb the right way.
> 2. Looks like gdb.sh is an internal script, so I propose to rename it to 
> "_gdb.sh" to reflect it for users.

No, it's not. I hope this will be useful for the teammates, so its name
should be convenient to be used via shell.

> >
> > Closes #5569
> >
> > Signed-off-by: Igor Munkin <imun at tarantool.org>
> > ---
> >
> > Issue: https://github.com/tarantool/tarantool/issues/5569
> > Branch: https://github.com/tarantool/tarantool/tree/imun/gh-5569-coredump-tooling
> >
> >   changelogs/unreleased/tarabrt.md |   3 +
> >   tools/gdb.sh                     |  59 ++++++++
> >   tools/tarabrt.sh                 | 234 +++++++++++++++++++++++++++++++
> >   3 files changed, 296 insertions(+)
> >   create mode 100644 changelogs/unreleased/tarabrt.md
> >   create mode 100755 tools/gdb.sh
> >   create mode 100755 tools/tarabrt.sh
> >

<snipped>

> > diff --git a/tools/gdb.sh b/tools/gdb.sh
> > new file mode 100755
> > index 000000000..a58c47cab
> > --- /dev/null
> > +++ b/tools/gdb.sh
> > @@ -0,0 +1,59 @@
> > +#!/bin/sh
> > +set -eu
> 3. "set -euo pipefail" is better. Pipe used for SUBPATH var.

There is no "-o pipefail" for /bin/sh, only for /bin/bash.

> > +

<snipped>

> > diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
> > new file mode 100755
> > index 000000000..3d44803be
> > --- /dev/null
> > +++ b/tools/tarabrt.sh
> > @@ -0,0 +1,234 @@
> > +#!/bin/sh
> > +set -eu
> 4. "set -euo pipefail" is better. Pipe used for SUBPATH var.

There is no "-o pipefail" for /bin/sh, only for /bin/bash.

> > +
> > +TOOL=$(basename "$0")
> > +HELP=$(cat <<HELP
> > +${TOOL} - Tarantool Automatic Bug Reporting Tool
> > +
> > +This tool collects all required artefacts (listed below) and packs them into
> > +a single archive with unified format:
> > +  - /checklist - the plain text file with the list of tarball contents
> > +  - /version   - the plain text file containing \`tarantool --version' output
> > +  - /tarantool - the executable binary file produced the core dump
> > +  - /coredump  - the core dump file produced by the executable
> > +  - all shared libraries loaded (even via dlopen(3)) at the crash moment.
> > +
> > +SYNOPSIS
> > +
> > +  ${TOOL} [-h] [-c core] [-d dir] [-e executable] [-p procID] [-t datetime]
> > +
> > +Supported options are:
> > +  -c COREDUMP                   Use file COREDUMP as a core dump to examine.
> 
> 5. I propose to specify somewhere links to documentation how to setup 
> coredump paths and where
> 
> to find coredumps in a system. For Linux it is core(5) [1].
> 
> 1. https://man7.org/linux/man-pages/man5/core.5.html

Added the following line to the help message.

================================================================================

diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
index 3d44803be..212511813 100755
--- a/tools/tarabrt.sh
+++ b/tools/tarabrt.sh
@@ -19,6 +19,7 @@ SYNOPSIS
 
 Supported options are:
   -c COREDUMP                   Use file COREDUMP as a core dump to examine.
+                                See core(5) for more info.
 
   -d DIRECTORY                  Create the resulting archive with the artefacts
                                 within DIRECTORY.

================================================================================

> 
> 
> > +
> > +  -d DIRECTORY                  Create the resulting archive with the artefacts
> > +                                within DIRECTORY.
> > +
> > +  -e TARANTOOL                  Use file TARANTOOL as the executable file for
> > +                                examining with a core dump COREDUMP. If PID is
> > +                                specified, the one from /proc/PID/exe is chosen
> > +                                (see proc(5) for more info). If TARANTOOL is
> > +                                omitted, /usr/bin/tarantool is chosen.
> > +
> > +  -p PID                        PID of the dumped process, as seen in the PID
> > +                                namespace in which the given process resides
> > +                                (see %p in core(5) for more info). This flag
> > +                                have to be set when ${TOOL} is used as
> > +                                kernel.core_pattern pipeline script.
> > +
> > +  -t DATETIME                   Time of dump, expressed as seconds since the
> > +                                epoch, 1970-01-01 00:00:00 +0000 (UTC).
> > +
> 
> 6. Why users may want to specify datetime manually?

To provide the timing in core pattern via %t. I don't know whether the
value differs from the one you mentioned below. If it's not, I guess
this option can be removed.

> 
> We can get datetime in the script with stat:
> 
>   $stat -c '%Y' core.2715242
> 
> 1614773664
> 
> > +  -h                            Shows this message and exit.
> > +
> > +USAGE
> > +
> > +  - Manual usage. User can simply pack all necessary artefacts by running the
> > +    following command.
> > +    $ /path/to/${TOOL} -c ./core -d /tmp
> > +
> > +  - Automatic usage. If user faces the failures often, one can set this script
> > +    as a pipe reciever in kernel.core_pattern syntax.
> > +    # sysctl -w kernel.core_pattern="|/absolute/path/to/${TOOL} -d /var/core -p %p -t %t"
> > +
> > +HELP
> > +)
> > +
> > +# Parse CLI options.
> > +OPTIONS=$(getopt -o c:d:e:hp:t: -n "${TOOL}" -- "$@")
> > +eval set -- "${OPTIONS}"
> > +while true; do
> > +	case "$1" in
> > +		--) shift; break;;
> > +		-c) COREFILE=$2; shift 2;;
> > +		-d) COREDIR=$2;  shift 2;;
> > +		-e) BINARY=$2;   shift 2;;
> > +		-p) PID=$2;      shift 2;;
> > +		-t) TIME=$2;     shift 2;;
> > +		-h) printf "%s\n", "${HELP}";
> > +			exit 0;;
> > +		*)  printf "Invalid option: $1\n%s\n", "${HELP}";
> > +			exit 1;;
> > +	esac
> > +done
> > +
> > +# Use the default values for the remaining parameters.
> > +BINARY=${BINARY:-/usr/bin/tarantool}
> > +COREDIR=${COREDIR:-${PWD}}
> > +COREFILE=${COREFILE:-}
> > +PID=${PID:-}
> > +TIME=${TIME:-$(date +%s)}
> > +
> > +# XXX: This section handles the case when the script is used for
> > +# kernel.core_pattern. If PID is set and there is a directory in
> > +# procfs with this PID, the script processes the core dumped by
> > +# this process. If the process exe (or strictly saying its comm)
> > +# is not 'tarantool' then the coredump is simply saved to the
> > +# COREDIR; otherwise the dumped core is packed to the tarball.
> > +if [ -n "${PID}" ] && [ -d /proc/"${PID}" ]; then
> > +	BINARY=$(readlink /proc/"${PID}"/exe)
> > +	CMDNAME=$(sed -z 's/\s$//' /proc/"${PID}"/comm)
> > +	COREFILE=${COREDIR}/${CMDNAME}-core.${PID}.${TIME}
> > +	cat >"${COREFILE}"
> > +	if [ "${CMDNAME}" != 'tarantool' ]; then
> > +		[ -t 1 ] && cat <<ALIENCOREDUMP
> > +/proc/${PID}/comm doesn't equal to 'tarantool', so we assume the
> > +obtained core is dumped by \`${CMDNAME}' and should be packed in
> > +a different way. As a result it is simply stored to the file, so
> > +you can process it on your own.
> > +
> > +The file with core dump: ${COREFILE}
> > +ALIENCOREDUMP
> > +		exit 0;
> > +	fi
> > +fi
> > +
> > +if [ -z "${COREFILE}" ]; then
> > +	[ -t 1 ] && cat <<NOCOREDUMP
> > +There is no core dump file passed to ${TOOL}. The artefacts can't
> > +be collected. If you see this message, check the usage by running
> > +\`${TOOL} -h': -c option is the obligatory one.
> > +NOCOREDUMP
> > +	exit 1;
> > +fi
> > +
> > +if file "${COREFILE}" | grep -qv 'core file'; then
> > +	[ -t 1 ] && cat <<NOTACOREDUMP
> > +Not a core dump: ${COREFILE}
> > +
> > +The given COREDUMP file is not a valid core dump (see core(5) for
> > +more info) or not even an ELF (see elf(5) for more info). If you
> > +see this message, check the COREDUMP file the following way:
> > +$ file ${COREFILE}
> > +NOTACOREDUMP
> > +	exit 1;
> > +fi
> > +
> > +# Check that gdb is installed.
> > +if ! command -v gdb >/dev/null; then
> > +	[ -t 1 ] && cat <<NOGDB
> > +gdb is not installed, but it is obligatory for collecting the
> > +loaded shared libraries from the core dump.
> > +
> > +You can proceed collecting the artefacts manually later by running
> > +the following command:
> > +$ ${TOOL} -e ${BINARY} -c ${COREFILE}
> > +NOGDB
> > +	exit 1;
> > +fi
> > +
> > +if file "${BINARY}" | grep -qv 'executable'; then
> > +	[ -t 1 ] && cat <<NOTELF
> > +Not an ELF file: ${BINARY}
> > +
> > +The given BINARY file is not an ELF (see elf(5) for more info).
> > +If you see this message, check the BINARY file the following way:
> > +$ file ${BINARY}
> > +NOTELF
> > +	exit 1;
> > +fi
> > +
> > +if gdb -batch -n "${BINARY}" -ex 'info symbol tarantool_version' 2>/dev/null | \
> > +	grep -q 'tarantool_version in section .text'
> > +then
> > +	# XXX: This is a very ugly hack to implement 'unless'
> > +	# operator in bash for a long pipeline as a conditional.
> > +	:
> > +else
> > +	[ -t 1 ] && cat <<NOTARANTOOL
> > +Not a Tarantool binary: ${BINARY}
> > +
> > +The given BINARY file is not a Tarantool executable: there is no a
> > +signature symbol in the binary file. If you see this message,
> > +check the BINARY file the following way:
> > +$ ${BINARY} --help
> > +NOTARANTOOL
> > +	exit 1;
> > +fi
> > +
> > +# Resolve the host name if possible.
> > +HOSTNAME=$(hostname 2>/dev/null || echo hostname)
> > +
> > +# Proceed with collecting and packing artefacts.
> > +TMPDIR=$(mktemp -d -p "${COREDIR}")
> > +TARLIST=${TMPDIR}/tarlist
> > +VERSION=${TMPDIR}/version
> > +ARCHIVENAME=${COREDIR}/tarantool-core-${PID:-N}-$(date +%Y%m%d%H%M -d @"${TIME}")-${HOSTNAME%%.*}.tar.gz
> > +
> > +# Dump the version to checkout the right commit later.
> > +${BINARY} --version >"${VERSION}"
> > +
> > +# Collect the most important artefacts.
> > +{
> > +	echo "${BINARY}"
> > +	echo "${COREFILE}"
> > +	echo "${VERSION}"
> > +} >>"${TARLIST}"
> > +
> > +SEPARATOR1="Shared Object Library"
> > +SEPARATOR2="Shared library is missing debugging information"
> > +# XXX: This is kinda "postmortem ldd": the command below dumps the
> > +# full list of the shared libraries the binary is linked against
> > +# or those loaded via dlopen at the platform runtime.
> > +# This is black voodoo magic. Do not touch. You are warned.
> > +if gdb -batch -n "${BINARY}" -c "${COREFILE}" -ex 'info shared'    | \
> > +	sed -n "/${SEPARATOR1}/,/${SEPARATOR2}/p;/${SEPARATOR2}/q" | \
> > +	awk '{ print $NF }' | grep '^/' >>"${TARLIST}"
> > +then
> > +	# XXX: This is a very ugly hack to implement 'unless'
> > +	# operator in bash for a long pipeline as a conditional.
> > +	:
> > +else
> > +	[ -t 1 ] && cat <<COREMISMATCH
> > +Core dump file is produced by the different Tarantool executable.
> > +
> > +Looks like '${COREFILE}' is not generated by \`${BINARY}'.
> > +If you see this message, please check that the given COREDUMP
> > +is produced by the specified BINARY.
> > +There are some temporary artefacts in ${TMPDIR}.
> > +Remove it manually if you don't need them anymore.
> > +COREMISMATCH
> > +	exit 1;
> > +fi
> > +
> > +# Pack everything listed in TARLIST file into a tarball. To unify
> > +# the archive format BINARY, COREFILE, VERSION and TARLIST are
> > +# renamed while packing.
> > +tar -czhf "${ARCHIVENAME}" -P -T "${TARLIST}" \
> > +	--transform="s|${BINARY}|tarantool|"  \
> > +	--transform="s|${COREFILE}|coredump|" \
> > +	--transform="s|${TARLIST}|checklist|" \
> > +	--transform="s|${VERSION}|version|"   \
> > +	--add-file="${TARLIST}"
> > +
> > +[ -t 1 ] && cat <<FINALIZE
> > +The resulting is located here: ${ARCHIVENAME}
> > +
> > +If you want to upload it, choose the available resourse
> 
> 7. Typo: resourse -> resource
> 
> > +(e.g. http://transfer.sh) and run the following command:
> > +$ curl -T ${ARCHIVENAME} <resourse-uri>
> 8. Typo: resourse-uri -> resource-uri

Thanks, fixed both typos.

================================================================================

diff --git a/tools/tarabrt.sh b/tools/tarabrt.sh
index 6b88698a2..1f9e10867 100755
--- a/tools/tarabrt.sh
+++ b/tools/tarabrt.sh
@@ -227,9 +227,9 @@ The archive contents are listed below:
 ${TARLIST}
 $(cat ${TARLIST})
 
-If you want to upload it, choose the available resourse
+If you want to upload it, choose the available resource
 (e.g. http://transfer.sh) and run the following command:
-$ curl -T ${ARCHIVENAME} <resourse-uri>
+$ curl -T ${ARCHIVENAME} <resource-uri>
 FINALIZE
 
 # Cleanup temporary files.

================================================================================

> > +FINALIZE
> > +
> > +# Cleanup temporary files.
> > +[ -f "${TARLIST}" ] && rm -f "${TARLIST}"
> > +[ -f "${VERSION}" ] && rm -f "${VERSION}"
> > +[ -d "${TMPDIR}" ] && rmdir "${TMPDIR}"

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list