[Tarantool-patches] [PATCH v3] Implement perf testing at gitlab-ci

Mon Feb 17 01:04:19 MSK 2020

In order to don't make anybody confused I'll share our agreement:

- Proposal (stage 1):
  - Don't review / change workloads and harness.
  - Enable automatic runs of benchmarks on long-term branches (per-push).
  - Save those results to the existing database (one under
    bench.tarantool.org).
  - Resurrect bench.tarantool.org: it should show new results.

After this we should review workload kinds and sizes, improve
visualization, setup alerts and made other enhancement that will make
performance tracking / measurements being the useful tool.

Since we discussing the first stage now, there is nothing to review.

There was the suggestion from me: move all things where we have no
agreement to the separate repository (bench-run) to don't do many fixups
within tarantool repository in the near future and split the
responsibility (QA team is both producer and consumer of performance
tracking results).

We have no agreement on using docker in performance testing (I'm
strongly against, but it is not in my responsibility). So any trails of
docker should be within bench-run repository. So here I expect only
./bench-run/prepare.sh and ./bench-run/sysbench.sh calls, nothing more.

We can pass docker repository URI and credentials within environment
variables (secret ones for credentials) and use it in bench-run. I don't
see any problem to do it in this way.

Aside of this, I'm against of using gitlab-runner on performance
machines, because I don't know how it works. But okay, maybe everything
will be fine, however please monitor its behaviour.

My objections against using docker in a performance testing are below.
Skip them: it is only to say 'I said this!' in the future.

Several questions about the patch and bench-run are at end of the email
(it is about stage 2, yep, but anyway).

WBR, Alexander Turenko.

----

Docker virtualizes network and disk (both root and volumes). Any
virtualization level adds complexity: requires more expertise and work
to investigate and explain results, may affect results on its own and
make them less predictable and stable. On the other hand, it does not
give any gains for performance testing.

One may say that it freezes userspace, but it may be easily achieved w/o
docker: just don't change it. That's all.

Okay, this topic is not so easy when the machine where a performance
testing performed is not fully controlled: weird processes within an
organization does not save us from somebody who will login and update
something (strange, yep?).

Docker will not save us from this situation: somebody may update docker
itself, or kernel, or run something that will affect results that are in
fly. The problem is in processes and it should be solved first.

One may say that docker does not spoil performance results. Maybe. Maybe
not. It is hard to say without deep investigation. While gains are so
vague I would not pay my time to look at this direction.

This is basically all, but I'll share several questions to show that my
point 'adding of a virtualization level requires more expertise' have
some ground downward.

----

Whether vm.dirty_ratio will work in the same way for dirty pages of a
filesystem within a volume as for an underlying filesystem? Whether it
depends on a certain underlying filesystem? Whether it'll use docker's
memory size to calculate a dirty pages percent or system-wide one?

Whether `sync + drop caches` within a container will affect disc buffers
outside of the container (say, one that remains after a previous run
within another container)?

Whether a unix domain socket that is created within overlay filesystem
will behave in the same way as on a real filesystem (in case we'll test
iproto via unix socket)?

Will fsync() flush data to a real disc or will be catched somewhere
within docker? We had related regression [1].

[1]: https://github.com/tarantool/tarantool/issues/3747

----

Black box testing sucks. We should deeply understand what we're testing,
otherwise it will get 'some quality' which never will be good.

Performance testing with docker is black box for me. When it is
'tarantool + libc + some libs + kernel' I more or less understand (at
least able to inspect) what is going on and I can, say, propose to add /
remove / tune workloads to cover specific needs.

I can dig into docker, of course, but there are so many things which
deserves time more than this activity.

----

I looked here and there around the patch and bench-run and have several
questions. Since we agreed to don't review anything around workloads
now, it is just questions. Okay to ignore.

I don't see any volume / mount parameters. Aren't this means that WAL
writes will going to an overlay fs? I guess it may be far from a real
disc and may have a separate level of caching.

AFAIS, the current way to use docker don't even try to freeze userspace:
it uses 'ubuntu:18.04' tag, which is updated from time to time, not,
say, 'ubuntu:bionic-20200112'. It also performs 'apt-get update' inside
and so userspace will be changed for each rebuilt of the image. We
unable to change something inside the image and don't update everything.
This way we don't actually control userspace updates.

BTW, why Ubuntu is used while all production environments (where
performance matters) are on RHEL / CentOS 7?

Why dirty cache is not cleaned (`sync`) before flushing clean cache to
disc (`echo 3 > /proc/sys/vm/drop_caches`)?