[Tarantool-patches] [PATCH v3] Implement perf testing at gitlab-ci

Mon Feb 17 10:25:14 MSK 2020



>Понедельник, 17 февраля 2020, 1:04 +03:00 от Alexander Turenko <alexander.turenko at tarantool.org>:
>
>In order to don't make anybody confused I'll share our agreement:
>
>- Proposal (stage 1):
>  - Don't review / change workloads and harness.
>  - Enable automatic runs of benchmarks on long-term branches (per-push).
>  - Save those results to the existing database (one under
>    bench.tarantool.org).
>  - Resurrect bench.tarantool.org: it should show new results.
>
>After this we should review workload kinds and sizes, improve
>visualization, setup alerts and made other enhancement that will make
>performance tracking / measurements being the useful tool.
>
>Since we discussing the first stage now, there is nothing to review.
>
>There was the suggestion from me: move all things where we have no
>agreement to the separate repository (bench-run) to don't do many fixups
>within tarantool repository in the near future and split the
>responsibility (QA team is both producer and consumer of performance
>tracking results).
>
>We have no agreement on using docker in performance testing (I'm
>strongly against, but it is not in my responsibility). So any trails of
>docker should be within bench-run repository. So here I expect only
>./bench-run/prepare.sh and ./bench-run/sysbench.sh calls, nothing more.
Ok, sure, I've moved all the actions into the bench-run, only make calls and
benchmarks scripts runs left at the Tarantool sources:
- make call to create/update the Docker images
- benchmarks scripts runners calls
- make call to cleanup the short-term Docker image
>
>We can pass docker repository URI and credentials within environment
>variables (secret ones for credentials) and use it in bench-run. I don't
>see any problem to do it in this way.
Fixed as suggested.
>
>Aside of this, I'm against of using gitlab-runner on performance
>machines, because I don't know how it works. But okay, maybe everything
>will be fine, however please monitor its behaviour.
Right, it doesn't affect the performance results, also at the next performance
process development it can be monitored by Prometheus - still on discussion.
>
>My objections against using docker in a performance testing are below.
>Skip them: it is only to say 'I said this!' in the future.
>
>Several questions about the patch and bench-run are at end of the email
>(it is about stage 2, yep, but anyway).
>
>WBR, Alexander Turenko.
>
>----
>
>Docker virtualizes network and disk (both root and volumes). Any
>virtualization level adds complexity: requires more expertise and work
>to investigate and explain results, may affect results on its own and
>make them less predictable and stable. On the other hand, it does not
>give any gains for performance testing.
>
>One may say that it freezes userspace, but it may be easily achieved w/o
>docker: just don't change it. That's all.
>
>Okay, this topic is not so easy when the machine where a performance
>testing performed is not fully controlled: weird processes within an
>organization does not save us from somebody who will login and update
>something (strange, yep?).
>
>Docker will not save us from this situation: somebody may update docker
>itself, or kernel, or run something that will affect results that are in
>fly. The problem is in processes and it should be solved first.
>
>One may say that docker does not spoil performance results. Maybe. Maybe
>not. It is hard to say without deep investigation. While gains are so
>vague I would not pay my time to look at this direction.
>
>This is basically all, but I'll share several questions to show that my
>point 'adding of a virtualization level requires more expertise' have
>some ground downward.
>
>----
>
>Whether vm.dirty_ratio will work in the same way for dirty pages of a
>filesystem within a volume as for an underlying filesystem? Whether it
>depends on a certain underlying filesystem? Whether it'll use docker's
>memory size to calculate a dirty pages percent or system-wide one?
>
>Whether `sync + drop caches` within a container will affect disc buffers
>outside of the container (say, one that remains after a previous run
>within another container)?
>
>Whether a unix domain socket that is created within overlay filesystem
>will behave in the same way as on a real filesystem (in case we'll test
>iproto via unix socket)?
>
>Will fsync() flush data to a real disc or will be catched somewhere
>within docker? We had related regression [1].
>
>[1]:  https://github.com/tarantool/tarantool/issues/3747
>
>----
>
>Black box testing sucks. We should deeply understand what we're testing,
>otherwise it will get 'some quality' which never will be good.
>
>Performance testing with docker is black box for me. When it is
>'tarantool + libc + some libs + kernel' I more or less understand (at
>least able to inspect) what is going on and I can, say, propose to add /
>remove / tune workloads to cover specific needs.
>
>I can dig into docker, of course, but there are so many things which
>deserves time more than this activity.
>
>----
>
>I looked here and there around the patch and bench-run and have several
>questions. Since we agreed to don't review anything around workloads
>now, it is just questions. Okay to ignore.
>
>I don't see any volume / mount parameters. Aren't this means that WAL
>writes will going to an overlay fs? I guess it may be far from a real
>disc and may have a separate level of caching.
Gitlab-runner uses Docker image for running the jobs, it's configuration at the
very high level is:
tls_verify = false
memory = "60g"
memory_swap = "60g"
cpuset_cpus = "6,7,8,9,10,11"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/mnt/gitlab_docker_tmpfs_perf:/builds", "/cache"]
network_mode = "host"
shm_size = 0
Also for different benchmarks like we have linkbench where the disk performance
is need to be checked it has special gitlab-runner configuration - it really uses disk
space and doesn't have swap space, because memory == memory_swap:
memory = "3g"
memory_swap = "3g"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/test_ssd/gitlab:/builds", "/cache"]
network_mode = "host"
shm_size = 0

Also the docker images can be checked by 'docker inspect' for the fs type, by
default it is 'overlay2', anyway it can be changed after needed discussions.
>
>AFAIS, the current way to use docker don't even try to freeze userspace:
>it uses 'ubuntu:18.04' tag, which is updated from time to time, not,
>say, 'ubuntu:bionic-20200112'. It also performs 'apt-get update' inside
>and so userspace will be changed for each rebuilt of the image. We
>unable to change something inside the image and don't update everything.
>This way we don't actually control userspace updates. 
Right, I've removed the 'upgrade' extra call and left only 'apt-get update', which
in real just updates the list of the repositories with needed packages to install.
>
>
>BTW, why Ubuntu is used while all production environments (where
>performance matters) are on RHEL / CentOS 7? 
The Dockerfiles that installs the benchmarks that were used from benchmarks
run repository used Ubuntu 18.04, so it was left till the decision that we need to
update it to CentOS 7 or similar.
>
>
>Why dirty cache is not cleaned (`sync`) before flushing clean cache to
>disc (`echo 3 > /proc/sys/vm/drop_caches`)?
Ok, sure, I've checked it and set in bench-run repository scripts.

-- 
Alexander Tikhonov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200217/78b7b8d5/attachment.html>