[Tarantool-patches] [small] Revert "Free all slabs on region reset"

Thu Jan 30 10:23:18 MSK 2020

On Thu, Jan 30, 2020 at 12:41:22AM +0300, Konstantin Osipov wrote:
> * Alexander Turenko <alexander.turenko at tarantool.org> [20/01/29 11:11]:
> > This reverts commit 67d7ab44ab09df3356929e3692a03321b31f3ebb.
> > 
> > The goal of the reverted commit was to fix flaky fails of tarantool
> > tests that checks amount of memory used by a fiber:
> > 
> >  | fiber.info()[fiber.self().id()].memory.used
> > 
> > It also attempts to overcome the situation when a fiber holds some
> > amount of memory, which is not used in any way. The high limit of such
> > memory is controlled by a threshold in fiber_gc() tarantool's function
> > (128 KiB at the moment):
> 
> The commit by Georgy was doing the right thing, it just had a bug
> in it. I saw Georgy posted a fix for his commit - but I haven't
> seen it on the list.
> 
> Why revert this commit rather than fix it? It should be pretty
> strightforward to fix, no?
> 
> The fix is to avoid iterating over all the free slabs when we
> failed to find the right slab size several times, and just get a
> new slab at once. I'd even say that Georgy's commit was entirely
> correct, it just didn't account for the "land mine" it is setting
> for alloc case.
> 
> Thoughts?

For the record, the discussed fix in small is [1], the change in
tarantool is [2].

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-January/013798.html
[2]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-January/013795.html

The Georgy's fix for small keeps linear traversal over allocated slabs
on region_reset(). Fixed region_reserve() releases one slab at max per
allocation, so this linear traversal will persist for several
region_reset() calls in row. This causes degradation of first ~2M
requests that are 4096+ bytes size on the workload #4736 just because we
traverse the same slabs again and again. I suspect that even calling of
region_free() would give better performance on the workload (at least
without big local slowdowns).

W/o the fix I saw 15-20k allocated slabs during workload from #4736 (and
it was increasing) that explains 2M number of degraded requests (I
assume that ~1000 fibers work on them).

I think that reserve/reset should give predictable performance and so
should be constant until region_free() will be called. Ideally it should
doing something like Georgy made: put one or several unnecessary slabs
to a slab_cache. Even better would be spread deallocations across
allocation calls like LuaJIT does: perform more deallocations if there
are more garbage.

Anyway, I need more time to propose something certain. There are some
thoughts in #4750, but I don't ready to defend them:

> Save a pointer to unused tail of slabs (or store them in a separate
> list) and pass one or several slabs back to slab_cache on allocations
> (region_reserve()) -- this is in the spirit of Georgy's commit
> b4c11319e1ca637394d87cde7514fb43aec6b02f. It does not require linear
> slab list traversal in region_reset() that was added in the reverted
> commit. This way we'll spread deallocations across time and so
> performance will be more predictable comparing a solution with linear
> traversal in region_reset() (that is vital when you have an SLA).

The core of this idea is to create 'garbage' list to distinguish such
slabs from used ones without the full traversal in region_reset().

It should decrease average cost of region_free(), but increase cost of
region_reset() and region_reserve() at a constant. I need to test it on
different workloads to understand whether it is worthful.

BTW, #4750 is only about tests and will be closed soon. If you have
thoughts about gains we can get from changing of region_*() algorithms,
please, file a separate issue and describe them.

WBR, Alexander Turenko.