Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Konstantin Osipov <kostja.osipov@gmail.com>,
	Ilya Kosarev <i.kosarev@tarantool.org>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion
Date: Sun, 16 Feb 2020 16:41:45 +0100	[thread overview]
Message-ID: <2b8dee18-aaab-265e-6d38-4dd3a5d67653@tarantool.org> (raw)
In-Reply-To: <20200215163440.GA19240@atlas>

We discussed the ticket and the patches verbally.

On 15/02/2020 17:34, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/02/15 19:24]:
>> Here is what Kostja said, but somewhy without CCing the
>> mailing list, and my answers to it inlined:
>>
>>> This is the first approximation of what I have proposed at best. 
>>> First of all, the mechanism should not be truncation specific. It should be usable by all subsystems that require emergency memory.
>>>
>>> Second, there is no reason to patch small. All the patch needs to do is something along these lines:
>>>
>>>
>>> 1) in memtx_init, reserve the emergency slab.
>>>
>>> 2) in memtx_tuple_alloc, refuse with error if there is no emergency slab
>>>
>>> 3) in memtx_tuple_free, try to reserve emergency slab (replenish it) if it is empty
>>
>> All of this adds code and conditions to a hot path. The way with quota
>> enabling/disabling works only in a rare case, when delete() fails due
>> to OOM, or when truncate is called, which is not often.
> 
> First of all, the way with quota simply doesn't work. You expect
> the quota to shrink back, but it never does. So you simply run out
> of all existing memory. 

This is solvable. Only normal slabs are never returned. 'Huge' slabs
are. When quota is disabled, we can allocate all new data in 'huge'
slabs. Then quota should shrink back when these slabs are freed.

> If it worked, the fix would to reduce the amount of available
> quota by 1 slab at start, and change quota before truncate, 
> and then return the quota back in its place. 

This does not really work as simple as you wanted. If truncate would
not free anything, you won't be able to return the quota back. Because
it is occupied by a tuple in _truncate.

> My main point: under no circumstances tarantool should go beyond
> the amount of memory set by quota. If we need to reserve
> memory/quota for emergency, it's fine, let's do it at start,
> but going beyond is not acceptable.

Ok, we also discussed with you that we could allocate one special
slab in the beginning, and use it only for special cases like
_truncate tuples and :delete() reservations. But this also does
not fix the issue completely, only makes it harder to stumble into
it. Besides, this adds notable complexity to memtx tuple allocation
and freeing.

> Second, yes, it does add a branch to the hot path. Same as in
> memtx_index_extent_reserve().  This will have no impact on performance profile -
> feel free to check. I think, however, given the 10% performance
> regression in tuple_format, your time optimizing performance will
> be better spent elsewhere.

Having one condition on a hot path is not a justification for
adding more conditions and complexity.

>>> 4) at start of truncate, or wherever we need emergency memory, release emergency slab. Simply return it to arena.
>>
>> Once you returned it, all the other operations will be able to take
>> and fill it. Such as insertions. In the first truncate or delete
>> didn't free anything, or freed not enough to fit a new truncate
>> tuple here, there is no more a reserved slab for a next delete/truncate.
>> So your proposal does not seem to change anything.
> 
> Ehm, I did not fully explain the proposal. It also assumes there is 
> "emergency mode" flag set at start of truncate and cleared at end,
> and if there is no reserve slab and there is an emergence flag
> set, nothing can allocate memory.

Yeah, the problem is the same as with reserving quota instead of a
slab. If truncate/delete didn't free anything, the 'emergency'
slab is already taken, and can't be freed.

Here is a summary of what we discussed.

Lets don't change anything about truncate(). Or make it work using
deletes + periodically try to execute a real truncate again. Or
drop _truncate system space. Or use 'huge' slabs idea for _truncate
tuples.

Talking of delete, it looks like the problem is related to a
leak/bug somewhere in btree. (Probably this is it:
https://github.com/tarantool/tarantool/issues/4780).

The thing is that :delete() should never fail in memtx. It is
guaranteed by memtx_index_extent_reserve() in
memtx_space_replace_all_keys(). Indeed, consider an example. Assume,
that we insert tuples into a tree. It is getting filled, and calls
memtx_index_extent_reserve() with 16 blocks each time. When there is
not enough memory for 16 blocks, it is assumed, that there is enough
for 8 blocks.
So memtx_index_extent_reserve() for REPLACE stops working, but
memtx_index_extent_reserve() for DELETE works, because needs only 8
blocks. Moreover, even if DELETE didn't free anything, the tree is
not changed, and these 8 blocks are still available for a next DELETE
attempt.

This was the initial idea with that reserve() call. A problem may be
with a leak, or with the assumption, that not enough for 16 blocks !=
enough for 8 blocks. And it is possible to reach a situation, when both
REPLACE and DELETE can't reserve anything.

I think that firstly we need a stable reproducer. Because the tests
on the branch seems to fail only at insertion. On deletion they fail
artificially via errinj. Perhaps. Because I tried to delete that
errinj, and nothing changed. It means, that they don't really depend
on why memtx_index_extent_reserve() can fail. And this is bad.

Once we have a reproducer, without errinj, we can see at the real
reason why does not it work. Maybe it would be enough just to increase
RESERVE_EXTENTS_BEFORE_REPLACE. So as after any insertion/update there
would be still reserved at least RESERVE_EXTENTS_BEFORE_DELETE.
Maybe we will find a leak in bps. A fair reproducer will show.

      reply	other threads:[~2020-02-16 15:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 19:40 Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 1/4] small: bump small version Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 2/4] b-tree: return NULL on matras_alloc fail Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 3/4] memtx: use reserved slab for truncation Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 4/4] memtx: space:delete() might fail on reserve Ilya Kosarev
2020-02-14 22:48 ` [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Vladislav Shpilevoy
2020-02-15 15:33   ` Vladislav Shpilevoy
2020-02-15 16:34     ` Konstantin Osipov
2020-02-16 15:41       ` Vladislav Shpilevoy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b8dee18-aaab-265e-6d38-4dd3a5d67653@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=i.kosarev@tarantool.org \
    --cc=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox