Re: [Tarantool-patches] [PATCH v3] wal: introduce limits on simultaneous writes - Vladislav Shpilevoy via Tarantool-patches

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v3] wal: introduce limits on simultaneous writes
Date: Wed, 3 Mar 2021 21:59:02 +0100	[thread overview]
Message-ID: <8b33b97e-fb7b-6ae5-9007-3f82e3cb49ea@tarantool.org> (raw)
In-Reply-To: <97734227-410b-3e72-2ee7-30e7d6f63217@tarantool.org>

>>>>> @@ -159,6 +276,13 @@ journal_write(struct journal_entry *entry)
>>>>>    static inline int
>>>>>    journal_write_async(struct journal_entry *entry)
>>>>>    {
>>>>> +    /*
>>>>> +     * It's the job of the caller to check whether the queue is full prior
>>>>> +     * to submitting the request.
>>>>> +     */
>>>>> +    assert(!journal_queue_is_full() || journal_queue.is_ready);
>>>>> +    journal_queue_on_append(entry);
>>>> 8. Probably must assert that waiters list is empty. Otherwise you
>>>> could go out of order.
>>> It's not empty by the time the first entry gets to 'journal_write_async'.
>>> Everyone else is waken up, but not yet removed from the queue.
>>>
>>> Looks like we cannot determine whether a write is called after waiting in queue
>>> or not.
>> It bothers me a lot, the rest looks good. We don't have any protection against
>> a reorder, even not an assertion. How about this? (I didn't test):
> 
> Thanks for thinking this through!
> Yes, indeed, this is a problem. Unfortunately, your solution will allow reordering.
> We needed the waiter count to prevent anyone from sliding in front when the qeueue
> is already waken, but the fibers are not yet scheduled.
> 
> We also discussed that a reorder may happen if applier tries to apply
> a vinyl tx, which may yield after it has waited in queue.
> In this case, if some tx is committed (synchronously) and added to the queue
> while the vinyl tx sleeps, the vinyl tx will bypass that synchronous tx in queue.
> 
> So, there shouldn't be any yields between journal_queue_wait() and
> journal_write(_async)().
> 
> We cannot yield in memtx transactions, so the only solution is to put
> journal_queue_wait() after txn_prepare() inside txn_commit_async().
> 
> Take a look at this patch (on top of the branch):
> I decided to move journal_queue_wait() to journal_write_async() to be
> consistent with journal_write(), which also checks queue.
> 
> Now reordering is not a problem since waiting in queue is done right before
> the write request is submitted. No yields in between.
> 
> Also, I renamed journal_write_async() to journal_write_try_async() and
> txn_commit_async() to txn_commit_try_async() to indicate that they may
> yield occasionally now.
> 
> What do you think of this approach?

Looks good.

> P.S. there's another thing I wanted to discuss: should we set defaults for
> wal_queue_max_size(len) to some finite value? If yes, then what value should
> it be? wal_queue_max_size=default memtx_memory (256 Mb)?
> Don't know which value to choose for wal_queue_max_len at all.

Probably we could try something sanely big. For instance, max_len = 1000000
in assumption that you might replicate a million rows per second, and a WAL
write is likely to end faster than a second. But if you still have a queue
bigger than million, then it looks wrong. On the other hand, should give some
bit of protection against OOM assuming the transactions aren't big.

For max_size we could be a bit more brave than 256MB. Perhaps 5GB? 10GB? Just
to have some sane limit up to which it is still imaginable that the instance
really has that much memory. Better than infinity.

Don't know. Also can leave unlimited, or ask Mons for good defaults since he
has experience of configuring Tarantool in prod.

See 2 comments below.

> diff --git a/test/replication/gh-5536-wal-limit.result b/test/replication/gh-5536-wal-limit.result
> new file mode 100644
> index 000000000..f7799baa8
> --- /dev/null
> +++ b/test/replication/gh-5536-wal-limit.result
> @@ -0,0 +1,132 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +
> +--
> +-- gh-5536: out of memory on a joining replica. Introduce a WAL queue limit so
> +-- that appliers stop reading new transactions from master once the queue is
> +-- full.
> +--
> +box.schema.user.grant('guest', 'replication')
> + | ---
> + | ...
> +_ = box.schema.space.create('test')
> + | ---
> + | ...
> +_ = box.space.test:create_index('pk')
> + | ---
> + | ...
> +
> +replication_timeout = box.cfg.replication_timeout

1. Need to save the WAL limit in case it won't be defaulted
to 0 in future. To properly restore it in the end of the test.

> + | ---
> + | ...
> +box.cfg{replication_timeout=1000}
> + | ---
> + | ...=
2. Would be also good to have at least one test for WAL row count
limit, not only for the byte size.

next prev parent reply	other threads:[~2021-03-03 20:59 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 19:35 Serge Petrenko via Tarantool-patches
2021-02-24 19:40 ` Serge Petrenko via Tarantool-patches
2021-02-25 13:05 ` Konstantin Osipov via Tarantool-patches
2021-02-26  0:57   ` Vladislav Shpilevoy via Tarantool-patches
2021-02-26  7:18     ` Konstantin Osipov via Tarantool-patches
2021-02-26 20:23       ` Vladislav Shpilevoy via Tarantool-patches
2021-02-26 21:20         ` Konstantin Osipov via Tarantool-patches
2021-02-26 22:44           ` Vladislav Shpilevoy via Tarantool-patches
2021-02-27 13:27             ` Konstantin Osipov via Tarantool-patches
2021-03-01 19:15   ` Serge Petrenko via Tarantool-patches
2021-03-01 21:46     ` Konstantin Osipov via Tarantool-patches
2021-02-26  0:56 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-01 19:08   ` Serge Petrenko via Tarantool-patches
2021-03-01 22:05     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-02 17:51       ` Serge Petrenko via Tarantool-patches
2021-03-03 20:59         ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-03-09 15:10           ` Serge Petrenko via Tarantool-patches
2021-03-09 19:49 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-10  8:18   ` Konstantin Osipov via Tarantool-patches
2021-03-12 17:10     ` Serge Petrenko via Tarantool-patches
2021-03-13 19:14       ` Konstantin Osipov via Tarantool-patches
2021-03-15 23:42       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-16  6:45         ` Konstantin Osipov via Tarantool-patches
2021-03-16 20:27           ` Vladislav Shpilevoy via Tarantool-patches
2021-03-16 10:19         ` Serge Petrenko via Tarantool-patches
2021-03-16 20:48           ` Vladislav Shpilevoy via Tarantool-patches
2021-03-17 12:14             ` Serge Petrenko via Tarantool-patches
2021-03-17 21:02           ` Vladislav Shpilevoy via Tarantool-patches
2021-03-19 11:32             ` Serge Petrenko via Tarantool-patches
2021-03-19 15:36 ` Kirill Yukhin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b33b97e-fb7b-6ae5-9007-3f82e3cb49ea@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v3] wal: introduce limits on simultaneous writes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox