From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v2] wal: introduce limits on simultaneous writes Date: Wed, 24 Feb 2021 22:32:00 +0300 [thread overview] Message-ID: <2e8b0be7-52ba-70b6-d1d9-cdf976106a2d@tarantool.org> (raw) In-Reply-To: <77af94ed-71eb-74e5-da73-7ae7286bfeb1@tarantool.org> 24.02.2021 01:19, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! Thanks for the review! Please find my answers below and v3 of the patch in your inbox. > >>>> diff --git a/src/box/journal.c b/src/box/journal.c >>>> index cb320b557..49441e596 100644 >>>> --- a/src/box/journal.c >>>> +++ b/src/box/journal.c >>>> @@ -55,3 +55,66 @@ journal_entry_new(size_t n_rows, struct region *region, >>>> complete_data); >>>> return entry; >>>> } >>>> + >>>> +struct journal_queue_entry { >>>> + /** The fiber waiting for queue space to free. */ >>>> + struct fiber *fiber; >>>> + /** Whether the fiber should be waken up regardless of queue size. */ >>>> + bool is_ready; >>>> + /** A link in all waiting fibers list. */ >>>> + struct rlist in_queue; >>>> +}; >>>> + >>>> +/** >>>> + * Wake up the next waiter in journal queue. >>>> + */ >>>> +static inline void >>>> +journal_queue_wakeup_next(struct rlist *link, bool force_ready) >>> 1. The flag is known in all usage places at compilation time. Is it >>> possible to split the function into force/normal versions? The same >>> for journal_queue_wakeup() from which this runtime uncertainty arises. >> Actually, the parameter is not known at compile time when wakeup_next() >> is called from journal_wait_queue(). For now wakeup_next() only has a single >> check for force_ready, so moving the check outside would only increase the >> number of branches. >> >> journal_queue_wakeup() is called only once per a whole queue wakeup, so >> I suppose it doesn't hurt much it has a compile-time known parameter. > Is it called once? Then why does it have `if (current_journal->queue_is_woken)` > check? I was trying to say 'it's called once per a bunch of wakeup_next() calls' Just ignore this. This is irrelevant. Actually, no, it may be called multiple times, from every journal_async_complete(). But it is a no-op for each consequent call, except the first one. (while the queue is being cleared). > >>>> +{ >>>> + /* Empty queue or last entry in queue. */ >>>> + if (link == rlist_last(¤t_journal->waiters)) { >>> 2. I am not sure I understand what is happening here. Why is this >>> function in one place called with the pointer at the list itself, >>> and in another place with the pointer at one element? >> Well, <list head> -> next is the fist list entry, right? > Perhaps. TBH, I don't remember and when see such tricky things in > the code, it takes time to understand it. > >> In queue_wakeup() I wake the first waiter up. >> >> Once any waiter gets woken up, it wakes up the next waiter. >> Which is <in_queue> -> next. >> >> That's why I have a common helper for these two cases. > Ok, I see now. But it seems you could make it simpler, right? > > ==================== > @@ -69,10 +69,10 @@ struct journal_queue_entry { > * Wake up the next waiter in journal queue. > */ > static inline void > -journal_queue_wakeup_next(struct rlist *link, bool force_ready) > +journal_queue_wakeup_first(bool force_ready) > { > /* Empty queue or last entry in queue. */ > - if (link == rlist_last(¤t_journal->waiters)) { > + if (rlist_empty(¤t_journal->waiters)) { > current_journal->queue_is_woken = false; > return; > } > @@ -97,7 +97,7 @@ journal_queue_wakeup(bool force_ready) > if (current_journal->queue_is_woken) > return; > current_journal->queue_is_woken = true; > - journal_queue_wakeup_next(¤t_journal->waiters, force_ready); > + journal_queue_wakeup_first(force_ready); > } > > void > @@ -114,7 +114,7 @@ journal_wait_queue(void) > while (journal_queue_is_full() && !entry.is_ready) > fiber_yield(); > > - journal_queue_wakeup_next(&entry.in_queue, entry.is_ready); > assert(&entry.in_queue == rlist_first(¤t_journal->waiters)); > rlist_del(&entry.in_queue); > + journal_queue_wakeup_first(entry.is_ready); > } > ==================== > > (I didn't test it.) Yes, indeed. Thanks! Applied with minor changes. >>> 5. Can rlist_del be done along with fiber_wakeup()? Then you >>> wouldn't need is_woken maybe. >> Looks like it can't. >> Say we have only one waiter. And remove it from the list on wakeup. >> The list would become empty and there'd be no way to check whether >> journal has any waiters, and we may reorder the entries (put new ones before >> the waiting one). This is not necessarily bad, because I put entries into queue >> before txn_begin(), but someone may call journal_wait_queue() from inside the >> transaction, or right before txn_commit(). Then it might be bad to put other >> transactions before this one. > Order change is definitely not acceptable. > >> So while removing is_woken we would have to introduce queue_has_waiters flag for >> the sake of this single waiter. > It would rather become a counter - number of waiters. Because there can > be many. But yeah, I see the problem. > >>>> +} >>>> + >>>> +/** >>>> + * Check whether anyone is waiting for the journal queue to empty. If there are >>>> + * other waiters we must go after them to preserve write order. >>>> + */ >>>> +static inline bool >>>> +journal_queue_has_waiters(void) >>>> +{ >>>> + return !rlist_empty(¤t_journal->waiters); >>>> +} >>>> + >>>> +/** Yield until there's some space in the journal queue. */ >>>> +void >>>> +journal_wait_queue(void); >>>> + >>>> +/** Set maximal journal queue size in bytes. */ >>>> +static inline void >>>> +journal_queue_set_max_size(struct journal *j, int64_t size) >>> 7. Why do we have journal parameter here, but don't have it in >>> the other functions? The same journal_queue_set_max_len. >> This is my attempt to make sure only wal_writer's journal has a queue. >> I explicitly set queue_max_... parameters only for wal_writer's journal. >> And then there's an assert that journal_queue_set_...() is only called with >> the current journal. > Or the assertion could be done in wal_set_queue_*() functions. To keep the > journal API consistent. Actually, struct journal has a ton queue_* members now, so I'm following your older advice and extracting everything related to queues into struct journal_queue. > > I just realized, journal can be easily unit-tested. It does not depend on > anything except small/ and core/ libs. Although seems like a lot of work so > maybe not now. Probably later, for something more complex and harder to test > via functional tests. However if you would write tests now, it would be > greatly appreciated. > >>>> +{ >>>> + assert(j == current_journal); >>>> + j->queue_max_size = size;c >>>> + if (journal_queue_has_waiters() && !journal_queue_is_full()) >>>> + journal_queue_wakeup(false); >>>> +} >>>> @@ -159,6 +264,12 @@ journal_write(struct journal_entry *entry) >>>> static inline int >>>> journal_write_async(struct journal_entry *entry) >>>> { >>>> + /* >>>> + * It's the job of the caller to check whether the queue is full prior >>>> + * to submitting the request. >>> 8. Maybe add an assert though. >> I wanted to, but it's impossible. >> The queue may be full when all the waiters are forcefully waken up by a >> synchronous commit. And it's hard to tell whether it was a "force" wakeup >> or not. So let's just hope noone misuses this API. > Yeah, I see now. > >> Or, even better, I can remove is_ready field from queue entries and add a new field >> to the journal: queue_is_ready or something. And addition to queue_is_awake. >> Then every entry will check queue_is_ready instead of entry.is_ready and >> it'll be possible to add an assert here: !journal_queue_is_full || journal_queue_is_ready >> Looks like this'll also allow us to extract queue_wakeup_(next)_force, like you suggested >> in paragraph 1. >> What do you think ? > Sounds good, worth doing. I introduced queue_is_ready and removed entry.is_ready. The code looks cleaner now and together with your suggestion regarding journal_queue_wakeup_first(), now it doesn't have parameters at all. It does have a check for queue_is_ready internally, but there's no point in separating _force and normal versions. This would simply move the check outside the function call. > > See 2 comments below. > >>>> + */ >>>> + journal_queue_on_append(entry); >>>> + >>>> return current_journal->write_async(current_journal, entry); >>>> }> diff --git a/src/box/applier.cc b/src/box/applier.cc >> index 553db76fc..7c2452d2b 100644 >> --- a/src/box/applier.cc >> +++ b/src/box/applier.cc >> @@ -967,6 +967,15 @@ applier_apply_tx(struct applier *applier, struct stailq *rows) >> goto success; >> } >> >> + /* >> + * Do not spam WAL with excess write requests, let it process what's >> + * piled up first. >> + * This is done before opening the transaction to avoid problems with >> + * yielding inside it. >> + */ >> + if (journal_queue_is_full()) >> + journal_wait_queue(); > 1. I just noticed you do the waiting before starting the > transaction. In case of Vinyl the transaction can yield. So > by the time you get to commit, the queue could be full. > > Don't know what to do with this. We can't wait before > txn_commit_async() because it would kill the memtx transactions. > > Maybe we could not to care now. Because overpopulation never > will exceed number of appliers, which is tiny. > > But when async transactions will go to the public API, we > will face this issue anyway. I assume we will need to extract > txn_prepare to the "public" part of txn.h and use it separately > from writing to the journal. So in our code it would look like > this: > > sync: txn_begin() ... txn_commit() > async: txn_begin() ... txn_prepare() journal_wait() txn_persist() > > or something similar. But don't know for sure. > > Summary: leave it as is if don't want to tear commit_async() > and commit() up into parts now. Let's leave it as is for now then. > >> + >> /** >> * Explicitly begin the transaction so that we can >> * control fiber->gc life cycle and, in case of apply >> diff --git a/src/box/journal.h b/src/box/journal.h >> index 5d8d5a726..d295dfa4b 100644 >> --- a/src/box/journal.h >> +++ b/src/box/journal.h >> @@ -159,6 +264,12 @@ journal_write(struct journal_entry *entry) >> static inline int >> journal_write_async(struct journal_entry *entry) >> { >> + /* >> + * It's the job of the caller to check whether the queue is full prior >> + * to submitting the request. >> + */ >> + journal_queue_on_append(entry); >> + >> return current_journal->write_async(current_journal, entry); > 2. What if write_async() is called by some applier when the queue is > not full, but also not empty? It seems it will bypass the existing > waiters and lead to the transaction order change. No? Yes, you're correct, and thanks for noticing this. This is fixed simply: diff --git a/src/box/applier.cc b/src/box/applier.cc index 7c2452d2b..27ddd0f29 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -973,7 +973,7 @@ applier_apply_tx(struct applier *applier, struct stailq *rows) * This is done before opening the transaction to avoid problems with * yielding inside it. */ - if (journal_queue_is_full()) + if (journal_queue_is_full() || journal_queue_has_waiters()) journal_wait_queue(); /** Having this fix applied, nothing else could go wrong here AFAICS. > > I start thinking that we need to queue the journal_entry objects right > in the journal object. So if their queue is not empty, > journal_write_async() adds the entry to the queue and does not call > write_async(). Why? > > Also would be cool to add a test how the applier can reorder WAL writes > in the current patch. -- Serge Petrenko
next prev parent reply other threads:[~2021-02-24 19:32 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-11 12:17 Serge Petrenko via Tarantool-patches 2021-02-15 11:17 ` Cyrill Gorcunov via Tarantool-patches 2021-02-16 12:47 ` Serge Petrenko via Tarantool-patches 2021-02-16 12:49 ` Cyrill Gorcunov via Tarantool-patches 2021-02-17 20:46 ` Vladislav Shpilevoy via Tarantool-patches 2021-02-18 20:06 ` Serge Petrenko via Tarantool-patches 2021-02-23 22:19 ` Vladislav Shpilevoy via Tarantool-patches 2021-02-24 19:32 ` Serge Petrenko via Tarantool-patches [this message] 2021-02-26 0:58 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2e8b0be7-52ba-70b6-d1d9-cdf976106a2d@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2] wal: introduce limits on simultaneous writes' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox