[Tarantool-patches] [PATCH 04/13] wal: refactor wal_write_to_disk()

Wed Jun 16 11:02:11 MSK 2021

On Wed, Jun 16, 2021 at 08:22:15AM +0200, Vladislav Shpilevoy wrote:
> 
> 
> On 15.06.2021 22:46, Cyrill Gorcunov wrote:
> > On Fri, Jun 11, 2021 at 11:56:12PM +0200, Vladislav Shpilevoy wrote:
> >> It didn't have a single fail path. That led to some amount of code
> >> duplication, and it complicated future patches where the journal
> >> entries are going to get a proper error reason instead of default
> >> -1 without any details.
> >>
> >> The patch is a preparation for #6027 where it is wanted to have
> >> more detailed errors on journal entry/transaction fail instead
> >> of ER_WAL_IO for everything. Sometimes it can override a real
> >> error like a cascade rollback, or a transaction conflict.
> >>
> >> Part of #6027
> >> ---
> >> @@ -1038,7 +1036,10 @@ wal_write_to_disk(struct cmsg *msg)
> >>  {
> >>  	struct wal_writer *writer = &wal_writer_singleton;
> >>  	struct wal_msg *wal_msg = (struct wal_msg *) msg;
> >>  	struct error *error;
> >> +	assert(!stailq_empty(&wal_msg->commit));
> > 
> > Hi Vlad, you know I don't understand why we need this assert...
> 
> Otherwise in case of, for instance, rotate fail, the rollback won't
> start.

But the current whole code logic is assuming that if commit chain
is empty then there was no data to write or rollback, iow codeflow
is using list emptiness as a sign of data to procceed. If we really
want to prohibit calling wal_write_to_disk with no entries then
maybe better put panic here?

The wal_msg->commit entry is accessed later in code anyway via bpu,
which means that if we add

	if (stailq_empty(&wal_msg->commit))
		panic("wal: attempt to update vclock without data");

this won't cause perf degradation since after a few lines you gonna be
testing @commit again where bpu entry is already filled and this will
allow us to catch problems on release builds as well.

Don't get me wrong please I simply don't like assert() with a passion
because it hides problems which might happen on release builds.

Anyway, just a proposal, if you still prefer calling assert here, ok
let it be.

> > 
> > Jumps to "done" label change the code logic. Before the patch if we
> > reached the write and say wal_opt_rotate failed we set up is_in_rollback
> > sign and exit early, after the patch we start notifying watchers that
> > there "write" happened which means relay code will be woken up while there
> > no new data on disk level at all, which means watchers wanna be notified
> > for no reason, no? Or I miss something obvious?
> 
> You didn't miss anything. But I see no harm in that. WAL write fail is
> extremely rare, so a rare spurious wakeup won't do anything bad. I
> decided the code reusability and simplicity is more important here.

OK, thanks for explanation!