[Tarantool-patches] [PATCH 1/1] applier: process synchro rows after WAL write
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Fri Apr 9 01:56:59 MSK 2021
Hi! Thanks for the review!
> Thanks for the patch!
>
> I'm a bit worried about two different synchro rows coming from two
> appliers. Is everything ok in this case?
> Or even normal rows coming from other appliers. Say some other replica
> has already applied this synchro row and even has written some rows on
> top of it. Its applier won't block on replica_id latch, and may fail to apply
> some txs following this synchro row, because it's not yet written to WAL
> and thus not applied (limbo is still not empty or belongs to other instance).
Should not happen, if I understood you correctly. See my response in the
other email + a test for test-run + a console test.
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 971b2e64c..f7c82548c 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -762,26 +762,21 @@ applier_txn_wal_write_cb(struct trigger *trigger, void *event)
>> }
>> struct synchro_entry {
>> - /** Encoded form of a synchro record. */
>> - struct synchro_body_bin body_bin;
>> -
>> - /** xrow to write, used by the journal engine. */
>> - struct xrow_header row;
>> -
>> + /** Request to process when WAL write is done. */
>> + struct synchro_request *req;
>> + /** Fiber created the entry. To wakeup when WAL write is done. */
>> + struct fiber *owner;
>> /**
>> - * The journal entry itself. Note since
>> - * it has unsized array it must be the
>> - * last entry in the structure.
>> + * The base journal entry. It has unsized array and then must be the
>> + * last entry in the structure. But can workaround it via a union
>> + * adding the needed tail as char[].
>> */
>> - struct journal_entry journal_entry;
>> + union {
>> + struct journal_entry base;
>> + char base_buf[sizeof(base) + sizeof(base.rows[0])];
>> + };
>> };
>
> I don't understand this union stuff.
> The journal_entry is the last entry in synchro_entry anyway.
> Is this a hack for allowing to allocate synchro_entry on the stack?
Yes. Otherwise I would need a more ugly hack:
struct synchro_entry *e;
alignas(alignof(*e)) char e_buf[sizeof(*e) + sizeof(e->base.rows[0])];
e = (struct synchro_entry *)e_buf;
Then I managed to create the union hack, which I find
less ugly, and it seems to be allow to declare it on the stack.
The interesting thing though is that I still can't make that
union not a last member.
I also see now there is a problem in one of the CI jobs:
https://github.com/tarantool/tarantool/runs/2291961909
It complains about out of bound array access. Which has nothing to
do with the union, just a C++ issue I suppose. I fixed it in the new
version, see the other email.
>> diff --git a/test/replication/gh-5213-qsync-applier-order.test.lua b/test/replication/gh-5213-qsync-applier-order.test.lua
>> new file mode 100644
>> index 000000000..3a32626f2
>> --- /dev/null
>> +++ b/test/replication/gh-5213-qsync-applier-order.test.lua
>> @@ -0,0 +1,129 @@
>> +--
>> +-- gh-5213: applier used to process CONFIRM/ROLLBACK before writing them to WAL.
>> +-- As a result it could happen that the transactions became visible on CONFIRM,
>> +-- then somehow weren't written to WAL, and on restart the data might not be
>> +-- visible again. Which means rollback of confirmed data and is not acceptable
>> +-- (on the contrary with commit after rollback).
>> +--
>> +test_run = require('test_run').new()
>> +fiber = require('fiber')
>> +old_synchro_quorum = box.cfg.replication_synchro_quorum
>> +old_synchro_timeout = box.cfg.replication_synchro_timeout
>> +
>> +box.schema.user.grant('guest', 'super')
>> +
>> +s = box.schema.space.create('test', {is_sync = true})
>> +_ = s:create_index('pk')
>> +
>> +test_run:cmd('create server replica with rpl_master=default,\
>> + script="replication/gh-5213-replica.lua"')
>> +test_run:cmd('start server replica')
>> +
>> +test_run:switch('replica')
>> +assert(box.info.id == 2)
>> +lsn = box.info.vclock[1]
>> +
>> +test_run:switch('default')
>> +fiber = require('fiber')
>> +box.cfg{ \
>> + replication_synchro_quorum = 3, \
>> + replication_synchro_timeout = 1000, \
>> +}
>> +f = fiber.new(function() s:replace{1} end)
>> +
>> +test_run:switch('replica')
>> +-- Wait when the transaction is written to WAL.
>> +test_run:wait_cond(function() return box.info.vclock[1] == lsn + 1 end)
>
> This shouldn't go wrong, but I find
> test_run:wait_lsn('replica, 'default') more durable.
> And you wouldn't need to save lsn above then.
>
> Same for other usages of lsn = ... and wait_cond(box.info.vclock[1] == lsn + ...)
>
> Up to you though.
I totally forgot about wait_lsn(), thanks for noticing.
I applied it. See the full diff in the second email.
More information about the Tarantool-patches
mailing list