[Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sat Nov 28 01:10:57 MSK 2020
>> On 24.11.2020 14:18, Serge Petrenko wrote:
>>> When running a cluster with leader election, its useful to wait till the
>>> instance is writeable to determine that it has become a leader. However,
>>> sometimes the instance fails to write data right after transitioning to
>>> leader because its limbo still contains pending transactions from the
>>> old leader. Make sure the instance deals with pending transactions first
>>> and becomes writeable only once the limbo is empty.
>> I just realized one thing. We can add a function txn_limbo_is_ro(),
>> like we did with raft_is_ro(), account it in box_update_ro_summary(),
>> and call box_update_ro_summary() when we see that the limbo is emptied,
>> or when its ownership changes to a different instance.
>>
>> Probably would be simpler, and also we could make it work with manual
>> election! So users could call box.ctl.wait_rw() even without using raft!
>>
>> To show concrete error if somebody still tries to write, we could
>> patch box_check_writable() to show the reason why the instance is not
>> writable. We will do it anyway for raft, to tell the users the real
>> leader in case they are trying to write on a replica. In scope of
>> https://github.com/tarantool/tarantool/issues/5568.
>>
>> Your version of the patch also looks good.
>>
>> What do you think?
>
> Thanks for your answer!
>
> Your proposal looks good. One question though. What about multimaster
> synchro? Are we planning to support it one day? If yes, then limbo
> emptiness will mean nothing.
Indeed. But I have no idea if we will ever support it, and if
yes - when. It is possible in theory, but we never tried to
elaborate so far. I wouldn't expect it happening in the next
year or so.
> So, there're two options:
>
> 1) we may leave this patch as is. Then one won't be
> able to call wait_rw() with manual election. That's a pity, since
> your proposal looks quite logical, especially from the user's point
> of view. Having a single error for all these cases would be good.
It also bothers me, that now box.ctl.wait_rw() actually does not wait
for rw, strictly speaking. So it is probably even a bug, not a feature.
A user can get wait_rw() true, but still won't be able to write.
Value of such helper becomes zero with synchronous replication, in
essence.
> 2) Make limbo affect is_ro. Then everything's good for now, but we'll
> have to rewrite it back once (and if) we decide to implement
> multimaster synchro. Then the patch will look exactly like it does
> now.
Perhaps. If nothing will change in raft and limbo significantly.
> I'm not sure whether we're planning to make multimaster synchro work,
> so I can't choose between these options and leave you to decide.
We discuss it, but there is no really a plan. No customer request, and
no ticket AFAIR. I assume some big customer must ask for this for us to
even start planning.
> Besides, looks like if we take option 2 the patch will differ in a single
> line: `box_update_ro_summary()` after `box_clear_synchro_queue`
Why? You will need to patch the limbo as well, not just remove
box_update_ro_summary from after box_clear_synchro_queue. We will
need to add a method like txn_limbo_is_ro() (similar to raft_is_ro()),
use it in box_update_ro_summary(), and call the latter from some
places in the limbo.
I vote for the option 2. Mostly because I am afraid it is rather a bug.
wait_rw just does not guarantee anything now.
More information about the Tarantool-patches
mailing list