[Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
Serge Petrenko
sergepetrenko at tarantool.org
Wed Jul 14 21:28:00 MSK 2021
04.07.2021 15:27, Vladislav Shpilevoy пишет:
> Thanks for working on this!
Thanks for a thorough review!
> See 11 comments below.
>
> 1. On this commit a lot of tests fail. Why?
Must be because persisting limbo and raft state isn't yet implemented.
I'll move the relevant patches before this one.
> 2. On top of this commit and on top of the branch too I tried to
> promote a candidate and got a strange error in the logs, although
> the promotion was successful:
>
> --
> -- Instance 1
> --
>
> -- Step 1
> box.cfg{
> listen = 3313,
> election_mode = 'candidate',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'}
> }
> box.schema.user.grant('guest', 'super')
>
>
> --
> -- Instance 2
> --
>
> -- Step 2
> box.cfg{
> listen = 3314,
> election_mode = 'voter',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'},
> read_only = true,
> }
>
> -- Step 3
> box.cfg{read_only = false, election_mode = 'candidate'}
>
> -- Step 4
> box.ctl.promote()
>
> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
> ---
> ...
That's because once a candidate becomes the leader, it tries to issue
`box.ctl.promote()`, and fails,
since we're already in `box.ctl.promote()` call.
I'm not sure how to handle that properly. This doesn't break anything
though.
> 3. On top of the branch I tried demote on a leader with election mode
> candidate 2 times. Demote didn't do anything on the second invocation.
> The test:
>
> --
> -- Instance 1
> --
>
> -- Step 1
> box.cfg{
> listen = 3313,
> election_mode = 'candidate',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'}
> }
> box.schema.user.grant('guest', 'super')
>
> -- Step 3
> -- I demote self but become leader again because I am a
> -- candidate. This is fine.
> box.ctl.demote()
>
> -- Step 4
> tarantool> box.info.election
> ---
> - state: leader
> vote: 1
> leader: 1
> term: 3
> ...
>
> -- Step 5
> -- But now I try to demote again and nothing happens.
> -- I expected the same behaviour - bump term and become
> -- a leader again.
> tarantool> box.ctl.demote()
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: leader
> vote: 1
> leader: 1
> term: 3 <- the term is the same, so demote didn't do anything.
> ...
>
> --
> -- Instance 2
> --
>
> -- Step 2
> box.cfg{
> listen = 3314,
> election_mode = 'voter',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'},
> read_only = true,
> }
>
This one is fixed now.
> 4. I tried switching between candidate and manual, trying promote/demote
> in different combinations and got promote not doing anything. The test
> is the same as above but steps >= 3 are different:
>
> tarantool> box.ctl.demote()
> 2021-07-04 14:22:49.550 [588] main/103/interactive I> RAFT: bump term to 3, follow
> 2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: persisted state {term: 3}
> 2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: vote for 1, follow
> 2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: persisted state {term: 3, vote: 1}
> 2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: enter candidate state with 1 self vote
> ---
> ...
>
> tarantool> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, state: follower} from 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: enter leader state with quorum 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: vote request is skipped - the leader is already known - 1
> 2021-07-04 14:22:49.553 [588] relay/[::1]:54006/101/main I> recover from `./00000000000000000000.xlog'
>
> ---
> ...
>
> tarantool> box.cfg{election_mode='manual'}
> 2021-07-04 14:22:55.391 [588] main/103/interactive I> set 'election_mode' configuration option to "manual"
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: follower
> vote: 1
> leader: 0
> term: 3
> ...
>
> tarantool> box.ctl.promote()
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: follower
> vote: 1
> leader: 0
> term: 3
> ...
>
> As you can see, the last promote() didn't do anything. The node
> stayed a follower, and no errors were raised.
>
>
> From the comments 2-4 it seems to me the complexity of
> box_clear_synchro_queue()/box_promote() is getting out of control with
> the number of ifs and flags, sometimes having unexpected meaning (for
> example, try_wait flag causes term bump somewhy). It might be good to
> split it into separate functions and make promotion/demotion out of each
> election mode linear and straight.
I agree. I've reworked promote() to be more sane, please see the
corresponding patches
in v4.
>
>> @TarantoolBot document
>> Title: box.ctl.demote
>>
>> `box.ctl.demote()` is a new function, which works exactly like
>> `box.ctl.promote()`, with one exception that it results in the instance
>> writing DEMOTE request to WAL instead of a PROMOTE request.
>>
>> A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
>> limbo as well), but clears the synchronous transaction queue ownership instead
>> of assigning it to a new instance.
> 5. It is worth mentioning that demote can only be called on a master.
Sure.
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 1d894be97..86c5967b9 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1674,15 +1680,22 @@ box_promote(void)
>> rc = -1;
>> } else {
>> promote:
>> - if (try_wait) {
>> + if (try_wait || demote) {
>> raft_new_term(box_raft());
>> if (box_raft_wait_persisted() < 0)
>> return -1;
> 6. What if now some other node won the elections of the term
> you just bumped? Then you wouldn't have the right to perform
> DEMOTE as the term is not yours.
Yes, indeed. Here's how I reworked this bit:
let's not write demote at all, when elections are enabled. Simply
bumping the term should be enough. Another leader (or maybe this instance)
will take over the limbo in the new term.
When elections are off, assume nothing can happen other than someone else
writes a promote/demote earlier than we do. I handle it as well in the
new patch.
>> }
>> uint64_t term = box_raft()->term;
>> - txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
>> + if (demote) {
>> + txn_limbo_write_demote(&txn_limbo, wait_lsn,
>> + term);
>> + } else {
>> + txn_limbo_write_promote(&txn_limbo, wait_lsn,
>> + term);
>> + }
>> + uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
>> struct synchro_request req = {
>> - .type = IPROTO_PROMOTE,
>> + .type = type,
>> .replica_id = former_leader_id,
>> .origin_id = instance_id,
>> .lsn = wait_lsn,
>> diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
>> index 6b9e324ed..b969df836 100644
>> --- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
>> +++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
>> @@ -135,3 +136,5 @@ box.cfg{
>> replication_synchro_quorum = old_synchro_quorum, \
>> replication_synchro_timeout = old_synchro_timeout, \
>> }
>> +box.ctl.demote()
>> +
> 7. Git diff highlights this new line with red as unnecessary.
Sure, fixed.
>> diff --git a/test/replication/gh-6034-limbo-ownership.result b/test/replication/gh-6034-limbo-ownership.result
>> new file mode 100644
>> index 000000000..e412b8d53
>> --- /dev/null
>> +++ b/test/replication/gh-6034-limbo-ownership.result
> 8. Could you please write qsync tests with gh-####-qsync-...
> pattern? It is useful when you want to run all qsync tests
> and can do `python test-run.py qsync` then.
Yep.
>> @@ -0,0 +1,189 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +fiber = require('fiber')
> 9. Fiber is not used in the test.
Removed, thanks!
>> + | ---
>> + | ...
>> +
> <...>
>
>> +assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
> 10. There is test_run:get_server_id().
Hm, didn't know that, thanks!
>> + | ---
>> + | - true
>> + | ...
>> +box.space.async:insert{2} -- failure.
>> + | ---
>> + | - error: Can't modify data because this instance is in read-only mode.
>> + | ...
>> +
>> +-- Promotion on the other node. Default should become ro.
>> +box.ctl.promote()
>> + | ---
>> + | ...
>> +assert(not box.info.ro)
>> + | ---
>> + | - true
>> + | ...
>> +assert(box.info.synchro.queue.owner == box.info.id)
>> + | ---
>> + | - true
>> + | ...
>> +box.space.sync:insert{2} -- success.
>> + | ---
>> + | - [2]
>> + | ...
>> +
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +assert(box.info.ro)
>> + | ---
>> + | - true
>> + | ...
>> +assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
> 11. Ditto.
Fixed.
--
Serge Petrenko
More information about the Tarantool-patches
mailing list