[Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sun Jul 4 15:27:19 MSK 2021


Thanks for working on this!

See 11 comments below.

1. On this commit a lot of tests fail. Why?

2. On top of this commit and on top of the branch too I tried to
promote a candidate and got a strange error in the logs, although
the promotion was successful:

--
-- Instance 1
--

-- Step 1
box.cfg{
    listen = 3313,
    election_mode = 'candidate',
    replication_synchro_quorum = 2,
    replication = {'localhost:3314', 'localhost:3313'}
}
box.schema.user.grant('guest', 'super')


--
-- Instance 2
--

-- Step 2
box.cfg{
    listen = 3314,
    election_mode = 'voter',
    replication_synchro_quorum = 2,
    replication = {'localhost:3314', 'localhost:3313'},
    read_only = true,
}

-- Step 3
box.cfg{read_only = false, election_mode = 'candidate'}

-- Step 4
box.ctl.promote()

main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
---
...


3. On top of the branch I tried demote on a leader with election mode
candidate 2 times. Demote didn't do anything on the second invocation.
The test:

--
-- Instance 1
--

-- Step 1
box.cfg{
    listen = 3313,
    election_mode = 'candidate',
    replication_synchro_quorum = 2,
    replication = {'localhost:3314', 'localhost:3313'}
}
box.schema.user.grant('guest', 'super')

-- Step 3
-- I demote self but become leader again because I am a
-- candidate. This is fine.
box.ctl.demote()

-- Step 4
tarantool> box.info.election
---
- state: leader
  vote: 1
  leader: 1
  term: 3
...

-- Step 5
-- But now I try to demote again and nothing happens.
-- I expected the same behaviour - bump term and become
-- a leader again.
tarantool> box.ctl.demote()
---
...

tarantool> box.info.election
---
- state: leader
  vote: 1
  leader: 1
  term: 3 <- the term is the same, so demote didn't do anything.
...

--
-- Instance 2
--

-- Step 2
box.cfg{
    listen = 3314,
    election_mode = 'voter',
    replication_synchro_quorum = 2,
    replication = {'localhost:3314', 'localhost:3313'},
    read_only = true,
}


4. I tried switching between candidate and manual, trying promote/demote
in different combinations and got promote not doing anything. The test
is the same as above but steps >= 3 are different:

tarantool> box.ctl.demote()
2021-07-04 14:22:49.550 [588] main/103/interactive I> RAFT: bump term to 3, follow
2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: persisted state {term: 3}
2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: vote for 1, follow
2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: persisted state {term: 3, vote: 1}
2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: enter candidate state with 1 self vote
---
...

tarantool> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, state: follower} from 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: enter leader state with quorum 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: vote request is skipped - the leader is already known - 1
2021-07-04 14:22:49.553 [588] relay/[::1]:54006/101/main I> recover from `./00000000000000000000.xlog'

---
...

tarantool> box.cfg{election_mode='manual'}
2021-07-04 14:22:55.391 [588] main/103/interactive I> set 'election_mode' configuration option to "manual"
---
...

tarantool> box.info.election
---
- state: follower
  vote: 1
  leader: 0
  term: 3
...

tarantool> box.ctl.promote()
---
...

tarantool> box.info.election
---
- state: follower
  vote: 1
  leader: 0
  term: 3
...

As you can see, the last promote() didn't do anything. The node
stayed a follower, and no errors were raised.


>From the comments 2-4 it seems to me the complexity of
box_clear_synchro_queue()/box_promote() is getting out of control with
the number of ifs and flags, sometimes having unexpected meaning (for
example, try_wait flag causes term bump somewhy). It might be good to
split it into separate functions and make promotion/demotion out of each
election mode linear and straight.

> @TarantoolBot document
> Title: box.ctl.demote
> 
> `box.ctl.demote()` is a new function, which works exactly like
> `box.ctl.promote()`, with one exception that it results in the instance
> writing DEMOTE request to WAL instead of a PROMOTE request.
> 
> A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
> limbo as well), but clears the synchronous transaction queue  ownership instead
> of assigning it to a new instance.

5. It is worth mentioning that demote can only be called on a master.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 1d894be97..86c5967b9 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1674,15 +1680,22 @@ box_promote(void)
>  			rc = -1;
>  		} else {
>  promote:
> -			if (try_wait) {
> +			if (try_wait || demote) {
>  				raft_new_term(box_raft());
>  				if (box_raft_wait_persisted() < 0)
>  					return -1;

6. What if now some other node won the elections of the term
you just bumped? Then you wouldn't have the right to perform
DEMOTE as the term is not yours.

>  			}
>  			uint64_t term = box_raft()->term;
> -			txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
> +			if (demote) {
> +				txn_limbo_write_demote(&txn_limbo, wait_lsn,
> +						       term);
> +			} else {
> +				txn_limbo_write_promote(&txn_limbo, wait_lsn,
> +							term);
> +			}
> +			uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
>  			struct synchro_request req = {
> -				.type = IPROTO_PROMOTE,
> +				.type = type,
>  				.replica_id = former_leader_id,
>  				.origin_id = instance_id,
>  				.lsn = wait_lsn,
> diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
> index 6b9e324ed..b969df836 100644
> --- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
> +++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
> @@ -135,3 +136,5 @@ box.cfg{
>      replication_synchro_quorum = old_synchro_quorum,                            \
>      replication_synchro_timeout = old_synchro_timeout,                          \
>  }
> +box.ctl.demote()
> +

7. Git diff highlights this new line with red as unnecessary.

> diff --git a/test/replication/gh-6034-limbo-ownership.result b/test/replication/gh-6034-limbo-ownership.result
> new file mode 100644
> index 000000000..e412b8d53
> --- /dev/null
> +++ b/test/replication/gh-6034-limbo-ownership.result

8. Could you please write qsync tests with gh-####-qsync-...
pattern? It is useful when you want to run all qsync tests
and can do `python test-run.py qsync` then.

> @@ -0,0 +1,189 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +fiber = require('fiber')

9. Fiber is not used in the test.

> + | ---
> + | ...
> +

<...>

> +assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])

10. There is test_run:get_server_id().

> + | ---
> + | - true
> + | ...
> +box.space.async:insert{2} -- failure.
> + | ---
> + | - error: Can't modify data because this instance is in read-only mode.
> + | ...
> +
> +-- Promotion on the other node. Default should become ro.
> +box.ctl.promote()
> + | ---
> + | ...
> +assert(not box.info.ro)
> + | ---
> + | - true
> + | ...
> +assert(box.info.synchro.queue.owner == box.info.id)
> + | ---
> + | - true
> + | ...
> +box.space.sync:insert{2} -- success.
> + | ---
> + | - [2]
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +assert(box.info.ro)
> + | ---
> + | - true
> + | ...
> +assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])

11. Ditto.


More information about the Tarantool-patches mailing list