[Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
@ 2020-11-24 13:18 Serge Petrenko
  2020-11-24 14:14 ` Cyrill Gorcunov
  2020-11-26 21:01 ` Vladislav Shpilevoy
  0 siblings, 2 replies; 7+ messages in thread
From: Serge Petrenko @ 2020-11-24 13:18 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

When running a cluster with leader election, its useful to wait till the
instance is writeable to determine that it has become a leader. However,
sometimes the instance fails to write data right after transitioning to
leader because its limbo still contains pending transactions from the
old leader. Make sure the instance deals with pending transactions first
and becomes writeable only once the limbo is empty.

Closes #5440
---
https://github.com/tarantool/tarantool/issues/5440
sp/gh-5440-on-state-update

 src/box/raft.c                                  | 17 ++++++++++-------
 test/replication/election_qsync.result          | 10 ++++++++--
 test/replication/election_qsync.test.lua        |  6 ++++--
 test/replication/election_qsync_stress.result   |  4 ++++
 test/replication/election_qsync_stress.test.lua |  2 ++
 5 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/src/box/raft.c b/src/box/raft.c
index 6d98f5645..fa2396b4f 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -89,9 +89,12 @@ box_raft_update_synchro_queue(struct raft *raft)
 	 * If the node became a leader, it means it will ignore all records from
 	 * all the other nodes, and won't get late CONFIRM messages anyway. Can
 	 * clear the queue without waiting for confirmations.
+	 * Once the queue is empty it's safe to become writeable.
 	 */
-	if (raft->state == RAFT_STATE_LEADER)
+	if (raft->state == RAFT_STATE_LEADER) {
 		box_clear_synchro_queue(false);
+		box_update_ro_summary();
+	}
 }
 
 static int
@@ -155,14 +158,14 @@ box_raft_on_update_f(struct trigger *trigger, void *event)
 	struct raft *raft = (struct raft *)event;
 	assert(raft == box_raft());
 	/*
-	 * XXX: in case the instance became a leader, RO must be updated only
-	 * after clearing the synchro queue.
-	 *
-	 * When the instance became a follower, then on the contrary - make it
-	 * read-only ASAP, this is good.
+	 * When the instance becomes a follower, it's good to make it read-only
+	 * ASAP. This way we make sure followers don't write anything.
+	 * However, if the instance is transitioning to leader make it writeable
+	 * only after it clears its synchro queue. It won't be able to process
+	 * requests earlier, anyway.
 	 */
-	box_update_ro_summary();
 	if (raft->state != RAFT_STATE_LEADER)
+		box_update_ro_summary();
 		return 0;
 	/*
 	 * If the node became a leader, time to clear the synchro queue. But it
diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
index cb349efcc..c06400b38 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -75,7 +75,10 @@ box.cfg{
  | ---
  | ...
 
-test_run:wait_cond(function() return box.info.election.state == 'leader' end)
+box.ctl.wait_rw()
+ | ---
+ | ...
+assert(box.info.election.state == 'leader')
  | ---
  | - true
  | ...
@@ -130,7 +133,10 @@ box.cfg{
  | ---
  | ...
 
-test_run:wait_cond(function() return box.info.election.state == 'leader' end)
+box.ctl.wait_rw()
+ | ---
+ | ...
+assert(box.info.election.state == 'leader')
  | ---
  | - true
  | ...
diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua
index eb89e5b79..ea6fc4a61 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -39,7 +39,8 @@ box.cfg{
     replication_timeout = 0.1,                                                  \
 }
 
-test_run:wait_cond(function() return box.info.election.state == 'leader' end)
+box.ctl.wait_rw()
+assert(box.info.election.state == 'leader')
 lsn = box.info.lsn
 _ = fiber.create(function()                                                     \
     ok, err = pcall(box.space.test.replace, box.space.test, {1})                \
@@ -69,7 +70,8 @@ box.cfg{
     replication_timeout = 0.01,                                                 \
 }
 
-test_run:wait_cond(function() return box.info.election.state == 'leader' end)
+box.ctl.wait_rw()
+assert(box.info.election.state == 'leader')
 _ = box.space.test:replace{2}
 box.space.test:select{}
 box.space.test:drop()
diff --git a/test/replication/election_qsync_stress.result b/test/replication/election_qsync_stress.result
index 9fab2f1d7..1380c910b 100644
--- a/test/replication/election_qsync_stress.result
+++ b/test/replication/election_qsync_stress.result
@@ -69,6 +69,9 @@ leader_port = test_run:eval(old_leader, 'box.cfg.listen')[1]
 c = netbox.connect(leader_port)
  | ---
  | ...
+c:eval('box.ctl.wait_rw()')
+ | ---
+ | ...
 
 _ = c:eval('box.schema.space.create("test", {is_sync=true})')
  | ---
@@ -94,6 +97,7 @@ for i = 1,10 do
     leader_port = test_run:eval(new_leader, 'box.cfg.listen')[1]
     c = netbox.connect(leader_port)
     c:eval('box.cfg{replication_synchro_timeout=1000}')
+    c:eval('box.ctl.wait_rw()')
     c.space._schema:replace{'smth'}
     c.space.test:get{i}
     test_run:cmd('start server '..old_leader..' with wait=True, wait_load=True, args="2 0.4"')
diff --git a/test/replication/election_qsync_stress.test.lua b/test/replication/election_qsync_stress.test.lua
index 0ba15eef7..d70601cc5 100644
--- a/test/replication/election_qsync_stress.test.lua
+++ b/test/replication/election_qsync_stress.test.lua
@@ -40,6 +40,7 @@ old_leader_nr = get_leader(nrs)
 old_leader = 'election_replica'..old_leader_nr
 leader_port = test_run:eval(old_leader, 'box.cfg.listen')[1]
 c = netbox.connect(leader_port)
+c:eval('box.ctl.wait_rw()')
 
 _ = c:eval('box.schema.space.create("test", {is_sync=true})')
 _ = c:eval('box.space.test:create_index("pk")')
@@ -58,6 +59,7 @@ for i = 1,10 do
     leader_port = test_run:eval(new_leader, 'box.cfg.listen')[1]
     c = netbox.connect(leader_port)
     c:eval('box.cfg{replication_synchro_timeout=1000}')
+    c:eval('box.ctl.wait_rw()')
     c.space._schema:replace{'smth'}
     c.space.test:get{i}
     test_run:cmd('start server '..old_leader..' with wait=True, wait_load=True, args="2 0.4"')
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
  2020-11-24 13:18 [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo Serge Petrenko
@ 2020-11-24 14:14 ` Cyrill Gorcunov
  2020-11-25  8:48   ` Serge Petrenko
  2020-11-26 21:01 ` Vladislav Shpilevoy
  1 sibling, 1 reply; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-11-24 14:14 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

On Tue, Nov 24, 2020 at 04:18:20PM +0300, Serge Petrenko wrote:
...
>  	if (raft->state != RAFT_STATE_LEADER)
> +		box_update_ro_summary();
>  		return 0;

Either indent is broken or curled braces are missed?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
  2020-11-24 14:14 ` Cyrill Gorcunov
@ 2020-11-25  8:48   ` Serge Petrenko
  0 siblings, 0 replies; 7+ messages in thread
From: Serge Petrenko @ 2020-11-25  8:48 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tarantool-patches, v.shpilevoy


24.11.2020 17:14, Cyrill Gorcunov пишет:
> On Tue, Nov 24, 2020 at 04:18:20PM +0300, Serge Petrenko wrote:
> ...
>>   	if (raft->state != RAFT_STATE_LEADER)
>> +		box_update_ro_summary();
>>   		return 0;
> Either indent is broken or curled braces are missed?

Yep, thanks!

Fixed on the branch.

diff --git a/src/box/raft.c b/src/box/raft.c
index fa2396b4f..aa84b82a6 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -164,9 +164,10 @@ box_raft_on_update_f(struct trigger *trigger, void 
*event)
          * only after it clears its synchro queue. It won't be able to 
process
          * requests earlier, anyway.
          */
-       if (raft->state != RAFT_STATE_LEADER)
+       if (raft->state != RAFT_STATE_LEADER) {
                 box_update_ro_summary();
                 return 0;
+       }
         /*
          * If the node became a leader, time to clear the synchro 
queue. But it
          * must be done in the worker fiber so as not to block the state

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
  2020-11-24 13:18 [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo Serge Petrenko
  2020-11-24 14:14 ` Cyrill Gorcunov
@ 2020-11-26 21:01 ` Vladislav Shpilevoy
  2020-11-27 13:00   ` Serge Petrenko
  1 sibling, 1 reply; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-26 21:01 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Hi! Thanks for the patch!

On 24.11.2020 14:18, Serge Petrenko wrote:
> When running a cluster with leader election, its useful to wait till the
> instance is writeable to determine that it has become a leader. However,
> sometimes the instance fails to write data right after transitioning to
> leader because its limbo still contains pending transactions from the
> old leader. Make sure the instance deals with pending transactions first
> and becomes writeable only once the limbo is empty.

I just realized one thing. We can add a function txn_limbo_is_ro(),
like we did with raft_is_ro(), account it in box_update_ro_summary(),
and call box_update_ro_summary() when we see that the limbo is emptied,
or when its ownership changes to a different instance.

Probably would be simpler, and also we could make it work with manual
election! So users could call box.ctl.wait_rw() even without using raft!

To show concrete error if somebody still tries to write, we could
patch box_check_writable() to show the reason why the instance is not
writable. We will do it anyway for raft, to tell the users the real
leader in case they are trying to write on a replica. In scope of
https://github.com/tarantool/tarantool/issues/5568.

Your version of the patch also looks good.

What do you think?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
  2020-11-26 21:01 ` Vladislav Shpilevoy
@ 2020-11-27 13:00   ` Serge Petrenko
  2020-11-27 22:10     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 7+ messages in thread
From: Serge Petrenko @ 2020-11-27 13:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


27.11.2020 00:01, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
> On 24.11.2020 14:18, Serge Petrenko wrote:
>> When running a cluster with leader election, its useful to wait till the
>> instance is writeable to determine that it has become a leader. However,
>> sometimes the instance fails to write data right after transitioning to
>> leader because its limbo still contains pending transactions from the
>> old leader. Make sure the instance deals with pending transactions first
>> and becomes writeable only once the limbo is empty.
> I just realized one thing. We can add a function txn_limbo_is_ro(),
> like we did with raft_is_ro(), account it in box_update_ro_summary(),
> and call box_update_ro_summary() when we see that the limbo is emptied,
> or when its ownership changes to a different instance.
>
> Probably would be simpler, and also we could make it work with manual
> election! So users could call box.ctl.wait_rw() even without using raft!
>
> To show concrete error if somebody still tries to write, we could
> patch box_check_writable() to show the reason why the instance is not
> writable. We will do it anyway for raft, to tell the users the real
> leader in case they are trying to write on a replica. In scope of
> https://github.com/tarantool/tarantool/issues/5568.
>
> Your version of the patch also looks good.
>
> What do you think?

Thanks for your answer!

Your proposal looks good. One question though. What about multimaster
synchro? Are we planning to support it one day? If yes, then limbo
emptiness will mean nothing.

So, there're two options:

1) we may leave this patch as is. Then one won't be
    able to call wait_rw() with manual election. That's a pity, since
    your proposal looks quite logical, especially from the user's point
    of view. Having a single error for all these cases would be good.

2) Make limbo affect is_ro. Then everything's good for now, but we'll
    have to rewrite it back once (and if) we decide to implement
    multimaster synchro. Then the patch will look exactly like it does
    now.

I'm not sure whether we're planning to make multimaster synchro work,
so I can't choose between these options and leave you to decide.

Besides, looks like if we take option 2 the patch will differ in a single
line: `box_update_ro_summary()` after `box_clear_synchro_queue`

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
  2020-11-27 13:00   ` Serge Petrenko
@ 2020-11-27 22:10     ` Vladislav Shpilevoy
  2020-11-30  9:40       ` Serge Petrenko
  0 siblings, 1 reply; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-27 22:10 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

>> On 24.11.2020 14:18, Serge Petrenko wrote:
>>> When running a cluster with leader election, its useful to wait till the
>>> instance is writeable to determine that it has become a leader. However,
>>> sometimes the instance fails to write data right after transitioning to
>>> leader because its limbo still contains pending transactions from the
>>> old leader. Make sure the instance deals with pending transactions first
>>> and becomes writeable only once the limbo is empty.
>> I just realized one thing. We can add a function txn_limbo_is_ro(),
>> like we did with raft_is_ro(), account it in box_update_ro_summary(),
>> and call box_update_ro_summary() when we see that the limbo is emptied,
>> or when its ownership changes to a different instance.
>>
>> Probably would be simpler, and also we could make it work with manual
>> election! So users could call box.ctl.wait_rw() even without using raft!
>>
>> To show concrete error if somebody still tries to write, we could
>> patch box_check_writable() to show the reason why the instance is not
>> writable. We will do it anyway for raft, to tell the users the real
>> leader in case they are trying to write on a replica. In scope of
>> https://github.com/tarantool/tarantool/issues/5568.
>>
>> Your version of the patch also looks good.
>>
>> What do you think?
> 
> Thanks for your answer!
> 
> Your proposal looks good. One question though. What about multimaster
> synchro? Are we planning to support it one day? If yes, then limbo
> emptiness will mean nothing.

Indeed. But I have no idea if we will ever support it, and if
yes - when. It is possible in theory, but we never tried to
elaborate so far. I wouldn't expect it happening in the next
year or so.

> So, there're two options:
> 
> 1) we may leave this patch as is. Then one won't be
>    able to call wait_rw() with manual election. That's a pity, since
>    your proposal looks quite logical, especially from the user's point
>    of view. Having a single error for all these cases would be good.

It also bothers me, that now box.ctl.wait_rw() actually does not wait
for rw, strictly speaking. So it is probably even a bug, not a feature.
A user can get wait_rw() true, but still won't be able to write.
Value of such helper becomes zero with synchronous replication, in
essence.

> 2) Make limbo affect is_ro. Then everything's good for now, but we'll
>    have to rewrite it back once (and if) we decide to implement
>    multimaster synchro. Then the patch will look exactly like it does
>    now.

Perhaps. If nothing will change in raft and limbo significantly.

> I'm not sure whether we're planning to make multimaster synchro work,
> so I can't choose between these options and leave you to decide.

We discuss it, but there is no really a plan. No customer request, and
no ticket AFAIR. I assume some big customer must ask for this for us to
even start planning.

> Besides, looks like if we take option 2 the patch will differ in a single
> line: `box_update_ro_summary()` after `box_clear_synchro_queue`

Why? You will need to patch the limbo as well, not just remove
box_update_ro_summary from after box_clear_synchro_queue. We will
need to add a method like txn_limbo_is_ro() (similar to raft_is_ro()),
use it in box_update_ro_summary(), and call the latter from some
places in the limbo.

I vote for the option 2. Mostly because I am afraid it is rather a bug.
wait_rw just does not guarantee anything now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo
  2020-11-27 22:10     ` Vladislav Shpilevoy
@ 2020-11-30  9:40       ` Serge Petrenko
  0 siblings, 0 replies; 7+ messages in thread
From: Serge Petrenko @ 2020-11-30  9:40 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


28.11.2020 01:10, Vladislav Shpilevoy пишет:
>>> On 24.11.2020 14:18, Serge Petrenko wrote:
>>>> When running a cluster with leader election, its useful to wait till the
>>>> instance is writeable to determine that it has become a leader. However,
>>>> sometimes the instance fails to write data right after transitioning to
>>>> leader because its limbo still contains pending transactions from the
>>>> old leader. Make sure the instance deals with pending transactions first
>>>> and becomes writeable only once the limbo is empty.
>>> I just realized one thing. We can add a function txn_limbo_is_ro(),
>>> like we did with raft_is_ro(), account it in box_update_ro_summary(),
>>> and call box_update_ro_summary() when we see that the limbo is emptied,
>>> or when its ownership changes to a different instance.
>>>
>>> Probably would be simpler, and also we could make it work with manual
>>> election! So users could call box.ctl.wait_rw() even without using raft!
>>>
>>> To show concrete error if somebody still tries to write, we could
>>> patch box_check_writable() to show the reason why the instance is not
>>> writable. We will do it anyway for raft, to tell the users the real
>>> leader in case they are trying to write on a replica. In scope of
>>> https://github.com/tarantool/tarantool/issues/5568.
>>>
>>> Your version of the patch also looks good.
>>>
>>> What do you think?
>> Thanks for your answer!
>>
>> Your proposal looks good. One question though. What about multimaster
>> synchro? Are we planning to support it one day? If yes, then limbo
>> emptiness will mean nothing.
> Indeed. But I have no idea if we will ever support it, and if
> yes - when. It is possible in theory, but we never tried to
> elaborate so far. I wouldn't expect it happening in the next
> year or so.
Ok, I see.
>> So, there're two options:
>>
>> 1) we may leave this patch as is. Then one won't be
>>     able to call wait_rw() with manual election. That's a pity, since
>>     your proposal looks quite logical, especially from the user's point
>>     of view. Having a single error for all these cases would be good.
> It also bothers me, that now box.ctl.wait_rw() actually does not wait
> for rw, strictly speaking. So it is probably even a bug, not a feature.
> A user can get wait_rw() true, but still won't be able to write.
> Value of such helper becomes zero with synchronous replication, in
> essence.


Ok.


>
>> 2) Make limbo affect is_ro. Then everything's good for now, but we'll
>>     have to rewrite it back once (and if) we decide to implement
>>     multimaster synchro. Then the patch will look exactly like it does
>>     now.
> Perhaps. If nothing will change in raft and limbo significantly.
>
>> I'm not sure whether we're planning to make multimaster synchro work,
>> so I can't choose between these options and leave you to decide.
> We discuss it, but there is no really a plan. No customer request, and
> no ticket AFAIR. I assume some big customer must ask for this for us to
> even start planning.
>
>> Besides, looks like if we take option 2 the patch will differ in a single
>> line: `box_update_ro_summary()` after `box_clear_synchro_queue`
> Why? You will need to patch the limbo as well, not just remove
> box_update_ro_summary from after box_clear_synchro_queue. We will
> need to add a method like txn_limbo_is_ro() (similar to raft_is_ro()),
> use it in box_update_ro_summary(), and call the latter from some
> places in the limbo.


Never mind, I don't know what I was thinking at the time of writing that.


>
> I vote for the option 2. Mostly because I am afraid it is rather a bug.
> wait_rw just does not guarantee anything now.


Done. Please see "[PATCH v2] box: make instace ro while limbo is not empty"

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-30  9:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-24 13:18 [Tarantool-patches] [PATCH] raft: make sure the leader stays ro till it clears the limbo Serge Petrenko
2020-11-24 14:14 ` Cyrill Gorcunov
2020-11-25  8:48   ` Serge Petrenko
2020-11-26 21:01 ` Vladislav Shpilevoy
2020-11-27 13:00   ` Serge Petrenko
2020-11-27 22:10     ` Vladislav Shpilevoy
2020-11-30  9:40       ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox