Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery
@ 2020-06-19 18:00 Serge Petrenko
  2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
  2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
  0 siblings, 2 replies; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw)
  To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches

Branch:
gh-4842-sync-replication

Serge Petrenko (2):
  box: rework local_recovery to use async txn_commit
  replication: support ROLLBACK and CONFIRM during recovery

 src/box/box.cc      | 60 +++++++++++++++++++++++++++++++++++++++++----
 src/box/txn_limbo.c |  6 ++---
 2 files changed, 57 insertions(+), 9 deletions(-)

-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko
@ 2020-06-19 18:00 ` Serge Petrenko
  2020-06-19 18:45   ` Serge Petrenko
  2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
  1 sibling, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw)
  To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches

Local recovery should use asynchronous txn commit procedure in order to
get to CONFIRM and ROLLBACK statements for a transaction that needs
confirmation before confirmation timeout happens.
Using async txn commit doesn't harm other transactions, since the
journal used during local recovery fakes writes and its write_async()
method may reuse plain write().

Follow-up #4847
Follow-up #4848
---
 src/box/box.cc | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 8ba7ffafb..f80d6f8e6 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -118,6 +118,8 @@ static struct gc_checkpoint_ref backup_gc;
 static bool is_box_configured = false;
 static bool is_ro = true;
 static fiber_cond ro_cond;
+/** Set to true during recovery from local files. */
+static bool is_local_recovery = false;
 
 /**
  * The following flag is set if the instance failed to
@@ -206,7 +208,24 @@ box_process_rw(struct request *request, struct space *space,
 		goto rollback;
 
 	if (is_autocommit) {
-		if (txn_commit(txn) != 0)
+		int res = 0;
+		/*
+		 * During local recovery the commit procedure
+		 * should be async, otherwise the only fiber
+		 * processing recovery will get stuck on the first
+		 * synchronous tx it meets until confirm timeout
+		 * is reached and the tx is rolled back, yielding
+		 * an error.
+		 * Moreover, txn_commit_async() doesn't hurt at
+		 * all during local recovery, since journal_write
+		 * is faked at this stage and returns immediately.
+		 */
+		if (is_local_recovery) {
+			res = txn_commit_async(txn);
+		} else {
+			res = txn_commit(txn);
+		}
+		if (res < 0)
 			goto error;
 	        fiber_gc();
 	}
@@ -327,12 +346,25 @@ recovery_journal_write(struct journal *base,
 	return 0;
 }
 
+static int
+recovery_journal_write_async(struct journal *base,
+			     struct  journal_entry *entry)
+{
+	recovery_journal_write(base, entry);
+	/*
+	 * Since there're no actual writes, fire a
+	 * journal_async_complete callback right away.
+	 */
+	journal_async_complete(base, entry);
+	return 0;
+}
+
 static void
 recovery_journal_create(struct vclock *v)
 {
 	static struct recovery_journal journal;
-	journal_create(&journal.base, journal_no_write_async,
-		       journal_no_write_async_cb,
+	journal_create(&journal.base, recovery_journal_write_async,
+		       txn_complete_async,
 		       recovery_journal_write, NULL);
 	journal.vclock = v;
 	journal_set(&journal.base);
@@ -2315,6 +2347,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
 	memtx = (struct memtx_engine *)engine_by_name("memtx");
 	assert(memtx != NULL);
 
+	is_local_recovery = true;
 	recovery_journal_create(&recovery->vclock);
 
 	/*
@@ -2356,6 +2389,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
 		box_sync_replication(false);
 	}
 	recovery_finalize(recovery);
+	is_local_recovery = false;
 
 	/*
 	 * We must enable WAL before finalizing engine recovery,
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery
  2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko
  2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
@ 2020-06-19 18:00 ` Serge Petrenko
  2020-06-23 11:50   ` Serge Petrenko
  1 sibling, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw)
  To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches

Follow-up #4847
Follow-up #4848
---
 src/box/box.cc      | 20 ++++++++++++++++++--
 src/box/txn_limbo.c |  6 ++----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index f80d6f8e6..f4c22b340 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -374,9 +374,25 @@ static void
 apply_wal_row(struct xstream *stream, struct xrow_header *row)
 {
 	struct request request;
-	// TODO: process confirmation during recovery.
-	if (iproto_type_is_synchro_request(row->type))
+	if (iproto_type_is_synchro_request(row->type)) {
+		uint32_t replica_id;
+		int64_t lsn;
+		switch(row->type) {
+		case IPROTO_CONFIRM:
+			if (xrow_decode_confirm(row, &replica_id, &lsn) < 0)
+				diag_raise();
+			assert(txn_limbo.instance_id == replica_id);
+			txn_limbo_read_confirm(&txn_limbo, lsn);
+			break;
+		case IPROTO_ROLLBACK:
+			if (xrow_decode_rollback(row, &replica_id, &lsn) < 0)
+				diag_raise();
+			assert(txn_limbo.instance_id == replica_id);
+			txn_limbo_read_rollback(&txn_limbo, lsn);
+			break;
+		}
 		return;
+	}
 	xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
 	if (request.type != IPROTO_NOP) {
 		struct space *space = space_cache_find_xc(request.space_id);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 1e1fa1aaf..ed19d4ec5 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -228,8 +228,7 @@ txn_limbo_write_confirm(struct txn_limbo *limbo,
 void
 txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 {
-	assert(limbo->instance_id != REPLICA_ID_NIL &&
-	       limbo->instance_id != instance_id);
+	assert(limbo->instance_id != REPLICA_ID_NIL);
 	struct txn_limbo_entry *e, *tmp;
 	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
 		if (e->lsn > lsn)
@@ -262,8 +261,7 @@ txn_limbo_write_rollback(struct txn_limbo *limbo,
 void
 txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 {
-	assert(limbo->instance_id != REPLICA_ID_NIL &&
-	       limbo->instance_id != instance_id);
+	assert(limbo->instance_id != REPLICA_ID_NIL);
 	struct txn_limbo_entry *e, *tmp;
 	rlist_foreach_entry_safe_reverse(e, &limbo->queue, in_queue, tmp) {
 		if (e->lsn <= lsn)
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
@ 2020-06-19 18:45   ` Serge Petrenko
  2020-06-21 16:25     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:45 UTC (permalink / raw)
  To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches


19.06.2020 21:00, Serge Petrenko пишет:
> Local recovery should use asynchronous txn commit procedure in order to
> get to CONFIRM and ROLLBACK statements for a transaction that needs
> confirmation before confirmation timeout happens.
> Using async txn commit doesn't harm other transactions, since the
> journal used during local recovery fakes writes and its write_async()
> method may reuse plain write().


Added a new commit. I didn't squash it, because its debatable.

commit 83760f8b0d1a59e17ab6adeeda60efc1bc6e94ad
Author: Serge Petrenko <sergepetrenko@tarantool.org>
Date:   Fri Jun 19 21:42:35 2020 +0300

     box: fix for 'box: rework local_recovery to use async txn_commit'

     [TO BE SQUASHED INTO THE PREVIOUS COMMIT]

diff --git a/src/box/box.cc b/src/box/box.cc
index f80d6f8e6..0fe7625fb 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct 
space *space,
                  */
                 if (is_local_recovery) {
                         res = txn_commit_async(txn);
+                       /*
+                        * Hack: remove the unnecessary trigger.
+                        * I don't know of a better place to do
+                        * it.
+                        */
+ trigger_clear(&txn->on_write_failure);
                 } else {
                         res = txn_commit(txn);
                 }
diff --git a/src/box/txn.c b/src/box/txn.c
index 2360ecae3..d36c11dda 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -688,9 +688,9 @@ txn_commit_async(struct txn *txn)
          * Set a trigger to abort waiting for confirm on WAL write
          * failure.
          */
-       trigger_create(&txn->on_write_failure, txn_limbo_on_rollback,
-                      limbo_entry, NULL);
         if (is_sync) {
+ trigger_create(&txn->on_write_failure, txn_limbo_on_rollback,
+                              limbo_entry, NULL);
                 txn_on_rollback(txn, &txn->on_write_failure);
         }
         return 0;

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-19 18:45   ` Serge Petrenko
@ 2020-06-21 16:25     ` Vladislav Shpilevoy
  2020-06-22 11:19       ` Serge Petrenko
  0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-21 16:25 UTC (permalink / raw)
  To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches

Hi! Thanks for the patch!

> diff --git a/src/box/box.cc b/src/box/box.cc
> index f80d6f8e6..0fe7625fb 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space,
>                  */
>                 if (is_local_recovery) {
>                         res = txn_commit_async(txn);
> +                       /*
> +                        * Hack: remove the unnecessary trigger.
> +                        * I don't know of a better place to do
> +                        * it.
> +                        */

Why is it necessary to remove it?

> + trigger_clear(&txn->on_write_failure);
>                 } else {
>                         res = txn_commit(txn);
>                 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-21 16:25     ` Vladislav Shpilevoy
@ 2020-06-22 11:19       ` Serge Petrenko
  2020-06-22 18:55         ` Serge Petrenko
  0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-22 11:19 UTC (permalink / raw)
  To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches


21.06.2020 19:25, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index f80d6f8e6..0fe7625fb 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space,
>>                   */
>>                  if (is_local_recovery) {
>>                          res = txn_commit_async(txn);
>> +                       /*
>> +                        * Hack: remove the unnecessary trigger.
>> +                        * I don't know of a better place to do
>> +                        * it.
>> +                        */
> Why is it necessary to remove it?

The trigger is set after journal_write_async() returns.
The idea is that journal_write_async() just submits the request for
writing, and later, journal_async_complete(), which is txn_complete_async()
completes the tx processing. The key word here is "later", meaning that
txn_complete_async() is called after txn_commit async() returns.

This is why txn_complete_async() clears the trigger set on write failure:
if write fails, we rollback the tx and remove the entry from the limbo.
If writing doesn't fail, it's limbo's business to remove an entry. The 
tx still
may be rolled back due to a timeout, then limbo will remove the entry, 
and if
an on_rollback trigger isn't cleared, it'll try to remove the entry again.

Now, back to this case: I reworked recovery_journal to use write_async().
recovery_journal_write_async() calls journal_async_complete() right away,
since there're no actual writes and no one else can call 
journal_async_complete().

So, once journal_write_async() returns, the trigger is cleared, but 
later it's
reset at the end of txn_commit_async(), because txn_commit_async() 
assumes that
the write hasn't happened yet, and journal_async_complete() is yet to be 
called.

One option is to call journal_async_complete() after txn_commit_async(), 
this will
solve the problem, but journal_async_complete() receives `struct 
journal_entry *`,
and we have no knowledge of the journal entry outside of 
`txn_commit_async()`.

Another option is to set a txn_flag, like 
"TXN_DONT_EXPECT_WRITE_FAILURE", so that
the txn doesn't set an on_write_failure trigger when we don't need it. 
But this
also looks ugly.

That's why I unconditionally reset the trigger after txn_commit_async().

>
>> + trigger_clear(&txn->on_write_failure);
>>                  } else {
>>                          res = txn_commit(txn);
>>                  }

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-22 11:19       ` Serge Petrenko
@ 2020-06-22 18:55         ` Serge Petrenko
  2020-06-22 21:43           ` Vladislav Shpilevoy
  0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-22 18:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches


22.06.2020 14:19, Serge Petrenko пишет:
>
> 21.06.2020 19:25, Vladislav Shpilevoy пишет:
>> Hi! Thanks for the patch!
>>
>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index f80d6f8e6..0fe7625fb 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct 
>>> space *space,
>>>                   */
>>>                  if (is_local_recovery) {
>>>                          res = txn_commit_async(txn);
>>> +                       /*
>>> +                        * Hack: remove the unnecessary trigger.
>>> +                        * I don't know of a better place to do
>>> +                        * it.
>>> +                        */
>> Why is it necessary to remove it?
>
> The trigger is set after journal_write_async() returns.
> The idea is that journal_write_async() just submits the request for
> writing, and later, journal_async_complete(), which is 
> txn_complete_async()
> completes the tx processing. The key word here is "later", meaning that
> txn_complete_async() is called after txn_commit async() returns.
>
> This is why txn_complete_async() clears the trigger set on write failure:
> if write fails, we rollback the tx and remove the entry from the limbo.
> If writing doesn't fail, it's limbo's business to remove an entry. The 
> tx still
> may be rolled back due to a timeout, then limbo will remove the entry, 
> and if
> an on_rollback trigger isn't cleared, it'll try to remove the entry 
> again.
>
> Now, back to this case: I reworked recovery_journal to use write_async().
> recovery_journal_write_async() calls journal_async_complete() right away,
> since there're no actual writes and no one else can call 
> journal_async_complete().
>
> So, once journal_write_async() returns, the trigger is cleared, but 
> later it's
> reset at the end of txn_commit_async(), because txn_commit_async() 
> assumes that
> the write hasn't happened yet, and journal_async_complete() is yet to 
> be called.
>
> One option is to call journal_async_complete() after 
> txn_commit_async(), this will
> solve the problem, but journal_async_complete() receives `struct 
> journal_entry *`,
> and we have no knowledge of the journal entry outside of 
> `txn_commit_async()`.
>
> Another option is to set a txn_flag, like 
> "TXN_DONT_EXPECT_WRITE_FAILURE", so that
> the txn doesn't set an on_write_failure trigger when we don't need it. 
> But this
> also looks ugly.
>
> That's why I unconditionally reset the trigger after txn_commit_async().

After further discussion, we decided to use different values of 
txn->signature < 0
as reasons for rollbackand instead of resetting the on_write_failure 
triggers,
we'll just make them work onlyfor some rollback reasons (like failed write).

>
>>
>>> + trigger_clear(&txn->on_write_failure);
>>>                  } else {
>>>                          res = txn_commit(txn);
>>>                  }
>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-22 18:55         ` Serge Petrenko
@ 2020-06-22 21:43           ` Vladislav Shpilevoy
  2020-06-23  8:39             ` Serge Petrenko
  0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-22 21:43 UTC (permalink / raw)
  To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches

I am looking at the commit which added on_write_failure.
Why can't we reuse on_rollback trigger? As we discussed,
we can pass whatever we want using txn->signature. So
for manual rollback we can set it to -1, for write failure
to -2, for limbo rollback to -3, for timeout to -4, etc.

I am just afraid that +16 bytes for transaction object
may affect performance of the async transactions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-22 21:43           ` Vladislav Shpilevoy
@ 2020-06-23  8:39             ` Serge Petrenko
  2020-06-26 22:00               ` Vladislav Shpilevoy
  0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-23  8:39 UTC (permalink / raw)
  To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches


23.06.2020 00:43, Vladislav Shpilevoy пишет:
> I am looking at the commit which added on_write_failure.
> Why can't we reuse on_rollback trigger? As we discussed,
> we can pass whatever we want using txn->signature. So
> for manual rollback we can set it to -1, for write failure
> to -2, for limbo rollback to -3, for timeout to -4, etc.
>
> I am just afraid that +16 bytes for transaction object
> may affect performance of the async transactions.

We can. I've just done it & dropped the additionnal commit

(the one that hacked the trigger removal in recovery.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery
  2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
@ 2020-06-23 11:50   ` Serge Petrenko
  0 siblings, 0 replies; 11+ messages in thread
From: Serge Petrenko @ 2020-06-23 11:50 UTC (permalink / raw)
  To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches


19.06.2020 21:00, Serge Petrenko пишет:
> Follow-up #4847
> Follow-up #4848

Added a small test for CONFIRM/ROLLBACK:

commit 5b91fbd3996c4ca43f5c96a1767e5b7ec7cea86d
Author: Serge Petrenko <sergepetrenko@tarantool.org>
Date:   Tue Jun 23 14:20:45 2020 +0300

     replication: add test for synchro CONFIRM/ROLLBACK

     [TO BE SQUASHED INTO THE PREVIOUS COMMIT]

diff --git a/test/replication/sync_replication_sanity.result 
b/test/replication/sync_replication_sanity.result
index 551df7daf..4b9823d77 100644
--- a/test/replication/sync_replication_sanity.result
+++ b/test/replication/sync_replication_sanity.result
@@ -69,3 +69,135 @@ box.schema.create_space('test', {is_sync = true, 
is_local = true})
   | ---
   | - error: 'Failed to create space ''test'': local space can''t be 
synchronous'
   | ...
+
+--
+-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL.
+--
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+-- Set up synchronous replication options.
+quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+lsn = box.info.lsn
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+-- 1 for insertion, 1 for CONFIRM message.
+box.info.lsn - lsn
+ | ---
+ | - 2
+ | ...
+-- Raise quorum so that master has to issue a ROLLBACK.
+box.cfg{replication_synchro_quorum=3}
+ | ---
+ | ...
+t = fiber.time()
+ | ---
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+-- Check that master waited for acks.
+fiber.time() - t > box.cfg.replication_synchro_timeout
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=2}
+ | ---
+ | ...
+box.space.sync:insert{3}
+ | ---
+ | - [3]
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ |   - [3]
+ | ...
+
+-- Check consistency on replica.
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ |   - [3]
+ | ...
+
+-- Check consistency in recovered data.
+test_run:cmd('restart server replica')
+ |
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ |   - [3]
+ | ...
+
+-- Cleanup.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+box.cfg{replication_synchro_quorum=quorum, 
replication_synchro_timeout=timeout}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/sync_replication_sanity.test.lua 
b/test/replication/sync_replication_sanity.test.lua
index fd7ed537e..8715a4600 100644
--- a/test/replication/sync_replication_sanity.test.lua
+++ b/test/replication/sync_replication_sanity.test.lua
@@ -27,3 +27,55 @@ s2:drop()

  -- Local space can't be synchronous.
  box.schema.create_space('test', {is_sync = true, is_local = true})
+
+--
+-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL.
+--
+env = require('test_run')
+test_run = env.new()
+fiber = require('fiber')
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+-- Set up synchronous replication options.
+quorum = box.cfg.replication_synchro_quorum
+timeout = box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+
+lsn = box.info.lsn
+box.space.sync:insert{1}
+-- 1 for insertion, 1 for CONFIRM message.
+box.info.lsn - lsn
+-- Raise quorum so that master has to issue a ROLLBACK.
+box.cfg{replication_synchro_quorum=3}
+t = fiber.time()
+box.space.sync:insert{2}
+-- Check that master waited for acks.
+fiber.time() - t > box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=2}
+box.space.sync:insert{3}
+box.space.sync:select{}
+
+-- Check consistency on replica.
+test_run:cmd('switch replica')
+box.space.sync:select{}
+
+-- Check consistency in recovered data.
+test_run:cmd('restart server replica')
+box.space.sync:select{}
+
+-- Cleanup.
+test_run:cmd('switch default')
+
+box.cfg{replication_synchro_quorum=quorum, 
replication_synchro_timeout=timeout}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.space.sync:drop()
+box.schema.user.revoke('guest', 'replication')

--

Serge Petrenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
  2020-06-23  8:39             ` Serge Petrenko
@ 2020-06-26 22:00               ` Vladislav Shpilevoy
  0 siblings, 0 replies; 11+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-26 22:00 UTC (permalink / raw)
  To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches

On 23/06/2020 10:39, Serge Petrenko wrote:
> 
> 23.06.2020 00:43, Vladislav Shpilevoy пишет:
>> I am looking at the commit which added on_write_failure.
>> Why can't we reuse on_rollback trigger? As we discussed,
>> we can pass whatever we want using txn->signature. So
>> for manual rollback we can set it to -1, for write failure
>> to -2, for limbo rollback to -3, for timeout to -4, etc.
>>
>> I am just afraid that +16 bytes for transaction object
>> may affect performance of the async transactions.
> 
> We can. I've just done it & dropped the additionnal commit
> 
> (the one that hacked the trigger removal in recovery.

We also can drop TXN_IS_ABORTED_BY_YIELD and reuse the signature
for this type of error too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-06-26 22:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
2020-06-19 18:45   ` Serge Petrenko
2020-06-21 16:25     ` Vladislav Shpilevoy
2020-06-22 11:19       ` Serge Petrenko
2020-06-22 18:55         ` Serge Petrenko
2020-06-22 21:43           ` Vladislav Shpilevoy
2020-06-23  8:39             ` Serge Petrenko
2020-06-26 22:00               ` Vladislav Shpilevoy
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
2020-06-23 11:50   ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox