* [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery
@ 2020-06-19 18:00 Serge Petrenko
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
0 siblings, 2 replies; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw)
To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
Branch:
gh-4842-sync-replication
Serge Petrenko (2):
box: rework local_recovery to use async txn_commit
replication: support ROLLBACK and CONFIRM during recovery
src/box/box.cc | 60 +++++++++++++++++++++++++++++++++++++++++----
src/box/txn_limbo.c | 6 ++---
2 files changed, 57 insertions(+), 9 deletions(-)
--
2.24.3 (Apple Git-128)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko
@ 2020-06-19 18:00 ` Serge Petrenko
2020-06-19 18:45 ` Serge Petrenko
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
1 sibling, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw)
To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
Local recovery should use asynchronous txn commit procedure in order to
get to CONFIRM and ROLLBACK statements for a transaction that needs
confirmation before confirmation timeout happens.
Using async txn commit doesn't harm other transactions, since the
journal used during local recovery fakes writes and its write_async()
method may reuse plain write().
Follow-up #4847
Follow-up #4848
---
src/box/box.cc | 40 +++++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 8ba7ffafb..f80d6f8e6 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -118,6 +118,8 @@ static struct gc_checkpoint_ref backup_gc;
static bool is_box_configured = false;
static bool is_ro = true;
static fiber_cond ro_cond;
+/** Set to true during recovery from local files. */
+static bool is_local_recovery = false;
/**
* The following flag is set if the instance failed to
@@ -206,7 +208,24 @@ box_process_rw(struct request *request, struct space *space,
goto rollback;
if (is_autocommit) {
- if (txn_commit(txn) != 0)
+ int res = 0;
+ /*
+ * During local recovery the commit procedure
+ * should be async, otherwise the only fiber
+ * processing recovery will get stuck on the first
+ * synchronous tx it meets until confirm timeout
+ * is reached and the tx is rolled back, yielding
+ * an error.
+ * Moreover, txn_commit_async() doesn't hurt at
+ * all during local recovery, since journal_write
+ * is faked at this stage and returns immediately.
+ */
+ if (is_local_recovery) {
+ res = txn_commit_async(txn);
+ } else {
+ res = txn_commit(txn);
+ }
+ if (res < 0)
goto error;
fiber_gc();
}
@@ -327,12 +346,25 @@ recovery_journal_write(struct journal *base,
return 0;
}
+static int
+recovery_journal_write_async(struct journal *base,
+ struct journal_entry *entry)
+{
+ recovery_journal_write(base, entry);
+ /*
+ * Since there're no actual writes, fire a
+ * journal_async_complete callback right away.
+ */
+ journal_async_complete(base, entry);
+ return 0;
+}
+
static void
recovery_journal_create(struct vclock *v)
{
static struct recovery_journal journal;
- journal_create(&journal.base, journal_no_write_async,
- journal_no_write_async_cb,
+ journal_create(&journal.base, recovery_journal_write_async,
+ txn_complete_async,
recovery_journal_write, NULL);
journal.vclock = v;
journal_set(&journal.base);
@@ -2315,6 +2347,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
memtx = (struct memtx_engine *)engine_by_name("memtx");
assert(memtx != NULL);
+ is_local_recovery = true;
recovery_journal_create(&recovery->vclock);
/*
@@ -2356,6 +2389,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
box_sync_replication(false);
}
recovery_finalize(recovery);
+ is_local_recovery = false;
/*
* We must enable WAL before finalizing engine recovery,
--
2.24.3 (Apple Git-128)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery
2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
@ 2020-06-19 18:00 ` Serge Petrenko
2020-06-23 11:50 ` Serge Petrenko
1 sibling, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw)
To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
Follow-up #4847
Follow-up #4848
---
src/box/box.cc | 20 ++++++++++++++++++--
src/box/txn_limbo.c | 6 ++----
2 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index f80d6f8e6..f4c22b340 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -374,9 +374,25 @@ static void
apply_wal_row(struct xstream *stream, struct xrow_header *row)
{
struct request request;
- // TODO: process confirmation during recovery.
- if (iproto_type_is_synchro_request(row->type))
+ if (iproto_type_is_synchro_request(row->type)) {
+ uint32_t replica_id;
+ int64_t lsn;
+ switch(row->type) {
+ case IPROTO_CONFIRM:
+ if (xrow_decode_confirm(row, &replica_id, &lsn) < 0)
+ diag_raise();
+ assert(txn_limbo.instance_id == replica_id);
+ txn_limbo_read_confirm(&txn_limbo, lsn);
+ break;
+ case IPROTO_ROLLBACK:
+ if (xrow_decode_rollback(row, &replica_id, &lsn) < 0)
+ diag_raise();
+ assert(txn_limbo.instance_id == replica_id);
+ txn_limbo_read_rollback(&txn_limbo, lsn);
+ break;
+ }
return;
+ }
xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
if (request.type != IPROTO_NOP) {
struct space *space = space_cache_find_xc(request.space_id);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 1e1fa1aaf..ed19d4ec5 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -228,8 +228,7 @@ txn_limbo_write_confirm(struct txn_limbo *limbo,
void
txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
{
- assert(limbo->instance_id != REPLICA_ID_NIL &&
- limbo->instance_id != instance_id);
+ assert(limbo->instance_id != REPLICA_ID_NIL);
struct txn_limbo_entry *e, *tmp;
rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
if (e->lsn > lsn)
@@ -262,8 +261,7 @@ txn_limbo_write_rollback(struct txn_limbo *limbo,
void
txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
{
- assert(limbo->instance_id != REPLICA_ID_NIL &&
- limbo->instance_id != instance_id);
+ assert(limbo->instance_id != REPLICA_ID_NIL);
struct txn_limbo_entry *e, *tmp;
rlist_foreach_entry_safe_reverse(e, &limbo->queue, in_queue, tmp) {
if (e->lsn <= lsn)
--
2.24.3 (Apple Git-128)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
@ 2020-06-19 18:45 ` Serge Petrenko
2020-06-21 16:25 ` Vladislav Shpilevoy
0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-19 18:45 UTC (permalink / raw)
To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
19.06.2020 21:00, Serge Petrenko пишет:
> Local recovery should use asynchronous txn commit procedure in order to
> get to CONFIRM and ROLLBACK statements for a transaction that needs
> confirmation before confirmation timeout happens.
> Using async txn commit doesn't harm other transactions, since the
> journal used during local recovery fakes writes and its write_async()
> method may reuse plain write().
Added a new commit. I didn't squash it, because its debatable.
commit 83760f8b0d1a59e17ab6adeeda60efc1bc6e94ad
Author: Serge Petrenko <sergepetrenko@tarantool.org>
Date: Fri Jun 19 21:42:35 2020 +0300
box: fix for 'box: rework local_recovery to use async txn_commit'
[TO BE SQUASHED INTO THE PREVIOUS COMMIT]
diff --git a/src/box/box.cc b/src/box/box.cc
index f80d6f8e6..0fe7625fb 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct
space *space,
*/
if (is_local_recovery) {
res = txn_commit_async(txn);
+ /*
+ * Hack: remove the unnecessary trigger.
+ * I don't know of a better place to do
+ * it.
+ */
+ trigger_clear(&txn->on_write_failure);
} else {
res = txn_commit(txn);
}
diff --git a/src/box/txn.c b/src/box/txn.c
index 2360ecae3..d36c11dda 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -688,9 +688,9 @@ txn_commit_async(struct txn *txn)
* Set a trigger to abort waiting for confirm on WAL write
* failure.
*/
- trigger_create(&txn->on_write_failure, txn_limbo_on_rollback,
- limbo_entry, NULL);
if (is_sync) {
+ trigger_create(&txn->on_write_failure, txn_limbo_on_rollback,
+ limbo_entry, NULL);
txn_on_rollback(txn, &txn->on_write_failure);
}
return 0;
--
Serge Petrenko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-19 18:45 ` Serge Petrenko
@ 2020-06-21 16:25 ` Vladislav Shpilevoy
2020-06-22 11:19 ` Serge Petrenko
0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-21 16:25 UTC (permalink / raw)
To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
Hi! Thanks for the patch!
> diff --git a/src/box/box.cc b/src/box/box.cc
> index f80d6f8e6..0fe7625fb 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space,
> */
> if (is_local_recovery) {
> res = txn_commit_async(txn);
> + /*
> + * Hack: remove the unnecessary trigger.
> + * I don't know of a better place to do
> + * it.
> + */
Why is it necessary to remove it?
> + trigger_clear(&txn->on_write_failure);
> } else {
> res = txn_commit(txn);
> }
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-21 16:25 ` Vladislav Shpilevoy
@ 2020-06-22 11:19 ` Serge Petrenko
2020-06-22 18:55 ` Serge Petrenko
0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-22 11:19 UTC (permalink / raw)
To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
21.06.2020 19:25, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index f80d6f8e6..0fe7625fb 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space,
>> */
>> if (is_local_recovery) {
>> res = txn_commit_async(txn);
>> + /*
>> + * Hack: remove the unnecessary trigger.
>> + * I don't know of a better place to do
>> + * it.
>> + */
> Why is it necessary to remove it?
The trigger is set after journal_write_async() returns.
The idea is that journal_write_async() just submits the request for
writing, and later, journal_async_complete(), which is txn_complete_async()
completes the tx processing. The key word here is "later", meaning that
txn_complete_async() is called after txn_commit async() returns.
This is why txn_complete_async() clears the trigger set on write failure:
if write fails, we rollback the tx and remove the entry from the limbo.
If writing doesn't fail, it's limbo's business to remove an entry. The
tx still
may be rolled back due to a timeout, then limbo will remove the entry,
and if
an on_rollback trigger isn't cleared, it'll try to remove the entry again.
Now, back to this case: I reworked recovery_journal to use write_async().
recovery_journal_write_async() calls journal_async_complete() right away,
since there're no actual writes and no one else can call
journal_async_complete().
So, once journal_write_async() returns, the trigger is cleared, but
later it's
reset at the end of txn_commit_async(), because txn_commit_async()
assumes that
the write hasn't happened yet, and journal_async_complete() is yet to be
called.
One option is to call journal_async_complete() after txn_commit_async(),
this will
solve the problem, but journal_async_complete() receives `struct
journal_entry *`,
and we have no knowledge of the journal entry outside of
`txn_commit_async()`.
Another option is to set a txn_flag, like
"TXN_DONT_EXPECT_WRITE_FAILURE", so that
the txn doesn't set an on_write_failure trigger when we don't need it.
But this
also looks ugly.
That's why I unconditionally reset the trigger after txn_commit_async().
>
>> + trigger_clear(&txn->on_write_failure);
>> } else {
>> res = txn_commit(txn);
>> }
--
Serge Petrenko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-22 11:19 ` Serge Petrenko
@ 2020-06-22 18:55 ` Serge Petrenko
2020-06-22 21:43 ` Vladislav Shpilevoy
0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-22 18:55 UTC (permalink / raw)
To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
22.06.2020 14:19, Serge Petrenko пишет:
>
> 21.06.2020 19:25, Vladislav Shpilevoy пишет:
>> Hi! Thanks for the patch!
>>
>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index f80d6f8e6..0fe7625fb 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct
>>> space *space,
>>> */
>>> if (is_local_recovery) {
>>> res = txn_commit_async(txn);
>>> + /*
>>> + * Hack: remove the unnecessary trigger.
>>> + * I don't know of a better place to do
>>> + * it.
>>> + */
>> Why is it necessary to remove it?
>
> The trigger is set after journal_write_async() returns.
> The idea is that journal_write_async() just submits the request for
> writing, and later, journal_async_complete(), which is
> txn_complete_async()
> completes the tx processing. The key word here is "later", meaning that
> txn_complete_async() is called after txn_commit async() returns.
>
> This is why txn_complete_async() clears the trigger set on write failure:
> if write fails, we rollback the tx and remove the entry from the limbo.
> If writing doesn't fail, it's limbo's business to remove an entry. The
> tx still
> may be rolled back due to a timeout, then limbo will remove the entry,
> and if
> an on_rollback trigger isn't cleared, it'll try to remove the entry
> again.
>
> Now, back to this case: I reworked recovery_journal to use write_async().
> recovery_journal_write_async() calls journal_async_complete() right away,
> since there're no actual writes and no one else can call
> journal_async_complete().
>
> So, once journal_write_async() returns, the trigger is cleared, but
> later it's
> reset at the end of txn_commit_async(), because txn_commit_async()
> assumes that
> the write hasn't happened yet, and journal_async_complete() is yet to
> be called.
>
> One option is to call journal_async_complete() after
> txn_commit_async(), this will
> solve the problem, but journal_async_complete() receives `struct
> journal_entry *`,
> and we have no knowledge of the journal entry outside of
> `txn_commit_async()`.
>
> Another option is to set a txn_flag, like
> "TXN_DONT_EXPECT_WRITE_FAILURE", so that
> the txn doesn't set an on_write_failure trigger when we don't need it.
> But this
> also looks ugly.
>
> That's why I unconditionally reset the trigger after txn_commit_async().
After further discussion, we decided to use different values of
txn->signature < 0
as reasons for rollbackand instead of resetting the on_write_failure
triggers,
we'll just make them work onlyfor some rollback reasons (like failed write).
>
>>
>>> + trigger_clear(&txn->on_write_failure);
>>> } else {
>>> res = txn_commit(txn);
>>> }
>
--
Serge Petrenko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-22 18:55 ` Serge Petrenko
@ 2020-06-22 21:43 ` Vladislav Shpilevoy
2020-06-23 8:39 ` Serge Petrenko
0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-22 21:43 UTC (permalink / raw)
To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
I am looking at the commit which added on_write_failure.
Why can't we reuse on_rollback trigger? As we discussed,
we can pass whatever we want using txn->signature. So
for manual rollback we can set it to -1, for write failure
to -2, for limbo rollback to -3, for timeout to -4, etc.
I am just afraid that +16 bytes for transaction object
may affect performance of the async transactions.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-22 21:43 ` Vladislav Shpilevoy
@ 2020-06-23 8:39 ` Serge Petrenko
2020-06-26 22:00 ` Vladislav Shpilevoy
0 siblings, 1 reply; 11+ messages in thread
From: Serge Petrenko @ 2020-06-23 8:39 UTC (permalink / raw)
To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
23.06.2020 00:43, Vladislav Shpilevoy пишет:
> I am looking at the commit which added on_write_failure.
> Why can't we reuse on_rollback trigger? As we discussed,
> we can pass whatever we want using txn->signature. So
> for manual rollback we can set it to -1, for write failure
> to -2, for limbo rollback to -3, for timeout to -4, etc.
>
> I am just afraid that +16 bytes for transaction object
> may affect performance of the async transactions.
We can. I've just done it & dropped the additionnal commit
(the one that hacked the trigger removal in recovery.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
@ 2020-06-23 11:50 ` Serge Petrenko
0 siblings, 0 replies; 11+ messages in thread
From: Serge Petrenko @ 2020-06-23 11:50 UTC (permalink / raw)
To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
19.06.2020 21:00, Serge Petrenko пишет:
> Follow-up #4847
> Follow-up #4848
Added a small test for CONFIRM/ROLLBACK:
commit 5b91fbd3996c4ca43f5c96a1767e5b7ec7cea86d
Author: Serge Petrenko <sergepetrenko@tarantool.org>
Date: Tue Jun 23 14:20:45 2020 +0300
replication: add test for synchro CONFIRM/ROLLBACK
[TO BE SQUASHED INTO THE PREVIOUS COMMIT]
diff --git a/test/replication/sync_replication_sanity.result
b/test/replication/sync_replication_sanity.result
index 551df7daf..4b9823d77 100644
--- a/test/replication/sync_replication_sanity.result
+++ b/test/replication/sync_replication_sanity.result
@@ -69,3 +69,135 @@ box.schema.create_space('test', {is_sync = true,
is_local = true})
| ---
| - error: 'Failed to create space ''test'': local space can''t be
synchronous'
| ...
+
+--
+-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL.
+--
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+-- Set up synchronous replication options.
+quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+lsn = box.info.lsn
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+-- 1 for insertion, 1 for CONFIRM message.
+box.info.lsn - lsn
+ | ---
+ | - 2
+ | ...
+-- Raise quorum so that master has to issue a ROLLBACK.
+box.cfg{replication_synchro_quorum=3}
+ | ---
+ | ...
+t = fiber.time()
+ | ---
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+-- Check that master waited for acks.
+fiber.time() - t > box.cfg.replication_synchro_timeout
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=2}
+ | ---
+ | ...
+box.space.sync:insert{3}
+ | ---
+ | - [3]
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | - [3]
+ | ...
+
+-- Check consistency on replica.
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | - [3]
+ | ...
+
+-- Check consistency in recovered data.
+test_run:cmd('restart server replica')
+ |
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | - [3]
+ | ...
+
+-- Cleanup.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+box.cfg{replication_synchro_quorum=quorum,
replication_synchro_timeout=timeout}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/sync_replication_sanity.test.lua
b/test/replication/sync_replication_sanity.test.lua
index fd7ed537e..8715a4600 100644
--- a/test/replication/sync_replication_sanity.test.lua
+++ b/test/replication/sync_replication_sanity.test.lua
@@ -27,3 +27,55 @@ s2:drop()
-- Local space can't be synchronous.
box.schema.create_space('test', {is_sync = true, is_local = true})
+
+--
+-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL.
+--
+env = require('test_run')
+test_run = env.new()
+fiber = require('fiber')
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+-- Set up synchronous replication options.
+quorum = box.cfg.replication_synchro_quorum
+timeout = box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+
+lsn = box.info.lsn
+box.space.sync:insert{1}
+-- 1 for insertion, 1 for CONFIRM message.
+box.info.lsn - lsn
+-- Raise quorum so that master has to issue a ROLLBACK.
+box.cfg{replication_synchro_quorum=3}
+t = fiber.time()
+box.space.sync:insert{2}
+-- Check that master waited for acks.
+fiber.time() - t > box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=2}
+box.space.sync:insert{3}
+box.space.sync:select{}
+
+-- Check consistency on replica.
+test_run:cmd('switch replica')
+box.space.sync:select{}
+
+-- Check consistency in recovered data.
+test_run:cmd('restart server replica')
+box.space.sync:select{}
+
+-- Cleanup.
+test_run:cmd('switch default')
+
+box.cfg{replication_synchro_quorum=quorum,
replication_synchro_timeout=timeout}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.space.sync:drop()
+box.schema.user.revoke('guest', 'replication')
--
Serge Petrenko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit
2020-06-23 8:39 ` Serge Petrenko
@ 2020-06-26 22:00 ` Vladislav Shpilevoy
0 siblings, 0 replies; 11+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-26 22:00 UTC (permalink / raw)
To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches
On 23/06/2020 10:39, Serge Petrenko wrote:
>
> 23.06.2020 00:43, Vladislav Shpilevoy пишет:
>> I am looking at the commit which added on_write_failure.
>> Why can't we reuse on_rollback trigger? As we discussed,
>> we can pass whatever we want using txn->signature. So
>> for manual rollback we can set it to -1, for write failure
>> to -2, for limbo rollback to -3, for timeout to -4, etc.
>>
>> I am just afraid that +16 bytes for transaction object
>> may affect performance of the async transactions.
>
> We can. I've just done it & dropped the additionnal commit
>
> (the one that hacked the trigger removal in recovery.
We also can drop TXN_IS_ABORTED_BY_YIELD and reuse the signature
for this type of error too.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-06-26 22:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko
2020-06-19 18:45 ` Serge Petrenko
2020-06-21 16:25 ` Vladislav Shpilevoy
2020-06-22 11:19 ` Serge Petrenko
2020-06-22 18:55 ` Serge Petrenko
2020-06-22 21:43 ` Vladislav Shpilevoy
2020-06-23 8:39 ` Serge Petrenko
2020-06-26 22:00 ` Vladislav Shpilevoy
2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko
2020-06-23 11:50 ` Serge Petrenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox