* [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback
@ 2020-04-07 15:14 Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 1/8] box: fix bootstrap comment Cyrill Gorcunov
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
In the series a few fixups including simple code cleanup.
I've assigned a separate bug for myself for applier redesign
since I need more time to understand code better
https://github.com/tarantool/tarantool/issues/4853
Issue https://github.com/tarantool/tarantool/issues/4730
Branch gorcunov/gh-4730-diag-raise-master-12
The series gathers everyone Acks and ready for merging.
Cyrill Gorcunov (8):
box: fix bootstrap comment
alter: shrink txn_alter_trigger_new code
request: add missing OutOfMemory diag_set
applier: add missing diag_set on region_alloc failure
replication: merge replica_by_id into replicaset
applier: reduce applier_txn_rollback_cb code density
applier: prevent nil dereference on applier rollback
test: add replication/gh-4730-applier-rollback
src/box/alter.cc | 4 +-
src/box/applier.cc | 24 ++-
src/box/box.cc | 2 +-
src/box/replication.cc | 2 -
src/box/replication.h | 2 +-
src/box/request.c | 8 +-
src/box/txn.c | 13 ++
src/lib/core/errinj.h | 1 +
test/box/errinj.result | 1 +
.../gh-4730-applier-rollback.result | 145 ++++++++++++++++++
.../gh-4730-applier-rollback.test.lua | 73 +++++++++
test/replication/replica-applier-rollback.lua | 16 ++
test/replication/suite.cfg | 1 +
test/replication/suite.ini | 2 +-
14 files changed, 281 insertions(+), 13 deletions(-)
create mode 100644 test/replication/gh-4730-applier-rollback.result
create mode 100644 test/replication/gh-4730-applier-rollback.test.lua
create mode 100644 test/replication/replica-applier-rollback.lua
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 1/8] box: fix bootstrap comment
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
@ 2020-04-07 15:14 ` Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 2/8] alter: shrink txn_alter_trigger_new code Cyrill Gorcunov
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
We're not starting new master node but
a new instance instead. The comment simply
leftover from older modifications.
Acked-by: Konstantin Osipov <kostja.osipov@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/box.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 765d64678..0c15ba5e9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2414,7 +2414,7 @@ box_cfg_xc(void)
local_recovery(&instance_uuid, &replicaset_uuid,
&checkpoint->vclock);
} else {
- /* Bootstrap a new master */
+ /* Bootstrap a new instance */
bootstrap(&instance_uuid, &replicaset_uuid,
&is_bootstrap_leader);
}
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 2/8] alter: shrink txn_alter_trigger_new code
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 1/8] box: fix bootstrap comment Cyrill Gorcunov
@ 2020-04-07 15:14 ` Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 3/8] request: add missing OutOfMemory diag_set Cyrill Gorcunov
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
Instead of calling memset which is useless here
just use trigger_create helper.
Acked-by: Konstantin Osipov <kostja.osipov@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/alter.cc | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/src/box/alter.cc b/src/box/alter.cc
index d73679fb8..dbbbcbc44 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -820,9 +820,7 @@ txn_alter_trigger_new(trigger_f run, void *data)
diag_set(OutOfMemory, size, "region", "struct trigger");
return NULL;
}
- trigger = (struct trigger *)memset(trigger, 0, size);
- trigger->run = run;
- trigger->data = data;
+ trigger_create(trigger, run, data, NULL);
return trigger;
}
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 3/8] request: add missing OutOfMemory diag_set
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 1/8] box: fix bootstrap comment Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 2/8] alter: shrink txn_alter_trigger_new code Cyrill Gorcunov
@ 2020-04-07 15:14 ` Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 4/8] applier: add missing diag_set on region_alloc failure Cyrill Gorcunov
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
In request_create_from_tuple and request_handle_sequence
we may be unable to request memory for tuples, don't
forget to setup diag error otherwise diag_raise will
lead to nil dereference.
Acked-by: Sergey Ostanevich <sergos@tarantool.org>
Acked-by: Konstantin Osipov <kostja.osipov@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/request.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/box/request.c b/src/box/request.c
index 82232a155..994f2da62 100644
--- a/src/box/request.c
+++ b/src/box/request.c
@@ -109,8 +109,10 @@ request_create_from_tuple(struct request *request, struct space *space,
* the tuple data to WAL on commit.
*/
char *buf = region_alloc(&fiber()->gc, size);
- if (buf == NULL)
+ if (buf == NULL) {
+ diag_set(OutOfMemory, size, "region_alloc", "tuple");
return -1;
+ }
memcpy(buf, data, size);
request->tuple = buf;
request->tuple_end = buf + size;
@@ -199,8 +201,10 @@ request_handle_sequence(struct request *request, struct space *space)
size_t buf_size = (request->tuple_end - request->tuple) +
mp_sizeof_uint(UINT64_MAX);
char *tuple = region_alloc(&fiber()->gc, buf_size);
- if (tuple == NULL)
+ if (tuple == NULL) {
+ diag_set(OutOfMemory, buf_size, "region_alloc", "tuple");
return -1;
+ }
char *tuple_end = mp_encode_array(tuple, len);
if (unlikely(key != data)) {
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 4/8] applier: add missing diag_set on region_alloc failure
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
` (2 preceding siblings ...)
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 3/8] request: add missing OutOfMemory diag_set Cyrill Gorcunov
@ 2020-04-07 15:14 ` Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 5/8] replication: merge replica_by_id into replicaset Cyrill Gorcunov
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
In case if we're hitting memory limit allocating triggers
we should setup diag error to prevent nil dereference
in diag_raise call (for example from applier_apply_tx).
Note that there are region_alloc_xc helpers which are
throwing errors but as far as I understand we need the
rollback action to process first instead of immediate
throw/catch thus we use diag_set.
Acked-by: Sergey Ostanevich <sergos@tarantool.org>
Acked-by: Konstantin Osipov <kostja.osipov@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/applier.cc | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 47a26c366..2eb1e04fc 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -796,8 +796,11 @@ applier_apply_tx(struct stailq *rows)
sizeof(struct trigger));
on_commit = (struct trigger *)region_alloc(&txn->region,
sizeof(struct trigger));
- if (on_rollback == NULL || on_commit == NULL)
+ if (on_rollback == NULL || on_commit == NULL) {
+ diag_set(OutOfMemory, sizeof(struct trigger),
+ "region_alloc", "on_rollback/on_commit");
goto rollback;
+ }
trigger_create(on_rollback, applier_txn_rollback_cb, NULL, NULL);
txn_on_rollback(txn, on_rollback);
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 5/8] replication: merge replica_by_id into replicaset
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
` (3 preceding siblings ...)
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 4/8] applier: add missing diag_set on region_alloc failure Cyrill Gorcunov
@ 2020-04-07 15:14 ` Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 6/8] applier: reduce applier_txn_rollback_cb code density Cyrill Gorcunov
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
For some reason the replica_by_id member (which is an
array of pointers) is allocated dynamically. Moreover
VCLOCK_MAX = 32 by now and extending it to some new
limit will require a way more efforts than just increase
the number.
Thus reserve memory for replica_by_id inside replicaset
statically. This allows to simplify code a bit and
drop calloc/free calls.
The former code comes from edd76a2a0ae17e3d without any
explanation why the dynamic member is needed.
Acked-by: Konstantin Osipov <kostja.osipov@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/replication.cc | 2 --
src/box/replication.h | 2 +-
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 1345f189b..7c10fb6f2 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -89,7 +89,6 @@ replication_init(void)
rlist_create(&replicaset.anon);
vclock_create(&replicaset.vclock);
fiber_cond_create(&replicaset.applier.cond);
- replicaset.replica_by_id = (struct replica **)calloc(VCLOCK_MAX, sizeof(struct replica *));
latch_create(&replicaset.applier.order_latch);
vclock_create(&replicaset.applier.vclock);
@@ -112,7 +111,6 @@ replication_free(void)
relay_cancel(replica->relay);
diag_destroy(&replicaset.applier.diag);
- free(replicaset.replica_by_id);
}
int
diff --git a/src/box/replication.h b/src/box/replication.h
index 2ef1255b3..9df91e611 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -251,7 +251,7 @@ struct replicaset {
struct diag diag;
} applier;
/** Map of all known replica_id's to correspponding replica's. */
- struct replica **replica_by_id;
+ struct replica *replica_by_id[VCLOCK_MAX];
};
extern struct replicaset replicaset;
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 6/8] applier: reduce applier_txn_rollback_cb code density
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
` (4 preceding siblings ...)
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 5/8] replication: merge replica_by_id into replicaset Cyrill Gorcunov
@ 2020-04-07 15:14 ` Cyrill Gorcunov
2020-04-07 15:15 ` [Tarantool-patches] [PATCH v12 7/8] applier: prevent nil dereference on applier rollback Cyrill Gorcunov
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:14 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
To make it a bit more readable.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/applier.cc | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 2eb1e04fc..2f9c9c797 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -695,8 +695,10 @@ applier_txn_rollback_cb(struct trigger *trigger, void *event)
/* Setup shared applier diagnostic area. */
diag_set(ClientError, ER_WAL_IO);
diag_move(&fiber()->diag, &replicaset.applier.diag);
+
/* Broadcast the rollback event across all appliers. */
trigger_run(&replicaset.applier.on_rollback, event);
+
/* Rollback applier vclock to the committed one. */
vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);
return 0;
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 7/8] applier: prevent nil dereference on applier rollback
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
` (5 preceding siblings ...)
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 6/8] applier: reduce applier_txn_rollback_cb code density Cyrill Gorcunov
@ 2020-04-07 15:15 ` Cyrill Gorcunov
2020-04-07 15:15 ` [Tarantool-patches] [PATCH v12 8/8] test: add replication/gh-4730-applier-rollback Cyrill Gorcunov
2020-04-08 11:04 ` [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Kirill Yukhin
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:15 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
Currently when transaction rollback happens we just drop an existing
error setting ClientError to the replicaset.applier.diag. This action
leaves current fiber with diag=nil, which in turn leads to sigsegv once
diag_raise() called right after applier_apply_tx():
| applier_f
| try {
| applier_subscribe
| applier_apply_tx
| // error happens
| txn_rollback
| diag_set(ClientError, ER_WAL_IO)
| diag_move(&fiber()->diag, &replicaset.applier.diag)
| // fiber->diag = nil
| applier_on_rollback
| diag_add_error(&applier->diag, diag_last_error(&replicaset.applier.diag)
| fiber_cancel(applier->reader);
| diag_raise() -> NULL dereference
| } catch { ... }
Thus:
- use diag_set_error() instead of diag_move() to not drop error
from a current fiber() preventing a nil dereference;
- put fixme mark into the code: we need to rework it in a
more sense way.
Fixes #4730
Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/applier.cc | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 2f9c9c797..68de3c08c 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -692,9 +692,22 @@ static int
applier_txn_rollback_cb(struct trigger *trigger, void *event)
{
(void) trigger;
- /* Setup shared applier diagnostic area. */
+
+ /*
+ * Setup shared applier diagnostic area.
+ *
+ * FIXME: We should consider redesign this
+ * moment and instead of carrying one shared
+ * diag use per-applier diag instead all the time
+ * (which actually already present in the structure).
+ *
+ * But remember that transactions are asynchronous
+ * and rollback may happen a way latter after it
+ * passed to the journal engine.
+ */
diag_set(ClientError, ER_WAL_IO);
- diag_move(&fiber()->diag, &replicaset.applier.diag);
+ diag_set_error(&replicaset.applier.diag,
+ diag_last_error(diag_get()));
/* Broadcast the rollback event across all appliers. */
trigger_run(&replicaset.applier.on_rollback, event);
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH v12 8/8] test: add replication/gh-4730-applier-rollback
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
` (6 preceding siblings ...)
2020-04-07 15:15 ` [Tarantool-patches] [PATCH v12 7/8] applier: prevent nil dereference on applier rollback Cyrill Gorcunov
@ 2020-04-07 15:15 ` Cyrill Gorcunov
2020-04-08 11:04 ` [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Kirill Yukhin
8 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-04-07 15:15 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
Test that diag_raise doesn't happen if async transaction
fails inside replication procedure.
Side note: I don't like merging tests with patches in
general and I hate doing so for big tests with a passion
because it hides the patch code itself. So here is a
separate patch on top of the fix.
Test-of #4730
Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/txn.c | 13 ++
src/lib/core/errinj.h | 1 +
test/box/errinj.result | 1 +
.../gh-4730-applier-rollback.result | 145 ++++++++++++++++++
.../gh-4730-applier-rollback.test.lua | 73 +++++++++
test/replication/replica-applier-rollback.lua | 16 ++
test/replication/suite.cfg | 1 +
test/replication/suite.ini | 2 +-
8 files changed, 251 insertions(+), 1 deletion(-)
create mode 100644 test/replication/gh-4730-applier-rollback.result
create mode 100644 test/replication/gh-4730-applier-rollback.test.lua
create mode 100644 test/replication/replica-applier-rollback.lua
diff --git a/src/box/txn.c b/src/box/txn.c
index f9c3e3675..488aa4bdd 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -34,6 +34,7 @@
#include "journal.h"
#include <fiber.h>
#include "xrow.h"
+#include "errinj.h"
double too_long_threshold;
@@ -576,6 +577,18 @@ txn_commit_async(struct txn *txn)
{
struct journal_entry *req;
+ ERROR_INJECT(ERRINJ_TXN_COMMIT_ASYNC, {
+ diag_set(ClientError, ER_INJECTION,
+ "txn commit async injection");
+ /*
+ * Log it for the testing sake: we grep
+ * output to mark this event.
+ */
+ diag_log();
+ txn_rollback(txn);
+ return -1;
+ });
+
if (txn_prepare(txn) != 0) {
txn_rollback(txn);
return -1;
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index ee6c57a0d..7577ed11a 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -139,6 +139,7 @@ struct errinj {
_(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1}) \
_(ERRINJ_RELAY_FASTER_THAN_TX, ERRINJ_BOOL, {.bparam = false}) \
_(ERRINJ_INDEX_RESERVE, ERRINJ_BOOL, {.bparam = false})\
+ _(ERRINJ_TXN_COMMIT_ASYNC, ERRINJ_BOOL, {.bparam = false})\
ENUM0(errinj_id, ERRINJ_LIST);
extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 0d3fedeb3..de877b708 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -76,6 +76,7 @@ evals
- ERRINJ_TUPLE_ALLOC: false
- ERRINJ_TUPLE_FIELD: false
- ERRINJ_TUPLE_FORMAT_COUNT: -1
+ - ERRINJ_TXN_COMMIT_ASYNC: false
- ERRINJ_VYRUN_DATA_READ: false
- ERRINJ_VY_COMPACTION_DELAY: false
- ERRINJ_VY_DELAY_PK_LOOKUP: false
diff --git a/test/replication/gh-4730-applier-rollback.result b/test/replication/gh-4730-applier-rollback.result
new file mode 100644
index 000000000..26a0eb6fa
--- /dev/null
+++ b/test/replication/gh-4730-applier-rollback.result
@@ -0,0 +1,145 @@
+-- test-run result file version 2
+#!/usr/bin/env tarantool
+ | ---
+ | ...
+--
+-- vim: ts=4 sw=4 et
+--
+
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- Allow replica to connect to us
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+--
+-- Create replica instance, we're the master and
+-- start it, no data to sync yet though
+test_run:cmd("create server replica_slave with rpl_master=default, script='replication/replica-applier-rollback.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd("start server replica_slave")
+ | ---
+ | - true
+ | ...
+
+--
+-- Fill initial data on the master instance
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+_ = box.schema.space.create('test')
+ | ---
+ | ...
+s = box.space.test
+ | ---
+ | ...
+
+s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}})
+ | ---
+ | ...
+
+_ = s:create_index('primary', {type = 'tree', parts = {'id'}})
+ | ---
+ | ...
+s:insert({1, '1'})
+ | ---
+ | - [1, '1']
+ | ...
+s:insert({2, '2'})
+ | ---
+ | - [2, '2']
+ | ...
+s:insert({3, '3'})
+ | ---
+ | - [3, '3']
+ | ...
+
+--
+-- Wait for data from master get propagated
+test_run:wait_lsn('replica_slave', 'default')
+ | ---
+ | ...
+
+--
+-- Now inject error into slave instance
+test_run:cmd('switch replica_slave')
+ | ---
+ | - true
+ | ...
+
+--
+-- To make sure we're running
+box.info.status
+ | ---
+ | - running
+ | ...
+
+--
+-- To fail inserting new record.
+errinj = box.error.injection
+ | ---
+ | ...
+errinj.set('ERRINJ_TXN_COMMIT_ASYNC', true)
+ | ---
+ | - ok
+ | ...
+
+--
+-- Jump back to master node and write new
+-- entry which should cause error to happen
+-- on slave instance
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+s:insert({4, '4'})
+ | ---
+ | - [4, '4']
+ | ...
+
+--
+-- Wait for error to trigger
+test_run:cmd('switch replica_slave')
+ | ---
+ | - true
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+while test_run:grep_log('replica_slave', 'ER_INJECTION:[^\n]*') == nil do fiber.sleep(0.1) end
+ | ---
+ | ...
+
+----
+---- Such error cause the applier to be
+---- cancelled and reaped, thus stop the
+---- slave node and cleanup
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+--
+-- Cleanup
+test_run:cmd("stop server replica_slave")
+ | ---
+ | - true
+ | ...
+test_run:cmd("delete server replica_slave")
+ | ---
+ | - true
+ | ...
+box.space.test:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/gh-4730-applier-rollback.test.lua b/test/replication/gh-4730-applier-rollback.test.lua
new file mode 100644
index 000000000..de7a740de
--- /dev/null
+++ b/test/replication/gh-4730-applier-rollback.test.lua
@@ -0,0 +1,73 @@
+#!/usr/bin/env tarantool
+--
+-- vim: ts=4 sw=4 et
+--
+
+test_run = require('test_run').new()
+
+--
+-- Allow replica to connect to us
+box.schema.user.grant('guest', 'replication')
+
+--
+-- Create replica instance, we're the master and
+-- start it, no data to sync yet though
+test_run:cmd("create server replica_slave with rpl_master=default, script='replication/replica-applier-rollback.lua'")
+test_run:cmd("start server replica_slave")
+
+--
+-- Fill initial data on the master instance
+test_run:cmd('switch default')
+
+_ = box.schema.space.create('test')
+s = box.space.test
+
+s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}})
+
+_ = s:create_index('primary', {type = 'tree', parts = {'id'}})
+s:insert({1, '1'})
+s:insert({2, '2'})
+s:insert({3, '3'})
+
+--
+-- Wait for data from master get propagated
+test_run:wait_lsn('replica_slave', 'default')
+
+--
+-- Now inject error into slave instance
+test_run:cmd('switch replica_slave')
+
+--
+-- To make sure we're running
+box.info.status
+
+--
+-- To fail inserting new record.
+errinj = box.error.injection
+errinj.set('ERRINJ_TXN_COMMIT_ASYNC', true)
+
+--
+-- Jump back to master node and write new
+-- entry which should cause error to happen
+-- on slave instance
+test_run:cmd('switch default')
+s:insert({4, '4'})
+
+--
+-- Wait for error to trigger
+test_run:cmd('switch replica_slave')
+fiber = require('fiber')
+while test_run:grep_log('replica_slave', 'ER_INJECTION:[^\n]*') == nil do fiber.sleep(0.1) end
+
+----
+---- Such error cause the applier to be
+---- cancelled and reaped, thus stop the
+---- slave node and cleanup
+test_run:cmd('switch default')
+
+--
+-- Cleanup
+test_run:cmd("stop server replica_slave")
+test_run:cmd("delete server replica_slave")
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica-applier-rollback.lua b/test/replication/replica-applier-rollback.lua
new file mode 100644
index 000000000..26fb10055
--- /dev/null
+++ b/test/replication/replica-applier-rollback.lua
@@ -0,0 +1,16 @@
+--
+-- vim: ts=4 sw=4 et
+--
+
+print('arg', arg)
+
+box.cfg({
+ replication = os.getenv("MASTER"),
+ listen = os.getenv("LISTEN"),
+ memtx_memory = 107374182,
+ replication_timeout = 0.1,
+ replication_connect_timeout = 0.5,
+ read_only = true,
+})
+
+require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 90fd53ca6..a6abfd72c 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -16,6 +16,7 @@
"gh-4605-empty-password.test.lua": {},
"gh-4606-admin-creds.test.lua": {},
"gh-4739-vclock-assert.test.lua": {},
+ "gh-4730-applier-rollback.test.lua": {},
"*": {
"memtx": {"engine": "memtx"},
"vinyl": {"engine": "vinyl"}
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index b4e09744a..ac413669d 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
script = master.lua
description = tarantool/box, replication
disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua
config = suite.cfg
lua_libs = lua/fast_replica.lua lua/rlimit.lua
use_unix_sockets = True
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
` (7 preceding siblings ...)
2020-04-07 15:15 ` [Tarantool-patches] [PATCH v12 8/8] test: add replication/gh-4730-applier-rollback Cyrill Gorcunov
@ 2020-04-08 11:04 ` Kirill Yukhin
8 siblings, 0 replies; 10+ messages in thread
From: Kirill Yukhin @ 2020-04-08 11:04 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml
Hello,
On 07 апр 18:14, Cyrill Gorcunov wrote:
> In the series a few fixups including simple code cleanup.
>
> I've assigned a separate bug for myself for applier redesign
> since I need more time to understand code better
> https://github.com/tarantool/tarantool/issues/4853
>
> Issue https://github.com/tarantool/tarantool/issues/4730
> Branch gorcunov/gh-4730-diag-raise-master-12
>
> The series gathers everyone Acks and ready for merging.
>
> Cyrill Gorcunov (8):
> box: fix bootstrap comment
> alter: shrink txn_alter_trigger_new code
> request: add missing OutOfMemory diag_set
> applier: add missing diag_set on region_alloc failure
> replication: merge replica_by_id into replicaset
> applier: reduce applier_txn_rollback_cb code density
> applier: prevent nil dereference on applier rollback
> test: add replication/gh-4730-applier-rollback
I've checked your patchset into 2.2, 2.3 and master.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-04-08 11:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07 15:14 [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 1/8] box: fix bootstrap comment Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 2/8] alter: shrink txn_alter_trigger_new code Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 3/8] request: add missing OutOfMemory diag_set Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 4/8] applier: add missing diag_set on region_alloc failure Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 5/8] replication: merge replica_by_id into replicaset Cyrill Gorcunov
2020-04-07 15:14 ` [Tarantool-patches] [PATCH v12 6/8] applier: reduce applier_txn_rollback_cb code density Cyrill Gorcunov
2020-04-07 15:15 ` [Tarantool-patches] [PATCH v12 7/8] applier: prevent nil dereference on applier rollback Cyrill Gorcunov
2020-04-07 15:15 ` [Tarantool-patches] [PATCH v12 8/8] test: add replication/gh-4730-applier-rollback Cyrill Gorcunov
2020-04-08 11:04 ` [Tarantool-patches] [PATCH v12 0/8] replication: prevent nil dereference on applier rollback Kirill Yukhin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox