Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: tml <tarantool-patches@dev.tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v11 8/8] test: add replication/applier-rollback
Date: Tue, 7 Apr 2020 13:26:34 +0300	[thread overview]
Message-ID: <AED406E3-B951-422A-8204-77A2E13A2462@tarantool.org> (raw)
In-Reply-To: <20200404161524.7466-9-gorcunov@gmail.com>

Hi! Thanks for the patch!
Please find my comments below.

> 4 апр. 2020 г., в 19:15, Cyrill Gorcunov <gorcunov@gmail.com> написал(а):
> 
> Test that diag_raise doesn't happen if async transaction
> fails inside replication procedure.
> 
> Side note: I don't like merging tests with patches in
> general and I hate doing so for big tests with a passion
> because it hides the patch code itself. So here is a
> separate patch on top of the fix.

I like that. It was easy to check the test without the fix.

> 
> Test-of #4730
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> src/box/txn.c                               |  13 ++
> src/lib/core/errinj.h                       |   1 +
> test/box/errinj.result                      |   1 +
> test/replication/applier-rollback-slave.lua |  16 ++
> test/replication/applier-rollback.result    | 162 ++++++++++++++++++++
> test/replication/applier-rollback.test.lua  |  81 ++++++++++

It’s a bugfix, so please name the test gh-4730-something-something.test.lua

> test/replication/suite.ini                  |   2 +-
> 7 files changed, 275 insertions(+), 1 deletion(-)
> create mode 100644 test/replication/applier-rollback-slave.lua
> create mode 100644 test/replication/applier-rollback.result
> create mode 100644 test/replication/applier-rollback.test.lua
> 
> diff --git a/src/box/txn.c b/src/box/txn.c
> index f9c3e3675..488aa4bdd 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -34,6 +34,7 @@
> #include "journal.h"
> #include <fiber.h>
> #include "xrow.h"
> +#include "errinj.h"
> 
> double too_long_threshold;
> 
> @@ -576,6 +577,18 @@ txn_commit_async(struct txn *txn)
> {
> 	struct journal_entry *req;
> 
> +	ERROR_INJECT(ERRINJ_TXN_COMMIT_ASYNC, {
> +		diag_set(ClientError, ER_INJECTION,
> +			 "txn commit async injection");
> +		/*
> +		 * Log it for the testing sake: we grep
> +		 * output to mark this event.
> +		 */
> +		diag_log();
> +		txn_rollback(txn);
> +		return -1;
> +	});
> +
> 	if (txn_prepare(txn) != 0) {
> 		txn_rollback(txn);
> 		return -1;
> diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
> index ee6c57a0d..7577ed11a 100644
> --- a/src/lib/core/errinj.h
> +++ b/src/lib/core/errinj.h
> @@ -139,6 +139,7 @@ struct errinj {
> 	_(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1}) \
> 	_(ERRINJ_RELAY_FASTER_THAN_TX, ERRINJ_BOOL, {.bparam = false}) \
> 	_(ERRINJ_INDEX_RESERVE, ERRINJ_BOOL, {.bparam = false})\
> +	_(ERRINJ_TXN_COMMIT_ASYNC, ERRINJ_BOOL, {.bparam = false})\
> 
> ENUM0(errinj_id, ERRINJ_LIST);
> extern struct errinj errinjs[];
> diff --git a/test/box/errinj.result b/test/box/errinj.result
> index 0d3fedeb3..de877b708 100644
> --- a/test/box/errinj.result
> +++ b/test/box/errinj.result
> @@ -76,6 +76,7 @@ evals
>   - ERRINJ_TUPLE_ALLOC: false
>   - ERRINJ_TUPLE_FIELD: false
>   - ERRINJ_TUPLE_FORMAT_COUNT: -1
> +  - ERRINJ_TXN_COMMIT_ASYNC: false
>   - ERRINJ_VYRUN_DATA_READ: false
>   - ERRINJ_VY_COMPACTION_DELAY: false
>   - ERRINJ_VY_DELAY_PK_LOOKUP: false
> diff --git a/test/replication/applier-rollback-slave.lua b/test/replication/applier-rollback-slave.lua
> new file mode 100644
> index 000000000..26fb10055
> --- /dev/null
> +++ b/test/replication/applier-rollback-slave.lua

Better name it replica_applier_rollback.lua for the sake of consistency
with other instance file names.

> @@ -0,0 +1,16 @@
> +--
> +-- vim: ts=4 sw=4 et
> +--
> +
> +print('arg', arg)
> +
> +box.cfg({
> +    replication                 = os.getenv("MASTER"),
> +    listen                      = os.getenv("LISTEN"),
> +    memtx_memory                = 107374182,
> +    replication_timeout         = 0.1,
> +    replication_connect_timeout = 0.5,
> +    read_only                   = true,
> +})
> +
> +require('console').listen(os.getenv('ADMIN'))
> diff --git a/test/replication/applier-rollback.result b/test/replication/applier-rollback.result
> new file mode 100644
> index 000000000..3c659f460
> --- /dev/null
> +++ b/test/replication/applier-rollback.result
> @@ -0,0 +1,162 @@
> +-- test-run result file version 2
> +#!/usr/bin/env tarantool
> + | ---
> + | ...
> +--
> +-- vim: ts=4 sw=4 et
> +--
> +
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +errinj = box.error.injection
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +
> +--
> +-- Allow replica to connect to us
> +box.schema.user.grant('guest', 'replication')
> + | ---
> + | ...
> +
> +--
> +-- Create replica instance, we're the master and
> +-- start it, no data to sync yet though
> +test_run:cmd("create server replica_slave with rpl_master=default, script='replication/applier-rollback-slave.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server replica_slave")
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- Fill initial data on the master instance
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +
> +_ = box.schema.space.create('test', {engine=engine})
> + | ---
> + | ...
> +s = box.space.test
> + | ---
> + | ...
> +
> +s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}})
> + | ---
> + | ...
> +
> +_ = s:create_index('primary', {type = 'tree', parts = {'id'}})
> + | ---
> + | ...
> +s:insert({1, '1'})
> + | ---
> + | - [1, '1']
> + | ...
> +s:insert({2, '2'})
> + | ---
> + | - [2, '2']
> + | ...
> +s:insert({3, '3'})
> + | ---
> + | - [3, '3']
> + | ...
> +
> +--
> +-- To make sure we're running
> +box.info.status
> + | ---
> + | - running
> + | ...
> +
> +--
> +-- Wait for data from master get propagated
> +test_run:wait_lsn('replica_slave', 'default')
> + | ---
> + | ...
> +
> +--
> +-- Now inject error into slave instance
> +test_run:cmd('switch replica_slave')
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- To make sure we're running
> +box.info.status
> + | ---
> + | - running
> + | ...
> +
> +--
> +-- To fail inserting new record.
> +errinj = box.error.injection
> + | ---
> + | ...
> +errinj.set('ERRINJ_TXN_COMMIT_ASYNC', true)
> + | ---
> + | - ok
> + | ...
> +
> +--
> +-- Jump back to master node and write new
> +-- entry which should cause error to happen
> +-- on slave instance
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +s:insert({4, '4'})
> + | ---
> + | - [4, '4']
> + | ...
> +
> +--
> +-- Wait for error to trigger
> +test_run:cmd('switch replica_slave')
> + | ---
> + | - true
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +while test_run:grep_log('replica_slave', 'ER_INJECTION:[^\n]*') == nil do fiber.sleep(0.1) end
> + | ---
> + | ...
> +
> +----
> +---- Such error cause the applier to be
> +---- cancelled and reaped, thus stop the
> +---- slave node and cleanup
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- Cleanup
> +test_run:cmd("stop server replica_slave")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("delete server replica_slave")
> + | ---
> + | - true
> + | ...
> +box.cfg{replication=""}
> + | ---
> + | ...
> +box.space.test:drop()
> + | ---
> + | ...
> +box.schema.user.revoke('guest', 'replication')
> + | ---
> + | ...
> diff --git a/test/replication/applier-rollback.test.lua b/test/replication/applier-rollback.test.lua
> new file mode 100644
> index 000000000..2c32af5c6
> --- /dev/null
> +++ b/test/replication/applier-rollback.test.lua
> @@ -0,0 +1,81 @@
> +#!/usr/bin/env tarantool
> +--
> +-- vim: ts=4 sw=4 et
> +--
> +
> +test_run = require('test_run').new()
> +
> +errinj = box.error.injection

You don’t need the errinj on master, only on replica, AFAICS.

> +engine = test_run:get_cfg('engine')

Why test both engines? I suggest you run the test with no arguments.
(you’ll have to add a line to replication/suite.cfg for that)

> +
> +--
> +-- Allow replica to connect to us
> +box.schema.user.grant('guest', 'replication')
> +
> +--
> +-- Create replica instance, we're the master and
> +-- start it, no data to sync yet though
> +test_run:cmd("create server replica_slave with rpl_master=default, script='replication/applier-rollback-slave.lua'")
> +test_run:cmd("start server replica_slave")
> +
> +--
> +-- Fill initial data on the master instance
> +test_run:cmd('switch default')
> +
> +_ = box.schema.space.create('test', {engine=engine})
> +s = box.space.test
> +
> +s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}})
> +
> +_ = s:create_index('primary', {type = 'tree', parts = {'id'}})
> +s:insert({1, '1'})
> +s:insert({2, '2'})
> +s:insert({3, '3'})
> +
> +--
> +-- To make sure we're running
> +box.info.status
> +

This should always evaluate to ‘running’, you’ve just inserted some
values and got a result back, so I’d omit this check.

> +--
> +-- Wait for data from master get propagated
> +test_run:wait_lsn('replica_slave', 'default')
> +
> +--
> +-- Now inject error into slave instance
> +test_run:cmd('switch replica_slave')
> +
> +--
> +-- To make sure we're running
> +box.info.status
> +
> +--
> +-- To fail inserting new record.
> +errinj = box.error.injection
> +errinj.set('ERRINJ_TXN_COMMIT_ASYNC', true)
> +
> +--
> +-- Jump back to master node and write new
> +-- entry which should cause error to happen
> +-- on slave instance
> +test_run:cmd('switch default')
> +s:insert({4, '4'})
> +
> +--
> +-- Wait for error to trigger
> +test_run:cmd('switch replica_slave')
> +fiber = require('fiber')
> +while test_run:grep_log('replica_slave', 'ER_INJECTION:[^\n]*') == nil do fiber.sleep(0.1) end
> +
> +----
> +---- Such error cause the applier to be
> +---- cancelled and reaped, thus stop the
> +---- slave node and cleanup
> +test_run:cmd('switch default')
> +
> +--
> +-- Cleanup
> +test_run:cmd("stop server replica_slave")
> +test_run:cmd("delete server replica_slave")
> +box.cfg{replication=""}

You didn’t set box.cfg.replication, so you shouldn’t reset it.

> +box.space.test:drop()
> +box.schema.user.revoke('guest', 'replication')
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index b4e09744a..f6c924762 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -3,7 +3,7 @@ core = tarantool
> script =  master.lua
> description = tarantool/box, replication
> disabled = consistent.test.lua
> -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua
> +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua applier-rollback.test.lua
> config = suite.cfg
> lua_libs = lua/fast_replica.lua lua/rlimit.lua
> use_unix_sockets = True
> -- 
> 2.20.1
> 


--
Serge Petrenko
sergepetrenko@tarantool.org

  reply	other threads:[~2020-04-07 10:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-04 16:15 [Tarantool-patches] [PATCH v11 0/8] box/replication: prevent nil dereference on applier rollback Cyrill Gorcunov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 1/8] box: fix bootstrap comment Cyrill Gorcunov
2020-04-05  7:31   ` Konstantin Osipov
2020-04-05  7:56     ` Cyrill Gorcunov
2020-04-05  8:35       ` Konstantin Osipov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 2/8] box/alter: shrink txn_alter_trigger_new code Cyrill Gorcunov
2020-04-06  7:39   ` Konstantin Osipov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 3/8] box/request: add missing OutOfMemory diag_set Cyrill Gorcunov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 4/8] box/applier: add missing diag_set on region_alloc failure Cyrill Gorcunov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 5/8] box/replication: merge replica_by_id into replicaset Cyrill Gorcunov
2020-04-06  7:40   ` Konstantin Osipov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 6/8] applier: reduce applier_txn_rollback_cb code density Cyrill Gorcunov
2020-04-06  7:40   ` Konstantin Osipov
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 7/8] box/applier: prevent nil dereference on applier rollback Cyrill Gorcunov
2020-04-07 10:36   ` Serge Petrenko
2020-04-04 16:15 ` [Tarantool-patches] [PATCH v11 8/8] test: add replication/applier-rollback Cyrill Gorcunov
2020-04-07 10:26   ` Serge Petrenko [this message]
2020-04-07 10:55     ` Cyrill Gorcunov
2020-04-07 10:46 ` [Tarantool-patches] [PATCH v11 0/8] box/replication: prevent nil dereference on applier rollback Serge Petrenko
2020-04-07 11:00   ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AED406E3-B951-422A-8204-77A2E13A2462@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v11 8/8] test: add replication/applier-rollback' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox