* [Tarantool-patches] [PATCH 0/3] box/txn: fix assert in txn_rollback
@ 2020-02-27 14:16 Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 1/3] box/txn: rename txn_write to txn_commit_async Cyrill Gorcunov
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-02-27 14:16 UTC (permalink / raw)
To: tml
Currently both synchronous and asynchronous commits use
async write engine, which implies that commit happens
when txn is already unbound from fiber. Thus we should
make it so right before we pass txn into journal level,
otherwise if error happens for example in allocation
(journal_entry_new) the assert will trigger.
Side note: there is longstanding work on journal methods
redesign which I didn't complete yet but lets fix this
issue earlier since bugs are the bugs and must be fixed.
branch gorcunov/gh-4031-txn_write_to_wal-hotfix
issue https://github.com/tarantool/tarantool/issues/4776
There is a problem in passing this branch via gitlab
since the master branch is failing itself. But I tested
it manually to trigger assert without the patch applied.
Please take a look once time permit.
Cyrill Gorcunov (3):
box/txn: rename txn_write to txn_commit_async
box/txn: log error explicitly in txn_commit
box/txn: clear fiber storage right before journal_write
src/box/applier.cc | 2 +-
src/box/journal.c | 6 ++++++
src/box/txn.c | 10 ++++++----
src/box/txn.h | 2 +-
src/lib/core/errinj.h | 3 ++-
test/box/errinj.result | 20 ++++++++++++++++++++
test/box/errinj.test.lua | 7 +++++++
7 files changed, 43 insertions(+), 7 deletions(-)
base-commit: 9d7686fcc879d7edbb73cf216a76bb290cd64352
--
2.20.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Tarantool-patches] [PATCH 1/3] box/txn: rename txn_write to txn_commit_async
2020-02-27 14:16 [Tarantool-patches] [PATCH 0/3] box/txn: fix assert in txn_rollback Cyrill Gorcunov
@ 2020-02-27 14:16 ` Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 2/3] box/txn: log error explicitly in txn_commit Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write Cyrill Gorcunov
2 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-02-27 14:16 UTC (permalink / raw)
To: tml
This points out that commitment is done in async
way where wal engine will call txn_complete explicitly.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/applier.cc | 2 +-
src/box/txn.c | 4 ++--
src/box/txn.h | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index ae3d281a5..10859a835 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -805,7 +805,7 @@ applier_apply_tx(struct stailq *rows)
trigger_create(on_commit, applier_txn_commit_cb, NULL, NULL);
txn_on_commit(txn, on_commit);
- if (txn_write(txn) < 0)
+ if (txn_commit_async(txn) < 0)
goto fail;
/* Transaction was sent to journal so promote vclock. */
diff --git a/src/box/txn.c b/src/box/txn.c
index a4ca48224..cac97d1eb 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -564,7 +564,7 @@ txn_prepare(struct txn *txn)
}
int
-txn_write(struct txn *txn)
+txn_commit_async(struct txn *txn)
{
if (txn_prepare(txn) != 0) {
txn_rollback(txn);
@@ -592,7 +592,7 @@ txn_commit(struct txn *txn)
{
txn->fiber = fiber();
- if (txn_write(txn) != 0)
+ if (txn_commit_async(txn) != 0)
return -1;
/*
* In case of non-yielding journal the transaction could already
diff --git a/src/box/txn.h b/src/box/txn.h
index ae2c3a58f..215e9c0bc 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -293,7 +293,7 @@ txn_rollback(struct txn *txn);
* freed.
*/
int
-txn_write(struct txn *txn);
+txn_commit_async(struct txn *txn);
/**
* Most txns don't have triggers, and txn objects
--
2.20.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Tarantool-patches] [PATCH 2/3] box/txn: log error explicitly in txn_commit
2020-02-27 14:16 [Tarantool-patches] [PATCH 0/3] box/txn: fix assert in txn_rollback Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 1/3] box/txn: rename txn_write to txn_commit_async Cyrill Gorcunov
@ 2020-02-27 14:16 ` Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write Cyrill Gorcunov
2 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-02-27 14:16 UTC (permalink / raw)
To: tml
To be on same way as we do in txn_write_to_wal.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/txn.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/box/txn.c b/src/box/txn.c
index cac97d1eb..9e2888c09 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -604,8 +604,10 @@ txn_commit(struct txn *txn)
fiber_set_cancellable(cancellable);
}
int res = txn->signature >= 0 ? 0 : -1;
- if (res != 0)
+ if (res != 0) {
diag_set(ClientError, ER_WAL_IO);
+ diag_log();
+ }
/* Synchronous transactions are freed by the calling fiber. */
txn_free(txn);
--
2.20.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write
2020-02-27 14:16 [Tarantool-patches] [PATCH 0/3] box/txn: fix assert in txn_rollback Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 1/3] box/txn: rename txn_write to txn_commit_async Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 2/3] box/txn: log error explicitly in txn_commit Cyrill Gorcunov
@ 2020-02-27 14:16 ` Cyrill Gorcunov
2020-02-27 19:47 ` Kirill Yukhin
2 siblings, 1 reply; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-02-27 14:16 UTC (permalink / raw)
To: tml
Otherwise any previous attempts to rollback the transaction
will triggers assert since rollback action requires the
fiber to carry txn in the fiber->storage.txn. To make
sure the rollback works as expected we inject journal
allocation error to happen.
There are some nonobvious moments to be clarified. Both
asyncronous and synchronous commits are using async commit
engine. And journal_entry_complete is critical for both:
it handles success writes and rollback on failures.
On sync commits (txn_commit) we simply call fiber_yield
and wait until we're woken up by journal_entry_complete
to proceed. In turn txn_commit_async doesn't wait for
commit to happen but leaves journal entry to commit
later (for this reason the fiber->storage.txn is set
to null).
Fixes #4776
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/journal.c | 6 ++++++
src/box/txn.c | 2 +-
src/lib/core/errinj.h | 3 ++-
test/box/errinj.result | 20 ++++++++++++++++++++
test/box/errinj.test.lua | 7 +++++++
5 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/src/box/journal.c b/src/box/journal.c
index 11e78990d..9633e2af3 100644
--- a/src/box/journal.c
+++ b/src/box/journal.c
@@ -29,6 +29,7 @@
* SUCH DAMAGE.
*/
#include "journal.h"
+#include "errinj.h"
#include <small/region.h>
#include <diag.h>
@@ -62,6 +63,11 @@ journal_entry_new(size_t n_rows, struct region *region,
size_t size = (sizeof(struct journal_entry) +
sizeof(entry->rows[0]) * n_rows);
+ ERROR_INJECT(ERRINJ_JOURNAL_NEW, {
+ diag_set(OutOfMemory, 0, "region", "journal_entry");
+ return NULL;
+ });
+
entry = region_aligned_alloc(region, size,
alignof(struct journal_entry));
if (entry == NULL) {
diff --git a/src/box/txn.c b/src/box/txn.c
index 9e2888c09..8b10c0965 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -520,6 +520,7 @@ txn_write_to_wal(struct txn *txn)
assert(local_row == remote_row + txn->n_new_rows);
/* Send the entry to the journal. */
+ fiber_set_txn(fiber(), NULL);
if (journal_write(req) < 0) {
diag_set(ClientError, ER_WAL_IO);
diag_log();
@@ -583,7 +584,6 @@ txn_commit_async(struct txn *txn)
fiber_set_txn(fiber(), NULL);
return 0;
}
- fiber_set_txn(fiber(), NULL);
return txn_write_to_wal(txn);
}
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index ed0cba903..08ea47f2e 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -136,7 +136,8 @@ struct errinj {
_(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \
_(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \
_(ERRINJ_FIBER_MADVISE, ERRINJ_BOOL, {.bparam = false}) \
- _(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1})
+ _(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1}) \
+ _(ERRINJ_JOURNAL_NEW, ERRINJ_BOOL, {.bparam = false})
ENUM0(errinj_id, ERRINJ_LIST);
extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index daa27ed24..927f86402 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -54,6 +54,7 @@ evals
- ERRINJ_HTTP_RESPONSE_ADD_WAIT: false
- ERRINJ_INDEX_ALLOC: false
- ERRINJ_IPROTO_TX_DELAY: false
+ - ERRINJ_JOURNAL_NEW: false
- ERRINJ_LOG_ROTATE: false
- ERRINJ_MEMTX_DELAY_GC: false
- ERRINJ_PORT_DUMP: false
@@ -134,6 +135,25 @@ errinj.set("ERRINJ_TESTING", false)
---
- ok
...
+-- Check journal entries allocation failure
+errinj.set("ERRINJ_JOURNAL_NEW", true)
+---
+- ok
+...
+space:insert{1}
+---
+- error: Failed to allocate 0 bytes in region for journal_entry
+...
+space:get{1}
+---
+...
+errinj.set("ERRINJ_JOURNAL_NEW", false)
+---
+- ok
+...
+space:truncate()
+---
+...
-- Check how well we handle a failed log write
errinj.set("ERRINJ_WAL_IO", true)
---
diff --git a/test/box/errinj.test.lua b/test/box/errinj.test.lua
index 01d2b68df..f007e97f6 100644
--- a/test/box/errinj.test.lua
+++ b/test/box/errinj.test.lua
@@ -30,6 +30,13 @@ errinj.set("ERRINJ_TESTING", true)
space:select{222444}
errinj.set("ERRINJ_TESTING", false)
+-- Check journal entries allocation failure
+errinj.set("ERRINJ_JOURNAL_NEW", true)
+space:insert{1}
+space:get{1}
+errinj.set("ERRINJ_JOURNAL_NEW", false)
+space:truncate()
+
-- Check how well we handle a failed log write
errinj.set("ERRINJ_WAL_IO", true)
space:insert{1}
--
2.20.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write Cyrill Gorcunov
@ 2020-02-27 19:47 ` Kirill Yukhin
2020-02-27 20:20 ` Cyrill Gorcunov
0 siblings, 1 reply; 7+ messages in thread
From: Kirill Yukhin @ 2020-02-27 19:47 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml
Hello,
// This is not an official review. Just a nitpicking.
On 27 фев 17:16, Cyrill Gorcunov wrote:
> --- a/test/box/errinj.result
> +++ b/test/box/errinj.result
> @@ -134,6 +135,25 @@ errinj.set("ERRINJ_TESTING", false)
> ---
> - ok
> ...
> +-- Check journal entries allocation failure
> +errinj.set("ERRINJ_JOURNAL_NEW", true)
> +---
> +- ok
> +...
> +space:insert{1}
> +---
> +- error: Failed to allocate 0 bytes in region for journal_entry
Why 0? How could we ever fail to allocate 0 bytes?
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write
2020-02-27 19:47 ` Kirill Yukhin
@ 2020-02-27 20:20 ` Cyrill Gorcunov
2020-02-27 20:27 ` Kirill Yukhin
0 siblings, 1 reply; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-02-27 20:20 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tml
On Thu, Feb 27, 2020 at 10:47:29PM +0300, Kirill Yukhin wrote:
> Hello,
>
> // This is not an official review. Just a nitpicking.
>
> > +space:insert{1}
> > +---
> > +- error: Failed to allocate 0 bytes in region for journal_entry
>
> Why 0? How could we ever fail to allocate 0 bytes?
To make this message immutable even if size of the journal_entry
structure get changed. We don't care about the size here but rather
the fact that allocation failed. If you prefer I can put some
randomn number here instead.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write
2020-02-27 20:20 ` Cyrill Gorcunov
@ 2020-02-27 20:27 ` Kirill Yukhin
0 siblings, 0 replies; 7+ messages in thread
From: Kirill Yukhin @ 2020-02-27 20:27 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml
On 27 фев 23:20, Cyrill Gorcunov wrote:
> On Thu, Feb 27, 2020 at 10:47:29PM +0300, Kirill Yukhin wrote:
> > Hello,
> >
> > // This is not an official review. Just a nitpicking.
> >
> > > +space:insert{1}
> > > +---
> > > +- error: Failed to allocate 0 bytes in region for journal_entry
> >
> > Why 0? How could we ever fail to allocate 0 bytes?
>
> To make this message immutable even if size of the journal_entry
> structure get changed. We don't care about the size here but rather
> the fact that allocation failed. If you prefer I can put some
> randomn number here instead.
Now I get it. Sure, let's make it 0 (zero).
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-02-27 20:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-27 14:16 [Tarantool-patches] [PATCH 0/3] box/txn: fix assert in txn_rollback Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 1/3] box/txn: rename txn_write to txn_commit_async Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 2/3] box/txn: log error explicitly in txn_commit Cyrill Gorcunov
2020-02-27 14:16 ` [Tarantool-patches] [PATCH 3/3] box/txn: clear fiber storage right before journal_write Cyrill Gorcunov
2020-02-27 19:47 ` Kirill Yukhin
2020-02-27 20:20 ` Cyrill Gorcunov
2020-02-27 20:27 ` Kirill Yukhin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox