From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 14 Aug 2018 19:17:19 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH] xlog: added request details to panic message for broken LSN Message-ID: <20180814161719.rwitrnl5yn4xdelg@esperanza> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: To: Sergei Kalashnikov Cc: tarantool-patches@freelists.org List-ID: Hello, The patch below is mangled: tabs have been replaced with spaces. Please use either `git send-email` or a MUA that doesn't mangle patches. For more details, please see our guideline: https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines/#how-to-submit-a-patch-for-review On Thu, Aug 09, 2018 at 06:02:13PM +0300, Sergei Kalashnikov wrote: > Aid the debugging of replication issues related to out-of-order requests. > Adds the details of request/tuple to the diagnostic message whenever possible. Commit message text width should be < 72 characters > > Fixes: #3105 Should be 'Closes #3105' Also, the subject line should be in the imperative mood (xlog: add request details ...) For more information about how to write a commit message, please see our guideline: https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines/#how-to-write-a-commit-message > --- > Ticket:https://github.com/tarantool/tarantool/issues/3105 > Branch:ztarvos/gh-3105-log-request-broken-lsn "Ticket:" and "Branch:" prefixes are not necessary. It's clear which is which without them. Also, the branch should given as a hyper link. In your case it would be: https://github.com/tarantool/tarantool/commits/ztarvos/gh-3105-log-request-broken-lsn > > src/box/applier.cc | 8 +- > src/box/recovery.cc | 4 +- > src/box/vclock.c | 6 + > src/box/vclock.h | 22 +++ > src/box/wal.c | 12 +- > src/box/xrow.c | 24 +++ > src/box/xrow.h | 15 ++ > test/xlog/log_request_broken_lsn_panic.result | 200 ++++++++++++++++++++++++ > test/xlog/log_request_broken_lsn_panic.test.lua | 105 +++++++++++++ > test/xlog/panic_broken_lsn.lua | 11 ++ > 10 files changed, 395 insertions(+), 12 deletions(-) > create mode 100644 test/xlog/log_request_broken_lsn_panic.result > create mode 100644 test/xlog/log_request_broken_lsn_panic.test.lua > create mode 100644 test/xlog/panic_broken_lsn.lua The commit adds some trailing space. They aren't visible in the diff below, because your MUA mangles spaces, but they are visible on the branch with the commit: see `git show 201114199d`, lines 99, 168, 312, 322, 324, 326, 328, 330, 331, 368. Please look through your patches before submitting them to make sure it doesn't happen. Adding [color] ui = true to your gitconfig might be helpful. Regarding the patch itself. After it is applied, vclock_follow() is only used in mp_decode_vclock(). The latter can't panic on invalid LSN order so you could use vclock_follow_unsafe() there as well. That said, you only need two functions: 1. Plain vclock_follow(). This one doesn't need the panic(). A simple assertion will do. No need in '_unsafe' suffix. 2. A function that takes an xrow instead of replica_id/lsn. Let's call it vclock_follow_xrow(), not vice versa as in your patch, because 'xrow' is a mere argument here, not a subject/object. This function should be a wrapper around vclock_follow() which checks LSN order before calling it. I guess it could be defined in a header, because it's going to be pretty short. Also, this function doesn't need to take replica_id/lsn explicitly, as it can use the corresponding members of the xrow_header struct. I think we should use this function everywhere where we have an xrow, even if the order can't be broken due to some extra checks (this wouldn't make the code any slower than it is now, because vclock_follow(), which is currently used, always does the check). Regarding the test. It's too complex IMHO. And the most of the complexity comes from the code that generates a broken WAL. May be, we could simplify it by using error injection to generate a broken WAL instead (see src/errinj.c)? Also, the test only checks the case of broken LSN order on recovery while this feature is primarily needed for the case of broken replication (because on local recovery we can use `tarantoolctl cat` to find a broken record). Do you think if it's possible to test vclock_follow_xrow() in the applier code as well?