[tarantool-patches] [PATCH] xlog: added request details to panic message for broken LSN

Vladimir Davydov vdavydov.dev at gmail.com
Tue Aug 14 19:17:19 MSK 2018


Hello,

The patch below is mangled: tabs have been replaced with spaces.
Please use either `git send-email` or a MUA that doesn't mangle patches.
For more details, please see our guideline:

https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines/#how-to-submit-a-patch-for-review

On Thu, Aug 09, 2018 at 06:02:13PM +0300, Sergei Kalashnikov wrote:
> Aid the debugging of replication issues related to out-of-order requests.
> Adds the details of request/tuple to the diagnostic message whenever possible.

Commit message text width should be < 72 characters

> 
> Fixes: #3105

Should be 'Closes #3105'

Also, the subject line should be in the imperative mood
(xlog: add request details ...)

For more information about how to write a commit message, please see
our guideline:

https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines/#how-to-write-a-commit-message

> ---
>  Ticket:https://github.com/tarantool/tarantool/issues/3105
>  Branch:ztarvos/gh-3105-log-request-broken-lsn

"Ticket:" and "Branch:" prefixes are not necessary. It's clear which
is which without them. Also, the branch should given as a hyper link.
In your case it would be:

https://github.com/tarantool/tarantool/commits/ztarvos/gh-3105-log-request-broken-lsn

> 
>  src/box/applier.cc                              |   8 +-
>  src/box/recovery.cc                             |   4 +-
>  src/box/vclock.c                                |   6 +
>  src/box/vclock.h                                |  22 +++
>  src/box/wal.c                                   |  12 +-
>  src/box/xrow.c                                  |  24 +++
>  src/box/xrow.h                                  |  15 ++
>  test/xlog/log_request_broken_lsn_panic.result   | 200 ++++++++++++++++++++++++
>  test/xlog/log_request_broken_lsn_panic.test.lua | 105 +++++++++++++
>  test/xlog/panic_broken_lsn.lua                  |  11 ++
>  10 files changed, 395 insertions(+), 12 deletions(-)
>  create mode 100644 test/xlog/log_request_broken_lsn_panic.result
>  create mode 100644 test/xlog/log_request_broken_lsn_panic.test.lua
>  create mode 100644 test/xlog/panic_broken_lsn.lua

The commit adds some trailing space. They aren't visible in the diff
below, because your MUA mangles spaces, but they are visible on the
branch with the commit: see `git show 201114199d`, lines 99, 168, 312,
322, 324, 326, 328, 330, 331, 368. Please look through your patches
before submitting them to make sure it doesn't happen. Adding

[color]
	ui = true

to your gitconfig might be helpful.

Regarding the patch itself. After it is applied, vclock_follow() is only
used in mp_decode_vclock(). The latter can't panic on invalid LSN order
so you could use vclock_follow_unsafe() there as well.

That said, you only need two functions:

 1. Plain vclock_follow(). This one doesn't need the panic().
    A simple assertion will do. No need in '_unsafe' suffix.

 2. A function that takes an xrow instead of replica_id/lsn.

    Let's call it vclock_follow_xrow(), not vice versa as in your patch,
    because 'xrow' is a mere argument here, not a subject/object.

    This function should be a wrapper around vclock_follow() which
    checks LSN order before calling it. I guess it could be defined
    in a header, because it's going to be pretty short.
    
    Also, this function doesn't need to take replica_id/lsn explicitly,
    as it can use the corresponding members of the xrow_header struct.

    I think we should use this function everywhere where we have an
    xrow, even if the order can't be broken due to some extra checks
    (this wouldn't make the code any slower than it is now, because
    vclock_follow(), which is currently used, always does the check).

Regarding the test. It's too complex IMHO. And the most of the
complexity comes from the code that generates a broken WAL. May be,
we could simplify it by using error injection to generate a broken
WAL instead (see src/errinj.c)?

Also, the test only checks the case of broken LSN order on recovery
while this feature is primarily needed for the case of broken
replication (because on local recovery we can use `tarantoolctl cat`
to find a broken record). Do you think if it's possible to test
vclock_follow_xrow() in the applier code as well?



More information about the Tarantool-patches mailing list