Hi! Thanks for your review!

Please find my answers below, together with an incremental diff.
I’ll send v3 shortly.

16 февр. 2020 г., в 19:15, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):

Hi! Thanks for the patch! I will review other commits when
Kostja is fine with them.

Since he finished with this one, here is my nit: lets keep
assertion for the debug build. If not this assertion, we
probably wouldn't notice this bug, and may miss future bugs
without it. During running the tests we won't notice a
warning.

Also we probably should not call vclock_follow() at all, if
lsn is broken. Just keep it as it. It does not look right to

Ok

decrease it. And make it panic() in vclock_follow() to catch
other bugs related to it.

Kostja was against panicking on the previous review iteration,
the assertion is still in `vclock_follow()`


On 13/02/2020 22:52, sergepetrenko wrote:
From: Serge Petrenko <sergepetrenko@tarantool.org>

There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attemt to write a record with a duplicate or otherwise broken lsn.

Follow-up #4739
---
src/box/wal.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/box/wal.c b/src/box/wal.c
index 0ae66ff32..f8ee2b7d8 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
(*row)->tsn = tsn;
(*row)->is_commit = row == end - 1;
} else {
- vclock_follow(vclock_diff, (*row)->replica_id,
-       (*row)->lsn - vclock_get(base,
-        (*row)->replica_id));
+ int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
+ if (diff <= vclock_get(vclock_diff,
+        (*row)->replica_id)) {
+ say_crit("Attempt to write a broken LSN to WAL:"
+  " replica id: %d, committed lsn: %d,"
+  " new lsn %d", (*row)->replica_id,
+  vclock_get(base, (*row)->replica_id) +
+  vclock_get(vclock_diff,
+     (*row)->replica_id),
+     (*row)->lsn);
+ }
+ vclock_follow(vclock_diff, (*row)->replica_id, diff);

On the summary, lets call follow() in 'else' branch, and add unreachable()
after crit log.

I believe `unreachable` doesn’t fit here, since it implies that the code is
truly unreachable, while we are trying to catch something that «shouldn’t happen».
I’ve even seen a ticket in our repo regarding this misuse.
Let’s just leave an `assert(0)` in this branch, unreachable is defined like that anyway.

Here are my changes:

diff --git a/src/box/wal.c b/src/box/wal.c
index f8ee2b7d8..a87aedf1d 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
  if (diff <= vclock_get(vclock_diff,
         (*row)->replica_id)) {
  say_crit("Attempt to write a broken LSN to WAL:"
-  " replica id: %d, committed lsn: %d,"
+  " replica id: %d, confirmed lsn: %d,"
   " new lsn %d", (*row)->replica_id,
   vclock_get(base, (*row)->replica_id) +
   vclock_get(vclock_diff,
      (*row)->replica_id),
      (*row)->lsn);
+ assert(0);
+ } else {
+ vclock_follow(vclock_diff, (*row)->replica_id, diff);
  }
- vclock_follow(vclock_diff, (*row)->replica_id, diff);
  }
  }
 }

--
Serge Petrenko
sergepetrenko@tarantool.org