[Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
Serge Petrenko
sergepetrenko at tarantool.org
Tue Feb 18 20:28:02 MSK 2020
Hi! Thanks for your review!
Please find my answers below, together with an incremental diff.
I’ll send v3 shortly.
> 16 февр. 2020 г., в 19:15, Vladislav Shpilevoy <v.shpilevoy at tarantool.org> написал(а):
>
> Hi! Thanks for the patch! I will review other commits when
> Kostja is fine with them.
>
> Since he finished with this one, here is my nit: lets keep
> assertion for the debug build. If not this assertion, we
> probably wouldn't notice this bug, and may miss future bugs
> without it. During running the tests we won't notice a
> warning.
>
> Also we probably should not call vclock_follow() at all, if
> lsn is broken. Just keep it as it. It does not look right to
Ok
> decrease it. And make it panic() in vclock_follow() to catch
> other bugs related to it.
Kostja was against panicking on the previous review iteration,
the assertion is still in `vclock_follow()`
>
> On 13/02/2020 22:52, sergepetrenko wrote:
>> From: Serge Petrenko <sergepetrenko at tarantool.org>
>>
>> There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
>> fire in release builds, of course. Let's at least warn the user on an
>> attemt to write a record with a duplicate or otherwise broken lsn.
>>
>> Follow-up #4739
>> ---
>> src/box/wal.c | 15 ++++++++++++---
>> 1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/box/wal.c b/src/box/wal.c
>> index 0ae66ff32..f8ee2b7d8 100644
>> --- a/src/box/wal.c
>> +++ b/src/box/wal.c
>> @@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>> (*row)->tsn = tsn;
>> (*row)->is_commit = row == end - 1;
>> } else {
>> - vclock_follow(vclock_diff, (*row)->replica_id,
>> - (*row)->lsn - vclock_get(base,
>> - (*row)->replica_id));
>> + int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
>> + if (diff <= vclock_get(vclock_diff,
>> + (*row)->replica_id)) {
>> + say_crit("Attempt to write a broken LSN to WAL:"
>> + " replica id: %d, committed lsn: %d,"
>> + " new lsn %d", (*row)->replica_id,
>> + vclock_get(base, (*row)->replica_id) +
>> + vclock_get(vclock_diff,
>> + (*row)->replica_id),
>> + (*row)->lsn);
>> + }
>> + vclock_follow(vclock_diff, (*row)->replica_id, diff);
>
> On the summary, lets call follow() in 'else' branch, and add unreachable()
> after crit log.
I believe `unreachable` doesn’t fit here, since it implies that the code is
truly unreachable, while we are trying to catch something that «shouldn’t happen».
I’ve even seen a ticket in our repo regarding this misuse.
Let’s just leave an `assert(0)` in this branch, unreachable is defined like that anyway.
Here are my changes:
diff --git a/src/box/wal.c b/src/box/wal.c
index f8ee2b7d8..a87aedf1d 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
if (diff <= vclock_get(vclock_diff,
(*row)->replica_id)) {
say_crit("Attempt to write a broken LSN to WAL:"
- " replica id: %d, committed lsn: %d,"
+ " replica id: %d, confirmed lsn: %d,"
" new lsn %d", (*row)->replica_id,
vclock_get(base, (*row)->replica_id) +
vclock_get(vclock_diff,
(*row)->replica_id),
(*row)->lsn);
+ assert(0);
+ } else {
+ vclock_follow(vclock_diff, (*row)->replica_id, diff);
}
- vclock_follow(vclock_diff, (*row)->replica_id, diff);
}
}
}
--
Serge Petrenko
sergepetrenko at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200218/8c2359e5/attachment.html>
More information about the Tarantool-patches
mailing list