[Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn

Serge Petrenko sergepetrenko at tarantool.org
Tue Feb 18 20:28:02 MSK 2020


Hi! Thanks for your review!

Please find my answers below, together with an incremental diff.
I’ll send v3 shortly.

> 16 февр. 2020 г., в 19:15, Vladislav Shpilevoy <v.shpilevoy at tarantool.org> написал(а):
> 
> Hi! Thanks for the patch! I will review other commits when
> Kostja is fine with them.
> 
> Since he finished with this one, here is my nit: lets keep
> assertion for the debug build. If not this assertion, we
> probably wouldn't notice this bug, and may miss future bugs
> without it. During running the tests we won't notice a
> warning.
> 
> Also we probably should not call vclock_follow() at all, if
> lsn is broken. Just keep it as it. It does not look right to

Ok

> decrease it. And make it panic() in vclock_follow() to catch
> other bugs related to it.

Kostja was against panicking on the previous review iteration,
the assertion is still in `vclock_follow()`

> 
> On 13/02/2020 22:52, sergepetrenko wrote:
>> From: Serge Petrenko <sergepetrenko at tarantool.org>
>> 
>> There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
>> fire in release builds, of course. Let's at least warn the user on an
>> attemt to write a record with a duplicate or otherwise broken lsn.
>> 
>> Follow-up #4739
>> ---
>> src/box/wal.c | 15 ++++++++++++---
>> 1 file changed, 12 insertions(+), 3 deletions(-)
>> 
>> diff --git a/src/box/wal.c b/src/box/wal.c
>> index 0ae66ff32..f8ee2b7d8 100644
>> --- a/src/box/wal.c
>> +++ b/src/box/wal.c
>> @@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>> 			(*row)->tsn = tsn;
>> 			(*row)->is_commit = row == end - 1;
>> 		} else {
>> -			vclock_follow(vclock_diff, (*row)->replica_id,
>> -				      (*row)->lsn - vclock_get(base,
>> -							       (*row)->replica_id));
>> +			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
>> +			if (diff <= vclock_get(vclock_diff,
>> +					       (*row)->replica_id)) {
>> +				say_crit("Attempt to write a broken LSN to WAL:"
>> +					 " replica id: %d, committed lsn: %d,"
>> +					 " new lsn %d", (*row)->replica_id,
>> +					 vclock_get(base, (*row)->replica_id) +
>> +					 vclock_get(vclock_diff,
>> +						    (*row)->replica_id),
>> +						    (*row)->lsn);
>> +			}
>> +			vclock_follow(vclock_diff, (*row)->replica_id, diff);
> 
> On the summary, lets call follow() in 'else' branch, and add unreachable()
> after crit log.


I believe `unreachable` doesn’t fit here, since it implies that the code is
truly unreachable, while we are trying to catch something that «shouldn’t happen».
I’ve even seen a ticket in our repo regarding this misuse.
Let’s just leave an `assert(0)` in this branch, unreachable is defined like that anyway.

Here are my changes:

diff --git a/src/box/wal.c b/src/box/wal.c
index f8ee2b7d8..a87aedf1d 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 			if (diff <= vclock_get(vclock_diff,
 					       (*row)->replica_id)) {
 				say_crit("Attempt to write a broken LSN to WAL:"
-					 " replica id: %d, committed lsn: %d,"
+					 " replica id: %d, confirmed lsn: %d,"
 					 " new lsn %d", (*row)->replica_id,
 					 vclock_get(base, (*row)->replica_id) +
 					 vclock_get(vclock_diff,
 						    (*row)->replica_id),
 						    (*row)->lsn);
+				assert(0);
+			} else {
+				vclock_follow(vclock_diff, (*row)->replica_id, diff);
 			}
-			vclock_follow(vclock_diff, (*row)->replica_id, diff);
 		}
 	}
 }

--
Serge Petrenko
sergepetrenko at tarantool.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200218/8c2359e5/attachment.html>


More information about the Tarantool-patches mailing list