[Tarantool-patches] [PATCH] replication: stop pushing TimedOut error to the replica

Serge Petrenko sergepetrenko at tarantool.org
Thu Jul 15 13:20:52 MSK 2021



14.07.2021 00:49, Vladislav Shpilevoy пишет:
> Hi! Good job on the patch! Especially on the
> explanation in the commit message.

Hi! Thanks for the review!

TLDR: applied your diff.

Please find my answers below.

>
>> diff --git a/src/box/xrow.c b/src/box/xrow.c
>> index 16cb2484c..c4647c6cb 100644
>> --- a/src/box/xrow.c
>> +++ b/src/box/xrow.c
>> @@ -549,6 +550,10 @@ iproto_write_error(int fd, const struct error *e, uint32_t schema_version,
>>   
>>   	size_t region_svp = region_used(region);
>>   	mpstream_iproto_encode_error(&stream, e);
>> +	struct errinj *inj = errinj(ERRINJ_IPROTO_WRITE_ERROR_LARGE,
>> +				    ERRINJ_INT);
>> +	if (inj != NULL && inj->iparam > 0)
>> +		mpstream_memset(&stream, 'E', inj->iparam);
> Error injections are supposed to simulate something what can happen
> in reality. Otherwise they might test an impossible case, like here -
> the error object can't be corrupted. If I understood correctly, this

I agree, but I couldn't find a better way to reproduce the issue.

> way you are trying to prove that iproto_write_error() is not called
> at all. Because if it is, you break the data and see it in the test.
> You could make it panic() with the same result I suppose. It is one way.

What I'm trying to do is to achieve the exact same error.

I am trying to write an incomplete row, indeed, but that's not enough to 
trigger
the error: once replica receives an incomplete row, it just exits with
"Unexpected EOF when reading from socket", which is recoverable.

In order to receive ER_INVALID_MSGPACK, not only we need to write an 
incomplete row,
we also need len(incomplete row) + len(iproto error) >= len(original row)

This way length check passes and replication is broken on 
ER_INVALID_MSGPACK,
because obviously you can't have an error object in the middle of other 
row body.

Unfortunately I could only achieve writing an incomplete row with 
relatively large rows
(60 kb in one case and ~ 8 kb in the other). The row is usually split in 
half, so even
while the problem is here (replica receives an iproto error right in the 
middle of another row),
it can't be tested, because all we see is "Unexpected EOF". (8kb / 2 + 
180  < 8kb)

I tried to turn off length checks on the replica with an error 
injection, but stumbled upon a
couple of problems. So I decided to make the error object huge, just so 
it would pass
the length check when written after the partial row.

This way the exact error we need is triggered, ER_INVALID_MSGPACK. And 
it's triggered
not because iproto error contains garbage, but because the row is padded 
with an
iproto error. I checked with verbose logs, and the broken row had the 
beginning of a normal
row, and an iproto error instead of the rest of the row body.
> Another way would be to simulate exactly the happened case - fill the
> buffer with an incomplete xrow.
>
> You tried to do it, from what I see in the test. You create a big
> tuple which should fill the buffer.

>
> But still it didn't help somewhy and you added the invalid error
> object, right?

Yes, as described above.

>
> I managed to remove this injection and still make the test fail 100%
> without your patch. I also think I found an explanation of why the
> corrupted error object was needed before.

>
> See my comment in the test.
>
>> diff --git a/test/replication/gh-4040-invalid-msgpack.result b/test/replication/gh-4040-invalid-msgpack.result
>> new file mode 100644
>> index 000000000..4348ebd90
>> --- /dev/null
>> +++ b/test/replication/gh-4040-invalid-msgpack.result
>> @@ -0,0 +1,180 @@
> <...>
>
>> +-- Generate enough data to fill the sendbuf.
>> +-- This will make the relay yield in the middle of writing a row waiting for the
>> +-- socket to become writeable.
>> +tbl = {1}
>> + | ---
>> + | ...
>> +filler = string.rep('b', 100)
>> + | ---
>> + | ...
>> +for i = 2, 0.7 * bufsize / 100 do\
>> +    tbl[i] = filler\
> The reason why you needed the corrupted error object is that the
> tuples are not big enough.
>
> Consider my machine (I think you should have the same values). I
> have bufsize = 8192 (8KB). Now consider the tuples you create: each
> is < 6KB. Each tuple even with all the MessagePack meta fits into
> the buffer alone.

My buffer size is 8kb as well, yes.

>
> Relay writes the xrows using writev(). On my machine this is what I
> see when I print its results on your tuples:
>
> 	Written 5744
> 	Written 5744
>
> So the xrows are written one by one, not in parts. I don't know why.
> Probably in the kernel it does not accept even a part of the buffer
> if sees it does not fit. So you never had in this test a partially
> written xrow.
That's strange. I did have partially written rows. And I could only 
reproduce the error
reliably only with row size 0.7 * bufsize.

I checked the logs (there's a hex dump of the row in verbose mode),
the error is due to partially written rows, not due to 
IPROTO_WRITE_ERROR_LARGE.


>
> I changed it slightly:
>
> 	-for i = 2, 0.7 * bufsize / 100 do\
> 	+for i = 2, bufsize / 100 + 1 do\
>
> And removed your ERRINJ_IPROTO_WRITE_ERROR_LARGE injection. Now
> each tuple is > 8KB. The kernel is forced to write it in parts.
> This is what I see:
>
> 	Written 8192
> 	Written 102
>
> The total size of one xrow is > 8KB. The kernel allows to write a
> part of the tuple, and now any error, even a valid one, will corrupt
> the buffer. That happens if I revert your timeout error ignorance.
>
> What I don't get is why did you have 0.7 coefficient in the loop?

With other row lengths the reproducer was not stable,
Tarantool didn't fail every time. It failed once in a couple of runs.

And again, I know the large error errinj looks strange, but it doesn't
make the test fail due to the errinj. The test fails due to partially 
written rows.

>
> My full diff:

Cool! I tried your version and it does exactly what's needed. Applied.

I don't know why that didn't work out when I first tried this approach.
I guess you really need row_size ~ buf size,
this way the part of the row that doesn't fit is small (< iproto error 
size),
and normal iproto error on top is enough to trigger the error.

Thanks for the help!

>
> ====================
> diff --git a/src/box/xrow.c b/src/box/xrow.c
> index c4647c6cb..db72437a9 100644
> --- a/src/box/xrow.c
> +++ b/src/box/xrow.c
> @@ -550,10 +550,6 @@ iproto_write_error(int fd, const struct error *e, uint32_t schema_version,
>   
>   	size_t region_svp = region_used(region);
>   	mpstream_iproto_encode_error(&stream, e);
> -	struct errinj *inj = errinj(ERRINJ_IPROTO_WRITE_ERROR_LARGE,
> -				    ERRINJ_INT);
> -	if (inj != NULL && inj->iparam > 0)
> -		mpstream_memset(&stream, 'E', inj->iparam);
>   	mpstream_flush(&stream);
>   	if (is_error)
>   		goto cleanup;
> diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
> index 0d8ec967d..e49242807 100644
> --- a/src/lib/core/errinj.h
> +++ b/src/lib/core/errinj.h
> @@ -153,7 +153,6 @@ struct errinj {
>   	_(ERRINJ_SNAP_COMMIT_FAIL, ERRINJ_BOOL, {.bparam = false}) \
>   	_(ERRINJ_IPROTO_SINGLE_THREAD_STAT, ERRINJ_INT, {.iparam = -1}) \
>   	_(ERRINJ_IPROTO_WRITE_ERROR_DELAY, ERRINJ_BOOL, {.bparam = false})\
> -	_(ERRINJ_IPROTO_WRITE_ERROR_LARGE, ERRINJ_INT, {.iparam = -1})\
>   	_(ERRINJ_APPLIER_READ_TX_ROW_DELAY, ERRINJ_BOOL, {.bparam = false})\
>   
>   ENUM0(errinj_id, ERRINJ_LIST);
> diff --git a/test/box/errinj.result b/test/box/errinj.result
> index e5193a3d3..44f86a54e 100644
> --- a/test/box/errinj.result
> +++ b/test/box/errinj.result
> @@ -61,7 +61,6 @@ evals
>     - ERRINJ_IPROTO_SINGLE_THREAD_STAT: -1
>     - ERRINJ_IPROTO_TX_DELAY: false
>     - ERRINJ_IPROTO_WRITE_ERROR_DELAY: false
> -  - ERRINJ_IPROTO_WRITE_ERROR_LARGE: -1
>     - ERRINJ_LOG_ROTATE: false
>     - ERRINJ_MEMTX_DELAY_GC: false
>     - ERRINJ_PORT_DUMP: false
> diff --git a/test/replication/gh-4040-invalid-msgpack.result b/test/replication/gh-4040-invalid-msgpack.result
> index 4348ebd90..def47ed3c 100644
> --- a/test/replication/gh-4040-invalid-msgpack.result
> +++ b/test/replication/gh-4040-invalid-msgpack.result
> @@ -47,9 +47,6 @@ test_run:switch('replica')
>    | - true
>    | ...
>   
> -box.cfg{log_level=6}
> - | ---
> - | ...
>   sign = box.info.signature
>    | ---
>    | ...
> @@ -81,10 +78,6 @@ box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', true)
>    | ---
>    | - ok
>    | ...
> -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', bufsize)
> - | ---
> - | - ok
> - | ...
>   -- Generate enough data to fill the sendbuf.
>   -- This will make the relay yield in the middle of writing a row waiting for the
>   -- socket to become writeable.
> @@ -94,7 +87,7 @@ tbl = {1}
>   filler = string.rep('b', 100)
>    | ---
>    | ...
> -for i = 2, 0.7 * bufsize / 100 do\
> +for i = 2, bufsize / 100 + 1 do\
>       tbl[i] = filler\
>   end
>    | ---
> @@ -136,10 +129,6 @@ box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', false)
>    | ---
>    | - ok
>    | ...
> -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', -1)
> - | ---
> - | - ok
> - | ...
>   
>   test_run:switch('replica')
>    | ---
> diff --git a/test/replication/gh-4040-invalid-msgpack.test.lua b/test/replication/gh-4040-invalid-msgpack.test.lua
> index 0e51affac..f7f1c3953 100644
> --- a/test/replication/gh-4040-invalid-msgpack.test.lua
> +++ b/test/replication/gh-4040-invalid-msgpack.test.lua
> @@ -18,7 +18,6 @@ _ = box.space.test:create_index('pk')
>   test_run:cmd('start server replica with args="1000"')
>   test_run:switch('replica')
>   
> -box.cfg{log_level=6}
>   sign = box.info.signature
>   box.error.injection.set('ERRINJ_APPLIER_READ_TX_ROW_DELAY', true)
>   test_run:switch('master')
> @@ -31,13 +30,12 @@ bufsize = soc:getsockopt('SOL_SOCKET', 'SO_SNDBUF')
>   require('log').info("SO_SNDBUF size is %d", bufsize)
>   -- Master shouldn't try to write the error while the socket isn't writeable.
>   box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', true)
> -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', bufsize)
>   -- Generate enough data to fill the sendbuf.
>   -- This will make the relay yield in the middle of writing a row waiting for the
>   -- socket to become writeable.
>   tbl = {1}
>   filler = string.rep('b', 100)
> -for i = 2, 0.7 * bufsize / 100 do\
> +for i = 2, bufsize / 100 + 1 do\
>       tbl[i] = filler\
>   end
>   for i = 1,10 do\
> @@ -57,7 +55,6 @@ test_run:wait_cond(function() return box.info.signature > sign end)
>   
>   test_run:switch('master')
>   box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', false)
> -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', -1)
>   
>   test_run:switch('replica')
>   

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list