From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Serge Petrenko <sergepetrenko@tarantool.org>, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] replication: stop pushing TimedOut error to the replica Date: Tue, 13 Jul 2021 23:49:53 +0200 [thread overview] Message-ID: <9cec77da-70b3-e228-f092-2df40609d9f3@tarantool.org> (raw) In-Reply-To: <20210709074048.18169-1-sergepetrenko@tarantool.org> Hi! Good job on the patch! Especially on the explanation in the commit message. > diff --git a/src/box/xrow.c b/src/box/xrow.c > index 16cb2484c..c4647c6cb 100644 > --- a/src/box/xrow.c > +++ b/src/box/xrow.c > @@ -549,6 +550,10 @@ iproto_write_error(int fd, const struct error *e, uint32_t schema_version, > > size_t region_svp = region_used(region); > mpstream_iproto_encode_error(&stream, e); > + struct errinj *inj = errinj(ERRINJ_IPROTO_WRITE_ERROR_LARGE, > + ERRINJ_INT); > + if (inj != NULL && inj->iparam > 0) > + mpstream_memset(&stream, 'E', inj->iparam); Error injections are supposed to simulate something what can happen in reality. Otherwise they might test an impossible case, like here - the error object can't be corrupted. If I understood correctly, this way you are trying to prove that iproto_write_error() is not called at all. Because if it is, you break the data and see it in the test. You could make it panic() with the same result I suppose. It is one way. Another way would be to simulate exactly the happened case - fill the buffer with an incomplete xrow. You tried to do it, from what I see in the test. You create a big tuple which should fill the buffer. But still it didn't help somewhy and you added the invalid error object, right? I managed to remove this injection and still make the test fail 100% without your patch. I also think I found an explanation of why the corrupted error object was needed before. See my comment in the test. > diff --git a/test/replication/gh-4040-invalid-msgpack.result b/test/replication/gh-4040-invalid-msgpack.result > new file mode 100644 > index 000000000..4348ebd90 > --- /dev/null > +++ b/test/replication/gh-4040-invalid-msgpack.result > @@ -0,0 +1,180 @@ <...> > +-- Generate enough data to fill the sendbuf. > +-- This will make the relay yield in the middle of writing a row waiting for the > +-- socket to become writeable. > +tbl = {1} > + | --- > + | ... > +filler = string.rep('b', 100) > + | --- > + | ... > +for i = 2, 0.7 * bufsize / 100 do\ > + tbl[i] = filler\ The reason why you needed the corrupted error object is that the tuples are not big enough. Consider my machine (I think you should have the same values). I have bufsize = 8192 (8KB). Now consider the tuples you create: each is < 6KB. Each tuple even with all the MessagePack meta fits into the buffer alone. Relay writes the xrows using writev(). On my machine this is what I see when I print its results on your tuples: Written 5744 Written 5744 So the xrows are written one by one, not in parts. I don't know why. Probably in the kernel it does not accept even a part of the buffer if sees it does not fit. So you never had in this test a partially written xrow. I changed it slightly: -for i = 2, 0.7 * bufsize / 100 do\ +for i = 2, bufsize / 100 + 1 do\ And removed your ERRINJ_IPROTO_WRITE_ERROR_LARGE injection. Now each tuple is > 8KB. The kernel is forced to write it in parts. This is what I see: Written 8192 Written 102 The total size of one xrow is > 8KB. The kernel allows to write a part of the tuple, and now any error, even a valid one, will corrupt the buffer. That happens if I revert your timeout error ignorance. What I don't get is why did you have 0.7 coefficient in the loop? My full diff: ==================== diff --git a/src/box/xrow.c b/src/box/xrow.c index c4647c6cb..db72437a9 100644 --- a/src/box/xrow.c +++ b/src/box/xrow.c @@ -550,10 +550,6 @@ iproto_write_error(int fd, const struct error *e, uint32_t schema_version, size_t region_svp = region_used(region); mpstream_iproto_encode_error(&stream, e); - struct errinj *inj = errinj(ERRINJ_IPROTO_WRITE_ERROR_LARGE, - ERRINJ_INT); - if (inj != NULL && inj->iparam > 0) - mpstream_memset(&stream, 'E', inj->iparam); mpstream_flush(&stream); if (is_error) goto cleanup; diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h index 0d8ec967d..e49242807 100644 --- a/src/lib/core/errinj.h +++ b/src/lib/core/errinj.h @@ -153,7 +153,6 @@ struct errinj { _(ERRINJ_SNAP_COMMIT_FAIL, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_IPROTO_SINGLE_THREAD_STAT, ERRINJ_INT, {.iparam = -1}) \ _(ERRINJ_IPROTO_WRITE_ERROR_DELAY, ERRINJ_BOOL, {.bparam = false})\ - _(ERRINJ_IPROTO_WRITE_ERROR_LARGE, ERRINJ_INT, {.iparam = -1})\ _(ERRINJ_APPLIER_READ_TX_ROW_DELAY, ERRINJ_BOOL, {.bparam = false})\ ENUM0(errinj_id, ERRINJ_LIST); diff --git a/test/box/errinj.result b/test/box/errinj.result index e5193a3d3..44f86a54e 100644 --- a/test/box/errinj.result +++ b/test/box/errinj.result @@ -61,7 +61,6 @@ evals - ERRINJ_IPROTO_SINGLE_THREAD_STAT: -1 - ERRINJ_IPROTO_TX_DELAY: false - ERRINJ_IPROTO_WRITE_ERROR_DELAY: false - - ERRINJ_IPROTO_WRITE_ERROR_LARGE: -1 - ERRINJ_LOG_ROTATE: false - ERRINJ_MEMTX_DELAY_GC: false - ERRINJ_PORT_DUMP: false diff --git a/test/replication/gh-4040-invalid-msgpack.result b/test/replication/gh-4040-invalid-msgpack.result index 4348ebd90..def47ed3c 100644 --- a/test/replication/gh-4040-invalid-msgpack.result +++ b/test/replication/gh-4040-invalid-msgpack.result @@ -47,9 +47,6 @@ test_run:switch('replica') | - true | ... -box.cfg{log_level=6} - | --- - | ... sign = box.info.signature | --- | ... @@ -81,10 +78,6 @@ box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', true) | --- | - ok | ... -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', bufsize) - | --- - | - ok - | ... -- Generate enough data to fill the sendbuf. -- This will make the relay yield in the middle of writing a row waiting for the -- socket to become writeable. @@ -94,7 +87,7 @@ tbl = {1} filler = string.rep('b', 100) | --- | ... -for i = 2, 0.7 * bufsize / 100 do\ +for i = 2, bufsize / 100 + 1 do\ tbl[i] = filler\ end | --- @@ -136,10 +129,6 @@ box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', false) | --- | - ok | ... -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', -1) - | --- - | - ok - | ... test_run:switch('replica') | --- diff --git a/test/replication/gh-4040-invalid-msgpack.test.lua b/test/replication/gh-4040-invalid-msgpack.test.lua index 0e51affac..f7f1c3953 100644 --- a/test/replication/gh-4040-invalid-msgpack.test.lua +++ b/test/replication/gh-4040-invalid-msgpack.test.lua @@ -18,7 +18,6 @@ _ = box.space.test:create_index('pk') test_run:cmd('start server replica with args="1000"') test_run:switch('replica') -box.cfg{log_level=6} sign = box.info.signature box.error.injection.set('ERRINJ_APPLIER_READ_TX_ROW_DELAY', true) test_run:switch('master') @@ -31,13 +30,12 @@ bufsize = soc:getsockopt('SOL_SOCKET', 'SO_SNDBUF') require('log').info("SO_SNDBUF size is %d", bufsize) -- Master shouldn't try to write the error while the socket isn't writeable. box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', true) -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', bufsize) -- Generate enough data to fill the sendbuf. -- This will make the relay yield in the middle of writing a row waiting for the -- socket to become writeable. tbl = {1} filler = string.rep('b', 100) -for i = 2, 0.7 * bufsize / 100 do\ +for i = 2, bufsize / 100 + 1 do\ tbl[i] = filler\ end for i = 1,10 do\ @@ -57,7 +55,6 @@ test_run:wait_cond(function() return box.info.signature > sign end) test_run:switch('master') box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', false) -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', -1) test_run:switch('replica')
next prev parent reply other threads:[~2021-07-13 21:49 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-09 7:40 Serge Petrenko via Tarantool-patches 2021-07-09 7:59 ` Cyrill Gorcunov via Tarantool-patches 2021-07-09 8:30 ` Serge Petrenko via Tarantool-patches 2021-07-09 10:41 ` Cyrill Gorcunov via Tarantool-patches 2021-07-12 10:14 ` Serge Petrenko via Tarantool-patches 2021-07-12 10:38 ` Cyrill Gorcunov via Tarantool-patches 2021-07-13 21:49 ` Vladislav Shpilevoy via Tarantool-patches [this message] 2021-07-15 10:20 ` Serge Petrenko via Tarantool-patches 2021-07-15 21:04 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9cec77da-70b3-e228-f092-2df40609d9f3@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] replication: stop pushing TimedOut error to the replica' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox