From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 17F216EC55; Wed, 14 Jul 2021 00:49:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 17F216EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1626212997; bh=uR5lINjiAIWB4B3H26v0mCE86Xt5wRakO3NugZV9VqU=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=pZ1J7bGLtbS5k7OmPaLEQufgf/O2hnkyWsUzrsYsaEmwsHpFBHoh9qP5daxI/XhJn IqJ7o2/Sy3eb3AuR2FxeP2kKFfaCcJGJ55KDl4X4sRvc33v3fQPZbe7NNTbvlseCqE plzyTGZkZlYqfiAuZdvkrozIBIC2ymWkUZRYwdTE= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2B2316EC55 for ; Wed, 14 Jul 2021 00:49:55 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2B2316EC55 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1m3QHq-0008FU-8M; Wed, 14 Jul 2021 00:49:54 +0300 To: Serge Petrenko , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <20210709074048.18169-1-sergepetrenko@tarantool.org> Message-ID: <9cec77da-70b3-e228-f092-2df40609d9f3@tarantool.org> Date: Tue, 13 Jul 2021 23:49:53 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210709074048.18169-1-sergepetrenko@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD97BB0EF39AD2B33D5CFD6F66580F08A9E8DA110284E1A7113182A05F538085040C7E80BCD5313A4169B32975544D82AF52137E040689F63AB95BBFC7145EDB60A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE751BC6685BC61E6BCEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637D81244D2CF6B2D5F8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8301DCDAFAFFE71994A465AF2E435F4DC117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8B974A882099E279BDA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CB24F08513AFFAC7943847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A5EC2494124B137F4FBB81E197AEB630ACE2AD85F630058B56D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA753753CEE10E4ED4A7410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34D8FD11F74BDD6E8DD1805A1AB64796C17A638E833153D313B9A37C271290DE1E7591543049AEFC881D7E09C32AA3244C764CD1D9350DEA2066474C8F768ED895853296C06374E602FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojDdSFIg49M1RO7KQZDBl7Tg== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D5C3CEA90A936027CD49535E6A12D7B383841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] replication: stop pushing TimedOut error to the replica X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Good job on the patch! Especially on the explanation in the commit message. > diff --git a/src/box/xrow.c b/src/box/xrow.c > index 16cb2484c..c4647c6cb 100644 > --- a/src/box/xrow.c > +++ b/src/box/xrow.c > @@ -549,6 +550,10 @@ iproto_write_error(int fd, const struct error *e, uint32_t schema_version, > > size_t region_svp = region_used(region); > mpstream_iproto_encode_error(&stream, e); > + struct errinj *inj = errinj(ERRINJ_IPROTO_WRITE_ERROR_LARGE, > + ERRINJ_INT); > + if (inj != NULL && inj->iparam > 0) > + mpstream_memset(&stream, 'E', inj->iparam); Error injections are supposed to simulate something what can happen in reality. Otherwise they might test an impossible case, like here - the error object can't be corrupted. If I understood correctly, this way you are trying to prove that iproto_write_error() is not called at all. Because if it is, you break the data and see it in the test. You could make it panic() with the same result I suppose. It is one way. Another way would be to simulate exactly the happened case - fill the buffer with an incomplete xrow. You tried to do it, from what I see in the test. You create a big tuple which should fill the buffer. But still it didn't help somewhy and you added the invalid error object, right? I managed to remove this injection and still make the test fail 100% without your patch. I also think I found an explanation of why the corrupted error object was needed before. See my comment in the test. > diff --git a/test/replication/gh-4040-invalid-msgpack.result b/test/replication/gh-4040-invalid-msgpack.result > new file mode 100644 > index 000000000..4348ebd90 > --- /dev/null > +++ b/test/replication/gh-4040-invalid-msgpack.result > @@ -0,0 +1,180 @@ <...> > +-- Generate enough data to fill the sendbuf. > +-- This will make the relay yield in the middle of writing a row waiting for the > +-- socket to become writeable. > +tbl = {1} > + | --- > + | ... > +filler = string.rep('b', 100) > + | --- > + | ... > +for i = 2, 0.7 * bufsize / 100 do\ > + tbl[i] = filler\ The reason why you needed the corrupted error object is that the tuples are not big enough. Consider my machine (I think you should have the same values). I have bufsize = 8192 (8KB). Now consider the tuples you create: each is < 6KB. Each tuple even with all the MessagePack meta fits into the buffer alone. Relay writes the xrows using writev(). On my machine this is what I see when I print its results on your tuples: Written 5744 Written 5744 So the xrows are written one by one, not in parts. I don't know why. Probably in the kernel it does not accept even a part of the buffer if sees it does not fit. So you never had in this test a partially written xrow. I changed it slightly: -for i = 2, 0.7 * bufsize / 100 do\ +for i = 2, bufsize / 100 + 1 do\ And removed your ERRINJ_IPROTO_WRITE_ERROR_LARGE injection. Now each tuple is > 8KB. The kernel is forced to write it in parts. This is what I see: Written 8192 Written 102 The total size of one xrow is > 8KB. The kernel allows to write a part of the tuple, and now any error, even a valid one, will corrupt the buffer. That happens if I revert your timeout error ignorance. What I don't get is why did you have 0.7 coefficient in the loop? My full diff: ==================== diff --git a/src/box/xrow.c b/src/box/xrow.c index c4647c6cb..db72437a9 100644 --- a/src/box/xrow.c +++ b/src/box/xrow.c @@ -550,10 +550,6 @@ iproto_write_error(int fd, const struct error *e, uint32_t schema_version, size_t region_svp = region_used(region); mpstream_iproto_encode_error(&stream, e); - struct errinj *inj = errinj(ERRINJ_IPROTO_WRITE_ERROR_LARGE, - ERRINJ_INT); - if (inj != NULL && inj->iparam > 0) - mpstream_memset(&stream, 'E', inj->iparam); mpstream_flush(&stream); if (is_error) goto cleanup; diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h index 0d8ec967d..e49242807 100644 --- a/src/lib/core/errinj.h +++ b/src/lib/core/errinj.h @@ -153,7 +153,6 @@ struct errinj { _(ERRINJ_SNAP_COMMIT_FAIL, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_IPROTO_SINGLE_THREAD_STAT, ERRINJ_INT, {.iparam = -1}) \ _(ERRINJ_IPROTO_WRITE_ERROR_DELAY, ERRINJ_BOOL, {.bparam = false})\ - _(ERRINJ_IPROTO_WRITE_ERROR_LARGE, ERRINJ_INT, {.iparam = -1})\ _(ERRINJ_APPLIER_READ_TX_ROW_DELAY, ERRINJ_BOOL, {.bparam = false})\ ENUM0(errinj_id, ERRINJ_LIST); diff --git a/test/box/errinj.result b/test/box/errinj.result index e5193a3d3..44f86a54e 100644 --- a/test/box/errinj.result +++ b/test/box/errinj.result @@ -61,7 +61,6 @@ evals - ERRINJ_IPROTO_SINGLE_THREAD_STAT: -1 - ERRINJ_IPROTO_TX_DELAY: false - ERRINJ_IPROTO_WRITE_ERROR_DELAY: false - - ERRINJ_IPROTO_WRITE_ERROR_LARGE: -1 - ERRINJ_LOG_ROTATE: false - ERRINJ_MEMTX_DELAY_GC: false - ERRINJ_PORT_DUMP: false diff --git a/test/replication/gh-4040-invalid-msgpack.result b/test/replication/gh-4040-invalid-msgpack.result index 4348ebd90..def47ed3c 100644 --- a/test/replication/gh-4040-invalid-msgpack.result +++ b/test/replication/gh-4040-invalid-msgpack.result @@ -47,9 +47,6 @@ test_run:switch('replica') | - true | ... -box.cfg{log_level=6} - | --- - | ... sign = box.info.signature | --- | ... @@ -81,10 +78,6 @@ box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', true) | --- | - ok | ... -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', bufsize) - | --- - | - ok - | ... -- Generate enough data to fill the sendbuf. -- This will make the relay yield in the middle of writing a row waiting for the -- socket to become writeable. @@ -94,7 +87,7 @@ tbl = {1} filler = string.rep('b', 100) | --- | ... -for i = 2, 0.7 * bufsize / 100 do\ +for i = 2, bufsize / 100 + 1 do\ tbl[i] = filler\ end | --- @@ -136,10 +129,6 @@ box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', false) | --- | - ok | ... -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', -1) - | --- - | - ok - | ... test_run:switch('replica') | --- diff --git a/test/replication/gh-4040-invalid-msgpack.test.lua b/test/replication/gh-4040-invalid-msgpack.test.lua index 0e51affac..f7f1c3953 100644 --- a/test/replication/gh-4040-invalid-msgpack.test.lua +++ b/test/replication/gh-4040-invalid-msgpack.test.lua @@ -18,7 +18,6 @@ _ = box.space.test:create_index('pk') test_run:cmd('start server replica with args="1000"') test_run:switch('replica') -box.cfg{log_level=6} sign = box.info.signature box.error.injection.set('ERRINJ_APPLIER_READ_TX_ROW_DELAY', true) test_run:switch('master') @@ -31,13 +30,12 @@ bufsize = soc:getsockopt('SOL_SOCKET', 'SO_SNDBUF') require('log').info("SO_SNDBUF size is %d", bufsize) -- Master shouldn't try to write the error while the socket isn't writeable. box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', true) -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', bufsize) -- Generate enough data to fill the sendbuf. -- This will make the relay yield in the middle of writing a row waiting for the -- socket to become writeable. tbl = {1} filler = string.rep('b', 100) -for i = 2, 0.7 * bufsize / 100 do\ +for i = 2, bufsize / 100 + 1 do\ tbl[i] = filler\ end for i = 1,10 do\ @@ -57,7 +55,6 @@ test_run:wait_cond(function() return box.info.signature > sign end) test_run:switch('master') box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_DELAY', false) -box.error.injection.set('ERRINJ_IPROTO_WRITE_ERROR_LARGE', -1) test_run:switch('replica')