From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 9602A6EC6F; Sat, 13 Feb 2021 00:48:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 9602A6EC6F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1613166534; bh=9eJXE0PkoiUFAVLnM8OhDpNmV760Cq8nwYRf1SmlvmE=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=aXggtlQizJFK5v2Ip/3wGB4s9vRxW12fMTMJYZI4HtmdOGVbApOf64hROioUM7mcF DXdnra3xlZm8t9r0SKKlmEEO5f+7BiuhkciaWTCSm9Mu3nf0uRSjd0R4WUs0/qEAxk yqRYAdH1SjuChecOXIfCSMZtlFvlmaCYiIm06SMQ= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id F05896EC6F for ; Sat, 13 Feb 2021 00:48:52 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F05896EC6F Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lAgJ0-000883-Gj; Sat, 13 Feb 2021 00:48:50 +0300 To: Serge Petrenko , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <20210212112541.27561-1-sergepetrenko@tarantool.org> Message-ID: Date: Fri, 12 Feb 2021 22:48:49 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <20210212112541.27561-1-sergepetrenko@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD981647AC6901E234BABE7DAC0FE827E3F083D786191B611F8182A05F538085040A4263FB43BAE1F6D9F4101FAC83FA4BF76A1E2D92DC4CF8172CE79A5BD6A508F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE72407438AC6002944EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006374DF0C582D42FCA168638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC8FB81052D80F2D743FD573287D10D7F2876134A10C08A088389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0C26CFBAC0749D213D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BC908CD1B87A134A2A471835C12D1D977C4224003CC836476EC64975D915A344093EC92FD9297F6718AA50765F7900637870CFFD37CCFDD3AA7F4EDE966BC389F395957E7521B51C24C7702A67D5C33162DBA43225CD8A89F0A35B161A8BF67C15E1C53F199C2BB95B5C8C57E37DE458B4C7702A67D5C3316FA3894348FB808DBAF038BB36E94EA6B574AF45C6390F7469DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A5E670C65732E20B17A78FDF2F06D46CBB8080EBA4D8EC9444D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75448CF9D3A7B2C848410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D346F8291983715AC66BAE385E001BB5FBF9BDB454F7F588C071A93FB68DC608F194809ED5F5D06601A1D7E09C32AA3244C7C5E96707341A7176E839A272D27F2321E098CBE561D6343FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojtEL/kbq1PXGDYz5pHw3cRA== X-Mailru-Sender: 689FA8AB762F73936BC43F508A0638229997783427FB885B7990D8E51A7EF80F3841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! On 12.02.2021 12:25, Serge Petrenko via Tarantool-patches wrote: > While sending a WAL, relay only yields in `coio_write_xrow`, once it > sees the socket isn't ready for writes. > It may happen that the socket is always ready for a long period of time, > and relay doesn't yield at all while recovering a whole .xlog file. This > may take well more than a minute. > During this period of time, relay doesn't read replica's ACKs due to > relay reader fiber not being scheduled, and once the reader is finally > live it times out immediately, causing the replica to reconnect. > > The problem is amplified by the fact that replica waits for > replication_timeout to pass prior to reconnecting, which lets master > pile up even more ready WALs, and effectively making it impossible for > the replica to sync. I couldn't understand this part. Why is it bad? Yeah, replica waits, but replica is applier, on another instance. How is it related? And relay_reader does not send anything. So why is it bad? Couldn't the problem be fixed by reading all the non-consumed data after reading WAL? The current solution also looks fine. Maybe even better because it becomes consistent with local recovery. However I still want to understand this part about replica. > To fix the problem let's yield explicitly in relay_send_row every > WAL_ROWS_PER_YIELD rows. The same is already done in local recovery, and > serves the same purpose: to not block the event loop for too long. > > Closes #5762 > --- > diff --git a/src/box/relay.cc b/src/box/relay.cc > index df04f8198..afc57dfbc 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet) > { > ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY); > > + static uint64_t row_cnt = 0; Relays are in threads. So this variable either should be thread-local, or be in struct relay. Otherwise you get non-atomic updates which may lead to some increments disappearing. Given that thread-local variable access is not free, I would go for having it in struct relay, but up to you. > packet->sync = relay->sync; > relay->last_row_time = ev_monotonic_now(loop()); > coio_write_xrow(&relay->io, packet); > fiber_gc(); > > + /* > + * It may happen that the socket is always ready for write, so yield > + * explicitly every now and then to not block the event loop. > + */ > + row_cnt++; > + if (row_cnt % WAL_ROWS_PER_YIELD == 0) { > + fiber_sleep(0); > + } Maybe better drop {} as the if's body is just one line.