From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 2B0A66EC6F; Fri, 12 Feb 2021 14:25:45 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2B0A66EC6F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1613129145; bh=v0ZDEOWTLdfn0DiqLN00n95f+p6xWyKldSZShmZjq+c=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Nw6xRPpNW1EQ66h0Zmm0oCvvNjy4h2sNTp6g8wRlfQr0wbXi79Jw6zp0+F9alW0qN Snb98sKEZb1ks6sjXwRZ9eiXffGT8gC+N2KZM+sD+JYdwgnJ2KcegMVShgp8BENEZ4 bAu0HhQN/ZkIgm6hakIAxTqVTSHBcT/yAgB2OrUo= Received: from smtp45.i.mail.ru (smtp45.i.mail.ru [94.100.177.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id F36B66EC6F for ; Fri, 12 Feb 2021 14:25:43 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F36B66EC6F Received: by smtp45.i.mail.ru with esmtpa (envelope-from ) id 1lAWZz-0005WV-33; Fri, 12 Feb 2021 14:25:43 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Fri, 12 Feb 2021 14:25:41 +0300 Message-Id: <20210212112541.27561-1-sergepetrenko@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD981647AC6901E234B9CF37342021E12F6030BFD79CD0BBE24182A05F538085040A3F078EBB257ABC004CB3FD81F5D3C13F726647F80D8134B6CC3E9B7E40BF814 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7D950999244A4B2E6EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637D82ED515D6052E03EA1F7E6F0F101C674E70A05D1297E1BBC6CDE5D1141D2B1C8A609095B4CEBA383DB1D7D9EE68C8D4AE85AE735F23385D9FA2833FD35BB23D9E625A9149C048EE140C956E756FBB7A618001F51B5FD3F9D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BCC99A7B061B94ED7A471835C12D1D977C4224003CC836476EC64975D915A344093EC92FD9297F6718AA50765F7900637B7457ED07BBA7E3EA7F4EDE966BC389F395957E7521B51C24C7702A67D5C33162DBA43225CD8A89F72BE6798D60363525E1C53F199C2BB95B5C8C57E37DE458B4C7702A67D5C3316FA3894348FB808DB48C21F01D89DB561574AF45C6390F7469DAA53EE0834AAEE X-B7AD71C0: 2623F767319EFA42AC98609FCEE262F9192335DD689A58EBAE0174B7F1092AFB6832F36CAB5E165883D1E6FFEE37A126 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4B5F30041107ECCAEB459AD25B48836B76E15E3BE9FD72F9AC6CDE5D1141D2B1C8A609095B4CEBA383DB1D7D9EE68C8D474954122BBB8CDF6AD91A466A1DEF99B296C473AB1E14218C0352516E9094FE26DABF04D5057A81F728CF7B057D10C7086A7C529F68B8E5CB936CB490224F2464EEA7BD89490CAC0EDDA962BC3F61961 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D341E2D05735FCBECD18FD716A8D117731A3159A38FCA43EC31D3035A8419C9D035953A8F4F9E3973661D7E09C32AA3244CC569578C1A30BD10D0A1FD436338EED86C24832127668422927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojtEL/kbq1PXHkAKTIKZPoGg== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446F0529EFC192AF9F26212477D99E67A925BE98B5A3CA3905C424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" While sending a WAL, relay only yields in `coio_write_xrow`, once it sees the socket isn't ready for writes. It may happen that the socket is always ready for a long period of time, and relay doesn't yield at all while recovering a whole .xlog file. This may take well more than a minute. During this period of time, relay doesn't read replica's ACKs due to relay reader fiber not being scheduled, and once the reader is finally live it times out immediately, causing the replica to reconnect. The problem is amplified by the fact that replica waits for replication_timeout to pass prior to reconnecting, which lets master pile up even more ready WALs, and effectively making it impossible for the replica to sync. To fix the problem let's yield explicitly in relay_send_row every WAL_ROWS_PER_YIELD rows. The same is already done in local recovery, and serves the same purpose: to not block the event loop for too long. Closes #5762 --- No test provided as the fix is quite obvious but rather hard to test automatically. https://github.com/tarantool/tarantool/issues/5762 https://github.com/tarantool/tarantool/tree/sp/gh-5762-relay-yield src/box/relay.cc | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/box/relay.cc b/src/box/relay.cc index df04f8198..afc57dfbc 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet) { ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY); + static uint64_t row_cnt = 0; packet->sync = relay->sync; relay->last_row_time = ev_monotonic_now(loop()); coio_write_xrow(&relay->io, packet); fiber_gc(); + /* + * It may happen that the socket is always ready for write, so yield + * explicitly every now and then to not block the event loop. + */ + row_cnt++; + if (row_cnt % WAL_ROWS_PER_YIELD == 0) { + fiber_sleep(0); + } struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE); if (inj != NULL && inj->dparam > 0) fiber_sleep(inj->dparam); -- 2.24.3 (Apple Git-128)