From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A430E6FC86; Wed, 24 Mar 2021 15:24:50 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A430E6FC86 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1616588690; bh=HoCbV6UZSLbXFFYRB5eNcxcfkte0VcCInqWPa3gehDo=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=UqIgTbKE31VV316lveSAeV4+ZHqeg8axfbUZAg6vnmsDURs/bs7c5TPckTjH0N3v1 VBbGCy0ytqK79om0ZIzrEs8zAqdxh6m8CbYgVPGlqPBF3Mj4JNGtHpfLWVyv92tZlo Dm/NIeLi+dYah0EFRSDH33g87ENeaMUuimk+hr/4= Received: from smtp36.i.mail.ru (smtp36.i.mail.ru [94.100.177.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B4B796FC86 for ; Wed, 24 Mar 2021 15:24:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B4B796FC86 Received: by smtp36.i.mail.ru with esmtpa (envelope-from ) id 1lP2Ye-0004oK-NV; Wed, 24 Mar 2021 15:24:21 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Wed, 24 Mar 2021 15:24:11 +0300 Message-Id: <12bf66a77b755eaadc09665ede9fbcde0516a7a4.1616588119.git.sergepetrenko@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD95D6E7CC48CB1F5F111116110879554077ED795A4341E98F3182A05F538085040E3AB360632B7D2D981B1B65901EE4F0F7CF54D125D46406AC0DF7CBCE47CC699 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE79961E86438F5BDAEEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063748424D8FCCA3295D8638F802B75D45FF914D58D5BE9E6BC131B5C99E7648C95C5DD32608FC869F5D1C575EC36B1518A81D2C97C00F792DEEA471835C12D1D9774AD6D5ED66289B5278DA827A17800CE77FFCE1C639F4728C9FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C33F395433950BAD63117882F4460429728AD0CFFFB425014E868A13BD56FB6657D81D268191BDAD3DC09775C1D3CA48CF461A96A57DA09AA9BA3038C0950A5D36C8A9BA7A39EFB766EC990983EF5C0329BA3038C0950A5D36D5E8D9A59859A8B62D71FFF22C4DB62276E601842F6C81A1F004C906525384307823802FF610243DF43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B3BBE47FD9DD3FB59A8DF7F3B2552694A2BEBFE083D3B9BA73A03B725D353964BB11811A4A51E3B096D1867E19FE14079BA9C0B312567BB23089D37D7C0E48F6CA18204E546F3947CF841BD135A60A31742539A7722CA490CC8A9BA7A39EFB7666BA297DBC24807EA089D37D7C0E48F6C8AA50765F790063749630363B3EBCB7CEFF80C71ABB335746BA297DBC24807EA27F269C8F02392CD20465B3A5AADEC6827F269C8F02392CD5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24A6D60772A99906F8E1CD14B953EB46D4DA43CB839DEBB84355D89D7DBCDD132 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975C5DD32608FC869F5D1C575EC36B1518A81D2C97C00F792DEE9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF0417BEADF48D1460699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D340A4C04F5DECA7EE9BD1B18B16941D434D7E25F58F07BA523902621773F03404D2379BC1830CF7C321D7E09C32AA3244C0596282A7F21CB1E3B6EC4242EB1F88FF2F5F14F68F1805B927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojbL9S8ysBdXjhXhGERY7yLwmxPqb/0Eiy X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F86C523D0B3819F67E9C039C1CDB583451B44F983462AA505D424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Since the introduction of synchronous replication it became possible for final join to fail on master side due to not being able to gather acks for some tx around _cluster registration. A replica receives an error in this case: either ER_SYNC_ROLLBACK or ER_SYNC_QUORUM_TIMEOUT. The errors lead to applier retrying final join, but with wrong state, APPLIER_REGISTER, which should be used only on an anonymous replica. This lead to a hang in fiber executing box.cfg, because it waited for APPLIER_JOINED state, which was never entered. Part-of #5566 --- src/box/applier.cc | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 5a88a013e..326cf18d2 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -551,7 +551,7 @@ applier_wait_register(struct applier *applier, uint64_t row_count) } static void -applier_register(struct applier *applier) +applier_register(struct applier *applier, bool was_anon) { /* Send REGISTER request */ struct ev_io *coio = &applier->io; @@ -566,9 +566,16 @@ applier_register(struct applier *applier) row.type = IPROTO_REGISTER; coio_write_xrow(coio, &row); - applier_set_state(applier, APPLIER_REGISTER); + /* + * Register may serve as a retry for final join. Set corresponding + * states to unblock anyone who's waiting for final join to start or + * end. + */ + applier_set_state(applier, was_anon ? APPLIER_REGISTER : + APPLIER_FINAL_JOIN); applier_wait_register(applier, 0); - applier_set_state(applier, APPLIER_REGISTERED); + applier_set_state(applier, was_anon ? APPLIER_REGISTERED : + APPLIER_JOINED); applier_set_state(applier, APPLIER_READY); } @@ -1303,6 +1310,14 @@ applier_f(va_list ap) return -1; session_set_type(session, SESSION_TYPE_APPLIER); + /* + * The instance saves replication_anon value on bootstrap. + * If a freshly started instance sees it has received + * REPLICASET_UUID but hasn't yet registered, it must be an + * anonymous replica, hence the default value 'true'. + */ + bool was_anon = true; + /* Re-connect loop */ while (!fiber_is_cancelled()) { try { @@ -1316,6 +1331,7 @@ applier_f(va_list ap) * The join will pause the applier * until WAL is created. */ + was_anon = replication_anon; if (replication_anon) applier_fetch_snapshot(applier); else @@ -1324,11 +1340,10 @@ applier_f(va_list ap) if (instance_id == REPLICA_ID_NIL && !replication_anon) { /* - * The instance transitioned - * from anonymous. Register it - * now. + * The instance transitioned from anonymous or + * is retrying final join. */ - applier_register(applier); + applier_register(applier, was_anon); } applier_subscribe(applier); /* -- 2.24.3 (Apple Git-128)