From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 3A27A6EC5F; Sun, 18 Apr 2021 15:00:47 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3A27A6EC5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618747247; bh=5kFFyrBdDgboULb0oyOcxE55zHSGZ+Sp93DziAseEZg=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=X0oDycnpMJsE3Xp15oL7R4gzQ9yPYd2OpCZElzOSXFt1LYkWWKiOedM19iqrxr+5/ ADY1q7JwCrDAShOise27dgtyz4hdVyzIwvwLACxlnKPi3mhmygSbDCMVLNcrK8+J30 t5VqjUKa9rPrc658yxN7siTKxY4FpVqvGzvfaI0E= Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 429946EC5F for ; Sun, 18 Apr 2021 15:00:45 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 429946EC5F Received: by smtp63.i.mail.ru with esmtpa (envelope-from ) id 1lY66W-0008A3-91; Sun, 18 Apr 2021 15:00:44 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: Message-ID: <5cbaefa9-078a-b00c-2aec-75cf01f732d4@tarantool.org> Date: Sun, 18 Apr 2021 15:00:43 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E7480257C85EA0BB7A95D0F00AE41BB9A5343182A05F5380850409549217E5FEE993B3EDB66458FA57EF48271640BDB04210D7EC0A71F17971616 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE73AA63C5F29446501EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370EBB47D88F71BB738638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B23FF69AD89AEAEEC760CE7FE3029224A2C26CFBAC0749D213D2E47CDBA5A96583C09775C1D3CA48CF4C82C86BFC697D19117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE73B0385442E67878B9FA2833FD35BB23DF004C90652538430302FCEF25BFAB3454AD6D5ED66289B5278DA827A17800CE755ECEFFCCED56925D32BA5DBAC0009BE395957E7521B51C20BC6067A898B09E4090A508E0FED6299176DF2183F8FC7C0C0837EA9F3D19764C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B2303E78B907142AC75ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2AD77751E876CB595E8F7B195E1C978316066CDAC963FB4CD45A0E006394C2ED0 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8183A4AFAF3EA6BDC44671AA518CC42EA905C2F678D3BD5F5292EE39E5489DED681910C8FB7761C9A899C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF0417BEADF48D1460699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D348E5EF936B2E46EBAB8844C6D883F794806FD7715AB1B367BED782789A2808AEBE092AFC30361762E1D7E09C32AA3244C65C2FB5D8DB2C6E6E2640C7C2D5F1C1433C9DC155518937FFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj1t4H7vLuVFX6V4ECbTV4Pw== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F8BA32598198C2B8C3FACE9F93001C8274C0763AB687228CE0424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v4 13/12] replication: send accumulated Raft messages after relay start X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" It may happen that a Raft leader fails to send a broadcast to the freshly connected follower. Here's what happens: a follower subscribes to a candidate during on-going elections. box_process_subscribe() sends out current node's Raft state, which's candidate. Suppose a relay from follower to candidate is already set up. Follower immediately responds to the vote request. This makes the candidate become leader. But candidate's relay is not yet ready to process Raft messages, and is_leader message from the candidate gets rejected. Once relay starts, it relays all the xlogs, but the follower rejects all the data, because it hasn't received is_leader notification from the candidate. Fix this by sending the last rejected message as soon as relay starts dispatching Raft messages. Follow-up #5445 --- Hey, guys, take a look please. This fixes flaky replication/gh-5445-leader-inconsistency and should probably fix replication/election_qsync_stress as well.  src/box/relay.cc | 79 ++++++++++++++++++++++++++++++++++++++----------  1 file changed, 63 insertions(+), 16 deletions(-) diff --git a/src/box/relay.cc b/src/box/relay.cc index 7be33ee31..9fdd02bc1 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -160,6 +160,16 @@ struct relay {           * anonymous replica, for example.           */          bool is_raft_enabled; +        /** Is set to true by the first Raft broadcast which comes while +         * the relay is not yet ready to dispatch Raft messages. +         */ +        bool has_pending_broadcast; +        /** +         * A Raft broadcast which should be pushed once relay notifies +         * tx it needs Raft updates. Otherwise this message would be +         * lost until some new Raft event happens. +         */ +        struct raft_request pending_broadcast;      } tx;  }; @@ -626,6 +636,10 @@ struct relay_is_raft_enabled_msg {      bool value;      /** Flag to wait for the flag being set, in a relay thread. */      bool is_finished; +    /** Whether this message carries a pending raft broadcast to relay. */ +    bool has_pending_broadcast; +    /** The raft request relay should send upon this message's return. */ +    struct raft_request req;  };  /** TX thread part of the Raft flag setting, first hop. */ @@ -635,14 +649,28 @@ tx_set_is_raft_enabled(struct cmsg *base)      struct relay_is_raft_enabled_msg *msg =          (struct relay_is_raft_enabled_msg *)base;      msg->relay->tx.is_raft_enabled = msg->value; +    if (msg->relay->tx.has_pending_broadcast) { +        msg->has_pending_broadcast = true; +        msg->req = msg->relay->tx.pending_broadcast; +    }  } +static void +relay_send_raft(struct relay *relay, struct raft_request *req); +  /** Relay thread part of the Raft flag setting, second hop. */  static void  relay_set_is_raft_enabled(struct cmsg *base)  {      struct relay_is_raft_enabled_msg *msg =          (struct relay_is_raft_enabled_msg *)base; +    /* +     * There might have been some pending Raft broadcasts. Send the last of +     * them as soon as relay is set up. +     */ +    if (msg->has_pending_broadcast) +        relay_send_raft(msg->relay, &msg->req); +      msg->is_finished = true;  } @@ -938,25 +966,41 @@ struct relay_raft_msg {      struct relay *relay;  }; +/** + * Send a Raft message to the peer. This is done asynchronously, out of scope + * of recover_remaining_wals loop. + */  static void -relay_raft_msg_push(struct cmsg *base) +relay_send_raft(struct relay *relay, struct raft_request *req)  { -    struct relay_raft_msg *msg = (struct relay_raft_msg *)base;      struct xrow_header row; -    xrow_encode_raft(&row, &fiber()->gc, &msg->req); +    xrow_encode_raft(&row, &fiber()->gc, req);      try { -        /* -         * Send the message before restarting the recovery. Otherwise -         * all the rows would be sent from under a non-leader role and -         * would be ignored again. -         */ -        relay_send(msg->relay, &row); -        if (msg->req.state == RAFT_STATE_LEADER) -            relay_restart_recovery(msg->relay); +        relay_send(relay, &row);      } catch (Exception *e) { -        relay_set_error(msg->relay, e); +        relay_set_error(relay, e);          fiber_cancel(fiber());      } +} + +static void +relay_raft_msg_push(struct cmsg *base) +{ +    struct relay_raft_msg *msg = (struct relay_raft_msg *)base; +    /* +     * Send the message before restarting the recovery. Otherwise +     * all the rows would be sent from under a non-leader role and +     * would be ignored again. +     */ +    relay_send_raft(msg->relay, &msg->req); +    if (msg->req.state == RAFT_STATE_LEADER) { +        try { +            relay_restart_recovery(msg->relay); +        } catch (Exception *e) { +            relay_set_error(msg->relay, e); +            fiber_cancel(fiber()); +        } +    }      free(msg);  } @@ -964,12 +1008,15 @@ void  relay_push_raft(struct relay *relay, const struct raft_request *req)  {      /* -     * Raft updates don't stack. They are thrown away if can't be pushed -     * now. This is fine, as long as relay's live much longer that the -     * timeouts in Raft are set. +     * Remember the latest Raft update. It might be a notification that +     * this node is a leader. If sometime later we find out this node needs +     * Raft updates, we need to send is_leader notification.       */ -    if (!relay->tx.is_raft_enabled) +    if (!relay->tx.is_raft_enabled) { +        relay->tx.has_pending_broadcast = true; +        relay->tx.pending_broadcast = *req;          return; +    }      /*       * XXX: the message should be preallocated. It should       * work like Kharon in IProto. Relay should have 2 raft -- 2.24.3 (Apple Git-128)