From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 90C2F6EC5F; Tue, 20 Apr 2021 13:38:49 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 90C2F6EC5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618915129; bh=Tpvw2gGOWppnYaG+vVTk32XW3lqp5y/OUziJDpodv6c=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=gaCSC+wAi9pRkLL5eRa4wBVASaihKlUidm7praJMH/g29OjlNDyprl2Qj6H5del/H nUYSMlYnPjzFq0vZDpETMIOGX0jyM0ucjtQNqoLfUOwL/CVNwVrTFxi/e0OnKJrfbX ERYElU4Bxhc/1bJJYnKItpZvCHG6Mhrp5/ezswus= Received: from smtp29.i.mail.ru (smtp29.i.mail.ru [94.100.177.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7535E6EC5F for ; Tue, 20 Apr 2021 13:38:48 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7535E6EC5F Received: by smtp29.i.mail.ru with esmtpa (envelope-from ) id 1lYnmI-0005Tj-Tt; Tue, 20 Apr 2021 13:38:47 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <5cbaefa9-078a-b00c-2aec-75cf01f732d4@tarantool.org> <83e7df81-078c-def7-1f73-8810676bf241@tarantool.org> <6e626b42-dddd-5ac0-3e0f-f2b92d3ac8fe@tarantool.org> <35351452-fbd8-926f-886b-8210ccb8f74e@tarantool.org> Message-ID: Date: Tue, 20 Apr 2021 13:38:46 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: <35351452-fbd8-926f-886b-8210ccb8f74e@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E7480B1C8842CE613979723F2FB4628545A35182A05F538085040364E6C3C825866B6FE5F58A00096A8A47C8D7D8E1B0BB8D47B77BF8FA758F5A3 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE73B2A9F8A35432468EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006374FBED2B83F5E00CA8638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B2B74BA3DAFB37810FD913BD9A4B5915A63C8827A907CBE77DD2E47CDBA5A96583C09775C1D3CA48CF62968DCAA3E4B45B117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE7A3E989B1926288338941B15DA834481F9449624AB7ADAF372E808ACE2090B5E14AD6D5ED66289B5259CC434672EE63711DD303D21008E298D5E8D9A59859A8B6B372FE9A2E580EFC725E5C173C3A84C3BE90F13D913F449135872C767BF85DA2F004C90652538430E4A6367B16DE6309 X-C1DE0DAB: 0D63561A33F958A50011667B9D4E6B061142C1548BEC8698378AEC19A994A576D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34EB7BD66E9101C100786C27D7ED70377458094D6B56DE2804E30696575943DA47E01237E767C39C2C1D7E09C32AA3244C862B43145F698354C798EAE8D370591BBBA718C7E6A9E042FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojlPRl29Bx4WEUKB71Dfrmbg== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A44602829E7FD6226451ECAB04E78BD3A16755870825DC946E45424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v4 13/12] replication: send accumulated Raft messages after relay start X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 20.04.2021 01:36, Vladislav Shpilevoy пишет: > Thanks for the patch! > > See 2 comments below. > >> diff --git a/src/box/relay.cc b/src/box/relay.cc >> index 7be33ee31..85f335cd7 100644 >> --- a/src/box/relay.cc >> +++ b/src/box/relay.cc >> @@ -628,13 +659,38 @@ struct relay_is_raft_enabled_msg { >>      bool is_finished; >>  }; >> >> +static void >> +relay_push_raft_msg(struct relay *relay, bool do_restart_recovery) > 1. Why is the recovery restart flag is ignored if a message is already > sent? This might lead to recovery restart loss if I am not mistaken. I think it's okay. As soon as the message is pushed from relay_push_raft() rather than from tx_set_is_raft_enabled(), we may freely restart the recovery. So, we only care whether do_restart_recovery is set when the message gets pushed in the same call. We don't care whether do_restart_recovery is set  or not when the call exits without pushing the message. The next call will have the correct value for do_restart_recovery anyway. Please see a more detailed explanation below. > >> +{ >> +    if (!relay->tx.is_raft_enabled || relay->tx.is_raft_push_sent) >> +        return; >> +    struct relay_raft_msg *msg = >> +        &relay->tx.raft_msgs[relay->tx.raft_ready_msg]; >> +    msg->do_restart_recovery = do_restart_recovery; >> +    cpipe_push(&relay->relay_pipe, &msg->base); >> +    relay->tx.raft_ready_msg = (relay->tx.raft_ready_msg + 1) % 2; >> +    relay->tx.is_raft_push_sent = true; >> +    relay->tx.is_raft_push_pending = false; >> +} >> + >>  /** TX thread part of the Raft flag setting, first hop. */ >>  static void >>  tx_set_is_raft_enabled(struct cmsg *base) >>  { >>      struct relay_is_raft_enabled_msg *msg = >>          (struct relay_is_raft_enabled_msg *)base; >> -    msg->relay->tx.is_raft_enabled = msg->value; >> +    struct relay *relay  = msg->relay; >> +    relay->tx.is_raft_enabled = msg->value; >> +    /* >> +     * Send saved raft message as soon as relay becomes operational. >> +     * Do not restart recovery upon the message arrival. Recovery is >> +     * positioned at replica_clock initially, i.e. already "restarted" and >> +     * restarting it once again would position it at the oldest xlog >> +     * possible, because relay reader hasn't received replica vclock yet. >> +     */ >> +    if (relay->tx.is_raft_push_pending) { >> +        relay_push_raft_msg(msg->relay, false); > 2. I don't understand. Why wasn't there such a problem before? Recovery > must be restarted when the node becomes a leader. If you do not restart > it, the data would be ignored by the replicas. How do you know it is > positioned right now at replica_clock? You are in tx thread, you can't > tell. What do I miss? This is because this `relay_push_raft_msg` is delivered before `relay_set_is_raft_enabled`. And both these messages get processed by the cbus_process() loop waiting for `relay_seet_is_raft_enabled`. This happens in relay_send_is_raft_enabled() even before the relay reader fiber is created, so recv_vclock is zero. Restarting recovery here would lead to it being reset to the first ever wal this instance has, which's wrong. Such a problem might've existed before, but was extremely hard to catch: relay_push_raft_msg() wasn't called until relay->tx.is_raft_enabled was set. And when tx.is_raft_enabled was set it most probably meant that relay_set_is_raft_enabled was already delivered and relay has exited this first cbus_process() loop, which worked before reader fiber creation. In order to solve the problem in some another way, I need to make relay_push_raft_msg() deliver the message to the second cbus_process() loop, the main one. And I couldn't come up with an idea how to do that. The message should be pushed right in tx_set_is_raft_enabled, and this means it'll get delivered before relay_set_is_raft_enabled. -- Serge Petrenko