From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [94.100.177.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2105E4765E9 for ; Wed, 23 Dec 2020 15:01:24 +0300 (MSK) From: Serge Petrenko References: <3600a9d1e06d6c1c8cda41dda88cd0dd088ca3e2.1607633488.git.sergepetrenko@tarantool.org> <5b466891-65b4-ad9c-253e-156869100959@tarantool.org> Message-ID: <0dd72d36-9b82-96d1-ceac-023c160db59a@tarantool.org> Date: Wed, 23 Dec 2020 15:01:23 +0300 MIME-Version: 1.0 In-Reply-To: <5b466891-65b4-ad9c-253e-156869100959@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH 2/4] relay: rename is_raft_enabled message to relay_is_running List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , Cyrill Gorcunov Cc: tarantool-patches@dev.tarantool.org 18.12.2020 00:43, Vladislav Shpilevoy пишет: > Thanks for the patch! > > I have an idea how to do this all a bit easier probably. But could > take a few more commits. > > AFAIU, here is what we need, looking from above. > > We need box_clear_synchro_queue() wait until quorum of replicas got > the changes of the instance owning the limbo. > > Limbo already collects the LSNs on the node owning the limbo, through > txn_limbo_ack(). This function populates a vclock object with versions > of the owner LSN how it is seen on the replicas. > > If we would collect LSNs into that vclock on the replica nodes too, it > would be quite simple to check if a quorum is reached. Just do the > same what txn_limbo_assign_local_lsn() does - scan vclock and check if > there is a quorum. > If there is no a quorum, the replica must wait for it. We can add a > trigger to the limbo (on_ack()) or directly to tx_status_update(), > and wait for the moment when the quorum is collected. Thanks for your comments! Please see v2 of this patch. Below are my answers. Which are mostly of historical interest now, since a lot is different in the new patch. Yes, your proposal looks fine and it's rather easy to implement. I didn't look at the limbo code much and simply decided that it'd be too complex to rewrite txn_limbo_ack() that way. Looks like it's not that hard, after all. > Another option - copy limbo's vclock on the stack in box_clear_synchro_queue(), > check current number of confirmed replicas, then hang a trigger on > tx_status_update(), and wait. Even simpler than the first option. > The reason why I don't like or don't get yet (maybe it is good?) the > approach through the relays is that firstly I want is_raft_enabled be > deleted instead of extended. It is an ugly crutch, which probably > should be removed by this: > https://github.com/tarantool/tarantool/issues/5438. Ok, I see. I think though the mechanism itself is not that bad. Maybe we can remove the is_raft_enabled mesage, which is really a crutch, but leave the tx notification mechanism. It looks useful to know when the relay is operational and when it's not. What do you think? P.S. I ended up hanging a trigger on tx_status_update, and make this trigger perform roughly the same work it did in the first version of the patch. > Secondly, any changes in relay not related to just reading and writing > network scare shit out of me. It is a complex thing running in a > separate thread, so better avoid adding any more complex logic to it. > It should be as dumb as possible. Until all relays are moved to one > thread (WAL) with network and disk IO served through worker threads. > What I see in the last commit now looks extremely complex. Ok, I see your point. > This is a discussion proposal. I may be wrong here and miss something. > > On 10.12.2020 21:55, Serge Petrenko via Tarantool-patches wrote: >> Relay sends a raft enabler/disabler message on start and stop. This >> message would also be useful to perform some additional work related to relay >> initialization/deinitialization done in the tx thread. >> >> For example, the message will be used to notify tx thread that there's no >> point in wating until a relay replicates a specific row, since the relay >> is exiting. This is the case in #5435. > Why do we need such a notification? Why can't we let box_clear_synchro_queue() > simply wait until timeout? What if the timeout is 10 sec, and we gave up > immediately because there was no point, but in 5 sec the relays would start, > deliver everything, and the queue could be cleared? As far as I understood, we need to wait for quorum infinitely, not for some timeout. However, it's easy to add such a timeout. As I understand once the node reaches timeout it stays in leader state (in case raft is enabled), but remains read-only? In order to make it operational again the user has to call box.ctl_clear_synchro_queue() manually. Is this correct? > P.S. I read the last commit and now I see that you don't wait for timeout. > But still you want infinitely. So does it matter if a relay has exited or > not anyway? We need the relay to notify tx thread when it's exiting to destroy the watchers. Here's what I do. I do wait infinitely, but only  when it makes sense: i.e. I start a watcher for every connected relay. While there are enough running watchers to gather quorum, I sleep and get waken up by each firing watcher. The relay exit notification is needed to cease waiting once there's not enough connected replicas anymore. Every time a relay thread exits the fiber executing clear_synchro_queue gets notified and sees that the watcher count has decreased. -- Serge Petrenko