From: Konstantin Osipov <kostja.osipov@gmail.com> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication Date: Thu, 23 Apr 2020 09:58:09 +0300 [thread overview] Message-ID: <20200423065809.GA4528@atlas> (raw) In-Reply-To: <dd4d703e-6918-ccd9-ac5e-76fb54fff0f9@tarantool.org> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/04/22 01:21]: > > To my understanding - it's up to user. I was considering a cluster that > > has no WAL at all - relying on sychro replication and sufficient number > > of replicas. Everyone who I asked about it told me I'm nuts. To my great > > surprise Alexander Lyapunov brought exactly the same idea to discuss. > > I didn't see an RFC on that, and this can become easily possible, when > in-memory relay is implemented. If it is implemented in a clean way. We > just can turn off the disk backoff, and it will work from memory-only. Sync replication must work from in-memory relay only. It works as a natural failure detector: a replica which is slow or unavailable is first removed from the subscribers of in-memory relay, and only then (possibly much much later) is marked as down. By looking at the in-memory relay you have a clear idea what peers are available and can abort a transaction if a cluster is in the downgraded state right away. You never wait for impossible events. If you do have to wait, and say your wait timeout is 1 second, you quickly run out of any fibers in the fiber pool for any work, because all of them will be waiting on the sync transactions they picked up from iproto to finish. The system will loose its throttling capability. There are other reasons, too: the protocol will eventually be quite tricky and the logic has to reside in a single place and not require inter-thread communication. Committing a transaction purely anywhere outside WAL will require inter-thread communication, which is costly and should be avoided. I am surprised I have to explain this again and again - I never assumed this spec is a precursor for a half-backed implementation, only as a high-level description of the next steps after in-memory relay is in. > > All of these is for one resolution: I would keep it for user to decide. > > Obviously, to speed up the processing leader can disable wal completely, > > but to do so we have to re-work the relay to work from memory. Replicas > > can use WAL in a way user wants: 2 replicas with slow HDD should'n wait > > for fsync(), while super-fast Intel DCPMM one can enable it. Balancing > > is up to user. > > Possibility of omitting fsync means that it is possible, that all nodes > write confirm, which is reported to the client, then the nodes restart, > and the data is lost. I would say it somewhere. Worse yet you can elect a leader "based on WAL length" and then it is no longer the leader, because it lost it long WAL after restart. fcync() is mandatory during election, in other cases it shouldn't impact durability in our case. -- Konstantin Osipov, Moscow, Russia
next prev parent reply other threads:[~2020-04-23 6:58 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-03 21:08 Sergey Ostanevich 2020-04-07 13:02 ` Aleksandr Lyapunov 2020-04-08 9:18 ` Sergey Ostanevich 2020-04-08 14:05 ` Konstantin Osipov 2020-04-08 15:06 ` Sergey Ostanevich 2020-04-14 12:58 ` Sergey Bronnikov 2020-04-14 14:43 ` Sergey Ostanevich 2020-04-15 11:09 ` sergos 2020-04-15 14:50 ` sergos 2020-04-16 7:13 ` Aleksandr Lyapunov 2020-04-17 10:10 ` Konstantin Osipov 2020-04-17 13:45 ` Sergey Ostanevich 2020-04-20 11:20 ` Serge Petrenko 2020-04-20 23:32 ` Vladislav Shpilevoy 2020-04-21 10:49 ` Sergey Ostanevich 2020-04-21 22:17 ` Vladislav Shpilevoy 2020-04-22 16:50 ` Sergey Ostanevich 2020-04-22 20:28 ` Vladislav Shpilevoy 2020-04-23 6:58 ` Konstantin Osipov [this message] 2020-04-23 9:14 ` Konstantin Osipov 2020-04-23 11:27 ` Sergey Ostanevich 2020-04-23 11:43 ` Konstantin Osipov 2020-04-23 15:11 ` Sergey Ostanevich 2020-04-23 20:39 ` Konstantin Osipov 2020-04-23 21:38 ` Vladislav Shpilevoy 2020-04-23 22:28 ` Konstantin Osipov 2020-04-30 14:50 ` Sergey Ostanevich 2020-05-06 8:52 ` Konstantin Osipov 2020-05-06 16:39 ` Sergey Ostanevich 2020-05-06 18:44 ` Konstantin Osipov 2020-05-12 15:55 ` Sergey Ostanevich 2020-05-12 16:42 ` Konstantin Osipov 2020-05-13 21:39 ` Vladislav Shpilevoy 2020-05-13 23:54 ` Konstantin Osipov 2020-05-14 20:38 ` Sergey Ostanevich 2020-05-20 20:59 ` Sergey Ostanevich 2020-05-25 23:41 ` Vladislav Shpilevoy 2020-05-27 21:17 ` Sergey Ostanevich 2020-06-09 16:19 ` Sergey Ostanevich 2020-06-11 15:17 ` Vladislav Shpilevoy 2020-06-12 20:31 ` Sergey Ostanevich 2020-05-13 21:36 ` Vladislav Shpilevoy 2020-05-13 23:45 ` Konstantin Osipov 2020-05-06 18:55 ` Konstantin Osipov 2020-05-06 19:10 ` Konstantin Osipov 2020-05-12 16:03 ` Sergey Ostanevich 2020-05-13 21:42 ` Vladislav Shpilevoy 2020-05-14 0:05 ` Konstantin Osipov 2020-05-07 23:01 ` Konstantin Osipov 2020-05-12 16:40 ` Sergey Ostanevich 2020-05-12 17:47 ` Konstantin Osipov 2020-05-13 21:34 ` Vladislav Shpilevoy 2020-05-13 23:31 ` Konstantin Osipov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200423065809.GA4528@atlas \ --to=kostja.osipov@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox