From: Konstantin Osipov <kostja.osipov@gmail.com> To: Georgy Kirichenko <georgy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] Trigger on vclock change Date: Fri, 15 Nov 2019 04:57:45 +0300 [thread overview] Message-ID: <20191115015745.GA23299@atlas> (raw) In-Reply-To: <27617293.E4uLSYyink@home.lan> * Georgy Kirichenko <georgy@tarantool.org> [19/11/15 04:33]: > On Thursday, November 14, 2019 10:48:06 PM MSK Konstantin Osipov wrote: > > * Georgy Kirichenko <georgy@tarantool.org> [19/11/14 22:42]: > > > A replica state is described by 2 vclocks - written and committed ones. > > > Right now it is not an issue to report them both as an applier submits > > > transaction asynchronously. In addition to these two vclocks (yes, the > > > both could be transferred from the WAL thread) applier will report a > > > reject vclock - the vclock where applying breaks, and this could be done > > > from TX. I do not like the idea to split transmission between 2 threads. > > > The write and reject vclocks are used to evaluate majority whereas commit > > > vclock instructs a whole cluster that majority was already reached. The > > > main point is that any replica member could commit a transaction - this > > > relaxes RAFT limitations and increases the whole cluster durability (and > > > it is simpler in design and implementation, really). Also the new > > > synchronous replication design has a lot of advantages in comparison with > > > RAFT but let us discuss it in another thread. If you interested please > > > ask for details as I have not enough time to write public document right > > > now. > > > Returning to the subject, I would like to conclude that wal on_commit and > > > on_write triggers are good source to initiate status transmission. And the > > > trigger implemented by Maria will be replaced by replica on_commit which > > > allows us not to change anything at higher levels. > > > > Congratulations, Georgy, maybe you even get a Turing award for > > inventing a new protocol. > > > > Wait... they don't give a Turing award for "protocols" which have > > no proof and yield inconsistent results, or do they? > You do not even know details of the protocol but make such suggestion, so I > could only repeat your last statement: "what a shame", seriously. > Please, remember all my attempts to discuss it with you or, for instance, our > one-per-2-week meetings which all (except the first one) were skipped by you. If you want to discuss anything with me, feel free to reach out. I am following the process as I believe it should work in a distributed open source project: before there is a change, there is a design document on which everyone can equally comment. > > Meanwhile, if you have a design in mind, you could send an RFC. I > > will respond to the RFC. > Anybody could see the design document after this protocol research will be > done. Yes, the research requires to be implemented first. You don't need to waste time on implementation. Your approach, just by the description of it, is neither consistent, nor durable: - if you allow active-active, you can have lost writes. Here's a simple example: box.begin() local a = box.space.t.select{1}[2] box.space.t:replace{1, a+1} box.commit() By running this transaction concurrently on two masters, you will get lost writes. RAFT would not let that happen. But let's imagine for a second that this is not an issue. Your proposal is missing the critical parts of RAFT: neutralizing old leaders and completing transactions upon leader failure - i.e. when the new leader commits writes accepting by the majority and rolls back the rest, on behalf of the deceased. Imagine one of your active replica fails midway: - it can fail after a record is written to wal by one of the peers - it can fail after a record is written to wal by the majority of the peers, bu - it can fail after a record is committed by one of the peers, but not all. Who and how is going to repair these replicas upon master failure? You just threw RAFT "longest log wins" principle into a garbage bin, so you would never be able to identify which specific transactions need repair, on which replica, and what this repair should do. Needless to say that you haven't stopped transaction processing on these replicas, so even if you knew which specific transactions needed completion and on which replica, the data they modify could be easily overwritten by the time you get to finish these transactions. As to your suggestion to track commit/wal write vclock in tx thread, well, this has fortunately nothing to do with correctness, but has all to do with efficiency and performance. There was a plan to move out the applier to iproto thread from tx, and you even wrote a patch, which wasn't finished like many others, because you never addressed the review comments. Now you chose to go in the opposite direction by throwing more logic to the tx thread, adding to the scalability bottleneck of the single-threaded architecture. We discussed that before - but somehow it slips from your mind each time. Meanwhile, Vlad's review of your patch for in-memory WAL is not addressed. You could complain that my reviews are too harsh and asking too much, but this is Vlad's... -- Konstantin Osipov, Moscow, Russia
next prev parent reply other threads:[~2019-11-15 1:57 UTC|newest] Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-14 12:57 Maria 2019-11-14 13:44 ` Konstantin Osipov 2019-11-14 14:06 ` Georgy Kirichenko 2019-11-14 15:26 ` Konstantin Osipov 2019-11-14 17:13 ` Georgy Kirichenko 2019-11-14 17:33 ` Konstantin Osipov 2019-11-14 19:16 ` Georgy Kirichenko 2019-11-14 19:48 ` Konstantin Osipov 2019-11-14 20:01 ` Georgy Kirichenko 2019-11-15 1:57 ` Konstantin Osipov [this message] 2019-11-15 6:02 ` Georgy Kirichenko 2019-11-15 13:57 ` Konstantin Osipov 2019-11-15 19:57 ` Georgy Kirichenko 2019-11-16 10:37 ` Konstantin Osipov 2019-11-16 20:43 ` Georgy Kirichenko 2019-11-16 11:56 ` Konstantin Osipov 2019-11-16 20:34 ` Georgy Kirichenko 2019-11-18 9:31 ` Konstantin Osipov 2020-06-02 12:22 ` Maria Khaydich 2020-06-03 10:12 ` Sergey Ostanevich 2020-06-03 12:08 ` Alexander Turenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191115015745.GA23299@atlas \ --to=kostja.osipov@gmail.com \ --cc=georgy@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] Trigger on vclock change' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox