From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9011D452566 for ; Thu, 14 Nov 2019 20:33:40 +0300 (MSK) Received: by mail-lf1-f67.google.com with SMTP id q28so5731941lfa.5 for ; Thu, 14 Nov 2019 09:33:40 -0800 (PST) Date: Thu, 14 Nov 2019 20:33:38 +0300 From: Konstantin Osipov Message-ID: <20191114173338.GC17735@atlas> References: <20191114125705.26760-1-maria.khaydich@tarantool.org> <13232148.CanJsBF7IP@home.lan> <20191114152656.GA12369@atlas> <9545327.LWTAaRDJnC@home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9545327.LWTAaRDJnC@home.lan> Subject: Re: [Tarantool-patches] [PATCH] Trigger on vclock change List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Georgy Kirichenko Cc: tarantool-patches@dev.tarantool.org * Georgy Kirichenko [19/11/14 20:14]: > > I also think it it's a pair of server_id, lsn, rather than entire > > vclock - usually you know what you're waiting for, and it's only > > one component of vclock, not all of them. > But there are some issues > 1. what if we wish to have a timeout > 2. what if lsn are waited in non-strictly increasing order > 3. what if awaiting fiber is canceled > The approach you suggested looks for me like a bike-shed trigger > implementation but the implementation is limited to use only for wait for lsn. > So I would like to propose to ask Alexander Tikhonov to provide us with a > benchmark result first and then make a conclusion about performance impact. Maybe you're right. But isn't the entire idea of wait_lsn() bike-shed, as you put it, because we don't have sync replication? > > > Anyway, we will need to have such trigger in order to make applier able to > > > report local replica wal and commited vclock in scope of synchronous > > > replication issue. > > > > This has to happen in WAL thread, not in main thread, and has to > > watch relay-from-memory vclock, not async-replication vclock. And > > it also needs to roll back the transaction locally on failure, > > i.e. write some sort of undo records to the WAL. > This will work in an applier which lives in the TX cord, as an applier > processes incoming transactions through the TX. And an applier should be able > to answer with two vclocks - committed and written ones. Yes, WAL will batch > such vclocks updates but this is still of hundreds of events per second. > Unfortunately there is no point to move an applier to the WAL thread because a > transaction could not be validated without TX. OK, now I get it where you're heading. I think sending acks from tx thread has the following disadvantages: - we mix up "committed" event and "written to the commit log" event. They become indistinguishable in tx thread. Per RAFT, we should send back acks as soon as we write to the local commit log, and when the leader gets enough 'acks' from enough commit logs it sends another message which makes the local transaction commit. If you 'ack' when you commit the local transaction, how would you be able to roll it back on leader change or majority failure? So the event you need to be acknowledging is not the event this trigger in question is capturing. - the second issue is latency. tx/wal scheduling delay can be in hundreds of microseconds, and this is close to networking delays on fast networks within the same rack/data center. So it acknowledging commit log writes from WAL thread will also speed up the leader quite a bit, since the round trip will be shorter. To sum up, I still think you should not use this trigger to acknowledge commit log writes. Better have a separate socket for this altogether, or move the write end of the existing socket to the wal, while keeping the read end where it is now, in tx/applier. -- Konstantin Osipov, Moscow, Russia