From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kostja.osipov@gmail.com>
Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com
 [209.85.167.67])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 9011D452566
 for <tarantool-patches@dev.tarantool.org>;
 Thu, 14 Nov 2019 20:33:40 +0300 (MSK)
Received: by mail-lf1-f67.google.com with SMTP id q28so5731941lfa.5
 for <tarantool-patches@dev.tarantool.org>;
 Thu, 14 Nov 2019 09:33:40 -0800 (PST)
Date: Thu, 14 Nov 2019 20:33:38 +0300
From: Konstantin Osipov <kostja.osipov@gmail.com>
Message-ID: <20191114173338.GC17735@atlas>
References: <20191114125705.26760-1-maria.khaydich@tarantool.org>
 <13232148.CanJsBF7IP@home.lan> <20191114152656.GA12369@atlas>
 <9545327.LWTAaRDJnC@home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <9545327.LWTAaRDJnC@home.lan>
Subject: Re: [Tarantool-patches] [PATCH] Trigger on vclock change
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Georgy Kirichenko <georgy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org

* Georgy Kirichenko <georgy@tarantool.org> [19/11/14 20:14]:
> > I also think it it's a pair of server_id, lsn, rather than entire
> > vclock - usually you know what you're waiting for, and it's only
> > one component of vclock, not all of them.
> But there are some issues
> 1. what if we wish to have a timeout
> 2. what if lsn are waited in non-strictly increasing order
> 3. what if awaiting fiber is canceled
> The approach you suggested looks for me like a bike-shed trigger 
> implementation but the implementation is limited to use only for wait for lsn. 
> So I would like to propose to ask Alexander Tikhonov to provide us with a 
> benchmark result first and then make a conclusion about performance impact.

Maybe you're right. But isn't the entire idea of wait_lsn()
bike-shed, as you put it, because we don't have sync replication?

> > > Anyway, we will need to have such trigger in order to make applier able to
> > > report local replica wal and commited vclock in scope of synchronous
> > > replication issue.
> > 
> > This has to happen in WAL thread, not in main thread, and has to
> > watch relay-from-memory vclock, not async-replication vclock. And
> > it also needs to roll back the transaction locally on failure,
> > i.e. write some sort of undo records to the WAL.
> This will work in an applier which lives in the TX cord, as an applier 
> processes incoming transactions through the TX. And an applier should be able 
> to answer with two vclocks - committed and written ones. Yes, WAL will batch 
> such vclocks updates but this is still of hundreds of events per second. 
> Unfortunately there is no point to move an applier to the WAL thread because a 
> transaction could not be validated without TX.

OK, now I get it where you're heading. I think sending acks from
tx thread has the following disadvantages:
- we mix up "committed" event and "written to the commit log"
  event. They become indistinguishable in tx thread. Per RAFT, we
  should send back acks as soon as we write to the local commit
  log, and when the leader gets enough 'acks' from enough commit
  logs it sends another message which makes the local transaction
  commit. If you 'ack' when you commit the local transaction, how
  would you be able to roll it back on leader change or majority 
  failure?

  So the event you need to be acknowledging is not the event this 
  trigger in question is capturing. 

- the second issue is latency. tx/wal scheduling delay can be in
  hundreds of microseconds, and this is close to  networking
  delays on fast networks within the same rack/data center.
  So it acknowledging commit log writes from WAL thread will
  also speed up the leader quite a bit, since the round trip
  will be shorter.

To sum up, I still think you should not use this trigger to
acknowledge commit log writes. Better have a separate socket for
this altogether, or move the write end of the existing socket to
the wal, while keeping the read end where it is now, in
tx/applier.

-- 
Konstantin Osipov, Moscow, Russia