From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounces@dev.tarantool.org>
Received: from [87.239.111.99] (localhost [127.0.0.1])
	by dev.tarantool.org (Postfix) with ESMTP id 07F7F6EC40;
	Wed,  2 Jun 2021 09:55:41 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 07F7F6EC40
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev;
	t=1622616941; bh=T3u+UhWurEjWor7l77d6P8HAS4OtfFBSYwDwqn5Ru20=;
	h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=b53UVNGePjaRTxVhHPqb0Fw/MA0RtBxSBSQq86lMy+FVZYETaBV8NPYn+lLm7KIgT
	 APkuU8228zB6NtwC9xrcUP5n9WEDZaQ6uAuG/YUicKhjDaUWKAtSKO6yhUolGayC31
	 c0Hza6qfv/nLrv1pn/ihYQ5elExhCefYLuNs3NnQ=
Received: from smtp46.i.mail.ru (smtp46.i.mail.ru [94.100.177.106])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 1B5546EC40
 for <tarantool-patches@dev.tarantool.org>;
 Wed,  2 Jun 2021 09:55:40 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1B5546EC40
Received: by smtp46.i.mail.ru with esmtpa (envelope-from
 <sergepetrenko@tarantool.org>)
 id 1loKmw-0000W7-Nb; Wed, 02 Jun 2021 09:55:39 +0300
To: Cyrill Gorcunov <gorcunov@gmail.com>,
 Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
References: <YLauiZStUcSIqDKp@grain>
Message-ID: <43b6b0d8-b5e9-e8e5-1b6d-ed550c674c11@tarantool.org>
Date: Wed, 2 Jun 2021 09:55:38 +0300
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
 Gecko/20100101 Thunderbird/78.10.2
MIME-Version: 1.0
In-Reply-To: <YLauiZStUcSIqDKp@grain>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-7564579A: B8F34718100C35BD
X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C5423CDB5763716BB867A249A0077715520182A05F538085040C72C697D90AEB0AFF1813D9C3FA5A21984AE56E4863757C3D1FD1E9E8B1F2C74
X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE79961E86438F5BDAEEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637FD85A7F5EB0E97528638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8FF7F4E2BC0EC9D6E1A65702EDCE754A0117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC974A882099E279BDA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751F6FD1C55BDD38FC3FD2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE7B96B19DC409332149AF716F719AB83ED8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE3A6C7FFFE744CA7FB2D242C3BD2E3F4C6C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637870CFFD37CCFDD3AEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A
X-C1DE0DAB: 0D63561A33F958A56B4763394913B515C31593DAEFF87C2A25533368AC0EBDDAD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75FBC5FED0552DA851410CA545F18667F91A7EA1CDA0B5A7A0
X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34D4140D813EC137C62A46A2FBFA71186ABE25C02B8BEA28F51278C6D6BDB6672E3ECFFCCBF3A57B4A1D7E09C32AA3244C536FAE93AE94BBBA1798E8B55949DF115A1673A01BA68E40927AC6DF5659F194
X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+mfSpkNmA2ouppZYt6CfMA==
X-Mailru-Sender: 583F1D7ACE8F49BD9DF7A8DAE6E2B08A0776BA88A9E497F0D8D50ECB39C929CAFE82372B44F232A1424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387
X-Mras: Ok
Subject: Re: [Tarantool-patches] [RFC] on downstream.lag design
X-BeenThere: tarantool-patches@dev.tarantool.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
From: Serge Petrenko via Tarantool-patches
 <tarantool-patches@dev.tarantool.org>
Reply-To: Serge Petrenko <sergepetrenko@tarantool.org>
Cc: TML <tarantool-patches@dev.tarantool.org>
Errors-To: tarantool-patches-bounces@dev.tarantool.org
Sender: "Tarantool-patches" <tarantool-patches-bounces@dev.tarantool.org>



02.06.2021 01:02, Cyrill Gorcunov пишет:
> Guys, I would like to discuss option 3 from downstream.lag
> proposal.
>
> Quoting https://github.com/tarantool/tarantool/issues/5447
> ---
> Option 3
> Downstream.lag is updated constantly until there are non-received ACKs.
> It becomes 0 when no ACKs to wait for. The difference with the option 2
> is that the update is literally continuous - each read of downstream.lag
> shows a bigger value until an ACK is received and corrects it.
>
> Pros: with long transactions it won't freeze for seconds, and would show the truth when not 0.
> Cons: the same as in the option 2. Also it is more complex to implement.
> ---
>
> Here is a code flow I've in mind

Hi! Thanks for working on this!

>
> ~~~
> master (stage 1)
> ================
>
> TX                        WAL                         RELAY
> --                        ---                         -----
> txn_commit
>    txn_limbo_append
>    journal_write       --> wal_write
>      fiber_yield()           ...
>                              [xrow.tm = 1]
>                              wal_write_to_disk
>                                wal_watcher_notify -->  recover_remaining_wals
>                        <--   fiber_up()                  recover_xlog
>                                                            relay_send_row
>                                                              relay_send [xrow.tm = 1, lag = arm to count]
>                                                              {remember in relay's wal_st}

What's wal_st? Is it a list of all sent out xrow.tms?

>    txn_limbo_wait_complete                                        |
>      (stage 1 complete,                                           |
>       waiting for data from                                       |
>       replica, to gather ACKs)                                    |
>                                                                   |
>                                                                   |
> replica (stage 2)                                                |
> =================   +--------------------------------------------+
>                     /
> TX                /               WAL                         RELAY
> --               |                ---                         -----
>             [xrow.tm = 1]
>                   |
>                   V
>
> applier_apply_tx
>    apply_plain_tx
>      txn_commit_try_async
>        journal_write_try_async --> wal_write_async
>                                      wal_write_to_disk
>                                        wal_watcher_notify -->  recover_remaining_wals
>                                                                  recover_xlog
>                                                                    relay_send_row
>                                                                    (filtered out)
>      applier_txn_wal_write_cb
>        [xrow.tm = 1] -> {remember in wal_st}
>
> finally transfer comes to applier_writer_f
>
> applier_writer_f
>    xrow_encode_vclock
>      {encode [xrow.tm = 1] from wal_st}
>      coio_write_xrow  -
>                        \
>                         \
> master (stage 3)       |
> ================       |
>                         |
> RELAY                  |
> -----                 /
> relay_reader_f    <--+
>    receive ack [xrow.tm = 1]
>      modify_relay_lag() (to implement)
>        armed value from stage 1 minus xrow.tm
>
> ~~~
>
> Once txn_commit() the pre-send stage is relay thread woken by
> the WAL thread where we catch rows to be send and if there is
> a sync transaction we remember the timestamp from first row
> somewhere in the relay structure, this timestamp is assigned
> by WAL thread itself right before flushing data to the disk.
>
> If user start reading box.info().relay.downstream.lag it will
> see increasing counter like [ev_now - xrow.tm] until ACK is received.
>
> Once ACK is obtained the lag set to some positive value [ev_now - xrow.tm].
> This value remains immutable until new sync transaction is sent. On new
> sync transaction we do the same -- asign value from row.tm and count
> time until ACK is received.

I'm not sure I understood it all correstly. Correct me if I'm wrong,
here's how I unerstand it:

You save xrow.tm in a list or something in relay, once sending the 
corresponding
row out.

When applier sends out an acks for the row, the ack contains the same 
xrow.tm received earlier.

Relay, upon receiving an ack, finds the corresponding xrow.tm in list 
and removes it.

Every time downstream.lag is read it is equal ev_now() - oldest xrow.tm 
in list.

Is this right?

-- 
Serge Petrenko