From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id DBC7D6EC40; Wed, 2 Jun 2021 01:02:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org DBC7D6EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1622584974; bh=PffCXVQqENyoobDqaSoZwrtvmHeKgb03CaKlckUKP4s=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=XN5GJ8TlwDYwjPHozKjVedzWD4P6UfDJTjqIAbYLXbKsejfXUjaDx6VtAiUuGTaXl dAigAaDjBidQvzhOuVDE/72KThiBdgggDYw3DQV/PbUg22qQUqeHtRm3if13U44p7k Jkhic+RBYG4i/BomQ2fyplKNKey5cwE6HDr6jntM= Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 314A46EC40 for ; Wed, 2 Jun 2021 01:02:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 314A46EC40 Received: by mail-lj1-f174.google.com with SMTP id a4so21289385ljd.5 for ; Tue, 01 Jun 2021 15:02:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=fDkE47q+Nx5gXjf3T3u4R6AAiZS6cTYIs028dPtu0oQ=; b=Ky1/b4cQfsmVlA+R2BsTVW9UsDsZtoGAss0k7wLIA4f3H5ZreNlowP2wlIYVw3Zd/z LFjTawYr+uIFp9rzpMuZ/KYC7rzAXQZTgHCgJ837svhenzNYTVYFohzFrbaMWYWUz1hO cU/QsXM+4K+JgMEBbCibocA2K5qqmU1lcrqFwzQPMUCZuOfgQ7eojhLzNLLC9KEKMIHn k16MXombXx96Tb9t2asASrVov3D50eeVx7mbxEoLD+UaFOYEhkfbwFL9SyfTYgvHr2FI R0qSx5vgauySKrIwGgCG8EjvJNKl27ZU3tRpv5zh3Kw9R6eiN4mxTmRhHht1eP1VUYzg SgMQ== X-Gm-Message-State: AOAM5324eD9UA2MARvN+yI4Jjv+03kYrZ3B/C7MB0FOvNF+KMiR/g9/I 8z7US01fzcxdwI5qTdH58/uB5e/ZQRk= X-Google-Smtp-Source: ABdhPJxtkUa/bmzFR3wceP7GUvtysWyp+tsoCFu2/tf1RfkJrGHpvBbwXBRFKqzSDRB6Pv0p0fEJvg== X-Received: by 2002:a2e:9c93:: with SMTP id x19mr8642930lji.260.1622584971345; Tue, 01 Jun 2021 15:02:51 -0700 (PDT) Received: from grain.localdomain ([5.18.171.94]) by smtp.gmail.com with ESMTPSA id m13sm1800201lfr.23.2021.06.01.15.02.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Jun 2021 15:02:50 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 67C1B5A0042; Wed, 2 Jun 2021 01:02:49 +0300 (MSK) Date: Wed, 2 Jun 2021 01:02:49 +0300 To: Vladislav Shpilevoy Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/2.0.7 (2021-05-04) Subject: [Tarantool-patches] [RFC] on downstream.lag design X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: TML Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Guys, I would like to discuss option 3 from downstream.lag proposal. Quoting https://github.com/tarantool/tarantool/issues/5447 --- Option 3 Downstream.lag is updated constantly until there are non-received ACKs. It becomes 0 when no ACKs to wait for. The difference with the option 2 is that the update is literally continuous - each read of downstream.lag shows a bigger value until an ACK is received and corrects it. Pros: with long transactions it won't freeze for seconds, and would show the truth when not 0. Cons: the same as in the option 2. Also it is more complex to implement. --- Here is a code flow I've in mind ~~~ master (stage 1) ================ TX WAL RELAY -- --- ----- txn_commit txn_limbo_append journal_write --> wal_write fiber_yield() ... [xrow.tm = 1] wal_write_to_disk wal_watcher_notify --> recover_remaining_wals <-- fiber_up() recover_xlog relay_send_row relay_send [xrow.tm = 1, lag = arm to count] {remember in relay's wal_st} txn_limbo_wait_complete | (stage 1 complete, | waiting for data from | replica, to gather ACKs) | | | replica (stage 2) | ================= +--------------------------------------------+ / TX / WAL RELAY -- | --- ----- [xrow.tm = 1] | V applier_apply_tx apply_plain_tx txn_commit_try_async journal_write_try_async --> wal_write_async wal_write_to_disk wal_watcher_notify --> recover_remaining_wals recover_xlog relay_send_row (filtered out) applier_txn_wal_write_cb [xrow.tm = 1] -> {remember in wal_st} finally transfer comes to applier_writer_f applier_writer_f xrow_encode_vclock {encode [xrow.tm = 1] from wal_st} coio_write_xrow - \ \ master (stage 3) | ================ | | RELAY | ----- / relay_reader_f <--+ receive ack [xrow.tm = 1] modify_relay_lag() (to implement) armed value from stage 1 minus xrow.tm ~~~ Once txn_commit() the pre-send stage is relay thread woken by the WAL thread where we catch rows to be send and if there is a sync transaction we remember the timestamp from first row somewhere in the relay structure, this timestamp is assigned by WAL thread itself right before flushing data to the disk. If user start reading box.info().relay.downstream.lag it will see increasing counter like [ev_now - xrow.tm] until ACK is received. Once ACK is obtained the lag set to some positive value [ev_now - xrow.tm]. This value remains immutable until new sync transaction is sent. On new sync transaction we do the same -- asign value from row.tm and count time until ACK is received.