From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 1F4306EC55; Thu, 6 May 2021 01:19:59 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1F4306EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1620253199; bh=uKshq/TY+UnSgBo1yjiDfk1Z2HWce1MLOUV7QwIDwvg=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xEhQcXLWRf2TE9GzTVB1c0QrA9RbAB5BukvPJHhr/ktcmFI42ybWUb3BZFSZIfkPX SyZduXiChL9fQ1AWAKSmWu/bjFbuF8OWUpa2LiG4184XfC/0QPXwIMx0u9EXWSGfV6 ZsOx/VxHgHewwJ/re/0WZ7YGWrwsFqbsalSSj7f0= Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7E2EC6EC55 for ; Thu, 6 May 2021 01:19:58 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7E2EC6EC55 Received: by mail-lf1-f54.google.com with SMTP id x20so4848166lfu.6 for ; Wed, 05 May 2021 15:19:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=clShHg+OhPcRg0H0M0gP6CmHLcG3eL0TWsjVsZkTZz4=; b=qVIzRielMsc4DifXbo0sc16pc/mKthuhbk8BDBgr5qGfw1aVWZfn5tsSArjqYxdDN2 RDGJzV4Nd+shN/tJnI7IBAIYy8VFrvJMOrHLl8DNYNlQ4wKEST3aXwhhVHQ6kUx1Dp59 9exQg0id9YvblE9BW0/BDC8cEcF/nx53F8V5PfurlXGoS6ZXqV2xJtKWyV4ScLYsc6qG YZTsi3XVVfWcIQ9R4zyhRV0iBSyKZp1aR3/IxQu+XUlqonF7QtgLIsWYu0161zbAYvX/ p157NU9VIUe461KMvb+W/4JpjLz0p+zkJfI6xE+crvf7D1S/SzZXpLZyPaPYI7lalRFp tX7w== X-Gm-Message-State: AOAM532YpAHmpQxoxOtq6NHkakF6YFWqIyx1RKXxwP1tl4Jqk7fAv7nv mDSqPwtLF5r7+bvM3FF/Nz0= X-Google-Smtp-Source: ABdhPJyxkeTxNqrgwHiUMlIOYJCp2Bv8la0lN2IYUqkUv8xdvGxRuoUF4Pgl1PT4IKRi9Y+ewlDOaQ== X-Received: by 2002:a05:6512:314e:: with SMTP id s14mr687669lfi.14.1620253197904; Wed, 05 May 2021 15:19:57 -0700 (PDT) Received: from grain.localdomain ([5.18.103.226]) by smtp.gmail.com with ESMTPSA id q10sm169738lfg.1.2021.05.05.15.19.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 May 2021 15:19:55 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 24B1056019F; Thu, 6 May 2021 01:19:55 +0300 (MSK) Date: Thu, 6 May 2021 01:19:55 +0300 To: Vladislav Shpilevoy Message-ID: References: <20210430153940.121271-1-gorcunov@gmail.com> <20210430153940.121271-3-gorcunov@gmail.com> <6984ca72-28ec-d86f-5653-c3beff626158@tarantool.org> <5c0161b3-0f00-901d-1c94-f40915fe6f6b@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5c0161b3-0f00-901d-1c94-f40915fe6f6b@tarantool.org> User-Agent: Mutt/2.0.5 (2021-01-21) Subject: Re: [Tarantool-patches] [RFC v3 2/3] applier: send first row's WAL time in the applier_writer_f X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: Mons Anderson , tml Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Wed, May 05, 2021 at 10:47:23PM +0200, Vladislav Shpilevoy wrote: > >>> +static void > >>> +awstat_update(struct awstat *awstat) > >>> +{ > >>> + /* Ignore if not needed */ > >>> + if (awstat->instance_id == 0) > >>> + return; > >> > >> 2. Why did you even allocate this stat if it is not needed? > >> Maybe it would be better to have it NULL then and check > >> for NULL? AFAIU these are the initial and final join cases. > > > > They are allocated on the stack together with synchro_entry, > > so there is no penalty. > > They are allocated on the region for plain transactions, which > are way more important to optimize. I thought you meant about synchro entry. Anyway, I changed the code that this if no longer called for syncro entries in "join" stages. I'll send a new version for review (hopefully soon). > > >> Did you try the way I proposed about waiting for all applier's > >> WAL writes to end in applier_stop()? Does it look worse? After > >> the fiber stop and wal_sync() it would be safe to assume there > >> are no WAL writes in fly from this applier. But I don't know if > >> it would look better. > > > > I thought about it alot. And you know, I don't really like what we > > are to implement > > > > - currently applier_stop() doesn't wait the journal to finish its > > write. The main applier reader is spinning in !fiber_is_cancelled() > > cycle in a polling way while applier tries to read new data from the > > remote relay peer. If peer doesn't reply for some reason then we throw > > an exception which is catched by a caller code, and the caller tries > > to iterate new cycle testing if fiber is cancelled. > > I do not follow here. You needed to wait only for the pending journal > writes. It has nothing to do with reading new data from anywhere, does > it? The pending writes may take too much time. And in case if we need to stop applier such "wait" procedure will bring a penalty because right now if we call applier_stop() in worst case it will wait for 1 second (with default configs) before the applier reader will be joined. And we don't know yet for which reason (except cfg reconfig) we will need to stop appliers in future. And this is a main reason why I dropped the idea of introducing "wait until async write complete" approach. > > > - in applier_stop() we will have to implement some kind of reference > > counting, which would be modified on journal completion and i think > > this makes code even more complex, since we have to add some additional > > logic when applier is allowed to cancel. > > We don't really. You only need to call wal_sync() after which you can be > sure all the WAL writes started before this call now are finished. At > least AFAR it works. But these refs might be needed for assertions, so > yes, maybe not a good idea. Yes, wal_sync could do the trick (I must confess I forgot about it). So if we put aside the code with refs+assertion how would the final code flow look? I think it might be something like applier_reader apply_plain_tx save tm in applier::some-member applier_writer read applier::some-member applier_stop wal_sync but for debug purpose we definitely will need binding with refs which not that suitable. > >>> + return; > >>> + > >>> + r->applier->first_row_wal_time = awstat->first_row_tm; > >> > >> 4. In case there was a batch of transactions written to WAL, > >> the latest one will override the timestamp of the previous ones and > >> this would make the lag incorrect, because you missed the older > >> transactions. Exactly like when you tried to take a timestamp of > >> the last row instead of the first row, but in a bigger scope. > >> Unless I missed something. > > > > I'm not sure I follow you here. Say we have a batch of transactions. > > The update happens on every journal_entry completion, if several > > entries are flushed then completion is called in ordered way (one > > followed by another). > > Exactly. The latest update will override the previous timestamps. > > > The update happens in same tx thread where > > appliers are running which means acks sending procedure is ordered > > relatively to updates call. Thus we may have situation where we > > complete first entry then we either send it in ack message either > > update to a new value and only then send ack. > > In one event loop iteration many transactions might be submitted to > WAL by the applier. And they will end also in one event loop iteration > later. There won't even be any yields between their completions. > > Yes, first of them will wakeup the ack-sender fiber, but the next ones > will override the first timestamp. After a yield the ACK will contain > the newest timestamp ignoring the timestamps of the older transactions. Seems I get what bothers you. Let me think about. > > > As far as I understand > > there might be a gap in fibers scheduling before several journal entries > > get completed. Thus the lag calculated on relay side will be bigger for > > some moment but on next ack will shrink to latest entry for writtent > > journal entry. > > On the contrary, it will be smaller on the other side than it is. > Because you send the ACK with the latest timestamp instead of the > oldest one. Smaller would be misleading about the real latency. Agreed, need to think. Thanks!