From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A22286EC55; Wed, 5 May 2021 16:06:14 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A22286EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1620219974; bh=ngPzPE/loiFKOYkbisLw3ZGjn6gL9E61pqdVug30mHU=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=A5HwhfnkyTiRNFQMQZ4PJOnS2kYE2uVNVTfN8hCa7R/FayzBQhTPvX++ZBbI0xP2L +d5Jyhs6wRrt4iA2hu51fub0jLDVvsyRwB4qw/B6V7uiqCYwCLK2AiD6otChplqZjQ Q5zsluUC/O/WDdYY1i9EISI3kMMxFRBuXXCYg+/o= Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 81B346EC55 for ; Wed, 5 May 2021 16:06:12 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 81B346EC55 Received: by mail-lf1-f48.google.com with SMTP id x20so2467742lfu.6 for ; Wed, 05 May 2021 06:06:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=PxFIfN3LOQVamW3Nncy3DpQiHu1UzbwMcVXs9wRdE3I=; b=p1sV0xYu5IbRSUKYi+h4mUq9XdPr5yEnYIXc69TCF4dARrDDVzSm5x5NEyZsGFOZbV /ByuNb7l8IUQrAhwbAUdGsvxcIFrBl0GUuT1KPpucYIUb9b68yhyQ9rOTQ9ODmnx/3iE EQg4ilCoxZbod6EqvSW/L5GGQ7Zaqv4gf9sTXT66mSfmAdZN6Hf8pJfV9AMwGQ9IisUN JuawIdUYBKx+KpGu8LRJXORAALbfHAF3PApaAsezrwiGRTIXVwUNp2TTzJ37R+1roFFH IJSf+N/0dIIzlfx/DKDkEE9vjyNnEMRjeAV2+EqKVr5Izo3K++ovj9bodP3Td3OzT8Cv I5qg== X-Gm-Message-State: AOAM533nV+6ndol411e4obuakrsBZOqSvGh1umdvVEEWoMghroVYaqkk jDaQ1+Fgi+pWHxxT0qfdegDHzcFyBeM= X-Google-Smtp-Source: ABdhPJxDlu4K6ZpsGj/CbEUZ79wH3ksi9JNDO5mpkDa2R909Q40ZkmKe5KDGqExYZqLFUCuKigc2vw== X-Received: by 2002:a05:6512:3159:: with SMTP id s25mr19638097lfi.416.1620219971871; Wed, 05 May 2021 06:06:11 -0700 (PDT) Received: from grain.localdomain ([5.18.103.226]) by smtp.gmail.com with ESMTPSA id r19sm2086626ljg.125.2021.05.05.06.06.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 May 2021 06:06:10 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 5389756014D; Wed, 5 May 2021 16:06:08 +0300 (MSK) Date: Wed, 5 May 2021 16:06:08 +0300 To: Vladislav Shpilevoy Message-ID: References: <20210430153940.121271-1-gorcunov@gmail.com> <20210430153940.121271-3-gorcunov@gmail.com> <6984ca72-28ec-d86f-5653-c3beff626158@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6984ca72-28ec-d86f-5653-c3beff626158@tarantool.org> User-Agent: Mutt/2.0.5 (2021-01-21) Subject: Re: [Tarantool-patches] [RFC v3 2/3] applier: send first row's WAL time in the applier_writer_f X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: Mons Anderson , tml Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Fri, Apr 30, 2021 at 10:49:10PM +0200, Vladislav Shpilevoy wrote: > > > > Same time the apply_plain_tx uses asynchronous WAL write completion > > and at moment when the write procedure is finished the applier might > > be removed from replicaset already thus we use applier's instance > > Did you mean instance id? Yes. > > to lookup if it is still alive. > > > > The calculation of the downstream lag itself lag will be addressed > > One of the 'lag' words is redundant. Thanks for catching! > > > > +/** Applier WAL related statistics */ > > +struct awstat { > > 1. Please, lets avoid such hard contractions. At first I > thought you decided to do something related to AWS when saw > this name. > > struct applier_lag > > would be just fine. Sure, will update. > > > + uint32_t instance_id; > > + double first_row_tm; > > +}; > > + > > +static void > > +awstat_update(struct awstat *awstat) > > +{ > > + /* Ignore if not needed */ > > + if (awstat->instance_id == 0) > > + return; > > 2. Why did you even allocate this stat if it is not needed? > Maybe it would be better to have it NULL then and check > for NULL? AFAIU these are the initial and final join cases. They are allocated on the stack together with synchro_entry, so there is no penalty. After your comment I've changed it to struct synchro_entry { /** Request to process when WAL write is done. */ struct synchro_request *req; /** Fiber created the entry. To wakeup when WAL write is done. */ struct fiber *owner; /** WAL bound statistics. */ struct applier_lag *applier_lag; /** * The base journal entry. It has unsized array and then must be the * last entry in the structure. But can workaround it via a union * adding the needed tail as char[]. */ union { struct journal_entry base; char base_buf[sizeof(base) + sizeof(base.rows[0])]; }; }; and then allocate applier_lag on the stack and assign a pointer if needed. static int apply_synchro_row(struct applier *applier, struct xrow_header *row, bool use_awstat) { ... struct applier_lag applier_lag; struct synchro_entry entry; ... if (use_awstat) { applier_lag.instance_id = applier->instance_id; applier_lag.first_row_tm = row->tm; entry.applier_lag = &applier_lag; } else { entry.applier_lag = NULL; } } Strictly speaking there is no difference. > > Did you try the way I proposed about waiting for all applier's > WAL writes to end in applier_stop()? Does it look worse? After > the fiber stop and wal_sync() it would be safe to assume there > are no WAL writes in fly from this applier. But I don't know if > it would look better. I thought about it alot. And you know, I don't really like what we are to implement - currently applier_stop() doesn't wait the journal to finish its write. The main applier reader is spinning in !fiber_is_cancelled() cycle in a polling way while applier tries to read new data from the remote relay peer. If peer doesn't reply for some reason then we throw an exception which is catched by a caller code, and the caller tries to iterate new cycle testing if fiber is cancelled. In case if reconfiguration happens (and timeouts are set to default) then we try to prune old appliers calling fiber_join on them which means this fiber_join won't exit until new fiber_is_cancelled iteration get processed. With default configs it means this could take up to replication_disconnect_timeout(), ie 1 second by default. In turn if we bound to journal write completion then this gonna be uncontrollable because journal may hang as long as it wish and we continue spinning in a waiting cycle - in applier_stop() we will have to implement some kind of reference counting, which would be modified on journal completion and i think this makes code even more complex, since we have to add some additional logic when applier is allowed to cancel. > > + > > + /* > > + * Write to WAL happens in two contexts: as > > + * synchronous writes and as asynchronous. In > > + * second case the applier might be already > > + * stopped and removed.> + */ > > + struct replica *r = replica_by_id(awstat->instance_id); > > + if (r == NULL && r->applier == NULL) > > 3. There is another idea - store the timestamp in struct replica. > Then it is -1 dereference. Although you would need to call > replica_by_id() before each ACK, but one ACK covers multiple > transactions and it would mean less lookups than now. I like this idea, letme try to implement it. > > + return; > > + > > + r->applier->first_row_wal_time = awstat->first_row_tm; > > 4. In case there was a batch of transactions written to WAL, > the latest one will override the timestamp of the previous ones and > this would make the lag incorrect, because you missed the older > transactions. Exactly like when you tried to take a timestamp of > the last row instead of the first row, but in a bigger scope. > Unless I missed something. I'm not sure I follow you here. Say we have a batch of transactions. The update happens on every journal_entry completion, if several entries are flushed then completion is called in ordered way (one followed by another). The update happens in same tx thread where appliers are running which means acks sending procedure is ordered relatively to updates call. Thus we may have situation where we complete first entry then we either send it in ack message either update to a new value and only then send ack. As far as I understand there might be a gap in fibers scheduling before several journal entries get completed. Thus the lag calculated on relay side will be bigger for some moment but on next ack will shrink to latest entry for writtent journal entry. Because relay uses current time minus time value obtained from applier's ack. Hopefully I didn't miss something either. > Probably you need to assign a new timestamp only when the old > one is not 0, and reset it to 0 on each sent ACK. Don't know. Gimme some time to investigate this vague moment. > > > @@ -817,6 +852,12 @@ apply_synchro_row(struct xrow_header *row) > > apply_synchro_row_cb, &entry); > > entry.req = &req; > > entry.owner = fiber(); > > + if (use_awstat) { > > 5. You don't really need this flag. Because during joins applier's > instance ID should be zero anyway. Therefore you would assign > stat.instance_id = 0 in this case regardless of the flag. > > Also this means you don't need the entire applier struct. You only > need the instance_id as an argument. I am saying this because these > functions apply_* didn't not depend on a concrete applier object > probably exactly because it could be deleted. Not having an applier > pointer in their arguments could be intentional. Sounds reasonable, thanks! > > @@ -93,6 +93,11 @@ struct applier { > > ev_tstamp last_row_time; > > /** Number of seconds this replica is behind the remote master */ > > ev_tstamp lag; > > + /** > > + * WAL time of first applied row in a transaction. > > + * For relay statistics sake. > > 6. It is not a first applied row on the whole. Only for the latest > transaction written to WAL. +1 Thanks a huge for comments, Vlad!