From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 02B1D6EC5F; Fri, 30 Apr 2021 23:49:15 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 02B1D6EC5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1619815755; bh=w8x47hEMX/6n7QsOJ3n3/h4w2kdmkEL84ir493J5ukI=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=O7bbggXzdceqiG3y1VZzi+PB2ElXItk98DkhIXii93KRqA4CuI+PkfHYcfafE6222 W9w734LLoo/rWyo5tcIb9+wOI0ZNjryIkGfMYwWGs3ZXGGGTRURXFnaYJ9Sm94y/G9 z9RRi4UXkcwVCPw/43/T3K2PeAchGe2Kcc94yvPw= Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [94.100.177.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 36CA26EC5F for ; Fri, 30 Apr 2021 23:49:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 36CA26EC5F Received: by smtp35.i.mail.ru with esmtpa (envelope-from ) id 1lca4V-0008Rr-OU; Fri, 30 Apr 2021 23:49:12 +0300 To: Cyrill Gorcunov , tml References: <20210430153940.121271-1-gorcunov@gmail.com> <20210430153940.121271-3-gorcunov@gmail.com> Message-ID: <6984ca72-28ec-d86f-5653-c3beff626158@tarantool.org> Date: Fri, 30 Apr 2021 22:49:10 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <20210430153940.121271-3-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD95978C26455E69BE0FB16CA32063744B94D060C3A2F6991A4182A05F538085040E73DBCB680395B93AEAB3733C3D1CED9BD9FC18CFCF76DCE7CBA15EDE4539607 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F12ABE79F2AB44EAEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370FDF1CE57EA9D07C8638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B25D4AEA3C9A79E2D3BC5DDA1834B2CDCFEBA8952C4932A151D2E47CDBA5A96583C09775C1D3CA48CFE3800B164E348C91117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE7A46ADF0C403417779FA2833FD35BB23DF004C90652538430302FCEF25BFAB3454AD6D5ED66289B5278DA827A17800CE7F37C1F64ECBCA44B7B076A6E789B0E97A8DF7F3B2552694A1E7802607F20496D49FD398EE364050F1E561CDFBCA1751F500AC0B2F9B62304B3661434B16C20AC78D18283394535A9E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8BB07C9E286C61B7F975ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2368A440D3B0F6089093C9A16E5BC824A2A04A2ABAA09D25379311020FFC8D4ADAC16E8A1BF356EE9F985BD6351FFA5B1 X-C1DE0DAB: 0D63561A33F958A5C73161D318A6902ACB8A0B96B4BE64058EE2A6D0CB8543A8D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34E229BE567979C9404E6E3758D0108918C48797C5CEB2448180AD3E9B6F13A40B6BDB687259BF84C81D7E09C32AA3244C7C475795635969E47B74BF9C504ED179408A6A02710B730483B48618A63566E0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj9hvEon6tehO2Zi9eHUpK7g== X-Mailru-Sender: 504CC1E875BF3E7D9BC0E5172ADA3110EB2C8C43034BAB51E2187E3722ABD60EC9B4BD4ACBE989A207784C02288277CA03E0582D3806FB6A5317862B1921BA260ED6CFD6382C13A6112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [RFC v3 2/3] applier: send first row's WAL time in the applier_writer_f X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Cc: Mons Anderson Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for working on this! On 30.04.2021 17:39, Cyrill Gorcunov wrote: > This fibers sends current vclock of the node to remote relay reader. > This packet carries xrow_header::tm field as a zero but we can reuse > it to provide information about first timestamp in a transaction we > wrote to our WAL. Since old instances of Tarantool simply ignore this > field such extension won't cause any problems. > > This timestamp will be needed to account lags of downstream replicas > suitable for information purpose and cluster health monitoring. > > We update applier statistics in WAL callbacks but since both > apply_synchro_row and apply_plain_tx are used not only in real data > application but in final join stage as well (in this stage we're not > yet writing the data) the apply_synchro_row is extended with a flag > pointing that applier update is needed. > > Same time the apply_plain_tx uses asynchronous WAL write completion > and at moment when the write procedure is finished the applier might > be removed from replicaset already thus we use applier's instance Did you mean instance id? > to lookup if it is still alive. > > The calculation of the downstream lag itself lag will be addressed One of the 'lag' words is redundant. > in next patch because sending the timestamp and its observation > are independent actions. See 6 comments below. > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 33181fdbf..626dc0324 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -751,11 +752,41 @@ applier_txn_rollback_cb(struct trigger *trigger, void *event) > return 0; > } > > +/** Applier WAL related statistics */ > +struct awstat { 1. Please, lets avoid such hard contractions. At first I thought you decided to do something related to AWS when saw this name. struct applier_lag would be just fine. > + uint32_t instance_id; > + double first_row_tm; > +}; > + > +static void > +awstat_update(struct awstat *awstat) > +{ > + /* Ignore if not needed */ > + if (awstat->instance_id == 0) > + return; 2. Why did you even allocate this stat if it is not needed? Maybe it would be better to have it NULL then and check for NULL? AFAIU these are the initial and final join cases. Did you try the way I proposed about waiting for all applier's WAL writes to end in applier_stop()? Does it look worse? After the fiber stop and wal_sync() it would be safe to assume there are no WAL writes in fly from this applier. But I don't know if it would look better. > + > + /* > + * Write to WAL happens in two contexts: as > + * synchronous writes and as asynchronous. In > + * second case the applier might be already > + * stopped and removed.> + */ > + struct replica *r = replica_by_id(awstat->instance_id); > + if (r == NULL && r->applier == NULL) 3. There is another idea - store the timestamp in struct replica. Then it is -1 dereference. Although you would need to call replica_by_id() before each ACK, but one ACK covers multiple transactions and it would mean less lookups than now. > + return; > + > + r->applier->first_row_wal_time = awstat->first_row_tm; 4. In case there was a batch of transactions written to WAL, the latest one will override the timestamp of the previous ones and this would make the lag incorrect, because you missed the older transactions. Exactly like when you tried to take a timestamp of the last row instead of the first row, but in a bigger scope. Unless I missed something. Probably you need to assign a new timestamp only when the old one is not 0, and reset it to 0 on each sent ACK. Don't know. > @@ -817,6 +852,12 @@ apply_synchro_row(struct xrow_header *row) > apply_synchro_row_cb, &entry); > entry.req = &req; > entry.owner = fiber(); > + if (use_awstat) { 5. You don't really need this flag. Because during joins applier's instance ID should be zero anyway. Therefore you would assign stat.instance_id = 0 in this case regardless of the flag. Also this means you don't need the entire applier struct. You only need the instance_id as an argument. I am saying this because these functions apply_* didn't not depend on a concrete applier object probably exactly because it could be deleted. Not having an applier pointer in their arguments could be intentional. > + entry.awstat.instance_id = applier->instance_id; > + entry.awstat.first_row_tm = row->tm; > + } else { > + entry.awstat.instance_id = 0; > + }> diff --git a/src/box/applier.h b/src/box/applier.h > index 15ca1fcfd..bd98827e7 100644 > --- a/src/box/applier.h > +++ b/src/box/applier.h > @@ -93,6 +93,11 @@ struct applier { > ev_tstamp last_row_time; > /** Number of seconds this replica is behind the remote master */ > ev_tstamp lag; > + /** > + * WAL time of first applied row in a transaction. > + * For relay statistics sake. 6. It is not a first applied row on the whole. Only for the latest transaction written to WAL. > + */ > + double first_row_wal_time; > /** The last box_error_code() logged to avoid log flooding */ > uint32_t last_logged_errcode; > /** Remote instance ID. */ >