From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C623A6EC55; Tue, 15 Jun 2021 12:36:06 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C623A6EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1623749766; bh=JdAhqpE0p8INOYOref/jMRP0nGgp7G3Z9rtyTKLwiEk=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=IA4pNDCZLLhPbM5m2211NXXyO15BK1ZE8mhWFheh0aga67DJ2f3nbrKkZkGehO1Zh kTHWfVdU0WCDfGdWl72jJUTdGBysbU6OGHk+eDVrC1/yTP+qx8RJmkH49dZLqQG8d9 vHGTzpwX8Hv/BF8yi68k0ZlK+gHteFrh8YhxEJ7Y= Received: from smtp40.i.mail.ru (smtp40.i.mail.ru [94.100.177.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2EBF26EC55 for ; Tue, 15 Jun 2021 12:36:05 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2EBF26EC55 Received: by smtp40.i.mail.ru with esmtpa (envelope-from ) id 1lt5UJ-0004P1-Ci; Tue, 15 Jun 2021 12:36:03 +0300 To: Cyrill Gorcunov , tml References: <20210607155519.109626-1-gorcunov@gmail.com> <20210607155519.109626-2-gorcunov@gmail.com> Message-ID: <22738ee6-74e1-0090-4eb0-c08183bb16e8@tarantool.org> Date: Tue, 15 Jun 2021 12:36:02 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210607155519.109626-2-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C544BBC2A69B1B4100B389BF69B7A224D7C182A05F538085040969470628DF5B3EC62D6E221C4A77116DB56C4F068AE1F26EA54695A06122E30 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7387B291F9AC4D188EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063750965CC5CDB672DE8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8674B9CEF77A079544F7653ACBBA75874117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC8C7ADC89C2F0B2A5A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F4460429728776938767073520B1593CA6EC85F86DF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B654CE8ED7C2D004275ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A588C10B26CA108B4E4408A0C8DA7FD71619FDCB18E621EA81D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75FBC5FED0552DA851410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A5BF8FA0BD61F9945C2750056F6497EE6D563F9FDDEA6167CF914A69D753AE071A5F6C22037B1AA91D7E09C32AA3244C3ACE95701EBE39DBA963806AE75122FE39C99C45E8D137E9927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj6OL1iHTyIM0TyLWLVX8usA== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A4464FD9C0EC7B13818A1E4F4C5A35EE183691E4618459E0AF95424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v8 1/2] applier: send transaction's first row WAL time in the applier_writer_f X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 07.06.2021 18:55, Cyrill Gorcunov пишет: > Applier fiber sends current vclock of the node to remote relay reader, > pointing current state of fetched WAL data so the relay will know which > new data should be sent. The packet applier sends carries xrow_header::tm > field as a zero but we can reuse it to provide information about first > timestamp in a transaction we wrote to our WAL. Since old instances of > Tarantool simply ignore this field such extension won't cause any > problems. > > The timestamp will be needed to account lag of downstream replicas > suitable for information purpose and cluster health monitoring. > > We update applier statistics in WAL callbacks but since both > apply_synchro_row and apply_plain_tx are used not only in real data > application but in final join stage as well (in this stage we're not > writing the data yet) the apply_synchro_row is extended with replica_id > argument which is non zero when applier is subscribed. > > The calculation of the downstream lag itself lag will be addressed > in next patch because sending the timestamp and its observation > are independent actions. Hi! Thanks for the patch! Looks good generally. Please, find one question below. > > Part-of #5447 > > Signed-off-by: Cyrill Gorcunov > --- > src/box/applier.cc | 90 +++++++++++++++++++++++++++++++++++------- > src/box/replication.cc | 1 + > src/box/replication.h | 5 +++ > 3 files changed, 81 insertions(+), 15 deletions(-) > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 33181fdbf..38695a54f 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -163,6 +163,9 @@ applier_writer_f(va_list ap) > struct ev_io io; > coio_create(&io, applier->io.fd); > > + /* ID is permanent while applier is alive */ > + uint32_t replica_id = applier->instance_id; > + > while (!fiber_is_cancelled()) { > /* > * Tarantool >= 1.7.7 sends periodic heartbeat > @@ -193,6 +196,16 @@ applier_writer_f(va_list ap) > applier->has_acks_to_send = false; > struct xrow_header xrow; > xrow_encode_vclock(&xrow, &replicaset.vclock); > + /* > + * For relay lag statistics we report last > + * written transaction timestamp in tm field. > + * > + * Replica might be dead already so we have to > + * test on each iteration. > + */ > + struct replica *r = replica_by_id(replica_id); > + if (likely(r != NULL)) > + xrow.tm = r->applier_txn_start_tm; How could a replica be dead here? AFAIR we delete a replica only when it's deleted from _cluster. Shouldn't the applier writer be dead as well by that time? > coio_write_xrow(&io, &xrow); > ERROR_INJECT(ERRINJ_APPLIER_SLOW_ACK, { > fiber_sleep(0.01); > @@ -490,7 +503,7 @@ static uint64_t > applier_read_tx(struct applier *applier, struct stailq *rows, double timeout); > > static int > -apply_final_join_tx(struct stailq *rows); > +apply_final_join_tx(uint32_t replica_id, struct stailq *rows); > > /** > * A helper struct to link xrow objects in a list. > @@ -535,7 +548,7 @@ applier_wait_register(struct applier *applier, uint64_t row_count) > next)->row); > break; > } > - if (apply_final_join_tx(&rows) != 0) > + if (apply_final_join_tx(applier->instance_id, &rows) != 0) > diag_raise(); > } > > @@ -751,11 +764,35 @@ applier_txn_rollback_cb(struct trigger *trigger, void *event) > return 0; > } > > +struct replica_cb_data { > + /** Replica ID the data belongs to. */ > + uint32_t replica_id; > + /** > + * Timestamp of a transaction to be accounted > + * for relay lag. Usually it is a first row in > + * a transaction. > + */ > + double txn_start_tm; > +}; > + > -- Serge Petrenko