From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 1C7166EC5D; Thu, 8 Apr 2021 15:58:41 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1C7166EC5D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1617886721; bh=o1FwgfpJTeyB06fWq0LddtPOj6vfLvbWVB2h90W2pYo=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ipcapeUBwB1T+6ZnJQ61qLKMGqRhYhz+skenAw1pIeMQSyNF+7wtplStqIxdREjQn Gdllcmhgy+9SQiSbBeSHefV+BGVub4upBbLTJ8hn/XrarwZOGp+QJGczjyDcZvEPHN urN5zD4fGSi6mnuFfAhYmPWVyY6vevu7FehyGtEo= Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [94.100.177.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BEC266EC5D for ; Thu, 8 Apr 2021 15:58:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BEC266EC5D Received: by smtp32.i.mail.ru with esmtpa (envelope-from ) id 1lUUF4-0006Ds-Lu; Thu, 08 Apr 2021 15:58:39 +0300 To: Cyrill Gorcunov , tml References: <20210408121813.1633911-1-gorcunov@gmail.com> Message-ID: <75632023-f2e4-b038-f800-990181566e64@tarantool.org> Date: Thu, 8 Apr 2021 15:58:38 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: <20210408121813.1633911-1-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD912A3E3D5D4B49FC1CC94992626AE9EFE0A9192A937C653EE00894C459B0CD1B9A7300143CACC36D93E5454CA33A56BDA1A352C6F1095D81458574C2FEBD6103D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7661D1C86A3B4DE39EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063715F166F2542EEE4C8638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B200D8536CB2AD9843078463036D5EEE0E3999716826408816D2E47CDBA5A96583C09775C1D3CA48CF27ED053E960B195E117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE709B92020B71E24959FA2833FD35BB23DF004C906525384302BEBFE083D3B9BA71A620F70A64A45A98AA50765F79006372E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89F29C2079CDE5AC98035872C767BF85DA2F004C90652538430E4A6367B16DE6309 X-C1DE0DAB: 0D63561A33F958A5A89FD01E0D387A8534A4C586E8CC6AE97B06EBFF18C1F9FAD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D343D50AEDB859DBAD99A208C4A2679B3E8C753EF9348B39E9ACA56C0D53F4469A857274ABEB1C7D1131D7E09C32AA3244C05057D67F8218CBBBE317365431EC4A88A6D4CC6FBFAC251927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioju+jaMfvANXqIrIjF9VWlKg== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A44639CB93A8659E8F7911B58E9281BF975D1087201FBCDFA22C424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v4] qsync: provide box.info.synchro interface for monitoring X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 08.04.2021 15:18, Cyrill Gorcunov пишет: > In commit 14fa5fd82 (cfg: support symbolic evaluation of > replication_synchro_quorum) we implemented support of > symbolic evaluation of `replication_synchro_quorum` parameter > and there is no easy way to obtain it current run-time value, > ie evaluated number value. > > Moreover we would like to fetch queue length on transaction > limbo for tests and extend this statistics in future. Thus > lets add them. > > Closes #5191 Thanks for the fixes! Please return `box.info.synchro.quorum` assertions to the test regarding quorum evaluation. Like you had it in the previous patch version. Other than that, LGTM. > > Signed-off-by: Cyrill Gorcunov > > @TarantoolBot document > Title: Provide `box.info.synchro` interface > > The `box.info.synchro` leaf provides information about details of > synchronous replication. > > In particular `quorum` represent the current value of synchronous > replication quorum defined by `replication_synchro_quorum` > configuration parameter because it can be set as dynamic formula > such as `N/2+1` and the value depends on current number of replicas. > > Since synchronous replication does not commit data immediately > but waits for its propagation to replicas the data sits in a queue > gathering `commit` responses from remote nodes. Current number of > entries waiting in the queue is shown via `queue.len` member. > > A typical output is the following > > ``` Lua > tarantool> box.info.synchro > --- > - queue: > len: 0 > quorum: 1 > ... > ``` > > The `len` member shows current number of entries in the queue. > And the `quorum` member shows an evaluated value of > `replication_synchro_quorum` parameter. > --- > issue https://github.com/tarantool/tarantool/issues/5191 > branch gorcunov/gh-5191-qsync-stat-4 > > changelogs/unreleased/box-info-limbo.md | 4 +++ > src/box/lua/info.c | 22 ++++++++++++++ > src/box/txn_limbo.c | 5 +++- > src/box/txn_limbo.h | 4 +++ > test/box/info.result | 1 + > test/replication/qsync_basic.result | 40 +++++++++++++++++++++---- > test/replication/qsync_basic.test.lua | 21 +++++++++++-- > 7 files changed, 89 insertions(+), 8 deletions(-) > create mode 100644 changelogs/unreleased/box-info-limbo.md > > diff --git a/changelogs/unreleased/box-info-limbo.md b/changelogs/unreleased/box-info-limbo.md > new file mode 100644 > index 000000000..0f75a911d > --- /dev/null > +++ b/changelogs/unreleased/box-info-limbo.md > @@ -0,0 +1,4 @@ > +## feature/core > + > +* Provide information about state of synchronous replication via > + `box.info.synchro` interface (gh-5191). > diff --git a/src/box/lua/info.c b/src/box/lua/info.c > index 8cd379756..7c63c8f5e 100644 > --- a/src/box/lua/info.c > +++ b/src/box/lua/info.c > @@ -50,6 +50,7 @@ > #include "version.h" > #include "box/box.h" > #include "box/raft.h" > +#include "box/txn_limbo.h" > #include "lua/utils.h" > #include "fiber.h" > #include "sio.h" > @@ -599,6 +600,26 @@ lbox_info_election(struct lua_State *L) > return 1; > } > > +static int > +lbox_info_synchro(struct lua_State *L) > +{ > + lua_createtable(L, 0, 2); > + > + /* Quorum value may be evaluated via formula */ > + lua_pushinteger(L, replication_synchro_quorum); > + lua_setfield(L, -2, "quorum"); > + > + /*Queue information. */ > + struct txn_limbo *queue = &txn_limbo; > + lua_createtable(L, 0, 1); > + lua_pushnumber(L, queue->len); > + lua_setfield(L, -2, "len"); > + lua_setfield(L, -2, "queue"); > + > + return 1; > +} > + > + > static const struct luaL_Reg lbox_info_dynamic_meta[] = { > {"id", lbox_info_id}, > {"uuid", lbox_info_uuid}, > @@ -618,6 +639,7 @@ static const struct luaL_Reg lbox_info_dynamic_meta[] = { > {"sql", lbox_info_sql}, > {"listen", lbox_info_listen}, > {"election", lbox_info_election}, > + {"synchro", lbox_info_synchro}, > {NULL, NULL} > }; > > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index cf0ad9350..a22e0861a 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -41,6 +41,7 @@ static inline void > txn_limbo_create(struct txn_limbo *limbo) > { > rlist_create(&limbo->queue); > + limbo->len = 0; > limbo->owner_id = REPLICA_ID_NIL; > fiber_cond_create(&limbo->wait_cond); > vclock_create(&limbo->vclock); > @@ -118,6 +119,7 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn) > e->is_commit = false; > e->is_rollback = false; > rlist_add_tail_entry(&limbo->queue, e, in_queue); > + limbo->len++; > /* > * We added new entries from a remote instance to an empty limbo. > * Time to make this instance read-only. > @@ -132,8 +134,8 @@ txn_limbo_remove(struct txn_limbo *limbo, struct txn_limbo_entry *entry) > { > assert(!rlist_empty(&entry->in_queue)); > assert(txn_limbo_first_entry(limbo) == entry); > - (void) limbo; > rlist_del_entry(entry, in_queue); > + limbo->len--; > } > > static inline void > @@ -144,6 +146,7 @@ txn_limbo_pop(struct txn_limbo *limbo, struct txn_limbo_entry *entry) > assert(entry->is_rollback); > > rlist_del_entry(entry, in_queue); > + limbo->len--; > ++limbo->rollback_count; > } > > diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h > index af0addf8d..f2a98c8bb 100644 > --- a/src/box/txn_limbo.h > +++ b/src/box/txn_limbo.h > @@ -94,6 +94,10 @@ struct txn_limbo { > * them LSNs in the same order. > */ > struct rlist queue; > + /** > + * Number of entries in limbo queue. > + */ > + int64_t len; > /** > * Instance ID of the owner of all the transactions in the > * queue. Strictly speaking, nothing prevents to store not > diff --git a/test/box/info.result b/test/box/info.result > index c8037818b..3ee653773 100644 > --- a/test/box/info.result > +++ b/test/box/info.result > @@ -89,6 +89,7 @@ t > - signature > - sql > - status > + - synchro > - uptime > - uuid > - vclock > diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result > index bd3c3cce1..3457d2cc9 100644 > --- a/test/replication/qsync_basic.result > +++ b/test/replication/qsync_basic.result > @@ -637,12 +637,46 @@ box.space.sync:count() > | - 0 > | ... > > --- Cleanup. > +-- > +-- gh-5191: test box.info.synchro interface. For > +-- this sake we stop the replica and initiate data > +-- write in sync space which won't pass due to timeout. > +-- While we're sitting in a wait cycle the queue should > +-- not be empty. > +-- > +-- Make sure this test is the *LAST* one since we stop > +-- the replica node and never restart it back before the > +-- cleanup procedure, also we're spinning on default node > +-- and do not switch to other nodes. > +-- > test_run:cmd('switch default') > | --- > | - true > | ... > +test_run:cmd('stop server replica') > + | --- > + | - true > + | ... > +assert(box.info.synchro.queue.len == 0) > + | --- > + | - true > + | ... > +box.cfg{replication_synchro_timeout = 2} > + | --- > + | ... > +f = fiber.new(function() box.space.sync:insert{1024} end) > + | --- > + | ... > +test_run:wait_cond(function() return box.info.synchro.queue.len == 1 end) > + | --- > + | - true > + | ... > +test_run:wait_cond(function() return box.info.synchro.queue.len == 0 end) > + | --- > + | - true > + | ... > > +-- Cleanup > box.cfg{ \ > replication_synchro_quorum = old_synchro_quorum, \ > replication_synchro_timeout = old_synchro_timeout, \ > @@ -650,10 +684,6 @@ box.cfg{ > } > | --- > | ... > -test_run:cmd('stop server replica') > - | --- > - | - true > - | ... > test_run:cmd('delete server replica') > | --- > | - true > diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua > index 94235547d..a604d80ee 100644 > --- a/test/replication/qsync_basic.test.lua > +++ b/test/replication/qsync_basic.test.lua > @@ -248,15 +248,32 @@ for i = 1, 100 do box.space.sync:delete{i} end > test_run:cmd('switch replica') > box.space.sync:count() > > --- Cleanup. > +-- > +-- gh-5191: test box.info.synchro interface. For > +-- this sake we stop the replica and initiate data > +-- write in sync space which won't pass due to timeout. > +-- While we're sitting in a wait cycle the queue should > +-- not be empty. > +-- > +-- Make sure this test is the *LAST* one since we stop > +-- the replica node and never restart it back before the > +-- cleanup procedure, also we're spinning on default node > +-- and do not switch to other nodes. > +-- > test_run:cmd('switch default') > +test_run:cmd('stop server replica') > +assert(box.info.synchro.queue.len == 0) > +box.cfg{replication_synchro_timeout = 2} > +f = fiber.new(function() box.space.sync:insert{1024} end) > +test_run:wait_cond(function() return box.info.synchro.queue.len == 1 end) > +test_run:wait_cond(function() return box.info.synchro.queue.len == 0 end) > > +-- Cleanup > box.cfg{ \ > replication_synchro_quorum = old_synchro_quorum, \ > replication_synchro_timeout = old_synchro_timeout, \ > replication_timeout = old_timeout, \ > } > -test_run:cmd('stop server replica') > test_run:cmd('delete server replica') > box.space.test:drop() > box.space.sync:drop() -- Serge Petrenko