From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 5DCE16EC55; Sun, 25 Jul 2021 21:31:55 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5DCE16EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627237915; bh=IV7MPPRRzrJu4dWxmkgLqp1oMk4nBiDzxSNrWn74aVE=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Eg88IpsIwCZoXYZzbv2q6TBHzEkQHp0IZFTNxPj2R0V9a/aqZwA2TlK8jtUB6BQyy hpTDd2sktjbgGEjOwP+bkIyxoFikX51jB6c4HarRSwNmB8WWzHo79DcCUPgW/ewVqu qRvmjXym3Ut0TQMfH50kDJ3uoB9djfuHQPzVJVaY= Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 32FB56EC55 for ; Sun, 25 Jul 2021 21:31:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 32FB56EC55 Received: by smtp63.i.mail.ru with esmtpa (envelope-from ) id 1m7iun-00041z-5z; Sun, 25 Jul 2021 21:31:53 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com References: Message-ID: Date: Sun, 25 Jul 2021 21:31:51 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C3A9514C5AE4B3B389A94BDFA06D40730D182A05F538085040BCED958BFFB28784A4DFC7023266079D8046CBEAD5D100B675F7B53C083CB5B1 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F2EC3597058CFA6DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637E8D1333770DC60CDEA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F28DFE82B5FA17BAC60FEF3907074E18EECC7F00164DA146DAFE8445B8C89999728AA50765F7900637CAEE156C82D3D7D9389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8FA486DC37A503D0BF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947C1DAA61796BF5227B6E0066C2D8992A164AD6D5ED66289B52698AB9A7B718F8C46E0066C2D8992A16725E5C173C3A84C30A1B1C1E5F5D067DBA3038C0950A5D36B5C8C57E37DE458B0BC6067A898B09E46D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE778B471BB9634AD8A731C566533BA786AA5CC5B56E945C8DA X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458F0AFF96BAACF4158235E5A14AD4A4A4625E192CAD1D9E79DB53CE84373687089CA5CA71AD17640DF X-C1DE0DAB: 0D63561A33F958A5D8209193F955F04E05FCA6234CD2B6E5D202AAEB18CEDF2ED59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7501A9DF589746230F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3455049D7B43D89D641772AB97295E306659AC05BF12BD757F3EE63045F5DEF3BD0D08337BFB18DD311D7E09C32AA3244C02D3F767E01B47D406F796DD827E69368580396430872480FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojBHzp66hWLW3SjZxurk+5YA== X-Mailru-Sender: 9482C2B233321BD24D0BA003FCDC793EBFFD9D1E10DA5C66A4DFC7023266079D1442BDFAA493918432C2A64043BFB05F66FEC6BF5C9C28D9D2F9AC31ED4B18F0B80F102789B70DF27402F9BA4338D657ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 1/1] replication: set replica ID before _cluster commit X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Petrenko via Tarantool-patches Reply-To: Sergey Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 25.07.2021 19:53, Vladislav Shpilevoy пишет: > Replica registration works via looking for the smallest not > occupied ID in _cluster and inserting it into the space. > > It works not so good when mvcc is enabled. In particular, if more > than 1 replica try to register at the same time, they might get > the same replica_id because don't see changes of each other until > the registration in _cluster is complete. > > This in the end leads to all replicas failing the registration > except one with the 'duplicate key' error (primary index in > _cluster is replica ID). > > The patch makes the replicas occupy their ID before they commit it > into _cluster. And new replica ID search now uses the replica ID > map instead of _cluster iterator. > > This way the registration works like before - like MVCC does not > exist which is fine. > > Part of #5430 > --- Hi! Thanks for the patch! Please, find a couple of comments below. > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5430-cluster-duplicate > Issue: https://github.com/tarantool/tarantool/issues/5430 > > .../gh-5430-cluster-mvcc-duplicate.md | 7 + > src/box/alter.cc | 96 ++++++------ > src/box/box.cc | 19 +-- > src/box/replication.cc | 13 ++ > src/box/replication.h | 4 + > test/replication/gh-5430-cluster-mvcc.result | 146 ++++++++++++++++++ > .../replication/gh-5430-cluster-mvcc.test.lua | 62 ++++++++ > test/replication/gh-5430-mvcc-master.lua | 11 ++ > test/replication/gh-5430-mvcc-replica1.lua | 10 ++ > test/replication/gh-5430-mvcc-replica2.lua | 1 + > test/replication/suite.cfg | 1 + > test/replication/suite.ini | 2 +- > 12 files changed, 306 insertions(+), 66 deletions(-) > create mode 100644 changelogs/unreleased/gh-5430-cluster-mvcc-duplicate.md > create mode 100644 test/replication/gh-5430-cluster-mvcc.result > create mode 100644 test/replication/gh-5430-cluster-mvcc.test.lua > create mode 100644 test/replication/gh-5430-mvcc-master.lua > create mode 100644 test/replication/gh-5430-mvcc-replica1.lua > create mode 120000 test/replication/gh-5430-mvcc-replica2.lua > > diff --git a/changelogs/unreleased/gh-5430-cluster-mvcc-duplicate.md b/changelogs/unreleased/gh-5430-cluster-mvcc-duplicate.md > new file mode 100644 > index 000000000..59b90f026 > --- /dev/null > +++ b/changelogs/unreleased/gh-5430-cluster-mvcc-duplicate.md > @@ -0,0 +1,7 @@ > +## bugfix/replication > + > +* Fixed a rare error appearing when MVCC (`box.cfg.memtx_use_mvcc_engine`) was > + enabled and more than one replica was joined to a cluster. The join could fail > + with the error `"ER_TUPLE_FOUND: Duplicate key exists in unique index > + 'primary' in space '_cluster'"`. The same could happen at bootstrap of a > + cluster having >= 3 nodes (gh-5430). > diff --git a/src/box/alter.cc b/src/box/alter.cc > index 89bb5946c..64ba09021 100644 > --- a/src/box/alter.cc > +++ b/src/box/alter.cc > @@ -4178,47 +4178,11 @@ on_replace_dd_schema(struct trigger * /* trigger */, void *event) > return 0; > } > > -/** > - * A record with id of the new instance has been synced to the > - * write ahead log. Update the cluster configuration cache > - * with it. > - */ > -static int > -register_replica(struct trigger *trigger, void * /* event */) > -{ > - struct tuple *new_tuple = (struct tuple *)trigger->data; > - uint32_t id; > - if (tuple_field_u32(new_tuple, BOX_CLUSTER_FIELD_ID, &id) != 0) > - return -1; > - tt_uuid uuid; > - if (tuple_field_uuid(new_tuple, BOX_CLUSTER_FIELD_UUID, &uuid) != 0) > - return -1; > - struct replica *replica = replica_by_uuid(&uuid); > - if (replica != NULL) { > - replica_set_id(replica, id); > - } else { > - try { > - replica = replicaset_add(id, &uuid); > - /* Can't throw exceptions from on_commit trigger */ > - } catch(Exception *e) { > - panic("Can't register replica: %s", e->errmsg); > - } > - } > - return 0; > -} > - > +/** Unregister the replica affected by the change. */ > static int > -unregister_replica(struct trigger *trigger, void * /* event */) > +on_replace_cluster_clear_id(struct trigger *trigger, void * /* event */) > { > - struct tuple *old_tuple = (struct tuple *)trigger->data; > - > - struct tt_uuid old_uuid; > - if (tuple_field_uuid(old_tuple, BOX_CLUSTER_FIELD_UUID, &old_uuid) != 0) > - return -1; > - > - struct replica *replica = replica_by_uuid(&old_uuid); > - assert(replica != NULL); > - replica_clear_id(replica); > + replica_clear_id((struct replica *)trigger->data); > return 0; > } > > @@ -4280,14 +4244,34 @@ on_replace_dd_cluster(struct trigger *trigger, void *event) > "updates of instance uuid"); > return -1; > } > - } else { > - struct trigger *on_commit; > - on_commit = txn_alter_trigger_new(register_replica, > - new_tuple); > - if (on_commit == NULL) > - return -1; > - txn_stmt_on_commit(stmt, on_commit); > + return 0; > + } > + /* > + * With read-views enabled there might be already a replica > + * whose registration is in progress in another transaction. > + * With the same replica ID. > + */ > + if (replica_by_id(replica_id) != NULL) { > + diag_set(ClientError, ER_UNSUPPORTED, "Tarantool", > + "more than 1 replica with the same ID"); > + return -1; > } > ==================== > > I couldn't test this check because of the bug in mvcc: > https://github.com/tarantool/tarantool/issues/6246 > > ==================== I don't understand how this could happen (I mean the if branch being taken). Would you mind explaining? > + struct trigger *on_rollback = txn_alter_trigger_new( > + on_replace_cluster_clear_id, NULL); > + if (on_rollback == NULL) > + return -1; > + /* > + * Register the replica before commit so as to occupy the > + * replica ID now. While WAL write is in progress, new replicas > + * might come, they should see the ID is already in use. > + */ > + struct replica *replica = replica_by_uuid(&replica_uuid); > + if (replica != NULL) > + replica_set_id(replica, replica_id); > + else > + replica = replicaset_add(replica_id, &replica_uuid); > + on_rollback->data = replica; > + txn_stmt_on_rollback(stmt, on_rollback); > } else { > /* > * Don't allow deletion of the record for this instance > @@ -4300,9 +4284,23 @@ on_replace_dd_cluster(struct trigger *trigger, void *event) > if (replica_check_id(replica_id) != 0) > return -1; > > - struct trigger *on_commit; > - on_commit = txn_alter_trigger_new(unregister_replica, > - old_tuple); > + struct replica *replica = replica_by_id(replica_id); > + if (replica == NULL) { > + /* > + * Impossible, but it is important not to leave > + * undefined behaviour if there is a bug. Too sensitive > + * subsystem is affected. > + */ > + panic("Tried to unregister a replica not stored in " > + "replica_by_id map, id is %u", replica_id); > + } > + /* > + * Unregister only after commit. Otherwise if the transaction > + * would be rolled back, there might be already another replica > + * taken the freed ID. > + */ > + struct trigger *on_commit = txn_alter_trigger_new( > + on_replace_cluster_clear_id, replica); > if (on_commit == NULL) > return -1; > txn_stmt_on_commit(stmt, on_commit); > diff --git a/src/box/box.cc b/src/box/box.cc > index 8c10a99dd..5c10aceff 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -2407,22 +2407,9 @@ box_on_join(const tt_uuid *instance_uuid) > return; /* nothing to do - already registered */ > > box_check_writable_xc(); > - > - /** Find the largest existing replica id. */ > - struct space *space = space_cache_find_xc(BOX_CLUSTER_ID); > - struct index *index = index_find_system_xc(space, 0); > - struct iterator *it = index_create_iterator_xc(index, ITER_ALL, > - NULL, 0); > - IteratorGuard iter_guard(it); > - struct tuple *tuple; > - /** Assign a new replica id. */ > - uint32_t replica_id = 1; > - while ((tuple = iterator_next_xc(it)) != NULL) { > - if (tuple_field_u32_xc(tuple, > - BOX_CLUSTER_FIELD_ID) != replica_id) > - break; > - replica_id++; > - } > + uint32_t replica_id; > + if (replica_find_new_id(&replica_id) != 0) > + diag_raise(); > box_register_replica(replica_id, instance_uuid); > } > > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 45ad03dfd..1288bc9b1 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -1032,3 +1032,16 @@ replica_by_id(uint32_t replica_id) > { > return replicaset.replica_by_id[replica_id]; > } > + > +int > +replica_find_new_id(uint32_t *replica_id) > +{ > + for (uint32_t i = 1; i < VCLOCK_MAX; ++i) { > + if (replicaset.replica_by_id[i] == NULL) { > + *replica_id = i; > + return 0; > + } > + } > + diag_set(ClientError, ER_REPLICA_MAX, VCLOCK_MAX); > + return -1; > +} > diff --git a/src/box/replication.h b/src/box/replication.h > index 57e0f10ae..5d1fa1255 100644 > --- a/src/box/replication.h > +++ b/src/box/replication.h > @@ -360,6 +360,10 @@ replica_by_uuid(const struct tt_uuid *uuid); > struct replica * > replica_by_id(uint32_t replica_id); > > +/** Find the smallest free replica ID in the available range. */ nit: free -> empty / unoccupied > +int > +replica_find_new_id(uint32_t *replica_id); > + > /** > * Find a node in the replicaset on which the instance can try to register to > * join the replicaset.