From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 26 Sep 2018 17:46:58 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH rfc] schema: add possibility to find and throw away dead replicas Message-ID: <20180926144658.dru6pw44e33nnnpi@esperanza> References: <20180921182503.14027-1-arkholga@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180921182503.14027-1-arkholga@tarantool.org> To: Olga Arkhangelskaia Cc: tarantool-patches@freelists.org List-ID: On Fri, Sep 21, 2018 at 09:25:03PM +0300, Olga Arkhangelskaia wrote: > Adds possibility to get list of alive replicas in a replicaset, > prune from box.space_cluster those who is not considered as alive, > and if one has doubts see state of replicaset. > > Replica is considered alive if it is just added, its status after > timeout period is not stopped or disconnected. However it it has both > roles (master and replica) we consider such instance dead only if its > upstream and downstream status is stopped or disconnected. > > If replica is considered dead we can prune its uuid from _cluster space. > If one not sure if the replica is dead or is there is any activity on it > it is possible to list replicas with its role, status and lsn > statistics. > > If you have some ideas how else we can/should decide whether replica is dead > please share. > > Closes #3110 > --- > > https://github.com/tarantool/tarantool/issues/3110 > https://github.com/tarantool/tarantool/tree/OKriw/gh-3110-prune-dead-replica-from-replicaset-1.10 A documentation request with the new API description is missing. Tests don't pass on Travis CI. Regarding the code: 1. Why do you add a function that lists *alive* replicas? The issue author didn't ask for that. He asked for a script that would delete dead replicas from the _cluster system space. We might want to add a function that would list *dead* replicas so that he/she could check what replicas would be deleted (aka "dry run"), but it doesn't make sense to list alive replicas. 2. Dead replica detection is utterly ridiculuous: the functions sleeps for the given amount of time and then deletes inactive replicas. As a user, I'd want to have an ability to delete replicas that have been inactive for, say, a day. Does this mean that I have to wait for a whole day before this function completes? Obviously, no. I guess tarantool core should keep track of the time each replica was active last time so that the function would work instantly. The time should probably be persisted so that restarts wouldn't affect the way the function works.