From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 823804696C3 for ; Sat, 28 Mar 2020 09:17:24 +0300 (MSK) Received: by mail-lf1-f67.google.com with SMTP id n20so9651318lfl.10 for ; Fri, 27 Mar 2020 23:17:24 -0700 (PDT) Date: Sat, 28 Mar 2020 09:17:22 +0300 From: Konstantin Osipov Message-ID: <20200328061722.GC23207@atlas> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org * Serge Petrenko [20/03/27 18:08]: One thing I don't understand is how is this extra patch necessary to close #4114? Aren't the first 3 patches enough? This is a nice optimization and is perhaps a pre-requisite for a fix for some anonymous replica bug, but how does it affect vinyl? > Sign local space requests with a zero instance id. This allows to split > local changes aside from the changes, which should be visible to the whole > cluster, and stop sending NOPs to replicas to follow local vclock. > > Moreover, it fixes the following bug with local spaces and replication. > In a situation when there are a master and a replica set up, replica may > still write to local spaces even if it's read-only. Local space > operations used to promote instance's lsn before this patch. Eventually, > master would have vclock {1:x} and replica'd have vclock {1:x, 2:y}, > where y > 0, due to local space requests performed by the replica. > If a snapshot happens on replica side, replica will delete it's .xlog > files prior to the snapshot, since no one replicates from it and thus it > doesn't have any registered GC consumers. > From this point, every attempt to configure replication from replica to > master will fail, since master will try to fetch records which account > for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y}, > even though master will never need the rows in range {2:1} - {2:y}, > since they'll be turned to NOPs during replication. I think the issue above is fixed by lexicographical search for min vclock in gc. > > Starting from this patch, in the situation described above, replica's > clock will be {0:y, 1:x}, and, since local space requests are now not > replicated at all, master will be able to follow replica, turning the > configuration to master-master. > @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header) > say_info("remote vclock %s local vclock %s", > vclock_to_string(&replica_clock), vclock_to_string(&vclock)); > > + /* > + * Replica clock is used in gc state and recovery > + * initialization, so we need to replace the remote 0-th > + * component with our own one. This makes recovery work > + * correctly: we're trying to recover from an xlog whose > + * vclock is smaller than remote replica clock in each > + * component, exluding the 0-th component. This leads to > + * finding the correct WAL, if it exists, since we do not > + * need to recover local rows (the ones, that contribute > + * to the 0-th vclock component). It would be bad to set > + * 0-th vclock component to a smaller value, since it > + * would unnecessarily require additional WALs, which may > + * have already been deleted. > + * Same stands for gc. Remote instances do not need this > + * instance's local rows, and after gc was reworked to > + * track individual vclock components instead of > + * signatures it's safe to set the local component to the > + * most recent value. > + */ > + vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0)); As I mentioned above, it seems unnecessary for 4114, but it also seems insufficient you change to use replica id 0 for local requests later in this patch. relay_subscribe will use signature-based search of xlog so playing around with components may be not enough. E.g. if local component is huge, we may skip over some important changes the replica is missing. Apparently relay has to use lexigraphical search in xlog_dir index as well! Please provide an explanation in CS/code comment or lets discuss this if you disagree. -- Konstantin Osipov, Moscow, Russia