[Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it

Konstantin Osipov kostja.osipov at gmail.com
Thu Feb 27 09:52:40 MSK 2020


* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [20/02/27 09:42]:
> > +	/*
> > +	 * Stop accepting local rows coming from a remote
> > +	 * instance as soon as local WAL starts accepting writes.
> > +	 */
> > +	unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id;
> 
> 1. I was always wondering, what if the instance got orphaned after it
> started accepting writes? WAL is fully functional, it syncs whatever is
> needed, and then a resubscribe happens. Can this break anything?

Good catch. I wanted to make this comment too, but checked the
code and it seems we're safe, since we also switch engine vtab to
readonly. 

It is important not to spoil this invariant by future commits
though, so a comment on significance of orphan mode to replication
correctness in box_set_orphan would be nice...

Or extending the comment for is_orphan variable and its difference
from is_ro variable. Basically, is_ro is a user-level setting,
which doesn't prevent writes to temporary tables and some internal
writes e.g. to _cluster table, while is_orphan is server-wide
internal setting which is expected to freeze *all* writes except
from a remote, otherwise there will be correctness issues.

> 2. Consider this hack which I just invented. In that way you won't
> depend on ERRINJ and NDEBUG interconnection.
> 
> ====================
> @@ -282,9 +282,7 @@ tx_schedule_commit(struct cmsg *msg)
>  	ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; });
>  	/* Update the tx vclock to the latest written by wal. */
>  	vclock_copy(&replicaset.vclock, &batch->vclock);
> -#ifndef NDEBUG
> -skip_update:
> -#endif
> +	ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, {skip_update:;});
>  	tx_schedule_queue(&batch->commit);
>  	mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base));
>  }
> ====================
> 
> Talking of the injection itself - don't know really. Perhaps
> it would be better to add a delay to the wal_write_to_disk()
> function, to its very end, after wal_notify_watchers(). In
> that case relay will wake up, send whatever it wants, and TX
> won't update the vclock until you let wal_write_to_disk()
> finish. Seems more natural this way.

I agree.


-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list