Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v3 06/10] raft: keep track of greatest known term and filter replication sources based on that
Date: Fri, 16 Apr 2021 01:27:40 +0200	[thread overview]
Message-ID: <72b60d0e-f657-1102-9129-e9f719af338b@tarantool.org> (raw)
In-Reply-To: <578222e82d897fd042d73afefa14766a0518de09.1618409665.git.sergepetrenko@tarantool.org>

Good job on the patch!

Please, try to reduce length of the lines in the commit
message, or at least its title. It is suuuper long now.

See 10 comments below.

> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 4898f9f7b..3fb864686 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -790,6 +790,12 @@ apply_synchro_row_cb(struct journal_entry *entry)
>  		applier_rollback_by_wal_io();
>  	} else {
>  		txn_limbo_process(&txn_limbo, synchro_entry->req);
> +		if (iproto_type_is_promote_request(synchro_entry->req->type)) {
> +			raft_source_update_term(box_raft(),
> +						synchro_entry->req->origin_id,
> +						synchro_entry->req->term);

1. How about moving that to txn_limbo_read_promote()? What do you think? I see
you do it in 3 places where txn_limbo_process() or txn_limbo_read_promote()
are called on PROMOTE rows.

> +
> +		}
>  		trigger_run(&replicaset.applier.on_wal_write, NULL);
>  	}
>  	/* The fiber is the same on final join. */
> @@ -1027,6 +1033,28 @@ applier_apply_tx(struct applier *applier, struct stailq *rows)
>  		}
>  	}
>  
> +	/*
> +	 * When elections are enabled we must filter out synchronous rows coming
> +	 * from an instance that fell behind the current leader. This includes
> +	 * both synchronous tx rows and rows for txs following unconfirmed
> +	 * synchronous transactions.
> +	 * The rows are replaced with NOPs to preserve the vclock consistency.
> +	 */
> +	struct applier_tx_row *item;
> +	if (raft_source_has_outdated_term(box_raft(), applier->instance_id) &&

2. The names are too long IMO. I would propose

	raft_is_node_outdated(raft, id)   // Check if behind
	raft_process_term(raft, id, term) // Set term for a node or skip if
						the same or older
	raft_node_term(raft, id)         // Get term

'source' is not really a perfect name, because raft nodes send
messages to each other. There are no one-directional channels
AFAIR like we have with upstream and downstream in the replication.

I used 'source' in raft_process_heartbeat() as like a source of the
heartbeat message. Note like the nodes are called sources everywhere.

Also I used 'process' for the new term, because we already have
raft_process_heartbeat() to handle info from a node with a given
ID, and I thought it makes sense to keep them similar.


3. Why does raft_is_source_allowed() still exist when we have this wonder?

> +	    (last_row->wait_sync ||
> +	     (iproto_type_is_synchro_request(first_row->type) &&
> +	     !iproto_type_is_promote_request(first_row->type)))) {
> +		stailq_foreach_entry(item, rows, next) {
> +			struct xrow_header *row = &item->row;
> +			row->type = IPROTO_NOP;
> +			/*
> +			 * Row body is saved to fiber's region and will be freed
> +			 * on next fiber_gc() call.
> +			 */
> +			row->bodycnt = 0;
> +		}
> +	}
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 722fc23b7..f44dd0e54 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -426,6 +426,10 @@ wal_stream_apply_synchro_row(struct wal_stream *stream, struct xrow_header *row)
>  		return -1;
>  	}
>  	txn_limbo_process(&txn_limbo, &syn_req);
> +	if (syn_req.type == IPROTO_PROMOTE) {
> +			raft_source_update_term(box_raft(), syn_req.origin_id,
> +						syn_req.term);

4. Misaligned. Also see the first comment.

> @@ -1558,20 +1567,21 @@ box_clear_synchro_queue(bool try_wait)
>  			rc = -1;
>  		} else {
>  promote:
> -			/*
> -			 * Term parameter is unused now, We'll pass
> -			 * box_raft()->term there later.
> -			 */
> -			txn_limbo_write_promote(&txn_limbo, wait_lsn, 0);
> +			/* We cannot possibly get here in a volatile state. */
> +			assert(box_raft()->volatile_term == box_raft()->term);
> +			txn_limbo_write_promote(&txn_limbo, wait_lsn,
> +						box_raft()->term);
>  			struct synchro_request req = {
>  				.type = 0, /* unused */
>  				.replica_id = 0, /* unused */
>  				.origin_id = instance_id,
>  				.lsn = wait_lsn,
> -				.term = 0, /* unused */
> +				.term = box_raft()->term,
>  			};
>  			txn_limbo_read_promote(&txn_limbo, &req);
>  			assert(txn_limbo_is_empty(&txn_limbo));
> +			raft_source_update_term(box_raft(), req.origin_id,
> +						req.term);

5. See the first comment.

>  		}
>  	}
> diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
> index e447f6634..01f548fee 100644
> --- a/src/lib/raft/raft.h
> +++ b/src/lib/raft/raft.h
> @@ -207,6 +207,19 @@ struct raft {
>  	 * subsystems, such as Raft.
>  	 */
>  	const struct vclock *vclock;
> +	/**
> +	 * The biggest term seen by this instance and persisted in WAL as part
> +	 * of a PROMOTE request. May be smaller than @a term, while there are
> +	 * ongoing elections, or the leader is already known, but this instance
> +	 * hasn't read its PROMOTE request yet.
> +	 * During other times must be equal to @a term.
> +	 */
> +	uint64_t greatest_known_term;

6. Maybe omit 'known'. There can't be 'greatest_unknown_term'.

> +	/**
> +	 * Latest terms received with PROMOTE entries from remote instances.
> +	 * Raft uses them to determine data from which sources may be applied.
> +	 */
> +	struct vclock term_map;

7. I have a feeling it is similar to the limbo's LSN map. Like
they should be merged into something one. Can't formulate that
properly. I hope we will see it more clear when will move all that
to the WAL thread someday.

>  	/** State machine timed event trigger. */
>  	struct ev_timer timer;
>  	/** Configured election timeout in seconds. */
> @@ -243,6 +256,39 @@ raft_is_source_allowed(const struct raft *raft, uint32_t source_id)
>  	return !raft->is_enabled || raft->leader == source_id;
>  }
>  
> +/**
> + * Return the latest term as seen in PROMOTE requests from instance with id
> + * @a source_id.
> + */
> +static inline uint64_t
> +raft_source_term(const struct raft *raft, uint32_t source_id)
> +{
> +	assert(source_id != 0 && source_id < VCLOCK_MAX);
> +	return vclock_get(&raft->term_map, source_id);
> +}
> +
> +/**
> + * Check whether replica with id @a source_id is too old to apply synchronous
> + * data from it. The check is only valid when elections are enabled.
> + */
> +static inline bool
> +raft_source_has_outdated_term(const struct raft *raft, uint32_t source_id)
> +{
> +	uint64_t source_term = vclock_get(&raft->term_map, source_id);
> +	return raft->is_enabled && source_term < raft->greatest_known_term;
> +}
> +
> +/** Remember the last term seen for replica  with id @a source_id. */
> +static inline void
> +raft_source_update_term(struct raft *raft, uint32_t source_id, uint64_t term)
> +{
> +	if ((uint64_t) vclock_get(&raft->term_map, source_id) >= term)

8. Probably having the term as uint64_t was a mistake from the beginning.
Feel free to change it to int64_t if you want, in a separate commit.

> +		return;
> +	vclock_follow(&raft->term_map, source_id, term);
> +	if (term > raft->greatest_known_term)
> +		raft->greatest_known_term = term;
> +}

9. I see these are not used in the raft code at all. Did you think about
moving it all to box/raft.h and box/raft.c? Or about covering this all
with unit tests in unit/raft.c if you decide to keep it here?

> +
>  /** Check if Raft is enabled. */
>  static inline bool
>  raft_is_enabled(const struct raft *raft)
> diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result
> new file mode 100644
> index 000000000..ff3104de5
> --- /dev/null
> +++ b/test/replication/gh-5445-leader-inconsistency.result
> @@ -0,0 +1,291 @@
> +-- test-run result file version 2
> +test_run = require("test_run").new()
> + | ---
> + | ...
> +
> +is_leader_cmd = "return box.info.election.state == 'leader'"
> + | ---
> + | ...
> +
> +-- Auxiliary.
> +test_run:cmd('setopt delimiter ";"')
> + | ---
> + | - true
> + | ...
> +function get_leader(nrs)
> +    local leader_nr = 0
> +    test_run:wait_cond(function()
> +        for nr, do_check in pairs(nrs) do
> +            if do_check then
> +                local is_leader = test_run:eval('election_replica'..nr,
> +                                                is_leader_cmd)[1]
> +                if is_leader then
> +                    leader_nr = nr
> +                    return true
> +                end
> +            end
> +        end
> +        return false
> +    end)
> +    assert(leader_nr ~= 0)
> +    return leader_nr
> +end;
> + | ---
> + | ...
> +
> +function name(id)
> +    return 'election_replica'..id
> +end;

10. You can move this function above get_leader() and use it
in there.

  reply	other threads:[~2021-04-15 23:27 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14 14:17 [Tarantool-patches] [PATCH v3 00/10] raft: introduce manual elections and fix a bug with re-applying rolled back transactions Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 01/10] wal: enrich row's meta information with sync replication flags Serge Petrenko via Tarantool-patches
2021-04-15 23:18   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16  7:08     ` Serge Petrenko via Tarantool-patches
2021-04-16  7:11       ` Serge Petrenko via Tarantool-patches
2021-04-16  8:57       ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 02/10] xrow: introduce a PROMOTE entry Serge Petrenko via Tarantool-patches
2021-04-15 23:19   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 16:18     ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 03/10] box: actualise iproto_key_type array Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 04/10] box: make clear_synchro_queue() write a PROMOTE entry instead of CONFIRM + ROLLBACK Serge Petrenko via Tarantool-patches
2021-04-15 23:20   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16  9:28     ` Serge Petrenko via Tarantool-patches
2021-04-16 14:03       ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 05/10] box: write PROMOTE even for empty limbo Serge Petrenko via Tarantool-patches
2021-04-15 23:21   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16  9:33     ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 06/10] raft: keep track of greatest known term and filter replication sources based on that Serge Petrenko via Tarantool-patches
2021-04-15 23:27   ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-04-16 14:16     ` Serge Petrenko via Tarantool-patches
2021-04-16 22:13       ` Vladislav Shpilevoy via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 07/10] replication: introduce a new election mode: "manual" Serge Petrenko via Tarantool-patches
2021-04-15 23:27   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 14:18     ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 08/10] Support manual elections in `box.ctl.clear_synchro_queue()` Serge Petrenko via Tarantool-patches
2021-04-15 23:30   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 15:38     ` Serge Petrenko via Tarantool-patches
2021-04-16 15:40       ` Serge Petrenko via Tarantool-patches
2021-04-16 15:50         ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 09/10] box: remove parameter from clear_synchro_queue Serge Petrenko via Tarantool-patches
2021-04-14 14:18 ` [Tarantool-patches] [PATCH v3 10/10] box.ctl: rename clear_synchro_queue to promote Serge Petrenko via Tarantool-patches
2021-04-15 23:31   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 16:13     ` Serge Petrenko via Tarantool-patches
2021-04-14 18:21 ` [Tarantool-patches] [PATCH v3 00/10] raft: introduce manual elections and fix a bug with re-applying rolled back transactions Cyrill Gorcunov via Tarantool-patches
2021-04-15 23:16 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 16:35   ` Serge Petrenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72b60d0e-f657-1102-9129-e9f719af338b@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v3 06/10] raft: keep track of greatest known term and filter replication sources based on that' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox