[Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms

Cyrill Gorcunov gorcunov at gmail.com
Tue Aug 3 14:23:23 MSK 2021


On Tue, Aug 03, 2021 at 01:48:18AM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 6 comments below.
> 
> > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > index f621fa657..a7f472714 100644
> > --- a/src/box/applier.cc
> > +++ b/src/box/applier.cc
> > @@ -909,12 +910,15 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
> >  	 * transactions side, including the async ones.
> >  	 */
> >  	if (journal_write(&entry.base) != 0)
> > -		goto err;
> > +		goto err_unlock;
> >  	if (entry.base.res < 0) {
> >  		diag_set_journal_res(entry.base.res);
> > -		goto err;
> > +		goto err_unlock;
> >  	}
> > +	txn_limbo_term_unlock(&txn_limbo);
> >  	return 0;
> > +err_unlock:
> > +	txn_limbo_term_unlock(&txn_limbo);
> 
> 1. Could be done simpler:
> 
> ====================
> @@ -908,7 +909,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>  	 * before trying to commit. But that requires extra steps from the
>  	 * transactions side, including the async ones.
>  	 */
> -	if (journal_write(&entry.base) != 0)
> +	int rc = journal_write(&entry.base);
> +	txn_limbo_term_unlock(&txn_limbo);
> +	if (rc != 0)
>  		goto err;
>  	if (entry.base.res < 0) {
>  		diag_set_journal_res(entry.base.res);
> ====================
> 

I thought about it, and initially used exactly this way. But I think
the error code is bound to the execution context and we should setup
fiber's error in a locked way simply because our fiber's schedule
could be reworked in future where unlock would cause immediate resched
with fiber switch and execution (say we introduce a fiber's priority),
in result another fiber get running while this one woud sit without
error assigned.

Anyway, since sched reworks seems not happen in near future I'll
update the code and use your snippet above. Kind of, I've to declare
@rc earlier because otherwise I get

/home/cyrill/projects/tarantool/tarantool.git/src/box/applier.cc:912:13: note:   crosses initialization of ‘int rc’
  912 |         int rc = journal_write(&entry.base);
      |             ^~


> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 535f30292..5ca617e32 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1573,7 +1573,7 @@ box_run_elections(void)
> >  static int
> >  box_check_promote_term_intact(uint64_t promote_term)
> >  {
> > -	if (txn_limbo.promote_greatest_term != promote_term) {
> > +	if (txn_limbo_term_max_raw(&txn_limbo) != promote_term) {
> 
> 2. In raft terminology we call such data 'volatile' instead
> of 'raw'.

We need to access unlocked value for read only purpose knowing that
it might be changed after we've read it, and _raw suffix is commonly
used for such things. Since you don't like the suffix I think better
would be simply use direct access to the @promote_greatest_term variable
since _volatile suffix would be too long.

> 
> >  		diag_set(ClientError, ER_INTERFERING_PROMOTE,
> >  			 txn_limbo.owner_id);
> >  		return -1;
> > @@ -1728,7 +1728,7 @@ box_promote(void)
> >  	 * Currently active leader (the instance that is seen as leader by both
> >  	 * raft and txn_limbo) can't issue another PROMOTE.
> >  	 */
> > -	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
> > +	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
> 
> 3. Why did you change the name? The old one was closer to reality. When
> you say 'limbo term', it is not clear whether you mean the max term or
> term of one of the nodes.
> 
> Please, extract all such renames into a separate commit. Otherwise
> it is harder to find the functional changes in this one.

Because we pass the replica_id as an argument which implies that we
fetch replica's term. But not a problem, will revert it back to the
original naming.

> > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> > index 570f77c46..be5e0adf5 100644
> > --- a/src/box/txn_limbo.c
> > +++ b/src/box/txn_limbo.c
> > @@ -724,22 +725,23 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
> >  }
> >  
> >  void
> > -txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
> > +txn_limbo_process_locked(struct txn_limbo *limbo,
> > +			 const struct synchro_request *req)
> >  {
> 
> 4. Please, add an assertion that the latch is occupied. The same for the
> other _locked functions.

OK

> >  
> > +/** Lock promote data. */
> > +static inline void
> > +txn_limbo_term_lock(struct txn_limbo *limbo)
> > +{
> > +	latch_lock(&limbo->promote_latch);
> > +}
> > +
> > +/** Unlock promote data. */
> > +static void
> 
> 5. Why isn't it inline while the others are?

Seems to loose it while been reordering the code, will
add it back, thanks!

> > +
> > +/** Fetch replica's term with lock taken. */
> > +static inline uint64_t
> > +txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
> > +{
> > +	panic_on(!txn_limbo_term_is_locked(limbo),
> > +		 "limbo: unlocked term read for replica_id %u",
> > +		 replica_id);
> 
> 6. This new macro seems counter-intuitive because works vice versa
> compared to assert(). Can you try to rename it somehow and make it
> accept a condition which must be true instead of false?

I'll drop it.
---
Here is a diff on top

 - drop panic_on helper
 - access promote_greatest_term directly
 - bring txn_limbo_replica_term name back
 - use assert(latch_is_locked(&limbo->promote_latch)) test in _locked helpers
 - drop txn_limbo_term_is_locked helper

I didn't pushed it out yet, because of the next patch which we have to discuss
and rework, so mostl likely I'll send another series later, so this diff is
rather to show what have been changed.
---
 src/box/applier.cc  | 12 ++++++------
 src/box/box.cc      | 12 ++++++------
 src/box/txn_limbo.c | 14 ++++++++------
 src/box/txn_limbo.h | 32 +++++---------------------------
 4 files changed, 25 insertions(+), 45 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index a7f472714..9db286ae2 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -867,6 +867,7 @@ static int
 apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 {
 	assert(iproto_type_is_synchro_request(row->type));
+	int rc = 0;
 
 	struct synchro_request req;
 	if (xrow_decode_synchro(row, &req) != 0)
@@ -909,16 +910,15 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * before trying to commit. But that requires extra steps from the
 	 * transactions side, including the async ones.
 	 */
-	if (journal_write(&entry.base) != 0)
-		goto err_unlock;
+	rc = journal_write(&entry.base);
+	txn_limbo_term_unlock(&txn_limbo);
+	if (rc != 0)
+		goto err;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
-		goto err_unlock;
+		goto err;
 	}
-	txn_limbo_term_unlock(&txn_limbo);
 	return 0;
-err_unlock:
-	txn_limbo_term_unlock(&txn_limbo);
 err:
 	diag_log();
 	return -1;
diff --git a/src/box/box.cc b/src/box/box.cc
index a9f6da19d..31adb6c7d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1573,7 +1573,7 @@ box_run_elections(void)
 static int
 box_check_promote_term_intact(uint64_t promote_term)
 {
-	if (txn_limbo_term_max_raw(&txn_limbo) != promote_term) {
+	if (txn_limbo.promote_greatest_term != promote_term) {
 		diag_set(ClientError, ER_INTERFERING_PROMOTE,
 			 txn_limbo.owner_id);
 		return -1;
@@ -1585,7 +1585,7 @@ box_check_promote_term_intact(uint64_t promote_term)
 static int
 box_trigger_elections(void)
 {
-	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
 	raft_new_term(box_raft());
 	if (box_raft_wait_term_persisted() < 0)
 		return -1;
@@ -1596,7 +1596,7 @@ box_trigger_elections(void)
 static int
 box_try_wait_confirm(double timeout)
 {
-	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
 	txn_limbo_wait_empty(&txn_limbo, timeout);
 	return box_check_promote_term_intact(promote_term);
 }
@@ -1612,7 +1612,7 @@ box_wait_limbo_acked(void)
 	if (txn_limbo_is_empty(&txn_limbo))
 		return txn_limbo.confirmed_lsn;
 
-	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
 	int quorum = replication_synchro_quorum;
 	struct txn_limbo_entry *last_entry;
 	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
@@ -1728,7 +1728,7 @@ box_promote(void)
 	 * Currently active leader (the instance that is seen as leader by both
 	 * raft and txn_limbo) can't issue another PROMOTE.
 	 */
-	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
+	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
 			 raft->term && txn_limbo.owner_id == instance_id;
 	if (box_election_mode != ELECTION_MODE_OFF)
 		is_leader = is_leader && raft->state == RAFT_STATE_LEADER;
@@ -1784,7 +1784,7 @@ box_demote(void)
 		return 0;
 
 	/* Currently active leader is the only one who can issue a DEMOTE. */
-	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
+	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
 			 box_raft()->term && txn_limbo.owner_id == instance_id;
 	if (box_election_mode != ELECTION_MODE_OFF)
 		is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index be5e0adf5..5a5565a70 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -46,7 +46,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 	fiber_cond_create(&limbo->wait_cond);
 	vclock_create(&limbo->vclock);
 	vclock_create(&limbo->promote_term_map);
-	limbo->promote_term_max = 0;
+	limbo->promote_greatest_term = 0;
 	latch_create(&limbo->promote_latch);
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
@@ -309,7 +309,7 @@ txn_limbo_checkpoint(const struct txn_limbo *limbo,
 	req->type = IPROTO_PROMOTE;
 	req->replica_id = limbo->owner_id;
 	req->lsn = limbo->confirmed_lsn;
-	req->term = limbo->promote_term_max;
+	req->term = limbo->promote_greatest_term;
 }
 
 static void
@@ -728,20 +728,22 @@ void
 txn_limbo_process_locked(struct txn_limbo *limbo,
 			 const struct synchro_request *req)
 {
+	assert(latch_is_locked(&limbo->promote_latch));
+
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
 	if (txn_limbo_term_locked(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
-		if (term > limbo->promote_term_max)
-			limbo->promote_term_max = term;
+		if (term > limbo->promote_greatest_term)
+			limbo->promote_greatest_term = term;
 	} else if (iproto_type_is_promote_request(req->type) &&
-		   limbo->promote_term_max > 1) {
+		   limbo->promote_greatest_term > 1) {
 		/* PROMOTE for outdated term. Ignore. */
 		say_info("RAFT: ignoring %s request from instance "
 			 "id %u for term %llu. Greatest term seen "
 			 "before (%llu) is bigger.",
 			 iproto_type_name(req->type), origin, (long long)term,
-			 (long long)limbo->promote_term_max);
+			 (long long)limbo->promote_greatest_term);
 		return;
 	}
 
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 25faffd2b..abec9f72a 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -147,7 +147,7 @@ struct txn_limbo {
 	 * instance hasn't read its PROMOTE request yet. During other times the
 	 * limbo and raft are in sync and the terms are the same.
 	 */
-	uint64_t promote_term_max;
+	uint64_t promote_greatest_term;
 	/**
 	 * To order access to the promote data.
 	 */
@@ -224,26 +224,17 @@ txn_limbo_term_lock(struct txn_limbo *limbo)
 }
 
 /** Unlock promote data. */
-static void
+static inline void
 txn_limbo_term_unlock(struct txn_limbo *limbo)
 {
 	latch_unlock(&limbo->promote_latch);
 }
 
-/** Test if promote data is locked. */
-static inline bool
-txn_limbo_term_is_locked(const struct txn_limbo *limbo)
-{
-	return latch_is_locked(&limbo->promote_latch);
-}
-
 /** Fetch replica's term with lock taken. */
 static inline uint64_t
 txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
 {
-	panic_on(!txn_limbo_term_is_locked(limbo),
-		 "limbo: unlocked term read for replica_id %u",
-		 replica_id);
+	assert(latch_is_locked(&limbo->promote_latch));
 	return vclock_get(&limbo->promote_term_map, replica_id);
 }
 
@@ -252,7 +243,7 @@ txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
  * @a replica_id.
  */
 static inline uint64_t
-txn_limbo_term(struct txn_limbo *limbo, uint32_t replica_id)
+txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
 {
 	txn_limbo_term_lock(limbo);
 	uint64_t v = txn_limbo_term_locked(limbo, replica_id);
@@ -260,19 +251,6 @@ txn_limbo_term(struct txn_limbo *limbo, uint32_t replica_id)
 	return v;
 }
 
-/**
- * Fiber's preempt not safe read of @a terms_max.
- *
- * Use it if you're interested in current value
- * only and ready that the value is getting updated
- * if after the read yield happens.
- */
-static inline uint64_t
-txn_limbo_term_max_raw(struct txn_limbo *limbo)
-{
-	return limbo->promote_term_max;
-}
-
 /**
  * Check whether replica with id @a source_id is too old to apply synchronous
  * data from it. The check is only valid when elections are enabled.
@@ -283,7 +261,7 @@ txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
 {
 	txn_limbo_term_lock(limbo);
 	bool res = txn_limbo_term_locked(limbo, replica_id) <
-		limbo->promote_term_max;
+		limbo->promote_greatest_term;
 	txn_limbo_term_unlock(limbo);
 	return res;
 }
-- 
2.31.1



More information about the Tarantool-patches mailing list