[Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms

Cyrill Gorcunov gorcunov at gmail.com
Wed Sep 1 19:04:37 MSK 2021


On Mon, Aug 23, 2021 at 02:41:51PM +0300, Cyrill Gorcunov wrote:
> On Mon, Aug 23, 2021 at 02:32:27PM +0300, Serge Petrenko wrote:
> > > 
> > > In the next patch you would make txn_limbo_process_begin()
> > > also take the request to validate it. Then the 'filtering'
> > > would become internal to the limbo.
> > 
> > I agree with Vlad here.
> > 
> > txn_limbo_process_begin()/commit()/rollback
> > 
> > looks more clean than calling lock()/unlock() manually.
> > 
> > Let's stick with Vlad's proposal then.
> 
> Sure thing, thanks!

Guys, this scheme doesn't work, it makes code even more complex.
The problem is that the completion is called from another fiber.
Here is a log from replica

[cyrill at grain 001_replication] grep promote_latch replica.log 
2021-09-01 18:41:42.065 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: lock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i
2021-09-01 18:41:42.065 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: unlock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i
2021-09-01 18:41:42.224 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: lock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i
2021-09-01 18:41:42.225 [2061868] main I> limbo: unlock promote_latch 1: sched
tarantool: /home/cyrill/projects/tarantool/tarantool.git/src/lib/core/latch.h:175: latch_unlock: Assertion `l->owner == fiber()' failed.

Initially the limbo get latched by applier fiber but the completion called from
sched fiber then which cause assertion to trigger. Actually we could add another
helper which would unlock the promote_latch from applier but as to me this all
become an extreme mess, and I'm pretty sure this is a bad design: locks always
should be taken and released from the same function body, until there are some
very serious reasons to not do so.

I used

int
txn_limbo_process_begin(struct txn_limbo *limbo,
			const struct synchro_request *req)
{
	...
	latch_lock(&limbo->promote_latch);
	...
}

txn_limbo_process_commit(struct txn_limbo *limbo,
			 const struct synchro_request *req)
{
	assert(latch_is_locked(&limbo->promote_latch));
	...
	latch_unlock(&limbo->promote_latch);
}

void
txn_limbo_process_rollback(struct txn_limbo *limbo)
{
	assert(latch_is_locked(&limbo->promote_latch));
	latch_unlock(&limbo->promote_latch);
}

and it triggers from

static void
apply_synchro_row_cb(struct journal_entry *entry)
{
	assert(entry->complete_data != NULL);
	struct synchro_entry *synchro_entry =
		(struct synchro_entry *)entry->complete_data;
	if (entry->res < 0) {
		applier_rollback_by_wal_io(entry->res);
-->		txn_limbo_process_rollback(&txn_limbo);
	} else {
		replica_txn_wal_write_cb(synchro_entry->rcb);
-->		txn_limbo_process_commit(&txn_limbo, synchro_entry->req);
		trigger_run(&replicaset.applier.on_wal_write, NULL);
	}
	fiber_wakeup(synchro_entry->owner);
}

I tend to step back to explicit locking/unlocking to not mess with
triggers and completion contexts.

	Cyrill


More information about the Tarantool-patches mailing list