Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition
Date: Tue, 15 Jun 2021 22:55:21 +0200
Message-ID: <797fdc67-944d-67f2-cd44-ff8ce79ce342@tarantool.org> (raw)
In-Reply-To: <fd68d2657ec9d4974b2c7641f6faacde9a99d39a.1623331925.git.sergepetrenko@tarantool.org>

Good job on the patch!

See 4 comments below.

On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
> Forbid limbo ownership transition without an explicit promote.
> Make it so that synchronous transactions may be committed only after it
> is claimed by some instance via a PROMOTE request.
> 
> Make everyone but the limbo owner read-only even when the limbo is
> empty.
> 
> Part-of #6034
> 
> @TarantoolBot document
> Title: synchronous replication changes
> 
> `box.info.synchro.queue` receives a new field: `owner`. It's a replica
> id of the instance owning the synchronous transaction limbo.
> 
> Once some instance owns the limbo, every other instance becomes
> read-only. When the limbo is unclaimed, e.g.
> `box.info.synchro.queue.owner` is `0`, everyone may be writeable, but
> cannot create synchronous transactions.
> 
> In order to claim or re-claim the limbo, you have to issue
> `box.ctl.promote()` on the instance you wish to promote.
> 
> When elections are enabled, the instance issues `box.ctl.promote()`
> automatically once it wins the elections.

1. It might be not a good idea to mention the limbo in the public
documentation. It is not described there anyhow, and raises the
question "what is limbo?".

Maybe better replace "limbo" with "queue" everywhere in the
public texts such as doc requests, changelogs, and error
messages.

> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index d93820e96..e75f54a01 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -277,6 +277,7 @@ struct errcode_record {
>  	/*222 */_(ER_QUORUM_WAIT,		"Couldn't wait for quorum %d: %s") \
>  	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
>  	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
> +	/*225 */_(ER_LIMBO_UNCLAIMED,		"Synchronous transaction limbo doesn't belong to any instance")\

2. The same as above - lets not mention limbo in the public
space. Might be ER_SYNC_UNCLAIMED, or ER_SYNC_QUEUE_UNCLAIMED,
or a similar option. And in the error message:

	transaction limbo -> transaction queue

> diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
> index 75c9b222b..6a49e2b01 100644
> --- a/test/replication/qsync_basic.test.lua
> +++ b/test/replication/qsync_basic.test.lua
> @@ -248,29 +249,6 @@ for i = 1, 100 do box.space.sync:delete{i} end
>  test_run:cmd('switch replica')
>  box.space.sync:count()
>  
> ---
> --- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
> --- instances, but also works for local rows.

3. Please, try to keep this test somehow. You can try to perform
these nops locally on the default node. Otherwise is_fully_nop in
txn_journal_entry_new() remains untested.


4. When I tried to run the tests, I got a crash:

[010] replication/gh-5195-qsync-replica-write.test.l> memtx           
[010] 
[010] [Instance "replica" killed by signal: 6 (SIGABRT)]
[010] 
[010] Last 15 lines of Tarantool Log file [Instance "replica"][/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/replica.log]:
[010] 2021-06-15 22:13:04.821 [40345] main/112/applier/unix/:/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/master.socket-iproto I> RAFT: message {term: 1, state: follower} from 1
[010] 2021-06-15 22:13:04.824 [40345] main/113/applierw/unix/:/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/master.socket-iproto C> leaving orphan mode
[010] 2021-06-15 22:13:04.824 [40345] main/103/replica I> replica set sync complete
[010] 2021-06-15 22:13:04.824 [40345] main/103/replica C> leaving orphan mode
[010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'log_level' configuration option to 5
[010] 2021-06-15 22:13:04.825 [40345] main/107/checkpoint_daemon I> scheduled next checkpoint for Tue Jun 15 23:20:29 2021
[010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'replication_timeout' configuration option to 0.1
[010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'memtx_memory' configuration option to 107374182
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'replication_sync_timeout' configuration option to 100
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'listen' configuration option to "\/Users\/gerold\/Work\/Repositories\/tarantool\/test\/var\/010_replication\/replica.socket-iproto"
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'replication' configuration option to ["\/Users\/gerold\/Work\/Repositories\/tarantool\/test\/var\/010_replication\/master.socket-iproto"]
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'log_format' configuration option to "plain"
[010] 2021-06-15 22:13:04.827 [40345] main C> entering the event loop
[010] 2021-06-15 22:13:04.903 [40345] main/128/console/unix/: I> set 'replication_synchro_timeout' configuration option to 0.001
[010] Assertion failed: (txn_limbo_is_empty(&txn_limbo)), function box_promote, file /Users/gerold/Work/Repositories/tarantool/src/box/box.cc, line 1678.

And a wrong result:

[005] replication/gh-5298-qsync-recovery-snap.test.l> vinyl           [ fail ]
[005] 
[005] Test failed! Result content mismatch:
[005] --- replication/gh-5298-qsync-recovery-snap.result	Tue Jun 15 21:48:57 2021
[005] +++ var/rejects/replication/gh-5298-qsync-recovery-snap.reject	Tue Jun 15 22:13:44 2021
[005] @@ -46,7 +46,7 @@
[005]  -- Could hang if the limbo would incorrectly handle the snapshot end.
[005]  box.space.sync:replace{11}
[005]   | ---
[005] - | - [11]
[005] + | - error: Synchronous transaction limbo doesn't belong to any instance
[005]   | ...
[005]  
[005]  old_synchro_quorum = box.cfg.replication_synchro_quorum
[005] @@ -64,7 +64,7 @@
[005]   | ...
[005]  box.space.sync:replace{12}
[005]   | ---
[005] - | - error: Quorum collection for a synchronous transaction is timed out
[005] + | - error: Synchronous transaction limbo doesn't belong to any instance
[005]   | ...
[005]  
[005]  box.cfg{                                                                        \
[005] @@ -75,18 +75,16 @@
[005]   | ...
[005]  box.space.sync:replace{13}
[005]   | ---
[005] - | - [13]
[005] + | - error: Synchronous transaction limbo doesn't belong to any instance
[005]   | ...
[005]  box.space.sync:get({11})
[005]   | ---
[005] - | - [11]
[005]   | ...
[005]  box.space.sync:get({12})
[005]   | ---
[005]   | ...
[005]  box.space.sync:get({13})
[005]   | ---
[005] - | - [13]
[005]   | ...
[005]  
[005]  box.cfg{                                                                        \
[005] 

Then I stopped.

  reply	other threads:[~2021-06-15 20:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
2021-06-10 16:47   ` Cyrill Gorcunov via Tarantool-patches
2021-06-11  8:43     ` Serge Petrenko via Tarantool-patches
2021-06-11  8:44       ` Cyrill Gorcunov via Tarantool-patches
2021-06-15 20:53   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
2021-06-15 20:55   ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
2021-06-18 22:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-21 10:13     ` Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
2021-06-15 20:57   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
2021-06-18 22:49       ` Vladislav Shpilevoy via Tarantool-patches
2021-06-21  8:55         ` Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
2021-06-15 20:59   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
2021-06-18 22:52       ` Vladislav Shpilevoy via Tarantool-patches
2021-06-21 10:12         ` Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
2021-06-18 22:52   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-21 14:56     ` Serge Petrenko via Tarantool-patches
2021-06-10 13:32 ` [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term Serge Petrenko via Tarantool-patches
2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
2021-06-18 22:53   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-21 15:02     ` Serge Petrenko via Tarantool-patches
2021-06-15 20:53 ` [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=797fdc67-944d-67f2-cd44-ff8ce79ce342@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git