[Tarantool-patches] [RFC] Quorum-based synchronous replication

Konstantin Osipov kostja.osipov at gmail.com
Thu May 14 02:31:59 MSK 2020


* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [20/05/14 00:37]:
> >>> 3) One can quickly run out of memory for undo. Any sync
> >>>    transaction should be capped with a timeout to avoid OOMs. I
> >>>    don't know how many times I should repeat it. The only good
> >>>    solution for load control is in-memory WAL, which will allow to
> >>>    rollback all transactions as soon as network partitioning is
> >>>    detected.
> >>
> >> How in-memry WAL can help save on _undo_ memory? 
> >> To rollback whatever amount of transactions one need to store the undo. 
> > 
> > I wrote earlier that it works as a natural failure detector and
> > throttling mechanism. If
> > there is no quorum, we can see it immediately by looking at the
> > number of active subscribers of the in-memory WAL, so do not
> > accumulate undo.
> 
> Here we go again ...
> 
> Talking of throttling. Without in-memory WAL no need for throttling. All is
> 'slow' by design already, as you think.

What is the limit for transactions in txn_limbo list? How does
this limit work? What about the fibers, which are pinned as long
as the transaction is not committed?
> 
> Talking of failure detection - what??? I don't get it. This is something new.
> With in-memory relay or without you anyway can see if there is a quorum.

How do you "see" it? You write to the WAL and wait for acks. You
could add a wait timeout, and assume there is no quorum if there
are no acks within the timeout. This is not the best strategy, but
there is no other. The spec doesn't say even that, it simply says
that somehow lack of quorum is detected, but how it is detected is
not clear.

With in-memory WAL you can afford to wait longer if you have space
in the ring buffer, and you know immediately if you shouldn't wait
because you see that the ring buffer is full and the majority of
subscribers are behind the start of the buffer.


> This is a matter of API of replication and transaction modules, and their
> interaction with each other, solved by txn_limbo in my branch.

How is it "solved"?

> But still, I don't see how knowing number of subscribers helps with the
> quorum. Subscriber presence does not add to quorums by itself. Anyway every
> transaction needs to be replicated before you can say that its quorum got
> +1 replica ack.

It helps to see quickly absence of the quorum, not presence of it.

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list