[Tarantool-patches] [PATCH v4 00/11] Replication from memory

Georgy Kirichenko georgy at tarantool.org
Wed Feb 12 12:39:09 MSK 2020


This is a complete redesign of the previous version of the
feature. First five patches are refactoring done to make
corresponding facilities, recovery, coio and xstream,
C-compliant. Two minor changes are picked out in order to
facilitate review.

Sixth patch extracts xlog batch writing into a separate
routine what helps with further reviews too.

Matrix clock is a structure maintaining a set of vclocks and
used to build a n-majority vclock (a vclock each component of
them is greather or equal than n corresponding components of
all containing vclocks). This feature is used in order to
determine a vclock read by all replicas (0-majority) or
a vclock which is applied by n-replicas in case of synchronous
replication.

Matrix clock allows wal to track relay vclocks and collect
garbage without tx thread what is implemented in the next patch.

Xrow buffer objects is a in-mempory data structure placing
encoded xrows data as well as corresponding xrow headers into
rotating memory buffers. The main purpose is to let atransaction
to live in memory for some time even after the transaction
finalization. Xrow encoded data is stored into obufs whereas
headers are stored in arrays. Such approach allows to analyze
xrow header (replica id, lsn and group) without decoding blob
data as recovery does now. Additionally it is possible to scan
xrow headers and build a big data range containing already encoded
data in order to send the data with one call (not implemented yet).

Tenth patch does a wal refactoring consisiting in xrow buffer
usage before any actual write.

The last patch implements in memory replication. From now a relay
lives in a wal thread (what is inevitably in case of synchronous
replication) as a pair of fibers - writer and reader. The reader
has the same mission as before - to read and process replica status
vclock. The writer fetcher rows from wal xrow buffer and then
transmits them to a replica. If wal memory does not contain
required rows then writing fiber spawns a cord which reads logs
from files. Also relay provides a special filter function which
is used by the writer in order to implement previous relaying
logic (skip rows, nops).

Branch:
https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3794-memory-replication
Issue: https://github.com/tarantool/tarantool/issues/3794

Georgy Kirichenko (11):
  recovery: do not call recovery_stop_local inside recovery_delete
  recovery: do not throw an error
  coio: do not allow parallel usage of coio
  coio: do not throw an error, minor refactoring
  xstream: get rid of an exception
  wal: extract log write batch into a separate routine
  wal: matrix clock structure
  wal: track relay vclock and collect logs in wal thread
  wal: xrow memory buffer and cursor
  wal: use a xrow buffer object for entry encoding
  replication: use wal memory buffer to fetch rows

 src/box/CMakeLists.txt                        |   6 +-
 src/box/applier.cc                            |  49 +-
 src/box/box.cc                                |  81 +-
 src/box/gc.c                                  | 216 ++---
 src/box/gc.h                                  |  95 +-
 src/box/lua/info.c                            |  33 +-
 src/box/mclock.c                              | 374 ++++++++
 src/box/mclock.h                              | 125 +++
 src/box/recovery.cc                           | 100 ++-
 src/box/recovery.h                            |  14 +-
 src/box/relay.cc                              | 649 ++++----------
 src/box/relay.h                               |   6 +-
 src/box/replication.cc                        |  37 +-
 src/box/wal.c                                 | 829 ++++++++++++++++--
 src/box/wal.h                                 |  97 +-
 src/box/xlog.c                                |  57 +-
 src/box/xlog.h                                |  14 +
 src/box/xrow_buf.c                            | 374 ++++++++
 src/box/xrow_buf.h                            | 197 +++++
 src/box/xrow_io.cc                            |  59 +-
 src/box/xrow_io.h                             |  11 +-
 src/box/xstream.cc                            |  44 -
 src/box/xstream.h                             |   9 +-
 src/lib/core/coio.cc                          | 534 ++++++-----
 src/lib/core/coio.h                           |  19 +-
 src/lib/core/coio_buf.h                       |   8 +
 src/lib/core/errinj.h                         |   1 +
 test/box-py/iproto.test.py                    |   9 +-
 test/box/errinj.result                        | 134 +--
 test/replication/force_recovery.result        |   8 +
 test/replication/force_recovery.test.lua      |   2 +
 test/replication/gc_no_space.result           |  30 +-
 test/replication/gc_no_space.test.lua         |  12 +-
 test/replication/replica_rejoin.result        |   8 +
 test/replication/replica_rejoin.test.lua      |   2 +
 .../show_error_on_disconnect.result           |   8 +
 .../show_error_on_disconnect.test.lua         |   2 +
 test/replication/suite.ini                    |   2 +-
 test/unit/CMakeLists.txt                      |   2 +
 test/unit/mclock.result                       |  18 +
 test/unit/mclock.test.c                       | 160 ++++
 test/xlog/panic_on_wal_error.result           |  12 +
 test/xlog/panic_on_wal_error.test.lua         |   3 +
 test/xlog/suite.ini                           |   2 +-
 44 files changed, 3063 insertions(+), 1389 deletions(-)
 create mode 100644 src/box/mclock.c
 create mode 100644 src/box/mclock.h
 create mode 100644 src/box/xrow_buf.c
 create mode 100644 src/box/xrow_buf.h
 delete mode 100644 src/box/xstream.cc
 create mode 100644 test/unit/mclock.result
 create mode 100644 test/unit/mclock.test.c

-- 
2.25.0



More information about the Tarantool-patches mailing list