From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 31198469719 for ; Wed, 12 Feb 2020 12:39:23 +0300 (MSK) From: Georgy Kirichenko Date: Wed, 12 Feb 2020 12:39:09 +0300 Message-Id: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v4 00/11] Replication from memory List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org This is a complete redesign of the previous version of the feature. First five patches are refactoring done to make corresponding facilities, recovery, coio and xstream, C-compliant. Two minor changes are picked out in order to facilitate review. Sixth patch extracts xlog batch writing into a separate routine what helps with further reviews too. Matrix clock is a structure maintaining a set of vclocks and used to build a n-majority vclock (a vclock each component of them is greather or equal than n corresponding components of all containing vclocks). This feature is used in order to determine a vclock read by all replicas (0-majority) or a vclock which is applied by n-replicas in case of synchronous replication. Matrix clock allows wal to track relay vclocks and collect garbage without tx thread what is implemented in the next patch. Xrow buffer objects is a in-mempory data structure placing encoded xrows data as well as corresponding xrow headers into rotating memory buffers. The main purpose is to let atransaction to live in memory for some time even after the transaction finalization. Xrow encoded data is stored into obufs whereas headers are stored in arrays. Such approach allows to analyze xrow header (replica id, lsn and group) without decoding blob data as recovery does now. Additionally it is possible to scan xrow headers and build a big data range containing already encoded data in order to send the data with one call (not implemented yet). Tenth patch does a wal refactoring consisiting in xrow buffer usage before any actual write. The last patch implements in memory replication. From now a relay lives in a wal thread (what is inevitably in case of synchronous replication) as a pair of fibers - writer and reader. The reader has the same mission as before - to read and process replica status vclock. The writer fetcher rows from wal xrow buffer and then transmits them to a replica. If wal memory does not contain required rows then writing fiber spawns a cord which reads logs from files. Also relay provides a special filter function which is used by the writer in order to implement previous relaying logic (skip rows, nops). Branch: https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3794-memory-replication Issue: https://github.com/tarantool/tarantool/issues/3794 Georgy Kirichenko (11): recovery: do not call recovery_stop_local inside recovery_delete recovery: do not throw an error coio: do not allow parallel usage of coio coio: do not throw an error, minor refactoring xstream: get rid of an exception wal: extract log write batch into a separate routine wal: matrix clock structure wal: track relay vclock and collect logs in wal thread wal: xrow memory buffer and cursor wal: use a xrow buffer object for entry encoding replication: use wal memory buffer to fetch rows src/box/CMakeLists.txt | 6 +- src/box/applier.cc | 49 +- src/box/box.cc | 81 +- src/box/gc.c | 216 ++--- src/box/gc.h | 95 +- src/box/lua/info.c | 33 +- src/box/mclock.c | 374 ++++++++ src/box/mclock.h | 125 +++ src/box/recovery.cc | 100 ++- src/box/recovery.h | 14 +- src/box/relay.cc | 649 ++++---------- src/box/relay.h | 6 +- src/box/replication.cc | 37 +- src/box/wal.c | 829 ++++++++++++++++-- src/box/wal.h | 97 +- src/box/xlog.c | 57 +- src/box/xlog.h | 14 + src/box/xrow_buf.c | 374 ++++++++ src/box/xrow_buf.h | 197 +++++ src/box/xrow_io.cc | 59 +- src/box/xrow_io.h | 11 +- src/box/xstream.cc | 44 - src/box/xstream.h | 9 +- src/lib/core/coio.cc | 534 ++++++----- src/lib/core/coio.h | 19 +- src/lib/core/coio_buf.h | 8 + src/lib/core/errinj.h | 1 + test/box-py/iproto.test.py | 9 +- test/box/errinj.result | 134 +-- test/replication/force_recovery.result | 8 + test/replication/force_recovery.test.lua | 2 + test/replication/gc_no_space.result | 30 +- test/replication/gc_no_space.test.lua | 12 +- test/replication/replica_rejoin.result | 8 + test/replication/replica_rejoin.test.lua | 2 + .../show_error_on_disconnect.result | 8 + .../show_error_on_disconnect.test.lua | 2 + test/replication/suite.ini | 2 +- test/unit/CMakeLists.txt | 2 + test/unit/mclock.result | 18 + test/unit/mclock.test.c | 160 ++++ test/xlog/panic_on_wal_error.result | 12 + test/xlog/panic_on_wal_error.test.lua | 3 + test/xlog/suite.ini | 2 +- 44 files changed, 3063 insertions(+), 1389 deletions(-) create mode 100644 src/box/mclock.c create mode 100644 src/box/mclock.h create mode 100644 src/box/xrow_buf.c create mode 100644 src/box/xrow_buf.h delete mode 100644 src/box/xstream.cc create mode 100644 test/unit/mclock.result create mode 100644 test/unit/mclock.test.c -- 2.25.0