Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine
@ 2020-07-23 12:29 Cyrill Gorcunov
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 1/7] journal: drop redundant declaration Cyrill Gorcunov
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

In this series we write CONFIRM/ROLLBACK messages into the WAL directly
without involving the txn engine.

Vlad, take a look please, once time permit.

issue https://github.com/tarantool/tarantool/issues/5129
branch gorcunov/gh-5129-journal-2

Cyrill Gorcunov (7):
  journal: drop redundant declaration
  wal: bind asynchronous write completion to an entry
  journal: add journal_entry_create helper
  qsync: provide a binary form of CONFIRM/ROLLBACK entries
  qsync: provide a way to encode preallocated CONFIRM/ROLLBACK entries
  qsync: implement direct write of CONFIRM/ROLLBACK into a journal
  qsync: drop no longer used xrow_encode_confirm,rollback

 src/box/box.cc             | 14 +++---
 src/box/iproto_constants.h | 27 +++++++++++
 src/box/journal.c          |  8 ++--
 src/box/journal.h          | 40 ++++++++++++-----
 src/box/txn.c              |  2 +-
 src/box/txn_limbo.c        | 92 +++++++++++++++++++++-----------------
 src/box/vy_log.c           |  2 +-
 src/box/wal.c              | 20 ++++-----
 src/box/wal.h              |  4 +-
 src/box/xrow.c             | 41 +++--------------
 src/box/xrow.h             | 29 ++++--------
 11 files changed, 146 insertions(+), 133 deletions(-)


base-commit: 8c50e069e3183222682896867c7e1ef98a5b99a7
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 1/7] journal: drop redundant declaration
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry Cyrill Gorcunov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

We declare journal_entry right below no need for
more declarations.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/journal.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/box/journal.h b/src/box/journal.h
index 1a10e66c3..9049a2ce0 100644
--- a/src/box/journal.h
+++ b/src/box/journal.h
@@ -40,7 +40,6 @@ extern "C" {
 #endif /* defined(__cplusplus) */
 
 struct xrow_header;
-struct journal_entry;
 
 /**
  * An entry for an abstract journal.
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 1/7] journal: drop redundant declaration Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper Cyrill Gorcunov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

In commit 77ba0e3504464131fe81c672d508d0275be2173a we've redesigned
wal journal operations such that asynchronous write completion
is a signle instance per journal.

It turned out that such simplification is too tight and doesn't
allow us to pass entries into the journal with custom completions.

Thus lets allow back such ability. We will need it to be able
to write "confirm" records into wal directly without touching
trasactions code at all.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/box.cc    | 14 ++++++++------
 src/box/journal.c |  2 ++
 src/box/journal.h | 18 +++++++++---------
 src/box/txn.c     |  2 +-
 src/box/vy_log.c  |  2 +-
 src/box/wal.c     | 20 +++++++++-----------
 src/box/wal.h     |  4 ++--
 7 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 83eef5d98..7d61f2ed2 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -348,7 +348,7 @@ recovery_journal_write(struct journal *base,
 	 * Since there're no actual writes, fire a
 	 * journal_async_complete callback right away.
 	 */
-	journal_async_complete(base, entry);
+	journal_async_complete(entry);
 	return 0;
 }
 
@@ -357,7 +357,7 @@ recovery_journal_create(struct vclock *v)
 {
 	static struct recovery_journal journal;
 	journal_create(&journal.base, recovery_journal_write,
-		       txn_complete_async, recovery_journal_write);
+		       recovery_journal_write);
 	journal.vclock = v;
 	journal_set(&journal.base);
 }
@@ -2193,8 +2193,10 @@ engine_init()
 static int
 bootstrap_journal_write(struct journal *base, struct journal_entry *entry)
 {
+	(void)base;
+
 	entry->res = 0;
-	journal_async_complete(base, entry);
+	journal_async_complete(entry);
 	return 0;
 }
 
@@ -2580,8 +2582,8 @@ box_cfg_xc(void)
 
 	int64_t wal_max_size = box_check_wal_max_size(cfg_geti64("wal_max_size"));
 	enum wal_mode wal_mode = box_check_wal_mode(cfg_gets("wal_mode"));
-	if (wal_init(wal_mode, txn_complete_async, cfg_gets("wal_dir"),
-		     wal_max_size, &INSTANCE_UUID, on_wal_garbage_collection,
+	if (wal_init(wal_mode, cfg_gets("wal_dir"), wal_max_size,
+		     &INSTANCE_UUID, on_wal_garbage_collection,
 		     on_wal_checkpoint_threshold) != 0) {
 		diag_raise();
 	}
@@ -2628,7 +2630,7 @@ box_cfg_xc(void)
 	}
 
 	struct journal bootstrap_journal;
-	journal_create(&bootstrap_journal, NULL, txn_complete_async,
+	journal_create(&bootstrap_journal, bootstrap_journal_write,
 		       bootstrap_journal_write);
 	journal_set(&bootstrap_journal);
 	auto bootstrap_journal_guard = make_scoped_guard([] {
diff --git a/src/box/journal.c b/src/box/journal.c
index f1e89aaa2..fb81acb39 100644
--- a/src/box/journal.c
+++ b/src/box/journal.c
@@ -36,6 +36,7 @@ struct journal *current_journal = NULL;
 
 struct journal_entry *
 journal_entry_new(size_t n_rows, struct region *region,
+		  void (*write_async_cb)(struct journal_entry *entry),
 		  void *complete_data)
 {
 	struct journal_entry *entry;
@@ -50,6 +51,7 @@ journal_entry_new(size_t n_rows, struct region *region,
 		return NULL;
 	}
 
+	entry->write_async_cb = write_async_cb;
 	entry->complete_data = complete_data;
 	entry->approx_len = 0;
 	entry->n_rows = n_rows;
diff --git a/src/box/journal.h b/src/box/journal.h
index 9049a2ce0..74a5eb050 100644
--- a/src/box/journal.h
+++ b/src/box/journal.h
@@ -60,6 +60,10 @@ struct journal_entry {
 	 * A journal entry completion callback argument.
 	 */
 	void *complete_data;
+	/**
+	 * Asynchronous write completion function.
+	 */
+	void (*write_async_cb)(struct journal_entry *entry);
 	/**
 	 * Approximate size of this request when encoded.
 	 */
@@ -83,6 +87,7 @@ struct region;
  */
 struct journal_entry *
 journal_entry_new(size_t n_rows, struct region *region,
+		  void (*write_async_cb)(struct journal_entry *entry),
 		  void *complete_data);
 
 /**
@@ -95,22 +100,19 @@ struct journal {
 	int (*write_async)(struct journal *journal,
 			   struct journal_entry *entry);
 
-	/** Asynchronous write completion */
-	void (*write_async_cb)(struct journal_entry *entry);
-
 	/** Synchronous write */
 	int (*write)(struct journal *journal,
 		     struct journal_entry *entry);
 };
 
 /**
- * Finalize a single entry.
+ * Complete asynchronous write.
  */
 static inline void
-journal_async_complete(struct journal *journal, struct journal_entry *entry)
+journal_async_complete(struct journal_entry *entry)
 {
-	assert(journal->write_async_cb != NULL);
-	journal->write_async_cb(entry);
+	assert(entry->write_async_cb != NULL);
+	entry->write_async_cb(entry);
 }
 
 /**
@@ -172,12 +174,10 @@ static inline void
 journal_create(struct journal *journal,
 	       int (*write_async)(struct journal *journal,
 				  struct journal_entry *entry),
-	       void (*write_async_cb)(struct journal_entry *entry),
 	       int (*write)(struct journal *journal,
 			    struct journal_entry *entry))
 {
 	journal->write_async	= write_async;
-	journal->write_async_cb	= write_async_cb;
 	journal->write		= write;
 }
 
diff --git a/src/box/txn.c b/src/box/txn.c
index 9c21258c5..cc1f496c5 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -551,7 +551,7 @@ txn_journal_entry_new(struct txn *txn)
 
 	/* Save space for an additional NOP row just in case. */
 	req = journal_entry_new(txn->n_new_rows + txn->n_applier_rows + 1,
-				&txn->region, txn);
+				&txn->region, txn_complete_async, txn);
 	if (req == NULL)
 		return NULL;
 
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 311985c72..de4c5205c 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -818,7 +818,7 @@ vy_log_tx_flush(struct vy_log_tx *tx)
 	size_t used = region_used(&fiber()->gc);
 
 	struct journal_entry *entry;
-	entry = journal_entry_new(tx_size, &fiber()->gc, NULL);
+	entry = journal_entry_new(tx_size, &fiber()->gc, NULL, NULL);
 	if (entry == NULL)
 		goto err;
 
diff --git a/src/box/wal.c b/src/box/wal.c
index 37a8bd483..4e6025104 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -266,10 +266,9 @@ xlog_write_entry(struct xlog *l, struct journal_entry *entry)
 static void
 tx_schedule_queue(struct stailq *queue)
 {
-	struct wal_writer *writer = &wal_writer_singleton;
 	struct journal_entry *req, *tmp;
 	stailq_foreach_entry_safe(req, tmp, queue, fifo)
-		journal_async_complete(&writer->base, req);
+		journal_async_complete(req);
 }
 
 /**
@@ -403,9 +402,8 @@ tx_notify_checkpoint(struct cmsg *msg)
  */
 static void
 wal_writer_create(struct wal_writer *writer, enum wal_mode wal_mode,
-		  void (*wall_async_cb)(struct journal_entry *entry),
-		  const char *wal_dirname,
-		  int64_t wal_max_size, const struct tt_uuid *instance_uuid,
+		  const char *wal_dirname, int64_t wal_max_size,
+		  const struct tt_uuid *instance_uuid,
 		  wal_on_garbage_collection_f on_garbage_collection,
 		  wal_on_checkpoint_threshold_f on_checkpoint_threshold)
 {
@@ -415,7 +413,6 @@ wal_writer_create(struct wal_writer *writer, enum wal_mode wal_mode,
 	journal_create(&writer->base,
 		       wal_mode == WAL_NONE ?
 		       wal_write_none_async : wal_write_async,
-		       wall_async_cb,
 		       wal_mode == WAL_NONE ?
 		       wal_write_none : wal_write);
 
@@ -525,15 +522,15 @@ wal_open(struct wal_writer *writer)
 }
 
 int
-wal_init(enum wal_mode wal_mode, void (*wall_async_cb)(struct journal_entry *entry),
-	 const char *wal_dirname, int64_t wal_max_size, const struct tt_uuid *instance_uuid,
+wal_init(enum wal_mode wal_mode, const char *wal_dirname,
+	 int64_t wal_max_size, const struct tt_uuid *instance_uuid,
 	 wal_on_garbage_collection_f on_garbage_collection,
 	 wal_on_checkpoint_threshold_f on_checkpoint_threshold)
 {
 	/* Initialize the state. */
 	struct wal_writer *writer = &wal_writer_singleton;
-	wal_writer_create(writer, wal_mode, wall_async_cb, wal_dirname,
-			  wal_max_size, instance_uuid, on_garbage_collection,
+	wal_writer_create(writer, wal_mode, wal_dirname, wal_max_size,
+			  instance_uuid, on_garbage_collection,
 			  on_checkpoint_threshold);
 
 	/* Start WAL thread. */
@@ -1304,7 +1301,8 @@ wal_write_none_async(struct journal *journal,
 	vclock_merge(&writer->vclock, &vclock_diff);
 	vclock_copy(&replicaset.vclock, &writer->vclock);
 	entry->res = vclock_sum(&writer->vclock);
-	journal_async_complete(journal, entry);
+
+	journal_async_complete(entry);
 	return 0;
 }
 
diff --git a/src/box/wal.h b/src/box/wal.h
index f348dc636..9d0cada46 100644
--- a/src/box/wal.h
+++ b/src/box/wal.h
@@ -81,8 +81,8 @@ typedef void (*wal_on_checkpoint_threshold_f)(void);
  * Start WAL thread and initialize WAL writer.
  */
 int
-wal_init(enum wal_mode wal_mode, void (*wall_async_cb)(struct journal_entry *entry),
-	 const char *wal_dirname, int64_t wal_max_size, const struct tt_uuid *instance_uuid,
+wal_init(enum wal_mode wal_mode, const char *wal_dirname,
+	 int64_t wal_max_size, const struct tt_uuid *instance_uuid,
 	 wal_on_garbage_collection_f on_garbage_collection,
 	 wal_on_checkpoint_threshold_f on_checkpoint_threshold);
 
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 1/7] journal: drop redundant declaration Cyrill Gorcunov
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries Cyrill Gorcunov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

To create raw journal entries. We will use it
to write confirm/rollback entries.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/journal.c |  8 ++------
 src/box/journal.h | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/src/box/journal.c b/src/box/journal.c
index fb81acb39..159e32ff3 100644
--- a/src/box/journal.c
+++ b/src/box/journal.c
@@ -51,11 +51,7 @@ journal_entry_new(size_t n_rows, struct region *region,
 		return NULL;
 	}
 
-	entry->write_async_cb = write_async_cb;
-	entry->complete_data = complete_data;
-	entry->approx_len = 0;
-	entry->n_rows = n_rows;
-	entry->res = -1;
-
+	journal_entry_create(entry, n_rows, 0, write_async_cb,
+			     complete_data);
 	return entry;
 }
diff --git a/src/box/journal.h b/src/box/journal.h
index 74a5eb050..fc33fd6f0 100644
--- a/src/box/journal.h
+++ b/src/box/journal.h
@@ -80,6 +80,27 @@ struct journal_entry {
 
 struct region;
 
+/**
+ * Initialize a new journal entry.
+ */
+static inline void
+journal_entry_create(struct journal_entry *entry, size_t n_rows,
+		     size_t approx_len,
+		     void (*write_async_cb)(struct journal_entry *entry),
+		     void *complete_data)
+{
+	/*
+	 * fifo member is left untouched because
+	 * it is used by the journal engine internally,
+	 * no need to waste time here.
+	 */
+	entry->write_async_cb	= write_async_cb;
+	entry->complete_data	= complete_data;
+	entry->approx_len	= approx_len;
+	entry->n_rows		= n_rows;
+	entry->res		= -1;
+}
+
 /**
  * Create a new journal entry.
  *
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated " Cyrill Gorcunov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

These msgpack entries will be needed to write them
down to a journal without involving txn engine.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/iproto_constants.h | 27 +++++++++++++++++++++++++++
 src/box/xrow.c             | 23 ++++++++---------------
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 6b850f101..1e6566e0f 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -328,6 +328,33 @@ iproto_type_is_synchro_request(uint32_t type)
 	return type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK;
 }
 
+/** CONFIRM/ROLLBACK entries encoded in MsgPack. */
+struct PACKED request_synchro_body {
+	uint8_t m_body;
+	uint8_t k_replica_id;
+	uint8_t m_replica_id;
+	uint32_t v_replica_id;
+	uint8_t k_lsn;
+	uint8_t m_lsn;
+	uint64_t v_lsn;
+};
+
+/**
+ * Creates CONFIRM/ROLLBACK entry.
+ */
+static inline void
+request_synchro_body_create(struct request_synchro_body *body,
+			    uint32_t replica_id, int64_t lsn)
+{
+	body->m_body = 0x80 | 2;
+	body->k_replica_id = IPROTO_REPLICA_ID;
+	body->m_replica_id = 0xce;
+	body->v_replica_id = mp_bswap_u32(replica_id);
+	body->k_lsn = IPROTO_LSN;
+	body->m_lsn = 0xcf;
+	body->v_lsn = mp_bswap_u64(lsn);
+}
+
 /** This is an error. */
 static inline bool
 iproto_type_is_error(uint32_t type)
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 0c797a9d5..c2f4ba40a 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -897,26 +897,19 @@ static int
 xrow_encode_confirm_rollback(struct xrow_header *row, struct region *region,
 			     uint32_t replica_id, int64_t lsn, int type)
 {
-	size_t len = mp_sizeof_map(2) + mp_sizeof_uint(IPROTO_REPLICA_ID) +
-		     mp_sizeof_uint(replica_id) + mp_sizeof_uint(IPROTO_LSN) +
-		     mp_sizeof_uint(lsn);
-	char *buf = (char *)region_alloc(region, len);
-	if (buf == NULL) {
-		diag_set(OutOfMemory, len, "region_alloc", "buf");
+	struct request_synchro_body *body;
+
+	body = region_alloc(region, sizeof(*body));
+	if (body == NULL) {
+		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
 		return -1;
 	}
-	char *pos = buf;
-
-	pos = mp_encode_map(pos, 2);
-	pos = mp_encode_uint(pos, IPROTO_REPLICA_ID);
-	pos = mp_encode_uint(pos, replica_id);
-	pos = mp_encode_uint(pos, IPROTO_LSN);
-	pos = mp_encode_uint(pos, lsn);
+	request_synchro_body_create(body, replica_id, lsn);
 
 	memset(row, 0, sizeof(*row));
 
-	row->body[0].iov_base = buf;
-	row->body[0].iov_len = len;
+	row->body[0].iov_base = (void *)body;
+	row->body[0].iov_len = sizeof(*body);
 	row->bodycnt = 1;
 
 	row->type = type;
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated CONFIRM/ROLLBACK entries
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal Cyrill Gorcunov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Force the caller to allocate memory. Eventually we are
planning to use preallocated memory without calling
region_alloc at all.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/xrow.c | 32 +++++++++++++++++++++-----------
 src/box/xrow.h | 15 +++++++++++++++
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/src/box/xrow.c b/src/box/xrow.c
index c2f4ba40a..bebad2dd0 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -893,17 +893,11 @@ xrow_encode_dml(const struct request *request, struct region *region,
 	return iovcnt;
 }
 
-static int
-xrow_encode_confirm_rollback(struct xrow_header *row, struct region *region,
+int
+xrow_encode_confirm_rollback(struct xrow_header *row,
+			     struct request_synchro_body *body,
 			     uint32_t replica_id, int64_t lsn, int type)
 {
-	struct request_synchro_body *body;
-
-	body = region_alloc(region, sizeof(*body));
-	if (body == NULL) {
-		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
-		return -1;
-	}
 	request_synchro_body_create(body, replica_id, lsn);
 
 	memset(row, 0, sizeof(*row));
@@ -921,7 +915,15 @@ int
 xrow_encode_confirm(struct xrow_header *row, struct region *region,
 		    uint32_t replica_id, int64_t lsn)
 {
-	return xrow_encode_confirm_rollback(row, region, replica_id, lsn,
+	struct request_synchro_body *body;
+
+	body = region_alloc(region, sizeof(*body));
+	if (body == NULL) {
+		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
+		return -1;
+	}
+
+	return xrow_encode_confirm_rollback(row, body, replica_id, lsn,
 					    IPROTO_CONFIRM);
 }
 
@@ -929,7 +931,15 @@ int
 xrow_encode_rollback(struct xrow_header *row, struct region *region,
 		     uint32_t replica_id, int64_t lsn)
 {
-	return xrow_encode_confirm_rollback(row, region, replica_id, lsn,
+	struct request_synchro_body *body;
+
+	body = region_alloc(region, sizeof(*body));
+	if (body == NULL) {
+		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
+		return -1;
+	}
+
+	return xrow_encode_confirm_rollback(row, body, replica_id, lsn,
 					    IPROTO_ROLLBACK);
 }
 
diff --git a/src/box/xrow.h b/src/box/xrow.h
index e21ede5a3..503140d1e 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -54,6 +54,7 @@ enum {
 	IPROTO_SELECT_HEADER_LEN = IPROTO_HEADER_LEN + 7,
 };
 
+struct request_synchro_body;
 struct region;
 
 struct xrow_header {
@@ -215,6 +216,20 @@ int
 xrow_encode_dml(const struct request *request, struct region *region,
 		struct iovec *iov);
 
+
+/**
+ * Encode the CONFIRM or ROLLBACK to row body and set row type.
+ * @param row xrow header.
+ * @param body body to encode.
+ * @param replica_id master's instance id.
+ * @param lsn last confirmed lsn.
+ * @param type IPROTO_CONFIRM or IPROTO_ROLLBACK.
+ */
+int
+xrow_encode_confirm_rollback(struct xrow_header *row,
+			     struct request_synchro_body *body,
+			     uint32_t replica_id, int64_t lsn, int type);
+
 /**
  * Encode the CONFIRM to row body and set row type to
  * IPROTO_CONFIRM.
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
                   ` (4 preceding siblings ...)
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated " Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 7/7] qsync: drop no longer used xrow_encode_confirm, rollback Cyrill Gorcunov
  2020-07-23 22:13 ` [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Vladislav Shpilevoy
  7 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

When we need to write CONFIRM or ROLLBACK message (which is just
a binary record in msgpack format) into a journal we use txn code
to allocate a new transaction, encode there a message and pass it
to walk the long txn path before it hit the journal. This is not
only resource wasting but also somehow strange from arhitectural
point of view.

Instead lets encode a record on the stack and write it
directly to the journal.

Closes #5129

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/txn_limbo.c | 92 +++++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 41 deletions(-)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index a74bfe244..3eca244dc 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -32,6 +32,9 @@
 #include "txn_limbo.h"
 #include "replication.h"
 
+#include "iproto_constants.h"
+#include "journal.h"
+
 struct txn_limbo txn_limbo;
 
 static inline void
@@ -237,62 +240,69 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	return 0;
 }
 
+static void
+txn_limbo_write_cb(struct journal_entry *entry)
+{
+	/*
+	 * It is safe to call fiber_wakeup on
+	 * active fiber but we need to call wakeup
+	 * in case if we're sitting in WAL thread.
+	 */
+	assert(entry->complete_data != NULL);
+	fiber_wakeup(entry->complete_data);
+}
+
+/**
+ * Write CONFIRM or ROLLBACK message to a journal directly
+ * without involving transaction engine because using txn
+ * engine is far from being cheap while we only need to
+ * write a small message.
+ */
 static int
-txn_limbo_write_confirm_rollback(struct txn_limbo *limbo, int64_t lsn,
-				 bool is_confirm)
+txn_limbo_write(uint32_t replica_id, int64_t lsn, int type)
 {
+	assert(replica_id != REPLICA_ID_NIL);
+	assert(type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK);
 	assert(lsn > 0);
 
-	struct xrow_header row;
-	struct request request = {
-		.header = &row,
-	};
+	char buf[sizeof(struct journal_entry) +
+		 sizeof(struct xrow_header *) +
+		 sizeof(struct xrow_header)];
 
-	struct txn *txn = txn_begin();
-	if (txn == NULL)
-		return -1;
+	struct journal_entry *entry = (void *)buf;
+	struct xrow_header *row = (void *)&entry->rows[1];
+	entry->rows[0] = row;
 
-	int res = 0;
-	if (is_confirm) {
-		res = xrow_encode_confirm(&row, &txn->region,
-					  limbo->instance_id, lsn);
-	} else {
-		/*
-		 * This LSN is the first to be rolled back, so
-		 * the last "safe" lsn is lsn - 1.
-		 */
-		res = xrow_encode_rollback(&row, &txn->region,
-					   limbo->instance_id, lsn);
+	struct request_synchro_body body;
+	xrow_encode_confirm_rollback(row, &body, replica_id,
+				     lsn, type);
+
+	journal_entry_create(entry, 1, xrow_approx_len(row),
+			     txn_limbo_write_cb, fiber());
+
+	if (journal_write(entry) != 0) {
+		diag_set(ClientError, ER_WAL_IO);
+		diag_log();
+		return -1;
 	}
-	if (res == -1)
-		goto rollback;
-	/*
-	 * This is not really a transaction. It just uses txn API
-	 * to put the data into WAL. And obviously it should not
-	 * go to the limbo and block on the very same sync
-	 * transaction which it tries to confirm now.
-	 */
-	txn_set_flag(txn, TXN_FORCE_ASYNC);
 
-	if (txn_begin_stmt(txn, NULL) != 0)
-		goto rollback;
-	if (txn_commit_stmt(txn, &request) != 0)
-		goto rollback;
+	if (entry->res < 0) {
+		diag_set(ClientError, ER_WAL_IO);
+		diag_log();
+		return -1;
+	}
 
-	return txn_commit(txn);
-rollback:
-	txn_rollback(txn);
-	return -1;
+	return 0;
 }
 
 /**
  * Write a confirmation entry to WAL. After it's written all the
  * transactions waiting for confirmation may be finished.
  */
-static int
+static inline int
 txn_limbo_write_confirm(struct txn_limbo *limbo, int64_t lsn)
 {
-	return txn_limbo_write_confirm_rollback(limbo, lsn, true);
+	return txn_limbo_write(limbo->instance_id, lsn, IPROTO_CONFIRM);
 }
 
 void
@@ -338,10 +348,10 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
  * transactions following the current one and waiting for
  * confirmation must be rolled back.
  */
-static int
+static inline int
 txn_limbo_write_rollback(struct txn_limbo *limbo, int64_t lsn)
 {
-	return txn_limbo_write_confirm_rollback(limbo, lsn, false);
+	return txn_limbo_write(limbo->instance_id, lsn, IPROTO_ROLLBACK);
 }
 
 void
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Tarantool-patches] [PATCH v2 7/7] qsync: drop no longer used xrow_encode_confirm, rollback
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
                   ` (5 preceding siblings ...)
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal Cyrill Gorcunov
@ 2020-07-23 12:29 ` Cyrill Gorcunov
  2020-07-23 22:13 ` [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Vladislav Shpilevoy
  7 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-23 12:29 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Since we no longer allocate these entries via region_alloc
just drop them out as unused.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/xrow.c | 32 --------------------------------
 src/box/xrow.h | 28 ----------------------------
 2 files changed, 60 deletions(-)

diff --git a/src/box/xrow.c b/src/box/xrow.c
index bebad2dd0..4fac170b3 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -911,38 +911,6 @@ xrow_encode_confirm_rollback(struct xrow_header *row,
 	return 0;
 }
 
-int
-xrow_encode_confirm(struct xrow_header *row, struct region *region,
-		    uint32_t replica_id, int64_t lsn)
-{
-	struct request_synchro_body *body;
-
-	body = region_alloc(region, sizeof(*body));
-	if (body == NULL) {
-		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
-		return -1;
-	}
-
-	return xrow_encode_confirm_rollback(row, body, replica_id, lsn,
-					    IPROTO_CONFIRM);
-}
-
-int
-xrow_encode_rollback(struct xrow_header *row, struct region *region,
-		     uint32_t replica_id, int64_t lsn)
-{
-	struct request_synchro_body *body;
-
-	body = region_alloc(region, sizeof(*body));
-	if (body == NULL) {
-		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
-		return -1;
-	}
-
-	return xrow_encode_confirm_rollback(row, body, replica_id, lsn,
-					    IPROTO_ROLLBACK);
-}
-
 static int
 xrow_decode_confirm_rollback(struct xrow_header *row, uint32_t *replica_id,
 			     int64_t *lsn)
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 503140d1e..86a9e3542 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -230,20 +230,6 @@ xrow_encode_confirm_rollback(struct xrow_header *row,
 			     struct request_synchro_body *body,
 			     uint32_t replica_id, int64_t lsn, int type);
 
-/**
- * Encode the CONFIRM to row body and set row type to
- * IPROTO_CONFIRM.
- * @param row xrow header.
- * @param region Region to use to encode the confirmation body.
- * @param replica_id master's instance id.
- * @param lsn last confirmed lsn.
- * @retval -1 on error.
- * @retval 0 success.
- */
-int
-xrow_encode_confirm(struct xrow_header *row, struct region *region,
-		    uint32_t replica_id, int64_t lsn);
-
 /**
  * Decode the CONFIRM request body.
  * @param row xrow header.
@@ -255,20 +241,6 @@ xrow_encode_confirm(struct xrow_header *row, struct region *region,
 int
 xrow_decode_confirm(struct xrow_header *row, uint32_t *replica_id, int64_t *lsn);
 
-/**
- * Encode the ROLLBACK row body and set row type to
- * IPROTO_ROLLBACK.
- * @param row xrow header.
- * @param region Region to use to encode the rollback body.
- * @param replica_id master's instance id.
- * @param lsn lsn to rollback from, including it.
- * @retval -1  on error.
- * @retval 0 success.
- */
-int
-xrow_encode_rollback(struct xrow_header *row, struct region *region,
-		     uint32_t replica_id, int64_t lsn);
-
 /**
  * Decode the ROLLBACK row body.
  * @param row xrow header.
-- 
2.26.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry Cyrill Gorcunov
@ 2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-24 17:48     ` Cyrill Gorcunov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 22:10 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

See 5 comments below.

On 23.07.2020 14:29, Cyrill Gorcunov wrote:
> In commit 77ba0e3504464131fe81c672d508d0275be2173a we've redesigned
> wal journal operations such that asynchronous write completion
> is a signle instance per journal.

1. signle -> single

> It turned out that such simplification is too tight and doesn't
> allow us to pass entries into the journal with custom completions.
> 
> Thus lets allow back such ability. We will need it to be able
> to write "confirm" records into wal directly without touching
> trasactions code at all.

2. trasactions -> transactions

3. Please, add 'Part of #5129' to the commits related to the issue.

> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 83eef5d98..7d61f2ed2 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -2628,7 +2630,7 @@ box_cfg_xc(void)
>  	}
>  
>  	struct journal bootstrap_journal;
> -	journal_create(&bootstrap_journal, NULL, txn_complete_async,
> +	journal_create(&bootstrap_journal, bootstrap_journal_write,

4. Please, leave the sync write() NULL. I did it intentionally so as
it would early fail if bootstrap will call blocking write anywhere.

>  		       bootstrap_journal_write);
>  	journal_set(&bootstrap_journal);
>  	auto bootstrap_journal_guard = make_scoped_guard([] {
> diff --git a/src/box/journal.c b/src/box/journal.c
> index f1e89aaa2..fb81acb39 100644
> --- a/src/box/journal.c
> +++ b/src/box/journal.c
> @@ -36,6 +36,7 @@ struct journal *current_journal = NULL;
>  
>  struct journal_entry *
>  journal_entry_new(size_t n_rows, struct region *region,
> +		  void (*write_async_cb)(struct journal_entry *entry),

5.I think it would be better to move the callback definition to
a typedef. Otherwise it looks too bulky and too many places to change,
if will need to change the signature.

>  		  void *complete_data)
>  {
>  	struct journal_entry *entry;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper Cyrill Gorcunov
@ 2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-24 17:50     ` Cyrill Gorcunov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 22:10 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

> diff --git a/src/box/journal.h b/src/box/journal.h
> index 74a5eb050..fc33fd6f0 100644
> --- a/src/box/journal.h
> +++ b/src/box/journal.h
> @@ -80,6 +80,27 @@ struct journal_entry {
>  
>  struct region;
>  
> +/**
> + * Initialize a new journal entry.
> + */
> +static inline void
> +journal_entry_create(struct journal_entry *entry, size_t n_rows,
> +		     size_t approx_len,
> +		     void (*write_async_cb)(struct journal_entry *entry),
> +		     void *complete_data)
> +{
> +	/*
> +	 * fifo member is left untouched because
> +	 * it is used by the journal engine internally,
> +	 * no need to waste time here.

fifo -> Fifo.

Also you can use up to 80 symbols for comments now. No need to keep them
tight. Here and in all other new places.

> +	 */
> +	entry->write_async_cb	= write_async_cb;
> +	entry->complete_data	= complete_data;
> +	entry->approx_len	= approx_len;
> +	entry->n_rows		= n_rows;
> +	entry->res		= -1;
> +}
> +
>  /**
>   * Create a new journal entry.
>   *
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries Cyrill Gorcunov
@ 2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-24 18:07     ` Cyrill Gorcunov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 22:10 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

See 2 commits below.

On 23.07.2020 14:29, Cyrill Gorcunov wrote:
> These msgpack entries will be needed to write them
> down to a journal without involving txn engine.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  src/box/iproto_constants.h | 27 +++++++++++++++++++++++++++
>  src/box/xrow.c             | 23 ++++++++---------------
>  2 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
> index 6b850f101..1e6566e0f 100644
> --- a/src/box/iproto_constants.h
> +++ b/src/box/iproto_constants.h
> @@ -328,6 +328,33 @@ iproto_type_is_synchro_request(uint32_t type)
>  	return type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK;
>  }
>  
> +/** CONFIRM/ROLLBACK entries encoded in MsgPack. */
> +struct PACKED request_synchro_body {

1. Please, call it _bin, not _body. Look at the examples iproto_header_bin,
iproto_body_bin. _body_bin also would be fine. But should end with _bin
anyway.

> +	uint8_t m_body;
> +	uint8_t k_replica_id;
> +	uint8_t m_replica_id;
> +	uint32_t v_replica_id;
> +	uint8_t k_lsn;
> +	uint8_t m_lsn;
> +	uint64_t v_lsn;
> +};
> +
> +/**
> + * Creates CONFIRM/ROLLBACK entry.
> + */
> +static inline void
> +request_synchro_body_create(struct request_synchro_body *body,
> +			    uint32_t replica_id, int64_t lsn)
> +{
> +	body->m_body = 0x80 | 2;
> +	body->k_replica_id = IPROTO_REPLICA_ID;
> +	body->m_replica_id = 0xce;
> +	body->v_replica_id = mp_bswap_u32(replica_id);
> +	body->k_lsn = IPROTO_LSN;
> +	body->m_lsn = 0xcf;
> +	body->v_lsn = mp_bswap_u64(lsn);
> +}
> +
>  /** This is an error. */
>  static inline bool
>  iproto_type_is_error(uint32_t type)
> diff --git a/src/box/xrow.c b/src/box/xrow.c
> index 0c797a9d5..c2f4ba40a 100644
> --- a/src/box/xrow.c
> +++ b/src/box/xrow.c
> @@ -897,26 +897,19 @@ static int
>  xrow_encode_confirm_rollback(struct xrow_header *row, struct region *region,
>  			     uint32_t replica_id, int64_t lsn, int type)
>  {
> -	size_t len = mp_sizeof_map(2) + mp_sizeof_uint(IPROTO_REPLICA_ID) +
> -		     mp_sizeof_uint(replica_id) + mp_sizeof_uint(IPROTO_LSN) +
> -		     mp_sizeof_uint(lsn);
> -	char *buf = (char *)region_alloc(region, len);
> -	if (buf == NULL) {
> -		diag_set(OutOfMemory, len, "region_alloc", "buf");
> +	struct request_synchro_body *body;
> +
> +	body = region_alloc(region, sizeof(*body));

2. I see you omit explicit type casts. And use (void *) in the next commits,
when need to cast between 2 non-void types. Both ways are not used in Tarantool.
At least they are not part of the code style. Personally I am fine if we will
adopt your way. It is not too conflicting with the old code (don't need to
change the old code then), is more compact. But you need to raise that question
with other teammates. So as we could write down that way in the doc, if everyone
agrees. And force it in the newer commits.

> +	if (body == NULL) {
> +		diag_set(OutOfMemory, sizeof(*body), "region_alloc", "body");
>  		return -1;
>  	}
> -	char *pos = buf;
> -
> -	pos = mp_encode_map(pos, 2);
> -	pos = mp_encode_uint(pos, IPROTO_REPLICA_ID);
> -	pos = mp_encode_uint(pos, replica_id);
> -	pos = mp_encode_uint(pos, IPROTO_LSN);
> -	pos = mp_encode_uint(pos, lsn);
> +	request_synchro_body_create(body, replica_id, lsn);
>  
>  	memset(row, 0, sizeof(*row));
>  
> -	row->body[0].iov_base = buf;
> -	row->body[0].iov_len = len;
> +	row->body[0].iov_base = (void *)body;
> +	row->body[0].iov_len = sizeof(*body);
>  	row->bodycnt = 1;
>  
>  	row->type = type;
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated CONFIRM/ROLLBACK entries
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated " Cyrill Gorcunov
@ 2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-24 18:08     ` Cyrill Gorcunov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 22:10 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

> diff --git a/src/box/xrow.h b/src/box/xrow.h
> index e21ede5a3..503140d1e 100644
> --- a/src/box/xrow.h
> +++ b/src/box/xrow.h
> @@ -215,6 +216,20 @@ int
>  xrow_encode_dml(const struct request *request, struct region *region,
>  		struct iovec *iov);
>  
> +
> +/**
> + * Encode the CONFIRM or ROLLBACK to row body and set row type.
> + * @param row xrow header.
> + * @param body body to encode.
> + * @param replica_id master's instance id.
> + * @param lsn last confirmed lsn.
> + * @param type IPROTO_CONFIRM or IPROTO_ROLLBACK.

1. Lets start sentences from capital letters.

> + */
> +int
> +xrow_encode_confirm_rollback(struct xrow_header *row,

2. I would also rename xrow_encode_confirm_rollback -> xrow_encode_synchro.
While we are here. I don't really like the current name. Too long, and
a bit confusing. When I saw it first time, I thought it encodes confirmation
of a rollback.

> +			     struct request_synchro_body *body,
> +			     uint32_t replica_id, int64_t lsn, int type);
> +
>  /**
>   * Encode the CONFIRM to row body and set row type to
>   * IPROTO_CONFIRM.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal Cyrill Gorcunov
@ 2020-07-23 22:10   ` Vladislav Shpilevoy
  2020-07-24 18:16     ` Cyrill Gorcunov
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 22:10 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

See 5 comments below.

On 23.07.2020 14:29, Cyrill Gorcunov wrote:
> When we need to write CONFIRM or ROLLBACK message (which is just
> a binary record in msgpack format) into a journal we use txn code
> to allocate a new transaction, encode there a message and pass it
> to walk the long txn path before it hit the journal. This is not
> only resource wasting but also somehow strange from arhitectural

1. arhitectural -> architectural.

> point of view.
> 
> Instead lets encode a record on the stack and write it
> directly to the journal.
> 
> Closes #5129
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  src/box/txn_limbo.c | 92 +++++++++++++++++++++++++--------------------
>  1 file changed, 51 insertions(+), 41 deletions(-)
> 
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index a74bfe244..3eca244dc 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -32,6 +32,9 @@
>  #include "txn_limbo.h"
>  #include "replication.h"
>  
> +#include "iproto_constants.h"
> +#include "journal.h"
> +
>  struct txn_limbo txn_limbo;
>  
>  static inline void
> @@ -237,62 +240,69 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>  	return 0;
>  }
>  
> +static void
> +txn_limbo_write_cb(struct journal_entry *entry)
> +{
> +	/*
> +	 * It is safe to call fiber_wakeup on
> +	 * active fiber but we need to call wakeup
> +	 * in case if we're sitting in WAL thread.

2. Actually it is not safe to call wakeup on the active fiber. It
leads to something being broken in the scheduler. This is why we
have so many checks like

	if (txn->fiber != fiber()) fiber_wakeup(txn->fiber);

But yeah, in this case this is fine - the fiber is sleeping for
sure.

> +	 */
> +	assert(entry->complete_data != NULL);
> +	fiber_wakeup(entry->complete_data);
> +}
> +
> +/**
> + * Write CONFIRM or ROLLBACK message to a journal directly
> + * without involving transaction engine because using txn
> + * engine is far from being cheap while we only need to
> + * write a small message.
> + */
>  static int
> -txn_limbo_write_confirm_rollback(struct txn_limbo *limbo, int64_t lsn,
> -				 bool is_confirm)
> +txn_limbo_write(uint32_t replica_id, int64_t lsn, int type)
>  {
> +	assert(replica_id != REPLICA_ID_NIL);
> +	assert(type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK);
>  	assert(lsn > 0);
>  
> -	struct xrow_header row;
> -	struct request request = {
> -		.header = &row,
> -	};
> +	char buf[sizeof(struct journal_entry) +
> +		 sizeof(struct xrow_header *) +
> +		 sizeof(struct xrow_header)];

3. The last addition is an overkill a bit. Why can't
you declare {struct xrow_header row;} on stack separately? I
understand

	sizeof(struct journal_entry) + sizeof(struct xrow_header *)

- they should be one object. But the third addition seems not
necessarily to be done in the same buffer variable.


4. I think all these changes are going to interfere with
https://github.com/tarantool/tarantool/issues/5151.

What are you going to do to request_synchro_body to implement 5151? Will they be
two separate structs? Or will the struct synchro_request contain the body? - this
would lead to data duplication though, because the point of synchro_request is to
provide the synchro fields not encoded nor bswaped.

Would be good to think through how will these changes live with 5151.

> -	struct txn *txn = txn_begin();
> -	if (txn == NULL)
> -		return -1;
> +	struct journal_entry *entry = (void *)buf;
> +	struct xrow_header *row = (void *)&entry->rows[1];
> +	entry->rows[0] = row;
>  
> -	int res = 0;
> -	if (is_confirm) {
> -		res = xrow_encode_confirm(&row, &txn->region,
> -					  limbo->instance_id, lsn);
> -	} else {
> -		/*
> -		 * This LSN is the first to be rolled back, so
> -		 * the last "safe" lsn is lsn - 1.
> -		 */
> -		res = xrow_encode_rollback(&row, &txn->region,
> -					   limbo->instance_id, lsn);
> +	struct request_synchro_body body;

5. It looks like you could make things simpler, if you would create a
struct containing the journal_entry, body, *xrow_header[1], and
xrow_header. Would simplify things IMO. You wouldn't need the stack
manipulations like you did above. 

> +	xrow_encode_confirm_rollback(row, &body, replica_id,
> +				     lsn, type);
> +
> +	journal_entry_create(entry, 1, xrow_approx_len(row),
> +			     txn_limbo_write_cb, fiber());
> +
> +	if (journal_write(entry) != 0) {
> +		diag_set(ClientError, ER_WAL_IO);
> +		diag_log();
> +		return -1;
>  	}
> -	if (res == -1)
> -		goto rollback;
> -	/*
> -	 * This is not really a transaction. It just uses txn API
> -	 * to put the data into WAL. And obviously it should not
> -	 * go to the limbo and block on the very same sync
> -	 * transaction which it tries to confirm now.
> -	 */
> -	txn_set_flag(txn, TXN_FORCE_ASYNC);
>  
> -	if (txn_begin_stmt(txn, NULL) != 0)
> -		goto rollback;
> -	if (txn_commit_stmt(txn, &request) != 0)
> -		goto rollback;
> +	if (entry->res < 0) {
> +		diag_set(ClientError, ER_WAL_IO);
> +		diag_log();
> +		return -1;
> +	}
>  
> -	return txn_commit(txn);
> -rollback:
> -	txn_rollback(txn);
> -	return -1;
> +	return 0;
>  }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine
  2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
                   ` (6 preceding siblings ...)
  2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 7/7] qsync: drop no longer used xrow_encode_confirm, rollback Cyrill Gorcunov
@ 2020-07-23 22:13 ` Vladislav Shpilevoy
  2020-07-24 18:16   ` Cyrill Gorcunov
  7 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 22:13 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

It is worth fixing txn.c in the same patchset.

txn_add_redo(), shouldn't work with synchro requests anymore.

Probably some other things can be simplified too. You need to
look at txn.c history to see what else was added here related to
writing synchro requests, and remove it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry
  2020-07-23 22:10   ` Vladislav Shpilevoy
@ 2020-07-24 17:48     ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-24 17:48 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 24, 2020 at 12:10:43AM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 5 comments below.
> 

Thanks! Addressed in a new series (I will send it out soon, hopefully)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper
  2020-07-23 22:10   ` Vladislav Shpilevoy
@ 2020-07-24 17:50     ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-24 17:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 24, 2020 at 12:10:47AM +0200, Vladislav Shpilevoy wrote:
> >  
> > +/**
> > + * Initialize a new journal entry.
> > + */
> > +static inline void
> > +journal_entry_create(struct journal_entry *entry, size_t n_rows,
> > +		     size_t approx_len,
> > +		     void (*write_async_cb)(struct journal_entry *entry),
> > +		     void *complete_data)
> > +{
> > +	/*
> > +	 * fifo member is left untouched because
> > +	 * it is used by the journal engine internally,
> > +	 * no need to waste time here.
> 
> fifo -> Fifo.
> 
> Also you can use up to 80 symbols for comments now. No need to keep them
> tight. Here and in all other new places.

The fifo is a member of structure. Actually I just ripped off the comment
simply because we already have a number of _create helpers which implies
to initialize only the member which has to be initialized. If something
is not touched it means that the member is left for internal use and
can be left unitialized.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries
  2020-07-23 22:10   ` Vladislav Shpilevoy
@ 2020-07-24 18:07     ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-24 18:07 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 24, 2020 at 12:10:52AM +0200, Vladislav Shpilevoy wrote:
> >  
> > +/** CONFIRM/ROLLBACK entries encoded in MsgPack. */
> > +struct PACKED request_synchro_body {
> 
> 1. Please, call it _bin, not _body. Look at the examples iproto_header_bin,
> iproto_body_bin. _body_bin also would be fine. But should end with _bin
> anyway.

Actually in thi file we already have

struct PACKED request_replace_body {
	uint8_t m_body;
	uint8_t k_space_id;
	uint8_t m_space_id;
	uint32_t v_space_id;
	uint8_t k_tuple;
};

that's why I named it this way. But sure renamed to
have _bin postfix.

> > @@ -897,26 +897,19 @@ static int
> >  xrow_encode_confirm_rollback(struct xrow_header *row, struct region *region,
> >  			     uint32_t replica_id, int64_t lsn, int type)
> >  {
> > -	size_t len = mp_sizeof_map(2) + mp_sizeof_uint(IPROTO_REPLICA_ID) +
> > -		     mp_sizeof_uint(replica_id) + mp_sizeof_uint(IPROTO_LSN) +
> > -		     mp_sizeof_uint(lsn);
> > -	char *buf = (char *)region_alloc(region, len);
> > -	if (buf == NULL) {
> > -		diag_set(OutOfMemory, len, "region_alloc", "buf");
> > +	struct request_synchro_body *body;
> > +
> > +	body = region_alloc(region, sizeof(*body));
> 
> 2. I see you omit explicit type casts. And use (void *) in the next commits,
> when need to cast between 2 non-void types. Both ways are not used in Tarantool.
> At least they are not part of the code style. Personally I am fine if we will
> adopt your way. It is not too conflicting with the old code (don't need to
> change the old code then), is more compact. But you need to raise that question
> with other teammates. So as we could write down that way in the doc, if everyone
> agrees. And force it in the newer commits.

I tried to ask, which end up in a flame war. I gave up. I think the
region_alloc is the same as malloc and etc and we already have a
number of places where we use implicit casting (grep region_alloc).
So I think we can use both approaches, while I prefer a short from
which is allowed by C standart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated CONFIRM/ROLLBACK entries
  2020-07-23 22:10   ` Vladislav Shpilevoy
@ 2020-07-24 18:08     ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-24 18:08 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 24, 2020 at 12:10:57AM +0200, Vladislav Shpilevoy wrote:
> > +
> > +/**
> > + * Encode the CONFIRM or ROLLBACK to row body and set row type.
> > + * @param row xrow header.
> > + * @param body body to encode.
> > + * @param replica_id master's instance id.
> > + * @param lsn last confirmed lsn.
> > + * @param type IPROTO_CONFIRM or IPROTO_ROLLBACK.
> 
> 1. Lets start sentences from capital letters.

Another code in this file doesn't use such convention but fine, will do.

> 2. I would also rename xrow_encode_confirm_rollback -> xrow_encode_synchro.
> While we are here. I don't really like the current name. Too long, and
> a bit confusing. When I saw it first time, I thought it encodes confirmation
> of a rollback.

Done.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal
  2020-07-23 22:10   ` Vladislav Shpilevoy
@ 2020-07-24 18:16     ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-24 18:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 24, 2020 at 12:10:58AM +0200, Vladislav Shpilevoy wrote:
> 
> 1. arhitectural -> architectural.

Thanks!

> > +static void
> > +txn_limbo_write_cb(struct journal_entry *entry)
> > +{
> > +	/*
> > +	 * It is safe to call fiber_wakeup on
> > +	 * active fiber but we need to call wakeup
> > +	 * in case if we're sitting in WAL thread.
> 
> 2. Actually it is not safe to call wakeup on the active fiber. It
> leads to something being broken in the scheduler. This is why we
> have so many checks like
> 
> 	if (txn->fiber != fiber()) fiber_wakeup(txn->fiber);
> 
> But yeah, in this case this is fine - the fiber is sleeping for
> sure.

As we've been taking f2f it is safe.

> > +txn_limbo_write(uint32_t replica_id, int64_t lsn, int type)
> >  {
> > +	assert(replica_id != REPLICA_ID_NIL);
> > +	assert(type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK);
> >  	assert(lsn > 0);
> >  
> > -	struct xrow_header row;
> > -	struct request request = {
> > -		.header = &row,
> > -	};
> > +	char buf[sizeof(struct journal_entry) +
> > +		 sizeof(struct xrow_header *) +
> > +		 sizeof(struct xrow_header)];
> 
> 3. The last addition is an overkill a bit. Why can't
> you declare {struct xrow_header row;} on stack separately? I
> understand
> 
> 	sizeof(struct journal_entry) + sizeof(struct xrow_header *)
> 
> - they should be one object. But the third addition seems not
> necessarily to be done in the same buffer variable.

In new series (which I've not sent yet) I've done the folloing

static int
txn_limbo_write(uint32_t replica_id, int64_t lsn, int type)
{
	assert(replica_id != REPLICA_ID_NIL);
	assert(type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK);
	assert(lsn > 0);

	/*
	 * When allocated statically some compilers (such as
	 * clang + asan) requires the journal_entry::rows to
	 * be last in a container structure. So it it simplier
	 * just to create a cummulative buffer.
	 */
	char buf[sizeof(struct journal_entry) +
		 sizeof(struct xrow_header *)];

	struct synchro_body_bin body_bin;
	struct xrow_header row;

	struct journal_entry *entry = (struct journal_entry *)buf;
	entry->rows[0] = &row;

	xrow_encode_synchro(&row, &body_bin, replica_id, lsn, type);

	journal_entry_create(entry, 1, xrow_approx_len(&row),
			     txn_limbo_write_cb, fiber());

...
> 
> 
> 4. I think all these changes are going to interfere with
> https://github.com/tarantool/tarantool/issues/5151.
> 
> What are you going to do to request_synchro_body to implement 5151? Will they be
> two separate structs? Or will the struct synchro_request contain the body? - this
> would lead to data duplication though, because the point of synchro_request is to
> provide the synchro fields not encoded nor bswaped.
> 
> Would be good to think through how will these changes live with 5151.


I though of declaring new structure like

struct synchro_request {
	uint32_t replica_id;
	int64_t lsn;
}

the convention to binary form has to be done anyway but when
we have _bin structure we can encode it fastly and use stack
instead of some dynamically allocated memory with mp_() helper
used to encode entries (current mp_ helpers use too much if's
inernally and we catually should really consider performance
loss due to them, I think not compressing msgpack would gain
significant speedup and keep in mind that later we encode data
second time when pushing down to journal binary records).

Back to synchro_request -- instead of opencoded replica_id
and lsn pair we could switch to synchro_request structure
in arguments. This will address 5151. I hope to implement
it at Monday and show you te result.
> 
> 5. It looks like you could make things simpler, if you would create a
> struct containing the journal_entry, body, *xrow_header[1], and
> xrow_header. Would simplify things IMO. You wouldn't need the stack
> manipulations like you did above. 

Above I've posted which new way I use.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine
  2020-07-23 22:13 ` [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Vladislav Shpilevoy
@ 2020-07-24 18:16   ` Cyrill Gorcunov
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov @ 2020-07-24 18:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 24, 2020 at 12:13:58AM +0200, Vladislav Shpilevoy wrote:
> It is worth fixing txn.c in the same patchset.
> 
> txn_add_redo(), shouldn't work with synchro requests anymore.
> 
> Probably some other things can be simplified too. You need to
> look at txn.c history to see what else was added here related to
> writing synchro requests, and remove it.

I'll take a look, thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-07-24 18:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 12:29 [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 1/7] journal: drop redundant declaration Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 2/7] wal: bind asynchronous write completion to an entry Cyrill Gorcunov
2020-07-23 22:10   ` Vladislav Shpilevoy
2020-07-24 17:48     ` Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 3/7] journal: add journal_entry_create helper Cyrill Gorcunov
2020-07-23 22:10   ` Vladislav Shpilevoy
2020-07-24 17:50     ` Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 4/7] qsync: provide a binary form of CONFIRM/ROLLBACK entries Cyrill Gorcunov
2020-07-23 22:10   ` Vladislav Shpilevoy
2020-07-24 18:07     ` Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 5/7] qsync: provide a way to encode preallocated " Cyrill Gorcunov
2020-07-23 22:10   ` Vladislav Shpilevoy
2020-07-24 18:08     ` Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 6/7] qsync: implement direct write of CONFIRM/ROLLBACK into a journal Cyrill Gorcunov
2020-07-23 22:10   ` Vladislav Shpilevoy
2020-07-24 18:16     ` Cyrill Gorcunov
2020-07-23 12:29 ` [Tarantool-patches] [PATCH v2 7/7] qsync: drop no longer used xrow_encode_confirm, rollback Cyrill Gorcunov
2020-07-23 22:13 ` [Tarantool-patches] [PATCH v2 0/5] qsync: write CONFIRM/ROLLBACK without txn engine Vladislav Shpilevoy
2020-07-24 18:16   ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox