Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine
@ 2020-07-15 13:55 Aleksandr Lyapunov
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 01/13] Update license file (2020) Aleksandr Lyapunov
                   ` (17 more replies)
  0 siblings, 18 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

Changes in v3:
  - Fixes after code review
  - Lots of comments added
  - Code cleanup
  - A couple of bugs fixed

Aleksandr Lyapunov (13):
  Update license file (2020)
  Check data_offset overflow in struct tuple
  vinyl: rename tx_manager -> vy_tx_manager
  txm: introduce dirty tuples
  txm: save txn in txn_stmt
  txm: add TX status
  txm: save does_require_old_tuple flag in txn_stmt
  txm: introduce tx manager
  tmx: introduce prepare sequence number
  tmx: introduce conflict tracker
  txm: introduce txm_story
  txm: clarify all fetched tuples
  tmx: use new tx manager in memtx

 LICENSE                               |    2 +-
 src/box/errcode.h                     |    1 +
 src/box/lua/load_cfg.lua              |    2 +
 src/box/memtx_bitset.c                |   30 +-
 src/box/memtx_engine.c                |   60 +-
 src/box/memtx_hash.c                  |   79 ++-
 src/box/memtx_rtree.c                 |   30 +-
 src/box/memtx_space.c                 |   45 +-
 src/box/memtx_tree.c                  |  119 +++-
 src/box/space.c                       |    2 +
 src/box/space.h                       |    4 +
 src/box/tuple.c                       |   12 +-
 src/box/tuple.h                       |   12 +-
 src/box/tuple_format.c                |    4 +-
 src/box/txn.c                         | 1186 +++++++++++++++++++++++++++++++++
 src/box/txn.h                         |  355 ++++++++++
 src/box/vinyl.c                       |   44 +-
 src/box/vy_scheduler.h                |    2 +-
 src/box/vy_stmt.c                     |    9 +
 src/box/vy_tx.c                       |   51 +-
 src/box/vy_tx.h                       |   33 +-
 src/main.cc                           |    5 +
 test/app-tap/init_script.result       |    1 +
 test/box/admin.result                 |    2 +
 test/box/cfg.result                   |    4 +
 test/box/error.result                 |    1 +
 test/box/huge_field_map.result        |   49 ++
 test/box/huge_field_map.test.lua      |   22 +
 test/box/huge_field_map_long.result   |   51 ++
 test/box/huge_field_map_long.test.lua |   28 +
 test/box/suite.ini                    |    1 +
 31 files changed, 2123 insertions(+), 123 deletions(-)
 create mode 100644 test/box/huge_field_map.result
 create mode 100644 test/box/huge_field_map.test.lua
 create mode 100644 test/box/huge_field_map_long.result
 create mode 100644 test/box/huge_field_map_long.test.lua

-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 01/13] Update license file (2020)
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 02/13] Check data_offset overflow in struct tuple Aleksandr Lyapunov
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

---
 LICENSE | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/LICENSE b/LICENSE
index d797c05..734ba6c 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,4 +1,4 @@
-Copyright 2010-2016 Tarantool AUTHORS: please see AUTHORS file.
+Copyright 2010-2020 Tarantool AUTHORS: please see AUTHORS file.
 
 Redistribution and use in source and binary forms, with or
 without modification, are permitted provided that the following
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 02/13] Check data_offset overflow in struct tuple
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 01/13] Update license file (2020) Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-16 14:27   ` Nikita Pettik
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager Aleksandr Lyapunov
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

data_offset member of tuple is uint16_t now. At the same time
this field is calculated from field_map_size which is uint32_t.
That could lead to overflows and crashes.

Fixes #5084
---
 src/box/errcode.h                     |  1 +
 src/box/memtx_engine.c                | 19 ++++++++-----
 src/box/tuple.c                       | 11 ++++++--
 src/box/vy_stmt.c                     |  8 ++++++
 test/box/error.result                 |  1 +
 test/box/huge_field_map.result        | 49 +++++++++++++++++++++++++++++++++
 test/box/huge_field_map.test.lua      | 22 +++++++++++++++
 test/box/huge_field_map_long.result   | 51 +++++++++++++++++++++++++++++++++++
 test/box/huge_field_map_long.test.lua | 28 +++++++++++++++++++
 test/box/suite.ini                    |  1 +
 10 files changed, 183 insertions(+), 8 deletions(-)
 create mode 100644 test/box/huge_field_map.result
 create mode 100644 test/box/huge_field_map.test.lua
 create mode 100644 test/box/huge_field_map_long.result
 create mode 100644 test/box/huge_field_map_long.test.lua

diff --git a/src/box/errcode.h b/src/box/errcode.h
index ea521aa..3c21375 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -270,6 +270,7 @@ struct errcode_record {
 	/*215 */_(ER_SYNC_MASTER_MISMATCH,	"CONFIRM message arrived for an unknown master id %d, expected %d") \
         /*216 */_(ER_SYNC_QUORUM_TIMEOUT,       "Quorum collection for a synchronous transaction is timed out") \
         /*217 */_(ER_SYNC_ROLLBACK,             "A rollback for a synchronous transaction is received") \
+	/*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG,	"Can't create tuple: metadata size %u is too big") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 6ce8cac..b5b6b14 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1125,6 +1125,18 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end)
 	if (tuple_field_map_create(format, data, true, &builder) != 0)
 		goto end;
 	uint32_t field_map_size = field_map_build_size(&builder);
+	/*
+	 * Data offset is calculated from the begin of the struct
+	 * tuple base, not from memtx_tuple, because the struct
+	 * tuple is not the first field of the memtx_tuple.
+	 */
+	uint32_t data_offset = sizeof(struct tuple) + field_map_size;
+	if (data_offset > UINT16_MAX) {
+		/** tuple->data_offset is 16 bits */
+		diag_set(ClientError, ER_TUPLE_METADATA_IS_TOO_BIG,
+			 data_offset);
+		goto end;
+	}
 
 	size_t tuple_len = end - data;
 	size_t total = sizeof(struct memtx_tuple) + field_map_size + tuple_len;
@@ -1157,12 +1169,7 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end)
 	tuple->bsize = tuple_len;
 	tuple->format_id = tuple_format_id(format);
 	tuple_format_ref(format);
-	/*
-	 * Data offset is calculated from the begin of the struct
-	 * tuple base, not from memtx_tuple, because the struct
-	 * tuple is not the first field of the memtx_tuple.
-	 */
-	tuple->data_offset = sizeof(struct tuple) + field_map_size;
+	tuple->data_offset = data_offset;
 	char *raw = (char *) tuple + tuple->data_offset;
 	field_map_build(&builder, raw - field_map_size);
 	memcpy(raw, data, tuple_len);
diff --git a/src/box/tuple.c b/src/box/tuple.c
index 1f52a8c..e48ee08 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -83,6 +83,13 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end
 	if (tuple_field_map_create(format, data, true, &builder) != 0)
 		goto end;
 	uint32_t field_map_size = field_map_build_size(&builder);
+	uint32_t data_offset = sizeof(struct tuple) + field_map_size;
+	if (data_offset > UINT16_MAX) {
+		/** tuple->data_offset is 16 bits */
+		diag_set(ClientError, ER_TUPLE_METADATA_IS_TOO_BIG,
+			 data_offset);
+		goto end;
+	}
 
 	size_t data_len = end - data;
 	size_t total = sizeof(struct tuple) + field_map_size + data_len;
@@ -97,8 +104,8 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end
 	tuple->bsize = data_len;
 	tuple->format_id = tuple_format_id(format);
 	tuple_format_ref(format);
-	tuple->data_offset = sizeof(struct tuple) + field_map_size;
-	char *raw = (char *) tuple + tuple->data_offset;
+	tuple->data_offset = data_offset;
+	char *raw = (char *) tuple + data_offset;
 	field_map_build(&builder, raw - field_map_size);
 	memcpy(raw, data, data_len);
 	say_debug("%s(%zu) = %p", __func__, data_len, tuple);
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index 392f3da..f59c418 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -159,6 +159,14 @@ static struct tuple *
 vy_stmt_alloc(struct tuple_format *format, uint32_t data_offset, uint32_t bsize)
 {
 	assert(data_offset >= sizeof(struct vy_stmt) + format->field_map_size);
+
+	if (data_offset > UINT16_MAX) {
+		/** tuple->data_offset is 16 bits */
+		diag_set(ClientError, ER_TUPLE_METADATA_IS_TOO_BIG,
+			 data_offset);
+		return NULL;
+	}
+
 	struct vy_stmt_env *env = format->engine;
 	uint32_t total_size = data_offset + bsize;
 	if (unlikely(total_size > env->max_tuple_size)) {
diff --git a/test/box/error.result b/test/box/error.result
index 8241ec1..cdecdb2 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -436,6 +436,7 @@ t;
  |   215: box.error.SYNC_MASTER_MISMATCH
  |   216: box.error.SYNC_QUORUM_TIMEOUT
  |   217: box.error.SYNC_ROLLBACK
+ |   218: box.error.TUPLE_METADATA_IS_TOO_BIG
  | ...
 
 test_run:cmd("setopt delimiter ''");
diff --git a/test/box/huge_field_map.result b/test/box/huge_field_map.result
new file mode 100644
index 0000000..11b4da3
--- /dev/null
+++ b/test/box/huge_field_map.result
@@ -0,0 +1,49 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+
+s = box.schema.space.create('test', {engine = 'memtx'})
+ | ---
+ | ...
+i1 = s:create_index('pk')
+ | ---
+ | ...
+i2 = s:create_index('mk', {parts={{'[2][*]', 'uint'}}})
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function test()
+    local t = {1, {}}
+    for i = 1,65536 do
+        table.insert(t[2], i)
+        if (i % 4096 == 0) then
+            s:replace(t)
+        end
+    end
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+pcall(test) -- must fail but not crash
+ | ---
+ | - false
+ | - 'Can''t create tuple: metadata size 65558 is too big'
+ | ...
+
+test = nil
+ | ---
+ | ...
+s:drop()
+ | ---
+ | ...
diff --git a/test/box/huge_field_map.test.lua b/test/box/huge_field_map.test.lua
new file mode 100644
index 0000000..9042751
--- /dev/null
+++ b/test/box/huge_field_map.test.lua
@@ -0,0 +1,22 @@
+env = require('test_run')
+test_run = env.new()
+
+s = box.schema.space.create('test', {engine = 'memtx'})
+i1 = s:create_index('pk')
+i2 = s:create_index('mk', {parts={{'[2][*]', 'uint'}}})
+test_run:cmd("setopt delimiter ';'")
+function test()
+    local t = {1, {}}
+    for i = 1,65536 do
+        table.insert(t[2], i)
+        if (i % 4096 == 0) then
+            s:replace(t)
+        end
+    end
+end;
+test_run:cmd("setopt delimiter ''");
+
+pcall(test) -- must fail but not crash
+
+test = nil
+s:drop()
\ No newline at end of file
diff --git a/test/box/huge_field_map_long.result b/test/box/huge_field_map_long.result
new file mode 100644
index 0000000..d7971ae
--- /dev/null
+++ b/test/box/huge_field_map_long.result
@@ -0,0 +1,51 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+
+s = box.schema.space.create('test', {engine = 'memtx'})
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function test()
+    local t = {}
+    local k = {}
+    for i = 1,128 do
+        local parts = {}
+        for j = 0,127 do
+            table.insert(parts, {i * 128 - j, 'uint'})
+            table.insert(t, 1)
+        end
+        if i == 1 then k = table.deepcopy(t) end
+        s:create_index('test'..i, {parts = parts})
+        if i % 16 == 0 then
+            s:replace(t)
+            s:delete(k)
+        end
+    end
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+pcall(test) -- must fail but not crash
+ | ---
+ | - false
+ | - 'Can''t create tuple: metadata size 65542 is too big'
+ | ...
+
+test = nil
+ | ---
+ | ...
+s:drop()
+ | ---
+ | ...
diff --git a/test/box/huge_field_map_long.test.lua b/test/box/huge_field_map_long.test.lua
new file mode 100644
index 0000000..6415615
--- /dev/null
+++ b/test/box/huge_field_map_long.test.lua
@@ -0,0 +1,28 @@
+env = require('test_run')
+test_run = env.new()
+
+s = box.schema.space.create('test', {engine = 'memtx'})
+test_run:cmd("setopt delimiter ';'")
+function test()
+    local t = {}
+    local k = {}
+    for i = 1,128 do
+        local parts = {}
+        for j = 0,127 do
+            table.insert(parts, {i * 128 - j, 'uint'})
+            table.insert(t, 1)
+        end
+        if i == 1 then k = table.deepcopy(t) end
+        s:create_index('test'..i, {parts = parts})
+        if i % 16 == 0 then
+            s:replace(t)
+            s:delete(k)
+        end
+    end
+end;
+test_run:cmd("setopt delimiter ''");
+
+pcall(test) -- must fail but not crash
+
+test = nil
+s:drop()
\ No newline at end of file
diff --git a/test/box/suite.ini b/test/box/suite.ini
index 7f2f5d8..a9ed671 100644
--- a/test/box/suite.ini
+++ b/test/box/suite.ini
@@ -3,6 +3,7 @@ core = tarantool
 description = Database tests
 script = box.lua
 disabled = rtree_errinj.test.lua tuple_bench.test.lua
+long_run = huge_field_map_long.test.lua
 config = engine.cfg
 release_disabled = errinj.test.lua errinj_index.test.lua rtree_errinj.test.lua upsert_errinj.test.lua iproto_stress.test.lua gh-4648-func-load-unload.test.lua
 lua_libs = lua/fifo.lua lua/utils.lua lua/bitset.lua lua/index_random_test.lua lua/push.lua lua/identifier.lua
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 01/13] Update license file (2020) Aleksandr Lyapunov
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 02/13] Check data_offset overflow in struct tuple Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 16:04   ` Nikita Pettik
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples Aleksandr Lyapunov
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

Apart from other vinyl objects that are named with "vy_" prefix,
its transaction manager (tx_manager) have no such prefix.
It should have in order to avoid conflicts with global tx manager.

Needed for #4897
---
 src/box/vinyl.c        | 30 ++++++++++++++---------------
 src/box/vy_scheduler.h |  2 +-
 src/box/vy_tx.c        | 51 +++++++++++++++++++++++++-------------------------
 src/box/vy_tx.h        | 33 ++++++++++++++++----------------
 4 files changed, 59 insertions(+), 57 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 32301d7..f9252f1 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -98,7 +98,7 @@ struct vy_env {
 	/** Recovery status */
 	enum vy_status status;
 	/** TX manager */
-	struct tx_manager   *xm;
+	struct vy_tx_manager *xm;
 	/** Upsert squash queue */
 	struct vy_squash_queue *squash_queue;
 	/** Memory pool for index iterator. */
@@ -267,7 +267,7 @@ vy_info_append_regulator(struct vy_env *env, struct info_handler *h)
 static void
 vy_info_append_tx(struct vy_env *env, struct info_handler *h)
 {
-	struct tx_manager *xm = env->xm;
+	struct vy_tx_manager *xm = env->xm;
 
 	info_table_begin(h, "tx");
 
@@ -292,7 +292,7 @@ static void
 vy_info_append_memory(struct vy_env *env, struct info_handler *h)
 {
 	info_table_begin(h, "memory");
-	info_append_int(h, "tx", tx_manager_mem_used(env->xm));
+	info_append_int(h, "tx", vy_tx_manager_mem_used(env->xm));
 	info_append_int(h, "level0", lsregion_used(&env->mem_env.allocator));
 	info_append_int(h, "tuple_cache", env->cache_env.mem_used);
 	info_append_int(h, "page_index", env->lsm_env.page_index_size);
@@ -509,7 +509,7 @@ vinyl_engine_memory_stat(struct engine *engine, struct engine_memory_stat *stat)
 	stat->index += env->lsm_env.bloom_size;
 	stat->index += env->lsm_env.page_index_size;
 	stat->cache += env->cache_env.mem_used;
-	stat->tx += tx_manager_mem_used(env->xm);
+	stat->tx += vy_tx_manager_mem_used(env->xm);
 }
 
 static void
@@ -517,7 +517,7 @@ vinyl_engine_reset_stat(struct engine *engine)
 {
 	struct vy_env *env = vy_env(engine);
 
-	struct tx_manager *xm = env->xm;
+	struct vy_tx_manager *xm = env->xm;
 	memset(&xm->stat, 0, sizeof(xm->stat));
 
 	vy_scheduler_reset_stat(&env->scheduler);
@@ -1007,7 +1007,7 @@ vinyl_space_invalidate(struct space *space)
 	 * request bail out early, without dereferencing the space.
 	 */
 	bool unused;
-	tx_manager_abort_writers_for_ddl(env->xm, space, &unused);
+	vy_tx_manager_abort_writers_for_ddl(env->xm, space, &unused);
 }
 
 /** Argument passed to vy_check_format_on_replace(). */
@@ -1075,7 +1075,7 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
 	 * be checked with on_replace trigger so we abort them.
 	 */
 	bool need_wal_sync;
-	tx_manager_abort_writers_for_ddl(env->xm, space, &need_wal_sync);
+	vy_tx_manager_abort_writers_for_ddl(env->xm, space, &need_wal_sync);
 
 	if (!need_wal_sync && vy_lsm_is_empty(pk))
 		return 0; /* space is empty, nothing to do */
@@ -2497,7 +2497,7 @@ static void
 vinyl_engine_switch_to_ro(struct engine *engine)
 {
 	struct vy_env *env = vy_env(engine);
-	tx_manager_abort_writers_for_ro(env->xm);
+	vy_tx_manager_abort_writers_for_ro(env->xm);
 }
 
 /* }}} Public API of transaction control */
@@ -2584,7 +2584,7 @@ vy_env_new(const char *path, size_t memory,
 		goto error_path;
 	}
 
-	e->xm = tx_manager_new();
+	e->xm = vy_tx_manager_new();
 	if (e->xm == NULL)
 		goto error_xm;
 	e->squash_queue = vy_squash_queue_new();
@@ -2620,7 +2620,7 @@ error_lsm_env:
 	vy_scheduler_destroy(&e->scheduler);
 	vy_squash_queue_delete(e->squash_queue);
 error_squash_queue:
-	tx_manager_delete(e->xm);
+	vy_tx_manager_delete(e->xm);
 error_xm:
 	free(e->path);
 error_path:
@@ -2634,7 +2634,7 @@ vy_env_delete(struct vy_env *e)
 	vy_regulator_destroy(&e->regulator);
 	vy_scheduler_destroy(&e->scheduler);
 	vy_squash_queue_delete(e->squash_queue);
-	tx_manager_delete(e->xm);
+	vy_tx_manager_delete(e->xm);
 	free(e->path);
 	mempool_destroy(&e->iterator_pool);
 	vy_run_env_destroy(&e->run_env);
@@ -2893,7 +2893,7 @@ vinyl_engine_end_recovery(struct engine *engine)
 		/*
 		 * During recovery we skip statements that have
 		 * been dumped to disk - see vy_is_committed() -
-		 * so it may turn out that tx_manager::lsn stays
+		 * so it may turn out that vy_tx_manager::lsn stays
 		 * behind the instance vclock while we need it
 		 * to be up-to-date once recovery is complete,
 		 * because we use it while building an index to
@@ -3717,7 +3717,7 @@ vinyl_snapshot_iterator_free(struct snapshot_iterator *base)
 	struct vy_lsm *lsm = it->iterator.lsm;
 	struct vy_env *env = vy_env(lsm->base.engine);
 	vy_read_iterator_close(&it->iterator);
-	tx_manager_destroy_read_view(env->xm, it->rv);
+	vy_tx_manager_destroy_read_view(env->xm, it->rv);
 	vy_lsm_unref(lsm);
 	free(it);
 }
@@ -3737,7 +3737,7 @@ vinyl_index_create_snapshot_iterator(struct index *base)
 	it->base.next = vinyl_snapshot_iterator_next;
 	it->base.free = vinyl_snapshot_iterator_free;
 
-	it->rv = tx_manager_read_view(env->xm);
+	it->rv = vy_tx_manager_read_view(env->xm);
 	if (it->rv == NULL) {
 		free(it);
 		return NULL;
@@ -4160,7 +4160,7 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
 	 * be checked with on_replace trigger so we abort them.
 	 */
 	bool need_wal_sync;
-	tx_manager_abort_writers_for_ddl(env->xm, src_space, &need_wal_sync);
+	vy_tx_manager_abort_writers_for_ddl(env->xm, src_space, &need_wal_sync);
 
 	if (!need_wal_sync && vy_lsm_is_empty(pk))
 		return 0; /* space is empty, nothing to do */
diff --git a/src/box/vy_scheduler.h b/src/box/vy_scheduler.h
index 42e7b2f..f487b42 100644
--- a/src/box/vy_scheduler.h
+++ b/src/box/vy_scheduler.h
@@ -148,7 +148,7 @@ struct vy_scheduler {
 	 * by the dump.
 	 */
 	vy_scheduler_dump_complete_f dump_complete_cb;
-	/** List of read views, see tx_manager::read_views. */
+	/** List of read views, see vy_tx_manager::read_views. */
 	struct rlist *read_views;
 	/** Context needed for writing runs. */
 	struct vy_run_env *run_env;
diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index 846c632..ff63cd7 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -98,13 +98,13 @@ vy_global_read_view_create(struct vy_read_view *rv, int64_t lsn)
 	rv->refs = 0;
 }
 
-struct tx_manager *
-tx_manager_new(void)
+struct vy_tx_manager *
+vy_tx_manager_new(void)
 {
-	struct tx_manager *xm = calloc(1, sizeof(*xm));
+	struct vy_tx_manager *xm = calloc(1, sizeof(*xm));
 	if (xm == NULL) {
 		diag_set(OutOfMemory, sizeof(*xm),
-			 "malloc", "struct tx_manager");
+			 "malloc", "struct vy_tx_manager");
 		return NULL;
 	}
 
@@ -128,7 +128,7 @@ tx_manager_new(void)
 }
 
 void
-tx_manager_delete(struct tx_manager *xm)
+vy_tx_manager_delete(struct vy_tx_manager *xm)
 {
 	mempool_destroy(&xm->read_view_mempool);
 	mempool_destroy(&xm->read_interval_mempool);
@@ -138,7 +138,7 @@ tx_manager_delete(struct tx_manager *xm)
 }
 
 size_t
-tx_manager_mem_used(struct tx_manager *xm)
+vy_tx_manager_mem_used(struct vy_tx_manager *xm)
 {
 	struct mempool_stats mstats;
 	size_t ret = 0;
@@ -157,7 +157,7 @@ tx_manager_mem_used(struct tx_manager *xm)
 }
 
 struct vy_read_view *
-tx_manager_read_view(struct tx_manager *xm)
+vy_tx_manager_read_view(struct vy_tx_manager *xm)
 {
 	struct vy_read_view *rv;
 	/*
@@ -195,7 +195,8 @@ tx_manager_read_view(struct tx_manager *xm)
 }
 
 void
-tx_manager_destroy_read_view(struct tx_manager *xm, struct vy_read_view *rv)
+vy_tx_manager_destroy_read_view(struct vy_tx_manager *xm,
+                                struct vy_read_view *rv)
 {
 	if (rv == xm->p_global_read_view)
 		return;
@@ -209,7 +210,7 @@ tx_manager_destroy_read_view(struct tx_manager *xm, struct vy_read_view *rv)
 static struct txv *
 txv_new(struct vy_tx *tx, struct vy_lsm *lsm, struct vy_entry entry)
 {
-	struct tx_manager *xm = tx->xm;
+	struct vy_tx_manager *xm = tx->xm;
 	struct txv *v = mempool_alloc(&xm->txv_mempool);
 	if (v == NULL) {
 		diag_set(OutOfMemory, sizeof(*v), "mempool", "struct txv");
@@ -234,7 +235,7 @@ txv_new(struct vy_tx *tx, struct vy_lsm *lsm, struct vy_entry entry)
 static void
 txv_delete(struct txv *v)
 {
-	struct tx_manager *xm = v->tx->xm;
+	struct vy_tx_manager *xm = v->tx->xm;
 	xm->write_set_size -= tuple_size(v->entry.stmt);
 	vy_stmt_counter_unacct_tuple(&v->lsm->stat.txw.count, v->entry.stmt);
 	tuple_unref(v->entry.stmt);
@@ -248,7 +249,7 @@ txv_delete(struct txv *v)
 static void
 vy_read_interval_acct(struct vy_read_interval *interval)
 {
-	struct tx_manager *xm = interval->tx->xm;
+	struct vy_tx_manager *xm = interval->tx->xm;
 	xm->read_set_size += tuple_size(interval->left.stmt);
 	if (interval->left.stmt != interval->right.stmt)
 		xm->read_set_size += tuple_size(interval->right.stmt);
@@ -260,7 +261,7 @@ vy_read_interval_acct(struct vy_read_interval *interval)
 static void
 vy_read_interval_unacct(struct vy_read_interval *interval)
 {
-	struct tx_manager *xm = interval->tx->xm;
+	struct vy_tx_manager *xm = interval->tx->xm;
 	xm->read_set_size -= tuple_size(interval->left.stmt);
 	if (interval->left.stmt != interval->right.stmt)
 		xm->read_set_size -= tuple_size(interval->right.stmt);
@@ -271,7 +272,7 @@ vy_read_interval_new(struct vy_tx *tx, struct vy_lsm *lsm,
 		     struct vy_entry left, bool left_belongs,
 		     struct vy_entry right, bool right_belongs)
 {
-	struct tx_manager *xm = tx->xm;
+	struct vy_tx_manager *xm = tx->xm;
 	struct vy_read_interval *interval;
 	interval = mempool_alloc(&xm->read_interval_mempool);
 	if (interval == NULL) {
@@ -296,7 +297,7 @@ vy_read_interval_new(struct vy_tx *tx, struct vy_lsm *lsm,
 static void
 vy_read_interval_delete(struct vy_read_interval *interval)
 {
-	struct tx_manager *xm = interval->tx->xm;
+	struct vy_tx_manager *xm = interval->tx->xm;
 	vy_read_interval_unacct(interval);
 	vy_lsm_unref(interval->lsm);
 	tuple_unref(interval->left.stmt);
@@ -316,7 +317,7 @@ vy_tx_read_set_free_cb(vy_tx_read_set_t *read_set,
 }
 
 void
-vy_tx_create(struct tx_manager *xm, struct vy_tx *tx)
+vy_tx_create(struct vy_tx_manager *xm, struct vy_tx *tx)
 {
 	tx->last_stmt_space = NULL;
 	stailq_create(&tx->log);
@@ -339,7 +340,7 @@ vy_tx_destroy(struct vy_tx *tx)
 	trigger_run(&tx->on_destroy, NULL);
 	trigger_destroy(&tx->on_destroy);
 
-	tx_manager_destroy_read_view(tx->xm, tx->read_view);
+	vy_tx_manager_destroy_read_view(tx->xm, tx->read_view);
 
 	struct txv *v, *tmp;
 	stailq_foreach_entry_safe(v, tmp, &tx->log, next_in_log)
@@ -392,7 +393,7 @@ vy_tx_send_to_read_view(struct vy_tx *tx, struct txv *v)
 		/* already in (earlier) read view */
 		if (vy_tx_is_in_read_view(abort))
 			continue;
-		struct vy_read_view *rv = tx_manager_read_view(tx->xm);
+		struct vy_read_view *rv = vy_tx_manager_read_view(tx->xm);
 		if (rv == NULL)
 			return -1;
 		abort->read_view = rv;
@@ -422,7 +423,7 @@ vy_tx_abort_readers(struct vy_tx *tx, struct txv *v)
 }
 
 struct vy_tx *
-vy_tx_begin(struct tx_manager *xm)
+vy_tx_begin(struct vy_tx_manager *xm)
 {
 	struct vy_tx *tx = mempool_alloc(&xm->tx_mempool);
 	if (unlikely(tx == NULL)) {
@@ -662,7 +663,7 @@ vy_tx_handle_deferred_delete(struct vy_tx *tx, struct txv *v)
 int
 vy_tx_prepare(struct vy_tx *tx)
 {
-	struct tx_manager *xm = tx->xm;
+	struct vy_tx_manager *xm = tx->xm;
 
 	if (tx->state == VINYL_TX_ABORT) {
 		/* Conflict is already accounted - see vy_tx_abort(). */
@@ -793,7 +794,7 @@ void
 vy_tx_commit(struct vy_tx *tx, int64_t lsn)
 {
 	assert(tx->state == VINYL_TX_COMMIT);
-	struct tx_manager *xm = tx->xm;
+	struct vy_tx_manager *xm = tx->xm;
 
 	xm->stat.commit++;
 
@@ -833,7 +834,7 @@ vy_tx_rollback_after_prepare(struct vy_tx *tx)
 {
 	assert(tx->state == VINYL_TX_COMMIT);
 
-	struct tx_manager *xm = tx->xm;
+	struct vy_tx_manager *xm = tx->xm;
 
 	/*
 	 * There are two reasons of rollback_after_prepare:
@@ -878,7 +879,7 @@ vy_tx_rollback_after_prepare(struct vy_tx *tx)
 void
 vy_tx_rollback(struct vy_tx *tx)
 {
-	struct tx_manager *xm = tx->xm;
+	struct vy_tx_manager *xm = tx->xm;
 
 	xm->stat.rollback++;
 
@@ -1140,8 +1141,8 @@ vy_tx_set(struct vy_tx *tx, struct vy_lsm *lsm, struct tuple *stmt)
 }
 
 void
-tx_manager_abort_writers_for_ddl(struct tx_manager *xm, struct space *space,
-				 bool *need_wal_sync)
+vy_tx_manager_abort_writers_for_ddl(struct vy_tx_manager *xm,
+				    struct space *space, bool *need_wal_sync)
 {
 	*need_wal_sync = false;
 	if (space->index_count == 0)
@@ -1166,7 +1167,7 @@ tx_manager_abort_writers_for_ddl(struct tx_manager *xm, struct space *space,
 }
 
 void
-tx_manager_abort_writers_for_ro(struct tx_manager *xm)
+vy_tx_manager_abort_writers_for_ro(struct vy_tx_manager *xm)
 {
 	struct vy_tx *tx;
 	rlist_foreach_entry(tx, &xm->writers, in_writers) {
diff --git a/src/box/vy_tx.h b/src/box/vy_tx.h
index 3144c92..4fac5f6 100644
--- a/src/box/vy_tx.h
+++ b/src/box/vy_tx.h
@@ -54,7 +54,7 @@ extern "C" {
 
 struct space;
 struct tuple;
-struct tx_manager;
+struct vy_tx_manager;
 struct vy_mem;
 struct vy_tx;
 struct vy_history;
@@ -140,16 +140,16 @@ write_set_search_key(write_set_t *tree, struct vy_lsm *lsm,
 
 /** Transaction object. */
 struct vy_tx {
-	/** Link in tx_manager::writers. */
+	/** Link in vy_tx_manager::writers. */
 	struct rlist in_writers;
 	/** Transaction manager. */
-	struct tx_manager *xm;
+	struct vy_tx_manager *xm;
 	/**
 	 * Pointer to the space affected by the last prepared statement.
 	 * We need it so that we can abort a transaction on DDL even
 	 * if it hasn't inserted anything into the write set yet (e.g.
 	 * yielded on unique check) and therefore would otherwise be
-	 * ignored by tx_manager_abort_writers_for_ddl().
+	 * ignored by vy_tx_manager_abort_writers_for_ddl().
 	 */
 	struct space *last_stmt_space;
 	/**
@@ -209,7 +209,7 @@ vy_tx_read_view(struct vy_tx *tx)
 }
 
 /** Transaction manager object. */
-struct tx_manager {
+struct vy_tx_manager {
 	/**
 	 * The last committed log sequence number known to
 	 * vinyl. Updated in vy_commit().
@@ -278,24 +278,25 @@ struct tx_manager {
 };
 
 /** Allocate a tx manager object. */
-struct tx_manager *
-tx_manager_new(void);
+struct vy_tx_manager *
+vy_tx_manager_new(void);
 
 /** Delete a tx manager object. */
 void
-tx_manager_delete(struct tx_manager *xm);
+vy_tx_manager_delete(struct vy_tx_manager *xm);
 
 /** Return total amount of memory used by active transactions. */
 size_t
-tx_manager_mem_used(struct tx_manager *xm);
+vy_tx_manager_mem_used(struct vy_tx_manager *xm);
 
 /** Create or reuse an instance of a read view. */
 struct vy_read_view *
-tx_manager_read_view(struct tx_manager *xm);
+vy_tx_manager_read_view(struct vy_tx_manager *xm);
 
 /** Dereference and possibly destroy a read view. */
 void
-tx_manager_destroy_read_view(struct tx_manager *xm, struct vy_read_view *rv);
+vy_tx_manager_destroy_read_view(struct vy_tx_manager *xm,
+                                struct vy_read_view *rv);
 
 /**
  * Abort all rw transactions that affect the given space
@@ -307,19 +308,19 @@ tx_manager_destroy_read_view(struct tx_manager *xm, struct vy_read_view *rv);
  * to call wal_sync() to flush them.
  */
 void
-tx_manager_abort_writers_for_ddl(struct tx_manager *xm, struct space *space,
-				 bool *need_wal_sync);
+vy_tx_manager_abort_writers_for_ddl(struct vy_tx_manager *xm,
+                                    struct space *space, bool *need_wal_sync);
 
 /**
  * Abort all local rw transactions that haven't reached WAL yet.
  * Called before switching to read-only mode.
  */
 void
-tx_manager_abort_writers_for_ro(struct tx_manager *xm);
+vy_tx_manager_abort_writers_for_ro(struct vy_tx_manager *xm);
 
 /** Initialize a tx object. */
 void
-vy_tx_create(struct tx_manager *xm, struct vy_tx *tx);
+vy_tx_create(struct vy_tx_manager *xm, struct vy_tx *tx);
 
 /** Destroy a tx object. */
 void
@@ -327,7 +328,7 @@ vy_tx_destroy(struct vy_tx *tx);
 
 /** Begin a new transaction. */
 struct vy_tx *
-vy_tx_begin(struct tx_manager *xm);
+vy_tx_begin(struct vy_tx_manager *xm);
 
 /** Prepare a transaction to be committed. */
 int
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (2 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 16:22   ` Nikita Pettik
  2020-07-16  0:05   ` Vladislav Shpilevoy
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 05/13] txm: save txn in txn_stmt Aleksandr Lyapunov
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

If the tuple is marked as dirty that usually means that it was
somehow affected by a transaction. If a tuple found in index is
dirty - we cannot immediately return to user, instead we must
clarify it in transaction manager.

Part of #4897
---
 src/box/memtx_engine.c              |  5 +++--
 src/box/tuple.c                     |  5 +++--
 src/box/tuple.h                     | 12 ++++++++++--
 src/box/tuple_format.c              |  4 ++--
 src/box/vy_stmt.c                   |  5 +++--
 test/box/huge_field_map.result      |  2 +-
 test/box/huge_field_map_long.result |  2 +-
 7 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index b5b6b14..dfd6fce 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1131,8 +1131,8 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end)
 	 * tuple is not the first field of the memtx_tuple.
 	 */
 	uint32_t data_offset = sizeof(struct tuple) + field_map_size;
-	if (data_offset > UINT16_MAX) {
-		/** tuple->data_offset is 16 bits */
+	if (data_offset > INT16_MAX) {
+		/** tuple->data_offset is 15 bits */
 		diag_set(ClientError, ER_TUPLE_METADATA_IS_TOO_BIG,
 			 data_offset);
 		goto end;
@@ -1170,6 +1170,7 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end)
 	tuple->format_id = tuple_format_id(format);
 	tuple_format_ref(format);
 	tuple->data_offset = data_offset;
+	tuple->is_dirty = false;
 	char *raw = (char *) tuple + tuple->data_offset;
 	field_map_build(&builder, raw - field_map_size);
 	memcpy(raw, data, tuple_len);
diff --git a/src/box/tuple.c b/src/box/tuple.c
index e48ee08..9f0f24c 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -84,8 +84,8 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end
 		goto end;
 	uint32_t field_map_size = field_map_build_size(&builder);
 	uint32_t data_offset = sizeof(struct tuple) + field_map_size;
-	if (data_offset > UINT16_MAX) {
-		/** tuple->data_offset is 16 bits */
+	if (data_offset > INT16_MAX) {
+		/** tuple->data_offset is 15 bits */
 		diag_set(ClientError, ER_TUPLE_METADATA_IS_TOO_BIG,
 			 data_offset);
 		goto end;
@@ -105,6 +105,7 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end
 	tuple->format_id = tuple_format_id(format);
 	tuple_format_ref(format);
 	tuple->data_offset = data_offset;
+	tuple->is_dirty = false;
 	char *raw = (char *) tuple + data_offset;
 	field_map_build(&builder, raw - field_map_size);
 	memcpy(raw, data, data_len);
diff --git a/src/box/tuple.h b/src/box/tuple.h
index 9a88772..4752323 100644
--- a/src/box/tuple.h
+++ b/src/box/tuple.h
@@ -319,7 +319,13 @@ struct PACKED tuple
 	/**
 	 * Offset to the MessagePack from the begin of the tuple.
 	 */
-	uint16_t data_offset;
+	uint16_t data_offset : 15;
+	/**
+	 * The tuple (if it's found in index for example) could be invisible
+	 * for current transactions. The flag means that the tuple must
+	 * be clarified by transaction engine.
+	 */
+	bool is_dirty : 1;
 	/**
 	 * Engine specific fields and offsets array concatenated
 	 * with MessagePack fields array.
@@ -1081,8 +1087,10 @@ tuple_unref(struct tuple *tuple)
 	assert(tuple->refs - 1 >= 0);
 	if (unlikely(tuple->is_bigref))
 		tuple_unref_slow(tuple);
-	else if (--tuple->refs == 0)
+	else if (--tuple->refs == 0) {
+		assert(!tuple->is_dirty);
 		tuple_delete(tuple);
+	}
 }
 
 extern struct tuple *box_tuple_last;
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index faf038a..bae6c67 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -501,8 +501,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 	assert(tuple_format_field(format, 0)->offset_slot == TUPLE_OFFSET_SLOT_NIL
 	       || json_token_is_multikey(&tuple_format_field(format, 0)->token));
 	size_t field_map_size = -current_slot * sizeof(uint32_t);
-	if (field_map_size > UINT16_MAX) {
-		/** tuple->data_offset is 16 bits */
+	if (field_map_size > INT16_MAX) {
+		/** tuple->data_offset is 15 bits */
 		diag_set(ClientError, ER_INDEX_FIELD_COUNT_LIMIT,
 			 -current_slot);
 		return -1;
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index f59c418..92e0aa1 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -160,8 +160,8 @@ vy_stmt_alloc(struct tuple_format *format, uint32_t data_offset, uint32_t bsize)
 {
 	assert(data_offset >= sizeof(struct vy_stmt) + format->field_map_size);
 
-	if (data_offset > UINT16_MAX) {
-		/** tuple->data_offset is 16 bits */
+	if (data_offset > INT16_MAX) {
+		/** tuple->data_offset is 15 bits */
 		diag_set(ClientError, ER_TUPLE_METADATA_IS_TOO_BIG,
 			 data_offset);
 		return NULL;
@@ -198,6 +198,7 @@ vy_stmt_alloc(struct tuple_format *format, uint32_t data_offset, uint32_t bsize)
 		tuple_format_ref(format);
 	tuple->bsize = bsize;
 	tuple->data_offset = data_offset;
+	tuple->is_dirty = false;
 	vy_stmt_set_lsn(tuple, 0);
 	vy_stmt_set_type(tuple, 0);
 	vy_stmt_set_flags(tuple, 0);
diff --git a/test/box/huge_field_map.result b/test/box/huge_field_map.result
index 11b4da3..45022cc 100644
--- a/test/box/huge_field_map.result
+++ b/test/box/huge_field_map.result
@@ -38,7 +38,7 @@ test_run:cmd("setopt delimiter ''");
 pcall(test) -- must fail but not crash
  | ---
  | - false
- | - 'Can''t create tuple: metadata size 65558 is too big'
+ | - 'Can''t create tuple: metadata size 32790 is too big'
  | ...
 
 test = nil
diff --git a/test/box/huge_field_map_long.result b/test/box/huge_field_map_long.result
index d7971ae..cb47900 100644
--- a/test/box/huge_field_map_long.result
+++ b/test/box/huge_field_map_long.result
@@ -40,7 +40,7 @@ test_run:cmd("setopt delimiter ''");
 pcall(test) -- must fail but not crash
  | ---
  | - false
- | - 'Can''t create tuple: metadata size 65542 is too big'
+ | - 'Can''t create tuple: metadata size 32774 is too big'
  | ...
 
 test = nil
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 05/13] txm: save txn in txn_stmt
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (3 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 16:23   ` Nikita Pettik
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 06/13] txm: add TX status Aleksandr Lyapunov
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

There is a lot of places in transaction engine (see futher commits)
where it's convenient to store just a pointer to tx statement while
having a way to get the transaction itself by this pointer.
Let's store a pointer to TX in TX statement for that purpose.

Part of #4897
---
 src/box/txn.c | 1 +
 src/box/txn.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/src/box/txn.c b/src/box/txn.c
index 287f352..62b91d6 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -109,6 +109,7 @@ txn_stmt_new(struct region *region)
 	}
 
 	/* Initialize members explicitly to save time on memset() */
+	stmt->txn = in_txn();
 	stmt->space = NULL;
 	stmt->old_tuple = NULL;
 	stmt->new_tuple = NULL;
diff --git a/src/box/txn.h b/src/box/txn.h
index c1f06db..36b1a03 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -129,6 +129,8 @@ struct txn_stmt {
 
 	/** A linked list of all statements. */
 	struct stailq_entry next;
+	/** Owner of that statement. */
+	struct txn *txn;
 	/** Undo info. */
 	struct space *space;
 	struct tuple *old_tuple;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 06/13] txm: add TX status
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (4 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 05/13] txm: save txn in txn_stmt Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 16:42   ` Nikita Pettik
  2020-07-16  0:08   ` Vladislav Shpilevoy
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt Aleksandr Lyapunov
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

Transaction engine (see further commits) needs to distinguish and
maniputate transactions by their status. The status describe the
lifetime point of a transaction (inprogress, prepared, committed)
and its abilities (conflicted, read view).

Part of #4897
Part of #5108
---
 src/box/txn.c |  5 +++++
 src/box/txn.h | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/src/box/txn.c b/src/box/txn.c
index 62b91d6..0372047 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -231,6 +231,7 @@ txn_begin(void)
 	txn->flags = 0;
 	txn->in_sub_stmt = 0;
 	txn->id = ++tsn;
+	txn->status = TXN_INPROGRESS;
 	txn->signature = TXN_SIGNATURE_ROLLBACK;
 	txn->engine = NULL;
 	txn->engine_tx = NULL;
@@ -455,6 +456,7 @@ txn_complete(struct txn *txn)
 	 * IPROTO_NOP or IPROTO_CONFIRM statements.
 	 */
 	if (txn->signature < 0) {
+		txn->status = TXN_ABORTED;
 		/* Undo the transaction. */
 		struct txn_stmt *stmt;
 		stailq_reverse(&txn->stmts);
@@ -465,6 +467,7 @@ txn_complete(struct txn *txn)
 		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
 			txn_run_rollback_triggers(txn, &txn->on_rollback);
 	} else if (!txn_has_flag(txn, TXN_WAIT_SYNC)) {
+		txn->status = TXN_COMMITTED;
 		/* Commit the transaction. */
 		if (txn->engine != NULL)
 			engine_commit(txn->engine, txn);
@@ -662,6 +665,7 @@ txn_prepare(struct txn *txn)
 		trigger_clear(&txn->fiber_on_yield);
 
 	txn->start_tm = ev_monotonic_now(loop());
+	txn->status = TXN_PREPARED;
 	return 0;
 }
 
@@ -895,6 +899,7 @@ void
 txn_rollback(struct txn *txn)
 {
 	assert(txn == in_txn());
+	txn->status = TXN_ABORTED;
 	trigger_clear(&txn->fiber_on_stop);
 	if (!txn_has_flag(txn, TXN_CAN_YIELD))
 		trigger_clear(&txn->fiber_on_yield);
diff --git a/src/box/txn.h b/src/box/txn.h
index 36b1a03..e261852 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -121,6 +121,40 @@ enum {
 };
 
 /**
+ * Status of a transaction.
+ */
+enum txn_status {
+	/**
+	 * Initial state of TX. The only state of a TX that allowed to do
+	 * read or write actions.
+	 */
+	TXN_INPROGRESS,
+	/**
+	 * The TX have passed conflict checks and is ready to be committed.
+	 */
+	TXN_PREPARED,
+	/**
+	 * The TX was aborted when other TX was committed due to conflict.
+	 */
+	TXN_CONFLICTED,
+	/**
+	 * The TX was read_only, has a conflict and was sent to read view.
+	 * Read-only and does not participate in conflict resolution ever more.
+	 * This transaction can onlu see state of the database at some fixed
+	 * point in the past.
+	 */
+	TXN_IN_READ_VIEW,
+	/**
+	 * The TX was committed.
+	 */
+	TXN_COMMITTED,
+	/**
+	 * The TX was aborted by user.
+	 */
+	TXN_ABORTED,
+};
+
+/**
  * A single statement of a multi-statement
  * transaction: undo and redo info.
  */
@@ -217,6 +251,8 @@ struct txn {
 	 * Valid IDs start from 1.
 	 */
 	int64_t id;
+	/** Status of the TX */
+	enum txn_status status;
 	/** List of statements in a transaction. */
 	struct stailq stmts;
 	/** Number of new rows without an assigned LSN. */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (5 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 06/13] txm: add TX status Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 16:49   ` Nikita Pettik
  2020-07-16  0:09   ` Vladislav Shpilevoy
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager Aleksandr Lyapunov
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

That flag is needed for transactional conflict manager - if any
other transaction replaces old_tuple before current one and the flag
is set - the current transaction will be aborted.
For example REPLACE just replaces a key, no matter what tuple
lays in the index and thus does_require_old_tuple = false.
In contrast, UPDATE makes new tuple using old_tuple and thus
the statement will require old_tuple (does_require_old_tuple = true).
INSERT also does_require_old_tuple = true because it requires
old_tuple to be NULL.

Part of #4897
---
 src/box/memtx_space.c | 17 +++++++++++++++++
 src/box/txn.c         |  3 +++
 src/box/txn.h         | 13 +++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 8452ab4..e48ed3a 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -316,6 +316,10 @@ memtx_space_execute_replace(struct space *space, struct txn *txn,
 	if (stmt->new_tuple == NULL)
 		return -1;
 	tuple_ref(stmt->new_tuple);
+
+	if (mode == DUP_INSERT)
+		stmt->does_require_old_tuple = true;
+
 	if (memtx_space->replace(space, NULL, stmt->new_tuple,
 				 mode, &stmt->old_tuple) != 0)
 		return -1;
@@ -342,6 +346,13 @@ memtx_space_execute_delete(struct space *space, struct txn *txn,
 	struct tuple *old_tuple;
 	if (index_get(pk, key, part_count, &old_tuple) != 0)
 		return -1;
+
+	/*
+	 * We have to delete exactly old_tuple just because we return it as
+	 * a result.
+	 */
+	stmt->does_require_old_tuple = true;
+
 	if (old_tuple != NULL &&
 	    memtx_space->replace(space, old_tuple, NULL,
 				 DUP_REPLACE_OR_INSERT, &stmt->old_tuple) != 0)
@@ -390,6 +401,9 @@ memtx_space_execute_update(struct space *space, struct txn *txn,
 	if (stmt->new_tuple == NULL)
 		return -1;
 	tuple_ref(stmt->new_tuple);
+
+	stmt->does_require_old_tuple = true;
+
 	if (memtx_space->replace(space, old_tuple, stmt->new_tuple,
 				 DUP_REPLACE, &stmt->old_tuple) != 0)
 		return -1;
@@ -496,6 +510,9 @@ memtx_space_execute_upsert(struct space *space, struct txn *txn,
 			stmt->new_tuple = NULL;
 		}
 	}
+
+	stmt->does_require_old_tuple = true;
+
 	/*
 	 * It's OK to use DUP_REPLACE_OR_INSERT: we don't risk
 	 * inserting a new tuple if the old one exists, since
diff --git a/src/box/txn.c b/src/box/txn.c
index 0372047..d254edb 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -116,6 +116,7 @@ txn_stmt_new(struct region *region)
 	stmt->engine_savepoint = NULL;
 	stmt->row = NULL;
 	stmt->has_triggers = false;
+	stmt->does_require_old_tuple = false;
 	return stmt;
 }
 
@@ -360,6 +361,8 @@ txn_commit_stmt(struct txn *txn, struct request *request)
 	 */
 	if (stmt->space != NULL && !rlist_empty(&stmt->space->on_replace) &&
 	    stmt->space->run_triggers && (stmt->old_tuple || stmt->new_tuple)) {
+		/* Triggers see old_tuple and that tuple must remain the same */
+		stmt->does_require_old_tuple = true;
 		int rc = 0;
 		if(!space_is_temporary(stmt->space)) {
 			rc = trigger_run(&stmt->space->on_replace, txn);
diff --git a/src/box/txn.h b/src/box/txn.h
index e261852..962ada0 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -175,6 +175,19 @@ struct txn_stmt {
 	struct xrow_header *row;
 	/** on_commit and/or on_rollback list is not empty. */
 	bool has_triggers;
+	/**
+	 * Whether the stmt requires to replace exactly old_tuple (member).
+	 * That flag is needed for transactional conflict manager - if any
+	 * other transaction replaces old_tuple before current one and the flag
+	 * is set - the current transaction will be aborted.
+	 * For example REPLACE just replaces a key, no matter what tuple
+	 * lays in the index and thus does_require_old_tuple = false.
+	 * In contrast, UPDATE makes new tuple using old_tuple and thus
+	 * the statement will require old_tuple (does_require_old_tuple = true).
+	 * INSERT also does_require_old_tuple = true because it requires
+	 * old_tuple to be NULL.
+	 */
+	bool does_require_old_tuple;
 	/** Commit/rollback triggers associated with this statement. */
 	struct rlist on_commit;
 	struct rlist on_rollback;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (6 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 16:51   ` Nikita Pettik
                     ` (2 more replies)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number Aleksandr Lyapunov
                   ` (9 subsequent siblings)
  17 siblings, 3 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

Define TX manager. It will store data for MVCC and conflict manager.
Define also 'use_mvcc_engine' in config that enables MVCC engine.

Part of #4897
---
 src/box/lua/load_cfg.lua        |  2 ++
 src/box/txn.c                   | 21 +++++++++++++++++++++
 src/box/txn.h                   | 12 ++++++++++++
 src/main.cc                     |  5 +++++
 test/app-tap/init_script.result |  1 +
 test/box/admin.result           |  2 ++
 test/box/cfg.result             |  4 ++++
 7 files changed, 47 insertions(+)

diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 107bc15..8b40b29 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -82,6 +82,7 @@ local default_cfg = {
     coredump            = false,
     read_only           = false,
     hot_standby         = false,
+    use_mvcc_engine     = false,
     checkpoint_interval = 3600,
     checkpoint_wal_threshold = 1e18,
     checkpoint_count    = 2,
@@ -162,6 +163,7 @@ local template_cfg = {
     checkpoint_count    = 'number',
     read_only           = 'boolean',
     hot_standby         = 'boolean',
+    use_mvcc_engine     = 'boolean',
     worker_pool_threads = 'number',
     replication_timeout = 'number',
     replication_sync_lag = 'number',
diff --git a/src/box/txn.c b/src/box/txn.c
index d254edb..d647fc9 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -38,6 +38,16 @@
 #include "errinj.h"
 #include "iproto_constants.h"
 
+struct tx_manager
+{
+};
+
+/** That's a definition, see declaration for description. */
+bool tx_manager_use_mvcc_engine = false;
+
+/** The one and only instance of tx_manager. */
+static struct tx_manager txm;
+
 double too_long_threshold;
 
 /* Txn cache. */
@@ -1165,3 +1175,14 @@ txn_on_yield(struct trigger *trigger, void *event)
 	txn_set_flag(txn, TXN_IS_ABORTED_BY_YIELD);
 	return 0;
 }
+
+void
+tx_manager_init()
+{
+	(void)txm;
+}
+
+void
+tx_manager_free()
+{
+}
diff --git a/src/box/txn.h b/src/box/txn.h
index 962ada0..a2374f3 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -43,6 +43,12 @@ extern "C" {
 
 /** box statistics */
 extern struct rmean *rmean_box;
+/**
+ * Global flag that enables mvcc engine.
+ * If set, memtx starts to apply statements through txm history mechanism
+ * and tx manager itself transaction reads in order to detect conflicts.
+ */
+extern bool tx_manager_use_mvcc_engine;
 
 struct journal_entry;
 struct engine;
@@ -700,6 +706,12 @@ box_txn_savepoint(void);
 API_EXPORT int
 box_txn_rollback_to_savepoint(box_txn_savepoint_t *savepoint);
 
+void
+tx_manager_init();
+
+void
+tx_manager_free();
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
diff --git a/src/main.cc b/src/main.cc
index 65b1606..07eafb8 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -75,6 +75,7 @@
 #include <libutil.h>
 #include "box/lua/init.h" /* box_lua_init() */
 #include "box/session.h"
+#include "box/txn.h"
 #include "systemd.h"
 #include "crypto/crypto.h"
 #include "core/popen.h"
@@ -569,6 +570,8 @@ load_cfg(void)
 			log_format,
 			background);
 
+	tx_manager_use_mvcc_engine = cfg_getb("use_mvcc_engine");
+
 	if (background)
 		daemonize();
 
@@ -667,6 +670,7 @@ tarantool_free(void)
 	random_free();
 #endif
 	crypto_free();
+	tx_manager_free();
 	coll_free();
 	systemd_free();
 	say_logger_free();
@@ -830,6 +834,7 @@ main(int argc, char **argv)
 	signal_init();
 	cbus_init();
 	coll_init();
+	tx_manager_init();
 	crypto_init();
 	systemd_init();
 	tarantool_lua_init(tarantool_bin, main_argc, main_argv);
diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
index 857f0c9..3838e35 100644
--- a/test/app-tap/init_script.result
+++ b/test/app-tap/init_script.result
@@ -37,6 +37,7 @@ slab_alloc_factor:1.05
 sql_cache_size:5242880
 strip_core:true
 too_long_threshold:0.5
+use_mvcc_engine:false
 vinyl_bloom_fpr:0.05
 vinyl_cache:134217728
 vinyl_dir:.
diff --git a/test/box/admin.result b/test/box/admin.result
index ab3e80a..278e8c5 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -95,6 +95,8 @@ cfg_filter(box.cfg)
     - true
   - - too_long_threshold
     - 0.5
+  - - use_mvcc_engine
+    - false
   - - vinyl_bloom_fpr
     - 0.05
   - - vinyl_cache
diff --git a/test/box/cfg.result b/test/box/cfg.result
index bdd210b..b5ba165 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -83,6 +83,8 @@ cfg_filter(box.cfg)
  |     - true
  |   - - too_long_threshold
  |     - 0.5
+ |   - - use_mvcc_engine
+ |     - false
  |   - - vinyl_bloom_fpr
  |     - 0.05
  |   - - vinyl_cache
@@ -190,6 +192,8 @@ cfg_filter(box.cfg)
  |     - true
  |   - - too_long_threshold
  |     - 0.5
+ |   - - use_mvcc_engine
+ |     - false
  |   - - vinyl_bloom_fpr
  |     - 0.05
  |   - - vinyl_cache
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (7 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 17:13   ` Nikita Pettik
  2020-07-16  0:11   ` Vladislav Shpilevoy
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 10/13] tmx: introduce conflict tracker Aleksandr Lyapunov
                   ` (8 subsequent siblings)
  17 siblings, 2 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

Prepare sequence number is a monotonically increasing ID that is
assigned to any prepared transaction. This ID is suitable for
serialization order resolution: the bigger is ID - the latter the
transaction exist in the serialization order of transactions.

Note that id of transactions has quite different order in case
when transaction could yield - an younger (bigger id) transaction
can prepare/commit first (lower psn) while older tx sleeps in vain.

Also it should be mentioned that LSN has the same order as PSN,
but it has two general differences:
1. The LSN sequence has no holes, i.e. it is a natural number
sequence. This property is useless for transaction engine.
2. The LSN sequence is provided by WAL writer and thus LSN is not
available for TX thas was prepared and haven't been committed yet.
That feature makes psn more suitable sequence for transactions as
it allows to order prepared but not committed transaction and
allows, for example, to create a read view between prepared
transactions.

Part of #4897
---
 src/box/txn.c | 5 +++++
 src/box/txn.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/src/box/txn.c b/src/box/txn.c
index d647fc9..7a15986 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -40,6 +40,8 @@
 
 struct tx_manager
 {
+	/** Last prepare-sequence-number that was assigned to prepared TX. */
+	int64_t last_psn;
 };
 
 /** That's a definition, see declaration for description. */
@@ -242,6 +244,7 @@ txn_begin(void)
 	txn->flags = 0;
 	txn->in_sub_stmt = 0;
 	txn->id = ++tsn;
+	txn->psn = 0;
 	txn->status = TXN_INPROGRESS;
 	txn->signature = TXN_SIGNATURE_ROLLBACK;
 	txn->engine = NULL;
@@ -650,6 +653,8 @@ txn_journal_entry_new(struct txn *txn)
 static int
 txn_prepare(struct txn *txn)
 {
+	txn->psn = ++txm.last_psn;
+
 	if (txn_has_flag(txn, TXN_IS_ABORTED_BY_YIELD)) {
 		assert(!txn_has_flag(txn, TXN_CAN_YIELD));
 		diag_set(ClientError, ER_TRANSACTION_YIELD);
diff --git a/src/box/txn.h b/src/box/txn.h
index a2374f3..b5b94fd 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -270,6 +270,11 @@ struct txn {
 	 * Valid IDs start from 1.
 	 */
 	int64_t id;
+	/**
+	 * A sequential ID that is assigned when the TX become prepared.
+	 * Transactions are committed in that order.
+	 */
+	int64_t psn;
 	/** Status of the TX */
 	enum txn_status status;
 	/** List of statements in a transaction. */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 10/13] tmx: introduce conflict tracker
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (8 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-16  0:16   ` Vladislav Shpilevoy
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story Aleksandr Lyapunov
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

There are situations when we have to track that if some TX is
committed then some others must be aborted due to conflict.
The common case is that one r/w TX have read some value while the
second is about to overwrite the value; is the second is committed,
the first must be aborted.
Thus we have to store many-to-many TX relations between breaker
TX and victim TX.
The patch implements that.

Part of #4897
---
 src/box/txn.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/box/txn.h |  32 ++++++++++++
 2 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/src/box/txn.c b/src/box/txn.c
index 7a15986..ba81b58 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -42,6 +42,12 @@ struct tx_manager
 {
 	/** Last prepare-sequence-number that was assigned to prepared TX. */
 	int64_t last_psn;
+	/**
+	 * List of all transactions that are in a read view.
+	 * New transactions are added to the tail of this list,
+	 * so the list is ordered by rv_psn.
+	 */
+	struct rlist read_view_txs;
 };
 
 /** That's a definition, see declaration for description. */
@@ -50,6 +56,21 @@ bool tx_manager_use_mvcc_engine = false;
 /** The one and only instance of tx_manager. */
 static struct tx_manager txm;
 
+/**
+ * Record that links two transactions, breaker and victim.
+ * See txm_cause_conflict for details.
+ */
+struct tx_conflict_tracker {
+	/** TX that aborts victim on commit. */
+	struct txn *breaker;
+	/** TX that aborts will be aborted on breaker's commit. */
+	struct txn *victim;
+	/** Link in breaker->conflict_list. */
+	struct rlist in_conflict_list;
+	/** Link in victim->conflicted_by_list. */
+	struct rlist in_conflicted_by_list;
+};
+
 double too_long_threshold;
 
 /* Txn cache. */
@@ -209,6 +230,9 @@ txn_new(void)
 	}
 	assert(region_used(&region) == sizeof(*txn));
 	txn->region = region;
+	rlist_create(&txn->conflict_list);
+	rlist_create(&txn->conflicted_by_list);
+	rlist_create(&txn->in_read_view_txs);
 	return txn;
 }
 
@@ -218,6 +242,22 @@ txn_new(void)
 inline static void
 txn_free(struct txn *txn)
 {
+	struct tx_conflict_tracker *entry, *next;
+	rlist_foreach_entry_safe(entry, &txn->conflict_list,
+				 in_conflict_list, next) {
+		rlist_del(&entry->in_conflict_list);
+		rlist_del(&entry->in_conflicted_by_list);
+	}
+	rlist_foreach_entry_safe(entry, &txn->conflicted_by_list,
+				 in_conflicted_by_list, next) {
+		rlist_del(&entry->in_conflict_list);
+		rlist_del(&entry->in_conflicted_by_list);
+	}
+	assert(rlist_empty(&txn->conflict_list));
+	assert(rlist_empty(&txn->conflicted_by_list));
+
+	rlist_del(&txn->in_read_view_txs);
+
 	struct txn_stmt *stmt;
 	stailq_foreach_entry(stmt, &txn->stmts, next)
 		txn_stmt_destroy(stmt);
@@ -235,6 +275,8 @@ txn_begin(void)
 	struct txn *txn = txn_new();
 	if (txn == NULL)
 		return NULL;
+	assert(rlist_empty(&txn->conflict_list));
+	assert(rlist_empty(&txn->conflicted_by_list));
 
 	/* Initialize members explicitly to save time on memset() */
 	stailq_create(&txn->stmts);
@@ -245,6 +287,7 @@ txn_begin(void)
 	txn->in_sub_stmt = 0;
 	txn->id = ++tsn;
 	txn->psn = 0;
+	txn->rv_psn = 0;
 	txn->status = TXN_INPROGRESS;
 	txn->signature = TXN_SIGNATURE_ROLLBACK;
 	txn->engine = NULL;
@@ -293,6 +336,15 @@ txn_begin_stmt(struct txn *txn, struct space *space)
 		diag_set(ClientError, ER_SUB_STMT_MAX);
 		return -1;
 	}
+
+	/*
+	 * A conflict have happened; there is no reason to continue the TX.
+	 */
+	if (txn->status == TXN_CONFLICTED) {
+		diag_set(ClientError, ER_TRANSACTION_CONFLICT);
+		return -1;
+	}
+
 	struct txn_stmt *stmt = txn_stmt_new(&txn->region);
 	if (stmt == NULL)
 		return -1;
@@ -647,6 +699,32 @@ txn_journal_entry_new(struct txn *txn)
 	return req;
 }
 
+/**
+ * Handle conflict when @breaker transaction is prepared.
+ * The conflict is happened if @victim have read something that @breaker
+ * overwrites.
+ * If @victim is read-only or haven't made any changes, it should be send
+ * to read view, in which is will not see @breaker.
+ * Otherwise @vistim must be marked as conflicted.
+ */
+static void
+txn_handle_conflict(struct txn *breaker, struct txn *victim)
+{
+	assert(breaker->psn != 0);
+	if (!victim->status != TXN_INPROGRESS) {
+		/* Was conflicted by somebody else. */
+		return;
+	}
+	if (stailq_empty(&victim->stmts)) {
+		/* Send to read view. */
+		victim->rv_psn = breaker->psn;
+		rlist_add_tail(&txm.read_view_txs, &victim->in_read_view_txs);
+	} else {
+		/* Mark as conflicted. */
+		victim->status = TXN_CONFLICTED;
+	}
+}
+
 /*
  * Prepare a transaction using engines.
  */
@@ -670,6 +748,17 @@ txn_prepare(struct txn *txn)
 		diag_set(ClientError, ER_FOREIGN_KEY_CONSTRAINT);
 		return -1;
 	}
+
+	/*
+	 * Somebody else has written some value that we have read.
+	 * The transaction is not possible.
+	 */
+	if (txn->status == TXN_CONFLICTED || txn->status == TXN_IN_READ_VIEW) {
+		diag_set(ClientError, ER_TRANSACTION_CONFLICT);
+		return -1;
+	}
+	assert(txn->status == TXN_INPROGRESS);
+
 	/*
 	 * Perform transaction conflict resolution. Engine == NULL when
 	 * we have a bunch of IPROTO_NOP statements.
@@ -678,6 +767,26 @@ txn_prepare(struct txn *txn)
 		if (engine_prepare(txn->engine, txn) != 0)
 			return -1;
 	}
+
+	struct tx_conflict_tracker *entry, *next;
+	/* Handle conflicts. */
+	rlist_foreach_entry_safe(entry, &txn->conflict_list,
+				 in_conflict_list, next) {
+		assert(entry->breaker == txn);
+		txn_handle_conflict(txn, entry->victim);
+		rlist_del(&entry->in_conflict_list);
+		rlist_del(&entry->in_conflicted_by_list);
+	}
+	/* Just free conflict list - we don't need it anymore. */
+	rlist_foreach_entry_safe(entry, &txn->conflicted_by_list,
+				 in_conflicted_by_list, next) {
+		assert(entry->victim == txn);
+		rlist_del(&entry->in_conflict_list);
+		rlist_del(&entry->in_conflicted_by_list);
+	}
+	assert(rlist_empty(&txn->conflict_list));
+	assert(rlist_empty(&txn->conflicted_by_list));
+
 	trigger_clear(&txn->fiber_on_stop);
 	if (!txn_has_flag(txn, TXN_CAN_YIELD))
 		trigger_clear(&txn->fiber_on_yield);
@@ -1184,10 +1293,55 @@ txn_on_yield(struct trigger *trigger, void *event)
 void
 tx_manager_init()
 {
-	(void)txm;
+	rlist_create(&txm.read_view_txs);
 }
 
 void
 tx_manager_free()
 {
 }
+
+int
+txm_cause_conflict(struct txn *breaker, struct txn *victim)
+{
+	struct tx_conflict_tracker *tracker = NULL;
+	struct rlist *r1 = breaker->conflict_list.next;
+	struct rlist *r2 = victim->conflicted_by_list.next;
+	while (r1 != &breaker->conflict_list &&
+	       r2 != &victim->conflicted_by_list) {
+		tracker = rlist_entry(r1, struct tx_conflict_tracker,
+				      in_conflict_list);
+		assert(tracker->breaker == breaker);
+		if (tracker->victim == victim)
+			break;
+		tracker = rlist_entry(r2, struct tx_conflict_tracker,
+				      in_conflicted_by_list);
+		assert(tracker->victim == victim);
+		if (tracker->breaker == breaker)
+			break;
+		tracker = NULL;
+		r1 = r1->next;
+		r2 = r2->next;
+	}
+	if (tracker != NULL) {
+		/* Move to the beginning of a list
+		 * for a case of subsequent lookups */
+		rlist_del(&tracker->in_conflict_list);
+		rlist_del(&tracker->in_conflicted_by_list);
+	} else {
+		size_t size;
+		tracker = region_alloc_object(&victim->region,
+					      struct tx_conflict_tracker,
+					      &size);
+		if (tracker == NULL) {
+			diag_set(OutOfMemory, size, "tx region",
+				 "conflict_tracker");
+			return -1;
+		}
+		tracker->breaker = breaker;
+		tracker->victim = victim;
+	}
+	rlist_add(&breaker->conflict_list, &tracker->in_conflict_list);
+	rlist_add(&victim->conflicted_by_list, &tracker->in_conflicted_by_list);
+	return 0;
+}
diff --git a/src/box/txn.h b/src/box/txn.h
index b5b94fd..03ccb76 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -275,6 +275,11 @@ struct txn {
 	 * Transactions are committed in that order.
 	 */
 	int64_t psn;
+	/**
+	 * Read view of that TX. The TX can see only changes with ps < rv_psn.
+	 * Is nonzero if and only if status = TXN_IN_READ_VIEW.
+	 */
+	int64_t rv_psn;
 	/** Status of the TX */
 	enum txn_status status;
 	/** List of statements in a transaction. */
@@ -336,6 +341,22 @@ struct txn {
 	uint32_t fk_deferred_count;
 	/** List of savepoints to find savepoint by name. */
 	struct rlist savepoints;
+	/**
+	 * List of tx_conflict_tracker records where .breaker is the current
+	 * transaction and .victim is the transactions that must be aborted
+	 * if the current transaction is committed.
+	 */
+	struct rlist conflict_list;
+	/**
+	 * List of tx_conflict_tracker records where .victim is the current
+	 * transaction and .breaker is the transactions that, if committed,
+	 * will abort the current transaction.
+	 */
+	struct rlist conflicted_by_list;
+	/**
+	 * Link in tx_manager::read_view_txs.
+	 */
+	struct rlist in_read_view_txs;
 };
 
 static inline bool
@@ -717,6 +738,17 @@ tx_manager_init();
 void
 tx_manager_free();
 
+/**
+ * Notify TX manager that if transaction @breaker is committed then the
+ * transactions @victim must be aborted due to conflict.
+ * For example: there's two rw transaction in progress, one have read
+ * some value while the second is about to overwrite it. If the second
+ * is committed first, the first must be aborted.
+ * @return 0 on success, -1 on memory error.
+ */
+int
+txm_cause_conflict(struct txn *breaker, struct txn *victim);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (9 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 10/13] tmx: introduce conflict tracker Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-16  0:20   ` Vladislav Shpilevoy
  2020-07-16 22:25   ` Vladislav Shpilevoy
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 12/13] txm: clarify all fetched tuples Aleksandr Lyapunov
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

TXM story is a part of a history of a value in space.
It's a story about a tuple, from the point it was added to space
to the point when it was deleted from the space.
All stories are linked into a list of stories of the same key of
each index.

Part of #4897
---
 src/box/txn.c | 798 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/box/txn.h | 193 ++++++++++++++
 2 files changed, 991 insertions(+)

diff --git a/src/box/txn.c b/src/box/txn.c
index ba81b58..4c46b60 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -37,6 +37,28 @@
 #include "xrow.h"
 #include "errinj.h"
 #include "iproto_constants.h"
+#include "small/mempool.h"
+
+static uint32_t
+txm_story_key_hash(const struct tuple *a)
+{
+	uintptr_t u = (uintptr_t)a;
+	if (sizeof(uintptr_t) <= sizeof(uint32_t))
+		return u;
+	else
+		return u ^ (u >> 32);
+}
+
+#define mh_name _history
+#define mh_key_t struct tuple *
+#define mh_node_t struct txm_story *
+#define mh_arg_t int
+#define mh_hash(a, arg) (txm_story_key_hash((*(a))->tuple))
+#define mh_hash_key(a, arg) (txm_story_key_hash(a))
+#define mh_cmp(a, b, arg) ((*(a))->tuple != (*(b))->tuple)
+#define mh_cmp_key(a, b, arg) ((a) != (*(b))->tuple)
+#define MH_SOURCE
+#include "salad/mhash.h"
 
 struct tx_manager
 {
@@ -48,6 +70,14 @@ struct tx_manager
 	 * so the list is ordered by rv_psn.
 	 */
 	struct rlist read_view_txs;
+	/** Mempools for tx_story objects with difference index count. */
+	struct mempool txm_story_pool[BOX_INDEX_MAX];
+	/** Hash table tuple -> txm_story of that tuple. */
+	struct mh_history_t *history;
+	/** List of all txm_story objects. */
+	struct rlist all_stories;
+	/** Iterator that sequentially traverses all txm_story objects. */
+	struct rlist *traverse_all_stories;
 };
 
 /** That's a definition, see declaration for description. */
@@ -146,6 +176,9 @@ txn_stmt_new(struct region *region)
 	stmt->space = NULL;
 	stmt->old_tuple = NULL;
 	stmt->new_tuple = NULL;
+	stmt->add_story = NULL;
+	stmt->del_story = NULL;
+	stmt->next_in_del_list = NULL;
 	stmt->engine_savepoint = NULL;
 	stmt->row = NULL;
 	stmt->has_triggers = false;
@@ -1294,11 +1327,23 @@ void
 tx_manager_init()
 {
 	rlist_create(&txm.read_view_txs);
+	for (size_t i = 0; i < BOX_INDEX_MAX; i++) {
+		size_t item_size = sizeof(struct txm_story) +
+			i * sizeof(struct txm_story_link);
+		mempool_create(&txm.txm_story_pool[i],
+			       cord_slab_cache(), item_size);
+	}
+	txm.history = mh_history_new();
+	rlist_create(&txm.all_stories);
+	txm.traverse_all_stories = &txm.all_stories;
 }
 
 void
 tx_manager_free()
 {
+	mh_history_delete(txm.history);
+	for (size_t i = 0; i < BOX_INDEX_MAX; i++)
+		mempool_destroy(&txm.txm_story_pool[i]);
 }
 
 int
@@ -1345,3 +1390,756 @@ txm_cause_conflict(struct txn *breaker, struct txn *victim)
 	rlist_add(&victim->conflicted_by_list, &tracker->in_conflicted_by_list);
 	return 0;
 }
+
+/**
+ * Creates new story and links it with the @tuple.
+ * @return story on success, NULL on error (diag is set).
+ */
+static struct txm_story *
+txm_story_new(struct space *space, struct tuple *tuple)
+{
+	assert(!tuple->is_dirty);
+	uint32_t index_count = space->index_count;
+	assert(index_count < BOX_INDEX_MAX);
+	struct mempool *pool = &txm.txm_story_pool[index_count];
+	struct txm_story *story = (struct txm_story *)mempool_alloc(pool);
+	if (story == NULL) {
+		size_t item_size = sizeof(struct txm_story) +
+			index_count * sizeof(struct txm_story_link);
+		diag_set(OutOfMemory, item_size, "tx_manager", "tx story");
+		return story;
+	}
+	story->tuple = tuple;
+
+	const struct txm_story **put_story = (const struct txm_story **)&story;
+	struct txm_story **empty = NULL;
+	mh_int_t pos = mh_history_put(txm.history, put_story, &empty, 0);
+	if (pos == mh_end(txm.history)) {
+		mempool_free(pool, story);
+		diag_set(OutOfMemory, pos + 1,
+			 "tx_manager", "tx history hash table");
+		return NULL;
+	}
+	tuple->is_dirty = true;
+	tuple_ref(tuple);
+
+	story->index_count = index_count;
+	story->add_stmt = NULL;
+	story->add_psn = 0;
+	story->del_stmt = NULL;
+	story->del_psn = 0;
+	rlist_create(&story->reader_list);
+	rlist_add_tail(&txm.all_stories, &story->in_all_stories);
+	memset(story->link, 0, sizeof(story->link[0]) * index_count);
+	return story;
+}
+
+static void
+txm_story_delete(struct txm_story *story);
+
+/**
+ * Creates new story of a @tuple that was added by @stmt.
+ * @return story on success, NULL on error (diag is set).
+ */
+static struct txm_story *
+txm_story_new_add_stmt(struct tuple *tuple, struct txn_stmt *stmt)
+{
+	struct txm_story *res = txm_story_new(stmt->space, tuple);
+	if (res == NULL)
+		return NULL;
+	res->add_stmt = stmt;
+	assert(stmt->add_story == NULL);
+	stmt->add_story = res;
+	return res;
+}
+
+/**
+ * Creates new story of a @tuple that was deleted by @stmt.
+ * @return story on success, NULL on error (diag is set).
+ */
+static struct txm_story *
+txm_story_new_del_stmt(struct tuple *tuple, struct txn_stmt *stmt)
+{
+	struct txm_story *res = txm_story_new(stmt->space, tuple);
+	if (res == NULL)
+		return NULL;
+	res->del_stmt = stmt;
+	assert(stmt->del_story == NULL);
+	stmt->del_story = res;
+	return res;
+}
+
+/**
+ * Undo txm_story_new_add_stmt.
+ */
+static void
+txm_story_delete_add_stmt(struct txm_story *story)
+{
+	story->add_stmt->add_story = NULL;
+	story->add_stmt = NULL;
+	txm_story_delete(story);
+}
+
+/**
+ * Undo txm_story_new_del_stmt.
+ */
+static void
+txm_story_delete_del_stmt(struct txm_story *story)
+{
+	story->del_stmt->del_story = NULL;
+	story->del_stmt = NULL;
+	txm_story_delete(story);
+}
+
+
+/**
+ * Find a story of a @tuple. The story expected to be present (assert).
+ */
+static struct txm_story *
+txm_story_get(struct tuple *tuple)
+{
+	assert(tuple->is_dirty);
+
+	mh_int_t pos = mh_history_find(txm.history, tuple, 0);
+	assert(pos != mh_end(txm.history));
+	return *mh_history_node(txm.history, pos);
+}
+
+/**
+ * Get the older tuple, extracting it from older story if necessary.
+ */
+static struct tuple *
+txm_story_older_tuple(struct txm_story_link *link)
+{
+	return link->older.is_story ? link->older.story->tuple
+				    : link->older.tuple;
+}
+
+/**
+ * Link a @story with older story in @index (in both directions).
+ */
+static void
+txm_story_link_story(struct txm_story *story, struct txm_story *older_story,
+		     uint32_t index)
+{
+	assert(older_story != NULL);
+	struct txm_story_link *link = &story->link[index];
+	/* Must be unlinked. */
+	assert(!link->older.is_story);
+	assert(link->older.tuple == NULL);
+	link->older.is_story = true;
+	link->older.story = older_story;
+	older_story->link[index].newer_story = story;
+}
+
+/**
+ * Link a @story with older tuple in @index. In case if the tuple is dirty -
+ * find and link with the corresponding story.
+ */
+static void
+txm_story_link_tuple(struct txm_story *story, struct tuple *older_tuple,
+                     uint32_t index)
+{
+	struct txm_story_link *link = &story->link[index];
+	/* Must be unlinked. */
+	assert(!link->older.is_story);
+	assert(link->older.tuple == NULL);
+	if (older_tuple == NULL)
+		return;
+	if (older_tuple->is_dirty) {
+		txm_story_link_story(story, txm_story_get(older_tuple), index);
+		return;
+	}
+	link->older.tuple = older_tuple;
+	tuple_ref(link->older.tuple);
+}
+
+/**
+ * Unlink a @story with older story/tuple in @index.
+ */
+static void
+txm_story_unlink(struct txm_story *story, uint32_t index)
+{
+	struct txm_story_link *link = &story->link[index];
+	if (link->older.is_story) {
+		link->older.story->link[index].newer_story = NULL;
+	} else if (link->older.tuple != NULL) {
+		tuple_unref(link->older.tuple);
+		link->older.tuple = NULL;
+	}
+	link->older.is_story = false;
+	link->older.tuple = NULL;
+}
+
+/**
+ * Check if a @story is visible for transaction @txn. Return visible tuple to
+ * @visible_tuple (can be set to NULL).
+ * @param is_prepared_ok - whether prepared (not committed) change is acceptable.
+ * @param own_change - return true if the change was made by @txn itself.
+ * @return true if the story is visible, false otherwise.
+ */
+static bool
+txm_story_is_visible(struct txm_story *story, struct txn *txn,
+		     struct tuple **visible_tuple, bool is_prepared_ok,
+		     bool *own_change)
+{
+	*own_change = false;
+	*visible_tuple = NULL;
+	struct txn_stmt *dels = story->del_stmt;
+	while (dels != NULL) {
+		if (dels->txn == txn) {
+			/* Tuple is deleted by us (@txn). */
+			*own_change = true;
+			return true;
+		}
+		dels = dels->next_in_del_list;
+	}
+	if (is_prepared_ok && story->del_psn != 0 &&
+	    (txn->rv_psn == 0 || story->del_psn < txn->rv_psn)) {
+		/* Tuple is deleted by prepared TX. */
+		return true;
+	}
+	if (story->del_psn != 0 && story->del_stmt == NULL &&
+	    (txn->rv_psn == 0 || story->del_psn < txn->rv_psn)) {
+		/* Tuple is deleted by committed TX. */
+		return true;
+	}
+
+	if (story->add_stmt != NULL && story->add_stmt->txn == txn) {
+		/* Tuple is added by us (@txn). */
+		*visible_tuple = story->tuple;
+		*own_change = true;
+		return true;
+	}
+	if (is_prepared_ok && story->add_psn != 0 &&
+	    (txn->rv_psn == 0 || story->add_psn < txn->rv_psn)) {
+		/* Tuple is added by another prepared TX. */
+		*visible_tuple = story->tuple;
+		return true;
+	}
+	if (story->add_psn != 0 && story->add_stmt == NULL &&
+		(txn->rv_psn == 0 || story->add_psn < txn->rv_psn)) {
+		/* Tuple is added by committed TX. */
+		*visible_tuple = story->tuple;
+		return true;
+	}
+	if (story->add_psn == 0 && story->add_stmt == NULL) {
+		/* added long time ago. */
+		*visible_tuple = story->tuple;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * Temporary (allocated on region) struct that stores a conflicting TX.
+ */
+struct txn_conflict
+{
+	struct txn *other_txn;
+	struct txn_conflict *next;
+};
+
+/**
+ * Save @other_txn in list with head @coflicts_head. New list node is allocated
+ * on @region.
+ * @return 0 on success, -1 on memory error.
+ */
+static int
+txm_save_conflict(struct txn *other_txn, struct txn_conflict **coflicts_head,
+		  struct region *region)
+{
+	size_t err_size;
+	struct txn_conflict *next_conflict;
+	next_conflict = region_alloc_object(region, struct txn_conflict,
+					    &err_size);
+	if (next_conflict == NULL) {
+		diag_set(OutOfMemory, err_size, "txn_region", "txn conflict");
+		return -1;
+	}
+	next_conflict->other_txn = other_txn;
+	next_conflict->next = *coflicts_head;
+	*coflicts_head = next_conflict;
+	return 0;
+}
+
+/**
+ * Scan a history starting by @stmt statement in @index for a visible tuple
+ * (prepared suits), returned via @visible_replaced.
+ * Collect a list of transactions that will abort current transaction if they
+ * are committed.
+ *
+ * @return 0 on success, -1 on memory error.
+ */
+static int
+txm_story_find_visible(struct txn_stmt *stmt, struct txm_story *story,
+		       uint32_t index, struct tuple **visible_replaced,
+		       struct txn_conflict **collected_conflicts,
+		       struct region *region)
+{
+	while (true) {
+		if (!story->link[index].older.is_story) {
+			/*
+			 * the tuple is so old that we doesn't
+			 * know its story.
+			 */
+			*visible_replaced = story->link[index].older.tuple;
+			assert(*visible_replaced == NULL ||
+			       !(*visible_replaced)->is_dirty);
+			break;
+		}
+		story = story->link[index].older.story;
+		bool unused;
+		if (txm_story_is_visible(story, stmt->txn, visible_replaced,
+					 true, &unused))
+			break;
+
+		/*
+		 * We skip the story but once the story is committed
+		 * before out TX that may cause conflict.
+		 * The conflict will be unavoidable if this statement
+		 * relies on old_tuple. If not (it's a replace),
+		 * the conflict will take place only for secondary
+		 * index if the story will not be overwritten in primary
+		 * index.
+		 */
+		bool cross_conflict = false;
+		if (stmt->does_require_old_tuple) {
+			cross_conflict = true;
+		} else if (index != 0) {
+			struct txm_story *look_up = story;
+			cross_conflict = true;
+			while (look_up->link[index].newer_story != NULL) {
+				struct txm_story *over;
+				over = look_up->link[index].newer_story;
+				if (over->add_stmt->txn == stmt->txn) {
+					cross_conflict = false;
+					break;
+				}
+				look_up = over;
+			}
+		}
+		if (cross_conflict) {
+			if (txm_save_conflict(story->add_stmt->txn,
+					      collected_conflicts,
+					      region) != 0)
+				return -1;
+
+		}
+	}
+	return 0;
+}
+
+int
+txm_history_add_stmt(struct txn_stmt *stmt, struct tuple *old_tuple,
+		     struct tuple *new_tuple, enum dup_replace_mode mode,
+		     struct tuple **result)
+{
+	assert(new_tuple != NULL || old_tuple != NULL);
+	struct space *space = stmt->space;
+	struct txm_story *add_story = NULL;
+	uint32_t add_story_linked = 0;
+	struct txm_story *del_story = NULL;
+	bool del_story_created = false;
+	struct region *region = &stmt->txn->region;
+	size_t region_svp = region_used(region);
+
+	/*
+	 * List of transactions that will conflict us once one of them
+	 * become committed.
+	 */
+	struct txn_conflict *collected_conflicts = NULL;
+
+	/* Create add_story if necessary. */
+	if (new_tuple != NULL) {
+		add_story = txm_story_new_add_stmt(new_tuple, stmt);
+		if (add_story == NULL)
+			goto fail;
+
+		for (uint32_t i = 0; i < space->index_count; i++) {
+			struct tuple *replaced;
+			struct index *index = space->index[i];
+			if (index_replace(index, NULL, new_tuple,
+					  DUP_REPLACE_OR_INSERT,
+					  &replaced) != 0)
+				goto fail;
+			txm_story_link_tuple(add_story, replaced, i);
+			add_story_linked++;
+
+			struct tuple *visible_replaced = NULL;
+			if (txm_story_find_visible(stmt, add_story, i,
+						   &visible_replaced,
+						   &collected_conflicts,
+						   region) != 0)
+				goto fail;
+
+			uint32_t errcode;
+			errcode = replace_check_dup(old_tuple, visible_replaced,
+						    i == 0 ? mode : DUP_INSERT);
+			if (errcode != 0) {
+				struct space *sp = stmt->space;
+				if (sp != NULL)
+					diag_set(ClientError, errcode,
+						 sp->index[i]->def->name,
+						 space_name(sp));
+				goto fail;
+			}
+
+			if (i == 0) {
+				old_tuple = visible_replaced;
+			}
+		}
+	}
+
+	/* Create del_story if necessary. */
+	struct tuple *del_tuple = NULL;
+	if (new_tuple != NULL) {
+		struct txm_story_link *link = &add_story->link[0];
+		if (link->older.is_story) {
+			del_story = link->older.story;
+			del_tuple = del_story->tuple;
+		} else {
+			del_tuple = link->older.tuple;
+		}
+	} else {
+		del_tuple = old_tuple;
+	}
+	if (del_tuple != NULL && del_story == NULL) {
+		if (del_tuple->is_dirty) {
+			del_story = txm_story_get(del_tuple);
+		} else {
+			del_story = txm_story_new_del_stmt(del_tuple, stmt);
+			if (del_story == NULL)
+				goto fail;
+			del_story_created = true;
+		}
+	}
+	if (new_tuple != NULL && del_story_created) {
+		for (uint32_t i = 0; i < add_story->index_count; i++) {
+			struct txm_story_link *link = &add_story->link[i];
+			if (link->older.is_story)
+				continue;
+			if (link->older.tuple == del_tuple) {
+				txm_story_unlink(add_story, i);
+				txm_story_link_story(add_story, del_story, i);
+			}
+		}
+	}
+	if (del_story != NULL && !del_story_created) {
+		stmt->next_in_del_list = del_story->del_stmt;
+		del_story->del_stmt = stmt;
+		stmt->del_story = del_story;
+	}
+
+	/* Purge found conflicts. */
+	while (collected_conflicts != NULL) {
+		if (txm_cause_conflict(collected_conflicts->other_txn,
+				       stmt->txn) != 0)
+			goto fail;
+		collected_conflicts = collected_conflicts->next;
+	}
+	region_truncate(region, region_svp);
+
+	/*
+	 * We now reference both new and old tuple because the stmt holds
+	 * pointers to them.
+	 */
+	if (stmt->new_tuple != NULL)
+		tuple_ref(stmt->new_tuple);
+	*result = old_tuple;
+	if (*result != NULL)
+		tuple_ref(*result);
+	return 0;
+
+fail:
+	if (add_story != NULL) {
+		while (add_story_linked > 0) {
+			--add_story_linked;
+			uint32_t i = add_story_linked;
+
+			struct index *index = space->index[i];
+			struct txm_story_link *link = &add_story->link[i];
+			struct tuple *was = txm_story_older_tuple(link);
+			struct tuple *unused;
+			if (index_replace(index, new_tuple, was,
+					  DUP_INSERT,
+					  &unused) != 0) {
+				diag_log();
+				unreachable();
+				panic("failed to rollback change");
+			}
+
+			txm_story_unlink(stmt->add_story, i);
+
+		}
+		txm_story_delete_add_stmt(stmt->add_story);
+	}
+
+	if (del_story != NULL && del_story->del_stmt == stmt) {
+		del_story->del_stmt = stmt->next_in_del_list;
+		stmt->next_in_del_list = NULL;
+	}
+
+	if (del_story_created)
+		txm_story_delete_del_stmt(stmt->del_story);
+	else
+		stmt->del_story = NULL;
+
+	region_truncate(region, region_svp);
+	return -1;
+}
+
+void
+txm_history_rollback_stmt(struct txn_stmt *stmt)
+{
+	if (stmt->add_story != NULL) {
+		assert(stmt->add_story->tuple == stmt->new_tuple);
+		struct txm_story *story = stmt->add_story;
+
+		for (uint32_t i = 0; i < story->index_count; i++) {
+			struct txm_story_link *link = &story->link[i];
+			if (link->newer_story == NULL) {
+				struct tuple *unused;
+				struct index *index = stmt->space->index[i];
+				struct tuple *was = txm_story_older_tuple(link);
+				if (index_replace(index, story->tuple, was,
+						  DUP_INSERT, &unused) != 0) {
+					diag_log();
+					unreachable();
+					panic("failed to rollback change");
+				}
+			} else {
+				struct txm_story *newer = link->newer_story;
+				assert(newer->link[i].older.is_story);
+				assert(newer->link[i].older.story == story);
+				txm_story_unlink(newer, i);
+				if (link->older.is_story) {
+					struct txm_story *to = link->older.story;
+					txm_story_link_story(newer,to, i);
+				} else {
+					struct tuple *to = link->older.tuple;
+					txm_story_link_tuple(newer, to, i);
+				}
+			}
+			txm_story_unlink(story, i);
+		}
+		stmt->add_story->add_stmt = NULL;
+		stmt->add_story = NULL;
+		tuple_unref(stmt->new_tuple);
+	}
+
+	if (stmt->del_story != NULL) {
+		struct txm_story *story = stmt->del_story;
+
+		struct txn_stmt **prev = &story->del_stmt;
+		while (*prev != stmt) {
+			prev = &(*prev)->next_in_del_list;
+			assert(*prev != NULL);
+		}
+		*prev = stmt->next_in_del_list;
+		stmt->next_in_del_list = NULL;
+
+		stmt->del_story->del_stmt = NULL;
+		stmt->del_story = NULL;
+	}
+}
+
+void
+txm_history_prepare_stmt(struct txn_stmt *stmt)
+{
+	assert(stmt->txn->psn != 0);
+
+	/* Move story to the past to prepared stories. */
+
+	struct txm_story *story = stmt->add_story;
+	uint32_t index_count = story == NULL ? 0 : story->index_count;
+	/*
+	 * Note that if stmt->add_story == NULL, the index_count is set to 0,
+	 * and we will not enter the loop.
+	 */
+	for (uint32_t i = 0; i < index_count; ) {
+		if (!story->link[i].older.is_story) {
+			/* tuple is old. */
+			i++;
+			continue;
+		}
+		struct txm_story *old_story =
+			story->link[i].older.story;
+		if (old_story->del_psn != 0) {
+			/* is psn is set, the change is prepared. */
+			i++;
+			continue;
+		}
+		if (old_story->add_psn != 0) {
+			/* is psn is set, the change is prepared. */
+			i++;
+			continue;
+		}
+
+		if (old_story->add_stmt != NULL) {
+			/* ancient. */
+			i++;
+			continue;
+		}
+		if (old_story->add_stmt->txn == stmt->txn) {
+			/* added by us. */
+			i++;
+			continue;
+		}
+
+		if (old_story->add_stmt->does_require_old_tuple || i != 0)
+			old_story->add_stmt->txn->status = TXN_CONFLICTED;
+
+		/* Swap story and old story. */
+		struct txm_story_link *link = &story->link[i];
+		if (link->newer_story == NULL) {
+			/* we have to replace the tuple in index. */
+			struct tuple *unused;
+			struct index *index = stmt->space->index[i];
+			if (index_replace(index, story->tuple, old_story->tuple,
+					  DUP_INSERT, &unused) != 0) {
+				diag_log();
+				panic("failed to rollback change");
+			}
+		} else {
+			struct txm_story *newer = link->newer_story;
+			assert(newer->link[i].older.is_story);
+			assert(newer->link[i].older.story == story);
+			txm_story_unlink(newer, i);
+			txm_story_link_story(newer, old_story, i);
+		}
+
+		txm_story_unlink(story, i);
+		if (old_story->link[i].older.is_story) {
+			struct txm_story *to =
+				old_story->link[i].older.story;
+			txm_story_unlink(old_story, i);
+			txm_story_link_story(story, to, i);
+		} else {
+			struct tuple *to =
+				old_story->link[i].older.tuple;
+			txm_story_unlink(old_story, i);
+			txm_story_link_tuple(story, to, i);
+		}
+
+		txm_story_link_story(old_story, story, i);
+
+		if (i == 0) {
+			assert(stmt->del_story == old_story);
+			assert(story->link[0].older.is_story ||
+			       story->link[0].older.tuple == NULL);
+
+			struct txn_stmt *dels = old_story->del_stmt;
+			assert(dels != NULL);
+			do {
+				if (dels->txn != stmt->txn)
+					dels->txn->status = TXN_CONFLICTED;
+				dels->del_story = NULL;
+				struct txn_stmt *next = dels->next_in_del_list;
+				dels->next_in_del_list = NULL;
+				dels = next;
+			} while (dels != NULL);
+			old_story->del_stmt = NULL;
+
+			if (story->link[0].older.is_story) {
+				struct txm_story *oldest_story =
+					story->link[0].older.story;
+				dels = oldest_story->del_stmt;
+				while (dels != NULL) {
+					assert(dels->txn != stmt->txn);
+					dels->del_story = NULL;
+					struct txn_stmt *next =
+						dels->next_in_del_list;
+					dels->next_in_del_list = NULL;
+					dels = next;
+				}
+				oldest_story->del_stmt = stmt;
+				stmt->del_story = oldest_story;
+			}
+		}
+	}
+	if (stmt->add_story != NULL)
+		stmt->add_story->add_psn = stmt->txn->psn;
+
+	if (stmt->del_story != NULL)
+		stmt->del_story->del_psn = stmt->txn->psn;
+}
+
+ssize_t
+txm_history_commit_stmt(struct txn_stmt *stmt)
+{
+	size_t res = 0;
+	if (stmt->add_story != NULL) {
+		assert(stmt->add_story->add_stmt == stmt);
+		res += stmt->add_story->tuple->bsize;
+		stmt->add_story->add_stmt = NULL;
+		stmt->add_story = NULL;
+	}
+	if (stmt->del_story != NULL) {
+		assert(stmt->del_story->del_stmt == stmt);
+		assert(stmt->next_in_del_list == NULL);
+		res -= stmt->del_story->tuple->bsize;
+		tuple_unref(stmt->del_story->tuple);
+		stmt->del_story->del_stmt = NULL;
+		stmt->del_story = NULL;
+	}
+	return res;
+}
+
+struct tuple *
+txm_tuple_clarify_slow(struct txn *txn, struct space *space,
+		       struct tuple *tuple, uint32_t index,
+		       uint32_t mk_index, bool is_prepared_ok)
+{
+	(void)space;
+	assert(tuple->is_dirty);
+	struct txm_story *story = txm_story_get(tuple);
+	bool own_change = false;
+	struct tuple *result = NULL;
+
+	while (true) {
+		if (txm_story_is_visible(story, txn, &result,
+					 is_prepared_ok, &own_change)) {
+			break;
+		}
+		if (story->link[index].older.is_story) {
+			story = story->link[index].older.story;
+		} else {
+			result = story->link[index].older.tuple;
+			break;
+		}
+	}
+	(void)own_change; /* TODO: add conflict */
+	(void)mk_index; /* TODO: multiindex */
+	return result;
+}
+
+static void
+txm_story_delete(struct txm_story *story)
+{
+	assert(story->add_stmt == NULL);
+	assert(story->del_stmt == NULL);
+
+	if (txm.traverse_all_stories == &story->in_all_stories)
+		txm.traverse_all_stories = rlist_next(txm.traverse_all_stories);
+	rlist_del(&story->in_all_stories);
+
+	mh_int_t pos = mh_history_find(txm.history, story->tuple, 0);
+	assert(pos != mh_end(txm.history));
+	mh_history_del(txm.history, pos, 0);
+
+	story->tuple->is_dirty = false;
+	tuple_unref(story->tuple);
+
+#ifndef NDEBUG
+	/* Expecting to delete fully unlinked story. */
+	for (uint32_t i = 0; i < story->index_count; i++) {
+		assert(story->link[i].newer_story == NULL);
+		assert(story->link[i].older.is_story == false);
+		assert(story->link[i].older.tuple == NULL);
+	}
+#endif
+
+	struct mempool *pool = &txm.txm_story_pool[story->index_count];
+	mempool_free(pool, story);
+}
diff --git a/src/box/txn.h b/src/box/txn.h
index 03ccb76..e294497 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -36,6 +36,7 @@
 #include "trigger.h"
 #include "fiber.h"
 #include "space.h"
+#include "tuple.h"
 
 #if defined(__cplusplus)
 extern "C" {
@@ -175,6 +176,27 @@ struct txn_stmt {
 	struct space *space;
 	struct tuple *old_tuple;
 	struct tuple *new_tuple;
+	/**
+	 * If new_tuple != NULL and this transaction was not prepared,
+	 * this member holds added story of the new_tuple.
+	 */
+	struct txm_story *add_story;
+	/**
+	 * If new_tuple == NULL and this transaction was not prepared,
+	 * this member holds added story of the old_tuple.
+	 */
+	struct txm_story *del_story;
+	/**
+	 * Link in txm_story::del_stmt linked list.
+	 * Only one prepared TX can delete a tuple and a story. But
+	 * when there are several in-progress transactions and they delete
+	 * the same tuple we have to store several delete statements in one
+	 * story. It's implemented in that way: story has a pointer to the first
+	 * deleting statement, that statement has a pointer to the next etc,
+	 * with NULL in the end.
+	 * That member is that the pointer to next deleting statement.
+	 */
+	struct txn_stmt *next_in_del_list;
 	/** Engine savepoint for the start of this statement. */
 	void *engine_savepoint;
 	/** Redo info: the binary log row */
@@ -359,6 +381,88 @@ struct txn {
 	struct rlist in_read_view_txs;
 };
 
+/**
+ * Pointer to tuple or story.
+ */
+struct txm_story_or_tuple {
+	/** Flag whether it's a story. */
+	bool is_story;
+	union {
+		/** Pointer to story, it must be reverse liked. */
+		struct txm_story *story;
+		/** Smart pointer to tuple: the tuple is referenced if set. */
+		struct tuple *tuple;
+	};
+};
+
+/**
+ * Link that connects a txm_story with older and newer stories of the same
+ * key in index.
+ */
+struct txm_story_link {
+	/** Story that was happened after that story was ended. */
+	struct txm_story *newer_story;
+	/**
+	 * Older story or ancient tuple (so old that its story was lost).
+	 * In case of tuple is can also be NULL.
+	 */
+	struct txm_story_or_tuple older;
+};
+
+/**
+ * A part of a history of a value in space.
+ * It's a story about a tuple, from the point it was added to space to the
+ * point when it was deleted from a space.
+ * All stories are linked into a list of stories of the same key of each index.
+ */
+struct txm_story {
+	/** The story is about this tuple. The tuple is referenced. */
+
+	struct tuple *tuple;
+	/**
+	 * Statement that told this story. Is set to NULL when the statement's
+	 * transaction becomes committed. Can also be NULL if we don't know who
+	 * introduced that story, the tuple was added by a transaction that
+	 * was completed and destroyed some time ago.
+	 */
+	struct txn_stmt *add_stmt;
+	/**
+	 * Prepare sequence number of add_stmt's transaction. Is set when
+	 * the transactions is prepared. Can be 0 if the transaction is
+	 * in progress or we don't know who introduced that story.
+	 */
+	int64_t add_psn;
+	/**
+	 * Statement that ended this story. Is set to NULL when the statement's
+	 * transaction becomes committed. Can also be NULL if the tuple has not
+	 * been deleted yet.
+	 */
+	struct txn_stmt *del_stmt;
+	/**
+	 * Prepare sequence number of del_stmt's transaction. Is set when
+	 * the transactions is prepared. Can be 0 if the transaction is
+	 * in progress or if nobody has deleted the tuple.
+	 */
+	int64_t del_psn;
+	/**
+	 * List of trackers - transactions that has read this tuple.
+	 */
+	struct rlist reader_list;
+	/**
+	 * Link in tx_manager::all_stories
+	 */
+	struct rlist in_all_stories;
+	/**
+	 * Number of indexes in this space - and the count of link[].
+	 */
+	uint32_t index_count;
+	/**
+	 * Link with older and newer stories (and just tuples) for each
+	 * index respectively.
+	 */
+	struct txm_story_link link[];
+};
+
 static inline bool
 txn_has_flag(struct txn *txn, enum txn_flag flag)
 {
@@ -749,6 +853,95 @@ tx_manager_free();
 int
 txm_cause_conflict(struct txn *breaker, struct txn *victim);
 
+/**
+ * @brief Add a statement to transaction manager's history.
+ * Until unlinking or releasing the space could internally contain
+ * wrong tuples and must be cleaned through txm_tuple_clarify call.
+ * With that clarifying the statement will be visible to current transaction,
+ * but invisible to all others.
+ * Follows signature of @sa memtx_space_replace_all_keys .
+ *
+ * @param stmt current statement.
+ * @param old_tuple the tuple that should be removed (can be NULL).
+ * @param new_tuple the tuple that should be inserted (can be NULL).
+ * @param mode      dup_replace_mode, used only if new_tuple is not
+ *                  NULL and old_tuple is NULL, and only for the
+ *                  primary key.
+ * @param result - old or replaced tuple.
+ * @return 0 on success, -1 on error (diag is set).
+ */
+int
+txm_history_add_stmt(struct txn_stmt *stmt, struct tuple *old_tuple,
+		     struct tuple *new_tuple, enum dup_replace_mode mode,
+		     struct tuple **result);
+
+/**
+ * @brief Rollback (undo) a statement from transaction manager's history.
+ * It's just make the statement invisible to all.
+ * Prepared statements could be also removed, but for consistency all latter
+ * prepared statement must be also rolled back.
+ *
+ * @param stmt current statement.
+ */
+void
+txm_history_rollback_stmt(struct txn_stmt *stmt);
+
+/**
+ * @brief Prepare statement in history for further commit.
+ * Prepared statements are still invisible for read-only transactions
+ * but are visible to all read-write transactions.
+ * Prepared and in-progress transactions use the same links for creating
+ * chains of stories in history. The difference is that the order of
+ * prepared transactions is fixed while in-progress transactions are
+ * added to the end of list in any order. Thus to switch to prepared
+ * we have to reorder story in such a way that current story will be
+ * between earlier prepared stories and in-progress stories. That's what
+ * this function does.
+ *
+ * @param stmt current statement.
+ */
+void
+txm_history_prepare_stmt(struct txn_stmt *stmt);
+
+/**
+ * @brief Commit statement in history.
+ * Make the statement's changes permanent. It becomes visible to all.
+ *
+ * @param stmt current statement.
+ * @return the change in space bsize.
+ */
+ssize_t
+txm_history_commit_stmt(struct txn_stmt *stmt);
+
+/** Helper of txm_tuple_clarify */
+struct tuple *
+txm_tuple_clarify_slow(struct txn *txn, struct space *space,
+		       struct tuple *tuple, uint32_t index,
+		       uint32_t mk_index, bool is_prepared_ok);
+
+/**
+ * Cleans a tuple if it's dirty - finds a visible tuple in history.
+ * @param txn - current transactions.
+ * @param space - space in which the tuple was found.
+ * @param tuple - tuple to clean.
+ * @param index - index number.
+ * @param mk_index - multikey index (iа the index is multikey).
+ * @param is_prepared_ok - allow to return prepared tuples.
+ * @return clean tuple (can be NULL).
+ */
+static inline struct tuple *
+txm_tuple_clarify(struct txn *txn, struct space *space,
+		  struct tuple *tuple, uint32_t index,
+		  uint32_t mk_index, bool is_prepared_ok)
+{
+	if (!tx_manager_use_mvcc_engine)
+		return tuple;
+	if (!tuple->is_dirty)
+		return tuple;
+	return txm_tuple_clarify_slow(txn, space, tuple, index, mk_index,
+				      is_prepared_ok);
+}
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 12/13] txm: clarify all fetched tuples
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (10 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx Aleksandr Lyapunov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

If a tuple fetched from an index is dirty - it must be clarified.
Let's fix all fetched from indexeds in that way.
Also fix a snapshot iterator - it must save a part of history
along with creating a read view in order to clean tuple during
iteration from another thread.

Part of #4897
---
 src/box/memtx_bitset.c |  30 ++++---
 src/box/memtx_hash.c   |  79 ++++++++++++++---
 src/box/memtx_rtree.c  |  30 ++++++-
 src/box/memtx_tree.c   | 119 +++++++++++++++++++++----
 src/box/space.c        |   2 +
 src/box/space.h        |   4 +
 src/box/txn.c          | 230 +++++++++++++++++++++++++++++++++++++++++++++----
 src/box/txn.h          |  64 +++++++++++++-
 8 files changed, 499 insertions(+), 59 deletions(-)

diff --git a/src/box/memtx_bitset.c b/src/box/memtx_bitset.c
index 67eaf6f..68845c5 100644
--- a/src/box/memtx_bitset.c
+++ b/src/box/memtx_bitset.c
@@ -39,7 +39,9 @@
 #include "bitset/index.h"
 #include "fiber.h"
 #include "index.h"
+#include "schema.h"
 #include "tuple.h"
+#include "txn.h"
 #include "memtx_engine.h"
 
 struct memtx_bitset_index {
@@ -198,19 +200,27 @@ bitset_index_iterator_next(struct iterator *iterator, struct tuple **ret)
 	assert(iterator->free == bitset_index_iterator_free);
 	struct bitset_index_iterator *it = bitset_index_iterator(iterator);
 
-	size_t value = tt_bitset_iterator_next(&it->bitset_it);
-	if (value == SIZE_MAX) {
-		*ret = NULL;
-		return 0;
-	}
-
+	do {
+		size_t value = tt_bitset_iterator_next(&it->bitset_it);
+		if (value == SIZE_MAX) {
+			*ret = NULL;
+			return 0;
+		}
 #ifndef OLD_GOOD_BITSET
-	struct memtx_bitset_index *index =
-		(struct memtx_bitset_index *)iterator->index;
-	*ret = memtx_bitset_index_value_to_tuple(index, value);
+		struct memtx_bitset_index *index =
+			(struct memtx_bitset_index *)iterator->index;
+		struct tuple *tuple =
+			memtx_bitset_index_value_to_tuple(index, value);
 #else /* #ifndef OLD_GOOD_BITSET */
-	*ret = value_to_tuple(value);
+		struct tuple *tuple = value_to_tuple(value);
 #endif /* #ifndef OLD_GOOD_BITSET */
+		uint32_t iid = iterator->index->def->iid;
+		struct txn *txn = in_txn();
+		struct space *space = space_by_id(iterator->space_id);
+		bool is_rw = txn != NULL;
+		*ret = txm_tuple_clarify(txn, space, tuple, iid, 0, is_rw);
+	} while (*ret == NULL);
+
 	return 0;
 }
 
diff --git a/src/box/memtx_hash.c b/src/box/memtx_hash.c
index cdd531c..6e3bc18 100644
--- a/src/box/memtx_hash.c
+++ b/src/box/memtx_hash.c
@@ -33,9 +33,10 @@
 #include "fiber.h"
 #include "index.h"
 #include "tuple.h"
+#include "txn.h"
 #include "memtx_engine.h"
 #include "space.h"
-#include "schema.h" /* space_cache_find() */
+#include "schema.h" /* space_by_id(), space_cache_find() */
 #include "errinj.h"
 
 #include <small/mempool.h>
@@ -101,7 +102,7 @@ hash_iterator_free(struct iterator *iterator)
 }
 
 static int
-hash_iterator_ge(struct iterator *ptr, struct tuple **ret)
+hash_iterator_ge_base(struct iterator *ptr, struct tuple **ret)
 {
 	assert(ptr->free == hash_iterator_free);
 	struct hash_iterator *it = (struct hash_iterator *) ptr;
@@ -113,10 +114,10 @@ hash_iterator_ge(struct iterator *ptr, struct tuple **ret)
 }
 
 static int
-hash_iterator_gt(struct iterator *ptr, struct tuple **ret)
+hash_iterator_gt_base(struct iterator *ptr, struct tuple **ret)
 {
 	assert(ptr->free == hash_iterator_free);
-	ptr->next = hash_iterator_ge;
+	ptr->next = hash_iterator_ge_base;
 	struct hash_iterator *it = (struct hash_iterator *) ptr;
 	struct memtx_hash_index *index = (struct memtx_hash_index *)ptr->index;
 	struct tuple **res = light_index_iterator_get_and_next(&index->hash_table,
@@ -128,6 +129,32 @@ hash_iterator_gt(struct iterator *ptr, struct tuple **ret)
 	return 0;
 }
 
+#define WRAP_ITERATOR_METHOD(name)						\
+static int									\
+name(struct iterator *iterator, struct tuple **ret)				\
+{										\
+	struct txn *txn = in_txn();						\
+	struct space *space = space_by_id(iterator->space_id);			\
+	bool is_rw = txn != NULL;						\
+	uint32_t iid = iterator->index->def->iid;				\
+	bool is_first = true;							\
+	do {									\
+		int rc = is_first ? name##_base(iterator, ret)			\
+				  : hash_iterator_ge_base(iterator, ret);	\
+		if (rc != 0 || *ret == NULL)					\
+			return rc;						\
+		is_first = false;						\
+		*ret = txm_tuple_clarify(txn, space, *ret, iid, 0, is_rw);	\
+	} while (*ret == NULL);							\
+	return 0;								\
+}										\
+struct forgot_to_add_semicolon
+
+WRAP_ITERATOR_METHOD(hash_iterator_ge);
+WRAP_ITERATOR_METHOD(hash_iterator_gt);
+
+#undef WRAP_ITERATOR_METHOD
+
 static int
 hash_iterator_eq_next(MAYBE_UNUSED struct iterator *it, struct tuple **ret)
 {
@@ -139,7 +166,14 @@ static int
 hash_iterator_eq(struct iterator *it, struct tuple **ret)
 {
 	it->next = hash_iterator_eq_next;
-	return hash_iterator_ge(it, ret);
+	hash_iterator_ge_base(it, ret); /* always returns zero. */
+	if (*ret == NULL)
+		return 0;
+	struct txn *txn = in_txn();
+	struct space *sp = space_by_id(it->space_id);
+	bool is_rw = txn != NULL;
+	*ret = txm_tuple_clarify(txn, sp, *ret, it->index->def->iid, 0, is_rw);
+	return 0;
 }
 
 /* }}} */
@@ -279,11 +313,17 @@ memtx_hash_index_get(struct index *base, const char *key,
 	       part_count == base->def->key_def->part_count);
 	(void) part_count;
 
+	struct space *space = space_by_id(base->def->space_id);
 	*result = NULL;
 	uint32_t h = key_hash(key, base->def->key_def);
 	uint32_t k = light_index_find_key(&index->hash_table, h, key);
-	if (k != light_index_end)
-		*result = light_index_get(&index->hash_table, k);
+	if (k != light_index_end) {
+		struct tuple *tuple = light_index_get(&index->hash_table, k);
+		uint32_t iid = base->def->iid;
+		struct txn *txn = in_txn();
+		bool is_rw = txn != NULL;
+		*result = txm_tuple_clarify(txn, space, tuple, iid, 0, is_rw);
+	}
 	return 0;
 }
 
@@ -401,6 +441,7 @@ struct hash_snapshot_iterator {
 	struct snapshot_iterator base;
 	struct memtx_hash_index *index;
 	struct light_index_iterator iterator;
+	struct txm_snapshot_cleaner cleaner;
 };
 
 /**
@@ -418,6 +459,7 @@ hash_snapshot_iterator_free(struct snapshot_iterator *iterator)
 				      it->index->base.engine);
 	light_index_iterator_destroy(&it->index->hash_table, &it->iterator);
 	index_unref(&it->index->base);
+	txm_snapshot_cleaner_destroy(&it->cleaner);
 	free(iterator);
 }
 
@@ -434,13 +476,24 @@ hash_snapshot_iterator_next(struct snapshot_iterator *iterator,
 	struct hash_snapshot_iterator *it =
 		(struct hash_snapshot_iterator *) iterator;
 	struct light_index_core *hash_table = &it->index->hash_table;
-	struct tuple **res = light_index_iterator_get_and_next(hash_table,
-							       &it->iterator);
-	if (res == NULL) {
-		*data = NULL;
-		return 0;
+
+	while (true) {
+		struct tuple **res =
+			light_index_iterator_get_and_next(hash_table,
+			                                  &it->iterator);
+		if (res == NULL) {
+			*data = NULL;
+			return 0;
+		}
+
+		struct tuple *tuple = *res;
+		tuple = txm_snapshot_clarify(&it->cleaner, tuple);
+
+		if (tuple != NULL) {
+			*data = tuple_data_range(*res, size);
+			return 0;
+		}
 	}
-	*data = tuple_data_range(*res, size);
 	return 0;
 }
 
diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c
index 612fcb2..c7f1655 100644
--- a/src/box/memtx_rtree.c
+++ b/src/box/memtx_rtree.c
@@ -40,7 +40,9 @@
 #include "trivia/util.h"
 
 #include "tuple.h"
+#include "txn.h"
 #include "space.h"
+#include "schema.h"
 #include "memtx_engine.h"
 
 struct memtx_rtree_index {
@@ -148,7 +150,16 @@ static int
 index_rtree_iterator_next(struct iterator *i, struct tuple **ret)
 {
 	struct index_rtree_iterator *itr = (struct index_rtree_iterator *)i;
-	*ret = (struct tuple *)rtree_iterator_next(&itr->impl);
+	do {
+		*ret = (struct tuple *) rtree_iterator_next(&itr->impl);
+		if (*ret == NULL)
+			break;
+		uint32_t iid = i->index->def->iid;
+		struct txn *txn = in_txn();
+		struct space *space = space_by_id(i->space_id);
+		bool is_rw = txn != NULL;
+		*ret = txm_tuple_clarify(txn, space, *ret, iid, 0, is_rw);
+	} while (*ret == NULL);
 	return 0;
 }
 
@@ -213,8 +224,21 @@ memtx_rtree_index_get(struct index *base, const char *key,
 		unreachable();
 
 	*result = NULL;
-	if (rtree_search(&index->tree, &rect, SOP_OVERLAPS, &iterator))
-		*result = (struct tuple *)rtree_iterator_next(&iterator);
+	if (!rtree_search(&index->tree, &rect, SOP_OVERLAPS, &iterator)) {
+		rtree_iterator_destroy(&iterator);
+		return 0;
+	}
+	do {
+		struct tuple *tuple = (struct tuple *)
+			rtree_iterator_next(&iterator);
+		if (tuple == NULL)
+			break;
+		uint32_t iid = base->def->iid;
+		struct txn *txn = in_txn();
+		struct space *space = space_by_id(base->def->space_id);
+		bool is_rw = txn != NULL;
+		*result = txm_tuple_clarify(txn, space, tuple, iid, 0, is_rw);
+	} while (*result == NULL);
 	rtree_iterator_destroy(&iterator);
 	return 0;
 }
diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c
index 76ff3fc..fe93467 100644
--- a/src/box/memtx_tree.c
+++ b/src/box/memtx_tree.c
@@ -31,12 +31,13 @@
 #include "memtx_tree.h"
 #include "memtx_engine.h"
 #include "space.h"
-#include "schema.h" /* space_cache_find() */
+#include "schema.h" /* space_by_id(), space_cache_find() */
 #include "errinj.h"
 #include "memory.h"
 #include "fiber.h"
 #include "key_list.h"
 #include "tuple.h"
+#include "txn.h"
 #include <third_party/qsort_arg.h>
 #include <small/mempool.h>
 
@@ -175,7 +176,7 @@ tree_iterator_dummie(struct iterator *iterator, struct tuple **ret)
 }
 
 static int
-tree_iterator_next(struct iterator *iterator, struct tuple **ret)
+tree_iterator_next_base(struct iterator *iterator, struct tuple **ret)
 {
 	struct memtx_tree_index *index =
 		(struct memtx_tree_index *)iterator->index;
@@ -205,7 +206,7 @@ tree_iterator_next(struct iterator *iterator, struct tuple **ret)
 }
 
 static int
-tree_iterator_prev(struct iterator *iterator, struct tuple **ret)
+tree_iterator_prev_base(struct iterator *iterator, struct tuple **ret)
 {
 	struct memtx_tree_index *index =
 		(struct memtx_tree_index *)iterator->index;
@@ -234,7 +235,7 @@ tree_iterator_prev(struct iterator *iterator, struct tuple **ret)
 }
 
 static int
-tree_iterator_next_equal(struct iterator *iterator, struct tuple **ret)
+tree_iterator_next_equal_base(struct iterator *iterator, struct tuple **ret)
 {
 	struct memtx_tree_index *index =
 		(struct memtx_tree_index *)iterator->index;
@@ -270,7 +271,7 @@ tree_iterator_next_equal(struct iterator *iterator, struct tuple **ret)
 }
 
 static int
-tree_iterator_prev_equal(struct iterator *iterator, struct tuple **ret)
+tree_iterator_prev_equal_base(struct iterator *iterator, struct tuple **ret)
 {
 	struct memtx_tree_index *index =
 		(struct memtx_tree_index *)iterator->index;
@@ -304,6 +305,47 @@ tree_iterator_prev_equal(struct iterator *iterator, struct tuple **ret)
 	return 0;
 }
 
+#define WRAP_ITERATOR_METHOD(name)						\
+static int									\
+name(struct iterator *iterator, struct tuple **ret)				\
+{										\
+	struct memtx_tree *tree =						\
+		&((struct memtx_tree_index *)iterator->index)->tree;		\
+	struct tree_iterator *it = tree_iterator(iterator);			\
+	struct memtx_tree_iterator *ti = &it->tree_iterator;			\
+	uint32_t iid = iterator->index->def->iid;				\
+	bool is_multikey = iterator->index->def->key_def->is_multikey;		\
+	struct txn *txn = in_txn();						\
+	struct space *space = space_by_id(iterator->space_id);			\
+	bool is_rw = txn != NULL;						\
+	do {									\
+		int rc = name##_base(iterator, ret);				\
+		if (rc != 0 || *ret == NULL)					\
+			return rc;						\
+		uint32_t mk_index = 0;						\
+		if (is_multikey) {						\
+			struct memtx_tree_data *check =				\
+				memtx_tree_iterator_get_elem(tree, ti);		\
+			assert(check != NULL);					\
+			mk_index = check->hint;					\
+		}								\
+		*ret = txm_tuple_clarify(txn, space, *ret,			\
+					 iid, mk_index, is_rw);			\
+	} while (*ret == NULL);							\
+	tuple_unref(it->current.tuple);						\
+	it->current.tuple = *ret;						\
+	tuple_ref(it->current.tuple);						\
+	return 0;								\
+}										\
+struct forgot_to_add_semicolon
+
+WRAP_ITERATOR_METHOD(tree_iterator_next);
+WRAP_ITERATOR_METHOD(tree_iterator_prev);
+WRAP_ITERATOR_METHOD(tree_iterator_next_equal);
+WRAP_ITERATOR_METHOD(tree_iterator_prev_equal);
+
+#undef WRAP_ITERATOR_METHOD
+
 static void
 tree_iterator_set_next_method(struct tree_iterator *it)
 {
@@ -388,6 +430,22 @@ tree_iterator_start(struct iterator *iterator, struct tuple **ret)
 	tuple_ref(*ret);
 	it->current = *res;
 	tree_iterator_set_next_method(it);
+
+	uint32_t iid = iterator->index->def->iid;
+	bool is_multikey = iterator->index->def->key_def->is_multikey;
+	struct txn *txn = in_txn();
+	struct space *space = space_by_id(iterator->space_id);
+	bool is_rw = txn != NULL;
+	uint32_t mk_index = is_multikey ? res->hint : 0;
+	*ret = txm_tuple_clarify(txn, space, *ret, iid, mk_index, is_rw);
+	if (*ret == NULL) {
+		return iterator->next(iterator, ret);
+	} else {
+		tuple_unref(it->current.tuple);
+		it->current.tuple = *ret;
+		tuple_ref(it->current.tuple);
+	}
+
 	return 0;
 }
 
@@ -539,7 +597,16 @@ memtx_tree_index_get(struct index *base, const char *key,
 	key_data.part_count = part_count;
 	key_data.hint = key_hint(key, part_count, cmp_def);
 	struct memtx_tree_data *res = memtx_tree_find(&index->tree, &key_data);
-	*result = res != NULL ? res->tuple : NULL;
+	if (res == NULL) {
+		*result = NULL;
+		return 0;
+	}
+	struct txn *txn = in_txn();
+	struct space *space = space_by_id(base->def->space_id);
+	bool is_rw = txn != NULL;
+	uint32_t mk_index = base->def->key_def->is_multikey ? res->hint : 0;
+	*result = txm_tuple_clarify(txn, space, res->tuple, base->def->iid,
+				    mk_index, is_rw);
 	return 0;
 }
 
@@ -1208,6 +1275,7 @@ struct tree_snapshot_iterator {
 	struct snapshot_iterator base;
 	struct memtx_tree_index *index;
 	struct memtx_tree_iterator tree_iterator;
+	struct txm_snapshot_cleaner cleaner;
 };
 
 static void
@@ -1220,6 +1288,7 @@ tree_snapshot_iterator_free(struct snapshot_iterator *iterator)
 				      it->index->base.engine);
 	memtx_tree_iterator_destroy(&it->index->tree, &it->tree_iterator);
 	index_unref(&it->index->base);
+	txm_snapshot_cleaner_destroy(&it->cleaner);
 	free(iterator);
 }
 
@@ -1231,14 +1300,27 @@ tree_snapshot_iterator_next(struct snapshot_iterator *iterator,
 	struct tree_snapshot_iterator *it =
 		(struct tree_snapshot_iterator *)iterator;
 	struct memtx_tree *tree = &it->index->tree;
-	struct memtx_tree_data *res = memtx_tree_iterator_get_elem(tree,
-							&it->tree_iterator);
-	if (res == NULL) {
-		*data = NULL;
-		return 0;
+
+	while (true) {
+		struct memtx_tree_data *res =
+			memtx_tree_iterator_get_elem(tree, &it->tree_iterator);
+
+		if (res == NULL) {
+			*data = NULL;
+			return 0;
+		}
+
+		memtx_tree_iterator_next(tree, &it->tree_iterator);
+
+		struct tuple *tuple = res->tuple;
+		tuple = txm_snapshot_clarify(&it->cleaner, tuple);
+
+		if (tuple != NULL) {
+			*data = tuple_data_range(tuple, size);
+			return 0;
+		}
 	}
-	memtx_tree_iterator_next(tree, &it->tree_iterator);
-	*data = tuple_data_range(res->tuple, size);
+
 	return 0;
 }
 
@@ -1251,14 +1333,21 @@ static struct snapshot_iterator *
 memtx_tree_index_create_snapshot_iterator(struct index *base)
 {
 	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct tree_snapshot_iterator *it = (struct tree_snapshot_iterator *)
-		calloc(1, sizeof(*it));
+	struct tree_snapshot_iterator *it =
+		(struct tree_snapshot_iterator *) calloc(1, sizeof(*it));
 	if (it == NULL) {
 		diag_set(OutOfMemory, sizeof(struct tree_snapshot_iterator),
 			 "memtx_tree_index", "create_snapshot_iterator");
 		return NULL;
 	}
 
+	struct space *space = space_cache_find(base->def->space_id);
+	if (txm_snapshot_cleaner_create(&it->cleaner, space,
+					"memtx_tree_index") != 0) {
+		free(it);
+		return NULL;
+	}
+
 	it->base.free = tree_snapshot_iterator_free;
 	it->base.next = tree_snapshot_iterator_next;
 	it->index = index;
diff --git a/src/box/space.c b/src/box/space.c
index 1d375cc..7394316 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -210,6 +210,7 @@ space_create(struct space *space, struct engine *engine,
 			 "constraint_ids");
 		goto fail;
 	}
+	rlist_create(&space->txm_stories);
 	return 0;
 
 fail_free_indexes:
@@ -252,6 +253,7 @@ space_new_ephemeral(struct space_def *def, struct rlist *key_list)
 void
 space_delete(struct space *space)
 {
+	rlist_del(&space->txm_stories);
 	assert(space->ck_constraint_trigger == NULL);
 	for (uint32_t j = 0; j <= space->index_id_max; j++) {
 		struct index *index = space->index_map[j];
diff --git a/src/box/space.h b/src/box/space.h
index bbdd3ef..9dcc4e7 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -239,6 +239,10 @@ struct space {
 	 * Hash table with constraint identifiers hashed by name.
 	 */
 	struct mh_strnptr_t *constraint_ids;
+	/**
+	 * List of all tx stories in the space.
+	 */
+	struct rlist txm_stories;
 };
 
 /** Initialize a base space instance. */
diff --git a/src/box/txn.c b/src/box/txn.c
index 4c46b60..b4888b3 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -101,6 +101,20 @@ struct tx_conflict_tracker {
 	struct rlist in_conflicted_by_list;
 };
 
+/**
+ * Record that links transaction and a story that the transaction have read.
+ */
+struct tx_read_tracker {
+	/** The TX that read story. */
+	struct txn *reader;
+	/** The story that was read by reader. */
+	struct txm_story *story;
+	/** Link in story->reader_list. */
+	struct rlist in_reader_list;
+	/** Link in reader->read_set. */
+	struct rlist in_read_set;
+};
+
 double too_long_threshold;
 
 /* Txn cache. */
@@ -263,6 +277,7 @@ txn_new(void)
 	}
 	assert(region_used(&region) == sizeof(*txn));
 	txn->region = region;
+	rlist_create(&txn->read_set);
 	rlist_create(&txn->conflict_list);
 	rlist_create(&txn->conflicted_by_list);
 	rlist_create(&txn->in_read_view_txs);
@@ -275,6 +290,14 @@ txn_new(void)
 inline static void
 txn_free(struct txn *txn)
 {
+	struct tx_read_tracker *tracker, *tmp;
+	rlist_foreach_entry_safe(tracker, &txn->read_set,
+				 in_read_set, tmp) {
+		rlist_del(&tracker->in_reader_list);
+		rlist_del(&tracker->in_read_set);
+	}
+	assert(rlist_empty(&txn->read_set));
+
 	struct tx_conflict_tracker *entry, *next;
 	rlist_foreach_entry_safe(entry, &txn->conflict_list,
 				 in_conflict_list, next) {
@@ -1430,6 +1453,7 @@ txm_story_new(struct space *space, struct tuple *tuple)
 	story->del_psn = 0;
 	rlist_create(&story->reader_list);
 	rlist_add_tail(&txm.all_stories, &story->in_all_stories);
+	rlist_add(&space->txm_stories, &story->in_space_stories);
 	memset(story->link, 0, sizeof(story->link[0]) * index_count);
 	return story;
 }
@@ -1963,26 +1987,32 @@ txm_history_prepare_stmt(struct txn_stmt *stmt)
 			i++;
 			continue;
 		}
+		bool old_story_is_prepared = false;
 		struct txm_story *old_story =
 			story->link[i].older.story;
 		if (old_story->del_psn != 0) {
-			/* is psn is set, the change is prepared. */
-			i++;
-			continue;
-		}
-		if (old_story->add_psn != 0) {
-			/* is psn is set, the change is prepared. */
-			i++;
-			continue;
-		}
-
-		if (old_story->add_stmt != NULL) {
+			/* if psn is set, the change is prepared. */
+			old_story_is_prepared = true;
+		} else if (old_story->add_psn != 0) {
+			/* if psn is set, the change is prepared. */
+			old_story_is_prepared = true;
+		} else if (old_story->add_stmt != NULL) {
 			/* ancient. */
-			i++;
-			continue;
-		}
-		if (old_story->add_stmt->txn == stmt->txn) {
+			old_story_is_prepared = true;
+		} else if (old_story->add_stmt->txn == stmt->txn) {
 			/* added by us. */
+		}
+
+		if (old_story_is_prepared) {
+			struct tx_read_tracker *tracker;
+			rlist_foreach_entry(tracker, &old_story->reader_list,
+					    in_reader_list) {
+				if (tracker->reader == stmt->txn)
+					continue;
+				if (tracker->reader->status != TXN_INPROGRESS)
+					continue;
+				tracker->reader->status = TXN_CONFLICTED;
+			}
 			i++;
 			continue;
 		}
@@ -2091,7 +2121,6 @@ txm_tuple_clarify_slow(struct txn *txn, struct space *space,
 		       struct tuple *tuple, uint32_t index,
 		       uint32_t mk_index, bool is_prepared_ok)
 {
-	(void)space;
 	assert(tuple->is_dirty);
 	struct txm_story *story = txm_story_get(tuple);
 	bool own_change = false;
@@ -2109,7 +2138,8 @@ txm_tuple_clarify_slow(struct txn *txn, struct space *space,
 			break;
 		}
 	}
-	(void)own_change; /* TODO: add conflict */
+	if (!own_change)
+		txm_track_read(txn, space, tuple);
 	(void)mk_index; /* TODO: multiindex */
 	return result;
 }
@@ -2123,6 +2153,7 @@ txm_story_delete(struct txm_story *story)
 	if (txm.traverse_all_stories == &story->in_all_stories)
 		txm.traverse_all_stories = rlist_next(txm.traverse_all_stories);
 	rlist_del(&story->in_all_stories);
+	rlist_del(&story->in_space_stories);
 
 	mh_int_t pos = mh_history_find(txm.history, story->tuple, 0);
 	assert(pos != mh_end(txm.history));
@@ -2143,3 +2174,168 @@ txm_story_delete(struct txm_story *story)
 	struct mempool *pool = &txm.txm_story_pool[story->index_count];
 	mempool_free(pool, story);
 }
+
+
+static uint32_t
+txm_snapshot_cleaner_hash(const struct tuple *a)
+{
+	uintptr_t u = (uintptr_t)a;
+	if (sizeof(uintptr_t) <= sizeof(uint32_t))
+		return u;
+	else
+		return u ^ (u >> 32);
+}
+
+struct txm_snapshot_cleaner_entry
+{
+	struct tuple *from;
+	struct tuple *to;
+};
+
+#define mh_name _snapshot_cleaner
+#define mh_key_t struct tuple *
+#define mh_node_t struct txm_snapshot_cleaner_entry
+#define mh_arg_t int
+#define mh_hash(a, arg) (txm_snapshot_cleaner_hash((a)->from))
+#define mh_hash_key(a, arg) (txm_snapshot_cleaner_hash(a))
+#define mh_cmp(a, b, arg) (((a)->from) != ((b)->from))
+#define mh_cmp_key(a, b, arg) ((a) != ((b)->from))
+#define MH_SOURCE
+#include "salad/mhash.h"
+
+int
+txm_snapshot_cleaner_create(struct txm_snapshot_cleaner *cleaner,
+			    struct space *space, const char *index_name)
+{
+	cleaner->ht = NULL;
+	if (space == NULL || rlist_empty(&space->txm_stories))
+		return 0;
+	struct mh_snapshot_cleaner_t *ht = mh_snapshot_cleaner_new();
+	if (ht == NULL) {
+		diag_set(OutOfMemory, sizeof(*ht),
+			 index_name, "snapshot cleaner");
+		free(ht);
+		return -1;
+	}
+
+	struct txm_story *story;
+	rlist_foreach_entry(story, &space->txm_stories, in_space_stories) {
+		struct tuple *tuple = story->tuple;
+		struct tuple *clean =
+			txm_tuple_clarify_slow(NULL, space, tuple, 0, 0, true);
+		if (clean == tuple)
+			continue;
+
+		struct txm_snapshot_cleaner_entry entry;
+		entry.from = tuple;
+		entry.to = clean;
+		mh_int_t res =  mh_snapshot_cleaner_put(ht,  &entry, NULL, 0);
+		if (res == mh_end(ht)) {
+			diag_set(OutOfMemory, sizeof(entry),
+				 index_name, "snapshot rollback entry");
+			mh_snapshot_cleaner_delete(ht);
+			return -1;
+		}
+	}
+
+	cleaner->ht = ht;
+	return 0;
+}
+
+struct tuple *
+txm_snapshot_clarify_slow(struct txm_snapshot_cleaner *cleaner,
+			  struct tuple *tuple)
+{
+	assert(cleaner->ht != NULL);
+
+	struct mh_snapshot_cleaner_t *ht = cleaner->ht;
+	while (true) {
+		mh_int_t pos =  mh_snapshot_cleaner_find(ht, tuple, 0);
+		if (pos == mh_end(ht))
+			break;
+		struct txm_snapshot_cleaner_entry *entry =
+			mh_snapshot_cleaner_node(ht, pos);
+		assert(entry->from == tuple);
+		tuple = entry->to;
+	}
+
+	return tuple;
+}
+
+
+void
+txm_snapshot_cleaner_destroy(struct txm_snapshot_cleaner *cleaner)
+{
+	if (cleaner->ht != NULL)
+		mh_snapshot_cleaner_delete(cleaner->ht);
+}
+
+int
+txm_track_read(struct txn *txn, struct space *space, struct tuple *tuple)
+{
+	if (tuple == NULL)
+		return 0;
+	if (txn == NULL)
+		return 0;
+	if (space == NULL)
+		return 0;
+
+	struct txm_story *story;
+	struct tx_read_tracker *tracker = NULL;
+
+	if (!tuple->is_dirty) {
+		story = txm_story_new(space, tuple);
+		if (story != NULL)
+			return -1;
+		size_t sz;
+		tracker = region_alloc_object(&txn->region,
+					      struct tx_read_tracker, &sz);
+		if (tracker == NULL) {
+			diag_set(OutOfMemory, sz, "tx region", "read_tracker");
+			txm_story_delete(story);
+			return -1;
+		}
+		tracker->reader = txn;
+		tracker->story = story;
+		rlist_add(&story->reader_list, &tracker->in_reader_list);
+		rlist_add(&txn->read_set, &tracker->in_read_set);
+		return 0;
+	}
+	story = txm_story_get(tuple);
+
+	struct rlist *r1 = story->reader_list.next;
+	struct rlist *r2 = txn->read_set.next;
+	while (r1 != &story->reader_list && r2 != &txn->read_set) {
+		tracker = rlist_entry(r1, struct tx_read_tracker,
+				      in_reader_list);
+		assert(tracker->story == story);
+		if (tracker->reader == txn)
+			break;
+		tracker = rlist_entry(r2, struct tx_read_tracker,
+				      in_read_set);
+		assert(tracker->reader == txn);
+		if (tracker->story == story)
+			break;
+		tracker = NULL;
+		r1 = r1->next;
+		r2 = r2->next;
+	}
+	if (tracker != NULL) {
+		/* Move to the beginning of a list for faster further lookups.*/
+		rlist_del(&tracker->in_reader_list);
+		rlist_del(&tracker->in_read_set);
+	} else {
+		size_t sz;
+		tracker = region_alloc_object(&txn->region,
+					      struct tx_read_tracker, &sz);
+		if (tracker == NULL) {
+			diag_set(OutOfMemory, sz, "tx region", "read_tracker");
+			return -1;
+		}
+		tracker->reader = txn;
+		tracker->story = story;
+	}
+	rlist_add(&story->reader_list, &tracker->in_reader_list);
+	rlist_add(&txn->read_set, &tracker->in_read_set);
+	return 0;
+}
diff --git a/src/box/txn.h b/src/box/txn.h
index e294497..5fe87c5 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -379,6 +379,8 @@ struct txn {
 	 * Link in tx_manager::read_view_txs.
 	 */
 	struct rlist in_read_view_txs;
+	/** List of tx_read_trackers with stories that the TX have read. */
+	struct rlist read_set;
 };
 
 /**
@@ -453,6 +455,10 @@ struct txm_story {
 	 */
 	struct rlist in_all_stories;
 	/**
+	 * Link in space::txm_stories.
+	 */
+	struct rlist in_space_stories;
+	/**
 	 * Number of indexes in this space - and the count of link[].
 	 */
 	uint32_t index_count;
@@ -913,6 +919,13 @@ txm_history_prepare_stmt(struct txn_stmt *stmt);
 ssize_t
 txm_history_commit_stmt(struct txn_stmt *stmt);
 
+/**
+ * Record in TX manager that a transaction @txn have read a @tuple in @space.
+ * @return 0 on success, -1 on memory error.
+ */
+int
+txm_track_read(struct txn *txn, struct space *space, struct tuple *tuple);
+
 /** Helper of txm_tuple_clarify */
 struct tuple *
 txm_tuple_clarify_slow(struct txn *txn, struct space *space,
@@ -936,12 +949,61 @@ txm_tuple_clarify(struct txn *txn, struct space *space,
 {
 	if (!tx_manager_use_mvcc_engine)
 		return tuple;
-	if (!tuple->is_dirty)
+	if (!tuple->is_dirty) {
+		txm_track_read(txn, space, tuple);
 		return tuple;
+	}
 	return txm_tuple_clarify_slow(txn, space, tuple, index, mk_index,
 				      is_prepared_ok);
 }
 
+/**
+ * Snapshot cleaner is a short part of history that is supposed to clarify
+ * tuples in a index snapshot. It's also supposed to be used in another
+ * thread while common clarify would probably crash in that case.
+ */
+struct txm_snapshot_cleaner {
+	struct mh_snapshot_cleaner_t *ht;
+};
+
+/**
+ * Create a snapshot cleaner.
+ * @param cleaner - cleaner to create.
+ * @param space - space for which the cleaner must be created.
+ * @param index_name - name of index for diag in case of memory error.
+ * @return 0 on success, -1 on memory erorr.
+ */
+int
+txm_snapshot_cleaner_create(struct txm_snapshot_cleaner *cleaner,
+			    struct space *space, const char *index_name);
+
+/** Helper of txm_snapshot_clafify. */
+struct tuple *
+txm_snapshot_clarify_slow(struct txm_snapshot_cleaner *cleaner,
+			  struct tuple *tuple);
+
+/**
+ * Like a common clarify that function returns proper tuple if original
+ * tuple in index is dirty.
+ * @param cleaner - pre-created snapshot cleaner.
+ * @param tuple - tuple to clean.
+ * @return cleaned tuple, can be NULL.
+ */
+static inline struct tuple *
+txm_snapshot_clarify(struct txm_snapshot_cleaner *cleaner,
+		     struct tuple *tuple)
+{
+	if (cleaner->ht == NULL)
+		return tuple;
+	return txm_snapshot_clarify_slow(cleaner, tuple);
+}
+
+/**
+ * Free resources.in shapshot @cleaner.
+ */
+void
+txm_snapshot_cleaner_destroy(struct txm_snapshot_cleaner *cleaner);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (11 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 12/13] txm: clarify all fetched tuples Aleksandr Lyapunov
@ 2020-07-15 13:55 ` Aleksandr Lyapunov
  2020-07-16 22:26   ` Vladislav Shpilevoy
  2020-07-15 15:47 ` [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 13:55 UTC (permalink / raw)
  To: tarantool-patches

Use mvcc transaction engine in memtx if the engine is enabled.

Closes #4897
---
 src/box/memtx_engine.c | 40 ++++++++++++++++++++++++++++++++++++----
 src/box/memtx_space.c  | 28 ++++++++++++++++++++++++----
 src/box/txn.c          |  3 +++
 src/box/vinyl.c        | 14 ++++++++++----
 4 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index dfd6fce..1402dd7 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -335,10 +335,39 @@ static int
 memtx_engine_begin(struct engine *engine, struct txn *txn)
 {
 	(void)engine;
-	txn_can_yield(txn, false);
+	if (!tx_manager_use_mvcc_engine)
+		txn_can_yield(txn, false);
 	return 0;
 }
 
+static int
+memtx_engine_prepare(struct engine *engine, struct txn *txn)
+{
+	(void)engine;
+	struct txn_stmt *stmt;
+	stailq_foreach_entry(stmt, &txn->stmts, next) {
+		if (stmt->add_story != NULL || stmt->del_story != NULL)
+			txm_history_prepare_stmt(stmt);
+	}
+	return 0;
+}
+
+static void
+memtx_engine_commit(struct engine *engine, struct txn *txn)
+{
+	(void)engine;
+	struct txn_stmt *stmt;
+	stailq_foreach_entry(stmt, &txn->stmts, next) {
+		if (stmt->add_story != NULL || stmt->del_story != NULL) {
+			ssize_t bsize = txm_history_commit_stmt(stmt);
+			assert(stmt->space->engine == engine);
+			struct memtx_space *mspace =
+				(struct memtx_space *)stmt->space;
+			mspace->bsize += bsize;
+		}
+	}
+}
+
 static void
 memtx_engine_rollback_statement(struct engine *engine, struct txn *txn,
 				struct txn_stmt *stmt)
@@ -348,13 +377,16 @@ memtx_engine_rollback_statement(struct engine *engine, struct txn *txn,
 	if (stmt->old_tuple == NULL && stmt->new_tuple == NULL)
 		return;
 	struct space *space = stmt->space;
-	struct memtx_space *memtx_space = (struct memtx_space *)space;
+	struct memtx_space *memtx_space = (struct memtx_space*) space;
 	uint32_t index_count;
 
 	/* Only roll back the changes if they were made. */
 	if (stmt->engine_savepoint == NULL)
 		return;
 
+	if (stmt->add_story != NULL || stmt->del_story != NULL)
+		return txm_history_rollback_stmt(stmt);
+
 	if (memtx_space->replace == memtx_space_replace_all_keys)
 		index_count = space->index_count;
 	else if (memtx_space->replace == memtx_space_replace_primary_key)
@@ -914,8 +946,8 @@ static const struct engine_vtab memtx_engine_vtab = {
 	/* .complete_join = */ memtx_engine_complete_join,
 	/* .begin = */ memtx_engine_begin,
 	/* .begin_statement = */ generic_engine_begin_statement,
-	/* .prepare = */ generic_engine_prepare,
-	/* .commit = */ generic_engine_commit,
+	/* .prepare = */ memtx_engine_prepare,
+	/* .commit = */ memtx_engine_commit,
 	/* .rollback_statement = */ memtx_engine_rollback_statement,
 	/* .rollback = */ generic_engine_rollback,
 	/* .switch_to_ro = */ generic_engine_switch_to_ro,
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index e48ed3a..66f28c9 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -260,6 +260,20 @@ memtx_space_replace_all_keys(struct space *space, struct tuple *old_tuple,
 	if (pk == NULL)
 		return -1;
 	assert(pk->def->opts.is_unique);
+
+	if (tx_manager_use_mvcc_engine) {
+		struct txn *txn = in_txn();
+		struct txn_stmt *stmt =
+			txn == NULL ? NULL : txn_current_stmt(txn);
+		if (stmt != NULL) {
+			return txm_history_add_stmt(stmt, old_tuple, new_tuple,
+						    mode, result);
+		} else {
+			/** Ephemeral space */
+			assert(space->def->id == 0);
+		}
+	}
+
 	/*
 	 * If old_tuple is not NULL, the index has to
 	 * find and delete it, or return an error.
@@ -896,7 +910,9 @@ memtx_space_check_format(struct space *space, struct tuple_format *format)
 	if (txn_check_singlestatement(txn, "space format check") != 0)
 		return -1;
 
-	txn_can_yield(txn, true);
+	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
+	if (!could_yield)
+		txn_can_yield(txn, true);
 
 	struct memtx_engine *memtx = (struct memtx_engine *)space->engine;
 	struct memtx_ddl_state state;
@@ -940,7 +956,8 @@ memtx_space_check_format(struct space *space, struct tuple_format *format)
 	iterator_delete(it);
 	diag_destroy(&state.diag);
 	trigger_clear(&on_replace);
-	txn_can_yield(txn, false);
+	if (!could_yield)
+		txn_can_yield(txn, false);
 	return rc;
 }
 
@@ -1054,7 +1071,9 @@ memtx_space_build_index(struct space *src_space, struct index *new_index,
 	if (txn_check_singlestatement(txn, "index build") != 0)
 		return -1;
 
-	txn_can_yield(txn, true);
+	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
+	if (!could_yield)
+		txn_can_yield(txn, true);
 
 	struct memtx_engine *memtx = (struct memtx_engine *)src_space->engine;
 	struct memtx_ddl_state state;
@@ -1132,7 +1151,8 @@ memtx_space_build_index(struct space *src_space, struct index *new_index,
 	iterator_delete(it);
 	diag_destroy(&state.diag);
 	trigger_clear(&on_replace);
-	txn_can_yield(txn, false);
+	if (!could_yield)
+		txn_can_yield(txn, false);
 	return rc;
 }
 
diff --git a/src/box/txn.c b/src/box/txn.c
index b4888b3..b9a190f 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -203,6 +203,9 @@ txn_stmt_new(struct region *region)
 static inline void
 txn_stmt_destroy(struct txn_stmt *stmt)
 {
+	if (stmt->add_story != NULL || stmt->del_story != NULL)
+		txm_history_rollback_stmt(stmt);
+
 	if (stmt->old_tuple != NULL)
 		tuple_unref(stmt->old_tuple);
 	if (stmt->new_tuple != NULL)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index f9252f1..f69d3d9 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -1084,7 +1084,9 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
 		return -1;
 
 	/* See the comment in vinyl_space_build_index(). */
-	txn_can_yield(txn, true);
+	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
+	if (!could_yield)
+		txn_can_yield(txn, true);
 
 	struct trigger on_replace;
 	struct vy_check_format_ctx ctx;
@@ -1136,7 +1138,8 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
 out:
 	diag_destroy(&ctx.diag);
 	trigger_clear(&on_replace);
-	txn_can_yield(txn, false);
+	if (!could_yield)
+		txn_can_yield(txn, false);
 	return rc;
 }
 
@@ -4183,7 +4186,9 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
 	 * change the data dictionary, so there is no dirty state
 	 * that can be observed.
 	 */
-	txn_can_yield(txn, true);
+	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
+	if (!could_yield)
+		txn_can_yield(txn, true);
 
 	/*
 	 * Iterate over all tuples stored in the space and insert
@@ -4284,7 +4289,8 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
 out:
 	diag_destroy(&ctx.diag);
 	trigger_clear(&on_replace);
-	txn_can_yield(txn, false);
+	if (!could_yield)
+		txn_can_yield(txn, false);
 	return rc;
 }
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (12 preceding siblings ...)
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx Aleksandr Lyapunov
@ 2020-07-15 15:47 ` Aleksandr Lyapunov
  2020-07-15 16:38 ` Aleksandr Lyapunov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 15:47 UTC (permalink / raw)
  To: tarantool-patches

Forgot again
GH issue: https://github.com/tarantool/tarantool/issues/4897
GH branch 
https://github.com/tarantool/tarantool/tree/alyapunov/gh-4897-memtx-tx-engine

On 15.07.2020 16:55, Aleksandr Lyapunov wrote:
> Changes in v3:
>    - Fixes after code review
>    - Lots of comments added
>    - Code cleanup
>    - A couple of bugs fixed
>
> Aleksandr Lyapunov (13):
>    Update license file (2020)
>    Check data_offset overflow in struct tuple
>    vinyl: rename tx_manager -> vy_tx_manager
>    txm: introduce dirty tuples
>    txm: save txn in txn_stmt
>    txm: add TX status
>    txm: save does_require_old_tuple flag in txn_stmt
>    txm: introduce tx manager
>    tmx: introduce prepare sequence number
>    tmx: introduce conflict tracker
>    txm: introduce txm_story
>    txm: clarify all fetched tuples
>    tmx: use new tx manager in memtx
>
>   LICENSE                               |    2 +-
>   src/box/errcode.h                     |    1 +
>   src/box/lua/load_cfg.lua              |    2 +
>   src/box/memtx_bitset.c                |   30 +-
>   src/box/memtx_engine.c                |   60 +-
>   src/box/memtx_hash.c                  |   79 ++-
>   src/box/memtx_rtree.c                 |   30 +-
>   src/box/memtx_space.c                 |   45 +-
>   src/box/memtx_tree.c                  |  119 +++-
>   src/box/space.c                       |    2 +
>   src/box/space.h                       |    4 +
>   src/box/tuple.c                       |   12 +-
>   src/box/tuple.h                       |   12 +-
>   src/box/tuple_format.c                |    4 +-
>   src/box/txn.c                         | 1186 +++++++++++++++++++++++++++++++++
>   src/box/txn.h                         |  355 ++++++++++
>   src/box/vinyl.c                       |   44 +-
>   src/box/vy_scheduler.h                |    2 +-
>   src/box/vy_stmt.c                     |    9 +
>   src/box/vy_tx.c                       |   51 +-
>   src/box/vy_tx.h                       |   33 +-
>   src/main.cc                           |    5 +
>   test/app-tap/init_script.result       |    1 +
>   test/box/admin.result                 |    2 +
>   test/box/cfg.result                   |    4 +
>   test/box/error.result                 |    1 +
>   test/box/huge_field_map.result        |   49 ++
>   test/box/huge_field_map.test.lua      |   22 +
>   test/box/huge_field_map_long.result   |   51 ++
>   test/box/huge_field_map_long.test.lua |   28 +
>   test/box/suite.ini                    |    1 +
>   31 files changed, 2123 insertions(+), 123 deletions(-)
>   create mode 100644 test/box/huge_field_map.result
>   create mode 100644 test/box/huge_field_map.test.lua
>   create mode 100644 test/box/huge_field_map_long.result
>   create mode 100644 test/box/huge_field_map_long.test.lua
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager Aleksandr Lyapunov
@ 2020-07-15 16:04   ` Nikita Pettik
  2020-07-16  8:17     ` Aleksandr Lyapunov
  0 siblings, 1 reply; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 16:04 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> Apart from other vinyl objects that are named with "vy_" prefix,
> its transaction manager (tx_manager) have no such prefix.
> It should have in order to avoid conflicts with global tx manager.
> 
> Needed for #4897
> ---

LGTM. The only concern I have now - mb we'd better call
new TX manager in memtx not tx_manager, but mem_tx_manager or
memtx_tx_manager? I mean tx_ prefix is more general than memtx_..

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples Aleksandr Lyapunov
@ 2020-07-15 16:22   ` Nikita Pettik
  2020-07-16  0:05   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 16:22 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> If the tuple is marked as dirty that usually means that it was
> somehow affected by a transaction. If a tuple found in index is
> dirty - we cannot immediately return to user, instead we must
> clarify it in transaction manager.
> 
> Part of #4897
> ---

LGTM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 05/13] txm: save txn in txn_stmt
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 05/13] txm: save txn in txn_stmt Aleksandr Lyapunov
@ 2020-07-15 16:23   ` Nikita Pettik
  0 siblings, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 16:23 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> There is a lot of places in transaction engine (see futher commits)
> where it's convenient to store just a pointer to tx statement while
> having a way to get the transaction itself by this pointer.
> Let's store a pointer to TX in TX statement for that purpose.
> 
> Part of #4897
> ---

LGTM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (13 preceding siblings ...)
  2020-07-15 15:47 ` [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
@ 2020-07-15 16:38 ` Aleksandr Lyapunov
  2020-07-15 16:39 ` Aleksandr Lyapunov
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 16:38 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy

And again CC didn't work. Sorry, it's very hard to control.
Adding CC

On 15.07.2020 16:55, Aleksandr Lyapunov wrote:
> Changes in v3:
>    - Fixes after code review
>    - Lots of comments added
>    - Code cleanup
>    - A couple of bugs fixed
>
> Aleksandr Lyapunov (13):
>    Update license file (2020)
>    Check data_offset overflow in struct tuple
>    vinyl: rename tx_manager -> vy_tx_manager
>    txm: introduce dirty tuples
>    txm: save txn in txn_stmt
>    txm: add TX status
>    txm: save does_require_old_tuple flag in txn_stmt
>    txm: introduce tx manager
>    tmx: introduce prepare sequence number
>    tmx: introduce conflict tracker
>    txm: introduce txm_story
>    txm: clarify all fetched tuples
>    tmx: use new tx manager in memtx
>
>   LICENSE                               |    2 +-
>   src/box/errcode.h                     |    1 +
>   src/box/lua/load_cfg.lua              |    2 +
>   src/box/memtx_bitset.c                |   30 +-
>   src/box/memtx_engine.c                |   60 +-
>   src/box/memtx_hash.c                  |   79 ++-
>   src/box/memtx_rtree.c                 |   30 +-
>   src/box/memtx_space.c                 |   45 +-
>   src/box/memtx_tree.c                  |  119 +++-
>   src/box/space.c                       |    2 +
>   src/box/space.h                       |    4 +
>   src/box/tuple.c                       |   12 +-
>   src/box/tuple.h                       |   12 +-
>   src/box/tuple_format.c                |    4 +-
>   src/box/txn.c                         | 1186 +++++++++++++++++++++++++++++++++
>   src/box/txn.h                         |  355 ++++++++++
>   src/box/vinyl.c                       |   44 +-
>   src/box/vy_scheduler.h                |    2 +-
>   src/box/vy_stmt.c                     |    9 +
>   src/box/vy_tx.c                       |   51 +-
>   src/box/vy_tx.h                       |   33 +-
>   src/main.cc                           |    5 +
>   test/app-tap/init_script.result       |    1 +
>   test/box/admin.result                 |    2 +
>   test/box/cfg.result                   |    4 +
>   test/box/error.result                 |    1 +
>   test/box/huge_field_map.result        |   49 ++
>   test/box/huge_field_map.test.lua      |   22 +
>   test/box/huge_field_map_long.result   |   51 ++
>   test/box/huge_field_map_long.test.lua |   28 +
>   test/box/suite.ini                    |    1 +
>   31 files changed, 2123 insertions(+), 123 deletions(-)
>   create mode 100644 test/box/huge_field_map.result
>   create mode 100644 test/box/huge_field_map.test.lua
>   create mode 100644 test/box/huge_field_map_long.result
>   create mode 100644 test/box/huge_field_map_long.test.lua
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (14 preceding siblings ...)
  2020-07-15 16:38 ` Aleksandr Lyapunov
@ 2020-07-15 16:39 ` Aleksandr Lyapunov
  2020-07-15 16:40 ` Aleksandr Lyapunov
  2020-07-16  0:05 ` Vladislav Shpilevoy
  17 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 16:39 UTC (permalink / raw)
  To: tarantool-patches

And again CC didn't work. Sorry, it's very hard to control.
Adding CC

On 15.07.2020 16:55, Aleksandr Lyapunov wrote:
> Changes in v3:
>    - Fixes after code review
>    - Lots of comments added
>    - Code cleanup
>    - A couple of bugs fixed
>
> Aleksandr Lyapunov (13):
>    Update license file (2020)
>    Check data_offset overflow in struct tuple
>    vinyl: rename tx_manager -> vy_tx_manager
>    txm: introduce dirty tuples
>    txm: save txn in txn_stmt
>    txm: add TX status
>    txm: save does_require_old_tuple flag in txn_stmt
>    txm: introduce tx manager
>    tmx: introduce prepare sequence number
>    tmx: introduce conflict tracker
>    txm: introduce txm_story
>    txm: clarify all fetched tuples
>    tmx: use new tx manager in memtx
>
>   LICENSE                               |    2 +-
>   src/box/errcode.h                     |    1 +
>   src/box/lua/load_cfg.lua              |    2 +
>   src/box/memtx_bitset.c                |   30 +-
>   src/box/memtx_engine.c                |   60 +-
>   src/box/memtx_hash.c                  |   79 ++-
>   src/box/memtx_rtree.c                 |   30 +-
>   src/box/memtx_space.c                 |   45 +-
>   src/box/memtx_tree.c                  |  119 +++-
>   src/box/space.c                       |    2 +
>   src/box/space.h                       |    4 +
>   src/box/tuple.c                       |   12 +-
>   src/box/tuple.h                       |   12 +-
>   src/box/tuple_format.c                |    4 +-
>   src/box/txn.c                         | 1186 +++++++++++++++++++++++++++++++++
>   src/box/txn.h                         |  355 ++++++++++
>   src/box/vinyl.c                       |   44 +-
>   src/box/vy_scheduler.h                |    2 +-
>   src/box/vy_stmt.c                     |    9 +
>   src/box/vy_tx.c                       |   51 +-
>   src/box/vy_tx.h                       |   33 +-
>   src/main.cc                           |    5 +
>   test/app-tap/init_script.result       |    1 +
>   test/box/admin.result                 |    2 +
>   test/box/cfg.result                   |    4 +
>   test/box/error.result                 |    1 +
>   test/box/huge_field_map.result        |   49 ++
>   test/box/huge_field_map.test.lua      |   22 +
>   test/box/huge_field_map_long.result   |   51 ++
>   test/box/huge_field_map_long.test.lua |   28 +
>   test/box/suite.ini                    |    1 +
>   31 files changed, 2123 insertions(+), 123 deletions(-)
>   create mode 100644 test/box/huge_field_map.result
>   create mode 100644 test/box/huge_field_map.test.lua
>   create mode 100644 test/box/huge_field_map_long.result
>   create mode 100644 test/box/huge_field_map_long.test.lua
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (15 preceding siblings ...)
  2020-07-15 16:39 ` Aleksandr Lyapunov
@ 2020-07-15 16:40 ` Aleksandr Lyapunov
  2020-07-16  0:05 ` Vladislav Shpilevoy
  17 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-15 16:40 UTC (permalink / raw)
  To: tarantool-patches, Nikita Pettik


On 15.07.2020 16:55, Aleksandr Lyapunov wrote:
> Changes in v3:
>    - Fixes after code review
>    - Lots of comments added
>    - Code cleanup
>    - A couple of bugs fixed
>
> Aleksandr Lyapunov (13):
>    Update license file (2020)
>    Check data_offset overflow in struct tuple
>    vinyl: rename tx_manager -> vy_tx_manager
>    txm: introduce dirty tuples
>    txm: save txn in txn_stmt
>    txm: add TX status
>    txm: save does_require_old_tuple flag in txn_stmt
>    txm: introduce tx manager
>    tmx: introduce prepare sequence number
>    tmx: introduce conflict tracker
>    txm: introduce txm_story
>    txm: clarify all fetched tuples
>    tmx: use new tx manager in memtx
>
>   LICENSE                               |    2 +-
>   src/box/errcode.h                     |    1 +
>   src/box/lua/load_cfg.lua              |    2 +
>   src/box/memtx_bitset.c                |   30 +-
>   src/box/memtx_engine.c                |   60 +-
>   src/box/memtx_hash.c                  |   79 ++-
>   src/box/memtx_rtree.c                 |   30 +-
>   src/box/memtx_space.c                 |   45 +-
>   src/box/memtx_tree.c                  |  119 +++-
>   src/box/space.c                       |    2 +
>   src/box/space.h                       |    4 +
>   src/box/tuple.c                       |   12 +-
>   src/box/tuple.h                       |   12 +-
>   src/box/tuple_format.c                |    4 +-
>   src/box/txn.c                         | 1186 +++++++++++++++++++++++++++++++++
>   src/box/txn.h                         |  355 ++++++++++
>   src/box/vinyl.c                       |   44 +-
>   src/box/vy_scheduler.h                |    2 +-
>   src/box/vy_stmt.c                     |    9 +
>   src/box/vy_tx.c                       |   51 +-
>   src/box/vy_tx.h                       |   33 +-
>   src/main.cc                           |    5 +
>   test/app-tap/init_script.result       |    1 +
>   test/box/admin.result                 |    2 +
>   test/box/cfg.result                   |    4 +
>   test/box/error.result                 |    1 +
>   test/box/huge_field_map.result        |   49 ++
>   test/box/huge_field_map.test.lua      |   22 +
>   test/box/huge_field_map_long.result   |   51 ++
>   test/box/huge_field_map_long.test.lua |   28 +
>   test/box/suite.ini                    |    1 +
>   31 files changed, 2123 insertions(+), 123 deletions(-)
>   create mode 100644 test/box/huge_field_map.result
>   create mode 100644 test/box/huge_field_map.test.lua
>   create mode 100644 test/box/huge_field_map_long.result
>   create mode 100644 test/box/huge_field_map_long.test.lua
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 06/13] txm: add TX status
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 06/13] txm: add TX status Aleksandr Lyapunov
@ 2020-07-15 16:42   ` Nikita Pettik
  2020-07-16  0:08   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 16:42 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> Transaction engine (see further commits) needs to distinguish and
> maniputate transactions by their status. The status describe the
> lifetime point of a transaction (inprogress, prepared, committed)
> and its abilities (conflicted, read view).
> 
> Part of #4897
> Part of #5108
> ---
> diff --git a/src/box/txn.h b/src/box/txn.h
> index 36b1a03..e261852 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -121,6 +121,40 @@ enum {
>  };
>  
>  /**
> + * Status of a transaction.
> + */
> +enum txn_status {
> +	/**
> +	 * Initial state of TX. The only state of a TX that allowed to do
> +	 * read or write actions.
> +	 */
> +	TXN_INPROGRESS,
> +	/**
> +	 * The TX have passed conflict checks and is ready to be committed.
> +	 */
> +	TXN_PREPARED,
> +	/**
> +	 * The TX was aborted when other TX was committed due to conflict.
> +	 */
> +	TXN_CONFLICTED,
> +	/**
> +	 * The TX was read_only, has a conflict and was sent to read view.
> +	 * Read-only and does not participate in conflict resolution ever more.
> +	 * This transaction can onlu see state of the database at some fixed

Nit: onlu -> only

> +	 * point in the past.
> +	 */
> +	TXN_IN_READ_VIEW,
> +	/**
> +	 * The TX was committed.
> +	 */
> +	TXN_COMMITTED,
> +	/**
> +	 * The TX was aborted by user.
> +	 */

I guess not only by user. If commit stage fails, tx also features this flag.

> +	TXN_ABORTED,
> +};
> +
> +/**
>   * A single statement of a multi-statement
>   * transaction: undo and redo info.
>   */

The rest I guess is OK.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt Aleksandr Lyapunov
@ 2020-07-15 16:49   ` Nikita Pettik
  2020-07-16  0:09   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 16:49 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> That flag is needed for transactional conflict manager - if any
> other transaction replaces old_tuple before current one and the flag
> is set - the current transaction will be aborted.
> For example REPLACE just replaces a key, no matter what tuple
> lays in the index and thus does_require_old_tuple = false.
> In contrast, UPDATE makes new tuple using old_tuple and thus
> the statement will require old_tuple (does_require_old_tuple = true).
> INSERT also does_require_old_tuple = true because it requires
> old_tuple to be NULL.
> 
> Part of #4897
> ---
>  src/box/memtx_space.c | 17 +++++++++++++++++
>  src/box/txn.c         |  3 +++
>  src/box/txn.h         | 13 +++++++++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
> index 8452ab4..e48ed3a 100644
> --- a/src/box/memtx_space.c
> +++ b/src/box/memtx_space.c
> @@ -316,6 +316,10 @@ memtx_space_execute_replace(struct space *space, struct txn *txn,
>  	if (stmt->new_tuple == NULL)
>  		return -1;
>  	tuple_ref(stmt->new_tuple);
> +
> +	if (mode == DUP_INSERT)
> +		stmt->does_require_old_tuple = true;

I'd prefer is_old_tuple_required name, but it's up to you.

LGTM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager Aleksandr Lyapunov
@ 2020-07-15 16:51   ` Nikita Pettik
  2020-07-15 22:01   ` Vladislav Shpilevoy
  2020-07-16  0:10   ` Vladislav Shpilevoy
  2 siblings, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 16:51 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> Define TX manager. It will store data for MVCC and conflict manager.
> Define also 'use_mvcc_engine' in config that enables MVCC engine.
> 
> Part of #4897
> ---
>  src/box/lua/load_cfg.lua        |  2 ++
>  src/box/txn.c                   | 21 +++++++++++++++++++++
>  src/box/txn.h                   | 12 ++++++++++++
>  src/main.cc                     |  5 +++++
>  test/app-tap/init_script.result |  1 +
>  test/box/admin.result           |  2 ++
>  test/box/cfg.result             |  4 ++++
>  7 files changed, 47 insertions(+)
> 
> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
> index 107bc15..8b40b29 100644
> --- a/src/box/lua/load_cfg.lua
> +++ b/src/box/lua/load_cfg.lua
> @@ -82,6 +82,7 @@ local default_cfg = {
>      coredump            = false,
>      read_only           = false,
>      hot_standby         = false,
> +    use_mvcc_engine     = false,

Mb memtx_mvcc or memtx_mvcc_engine? I mean to tell it from vinyl.

The rest is obvious, LGTM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number Aleksandr Lyapunov
@ 2020-07-15 17:13   ` Nikita Pettik
  2020-07-16  0:11   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-15 17:13 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> Prepare sequence number is a monotonically increasing ID that is
> assigned to any prepared transaction. This ID is suitable for
> serialization order resolution: the bigger is ID - the latter the
> transaction exist in the serialization order of transactions.

Nit: exists
 
> Note that id of transactions has quite different order in case
> when transaction could yield - an younger (bigger id) transaction
> can prepare/commit first (lower psn) while older tx sleeps in vain.
> 
> Also it should be mentioned that LSN has the same order as PSN,
> but it has two general differences:
> 1. The LSN sequence has no holes, i.e. it is a natural number
> sequence. This property is useless for transaction engine.
> 2. The LSN sequence is provided by WAL writer and thus LSN is not
> available for TX thas was prepared and haven't been committed yet.

Nit: thas -> that

> That feature makes psn more suitable sequence for transactions as
> it allows to order prepared but not committed transaction and
> allows, for example, to create a read view between prepared
> transactions.
> 
> Part of #4897
> ---

LGTM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager Aleksandr Lyapunov
  2020-07-15 16:51   ` Nikita Pettik
@ 2020-07-15 22:01   ` Vladislav Shpilevoy
  2020-07-16  0:10   ` Vladislav Shpilevoy
  2 siblings, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-15 22:01 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

I agree with Nikita here. Better add 'memtx_' prefix to the
transaction manager's name. And to the option name too.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine
  2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
                   ` (16 preceding siblings ...)
  2020-07-15 16:40 ` Aleksandr Lyapunov
@ 2020-07-16  0:05 ` Vladislav Shpilevoy
  17 siblings, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:05 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Hi! Thanks for the patchset!

On 15.07.2020 15:55, Aleksandr Lyapunov wrote:
> Changes in v3:
>   - Fixes after code review
>   - Lots of comments added
>   - Code cleanup
>   - A couple of bugs fixed
> 
> Aleksandr Lyapunov (13):
>   Update license file (2020)
>   Check data_offset overflow in struct tuple
>   vinyl: rename tx_manager -> vy_tx_manager
>   txm: introduce dirty tuples
>   txm: save txn in txn_stmt
>   txm: add TX status
>   txm: save does_require_old_tuple flag in txn_stmt
>   txm: introduce tx manager
>   tmx: introduce prepare sequence number
>   tmx: introduce conflict tracker

In some commit titles you use tmx instead of txm. Is it
intentional? If yes, then what tmx means?

>   txm: introduce txm_story
>   txm: clarify all fetched tuples
>   tmx: use new tx manager in memtx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples Aleksandr Lyapunov
  2020-07-15 16:22   ` Nikita Pettik
@ 2020-07-16  0:05   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:05 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

> diff --git a/src/box/tuple.h b/src/box/tuple.h
> index 9a88772..4752323 100644
> --- a/src/box/tuple.h
> +++ b/src/box/tuple.h
> @@ -1081,8 +1087,10 @@ tuple_unref(struct tuple *tuple)
>  	assert(tuple->refs - 1 >= 0);
>  	if (unlikely(tuple->is_bigref))
>  		tuple_unref_slow(tuple);
> -	else if (--tuple->refs == 0)
> +	else if (--tuple->refs == 0) {
> +		assert(!tuple->is_dirty);

My question regarding this assertion remains - who is supposed
to remove the flag and when?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 06/13] txm: add TX status
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 06/13] txm: add TX status Aleksandr Lyapunov
  2020-07-15 16:42   ` Nikita Pettik
@ 2020-07-16  0:08   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:08 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

Firstly, I agree with all comments from Nikita.

> diff --git a/src/box/txn.h b/src/box/txn.h
> index 36b1a03..e261852 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -121,6 +121,40 @@ enum {
>  };
>  
>  /**
> + * Status of a transaction.
> + */
> +enum txn_status {
> +	/**
> +	 * Initial state of TX. The only state of a TX that allowed to do

Lets follow the code style and keep the comments in 66 symbols
border. In this commit and in all the others.

> +	 * read or write actions.
> +	 */
> +	TXN_INPROGRESS,
> +	/**
> +	 * The TX have passed conflict checks and is ready to be committed.
> +	 */
> +	TXN_PREPARED,
> +	/**
> +	 * The TX was aborted when other TX was committed due to conflict.
> +	 */
> +	TXN_CONFLICTED,
> +	/**
> +	 * The TX was read_only, has a conflict and was sent to read view.
> +	 * Read-only and does not participate in conflict resolution ever more.
> +	 * This transaction can onlu see state of the database at some fixed
> +	 * point in the past.
> +	 */
> +	TXN_IN_READ_VIEW,
> +	/**
> +	 * The TX was committed.
> +	 */
> +	TXN_COMMITTED,
> +	/**
> +	 * The TX was aborted by user.
> +	 */
> +	TXN_ABORTED,
> +};

I started a ticket for follow-up optimisations. I suggest to collect all of
them here: https://github.com/tarantool/tarantool/issues/5172. The ones
which are trivial and don't need separate tickets.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt Aleksandr Lyapunov
  2020-07-15 16:49   ` Nikita Pettik
@ 2020-07-16  0:09   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:09 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

See 3 comments below.

On 15.07.2020 15:55, Aleksandr Lyapunov wrote:
> That flag is needed for transactional conflict manager - if any
> other transaction replaces old_tuple before current one and the flag
> is set - the current transaction will be aborted.
> For example REPLACE just replaces a key, no matter what tuple
> lays in the index and thus does_require_old_tuple = false.
> In contrast, UPDATE makes new tuple using old_tuple and thus
> the statement will require old_tuple (does_require_old_tuple = true).
> INSERT also does_require_old_tuple = true because it requires
> old_tuple to be NULL.
> 
> Part of #4897
> ---
>  src/box/memtx_space.c | 17 +++++++++++++++++
>  src/box/txn.c         |  3 +++
>  src/box/txn.h         | 13 +++++++++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
> index 8452ab4..e48ed3a 100644
> --- a/src/box/memtx_space.c
> +++ b/src/box/memtx_space.c
> @@ -316,6 +316,10 @@ memtx_space_execute_replace(struct space *space, struct txn *txn,
>  	if (stmt->new_tuple == NULL)
>  		return -1;
>  	tuple_ref(stmt->new_tuple);
> +
> +	if (mode == DUP_INSERT)
> +		stmt->does_require_old_tuple = true;

1. Yeah, now I understand. NULL is also considered a tuple in txm.
Would be nice to have a comment here about it. In other places too,
btw. This is not so trivial, and it won't harm to write some
explanations.

> +
>  	if (memtx_space->replace(space, NULL, stmt->new_tuple,
>  				 mode, &stmt->old_tuple) != 0)
>  		return -1;
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 0372047..d254edb 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -360,6 +361,8 @@ txn_commit_stmt(struct txn *txn, struct request *request)
>  	 */
>  	if (stmt->space != NULL && !rlist_empty(&stmt->space->on_replace) &&
>  	    stmt->space->run_triggers && (stmt->old_tuple || stmt->new_tuple)) {
> +		/* Triggers see old_tuple and that tuple must remain the same */
> +		stmt->does_require_old_tuple = true;

2. What about before_replace triggers?

>  		int rc = 0;
>  		if(!space_is_temporary(stmt->space)) {
>  			rc = trigger_run(&stmt->space->on_replace, txn);
> diff --git a/src/box/txn.h b/src/box/txn.h
> index e261852..962ada0 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -175,6 +175,19 @@ struct txn_stmt {
>  	struct xrow_header *row;
>  	/** on_commit and/or on_rollback list is not empty. */
>  	bool has_triggers;
> +	/**
> +	 * Whether the stmt requires to replace exactly old_tuple (member).
> +	 * That flag is needed for transactional conflict manager - if any
> +	 * other transaction replaces old_tuple before current one and the flag
> +	 * is set - the current transaction will be aborted.

3. Do you mean the other transaction *commits* replacement of the old_tuple?
Because if there was another transaction touched the old tuple, but the current
one managed to commit earlier, the current txn won't be aborted. The other txn
will be.

> +	 * For example REPLACE just replaces a key, no matter what tuple
> +	 * lays in the index and thus does_require_old_tuple = false.
> +	 * In contrast, UPDATE makes new tuple using old_tuple and thus
> +	 * the statement will require old_tuple (does_require_old_tuple = true).
> +	 * INSERT also does_require_old_tuple = true because it requires
> +	 * old_tuple to be NULL.
> +	 */
> +	bool does_require_old_tuple;
>  	/** Commit/rollback triggers associated with this statement. */
>  	struct rlist on_commit;
>  	struct rlist on_rollback;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager Aleksandr Lyapunov
  2020-07-15 16:51   ` Nikita Pettik
  2020-07-15 22:01   ` Vladislav Shpilevoy
@ 2020-07-16  0:10   ` Vladislav Shpilevoy
  2 siblings, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:10 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

> diff --git a/src/box/txn.h b/src/box/txn.h
> index 962ada0..a2374f3 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -700,6 +706,12 @@ box_txn_savepoint(void);
>  API_EXPORT int
>  box_txn_rollback_to_savepoint(box_txn_savepoint_t *savepoint);
>  
> +void
> +tx_manager_init();

In addition to Nikita's comment about adding memtx_ prefix to the
tx manager, there a more global problem - the memtx tx manager
lives in txn.h/.c. But since it is memtx-specific, it would be
more correct to have a separate file for it, memtx_tx.h, like
we have for vinyl vy_tx.h/.c. Otherwise this manager took place
of the future global tx manager.

What is also bothering me is that you use struct txn and
struct txn_stmt for the memtx-specific tx manager things. But
specially for engine-local transactions we already have txn->engine_tx
and txn_stmt->engine_savepoint.

The same for txm_story, conflict manager and all the other new
structs and functions. They all are called txm, and are located
in txn.c, but are related to memtx only, and are used only in it.

So my old comment about moving the new code into memtx-specific
structs is relevant again.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number Aleksandr Lyapunov
  2020-07-15 17:13   ` Nikita Pettik
@ 2020-07-16  0:11   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:11 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

On 15.07.2020 15:55, Aleksandr Lyapunov wrote:
> Prepare sequence number is a monotonically increasing ID that is
> assigned to any prepared transaction. This ID is suitable for
> serialization order resolution: the bigger is ID - the latter the

1. Latter -> later.

> transaction exist in the serialization order of transactions.

2. What it means 'later in the order'? It can mean both older and
newer depending on from where to count. Would be better to call it
explicitly 'newer' or 'older'.

> Note that id of transactions has quite different order in case
> when transaction could yield - an younger (bigger id) transaction
> can prepare/commit first (lower psn) while older tx sleeps in vain.
> 
> Also it should be mentioned that LSN has the same order as PSN,
> but it has two general differences:
> 1. The LSN sequence has no holes, i.e. it is a natural number
> sequence. This property is useless for transaction engine.
> 2. The LSN sequence is provided by WAL writer and thus LSN is not
> available for TX thas was prepared and haven't been committed yet.
> That feature makes psn more suitable sequence for transactions as
> it allows to order prepared but not committed transaction and
> allows, for example, to create a read view between prepared
> transactions.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 10/13] tmx: introduce conflict tracker
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 10/13] tmx: introduce conflict tracker Aleksandr Lyapunov
@ 2020-07-16  0:16   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:16 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

See 11 comments below.

On 15.07.2020 15:55, Aleksandr Lyapunov wrote:
> There are situations when we have to track that if some TX is
> committed then some others must be aborted due to conflict.
> The common case is that one r/w TX have read some value while the
> second is about to overwrite the value; is the second is committed,

1. 'is the second' -> 'if the second'.

> the first must be aborted.
> Thus we have to store many-to-many TX relations between breaker
> TX and victim TX.
> The patch implements that.
> 
> Part of #4897
> ---
>  src/box/txn.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  src/box/txn.h |  32 ++++++++++++
>  2 files changed, 187 insertions(+), 1 deletion(-)
> 
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 7a15986..ba81b58 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -42,6 +42,12 @@ struct tx_manager
>  {
>  	/** Last prepare-sequence-number that was assigned to prepared TX. */
>  	int64_t last_psn;
> +	/**
> +	 * List of all transactions that are in a read view.
> +	 * New transactions are added to the tail of this list,
> +	 * so the list is ordered by rv_psn.
> +	 */
> +	struct rlist read_view_txs;

2. How is it going to be used? I see you create this list, but it
is not used for anything meaningful (in the branch version on this
commit). You add something, remove, but never search in there.

The same for rv_psn. It is assigned, but never used (on top of
this commit).

>  };
>  
>  /** That's a definition, see declaration for description. */
> @@ -50,6 +56,21 @@ bool tx_manager_use_mvcc_engine = false;
>  /** The one and only instance of tx_manager. */
>  static struct tx_manager txm;
>  
> +/**
> + * Record that links two transactions, breaker and victim.
> + * See txm_cause_conflict for details.
> + */
> +struct tx_conflict_tracker {
> +	/** TX that aborts victim on commit. */
> +	struct txn *breaker;
> +	/** TX that aborts will be aborted on breaker's commit. */

3. 'aborts' seems like copy-paste artifact. I guess you meant:

    TX that will be aborted on breaker's commit

> +	struct txn *victim;
> +	/** Link in breaker->conflict_list. */
> +	struct rlist in_conflict_list;
> +	/** Link in victim->conflicted_by_list. */
> +	struct rlist in_conflicted_by_list;
> +};
> +
>  double too_long_threshold;
>  
>  /* Txn cache. */
> @@ -647,6 +699,32 @@ txn_journal_entry_new(struct txn *txn)
>  	return req;
>  }
>  
> +/**
> + * Handle conflict when @breaker transaction is prepared.
> + * The conflict is happened if @victim have read something that @breaker
> + * overwrites.
> + * If @victim is read-only or haven't made any changes, it should be send

4. send -> sent.

> + * to read view, in which is will not see @breaker.
> + * Otherwise @vistim must be marked as conflicted.

5. vistim -> victim.

Also please take a look at https://github.com/tarantool/tarantool/wiki/Code-review-procedure
for correct parameters referencing in the comments. Grep by '@a'.
Apply the fixes to this place and all the others in all the
commits.

> + */
> +static void
> +txn_handle_conflict(struct txn *breaker, struct txn *victim)
> +{
> +	assert(breaker->psn != 0);
> +	if (!victim->status != TXN_INPROGRESS) {

6. I assume the first '!' is redundant. And is even a bug.
(You wrote '!victim->status' instead of 'victim->status').

> +		/* Was conflicted by somebody else. */
> +		return;
> +	}
> +	if (stailq_empty(&victim->stmts)) {

7. Unfortunately, the statement list emptiness is not the
only sign of the transaction being read-only. The list can
be not empty, and still the transaction can be read-only.
For example, it had some statements, but rolled them back.
Or its statements were turned into NOP in a before_replace
trigger.

> +		/* Send to read view. */
> +		victim->rv_psn = breaker->psn;
> +		rlist_add_tail(&txm.read_view_txs, &victim->in_read_view_txs);
> +	} else {
> +		/* Mark as conflicted. */
> +		victim->status = TXN_CONFLICTED;

8. What if tx is conflicted, but the next thing it does
is rollback of all its statements, and then calls
box.commit()? In that case the transaction is empty, and
technically nothing should prevent it from committing. It
won't change anything already.

More general question - what if transaction is conflicted,
but rolled back the conflicted statements? And then tried
to commit the rest.

> +	}
> +}
> +
>  /*
>   * Prepare a transaction using engines.
>   */
> @@ -670,6 +748,17 @@ txn_prepare(struct txn *txn)
>  		diag_set(ClientError, ER_FOREIGN_KEY_CONSTRAINT);
>  		return -1;
>  	}
> +
> +	/*
> +	 * Somebody else has written some value that we have read.
> +	 * The transaction is not possible.
> +	 */
> +	if (txn->status == TXN_CONFLICTED || txn->status == TXN_IN_READ_VIEW) {

9. What if it is a read-only transaction? It shouldn't fail
on box.commit() I think. Because it didn't do anything. But
from this code it seems it will fail, because of TXN_IN_READ_VIEW.

Hard to tell, because this patch does not assign TXN_IN_READ_VIEW
anywhere.

> +		diag_set(ClientError, ER_TRANSACTION_CONFLICT);
> +		return -1;
> +	}
> +	assert(txn->status == TXN_INPROGRESS);
> +
>  	/*
>  	 * Perform transaction conflict resolution. Engine == NULL when
>  	 * we have a bunch of IPROTO_NOP statements.
> @@ -1184,10 +1293,55 @@ txn_on_yield(struct trigger *trigger, void *event)
>  void
>  tx_manager_init()
>  {
> -	(void)txm;
> +	rlist_create(&txm.read_view_txs);
>  }
>  
>  void
>  tx_manager_free()
>  {
>  }
> +
> +int
> +txm_cause_conflict(struct txn *breaker, struct txn *victim)
> +{
> +	struct tx_conflict_tracker *tracker = NULL;
> +	struct rlist *r1 = breaker->conflict_list.next;
> +	struct rlist *r2 = victim->conflicted_by_list.next;
> +	while (r1 != &breaker->conflict_list &&
> +	       r2 != &victim->conflicted_by_list) {
> +		tracker = rlist_entry(r1, struct tx_conflict_tracker,
> +				      in_conflict_list);
> +		assert(tracker->breaker == breaker);
> +		if (tracker->victim == victim)
> +			break;
> +		tracker = rlist_entry(r2, struct tx_conflict_tracker,
> +				      in_conflicted_by_list);
> +		assert(tracker->victim == victim);
> +		if (tracker->breaker == breaker)
> +			break;
> +		tracker = NULL;
> +		r1 = r1->next;
> +		r2 = r2->next;
> +	}
> +	if (tracker != NULL) {
> +		/* Move to the beginning of a list
> +		 * for a case of subsequent lookups */

10. Lets use the correct code style and put /* and */
on separate lines. Also put a dot in the end of the
sentence.

> +		rlist_del(&tracker->in_conflict_list);
> +		rlist_del(&tracker->in_conflicted_by_list);
> +	} else {
> +		size_t size;
> +		tracker = region_alloc_object(&victim->region,
> +					      struct tx_conflict_tracker,
> +					      &size);
> +		if (tracker == NULL) {
> +			diag_set(OutOfMemory, size, "tx region",
> +				 "conflict_tracker");
> +			return -1;
> +		}
> +		tracker->breaker = breaker;
> +		tracker->victim = victim;
> +	}
> +	rlist_add(&breaker->conflict_list, &tracker->in_conflict_list);
> +	rlist_add(&victim->conflicted_by_list, &tracker->in_conflicted_by_list);
> +	return 0;
> +}
> diff --git a/src/box/txn.h b/src/box/txn.h
> index b5b94fd..03ccb76 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -717,6 +738,17 @@ tx_manager_init();
>  void
>  tx_manager_free();
>  
> +/**
> + * Notify TX manager that if transaction @breaker is committed then the
> + * transactions @victim must be aborted due to conflict.

11. transactions -> transaction.

> + * For example: there's two rw transaction in progress, one have read
> + * some value while the second is about to overwrite it. If the second
> + * is committed first, the first must be aborted.
> + * @return 0 on success, -1 on memory error.
> + */
> +int
> +txm_cause_conflict(struct txn *breaker, struct txn *victim);
> +
>  #if defined(__cplusplus)
>  } /* extern "C" */
>  #endif /* defined(__cplusplus) */
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story Aleksandr Lyapunov
@ 2020-07-16  0:20   ` Vladislav Shpilevoy
  2020-07-17  6:16     ` Aleksandr Lyapunov
  2020-07-16 22:25   ` Vladislav Shpilevoy
  1 sibling, 1 reply; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16  0:20 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

See 18 comments below.

On 15.07.2020 15:55, Aleksandr Lyapunov wrote:
> TXM story is a part of a history of a value in space.
> It's a story about a tuple, from the point it was added to space
> to the point when it was deleted from the space.
> All stories are linked into a list of stories of the same key of
> each index.
> 
> Part of #4897
> ---
>  src/box/txn.c | 798 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/box/txn.h | 193 ++++++++++++++
>  2 files changed, 991 insertions(+)
> 
> diff --git a/src/box/txn.c b/src/box/txn.c
> index ba81b58..4c46b60 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -37,6 +37,28 @@
>  #include "xrow.h"
>  #include "errinj.h"
>  #include "iproto_constants.h"
> +#include "small/mempool.h"
> +
> +static uint32_t
> +txm_story_key_hash(const struct tuple *a)
> +{
> +	uintptr_t u = (uintptr_t)a;
> +	if (sizeof(uintptr_t) <= sizeof(uint32_t))
> +		return u;
> +	else
> +		return u ^ (u >> 32);
> +}
> +
> +#define mh_name _history

1. Why is the hash called 'history', but the objects are 'story'?
Or as 'history' you mean a list of stories of the same key in one
index?

> +#define mh_key_t struct tuple *
> +#define mh_node_t struct txm_story *
> +#define mh_arg_t int
> +#define mh_hash(a, arg) (txm_story_key_hash((*(a))->tuple))
> +#define mh_hash_key(a, arg) (txm_story_key_hash(a))
> +#define mh_cmp(a, b, arg) ((*(a))->tuple != (*(b))->tuple)
> +#define mh_cmp_key(a, b, arg) ((a) != (*(b))->tuple)
> +#define MH_SOURCE
> +#include "salad/mhash.h"
>  
>  struct tx_manager
>  {
> @@ -48,6 +70,14 @@ struct tx_manager
>  	 * so the list is ordered by rv_psn.
>  	 */
>  	struct rlist read_view_txs;
> +	/** Mempools for tx_story objects with difference index count. */

2. difference -> different.

> +	struct mempool txm_story_pool[BOX_INDEX_MAX];
> +	/** Hash table tuple -> txm_story of that tuple. */
> +	struct mh_history_t *history;
> +	/** List of all txm_story objects. */
> +	struct rlist all_stories;
> +	/** Iterator that sequentially traverses all txm_story objects. */
> +	struct rlist *traverse_all_stories;

3. The iterator is initialized and assigned to something, but is
never used for anything meaningful. The same for all_stories list.
At least on top of this commit.

>  };
>  
>  /** That's a definition, see declaration for description. */
> @@ -1294,11 +1327,23 @@ void
>  tx_manager_init()
>  {
>  	rlist_create(&txm.read_view_txs);
> +	for (size_t i = 0; i < BOX_INDEX_MAX; i++) {
> +		size_t item_size = sizeof(struct txm_story) +
> +			i * sizeof(struct txm_story_link);
> +		mempool_create(&txm.txm_story_pool[i],
> +			       cord_slab_cache(), item_size);

4. Is it correct you also create a pool for 0 index count?
Why?

Probably would be much simpler to just create one mempool with
objects having BOX_INDEX_MAX records in them. It would also solve
the DDL problem. At least partially.

Can the story objects be allocated on txn's region? AFAIU,
stories are created by transactions, and can't outlive their
creator.

> +	}
> +	txm.history = mh_history_new();
> +	rlist_create(&txm.all_stories);
> +	txm.traverse_all_stories = &txm.all_stories;
>  }
>  
>  void
>  tx_manager_free()
>  {
> +	mh_history_delete(txm.history);
> +	for (size_t i = 0; i < BOX_INDEX_MAX; i++)
> +		mempool_destroy(&txm.txm_story_pool[i]);
>  }
>  
>  int
> @@ -1345,3 +1390,756 @@ txm_cause_conflict(struct txn *breaker, struct txn *victim)
>  	rlist_add(&victim->conflicted_by_list, &tracker->in_conflicted_by_list);
>  	return 0;
>  }
> +
> +/**
> + * Creates new story and links it with the @tuple.
> + * @return story on success, NULL on error (diag is set).
> + */
> +static struct txm_story *
> +txm_story_new(struct space *space, struct tuple *tuple)
> +{
> +	assert(!tuple->is_dirty);
> +	uint32_t index_count = space->index_count;
> +	assert(index_count < BOX_INDEX_MAX);
> +	struct mempool *pool = &txm.txm_story_pool[index_count];
> +	struct txm_story *story = (struct txm_story *)mempool_alloc(pool);
> +	if (story == NULL) {
> +		size_t item_size = sizeof(struct txm_story) +
> +			index_count * sizeof(struct txm_story_link);
> +		diag_set(OutOfMemory, item_size, "tx_manager", "tx story");

5. Diag OOM takes size, allocator, and variable name. Here the
allocator is "mempool_alloc", and the variable name is "story".
Lets be consistent with other diag_sets. The same for all the
other new places.

> +		return story;

6. I know story is NULL, but why not to return NULL explicitly?
This looks confusing.

> +	}
> +	story->tuple = tuple;

7. Nit: would be better to call tuple_ref() right near this assignment.
For the sake of OOM handler below, worth moving this line down to the
tuple_ref() call below.

> +
> +	const struct txm_story **put_story = (const struct txm_story **)&story;
> +	struct txm_story **empty = NULL;
> +	mh_int_t pos = mh_history_put(txm.history, put_story, &empty, 0);
> +	if (pos == mh_end(txm.history)) {
> +		mempool_free(pool, story);
> +		diag_set(OutOfMemory, pos + 1,
> +			 "tx_manager", "tx history hash table");
> +		return NULL;
> +	}
> +	tuple->is_dirty = true;
> +	tuple_ref(tuple);
> +
> +	story->index_count = index_count;
> +	story->add_stmt = NULL;
> +	story->add_psn = 0;
> +	story->del_stmt = NULL;
> +	story->del_psn = 0;
> +	rlist_create(&story->reader_list);
> +	rlist_add_tail(&txm.all_stories, &story->in_all_stories);
> +	memset(story->link, 0, sizeof(story->link[0]) * index_count);
> +	return story;
> +}
> +
> +static void
> +txm_story_delete(struct txm_story *story);
> +
> +/**
> + * Creates new story of a @tuple that was added by @stmt.

8. Comments for functions should use imperative mood. The same
in other similar places.

> + * @return story on success, NULL on error (diag is set).
> + */
> +static struct txm_story *
> +txm_story_new_add_stmt(struct tuple *tuple, struct txn_stmt *stmt)
> +{
> +	struct txm_story *res = txm_story_new(stmt->space, tuple);
> +	if (res == NULL)
> +		return NULL;
> +	res->add_stmt = stmt;
> +	assert(stmt->add_story == NULL);
> +	stmt->add_story = res;
> +	return res;
> +}
> +
> +/**
> + * Creates new story of a @tuple that was deleted by @stmt.
> + * @return story on success, NULL on error (diag is set).
> + */
> +static struct txm_story *
> +txm_story_new_del_stmt(struct tuple *tuple, struct txn_stmt *stmt)
> +{
> +	struct txm_story *res = txm_story_new(stmt->space, tuple);
> +	if (res == NULL)
> +		return NULL;
> +	res->del_stmt = stmt;
> +	assert(stmt->del_story == NULL);
> +	stmt->del_story = res;
> +	return res;
> +}
> +
> +/**
> + * Undo txm_story_new_add_stmt.
> + */
> +static void
> +txm_story_delete_add_stmt(struct txm_story *story)
> +{
> +	story->add_stmt->add_story = NULL;
> +	story->add_stmt = NULL;
> +	txm_story_delete(story);
> +}
> +
> +/**
> + * Undo txm_story_new_del_stmt.
> + */
> +static void
> +txm_story_delete_del_stmt(struct txm_story *story)
> +{
> +	story->del_stmt->del_story = NULL;
> +	story->del_stmt = NULL;
> +	txm_story_delete(story);
> +}
> +
> +
> +/**
> + * Find a story of a @tuple. The story expected to be present (assert).
> + */
> +static struct txm_story *
> +txm_story_get(struct tuple *tuple)
> +{
> +	assert(tuple->is_dirty);
> +
> +	mh_int_t pos = mh_history_find(txm.history, tuple, 0);
> +	assert(pos != mh_end(txm.history));
> +	return *mh_history_node(txm.history, pos);
> +}
> +
> +/**
> + * Get the older tuple, extracting it from older story if necessary.
> + */
> +static struct tuple *
> +txm_story_older_tuple(struct txm_story_link *link)
> +{
> +	return link->older.is_story ? link->older.story->tuple
> +				    : link->older.tuple;

9. Why so? If older.is_story, it means it stores a story,
not a tuple. But you return vice versa.

10. Why do you need all the new structs in txn.h if they aren't
used out of txn.c anyway. At least some of them.

> +}
> +
> +/**
> + * Link a @story with older story in @index (in both directions).
> + */
> +static void
> +txm_story_link_story(struct txm_story *story, struct txm_story *older_story,
> +		     uint32_t index)

11. What is index? Index id? Index ordinal number?

> +{
> +	assert(older_story != NULL);
> +	struct txm_story_link *link = &story->link[index];
> +	/* Must be unlinked. */
> +	assert(!link->older.is_story);
> +	assert(link->older.tuple == NULL);
> +	link->older.is_story = true;
> +	link->older.story = older_story;
> +	older_story->link[index].newer_story = story;
> +}
> +
> +/**
> + * Link a @story with older tuple in @index. In case if the tuple is dirty -
> + * find and link with the corresponding story.
> + */
> +static void
> +txm_story_link_tuple(struct txm_story *story, struct tuple *older_tuple,
> +                     uint32_t index)
> +{
> +	struct txm_story_link *link = &story->link[index];
> +	/* Must be unlinked. */
> +	assert(!link->older.is_story);
> +	assert(link->older.tuple == NULL);
> +	if (older_tuple == NULL)
> +		return;
> +	if (older_tuple->is_dirty) {
> +		txm_story_link_story(story, txm_story_get(older_tuple), index);
> +		return;
> +	}
> +	link->older.tuple = older_tuple;
> +	tuple_ref(link->older.tuple);
> +}
> +
> +/**
> + * Unlink a @story with older story/tuple in @index.
> + */
> +static void
> +txm_story_unlink(struct txm_story *story, uint32_t index)
> +{
> +	struct txm_story_link *link = &story->link[index];
> +	if (link->older.is_story) {
> +		link->older.story->link[index].newer_story = NULL;
> +	} else if (link->older.tuple != NULL) {
> +		tuple_unref(link->older.tuple);
> +		link->older.tuple = NULL;
> +	}
> +	link->older.is_story = false;
> +	link->older.tuple = NULL;
> +}
> +
> +/**
> + * Check if a @story is visible for transaction @txn. Return visible tuple to
> + * @visible_tuple (can be set to NULL).
> + * @param is_prepared_ok - whether prepared (not committed) change is acceptable.

12. Would be easier to underand if you would reverse the flag and use the
existing concept 'read committed' for it. So the variable meaning would
be reversed and it would be named like 'do_read_committed_only'.

> + * @param own_change - return true if the change was made by @txn itself.

13. own_change -> is_own_change. In the code too. Check all the other
new flags too.

> + * @return true if the story is visible, false otherwise.
> + */
> +static bool
> +txm_story_is_visible(struct txm_story *story, struct txn *txn,
> +		     struct tuple **visible_tuple, bool is_prepared_ok,
> +		     bool *own_change)
> +{
> +	*own_change = false;
> +	*visible_tuple = NULL;
> +	struct txn_stmt *dels = story->del_stmt;
> +	while (dels != NULL) {
> +		if (dels->txn == txn) {
> +			/* Tuple is deleted by us (@txn). */
> +			*own_change = true;
> +			return true;
> +		}
> +		dels = dels->next_in_del_list;
> +	}
> +	if (is_prepared_ok && story->del_psn != 0 &&
> +	    (txn->rv_psn == 0 || story->del_psn < txn->rv_psn)) {
> +		/* Tuple is deleted by prepared TX. */

14. How does it follow from the condition? I don't understand.
You check some flags for the current transaction, which is
clearly not prepared. And there are no our own del stmt. So
it means you somehow made this conclusion from story->del_psn != 0
and story->del_psn < txn->rv_psn. I don't understand how does
it mean the author of the story is prepared but not committed.
And yet below the condition is almost the same, but means committed.

The same for prepared not committed add_stmt.

> +		return true;
> +	}
> +	if (story->del_psn != 0 && story->del_stmt == NULL &&
> +	    (txn->rv_psn == 0 || story->del_psn < txn->rv_psn)) {
> +		/* Tuple is deleted by committed TX. */
> +		return true;
> +	}
> +
> +	if (story->add_stmt != NULL && story->add_stmt->txn == txn) {
> +		/* Tuple is added by us (@txn). */
> +		*visible_tuple = story->tuple;
> +		*own_change = true;
> +		return true;

15. What if this transaction added the tuple, and then another transaction
also added a tuple with the same key. Shouldn't there by a cycle like for
dels?

> +	}
> +	if (is_prepared_ok && story->add_psn != 0 &&
> +	    (txn->rv_psn == 0 || story->add_psn < txn->rv_psn)) {
> +		/* Tuple is added by another prepared TX. */
> +		*visible_tuple = story->tuple;
> +		return true;
> +	}
> +	if (story->add_psn != 0 && story->add_stmt == NULL &&
> +		(txn->rv_psn == 0 || story->add_psn < txn->rv_psn)) {

16. Bad indentation.

> +		/* Tuple is added by committed TX. */
> +		*visible_tuple = story->tuple;
> +		return true;
> +	}
> +	if (story->add_psn == 0 && story->add_stmt == NULL) {
> +		/* added long time ago. */

17. Did't understand the comment nor the condition.

> +		*visible_tuple = story->tuple;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/**
> + * Temporary (allocated on region) struct that stores a conflicting TX.
> + */
> +struct txn_conflict
> +{
> +	struct txn *other_txn;
> +	struct txn_conflict *next;

18. What are these members?

I am going to stop here. The patch is very interesting, but I need to get
some sleep. Will return to it afterwards.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager
  2020-07-15 16:04   ` Nikita Pettik
@ 2020-07-16  8:17     ` Aleksandr Lyapunov
  0 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-16  8:17 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

I'm not sure.. Actually I believe one there will be only one 
cross-engine transaction engine..
That's why I made it outstanding. But actually I'm note sure it's possible.
What do you think is better? use memtx specific name and if/when it'll 
be possible
to used in both engines - rename it OR use general name and rename to memtx
specific if/when it becomes obvious that it's not possible?

On 15.07.2020 19:04, Nikita Pettik wrote:
> On 15 Jul 16:55, Aleksandr Lyapunov wrote:
>> Apart from other vinyl objects that are named with "vy_" prefix,
>> its transaction manager (tx_manager) have no such prefix.
>> It should have in order to avoid conflicts with global tx manager.
>>
>> Needed for #4897
>> ---
> LGTM. The only concern I have now - mb we'd better call
> new TX manager in memtx not tx_manager, but mem_tx_manager or
> memtx_tx_manager? I mean tx_ prefix is more general than memtx_..
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 02/13] Check data_offset overflow in struct tuple
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 02/13] Check data_offset overflow in struct tuple Aleksandr Lyapunov
@ 2020-07-16 14:27   ` Nikita Pettik
  0 siblings, 0 replies; 41+ messages in thread
From: Nikita Pettik @ 2020-07-16 14:27 UTC (permalink / raw)
  To: Aleksandr Lyapunov; +Cc: tarantool-patches

On 15 Jul 16:55, Aleksandr Lyapunov wrote:
> data_offset member of tuple is uint16_t now. At the same time
> this field is calculated from field_map_size which is uint32_t.
> That could lead to overflows and crashes.
> 
> Fixes #5084
> ---

Pushed to master, 2.4, 2.3 and backported to 1.10 (without multikey
test). Changelogs are updated correspondingly. 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story Aleksandr Lyapunov
  2020-07-16  0:20   ` Vladislav Shpilevoy
@ 2020-07-16 22:25   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 22:25 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

Here I continue the review I didn't finish yesterday.

The patch has changed on the branch. I took the version
I just fetched from it and pasted below among with my
comments inlined.

See 24 comments below. But tbh I have many many more. The patch
broke my brain. I fill physical pain in my head, and I didn't even
review it whole. This patch is really really hard to understad.

>     txm: introduce txm_story
>     
>     TXM story is a part of a history of a value in space.
>     It's a story about a tuple, from the point it was added to space
>     to the point when it was deleted from the space.
>     All stories are linked into a list of stories of the same key of
>     each index.
>     
>     Part of #4897
> 
> diff --git a/src/box/txn.c b/src/box/txn.c
> index bbea5e8ac..3d6180631 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -37,6 +37,28 @@
>  #include "xrow.h"
>  #include "errinj.h"
>  #include "iproto_constants.h"
> +#include "small/mempool.h"
> +
> +static uint32_t
> +txm_story_key_hash(const struct tuple *a)
> +{
> +	uintptr_t u = (uintptr_t)a;
> +	if (sizeof(uintptr_t) <= sizeof(uint32_t))
> +		return u;
> +	else
> +		return u ^ (u >> 32);
> +}
> +
> +#define mh_name _history
> +#define mh_key_t struct tuple *
> +#define mh_node_t struct txm_story *
> +#define mh_arg_t int
> +#define mh_hash(a, arg) (txm_story_key_hash((*(a))->tuple))
> +#define mh_hash_key(a, arg) (txm_story_key_hash(a))
> +#define mh_cmp(a, b, arg) ((*(a))->tuple != (*(b))->tuple)
> +#define mh_cmp_key(a, b, arg) ((a) != (*(b))->tuple)
> +#define MH_SOURCE
> +#include "salad/mhash.h"
>  
>  struct tx_manager
>  {
> @@ -48,6 +70,14 @@ struct tx_manager
>  	 * so the list is ordered by rv_psn.
>  	 */
>  	struct rlist read_view_txs;
> +	/** Mempools for tx_story objects with difference index count. */
> +	struct mempool txm_story_pool[BOX_INDEX_MAX];
> +	/** Hash table tuple -> txm_story of that tuple. */
> +	struct mh_history_t *history;
> +	/** List of all txm_story objects. */
> +	struct rlist all_stories;
> +	/** Iterator that sequentially traverses all txm_story objects. */
> +	struct rlist *traverse_all_stories;
>  };
>  
>  /** That's a definition, see declaration for description. */
> @@ -146,6 +176,9 @@ txn_stmt_new(struct region *region)
>  	stmt->space = NULL;
>  	stmt->old_tuple = NULL;
>  	stmt->new_tuple = NULL;
> +	stmt->add_story = NULL;
> +	stmt->del_story = NULL;
> +	stmt->next_in_del_list = NULL;
>  	stmt->engine_savepoint = NULL;
>  	stmt->row = NULL;
>  	stmt->has_triggers = false;
> @@ -1295,11 +1328,23 @@ void
>  tx_manager_init()
>  {
>  	rlist_create(&txm.read_view_txs);
> +	for (size_t i = 0; i < BOX_INDEX_MAX; i++) {
> +		size_t item_size = sizeof(struct txm_story) +
> +			i * sizeof(struct txm_story_link);
> +		mempool_create(&txm.txm_story_pool[i],
> +			       cord_slab_cache(), item_size);
> +	}
> +	txm.history = mh_history_new();
> +	rlist_create(&txm.all_stories);
> +	txm.traverse_all_stories = &txm.all_stories;
>  }
>  
>  void
>  tx_manager_free()
>  {
> +	mh_history_delete(txm.history);
> +	for (size_t i = 0; i < BOX_INDEX_MAX; i++)
> +		mempool_destroy(&txm.txm_story_pool[i]);
>  }
>  
>  int
> @@ -1346,3 +1391,759 @@ txm_cause_conflict(struct txn *breaker, struct txn *victim)
>  	rlist_add(&victim->conflicted_by_list, &tracker->in_conflicted_by_list);
>  	return 0;
>  }
> +
> +/**
> + * Creates new story and links it with the @tuple.
> + * @return story on success, NULL on error (diag is set).
> + */
> +static struct txm_story *
> +txm_story_new(struct space *space, struct tuple *tuple)
> +{
> +	assert(!tuple->is_dirty);
> +	uint32_t index_count = space->index_count;
> +	assert(index_count < BOX_INDEX_MAX);
> +	struct mempool *pool = &txm.txm_story_pool[index_count];
> +	struct txm_story *story = (struct txm_story *)mempool_alloc(pool);
> +	if (story == NULL) {
> +		size_t item_size = sizeof(struct txm_story) +
> +			index_count * sizeof(struct txm_story_link);
> +		diag_set(OutOfMemory, item_size, "tx_manager", "tx story");
> +		return story;
> +	}
> +	story->tuple = tuple;
> +
> +	const struct txm_story **put_story = (const struct txm_story **)&story;
> +	struct txm_story **empty = NULL;
> +	mh_int_t pos = mh_history_put(txm.history, put_story, &empty, 0);
> +	if (pos == mh_end(txm.history)) {
> +		mempool_free(pool, story);
> +		diag_set(OutOfMemory, pos + 1,
> +			 "tx_manager", "tx history hash table");
> +		return NULL;
> +	}
> +	tuple->is_dirty = true;
> +	tuple_ref(tuple);
> +
> +	story->index_count = index_count;
> +	story->add_stmt = NULL;
> +	story->add_psn = 0;
> +	story->del_stmt = NULL;
> +	story->del_psn = 0;
> +	rlist_create(&story->reader_list);
> +	rlist_add_tail(&txm.all_stories, &story->in_all_stories);
> +	memset(story->link, 0, sizeof(story->link[0]) * index_count);
> +	return story;
> +}
> +
> +static void
> +txm_story_delete(struct txm_story *story);
> +
> +/**
> + * Creates new story of a @tuple that was added by @stmt.
> + * @return story on success, NULL on error (diag is set).
> + */
> +static struct txm_story *
> +txm_story_new_add_stmt(struct tuple *tuple, struct txn_stmt *stmt)
> +{
> +	struct txm_story *res = txm_story_new(stmt->space, tuple);
> +	if (res == NULL)
> +		return NULL;
> +	res->add_stmt = stmt;
> +	assert(stmt->add_story == NULL);
> +	stmt->add_story = res;
> +	return res;
> +}
> +
> +/**
> + * Creates new story of a @tuple that was deleted by @stmt.
> + * @return story on success, NULL on error (diag is set).
> + */
> +static struct txm_story *
> +txm_story_new_del_stmt(struct tuple *tuple, struct txn_stmt *stmt)
> +{
> +	struct txm_story *res = txm_story_new(stmt->space, tuple);
> +	if (res == NULL)
> +		return NULL;
> +	res->del_stmt = stmt;
> +	assert(stmt->del_story == NULL);
> +	stmt->del_story = res;
> +	return res;
> +}
> +
> +/**
> + * Undo txm_story_new_add_stmt.
> + */
> +static void
> +txm_story_delete_add_stmt(struct txm_story *story)
> +{
> +	story->add_stmt->add_story = NULL;
> +	story->add_stmt = NULL;
> +	txm_story_delete(story);
> +}
> +
> +/**
> + * Undo txm_story_new_del_stmt.
> + */
> +static void
> +txm_story_delete_del_stmt(struct txm_story *story)
> +{
> +	story->del_stmt->del_story = NULL;
> +	story->del_stmt = NULL;
> +	txm_story_delete(story);
> +}
> +
> +
> +/**
> + * Find a story of a @tuple. The story expected to be present (assert).
> + */
> +static struct txm_story *
> +txm_story_get(struct tuple *tuple)
> +{
> +	assert(tuple->is_dirty);
> +
> +	mh_int_t pos = mh_history_find(txm.history, tuple, 0);
> +	assert(pos != mh_end(txm.history));
> +	return *mh_history_node(txm.history, pos);
> +}
> +
> +/**
> + * Get the older tuple, extracting it from older story if necessary.
> + */
> +static struct tuple *
> +txm_story_older_tuple(struct txm_story_link *link)
> +{
> +	return link->older.is_story ? link->older.story->tuple
> +				    : link->older.tuple;
> +}
> +
> +/**
> + * Link a @story with older story in @index (in both directions).
> + */
> +static void
> +txm_story_link_story(struct txm_story *story, struct txm_story *older_story,
> +		     uint32_t index)
> +{
> +	assert(older_story != NULL);
> +	struct txm_story_link *link = &story->link[index];
> +	/* Must be unlinked. */
> +	assert(!link->older.is_story);
> +	assert(link->older.tuple == NULL);
> +	link->older.is_story = true;
> +	link->older.story = older_story;
> +	older_story->link[index].newer_story = story;
> +}
> +
> +/**
> + * Link a @story with older tuple in @index. In case if the tuple is dirty -
> + * find and link with the corresponding story.
> + */
> +static void
> +txm_story_link_tuple(struct txm_story *story, struct tuple *older_tuple,
> +                     uint32_t index)
> +{
> +	struct txm_story_link *link = &story->link[index];
> +	/* Must be unlinked. */
> +	assert(!link->older.is_story);
> +	assert(link->older.tuple == NULL);
> +	if (older_tuple == NULL)
> +		return;
> +	if (older_tuple->is_dirty) {
> +		txm_story_link_story(story, txm_story_get(older_tuple), index);
> +		return;
> +	}
> +	link->older.tuple = older_tuple;
> +	tuple_ref(link->older.tuple);
> +}
> +
> +/**
> + * Unlink a @story with older story/tuple in @index.
> + */
> +static void
> +txm_story_unlink(struct txm_story *story, uint32_t index)
> +{
> +	struct txm_story_link *link = &story->link[index];
> +	if (link->older.is_story) {
> +		link->older.story->link[index].newer_story = NULL;
> +	} else if (link->older.tuple != NULL) {
> +		tuple_unref(link->older.tuple);
> +		link->older.tuple = NULL;
> +	}
> +	link->older.is_story = false;
> +	link->older.tuple = NULL;
> +}
> +
> +/**
> + * Check if a @story is visible for transaction @txn. Return visible tuple to
> + * @visible_tuple (can be set to NULL).
> + * @param is_prepared_ok - whether prepared (not committed) change is acceptable.
> + * @param own_change - return true if the change was made by @txn itself.
> + * @return true if the story is visible, false otherwise.
> + */
> +static bool
> +txm_story_is_visible(struct txm_story *story, struct txn *txn,
> +		     struct tuple **visible_tuple, bool is_prepared_ok,
> +		     bool *own_change)

1. The fact that this functions is called 'is_visible', but returns 3!!!
values, including one tuple, is very wrong. The name is simply misleading.
It does not just check something and returns bool. It looks for something,
does some heavy actions. 'is' definitely does not fit here.

> +{
> +	*own_change = false;
> +	*visible_tuple = NULL;
> +
> +	int64_t rv_psn = INT64_MAX;
> +	if (txn != NULL && txn->rv_psn != 0)
> +		rv_psn = txn->rv_psn;
> +
> +	struct txn_stmt *dels = story->del_stmt;
> +	while (dels != NULL) {
> +		if (dels->txn == txn) {
> +			/* Tuple is deleted by us (@txn). */
> +			*own_change = true;
> +			return true;
> +		}
> +		dels = dels->next_in_del_list;
> +	}
> +	if (is_prepared_ok && story->del_psn != 0 && story->del_psn < rv_psn) {
> +		/* Tuple is deleted by prepared TX. */

2. What if story->del_stmt is NULL? It would mean the transaction is already
committed. But still this condition won't pass. The one below will pass,
but looks strange. Like if there are transactions, which need *only* prepared,
but not committed tuples. I would propose this:

	if (story->del_psn != 0 && story->del_psn < rv_psn) {
		/* Check if already committed. */
		if (story->del_stmt == NULL)
			return true;
		/* The stmt is prepared, because psn is set, and the stmt is still here. */
		if (is_prepared_ok)
			return true;
	}

Also I don't really like that we need to check committed/prepared/in-progress
state by these indirect attributes. Would be better to have a state or flags in
the story object.

> +		return true;
> +	}
> +	if (story->del_psn != 0 && story->del_stmt == NULL &&
> +	    story->del_psn < rv_psn) {
> +		/* Tuple is deleted by committed TX. */
> +		return true;
> +	}
> +
> +	if (story->add_stmt != NULL && story->add_stmt->txn == txn) {
> +		/* Tuple is added by us (@txn). */
> +		*visible_tuple = story->tuple;
> +		*own_change = true;
> +		return true;
> +	}
> +	if (is_prepared_ok && story->add_psn != 0 && story->add_psn < rv_psn) {

3. The same problem as for the comment above.

These checks for del and add stmt could be generalized if you
would store psn and stmt as a struct. Like this:

	struct txm_story_stmt {
		int64_t psn;
		struct txn_stmt *txn_stmt;
	};

Then you add two of them to the story object:

	struct txm_story {
		struct txm_story_stmt add_stmt;
		struct txm_story_stmt del_stmt;
		...
	};

Then you can have functions with meaningful names to do the
checks you do here.

	static inline bool
	txm_story_stmt_is_prepared(struct txm_story_stmt *stmt)
	{
		return stmt->psn > 0 && stmt->txn_stmt != NULL;
	}

	static inline bool
	txm_story_stmt_is_committed(struct txm_story_stmt *stmt)
	{
		return stmt->psn > 0 && stmt->txn_stmt == NULL;
	}

And more. And use them like this:

	static bool
	txm_story_is_visible(...)
	{
		...
		if (txm_story_stmt_is_committed(&story->del_stmt))
			return true;
		if (txm_story_stmt_is_committed(&story->add_stmt))
			return true;
		if (!is_prepared_ok)
			return false;
		if (txm_story_stmt_is_prepared(&story->del_stmt))
			return true;
		if (txm_story_stmt_is_prepared(&story->add_stmt))
			return true;
		...
	}

> +		/* Tuple is added by another prepared TX. */
> +		*visible_tuple = story->tuple;
> +		return true;
> +	}
> +	if (story->add_psn != 0 && story->add_stmt == NULL &&
> +	    story->add_psn < rv_psn) {
> +		/* Tuple is added by committed TX. */
> +		*visible_tuple = story->tuple;
> +		return true;
> +	}
> +	if (story->add_psn == 0 && story->add_stmt == NULL) {
> +		/* added long time ago. */
> +		*visible_tuple = story->tuple;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/**
> + * Temporary (allocated on region) struct that stores a conflicting TX.
> + */
> +struct txn_conflict
> +{
> +	struct txn *other_txn;
> +	struct txn_conflict *next;
> +};
> +
> +/**
> + * Save @other_txn in list with head @coflicts_head. New list node is allocated
> + * on @region.
> + * @return 0 on success, -1 on memory error.
> + */
> +static int
> +txm_save_conflict(struct txn *other_txn, struct txn_conflict **coflicts_head,

4. coflicts_head -> conflicts_head. In the comment too.

> +		  struct region *region)
> +{
> +	size_t err_size;
> +	struct txn_conflict *next_conflict;
> +	next_conflict = region_alloc_object(region, struct txn_conflict,
> +					    &err_size);
> +	if (next_conflict == NULL) {
> +		diag_set(OutOfMemory, err_size, "txn_region", "txn conflict");
> +		return -1;
> +	}
> +	next_conflict->other_txn = other_txn;
> +	next_conflict->next = *coflicts_head;
> +	*coflicts_head = next_conflict;

5. Why can't you just return the new head? Or NULL in case of OOM. This
is how we usually do functions which allocate something.

> +	return 0;
> +}
> +
> +/**
> + * Scan a history starting by @stmt statement in @index for a visible tuple
> + * (prepared suits), returned via @visible_replaced.
> + * Collect a list of transactions that will abort current transaction if they
> + * are committed.
> + *
> + * @return 0 on success, -1 on memory error.
> + */
> +static int
> +txm_story_find_visible(struct txn_stmt *stmt, struct txm_story *story,

6. If a function is a method of a struct, it should take the struct in the
first argument.

> +		       uint32_t index, struct tuple **visible_replaced,

7. Why is it called visible_replaced instead of just 'result'? What 'replaced'
means?

> +		       struct txn_conflict **collected_conflicts,
> +		       struct region *region)
> +{
> +	while (true) {
> +		if (!story->link[index].older.is_story) {
> +			/*
> +			 * the tuple is so old that we doesn't

8. the -> The. doesn't -> don't.

> +			 * know its story.
> +			 */
> +			*visible_replaced = story->link[index].older.tuple;
> +			assert(*visible_replaced == NULL ||
> +			       !(*visible_replaced)->is_dirty);
> +			break;
> +		}
> +		story = story->link[index].older.story;
> +		bool unused;
> +		if (txm_story_is_visible(story, stmt->txn, visible_replaced,
> +					 true, &unused))
> +			break;
> +
> +		/*
> +		 * We skip the story but once the story is committed
> +		 * before out TX that may cause conflict.
> +		 * The conflict will be unavoidable if this statement
> +		 * relies on old_tuple. If not (it's a replace),
> +		 * the conflict will take place only for secondary
> +		 * index if the story will not be overwritten in primary
> +		 * index.
> +		 */
> +		bool cross_conflict = false;
> +		if (stmt->does_require_old_tuple) {
> +			cross_conflict = true;
> +		} else if (index != 0) {
> +			struct txm_story *look_up = story;
> +			cross_conflict = true;
> +			while (look_up->link[0].newer_story != NULL) {
> +				struct txm_story *over;
> +				over = look_up->link[0].newer_story;
> +				if (over->add_stmt->txn == stmt->txn) {
> +					cross_conflict = false;
> +					break;
> +				}
> +				look_up = over;
> +			}
> +		}
> +		if (cross_conflict) {
> +			if (txm_save_conflict(story->add_stmt->txn,
> +					      collected_conflicts,
> +					      region) != 0)
> +				return -1;
> +
> +		}
> +	}
> +	return 0;
> +}
> +
> +int
> +txm_history_add_stmt(struct txn_stmt *stmt, struct tuple *old_tuple,
> +		     struct tuple *new_tuple, enum dup_replace_mode mode,
> +		     struct tuple **result)
> +{
> +	assert(new_tuple != NULL || old_tuple != NULL);
> +	struct space *space = stmt->space;
> +	struct txm_story *add_story = NULL;
> +	uint32_t add_story_linked = 0;
> +	struct txm_story *del_story = NULL;
> +	bool del_story_created = false;
> +	struct region *region = &stmt->txn->region;
> +	size_t region_svp = region_used(region);
> +
> +	/*
> +	 * List of transactions that will conflict us once one of them
> +	 * become committed.
> +	 */
> +	struct txn_conflict *collected_conflicts = NULL;
> +
> +	/* Create add_story if necessary. */
> +	if (new_tuple != NULL) {
> +		add_story = txm_story_new_add_stmt(new_tuple, stmt);
> +		if (add_story == NULL)
> +			goto fail;
> +
> +		for (uint32_t i = 0; i < space->index_count; i++) {
> +			struct tuple *replaced;
> +			struct index *index = space->index[i];
> +			if (index_replace(index, NULL, new_tuple,
> +					  DUP_REPLACE_OR_INSERT,
> +					  &replaced) != 0)

9. Why does this function change indexes? Why txn.c accesses index API
directly in the first place? The function is called txm_history_add_stmt,
but actually it behaves like space:replace().

> +				goto fail;
> +			txm_story_link_tuple(add_story, replaced, i);
> +			add_story_linked++;
> +
> +			struct tuple *visible_replaced = NULL;
> +			if (txm_story_find_visible(stmt, add_story, i,
> +						   &visible_replaced,
> +						   &collected_conflicts,
> +						   region) != 0)
> +				goto fail;
> +
> +			uint32_t errcode;
> +			errcode = replace_check_dup(old_tuple, visible_replaced,
> +						    i == 0 ? mode : DUP_INSERT);
> +			if (errcode != 0) {
> +				struct space *sp = stmt->space;

10. You already have 'space' variable, it stores the same value.

> +				if (sp != NULL)
> +					diag_set(ClientError, errcode,
> +						 sp->index[i]->def->name,

11. You already have 'index' variable. You can use 'index->def->name'.

> +						 space_name(sp));
> +				goto fail;
> +			}
> +
> +			if (i == 0) {
> +				old_tuple = visible_replaced;
> +			}

12. This cycle's body remained a complete mistery to me, in general. Every
line raises questions.

> +		}
> +	}
> +
> +	/* Create del_story if necessary. */
> +	struct tuple *del_tuple = NULL;
> +	if (new_tuple != NULL) {

13. You have exactly the same condition 'new_tuple != NULL' a few lines above.
Why can't the code below be inside this diff?

> +		struct txm_story_link *link = &add_story->link[0];
> +		if (link->older.is_story) {
> +			del_story = link->older.story;
> +			del_tuple = del_story->tuple;
> +		} else {
> +			del_tuple = link->older.tuple;
> +		}
> +	} else {
> +		del_tuple = old_tuple;

14. I don't understand why del_tuple is not equal to old_tuple
always.

> +	}
> +	if (del_tuple != NULL && del_story == NULL) {
> +		if (del_tuple->is_dirty) {
> +			del_story = txm_story_get(del_tuple);
> +		} else {
> +			del_story = txm_story_new_del_stmt(del_tuple, stmt);
> +			if (del_story == NULL)
> +				goto fail;
> +			del_story_created = true;
> +		}
> +	}
> +	if (new_tuple != NULL && del_story_created) {
> +		for (uint32_t i = 0; i < add_story->index_count; i++) {

15. Why do you use sometimes story->index_count, sometimes space->index_count?
They should be equal here.

> +			struct txm_story_link *link = &add_story->link[i];
> +			if (link->older.is_story)
> +				continue;
> +			if (link->older.tuple == del_tuple) {
> +				txm_story_unlink(add_story, i);
> +				txm_story_link_story(add_story, del_story, i);
> +			}
> +		}
> +	}
> +	if (del_story != NULL && !del_story_created) {
> +		stmt->next_in_del_list = del_story->del_stmt;
> +		del_story->del_stmt = stmt;
> +		stmt->del_story = del_story;
> +	}
> +
> +	/* Purge found conflicts. */
> +	while (collected_conflicts != NULL) {
> +		if (txm_cause_conflict(collected_conflicts->other_txn,
> +				       stmt->txn) != 0)
> +			goto fail;
> +		collected_conflicts = collected_conflicts->next;

16. Why can't call txm_cause_conflict() right when a conflict is
discovered? Why do you need to collect all of them into a list
first?

> +	}
> +
> +	/*
> +	 * We now reference both new and old tuple because the stmt holds
> +	 * pointers to them.
> +	 */

17. Why do you need old and new tuple as separate arguments of this
function, if they are already stored in stmt? Also they were already
referenced when were added to stmt.

If you are talking about story->tuple, it is already referenced in
txm_story_new().

> +	if (stmt->new_tuple != NULL)
> +		tuple_ref(stmt->new_tuple);
> +	*result = old_tuple;

18. I don't see that old_tuple is added to stmt in all code paths.

> +	if (*result != NULL)
> +		tuple_ref(*result);
> +	return 0;
> +
> +fail:
> +	if (add_story != NULL) {
> +		while (add_story_linked > 0) {
> +			--add_story_linked;
> +			uint32_t i = add_story_linked;

19. This easily can be a for() loop. Why do you need while()?

> +
> +			struct index *index = space->index[i];
> +			struct txm_story_link *link = &add_story->link[i];
> +			struct tuple *was = txm_story_older_tuple(link);
> +			struct tuple *unused;
> +			if (index_replace(index, new_tuple, was,
> +					  DUP_INSERT,
> +					  &unused) != 0) {

20. What happens to 'unused' returned tuple? Who unrefs it when
it is removed from the primary index? The same for the similar
code in txm_history_rollback_stmt().

> +				diag_log();
> +				unreachable();
> +				panic("failed to rollback change");
> +			}
> +
> +			txm_story_unlink(stmt->add_story, i);
> +
> +		}
> +		txm_story_delete_add_stmt(stmt->add_story);
> +	}
> +
> +	if (del_story != NULL && del_story->del_stmt == stmt) {
> +		del_story->del_stmt = stmt->next_in_del_list;
> +		stmt->next_in_del_list = NULL;
> +	}
> +
> +	if (del_story_created)
> +		txm_story_delete_del_stmt(stmt->del_story);
> +	else
> +		stmt->del_story = NULL;
> +
> +	region_truncate(region, region_svp);
> +	return -1;
> +}
> +
> +void
> +txm_history_rollback_stmt(struct txn_stmt *stmt)
> +{
> +	if (stmt->add_story != NULL) {
> +		assert(stmt->add_story->tuple == stmt->new_tuple);
> +		struct txm_story *story = stmt->add_story;
> +
> +		for (uint32_t i = 0; i < story->index_count; i++) {
> +			struct txm_story_link *link = &story->link[i];
> +			if (link->newer_story == NULL) {
> +				struct tuple *unused;
> +				struct index *index = stmt->space->index[i];
> +				struct tuple *was = txm_story_older_tuple(link);
> +				if (index_replace(index, story->tuple, was,
> +						  DUP_INSERT, &unused) != 0) {
> +					diag_log();
> +					unreachable();
> +					panic("failed to rollback change");
> +				}
> +			} else {
> +				struct txm_story *newer = link->newer_story;
> +				assert(newer->link[i].older.is_story);
> +				assert(newer->link[i].older.story == story);
> +				txm_story_unlink(newer, i);
> +				if (link->older.is_story) {
> +					struct txm_story *to = link->older.story;
> +					txm_story_link_story(newer,to, i);

21. ',to,' -> ', to,'.

> +				} else {
> +					struct tuple *to = link->older.tuple;
> +					txm_story_link_tuple(newer, to, i);
> +				}
> +			}
> +			txm_story_unlink(story, i);
> +		}
> +		stmt->add_story->add_stmt = NULL;
> +		txm_story_delete(stmt->add_story);
> +		stmt->add_story = NULL;
> +		tuple_unref(stmt->new_tuple);

22. I don't understand why this function contols new_ and old_ tuple in
txn_stmt. They have their own destructor in txn_stmt_destroy(), their own
refs, lifetime. It does not look right that you break the encapsulation
here.

> +	}
> +
> +	if (stmt->del_story != NULL) {
> +		struct txm_story *story = stmt->del_story;
> +
> +		struct txn_stmt **prev = &story->del_stmt;
> +		while (*prev != stmt) {
> +			prev = &(*prev)->next_in_del_list;
> +			assert(*prev != NULL);
> +		}
> +		*prev = stmt->next_in_del_list;
> +		stmt->next_in_del_list = NULL;
> +
> +		stmt->del_story->del_stmt = NULL;
> +		stmt->del_story = NULL;
> +	}
> +}
> +
> +void
> +txm_history_prepare_stmt(struct txn_stmt *stmt)
> +{
> +	assert(stmt->txn->psn != 0);
> +
> +	/* Move story to the past to prepared stories. */

23. Couldn't parse the comment.

> +
> +	struct txm_story *story = stmt->add_story;
> +	uint32_t index_count = story == NULL ? 0 : story->index_count;
> +	/*
> +	 * Note that if stmt->add_story == NULL, the index_count is set to 0,
> +	 * and we will not enter the loop.
> +	 */
> +	for (uint32_t i = 0; i < index_count; ) {
> +		if (!story->link[i].older.is_story) {
> +			/* tuple is old. */

24. Don't understand. How does it follow from the condition? And
why the 'tuple is old' prevents from doing the things below?

> +			i++;
> +			continue;
> +		}
> +		struct txm_story *old_story =
> +			story->link[i].older.story;
> +		if (old_story->del_psn != 0) {
> +			/* if psn is set, the change is prepared. */
> +			i++;
> +			continue;
> +		}
> +		if (old_story->add_psn != 0) {
> +			/* if psn is set, the change is prepared. */
> +			i++;
> +			continue;
> +		}
> +
> +		if (old_story->add_stmt == NULL) {
> +			/* ancient. */
> +			i++;
> +			continue;
> +		}
> +		if (old_story->add_stmt->txn == stmt->txn) {
> +			/* added by us. */
> +			i++;
> +			continue;
> +		}
> +
> +		if (old_story->add_stmt->does_require_old_tuple || i != 0)
> +			old_story->add_stmt->txn->status = TXN_CONFLICTED;
> +
> +		/* Swap story and old story. */
> +		struct txm_story_link *link = &story->link[i];
> +		if (link->newer_story == NULL) {
> +			/* we have to replace the tuple in index. */
> +			struct tuple *unused;
> +			struct index *index = stmt->space->index[i];
> +			if (index_replace(index, story->tuple, old_story->tuple,
> +					  DUP_INSERT, &unused) != 0) {
> +				diag_log();
> +				panic("failed to rollback change");
> +			}
> +		} else {
> +			struct txm_story *newer = link->newer_story;
> +			assert(newer->link[i].older.is_story);
> +			assert(newer->link[i].older.story == story);
> +			txm_story_unlink(newer, i);
> +			txm_story_link_story(newer, old_story, i);
> +		}
> +
> +		txm_story_unlink(story, i);
> +		if (old_story->link[i].older.is_story) {
> +			struct txm_story *to =
> +				old_story->link[i].older.story;
> +			txm_story_unlink(old_story, i);
> +			txm_story_link_story(story, to, i);
> +		} else {
> +			struct tuple *to =
> +				old_story->link[i].older.tuple;
> +			txm_story_unlink(old_story, i);
> +			txm_story_link_tuple(story, to, i);
> +		}
> +
> +		txm_story_link_story(old_story, story, i);
> +
> +		if (i == 0) {
> +			assert(stmt->del_story == old_story);
> +			assert(story->link[0].older.is_story ||
> +			       story->link[0].older.tuple == NULL);
> +
> +			struct txn_stmt *dels = old_story->del_stmt;
> +			assert(dels != NULL);
> +			do {
> +				if (dels->txn != stmt->txn)
> +					dels->txn->status = TXN_CONFLICTED;
> +				dels->del_story = NULL;
> +				struct txn_stmt *next = dels->next_in_del_list;
> +				dels->next_in_del_list = NULL;
> +				dels = next;
> +			} while (dels != NULL);
> +			old_story->del_stmt = NULL;
> +
> +			if (story->link[0].older.is_story) {
> +				struct txm_story *oldest_story =
> +					story->link[0].older.story;
> +				dels = oldest_story->del_stmt;
> +				while (dels != NULL) {
> +					assert(dels->txn != stmt->txn);
> +					dels->del_story = NULL;
> +					struct txn_stmt *next =
> +						dels->next_in_del_list;
> +					dels->next_in_del_list = NULL;
> +					dels = next;
> +				}
> +				oldest_story->del_stmt = stmt;
> +				stmt->del_story = oldest_story;
> +			}
> +		}
> +	}
> +	if (stmt->add_story != NULL)
> +		stmt->add_story->add_psn = stmt->txn->psn;
> +
> +	if (stmt->del_story != NULL)
> +		stmt->del_story->del_psn = stmt->txn->psn;
> +}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx
  2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx Aleksandr Lyapunov
@ 2020-07-16 22:26   ` Vladislav Shpilevoy
  2020-07-17  5:08     ` Aleksandr Lyapunov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 22:26 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

Thanks for the patch!

See 5 comments below.

On 15.07.2020 15:55, Aleksandr Lyapunov wrote:
> Use mvcc transaction engine in memtx if the engine is enabled.
> 
> Closes #4897
> ---
>  src/box/memtx_engine.c | 40 ++++++++++++++++++++++++++++++++++++----
>  src/box/memtx_space.c  | 28 ++++++++++++++++++++++++----
>  src/box/txn.c          |  3 +++
>  src/box/vinyl.c        | 14 ++++++++++----
>  4 files changed, 73 insertions(+), 12 deletions(-)
> 
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index dfd6fce..1402dd7 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -335,10 +335,39 @@ static int
>  memtx_engine_begin(struct engine *engine, struct txn *txn)
>  {
>  	(void)engine;
> -	txn_can_yield(txn, false);
> +	if (!tx_manager_use_mvcc_engine)
> +		txn_can_yield(txn, false);

1. You can pass tx_manager_use_mvcc_engine as a second
argument to txn_can_yield() instead of adding one another
'if'.

>  	return 0;
>  }
>  
> +static int
> +memtx_engine_prepare(struct engine *engine, struct txn *txn)
> +{
> +	(void)engine;
> +	struct txn_stmt *stmt;
> +	stailq_foreach_entry(stmt, &txn->stmts, next) {
> +		if (stmt->add_story != NULL || stmt->del_story != NULL)
> +			txm_history_prepare_stmt(stmt);
> +	}
> +	return 0;
> +}
> +
> +static void
> +memtx_engine_commit(struct engine *engine, struct txn *txn)
> +{
> +	(void)engine;
> +	struct txn_stmt *stmt;
> +	stailq_foreach_entry(stmt, &txn->stmts, next) {
> +		if (stmt->add_story != NULL || stmt->del_story != NULL) {
> +			ssize_t bsize = txm_history_commit_stmt(stmt);
> +			assert(stmt->space->engine == engine);
> +			struct memtx_space *mspace =
> +				(struct memtx_space *)stmt->space;
> +			mspace->bsize += bsize;
> +		}
> +	}
> +}
> +
>  static void
>  memtx_engine_rollback_statement(struct engine *engine, struct txn *txn,
>  				struct txn_stmt *stmt)
> @@ -348,13 +377,16 @@ memtx_engine_rollback_statement(struct engine *engine, struct txn *txn,
>  	if (stmt->old_tuple == NULL && stmt->new_tuple == NULL)
>  		return;
>  	struct space *space = stmt->space;
> -	struct memtx_space *memtx_space = (struct memtx_space *)space;
> +	struct memtx_space *memtx_space = (struct memtx_space*) space;
>  	uint32_t index_count;
>  
>  	/* Only roll back the changes if they were made. */
>  	if (stmt->engine_savepoint == NULL)
>  		return;
>  
> +	if (stmt->add_story != NULL || stmt->del_story != NULL)
> +		return txm_history_rollback_stmt(stmt);
> +
>  	if (memtx_space->replace == memtx_space_replace_all_keys)
>  		index_count = space->index_count;
>  	else if (memtx_space->replace == memtx_space_replace_primary_key)
> @@ -914,8 +946,8 @@ static const struct engine_vtab memtx_engine_vtab = {
>  	/* .complete_join = */ memtx_engine_complete_join,
>  	/* .begin = */ memtx_engine_begin,
>  	/* .begin_statement = */ generic_engine_begin_statement,
> -	/* .prepare = */ generic_engine_prepare,
> -	/* .commit = */ generic_engine_commit,
> +	/* .prepare = */ memtx_engine_prepare,
> +	/* .commit = */ memtx_engine_commit,
>  	/* .rollback_statement = */ memtx_engine_rollback_statement,
>  	/* .rollback = */ generic_engine_rollback,
>  	/* .switch_to_ro = */ generic_engine_switch_to_ro,
> diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
> index e48ed3a..66f28c9 100644
> --- a/src/box/memtx_space.c
> +++ b/src/box/memtx_space.c
> @@ -260,6 +260,20 @@ memtx_space_replace_all_keys(struct space *space, struct tuple *old_tuple,
>  	if (pk == NULL)
>  		return -1;
>  	assert(pk->def->opts.is_unique);
> +
> +	if (tx_manager_use_mvcc_engine) {
> +		struct txn *txn = in_txn();
> +		struct txn_stmt *stmt =
> +			txn == NULL ? NULL : txn_current_stmt(txn);
> +		if (stmt != NULL) {
> +			return txm_history_add_stmt(stmt, old_tuple, new_tuple,
> +						    mode, result);
> +		} else {
> +			/** Ephemeral space */
> +			assert(space->def->id == 0);
> +		}
> +	}
> +
>  	/*
>  	 * If old_tuple is not NULL, the index has to
>  	 * find and delete it, or return an error.
> @@ -896,7 +910,9 @@ memtx_space_check_format(struct space *space, struct tuple_format *format)
>  	if (txn_check_singlestatement(txn, "space format check") != 0)
>  		return -1;
>  
> -	txn_can_yield(txn, true);
> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
> +	if (!could_yield)
> +		txn_can_yield(txn, true);

2. This change looks unnecessary. You just added 'if'.
And still the result is that the txn can yield.

>  
>  	struct memtx_engine *memtx = (struct memtx_engine *)space->engine;
>  	struct memtx_ddl_state state;
> @@ -940,7 +956,8 @@ memtx_space_check_format(struct space *space, struct tuple_format *format)
>  	iterator_delete(it);
>  	diag_destroy(&state.diag);
>  	trigger_clear(&on_replace);
> -	txn_can_yield(txn, false);
> +	if (!could_yield)
> +		txn_can_yield(txn, false);

3. The same. And the same in the 2 hunks below.

>  	return rc;
>  }
>  
> @@ -1054,7 +1071,9 @@ memtx_space_build_index(struct space *src_space, struct index *new_index,
>  	if (txn_check_singlestatement(txn, "index build") != 0)
>  		return -1;
>  
> -	txn_can_yield(txn, true);
> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
> +	if (!could_yield)
> +		txn_can_yield(txn, true);
>  
>  	struct memtx_engine *memtx = (struct memtx_engine *)src_space->engine;
>  	struct memtx_ddl_state state;
> @@ -1132,7 +1151,8 @@ memtx_space_build_index(struct space *src_space, struct index *new_index,
>  	iterator_delete(it);
>  	diag_destroy(&state.diag);
>  	trigger_clear(&on_replace);
> -	txn_can_yield(txn, false);
> +	if (!could_yield)
> +		txn_can_yield(txn, false);
>  	return rc;
>  }
>  
> diff --git a/src/box/txn.c b/src/box/txn.c
> index b4888b3..b9a190f 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -203,6 +203,9 @@ txn_stmt_new(struct region *region)
>  static inline void
>  txn_stmt_destroy(struct txn_stmt *stmt)
>  {
> +	if (stmt->add_story != NULL || stmt->del_story != NULL)
> +		txm_history_rollback_stmt(stmt);

4. Doing rollback from a destructor is totally cursed. Why are you
doing it from there?

> +
>  	if (stmt->old_tuple != NULL)
>  		tuple_unref(stmt->old_tuple);
>  	if (stmt->new_tuple != NULL)
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index f9252f1..f69d3d9 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -1084,7 +1084,9 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
>  		return -1;
>  
>  	/* See the comment in vinyl_space_build_index(). */
> -	txn_can_yield(txn, true);
> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
> +	if (!could_yield)
> +		txn_can_yield(txn, true);

5. Unnecessary change. The same for all the other similar places.

>  
>  	struct trigger on_replace;
>  	struct vy_check_format_ctx ctx;
> @@ -1136,7 +1138,8 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
>  out:
>  	diag_destroy(&ctx.diag);
>  	trigger_clear(&on_replace);
> -	txn_can_yield(txn, false);
> +	if (!could_yield)
> +		txn_can_yield(txn, false);
>  	return rc;
>  }
>  
> @@ -4183,7 +4186,9 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
>  	 * change the data dictionary, so there is no dirty state
>  	 * that can be observed.
>  	 */
> -	txn_can_yield(txn, true);
> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
> +	if (!could_yield)
> +		txn_can_yield(txn, true);
>  
>  	/*
>  	 * Iterate over all tuples stored in the space and insert
> @@ -4284,7 +4289,8 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
>  out:
>  	diag_destroy(&ctx.diag);
>  	trigger_clear(&on_replace);
> -	txn_can_yield(txn, false);
> +	if (!could_yield)
> +		txn_can_yield(txn, false);
>  	return rc;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx
  2020-07-16 22:26   ` Vladislav Shpilevoy
@ 2020-07-17  5:08     ` Aleksandr Lyapunov
  2020-07-23 20:58       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-17  5:08 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches

Hi, thanks for review!

On 7/17/20 1:26 AM, Vladislav Shpilevoy wrote:
> Th
> 1. You can pass tx_manager_use_mvcc_engine as a second
> argument to txn_can_yield() instead of adding one another
> 'if'.
Actually it's not so simple. It would fall in assert.
>
>>   	
>> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
>> +	if (!could_yield)
>> +		txn_can_yield(txn, true);
> 2. This change looks unnecessary. You just added 'if'.
> And still the result is that the txn can yield.
See above. It's just a common restore-to-the-original state semantics.
>
>>   
>>   	struct memtx_engine *memtx = (struct memtx_engine *)space->engine;
>>   	struct memtx_ddl_state state;
>> @@ -940,7 +956,8 @@ memtx_space_check_format(struct space *space, struct tuple_format *format)
>>   	iterator_delete(it);
>>   	diag_destroy(&state.diag);
>>   	trigger_clear(&on_replace);
>> -	txn_can_yield(txn, false);
>> +	if (!could_yield)
>> +		txn_can_yield(txn, false);
> 3. The same. And the same in the 2 hunks below.
The same!
> @@ -203,6 +203,9 @@ txn_stmt_new(struct region *region)
>   static inline void
>   txn_stmt_destroy(struct txn_stmt *stmt)
>   {
> +	if (stmt->add_story != NULL || stmt->del_story != NULL)
> +		txm_history_rollback_stmt(stmt);
> 4. Doing rollback from a destructor is totally cursed. Why are you
> doing it from there?
Good, replaced with assert.
>
>> +
>>   	if (stmt->old_tuple != NULL)
>>   		tuple_unref(stmt->old_tuple);
>>   	if (stmt->new_tuple != NULL)
>> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
>> index f9252f1..f69d3d9 100644
>> --- a/src/box/vinyl.c
>> +++ b/src/box/vinyl.c
>> @@ -1084,7 +1084,9 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
>>   		return -1;
>>   
>>   	/* See the comment in vinyl_space_build_index(). */
>> -	txn_can_yield(txn, true);
>> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
>> +	if (!could_yield)
>> +		txn_can_yield(txn, true);
> 5. Unnecessary change. The same for all the other similar places.
>
>>   
>>   	struct trigger on_replace;
>>   	struct vy_check_format_ctx ctx;
>> @@ -1136,7 +1138,8 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
>>   out:
>>   	diag_destroy(&ctx.diag);
>>   	trigger_clear(&on_replace);
>> -	txn_can_yield(txn, false);
>> +	if (!could_yield)
>> +		txn_can_yield(txn, false);
>>   	return rc;
>>   }
>>   
>> @@ -4183,7 +4186,9 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
>>   	 * change the data dictionary, so there is no dirty state
>>   	 * that can be observed.
>>   	 */
>> -	txn_can_yield(txn, true);
>> +	bool could_yield = txn_has_flag(txn, TXN_CAN_YIELD);
>> +	if (!could_yield)
>> +		txn_can_yield(txn, true);
>>   
>>   	/*
>>   	 * Iterate over all tuples stored in the space and insert
>> @@ -4284,7 +4289,8 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
>>   out:
>>   	diag_destroy(&ctx.diag);
>>   	trigger_clear(&on_replace);
>> -	txn_can_yield(txn, false);
>> +	if (!could_yield)
>> +		txn_can_yield(txn, false);
>>   	return rc;
>>   }
>>   
>>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story
  2020-07-16  0:20   ` Vladislav Shpilevoy
@ 2020-07-17  6:16     ` Aleksandr Lyapunov
  0 siblings, 0 replies; 41+ messages in thread
From: Aleksandr Lyapunov @ 2020-07-17  6:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches

Hello, thank for the review!

On 7/16/20 3:20 AM, Vladislav Shpilevoy wrote:
> 1. Why is the hash called 'history', but the objects are 'story'?
> Or as 'history' you mean a list of stories of the same key in one
> index?
I mean history is a set of all stories.
Like a big book with a lot of paragraphs and stories about things in time.
> 2. difference -> different.
thanks
>
> 3. The iterator is initialized and assigned to something, but is
> never used for anything meaningful. The same for all_stories list.
> At least on top of this commit.
Implemented garbage collector.
> +			       cord_slab_cache(), item_size);
> 4. Is it correct you also create a pool for 0 index count?
> Why?
>
> Probably would be much simpler to just create one mempool with
> objects having BOX_INDEX_MAX records in them. It would also solve
> the DDL problem. At least partially.
>
> Can the story objects be allocated on txn's region? AFAIU,
> stories are created by transactions, and can't outlive their
> creator.
Discussed in call, that would be too much memory wasted.
>
> 5. Diag OOM takes size, allocator, and variable name. Here the
> allocator is "mempool_alloc", and the variable name is "story".
> Lets be consistent with other diag_sets. The same for all the
> other new places.
OK
>
> 6. I know story is NULL, but why not to return NULL explicitly?
> This looks confusing.
OK
>
> 7. Nit: would be better to call tuple_ref() right near this assignment.
> For the sake of OOM handler below, worth moving this line down to the
> tuple_ref() call below.
The hash table requires tuple to be set.
>
> 8. Comments for functions should use imperative mood. The same
> in other similar places.
OK
>
> +static struct tuple *
> +txm_story_older_tuple(struct txm_story_link *link)
> +{
> +	return link->older.is_story ? link->older.story->tuple
> +				    : link->older.tuple;
> 9. Why so? If older.is_story, it means it stores a story,
> not a tuple. But you return vice versa.
I don't see any mistake. I return tuple in any case.
>
> 10. Why do you need all the new structs in txn.h if they aren't
> used out of txn.c anyway. At least some of them.
OK, I'll try to.

I have to make a pause, will return later.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx
  2020-07-17  5:08     ` Aleksandr Lyapunov
@ 2020-07-23 20:58       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 41+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-23 20:58 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches

> On 7/17/20 1:26 AM, Vladislav Shpilevoy wrote:
>> Th
>> 1. You can pass tx_manager_use_mvcc_engine as a second
>> argument to txn_can_yield() instead of adding one another
>> 'if'.
> Actually it's not so simple. It would fall in assert.

Well, then remove the assert. Better patch txn_can_yield(),
that what you did. You just moved the common 'if' out of
the function and added it to each call, even though you
could move it inside.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-07-23 20:58 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 13:55 [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 01/13] Update license file (2020) Aleksandr Lyapunov
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 02/13] Check data_offset overflow in struct tuple Aleksandr Lyapunov
2020-07-16 14:27   ` Nikita Pettik
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 03/13] vinyl: rename tx_manager -> vy_tx_manager Aleksandr Lyapunov
2020-07-15 16:04   ` Nikita Pettik
2020-07-16  8:17     ` Aleksandr Lyapunov
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 04/13] txm: introduce dirty tuples Aleksandr Lyapunov
2020-07-15 16:22   ` Nikita Pettik
2020-07-16  0:05   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 05/13] txm: save txn in txn_stmt Aleksandr Lyapunov
2020-07-15 16:23   ` Nikita Pettik
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 06/13] txm: add TX status Aleksandr Lyapunov
2020-07-15 16:42   ` Nikita Pettik
2020-07-16  0:08   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 07/13] txm: save does_require_old_tuple flag in txn_stmt Aleksandr Lyapunov
2020-07-15 16:49   ` Nikita Pettik
2020-07-16  0:09   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 08/13] txm: introduce tx manager Aleksandr Lyapunov
2020-07-15 16:51   ` Nikita Pettik
2020-07-15 22:01   ` Vladislav Shpilevoy
2020-07-16  0:10   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 09/13] tmx: introduce prepare sequence number Aleksandr Lyapunov
2020-07-15 17:13   ` Nikita Pettik
2020-07-16  0:11   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 10/13] tmx: introduce conflict tracker Aleksandr Lyapunov
2020-07-16  0:16   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 11/13] txm: introduce txm_story Aleksandr Lyapunov
2020-07-16  0:20   ` Vladislav Shpilevoy
2020-07-17  6:16     ` Aleksandr Lyapunov
2020-07-16 22:25   ` Vladislav Shpilevoy
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 12/13] txm: clarify all fetched tuples Aleksandr Lyapunov
2020-07-15 13:55 ` [Tarantool-patches] [PATCH v3 13/13] tmx: use new tx manager in memtx Aleksandr Lyapunov
2020-07-16 22:26   ` Vladislav Shpilevoy
2020-07-17  5:08     ` Aleksandr Lyapunov
2020-07-23 20:58       ` Vladislav Shpilevoy
2020-07-15 15:47 ` [Tarantool-patches] [PATCH v3 00/13] Transaction engine for memtx engine Aleksandr Lyapunov
2020-07-15 16:38 ` Aleksandr Lyapunov
2020-07-15 16:39 ` Aleksandr Lyapunov
2020-07-15 16:40 ` Aleksandr Lyapunov
2020-07-16  0:05 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox