Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH 0/6] Get rid of the schema lock
@ 2019-06-30 19:40 Vladimir Davydov
  2019-06-30 19:40 ` [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers Vladimir Davydov
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

The schema lock was initially introduced to prevent a space from being
dropped while the memtx checkpoint thread dumps its content to a snap
file. Later on it was reused for syncing DDL-vs-DDL. Now we take it
before any statement involving _space, _index, and _truncate spaces.

This prevents us from batching several non-yielding DDL statements in
the same transaction (see #4083). For instance, suppose we want to
create a space and then an index for the space in the same transaction.
Then we would deadlock while trying to take the schema lock before
executing the second statement. We could, of course, make the lock
reentrant, but that would rather look like a hack.

Actually, we don't really need this lock for syncing checkpointing vs
DDL - we can simply postpone index destruction until after checkpointing
is complete. Moreover, all the necessary infrastructure has already been
implemented in the scope of #3408 ("Free tuples on space drop/truncate
in background fiber"). It also seems that we don't need this lock for
syncing DDL vs DDL, either, since instead we can lock individual spaces.

So this patch removes the schema lock altogether. This is a preparation
for transaction DDL (see #4083).

https://github.com/tarantool/tarantool/issues/4083
https://github.com/tarantool/tarantool/commits/dv/zap-schema-lock

Vladimir Davydov (6):
  Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers
  Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY
  Don't take schema lock for checkpointing
  test: make vinyl/replica_rejoin more stable
  vinyl: don't yield while logging index creation
  Replace schema lock with fine-grained locking

 src/box/alter.cc                   |  68 +++--------------
 src/box/alter.h                    |   3 -
 src/box/gc.c                       |   8 --
 src/box/memtx_engine.c             |  64 +++++++++-------
 src/box/memtx_engine.h             |   5 ++
 src/box/memtx_space.c              |  79 +++++++------------
 src/box/relay.cc                   |   6 +-
 src/box/schema.cc                  |  41 +++++-----
 src/box/schema.h                   |   6 --
 src/box/space.c                    |   2 -
 src/box/space.h                    |   7 +-
 src/box/txn.c                      |   3 -
 src/box/vinyl.c                    |  23 +-----
 src/box/vy_log.c                   |   6 +-
 src/box/vy_lsm.c                   |   4 +-
 src/box/vy_run.c                   |   6 +-
 src/box/vy_scheduler.c             |  22 ++----
 src/box/wal.c                      |   4 +-
 src/curl.c                         |   7 +-
 src/lib/core/errinj.h              |  12 ++-
 src/lib/core/latch.h               |  10 ---
 test/app-tap/snapshot.test.lua     |   5 +-
 test/box/errinj.result             |  38 +++++-----
 test/box/errinj.test.lua           |   6 +-
 test/box/on_replace.result         |   8 +-
 test/box/transaction.result        |   2 +-
 test/engine/errinj_ddl.result      | 152 +++++++++++++++++++++++++++++++++++++
 test/engine/errinj_ddl.test.lua    |  62 +++++++++++++++
 test/unit/vy_log_stub.c            |   3 +
 test/vinyl/errinj_vylog.result     |  23 +++++-
 test/vinyl/errinj_vylog.test.lua   |  10 ++-
 test/vinyl/replica_rejoin.result   | 119 ++++++++++++++---------------
 test/vinyl/replica_rejoin.test.lua |  34 +++++----
 test/xlog/errinj.result            |   4 +-
 test/xlog/errinj.test.lua          |   4 +-
 35 files changed, 489 insertions(+), 367 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
@ 2019-06-30 19:40 ` Vladimir Davydov
  2019-07-03 19:12   ` Konstantin Osipov
  2019-06-30 19:40 ` [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY Vladimir Davydov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

ERROR_INJECT_YIELD yields the current fiber execution by calling
fiber_sleep(0) while the given error injection is set.

ERROR_INJECT_SLEEP suspends the current thread execution by calling
usleep(1000) while the given error injection is set.
---
 src/box/memtx_engine.c | 16 ++--------
 src/box/memtx_space.c  | 79 ++++++++++++++++++--------------------------------
 src/box/relay.cc       |  6 ++--
 src/box/vinyl.c        | 23 ++-------------
 src/box/vy_log.c       |  6 +---
 src/box/vy_run.c       |  6 +---
 src/box/vy_scheduler.c | 22 ++++----------
 src/box/wal.c          |  4 +--
 src/curl.c             |  7 +----
 src/lib/core/errinj.h  |  8 +++++
 10 files changed, 53 insertions(+), 124 deletions(-)

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index f371d147..c60f831c 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -774,14 +774,7 @@ memtx_engine_commit_checkpoint(struct engine *engine,
 		snprintf(to, sizeof(to), "%s",
 			 xdir_format_filename(dir, lsn, NONE));
 		const char *from = xdir_format_filename(dir, lsn, INPROGRESS);
-#ifndef NDEBUG
-		struct errinj *delay = errinj(ERRINJ_SNAP_COMMIT_DELAY,
-					       ERRINJ_BOOL);
-		if (delay != NULL && delay->bparam) {
-			while (delay->bparam)
-				fiber_sleep(0.001);
-		}
-#endif
+		ERROR_INJECT_YIELD(ERRINJ_SNAP_COMMIT_DELAY);
 		int rc = coio_rename(from, to);
 		if (rc != 0)
 			panic("can't rename .snap.inprogress");
@@ -990,12 +983,7 @@ memtx_engine_gc_f(va_list va)
 	struct memtx_engine *memtx = va_arg(va, struct memtx_engine *);
 	while (!fiber_is_cancelled()) {
 		bool stop;
-		struct errinj *delay = errinj(ERRINJ_MEMTX_DELAY_GC,
-					      ERRINJ_BOOL);
-		if (delay != NULL && delay->bparam) {
-			while (delay->bparam)
-				fiber_sleep(0.001);
-		}
+		ERROR_INJECT_YIELD(ERRINJ_MEMTX_DELAY_GC);
 		memtx_engine_run_gc(memtx, &stop);
 		if (stop) {
 			fiber_yield_timeout(TIMEOUT_INFINITY);
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 77a8f7d5..f0e1cfd2 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -910,33 +910,21 @@ memtx_space_check_format(struct space *space, struct tuple_format *format)
 		rc = tuple_validate(format, tuple);
 		if (rc != 0)
 			break;
+
+		state.cursor = tuple;
+		tuple_ref(state.cursor);
+
 		if (++count % MEMTX_DDL_YIELD_LOOPS == 0 &&
-		    memtx->state == MEMTX_OK) {
-			state.cursor = tuple;
-			tuple_ref(state.cursor);
+		    memtx->state == MEMTX_OK)
 			fiber_sleep(0);
-			tuple_unref(state.cursor);
 
-			if (state.rc != 0) {
-				rc = -1;
-				diag_move(&state.diag, diag_get());
-				break;
-			}
-		}
+		ERROR_INJECT_YIELD(ERRINJ_CHECK_FORMAT_DELAY);
 
-		struct errinj *inj = errinj(ERRINJ_CHECK_FORMAT_DELAY, ERRINJ_BOOL);
-		if (inj != NULL && inj->bparam && count == 1) {
-			state.cursor = tuple;
-			tuple_ref(state.cursor);
-			do {
-				fiber_sleep(0);
-			} while (inj->bparam);
-			tuple_unref(state.cursor);
-			if (state.rc != 0) {
-				rc = -1;
-				diag_move(&state.diag, diag_get());
-				break;
-			}
+		tuple_unref(state.cursor);
+		if (state.rc != 0) {
+			rc = -1;
+			diag_move(&state.diag, diag_get());
+			break;
 		}
 	}
 	iterator_delete(it);
@@ -1086,39 +1074,30 @@ memtx_space_build_index(struct space *src_space, struct index *new_index,
 		 */
 		if (new_index->def->iid == 0)
 			tuple_ref(tuple);
+		/*
+		 * Remember the latest inserted tuple to
+		 * avoid processing yet to be added tuples
+		 * in on_replace triggers.
+		 */
+		state.cursor = tuple;
+		tuple_ref(state.cursor);
 		if (++count % MEMTX_DDL_YIELD_LOOPS == 0 &&
-		    memtx->state == MEMTX_OK) {
-			/*
-			 * Remember the latest inserted tuple to
-			 * avoid processing yet to be added tuples
-			 * in on_replace triggers.
-			 */
-			state.cursor = tuple;
-			tuple_ref(state.cursor);
+		    memtx->state == MEMTX_OK)
 			fiber_sleep(0);
-			tuple_unref(state.cursor);
-			/*
-			 * The on_replace trigger may have failed
-			 * during the yield.
-			 */
-			if (state.rc != 0) {
-				rc = -1;
-				diag_move(&state.diag, diag_get());
-				break;
-			}
-		}
 		/*
 		 * Sleep after at least one tuple is inserted to test
 		 * on_replace triggers for index build.
 		 */
-		struct errinj *inj = errinj(ERRINJ_BUILD_INDEX_DELAY, ERRINJ_BOOL);
-		if (inj != NULL && inj->bparam && count == 1) {
-			state.cursor = tuple;
-			tuple_ref(state.cursor);
-			do {
-				fiber_sleep(0);
-			} while (inj->bparam);
-			tuple_unref(state.cursor);
+		ERROR_INJECT_YIELD(ERRINJ_BUILD_INDEX_DELAY);
+		tuple_unref(state.cursor);
+		/*
+		 * The on_replace trigger may have failed
+		 * during the yield.
+		 */
+		if (state.rc != 0) {
+			rc = -1;
+			diag_move(&state.diag, diag_get());
+			break;
 		}
 	}
 	iterator_delete(it);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 906bf8ef..e9f5bdca 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -684,16 +684,14 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 static void
 relay_send(struct relay *relay, struct xrow_header *packet)
 {
-	struct errinj *inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
-	while (inj != NULL && inj->bparam)
-		fiber_sleep(0.01);
+	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
 
 	packet->sync = relay->sync;
 	relay->last_row_time = ev_monotonic_now(loop());
 	coio_write_xrow(&relay->io, packet);
 	fiber_gc();
 
-	inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
+	struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
 	if (inj != NULL && inj->dparam > 0)
 		fiber_sleep(inj->dparam);
 }
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 814325da..128b1199 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -1156,12 +1156,7 @@ vinyl_space_check_format(struct space *space, struct tuple_format *format)
 		 */
 		if (++loops % VY_YIELD_LOOPS == 0)
 			fiber_sleep(0);
-		struct errinj *inj = errinj(ERRINJ_CHECK_FORMAT_DELAY, ERRINJ_BOOL);
-		if (inj != NULL && inj->bparam && loops == 1) {
-			do {
-				fiber_sleep(0);
-			} while (inj->bparam);
-		}
+		ERROR_INJECT_YIELD(ERRINJ_CHECK_FORMAT_DELAY);
 		if (ctx.is_failed) {
 			diag_move(&ctx.diag, diag_get());
 			rc = -1;
@@ -3882,14 +3877,7 @@ next:
 		*ret = NULL;
 		return 0;
 	}
-#ifndef NDEBUG
-	struct errinj *delay = errinj(ERRINJ_VY_DELAY_PK_LOOKUP,
-				      ERRINJ_BOOL);
-	if (delay && delay->bparam) {
-		while (delay->bparam)
-			fiber_sleep(0.01);
-	}
-#endif
+	ERROR_INJECT_YIELD(ERRINJ_VY_DELAY_PK_LOOKUP);
 	/* Get the full tuple from the primary index. */
 	if (vy_get_by_secondary_tuple(it->lsm, it->tx,
 				      vy_tx_read_view(it->tx),
@@ -4422,12 +4410,7 @@ vinyl_space_build_index(struct space *src_space, struct index *new_index,
 		 * Sleep after one tuple is inserted to test
 		 * on_replace triggers for index build.
 		 */
-		inj = errinj(ERRINJ_BUILD_INDEX_DELAY, ERRINJ_BOOL);
-		if (inj != NULL && inj->bparam && loops == 1) {
-			do {
-				fiber_sleep(0);
-			} while (inj->bparam);
-		}
+		ERROR_INJECT_YIELD(ERRINJ_BUILD_INDEX_DELAY);
 	}
 	vy_read_iterator_close(&itr);
 
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index bf50f552..cb291f3c 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -807,11 +807,7 @@ vy_log_tx_flush(struct vy_log_tx *tx)
 		diag_set(ClientError, ER_INJECTION, "vinyl log flush");
 		return -1;
 	});
-	struct errinj *delay = errinj(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL);
-	if (delay != NULL && delay->bparam) {
-		while (delay->bparam)
-			fiber_sleep(0.001);
-	}
+	ERROR_INJECT_YIELD(ERRINJ_VY_LOG_FLUSH_DELAY);
 
 	int tx_size = 0;
 	struct vy_log_record *record;
diff --git a/src/box/vy_run.c b/src/box/vy_run.c
index dfd71f1f..c6c17aee 100644
--- a/src/box/vy_run.c
+++ b/src/box/vy_run.c
@@ -916,11 +916,7 @@ vy_page_read(struct vy_page *page, const struct vy_page_info *page_info,
 	if (inj != NULL && inj->dparam > 0)
 		usleep(inj->dparam * 1000000);
 
-	inj = errinj(ERRINJ_VY_READ_PAGE_DELAY, ERRINJ_BOOL);
-	if (inj != NULL) {
-		while (inj->bparam)
-			usleep(10000);
-	}
+	ERROR_INJECT_SLEEP(ERRINJ_VY_READ_PAGE_DELAY);
 
 	/* decode xlog tx */
 	const char *data_pos = data;
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index 0df55818..f3bded20 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -1040,12 +1040,7 @@ vy_task_write_run(struct vy_task *task, bool no_compression)
 	ERROR_INJECT(ERRINJ_VY_RUN_WRITE,
 		     {diag_set(ClientError, ER_INJECTION,
 			       "vinyl dump"); return -1;});
-
-	struct errinj *inj = errinj(ERRINJ_VY_RUN_WRITE_DELAY, ERRINJ_BOOL);
-	if (inj != NULL && inj->bparam) {
-		while (inj->bparam)
-			usleep(10000);
-	}
+	ERROR_INJECT_SLEEP(ERRINJ_VY_RUN_WRITE_DELAY);
 
 	struct vy_run_writer writer;
 	if (vy_run_writer_create(&writer, task->new_run, lsm->env->path,
@@ -1061,7 +1056,8 @@ vy_task_write_run(struct vy_task *task, bool no_compression)
 	int loops = 0;
 	struct vy_entry entry = vy_entry_none();
 	while ((rc = wi->iface->next(wi, &entry)) == 0 && entry.stmt != NULL) {
-		inj = errinj(ERRINJ_VY_RUN_WRITE_STMT_TIMEOUT, ERRINJ_DOUBLE);
+		struct errinj *inj = errinj(ERRINJ_VY_RUN_WRITE_STMT_TIMEOUT,
+					    ERRINJ_DOUBLE);
 		if (inj != NULL && inj->dparam > 0)
 			usleep(inj->dparam * 1000000);
 
@@ -1095,11 +1091,7 @@ fail:
 static int
 vy_task_dump_execute(struct vy_task *task)
 {
-	struct errinj *errinj = errinj(ERRINJ_VY_DUMP_DELAY, ERRINJ_BOOL);
-	if (errinj != NULL && errinj->bparam) {
-		while (errinj->bparam)
-			fiber_sleep(0.01);
-	}
+	ERROR_INJECT_SLEEP(ERRINJ_VY_DUMP_DELAY);
 	/*
 	 * Don't compress L1 runs as they are most frequently read
 	 * and smallest runs at the same time and so we would gain
@@ -1441,11 +1433,7 @@ err:
 static int
 vy_task_compaction_execute(struct vy_task *task)
 {
-	struct errinj *errinj = errinj(ERRINJ_VY_COMPACTION_DELAY, ERRINJ_BOOL);
-	if (errinj != NULL && errinj->bparam) {
-		while (errinj->bparam)
-			fiber_sleep(0.01);
-	}
+	ERROR_INJECT_SLEEP(ERRINJ_VY_COMPACTION_DELAY);
 	return vy_task_write_run(task, false);
 }
 
diff --git a/src/box/wal.c b/src/box/wal.c
index 6f5d0a58..58a58e5b 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -936,9 +936,7 @@ wal_write_to_disk(struct cmsg *msg)
 	struct vclock vclock_diff;
 	vclock_create(&vclock_diff);
 
-	struct errinj *inj = errinj(ERRINJ_WAL_DELAY, ERRINJ_BOOL);
-	while (inj != NULL && inj->bparam)
-		usleep(10);
+	ERROR_INJECT_SLEEP(ERRINJ_WAL_DELAY);
 
 	if (writer->in_rollback.route != NULL) {
 		/* We're rolling back a failed write. */
diff --git a/src/curl.c b/src/curl.c
index 3b48f907..a41c65e4 100644
--- a/src/curl.c
+++ b/src/curl.c
@@ -297,12 +297,7 @@ curl_execute(struct curl_request *curl_request, struct curl_env *env,
 	mcode = curl_multi_add_handle(env->multi, curl_request->easy);
 	if (mcode != CURLM_OK)
 		goto curl_merror;
-#ifndef NDEBUG
-	struct errinj *errinj = errinj(ERRINJ_HTTP_RESPONSE_ADD_WAIT,
-				       ERRINJ_BOOL);
-	while (errinj != NULL && errinj->bparam)
-		fiber_sleep(0.001);
-#endif
+	ERROR_INJECT_YIELD(ERRINJ_HTTP_RESPONSE_ADD_WAIT);
 	/* Don't wait on a cond if request has already failed or finished. */
 	if (curl_request->code == CURLE_OK && curl_request->in_progress) {
 		++env->stat.active_requests;
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index 78783651..a55f583c 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -153,6 +153,7 @@ errinj_foreach(errinj_cb cb, void *cb_ctx);
 
 #ifdef NDEBUG
 #  define ERROR_INJECT(ID, CODE)
+#  define ERROR_INJECT_WHILE(ID, CODE)
 #  define errinj(ID, TYPE) ((struct errinj *) NULL)
 #else
 #  /* Returns the error injection by id */
@@ -167,9 +168,16 @@ errinj_foreach(errinj_cb cb, void *cb_ctx);
 		if (errinj(ID, ERRINJ_BOOL)->bparam) \
 			CODE; \
 	} while (0)
+#  define ERROR_INJECT_WHILE(ID, CODE) \
+	do { \
+		while (errinj(ID, ERRINJ_BOOL)->bparam) \
+			CODE; \
+	} while (0)
 #endif
 
 #define ERROR_INJECT_RETURN(ID) ERROR_INJECT(ID, return -1)
+#define ERROR_INJECT_SLEEP(ID) ERROR_INJECT_WHILE(ID, usleep(1000))
+#define ERROR_INJECT_YIELD(ID) ERROR_INJECT_WHILE(ID, fiber_sleep(0))
 
 #if defined(__cplusplus)
 } /* extern "C" */
-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
  2019-06-30 19:40 ` [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers Vladimir Davydov
@ 2019-06-30 19:40 ` Vladimir Davydov
  2019-07-03 19:13   ` Konstantin Osipov
  2019-06-30 19:40 ` [PATCH 3/6] Don't take schema lock for checkpointing Vladimir Davydov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Timeout injections are unstable and difficult to use. Injecting a delay
is much more convenient.
---
 src/box/memtx_engine.c    |  6 +-----
 src/lib/core/errinj.h     |  4 ++--
 test/box/errinj.result    | 38 +++++++++++++++++++-------------------
 test/box/errinj.test.lua  |  6 +++---
 test/xlog/errinj.result   |  4 ++--
 test/xlog/errinj.test.lua |  4 ++--
 6 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index c60f831c..e259bf62 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -497,11 +497,6 @@ memtx_engine_bootstrap(struct engine *engine)
 static int
 checkpoint_write_row(struct xlog *l, struct xrow_header *row)
 {
-	struct errinj *errinj = errinj(ERRINJ_SNAP_WRITE_ROW_TIMEOUT,
-				       ERRINJ_DOUBLE);
-	if (errinj != NULL && errinj->dparam > 0)
-		usleep(errinj->dparam * 1000000);
-
 	static ev_tstamp last = 0;
 	if (last == 0) {
 		ev_now_update(loop());
@@ -674,6 +669,7 @@ checkpoint_f(va_list ap)
 		return -1;
 
 	say_info("saving snapshot `%s'", snap.filename);
+	ERROR_INJECT_SLEEP(ERRINJ_SNAP_WRITE_DELAY);
 	struct checkpoint_entry *entry;
 	rlist_foreach_entry(entry, &ckpt->entries, link) {
 		uint32_t size;
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index a55f583c..3933e1e3 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -117,8 +117,8 @@ struct errinj {
 	_(ERRINJ_IPROTO_TX_DELAY, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_HTTPC_EXECUTE, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_LOG_ROTATE, ERRINJ_BOOL, {.bparam = false}) \
-	_(ERRINJ_SNAP_COMMIT_DELAY, ERRINJ_BOOL, {.bparam = 0}) \
-	_(ERRINJ_SNAP_WRITE_ROW_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
+	_(ERRINJ_SNAP_WRITE_DELAY, ERRINJ_BOOL, {.bparam = false}) \
+	_(ERRINJ_SNAP_COMMIT_DELAY, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_HTTP_RESPONSE_ADD_WAIT, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_RUN_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 6c5af29e..cb463e1a 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -20,12 +20,12 @@ errinj.info()
     state: false
   ERRINJ_VYRUN_DATA_READ:
     state: false
-  ERRINJ_SNAP_WRITE_ROW_TIMEOUT:
-    state: 0
   ERRINJ_SQL_NAME_NORMALIZATION:
     state: false
   ERRINJ_VY_SCHED_TIMEOUT:
     state: 0
+  ERRINJ_COIO_SENDFILE_CHUNK:
+    state: -1
   ERRINJ_WAL_WRITE_PARTIAL:
     state: -1
   ERRINJ_VY_GC:
@@ -36,34 +36,34 @@ errinj.info()
     state: -1
   ERRINJ_WAL_WRITE_EOF:
     state: false
-  ERRINJ_COIO_SENDFILE_CHUNK:
-    state: -1
+  ERRINJ_VY_LOG_FILE_RENAME:
+    state: false
   ERRINJ_VYRUN_INDEX_GARBAGE:
     state: false
   ERRINJ_BUILD_INDEX_DELAY:
     state: false
-  ERRINJ_VY_RUN_FILE_RENAME:
-    state: false
   ERRINJ_BUILD_INDEX:
     state: -1
-  ERRINJ_RELAY_BREAK_LSN:
-    state: -1
+  ERRINJ_VY_INDEX_FILE_RENAME:
+    state: false
+  ERRINJ_CHECK_FORMAT_DELAY:
+    state: false
   ERRINJ_VY_DELAY_PK_LOOKUP:
     state: false
   ERRINJ_VY_TASK_COMPLETE:
     state: false
   ERRINJ_PORT_DUMP:
     state: false
-  ERRINJ_CHECK_FORMAT_DELAY:
+  ERRINJ_VY_DUMP_DELAY:
     state: false
   ERRINJ_WAL_IO:
     state: false
   ERRINJ_WAL_FALLOCATE:
     state: 0
-  ERRINJ_VY_DUMP_DELAY:
-    state: false
   ERRINJ_WAL_BREAK_LSN:
     state: -1
+  ERRINJ_RELAY_BREAK_LSN:
+    state: -1
   ERRINJ_TUPLE_FORMAT_COUNT:
     state: -1
   ERRINJ_TUPLE_ALLOC:
@@ -74,7 +74,7 @@ errinj.info()
     state: false
   ERRINJ_RELAY_REPORT_INTERVAL:
     state: 0
-  ERRINJ_VY_INDEX_FILE_RENAME:
+  ERRINJ_VY_RUN_FILE_RENAME:
     state: false
   ERRINJ_VY_READ_PAGE_TIMEOUT:
     state: 0
@@ -82,14 +82,14 @@ errinj.info()
     state: false
   ERRINJ_SIO_READ_MAX:
     state: -1
-  ERRINJ_VY_LOG_FILE_RENAME:
-    state: false
-  ERRINJ_WAL_WRITE_DISK:
-    state: false
   ERRINJ_HTTP_RESPONSE_ADD_WAIT:
     state: false
+  ERRINJ_WAL_WRITE_DISK:
+    state: false
   ERRINJ_SNAP_COMMIT_DELAY:
     state: false
+  ERRINJ_SNAP_WRITE_DELAY:
+    state: false
   ERRINJ_VY_RUN_WRITE:
     state: false
   ERRINJ_LOG_ROTATE:
@@ -1485,7 +1485,7 @@ for i = 1, count do box.space.tmp:insert{i, pad} end
 c = fiber.channel(1)
 ---
 ...
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', true)
 ---
 - ok
 ...
@@ -1504,7 +1504,7 @@ _ = collectgarbage('collect')
 for i = 1, count do box.space.tmp:insert{i, pad} end
 ---
 ...
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', false)
 ---
 - ok
 ...
@@ -1549,7 +1549,7 @@ _ = box.space.test:create_index('primary')
 for i = 1, 10 do box.space.test:insert{i} end
 ---
 ...
-errinj.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 9000)
+errinj.set('ERRINJ_SNAP_WRITE_DELAY', true)
 ---
 - ok
 ...
diff --git a/test/box/errinj.test.lua b/test/box/errinj.test.lua
index 386d2beb..093168b7 100644
--- a/test/box/errinj.test.lua
+++ b/test/box/errinj.test.lua
@@ -501,7 +501,7 @@ for i = 1, count do box.space.tmp:insert{i, pad} end
 
 -- Start background snapshot.
 c = fiber.channel(1)
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', true)
 _ = fiber.create(function() box.snapshot() c:put(true) end)
 
 -- Overwrite data stored in the temporary space while snapshot
@@ -511,7 +511,7 @@ for i = 1, count do box.space.tmp:delete{i} end
 _ = collectgarbage('collect')
 for i = 1, count do box.space.tmp:insert{i, pad} end
 
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', false)
 c:get()
 
 box.space.tmp:drop()
@@ -532,7 +532,7 @@ _ = box.schema.space.create('test')
 _ = box.space.test:create_index('primary')
 for i = 1, 10 do box.space.test:insert{i} end
 
-errinj.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 9000)
+errinj.set('ERRINJ_SNAP_WRITE_DELAY', true)
 _ = fiber.create(function() box.snapshot() end)
 path = fio.pathjoin(box.cfg.memtx_dir, '*.snap.inprogress')
 while #fio.glob(path) == 0 do fiber.sleep(0.001) end
diff --git a/test/xlog/errinj.result b/test/xlog/errinj.result
index 7f15bef3..d6d4141b 100644
--- a/test/xlog/errinj.result
+++ b/test/xlog/errinj.result
@@ -94,7 +94,7 @@ fio = require('fio')
 box.space._schema:delete('test') -- bump LSN
 ---
 ...
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0.1)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', true)
 ---
 - ok
 ...
@@ -132,7 +132,7 @@ box.cfg{checkpoint_interval = 10}
 box.cfg{checkpoint_interval = default_checkpoint_interval}
 ---
 ...
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', false)
 ---
 - ok
 ...
diff --git a/test/xlog/errinj.test.lua b/test/xlog/errinj.test.lua
index e433b2fc..de0e2e7e 100644
--- a/test/xlog/errinj.test.lua
+++ b/test/xlog/errinj.test.lua
@@ -48,7 +48,7 @@ errinj = nil
 --
 fio = require('fio')
 box.space._schema:delete('test') -- bump LSN
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0.1)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', true)
 default_checkpoint_interval = box.cfg.checkpoint_interval
 box.cfg{checkpoint_interval = 0.01}
 -- Wait for the checkpoint daemon to start making a checkpoint.
@@ -62,7 +62,7 @@ box.cfg{checkpoint_interval = 0.1}
 box.cfg{checkpoint_interval = 1}
 box.cfg{checkpoint_interval = 10}
 box.cfg{checkpoint_interval = default_checkpoint_interval}
-box.error.injection.set('ERRINJ_SNAP_WRITE_ROW_TIMEOUT', 0)
+box.error.injection.set('ERRINJ_SNAP_WRITE_DELAY', false)
 -- Wait for the checkpoint daemon to finish making a checkpoint.
 test_run:cmd("setopt delimiter ';'")
 test_run:wait_cond(function()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/6] Don't take schema lock for checkpointing
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
  2019-06-30 19:40 ` [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers Vladimir Davydov
  2019-06-30 19:40 ` [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY Vladimir Davydov
@ 2019-06-30 19:40 ` Vladimir Davydov
  2019-07-03 19:21   ` Konstantin Osipov
  2019-06-30 19:40 ` [PATCH 4/6] test: make vinyl/replica_rejoin more stable Vladimir Davydov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Memtx checkpointing proceeds as follows: first we open iterators over
primary indexes of all spaces and save them to a list, then we start
a thread that uses the iterators to dump space contents to a snap file.
To avoid accessing a freed tuple, we put the small allocator to the
delayed free mode. However, this doesn't prevent an index from being
dropped so we also take the schema lock to lock out any DDL operation
that can potentially destroy a space or an index. Note, vinyl doesn't
need this lock, because it implements index reference counting under
the hood.

Actually, we don't really need to take a lock - instead we can simply
postpone index destruction until checkpointing is complete, similarly
to how we postpone destruction of individual tuples. We even have all
the infrastructure for this - it's delayed garbage collection. So this
patch tweaks it a bit to delay the actual index destruction to be done
after checkpointing is complete.

This is a step forward towards removal of the schema lock, which stands
in the way of transactional DDL.
---
 src/box/gc.c                    |  8 ------
 src/box/memtx_engine.c          | 42 ++++++++++++++++++++++++++------
 src/box/memtx_engine.h          |  5 ++++
 test/app-tap/snapshot.test.lua  |  5 +++-
 test/engine/errinj_ddl.result   | 54 +++++++++++++++++++++++++++++++++++++++++
 test/engine/errinj_ddl.test.lua | 24 ++++++++++++++++++
 6 files changed, 121 insertions(+), 17 deletions(-)

diff --git a/src/box/gc.c b/src/box/gc.c
index 5639edd8..a2c963e0 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -54,7 +54,6 @@
 #include "say.h"
 #include "vclock.h"
 #include "cbus.h"
-#include "schema.h"
 #include "engine.h"		/* engine_collect_garbage() */
 #include "wal.h"		/* wal_collect_garbage() */
 #include "checkpoint_schedule.h"
@@ -369,12 +368,6 @@ gc_do_checkpoint(void)
 	gc.checkpoint_is_in_progress = true;
 
 	/*
-	 * We don't support DDL operations while making a checkpoint.
-	 * Lock them out.
-	 */
-	latch_lock(&schema_lock);
-
-	/*
 	 * Rotate WAL and call engine callbacks to create a checkpoint
 	 * on disk for each registered engine.
 	 */
@@ -398,7 +391,6 @@ out:
 	if (rc != 0)
 		engine_abort_checkpoint();
 
-	latch_unlock(&schema_lock);
 	gc.checkpoint_is_in_progress = false;
 	return rc;
 }
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index e259bf62..c8376110 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -527,20 +527,20 @@ checkpoint_write_row(struct xlog *l, struct xrow_header *row)
 }
 
 static int
-checkpoint_write_tuple(struct xlog *l, struct space *space,
+checkpoint_write_tuple(struct xlog *l, uint32_t space_id, uint32_t group_id,
 		       const char *data, uint32_t size)
 {
 	struct request_replace_body body;
 	body.m_body = 0x82; /* map of two elements. */
 	body.k_space_id = IPROTO_SPACE_ID;
 	body.m_space_id = 0xce; /* uint32 */
-	body.v_space_id = mp_bswap_u32(space_id(space));
+	body.v_space_id = mp_bswap_u32(space_id);
 	body.k_tuple = IPROTO_TUPLE;
 
 	struct xrow_header row;
 	memset(&row, 0, sizeof(struct xrow_header));
 	row.type = IPROTO_INSERT;
-	row.group_id = space_group_id(space);
+	row.group_id = group_id;
 
 	row.bodycnt = 2;
 	row.body[0].iov_base = &body;
@@ -551,7 +551,8 @@ checkpoint_write_tuple(struct xlog *l, struct space *space,
 }
 
 struct checkpoint_entry {
-	struct space *space;
+	uint32_t space_id;
+	uint32_t group_id;
 	struct snapshot_iterator *iterator;
 	struct rlist link;
 };
@@ -641,7 +642,8 @@ checkpoint_add_space(struct space *sp, void *data)
 	}
 	rlist_add_tail_entry(&ckpt->entries, entry, link);
 
-	entry->space = sp;
+	entry->space_id = space_id(sp);
+	entry->group_id = space_group_id(sp);
 	entry->iterator = index_create_snapshot_iterator(pk);
 	if (entry->iterator == NULL)
 		return -1;
@@ -677,8 +679,8 @@ checkpoint_f(va_list ap)
 		struct snapshot_iterator *it = entry->iterator;
 		for (data = it->next(it, &size); data != NULL;
 		     data = it->next(it, &size)) {
-			if (checkpoint_write_tuple(&snap, entry->space,
-					data, size) != 0) {
+			if (checkpoint_write_tuple(&snap, entry->space_id,
+					entry->group_id, data, size) != 0) {
 				xlog_close(&snap, false);
 				return -1;
 			}
@@ -748,6 +750,19 @@ memtx_engine_wait_checkpoint(struct engine *engine,
 	return result;
 }
 
+/**
+ * Called after checkpointing is complete to free indexes dropped
+ * while checkpointing was in progress, see memtx_engine_run_gc().
+ */
+static void
+memtx_engine_gc_after_checkpoint(struct memtx_engine *memtx)
+{
+	struct memtx_gc_task *task, *next;
+	stailq_foreach_entry_safe(task, next, &memtx->gc_to_free, link)
+		task->vtab->free(task);
+	stailq_create(&memtx->gc_to_free);
+}
+
 static void
 memtx_engine_commit_checkpoint(struct engine *engine,
 			       const struct vclock *vclock)
@@ -785,6 +800,8 @@ memtx_engine_commit_checkpoint(struct engine *engine,
 
 	checkpoint_delete(memtx->checkpoint);
 	memtx->checkpoint = NULL;
+
+	memtx_engine_gc_after_checkpoint(memtx);
 }
 
 static void
@@ -969,7 +986,15 @@ memtx_engine_run_gc(struct memtx_engine *memtx, bool *stop)
 	task->vtab->run(task, &task_done);
 	if (task_done) {
 		stailq_shift(&memtx->gc_queue);
-		task->vtab->free(task);
+		/*
+		 * If checkpointing is in progress, the index may be
+		 * used by the checkpoint thread so we postpone freeing
+		 * until checkpointing is complete.
+		 */
+		if (memtx->checkpoint == NULL)
+			task->vtab->free(task);
+		else
+			stailq_add_entry(&memtx->gc_to_free, task, link);
 	}
 }
 
@@ -1041,6 +1066,7 @@ memtx_engine_new(const char *snap_dirname, bool force_recovery,
 	}
 
 	stailq_create(&memtx->gc_queue);
+	stailq_create(&memtx->gc_to_free);
 	memtx->gc_fiber = fiber_new("memtx.gc", memtx_engine_gc_f);
 	if (memtx->gc_fiber == NULL)
 		goto fail;
diff --git a/src/box/memtx_engine.h b/src/box/memtx_engine.h
index ccb51678..fcf595e7 100644
--- a/src/box/memtx_engine.h
+++ b/src/box/memtx_engine.h
@@ -155,6 +155,11 @@ struct memtx_engine {
 	 * memtx_gc_task::link.
 	 */
 	struct stailq gc_queue;
+	/**
+	 * List of tasks awaiting to be freed once checkpointing
+	 * is complete, linked by memtx_gc_task::link.
+	 */
+	struct stailq gc_to_free;
 };
 
 struct memtx_gc_task;
diff --git a/test/app-tap/snapshot.test.lua b/test/app-tap/snapshot.test.lua
index f8ad1631..d86f32fe 100755
--- a/test/app-tap/snapshot.test.lua
+++ b/test/app-tap/snapshot.test.lua
@@ -86,7 +86,8 @@ local i2 = s2:create_index('test', { type = 'tree', parts = {1, 'unsigned'} })
 
 for i = 1,1000 do s1:insert{i, i, i} end
 
-fiber.create(function () box.snapshot() end)
+local snap_chan = fiber.channel()
+fiber.create(function () box.snapshot() snap_chan:put(true) end)
 
 fiber.sleep(0)
 
@@ -96,6 +97,8 @@ s2:update({1}, {{'+', 2, 2}})
 s1:drop()
 s2:drop()
 
+snap_chan:get()
+
 test:ok(true, "gh-1185: no crash in matras_touch")
 
 -------------------------------------------------------------------------------
diff --git a/test/engine/errinj_ddl.result b/test/engine/errinj_ddl.result
index 5f404300..a28223a6 100644
--- a/test/engine/errinj_ddl.result
+++ b/test/engine/errinj_ddl.result
@@ -350,3 +350,57 @@ s:count() -- 200
 s:drop()
 ---
 ...
+--
+-- Dropping and recreating a space while it is being written
+-- to a checkpoint.
+--
+s = box.schema.space.create('test', {engine = engine})
+---
+...
+_ = s:create_index('pk')
+---
+...
+for i = 1, 5 do s:insert{i, i} end
+---
+...
+errinj.set("ERRINJ_SNAP_WRITE_DELAY", true)
+---
+- ok
+...
+ch = fiber.channel(1)
+---
+...
+_ = fiber.create(function() box.snapshot() ch:put(true) end)
+---
+...
+s:drop()
+---
+...
+s = box.schema.space.create('test', {engine = engine})
+---
+...
+_ = s:create_index('pk')
+---
+...
+for i = 1, 5 do s:insert{i, i * 10} end
+---
+...
+errinj.set("ERRINJ_SNAP_WRITE_DELAY", false)
+---
+- ok
+...
+ch:get()
+---
+- true
+...
+s:select()
+---
+- - [1, 10]
+  - [2, 20]
+  - [3, 30]
+  - [4, 40]
+  - [5, 50]
+...
+s:drop()
+---
+...
diff --git a/test/engine/errinj_ddl.test.lua b/test/engine/errinj_ddl.test.lua
index a382daa4..c40eae93 100644
--- a/test/engine/errinj_ddl.test.lua
+++ b/test/engine/errinj_ddl.test.lua
@@ -164,3 +164,27 @@ ch:get()
 
 s:count() -- 200
 s:drop()
+
+--
+-- Dropping and recreating a space while it is being written
+-- to a checkpoint.
+--
+s = box.schema.space.create('test', {engine = engine})
+_ = s:create_index('pk')
+for i = 1, 5 do s:insert{i, i} end
+
+errinj.set("ERRINJ_SNAP_WRITE_DELAY", true)
+ch = fiber.channel(1)
+_ = fiber.create(function() box.snapshot() ch:put(true) end)
+
+s:drop()
+
+s = box.schema.space.create('test', {engine = engine})
+_ = s:create_index('pk')
+for i = 1, 5 do s:insert{i, i * 10} end
+
+errinj.set("ERRINJ_SNAP_WRITE_DELAY", false)
+ch:get()
+
+s:select()
+s:drop()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/6] test: make vinyl/replica_rejoin more stable
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
                   ` (2 preceding siblings ...)
  2019-06-30 19:40 ` [PATCH 3/6] Don't take schema lock for checkpointing Vladimir Davydov
@ 2019-06-30 19:40 ` Vladimir Davydov
  2019-07-03 19:23   ` Konstantin Osipov
  2019-06-30 19:40 ` [PATCH 5/6] vinyl: don't yield while logging index creation Vladimir Davydov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

The test checks that files left after rebootstrap are removed by the
garbage collector. It does that by printing file names to the result
file. This is inherently unstable, because should timing change, and
we can easily get an extra dump or compaction resulting in a different
set of files and hence test failure. Let's rewrite the test so that
it checks that files are actually removed using fio.path.exists().
---
 test/vinyl/replica_rejoin.result   | 119 ++++++++++++++++++-------------------
 test/vinyl/replica_rejoin.test.lua |  34 ++++++-----
 2 files changed, 79 insertions(+), 74 deletions(-)

diff --git a/test/vinyl/replica_rejoin.result b/test/vinyl/replica_rejoin.result
index 1dfcb91b..6e8156fe 100644
--- a/test/vinyl/replica_rejoin.result
+++ b/test/vinyl/replica_rejoin.result
@@ -11,6 +11,12 @@ test_run = env.new()
 box.schema.user.grant('guest', 'replication')
 ---
 ...
+_ = box.schema.space.create('mem')
+---
+...
+_ = box.space.mem:create_index('pk', {parts = {1, 'string'}})
+---
+...
 _ = box.schema.space.create('test', { id = 9000, engine = 'vinyl' })
 ---
 ...
@@ -36,28 +42,12 @@ test_run:cmd("start server replica")
 ---
 - true
 ...
-test_run:cmd("switch replica")
+files = test_run:eval("replica", "fio = require('fio') return fio.glob(fio.pathjoin(box.cfg.vinyl_dir, box.space.test.id, 0, '*'))")[1]
 ---
-- true
 ...
-fio = require('fio')
+_ = box.space.mem:replace{'files', files}
 ---
 ...
-fio.chdir(box.cfg.vinyl_dir)
----
-- true
-...
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
----
-- - 9000/0/00000000000000000002.index
-  - 9000/0/00000000000000000002.run
-  - 9000/0/00000000000000000004.index
-  - 9000/0/00000000000000000004.run
-...
-test_run:cmd("switch default")
----
-- true
-...
 test_run:cmd("stop server replica")
 ---
 - true
@@ -97,20 +87,6 @@ box.snapshot()
 ---
 - ok
 ...
-fio = require('fio')
----
-...
-fio.chdir(box.cfg.vinyl_dir)
----
-- true
-...
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
----
-- - 9000/0/00000000000000000008.index
-  - 9000/0/00000000000000000008.run
-  - 9000/0/00000000000000000010.index
-  - 9000/0/00000000000000000010.run
-...
 box.space.test:count() -- 99
 ---
 - 99
@@ -119,6 +95,28 @@ test_run:cmd("switch default")
 ---
 - true
 ...
+files = box.space.mem:get('files')[2]
+---
+...
+ok = true
+---
+...
+fio = require('fio')
+---
+...
+for _, f in ipairs(files) do ok = ok and not fio.path.exists(f) end
+---
+...
+ok
+---
+- true
+...
+files = test_run:eval("replica", "fio = require('fio') return fio.glob(fio.pathjoin(box.cfg.vinyl_dir, box.space.test.id, 0, '*'))")[1]
+---
+...
+_ = box.space.mem:replace{'files', files}
+---
+...
 test_run:cmd("stop server replica")
 ---
 - true
@@ -164,20 +162,6 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
-fio = require('fio')
----
-...
-fio.chdir(box.cfg.vinyl_dir)
----
-- true
-...
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
----
-- - 9000/0/00000000000000000008.index
-  - 9000/0/00000000000000000008.run
-  - 9000/0/00000000000000000010.index
-  - 9000/0/00000000000000000010.run
-...
 box.space.test:count() -- 99
 ---
 - 99
@@ -186,6 +170,16 @@ test_run:cmd("switch default")
 ---
 - true
 ...
+old_files = box.space.mem:get('files')[2]
+---
+...
+new_files = test_run:eval("replica", "fio = require('fio') return fio.glob(fio.pathjoin(box.cfg.vinyl_dir, box.space.test.id, 0, '*'))")[1]
+---
+...
+#old_files == #new_files
+---
+- true
+...
 test_run:cmd("stop server replica")
 ---
 - true
@@ -211,20 +205,6 @@ box.snapshot()
 ---
 - ok
 ...
-fio = require('fio')
----
-...
-fio.chdir(box.cfg.vinyl_dir)
----
-- true
-...
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
----
-- - 9000/0/00000000000000000022.index
-  - 9000/0/00000000000000000022.run
-  - 9000/0/00000000000000000024.index
-  - 9000/0/00000000000000000024.run
-...
 box.space.test:count() -- 98
 ---
 - 98
@@ -233,6 +213,22 @@ test_run:cmd("switch default")
 ---
 - true
 ...
+files = box.space.mem:get('files')[2]
+---
+...
+ok = true
+---
+...
+fio = require('fio')
+---
+...
+for _, f in ipairs(files) do ok = ok and not fio.path.exists(f) end
+---
+...
+ok
+---
+- true
+...
 test_run:cmd("stop server replica")
 ---
 - true
@@ -245,6 +241,9 @@ test_run:cmd("cleanup server replica")
 box.space.test:drop()
 ---
 ...
+box.space.mem:drop()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/vinyl/replica_rejoin.test.lua b/test/vinyl/replica_rejoin.test.lua
index 2c5a69e0..cbb17ef2 100644
--- a/test/vinyl/replica_rejoin.test.lua
+++ b/test/vinyl/replica_rejoin.test.lua
@@ -6,6 +6,8 @@ test_run = env.new()
 -- after rebootstrap.
 --
 box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('mem')
+_ = box.space.mem:create_index('pk', {parts = {1, 'string'}})
 _ = box.schema.space.create('test', { id = 9000, engine = 'vinyl' })
 _ = box.space.test:create_index('pk')
 pad = string.rep('x', 12 * 1024)
@@ -15,11 +17,8 @@ box.snapshot()
 -- Join a replica. Check its files.
 test_run:cmd("create server replica with rpl_master=default, script='vinyl/replica_rejoin.lua'")
 test_run:cmd("start server replica")
-test_run:cmd("switch replica")
-fio = require('fio')
-fio.chdir(box.cfg.vinyl_dir)
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
-test_run:cmd("switch default")
+files = test_run:eval("replica", "fio = require('fio') return fio.glob(fio.pathjoin(box.cfg.vinyl_dir, box.space.test.id, 0, '*'))")[1]
+_ = box.space.mem:replace{'files', files}
 test_run:cmd("stop server replica")
 
 -- Invoke garbage collector on the master.
@@ -36,11 +35,15 @@ test_run:cmd("start server replica")
 test_run:cmd("switch replica")
 box.cfg{checkpoint_count = 1}
 box.snapshot()
-fio = require('fio')
-fio.chdir(box.cfg.vinyl_dir)
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
 box.space.test:count() -- 99
 test_run:cmd("switch default")
+files = box.space.mem:get('files')[2]
+ok = true
+fio = require('fio')
+for _, f in ipairs(files) do ok = ok and not fio.path.exists(f) end
+ok
+files = test_run:eval("replica", "fio = require('fio') return fio.glob(fio.pathjoin(box.cfg.vinyl_dir, box.space.test.id, 0, '*'))")[1]
+_ = box.space.mem:replace{'files', files}
 test_run:cmd("stop server replica")
 
 -- Invoke garbage collector on the master.
@@ -59,11 +62,11 @@ test_run:cmd("start server replica with crash_expected=True") -- fail
 test_run:cmd("start server replica with crash_expected=True") -- fail again
 test_run:cmd("start server replica with args='disable_replication'")
 test_run:cmd("switch replica")
-fio = require('fio')
-fio.chdir(box.cfg.vinyl_dir)
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
 box.space.test:count() -- 99
 test_run:cmd("switch default")
+old_files = box.space.mem:get('files')[2]
+new_files = test_run:eval("replica", "fio = require('fio') return fio.glob(fio.pathjoin(box.cfg.vinyl_dir, box.space.test.id, 0, '*'))")[1]
+#old_files == #new_files
 test_run:cmd("stop server replica")
 box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', false)
 
@@ -73,14 +76,17 @@ test_run:cmd("start server replica")
 test_run:cmd("switch replica")
 box.cfg{checkpoint_count = 1}
 box.snapshot()
-fio = require('fio')
-fio.chdir(box.cfg.vinyl_dir)
-fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
 box.space.test:count() -- 98
 test_run:cmd("switch default")
+files = box.space.mem:get('files')[2]
+ok = true
+fio = require('fio')
+for _, f in ipairs(files) do ok = ok and not fio.path.exists(f) end
+ok
 test_run:cmd("stop server replica")
 
 -- Cleanup.
 test_run:cmd("cleanup server replica")
 box.space.test:drop()
+box.space.mem:drop()
 box.schema.user.revoke('guest', 'replication')
-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/6] vinyl: don't yield while logging index creation
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
                   ` (3 preceding siblings ...)
  2019-06-30 19:40 ` [PATCH 4/6] test: make vinyl/replica_rejoin more stable Vladimir Davydov
@ 2019-06-30 19:40 ` Vladimir Davydov
  2019-06-30 19:40 ` [PATCH 6/6] Replace schema lock with fine-grained locking Vladimir Davydov
  2019-07-05  8:53 ` [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
  6 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Currently, we always log a vinyl index creation in the vylog file
synchronously, i.e. wait for the write to complete successfully. This
makes any index creation a yielding operation, even if the target space
is empty. To implement transactional DDL for non-yielding statements, we
need to eliminate yields in this case. We can do that by simply using
vy_log_try_commit() instead of vy_log_commit() for logging index
creation, because we can handle a missing VY_LOG_PREPARE_INDEX record
during recovery - the code was left since before commit dd0827ba7948
("vinyl: log new index before WAL write on DDL") which split index
creation into PREPARE and COMMIT stages so all we need to do is slightly
modify the test.

The reason why I'm doing this now, in the series removing the schema
lock, is that removal of the schema lock without making space truncation
non-yielding (remember space truncation basically drops and recreates
all indexes) may result in a failure while executing space.truncate()
from concurrent fibers, which is rather unexpected. In particular, this
is checked by engine/truncate.test.lua. So to prevent the test failure
once the schema lock is removed (see the next patch), let's make empty
index creation non-yielding right now.
---
 src/box/vy_lsm.c                 |  3 +--
 test/unit/vy_log_stub.c          |  3 +++
 test/vinyl/errinj_vylog.result   | 23 +++++++++++++++++++----
 test/vinyl/errinj_vylog.test.lua | 10 ++++++++--
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index c4bcc74c..3e134cc1 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -339,8 +339,7 @@ vy_lsm_create(struct vy_lsm *lsm)
 	vy_log_prepare_lsm(id, lsm->space_id, lsm->index_id,
 			   lsm->group_id, lsm->key_def);
 	vy_log_insert_range(id, range->id, NULL, NULL);
-	if (vy_log_tx_commit() < 0)
-		return -1;
+	vy_log_tx_try_commit();
 
 	/* Assign the id. */
 	assert(lsm->id < 0);
diff --git a/test/unit/vy_log_stub.c b/test/unit/vy_log_stub.c
index d4668408..be73adc7 100644
--- a/test/unit/vy_log_stub.c
+++ b/test/unit/vy_log_stub.c
@@ -42,6 +42,9 @@ vy_log_next_id(void)
 void
 vy_log_tx_begin(void) {}
 
+void
+vy_log_tx_try_commit(void) {}
+
 int
 vy_log_tx_commit(void)
 {
diff --git a/test/vinyl/errinj_vylog.result b/test/vinyl/errinj_vylog.result
index 06cc6818..8fb5deda 100644
--- a/test/vinyl/errinj_vylog.result
+++ b/test/vinyl/errinj_vylog.result
@@ -161,10 +161,24 @@ s2.index.pk:drop()
 ---
 ...
 -- pending records must not be rolled back on error
-s2:create_index('pk') -- error
+SCHED_TIMEOUT = 0.05
+---
+...
+box.error.injection.set('ERRINJ_VY_SCHED_TIMEOUT', SCHED_TIMEOUT)
+---
+- ok
+...
+box.snapshot() -- error
 ---
 - error: Error injection 'vinyl log flush'
 ...
+fiber.sleep(2 * SCHED_TIMEOUT)
+---
+...
+box.error.injection.set('ERRINJ_VY_SCHED_TIMEOUT', 0)
+---
+- ok
+...
 box.error.injection.set('ERRINJ_VY_LOG_FLUSH', false);
 ---
 - ok
@@ -260,6 +274,10 @@ _ = s1:insert{333, 'ccc'}
 s2.index.pk:drop()
 ---
 ...
+-- VY_LOG_PREPARE_LSM missing
+_ = s2:create_index('pk')
+---
+...
 test_run:cmd('restart server default')
 s1 = box.space.test1
 ---
@@ -270,9 +288,6 @@ s2 = box.space.test2
 _ = s1:insert{444, 'ddd'}
 ---
 ...
-_ = s2:create_index('pk')
----
-...
 _ = s2:insert{555, 'eee'}
 ---
 ...
diff --git a/test/vinyl/errinj_vylog.test.lua b/test/vinyl/errinj_vylog.test.lua
index 2936f879..5ec58682 100644
--- a/test/vinyl/errinj_vylog.test.lua
+++ b/test/vinyl/errinj_vylog.test.lua
@@ -79,7 +79,11 @@ _ = s1:insert{3, 'c'}
 s2.index.pk:drop()
 
 -- pending records must not be rolled back on error
-s2:create_index('pk') -- error
+SCHED_TIMEOUT = 0.05
+box.error.injection.set('ERRINJ_VY_SCHED_TIMEOUT', SCHED_TIMEOUT)
+box.snapshot() -- error
+fiber.sleep(2 * SCHED_TIMEOUT)
+box.error.injection.set('ERRINJ_VY_SCHED_TIMEOUT', 0)
 
 box.error.injection.set('ERRINJ_VY_LOG_FLUSH', false);
 
@@ -126,13 +130,15 @@ _ = s1:insert{333, 'ccc'}
 -- VY_LOG_DROP_LSM missing
 s2.index.pk:drop()
 
+-- VY_LOG_PREPARE_LSM missing
+_ = s2:create_index('pk')
+
 test_run:cmd('restart server default')
 
 s1 = box.space.test1
 s2 = box.space.test2
 
 _ = s1:insert{444, 'ddd'}
-_ = s2:create_index('pk')
 _ = s2:insert{555, 'eee'}
 
 s1:select()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
                   ` (4 preceding siblings ...)
  2019-06-30 19:40 ` [PATCH 5/6] vinyl: don't yield while logging index creation Vladimir Davydov
@ 2019-06-30 19:40 ` Vladimir Davydov
  2019-07-03 19:35   ` Konstantin Osipov
  2019-07-05  8:53 ` [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
  6 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-06-30 19:40 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Now, as we don't need to take the schema lock for checkpointing, it is
only used to synchronize concurrent space modifications (drop, truncate,
alter). Actually, a global lock is a way too heavy means to achieve this
goal, because we only care about forbidding concurrent modifications of
the same space while concurrent modifications of different spaces should
work just fine. So this patch replaces schema lock with a per-space
flag. The flag is called is_in_alter and set by alter_space_new() and
cleared by alter_space_delete(). If the flag is already set when
alter_space_new() is called, an error is thrown.

Removal of the schema lock allows us to remove latch_steal() helper and
on_begin_stmt txn trigger altogether, as they were introduced solely to
support locking.

This is a prerequisite for transactional DDL, because it's unclear how
to preserve the global schema lock while allowing to combine several DDL
statements in the same transaction.
---
 src/box/alter.cc                | 68 +++++-----------------------
 src/box/alter.h                 |  3 --
 src/box/schema.cc               | 41 +++++++----------
 src/box/schema.h                |  6 ---
 src/box/space.c                 |  2 -
 src/box/space.h                 |  7 ++-
 src/box/txn.c                   |  3 --
 src/box/vy_lsm.c                |  1 +
 src/lib/core/latch.h            | 10 -----
 test/box/on_replace.result      |  8 ++--
 test/box/transaction.result     |  2 +-
 test/engine/errinj_ddl.result   | 98 +++++++++++++++++++++++++++++++++++++++++
 test/engine/errinj_ddl.test.lua | 38 ++++++++++++++++
 13 files changed, 176 insertions(+), 111 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index e76b9e68..f82993c3 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -568,7 +568,6 @@ space_swap_triggers(struct space *new_space, struct space *old_space)
 {
 	rlist_swap(&new_space->before_replace, &old_space->before_replace);
 	rlist_swap(&new_space->on_replace, &old_space->on_replace);
-	rlist_swap(&new_space->on_stmt_begin, &old_space->on_stmt_begin);
 	/** Swap SQL Triggers pointer. */
 	struct sql_trigger *new_value = new_space->sql_triggers;
 	new_space->sql_triggers = old_space->sql_triggers;
@@ -707,6 +706,10 @@ struct alter_space {
 static struct alter_space *
 alter_space_new(struct space *old_space)
 {
+	if (old_space->is_in_alter) {
+		tnt_raise(ClientError, ER_ALTER_SPACE, space_name(old_space),
+			  "the space is already being modified");
+	}
 	struct alter_space *alter =
 		region_calloc_object_xc(&in_txn()->region, struct alter_space);
 	rlist_create(&alter->ops);
@@ -716,6 +719,8 @@ alter_space_new(struct space *old_space)
 		alter->new_min_field_count = old_space->format->min_field_count;
 	else
 		alter->new_min_field_count = 0;
+
+	old_space->is_in_alter = true;
 	return alter;
 }
 
@@ -723,6 +728,11 @@ alter_space_new(struct space *old_space)
 static void
 alter_space_delete(struct alter_space *alter)
 {
+	if (alter->old_space != NULL) {
+		assert(alter->old_space->is_in_alter);
+		alter->old_space->is_in_alter = false;
+	}
+
 	/* Destroy the ops. */
 	while (! rlist_empty(&alter->ops)) {
 		AlterSpaceOp *op = rlist_shift_entry(&alter->ops,
@@ -769,6 +779,7 @@ alter_space_commit(struct trigger *trigger, void *event)
 	 * going to use it.
 	 */
 	space_delete(alter->old_space);
+	alter->old_space = NULL;
 	alter_space_delete(alter);
 }
 
@@ -3570,49 +3581,6 @@ on_replace_dd_space_sequence(struct trigger * /* trigger */, void *event)
 
 /* }}} sequence */
 
-static void
-unlock_after_dd(struct trigger *trigger, void *event)
-{
-	(void) trigger;
-	(void) event;
-	/*
-	 * In case of yielding journal this trigger will be processed
-	 * in a context of tx_prio endpoint instead of a context of
-	 * a fiber which has this latch locked. So steal the latch first.
-	 */
-	latch_steal(&schema_lock);
-	latch_unlock(&schema_lock);
-}
-
-static void
-lock_before_dd(struct trigger *trigger, void *event)
-{
-	(void) trigger;
-	if (fiber() == latch_owner(&schema_lock))
-		return;
-	struct txn *txn = (struct txn *)event;
-	/*
-	 * This trigger is executed before any check and may yield
-	 * on the latch lock. But a yield in a non-autocommit
-	 * memtx transaction will roll it back silently, rather
-	 * than produce an error, which is very confusing.
-	 * So don't try to lock a latch if there is
-	 * a multi-statement transaction.
-	 */
-	txn_check_singlestatement_xc(txn, "DDL");
-	struct trigger *on_commit =
-		txn_alter_trigger_new(unlock_after_dd, NULL);
-	struct trigger *on_rollback =
-		txn_alter_trigger_new(unlock_after_dd, NULL);
-	/*
-	 * Setting triggers doesn't fail. Lock the latch last
-	 * to avoid leaking the latch in case of exception.
-	 */
-	txn_on_commit(txn, on_commit);
-	txn_on_rollback(txn, on_rollback);
-	latch_lock(&schema_lock);
-}
-
 /**
  * Trigger invoked on rollback in the _trigger space.
  * Since rollback trigger is invoked after insertion to hash and space,
@@ -4473,18 +4441,6 @@ struct trigger on_replace_space_sequence = {
 	RLIST_LINK_INITIALIZER, on_replace_dd_space_sequence, NULL, NULL
 };
 
-struct trigger on_stmt_begin_space = {
-	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
-};
-
-struct trigger on_stmt_begin_index = {
-	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
-};
-
-struct trigger on_stmt_begin_truncate = {
-	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
-};
-
 struct trigger on_replace_trigger = {
 	RLIST_LINK_INITIALIZER, on_replace_dd_trigger, NULL, NULL
 };
diff --git a/src/box/alter.h b/src/box/alter.h
index b9ba7b84..c339ccea 100644
--- a/src/box/alter.h
+++ b/src/box/alter.h
@@ -47,8 +47,5 @@ extern struct trigger on_replace_space_sequence;
 extern struct trigger on_replace_trigger;
 extern struct trigger on_replace_fk_constraint;
 extern struct trigger on_replace_ck_constraint;
-extern struct trigger on_stmt_begin_space;
-extern struct trigger on_stmt_begin_index;
-extern struct trigger on_stmt_begin_truncate;
 
 #endif /* INCLUDES_TARANTOOL_BOX_ALTER_H */
diff --git a/src/box/schema.cc b/src/box/schema.cc
index 64412fac..0ab256ed 100644
--- a/src/box/schema.cc
+++ b/src/box/schema.cc
@@ -37,6 +37,7 @@
 #include "scoped_guard.h"
 #include "user.h"
 #include "vclock.h"
+#include "fiber.h"
 
 /**
  * @module Data Dictionary
@@ -74,11 +75,6 @@ struct rlist on_alter_space = RLIST_HEAD_INITIALIZER(on_alter_space);
 struct rlist on_alter_sequence = RLIST_HEAD_INITIALIZER(on_alter_sequence);
 struct rlist on_alter_func = RLIST_HEAD_INITIALIZER(on_alter_func);
 
-/**
- * Lock of scheme modification
- */
-struct latch schema_lock = LATCH_INITIALIZER(schema_lock);
-
 struct entity_access entity_access;
 
 bool
@@ -268,8 +264,7 @@ static void
 sc_space_new(uint32_t id, const char *name,
 	     struct key_part_def *key_parts,
 	     uint32_t key_part_count,
-	     struct trigger *replace_trigger,
-	     struct trigger *stmt_begin_trigger)
+	     struct trigger *replace_trigger)
 {
 	struct key_def *key_def = key_def_new(key_parts, key_part_count);
 	if (key_def == NULL)
@@ -298,8 +293,6 @@ sc_space_new(uint32_t id, const char *name,
 	space_cache_replace(NULL, space);
 	if (replace_trigger)
 		trigger_add(&space->on_replace, replace_trigger);
-	if (stmt_begin_trigger)
-		trigger_add(&space->on_stmt_begin, stmt_begin_trigger);
 	/*
 	 * Data dictionary spaces are fully built since:
 	 * - they contain data right from the start
@@ -394,56 +387,56 @@ schema_init()
 	key_parts[0].fieldno = 0;
 	key_parts[0].type = FIELD_TYPE_STRING;
 	sc_space_new(BOX_SCHEMA_ID, "_schema", key_parts, 1,
-		     &on_replace_schema, NULL);
+		     &on_replace_schema);
 
 	/* _collation - collation description. */
 	key_parts[0].fieldno = 0;
 	key_parts[0].type = FIELD_TYPE_UNSIGNED;
 	sc_space_new(BOX_COLLATION_ID, "_collation", key_parts, 1,
-		     &on_replace_collation, NULL);
+		     &on_replace_collation);
 
 	/* _space - home for all spaces. */
 	sc_space_new(BOX_SPACE_ID, "_space", key_parts, 1,
-		     &alter_space_on_replace_space, &on_stmt_begin_space);
+		     &alter_space_on_replace_space);
 
 	/* _truncate - auxiliary space for triggering space truncation. */
 	sc_space_new(BOX_TRUNCATE_ID, "_truncate", key_parts, 1,
-		     &on_replace_truncate, &on_stmt_begin_truncate);
+		     &on_replace_truncate);
 
 	/* _sequence - definition of all sequence objects. */
 	sc_space_new(BOX_SEQUENCE_ID, "_sequence", key_parts, 1,
-		     &on_replace_sequence, NULL);
+		     &on_replace_sequence);
 
 	/* _sequence_data - current sequence value. */
 	sc_space_new(BOX_SEQUENCE_DATA_ID, "_sequence_data", key_parts, 1,
-		     &on_replace_sequence_data, NULL);
+		     &on_replace_sequence_data);
 
 	/* _space_seq - association space <-> sequence. */
 	sc_space_new(BOX_SPACE_SEQUENCE_ID, "_space_sequence", key_parts, 1,
-		     &on_replace_space_sequence, NULL);
+		     &on_replace_space_sequence);
 
 	/* _user - all existing users */
-	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user, NULL);
+	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user);
 
 	/* _func - all executable objects on which one can have grants */
-	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func, NULL);
+	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func);
 	/*
 	 * _priv - association user <-> object
 	 * The real index is defined in the snapshot.
 	 */
-	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv, NULL);
+	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv);
 	/*
 	 * _cluster - association instance uuid <-> instance id
 	 * The real index is defined in the snapshot.
 	 */
 	sc_space_new(BOX_CLUSTER_ID, "_cluster", key_parts, 1,
-		     &on_replace_cluster, NULL);
+		     &on_replace_cluster);
 
 	/* _trigger - all existing SQL triggers. */
 	key_parts[0].fieldno = 0;
 	key_parts[0].type = FIELD_TYPE_STRING;
 	sc_space_new(BOX_TRIGGER_ID, "_trigger", key_parts, 1,
-		     &on_replace_trigger, NULL);
+		     &on_replace_trigger);
 
 	/* _index - definition of all space indexes. */
 	key_parts[0].fieldno = 0; /* space id */
@@ -451,7 +444,7 @@ schema_init()
 	key_parts[1].fieldno = 1; /* index id */
 	key_parts[1].type = FIELD_TYPE_UNSIGNED;
 	sc_space_new(BOX_INDEX_ID, "_index", key_parts, 2,
-		     &alter_space_on_replace_index, &on_stmt_begin_index);
+		     &alter_space_on_replace_index);
 
 	/* _fk_сonstraint - foreign keys constraints. */
 	key_parts[0].fieldno = 0; /* constraint name */
@@ -459,7 +452,7 @@ schema_init()
 	key_parts[1].fieldno = 1; /* child space */
 	key_parts[1].type = FIELD_TYPE_UNSIGNED;
 	sc_space_new(BOX_FK_CONSTRAINT_ID, "_fk_constraint", key_parts, 2,
-		     &on_replace_fk_constraint, NULL);
+		     &on_replace_fk_constraint);
 
 	/* _ck_сonstraint - check constraints. */
 	key_parts[0].fieldno = 0; /* space id */
@@ -467,7 +460,7 @@ schema_init()
 	key_parts[1].fieldno = 1; /* constraint name */
 	key_parts[1].type = FIELD_TYPE_STRING;
 	sc_space_new(BOX_CK_CONSTRAINT_ID, "_ck_constraint", key_parts, 2,
-		     &on_replace_ck_constraint, NULL);
+		     &on_replace_ck_constraint);
 
 	/*
 	 * _vinyl_deferred_delete - blackhole that is needed
diff --git a/src/box/schema.h b/src/box/schema.h
index f0039b29..9f8f6345 100644
--- a/src/box/schema.h
+++ b/src/box/schema.h
@@ -35,7 +35,6 @@
 #include <stdio.h> /* snprintf */
 #include "error.h"
 #include "space.h"
-#include "latch.h"
 
 #if defined(__cplusplus)
 extern "C" {
@@ -48,11 +47,6 @@ extern uint32_t space_cache_version;
 extern struct rlist on_schema_init;
 
 /**
- * Lock of schema modification
- */
-extern struct latch schema_lock;
-
-/**
  * Try to look up a space by space number in the space cache.
  * FFI-friendly no-exception-thrown space lookup function.
  *
diff --git a/src/box/space.c b/src/box/space.c
index b6ad87bf..e7babb2a 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -137,7 +137,6 @@ space_create(struct space *space, struct engine *engine,
 	space->index_id_max = index_id_max;
 	rlist_create(&space->before_replace);
 	rlist_create(&space->on_replace);
-	rlist_create(&space->on_stmt_begin);
 	space->run_triggers = true;
 
 	space->format = format;
@@ -219,7 +218,6 @@ space_delete(struct space *space)
 		tuple_format_unref(space->format);
 	trigger_destroy(&space->before_replace);
 	trigger_destroy(&space->on_replace);
-	trigger_destroy(&space->on_stmt_begin);
 	space_def_delete(space->def);
 	/*
 	 * SQL Triggers should be deleted with
diff --git a/src/box/space.h b/src/box/space.h
index 949f37d4..2cc0643d 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -154,8 +154,6 @@ struct space {
 	struct rlist before_replace;
 	/** Triggers fired after space_replace() -- see txn_commit_stmt(). */
 	struct rlist on_replace;
-	/** Triggers fired before space statement */
-	struct rlist on_stmt_begin;
 	/** SQL Trigger list. */
 	struct sql_trigger *sql_triggers;
 	/**
@@ -183,6 +181,11 @@ struct space {
 	/** Enable/disable triggers. */
 	bool run_triggers;
 	/**
+	 * Set if the space is being altered and hence any other
+	 * DDL request must be rejected.
+	 */
+	bool is_in_alter;
+	/**
 	 * Space format or NULL if space does not have format
 	 * (sysview engine, for example).
 	 */
diff --git a/src/box/txn.c b/src/box/txn.c
index 95076773..818f405b 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -242,9 +242,6 @@ txn_begin_stmt(struct txn *txn, struct space *space)
 	if (space == NULL)
 		return 0;
 
-	if (trigger_run(&space->on_stmt_begin, txn) != 0)
-		goto fail;
-
 	struct engine *engine = space->engine;
 	if (txn_begin_in_engine(engine, txn) != 0)
 		goto fail;
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 3e134cc1..48590521 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -38,6 +38,7 @@
 #include <small/mempool.h>
 
 #include "diag.h"
+#include "fiber.h"
 #include "errcode.h"
 #include "histogram.h"
 #include "index_def.h"
diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
index 58094256..49c59cf6 100644
--- a/src/lib/core/latch.h
+++ b/src/lib/core/latch.h
@@ -156,16 +156,6 @@ latch_trylock(struct latch *l)
 }
 
 /**
- * Take a latch ownership
- */
-static inline void
-latch_steal(struct latch *l)
-{
-	assert(l->owner != NULL);
-	l->owner = fiber();
-}
-
-/**
  * \copydoc box_latch_unlock
  */
 static inline void
diff --git a/test/box/on_replace.result b/test/box/on_replace.result
index a2f51409..b71c9878 100644
--- a/test/box/on_replace.result
+++ b/test/box/on_replace.result
@@ -477,7 +477,7 @@ t = s:on_replace(function () s:create_index('sec') end, t)
 ...
 s:replace({2, 3})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _index does not support multi-statement transactions
 ...
 t = s:on_replace(function () box.schema.user.create('newu') end, t)
 ---
@@ -512,7 +512,7 @@ t = s:on_replace(function () s:drop() end, t)
 ...
 s:replace({5, 6})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _index does not support multi-statement transactions
 ...
 t = s:on_replace(function () box.schema.func.create('newf') end, t)
 ---
@@ -533,14 +533,14 @@ t = s:on_replace(function () s:rename('newname') end, t)
 ...
 s:replace({8, 9})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _space does not support multi-statement transactions
 ...
 t = s:on_replace(function () s.index.pk:rename('newname') end, t)
 ---
 ...
 s:replace({9, 10})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _index does not support multi-statement transactions
 ...
 s:select()
 ---
diff --git a/test/box/transaction.result b/test/box/transaction.result
index bc50735b..9da53e5b 100644
--- a/test/box/transaction.result
+++ b/test/box/transaction.result
@@ -87,7 +87,7 @@ while f:status() ~= 'dead' do fiber.sleep(0) end;
 -- some operation involves more than one ddl spaces, so they should fail
 box.begin() box.schema.space.create('test');
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _space does not support multi-statement transactions
 ...
 box.rollback();
 ---
diff --git a/test/engine/errinj_ddl.result b/test/engine/errinj_ddl.result
index a28223a6..60bf34d9 100644
--- a/test/engine/errinj_ddl.result
+++ b/test/engine/errinj_ddl.result
@@ -404,3 +404,101 @@ s:select()
 s:drop()
 ---
 ...
+--
+-- Concurrent DDL.
+--
+s1 = box.schema.space.create('test1', {engine = engine})
+---
+...
+_ = s1:create_index('pk')
+---
+...
+for i = 1, 5 do s1:insert{i, i} end
+---
+...
+s2 = box.schema.space.create('test2', {engine = engine})
+---
+...
+_ = s2:create_index('pk')
+---
+...
+for i = 1, 5 do s2:insert{i, i} end
+---
+...
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
+---
+- ok
+...
+ch = fiber.channel(1)
+---
+...
+_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
+---
+...
+-- Modification of the same space must fail while an index creation
+-- is in progress.
+s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+s1:truncate()
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+s1:drop()
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+-- Modification of another space must work though.
+s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+---
+...
+s2:truncate()
+---
+...
+_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
+---
+...
+s2:drop()
+---
+...
+s2 = box.schema.space.create('test2', {engine = engine})
+---
+...
+_ = s2:create_index('pk')
+---
+...
+s2:drop()
+---
+...
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
+---
+- ok
+...
+ch:get()
+---
+- true
+...
+s1.index.pk:select()
+---
+- - [1, 1]
+  - [2, 2]
+  - [3, 3]
+  - [4, 4]
+  - [5, 5]
+...
+s1.index.sk:select()
+---
+- - [1, 1]
+  - [2, 2]
+  - [3, 3]
+  - [4, 4]
+  - [5, 5]
+...
+s1:drop()
+---
+...
diff --git a/test/engine/errinj_ddl.test.lua b/test/engine/errinj_ddl.test.lua
index c40eae93..207610c4 100644
--- a/test/engine/errinj_ddl.test.lua
+++ b/test/engine/errinj_ddl.test.lua
@@ -188,3 +188,41 @@ ch:get()
 
 s:select()
 s:drop()
+
+--
+-- Concurrent DDL.
+--
+s1 = box.schema.space.create('test1', {engine = engine})
+_ = s1:create_index('pk')
+for i = 1, 5 do s1:insert{i, i} end
+
+s2 = box.schema.space.create('test2', {engine = engine})
+_ = s2:create_index('pk')
+for i = 1, 5 do s2:insert{i, i} end
+
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
+ch = fiber.channel(1)
+_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
+
+-- Modification of the same space must fail while an index creation
+-- is in progress.
+s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+s1:truncate()
+_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
+s1:drop()
+
+-- Modification of another space must work though.
+s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+s2:truncate()
+_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
+s2:drop()
+s2 = box.schema.space.create('test2', {engine = engine})
+_ = s2:create_index('pk')
+s2:drop()
+
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
+ch:get()
+
+s1.index.pk:select()
+s1.index.sk:select()
+s1:drop()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers
  2019-06-30 19:40 ` [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers Vladimir Davydov
@ 2019-07-03 19:12   ` Konstantin Osipov
  2019-07-04 15:50     ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:12 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> ERROR_INJECT_YIELD yields the current fiber execution by calling
> fiber_sleep(0) while the given error injection is set.

I would sleep for some minimal non-zero time, e.g. a millisecond,
to avoid consuming CPU in parallel tests.
 
> ERROR_INJECT_SLEEP suspends the current thread execution by calling
> usleep(1000) while the given error injection is set.

LGTM.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY
  2019-06-30 19:40 ` [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY Vladimir Davydov
@ 2019-07-03 19:13   ` Konstantin Osipov
  2019-07-04 15:51     ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:13 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> Timeout injections are unstable and difficult to use. Injecting a delay
> is much more convenient.

lgtm


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] Don't take schema lock for checkpointing
  2019-06-30 19:40 ` [PATCH 3/6] Don't take schema lock for checkpointing Vladimir Davydov
@ 2019-07-03 19:21   ` Konstantin Osipov
  2019-07-03 20:05     ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:21 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> Memtx checkpointing proceeds as follows: first we open iterators over
> primary indexes of all spaces and save them to a list, then we start
> a thread that uses the iterators to dump space contents to a snap file.
> To avoid accessing a freed tuple, we put the small allocator to the
> delayed free mode. However, this doesn't prevent an index from being
> dropped so we also take the schema lock to lock out any DDL operation
> that can potentially destroy a space or an index. Note, vinyl doesn't
> need this lock, because it implements index reference counting under
> the hood.
> 
> Actually, we don't really need to take a lock - instead we can simply
> postpone index destruction until checkpointing is complete, similarly
> to how we postpone destruction of individual tuples. We even have all
> the infrastructure for this - it's delayed garbage collection. So this
> patch tweaks it a bit to delay the actual index destruction to be done
> after checkpointing is complete.
> 
> This is a step forward towards removal of the schema lock, which stands
> in the way of transactional DDL.

Looks like you do it because I said once I hate reference
counting.

First, I don't mind having reference counting for memtx index objects now
that we've approached transactional ddl frontier.

But even reference counting would be a bit cumbersome for this.
Please take a look at how bps does it - it links all pages into a
fifo-like list, and a checkpoint simply sets a savepoint on that
list. Committing a checkpoint releases the savepoint and garbage
collects all objects that have been freed after the savepoint. 

SQL will need multiple concurrent snapshots - so it will need
multiple versions. So please consider turning the algorithm you've
just used a general-purpose one - so that any database object
could add itself for delayed destruction, the delayed destruction
would take place immediately if there are no savepoints, or
immediately after the savepoint up until the next savepoint.

Then we can move all subsystems to this.

If you have a better general-purpose object garbage collection
idea, please share/implement it too.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] test: make vinyl/replica_rejoin more stable
  2019-06-30 19:40 ` [PATCH 4/6] test: make vinyl/replica_rejoin more stable Vladimir Davydov
@ 2019-07-03 19:23   ` Konstantin Osipov
  2019-07-04 15:51     ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:23 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> The test checks that files left after rebootstrap are removed by the
> garbage collector. It does that by printing file names to the result
> file. This is inherently unstable, because should timing change, and
> we can easily get an extra dump or compaction resulting in a different
> set of files and hence test failure. Let's rewrite the test so that
> it checks that files are actually removed using fio.path.exists().

lgtm


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-06-30 19:40 ` [PATCH 6/6] Replace schema lock with fine-grained locking Vladimir Davydov
@ 2019-07-03 19:35   ` Konstantin Osipov
  2019-07-03 19:56     ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:35 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> Now, as we don't need to take the schema lock for checkpointing, it is
> only used to synchronize concurrent space modifications (drop, truncate,
> alter). Actually, a global lock is a way too heavy means to achieve this
> goal, because we only care about forbidding concurrent modifications of
> the same space while concurrent modifications of different spaces should
> work just fine. So this patch replaces schema lock with a per-space
> flag. The flag is called is_in_alter and set by alter_space_new() and
> cleared by alter_space_delete(). If the flag is already set when
> alter_space_new() is called, an error is thrown.

Uh-oh.

Could you please do a bit more coding?

There are inherent dangers in using a boolean flag rather than a
normal lock:
- lock life time is bound to object life time
- there is no way to put itself into a wait queue
- there is an implicit assumption that a fiber only takes one lock
  and no way to inspect/free all locks of a fiber.
- deadlock detection is impossible.

Let's add a normal name-based locking for this:

struct lock {
    enum object_type object_type;
    char *object_name;
    enum { S, X } type;
    struct lock *pending;
    struct fiber *owner; // ideally it should be struct txn, or int txn_id, not struct fiber
};

hash<lock> metadata_locks;

This could be a separate module in box.cc. The api should take
locks by name:

struct lock *metadata_lock_get(enum object_type type, char
*object_name, enum lock_type type, int txn_id);

and unlock by value or txn_id:

void metadata_lock_free(struct lock *lock);

void metadata_lock_free_all(int txn_id);

> Removal of the schema lock allows us to remove latch_steal() helper and
> on_begin_stmt txn trigger altogether, as they were introduced solely to
> support locking.
> 
> This is a prerequisite for transactional DDL, because it's unclear how
> to preserve the global schema lock while allowing to combine several DDL
> statements in the same transaction.
> ---
>  src/box/alter.cc                | 68 +++++-----------------------
>  src/box/alter.h                 |  3 --
>  src/box/schema.cc               | 41 +++++++----------
>  src/box/schema.h                |  6 ---
>  src/box/space.c                 |  2 -
>  src/box/space.h                 |  7 ++-
>  src/box/txn.c                   |  3 --
>  src/box/vy_lsm.c                |  1 +
>  src/lib/core/latch.h            | 10 -----
>  test/box/on_replace.result      |  8 ++--
>  test/box/transaction.result     |  2 +-
>  test/engine/errinj_ddl.result   | 98 +++++++++++++++++++++++++++++++++++++++++
>  test/engine/errinj_ddl.test.lua | 38 ++++++++++++++++
>  13 files changed, 176 insertions(+), 111 deletions(-)
> 
> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index e76b9e68..f82993c3 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -568,7 +568,6 @@ space_swap_triggers(struct space *new_space, struct space *old_space)
>  {
>  	rlist_swap(&new_space->before_replace, &old_space->before_replace);
>  	rlist_swap(&new_space->on_replace, &old_space->on_replace);
> -	rlist_swap(&new_space->on_stmt_begin, &old_space->on_stmt_begin);
>  	/** Swap SQL Triggers pointer. */
>  	struct sql_trigger *new_value = new_space->sql_triggers;
>  	new_space->sql_triggers = old_space->sql_triggers;
> @@ -707,6 +706,10 @@ struct alter_space {
>  static struct alter_space *
>  alter_space_new(struct space *old_space)
>  {
> +	if (old_space->is_in_alter) {
> +		tnt_raise(ClientError, ER_ALTER_SPACE, space_name(old_space),
> +			  "the space is already being modified");
> +	}
>  	struct alter_space *alter =
>  		region_calloc_object_xc(&in_txn()->region, struct alter_space);
>  	rlist_create(&alter->ops);
> @@ -716,6 +719,8 @@ alter_space_new(struct space *old_space)
>  		alter->new_min_field_count = old_space->format->min_field_count;
>  	else
>  		alter->new_min_field_count = 0;
> +
> +	old_space->is_in_alter = true;
>  	return alter;
>  }
>  
> @@ -723,6 +728,11 @@ alter_space_new(struct space *old_space)
>  static void
>  alter_space_delete(struct alter_space *alter)
>  {
> +	if (alter->old_space != NULL) {
> +		assert(alter->old_space->is_in_alter);
> +		alter->old_space->is_in_alter = false;
> +	}
> +
>  	/* Destroy the ops. */
>  	while (! rlist_empty(&alter->ops)) {
>  		AlterSpaceOp *op = rlist_shift_entry(&alter->ops,
> @@ -769,6 +779,7 @@ alter_space_commit(struct trigger *trigger, void *event)
>  	 * going to use it.
>  	 */
>  	space_delete(alter->old_space);
> +	alter->old_space = NULL;
>  	alter_space_delete(alter);
>  }
>  
> @@ -3570,49 +3581,6 @@ on_replace_dd_space_sequence(struct trigger * /* trigger */, void *event)
>  
>  /* }}} sequence */
>  
> -static void
> -unlock_after_dd(struct trigger *trigger, void *event)
> -{
> -	(void) trigger;
> -	(void) event;
> -	/*
> -	 * In case of yielding journal this trigger will be processed
> -	 * in a context of tx_prio endpoint instead of a context of
> -	 * a fiber which has this latch locked. So steal the latch first.
> -	 */
> -	latch_steal(&schema_lock);
> -	latch_unlock(&schema_lock);
> -}
> -
> -static void
> -lock_before_dd(struct trigger *trigger, void *event)
> -{
> -	(void) trigger;
> -	if (fiber() == latch_owner(&schema_lock))
> -		return;
> -	struct txn *txn = (struct txn *)event;
> -	/*
> -	 * This trigger is executed before any check and may yield
> -	 * on the latch lock. But a yield in a non-autocommit
> -	 * memtx transaction will roll it back silently, rather
> -	 * than produce an error, which is very confusing.
> -	 * So don't try to lock a latch if there is
> -	 * a multi-statement transaction.
> -	 */
> -	txn_check_singlestatement_xc(txn, "DDL");
> -	struct trigger *on_commit =
> -		txn_alter_trigger_new(unlock_after_dd, NULL);
> -	struct trigger *on_rollback =
> -		txn_alter_trigger_new(unlock_after_dd, NULL);
> -	/*
> -	 * Setting triggers doesn't fail. Lock the latch last
> -	 * to avoid leaking the latch in case of exception.
> -	 */
> -	txn_on_commit(txn, on_commit);
> -	txn_on_rollback(txn, on_rollback);
> -	latch_lock(&schema_lock);
> -}
> -
>  /**
>   * Trigger invoked on rollback in the _trigger space.
>   * Since rollback trigger is invoked after insertion to hash and space,
> @@ -4473,18 +4441,6 @@ struct trigger on_replace_space_sequence = {
>  	RLIST_LINK_INITIALIZER, on_replace_dd_space_sequence, NULL, NULL
>  };
>  
> -struct trigger on_stmt_begin_space = {
> -	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
> -};
> -
> -struct trigger on_stmt_begin_index = {
> -	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
> -};
> -
> -struct trigger on_stmt_begin_truncate = {
> -	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
> -};
> -
>  struct trigger on_replace_trigger = {
>  	RLIST_LINK_INITIALIZER, on_replace_dd_trigger, NULL, NULL
>  };
> diff --git a/src/box/alter.h b/src/box/alter.h
> index b9ba7b84..c339ccea 100644
> --- a/src/box/alter.h
> +++ b/src/box/alter.h
> @@ -47,8 +47,5 @@ extern struct trigger on_replace_space_sequence;
>  extern struct trigger on_replace_trigger;
>  extern struct trigger on_replace_fk_constraint;
>  extern struct trigger on_replace_ck_constraint;
> -extern struct trigger on_stmt_begin_space;
> -extern struct trigger on_stmt_begin_index;
> -extern struct trigger on_stmt_begin_truncate;
>  
>  #endif /* INCLUDES_TARANTOOL_BOX_ALTER_H */
> diff --git a/src/box/schema.cc b/src/box/schema.cc
> index 64412fac..0ab256ed 100644
> --- a/src/box/schema.cc
> +++ b/src/box/schema.cc
> @@ -37,6 +37,7 @@
>  #include "scoped_guard.h"
>  #include "user.h"
>  #include "vclock.h"
> +#include "fiber.h"
>  
>  /**
>   * @module Data Dictionary
> @@ -74,11 +75,6 @@ struct rlist on_alter_space = RLIST_HEAD_INITIALIZER(on_alter_space);
>  struct rlist on_alter_sequence = RLIST_HEAD_INITIALIZER(on_alter_sequence);
>  struct rlist on_alter_func = RLIST_HEAD_INITIALIZER(on_alter_func);
>  
> -/**
> - * Lock of scheme modification
> - */
> -struct latch schema_lock = LATCH_INITIALIZER(schema_lock);
> -
>  struct entity_access entity_access;
>  
>  bool
> @@ -268,8 +264,7 @@ static void
>  sc_space_new(uint32_t id, const char *name,
>  	     struct key_part_def *key_parts,
>  	     uint32_t key_part_count,
> -	     struct trigger *replace_trigger,
> -	     struct trigger *stmt_begin_trigger)
> +	     struct trigger *replace_trigger)
>  {
>  	struct key_def *key_def = key_def_new(key_parts, key_part_count);
>  	if (key_def == NULL)
> @@ -298,8 +293,6 @@ sc_space_new(uint32_t id, const char *name,
>  	space_cache_replace(NULL, space);
>  	if (replace_trigger)
>  		trigger_add(&space->on_replace, replace_trigger);
> -	if (stmt_begin_trigger)
> -		trigger_add(&space->on_stmt_begin, stmt_begin_trigger);
>  	/*
>  	 * Data dictionary spaces are fully built since:
>  	 * - they contain data right from the start
> @@ -394,56 +387,56 @@ schema_init()
>  	key_parts[0].fieldno = 0;
>  	key_parts[0].type = FIELD_TYPE_STRING;
>  	sc_space_new(BOX_SCHEMA_ID, "_schema", key_parts, 1,
> -		     &on_replace_schema, NULL);
> +		     &on_replace_schema);
>  
>  	/* _collation - collation description. */
>  	key_parts[0].fieldno = 0;
>  	key_parts[0].type = FIELD_TYPE_UNSIGNED;
>  	sc_space_new(BOX_COLLATION_ID, "_collation", key_parts, 1,
> -		     &on_replace_collation, NULL);
> +		     &on_replace_collation);
>  
>  	/* _space - home for all spaces. */
>  	sc_space_new(BOX_SPACE_ID, "_space", key_parts, 1,
> -		     &alter_space_on_replace_space, &on_stmt_begin_space);
> +		     &alter_space_on_replace_space);
>  
>  	/* _truncate - auxiliary space for triggering space truncation. */
>  	sc_space_new(BOX_TRUNCATE_ID, "_truncate", key_parts, 1,
> -		     &on_replace_truncate, &on_stmt_begin_truncate);
> +		     &on_replace_truncate);
>  
>  	/* _sequence - definition of all sequence objects. */
>  	sc_space_new(BOX_SEQUENCE_ID, "_sequence", key_parts, 1,
> -		     &on_replace_sequence, NULL);
> +		     &on_replace_sequence);
>  
>  	/* _sequence_data - current sequence value. */
>  	sc_space_new(BOX_SEQUENCE_DATA_ID, "_sequence_data", key_parts, 1,
> -		     &on_replace_sequence_data, NULL);
> +		     &on_replace_sequence_data);
>  
>  	/* _space_seq - association space <-> sequence. */
>  	sc_space_new(BOX_SPACE_SEQUENCE_ID, "_space_sequence", key_parts, 1,
> -		     &on_replace_space_sequence, NULL);
> +		     &on_replace_space_sequence);
>  
>  	/* _user - all existing users */
> -	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user, NULL);
> +	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user);
>  
>  	/* _func - all executable objects on which one can have grants */
> -	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func, NULL);
> +	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func);
>  	/*
>  	 * _priv - association user <-> object
>  	 * The real index is defined in the snapshot.
>  	 */
> -	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv, NULL);
> +	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv);
>  	/*
>  	 * _cluster - association instance uuid <-> instance id
>  	 * The real index is defined in the snapshot.
>  	 */
>  	sc_space_new(BOX_CLUSTER_ID, "_cluster", key_parts, 1,
> -		     &on_replace_cluster, NULL);
> +		     &on_replace_cluster);
>  
>  	/* _trigger - all existing SQL triggers. */
>  	key_parts[0].fieldno = 0;
>  	key_parts[0].type = FIELD_TYPE_STRING;
>  	sc_space_new(BOX_TRIGGER_ID, "_trigger", key_parts, 1,
> -		     &on_replace_trigger, NULL);
> +		     &on_replace_trigger);
>  
>  	/* _index - definition of all space indexes. */
>  	key_parts[0].fieldno = 0; /* space id */
> @@ -451,7 +444,7 @@ schema_init()
>  	key_parts[1].fieldno = 1; /* index id */
>  	key_parts[1].type = FIELD_TYPE_UNSIGNED;
>  	sc_space_new(BOX_INDEX_ID, "_index", key_parts, 2,
> -		     &alter_space_on_replace_index, &on_stmt_begin_index);
> +		     &alter_space_on_replace_index);
>  
>  	/* _fk_сonstraint - foreign keys constraints. */
>  	key_parts[0].fieldno = 0; /* constraint name */
> @@ -459,7 +452,7 @@ schema_init()
>  	key_parts[1].fieldno = 1; /* child space */
>  	key_parts[1].type = FIELD_TYPE_UNSIGNED;
>  	sc_space_new(BOX_FK_CONSTRAINT_ID, "_fk_constraint", key_parts, 2,
> -		     &on_replace_fk_constraint, NULL);
> +		     &on_replace_fk_constraint);
>  
>  	/* _ck_сonstraint - check constraints. */
>  	key_parts[0].fieldno = 0; /* space id */
> @@ -467,7 +460,7 @@ schema_init()
>  	key_parts[1].fieldno = 1; /* constraint name */
>  	key_parts[1].type = FIELD_TYPE_STRING;
>  	sc_space_new(BOX_CK_CONSTRAINT_ID, "_ck_constraint", key_parts, 2,
> -		     &on_replace_ck_constraint, NULL);
> +		     &on_replace_ck_constraint);
>  
>  	/*
>  	 * _vinyl_deferred_delete - blackhole that is needed
> diff --git a/src/box/schema.h b/src/box/schema.h
> index f0039b29..9f8f6345 100644
> --- a/src/box/schema.h
> +++ b/src/box/schema.h
> @@ -35,7 +35,6 @@
>  #include <stdio.h> /* snprintf */
>  #include "error.h"
>  #include "space.h"
> -#include "latch.h"
>  
>  #if defined(__cplusplus)
>  extern "C" {
> @@ -48,11 +47,6 @@ extern uint32_t space_cache_version;
>  extern struct rlist on_schema_init;
>  
>  /**
> - * Lock of schema modification
> - */
> -extern struct latch schema_lock;
> -
> -/**
>   * Try to look up a space by space number in the space cache.
>   * FFI-friendly no-exception-thrown space lookup function.
>   *
> diff --git a/src/box/space.c b/src/box/space.c
> index b6ad87bf..e7babb2a 100644
> --- a/src/box/space.c
> +++ b/src/box/space.c
> @@ -137,7 +137,6 @@ space_create(struct space *space, struct engine *engine,
>  	space->index_id_max = index_id_max;
>  	rlist_create(&space->before_replace);
>  	rlist_create(&space->on_replace);
> -	rlist_create(&space->on_stmt_begin);
>  	space->run_triggers = true;
>  
>  	space->format = format;
> @@ -219,7 +218,6 @@ space_delete(struct space *space)
>  		tuple_format_unref(space->format);
>  	trigger_destroy(&space->before_replace);
>  	trigger_destroy(&space->on_replace);
> -	trigger_destroy(&space->on_stmt_begin);
>  	space_def_delete(space->def);
>  	/*
>  	 * SQL Triggers should be deleted with
> diff --git a/src/box/space.h b/src/box/space.h
> index 949f37d4..2cc0643d 100644
> --- a/src/box/space.h
> +++ b/src/box/space.h
> @@ -154,8 +154,6 @@ struct space {
>  	struct rlist before_replace;
>  	/** Triggers fired after space_replace() -- see txn_commit_stmt(). */
>  	struct rlist on_replace;
> -	/** Triggers fired before space statement */
> -	struct rlist on_stmt_begin;
>  	/** SQL Trigger list. */
>  	struct sql_trigger *sql_triggers;
>  	/**
> @@ -183,6 +181,11 @@ struct space {
>  	/** Enable/disable triggers. */
>  	bool run_triggers;
>  	/**
> +	 * Set if the space is being altered and hence any other
> +	 * DDL request must be rejected.
> +	 */
> +	bool is_in_alter;
> +	/**
>  	 * Space format or NULL if space does not have format
>  	 * (sysview engine, for example).
>  	 */
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 95076773..818f405b 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -242,9 +242,6 @@ txn_begin_stmt(struct txn *txn, struct space *space)
>  	if (space == NULL)
>  		return 0;
>  
> -	if (trigger_run(&space->on_stmt_begin, txn) != 0)
> -		goto fail;
> -
>  	struct engine *engine = space->engine;
>  	if (txn_begin_in_engine(engine, txn) != 0)
>  		goto fail;
> diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
> index 3e134cc1..48590521 100644
> --- a/src/box/vy_lsm.c
> +++ b/src/box/vy_lsm.c
> @@ -38,6 +38,7 @@
>  #include <small/mempool.h>
>  
>  #include "diag.h"
> +#include "fiber.h"
>  #include "errcode.h"
>  #include "histogram.h"
>  #include "index_def.h"
> diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
> index 58094256..49c59cf6 100644
> --- a/src/lib/core/latch.h
> +++ b/src/lib/core/latch.h
> @@ -156,16 +156,6 @@ latch_trylock(struct latch *l)
>  }
>  
>  /**
> - * Take a latch ownership
> - */
> -static inline void
> -latch_steal(struct latch *l)
> -{
> -	assert(l->owner != NULL);
> -	l->owner = fiber();
> -}
> -
> -/**
>   * \copydoc box_latch_unlock
>   */
>  static inline void
> diff --git a/test/box/on_replace.result b/test/box/on_replace.result
> index a2f51409..b71c9878 100644
> --- a/test/box/on_replace.result
> +++ b/test/box/on_replace.result
> @@ -477,7 +477,7 @@ t = s:on_replace(function () s:create_index('sec') end, t)
>  ...
>  s:replace({2, 3})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _index does not support multi-statement transactions
>  ...
>  t = s:on_replace(function () box.schema.user.create('newu') end, t)
>  ---
> @@ -512,7 +512,7 @@ t = s:on_replace(function () s:drop() end, t)
>  ...
>  s:replace({5, 6})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _index does not support multi-statement transactions
>  ...
>  t = s:on_replace(function () box.schema.func.create('newf') end, t)
>  ---
> @@ -533,14 +533,14 @@ t = s:on_replace(function () s:rename('newname') end, t)
>  ...
>  s:replace({8, 9})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _space does not support multi-statement transactions
>  ...
>  t = s:on_replace(function () s.index.pk:rename('newname') end, t)
>  ---
>  ...
>  s:replace({9, 10})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _index does not support multi-statement transactions
>  ...
>  s:select()
>  ---
> diff --git a/test/box/transaction.result b/test/box/transaction.result
> index bc50735b..9da53e5b 100644
> --- a/test/box/transaction.result
> +++ b/test/box/transaction.result
> @@ -87,7 +87,7 @@ while f:status() ~= 'dead' do fiber.sleep(0) end;
>  -- some operation involves more than one ddl spaces, so they should fail
>  box.begin() box.schema.space.create('test');
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _space does not support multi-statement transactions
>  ...
>  box.rollback();
>  ---
> diff --git a/test/engine/errinj_ddl.result b/test/engine/errinj_ddl.result
> index a28223a6..60bf34d9 100644
> --- a/test/engine/errinj_ddl.result
> +++ b/test/engine/errinj_ddl.result
> @@ -404,3 +404,101 @@ s:select()
>  s:drop()
>  ---
>  ...
> +--
> +-- Concurrent DDL.
> +--
> +s1 = box.schema.space.create('test1', {engine = engine})
> +---
> +...
> +_ = s1:create_index('pk')
> +---
> +...
> +for i = 1, 5 do s1:insert{i, i} end
> +---
> +...
> +s2 = box.schema.space.create('test2', {engine = engine})
> +---
> +...
> +_ = s2:create_index('pk')
> +---
> +...
> +for i = 1, 5 do s2:insert{i, i} end
> +---
> +...
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
> +---
> +- ok
> +...
> +ch = fiber.channel(1)
> +---
> +...
> +_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
> +---
> +...
> +-- Modification of the same space must fail while an index creation
> +-- is in progress.
> +s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +s1:truncate()
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +s1:drop()
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +-- Modification of another space must work though.
> +s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +---
> +...
> +s2:truncate()
> +---
> +...
> +_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
> +---
> +...
> +s2:drop()
> +---
> +...
> +s2 = box.schema.space.create('test2', {engine = engine})
> +---
> +...
> +_ = s2:create_index('pk')
> +---
> +...
> +s2:drop()
> +---
> +...
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
> +---
> +- ok
> +...
> +ch:get()
> +---
> +- true
> +...
> +s1.index.pk:select()
> +---
> +- - [1, 1]
> +  - [2, 2]
> +  - [3, 3]
> +  - [4, 4]
> +  - [5, 5]
> +...
> +s1.index.sk:select()
> +---
> +- - [1, 1]
> +  - [2, 2]
> +  - [3, 3]
> +  - [4, 4]
> +  - [5, 5]
> +...
> +s1:drop()
> +---
> +...
> diff --git a/test/engine/errinj_ddl.test.lua b/test/engine/errinj_ddl.test.lua
> index c40eae93..207610c4 100644
> --- a/test/engine/errinj_ddl.test.lua
> +++ b/test/engine/errinj_ddl.test.lua
> @@ -188,3 +188,41 @@ ch:get()
>  
>  s:select()
>  s:drop()
> +
> +--
> +-- Concurrent DDL.
> +--
> +s1 = box.schema.space.create('test1', {engine = engine})
> +_ = s1:create_index('pk')
> +for i = 1, 5 do s1:insert{i, i} end
> +
> +s2 = box.schema.space.create('test2', {engine = engine})
> +_ = s2:create_index('pk')
> +for i = 1, 5 do s2:insert{i, i} end
> +
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
> +ch = fiber.channel(1)
> +_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
> +
> +-- Modification of the same space must fail while an index creation
> +-- is in progress.
> +s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +s1:truncate()
> +_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
> +s1:drop()
> +
> +-- Modification of another space must work though.
> +s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +s2:truncate()
> +_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
> +s2:drop()
> +s2 = box.schema.space.create('test2', {engine = engine})
> +_ = s2:create_index('pk')
> +s2:drop()
> +
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
> +ch:get()
> +
> +s1.index.pk:select()
> +s1.index.sk:select()
> +s1:drop()
> -- 
> 2.11.0
> 

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-07-03 19:35   ` Konstantin Osipov
@ 2019-07-03 19:56     ` Vladimir Davydov
  2019-07-04  8:09       ` Konstantin Osipov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-03 19:56 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Wed, Jul 03, 2019 at 10:35:41PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> > Now, as we don't need to take the schema lock for checkpointing, it is
> > only used to synchronize concurrent space modifications (drop, truncate,
> > alter). Actually, a global lock is a way too heavy means to achieve this
> > goal, because we only care about forbidding concurrent modifications of
> > the same space while concurrent modifications of different spaces should
> > work just fine. So this patch replaces schema lock with a per-space
> > flag. The flag is called is_in_alter and set by alter_space_new() and
> > cleared by alter_space_delete(). If the flag is already set when
> > alter_space_new() is called, an error is thrown.
> 
> Uh-oh.
> 
> Could you please do a bit more coding?
> 
> There are inherent dangers in using a boolean flag rather than a
> normal lock:
> - lock life time is bound to object life time
> - there is no way to put itself into a wait queue
> - there is an implicit assumption that a fiber only takes one lock
>   and no way to inspect/free all locks of a fiber.
> - deadlock detection is impossible.

TBO I don't think you follow. It isn't a lock, actually. It's just a
flag saying the space is busy building an index. If someone tries to do
something with a space that is busy, they will fail. This is consistent
with vinyl tx manager behavior. No deadlock is possible by design.

I don't see any point implementing some kind of generic locking scheme
at this point, because as I said, there are actually no locks. I think
we should get to this once we start thinking about locking in the
transaction manager, not now.

> 
> Let's add a normal name-based locking for this:
> 
> struct lock {
>     enum object_type object_type;
>     char *object_name;
>     enum { S, X } type;
>     struct lock *pending;
>     struct fiber *owner; // ideally it should be struct txn, or int txn_id, not struct fiber
> };
> 
> hash<lock> metadata_locks;
> 
> This could be a separate module in box.cc. The api should take
> locks by name:
> 
> struct lock *metadata_lock_get(enum object_type type, char
> *object_name, enum lock_type type, int txn_id);
> 
> and unlock by value or txn_id:
> 
> void metadata_lock_free(struct lock *lock);
> 
> void metadata_lock_free_all(int txn_id);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] Don't take schema lock for checkpointing
  2019-07-03 19:21   ` Konstantin Osipov
@ 2019-07-03 20:05     ` Vladimir Davydov
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-03 20:05 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Wed, Jul 03, 2019 at 10:21:53PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> > Memtx checkpointing proceeds as follows: first we open iterators over
> > primary indexes of all spaces and save them to a list, then we start
> > a thread that uses the iterators to dump space contents to a snap file.
> > To avoid accessing a freed tuple, we put the small allocator to the
> > delayed free mode. However, this doesn't prevent an index from being
> > dropped so we also take the schema lock to lock out any DDL operation
> > that can potentially destroy a space or an index. Note, vinyl doesn't
> > need this lock, because it implements index reference counting under
> > the hood.
> > 
> > Actually, we don't really need to take a lock - instead we can simply
> > postpone index destruction until checkpointing is complete, similarly
> > to how we postpone destruction of individual tuples. We even have all
> > the infrastructure for this - it's delayed garbage collection. So this
> > patch tweaks it a bit to delay the actual index destruction to be done
> > after checkpointing is complete.
> > 
> > This is a step forward towards removal of the schema lock, which stands
> > in the way of transactional DDL.
> 
> Looks like you do it because I said once I hate reference
> counting.
> 
> First, I don't mind having reference counting for memtx index objects now
> that we've approached transactional ddl frontier.
> 
> But even reference counting would be a bit cumbersome for this.
> Please take a look at how bps does it - it links all pages into a
> fifo-like list, and a checkpoint simply sets a savepoint on that
> list. Committing a checkpoint releases the savepoint and garbage
> collects all objects that have been freed after the savepoint. 
> 
> SQL will need multiple concurrent snapshots - so it will need
> multiple versions. So please consider turning the algorithm you've
> just used a general-purpose one - so that any database object
> could add itself for delayed destruction, the delayed destruction
> would take place immediately if there are no savepoints, or
> immediately after the savepoint up until the next savepoint.
> 
> Then we can move all subsystems to this.
> 
> If you have a better general-purpose object garbage collection
> idea, please share/implement it too.

I thought we'd agreed that we aren't going to implement schema multi
versioning in foreseeable future so this patch does a quick fix for
memtx snapshotting to get rid of the schema lock which conflicts with
the notion of transactional DDL.

If you think that reference counting is a better design - fine, I'm okay
with it - we can pull the patch by Georgy that did exactly that. Just
that I think that introducing base index reference counting solely to
fix a problem in memtx seems to be a bit of an overkill for me.

Regarding SQL. I don't think I follow what you're talking about.
Let's discuss it f2f.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-07-03 19:56     ` Vladimir Davydov
@ 2019-07-04  8:09       ` Konstantin Osipov
  2019-07-04 17:06         ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-04  8:09 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/03 23:00]:
> TBO I don't think you follow. It isn't a lock, actually. It's just a
> flag saying the space is busy building an index. If someone tries to do
> something with a space that is busy, they will fail. This is consistent
> with vinyl tx manager behavior. No deadlock is possible by design.

It's a kind of locking called optimistic locking. Any lock can be acquired in
optimistic mode (_trylock()) and in pessimistic mode. Once you
have normal locks you can trylock them ad nausea, the only
question is whether users will like it. Imagine I have a DDL
script which I use to update/upgrade my data on production. Do I really
want to add re-tries to this DDL script now that I know it worked
in my test environment? Most users don't. 

With transactional data dictionary multiple metadata locks are
around the corner. 

Imagine this:

box.begin()
box.space.foo:rename('tmp')
box.space.bar:rename('foo')
box.space.foo:rename('bar')
box.commit()

the same can will soon be possible to do in SQL as well. 

all these locks will have to stay around till commit, so leave
through WAL yield.

In other words, we either need to switch the data dictionary to
MVCC or to use locks, or better yet, have both mechanisms
available so that we can use whichever we need when we need it.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers
  2019-07-03 19:12   ` Konstantin Osipov
@ 2019-07-04 15:50     ` Vladimir Davydov
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-04 15:50 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Wed, Jul 03, 2019 at 10:12:57PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> > ERROR_INJECT_YIELD yields the current fiber execution by calling
> > fiber_sleep(0) while the given error injection is set.
> 
> I would sleep for some minimal non-zero time, e.g. a millisecond,
> to avoid consuming CPU in parallel tests.

Done and pushed to master.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY
  2019-07-03 19:13   ` Konstantin Osipov
@ 2019-07-04 15:51     ` Vladimir Davydov
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-04 15:51 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Wed, Jul 03, 2019 at 10:13:57PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> > Timeout injections are unstable and difficult to use. Injecting a delay
> > is much more convenient.
> 
> lgtm

Pushed to master.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] test: make vinyl/replica_rejoin more stable
  2019-07-03 19:23   ` Konstantin Osipov
@ 2019-07-04 15:51     ` Vladimir Davydov
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-04 15:51 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Wed, Jul 03, 2019 at 10:23:03PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> > The test checks that files left after rebootstrap are removed by the
> > garbage collector. It does that by printing file names to the result
> > file. This is inherently unstable, because should timing change, and
> > we can easily get an extra dump or compaction resulting in a different
> > set of files and hence test failure. Let's rewrite the test so that
> > it checks that files are actually removed using fio.path.exists().
> 
> lgtm

Pushed to master.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-07-04  8:09       ` Konstantin Osipov
@ 2019-07-04 17:06         ` Vladimir Davydov
  2019-07-08  7:40           ` Konstantin Osipov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-04 17:06 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Thu, Jul 04, 2019 at 11:09:09AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/03 23:00]:
> > TBO I don't think you follow. It isn't a lock, actually. It's just a
> > flag saying the space is busy building an index. If someone tries to do
> > something with a space that is busy, they will fail. This is consistent
> > with vinyl tx manager behavior. No deadlock is possible by design.
> 
> It's a kind of locking called optimistic locking. Any lock can be acquired in
> optimistic mode (_trylock()) and in pessimistic mode. Once you
> have normal locks you can trylock them ad nausea, the only
> question is whether users will like it. Imagine I have a DDL
> script which I use to update/upgrade my data on production. Do I really
> want to add re-tries to this DDL script now that I know it worked
> in my test environment? Most users don't. 
> 
> With transactional data dictionary multiple metadata locks are
> around the corner. 
> 
> Imagine this:
> 
> box.begin()
> box.space.foo:rename('tmp')
> box.space.bar:rename('foo')
> box.space.foo:rename('bar')
> box.commit()
> 
> the same can will soon be possible to do in SQL as well. 
> 
> all these locks will have to stay around till commit, so leave
> through WAL yield.
> 
> In other words, we either need to switch the data dictionary to
> MVCC or to use locks, or better yet, have both mechanisms
> available so that we can use whichever we need when we need it.

As discussed verbally, a lock is held only during a yielding operation,
such as building of a new index or checking space format. A lock is
released before the statement is submitted to WAL.

As we agreed, I reworked the locking implementation to make the lock
lifetime clear: now I keep a set of all space locks rather than using
a per-space flag.

Please find the updated patch below. I also updated the branch.
---

From 5725bb952acb2aa1c7bb96331a0db271ff8b4c30 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Date: Thu, 4 Jul 2019 19:49:00 +0300
Subject: [PATCH] Replace schema lock with fine-grained locking

Now, as we don't need to take the schema lock for checkpointing, it is
only used to synchronize concurrent space modifications (drop, truncate,
alter). Actually, a global lock is a way too heavy means to achieve this
goal, because we only care about forbidding concurrent modifications of
the same space while concurrent modifications of different spaces should
work just fine. So this patch replaces the global schema lock with per
space locking.

A space lock is held while alter_space_do() is in progress so as to make
sure that while AlterSpaceOp::prepare() is performing a potentially
yielding operation, such as building a new index, the space struct
doesn't get freed from under our feet. Note, the lock is released right
after index build is complete, before the transaction is committed to
WAL, so if the transaction is non-yielding it can modify the space again
in the next statement (this is impossible now, but will be done in the
scope of the transactional DDL feature).

If alter_space_do() sees that the space is already locked it bails out
and throws an error. This should be fine, because long-lasting operation
involving schema change, such as building an index, are rare and only
performed under the supervision of the user so throwing an error rather
than waiting seems to be adequate.

Removal of the schema lock allows us to remove latch_steal() helper and
on_begin_stmt txn trigger altogether, as they were introduced solely to
support locking.

This is a prerequisite for transactional DDL, because it's unclear how
to preserve the global schema lock while allowing to combine several DDL
statements in the same transaction.

diff --git a/src/box/alter.cc b/src/box/alter.cc
index b470b763..80fd50d1 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -29,6 +29,7 @@
  * SUCH DAMAGE.
  */
 #include "alter.h"
+#include "assoc.h"
 #include "ck_constraint.h"
 #include "column_mask.h"
 #include "schema.h"
@@ -572,7 +573,6 @@ space_swap_triggers(struct space *new_space, struct space *old_space)
 {
 	rlist_swap(&new_space->before_replace, &old_space->before_replace);
 	rlist_swap(&new_space->on_replace, &old_space->on_replace);
-	rlist_swap(&new_space->on_stmt_begin, &old_space->on_stmt_begin);
 	/** Swap SQL Triggers pointer. */
 	struct sql_trigger *new_value = new_space->sql_triggers;
 	new_space->sql_triggers = old_space->sql_triggers;
@@ -745,6 +745,40 @@ AlterSpaceOp::AlterSpaceOp(struct alter_space *alter)
 	rlist_add_tail_entry(&alter->ops, this, link);
 }
 
+class AlterSpaceLock {
+	/** Set of all taken locks. */
+	static struct mh_i32_t *registry;
+	/** Identifier of the space this lock is for. */
+	uint32_t space_id;
+public:
+	/** Take a lock for the altered space. */
+	AlterSpaceLock(struct alter_space *alter) {
+		if (registry == NULL) {
+			registry = mh_i32_new();
+			if (registry == NULL) {
+				tnt_raise(OutOfMemory, 0, "mh_i32_new",
+					  "alter lock registry");
+			}
+		}
+		space_id = alter->old_space->def->id;
+		if (mh_i32_find(registry, space_id, NULL) != mh_end(registry)) {
+			tnt_raise(ClientError, ER_ALTER_SPACE,
+				  space_name(alter->old_space),
+				  "the space is already being modified");
+		}
+		mh_int_t k = mh_i32_put(registry, &space_id, NULL, NULL);
+		if (k == mh_end(registry))
+			tnt_raise(OutOfMemory, 0, "mh_i32_put", "alter lock");
+	}
+	~AlterSpaceLock() {
+		mh_int_t k = mh_i32_find(registry, space_id, NULL);
+		assert(k != mh_end(registry));
+		mh_i32_del(registry, k, NULL);
+	}
+};
+
+struct mh_i32_t *AlterSpaceLock::registry;
+
 /**
  * Commit the alter.
  *
@@ -773,6 +807,7 @@ alter_space_commit(struct trigger *trigger, void *event)
 	 * going to use it.
 	 */
 	space_delete(alter->old_space);
+	alter->old_space = NULL;
 	alter_space_delete(alter);
 }
 
@@ -838,6 +873,14 @@ alter_space_rollback(struct trigger *trigger, void * /* event */)
 static void
 alter_space_do(struct txn *txn, struct alter_space *alter)
 {
+	/**
+	 * AlterSpaceOp::prepare() may perform a potentially long
+	 * lasting operation that may yield, e.g. building of a new
+	 * index. We really don't want the space to be replaced by
+	 * another DDL operation while this one is in progress so
+	 * we lock out all concurrent DDL for this space.
+	 */
+	AlterSpaceLock lock(alter);
 	/*
 	 * Prepare triggers while we may fail. Note, we don't have to
 	 * free them in case of failure, because they are allocated on
@@ -3681,49 +3724,6 @@ on_replace_dd_space_sequence(struct trigger * /* trigger */, void *event)
 
 /* }}} sequence */
 
-static void
-unlock_after_dd(struct trigger *trigger, void *event)
-{
-	(void) trigger;
-	(void) event;
-	/*
-	 * In case of yielding journal this trigger will be processed
-	 * in a context of tx_prio endpoint instead of a context of
-	 * a fiber which has this latch locked. So steal the latch first.
-	 */
-	latch_steal(&schema_lock);
-	latch_unlock(&schema_lock);
-}
-
-static void
-lock_before_dd(struct trigger *trigger, void *event)
-{
-	(void) trigger;
-	if (fiber() == latch_owner(&schema_lock))
-		return;
-	struct txn *txn = (struct txn *)event;
-	/*
-	 * This trigger is executed before any check and may yield
-	 * on the latch lock. But a yield in a non-autocommit
-	 * memtx transaction will roll it back silently, rather
-	 * than produce an error, which is very confusing.
-	 * So don't try to lock a latch if there is
-	 * a multi-statement transaction.
-	 */
-	txn_check_singlestatement_xc(txn, "DDL");
-	struct trigger *on_commit =
-		txn_alter_trigger_new(unlock_after_dd, NULL);
-	struct trigger *on_rollback =
-		txn_alter_trigger_new(unlock_after_dd, NULL);
-	/*
-	 * Setting triggers doesn't fail. Lock the latch last
-	 * to avoid leaking the latch in case of exception.
-	 */
-	txn_on_commit(txn, on_commit);
-	txn_on_rollback(txn, on_rollback);
-	latch_lock(&schema_lock);
-}
-
 /** Delete the new trigger on rollback of an INSERT statement. */
 static void
 on_create_trigger_rollback(struct trigger *trigger, void * /* event */)
@@ -4612,18 +4612,6 @@ struct trigger on_replace_space_sequence = {
 	RLIST_LINK_INITIALIZER, on_replace_dd_space_sequence, NULL, NULL
 };
 
-struct trigger on_stmt_begin_space = {
-	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
-};
-
-struct trigger on_stmt_begin_index = {
-	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
-};
-
-struct trigger on_stmt_begin_truncate = {
-	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
-};
-
 struct trigger on_replace_trigger = {
 	RLIST_LINK_INITIALIZER, on_replace_dd_trigger, NULL, NULL
 };
diff --git a/src/box/alter.h b/src/box/alter.h
index b9ba7b84..c339ccea 100644
--- a/src/box/alter.h
+++ b/src/box/alter.h
@@ -47,8 +47,5 @@ extern struct trigger on_replace_space_sequence;
 extern struct trigger on_replace_trigger;
 extern struct trigger on_replace_fk_constraint;
 extern struct trigger on_replace_ck_constraint;
-extern struct trigger on_stmt_begin_space;
-extern struct trigger on_stmt_begin_index;
-extern struct trigger on_stmt_begin_truncate;
 
 #endif /* INCLUDES_TARANTOOL_BOX_ALTER_H */
diff --git a/src/box/schema.cc b/src/box/schema.cc
index 3f90b8d4..20666386 100644
--- a/src/box/schema.cc
+++ b/src/box/schema.cc
@@ -37,6 +37,7 @@
 #include "scoped_guard.h"
 #include "user.h"
 #include "vclock.h"
+#include "fiber.h"
 
 /**
  * @module Data Dictionary
@@ -74,11 +75,6 @@ struct rlist on_alter_space = RLIST_HEAD_INITIALIZER(on_alter_space);
 struct rlist on_alter_sequence = RLIST_HEAD_INITIALIZER(on_alter_sequence);
 struct rlist on_alter_func = RLIST_HEAD_INITIALIZER(on_alter_func);
 
-/**
- * Lock of scheme modification
- */
-struct latch schema_lock = LATCH_INITIALIZER(schema_lock);
-
 struct entity_access entity_access;
 
 bool
@@ -268,8 +264,7 @@ static void
 sc_space_new(uint32_t id, const char *name,
 	     struct key_part_def *key_parts,
 	     uint32_t key_part_count,
-	     struct trigger *replace_trigger,
-	     struct trigger *stmt_begin_trigger)
+	     struct trigger *replace_trigger)
 {
 	struct key_def *key_def = key_def_new(key_parts, key_part_count);
 	if (key_def == NULL)
@@ -298,8 +293,6 @@ sc_space_new(uint32_t id, const char *name,
 	space_cache_replace(NULL, space);
 	if (replace_trigger)
 		trigger_add(&space->on_replace, replace_trigger);
-	if (stmt_begin_trigger)
-		trigger_add(&space->on_stmt_begin, stmt_begin_trigger);
 	/*
 	 * Data dictionary spaces are fully built since:
 	 * - they contain data right from the start
@@ -394,56 +387,56 @@ schema_init()
 	key_parts[0].fieldno = 0;
 	key_parts[0].type = FIELD_TYPE_STRING;
 	sc_space_new(BOX_SCHEMA_ID, "_schema", key_parts, 1,
-		     &on_replace_schema, NULL);
+		     &on_replace_schema);
 
 	/* _collation - collation description. */
 	key_parts[0].fieldno = 0;
 	key_parts[0].type = FIELD_TYPE_UNSIGNED;
 	sc_space_new(BOX_COLLATION_ID, "_collation", key_parts, 1,
-		     &on_replace_collation, NULL);
+		     &on_replace_collation);
 
 	/* _space - home for all spaces. */
 	sc_space_new(BOX_SPACE_ID, "_space", key_parts, 1,
-		     &alter_space_on_replace_space, &on_stmt_begin_space);
+		     &alter_space_on_replace_space);
 
 	/* _truncate - auxiliary space for triggering space truncation. */
 	sc_space_new(BOX_TRUNCATE_ID, "_truncate", key_parts, 1,
-		     &on_replace_truncate, &on_stmt_begin_truncate);
+		     &on_replace_truncate);
 
 	/* _sequence - definition of all sequence objects. */
 	sc_space_new(BOX_SEQUENCE_ID, "_sequence", key_parts, 1,
-		     &on_replace_sequence, NULL);
+		     &on_replace_sequence);
 
 	/* _sequence_data - current sequence value. */
 	sc_space_new(BOX_SEQUENCE_DATA_ID, "_sequence_data", key_parts, 1,
-		     &on_replace_sequence_data, NULL);
+		     &on_replace_sequence_data);
 
 	/* _space_seq - association space <-> sequence. */
 	sc_space_new(BOX_SPACE_SEQUENCE_ID, "_space_sequence", key_parts, 1,
-		     &on_replace_space_sequence, NULL);
+		     &on_replace_space_sequence);
 
 	/* _user - all existing users */
-	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user, NULL);
+	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user);
 
 	/* _func - all executable objects on which one can have grants */
-	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func, NULL);
+	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func);
 	/*
 	 * _priv - association user <-> object
 	 * The real index is defined in the snapshot.
 	 */
-	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv, NULL);
+	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv);
 	/*
 	 * _cluster - association instance uuid <-> instance id
 	 * The real index is defined in the snapshot.
 	 */
 	sc_space_new(BOX_CLUSTER_ID, "_cluster", key_parts, 1,
-		     &on_replace_cluster, NULL);
+		     &on_replace_cluster);
 
 	/* _trigger - all existing SQL triggers. */
 	key_parts[0].fieldno = 0;
 	key_parts[0].type = FIELD_TYPE_STRING;
 	sc_space_new(BOX_TRIGGER_ID, "_trigger", key_parts, 1,
-		     &on_replace_trigger, NULL);
+		     &on_replace_trigger);
 
 	/* _index - definition of all space indexes. */
 	key_parts[0].fieldno = 0; /* space id */
@@ -451,7 +444,7 @@ schema_init()
 	key_parts[1].fieldno = 1; /* index id */
 	key_parts[1].type = FIELD_TYPE_UNSIGNED;
 	sc_space_new(BOX_INDEX_ID, "_index", key_parts, 2,
-		     &alter_space_on_replace_index, &on_stmt_begin_index);
+		     &alter_space_on_replace_index);
 
 	/* _fk_сonstraint - foreign keys constraints. */
 	key_parts[0].fieldno = 0; /* constraint name */
@@ -459,7 +452,7 @@ schema_init()
 	key_parts[1].fieldno = 1; /* child space */
 	key_parts[1].type = FIELD_TYPE_UNSIGNED;
 	sc_space_new(BOX_FK_CONSTRAINT_ID, "_fk_constraint", key_parts, 2,
-		     &on_replace_fk_constraint, NULL);
+		     &on_replace_fk_constraint);
 
 	/* _ck_сonstraint - check constraints. */
 	key_parts[0].fieldno = 0; /* space id */
@@ -467,7 +460,7 @@ schema_init()
 	key_parts[1].fieldno = 1; /* constraint name */
 	key_parts[1].type = FIELD_TYPE_STRING;
 	sc_space_new(BOX_CK_CONSTRAINT_ID, "_ck_constraint", key_parts, 2,
-		     &on_replace_ck_constraint, NULL);
+		     &on_replace_ck_constraint);
 
 	/*
 	 * _vinyl_deferred_delete - blackhole that is needed
diff --git a/src/box/schema.h b/src/box/schema.h
index 84f0d33f..f9d15b38 100644
--- a/src/box/schema.h
+++ b/src/box/schema.h
@@ -35,7 +35,6 @@
 #include <stdio.h> /* snprintf */
 #include "error.h"
 #include "space.h"
-#include "latch.h"
 
 #if defined(__cplusplus)
 extern "C" {
@@ -50,11 +49,6 @@ extern uint32_t space_cache_version;
 extern struct rlist on_schema_init;
 
 /**
- * Lock of schema modification
- */
-extern struct latch schema_lock;
-
-/**
  * Try to look up a space by space number in the space cache.
  * FFI-friendly no-exception-thrown space lookup function.
  *
diff --git a/src/box/space.c b/src/box/space.c
index b6ad87bf..e7babb2a 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -137,7 +137,6 @@ space_create(struct space *space, struct engine *engine,
 	space->index_id_max = index_id_max;
 	rlist_create(&space->before_replace);
 	rlist_create(&space->on_replace);
-	rlist_create(&space->on_stmt_begin);
 	space->run_triggers = true;
 
 	space->format = format;
@@ -219,7 +218,6 @@ space_delete(struct space *space)
 		tuple_format_unref(space->format);
 	trigger_destroy(&space->before_replace);
 	trigger_destroy(&space->on_replace);
-	trigger_destroy(&space->on_stmt_begin);
 	space_def_delete(space->def);
 	/*
 	 * SQL Triggers should be deleted with
diff --git a/src/box/space.h b/src/box/space.h
index 949f37d4..88db4ec5 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -154,8 +154,6 @@ struct space {
 	struct rlist before_replace;
 	/** Triggers fired after space_replace() -- see txn_commit_stmt(). */
 	struct rlist on_replace;
-	/** Triggers fired before space statement */
-	struct rlist on_stmt_begin;
 	/** SQL Trigger list. */
 	struct sql_trigger *sql_triggers;
 	/**
diff --git a/src/box/txn.c b/src/box/txn.c
index 95076773..818f405b 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -242,9 +242,6 @@ txn_begin_stmt(struct txn *txn, struct space *space)
 	if (space == NULL)
 		return 0;
 
-	if (trigger_run(&space->on_stmt_begin, txn) != 0)
-		goto fail;
-
 	struct engine *engine = space->engine;
 	if (txn_begin_in_engine(engine, txn) != 0)
 		goto fail;
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 3e134cc1..48590521 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -38,6 +38,7 @@
 #include <small/mempool.h>
 
 #include "diag.h"
+#include "fiber.h"
 #include "errcode.h"
 #include "histogram.h"
 #include "index_def.h"
diff --git a/src/lib/core/assoc.h b/src/lib/core/assoc.h
index 74f87eee..fca02e8d 100644
--- a/src/lib/core/assoc.h
+++ b/src/lib/core/assoc.h
@@ -40,6 +40,19 @@ extern "C" {
 #include "third_party/PMurHash.h"
 
 /*
+ * Set: (i32)
+ */
+#define mh_name _i32
+#define mh_key_t uint32_t
+#define mh_node_t uint32_t
+#define mh_arg_t void *
+#define mh_hash(a, arg) (*(a))
+#define mh_hash_key(a, arg) (a)
+#define mh_cmp(a, b, arg) (*(a) != *(b))
+#define mh_cmp_key(a, b, arg) ((a) != *(b))
+#include "salad/mhash.h"
+
+/*
  * Map: (i32) => (void *)
  */
 #define mh_name _i32ptr
diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
index 58094256..49c59cf6 100644
--- a/src/lib/core/latch.h
+++ b/src/lib/core/latch.h
@@ -156,16 +156,6 @@ latch_trylock(struct latch *l)
 }
 
 /**
- * Take a latch ownership
- */
-static inline void
-latch_steal(struct latch *l)
-{
-	assert(l->owner != NULL);
-	l->owner = fiber();
-}
-
-/**
  * \copydoc box_latch_unlock
  */
 static inline void
diff --git a/test/box/on_replace.result b/test/box/on_replace.result
index a2f51409..b71c9878 100644
--- a/test/box/on_replace.result
+++ b/test/box/on_replace.result
@@ -477,7 +477,7 @@ t = s:on_replace(function () s:create_index('sec') end, t)
 ...
 s:replace({2, 3})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _index does not support multi-statement transactions
 ...
 t = s:on_replace(function () box.schema.user.create('newu') end, t)
 ---
@@ -512,7 +512,7 @@ t = s:on_replace(function () s:drop() end, t)
 ...
 s:replace({5, 6})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _index does not support multi-statement transactions
 ...
 t = s:on_replace(function () box.schema.func.create('newf') end, t)
 ---
@@ -533,14 +533,14 @@ t = s:on_replace(function () s:rename('newname') end, t)
 ...
 s:replace({8, 9})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _space does not support multi-statement transactions
 ...
 t = s:on_replace(function () s.index.pk:rename('newname') end, t)
 ---
 ...
 s:replace({9, 10})
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _index does not support multi-statement transactions
 ...
 s:select()
 ---
diff --git a/test/box/transaction.result b/test/box/transaction.result
index bc50735b..9da53e5b 100644
--- a/test/box/transaction.result
+++ b/test/box/transaction.result
@@ -87,7 +87,7 @@ while f:status() ~= 'dead' do fiber.sleep(0) end;
 -- some operation involves more than one ddl spaces, so they should fail
 box.begin() box.schema.space.create('test');
 ---
-- error: DDL does not support multi-statement transactions
+- error: Space _space does not support multi-statement transactions
 ...
 box.rollback();
 ---
diff --git a/test/engine/errinj_ddl.result b/test/engine/errinj_ddl.result
index a28223a6..60bf34d9 100644
--- a/test/engine/errinj_ddl.result
+++ b/test/engine/errinj_ddl.result
@@ -404,3 +404,101 @@ s:select()
 s:drop()
 ---
 ...
+--
+-- Concurrent DDL.
+--
+s1 = box.schema.space.create('test1', {engine = engine})
+---
+...
+_ = s1:create_index('pk')
+---
+...
+for i = 1, 5 do s1:insert{i, i} end
+---
+...
+s2 = box.schema.space.create('test2', {engine = engine})
+---
+...
+_ = s2:create_index('pk')
+---
+...
+for i = 1, 5 do s2:insert{i, i} end
+---
+...
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
+---
+- ok
+...
+ch = fiber.channel(1)
+---
+...
+_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
+---
+...
+-- Modification of the same space must fail while an index creation
+-- is in progress.
+s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+s1:truncate()
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+s1:drop()
+---
+- error: 'Can''t modify space ''test1'': the space is already being modified'
+...
+-- Modification of another space must work though.
+s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+---
+...
+s2:truncate()
+---
+...
+_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
+---
+...
+s2:drop()
+---
+...
+s2 = box.schema.space.create('test2', {engine = engine})
+---
+...
+_ = s2:create_index('pk')
+---
+...
+s2:drop()
+---
+...
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
+---
+- ok
+...
+ch:get()
+---
+- true
+...
+s1.index.pk:select()
+---
+- - [1, 1]
+  - [2, 2]
+  - [3, 3]
+  - [4, 4]
+  - [5, 5]
+...
+s1.index.sk:select()
+---
+- - [1, 1]
+  - [2, 2]
+  - [3, 3]
+  - [4, 4]
+  - [5, 5]
+...
+s1:drop()
+---
+...
diff --git a/test/engine/errinj_ddl.test.lua b/test/engine/errinj_ddl.test.lua
index c40eae93..207610c4 100644
--- a/test/engine/errinj_ddl.test.lua
+++ b/test/engine/errinj_ddl.test.lua
@@ -188,3 +188,41 @@ ch:get()
 
 s:select()
 s:drop()
+
+--
+-- Concurrent DDL.
+--
+s1 = box.schema.space.create('test1', {engine = engine})
+_ = s1:create_index('pk')
+for i = 1, 5 do s1:insert{i, i} end
+
+s2 = box.schema.space.create('test2', {engine = engine})
+_ = s2:create_index('pk')
+for i = 1, 5 do s2:insert{i, i} end
+
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
+ch = fiber.channel(1)
+_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
+
+-- Modification of the same space must fail while an index creation
+-- is in progress.
+s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+s1:truncate()
+_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
+s1:drop()
+
+-- Modification of another space must work though.
+s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
+s2:truncate()
+_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
+s2:drop()
+s2 = box.schema.space.create('test2', {engine = engine})
+_ = s2:create_index('pk')
+s2:drop()
+
+errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
+ch:get()
+
+s1.index.pk:select()
+s1.index.sk:select()
+s1:drop()

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] Get rid of the schema lock
  2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
                   ` (5 preceding siblings ...)
  2019-06-30 19:40 ` [PATCH 6/6] Replace schema lock with fine-grained locking Vladimir Davydov
@ 2019-07-05  8:53 ` Vladimir Davydov
  6 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-05  8:53 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

On Sun, Jun 30, 2019 at 10:40:13PM +0300, Vladimir Davydov wrote:
>   Don't take schema lock for checkpointing
>   vinyl: don't yield while logging index creation
>   Replace schema lock with fine-grained locking

Pushed the rest of the series to master, after soliciting verbal
approval from Kostja.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-07-04 17:06         ` Vladimir Davydov
@ 2019-07-08  7:40           ` Konstantin Osipov
  2019-07-08  8:41             ` Vladimir Davydov
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Osipov @ 2019-07-08  7:40 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/04 20:10]:

Come to think about this commit, it still raises a lot of
questions.

What is it taken for entire alter_space_do while it protects alter_space_prepare
only?

Why is it necessary to protect the yields from prepare, but not
from WAL?

Why is not not taken when a new space is created? What happens
when box.schema.space:create() is run concurrently from two
fibers, would that work without a lock? Why?

These questions should at least be addressed in the comments to
the lock class. 

And since the lock is wrapped around long yielding operations,
perhaps it should be taken by objects which do these operations,
like CheckSpaceFormat and CreateIndex (it is easy to grant the
lock to a fiber which already owns it, so each object could be
taken its own lock)?

> On Thu, Jul 04, 2019 at 11:09:09AM +0300, Konstantin Osipov wrote:
> > * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/03 23:00]:
> > > TBO I don't think you follow. It isn't a lock, actually. It's just a
> > > flag saying the space is busy building an index. If someone tries to do
> > > something with a space that is busy, they will fail. This is consistent
> > > with vinyl tx manager behavior. No deadlock is possible by design.
> > 
> > It's a kind of locking called optimistic locking. Any lock can be acquired in
> > optimistic mode (_trylock()) and in pessimistic mode. Once you
> > have normal locks you can trylock them ad nausea, the only
> > question is whether users will like it. Imagine I have a DDL
> > script which I use to update/upgrade my data on production. Do I really
> > want to add re-tries to this DDL script now that I know it worked
> > in my test environment? Most users don't. 
> > 
> > With transactional data dictionary multiple metadata locks are
> > around the corner. 
> > 
> > Imagine this:
> > 
> > box.begin()
> > box.space.foo:rename('tmp')
> > box.space.bar:rename('foo')
> > box.space.foo:rename('bar')
> > box.commit()
> > 
> > the same can will soon be possible to do in SQL as well. 
> > 
> > all these locks will have to stay around till commit, so leave
> > through WAL yield.
> > 
> > In other words, we either need to switch the data dictionary to
> > MVCC or to use locks, or better yet, have both mechanisms
> > available so that we can use whichever we need when we need it.
> 
> As discussed verbally, a lock is held only during a yielding operation,
> such as building of a new index or checking space format. A lock is
> released before the statement is submitted to WAL.
> 
> As we agreed, I reworked the locking implementation to make the lock
> lifetime clear: now I keep a set of all space locks rather than using
> a per-space flag.
> 
> Please find the updated patch below. I also updated the branch.
> ---
> 
> From 5725bb952acb2aa1c7bb96331a0db271ff8b4c30 Mon Sep 17 00:00:00 2001
> From: Vladimir Davydov <vdavydov.dev@gmail.com>
> Date: Thu, 4 Jul 2019 19:49:00 +0300
> Subject: [PATCH] Replace schema lock with fine-grained locking
> 
> Now, as we don't need to take the schema lock for checkpointing, it is
> only used to synchronize concurrent space modifications (drop, truncate,
> alter). Actually, a global lock is a way too heavy means to achieve this
> goal, because we only care about forbidding concurrent modifications of
> the same space while concurrent modifications of different spaces should
> work just fine. So this patch replaces the global schema lock with per
> space locking.
> 
> A space lock is held while alter_space_do() is in progress so as to make
> sure that while AlterSpaceOp::prepare() is performing a potentially
> yielding operation, such as building a new index, the space struct
> doesn't get freed from under our feet. Note, the lock is released right
> after index build is complete, before the transaction is committed to
> WAL, so if the transaction is non-yielding it can modify the space again
> in the next statement (this is impossible now, but will be done in the
> scope of the transactional DDL feature).
> 
> If alter_space_do() sees that the space is already locked it bails out
> and throws an error. This should be fine, because long-lasting operation
> involving schema change, such as building an index, are rare and only
> performed under the supervision of the user so throwing an error rather
> than waiting seems to be adequate.
> 
> Removal of the schema lock allows us to remove latch_steal() helper and
> on_begin_stmt txn trigger altogether, as they were introduced solely to
> support locking.
> 
> This is a prerequisite for transactional DDL, because it's unclear how
> to preserve the global schema lock while allowing to combine several DDL
> statements in the same transaction.
> 
> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index b470b763..80fd50d1 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -29,6 +29,7 @@
>   * SUCH DAMAGE.
>   */
>  #include "alter.h"
> +#include "assoc.h"
>  #include "ck_constraint.h"
>  #include "column_mask.h"
>  #include "schema.h"
> @@ -572,7 +573,6 @@ space_swap_triggers(struct space *new_space, struct space *old_space)
>  {
>  	rlist_swap(&new_space->before_replace, &old_space->before_replace);
>  	rlist_swap(&new_space->on_replace, &old_space->on_replace);
> -	rlist_swap(&new_space->on_stmt_begin, &old_space->on_stmt_begin);
>  	/** Swap SQL Triggers pointer. */
>  	struct sql_trigger *new_value = new_space->sql_triggers;
>  	new_space->sql_triggers = old_space->sql_triggers;
> @@ -745,6 +745,40 @@ AlterSpaceOp::AlterSpaceOp(struct alter_space *alter)
>  	rlist_add_tail_entry(&alter->ops, this, link);
>  }
>  
> +class AlterSpaceLock {
> +	/** Set of all taken locks. */
> +	static struct mh_i32_t *registry;
> +	/** Identifier of the space this lock is for. */
> +	uint32_t space_id;
> +public:
> +	/** Take a lock for the altered space. */
> +	AlterSpaceLock(struct alter_space *alter) {
> +		if (registry == NULL) {
> +			registry = mh_i32_new();
> +			if (registry == NULL) {
> +				tnt_raise(OutOfMemory, 0, "mh_i32_new",
> +					  "alter lock registry");
> +			}
> +		}
> +		space_id = alter->old_space->def->id;
> +		if (mh_i32_find(registry, space_id, NULL) != mh_end(registry)) {
> +			tnt_raise(ClientError, ER_ALTER_SPACE,
> +				  space_name(alter->old_space),
> +				  "the space is already being modified");
> +		}
> +		mh_int_t k = mh_i32_put(registry, &space_id, NULL, NULL);
> +		if (k == mh_end(registry))
> +			tnt_raise(OutOfMemory, 0, "mh_i32_put", "alter lock");
> +	}
> +	~AlterSpaceLock() {
> +		mh_int_t k = mh_i32_find(registry, space_id, NULL);
> +		assert(k != mh_end(registry));
> +		mh_i32_del(registry, k, NULL);
> +	}
> +};
> +
> +struct mh_i32_t *AlterSpaceLock::registry;
> +
>  /**
>   * Commit the alter.
>   *
> @@ -773,6 +807,7 @@ alter_space_commit(struct trigger *trigger, void *event)
>  	 * going to use it.
>  	 */
>  	space_delete(alter->old_space);
> +	alter->old_space = NULL;
>  	alter_space_delete(alter);
>  }
>  
> @@ -838,6 +873,14 @@ alter_space_rollback(struct trigger *trigger, void * /* event */)
>  static void
>  alter_space_do(struct txn *txn, struct alter_space *alter)
>  {
> +	/**
> +	 * AlterSpaceOp::prepare() may perform a potentially long
> +	 * lasting operation that may yield, e.g. building of a new
> +	 * index. We really don't want the space to be replaced by
> +	 * another DDL operation while this one is in progress so
> +	 * we lock out all concurrent DDL for this space.
> +	 */
> +	AlterSpaceLock lock(alter);
>  	/*
>  	 * Prepare triggers while we may fail. Note, we don't have to
>  	 * free them in case of failure, because they are allocated on
> @@ -3681,49 +3724,6 @@ on_replace_dd_space_sequence(struct trigger * /* trigger */, void *event)
>  
>  /* }}} sequence */
>  
> -static void
> -unlock_after_dd(struct trigger *trigger, void *event)
> -{
> -	(void) trigger;
> -	(void) event;
> -	/*
> -	 * In case of yielding journal this trigger will be processed
> -	 * in a context of tx_prio endpoint instead of a context of
> -	 * a fiber which has this latch locked. So steal the latch first.
> -	 */
> -	latch_steal(&schema_lock);
> -	latch_unlock(&schema_lock);
> -}
> -
> -static void
> -lock_before_dd(struct trigger *trigger, void *event)
> -{
> -	(void) trigger;
> -	if (fiber() == latch_owner(&schema_lock))
> -		return;
> -	struct txn *txn = (struct txn *)event;
> -	/*
> -	 * This trigger is executed before any check and may yield
> -	 * on the latch lock. But a yield in a non-autocommit
> -	 * memtx transaction will roll it back silently, rather
> -	 * than produce an error, which is very confusing.
> -	 * So don't try to lock a latch if there is
> -	 * a multi-statement transaction.
> -	 */
> -	txn_check_singlestatement_xc(txn, "DDL");
> -	struct trigger *on_commit =
> -		txn_alter_trigger_new(unlock_after_dd, NULL);
> -	struct trigger *on_rollback =
> -		txn_alter_trigger_new(unlock_after_dd, NULL);
> -	/*
> -	 * Setting triggers doesn't fail. Lock the latch last
> -	 * to avoid leaking the latch in case of exception.
> -	 */
> -	txn_on_commit(txn, on_commit);
> -	txn_on_rollback(txn, on_rollback);
> -	latch_lock(&schema_lock);
> -}
> -
>  /** Delete the new trigger on rollback of an INSERT statement. */
>  static void
>  on_create_trigger_rollback(struct trigger *trigger, void * /* event */)
> @@ -4612,18 +4612,6 @@ struct trigger on_replace_space_sequence = {
>  	RLIST_LINK_INITIALIZER, on_replace_dd_space_sequence, NULL, NULL
>  };
>  
> -struct trigger on_stmt_begin_space = {
> -	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
> -};
> -
> -struct trigger on_stmt_begin_index = {
> -	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
> -};
> -
> -struct trigger on_stmt_begin_truncate = {
> -	RLIST_LINK_INITIALIZER, lock_before_dd, NULL, NULL
> -};
> -
>  struct trigger on_replace_trigger = {
>  	RLIST_LINK_INITIALIZER, on_replace_dd_trigger, NULL, NULL
>  };
> diff --git a/src/box/alter.h b/src/box/alter.h
> index b9ba7b84..c339ccea 100644
> --- a/src/box/alter.h
> +++ b/src/box/alter.h
> @@ -47,8 +47,5 @@ extern struct trigger on_replace_space_sequence;
>  extern struct trigger on_replace_trigger;
>  extern struct trigger on_replace_fk_constraint;
>  extern struct trigger on_replace_ck_constraint;
> -extern struct trigger on_stmt_begin_space;
> -extern struct trigger on_stmt_begin_index;
> -extern struct trigger on_stmt_begin_truncate;
>  
>  #endif /* INCLUDES_TARANTOOL_BOX_ALTER_H */
> diff --git a/src/box/schema.cc b/src/box/schema.cc
> index 3f90b8d4..20666386 100644
> --- a/src/box/schema.cc
> +++ b/src/box/schema.cc
> @@ -37,6 +37,7 @@
>  #include "scoped_guard.h"
>  #include "user.h"
>  #include "vclock.h"
> +#include "fiber.h"
>  
>  /**
>   * @module Data Dictionary
> @@ -74,11 +75,6 @@ struct rlist on_alter_space = RLIST_HEAD_INITIALIZER(on_alter_space);
>  struct rlist on_alter_sequence = RLIST_HEAD_INITIALIZER(on_alter_sequence);
>  struct rlist on_alter_func = RLIST_HEAD_INITIALIZER(on_alter_func);
>  
> -/**
> - * Lock of scheme modification
> - */
> -struct latch schema_lock = LATCH_INITIALIZER(schema_lock);
> -
>  struct entity_access entity_access;
>  
>  bool
> @@ -268,8 +264,7 @@ static void
>  sc_space_new(uint32_t id, const char *name,
>  	     struct key_part_def *key_parts,
>  	     uint32_t key_part_count,
> -	     struct trigger *replace_trigger,
> -	     struct trigger *stmt_begin_trigger)
> +	     struct trigger *replace_trigger)
>  {
>  	struct key_def *key_def = key_def_new(key_parts, key_part_count);
>  	if (key_def == NULL)
> @@ -298,8 +293,6 @@ sc_space_new(uint32_t id, const char *name,
>  	space_cache_replace(NULL, space);
>  	if (replace_trigger)
>  		trigger_add(&space->on_replace, replace_trigger);
> -	if (stmt_begin_trigger)
> -		trigger_add(&space->on_stmt_begin, stmt_begin_trigger);
>  	/*
>  	 * Data dictionary spaces are fully built since:
>  	 * - they contain data right from the start
> @@ -394,56 +387,56 @@ schema_init()
>  	key_parts[0].fieldno = 0;
>  	key_parts[0].type = FIELD_TYPE_STRING;
>  	sc_space_new(BOX_SCHEMA_ID, "_schema", key_parts, 1,
> -		     &on_replace_schema, NULL);
> +		     &on_replace_schema);
>  
>  	/* _collation - collation description. */
>  	key_parts[0].fieldno = 0;
>  	key_parts[0].type = FIELD_TYPE_UNSIGNED;
>  	sc_space_new(BOX_COLLATION_ID, "_collation", key_parts, 1,
> -		     &on_replace_collation, NULL);
> +		     &on_replace_collation);
>  
>  	/* _space - home for all spaces. */
>  	sc_space_new(BOX_SPACE_ID, "_space", key_parts, 1,
> -		     &alter_space_on_replace_space, &on_stmt_begin_space);
> +		     &alter_space_on_replace_space);
>  
>  	/* _truncate - auxiliary space for triggering space truncation. */
>  	sc_space_new(BOX_TRUNCATE_ID, "_truncate", key_parts, 1,
> -		     &on_replace_truncate, &on_stmt_begin_truncate);
> +		     &on_replace_truncate);
>  
>  	/* _sequence - definition of all sequence objects. */
>  	sc_space_new(BOX_SEQUENCE_ID, "_sequence", key_parts, 1,
> -		     &on_replace_sequence, NULL);
> +		     &on_replace_sequence);
>  
>  	/* _sequence_data - current sequence value. */
>  	sc_space_new(BOX_SEQUENCE_DATA_ID, "_sequence_data", key_parts, 1,
> -		     &on_replace_sequence_data, NULL);
> +		     &on_replace_sequence_data);
>  
>  	/* _space_seq - association space <-> sequence. */
>  	sc_space_new(BOX_SPACE_SEQUENCE_ID, "_space_sequence", key_parts, 1,
> -		     &on_replace_space_sequence, NULL);
> +		     &on_replace_space_sequence);
>  
>  	/* _user - all existing users */
> -	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user, NULL);
> +	sc_space_new(BOX_USER_ID, "_user", key_parts, 1, &on_replace_user);
>  
>  	/* _func - all executable objects on which one can have grants */
> -	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func, NULL);
> +	sc_space_new(BOX_FUNC_ID, "_func", key_parts, 1, &on_replace_func);
>  	/*
>  	 * _priv - association user <-> object
>  	 * The real index is defined in the snapshot.
>  	 */
> -	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv, NULL);
> +	sc_space_new(BOX_PRIV_ID, "_priv", key_parts, 1, &on_replace_priv);
>  	/*
>  	 * _cluster - association instance uuid <-> instance id
>  	 * The real index is defined in the snapshot.
>  	 */
>  	sc_space_new(BOX_CLUSTER_ID, "_cluster", key_parts, 1,
> -		     &on_replace_cluster, NULL);
> +		     &on_replace_cluster);
>  
>  	/* _trigger - all existing SQL triggers. */
>  	key_parts[0].fieldno = 0;
>  	key_parts[0].type = FIELD_TYPE_STRING;
>  	sc_space_new(BOX_TRIGGER_ID, "_trigger", key_parts, 1,
> -		     &on_replace_trigger, NULL);
> +		     &on_replace_trigger);
>  
>  	/* _index - definition of all space indexes. */
>  	key_parts[0].fieldno = 0; /* space id */
> @@ -451,7 +444,7 @@ schema_init()
>  	key_parts[1].fieldno = 1; /* index id */
>  	key_parts[1].type = FIELD_TYPE_UNSIGNED;
>  	sc_space_new(BOX_INDEX_ID, "_index", key_parts, 2,
> -		     &alter_space_on_replace_index, &on_stmt_begin_index);
> +		     &alter_space_on_replace_index);
>  
>  	/* _fk_сonstraint - foreign keys constraints. */
>  	key_parts[0].fieldno = 0; /* constraint name */
> @@ -459,7 +452,7 @@ schema_init()
>  	key_parts[1].fieldno = 1; /* child space */
>  	key_parts[1].type = FIELD_TYPE_UNSIGNED;
>  	sc_space_new(BOX_FK_CONSTRAINT_ID, "_fk_constraint", key_parts, 2,
> -		     &on_replace_fk_constraint, NULL);
> +		     &on_replace_fk_constraint);
>  
>  	/* _ck_сonstraint - check constraints. */
>  	key_parts[0].fieldno = 0; /* space id */
> @@ -467,7 +460,7 @@ schema_init()
>  	key_parts[1].fieldno = 1; /* constraint name */
>  	key_parts[1].type = FIELD_TYPE_STRING;
>  	sc_space_new(BOX_CK_CONSTRAINT_ID, "_ck_constraint", key_parts, 2,
> -		     &on_replace_ck_constraint, NULL);
> +		     &on_replace_ck_constraint);
>  
>  	/*
>  	 * _vinyl_deferred_delete - blackhole that is needed
> diff --git a/src/box/schema.h b/src/box/schema.h
> index 84f0d33f..f9d15b38 100644
> --- a/src/box/schema.h
> +++ b/src/box/schema.h
> @@ -35,7 +35,6 @@
>  #include <stdio.h> /* snprintf */
>  #include "error.h"
>  #include "space.h"
> -#include "latch.h"
>  
>  #if defined(__cplusplus)
>  extern "C" {
> @@ -50,11 +49,6 @@ extern uint32_t space_cache_version;
>  extern struct rlist on_schema_init;
>  
>  /**
> - * Lock of schema modification
> - */
> -extern struct latch schema_lock;
> -
> -/**
>   * Try to look up a space by space number in the space cache.
>   * FFI-friendly no-exception-thrown space lookup function.
>   *
> diff --git a/src/box/space.c b/src/box/space.c
> index b6ad87bf..e7babb2a 100644
> --- a/src/box/space.c
> +++ b/src/box/space.c
> @@ -137,7 +137,6 @@ space_create(struct space *space, struct engine *engine,
>  	space->index_id_max = index_id_max;
>  	rlist_create(&space->before_replace);
>  	rlist_create(&space->on_replace);
> -	rlist_create(&space->on_stmt_begin);
>  	space->run_triggers = true;
>  
>  	space->format = format;
> @@ -219,7 +218,6 @@ space_delete(struct space *space)
>  		tuple_format_unref(space->format);
>  	trigger_destroy(&space->before_replace);
>  	trigger_destroy(&space->on_replace);
> -	trigger_destroy(&space->on_stmt_begin);
>  	space_def_delete(space->def);
>  	/*
>  	 * SQL Triggers should be deleted with
> diff --git a/src/box/space.h b/src/box/space.h
> index 949f37d4..88db4ec5 100644
> --- a/src/box/space.h
> +++ b/src/box/space.h
> @@ -154,8 +154,6 @@ struct space {
>  	struct rlist before_replace;
>  	/** Triggers fired after space_replace() -- see txn_commit_stmt(). */
>  	struct rlist on_replace;
> -	/** Triggers fired before space statement */
> -	struct rlist on_stmt_begin;
>  	/** SQL Trigger list. */
>  	struct sql_trigger *sql_triggers;
>  	/**
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 95076773..818f405b 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -242,9 +242,6 @@ txn_begin_stmt(struct txn *txn, struct space *space)
>  	if (space == NULL)
>  		return 0;
>  
> -	if (trigger_run(&space->on_stmt_begin, txn) != 0)
> -		goto fail;
> -
>  	struct engine *engine = space->engine;
>  	if (txn_begin_in_engine(engine, txn) != 0)
>  		goto fail;
> diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
> index 3e134cc1..48590521 100644
> --- a/src/box/vy_lsm.c
> +++ b/src/box/vy_lsm.c
> @@ -38,6 +38,7 @@
>  #include <small/mempool.h>
>  
>  #include "diag.h"
> +#include "fiber.h"
>  #include "errcode.h"
>  #include "histogram.h"
>  #include "index_def.h"
> diff --git a/src/lib/core/assoc.h b/src/lib/core/assoc.h
> index 74f87eee..fca02e8d 100644
> --- a/src/lib/core/assoc.h
> +++ b/src/lib/core/assoc.h
> @@ -40,6 +40,19 @@ extern "C" {
>  #include "third_party/PMurHash.h"
>  
>  /*
> + * Set: (i32)
> + */
> +#define mh_name _i32
> +#define mh_key_t uint32_t
> +#define mh_node_t uint32_t
> +#define mh_arg_t void *
> +#define mh_hash(a, arg) (*(a))
> +#define mh_hash_key(a, arg) (a)
> +#define mh_cmp(a, b, arg) (*(a) != *(b))
> +#define mh_cmp_key(a, b, arg) ((a) != *(b))
> +#include "salad/mhash.h"
> +
> +/*
>   * Map: (i32) => (void *)
>   */
>  #define mh_name _i32ptr
> diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
> index 58094256..49c59cf6 100644
> --- a/src/lib/core/latch.h
> +++ b/src/lib/core/latch.h
> @@ -156,16 +156,6 @@ latch_trylock(struct latch *l)
>  }
>  
>  /**
> - * Take a latch ownership
> - */
> -static inline void
> -latch_steal(struct latch *l)
> -{
> -	assert(l->owner != NULL);
> -	l->owner = fiber();
> -}
> -
> -/**
>   * \copydoc box_latch_unlock
>   */
>  static inline void
> diff --git a/test/box/on_replace.result b/test/box/on_replace.result
> index a2f51409..b71c9878 100644
> --- a/test/box/on_replace.result
> +++ b/test/box/on_replace.result
> @@ -477,7 +477,7 @@ t = s:on_replace(function () s:create_index('sec') end, t)
>  ...
>  s:replace({2, 3})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _index does not support multi-statement transactions
>  ...
>  t = s:on_replace(function () box.schema.user.create('newu') end, t)
>  ---
> @@ -512,7 +512,7 @@ t = s:on_replace(function () s:drop() end, t)
>  ...
>  s:replace({5, 6})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _index does not support multi-statement transactions
>  ...
>  t = s:on_replace(function () box.schema.func.create('newf') end, t)
>  ---
> @@ -533,14 +533,14 @@ t = s:on_replace(function () s:rename('newname') end, t)
>  ...
>  s:replace({8, 9})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _space does not support multi-statement transactions
>  ...
>  t = s:on_replace(function () s.index.pk:rename('newname') end, t)
>  ---
>  ...
>  s:replace({9, 10})
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _index does not support multi-statement transactions
>  ...
>  s:select()
>  ---
> diff --git a/test/box/transaction.result b/test/box/transaction.result
> index bc50735b..9da53e5b 100644
> --- a/test/box/transaction.result
> +++ b/test/box/transaction.result
> @@ -87,7 +87,7 @@ while f:status() ~= 'dead' do fiber.sleep(0) end;
>  -- some operation involves more than one ddl spaces, so they should fail
>  box.begin() box.schema.space.create('test');
>  ---
> -- error: DDL does not support multi-statement transactions
> +- error: Space _space does not support multi-statement transactions
>  ...
>  box.rollback();
>  ---
> diff --git a/test/engine/errinj_ddl.result b/test/engine/errinj_ddl.result
> index a28223a6..60bf34d9 100644
> --- a/test/engine/errinj_ddl.result
> +++ b/test/engine/errinj_ddl.result
> @@ -404,3 +404,101 @@ s:select()
>  s:drop()
>  ---
>  ...
> +--
> +-- Concurrent DDL.
> +--
> +s1 = box.schema.space.create('test1', {engine = engine})
> +---
> +...
> +_ = s1:create_index('pk')
> +---
> +...
> +for i = 1, 5 do s1:insert{i, i} end
> +---
> +...
> +s2 = box.schema.space.create('test2', {engine = engine})
> +---
> +...
> +_ = s2:create_index('pk')
> +---
> +...
> +for i = 1, 5 do s2:insert{i, i} end
> +---
> +...
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
> +---
> +- ok
> +...
> +ch = fiber.channel(1)
> +---
> +...
> +_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
> +---
> +...
> +-- Modification of the same space must fail while an index creation
> +-- is in progress.
> +s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +s1:truncate()
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +s1:drop()
> +---
> +- error: 'Can''t modify space ''test1'': the space is already being modified'
> +...
> +-- Modification of another space must work though.
> +s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +---
> +...
> +s2:truncate()
> +---
> +...
> +_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
> +---
> +...
> +s2:drop()
> +---
> +...
> +s2 = box.schema.space.create('test2', {engine = engine})
> +---
> +...
> +_ = s2:create_index('pk')
> +---
> +...
> +s2:drop()
> +---
> +...
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
> +---
> +- ok
> +...
> +ch:get()
> +---
> +- true
> +...
> +s1.index.pk:select()
> +---
> +- - [1, 1]
> +  - [2, 2]
> +  - [3, 3]
> +  - [4, 4]
> +  - [5, 5]
> +...
> +s1.index.sk:select()
> +---
> +- - [1, 1]
> +  - [2, 2]
> +  - [3, 3]
> +  - [4, 4]
> +  - [5, 5]
> +...
> +s1:drop()
> +---
> +...
> diff --git a/test/engine/errinj_ddl.test.lua b/test/engine/errinj_ddl.test.lua
> index c40eae93..207610c4 100644
> --- a/test/engine/errinj_ddl.test.lua
> +++ b/test/engine/errinj_ddl.test.lua
> @@ -188,3 +188,41 @@ ch:get()
>  
>  s:select()
>  s:drop()
> +
> +--
> +-- Concurrent DDL.
> +--
> +s1 = box.schema.space.create('test1', {engine = engine})
> +_ = s1:create_index('pk')
> +for i = 1, 5 do s1:insert{i, i} end
> +
> +s2 = box.schema.space.create('test2', {engine = engine})
> +_ = s2:create_index('pk')
> +for i = 1, 5 do s2:insert{i, i} end
> +
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", true)
> +ch = fiber.channel(1)
> +_ = fiber.create(function() s1:create_index('sk', {parts = {2, 'unsigned'}}) ch:put(true) end)
> +
> +-- Modification of the same space must fail while an index creation
> +-- is in progress.
> +s1:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +s1:truncate()
> +_ = s1:create_index('tk', {parts = {3, 'unsigned'}})
> +s1:drop()
> +
> +-- Modification of another space must work though.
> +s2:format{{'a', 'unsigned'}, {'b', 'unsigned'}}
> +s2:truncate()
> +_ = s2:create_index('sk', {parts = {2, 'unsigned'}})
> +s2:drop()
> +s2 = box.schema.space.create('test2', {engine = engine})
> +_ = s2:create_index('pk')
> +s2:drop()
> +
> +errinj.set("ERRINJ_BUILD_INDEX_DELAY", false)
> +ch:get()
> +
> +s1.index.pk:select()
> +s1.index.sk:select()
> +s1:drop()

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] Replace schema lock with fine-grained locking
  2019-07-08  7:40           ` Konstantin Osipov
@ 2019-07-08  8:41             ` Vladimir Davydov
  0 siblings, 0 replies; 22+ messages in thread
From: Vladimir Davydov @ 2019-07-08  8:41 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Mon, Jul 08, 2019 at 10:40:53AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/04 20:10]:
> 
> Come to think about this commit, it still raises a lot of
> questions.
> 
> What is it taken for entire alter_space_do while it protects alter_space_prepare
> only?

Simply because it's easier to write the code that way. Since we only
care about yields, the lock has no effect if no operation yields.
OTOH taking the lock early makes alter_space_do fail early in case the
space happens to be busy.

> 
> Why is it necessary to protect the yields from prepare, but not
> from WAL?

Once we submitted a request to WAL, we don't really care about
concurrent DDL modifying the space as we are done working with it.
We only care that the space doesn't go away until we run the commit
or rollback trigger. And it can't go away, because other DDL operations
will wait for WAL as well, which orders them to happen after us.

> 
> Why is not not taken when a new space is created? What happens
> when box.schema.space:create() is run concurrently from two
> fibers, would that work without a lock? Why?

We don't need to take the lock when a space is created or dropped,
because those operations never yield.

Concurrent requests are ordered by the WAL just like any other
concurrent DDL.

> 
> These questions should at least be addressed in the comments to
> the lock class. 

Urgh... Isn't it obvious?

> 
> And since the lock is wrapped around long yielding operations,
> perhaps it should be taken by objects which do these operations,
> like CheckSpaceFormat and CreateIndex (it is easy to grant the
> lock to a fiber which already owns it, so each object could be
> taken its own lock)?

I think of locks as protecting objects, not pieces of code. This
particular lock protects the space struct from getting modified by
a concurrent request. I don't see any benefit resulting from taking
it deeper in the call stack - it would only spread the logic between
different functions.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-07-08  8:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-30 19:40 [PATCH 0/6] Get rid of the schema lock Vladimir Davydov
2019-06-30 19:40 ` [PATCH 1/6] Add ERROR_INJECT_YIELD and ERROR_INJECT_SLEEP helpers Vladimir Davydov
2019-07-03 19:12   ` Konstantin Osipov
2019-07-04 15:50     ` Vladimir Davydov
2019-06-30 19:40 ` [PATCH 2/6] Replace ERRINJ_SNAP_WRITE_ROW_TIMEOUT with ERRINJ_SNAP_WRITE_DELAY Vladimir Davydov
2019-07-03 19:13   ` Konstantin Osipov
2019-07-04 15:51     ` Vladimir Davydov
2019-06-30 19:40 ` [PATCH 3/6] Don't take schema lock for checkpointing Vladimir Davydov
2019-07-03 19:21   ` Konstantin Osipov
2019-07-03 20:05     ` Vladimir Davydov
2019-06-30 19:40 ` [PATCH 4/6] test: make vinyl/replica_rejoin more stable Vladimir Davydov
2019-07-03 19:23   ` Konstantin Osipov
2019-07-04 15:51     ` Vladimir Davydov
2019-06-30 19:40 ` [PATCH 5/6] vinyl: don't yield while logging index creation Vladimir Davydov
2019-06-30 19:40 ` [PATCH 6/6] Replace schema lock with fine-grained locking Vladimir Davydov
2019-07-03 19:35   ` Konstantin Osipov
2019-07-03 19:56     ` Vladimir Davydov
2019-07-04  8:09       ` Konstantin Osipov
2019-07-04 17:06         ` Vladimir Davydov
2019-07-08  7:40           ` Konstantin Osipov
2019-07-08  8:41             ` Vladimir Davydov
2019-07-05  8:53 ` [PATCH 0/6] Get rid of the schema lock Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox