Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH] memtx: cancel checkpoint thread at exit
@ 2019-04-18 15:46 Vladimir Davydov
  2019-04-18 17:15 ` Vladimir Davydov
  2019-04-18 17:24 ` Konstantin Osipov
  0 siblings, 2 replies; 5+ messages in thread
From: Vladimir Davydov @ 2019-04-18 15:46 UTC (permalink / raw)
  To: tarantool-patches

If a tarantool instance exits while checkpointing is in progress, the
memtx checkpoint thread, which writes the snap file, can access already
freed data resulting in a crash. Let's fix this the same way we did for
relay and vinyl threads - simply cancel the thread forcefully and wait
for it to terminate.

Closes #4170
---
https://github.com/tarantool/tarantool/issues/4170
https://github.com/tarantool/tarantool/commits/dv/gh-4170-memtx-cancel-checkpoint-thread-at-exit

 src/box/memtx_engine.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 48f700a0..2b9bef24 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -52,6 +52,9 @@
 /* sync snapshot every 16MB */
 #define SNAP_SYNC_INTERVAL	(1 << 24)
 
+static void
+checkpoint_cancel(struct checkpoint *ckpt);
+
 /*
  * Memtx yield-in-transaction trigger: roll back the effects
  * of the transaction and mark the transaction as aborted.
@@ -179,6 +182,8 @@ static void
 memtx_engine_shutdown(struct engine *engine)
 {
 	struct memtx_engine *memtx = (struct memtx_engine *)engine;
+	if (memtx->checkpoint != NULL)
+		checkpoint_cancel(memtx->checkpoint);
 	mempool_destroy(&memtx->iterator_pool);
 	if (mempool_is_initialized(&memtx->rtree_iterator_pool))
 		mempool_destroy(&memtx->rtree_iterator_pool);
@@ -590,6 +595,11 @@ checkpoint_new(const char *snap_dirname, uint64_t snap_io_rate_limit)
 	xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts);
 	vclock_create(&ckpt->vclock);
 	ckpt->touch = false;
+	/*
+	 * Reset the checkpoint thread id so that checkpoint_cancel()
+	 * doesn't attempt to cancel the thread if it isn't running.
+	 */
+	ckpt->cord.id = 0;
 	return ckpt;
 }
 
@@ -605,6 +615,19 @@ checkpoint_delete(struct checkpoint *ckpt)
 	free(ckpt);
 }
 
+static void
+checkpoint_cancel(struct checkpoint *ckpt)
+{
+	/*
+	 * Cancel the checkpoint thread if it's running and wait
+	 * for it to terminate so as to eliminate the possibility
+	 * of use-after-free.
+	 */
+	if (ckpt->cord.id != 0 &&
+	    tt_pthread_cancel(ckpt->cord.id) != ESRCH)
+		tt_pthread_join(ckpt->cord.id, NULL);
+	checkpoint_delete(ckpt);
+}
 
 static int
 checkpoint_add_space(struct space *sp, void *data)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] memtx: cancel checkpoint thread at exit
  2019-04-18 15:46 [PATCH] memtx: cancel checkpoint thread at exit Vladimir Davydov
@ 2019-04-18 17:15 ` Vladimir Davydov
  2019-04-18 17:24   ` [tarantool-patches] " Konstantin Osipov
  2019-04-18 17:24 ` Konstantin Osipov
  1 sibling, 1 reply; 5+ messages in thread
From: Vladimir Davydov @ 2019-04-18 17:15 UTC (permalink / raw)
  To: tarantool-patches

On Thu, Apr 18, 2019 at 06:46:54PM +0300, Vladimir Davydov wrote:
> +static void
> +checkpoint_cancel(struct checkpoint *ckpt)
> +{
> +	/*
> +	 * Cancel the checkpoint thread if it's running and wait
> +	 * for it to terminate so as to eliminate the possibility
> +	 * of use-after-free.
> +	 */
> +	if (ckpt->cord.id != 0 &&
> +	    tt_pthread_cancel(ckpt->cord.id) != ESRCH)
> +		tt_pthread_join(ckpt->cord.id, NULL);
> +	checkpoint_delete(ckpt);
> +}

Oops, turned out this check isn't quite correct as we don't reset
cord.id after joining the thread. As a result, we might call
pthread_cancel on an already joined thread, which leads to a crash
(caught by Travis CI):

 | #0  0x557f6332e9a5 in print_backtrace+1b
 | #1  0x557f6310614f in _ZL12sig_fatal_cbiP9siginfo_tPv+451
 | #2  0x7f98504e10e0 in __restore_rt+0
 | #3  0x7f98504de499 in pthread_cancel+9
 | #4  0x557f6314aa48 in checkpoint_cancel+52
 | #5  0x557f6314866c in memtx_engine_shutdown+55
 | #6  0x557f63144fb1 in engine_shutdown+6f
 | #7  0x557f6326bb52 in box_free+98
 | #8  0x557f631083cf in _Z14tarantool_freev+a9
 | #9  0x557f63109ae2 in main+10cf
 | #10 0x7f984f24e2e1 in __libc_start_main+f1
 | #11 0x557f63104f6a in _start+2a
 | #12 (nil) in +2a

Actually, we don't need to check cord.id at all - we can use already
existing waiting_for_snap_thread flag instead. Fixed this as follows
(this is an incremental diff; for the full diff see below):

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 2b9bef24..4d99910c 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -595,11 +595,6 @@ checkpoint_new(const char *snap_dirname, uint64_t snap_io_rate_limit)
 	xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts);
 	vclock_create(&ckpt->vclock);
 	ckpt->touch = false;
-	/*
-	 * Reset the checkpoint thread id so that checkpoint_cancel()
-	 * doesn't attempt to cancel the thread if it isn't running.
-	 */
-	ckpt->cord.id = 0;
 	return ckpt;
 }
 
@@ -623,9 +618,10 @@ checkpoint_cancel(struct checkpoint *ckpt)
 	 * for it to terminate so as to eliminate the possibility
 	 * of use-after-free.
 	 */
-	if (ckpt->cord.id != 0 &&
-	    tt_pthread_cancel(ckpt->cord.id) != ESRCH)
+	if (ckpt->waiting_for_snap_thread) {
+		tt_pthread_cancel(ckpt->cord.id);
 		tt_pthread_join(ckpt->cord.id, NULL);
+	}
 	checkpoint_delete(ckpt);
 }
 

From 66ff448a89864c3de7d34563a515295d62f7f944 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Date: Thu, 18 Apr 2019 18:03:26 +0300
Subject: [PATCH] memtx: cancel checkpoint thread at exit

If a tarantool instance exits while checkpointing is in progress, the
memtx checkpoint thread, which writes the snap file, can access already
freed data resulting in a crash. Let's fix this the same way we did for
relay and vinyl threads - simply cancel the thread forcefully and wait
for it to terminate.

Closes #4170

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 48f700a0..4d99910c 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -52,6 +52,9 @@
 /* sync snapshot every 16MB */
 #define SNAP_SYNC_INTERVAL	(1 << 24)
 
+static void
+checkpoint_cancel(struct checkpoint *ckpt);
+
 /*
  * Memtx yield-in-transaction trigger: roll back the effects
  * of the transaction and mark the transaction as aborted.
@@ -179,6 +182,8 @@ static void
 memtx_engine_shutdown(struct engine *engine)
 {
 	struct memtx_engine *memtx = (struct memtx_engine *)engine;
+	if (memtx->checkpoint != NULL)
+		checkpoint_cancel(memtx->checkpoint);
 	mempool_destroy(&memtx->iterator_pool);
 	if (mempool_is_initialized(&memtx->rtree_iterator_pool))
 		mempool_destroy(&memtx->rtree_iterator_pool);
@@ -605,6 +610,20 @@ checkpoint_delete(struct checkpoint *ckpt)
 	free(ckpt);
 }
 
+static void
+checkpoint_cancel(struct checkpoint *ckpt)
+{
+	/*
+	 * Cancel the checkpoint thread if it's running and wait
+	 * for it to terminate so as to eliminate the possibility
+	 * of use-after-free.
+	 */
+	if (ckpt->waiting_for_snap_thread) {
+		tt_pthread_cancel(ckpt->cord.id);
+		tt_pthread_join(ckpt->cord.id, NULL);
+	}
+	checkpoint_delete(ckpt);
+}
 
 static int
 checkpoint_add_space(struct space *sp, void *data)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] Re: [PATCH] memtx: cancel checkpoint thread at exit
  2019-04-18 15:46 [PATCH] memtx: cancel checkpoint thread at exit Vladimir Davydov
  2019-04-18 17:15 ` Vladimir Davydov
@ 2019-04-18 17:24 ` Konstantin Osipov
  1 sibling, 0 replies; 5+ messages in thread
From: Konstantin Osipov @ 2019-04-18 17:24 UTC (permalink / raw)
  To: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/04/18 18:47]:
> If a tarantool instance exits while checkpointing is in progress, the
> memtx checkpoint thread, which writes the snap file, can access already
> freed data resulting in a crash. Let's fix this the same way we did for
> relay and vinyl threads - simply cancel the thread forcefully and wait
> for it to terminate.

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] Re: [PATCH] memtx: cancel checkpoint thread at exit
  2019-04-18 17:15 ` Vladimir Davydov
@ 2019-04-18 17:24   ` Konstantin Osipov
  2019-04-18 18:23     ` Vladimir Davydov
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Osipov @ 2019-04-18 17:24 UTC (permalink / raw)
  To: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/04/18 20:19]:
> On Thu, Apr 18, 2019 at 06:46:54PM +0300, Vladimir Davydov wrote:
> > +static void
> > +checkpoint_cancel(struct checkpoint *ckpt)
> > +{
> > +	/*
> > +	 * Cancel the checkpoint thread if it's running and wait
> > +	 * for it to terminate so as to eliminate the possibility
> > +	 * of use-after-free.
> > +	 */
> > +	if (ckpt->cord.id != 0 &&
> > +	    tt_pthread_cancel(ckpt->cord.id) != ESRCH)
> > +		tt_pthread_join(ckpt->cord.id, NULL);
> > +	checkpoint_delete(ckpt);
> > +}
> 
> Oops, turned out this check isn't quite correct as we don't reset
> cord.id after joining the thread. As a result, we might call
> pthread_cancel on an already joined thread, which leads to a crash
> (caught by Travis CI):

I would reset cord.id instead.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] Re: [PATCH] memtx: cancel checkpoint thread at exit
  2019-04-18 17:24   ` [tarantool-patches] " Konstantin Osipov
@ 2019-04-18 18:23     ` Vladimir Davydov
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2019-04-18 18:23 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Thu, Apr 18, 2019 at 08:24:55PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/04/18 20:19]:
> > On Thu, Apr 18, 2019 at 06:46:54PM +0300, Vladimir Davydov wrote:
> > > +static void
> > > +checkpoint_cancel(struct checkpoint *ckpt)
> > > +{
> > > +	/*
> > > +	 * Cancel the checkpoint thread if it's running and wait
> > > +	 * for it to terminate so as to eliminate the possibility
> > > +	 * of use-after-free.
> > > +	 */
> > > +	if (ckpt->cord.id != 0 &&
> > > +	    tt_pthread_cancel(ckpt->cord.id) != ESRCH)
> > > +		tt_pthread_join(ckpt->cord.id, NULL);
> > > +	checkpoint_delete(ckpt);
> > > +}
> > 
> > Oops, turned out this check isn't quite correct as we don't reset
> > cord.id after joining the thread. As a result, we might call
> > pthread_cancel on an already joined thread, which leads to a crash
> > (caught by Travis CI):
> 
> I would reset cord.id instead.

Umm, no - the flag is already there and is used for the same purpose so
I don't see any point in resetting cord.id. Anyway, it's a bikeshed.
Pushed as is to master.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-18 18:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-18 15:46 [PATCH] memtx: cancel checkpoint thread at exit Vladimir Davydov
2019-04-18 17:15 ` Vladimir Davydov
2019-04-18 17:24   ` [tarantool-patches] " Konstantin Osipov
2019-04-18 18:23     ` Vladimir Davydov
2019-04-18 17:24 ` Konstantin Osipov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox