* [PATCH 00/13] box: garbage collection refactoring and fixes
@ 2018-10-04 17:20 Vladimir Davydov
2018-10-04 17:20 ` [PATCH 01/13] vinyl: fix master crash on replica join failure Vladimir Davydov
` (13 more replies)
0 siblings, 14 replies; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Commit 9c5d851d7830 ("replication: remove old snapshot files not needed
by replicas") introduced gc_consumer types so that a consumer could pin
either WALs or checkpoints, not necessarily both. This makes sense,
because a replica doesn't need to pin any checkpoints, however the
implementation looks rather dubious: consumers of all kinds are stored
in the same binary search tree so to find the consumer that needs the
oldest checkpoint we have to linearly scan this tree, which is
inefficient and ugly (see gc_tree_first_checkpoint). This also
complicates further work on the garbage collector, in particular
auto-deletion of WAL files on ENOSPC (#3397) and persistent garbage
collector state (#3442).
So this patch set separates WAL consumers from checkpoint references:
gc_consumer can now only be used to pin WALs while to pin a checkpoint
one has to use gc_checkpoint_ref, which has a more lightweight API and
implementation (e.g. it doesn't have "advance" method, because it
doesn't make sense to advance a checkpoint consumer). Along the way,
it does some related cleanups and fixes bug #3708, which was also
introduced by the above mentioned commit.
https://github.com/tarantool/tarantool/issues/3708
https://github.com/tarantool/tarantool/tree/dv/gh-3708-box-gc-fixes
Vladimir Davydov (13):
vinyl: fix master crash on replica join failure
vinyl: force deletion of runs left from unfinished indexes on restart
gc: make gc_consumer and gc_state structs transparent
gc: use fixed length buffer for storing consumer name
gc: fold gc_consumer_new and gc_consumer_delete
gc: format consumer name in gc_consumer_register
gc: rename checkpoint_count to min_checkpoint_count
gc: keep track of available checkpoints
gc: cleanup garbage collection procedure
gc: improve box.info.gc output
gc: separate checkpoint references from wal consumers
gc: call gc_run unconditionally when consumer is advanced
replication: ref checkpoint needed to join replica
src/box/CMakeLists.txt | 1 -
src/box/box.cc | 103 ++++++------
src/box/checkpoint.c | 72 ---------
src/box/checkpoint.h | 97 ------------
src/box/gc.c | 312 ++++++++++++++++---------------------
src/box/gc.h | 187 +++++++++++++++++-----
src/box/lua/info.c | 44 ++++--
src/box/memtx_engine.c | 7 +
src/box/relay.cc | 5 +-
src/box/vinyl.c | 11 +-
src/box/vy_scheduler.c | 1 -
test/replication/gc.result | 19 ++-
test/replication/gc.test.lua | 8 +-
test/vinyl/errinj.result | 51 ++++++
test/vinyl/errinj.test.lua | 18 +++
test/vinyl/errinj_gc.result | 4 -
test/vinyl/errinj_gc.test.lua | 1 -
test/vinyl/replica_rejoin.result | 7 -
test/vinyl/replica_rejoin.test.lua | 2 -
19 files changed, 480 insertions(+), 470 deletions(-)
delete mode 100644 src/box/checkpoint.c
delete mode 100644 src/box/checkpoint.h
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 01/13] vinyl: fix master crash on replica join failure
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:43 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 02/13] vinyl: force deletion of runs left from unfinished indexes on restart Vladimir Davydov
` (12 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
This patch fixes a trivial error on vy_send_range() error path which
results in a master crash in case a file needed to join a replica is
missing or corrupted.
See #3708
---
src/box/vinyl.c | 6 +++---
test/vinyl/errinj.result | 51 ++++++++++++++++++++++++++++++++++++++++++++++
test/vinyl/errinj.test.lua | 18 ++++++++++++++++
3 files changed, 72 insertions(+), 3 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 2d0360c1..0cd7b188 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -2983,13 +2983,13 @@ vy_send_range(struct vy_join_ctx *ctx,
vy_send_range_f, NULL, TIMEOUT_INFINITY);
fiber_set_cancellable(cancellable);
+out_delete_wi:
+ ctx->wi->iface->close(ctx->wi);
+ ctx->wi = NULL;
out_delete_slices:
rlist_foreach_entry_safe(slice, &ctx->slices, in_join, tmp)
vy_slice_delete(slice);
rlist_create(&ctx->slices);
-out_delete_wi:
- ctx->wi->iface->close(ctx->wi);
- ctx->wi = NULL;
out:
return rc;
}
diff --git a/test/vinyl/errinj.result b/test/vinyl/errinj.result
index b4dc5b69..63d14c61 100644
--- a/test/vinyl/errinj.result
+++ b/test/vinyl/errinj.result
@@ -2232,3 +2232,54 @@ test_run:cmd("clear filter")
---
- true
...
+--
+-- Check that an instance doesn't crash if a run file needed for
+-- joining a replica is corrupted (see gh-3708).
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+s:replace{1, 2, 3}
+---
+- [1, 2, 3]
+...
+box.snapshot()
+---
+- ok
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
+errinj.set('ERRINJ_VYRUN_INDEX_GARBAGE', true)
+---
+- ok
+...
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica with crash_expected=True")
+---
+- false
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+test_run:cmd("delete server replica")
+---
+- true
+...
+errinj.set('ERRINJ_VYRUN_INDEX_GARBAGE', false)
+---
+- ok
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/errinj.test.lua b/test/vinyl/errinj.test.lua
index e82b6aee..b6f697d3 100644
--- a/test/vinyl/errinj.test.lua
+++ b/test/vinyl/errinj.test.lua
@@ -882,3 +882,21 @@ i:stat().disk.compact.queue -- none
s:drop()
test_run:cmd("clear filter")
+
+--
+-- Check that an instance doesn't crash if a run file needed for
+-- joining a replica is corrupted (see gh-3708).
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+s:replace{1, 2, 3}
+box.snapshot()
+box.schema.user.grant('guest', 'replication')
+errinj.set('ERRINJ_VYRUN_INDEX_GARBAGE', true)
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica with crash_expected=True")
+test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+errinj.set('ERRINJ_VYRUN_INDEX_GARBAGE', false)
+box.schema.user.revoke('guest', 'replication')
+s:drop()
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 02/13] vinyl: force deletion of runs left from unfinished indexes on restart
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
2018-10-04 17:20 ` [PATCH 01/13] vinyl: fix master crash on replica join failure Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:44 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 03/13] gc: make gc_consumer and gc_state structs transparent Vladimir Davydov
` (11 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
If an instance is restarted while building a new vinyl index, there will
probably be some run files left. Currently, we won't delete such files
until box.snapshot() is called, even though there's no point in keeping
them around. Let's tweak vy_gc_lsm() so that it marks all runs that
belong to an unfinished index as incomplete to force vy_gc() to remove
them immediately after recovery is complete.
This also removes files left from a failed rebootstrap attempt so we can
remove a call to box.snapshot() from vinyl/replica_rejoin.test.lua.
---
src/box/vinyl.c | 2 ++
test/vinyl/errinj_gc.result | 4 ----
test/vinyl/errinj_gc.test.lua | 1 -
test/vinyl/replica_rejoin.result | 7 -------
test/vinyl/replica_rejoin.test.lua | 2 --
5 files changed, 2 insertions(+), 14 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 0cd7b188..fd24f9b5 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -3246,6 +3246,8 @@ vy_gc_lsm(struct vy_lsm_recovery_info *lsm_info)
}
struct vy_run_recovery_info *run_info;
rlist_foreach_entry(run_info, &lsm_info->runs, in_lsm) {
+ if (lsm_info->create_lsn < 0)
+ run_info->is_incomplete = true;
if (!run_info->is_dropped) {
run_info->is_dropped = true;
run_info->gc_lsn = lsm_info->drop_lsn;
diff --git a/test/vinyl/errinj_gc.result b/test/vinyl/errinj_gc.result
index 7c963103..13a43171 100644
--- a/test/vinyl/errinj_gc.result
+++ b/test/vinyl/errinj_gc.result
@@ -254,10 +254,6 @@ ch:get()
- true
...
test_run:cmd('restart server default')
-box.snapshot()
----
-- ok
-...
fio = require('fio')
---
...
diff --git a/test/vinyl/errinj_gc.test.lua b/test/vinyl/errinj_gc.test.lua
index 4c3d1bb3..aaa499a0 100644
--- a/test/vinyl/errinj_gc.test.lua
+++ b/test/vinyl/errinj_gc.test.lua
@@ -120,7 +120,6 @@ ch:get()
#fio.listdir(fio.pathjoin(box.cfg.vinyl_dir, s.id, 1)) > 0
test_run:cmd('restart server default')
-box.snapshot()
fio = require('fio')
s = box.space.test
diff --git a/test/vinyl/replica_rejoin.result b/test/vinyl/replica_rejoin.result
index bd5d1ed3..d153e346 100644
--- a/test/vinyl/replica_rejoin.result
+++ b/test/vinyl/replica_rejoin.result
@@ -164,13 +164,6 @@ test_run:cmd("switch replica")
---
- true
...
-box.cfg{checkpoint_count = 1}
----
-...
-box.snapshot()
----
-- ok
-...
fio = require('fio')
---
...
diff --git a/test/vinyl/replica_rejoin.test.lua b/test/vinyl/replica_rejoin.test.lua
index 972b04e5..8226fb94 100644
--- a/test/vinyl/replica_rejoin.test.lua
+++ b/test/vinyl/replica_rejoin.test.lua
@@ -59,8 +59,6 @@ test_run:cmd("start server replica with crash_expected=True") -- fail
test_run:cmd("start server replica with crash_expected=True") -- fail again
test_run:cmd("start server replica with args='disable_replication'")
test_run:cmd("switch replica")
-box.cfg{checkpoint_count = 1}
-box.snapshot()
fio = require('fio')
fio.chdir(box.cfg.vinyl_dir)
fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 03/13] gc: make gc_consumer and gc_state structs transparent
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
2018-10-04 17:20 ` [PATCH 01/13] vinyl: fix master crash on replica join failure Vladimir Davydov
2018-10-04 17:20 ` [PATCH 02/13] vinyl: force deletion of runs left from unfinished indexes on restart Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:47 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 04/13] gc: use fixed length buffer for storing consumer name Vladimir Davydov
` (10 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
It's exasperating to write trivial external functions for each member of
an opaque struct (gc_consumer_vclock, gc_consumer_name, etc) while we
could simply access those fields directly if we made those structs
transparent. Since we usually define structs as transparent if we need
to use them outside a source file, let's do the same for gc_consumer and
gc_state and remove all those one-line wrappers.
---
src/box/box.cc | 2 +-
src/box/gc.c | 63 +-----------------------------------------------------
src/box/gc.h | 63 ++++++++++++++++++++++++++++++++++++------------------
src/box/lua/info.c | 4 ++--
4 files changed, 46 insertions(+), 86 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 804fc00e..207e411c 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1613,7 +1613,7 @@ box_process_vote(struct ballot *ballot)
{
ballot->is_ro = cfg_geti("read_only") != 0;
vclock_copy(&ballot->vclock, &replicaset.vclock);
- gc_vclock(&ballot->gc_vclock);
+ vclock_copy(&ballot->gc_vclock, &gc.wal_vclock);
}
/** Insert a new cluster into _schema */
diff --git a/src/box/gc.c b/src/box/gc.c
index 3ab76626..a45806b6 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -48,44 +48,7 @@
#include "engine.h" /* engine_collect_garbage() */
#include "wal.h" /* wal_collect_garbage() */
-typedef rb_node(struct gc_consumer) gc_node_t;
-
-/**
- * The object of this type is used to prevent garbage
- * collection from removing files that are still in use.
- */
-struct gc_consumer {
- /** Link in gc_state::consumers. */
- gc_node_t node;
- /** Human-readable name. */
- char *name;
- /** The vclock tracked by this consumer. */
- struct vclock vclock;
- /** Consumer type, indicating that consumer only consumes
- * WAL files, or both - SNAP and WAL.
- */
- enum gc_consumer_type type;
-};
-
-typedef rb_tree(struct gc_consumer) gc_tree_t;
-
-/** Garbage collection state. */
-struct gc_state {
- /** Number of checkpoints to maintain. */
- int checkpoint_count;
- /** Max vclock WAL garbage collection has been called for. */
- struct vclock wal_vclock;
- /** Max vclock checkpoint garbage collection has been called for. */
- struct vclock checkpoint_vclock;
- /** Registered consumers, linked by gc_consumer::node. */
- gc_tree_t consumers;
- /**
- * Latch serializing concurrent invocations of engine
- * garbage collection callbacks.
- */
- struct latch latch;
-};
-static struct gc_state gc;
+struct gc_state gc;
/**
* Comparator used for ordering gc_consumer objects by signature
@@ -163,12 +126,6 @@ gc_free(void)
}
}
-void
-gc_vclock(struct vclock *vclock)
-{
- vclock_copy(vclock, &gc.wal_vclock);
-}
-
/** Find the consumer that uses the oldest checkpoint. */
struct gc_consumer *
gc_tree_first_checkpoint(gc_tree_t *consumers)
@@ -321,24 +278,6 @@ gc_consumer_advance(struct gc_consumer *consumer, const struct vclock *vclock)
gc_run();
}
-const char *
-gc_consumer_name(const struct gc_consumer *consumer)
-{
- return consumer->name;
-}
-
-void
-gc_consumer_vclock(const struct gc_consumer *consumer, struct vclock *vclock)
-{
- vclock_copy(vclock, &consumer->vclock);
-}
-
-int64_t
-gc_consumer_signature(const struct gc_consumer *consumer)
-{
- return vclock_sum(&consumer->vclock);
-}
-
struct gc_consumer *
gc_consumer_iterator_next(struct gc_consumer_iterator *it)
{
diff --git a/src/box/gc.h b/src/box/gc.h
index 36a951bf..2c03770d 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -32,13 +32,14 @@
*/
#include <stddef.h>
-#include <stdint.h>
+
+#include "vclock.h"
+#include "latch.h"
#if defined(__cplusplus)
extern "C" {
#endif /* defined(__cplusplus) */
-struct vclock;
struct gc_consumer;
/** Consumer type: WAL consumer, or SNAP */
@@ -48,6 +49,45 @@ enum gc_consumer_type {
GC_CONSUMER_ALL = 3,
};
+typedef rb_node(struct gc_consumer) gc_node_t;
+
+/**
+ * The object of this type is used to prevent garbage
+ * collection from removing files that are still in use.
+ */
+struct gc_consumer {
+ /** Link in gc_state::consumers. */
+ gc_node_t node;
+ /** Human-readable name. */
+ char *name;
+ /** The vclock tracked by this consumer. */
+ struct vclock vclock;
+ /** Consumer type, indicating that consumer only consumes
+ * WAL files, or both - SNAP and WAL.
+ */
+ enum gc_consumer_type type;
+};
+
+typedef rb_tree(struct gc_consumer) gc_tree_t;
+
+/** Garbage collection state. */
+struct gc_state {
+ /** Number of checkpoints to maintain. */
+ int checkpoint_count;
+ /** Max vclock WAL garbage collection has been called for. */
+ struct vclock wal_vclock;
+ /** Max vclock checkpoint garbage collection has been called for. */
+ struct vclock checkpoint_vclock;
+ /** Registered consumers, linked by gc_consumer::node. */
+ gc_tree_t consumers;
+ /**
+ * Latch serializing concurrent invocations of engine
+ * garbage collection callbacks.
+ */
+ struct latch latch;
+};
+extern struct gc_state gc;
+
/**
* Initialize the garbage collection state.
*/
@@ -61,12 +101,6 @@ void
gc_free(void);
/**
- * Get the oldest available vclock.
- */
-void
-gc_vclock(struct vclock *vclock);
-
-/**
* Invoke garbage collection in order to remove files left
* from old checkpoints. The number of checkpoints saved by
* this function is specified by box.cfg.checkpoint_count.
@@ -112,19 +146,6 @@ gc_consumer_unregister(struct gc_consumer *consumer);
void
gc_consumer_advance(struct gc_consumer *consumer, const struct vclock *vclock);
-/** Return the name of a consumer. */
-const char *
-gc_consumer_name(const struct gc_consumer *consumer);
-
-/** Return the vclock a consumer tracks. */
-void
-gc_consumer_vclock(const struct gc_consumer *consumer, struct vclock *vclock);
-
-
-/** Return the vclock signature a consumer tracks. */
-int64_t
-gc_consumer_signature(const struct gc_consumer *consumer);
-
/**
* Iterator over registered consumers. The iterator is valid
* as long as the caller doesn't yield.
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index d9ea73a6..85b21c65 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -397,11 +397,11 @@ lbox_info_gc_call(struct lua_State *L)
lua_createtable(L, 0, 2);
lua_pushstring(L, "name");
- lua_pushstring(L, gc_consumer_name(consumer));
+ lua_pushstring(L, consumer->name);
lua_settable(L, -3);
lua_pushstring(L, "signature");
- luaL_pushint64(L, gc_consumer_signature(consumer));
+ luaL_pushint64(L, vclock_sum(&consumer->vclock));
lua_settable(L, -3);
lua_rawseti(L, -2, ++count);
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 04/13] gc: use fixed length buffer for storing consumer name
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (2 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 03/13] gc: make gc_consumer and gc_state structs transparent Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:47 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete Vladimir Davydov
` (9 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
The length of a consumer name never exceeds 64 characters so no use to
allocate a string. This is a mere code simplification.
---
src/box/gc.c | 11 ++---------
src/box/gc.h | 4 +++-
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/src/box/gc.c b/src/box/gc.c
index a45806b6..7de50e84 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -34,7 +34,7 @@
#include <stdint.h>
#include <stdlib.h>
-#include <string.h>
+#include <stdio.h>
#define RB_COMPACT 1
#include <small/rb.h>
@@ -82,13 +82,7 @@ gc_consumer_new(const char *name, const struct vclock *vclock,
"malloc", "struct gc_consumer");
return NULL;
}
- consumer->name = strdup(name);
- if (consumer->name == NULL) {
- diag_set(OutOfMemory, strlen(name) + 1,
- "malloc", "struct gc_consumer");
- free(consumer);
- return NULL;
- }
+ snprintf(consumer->name, GC_NAME_MAX, "%s", name);
vclock_copy(&consumer->vclock, vclock);
consumer->type = type;
return consumer;
@@ -98,7 +92,6 @@ gc_consumer_new(const char *name, const struct vclock *vclock,
static void
gc_consumer_delete(struct gc_consumer *consumer)
{
- free(consumer->name);
TRASH(consumer);
free(consumer);
}
diff --git a/src/box/gc.h b/src/box/gc.h
index 2c03770d..fde4facf 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -42,6 +42,8 @@ extern "C" {
struct gc_consumer;
+enum { GC_NAME_MAX = 64 };
+
/** Consumer type: WAL consumer, or SNAP */
enum gc_consumer_type {
GC_CONSUMER_WAL = 1,
@@ -59,7 +61,7 @@ struct gc_consumer {
/** Link in gc_state::consumers. */
gc_node_t node;
/** Human-readable name. */
- char *name;
+ char name[GC_NAME_MAX];
/** The vclock tracked by this consumer. */
struct vclock vclock;
/** Consumer type, indicating that consumer only consumes
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (3 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 04/13] gc: use fixed length buffer for storing consumer name Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:50 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 06/13] gc: format consumer name in gc_consumer_register Vladimir Davydov
` (8 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
gc_consumer_new is used in one place while gc_consumer_delete is used in
two places, but it's a one-liner. Let's fold them to make the code flow
more straightforward.
---
src/box/gc.c | 44 ++++++++++++++------------------------------
1 file changed, 14 insertions(+), 30 deletions(-)
diff --git a/src/box/gc.c b/src/box/gc.c
index 7de50e84..449416b5 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -71,31 +71,6 @@ gc_consumer_cmp(const struct gc_consumer *a, const struct gc_consumer *b)
rb_gen(MAYBE_UNUSED static inline, gc_tree_, gc_tree_t,
struct gc_consumer, node, gc_consumer_cmp);
-/** Allocate a consumer object. */
-static struct gc_consumer *
-gc_consumer_new(const char *name, const struct vclock *vclock,
- enum gc_consumer_type type)
-{
- struct gc_consumer *consumer = calloc(1, sizeof(*consumer));
- if (consumer == NULL) {
- diag_set(OutOfMemory, sizeof(*consumer),
- "malloc", "struct gc_consumer");
- return NULL;
- }
- snprintf(consumer->name, GC_NAME_MAX, "%s", name);
- vclock_copy(&consumer->vclock, vclock);
- consumer->type = type;
- return consumer;
-}
-
-/** Free a consumer object. */
-static void
-gc_consumer_delete(struct gc_consumer *consumer)
-{
- TRASH(consumer);
- free(consumer);
-}
-
void
gc_init(void)
{
@@ -114,7 +89,7 @@ gc_free(void)
struct gc_consumer *next = gc_tree_next(&gc.consumers,
consumer);
gc_tree_remove(&gc.consumers, consumer);
- gc_consumer_delete(consumer);
+ free(consumer);
consumer = next;
}
}
@@ -213,9 +188,18 @@ struct gc_consumer *
gc_consumer_register(const char *name, const struct vclock *vclock,
enum gc_consumer_type type)
{
- struct gc_consumer *consumer = gc_consumer_new(name, vclock, type);
- if (consumer != NULL)
- gc_tree_insert(&gc.consumers, consumer);
+ struct gc_consumer *consumer = calloc(1, sizeof(*consumer));
+ if (consumer == NULL) {
+ diag_set(OutOfMemory, sizeof(*consumer),
+ "malloc", "struct gc_consumer");
+ return NULL;
+ }
+
+ snprintf(consumer->name, GC_NAME_MAX, "%s", name);
+ vclock_copy(&consumer->vclock, vclock);
+ consumer->type = type;
+
+ gc_tree_insert(&gc.consumers, consumer);
return consumer;
}
@@ -225,7 +209,7 @@ gc_consumer_unregister(struct gc_consumer *consumer)
int64_t signature = vclock_sum(&consumer->vclock);
gc_tree_remove(&gc.consumers, consumer);
- gc_consumer_delete(consumer);
+ free(consumer);
/*
* Rerun garbage collection after removing the consumer
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 06/13] gc: format consumer name in gc_consumer_register
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (4 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:50 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 07/13] gc: rename checkpoint_count to min_checkpoint_count Vladimir Davydov
` (7 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
It's better than using tt_snprintf at call sites.
---
src/box/box.cc | 7 +++----
src/box/gc.c | 11 ++++++++---
src/box/gc.h | 10 ++++++----
src/box/relay.cc | 6 +++---
4 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 207e411c..bdf71eb2 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1458,9 +1458,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
tnt_raise(ClientError, ER_MISSING_SNAPSHOT);
/* Register the replica with the garbage collector. */
- struct gc_consumer *gc = gc_consumer_register(
- tt_sprintf("replica %s", tt_uuid_str(&instance_uuid)),
- &start_vclock, GC_CONSUMER_WAL);
+ struct gc_consumer *gc = gc_consumer_register(&start_vclock,
+ GC_CONSUMER_WAL, "replica %s", tt_uuid_str(&instance_uuid));
if (gc == NULL)
diag_raise();
auto gc_guard = make_scoped_guard([=]{
@@ -2176,7 +2175,7 @@ box_backup_start(int checkpoint_idx, box_backup_cb cb, void *cb_arg)
return -1;
}
} while (checkpoint_idx-- > 0);
- backup_gc = gc_consumer_register("backup", vclock, GC_CONSUMER_ALL);
+ backup_gc = gc_consumer_register(vclock, GC_CONSUMER_ALL, "backup");
if (backup_gc == NULL)
return -1;
int rc = engine_backup(vclock, cb, cb_arg);
diff --git a/src/box/gc.c b/src/box/gc.c
index 449416b5..100c972f 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -32,6 +32,7 @@
#include <trivia/util.h>
+#include <stdarg.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
@@ -185,8 +186,8 @@ gc_set_checkpoint_count(int checkpoint_count)
}
struct gc_consumer *
-gc_consumer_register(const char *name, const struct vclock *vclock,
- enum gc_consumer_type type)
+gc_consumer_register(const struct vclock *vclock, enum gc_consumer_type type,
+ const char *format, ...)
{
struct gc_consumer *consumer = calloc(1, sizeof(*consumer));
if (consumer == NULL) {
@@ -195,7 +196,11 @@ gc_consumer_register(const char *name, const struct vclock *vclock,
return NULL;
}
- snprintf(consumer->name, GC_NAME_MAX, "%s", name);
+ va_list ap;
+ va_start(ap, format);
+ vsnprintf(consumer->name, GC_NAME_MAX, format, ap);
+ va_end(ap);
+
vclock_copy(&consumer->vclock, vclock);
consumer->type = type;
diff --git a/src/box/gc.h b/src/box/gc.h
index fde4facf..6e2a4910 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -35,6 +35,7 @@
#include "vclock.h"
#include "latch.h"
+#include "trivia/util.h"
#if defined(__cplusplus)
extern "C" {
@@ -122,17 +123,18 @@ gc_set_checkpoint_count(int checkpoint_count);
*
* This will stop garbage collection of objects newer than
* @vclock until the consumer is unregistered or advanced.
- * @name is a human-readable name of the consumer, it will
- * be used for reporting the consumer to the user.
+ * @format... specifies a human-readable name of the consumer,
+ * it will be used for listing the consumer in box.info.gc().
* @type consumer type, reporting whether consumer only depends
* on WAL files.
*
* Returns a pointer to the new consumer object or NULL on
* memory allocation failure.
*/
+CFORMAT(printf, 3, 4)
struct gc_consumer *
-gc_consumer_register(const char *name, const struct vclock *vclock,
- enum gc_consumer_type type);
+gc_consumer_register(const struct vclock *vclock, enum gc_consumer_type type,
+ const char *format, ...);
/**
* Unregister a consumer and invoke garbage collection
diff --git a/src/box/relay.cc b/src/box/relay.cc
index c90383d4..88430856 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -592,9 +592,9 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
* join.
*/
if (replica->gc == NULL) {
- replica->gc = gc_consumer_register(
- tt_sprintf("replica %s", tt_uuid_str(&replica->uuid)),
- replica_clock, GC_CONSUMER_WAL);
+ replica->gc = gc_consumer_register(replica_clock,
+ GC_CONSUMER_WAL, "replica %s",
+ tt_uuid_str(&replica->uuid));
if (replica->gc == NULL)
diag_raise();
}
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 07/13] gc: rename checkpoint_count to min_checkpoint_count
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (5 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 06/13] gc: format consumer name in gc_consumer_register Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:51 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 08/13] gc: keep track of available checkpoints Vladimir Davydov
` (6 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Because it's the minimal number of checkpoints that must not be deleted,
not the actual number of preserved checkpoints. Do it now, in a separate
patch so as to ease review of the next patch.
While we are at it, fix the comment to gc_set_(min_)checkpoint_count()
which got outdated by commit 5512053fa281 ("box: gc: do not remove files
being backed up").
---
src/box/box.cc | 2 +-
src/box/gc.c | 15 ++++++++-------
src/box/gc.h | 17 ++++++++++++-----
3 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index bdf71eb2..49deea61 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -826,7 +826,7 @@ box_set_checkpoint_count(void)
{
int checkpoint_count = cfg_geti("checkpoint_count");
box_check_checkpoint_count(checkpoint_count);
- gc_set_checkpoint_count(checkpoint_count);
+ gc_set_min_checkpoint_count(checkpoint_count);
}
void
diff --git a/src/box/gc.c b/src/box/gc.c
index 100c972f..dca278af 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -108,8 +108,8 @@ gc_tree_first_checkpoint(gc_tree_t *consumers)
void
gc_run(void)
{
- int checkpoint_count = gc.checkpoint_count;
- assert(checkpoint_count > 0);
+ int min_checkpoint_count = gc.min_checkpoint_count;
+ assert(min_checkpoint_count > 0);
/* Look up the consumer that uses the oldest WAL. */
struct gc_consumer *leftmost = gc_tree_first(&gc.consumers);
@@ -119,8 +119,9 @@ gc_run(void)
/*
* Find the oldest checkpoint that must be preserved.
- * We have to maintain @checkpoint_count oldest checkpoints,
- * plus we can't remove checkpoints that are still in use.
+ * We have to preserve @min_checkpoint_count oldest
+ * checkpoints, plus we can't remove checkpoints that
+ * are still in use.
*/
struct vclock gc_checkpoint_vclock;
vclock_create(&gc_checkpoint_vclock);
@@ -130,7 +131,7 @@ gc_run(void)
const struct vclock *vclock;
while ((vclock = checkpoint_iterator_prev(&checkpoints)) != NULL) {
- if (--checkpoint_count > 0)
+ if (--min_checkpoint_count > 0)
continue;
if (leftmost_checkpoint != NULL &&
vclock_sum(&leftmost_checkpoint->vclock) < vclock_sum(vclock))
@@ -180,9 +181,9 @@ gc_run(void)
}
void
-gc_set_checkpoint_count(int checkpoint_count)
+gc_set_min_checkpoint_count(int min_checkpoint_count)
{
- gc.checkpoint_count = checkpoint_count;
+ gc.min_checkpoint_count = min_checkpoint_count;
}
struct gc_consumer *
diff --git a/src/box/gc.h b/src/box/gc.h
index 6e2a4910..418f8d5e 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -75,8 +75,11 @@ typedef rb_tree(struct gc_consumer) gc_tree_t;
/** Garbage collection state. */
struct gc_state {
- /** Number of checkpoints to maintain. */
- int checkpoint_count;
+ /**
+ * Minimal number of checkpoints to preserve.
+ * Configured by box.cfg.checkpoint_count.
+ */
+ int min_checkpoint_count;
/** Max vclock WAL garbage collection has been called for. */
struct vclock wal_vclock;
/** Max vclock checkpoint garbage collection has been called for. */
@@ -112,11 +115,15 @@ void
gc_run(void);
/**
- * Update the checkpoint_count configuration option and
- * rerun garbage collection.
+ * Update the minimal number of checkpoints to preserve.
+ * Called when box.cfg.checkpoint_count is updated.
+ *
+ * Note, this function doesn't run garbage collector so
+ * changes will take effect only after a new checkpoint
+ * is created or a consumer is unregistered.
*/
void
-gc_set_checkpoint_count(int checkpoint_count);
+gc_set_min_checkpoint_count(int min_checkpoint_count);
/**
* Register a consumer.
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 08/13] gc: keep track of available checkpoints
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (6 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 07/13] gc: rename checkpoint_count to min_checkpoint_count Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 21:59 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 09/13] gc: cleanup garbage collection procedure Vladimir Davydov
` (5 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Currently, the checkpoint iterator is in fact a wrapper around
memtx_engine::snap_dir while the garbage collector knows nothing about
checkpoints. This feels like encapsulation violation. Let's keep track
of all available checkpoints right in the garbage collector instead
and export gc_ API to iterate over checkpoints.
---
src/box/CMakeLists.txt | 1 -
src/box/box.cc | 50 ++++++++++++++------------
src/box/checkpoint.c | 72 -------------------------------------
src/box/checkpoint.h | 97 --------------------------------------------------
src/box/gc.c | 84 ++++++++++++++++++++++++++++++++++---------
src/box/gc.h | 68 ++++++++++++++++++++++++++++++-----
src/box/lua/info.c | 10 ++----
src/box/memtx_engine.c | 7 ++++
src/box/vinyl.c | 3 +-
src/box/vy_scheduler.c | 1 -
10 files changed, 167 insertions(+), 226 deletions(-)
delete mode 100644 src/box/checkpoint.c
delete mode 100644 src/box/checkpoint.h
diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
index 67750898..52413d3c 100644
--- a/src/box/CMakeLists.txt
+++ b/src/box/CMakeLists.txt
@@ -102,7 +102,6 @@ add_library(box STATIC
txn.c
box.cc
gc.c
- checkpoint.c
user_def.c
user.cc
authentication.cc
diff --git a/src/box/box.cc b/src/box/box.cc
index 49deea61..a33b80ef 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -68,7 +68,6 @@
#include "authentication.h"
#include "path_lock.h"
#include "gc.h"
-#include "checkpoint.h"
#include "systemd.h"
#include "call.h"
#include "func.h"
@@ -1446,17 +1445,20 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
"wal_mode = 'none'");
}
- /* Remember start vclock. */
- struct vclock start_vclock;
/*
* The only case when the directory index is empty is
* when someone has deleted a snapshot and tries to join
* as a replica. Our best effort is to not crash in such
* case: raise ER_MISSING_SNAPSHOT.
*/
- if (checkpoint_last(&start_vclock) < 0)
+ struct gc_checkpoint *checkpoint = gc_last_checkpoint();
+ if (checkpoint == NULL)
tnt_raise(ClientError, ER_MISSING_SNAPSHOT);
+ /* Remember start vclock. */
+ struct vclock start_vclock;
+ vclock_copy(&start_vclock, &checkpoint->vclock);
+
/* Register the replica with the garbage collector. */
struct gc_consumer *gc = gc_consumer_register(&start_vclock,
GC_CONSUMER_WAL, "replica %s", tt_uuid_str(&instance_uuid));
@@ -1725,6 +1727,8 @@ bootstrap_master(const struct tt_uuid *replicaset_uuid)
if (engine_begin_checkpoint() ||
engine_commit_checkpoint(&replicaset.vclock))
panic("failed to create a checkpoint");
+
+ gc_add_checkpoint(&replicaset.vclock);
}
/**
@@ -1785,6 +1789,8 @@ bootstrap_from_master(struct replica *master)
if (engine_begin_checkpoint() ||
engine_commit_checkpoint(&replicaset.vclock))
panic("failed to create a checkpoint");
+
+ gc_add_checkpoint(&replicaset.vclock);
}
/**
@@ -2036,8 +2042,7 @@ box_cfg_xc(void)
xstream_create(&join_stream, apply_initial_join_row);
xstream_create(&subscribe_stream, apply_row);
- struct vclock last_checkpoint_vclock;
- int64_t last_checkpoint_lsn = checkpoint_last(&last_checkpoint_vclock);
+ struct gc_checkpoint *checkpoint = gc_last_checkpoint();
/*
* Lock the write ahead log directory to avoid multiple
@@ -2051,14 +2056,14 @@ box_cfg_xc(void)
* refuse to start. In hot standby mode, a busy
* WAL dir must contain at least one xlog.
*/
- if (!cfg_geti("hot_standby") || last_checkpoint_lsn < 0)
+ if (!cfg_geti("hot_standby") || checkpoint == NULL)
tnt_raise(ClientError, ER_ALREADY_RUNNING, cfg_gets("wal_dir"));
}
bool is_bootstrap_leader = false;
- if (last_checkpoint_lsn >= 0) {
+ if (checkpoint != NULL) {
/* Recover the instance from the local directory */
local_recovery(&instance_uuid, &replicaset_uuid,
- &last_checkpoint_vclock);
+ &checkpoint->vclock);
} else {
/* Bootstrap a new master */
bootstrap(&instance_uuid, &replicaset_uuid,
@@ -2151,7 +2156,8 @@ end:
if (rc)
engine_abort_checkpoint();
else
- gc_run();
+ gc_add_checkpoint(&vclock);
+
latch_unlock(&schema_lock);
box_checkpoint_is_in_progress = false;
return rc;
@@ -2165,20 +2171,20 @@ box_backup_start(int checkpoint_idx, box_backup_cb cb, void *cb_arg)
diag_set(ClientError, ER_BACKUP_IN_PROGRESS);
return -1;
}
- const struct vclock *vclock;
- struct checkpoint_iterator it;
- checkpoint_iterator_init(&it);
- do {
- vclock = checkpoint_iterator_prev(&it);
- if (vclock == NULL) {
- diag_set(ClientError, ER_MISSING_SNAPSHOT);
- return -1;
- }
- } while (checkpoint_idx-- > 0);
- backup_gc = gc_consumer_register(vclock, GC_CONSUMER_ALL, "backup");
+ struct gc_checkpoint *checkpoint;
+ gc_foreach_checkpoint_reverse(checkpoint) {
+ if (checkpoint_idx-- == 0)
+ break;
+ }
+ if (checkpoint_idx >= 0) {
+ diag_set(ClientError, ER_MISSING_SNAPSHOT);
+ return -1;
+ }
+ backup_gc = gc_consumer_register(&checkpoint->vclock,
+ GC_CONSUMER_ALL, "backup");
if (backup_gc == NULL)
return -1;
- int rc = engine_backup(vclock, cb, cb_arg);
+ int rc = engine_backup(&checkpoint->vclock, cb, cb_arg);
if (rc != 0) {
gc_consumer_unregister(backup_gc);
backup_gc = NULL;
diff --git a/src/box/checkpoint.c b/src/box/checkpoint.c
deleted file mode 100644
index cc32e75c..00000000
--- a/src/box/checkpoint.c
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * Copyright 2010-2017, Tarantool AUTHORS, please see AUTHORS file.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * 1. Redistributions of source code must retain the above
- * copyright notice, this list of conditions and the
- * following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials
- * provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY AUTHORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
- * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
- * AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
- * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
- * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
- * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
- * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-#include "checkpoint.h"
-
-#include <stdbool.h>
-#include <stdint.h>
-
-#include "engine.h"
-#include "memtx_engine.h"
-
-int64_t
-checkpoint_last(struct vclock *vclock)
-{
- struct memtx_engine *memtx;
- memtx = (struct memtx_engine *)engine_by_name("memtx");
- assert(memtx != NULL);
- return xdir_last_vclock(&memtx->snap_dir, vclock);
-}
-
-const struct vclock *
-checkpoint_iterator_next(struct checkpoint_iterator *it)
-{
- struct memtx_engine *memtx;
- memtx = (struct memtx_engine *)engine_by_name("memtx");
- assert(memtx != NULL);
- it->curr = it->curr == NULL ?
- vclockset_first(&memtx->snap_dir.index) :
- vclockset_next(&memtx->snap_dir.index,
- (struct vclock *)it->curr);
- return it->curr;
-}
-
-const struct vclock *
-checkpoint_iterator_prev(struct checkpoint_iterator *it)
-{
- struct memtx_engine *memtx;
- memtx = (struct memtx_engine *)engine_by_name("memtx");
- assert(memtx != NULL);
- it->curr = it->curr == NULL ?
- vclockset_last(&memtx->snap_dir.index) :
- vclockset_prev(&memtx->snap_dir.index,
- (struct vclock *)it->curr);
- return it->curr;
-}
diff --git a/src/box/checkpoint.h b/src/box/checkpoint.h
deleted file mode 100644
index 00a1e705..00000000
--- a/src/box/checkpoint.h
+++ /dev/null
@@ -1,97 +0,0 @@
-#ifndef TARANTOOL_BOX_CHECKPOINT_H_INCLUDED
-#define TARANTOOL_BOX_CHECKPOINT_H_INCLUDED
-/*
- * Copyright 2010-2017, Tarantool AUTHORS, please see AUTHORS file.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * 1. Redistributions of source code must retain the above
- * copyright notice, this list of conditions and the
- * following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials
- * provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY AUTHORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
- * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
- * AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
- * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
- * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
- * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
- * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-
-#include <stdbool.h>
-#include <stddef.h>
-#include <stdint.h>
-
-/**
- * This module implements a simple API for working with checkpoints.
- * As checkpoints are, in fact, memtx snapshots, functions exported
- * by this module are C wrappers around corresponding memtx_engine
- * methods.
- */
-
-#if defined(__cplusplus)
-extern "C" {
-#endif /* defined(__cplusplus) */
-
-struct vclock;
-
-/**
- * Return LSN and vclock (unless @vclock is NULL) of the most
- * recent checkpoint or -1 if there is no checkpoint.
- */
-int64_t
-checkpoint_last(struct vclock *vclock);
-
-/** Iterator over all existing checkpoints. */
-struct checkpoint_iterator {
- const struct vclock *curr;
-};
-
-/**
- * Init a checkpoint iterator. The iterator is valid as long
- * as the caller doesn't yield.
- */
-static inline void
-checkpoint_iterator_init(struct checkpoint_iterator *it)
-{
- it->curr = NULL;
-}
-
-/**
- * Iterate to the next checkpoint. Return NULL if the current
- * checkpoint is the most recent one.
- *
- * If called on the last iteration, this function positions
- * the iterator to the oldest checkpoint.
- */
-const struct vclock *
-checkpoint_iterator_next(struct checkpoint_iterator *it);
-
-/**
- * Iterate to the previous checkpoint. Return NULL if the current
- * checkpoint is the oldest one.
- *
- * If called on the first iteration, this function positions
- * the iterator to the newest checkpoint.
- */
-const struct vclock *
-checkpoint_iterator_prev(struct checkpoint_iterator *it);
-
-#if defined(__cplusplus)
-} /* extern "C" */
-#endif /* defined(__cplusplus) */
-
-#endif /* TARANTOOL_BOX_CHECKPOINT_H_INCLUDED */
diff --git a/src/box/gc.c b/src/box/gc.c
index dca278af..e6951f4c 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -32,7 +32,10 @@
#include <trivia/util.h>
+#include <assert.h>
+#include <limits.h>
#include <stdarg.h>
+#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
@@ -45,7 +48,6 @@
#include "say.h"
#include "latch.h"
#include "vclock.h"
-#include "checkpoint.h"
#include "engine.h" /* engine_collect_garbage() */
#include "wal.h" /* wal_collect_garbage() */
@@ -75,6 +77,10 @@ rb_gen(MAYBE_UNUSED static inline, gc_tree_, gc_tree_t,
void
gc_init(void)
{
+ /* Don't delete any files until recovery is complete. */
+ gc.min_checkpoint_count = INT_MAX;
+
+ rlist_create(&gc.checkpoints);
vclock_create(&gc.wal_vclock);
vclock_create(&gc.checkpoint_vclock);
gc_tree_new(&gc.consumers);
@@ -84,6 +90,12 @@ gc_init(void)
void
gc_free(void)
{
+ /* Free checkpoints. */
+ struct gc_checkpoint *checkpoint, *next_checkpoint;
+ rlist_foreach_entry_safe(checkpoint, &gc.checkpoints, in_checkpoints,
+ next_checkpoint) {
+ free(checkpoint);
+ }
/* Free all registered consumers. */
struct gc_consumer *consumer = gc_tree_first(&gc.consumers);
while (consumer != NULL) {
@@ -105,12 +117,14 @@ gc_tree_first_checkpoint(gc_tree_t *consumers)
return consumer;
}
-void
+/**
+ * Invoke garbage collection in order to remove files left
+ * from old checkpoints. The number of checkpoints saved by
+ * this function is specified by box.cfg.checkpoint_count.
+ */
+static void
gc_run(void)
{
- int min_checkpoint_count = gc.min_checkpoint_count;
- assert(min_checkpoint_count > 0);
-
/* Look up the consumer that uses the oldest WAL. */
struct gc_consumer *leftmost = gc_tree_first(&gc.consumers);
/* Look up the consumer that uses the oldest checkpoint. */
@@ -126,20 +140,25 @@ gc_run(void)
struct vclock gc_checkpoint_vclock;
vclock_create(&gc_checkpoint_vclock);
- struct checkpoint_iterator checkpoints;
- checkpoint_iterator_init(&checkpoints);
-
- const struct vclock *vclock;
- while ((vclock = checkpoint_iterator_prev(&checkpoints)) != NULL) {
- if (--min_checkpoint_count > 0)
- continue;
+ struct gc_checkpoint *checkpoint = NULL;
+ while (true) {
+ checkpoint = rlist_first_entry(&gc.checkpoints,
+ struct gc_checkpoint, in_checkpoints);
+ vclock_copy(&gc_checkpoint_vclock, &checkpoint->vclock);
+ if (gc.checkpoint_count <= gc.min_checkpoint_count)
+ break;
if (leftmost_checkpoint != NULL &&
- vclock_sum(&leftmost_checkpoint->vclock) < vclock_sum(vclock))
- continue;
- vclock_copy(&gc_checkpoint_vclock, vclock);
- break;
+ vclock_sum(&checkpoint->vclock) >=
+ vclock_sum(&leftmost_checkpoint->vclock))
+ break; /* checkpoint is in use */
+ rlist_del_entry(checkpoint, in_checkpoints);
+ free(checkpoint);
+ gc.checkpoint_count--;
}
+ /* At least one checkpoint must always be available. */
+ assert(checkpoint != NULL);
+
struct vclock gc_wal_vclock;
if (leftmost != NULL &&
vclock_sum(&leftmost->vclock) < vclock_sum(&gc_checkpoint_vclock))
@@ -186,6 +205,39 @@ gc_set_min_checkpoint_count(int min_checkpoint_count)
gc.min_checkpoint_count = min_checkpoint_count;
}
+void
+gc_add_checkpoint(const struct vclock *vclock)
+{
+ struct gc_checkpoint *last_checkpoint = gc_last_checkpoint();
+ if (last_checkpoint != NULL &&
+ vclock_sum(&last_checkpoint->vclock) == vclock_sum(vclock)) {
+ /*
+ * No new checkpoint was actually created.
+ * Rerun the garbage collector to delete old
+ * files in case box.cfg.checkpoint_count
+ * was changed.
+ */
+ gc_run();
+ return;
+ }
+ assert(last_checkpoint == NULL ||
+ vclock_sum(&last_checkpoint->vclock) < vclock_sum(vclock));
+
+ struct gc_checkpoint *checkpoint = calloc(1, sizeof(*checkpoint));
+ /*
+ * This function is called after a checkpoint is written
+ * to disk so it can't fail.
+ */
+ if (checkpoint == NULL)
+ panic("out of memory");
+
+ vclock_copy(&checkpoint->vclock, vclock);
+ rlist_add_tail_entry(&gc.checkpoints, checkpoint, in_checkpoints);
+ gc.checkpoint_count++;
+
+ gc_run();
+}
+
struct gc_consumer *
gc_consumer_register(const struct vclock *vclock, enum gc_consumer_type type,
const char *format, ...)
diff --git a/src/box/gc.h b/src/box/gc.h
index 418f8d5e..e26b1017 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -32,6 +32,7 @@
*/
#include <stddef.h>
+#include <small/rlist.h>
#include "vclock.h"
#include "latch.h"
@@ -55,6 +56,17 @@ enum gc_consumer_type {
typedef rb_node(struct gc_consumer) gc_node_t;
/**
+ * Garbage collector keeps track of all preserved checkpoints.
+ * The following structure represents a checkpoint.
+ */
+struct gc_checkpoint {
+ /** Link in gc_state::checkpoints. */
+ struct rlist in_checkpoints;
+ /** VClock of the checkpoint. */
+ struct vclock vclock;
+};
+
+/**
* The object of this type is used to prevent garbage
* collection from removing files that are still in use.
*/
@@ -80,6 +92,17 @@ struct gc_state {
* Configured by box.cfg.checkpoint_count.
*/
int min_checkpoint_count;
+ /**
+ * Number of preserved checkpoints. May be greater than
+ * @min_checkpoint_count, because some checkpoints may
+ * be in use by replication or backup.
+ */
+ int checkpoint_count;
+ /**
+ * List of preserved checkpoints. New checkpoints are added
+ * to the tail. Linked by gc_checkpoint::in_checkpoints.
+ */
+ struct rlist checkpoints;
/** Max vclock WAL garbage collection has been called for. */
struct vclock wal_vclock;
/** Max vclock checkpoint garbage collection has been called for. */
@@ -95,6 +118,35 @@ struct gc_state {
extern struct gc_state gc;
/**
+ * Iterate over all checkpoints tracked by the garbage collector,
+ * starting from the oldest and ending with the newest.
+ */
+#define gc_foreach_checkpoint(checkpoint) \
+ rlist_foreach_entry(checkpoint, &gc.checkpoints, in_checkpoints)
+
+/**
+ * Iterate over all checkpoints tracked by the garbage collector
+ * in the reverse order, that is starting from the newest and
+ * ending with the oldest.
+ */
+#define gc_foreach_checkpoint_reverse(checkpoint) \
+ rlist_foreach_entry_reverse(checkpoint, &gc.checkpoints, in_checkpoints)
+
+/**
+ * Return the last (newest) checkpoint known to the garbage
+ * collector. If there's no checkpoint, return NULL.
+ */
+static inline struct gc_checkpoint *
+gc_last_checkpoint(void)
+{
+ if (rlist_empty(&gc.checkpoints))
+ return NULL;
+
+ return rlist_last_entry(&gc.checkpoints, struct gc_checkpoint,
+ in_checkpoints);
+}
+
+/**
* Initialize the garbage collection state.
*/
void
@@ -107,14 +159,6 @@ void
gc_free(void);
/**
- * Invoke garbage collection in order to remove files left
- * from old checkpoints. The number of checkpoints saved by
- * this function is specified by box.cfg.checkpoint_count.
- */
-void
-gc_run(void);
-
-/**
* Update the minimal number of checkpoints to preserve.
* Called when box.cfg.checkpoint_count is updated.
*
@@ -126,6 +170,14 @@ void
gc_set_min_checkpoint_count(int min_checkpoint_count);
/**
+ * Track a new checkpoint in the garbage collector state.
+ * Note, this function may run garbage collector to remove
+ * old checkpoints.
+ */
+void
+gc_add_checkpoint(const struct vclock *vclock);
+
+/**
* Register a consumer.
*
* This will stop garbage collection of objects newer than
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 85b21c65..97d5aba3 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -47,7 +47,6 @@
#include "box/replication.h"
#include "box/info.h"
#include "box/gc.h"
-#include "box/checkpoint.h"
#include "box/engine.h"
#include "box/vinyl.h"
#include "main.h"
@@ -363,22 +362,19 @@ static int
lbox_info_gc_call(struct lua_State *L)
{
int count;
- const struct vclock *vclock;
lua_newtable(L);
lua_pushstring(L, "checkpoints");
lua_newtable(L);
- struct checkpoint_iterator checkpoints;
- checkpoint_iterator_init(&checkpoints);
-
count = 0;
- while ((vclock = checkpoint_iterator_next(&checkpoints)) != NULL) {
+ struct gc_checkpoint *checkpoint;
+ gc_foreach_checkpoint(checkpoint) {
lua_createtable(L, 0, 1);
lua_pushstring(L, "signature");
- luaL_pushint64(L, vclock_sum(vclock));
+ luaL_pushint64(L, vclock_sum(&checkpoint->vclock));
lua_settable(L, -3);
lua_rawseti(L, -2, ++count);
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 1f80ce54..ae1f5a0e 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1046,6 +1046,13 @@ memtx_engine_new(const char *snap_dirname, bool force_recovery,
xlog_cursor_close(&cursor, false);
}
+ /* Apprise the garbage collector of available checkpoints. */
+ for (struct vclock *vclock = vclockset_first(&memtx->snap_dir.index);
+ vclock != NULL;
+ vclock = vclockset_next(&memtx->snap_dir.index, vclock)) {
+ gc_add_checkpoint(vclock);
+ }
+
stailq_create(&memtx->gc_queue);
memtx->gc_fiber = fiber_new("memtx.gc", memtx_engine_gc_f);
if (memtx->gc_fiber == NULL)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index fd24f9b5..acfe86d1 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -70,7 +70,6 @@
#include "info.h"
#include "column_mask.h"
#include "trigger.h"
-#include "checkpoint.h"
#include "session.h"
#include "wal.h" /* wal_mode() */
@@ -3304,7 +3303,7 @@ vinyl_engine_collect_garbage(struct engine *engine, int64_t lsn)
vy_log_collect_garbage(lsn);
/* Cleanup run files. */
- int64_t signature = checkpoint_last(NULL);
+ int64_t signature = vy_log_signature();
struct vy_recovery *recovery = vy_recovery_new(signature, 0);
if (recovery == NULL) {
say_error("failed to recover vylog for garbage collection");
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index 2f85424a..eab3f6c5 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -39,7 +39,6 @@
#include <small/rlist.h>
#include <tarantool_ev.h>
-#include "checkpoint.h"
#include "diag.h"
#include "errcode.h"
#include "errinj.h"
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 09/13] gc: cleanup garbage collection procedure
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (7 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 08/13] gc: keep track of available checkpoints Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 22:00 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 10/13] gc: improve box.info.gc output Vladimir Davydov
` (4 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Do some refactoring intended to make the code of gc_run() easier for
understanding:
- Remove gc_state::checkpoint_vclock. It was used to avoid rerunning
engine gc callback in case no checkpoint was deleted. Since we
maintain a list of all available checkpoints, we don't need it for
this anymore - we can run gc only if a checkpoint was actually
removed from the list.
- Rename gc_state::wal_vclock back to gc_state::vclock.
- Use bool variables with descriptive names instead of comparing
vclock signatures.
- Add some comments.
---
src/box/box.cc | 2 +-
src/box/gc.c | 52 +++++++++++++++++++++++++---------------------------
src/box/gc.h | 6 ++----
3 files changed, 28 insertions(+), 32 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index a33b80ef..2b2d2307 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1614,7 +1614,7 @@ box_process_vote(struct ballot *ballot)
{
ballot->is_ro = cfg_geti("read_only") != 0;
vclock_copy(&ballot->vclock, &replicaset.vclock);
- vclock_copy(&ballot->gc_vclock, &gc.wal_vclock);
+ vclock_copy(&ballot->gc_vclock, &gc.vclock);
}
/** Insert a new cluster into _schema */
diff --git a/src/box/gc.c b/src/box/gc.c
index e6951f4c..eb07a3b8 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -80,9 +80,8 @@ gc_init(void)
/* Don't delete any files until recovery is complete. */
gc.min_checkpoint_count = INT_MAX;
+ vclock_create(&gc.vclock);
rlist_create(&gc.checkpoints);
- vclock_create(&gc.wal_vclock);
- vclock_create(&gc.checkpoint_vclock);
gc_tree_new(&gc.consumers);
latch_create(&gc.latch);
}
@@ -125,26 +124,23 @@ gc_tree_first_checkpoint(gc_tree_t *consumers)
static void
gc_run(void)
{
- /* Look up the consumer that uses the oldest WAL. */
- struct gc_consumer *leftmost = gc_tree_first(&gc.consumers);
/* Look up the consumer that uses the oldest checkpoint. */
struct gc_consumer *leftmost_checkpoint =
gc_tree_first_checkpoint(&gc.consumers);
+ bool run_wal_gc = false;
+ bool run_engine_gc = false;
+
/*
* Find the oldest checkpoint that must be preserved.
* We have to preserve @min_checkpoint_count oldest
* checkpoints, plus we can't remove checkpoints that
* are still in use.
*/
- struct vclock gc_checkpoint_vclock;
- vclock_create(&gc_checkpoint_vclock);
-
struct gc_checkpoint *checkpoint = NULL;
while (true) {
checkpoint = rlist_first_entry(&gc.checkpoints,
struct gc_checkpoint, in_checkpoints);
- vclock_copy(&gc_checkpoint_vclock, &checkpoint->vclock);
if (gc.checkpoint_count <= gc.min_checkpoint_count)
break;
if (leftmost_checkpoint != NULL &&
@@ -154,20 +150,29 @@ gc_run(void)
rlist_del_entry(checkpoint, in_checkpoints);
free(checkpoint);
gc.checkpoint_count--;
+ run_engine_gc = true;
}
/* At least one checkpoint must always be available. */
assert(checkpoint != NULL);
- struct vclock gc_wal_vclock;
- if (leftmost != NULL &&
- vclock_sum(&leftmost->vclock) < vclock_sum(&gc_checkpoint_vclock))
- vclock_copy(&gc_wal_vclock, &leftmost->vclock);
- else
- vclock_copy(&gc_wal_vclock, &gc_checkpoint_vclock);
+ /*
+ * Find the vclock of the oldest WAL row to keep.
+ * Note, we must keep all WALs created after the
+ * oldest checkpoint, even if no consumer needs them.
+ */
+ const struct vclock *vclock = (gc_tree_empty(&gc.consumers) ? NULL :
+ &gc_tree_first(&gc.consumers)->vclock);
+ if (vclock == NULL ||
+ vclock_sum(vclock) > vclock_sum(&checkpoint->vclock))
+ vclock = &checkpoint->vclock;
+
+ if (vclock_sum(vclock) > vclock_sum(&gc.vclock)) {
+ vclock_copy(&gc.vclock, vclock);
+ run_wal_gc = true;
+ }
- if (vclock_sum(&gc_wal_vclock) <= vclock_sum(&gc.wal_vclock) &&
- vclock_sum(&gc_checkpoint_vclock) <= vclock_sum(&gc.checkpoint_vclock))
+ if (!run_engine_gc && !run_wal_gc)
return; /* nothing to do */
/*
@@ -185,17 +190,10 @@ gc_run(void)
* fails - see comment to memtx_engine_collect_garbage().
*/
int rc = 0;
-
- if (vclock_sum(&gc_checkpoint_vclock) > vclock_sum(&gc.checkpoint_vclock)) {
- vclock_copy(&gc.checkpoint_vclock, &gc_checkpoint_vclock);
- rc = engine_collect_garbage(vclock_sum(&gc_checkpoint_vclock));
- }
- if (vclock_sum(&gc_wal_vclock) > vclock_sum(&gc.wal_vclock)) {
- vclock_copy(&gc.wal_vclock, &gc_wal_vclock);
- if (rc == 0)
- wal_collect_garbage(vclock_sum(&gc_wal_vclock));
- }
-
+ if (run_engine_gc)
+ rc = engine_collect_garbage(vclock_sum(&checkpoint->vclock));
+ if (run_wal_gc && rc == 0)
+ wal_collect_garbage(vclock_sum(vclock));
latch_unlock(&gc.latch);
}
diff --git a/src/box/gc.h b/src/box/gc.h
index e26b1017..f6f2279e 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -87,6 +87,8 @@ typedef rb_tree(struct gc_consumer) gc_tree_t;
/** Garbage collection state. */
struct gc_state {
+ /** VClock of the oldest WAL row available on the instance. */
+ struct vclock vclock;
/**
* Minimal number of checkpoints to preserve.
* Configured by box.cfg.checkpoint_count.
@@ -103,10 +105,6 @@ struct gc_state {
* to the tail. Linked by gc_checkpoint::in_checkpoints.
*/
struct rlist checkpoints;
- /** Max vclock WAL garbage collection has been called for. */
- struct vclock wal_vclock;
- /** Max vclock checkpoint garbage collection has been called for. */
- struct vclock checkpoint_vclock;
/** Registered consumers, linked by gc_consumer::node. */
gc_tree_t consumers;
/**
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 10/13] gc: improve box.info.gc output
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (8 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 09/13] gc: cleanup garbage collection procedure Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 22:01 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 11/13] gc: separate checkpoint references from wal consumers Vladimir Davydov
` (3 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Report vclocks in addition to signatures. When box.info.gc was first
introduced we used signatures in gc. Now we use vclocks so there's no
reason not to report them. This is consistent with box.info output
(there's vclock and signature).
Report the vclock and signature of the oldest WAL row available on the
instance under box.info.gc().vclock. Without this information the user
would have to figure it out by looking at box.info.gc().consumers.
---
src/box/lua/info.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 97d5aba3..9e51b823 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -365,13 +365,25 @@ lbox_info_gc_call(struct lua_State *L)
lua_newtable(L);
+ lua_pushstring(L, "vclock");
+ lbox_pushvclock(L, &gc.vclock);
+ lua_settable(L, -3);
+
+ lua_pushstring(L, "signature");
+ luaL_pushint64(L, vclock_sum(&gc.vclock));
+ lua_settable(L, -3);
+
lua_pushstring(L, "checkpoints");
lua_newtable(L);
count = 0;
struct gc_checkpoint *checkpoint;
gc_foreach_checkpoint(checkpoint) {
- lua_createtable(L, 0, 1);
+ lua_createtable(L, 0, 2);
+
+ lua_pushstring(L, "vclock");
+ lbox_pushvclock(L, &checkpoint->vclock);
+ lua_settable(L, -3);
lua_pushstring(L, "signature");
luaL_pushint64(L, vclock_sum(&checkpoint->vclock));
@@ -390,12 +402,16 @@ lbox_info_gc_call(struct lua_State *L)
count = 0;
struct gc_consumer *consumer;
while ((consumer = gc_consumer_iterator_next(&consumers)) != NULL) {
- lua_createtable(L, 0, 2);
+ lua_createtable(L, 0, 3);
lua_pushstring(L, "name");
lua_pushstring(L, consumer->name);
lua_settable(L, -3);
+ lua_pushstring(L, "vclock");
+ lbox_pushvclock(L, &consumer->vclock);
+ lua_settable(L, -3);
+
lua_pushstring(L, "signature");
luaL_pushint64(L, vclock_sum(&consumer->vclock));
lua_settable(L, -3);
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 11/13] gc: separate checkpoint references from wal consumers
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (9 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 10/13] gc: improve box.info.gc output Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 22:05 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 12/13] gc: call gc_run unconditionally when consumer is advanced Vladimir Davydov
` (2 subsequent siblings)
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Initially, gc_consumer object was used for pinning both checkpoint and
WAL files, but commit 9c5d851d7830 ("replication: remove old snapshot
files not needed by replicas") changed that. Now whether a consumer pins
WALs or checkpoints or both depends on gc_consumer_type. This was done
so that replicas wouldn't prevent garbage collection of checkpoint
files, which they don't need after initial join is complete.
The way the feature was implemented is rather questionable though:
- Since consumers of both types are stored in the same binary search
tree, we have to iterate through the tree to find the leftmost
checkpoint consumer, see gc_tree_first_checkpoint. This looks
inefficient and ugly.
- The notion of advancing a checkpoint consumer (gc_consumer_advance)
is dubious: there's no point to move on to the next checkpoint after
reading one - instead the consumer needs incremental changes, i.e.
WALs.
To eliminate those questionable aspects and make the code easier for
understanding, let's separate WAL and checkpoint consumers. We do this
by removing gc_consumer_type and making gc_consumer track WALs only.
For pinning the files corresponding to a checkpoint a new object class
is introduced, gc_checkpoint_ref. To pin a checkpoint, gc_ref_checkpoint
needs to be called. It is passed the gc_checkpoint object to pin, the
consumer name, and the gc_checkpoint_ref to store the ref in. To unpin a
previously pinned checkpoint, gc_checkpoint_unref should be called.
References are listed by box.info.gc() for each checkpoint under
'references' key.
---
src/box/box.cc | 30 ++++++++++++++----------
src/box/gc.c | 43 +++++++++++++++++-----------------
src/box/gc.h | 69 ++++++++++++++++++++++++++++++++++++++++--------------
src/box/lua/info.c | 10 ++++++++
src/box/relay.cc | 5 ++--
5 files changed, 102 insertions(+), 55 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 2b2d2307..2679f6da 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -89,11 +89,17 @@ static void title(const char *new_status)
bool box_checkpoint_is_in_progress = false;
/**
- * If backup is in progress, this points to the gc consumer
+ * Set if backup is in progress, i.e. box_backup_start() was
+ * called but box_backup_stop() hasn't been yet.
+ */
+static bool backup_is_in_progress;
+
+/**
+ * If backup is in progress, this points to the gc reference
* object that prevents the garbage collector from deleting
* the checkpoint files that are currently being backed up.
*/
-static struct gc_consumer *backup_gc;
+static struct gc_checkpoint_ref backup_gc;
/**
* The instance is in read-write mode: the local checkpoint
@@ -1461,7 +1467,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
/* Register the replica with the garbage collector. */
struct gc_consumer *gc = gc_consumer_register(&start_vclock,
- GC_CONSUMER_WAL, "replica %s", tt_uuid_str(&instance_uuid));
+ "replica %s", tt_uuid_str(&instance_uuid));
if (gc == NULL)
diag_raise();
auto gc_guard = make_scoped_guard([=]{
@@ -2167,7 +2173,7 @@ int
box_backup_start(int checkpoint_idx, box_backup_cb cb, void *cb_arg)
{
assert(checkpoint_idx >= 0);
- if (backup_gc != NULL) {
+ if (backup_is_in_progress) {
diag_set(ClientError, ER_BACKUP_IN_PROGRESS);
return -1;
}
@@ -2180,14 +2186,12 @@ box_backup_start(int checkpoint_idx, box_backup_cb cb, void *cb_arg)
diag_set(ClientError, ER_MISSING_SNAPSHOT);
return -1;
}
- backup_gc = gc_consumer_register(&checkpoint->vclock,
- GC_CONSUMER_ALL, "backup");
- if (backup_gc == NULL)
- return -1;
+ backup_is_in_progress = true;
+ gc_ref_checkpoint(checkpoint, &backup_gc, "backup");
int rc = engine_backup(&checkpoint->vclock, cb, cb_arg);
if (rc != 0) {
- gc_consumer_unregister(backup_gc);
- backup_gc = NULL;
+ gc_unref_checkpoint(&backup_gc);
+ backup_is_in_progress = false;
}
return rc;
}
@@ -2195,9 +2199,9 @@ box_backup_start(int checkpoint_idx, box_backup_cb cb, void *cb_arg)
void
box_backup_stop(void)
{
- if (backup_gc != NULL) {
- gc_consumer_unregister(backup_gc);
- backup_gc = NULL;
+ if (backup_is_in_progress) {
+ gc_unref_checkpoint(&backup_gc);
+ backup_is_in_progress = false;
}
}
diff --git a/src/box/gc.c b/src/box/gc.c
index eb07a3b8..f8100e3f 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -106,16 +106,6 @@ gc_free(void)
}
}
-/** Find the consumer that uses the oldest checkpoint. */
-struct gc_consumer *
-gc_tree_first_checkpoint(gc_tree_t *consumers)
-{
- struct gc_consumer *consumer = gc_tree_first(consumers);
- while (consumer != NULL && consumer->type == GC_CONSUMER_WAL)
- consumer = gc_tree_next(consumers, consumer);
- return consumer;
-}
-
/**
* Invoke garbage collection in order to remove files left
* from old checkpoints. The number of checkpoints saved by
@@ -124,10 +114,6 @@ gc_tree_first_checkpoint(gc_tree_t *consumers)
static void
gc_run(void)
{
- /* Look up the consumer that uses the oldest checkpoint. */
- struct gc_consumer *leftmost_checkpoint =
- gc_tree_first_checkpoint(&gc.consumers);
-
bool run_wal_gc = false;
bool run_engine_gc = false;
@@ -143,9 +129,7 @@ gc_run(void)
struct gc_checkpoint, in_checkpoints);
if (gc.checkpoint_count <= gc.min_checkpoint_count)
break;
- if (leftmost_checkpoint != NULL &&
- vclock_sum(&checkpoint->vclock) >=
- vclock_sum(&leftmost_checkpoint->vclock))
+ if (!rlist_empty(&checkpoint->refs))
break; /* checkpoint is in use */
rlist_del_entry(checkpoint, in_checkpoints);
free(checkpoint);
@@ -229,6 +213,7 @@ gc_add_checkpoint(const struct vclock *vclock)
if (checkpoint == NULL)
panic("out of memory");
+ rlist_create(&checkpoint->refs);
vclock_copy(&checkpoint->vclock, vclock);
rlist_add_tail_entry(&gc.checkpoints, checkpoint, in_checkpoints);
gc.checkpoint_count++;
@@ -236,9 +221,27 @@ gc_add_checkpoint(const struct vclock *vclock)
gc_run();
}
+void
+gc_ref_checkpoint(struct gc_checkpoint *checkpoint,
+ struct gc_checkpoint_ref *ref, const char *format, ...)
+{
+ va_list ap;
+ va_start(ap, format);
+ vsnprintf(ref->name, GC_NAME_MAX, format, ap);
+ va_end(ap);
+
+ rlist_add_tail_entry(&checkpoint->refs, ref, in_refs);
+}
+
+void
+gc_unref_checkpoint(struct gc_checkpoint_ref *ref)
+{
+ rlist_del_entry(ref, in_refs);
+ gc_run();
+}
+
struct gc_consumer *
-gc_consumer_register(const struct vclock *vclock, enum gc_consumer_type type,
- const char *format, ...)
+gc_consumer_register(const struct vclock *vclock, const char *format, ...)
{
struct gc_consumer *consumer = calloc(1, sizeof(*consumer));
if (consumer == NULL) {
@@ -253,8 +256,6 @@ gc_consumer_register(const struct vclock *vclock, enum gc_consumer_type type,
va_end(ap);
vclock_copy(&consumer->vclock, vclock);
- consumer->type = type;
-
gc_tree_insert(&gc.consumers, consumer);
return consumer;
}
diff --git a/src/box/gc.h b/src/box/gc.h
index f6f2279e..a5392cef 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -46,13 +46,6 @@ struct gc_consumer;
enum { GC_NAME_MAX = 64 };
-/** Consumer type: WAL consumer, or SNAP */
-enum gc_consumer_type {
- GC_CONSUMER_WAL = 1,
- GC_CONSUMER_SNAP = 2,
- GC_CONSUMER_ALL = 3,
-};
-
typedef rb_node(struct gc_consumer) gc_node_t;
/**
@@ -64,11 +57,30 @@ struct gc_checkpoint {
struct rlist in_checkpoints;
/** VClock of the checkpoint. */
struct vclock vclock;
+ /**
+ * List of checkpoint references, linked by
+ * gc_checkpoint_ref::in_refs.
+ *
+ * We use a list rather than a reference counter so
+ * that we can list reference names in box.info.gc().
+ */
+ struct rlist refs;
+};
+
+/**
+ * The following structure represents a checkpoint reference.
+ * See also gc_checkpoint::refs.
+ */
+struct gc_checkpoint_ref {
+ /** Link in gc_checkpoint::refs. */
+ struct rlist in_refs;
+ /** Human-readable name of this checkpoint reference. */
+ char name[GC_NAME_MAX];
};
/**
* The object of this type is used to prevent garbage
- * collection from removing files that are still in use.
+ * collection from removing WALs that are still in use.
*/
struct gc_consumer {
/** Link in gc_state::consumers. */
@@ -77,10 +89,6 @@ struct gc_consumer {
char name[GC_NAME_MAX];
/** The vclock tracked by this consumer. */
struct vclock vclock;
- /** Consumer type, indicating that consumer only consumes
- * WAL files, or both - SNAP and WAL.
- */
- enum gc_consumer_type type;
};
typedef rb_tree(struct gc_consumer) gc_tree_t;
@@ -131,6 +139,12 @@ extern struct gc_state gc;
rlist_foreach_entry_reverse(checkpoint, &gc.checkpoints, in_checkpoints)
/**
+ * Iterate over all references to the given checkpoint.
+ */
+#define gc_foreach_checkpoint_ref(ref, checkpoint) \
+ rlist_foreach_entry(ref, &(checkpoint)->refs, in_refs)
+
+/**
* Return the last (newest) checkpoint known to the garbage
* collector. If there's no checkpoint, return NULL.
*/
@@ -176,22 +190,41 @@ void
gc_add_checkpoint(const struct vclock *vclock);
/**
+ * Get a reference to @checkpoint and store it in @ref.
+ * This will block the garbage collector from deleting
+ * the checkpoint files until the reference is released
+ * with gc_put_checkpoint_ref().
+ *
+ * @format... specifies a human-readable name that will be
+ * used for listing the reference in box.info.gc().
+ */
+CFORMAT(printf, 3, 4)
+void
+gc_ref_checkpoint(struct gc_checkpoint *checkpoint,
+ struct gc_checkpoint_ref *ref, const char *format, ...);
+
+/**
+ * Release a reference to a checkpoint previously taken
+ * with gc_ref_checkpoint(). This function may trigger
+ * garbage collection.
+ */
+void
+gc_unref_checkpoint(struct gc_checkpoint_ref *ref);
+
+/**
* Register a consumer.
*
- * This will stop garbage collection of objects newer than
+ * This will stop garbage collection of WAL files newer than
* @vclock until the consumer is unregistered or advanced.
* @format... specifies a human-readable name of the consumer,
* it will be used for listing the consumer in box.info.gc().
- * @type consumer type, reporting whether consumer only depends
- * on WAL files.
*
* Returns a pointer to the new consumer object or NULL on
* memory allocation failure.
*/
-CFORMAT(printf, 3, 4)
+CFORMAT(printf, 2, 3)
struct gc_consumer *
-gc_consumer_register(const struct vclock *vclock, enum gc_consumer_type type,
- const char *format, ...);
+gc_consumer_register(const struct vclock *vclock, const char *format, ...);
/**
* Unregister a consumer and invoke garbage collection
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 9e51b823..655768ec 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -389,6 +389,16 @@ lbox_info_gc_call(struct lua_State *L)
luaL_pushint64(L, vclock_sum(&checkpoint->vclock));
lua_settable(L, -3);
+ lua_pushstring(L, "references");
+ lua_newtable(L);
+ int ref_idx = 0;
+ struct gc_checkpoint_ref *ref;
+ gc_foreach_checkpoint_ref(ref, checkpoint) {
+ lua_pushstring(L, ref->name);
+ lua_rawseti(L, -2, ++ref_idx);
+ }
+ lua_settable(L, -3);
+
lua_rawseti(L, -2, ++count);
}
lua_settable(L, -3);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 88430856..d5df487e 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -592,9 +592,8 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
* join.
*/
if (replica->gc == NULL) {
- replica->gc = gc_consumer_register(replica_clock,
- GC_CONSUMER_WAL, "replica %s",
- tt_uuid_str(&replica->uuid));
+ replica->gc = gc_consumer_register(replica_clock, "replica %s",
+ tt_uuid_str(&replica->uuid));
if (replica->gc == NULL)
diag_raise();
}
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 12/13] gc: call gc_run unconditionally when consumer is advanced
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (10 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 11/13] gc: separate checkpoint references from wal consumers Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 22:26 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 13/13] replication: ref checkpoint needed to join replica Vladimir Davydov
2018-10-05 17:03 ` [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
gc_consumer_unregister and gc_consumer_advance don't call gc_run in case
the consumer in question isn't leftmost. This code was written back when
gc_run was kinda heavy and would call engine/wal callbacks even if it
wouldn't really need to. Today gc_run will bail out shortly, without
making any complex computation, let alone invoking garbage collection
callbacks, in case it has nothing to do so those optimizations are
pointless. Let's remove them.
---
src/box/gc.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)
diff --git a/src/box/gc.c b/src/box/gc.c
index f8100e3f..6322bc67 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -263,18 +263,9 @@ gc_consumer_register(const struct vclock *vclock, const char *format, ...)
void
gc_consumer_unregister(struct gc_consumer *consumer)
{
- int64_t signature = vclock_sum(&consumer->vclock);
-
gc_tree_remove(&gc.consumers, consumer);
free(consumer);
-
- /*
- * Rerun garbage collection after removing the consumer
- * if it referenced the oldest vclock.
- */
- struct gc_consumer *leftmost = gc_tree_first(&gc.consumers);
- if (leftmost == NULL || vclock_sum(&leftmost->vclock) > signature)
- gc_run();
+ gc_run();
}
void
@@ -303,13 +294,7 @@ gc_consumer_advance(struct gc_consumer *consumer, const struct vclock *vclock)
if (update_tree)
gc_tree_insert(&gc.consumers, consumer);
- /*
- * Rerun garbage collection after advancing the consumer
- * if it referenced the oldest vclock.
- */
- struct gc_consumer *leftmost = gc_tree_first(&gc.consumers);
- if (leftmost == NULL || vclock_sum(&leftmost->vclock) > prev_signature)
- gc_run();
+ gc_run();
}
struct gc_consumer *
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 13/13] replication: ref checkpoint needed to join replica
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (11 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 12/13] gc: call gc_run unconditionally when consumer is advanced Vladimir Davydov
@ 2018-10-04 17:20 ` Vladimir Davydov
2018-10-04 22:27 ` Konstantin Osipov
2018-10-05 17:03 ` [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
13 siblings, 1 reply; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-04 17:20 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Before joining a new replica we register a gc_consumer to prevent
garbage collection of files needed for join and following subscribe.
Before commit 9c5d851d7830 ("replication: remove old snapshot files not
needed by replicas") a consumer would pin both checkpoints and WALs so
that would work as expected. However, the above mentioned commit
introduced consumer types and marked a consumer registered on replica
join as WAL-only so if the garbage collector was invoked during join, it
could delete files corresponding to the relayed checkpoint resulting in
replica join failure. Fix this issue by pinning the checkpoint used for
joining a replica with gc_ref_checkpoint and unpinning once join is
complete.
The issue can only be reproduced if there are vinyl spaces, because
deletion of an open snap file doesn't prevent the relay from reading it.
The existing replication/gc test would catch the issue if it triggered
compaction on the master so we simply tweak it accordingly instead of
adding a new test case.
Closes #3708
---
src/box/box.cc | 26 ++++++++++++++++----------
test/replication/gc.result | 19 +++++++++++++------
test/replication/gc.test.lua | 8 +++++---
3 files changed, 34 insertions(+), 19 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 2679f6da..7e32b9fc 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1465,14 +1465,14 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
struct vclock start_vclock;
vclock_copy(&start_vclock, &checkpoint->vclock);
- /* Register the replica with the garbage collector. */
- struct gc_consumer *gc = gc_consumer_register(&start_vclock,
- "replica %s", tt_uuid_str(&instance_uuid));
- if (gc == NULL)
- diag_raise();
- auto gc_guard = make_scoped_guard([=]{
- gc_consumer_unregister(gc);
- });
+ /*
+ * Make sure the checkpoint files won't be deleted while
+ * initial join is in progress.
+ */
+ struct gc_checkpoint_ref gc;
+ gc_ref_checkpoint(checkpoint, &gc, "replica %s",
+ tt_uuid_str(&instance_uuid));
+ auto gc_guard = make_scoped_guard([&]{ gc_unref_checkpoint(&gc); });
/* Respond to JOIN request with start_vclock. */
struct xrow_header row;
@@ -1496,8 +1496,14 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
replica = replica_by_uuid(&instance_uuid);
assert(replica != NULL);
- replica->gc = gc;
- gc_guard.is_active = false;
+
+ /* Register the replica as a WAL consumer. */
+ if (replica->gc != NULL)
+ gc_consumer_unregister(replica->gc);
+ replica->gc = gc_consumer_register(&start_vclock, "replica %s",
+ tt_uuid_str(&instance_uuid));
+ if (replica->gc == NULL)
+ diag_raise();
/* Remember master's vclock after the last request */
struct vclock stop_vclock;
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 83d0de29..5944ba51 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -16,6 +16,10 @@ fiber = require('fiber')
test_run:cleanup_cluster()
---
...
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
-- Make each snapshot trigger garbage collection.
default_checkpoint_count = box.cfg.checkpoint_count
---
@@ -40,10 +44,17 @@ box.error.injection.set("ERRINJ_RELAY_REPORT_INTERVAL", 0.05)
s = box.schema.space.create('test', {engine = engine});
---
...
-_ = s:create_index('pk')
+_ = s:create_index('pk', {run_count_per_level = 1})
---
...
-for i = 1, 100 do s:auto_increment{} end
+for i = 1, 50 do s:auto_increment{} end
+---
+...
+box.snapshot()
+---
+- ok
+...
+for i = 1, 50 do s:auto_increment{} end
---
...
box.snapshot()
@@ -76,10 +87,6 @@ test_run:cmd("setopt delimiter ''");
---
...
-- Start the replica.
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
----
-- true
-...
test_run:cmd("start server replica")
---
- true
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index eed76850..84c980dd 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -5,6 +5,7 @@ replica_set = require('fast_replica')
fiber = require('fiber')
test_run:cleanup_cluster()
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
-- Make each snapshot trigger garbage collection.
default_checkpoint_count = box.cfg.checkpoint_count
@@ -21,8 +22,10 @@ box.error.injection.set("ERRINJ_RELAY_REPORT_INTERVAL", 0.05)
-- Create and populate the space we will replicate.
s = box.schema.space.create('test', {engine = engine});
-_ = s:create_index('pk')
-for i = 1, 100 do s:auto_increment{} end
+_ = s:create_index('pk', {run_count_per_level = 1})
+for i = 1, 50 do s:auto_increment{} end
+box.snapshot()
+for i = 1, 50 do s:auto_increment{} end
box.snapshot()
for i = 1, 100 do s:auto_increment{} end
@@ -43,7 +46,6 @@ end)
test_run:cmd("setopt delimiter ''");
-- Start the replica.
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
test_run:cmd("start server replica")
-- Despite the fact that we invoked garbage collection that
--
2.11.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 01/13] vinyl: fix master crash on replica join failure
2018-10-04 17:20 ` [PATCH 01/13] vinyl: fix master crash on replica join failure Vladimir Davydov
@ 2018-10-04 21:43 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:43 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> This patch fixes a trivial error on vy_send_range() error path which
> results in a master crash in case a file needed to join a replica is
> missing or corrupted.
>
> See #3708
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 02/13] vinyl: force deletion of runs left from unfinished indexes on restart
2018-10-04 17:20 ` [PATCH 02/13] vinyl: force deletion of runs left from unfinished indexes on restart Vladimir Davydov
@ 2018-10-04 21:44 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:44 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> If an instance is restarted while building a new vinyl index, there will
> probably be some run files left. Currently, we won't delete such files
> until box.snapshot() is called, even though there's no point in keeping
> them around. Let's tweak vy_gc_lsm() so that it marks all runs that
> belong to an unfinished index as incomplete to force vy_gc() to remove
> them immediately after recovery is complete.
I don't see a point yet, but I assume you need it for something,
so OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 03/13] gc: make gc_consumer and gc_state structs transparent
2018-10-04 17:20 ` [PATCH 03/13] gc: make gc_consumer and gc_state structs transparent Vladimir Davydov
@ 2018-10-04 21:47 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:47 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> It's exasperating to write trivial external functions for each member of
With growth of the project I tend to
be erring on the side of greater encapsulation.
But we have no policy and I trust your common sense on this, so OK
to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 04/13] gc: use fixed length buffer for storing consumer name
2018-10-04 17:20 ` [PATCH 04/13] gc: use fixed length buffer for storing consumer name Vladimir Davydov
@ 2018-10-04 21:47 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:47 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> The length of a consumer name never exceeds 64 characters so no use to
> allocate a string. This is a mere code simplification.
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete
2018-10-04 17:20 ` [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete Vladimir Davydov
@ 2018-10-04 21:50 ` Konstantin Osipov
2018-10-05 8:56 ` Vladimir Davydov
0 siblings, 1 reply; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:50 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> gc_consumer_new is used in one place while gc_consumer_delete is used in
> two places, but it's a one-liner. Let's fold them to make the code flow
> more straightforward.
Can't agree with you on unrestricted use of free(), it's a time
bomb, please keep gc_consumer_delete().
Otherwise OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 06/13] gc: format consumer name in gc_consumer_register
2018-10-04 17:20 ` [PATCH 06/13] gc: format consumer name in gc_consumer_register Vladimir Davydov
@ 2018-10-04 21:50 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:50 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> It's better than using tt_snprintf at call sites.
Trivial, oK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 07/13] gc: rename checkpoint_count to min_checkpoint_count
2018-10-04 17:20 ` [PATCH 07/13] gc: rename checkpoint_count to min_checkpoint_count Vladimir Davydov
@ 2018-10-04 21:51 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:51 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> Because it's the minimal number of checkpoints that must not be deleted,
> not the actual number of preserved checkpoints. Do it now, in a separate
> patch so as to ease review of the next patch.
Trivial, thanks for extracting into a separate patch, OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 08/13] gc: keep track of available checkpoints
2018-10-04 17:20 ` [PATCH 08/13] gc: keep track of available checkpoints Vladimir Davydov
@ 2018-10-04 21:59 ` Konstantin Osipov
2018-10-05 8:50 ` Vladimir Davydov
0 siblings, 1 reply; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 21:59 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> Currently, the checkpoint iterator is in fact a wrapper around
> memtx_engine::snap_dir while the garbage collector knows nothing about
> checkpoints. This feels like encapsulation violation. Let's keep track
> of all available checkpoints right in the garbage collector instead
> and export gc_ API to iterate over checkpoints.
> +void
> +gc_add_checkpoint(const struct vclock *vclock)
> +{
> + struct gc_checkpoint *last_checkpoint = gc_last_checkpoint();
> + if (last_checkpoint != NULL &&
> + vclock_sum(&last_checkpoint->vclock) == vclock_sum(vclock)) {
> + /*
> + * No new checkpoint was actually created.
> + * Rerun the garbage collector to delete old
> + * files in case box.cfg.checkpoint_count
> + * was changed.
> + */
> + gc_run();
You only compare with the last checkpoint, not all existing
checkpoints. When do you need this? When do you add an existing checkpoint
and know it could be a duplicate of the last one? Is it initial
recovery? Why do you need to deal with it this way? Please explain.
Otherwise the patch is OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 09/13] gc: cleanup garbage collection procedure
2018-10-04 17:20 ` [PATCH 09/13] gc: cleanup garbage collection procedure Vladimir Davydov
@ 2018-10-04 22:00 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 22:00 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> Do some refactoring intended to make the code of gc_run() easier for
> understanding:
> - Remove gc_state::checkpoint_vclock. It was used to avoid rerunning
> engine gc callback in case no checkpoint was deleted. Since we
> maintain a list of all available checkpoints, we don't need it for
> this anymore - we can run gc only if a checkpoint was actually
> removed from the list.
> - Rename gc_state::wal_vclock back to gc_state::vclock.
> - Use bool variables with descriptive names instead of comparing
> vclock signatures.
> - Add some comments.
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 10/13] gc: improve box.info.gc output
2018-10-04 17:20 ` [PATCH 10/13] gc: improve box.info.gc output Vladimir Davydov
@ 2018-10-04 22:01 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 22:01 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> Report vclocks in addition to signatures. When box.info.gc was first
> introduced we used signatures in gc. Now we use vclocks so there's no
> reason not to report them. This is consistent with box.info output
> (there's vclock and signature).
>
> Report the vclock and signature of the oldest WAL row available on the
> instance under box.info.gc().vclock. Without this information the user
> would have to figure it out by looking at box.info.gc().consumers.
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 11/13] gc: separate checkpoint references from wal consumers
2018-10-04 17:20 ` [PATCH 11/13] gc: separate checkpoint references from wal consumers Vladimir Davydov
@ 2018-10-04 22:05 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 22:05 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
Pinning and referencing are different concepts :)
I agree with your reasoning and the patch is OK to push.
> Initially, gc_consumer object was used for pinning both checkpoint and
> WAL files, but commit 9c5d851d7830 ("replication: remove old snapshot
> files not needed by replicas") changed that. Now whether a consumer pins
> WALs or checkpoints or both depends on gc_consumer_type. This was done
> so that replicas wouldn't prevent garbage collection of checkpoint
> files, which they don't need after initial join is complete.
>
> The way the feature was implemented is rather questionable though:
> - Since consumers of both types are stored in the same binary search
> tree, we have to iterate through the tree to find the leftmost
> checkpoint consumer, see gc_tree_first_checkpoint. This looks
> inefficient and ugly.
> - The notion of advancing a checkpoint consumer (gc_consumer_advance)
> is dubious: there's no point to move on to the next checkpoint after
> reading one - instead the consumer needs incremental changes, i.e.
> WALs.
>
> To eliminate those questionable aspects and make the code easier for
> understanding, let's separate WAL and checkpoint consumers. We do this
> by removing gc_consumer_type and making gc_consumer track WALs only.
> For pinning the files corresponding to a checkpoint a new object class
> is introduced, gc_checkpoint_ref. To pin a checkpoint, gc_ref_checkpoint
> needs to be called. It is passed the gc_checkpoint object to pin, the
> consumer name, and the gc_checkpoint_ref to store the ref in. To unpin a
> previously pinned checkpoint, gc_checkpoint_unref should be called.
>
> References are listed by box.info.gc() for each checkpoint under
> 'references' key.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 12/13] gc: call gc_run unconditionally when consumer is advanced
2018-10-04 17:20 ` [PATCH 12/13] gc: call gc_run unconditionally when consumer is advanced Vladimir Davydov
@ 2018-10-04 22:26 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 22:26 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:12]:
> gc_consumer_unregister and gc_consumer_advance don't call gc_run in case
> the consumer in question isn't leftmost. This code was written back when
> gc_run was kinda heavy and would call engine/wal callbacks even if it
> wouldn't really need to. Today gc_run will bail out shortly, without
> making any complex computation, let alone invoking garbage collection
> callbacks, in case it has nothing to do so those optimizations are
> pointless. Let's remove them.
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 13/13] replication: ref checkpoint needed to join replica
2018-10-04 17:20 ` [PATCH 13/13] replication: ref checkpoint needed to join replica Vladimir Davydov
@ 2018-10-04 22:27 ` Konstantin Osipov
0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Osipov @ 2018-10-04 22:27 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:12]:
> Before joining a new replica we register a gc_consumer to prevent
> garbage collection of files needed for join and following subscribe.
> Before commit 9c5d851d7830 ("replication: remove old snapshot files not
> needed by replicas") a consumer would pin both checkpoints and WALs so
> that would work as expected. However, the above mentioned commit
> introduced consumer types and marked a consumer registered on replica
> join as WAL-only so if the garbage collector was invoked during join, it
> could delete files corresponding to the relayed checkpoint resulting in
> replica join failure. Fix this issue by pinning the checkpoint used for
> joining a replica with gc_ref_checkpoint and unpinning once join is
> complete.
>
> The issue can only be reproduced if there are vinyl spaces, because
> deletion of an open snap file doesn't prevent the relay from reading it.
> The existing replication/gc test would catch the issue if it triggered
> compaction on the master so we simply tweak it accordingly instead of
> adding a new test case.
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 08/13] gc: keep track of available checkpoints
2018-10-04 21:59 ` Konstantin Osipov
@ 2018-10-05 8:50 ` Vladimir Davydov
0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-05 8:50 UTC (permalink / raw)
To: Konstantin Osipov; +Cc: tarantool-patches
On Fri, Oct 05, 2018 at 12:59:32AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> > Currently, the checkpoint iterator is in fact a wrapper around
> > memtx_engine::snap_dir while the garbage collector knows nothing about
> > checkpoints. This feels like encapsulation violation. Let's keep track
> > of all available checkpoints right in the garbage collector instead
> > and export gc_ API to iterate over checkpoints.
>
>
> > +void
> > +gc_add_checkpoint(const struct vclock *vclock)
> > +{
> > + struct gc_checkpoint *last_checkpoint = gc_last_checkpoint();
> > + if (last_checkpoint != NULL &&
> > + vclock_sum(&last_checkpoint->vclock) == vclock_sum(vclock)) {
> > + /*
> > + * No new checkpoint was actually created.
> > + * Rerun the garbage collector to delete old
> > + * files in case box.cfg.checkpoint_count
> > + * was changed.
> > + */
> > + gc_run();
>
> You only compare with the last checkpoint, not all existing
> checkpoints. When do you need this? When do you add an existing checkpoint
> and know it could be a duplicate of the last one? Is it initial
> recovery? Why do you need to deal with it this way? Please explain.
There could be no changes (xlogs) after the last checkpoint, in which
case box.snapshot() will still call engine methods and try to add a new
checkpoint to gc. Currently, it's up to engines and gc to handle this
case. Why can't we detect this in box.snapshot() and bail out without
calling engine/gc methods? Because memtx needs to touch the snapshot in
in this case, see commit c10874f4b ("gh-2045: Update snapshot timestamp
if snapshot already exists"). Why? I guess this is needed for checkpoint
daemon.
We could introduce another engine callback for this though, something
like engine_touch_checkpoint.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete
2018-10-04 21:50 ` Konstantin Osipov
@ 2018-10-05 8:56 ` Vladimir Davydov
0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-05 8:56 UTC (permalink / raw)
To: Konstantin Osipov; +Cc: tarantool-patches
On Fri, Oct 05, 2018 at 12:50:04AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 00:11]:
> > gc_consumer_new is used in one place while gc_consumer_delete is used in
> > two places, but it's a one-liner. Let's fold them to make the code flow
> > more straightforward.
>
> Can't agree with you on unrestricted use of free(), it's a time
> bomb, please keep gc_consumer_delete().
OK I'll leave gc_consumer_delete() and add gc_checkpoint_delete(),
which will be a wrapper around free() too, in the patch that introduces
gc_checkpoint struct, for consistency.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 00/13] box: garbage collection refactoring and fixes
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
` (12 preceding siblings ...)
2018-10-04 17:20 ` [PATCH 13/13] replication: ref checkpoint needed to join replica Vladimir Davydov
@ 2018-10-05 17:03 ` Vladimir Davydov
13 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2018-10-05 17:03 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
I addressed reviews remarks by Kostja and pushed all patches to 1.10.
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2018-10-05 17:03 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-04 17:20 [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
2018-10-04 17:20 ` [PATCH 01/13] vinyl: fix master crash on replica join failure Vladimir Davydov
2018-10-04 21:43 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 02/13] vinyl: force deletion of runs left from unfinished indexes on restart Vladimir Davydov
2018-10-04 21:44 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 03/13] gc: make gc_consumer and gc_state structs transparent Vladimir Davydov
2018-10-04 21:47 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 04/13] gc: use fixed length buffer for storing consumer name Vladimir Davydov
2018-10-04 21:47 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 05/13] gc: fold gc_consumer_new and gc_consumer_delete Vladimir Davydov
2018-10-04 21:50 ` Konstantin Osipov
2018-10-05 8:56 ` Vladimir Davydov
2018-10-04 17:20 ` [PATCH 06/13] gc: format consumer name in gc_consumer_register Vladimir Davydov
2018-10-04 21:50 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 07/13] gc: rename checkpoint_count to min_checkpoint_count Vladimir Davydov
2018-10-04 21:51 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 08/13] gc: keep track of available checkpoints Vladimir Davydov
2018-10-04 21:59 ` Konstantin Osipov
2018-10-05 8:50 ` Vladimir Davydov
2018-10-04 17:20 ` [PATCH 09/13] gc: cleanup garbage collection procedure Vladimir Davydov
2018-10-04 22:00 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 10/13] gc: improve box.info.gc output Vladimir Davydov
2018-10-04 22:01 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 11/13] gc: separate checkpoint references from wal consumers Vladimir Davydov
2018-10-04 22:05 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 12/13] gc: call gc_run unconditionally when consumer is advanced Vladimir Davydov
2018-10-04 22:26 ` Konstantin Osipov
2018-10-04 17:20 ` [PATCH 13/13] replication: ref checkpoint needed to join replica Vladimir Davydov
2018-10-04 22:27 ` Konstantin Osipov
2018-10-05 17:03 ` [PATCH 00/13] box: garbage collection refactoring and fixes Vladimir Davydov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox