* [PATCH v2 0/3] Improve replication related logging
@ 2019-02-15 12:25 Vladimir Davydov
2019-02-15 12:25 ` [PATCH v2 1/3] vclock: use static buffer to format vclock Vladimir Davydov
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Vladimir Davydov @ 2019-02-15 12:25 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Post-mortem analysis of a replication failure is painful as it is, but
with poor logging that we have it is usually nearly to impossible to
understand what happened in the frist place. This patch set targets on
ameliorating the situration by logging all relevant vclocks and ids on
join/subscribe.
Changes in v2:
- Log all replica id changes, because without them vclocks don't make
much sense.
- Print remote and local vclocks on subscribe to replica's vclock.
It may be useful if master's log rotated.
v1: https://www.freelists.org/post/tarantool-patches/PATCH-replication-improve-joinsubscribe-logging
Vladimir Davydov (3):
vclock: use static buffer to format vclock
box: don't set cluster uuid and instance id in initial snapshot
replication: improve join/subscribe logging
src/box/alter.cc | 3 +-
src/box/applier.cc | 17 ++++++++---
src/box/bootstrap.snap | Bin 1911 -> 1831 bytes
src/box/box.cc | 27 +++++++++++------
src/box/error.cc | 6 ++--
src/box/gc.c | 6 ++--
src/box/lua/upgrade.lua | 7 +++--
src/box/replication.cc | 17 +++++------
src/box/vclock.c | 64 +++++++++++------------------------------
src/box/vclock.h | 5 ++--
src/box/xlog.c | 16 +++--------
test/box-py/bootstrap.result | 5 ++--
test/box-py/bootstrap.test.py | 16 ++---------
test/replication/misc.result | 9 +++++-
test/replication/misc.test.lua | 5 +++-
test/unit/vclock.cc | 3 +-
16 files changed, 91 insertions(+), 115 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] vclock: use static buffer to format vclock
2019-02-15 12:25 [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
@ 2019-02-15 12:25 ` Vladimir Davydov
2019-02-15 12:41 ` Konstantin Osipov
2019-02-15 12:25 ` [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot Vladimir Davydov
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2019-02-15 12:25 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Currently, vclock_to_string() allocates the formatted vclock string
using malloc() and hence the caller is responsible for freeing it, which
isn't very user-friendly. Let's use a static buffer as we do to format
other objects.
---
src/box/error.cc | 6 ++---
src/box/gc.c | 6 ++---
src/box/replication.cc | 12 ++++------
src/box/vclock.c | 64 +++++++++++++-------------------------------------
src/box/vclock.h | 5 ++--
src/box/xlog.c | 16 ++++---------
test/unit/vclock.cc | 3 +--
7 files changed, 31 insertions(+), 81 deletions(-)
diff --git a/src/box/error.cc b/src/box/error.cc
index aa81a390..47dce330 100644
--- a/src/box/error.cc
+++ b/src/box/error.cc
@@ -183,14 +183,12 @@ XlogGapError::XlogGapError(const char *file, unsigned line,
const struct vclock *from, const struct vclock *to)
: XlogError(&type_XlogGapError, file, line)
{
- char *s_from = vclock_to_string(from);
- char *s_to = vclock_to_string(to);
+ const char *s_from = vclock_to_string(from);
+ const char *s_to = vclock_to_string(to);
snprintf(errmsg, sizeof(errmsg),
"Missing .xlog file between LSN %lld %s and %lld %s",
(long long) vclock_sum(from), s_from ? s_from : "",
(long long) vclock_sum(to), s_to ? s_to : "");
- free(s_from);
- free(s_to);
}
struct error *
diff --git a/src/box/gc.c b/src/box/gc.c
index 7411314a..5639edd8 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -294,10 +294,8 @@ gc_advance(const struct vclock *vclock)
consumer->is_inactive = true;
gc_tree_remove(&gc.consumers, consumer);
- char *vclock_str = vclock_to_string(&consumer->vclock);
- say_crit("deactivated WAL consumer %s at %s",
- consumer->name, vclock_str);
- free(vclock_str);
+ say_crit("deactivated WAL consumer %s at %s", consumer->name,
+ vclock_to_string(&consumer->vclock));
consumer = next;
}
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 2cb4ec0f..19ad5026 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -686,9 +686,9 @@ replicaset_needs_rejoin(struct replica **master)
const char *uuid_str = tt_uuid_str(&replica->uuid);
const char *addr_str = sio_strfaddr(&applier->addr,
applier->addr_len);
- char *local_vclock_str = vclock_to_string(&replicaset.vclock);
- char *remote_vclock_str = vclock_to_string(&ballot->vclock);
- char *gc_vclock_str = vclock_to_string(&ballot->gc_vclock);
+ const char *local_vclock_str = vclock_to_string(&replicaset.vclock);
+ const char *remote_vclock_str = vclock_to_string(&ballot->vclock);
+ const char *gc_vclock_str = vclock_to_string(&ballot->gc_vclock);
say_info("can't follow %s at %s: required %s available %s",
uuid_str, addr_str, local_vclock_str, gc_vclock_str);
@@ -703,7 +703,7 @@ replicaset_needs_rejoin(struct replica **master)
"replica has local rows: local %s remote %s",
uuid_str, addr_str, local_vclock_str,
remote_vclock_str);
- goto next;
+ continue;
}
/* Prefer a master with the max vclock. */
@@ -711,10 +711,6 @@ replicaset_needs_rejoin(struct replica **master)
vclock_sum(&ballot->vclock) >
vclock_sum(&leader->applier->ballot.vclock))
leader = replica;
-next:
- free(local_vclock_str);
- free(remote_vclock_str);
- free(gc_vclock_str);
}
if (leader == NULL)
return false;
diff --git a/src/box/vclock.c b/src/box/vclock.c
index d4b2ba75..cea67017 100644
--- a/src/box/vclock.c
+++ b/src/box/vclock.c
@@ -35,6 +35,7 @@
#include <ctype.h>
#include "diag.h"
+#include "trivia/util.h"
int64_t
vclock_follow(struct vclock *vclock, uint32_t replica_id, int64_t lsn)
@@ -50,64 +51,31 @@ vclock_follow(struct vclock *vclock, uint32_t replica_id, int64_t lsn)
return prev_lsn;
}
-CFORMAT(printf, 4, 0) static inline int
-rsnprintf(char **buf, char **pos, char **end, const char *fmt, ...)
+static int
+vclock_snprint(char *buf, int size, const struct vclock *vclock)
{
- int rc = 0;
- va_list ap;
-
- while (1) {
- va_start(ap, fmt);
- int n = vsnprintf(*pos, *end - *pos, fmt, ap);
- va_end(ap);
- assert(n > -1); /* glibc >= 2.0.6, see vsnprintf(3) */
- if (n < *end - *pos) {
- *pos += n;
- break;
- }
-
- /* Reallocate buffer */
- ptrdiff_t cap = (*end - *buf) > 0 ? (*end - *buf) : 32;
- while (cap <= *pos - *buf + n)
- cap *= 2;
- char *chunk = (char *) realloc(*buf, cap);
- if (chunk == NULL) {
- diag_set(OutOfMemory, cap, "malloc", "vclock");
- free(*buf);
- *buf = *end = *pos = NULL;
- rc = -1;
- break;
- }
- *pos = chunk + (*pos - *buf);
- *end = chunk + cap;
- *buf = chunk;
- }
-
- return rc;
-}
-
-char *
-vclock_to_string(const struct vclock *vclock)
-{
- (void) vclock;
- char *buf = NULL, *pos = NULL, *end = NULL;
-
- if (rsnprintf(&buf, &pos, &end, "{") != 0)
- return NULL;
+ int total = 0;
+ SNPRINT(total, snprintf, buf, size, "{");
const char *sep = "";
struct vclock_iterator it;
vclock_iterator_init(&it, vclock);
vclock_foreach(&it, replica) {
- if (rsnprintf(&buf, &pos, &end, "%s%u: %lld", sep,
- replica.id, (long long) replica.lsn) != 0)
- return NULL;
+ SNPRINT(total, snprintf, buf, size, "%s%u: %lld",
+ sep, (unsigned)replica.id, (long long)replica.lsn);
sep = ", ";
}
- if (rsnprintf(&buf, &pos, &end, "}") != 0)
- return NULL;
+ SNPRINT(total, snprintf, buf, size, "}");
+ return total;
+}
+const char *
+vclock_to_string(const struct vclock *vclock)
+{
+ char *buf = tt_static_buf();
+ if (vclock_snprint(buf, TT_STATIC_BUF_LEN, vclock) < 0)
+ return "<failed to format vclock>";
return buf;
}
diff --git a/src/box/vclock.h b/src/box/vclock.h
index 0c999690..1a174c1e 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -245,10 +245,9 @@ vclock_merge(struct vclock *dst, struct vclock *diff)
* \brief Format vclock to YAML-compatible string representation:
* { replica_id: lsn, replica_id:lsn })
* \param vclock vclock
- * \return fomatted string. This pointer should be passed to free(3) to
- * release the allocated storage when it is no longer needed.
+ * \return fomatted string, stored in a static buffer.
*/
-char *
+const char *
vclock_to_string(const struct vclock *vclock);
/**
diff --git a/src/box/xlog.c b/src/box/xlog.c
index bd5614f6..d5de8e6d 100644
--- a/src/box/xlog.c
+++ b/src/box/xlog.c
@@ -151,20 +151,12 @@ xlog_meta_format(const struct xlog_meta *meta, char *buf, int size)
meta->filetype, v13, PACKAGE_VERSION,
tt_uuid_str(&meta->instance_uuid));
if (vclock_is_set(&meta->vclock)) {
- char *vstr = vclock_to_string(&meta->vclock);
- if (vstr == NULL)
- return -1;
- SNPRINT(total, snprintf, buf, size,
- VCLOCK_KEY ": %s\n", vstr);
- free(vstr);
+ SNPRINT(total, snprintf, buf, size, VCLOCK_KEY ": %s\n",
+ vclock_to_string(&meta->vclock));
}
if (vclock_is_set(&meta->prev_vclock)) {
- char *vstr = vclock_to_string(&meta->prev_vclock);
- if (vstr == NULL)
- return -1;
- SNPRINT(total, snprintf, buf, size,
- PREV_VCLOCK_KEY ": %s\n", vstr);
- free(vstr);
+ SNPRINT(total, snprintf, buf, size, PREV_VCLOCK_KEY ": %s\n",
+ vclock_to_string(&meta->prev_vclock));
}
SNPRINT(total, snprintf, buf, size, "\n");
assert(total > 0);
diff --git a/test/unit/vclock.cc b/test/unit/vclock.cc
index 6a1d3bc2..15e9f937 100644
--- a/test/unit/vclock.cc
+++ b/test/unit/vclock.cc
@@ -249,11 +249,10 @@ test_tostring_one(uint32_t count, const int64_t *lsns, const char *res)
if (lsns[node_id] > 0)
vclock_follow(&vclock, node_id, lsns[node_id]);
}
- char *str = vclock_to_string(&vclock);
+ const char *str = vclock_to_string(&vclock);
int result = strcmp(str, res);
if (result)
diag("\n!!!new result!!! %s\n", str);
- free(str);
return !result;
}
--
2.11.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot
2019-02-15 12:25 [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
2019-02-15 12:25 ` [PATCH v2 1/3] vclock: use static buffer to format vclock Vladimir Davydov
@ 2019-02-15 12:25 ` Vladimir Davydov
2019-02-15 12:58 ` [tarantool-patches] " Konstantin Osipov
2019-02-15 13:00 ` Konstantin Osipov
2019-02-15 12:25 ` [PATCH v2 3/3] replication: improve join/subscribe logging Vladimir Davydov
` (2 subsequent siblings)
4 siblings, 2 replies; 10+ messages in thread
From: Vladimir Davydov @ 2019-02-15 12:25 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
They aren't needed there as we reset them anyway once the snapshot is
replayed on initial bootstrap. OTOH having them results in calling
replica_{set,clear}_id twice on initial bootstrap, which will look weird
when I patch them to log the ids. So let's remove them from the initial
snapshot. This makes the initial bootstrap impossible to recover from as
it is, but that shouldn't be an issue since one can always bootstrap a
new instance in a normal way. This also allows us to make cluster uuid
truly immutable (currently, one can update it with REPLACE).
---
src/box/alter.cc | 2 +-
src/box/bootstrap.snap | Bin 1911 -> 1831 bytes
src/box/box.cc | 6 +-----
src/box/lua/upgrade.lua | 7 ++++---
test/box-py/bootstrap.result | 5 ++---
test/box-py/bootstrap.test.py | 16 +++-------------
test/replication/misc.result | 9 ++++++++-
test/replication/misc.test.lua | 5 ++++-
8 files changed, 23 insertions(+), 27 deletions(-)
diff --git a/src/box/alter.cc b/src/box/alter.cc
index 6e0ed973..de8372b3 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -3012,7 +3012,7 @@ on_replace_dd_schema(struct trigger * /* trigger */, void *event)
const char *key = tuple_field_cstr_xc(new_tuple ? new_tuple : old_tuple,
BOX_SCHEMA_FIELD_KEY);
if (strcmp(key, "cluster") == 0) {
- if (new_tuple == NULL)
+ if (old_tuple != NULL)
tnt_raise(ClientError, ER_REPLICASET_UUID_IS_RO);
tt_uuid uu;
tuple_field_uuid_xc(new_tuple, BOX_CLUSTER_FIELD_UUID, &uu);
diff --git a/src/box/bootstrap.snap b/src/box/bootstrap.snap
index d6cc821fbf449852b62aace82a2c751e7f328365..0bb446fb6903ac3ef630c419b909f7db3df0372a 100644
GIT binary patch
delta 1827
zcmV+;2i*Ag4yO)~8GkZ0H7#c}H)3WoGdVK~Np5p=VQyn(Iv`^*Gd5&pFkvk=W@R@m
zG%zzYEjT!4F)d?aGh{VnH8Wx{HDL-?Lu_wjYdRo%eF_TIx(m9^2Ce|k)s!0pr2qf`
z001bpFZ}>eEj0jYM??=v;25U>Fw8KA5aR@KMG#m>L*QW(A%7yj2IH-OV2p`GQD8SB
zQ<9aEzBH}eU*u^eqq>gJE=JE6+w)C1z1FnIIjXms%tRqmN&(pb)d1!IBbLvpo3p4b
z$3!pY+ag{1#zxmGiG3RH&dZ<IlG)Q)&mu4DR~@%A`ZD0|eaX+E*h$TE%zOP|u0`61
zt}gSi{qBaG_J6FaWj{gG!!{Jb{n32pXK<yYyE#?+{`o!Y{ob#7&t{j#En9_BTGWIi
z32^PJajM(`Y;}&VeOY*ivsM4}@6KsHxU)kQfDaclV)Q+*!&;nCA#Sv7b&gA(2-$&i
zV@8XTz6P~y3UQ;Y&T-PJLCp8JW=!Z-=h%f7dD!+`R)6{8Ug(@!UVgZAS1>>+ZNbR`
zpw6Eex9|>YH{XtzHQT#Sk^+hT!LINAW-Yu!OH$Mbwx5e92V0%vs*4x4pNp2HrUpzp
zEvj0!q%;h+I!CcTXVPg=Wl1{oBGj@o$}QM8Bymir=RJ*WNo5o`Y1ttflb~PEN}&`G
zY;}(Met!nRRPJ2A>TXB&eLE+6uEwm`{_nfSv+z5YIngifJhwFpbAab@i@W%&WoNWx
zpkLMoRpocHitJhhJL~1gDk6<P{bdtdQc{7f&M~~K81Eq{yB(Oz(AD>A9*~40q(R(o
z&R}PU=!n2pXC@OgC1^+h(Ttc8F%x2|a|D_>aet;42STRiCG#Q1LrjMl4l90F6lh>(
zU^c*LC}k*NC|#(PWP!ebu22$3lA&}|VXJc-#mpBFmKS%h7E7Q1-DUHI771}O6t+4?
zq6~IJ*9%1r)>d7=ERtg){riSBFXL6`q-%V4=Njy+5|x!gJ(#V|k#%|LX<62>o!2p;
z5r0M|bS~@ha4e^K?9^}PHt%pdM2kr9zTf@&Kg)3xqM#|XM`^2bBwycer^VfEjp~rL
zI>(^z!@FA=&55wpIf}AZ=VI<Gjov_L;T_sM*y<d=4;51M-fT#!j<q>*GD1^?BN8)&
zmV|6|j&($f$Qp!d5cf`tDq5v54h5u3mwzr==#Q<=ku;z0Z(%Lcd~9`&0P3VL4%K6S
zC~}fk;L+h6{wAW^j?dthbvUQQu%s2nnRIB(@-9hfj;+p7#iv?@E}n5{#OONc81g)>
z8{HJr;Q7|7W#x2N{6lzGUCn1d7iFEl_>pwNI1`So&Jo8)jTi-08rSUSVogmCn17-|
zDU6%BK|LE=ougMHz$?fIjaLyJ8lBN2M@I(DiID??-n_VRVbj6}$L8i$RuqegTCu%?
zP%2I-PAE<%Og1J|fo-VqTx6&OMi77i1war0(FF%F=!ymsV6Z@pqcDzv7zjfUiXs~a
zKmZIv3N(mXFhZ*abuTo5t}r0bJb#VtMSW;5s(;`EhOL>ZG@V_Dw&p-8+e?Qb5ZH(U
z8w&cNkRT-mlY5E*j$JyRR7_U;;{~ix4OzC?q3X4Ia+*1cr!ninwax4~%hK_LeTB&v
zLjVBRs^4;QWQeYPv}dd-Dk13b>Y^0K&07c!{KzY;+^{zwba*L5k*TZ4R)2A5)rMV5
z1W#@}b)GGRnL+;xjWVv<D$TnPrhAM5C5b{upNYCzXQ?ndZ_JlrcixyU53||%^AMcJ
zXn-!gCygMQX9hfEOg&wHPV7l;!Y8+w%G6)<X<;+(Ard$AAApCZBTbsYJgW!(nR&|J
zKW8&<KWHVG1Y*2nb*s||dVfD!iR^d$j8txkAl2j9AER4<f-G5IreqOrTQ9UL|8iT+
z`jed*i{oi{_|G(#F*%Eg@(e*se+*q_EnS!*7j%53&FC-ksx8Q~^uVod^Bkx>ON?E8
zB&%*d-&__)6H<((ZgQw$7zx{+W69LE*&h?7B7a<sAC~}Z0mJEG=YKq-7ghsTx}5mh
zm2YW>?Z;EnR68??zGSv7f?8y1z%Mw_woZ)_#d6LLT2tbG!rMPqA;^9qOVp6)w)Uk3
zWj-M@9sOHqC33{?D9#~wr0H7TN8Lc)Ob&^*Dez)Cm?wW<i_>jj;1n9Cw{1vsrpxC{
z>={t>jCkioCWhRtm4EX|*239%3IO;oK41!#Ues%p1qCUL?4u+<m}YW<L(kyhjyf?G
zTJAly-7>J7ywZd0|9Dm~is&PHOHUGW6(y5)oZ3z2@GO5Ksw8hRS@Hb?aZr+0cGNIK
z43t=2$s?w>#u23tKrzBb1(Hfa^nV<p45fgF?oxj_`)Ep_UK{`xaiyL&vvbKES-e0i
R9JX=Dom4Gm<qp*lt?d=eU4H-o
delta 1907
zcmV-(2aNcq4)+d_8GkW3EoV7lF*jvmFg6NFZgX^DZewLSAZ2DaWnnThWGyyhGh;0@
zWn*D2IA$_6EjKVTW;tOpGdN;qISN)oY;R+0Iv{&}3JTS_3%bn){{YS>%iw^e00000
z04TLD{Qyw4Hvqa!co0g!SZxA0ilQirqA1Ryb&CSuuIQd&-G3xfV4Io8{7rEye_;kT
znUWzlQ%n+(-(8ZDn8bwe9Ba74@W94Wq6%^`_@km@kx;TsDFyEU<^cKtMP@CXcju>&
z%m@YQuYZ~N^)8R!T6H0S_r~UjBiT*O!`b(`mANKuKgK%k%lADza@)VKrU?<%FW-?A
zcgq$;KNMX>JbxUrZr?xr?|#4cga6$yrZen9skJ5{=}H4)8|%zEIRTudWo+YD-eb+x
zTl#nBwxI0YBO4$`3Kz5bpXg;x)+zwb($a{M9yWGlxIl5%*P^CTqwa;{EG=oZTFiVu
ztagO6wA@mYzI=i3>#Urm<%54xopM$;D10)lzLd(8^M5HE6AtQGCh!KcSgTYDLzexW
zlc86WbuN|CD=tDrK~fk@(Sl{i3II!m0yi9DkDs^Oy(hvFk^aT5@ov>%d5_nuss}~;
zIf;7VEG=D~#6|l#dChuqaKzIj2)k#kX~0=plKuG+Pm|y^D-8o9*sayhW#iGzGb5n?
zbiQYuS$~iOc8{uzf{wr2q*6%0Sz2oRp(wIKFZM1EduHFba}x;b{F?1=eb>1*e&@6#
z`bnSX_f}&N{CtM-7_Y%@t->7W__amwY~c_zyCy~Ne&J^jN#~aS6UjYm$pB|*IpWuh
z_fe!_PYlm7*7xfl(OhDr#oTciirzh{F9kSD%YP~#vTzuZ`tpqQWQbvAOygM@`&Okx
zRVO$Zk;(+B1d0S|#7V?ybyC%YsL2_s0YQU9bHiC$4$-7Kl&m|=e2~r9$e3h2$aHXV
z?4rOSrY43aW&?}{m<%u&tSw!rvLPz9WTDj6`ZB{=TE0}EvY{$Z`eaR>LI1ncs7p;U
z>VIZrI7>^UQ1p)QH<DVsxx9{FCeKLvHy&?am|b3yt~1`9Ytg$!z}uvL6gW%E7v|@u
zX`!C^yq+211c~XK2K41w$m%mx$DQBMhu@<LWWx6y@7LeDJTpNea#C?f;4Cdo*>~(|
za`$_yJusZ5B~h!DcfYpEQ{pTwN!hhkoqq_hyvG#;&eF2=VTR>B1}l)Ix=G;`cykJ2
z0+9k$ChR?q$-D7#OkXO6Uj;sfELGO&d=<RP(&4b}8M3-4AvHldATmEHj1SJz(!SIr
z)Ap!Z%-z!@XtTAbQ}NicWy_VigR`_O(dYYNS(7#$oTViIOKVZ5%CUbWxmhdV@PBeR
z{zj_YjSt05^K!@{$Ftg^?qoyf*N4qoYjBp9C0d|sjPY>i1q-i34Ilp#lhIG(i=T(N
zU{O%m#Z4o2dAFeboTPRBWXr6ys5{Z%EG>6(z?fAL)Y)b~Cv9wiMAnm9i#nPa^^?I_
zT7EGC$cTc#kO6sNt35be^zc-k7Job}s?G`>6{{306ssNC8G&)A@myr61V#`500Bq<
z(FF%F=!ymsV6aGxqbQDn7zjfUj3OIHKmZCt3N(mXFf<d)r@FPDj8ZtjF@B65qsQnm
zdb9%{Fl^1V(safGu{8#2*<KzDLjXf$f}zk4k~}GqnAB4i;Mo6|T=B8$ihmcoLN#Qv
zW`~N`>gw8ZR8M1;glo6i^PHvd1pEY(&x<hFTNb}f<S>HleZ*(1sZ}bd;nqbLhnu?)
z0`og5tbB2A^d=Bhh~iUM!dCI{s;#+32u9v;>YObAvqAp~jk2!#9L;wjRQK2glSCDx
zFBkO*&!Gm{V4Rf$_slpeAAhjfv*!n#4QK>iGEW*WG{+2BNSyk+_ME_z)<IEjF@>qV
z=+eSw)<Yz2$i?pqk~>{y)|f4W?yNCQ+dpSBZ@;?|3<Pk@vAVt02zx(eiP@+7Leg!C
z@ci)%AEN_6RhEb^Q?dw$txDK}f4Qw@{mE{^qH-GS`!lg*OtodA$$u$t>4m41*-{rc
z$YmX0X@L6U{9_j+UwQ_sL%asso)ThLU&*SQl0SjPahi&z5t<wdFpNZPPrp=Z+w6~t
zQqe!|z>jmRgTbcno!<<-Mi@An<;2&gd{cG!-&K8-wdm3NZ#&`GLWf0givTzH+J|nz
zRZ)&ej;x_-MVwD#`G2o11jR2TN*t2i)?KngnV*P3N3RxIi5&4disNHD($!kVQFq%n
zqXVnj07fyL%<aF&#OW4d;1n3Aw{1{%rVHmxEDb2iB;?M5Cx+aumGhHS!ilmJ0B}=$
zz!WULsMm-KnnxJcM@xQ?&Nzj;-s0hkI*}JT?p^AkWnedXwQPsj|M;v@6md!P#-1d(
zDoP~v+qxUl;TQcxOG$3hx1zuYJJAoVS(G9}Fv@kW-Vp&@<A~A+pcq-B0!cL?`acdq
thElOZm#P0s`v^&(M?w~{P|y3<d8L3woSg;Vjxglw@t!2#kPg)lt?fmKrxpMJ
diff --git a/src/box/box.cc b/src/box/box.cc
index 6ea83a65..8136cf33 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1661,7 +1661,7 @@ box_set_replicaset_uuid(const struct tt_uuid *replicaset_uuid)
else
tt_uuid_create(&uu);
/* Save replica set UUID in _schema */
- if (boxk(IPROTO_REPLACE, BOX_SCHEMA_ID, "[%s%s]", "cluster",
+ if (boxk(IPROTO_INSERT, BOX_SCHEMA_ID, "[%s%s]", "cluster",
tt_uuid_str(&uu)))
diag_raise();
}
@@ -1736,10 +1736,6 @@ bootstrap_master(const struct tt_uuid *replicaset_uuid)
uint32_t replica_id = 1;
- /* Unregister a local replica if it was registered by bootstrap.bin */
- if (boxk(IPROTO_DELETE, BOX_CLUSTER_ID, "[%u]", 1) != 0)
- diag_raise();
-
/* Register the first replica in the replica set */
box_register_replica(replica_id, &INSTANCE_UUID);
assert(replica_by_uuid(&INSTANCE_UUID)->id == 1);
diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
index 3d9acc97..70cfb4f2 100644
--- a/src/box/lua/upgrade.lua
+++ b/src/box/lua/upgrade.lua
@@ -71,6 +71,8 @@ local function set_system_triggers(val)
box.space._priv:run_triggers(val)
box.space._trigger:run_triggers(val)
box.space._collation:run_triggers(val)
+ box.space._schema:run_triggers(val)
+ box.space._cluster:run_triggers(val)
end
--------------------------------------------------------------------------------
@@ -88,9 +90,8 @@ local function erase()
truncate(box.space._truncate)
truncate(box.space._collation)
truncate(box.space._trigger)
- --truncate(box.space._schema)
- box.space._schema:delete('version')
- box.space._schema:delete('max_id')
+ truncate(box.space._schema)
+ truncate(box.space._cluster)
end
local function create_sysview(source_id, target_id)
diff --git a/test/box-py/bootstrap.result b/test/box-py/bootstrap.result
index 506aca3d..3e439455 100644
--- a/test/box-py/bootstrap.result
+++ b/test/box-py/bootstrap.result
@@ -3,13 +3,12 @@ box.internal.bootstrap()
...
box.space._schema:select{}
---
-- - ['cluster', '<cluster uuid>']
- - ['max_id', 511]
+- - ['max_id', 511]
- ['version', 2, 1, 0]
...
box.space._cluster:select{}
---
-- - [1, '<server uuid>']
+- []
...
box.space._space:select{}
---
diff --git a/test/box-py/bootstrap.test.py b/test/box-py/bootstrap.test.py
index 9d690b03..4f2f55a7 100644
--- a/test/box-py/bootstrap.test.py
+++ b/test/box-py/bootstrap.test.py
@@ -1,16 +1,4 @@
-
-import sys
-import yaml
-
-server_uuid = server.get_param('uuid')
-sys.stdout.push_filter(server_uuid, '<server uuid>')
-cluster_uuid = yaml.load(server.admin('box.space._schema:get("cluster")',
- silent = True))[0][1]
-sys.stdout.push_filter(cluster_uuid, '<cluster uuid>')
-
server.admin('box.internal.bootstrap()')
-server.restart()
-
server.admin('box.space._schema:select{}')
server.admin('box.space._cluster:select{}')
server.admin('box.space._space:select{}')
@@ -20,4 +8,6 @@ server.admin('box.space._func:select{}')
server.admin('box.space._priv:select{}')
# Cleanup
-sys.stdout.pop_filter()
+server.stop()
+server.cleanup()
+server.deploy()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index ab827c50..684d7f7c 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -627,11 +627,18 @@ box.info.replication[2].downstream.status
- follow
...
-- change master's cluster uuid and check that replica doesn't connect.
+test_run:cmd("restart server default with cleanup=1")
+box.schema.user.grant("guest", "replication")
+---
+...
+replica_uuid = test_run:eval("replica", "box.info.uuid")[1]
+---
+...
test_run:cmd("stop server replica")
---
- true
...
-_ = box.space._schema:replace{'cluster', tostring(uuid.new())}
+_ = box.space._cluster:insert{2, replica_uuid}
---
...
-- master believes replica is in cluster, but their cluster UUIDs differ.
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index eda5310b..b0b64dc7 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -253,8 +253,11 @@ test_run:cmd("start server replica")
test_run:grep_log("replica", "REPLICASET_UUID_MISMATCH")
box.info.replication[2].downstream.status
-- change master's cluster uuid and check that replica doesn't connect.
+test_run:cmd("restart server default with cleanup=1")
+box.schema.user.grant("guest", "replication")
+replica_uuid = test_run:eval("replica", "box.info.uuid")[1]
test_run:cmd("stop server replica")
-_ = box.space._schema:replace{'cluster', tostring(uuid.new())}
+_ = box.space._cluster:insert{2, replica_uuid}
-- master believes replica is in cluster, but their cluster UUIDs differ.
test_run:cmd("start server replica")
test_run:wait_log("replica", "REPLICASET_UUID_MISMATCH", nil, 1.0)
--
2.11.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 3/3] replication: improve join/subscribe logging
2019-02-15 12:25 [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
2019-02-15 12:25 ` [PATCH v2 1/3] vclock: use static buffer to format vclock Vladimir Davydov
2019-02-15 12:25 ` [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot Vladimir Davydov
@ 2019-02-15 12:25 ` Vladimir Davydov
2019-02-15 12:59 ` [tarantool-patches] " Konstantin Osipov
2019-02-15 12:26 ` [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
2019-02-15 13:47 ` Vladimir Davydov
4 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2019-02-15 12:25 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
This patch makes logging a little bit more verbose so as to facilitate
post-mortem replication failure analysis:
- When an instance starts up, print its uuid and vclock.
- When a replica connects to a master, print the master's uuid to
the replica's log.
- Print replica uuid and address to the master's log on join/subscribe.
- Log local and remote vclock on subscribe, both on master and on
replica.
- Log ids assigned to replicas both on recovery and when online.
Also, log when a replica is deleted from the cluster.
---
src/box/alter.cc | 1 +
src/box/applier.cc | 17 +++++++++++++----
src/box/box.cc | 21 ++++++++++++++++++---
src/box/replication.cc | 5 +++++
4 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/src/box/alter.cc b/src/box/alter.cc
index de8372b3..0e340468 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -3017,6 +3017,7 @@ on_replace_dd_schema(struct trigger * /* trigger */, void *event)
tt_uuid uu;
tuple_field_uuid_xc(new_tuple, BOX_CLUSTER_FIELD_UUID, &uu);
REPLICASET_UUID = uu;
+ say_info("cluster uuid %s", tt_uuid_str(&uu));
}
}
diff --git a/src/box/applier.cc b/src/box/applier.cc
index c09253be..e9addcb3 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -207,11 +207,12 @@ applier_connect(struct applier *applier)
}
if (applier->version_id != greeting.version_id) {
- say_info("remote master is %u.%u.%u at %s",
+ say_info("remote master %s at %s running Tarantool %u.%u.%u",
+ tt_uuid_str(&greeting.uuid),
+ sio_strfaddr(&applier->addr, applier->addr_len),
version_id_major(greeting.version_id),
version_id_minor(greeting.version_id),
- version_id_patch(greeting.version_id),
- sio_strfaddr(&applier->addr, applier->addr_len));
+ version_id_patch(greeting.version_id));
}
/* Save the remote instance version and UUID on connect. */
@@ -393,8 +394,11 @@ applier_subscribe(struct applier *applier)
struct vclock remote_vclock_at_subscribe;
struct tt_uuid cluster_id = uuid_nil;
+ struct vclock vclock;
+ vclock_create(&vclock);
+ vclock_copy(&vclock, &replicaset.vclock);
xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,
- &replicaset.vclock);
+ &vclock);
coio_write_xrow(coio, &row);
/* Read SUBSCRIBE response */
@@ -429,6 +433,11 @@ applier_subscribe(struct applier *applier)
tt_uuid_str(&cluster_id),
tt_uuid_str(&REPLICASET_UUID));
}
+
+ say_info("subscribed");
+ say_info("remote vclock %s local vclock %s",
+ vclock_to_string(&remote_vclock_at_subscribe),
+ vclock_to_string(&vclock));
}
/*
* Tarantool < 1.6.7:
diff --git a/src/box/box.cc b/src/box/box.cc
index 8136cf33..058ba266 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1509,6 +1509,9 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
row.sync = header->sync;
coio_write_xrow(io, &row);
+ say_info("joining replica %s at %s",
+ tt_uuid_str(&instance_uuid), sio_socketname(io->fd));
+
/*
* Initial stream: feed replica with dirty data from engines.
*/
@@ -1600,6 +1603,9 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
"wal_mode = 'none'");
}
+ struct vclock vclock;
+ vclock_create(&vclock);
+ vclock_copy(&vclock, &replicaset.vclock);
/*
* Send a response to SUBSCRIBE request, tell
* the replica how many rows we have in stock for it,
@@ -1612,9 +1618,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
* the additional field.
*/
struct xrow_header row;
- xrow_encode_subscribe_response_xc(&row,
- &REPLICASET_UUID,
- &replicaset.vclock);
+ xrow_encode_subscribe_response_xc(&row, &REPLICASET_UUID, &vclock);
/*
* Identify the message with the replica id of this
* instance, this is the only way for a replica to find
@@ -1626,6 +1630,11 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
row.sync = header->sync;
coio_write_xrow(io, &row);
+ say_info("subscribed replica %s at %s",
+ tt_uuid_str(&replica_uuid), sio_socketname(io->fd));
+ say_info("remote vclock %s local vclock %s",
+ vclock_to_string(&replica_clock), vclock_to_string(&vclock));
+
/*
* Process SUBSCRIBE request via replication relay
* Send current recovery vector clock as a marker
@@ -1848,6 +1857,9 @@ bootstrap(const struct tt_uuid *instance_uuid,
INSTANCE_UUID = *instance_uuid;
else
tt_uuid_create(&INSTANCE_UUID);
+
+ say_info("instance uuid %s", tt_uuid_str(&INSTANCE_UUID));
+
/*
* Begin listening on the socket to enable
* master-master replication leader election.
@@ -1905,6 +1917,8 @@ local_recovery(const struct tt_uuid *instance_uuid,
tt_uuid_str(&INSTANCE_UUID));
}
+ say_info("instance uuid %s", tt_uuid_str(&INSTANCE_UUID));
+
struct wal_stream wal_stream;
wal_stream_create(&wal_stream, cfg_geti64("rows_per_wal"));
@@ -1930,6 +1944,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
* not attempt to apply these rows twice.
*/
recovery_scan(recovery, &replicaset.vclock, &gc.vclock);
+ say_info("instance vclock %s", vclock_to_string(&replicaset.vclock));
if (wal_dir_lock >= 0) {
box_listen();
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 19ad5026..e3ceb009 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -197,6 +197,9 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
instance_id = replica_id;
}
replicaset.replica_by_id[replica_id] = replica;
+
+ say_info("assigned id %d to replica %s",
+ replica->id, tt_uuid_str(&replica->uuid));
}
void
@@ -214,6 +217,8 @@ replica_clear_id(struct replica *replica)
*/
replicaset.replica_by_id[replica->id] = NULL;
replica->id = REPLICA_ID_NIL;
+ say_info("removed replica %s", tt_uuid_str(&replica->uuid));
+
/*
* The replica will never resubscribe so we don't need to keep
* WALs for it anymore. Unregister it with the garbage collector
--
2.11.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/3] Improve replication related logging
2019-02-15 12:25 [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
` (2 preceding siblings ...)
2019-02-15 12:25 ` [PATCH v2 3/3] replication: improve join/subscribe logging Vladimir Davydov
@ 2019-02-15 12:26 ` Vladimir Davydov
2019-02-15 13:47 ` Vladimir Davydov
4 siblings, 0 replies; 10+ messages in thread
From: Vladimir Davydov @ 2019-02-15 12:26 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
On Fri, Feb 15, 2019 at 03:25:46PM +0300, Vladimir Davydov wrote:
> Post-mortem analysis of a replication failure is painful as it is, but
> with poor logging that we have it is usually nearly to impossible to
> understand what happened in the frist place. This patch set targets on
> ameliorating the situration by logging all relevant vclocks and ids on
> join/subscribe.
Link to the branch:
https://github.com/tarantool/tarantool/commits/dv/make-replication-logging-more-verbose
>
> Changes in v2:
> - Log all replica id changes, because without them vclocks don't make
> much sense.
> - Print remote and local vclocks on subscribe to replica's vclock.
> It may be useful if master's log rotated.
>
> v1: https://www.freelists.org/post/tarantool-patches/PATCH-replication-improve-joinsubscribe-logging
>
> Vladimir Davydov (3):
> vclock: use static buffer to format vclock
> box: don't set cluster uuid and instance id in initial snapshot
> replication: improve join/subscribe logging
>
> src/box/alter.cc | 3 +-
> src/box/applier.cc | 17 ++++++++---
> src/box/bootstrap.snap | Bin 1911 -> 1831 bytes
> src/box/box.cc | 27 +++++++++++------
> src/box/error.cc | 6 ++--
> src/box/gc.c | 6 ++--
> src/box/lua/upgrade.lua | 7 +++--
> src/box/replication.cc | 17 +++++------
> src/box/vclock.c | 64 +++++++++++------------------------------
> src/box/vclock.h | 5 ++--
> src/box/xlog.c | 16 +++--------
> test/box-py/bootstrap.result | 5 ++--
> test/box-py/bootstrap.test.py | 16 ++---------
> test/replication/misc.result | 9 +++++-
> test/replication/misc.test.lua | 5 +++-
> test/unit/vclock.cc | 3 +-
> 16 files changed, 91 insertions(+), 115 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] vclock: use static buffer to format vclock
2019-02-15 12:25 ` [PATCH v2 1/3] vclock: use static buffer to format vclock Vladimir Davydov
@ 2019-02-15 12:41 ` Konstantin Osipov
0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-02-15 12:41 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [19/02/15 15:26]:
> Currently, vclock_to_string() allocates the formatted vclock string
> using malloc() and hence the caller is responsible for freeing it, which
> isn't very user-friendly. Let's use a static buffer as we do to format
> other objects.
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 10+ messages in thread
* [tarantool-patches] Re: [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot
2019-02-15 12:25 ` [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot Vladimir Davydov
@ 2019-02-15 12:58 ` Konstantin Osipov
2019-02-15 13:00 ` Konstantin Osipov
1 sibling, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-02-15 12:58 UTC (permalink / raw)
To: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [19/02/15 15:26]:
> They aren't needed there as we reset them anyway once the snapshot is
> replayed on initial bootstrap. OTOH having them results in calling
> replica_{set,clear}_id twice on initial bootstrap, which will look weird
> when I patch them to log the ids. So let's remove them from the initial
> snapshot. This makes the initial bootstrap impossible to recover from as
> it is, but that shouldn't be an issue since one can always bootstrap a
> new instance in a normal way. This also allows us to make cluster uuid
> truly immutable (currently, one can update it with REPLACE).
This patch is Ok to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 10+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] replication: improve join/subscribe logging
2019-02-15 12:25 ` [PATCH v2 3/3] replication: improve join/subscribe logging Vladimir Davydov
@ 2019-02-15 12:59 ` Konstantin Osipov
0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-02-15 12:59 UTC (permalink / raw)
To: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [19/02/15 15:26]:
OK to push.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 10+ messages in thread
* [tarantool-patches] Re: [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot
2019-02-15 12:25 ` [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot Vladimir Davydov
2019-02-15 12:58 ` [tarantool-patches] " Konstantin Osipov
@ 2019-02-15 13:00 ` Konstantin Osipov
1 sibling, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-02-15 13:00 UTC (permalink / raw)
To: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [19/02/15 15:26]:
> @@ -3012,7 +3012,7 @@ on_replace_dd_schema(struct trigger * /* trigger */, void *event)
> const char *key = tuple_field_cstr_xc(new_tuple ? new_tuple : old_tuple,
> BOX_SCHEMA_FIELD_KEY);
> if (strcmp(key, "cluster") == 0) {
> - if (new_tuple == NULL)
> + if (old_tuple != NULL)
> tnt_raise(ClientError, ER_REPLICASET_UUID_IS_RO);
> tt_uuid uu;
> tuple_field_uuid_xc(new_tuple, BOX_CLUSTER_FIELD_UUID, &uu);
Please allow to change cluster uuid after bootstrap.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/3] Improve replication related logging
2019-02-15 12:25 [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
` (3 preceding siblings ...)
2019-02-15 12:26 ` [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
@ 2019-02-15 13:47 ` Vladimir Davydov
4 siblings, 0 replies; 10+ messages in thread
From: Vladimir Davydov @ 2019-02-15 13:47 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
On Fri, Feb 15, 2019 at 03:25:46PM +0300, Vladimir Davydov wrote:
> Vladimir Davydov (3):
> vclock: use static buffer to format vclock
> box: don't set cluster uuid and instance id in initial snapshot
> replication: improve join/subscribe logging
Pushed to 2.1 and 1.10.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-02-15 13:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 12:25 [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
2019-02-15 12:25 ` [PATCH v2 1/3] vclock: use static buffer to format vclock Vladimir Davydov
2019-02-15 12:41 ` Konstantin Osipov
2019-02-15 12:25 ` [PATCH v2 2/3] box: don't set cluster uuid and instance id in initial snapshot Vladimir Davydov
2019-02-15 12:58 ` [tarantool-patches] " Konstantin Osipov
2019-02-15 13:00 ` Konstantin Osipov
2019-02-15 12:25 ` [PATCH v2 3/3] replication: improve join/subscribe logging Vladimir Davydov
2019-02-15 12:59 ` [tarantool-patches] " Konstantin Osipov
2019-02-15 12:26 ` [PATCH v2 0/3] Improve replication related logging Vladimir Davydov
2019-02-15 13:47 ` Vladimir Davydov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox