[Tarantool-patches] [RFC v7 0/2] relay: provide downstream lag information
Cyrill Gorcunov
gorcunov at gmail.com
Fri Jun 4 20:06:05 MSK 2021
Guys, take a look once time permit. Previous version is here
https://lists.tarantool.org/tarantool-patches/20210506214132.533913-1-gorcunov@gmail.com/
Not for merging yet! I think instead of applier_wal_stat structure we might need some
_commonly shared_ statistics structure, probably bound to WAL code and all other
threads will update it in a lockless way, because we might need to collect more
detais on WAL processing in future. I though of something like
enum {
WAL_ICNT__APPLIER_TXN_START_TM,
WAL_ICNT__MAX,
};
struct wal_stat {
int64_t icnt[WAL_ICNT__MAX];
} wal_st[VCLOCK_MAX];
and introduce
wal_st__icnt_read(unsigned id);
wal_st__icnt_write(unsigned id);
then applier will simply push last timestamp to WAL_ICNT__APPLIER_TXN_START_TM
counter, and later when we need to send ACK we use wal_st__icnt_read() to
fetch it back. We won't need to allocate some dynamic memory for it but
rather use statically preallocated shared between threads. Not sure though.
v4 (by Vlad):
- add a test case
- add docbot request
- dropped off xrow_encode_vclock_timed, we use opencoded assignment
for tm value when send ack
- struct awstat renamed to applier_wal_stat. Vlad I think this is
better name than "applier_lag" because this is statistics on WAL,
we simply track remote WAL propagation here, so more general name
is better for grep sake and for future extensions
- instead of passing applier structure we pass replica_id
- the real keeper of this statistics comes into "replica" structure
thus unbound of applier itself
- for synchro entries we pass a pointer to the applier_wal_stat instead
of using replica_id = 0 as a sign that we don't need to update statistics
for initial and final join cases
- to write and read statistics we provide wal_stat_update and wal_stat_ack
helpers to cover the case where single ACK spans several transactions
v7:
- reworked the idea, so we always send last applied transaction's timestamp
inside applier's ACK message
- fixed changelong
- in replica structure opencode the applier_txn_start_tm member
- drop multiple ifs in applier_apply_tx
- drop if statement in apply_synchro_row
Vlad you pointed
> 4. Lag is updated in the relay thread, therefore you can't
> simply read it in TX thread like you do in the diff block
> above.
actually I can read the relay's lag in box.info() output, if the
relay object is removed then it won't have RELAY_FOLLOW state
so we're safe. This is what you meant?
branch gorcunov/gh-5447-relay-lag-7-notest
issue https://github.com/tarantool/tarantool/issues/5447
Cyrill Gorcunov (2):
applier: send transaction's first row WAL time in the applier_writer_f
relay: provide information about downstream lag
.../unreleased/gh-5447-downstream-lag.md | 6 ++
src/box/applier.cc | 74 ++++++++++++---
src/box/applier.h | 14 +++
src/box/lua/info.c | 3 +
src/box/relay.cc | 51 ++++++++++
src/box/relay.h | 6 ++
src/box/replication.cc | 1 +
src/box/replication.h | 5 +
.../replication/gh-5447-downstream-lag.result | 93 +++++++++++++++++++
.../gh-5447-downstream-lag.test.lua | 41 ++++++++
10 files changed, 279 insertions(+), 15 deletions(-)
create mode 100644 changelogs/unreleased/gh-5447-downstream-lag.md
create mode 100644 test/replication/gh-5447-downstream-lag.result
create mode 100644 test/replication/gh-5447-downstream-lag.test.lua
--
2.31.1
More information about the Tarantool-patches
mailing list