[Tarantool-patches] [RFC v3 0/3] relay: provide downstream lag information

Cyrill Gorcunov gorcunov at gmail.com
Fri Apr 30 18:39:37 MSK 2021


Guys, this is *NOT* for merging but rather to gather comments on the
code structure and overall idea.

Here is a code flow for memory refresh

  MASTER NODE
  ===========

  TX
  ==
  main.sched
  |
  `- box_process_rw
  ^  `- txn_commit
  |     `- alloc xrow
  |        `- journal_write
  |           `- wal_assign_lsn
  |           `- write to disk
  |           `- wal_notify_watchers
  |               |
  +---------------+ wakeup relay thread
                  |
                  v
                RELAY THREAD
                ============
                relay_subscribe_f
                `- relay_reader_f
                |   `- coio_read_xrow_timeout_xc <------------------+
                |                                                   |
                `- relay_process_wal_event                          |
                   `- recover_remaining_wals                        |
                      `- relay_send                                 |
                          |                                         |
                          | read xrows from disk                    |
                          | and send them to replica's              |
                          | applier                                 |
                          |                                         |
                          |                                         |
  REPLICA NODE            |                                         |
  ============            |                                         ^
  TX                      |                                         |
  ==                      |                                         |
  main.sched              |                                         |
  `- applier_apply_tx <---+                                         |
  |  `- apply_synchro_row (if CONFIRM | ROLLBACK)                   |
  |  |  `- journal_write                                            |
  |  |  `- applier->first_row_wal_time from xrow::tm                |
  |  `- apply_plain_tx                                              |
  |     `- txn_commit_try_async                                     |
  |        `- applier_txn_wal_write_cb                              |
  |           `- applier->first_row_wal_time from xrow::tm          |
  |                                                                 |
  `- applier_writer_f                                               |
     `- xrow_encode_vclock_timed(applier->first_row_wal_time)       |
        `- coio_write_xrow -----------------------------------------+

Typical output is something like

 (freshly started)
 |tarantool> box.info.replication
 |---
 |- 1:
 |    id: 1
 |    uuid: f94edca8-71d4-46c9-b9d2-620a6a2bd977
 |    lsn: 121
 |  2:
 |    id: 2
 |    uuid: f6ac84e1-a040-48d9-a9c7-f8147b8e2c9e
 |    lsn: 0
 |    upstream:
 |      status: follow
 |      idle: 0.56554910800332
 |      peer: replicator at 127.0.0.1:3302
 |      lag: 0.00021719932556152
 |    downstream:
 |      status: follow
 |      idle: 0.52823433600133
 |      vclock: {1: 121}
 |      lag: 0
 |...

The new data sent

 |tarantool> box.space.sync:insert{55}
 |---
 |- [55]
 |...
 | tarantool> box.info.replication
 | ---
 | - 1:
 |     id: 1
 |     uuid: f94edca8-71d4-46c9-b9d2-620a6a2bd977
 |     lsn: 123
 |   2:
 |     id: 2
 |     uuid: f6ac84e1-a040-48d9-a9c7-f8147b8e2c9e
 |     lsn: 0
 |     upstream:
 |       status: follow
 |       idle: 0.96756215799542
 |       peer: replicator at 127.0.0.1:3302
 |       lag: 0.0002143383026123
 |     downstream:
 |       status: follow
 |       idle: 0.31903971399879
 |       vclock: {1: 123}
 |       lag: 0.0010807514190674
 | ...

Please take a look on applier notifications structure and naming. Actually
I don't really like `downstream.lag` name either because this is not a counterpart
for `upstream.lag` as far as I understand but rather packet traverse so maybe
`dowstream.wal-lag` would be more suitable? Also in idle cycles downstream.lag
is not changed which might confuse the readers because `upstream.lag` does.

Anyway any kind of comments and code structure would be highly appreciated.
Again, this series is not for merging because there is no docs, no tests yet,
I did manual testing only.

Previous version https://lists.tarantool.org/tarantool-patches/20210201100037.212301-1-gorcunov@gmail.com/

branch: gorcunov/gh-5447-relay-lag-3
issue: https://github.com/tarantool/tarantool/issues/5447

Cyrill Gorcunov (3):
  xrow: allow to pass timestamp via xrow_encode_vclock_timed helper
  applier: send first row's WAL time in the applier_writer_f
  relay: provide information about downstream lag

 src/box/applier.cc | 84 ++++++++++++++++++++++++++++++++++++++--------
 src/box/applier.h  |  5 +++
 src/box/lua/info.c |  3 ++
 src/box/relay.cc   | 46 ++++++++++++++++++++++---
 src/box/relay.h    |  3 ++
 src/box/xrow.c     |  5 ++-
 src/box/xrow.h     | 21 ++++++++++--
 7 files changed, 146 insertions(+), 21 deletions(-)


base-commit: 7fd53b4c5264bdbc8f01858409abe52bc38764c8
-- 
2.30.2



More information about the Tarantool-patches mailing list