From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vdavydov.dev@gmail.com>
Date: Tue, 29 Jan 2019 11:53:18 +0300
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [tarantool-patches] [PATCH] Relay logs from wal thread
Message-ID: <20190129085318.fggigft4p5pqb5kq@esperanza>
References: <ab1c9c4f67a93cc666417cab63e00886b8749d81.1548166552.git.georgy@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ab1c9c4f67a93cc666417cab63e00886b8749d81.1548166552.git.georgy@tarantool.org>
To: Georgy Kirichenko <georgy@tarantool.org>
Cc: tarantool-patches@freelists.org
List-ID: <tarantool-patches.dev.tarantool.org>

On Tue, Jan 22, 2019 at 05:18:36PM +0300, Georgy Kirichenko wrote:
> Use wal thread to relay rows. Relay writer and reader fibers live
> in a wal thread. Writer fiber uses wal memory to send rows to the
> peer when it possible or spawns cords to recover rows from a file.
> 
> Wal writer stores rows into two memory buffers swapping by buffer
> threshold. Memory buffers are splitted into chunks with the
> same server id issued by chunk threshold. In order to search
> position in wal memory all chunks indexed by wal_mem_index
> array. Each wal_mem_index contains corresponding replica_id and
> vclock just before first row in the chunk as well as chunk buffer
> number, position and size of the chunk in memory.
> 
> Closes: #3794

It's too early to comment on the code. Here are some general issues that
I think need to be addressed first:

 - Moving relay code to WAL looks ugly as it turns the code into a messy
   bundle. Let's try to hide relay behind an interface. Currently, we
   only need one method - write() - but in future we might need more,
   for instance prepare() or commit(). May be, we could reuse journal
   struct for this?

 - Memory index looks confusing to me. Why would we need it at all?
   AFAIU it is only used to quickly position relay after switching to
   memory buffer, but it's not a big deal to linearly scan the buffer
   instead when that happens - it's a rare event and scanning memory
   should be fast enough. At least, we should start with it, because
   that would be clear and simple, and build any indexing later in
   separate patches provided we realize that we really need it, which I
   doubt.

 - We shouldn't stop/start relay thread when switching to/from memory
   source. Instead on SUBSCRIBE we should always start a relay thread,
   as we do now, and implement some machinery to suspend/resume it and
   switch to memory/disk when it falls in/out the memory buffer. All
   that machinery should live in relay.cc.

 - When in sync mode, WAL should write directly to the relay socket,
   without using any intermediate buffer. Buffer should only be used if
   the socket is blocked or the relay is running in asynchronous mode.
   There's no need in extra memory copying otherwise.

 - Maintaining a single memory buffer in WAL would complicate relay sync
   interface implementation. May be, we could use a separate buffer per
   each relay? That would be less efficient, obviously, because it would
   mean extra memory copying and usage, but that would simplify the code
   a great deal. Besides, we would need to use the buffers only for
   async relays - once a relay is switched to sync mode, we could write
   to its socket directly. So I think we should at least consider this
   option.

 - You removed the code that tried to advance GC consumers as rarely as
   possible, namely only when a WAL file is closed. I'd try to save it
   somehow, because without it every ACK would result in GC going to WAL
   if relay is running in async mode (because only WAL knows about WAL
   files; for others it's just a continuous stream). When a relay is
   running in sync mode, advancing GC on each ACK seems to be OK,
   because WAL won't be invoked then (as we never remove WAL files
   created after the last checkpoint).

 - I really don't like code duplication. I think we should reuse the
   code used for sending rows and receiving ACKs between sync and sync
   modes. Hiding sync relay implementation behind an interface would
   allow us to do that.

 - I don't see how this patch depends on #1025 (sending rows in
   batches). Let's implement relaying from memory first and do #1025
   later, when we see the full picture.