From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 386DF255E8
	for <tarantool-patches@freelists.org>; Fri,  8 Feb 2019 12:09:24 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id GdCGLgvu_oTe for <tarantool-patches@freelists.org>;
	Fri,  8 Feb 2019 12:09:24 -0500 (EST)
Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id D5B3F21BA5
	for <tarantool-patches@freelists.org>; Fri,  8 Feb 2019 12:09:23 -0500 (EST)
Date: Fri, 8 Feb 2019 20:09:21 +0300
From: Konstantin Osipov <kostja@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH] Relay logs from wal thread
Message-ID: <20190208170921.GD4588@chai>
References: <ab1c9c4f67a93cc666417cab63e00886b8749d81.1548166552.git.georgy@tarantool.org>
 <20190129085318.fggigft4p5pqb5kq@esperanza>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190129085318.fggigft4p5pqb5kq@esperanza>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org
Cc: Georgy Kirichenko <georgy@tarantool.org>

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/01/29 11:57]:
> It's too early to comment on the code. Here are some general issues that
> I think need to be addressed first:
> 
>  - Moving relay code to WAL looks ugly as it turns the code into a messy
>    bundle. Let's try to hide relay behind an interface. Currently, we
>    only need one method - write() - but in future we might need more,
>    for instance prepare() or commit(). May be, we could reuse journal
>    struct for this?

Agree.

> 
>  - Memory index looks confusing to me. Why would we need it at all?
>    AFAIU it is only used to quickly position relay after switching to
>    memory buffer, but it's not a big deal to linearly scan the buffer
>    instead when that happens - it's a rare event and scanning memory
>    should be fast enough. At least, we should start with it, because
>    that would be clear and simple, and build any indexing later in
>    separate patches provided we realize that we really need it, which I
>    doubt.

I agree it could go in a separate patch. I disagree we need an
index: cheaper to have an index than to ensure it is never needed.

>  - We shouldn't stop/start relay thread when switching to/from memory
>    source. Instead on SUBSCRIBE we should always start a relay thread,
>    as we do now, and implement some machinery to suspend/resume it and
>    switch to memory/disk when it falls in/out the memory buffer. All
>    that machinery should live in relay.cc.

I agree. The relay thread should simply suspend itself when
switching to bsync.

>  - When in sync mode, WAL should write directly to the relay socket,
>    without using any intermediate buffer. Buffer should only be used if
>    the socket is blocked or the relay is running in asynchronous mode.
>    There's no need in extra memory copying otherwise.

This is not always possible. The socket doesn't have to be ready
all the time. I agree we could try to write to the socket first.
But we still need a buffer, and I think we need a single one, I
see no technical issue with having a single ring buffer with multiple
cursors. If a cursor falls behind the valid read position in the
buffer, it's invalidated and the relay is switched to async mode.
> 
>  - Maintaining a single memory buffer in WAL would complicate relay sync
>    interface implementation. May be, we could use a separate buffer per
>    each relay? That would be less efficient, obviously, because it would
>    mean extra memory copying and usage, but that would simplify the code
>    a great deal. Besides, we would need to use the buffers only for
>    async relays - once a relay is switched to sync mode, we could write
>    to its socket directly. So I think we should at least consider this
>    option.

Disagree, we have to solve the difficult problem of such buffer. If we have
multiple buffers, we have to solve it multiple times. Having a
single buffer for all replicas is equivalent to maintaining all
these replicas in a quorum - as soon as a replica position falls
behind valid read position in the buffer, it means it's out of the
sync replication quorum and our replication group has changed,
this should be reflected in the write ahead log.

>  - You removed the code that tried to advance GC consumers as rarely as
>    possible, namely only when a WAL file is closed. I'd try to save it
>    somehow, because without it every ACK would result in GC going to WAL
>    if relay is running in async mode (because only WAL knows about WAL
>    files; for others it's just a continuous stream). When a relay is
>    running in sync mode, advancing GC on each ACK seems to be OK,
>    because WAL won't be invoked then (as we never remove WAL files
>    created after the last checkpoint).

I agree.

>  - I really don't like code duplication. I think we should reuse the
>    code used for sending rows and receiving ACKs between sync and sync
>    modes. Hiding sync relay implementation behind an interface would
>    allow us to do that.

I agree.
> 
>  - I don't see how this patch depends on #1025 (sending rows in
>    batches). Let's implement relaying from memory first and do #1025
>    later, when we see the full picture.

I agree.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov