From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id D26296B461; Wed, 3 Mar 2021 23:59:06 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D26296B461 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1614805146; bh=QLXZwlT6mqoYTZvdS7w8X/9jXc6lAZuvvW1eX4S7WTE=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=KmI1Omi8FWNcKPmFFl8HA89bhHzTGG+J3QNuFrGx7CHXhnKPEB7KivJvqWaJIIuFn IXVsGXuat1U+ZbdEkiF95Pn5VI37lmMKhGdPZlMNelKmHxRxDCujkwXnluD7D/eYpT a9JQmPHyHbJagnYdzeOcw7RFmaBvHuLFK9vXHxho= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BF76E6B461 for ; Wed, 3 Mar 2021 23:59:04 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BF76E6B461 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lHYaF-0007sY-IN; Wed, 03 Mar 2021 23:59:04 +0300 To: Serge Petrenko , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <20210224193549.70017-1-sergepetrenko@tarantool.org> <469ef629-699c-0180-facb-f166d6d3c073@tarantool.org> <17046c4d-b239-f3d8-f150-941026096b1e@tarantool.org> <97734227-410b-3e72-2ee7-30e7d6f63217@tarantool.org> Message-ID: <8b33b97e-fb7b-6ae5-9007-3f82e3cb49ea@tarantool.org> Date: Wed, 3 Mar 2021 21:59:02 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <97734227-410b-3e72-2ee7-30e7d6f63217@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD92A98208ECBDD29F51A014079BC0C7C459F7160AC7246F773182A05F5380850408512C26599C795BB9AC71D474FA69F5ACD5F817798D5ADB4C27E450A73977F31 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE792E16514AF283DFAEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637C59BB75D821356298638F802B75D45FF914D58D5BE9E6BC131B5C99E7648C95CDB1C6D2319E1F1677512D511EF2296F33304A22FD5539F72A471835C12D1D9774AD6D5ED66289B5278DA827A17800CE77A825AB47F0FC8649FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C3253F27DA5A70FAF3117882F4460429728AD0CFFFB425014E868A13BD56FB6657A7F4EDE966BC389F9E8FC8737B5C2249A45692FFBBD75A6A089D37D7C0E48F6CCF19DD082D7633A0E7DDDDC251EA7DABAAAE862A0553A39223F8577A6DFFEA7C3DDF88F19B2F251A43847C11F186F3C5E7DDDDC251EA7DABCC89B49CDF41148F90DBEB212AEC08F22623479134186CDE6BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2BBE337FB72E92315FF39D8DB89857825743D937135AA13FF7120598F3739033D109B151A58CD633F834459D11680B505880A6FED6F3B21D8B74BE3B7AC9ED4DE X-C1DE0DAB: 0D63561A33F958A52DB6941DAA57F940028EDC00E6FEF7A62B8712C85BBF5E49D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75448CF9D3A7B2C848410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D345DB600F8E858000FDF3DF9E7F1916C8AF92ECCCF3E53C2C2FAD7DA9AEA6EBDE00CF979D7E2F0B4271D7E09C32AA3244C55DB1653F7D407BB4429BB530B925BB669B6CAE0477E908DFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojSsmoOoMLSh3ecIOIsNpOsQ== X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822B1DDFC9093F0989B49DC30E69A07FF043841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3] wal: introduce limits on simultaneous writes X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" >>>>> @@ -159,6 +276,13 @@ journal_write(struct journal_entry *entry) >>>>>    static inline int >>>>>    journal_write_async(struct journal_entry *entry) >>>>>    { >>>>> +    /* >>>>> +     * It's the job of the caller to check whether the queue is full prior >>>>> +     * to submitting the request. >>>>> +     */ >>>>> +    assert(!journal_queue_is_full() || journal_queue.is_ready); >>>>> +    journal_queue_on_append(entry); >>>> 8. Probably must assert that waiters list is empty. Otherwise you >>>> could go out of order. >>> It's not empty by the time the first entry gets to 'journal_write_async'. >>> Everyone else is waken up, but not yet removed from the queue. >>> >>> Looks like we cannot determine whether a write is called after waiting in queue >>> or not. >> It bothers me a lot, the rest looks good. We don't have any protection against >> a reorder, even not an assertion. How about this? (I didn't test): > > Thanks for thinking this through! > Yes, indeed, this is a problem. Unfortunately, your solution will allow reordering. > We needed the waiter count to prevent anyone from sliding in front when the qeueue > is already waken, but the fibers are not yet scheduled. > > We also discussed that a reorder may happen if applier tries to apply > a vinyl tx, which may yield after it has waited in queue. > In this case, if some tx is committed (synchronously) and added to the queue > while the vinyl tx sleeps, the vinyl tx will bypass that synchronous tx in queue. > > So, there shouldn't be any yields between journal_queue_wait() and > journal_write(_async)(). > > We cannot yield in memtx transactions, so the only solution is to put > journal_queue_wait() after txn_prepare() inside txn_commit_async(). > > Take a look at this patch (on top of the branch): > I decided to move journal_queue_wait() to journal_write_async() to be > consistent with journal_write(), which also checks queue. > > Now reordering is not a problem since waiting in queue is done right before > the write request is submitted. No yields in between. > > Also, I renamed journal_write_async() to journal_write_try_async() and > txn_commit_async() to txn_commit_try_async() to indicate that they may > yield occasionally now. > > What do you think of this approach? Looks good. > P.S. there's another thing I wanted to discuss: should we set defaults for > wal_queue_max_size(len) to some finite value? If yes, then what value should > it be? wal_queue_max_size=default memtx_memory (256 Mb)? > Don't know which value to choose for wal_queue_max_len at all. Probably we could try something sanely big. For instance, max_len = 1000000 in assumption that you might replicate a million rows per second, and a WAL write is likely to end faster than a second. But if you still have a queue bigger than million, then it looks wrong. On the other hand, should give some bit of protection against OOM assuming the transactions aren't big. For max_size we could be a bit more brave than 256MB. Perhaps 5GB? 10GB? Just to have some sane limit up to which it is still imaginable that the instance really has that much memory. Better than infinity. Don't know. Also can leave unlimited, or ask Mons for good defaults since he has experience of configuring Tarantool in prod. See 2 comments below. > diff --git a/test/replication/gh-5536-wal-limit.result b/test/replication/gh-5536-wal-limit.result > new file mode 100644 > index 000000000..f7799baa8 > --- /dev/null > +++ b/test/replication/gh-5536-wal-limit.result > @@ -0,0 +1,132 @@ > +-- test-run result file version 2 > +test_run = require('test_run').new() > + | --- > + | ... > +fiber = require('fiber') > + | --- > + | ... > + > +-- > +-- gh-5536: out of memory on a joining replica. Introduce a WAL queue limit so > +-- that appliers stop reading new transactions from master once the queue is > +-- full. > +-- > +box.schema.user.grant('guest', 'replication') > + | --- > + | ... > +_ = box.schema.space.create('test') > + | --- > + | ... > +_ = box.space.test:create_index('pk') > + | --- > + | ... > + > +replication_timeout = box.cfg.replication_timeout 1. Need to save the WAL limit in case it won't be defaulted to 0 in future. To properly restore it in the end of the test. > + | --- > + | ... > +box.cfg{replication_timeout=1000} > + | --- > + | ...= 2. Would be also good to have at least one test for WAL row count limit, not only for the byte size.