From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 292676EC58; Thu, 27 May 2021 13:53:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 292676EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1622112837; bh=4Ab1kHyFniQ/enAlVjwpWhlRXnJ4XqcscyjfduHqjWA=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=xi+ALQmWSjEvq44hzmYIF95tCV5G/KyPfhSSbfTWr6I5Z8etodBB6InNzOXPFLwyW kj8N7IbuyMGZLmvtEjD8cVvl+xjJZmi29iJXGmdTVy+TYRPZTu26O8u91yg3r/NLbw WjESvM90REovHugAZ5/sA+peNYyIjdEx+48STi/k= Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E5CB26EC58 for ; Thu, 27 May 2021 13:53:55 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E5CB26EC58 Received: by smtp63.i.mail.ru with esmtpa (envelope-from ) id 1lmDeE-00034V-Fv; Thu, 27 May 2021 13:53:54 +0300 To: Vladislav Shpilevoy , Cyrill Gorcunov Cc: tarantool-patches@dev.tarantool.org References: <8011f87bb9b5e1f53f5bee3124f3a8e9dbe1917c.1621935783.git.sergepetrenko@tarantool.org> <3552b10e-370d-ddbd-11ed-8c3d5310e651@tarantool.org> Message-ID: <008e0d5d-7b1a-0599-b76c-c3dc8a481d60@tarantool.org> Date: Thu, 27 May 2021 13:53:54 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: <3552b10e-370d-ddbd-11ed-8c3d5310e651@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9157EECD0FDB90B9A5E1FB2BD85CDC5FA3FC9EA359F29112C00894C459B0CD1B93C003200017E0933BBE952BB4A6BBDD31977F0E62ED16292388283C6EBE8EA67 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE797F4D2EDC29AFAF7EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370218B86C916BF3528638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8E2C7F8952F0308A68C58F30E6AC5D9E4117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18F04B652EEC242312D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6D635BA3ABDB36C18089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A597229C8DB05D3F8E094025BF21A8BF6F12FE46690AA9B8FED59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75E3127721F5A72C97410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34CC972AA0E2E780C4256D56C044916058B9CEB240CDF34CA1F60F978546D6B5F3FACBB66599C43F9B1D7E09C32AA3244C8FD1A68CF241E68152706E44D29F3FC760759606DA2E136AFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojywAFAsvjBJjlB+IsFUFaPQ== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A44662FB42200DA33344C04D9159EA4A63B3667B1B35C898C542424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 2/2] box: fix an assertion failure in box.ctl.promote() X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 26.05.2021 21:46, Vladislav Shpilevoy пишет: >>>> @@ -1618,14 +1618,29 @@ box_promote(void) >>>>                    txn_limbo.owner_id); >>>>               return -1; >>>>           } >>>> +        if (txn_limbo_is_empty(&txn_limbo)) { >>>> +            wait_lsn = txn_limbo.confirmed_lsn; >>>> +            goto promote; >>>> +        } >>>>       } >>>>   -    /* >>>> -     * promote() is a no-op on the limbo owner, so all the rows >>>> -     * in the limbo must've come through the applier meaning they already >>>> -     * have an lsn assigned, even if their WAL write hasn't finished yet. >>>> -     */ >>>> -    wait_lsn = txn_limbo_last_synchro_entry(&txn_limbo)->lsn; >>>> +    struct txn_limbo_entry *last_entry; >>>> +    last_entry = txn_limbo_last_synchro_entry(&txn_limbo); >>>> +    /* Wait for the last entries WAL write. */ >>>> +    if (last_entry->lsn < 0) { >>>> +        if (wal_sync(NULL) < 0) >>>> +            return -1; >>>> +        if (txn_limbo_is_empty(&txn_limbo)) { >>>> +            wait_lsn = txn_limbo.confirmed_lsn; >>>> +            goto promote; >>>> +        } >>>> +        if (last_entry != txn_limbo_last_synchro_entry(&txn_limbo)) { >>> This is a bit dangerous. We cache a pointer and then go to fiber_yield, >>> which switches context, at this moment the pointer become dangling one >>> and we simply can't be sure if it _were_ reused. IOW, Serge are we >>> 100% sure that the same pointer with same address but with new data >>> won't appear here as last entry in limbo? >> I agree this solution is not perfect. >> >> An alternative would be to do the following: >> 1) Check that the limbo owner hasn't changed >> 2) Check that the last entry has positive lsn (e.g. it's not a new entry which >>     wasn't yet written to WAL). And that this lsn is equal to the lsn of our entry. >> >> But what if our entry was confirmed and destroyed during wal_sync()? We can't compare >> other entries lsn with this ones. > As decided in the chat, you can use txn->id. It is unique until > restart and should help to detect if the last transaction has > changed. Yep, thanks for the suggestion! Here's the diff: ============================================================= diff --git a/src/box/box.cc b/src/box/box.cc index 3d9cd0e57..3baae6afe 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1628,13 +1628,19 @@ box_promote(void)         last_entry = txn_limbo_last_synchro_entry(&txn_limbo);         /* Wait for the last entries WAL write. */         if (last_entry->lsn < 0) { +               int64_t tid = last_entry->txn->id;                 if (wal_sync(NULL) < 0)                         return -1; +               if (former_leader_id != txn_limbo.owner_id) { +                       diag_set(ClientError, ER_INTERFERING_PROMOTE, +                                txn_limbo.owner_id); +                       return -1; +               }                 if (txn_limbo_is_empty(&txn_limbo)) {                         wait_lsn = txn_limbo.confirmed_lsn;                         goto promote;                 } -               if (last_entry != txn_limbo_last_synchro_entry(&txn_limbo)) { +               if (tid != txn_limbo_last_synchro_entry(&txn_limbo)->txn->id) {                         diag_set(ClientError, ER_QUORUM_WAIT, quorum,                                  "new synchronous transactions appeared");                         return -1; ============================================================= -- Serge Petrenko