From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 7FA7A6F3D0; Wed, 1 Sep 2021 19:04:44 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7FA7A6F3D0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1630512284; bh=DcZaGBwrWmz17VfdFROIGYAC4Ves/BkPWfgos89seGE=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=DN2qb0Zm/09ciuOzq1tAbOj8fQjOl+IS0A/GHMAq6AnQ8qfE3EE3KTck5B2tpP+mK 6oWkeJpr3ezL3mT3Dw1lHqfb7/Z0zGqCJPhzdOKlO7eLOnnaeHeZ7FSdaFLeku4MG4 6eNkFC0W71AWtjDdJ1FhqLaYooS8y4qG0NAq/YlY= Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 59C206F3D0 for ; Wed, 1 Sep 2021 19:04:42 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 59C206F3D0 Received: by mail-lj1-f181.google.com with SMTP id d16so5864717ljq.4 for ; Wed, 01 Sep 2021 09:04:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=q0A3ZazVt05WbFQNkDA3KoJhy3ZLvliPQErTTuvB7Yw=; b=XImb9/lyo75jyh6AVtHknAJtfJPg55nlWtHFyJvyXPYNF+RU+YV+TsrtHehiVAuakp 9aORqeL4j0MAu3McEGTpAZzr6hBE9fKwqbudTiUohVyf4o7edbCVDd8ofn1tv/LrzApm qfc5WkFCQJktYf7HwwfiAOZdoNYTX189gznFJBsEIMHZStWK9uo3n5difOXmN2blJMYP 9E/ZrMW4GB4+yktQH95vpuqn1KVFpXaZhqLNlPzoS0lQBsQ3AAjAd/iAXo7r0kSPmhkK 8h6UR13LLQ5U2P8Mxz4dW1/SlU+SsWz6e3lR4t5gDJ9IJmEpOitFTqgwO4MFYhiSAXvH OflA== X-Gm-Message-State: AOAM531Du2FCTsEe5WyqpPS9ActvqRyrUo/3MVHAy+gY2qsogIsOgt59 bVfvZyCdJqDPfvskA2CwkVAju5o5PJ8= X-Google-Smtp-Source: ABdhPJyC0EmHAsebzMwMzNnsSuMdml9/qx2iD7QYGfEax71Xx5jS0yeVfM8XyFkuPEFeIKziqcnG+g== X-Received: by 2002:a2e:b613:: with SMTP id r19mr349319ljn.136.1630512280531; Wed, 01 Sep 2021 09:04:40 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id v12sm22520lfi.49.2021.09.01.09.04.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Sep 2021 09:04:38 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 987685A001E; Wed, 1 Sep 2021 19:04:37 +0300 (MSK) Date: Wed, 1 Sep 2021 19:04:37 +0300 To: Serge Petrenko , Vladislav Shpilevoy Cc: tml Message-ID: References: <20210804190752.488147-1-gorcunov@gmail.com> <20210804190752.488147-3-gorcunov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.7 (2021-05-04) Subject: Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Mon, Aug 23, 2021 at 02:41:51PM +0300, Cyrill Gorcunov wrote: > On Mon, Aug 23, 2021 at 02:32:27PM +0300, Serge Petrenko wrote: > > > > > > In the next patch you would make txn_limbo_process_begin() > > > also take the request to validate it. Then the 'filtering' > > > would become internal to the limbo. > > > > I agree with Vlad here. > > > > txn_limbo_process_begin()/commit()/rollback > > > > looks more clean than calling lock()/unlock() manually. > > > > Let's stick with Vlad's proposal then. > > Sure thing, thanks! Guys, this scheme doesn't work, it makes code even more complex. The problem is that the completion is called from another fiber. Here is a log from replica [cyrill@grain 001_replication] grep promote_latch replica.log 2021-09-01 18:41:42.065 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: lock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i 2021-09-01 18:41:42.065 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: unlock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i 2021-09-01 18:41:42.224 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: lock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i 2021-09-01 18:41:42.225 [2061868] main I> limbo: unlock promote_latch 1: sched tarantool: /home/cyrill/projects/tarantool/tarantool.git/src/lib/core/latch.h:175: latch_unlock: Assertion `l->owner == fiber()' failed. Initially the limbo get latched by applier fiber but the completion called from sched fiber then which cause assertion to trigger. Actually we could add another helper which would unlock the promote_latch from applier but as to me this all become an extreme mess, and I'm pretty sure this is a bad design: locks always should be taken and released from the same function body, until there are some very serious reasons to not do so. I used int txn_limbo_process_begin(struct txn_limbo *limbo, const struct synchro_request *req) { ... latch_lock(&limbo->promote_latch); ... } txn_limbo_process_commit(struct txn_limbo *limbo, const struct synchro_request *req) { assert(latch_is_locked(&limbo->promote_latch)); ... latch_unlock(&limbo->promote_latch); } void txn_limbo_process_rollback(struct txn_limbo *limbo) { assert(latch_is_locked(&limbo->promote_latch)); latch_unlock(&limbo->promote_latch); } and it triggers from static void apply_synchro_row_cb(struct journal_entry *entry) { assert(entry->complete_data != NULL); struct synchro_entry *synchro_entry = (struct synchro_entry *)entry->complete_data; if (entry->res < 0) { applier_rollback_by_wal_io(entry->res); --> txn_limbo_process_rollback(&txn_limbo); } else { replica_txn_wal_write_cb(synchro_entry->rcb); --> txn_limbo_process_commit(&txn_limbo, synchro_entry->req); trigger_run(&replicaset.applier.on_wal_write, NULL); } fiber_wakeup(synchro_entry->owner); } I tend to step back to explicit locking/unlocking to not mess with triggers and completion contexts. Cyrill