From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id C52502BE48 for ; Tue, 23 Oct 2018 05:03:46 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P5O-KpDhwVEY for ; Tue, 23 Oct 2018 05:03:46 -0400 (EDT) Received: from smtp41.i.mail.ru (smtp41.i.mail.ru [94.100.177.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 863AF2B761 for ; Tue, 23 Oct 2018 05:03:46 -0400 (EDT) Received: by smtp41.i.mail.ru with esmtpa (envelope-from ) id 1gEsbM-0003vq-A6 for tarantool-patches@freelists.org; Tue, 23 Oct 2018 12:03:48 +0300 Date: Tue, 23 Oct 2018 12:03:46 +0300 From: Konstantin Osipov Subject: [tarantool-patches] Re: [PATCH 3/3] vinyl: force major compaction if there are too many DELETEs Message-ID: <20181023090346.GC17670@chai> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org * Vladimir Davydov [18/10/20 23:19]: > Even a perfectly shaped LSM tree can accumulate a huge number of DELETE > statements over time in case indexed fields are frequently updated. This > can significantly increase read and space amplification, especially for > secondary indexes. > > One way to deal with it is to propagate read amplification back to the > scheduler so that it can raise compaction priority accordingly. Although > this would probably make sense, it wouldn't be enough, because it > wouldn't deal with space amplification growth in case the workload is > write-mostly. I disagree with the reasoning. We need a weighted norm of all parameters of the lsm tree when calculating compaction priority. It's pretty easy to do. Imagine it's a multi-dimensional space, where dimensions are write amplification, read amplification, space amplification. We need to scale each dimension and calculated a distance to the center of the space, which stands for a perfectly shaped lsm. In any case reduction of read amplification and space amplification address independent concerns: we need to ensure that space amplification is within boundaries to not run out of disk space. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov