From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 30E17469710 for ; Sat, 9 May 2020 10:37:24 +0300 (MSK) Received: by mail-lj1-f195.google.com with SMTP id a21so4042353ljb.9 for ; Sat, 09 May 2020 00:37:24 -0700 (PDT) Date: Sat, 9 May 2020 10:37:21 +0300 From: Konstantin Osipov Message-ID: <20200509073721.GA10454@atlas> References: <80bacfc685233ad047f6a80ddadd72b8903eae5b.1588292014.git.v.shpilevoy@tarantool.org> <20200507224535.GA14285@atlas> <908c2496-cf78-9286-c175-95046d4195f6@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <908c2496-cf78-9286-c175-95046d4195f6@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH vshard 6/7] router: make discovery smoother in a big cluster List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org * Vladislav Shpilevoy [20/05/08 22:57]: Here's a good description of how a merkle tree can be used: https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archAntiEntropyRepair.html With a merkle tree, you will only need to transfer the tree itself and then the ranges which have actually changed. > On 08/05/2020 00:45, Konstantin Osipov wrote: > > * Vladislav Shpilevoy [20/05/01 22:16]: > > > > Why not use merkle trees to only fetch the changed subset? > > This is what I found: https://en.wikipedia.org/wiki/Merkle_tree > "Every leaf node is labelled with the cryptographic hash of a data > block, and every non-leaf node is labelled with the cryptographic > hash in the labels of its child nodes. They can help ensure that > data blocks received from other peers in a peer-to-peer network > are received undamaged and unaltered, and even to check that the > other peers do not lie and send fake blocks." > > Correct me if I found something wrong. > > Firstly, hashes has nothing to do with that. Discovery fetches > bucket ids (in ranges, usually). And I still need to fetch bucket > ids. It can't dehash a value, received from the storage, into a > range of buckets. > > Secondly, storage does not depend on router and can't keep a state > of every router on it. If you meant, that the storage should keep > something on it. > > Thirdly, there is no in-place change, which would mean you just need > to fetch a new version of a bucket from the same storage. Change means > the bucket was moved to a different replicaset (in 99.99999% cases). > Deleted from one, and added on another. So you need to download it from > the new place. > > Otherwise I probably didn't understand what you meant. -- Konstantin Osipov, Moscow, Russia