Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja@tarantool.org>
To: tarantool-patches@freelists.org
Cc: georgy@tarantool.org
Subject: [tarantool-patches] Re: [replication 1/1] replication: Add rfc on vclock implementation
Date: Thu, 17 May 2018 14:46:53 +0300	[thread overview]
Message-ID: <20180517114653.GA9161@atlas> (raw)
In-Reply-To: <b58de1bce01ec6584ce8424a2a7769c3d6ba20c8.1526546132.git.imarkov@tarantool.org>

* Ilya Markov <imarkov@tarantool.org> [18/05/17 12:09]:
> Add description of possible redesigning of vector clocks.
> ---
>  doc/rfc/vclock_struct.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++

Please rename the rfc following the naming convention. E.g.
gh-XXX-vclock-struct

>  1 file changed, 73 insertions(+)
>  create mode 100644 doc/rfc/vclock_struct.md
> 
> diff --git a/doc/rfc/vclock_struct.md b/doc/rfc/vclock_struct.md
> new file mode 100644
> index 0000000..417caa9
> --- /dev/null
> +++ b/doc/rfc/vclock_struct.md
> @@ -0,0 +1,73 @@
> +# RFC Vclock implementation design
> +
> +* **Status**: in progress
> +* **Start date**: 16-05-2018
> +* **Authors**: Ilya Markov @IlyaMarkovMipt \<imarkov@tarantool.org\>
> +* **Issues**:
> +
> +## Summary
> +
> +Overview of possible implementations of vector clocks in large scale replicasets.
> +
> +## Background and motivation
> +
> +Vector clocks are used for following states(more exactly LSN) of nodes in replicasets.
> +Currently, the clocks are implemented with static arrays with size limited by constant `VCLOCK_MAX`
> +Indices of the array represent replica identifier in replicaset, value is LSN.
> +In a large scale environment array is far from the best implementation in terms of time and memory consumption.
> +
> +The main problem here is that within large scale nodes may be added and deleted and the array may contain
> + large gaps. So most of memory space might turn out to be useless.
> +
> +For example, in star topology, one replica has fully filled vclock,
> + others have large arrays with only two valuable for them cells.
> +
> +## Ideas
> +The new design must address the following requirements:
> +1. Minimize memory consumption within constantly changing replicaset.
> +2. Fast vector clock comparison, following taking into account frequent updated nodes.
> +
> +### Tree
> +As a possible solution to address the gap problem is to use a tree.
> + The tree allocates nodes only for non-empty values. So memory usage in this case is minimized.
> + Comparison and vclock following would take O(N), N -size of replicaset.
> +This time complexity is the same as in implementation with static array but with worse constant.
> +
> +Though operations like set and get take O(logN) instead constant time in array.
> +As we can notice vclock_get is highly used with replica ids, which are written in logs.
> +Under assumption that number of writing replicas is less than the size of replicaset,
> +the problem with vclock_get may be solved with some fixed size cache in front of tree,
> + which will contain frequently replicas lsns.
> +
> +### Remapping with garbage collecting
> +Another approach addressing gap problem is shifting replica id to the start of vclock array,
> +getting rid of gaps.
> +
> +This idea helps avoiding gaps and simplifies comparison, setting vector clocks.
> +On the other hand, it requires dedicated calls which follow the state of vclock and shift it, when gaps are found.
> +Also the shift requires remapping of replica identifiers which also costs something in terms of memory and time consumption.
> +
> +### Paging
> +Allocate fixed size arrays for ranges of ids and store references to them in hash/tree index.
> +For example, we have several ranges of ids: 1-10, 65-100. Let's assume size of each array is 32.
> +For this set, there would 3 ranges: 1-32, 65-96, 97-128. The index would contain 3 records, which could be get by 1, 65, 97 respectively.
> +
> +In this approach gaps are limited to the certain size, there is no need in shifting.
> +Copying and comparison are almost the same as in approach with static size array.
> +
> +### Skip-lists
> +One more possible solution to gap problem may be lists.
> +But, as we need to index sometimes, we can use skip-lists, which in terms of time complexity of indexing are almost the same as trees.
> +Moreover, traversing lists is faster than trees.
> +
> +Bad side of the idea is that it consumes memory excessively.
> +
> +## Conclusion
> +The most easiest to implement solution is a tree. Nevertheless, it needs an optimizations for vclock_get.
> +
> +The paging looks like an approach which solves the current problem with gaps and doesn't create new problems or complexities.
> +
> +The shifting with remapping looks the worst one to my mind, mostly because of its difficulty
> +and generating new maintaining processes(e.g remapping) and, therefore, new possible problems.
> +
> +Skip-lists are just one of variations of trees, but with extra memory consumption.
> -- 
> 2.7.4
> 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

  reply	other threads:[~2018-05-17 11:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-17  8:36 [tarantool-patches] " Ilya Markov
2018-05-17 11:46 ` Konstantin Osipov [this message]
2018-05-17 14:36 ` [tarantool-patches] " Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180517114653.GA9161@atlas \
    --to=kostja@tarantool.org \
    --cc=georgy@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [replication 1/1] replication: Add rfc on vclock implementation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox