[tarantool-patches] Re: [PATCH 6/9] vinyl: set range size automatically

Konstantin Osipov kostja at tarantool.org
Tue Feb 5 20:09:09 MSK 2019


* Vladimir Davydov <vdavydov.dev at gmail.com> [19/01/21 06:58]:
> +int64_t
> +vy_lsm_range_size(struct vy_lsm *lsm)
> +{
> +	/* Use the configured range size if available. */
> +	if (lsm->opts.range_size > 0)
> +		return lsm->opts.range_size;
> +	/*
> +	 * It doesn't make much sense to create too small ranges.
> +	 * Limit the max number of ranges per index to 1000 and
> +	 * never create ranges smaller than 16 MB.
> +	 */
> +	enum { MIN_RANGE_SIZE = 16 * 1024 * 1024 };
> +	enum { MAX_RANGE_COUNT = 1000 };
> +	/*
> +	 * Ideally, we want to compact roughly the same amount of
> +	 * data after each dump so as to avoid IO bursts caused by
> +	 * simultaneous major compaction of a bunch of ranges,
> +	 * because such IO bursts can lead to a deviation of the
> +	 * LSM tree from the configured shape and, as a result,
> +	 * increased read amplification.  To achieve that, we need
> +	 * to have at least as many ranges as the number of dumps
> +	 * it takes to trigger major compaction in a range.
> +	 */
> +	int range_count = vy_lsm_dumps_per_compaction(lsm);
> +	range_count = MIN(range_count, MAX_RANGE_COUNT);
> +	int64_t range_size = lsm->stat.disk.last_level_count.bytes /
> +						(range_count + 1);
> +	range_size = MAX(range_size, MIN_RANGE_SIZE);
> +	return range_size;
> +}

OK, you could say the value is rarely used, so can be calculated
each time it is used, but why instead not recalculate it on each
major compaction? This would spare us from technical debt and
having to think about potential performance bottleneck in the
future.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov




More information about the Tarantool-patches mailing list