From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vdavydov.dev@gmail.com>
Date: Sat, 9 Jun 2018 15:36:02 +0300
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [tarantool-patches] [PATCH v2 1/2] box: create bigrefs for tuples
Message-ID: <20180609123602.3wfqvq5abodoeyqg@esperanza>
References: <cover.1528544705.git.imeevma@gmail.com>
 <6601bc3bb229282c7489ad6391c93319c5ff0366.1528544705.git.imeevma@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <6601bc3bb229282c7489ad6391c93319c5ff0366.1528544705.git.imeevma@gmail.com>
To: imeevma@tarantool.org
Cc: tarantool-patches@freelists.org
List-ID: <tarantool-patches.dev.tarantool.org>

On Sat, Jun 09, 2018 at 02:49:06PM +0300, imeevma@tarantool.org wrote:
> Due to limitation of reference counters for tuple being only
> 65535 it was possible to reach this limitation. This patch
> increases capacity of reference counters to 4 billions.
> 
> Closes #3224
> ---

> Branch: https://github.com/tarantool/tarantool/tree/imeevma/gh-3224-tuple-bigrefs
> Issue: https://github.com/tarantool/tarantool/issues/3224

If there's a cover letter, links should be added only there, not to
individual patches.

> @@ -174,20 +174,22 @@ process_rw(struct request *request, struct space *space, struct tuple **result)
>  		txn_rollback_stmt();
>  		return -1;
>  	}
> +	if (result == NULL)
> +		return txn_commit_stmt(txn, request);
> +	*result = tuple;
> +	if (tuple == NULL)
> +		return txn_commit_stmt(txn, request);
>  	/*
>  	 * Pin the tuple locally before the commit,
>  	 * otherwise it may go away during yield in
>  	 * when WAL is written in autocommit mode.
>  	 */
> -	TupleRefNil ref(tuple);
> -	if (txn_commit_stmt(txn, request) != 0)
> -		return -1;
> -	if (result != NULL) {
> -		if (tuple != NULL && tuple_bless(tuple) == NULL)
> -			return -1;
> -		*result = tuple;
> -	}
> -	return 0;
> +	tuple_ref(tuple);
> +	int rc = txn_commit_stmt(txn, request);
> +	if (rc == 0)
> +		tuple_bless(tuple);
> +	tuple_unref(tuple);
> +	return rc;

I don't like they you rework process_rw. IMO calling txn_commit_stmt
conditionally obfuscates the function work flow, which should be as
simple as

    begin_stmt
    execute_request
    commit_stmt

> diff --git a/src/box/tuple.c b/src/box/tuple.c
> index 014f374..68540f4 100644
> --- a/src/box/tuple.c
> +++ b/src/box/tuple.c
> @@ -48,6 +48,26 @@ enum {
>  	OBJSIZE_MIN = 16,
>  };
>  
> +/**
> + * Container for big reference counters. Contains array of big
> + * reference counters, size of this array and number of non-zero
> + * big reference counters. When reference counter of tuple becomes
> + * more than 32767, field refs of this tuple becomes index of big
> + * reference counter in big reference counter array and field
> + * is_bigref is set true. The moment big reference becomes equal
> + * 32767 it is set to 0, refs of the tuple becomes 32767 and
> + * is_bigref becomes false. Big reference counter can be equal to
> + * 0 or be more than 32767.
> + */
> +static struct bigref_list {
> +	/** Array of big reference counters. */
> +	uint32_t *refs;
> +	/** Number of non-zero elements in the array. */
> +	uint16_t size;
> +	/** Capacity of the array. */
> +	uint16_t capacity;
> +} bigref_list;

Technically, it isn't a list, but rather an array. Anyway, I think I'd
call this structure simply 'bigref', and functions for allocating and
freeing big references - 'bigref_alloc' and 'bigref_free'.

> +/**
> + * Return index for new big reference counter and allocate memory
> + * if needed.
> + * @retval index for new big reference counter.
> + */
> +static inline uint16_t
> +bigref_list_new_index()
> +{
> +	if (bigref_list.size < bigref_list.capacity) {
> +		uint16_t vacant_index = 0;
> +		while (bigref_list.refs[vacant_index] != 0)
> +			++vacant_index;
> +		++bigref_list.size;
> +		return vacant_index;
> +	}

Why don't you use a free list for this? This would improve the
worst case complexity of a bigref allocation from O(N) to O(1).

> +	/* Extend the array. */
> +	uint16_t capacity = bigref_list.capacity;
> +	if (capacity == 0)
> +		capacity = BIGREF_MIN_CAPACITY;
> +	else if (capacity < BIGREF_MAX_CAPACITY)
> +		capacity = MIN(capacity * BIGREF_FACTOR, BIGREF_MAX_CAPACITY);
> +	else
> +		panic("Too many big references");
> +	uint32_t *refs = (uint32_t *) realloc(bigref_list.refs, capacity *
> +					      sizeof(*bigref_list.refs));
> +	if (refs == NULL) {
> +		panic("failed to reallocate %zu bytes: Cannot allocate "\
> +		      "memory.", capacity * sizeof(*bigref_list.refs));
> +	}

The max size of a bigref array is 32767 ints, i.e. just 128 KB, and
there's the only object of this type. Is there any point in growing it
dynamically rather than allocating the max size statically on startup?

> +	bigref_list.refs = refs;
> +	memset(bigref_list.refs + bigref_list.capacity, 0, (capacity -
> +	       bigref_list.capacity) * sizeof(*bigref_list.refs));
> +	bigref_list.capacity = capacity;
> +	return bigref_list.size++;
> +}
> +
> +void
> +tuple_ref_slow(struct tuple *tuple)
> +{
> +	assert(tuple->is_bigref || tuple->refs == TUPLE_REF_MAX);
> +	if (! tuple->is_bigref) {
> +		tuple->ref_index = bigref_list_new_index();
> +		tuple->is_bigref = true;
> +		bigref_list.refs[tuple->ref_index] = TUPLE_REF_MAX;
> +	} else if (bigref_list.refs[tuple->ref_index] == BIGREF_MAX) {
> +		panic("Tuple big reference counter overflow");
> +	}
> +	bigref_list.refs[tuple->ref_index]++;
> +}
> +
> +/**
> + * Try to decrease allocated memory if it is possible. Free memory
> + * when size == 0.
> + */
> +static inline void
> +bigref_list_delete_index(uint16_t index)
> +{
> +	bigref_list.refs[index] = 0;
> +	if (--bigref_list.size == 0) {
> +		bigref_list_reset();
> +		return;
> +	}
> +	if (bigref_list.capacity == BIGREF_MIN_CAPACITY ||
> +	    bigref_list.size > bigref_list.capacity / BIGREF_FACTOR)
> +		return;
> +
> +	uint16_t top_index = bigref_list.capacity - 1;
> +	while (bigref_list.refs[top_index] == 0)
> +		top_index--;
> +
> +	uint16_t needed_capacity = top_index + 1;
> +	if (needed_capacity < BIGREF_MIN_CAPACITY)
> +		needed_capacity = BIGREF_MIN_CAPACITY;
> +	if (needed_capacity > bigref_list.capacity / BIGREF_FACTOR)
> +		return;
> +	/* Round up capacity to the next highest power of 2. */
> +	assert(sizeof(needed_capacity) == sizeof(uint16_t));
> +	needed_capacity--;
> +	needed_capacity |= needed_capacity >> 1;
> +	needed_capacity |= needed_capacity >> 2;
> +	needed_capacity |= needed_capacity >> 4;
> +	needed_capacity |= needed_capacity >> 8;
> +	assert(needed_capacity < UINT16_MAX);
> +	needed_capacity++;
> +	uint32_t *refs =
> +		(uint32_t *) realloc(bigref_list.refs, needed_capacity *
> +				     sizeof(*bigref_list.refs));
> +	if (refs == NULL) {
> +		panic("failed to reallocate %zu bytes: Cannot allocate "\
> +		      "memory.", needed_capacity * sizeof(*bigref_list.refs));
> +	}

Even if we agree to grow the bigref array dynamically, I see absolutely
no point in shrinking it: if we used N big refs once, we are likely to
use N big refs again in the future.

> +	bigref_list.refs = refs;
> +	bigref_list.capacity = needed_capacity;
> +}