[PATCH v2 11/11] vinyl: introduce quota consumer priorities

Sat Oct 6 16:24:05 MSK 2018

* Vladimir Davydov <vdavydov.dev at gmail.com> [18/09/28 21:00]:

Since we agreed to go with the implemented approach, please see a
few implementation comments below.

> Currently, we only limit quota consumption rate so that writers won't
> hit the hard limit before memory dump is complete. However, it isn't
> enough, because we also need to consider compaction: if it doesn't keep
> up with dumps, read and space amplification will grow uncontrollably.
> 
> The problem is compaction may be a quota consumer by itself, as it may
> generate deferred DELETE statements for secondary indexes. We can't
> ignore quota completely there, because if we do, we may hit the memory
> limit and stall all writers, which is unacceptable, but we do want to
> ignore the rate limit imposed to make sure that compaction keeps up with
> dumps, otherwise compaction won't benefit from such a throttling.
> 
> To tackle this problem, this patch introduces quota consumer priorities.
> Now a quota object maintains one rate limit and one wait queue per each
> transaction priority. Rate limit i is used by a consumer with priority
> prio if and only if prio <= i. Whenever a consumer has to be throttled,
> it is put to sleep to the wait queue corresponding to its priority.
> When quota is replenished, we pick the oldest consumer among all that
> may be woken up. This ensures fairness.

As discusses, these are not priorities, these are resource types.
Please change the patch to pass a bit mask of resource types for
which we request the quota.

Quota refill interval should be varying - please schedule this
work in a separate patch.

>  static void
>  vy_quota_signal(struct vy_quota *q)
>  {
> -	if (!rlist_empty(&q->wait_queue)) {
> +	/*
> +	 * Wake up a consumer that has waited most no matter
> +	 * whether it's high or low priority. This assures that
> +	 * high priority consumers don't uncontrollably throttle
> +	 * low priority ones.
> +	 */
> +	struct vy_quota_wait_node *oldest = NULL;
> +	for (int i = 0; i < vy_quota_consumer_prio_MAX; i++) {
> +		struct rlist *wq = &q->wait_queue[i];
> +		if (rlist_empty(wq))
> +			continue;

I still do not understand why you need a second queue and
timestapms. If you need to ensure total fairness, a single queue
should be enough. 
> +
>  		struct vy_quota_wait_node *n;
> -		n = rlist_first_entry(&q->wait_queue,
> -				      struct vy_quota_wait_node, in_wait_queue);
> +		n = rlist_first_entry(wq, struct vy_quota_wait_node,
> +				      in_wait_queue);
>  		/*
>  		 * No need in waking up a consumer if it will have
>  		 * to go back to sleep immediately.
>  		 */
> -		if (vy_quota_may_use(q, n->size))
> -			fiber_wakeup(n->fiber);
> +		if (!vy_quota_may_use(q, i, n->size))
> +			continue;
> +
> +		if (oldest == NULL || oldest->timestamp > n->timestamp)
> +			oldest = n;
>  	}
> +	if (oldest != NULL)
> +		fiber_wakeup(oldest->fiber);

This is some kind of black magic to me. Imagine you have another
resource type, say, CPU? Will you add the third queue? How this
will piece of code look then?

I'm OK if you go with a single queue in which you register all
requests, regardless of whether they are for disk, memory, or both
- and satisfy all requests in vy_quota_signal(). But I don't
immediately  understand the magic with two wait queues - and
frankly still don't see any need for it. 
> +	/**
> +	 * Timestamp assigned to this fiber when it was put to
> +	 * sleep, see vy_quota::wait_timestamp for more details.
> +	 */
> +	int64_t timestamp;

If you have a single queue, you don't need this. The name is to
general in any case.

>  };
>  
>  /**
> @@ -144,13 +186,27 @@ struct vy_quota {
>  	 */
>  	vy_quota_exceeded_f quota_exceeded_cb;
>  	/**
> -	 * Queue of consumers waiting for quota, linked by
> -	 * vy_quota_wait_node::state. Newcomers are added
> -	 * to the tail.
> +	 * Monotonically growing timestamp assigned to consumers
> +	 * waiting for quota. It is used for balancing wakeups
> +	 * among wait queues: if two fibers from different wait
> +	 * queues may proceed, the one with the lowest timestamp
> +	 * will be picked.
> +	 *
> +	 * See also vy_quota_wait_node::timestamp.
> +	 */
> +	int64_t wait_timestamp;

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov