[PATCH 3/3] core/fiber: Put watermarks into stack to track its usage

Cyrill Gorcunov gorcunov at gmail.com
Wed Mar 6 22:03:49 MSK 2019


On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote:
...
> > +
> > +static void
> > +fiber_wmark_recycle(struct fiber *fiber)
> > +{
> > +	static bool overflow_warned = false;
> > +
> > +	if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
> > +		return;
> > +
> > +#ifndef TARGET_OS_DARWIN
> > +	/*
> > +	 * On recycle we're trying to shrink stack
> > +	 * as much as we can until first mark overwrite
> > +	 * detected, then we simply zap watermark and
> > +	 * assume the stack is balanced and won't change
> > +	 * much in future.
> > +	 */
> 
> I don't understand. If the watermark has been exceeded once for a
> particular fiber, we'll never ever shrink the stack back to normal?

I guess you meant "extend", not "shrink". But true -- every recycle
we're trying to shrink stack, ie to make it less. And once we found
that wmark is vanished we stop further analisys.

> That is if a task eager for stack is rescheduled on different fibers
> they will all end up with huge stacks. That's not what we want.
> We want the stack size to be set dynamically so that the *majority* of
> fibers don't need to shrink/grow stacks back and forth. Or am I reading
> the code wrong?

No, you're right. But it is unclean for me yet how we should reach
this goal? Track stack usage in context switch?
...
> > +
> > +	if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
> > +		say_warn("fiber %d seems to overflow the stack soon",
> > +			 fiber->name, fiber->fid);
> 
> No point in printing fiber->name and id - they are printed by the say
> module, anyway. At the same time I'd prefer to keep max stack size here
> and print its value whenever we have to grow stack, similarly to how
> the Linux kernel handles it.

ok

> > +	if (stack_direction < 0) {
> > +		fiber->stack_overflow_wmark  = fiber->stack;
> > +		fiber->stack_overflow_wmark += wmark_inpage_offset;
> > +
> > +		fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
> > +	} else {
> > +		fiber->stack_overflow_wmark  = fiber->stack + fiber->stack_size;
> > +		fiber->stack_overflow_wmark -= page_size;
> > +		fiber->stack_overflow_wmark -= wmark_inpage_offset;
> > +
> > +		fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
> 
> I'd rather start with the minimal allowed stack size and grow it on
> demand, if there are too many fibers that exceed the watermark now and
> then, not vice versa (imagine we start from 16 MB).

This won't work. Imagine we put wmark in _first_ page, and then we find
that the wmark has been overwriten, where to put next mark then? Or
you propose to catch signals?

> > +
> > +	/*
> > +	 * To increase probability of the stack overflow
> > +	 * detection we put _first_ mark at random position
> > +	 * in first 128 bytes range. The rest of the marks
> > +	 * are put with constant step (because we should not
> > +	 * pressue random generator much in case of hight
> > +	 * number of fibers).
> > +	 */
> > +	random_bytes((void *)&v, sizeof(v));
> 
> Why not rand?

Because we already have a helper for random numbers :)

> 
> Anyway, I think we should recalculate the offset for each fiber
> individually each time it's recycled. Setting it once on system
> initialization is no better than not using randomization at all.
> As for pressure on the random generator - I don't understand why
> it can possibly be bad. After all, it's a pseudo random generator,
> not /dev/random.

ok

> > @@ -348,6 +348,10 @@ struct fiber {
> >  	struct slab *stack_slab;
> >  	/** Coro stack addr. */
> >  	void *stack;
> > +	/** Stack dynamic watermark addr for usage optimization. */
> > +	void *stack_dynamic_wmark;
> > +	/** Stack watermark addr for overflow detection. */
> > +	void *stack_overflow_wmark;
> 
> Here I'd like to see a more elaborate comment, explaining why we need
> these members, what exactly the stack usage optimization is about.

ok



More information about the Tarantool-patches mailing list