[PATCH 3/3] core/fiber: Put watermarks into stack to track its usage

Vladimir Davydov vdavydov.dev at gmail.com
Thu Mar 7 11:27:58 MSK 2019


On Wed, Mar 06, 2019 at 10:03:49PM +0300, Cyrill Gorcunov wrote:
> On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote:
> ...
> > > +
> > > +static void
> > > +fiber_wmark_recycle(struct fiber *fiber)
> > > +{
> > > +	static bool overflow_warned = false;
> > > +
> > > +	if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
> > > +		return;
> > > +
> > > +#ifndef TARGET_OS_DARWIN
> > > +	/*
> > > +	 * On recycle we're trying to shrink stack
> > > +	 * as much as we can until first mark overwrite
> > > +	 * detected, then we simply zap watermark and
> > > +	 * assume the stack is balanced and won't change
> > > +	 * much in future.
> > > +	 */
> > 
> > I don't understand. If the watermark has been exceeded once for a
> > particular fiber, we'll never ever shrink the stack back to normal?
> 
> I guess you meant "extend", not "shrink". But true -- every recycle
> we're trying to shrink stack, ie to make it less. And once we found
> that wmark is vanished we stop further analisys.
> 
> > That is if a task eager for stack is rescheduled on different fibers
> > they will all end up with huge stacks. That's not what we want.
> > We want the stack size to be set dynamically so that the *majority* of
> > fibers don't need to shrink/grow stacks back and forth. Or am I reading
> > the code wrong?
> 
> No, you're right. But it is unclean for me yet how we should reach
> this goal? Track stack usage in context switch?

You can track the ratio of fibers that have overwritten the watermark
recently. If the ratio gets too high (say 90% out of 1000 recently
scheduled fibers), the minimal stack allocation should be increased.

> ...
> > > +
> > > +	if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
> > > +		say_warn("fiber %d seems to overflow the stack soon",
> > > +			 fiber->name, fiber->fid);
> > 
> > No point in printing fiber->name and id - they are printed by the say
> > module, anyway. At the same time I'd prefer to keep max stack size here
> > and print its value whenever we have to grow stack, similarly to how
> > the Linux kernel handles it.
> 
> ok
> 
> > > +	if (stack_direction < 0) {
> > > +		fiber->stack_overflow_wmark  = fiber->stack;
> > > +		fiber->stack_overflow_wmark += wmark_inpage_offset;
> > > +
> > > +		fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
> > > +	} else {
> > > +		fiber->stack_overflow_wmark  = fiber->stack + fiber->stack_size;
> > > +		fiber->stack_overflow_wmark -= page_size;
> > > +		fiber->stack_overflow_wmark -= wmark_inpage_offset;
> > > +
> > > +		fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
> > 
> > I'd rather start with the minimal allowed stack size and grow it on
> > demand, if there are too many fibers that exceed the watermark now and
> > then, not vice versa (imagine we start from 16 MB).
> 
> This won't work. Imagine we put wmark in _first_ page, and then we find
> that the wmark has been overwriten, where to put next mark then? Or
> you propose to catch signals?

Put the next watermark in the second page then. If it gets overwritten,
too, put it in the third page, and so on. Perhaps, we can even double
the minimal stack allocation each time.

> 
> > > +
> > > +	/*
> > > +	 * To increase probability of the stack overflow
> > > +	 * detection we put _first_ mark at random position
> > > +	 * in first 128 bytes range. The rest of the marks
> > > +	 * are put with constant step (because we should not
> > > +	 * pressue random generator much in case of hight
> > > +	 * number of fibers).
> > > +	 */
> > > +	random_bytes((void *)&v, sizeof(v));
> > 
> > Why not rand?
> 
> Because we already have a helper for random numbers :)

It's a helper for filling an array with random bytes. I don't see much
point in using it for generating a single random number.



More information about the Tarantool-patches mailing list