[PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
Cyrill Gorcunov
gorcunov at gmail.com
Wed Mar 6 22:03:49 MSK 2019
On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote:
...
> > +
> > +static void
> > +fiber_wmark_recycle(struct fiber *fiber)
> > +{
> > + static bool overflow_warned = false;
> > +
> > + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
> > + return;
> > +
> > +#ifndef TARGET_OS_DARWIN
> > + /*
> > + * On recycle we're trying to shrink stack
> > + * as much as we can until first mark overwrite
> > + * detected, then we simply zap watermark and
> > + * assume the stack is balanced and won't change
> > + * much in future.
> > + */
>
> I don't understand. If the watermark has been exceeded once for a
> particular fiber, we'll never ever shrink the stack back to normal?
I guess you meant "extend", not "shrink". But true -- every recycle
we're trying to shrink stack, ie to make it less. And once we found
that wmark is vanished we stop further analisys.
> That is if a task eager for stack is rescheduled on different fibers
> they will all end up with huge stacks. That's not what we want.
> We want the stack size to be set dynamically so that the *majority* of
> fibers don't need to shrink/grow stacks back and forth. Or am I reading
> the code wrong?
No, you're right. But it is unclean for me yet how we should reach
this goal? Track stack usage in context switch?
...
> > +
> > + if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
> > + say_warn("fiber %d seems to overflow the stack soon",
> > + fiber->name, fiber->fid);
>
> No point in printing fiber->name and id - they are printed by the say
> module, anyway. At the same time I'd prefer to keep max stack size here
> and print its value whenever we have to grow stack, similarly to how
> the Linux kernel handles it.
ok
> > + if (stack_direction < 0) {
> > + fiber->stack_overflow_wmark = fiber->stack;
> > + fiber->stack_overflow_wmark += wmark_inpage_offset;
> > +
> > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
> > + } else {
> > + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size;
> > + fiber->stack_overflow_wmark -= page_size;
> > + fiber->stack_overflow_wmark -= wmark_inpage_offset;
> > +
> > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
>
> I'd rather start with the minimal allowed stack size and grow it on
> demand, if there are too many fibers that exceed the watermark now and
> then, not vice versa (imagine we start from 16 MB).
This won't work. Imagine we put wmark in _first_ page, and then we find
that the wmark has been overwriten, where to put next mark then? Or
you propose to catch signals?
> > +
> > + /*
> > + * To increase probability of the stack overflow
> > + * detection we put _first_ mark at random position
> > + * in first 128 bytes range. The rest of the marks
> > + * are put with constant step (because we should not
> > + * pressue random generator much in case of hight
> > + * number of fibers).
> > + */
> > + random_bytes((void *)&v, sizeof(v));
>
> Why not rand?
Because we already have a helper for random numbers :)
>
> Anyway, I think we should recalculate the offset for each fiber
> individually each time it's recycled. Setting it once on system
> initialization is no better than not using randomization at all.
> As for pressure on the random generator - I don't understand why
> it can possibly be bad. After all, it's a pseudo random generator,
> not /dev/random.
ok
> > @@ -348,6 +348,10 @@ struct fiber {
> > struct slab *stack_slab;
> > /** Coro stack addr. */
> > void *stack;
> > + /** Stack dynamic watermark addr for usage optimization. */
> > + void *stack_dynamic_wmark;
> > + /** Stack watermark addr for overflow detection. */
> > + void *stack_overflow_wmark;
>
> Here I'd like to see a more elaborate comment, explaining why we need
> these members, what exactly the stack usage optimization is about.
ok
More information about the Tarantool-patches
mailing list