* [PATCH v3 0/3] core/fiber: Rework stack management @ 2019-03-05 22:38 Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov Guys, take a look please once time permit. Deep testing is quite needed though I have no clue how to force lua to use stack intensively. Cyrill Gorcunov (3): core/fiber: Increase default stack size core/fiber: Mark stack as unneeded on creation core/fiber: Put watermarks into stack to track its usage src/lib/core/fiber.c | 200 ++++++++++++++++++++++++++++++++++++++++++- src/lib/core/fiber.h | 4 + 2 files changed, 200 insertions(+), 4 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] core/fiber: Increase default stack size 2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov @ 2019-03-05 22:38 ` Cyrill Gorcunov 2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko 2019-03-05 22:38 ` [PATCH 2/3] core/fiber: Mark stack as unneeded on creation Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov 2 siblings, 1 reply; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov The default 64K stack size used for years become too small for modern distors (Fedora 29 and etc) where third party libraries (such as ncurses) started to use 64K for own buffers and we get SIGSGV early without reaching interactive console phase. Thus we increase default size up to 256K which should fit for common case. Later we will make this value configurable to address arbitrary stack sizes without a need to rebuild the code. Note the values are switched to 4K page granularity for sake of future modifications -- we gonna manipulate pages to relax rss usage if OS allows. Closes #3418 --- src/lib/core/fiber.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index abd6c6b11..d4d16c05e 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -92,10 +92,10 @@ static size_t page_size; static int stack_direction; enum { - /* The minimum allowable fiber stack size in bytes */ - FIBER_STACK_SIZE_MINIMAL = 16384, - /* Default fiber stack size in bytes */ - FIBER_STACK_SIZE_DEFAULT = 65536 + /* The minimum allowable fiber stack size in 4K page */ + FIBER_STACK_SIZE_MINIMAL = 4 << 12, + /* Default fiber stack size in 4K page */ + FIBER_STACK_SIZE_DEFAULT = 64 << 12, }; /** Default fiber attributes */ -- 2.20.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tarantool-patches] [PATCH 1/3] core/fiber: Increase default stack size 2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov @ 2019-03-05 23:43 ` Alexander Turenko 2019-03-06 7:09 ` Cyrill Gorcunov 2019-03-06 7:30 ` Konstantin Osipov 0 siblings, 2 replies; 12+ messages in thread From: Alexander Turenko @ 2019-03-05 23:43 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tarantool-patches, vdavydov.dev, kostja On Wed, Mar 06, 2019 at 01:38:52AM +0300, Cyrill Gorcunov wrote: > The default 64K stack size used for years become too small > for modern distors (Fedora 29 and etc) where third party libraries > (such as ncurses) started to use 64K for own buffers and we get > SIGSGV early without reaching interactive console phase. > > Thus we increase default size up to 256K which should fit > for common case. Later we will make this value configurable > to address arbitrary stack sizes without a need to rebuild > the code. > > Note the values are switched to 4K page granularity for sake > of future modifications -- we gonna manipulate pages to > relax rss usage if OS allows. > > Closes #3418 > --- > src/lib/core/fiber.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index abd6c6b11..d4d16c05e 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -92,10 +92,10 @@ static size_t page_size; > static int stack_direction; > > enum { > - /* The minimum allowable fiber stack size in bytes */ > - FIBER_STACK_SIZE_MINIMAL = 16384, > - /* Default fiber stack size in bytes */ > - FIBER_STACK_SIZE_DEFAULT = 65536 > + /* The minimum allowable fiber stack size in 4K page */ > + FIBER_STACK_SIZE_MINIMAL = 4 << 12, > + /* Default fiber stack size in 4K page */ > + FIBER_STACK_SIZE_DEFAULT = 64 << 12, I recently looked at #3564 and observed that 256 KiB of stack is not enough for our http client (the wrapper around libcurl). I bisected how much it requeres and found that it fails with 448 KiB, but succeeds with 480 KiB. I just tried the following case with different FIBER_STACK_SIZE_DEFAULT constants on our tt-mac machine: ./src/tarantool <<< "print(require('json').encode(require('http.client').get('http://google.com')))" Tarantool is linked to libcurl-7.61.0: $ otool -L ./src/tarantool | grep libcurl /usr/local/opt/curl/lib/libcurl.4.dylib (compatibility version 10.0.0, current version 10.0.0) $ strings /usr/local/opt/curl/lib/libcurl.4.dylib | grep libcurl/ libcurl/7.61.0 However this case succeeds on my Linux machine on tarantool with 64 KiB stack with libcurl-7.63.0. Don't sure what is matter: libcurl version, OS, some compilation or runtime options of libcurl. But at least we see that there are environments where 256 KiB is not enough. Maybe we should investigate what lead libcurl to use so much stack size in that environment and possibly will find a way to configure it (at runtime) to avoid this. Don't sure. So this message is just for the record. https://github.com/tarantool/tarantool/issues/3564 > }; > > /** Default fiber attributes */ > -- > 2.20.1 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tarantool-patches] [PATCH 1/3] core/fiber: Increase default stack size 2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko @ 2019-03-06 7:09 ` Cyrill Gorcunov 2019-03-06 7:30 ` Konstantin Osipov 1 sibling, 0 replies; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-06 7:09 UTC (permalink / raw) To: Alexander Turenko; +Cc: tarantool-patches, vdavydov.dev, kostja On Wed, Mar 06, 2019 at 02:43:37AM +0300, Alexander Turenko wrote: ... > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > > index abd6c6b11..d4d16c05e 100644 > > --- a/src/lib/core/fiber.c > > +++ b/src/lib/core/fiber.c > > @@ -92,10 +92,10 @@ static size_t page_size; > > static int stack_direction; > > > > enum { > > - /* The minimum allowable fiber stack size in bytes */ > > - FIBER_STACK_SIZE_MINIMAL = 16384, > > - /* Default fiber stack size in bytes */ > > - FIBER_STACK_SIZE_DEFAULT = 65536 > > + /* The minimum allowable fiber stack size in 4K page */ > > + FIBER_STACK_SIZE_MINIMAL = 4 << 12, > > + /* Default fiber stack size in 4K page */ > > + FIBER_STACK_SIZE_DEFAULT = 64 << 12, > > I recently looked at #3564 and observed that 256 KiB of stack is not > enough for our http client (the wrapper around libcurl). I bisected how > much it requeres and found that it fails with 448 KiB, but succeeds with > 480 KiB. I just tried the following case with different > FIBER_STACK_SIZE_DEFAULT constants on our tt-mac machine: > > ./src/tarantool <<< "print(require('json').encode(require('http.client').get('http://google.com')))" > > Tarantool is linked to libcurl-7.61.0: > > $ otool -L ./src/tarantool | grep libcurl > /usr/local/opt/curl/lib/libcurl.4.dylib (compatibility version 10.0.0, current version 10.0.0) > $ strings /usr/local/opt/curl/lib/libcurl.4.dylib | grep libcurl/ > libcurl/7.61.0 You know, I think this run and catch game will continue forever, because we simply don't know in which environment users will run tarantool. As being discussed with Vladimir we must provide a way to a user to configure stack size. Most probably via environment variable. I simply didn't implement it yet. Also initially we started with 1M stack but then desided to shrink back to 256K. Looks like 512K would suffice all sides. > However this case succeeds on my Linux machine on tarantool with 64 KiB > stack with libcurl-7.63.0. Don't sure what is matter: libcurl version, > OS, some compilation or runtime options of libcurl. But at least we see > that there are environments where 256 KiB is not enough. > > Maybe we should investigate what lead libcurl to use so much stack size > in that environment and possibly will find a way to configure it (at > runtime) to avoid this. Don't sure. So this message is just for the > record. > > https://github.com/tarantool/tarantool/issues/3564 libcurl and etc are third party libs which we don't control so we of course could investigate but only once time permit. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tarantool-patches] [PATCH 1/3] core/fiber: Increase default stack size 2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko 2019-03-06 7:09 ` Cyrill Gorcunov @ 2019-03-06 7:30 ` Konstantin Osipov 1 sibling, 0 replies; 12+ messages in thread From: Konstantin Osipov @ 2019-03-06 7:30 UTC (permalink / raw) To: Alexander Turenko; +Cc: Cyrill Gorcunov, tarantool-patches, vdavydov.dev * Alexander Turenko <alexander.turenko@tarantool.org> [19/03/06 10:24]: Alexander, we plan to make the default stack size configurable. > On Wed, Mar 06, 2019 at 01:38:52AM +0300, Cyrill Gorcunov wrote: > > The default 64K stack size used for years become too small > > for modern distors (Fedora 29 and etc) where third party libraries > > (such as ncurses) started to use 64K for own buffers and we get > > SIGSGV early without reaching interactive console phase. > > > > Thus we increase default size up to 256K which should fit > > for common case. Later we will make this value configurable > > to address arbitrary stack sizes without a need to rebuild > > the code. > > > > Note the values are switched to 4K page granularity for sake > > of future modifications -- we gonna manipulate pages to > > relax rss usage if OS allows. > > > > Closes #3418 > > --- > > src/lib/core/fiber.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > > index abd6c6b11..d4d16c05e 100644 > > --- a/src/lib/core/fiber.c > > +++ b/src/lib/core/fiber.c > > @@ -92,10 +92,10 @@ static size_t page_size; > > static int stack_direction; > > > > enum { > > - /* The minimum allowable fiber stack size in bytes */ > > - FIBER_STACK_SIZE_MINIMAL = 16384, > > - /* Default fiber stack size in bytes */ > > - FIBER_STACK_SIZE_DEFAULT = 65536 > > + /* The minimum allowable fiber stack size in 4K page */ > > + FIBER_STACK_SIZE_MINIMAL = 4 << 12, > > + /* Default fiber stack size in 4K page */ > > + FIBER_STACK_SIZE_DEFAULT = 64 << 12, > > I recently looked at #3564 and observed that 256 KiB of stack is not > enough for our http client (the wrapper around libcurl). I bisected how > much it requeres and found that it fails with 448 KiB, but succeeds with > 480 KiB. I just tried the following case with different > FIBER_STACK_SIZE_DEFAULT constants on our tt-mac machine: > > ./src/tar -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] core/fiber: Mark stack as unneeded on creation 2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov @ 2019-03-05 22:38 ` Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov 2 siblings, 0 replies; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov Since we've increased the default stack size we hope the whole 256K won't be used for regular loads thus we mark the stack area as unneeded to minimize rss pressure. Note we do this on fiber creation at the moment, more detailed stack usage analisys will be in next patches since it doesn't depend on this change. Closes #3418 --- src/lib/core/fiber.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index d4d16c05e..64bcda26a 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -749,6 +749,13 @@ fiber_stack_create(struct fiber *fiber, size_t stack_size) fiber->stack_id = VALGRIND_STACK_REGISTER(fiber->stack, (char *)fiber->stack + fiber->stack_size); +#ifndef TARGET_OS_DARWIN + /* + * We don't expect the whole stack usage in regular + * loads, lets try to minimize rss pressure. + */ + madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED); +#endif mprotect(guard, page_size, PROT_NONE); return 0; -- 2.20.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage 2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 2/3] core/fiber: Mark stack as unneeded on creation Cyrill Gorcunov @ 2019-03-05 22:38 ` Cyrill Gorcunov 2019-03-06 18:05 ` Vladimir Davydov 2 siblings, 1 reply; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov We want to detect a situation where task in fiber is too eager for stack and close to its exhausting. For this sake upon stack creation we put 8 marks on last stack page with step of 128 bytes. Such params allows us to fill ~1/4 of a page, which does seem reasonable but we might change this params with time. Since the watermark position is permanent and some task is close to stack limit we print about the situation once to not spam a user much and stop putting the mark on recycling. Still this techique doesn't guarantee to handle all possible situations so to increas probability of catching eager fibers we put marks *not* at page start address but withing some offset, randomly generated during startup procedure. To minimize a pressue on memory system we try to relax stack usage with that named dynamic watermark. Basically it is the same marks as for overflow detection but placed "earlier" and on every recycle we try to shrink stack usage moving dynamic mark closer to stack start dropping clean pages with madvise help if OS supports it. Thus the stack will look like (for "growsdown" case) +--- <- fiber->stack + fiber->stack_size | | first stack page | +--- | ~~~~ | +--- | ### <- fiber->stack_dynamic_wmark | | +--- | ### <- fiber->stack_overflow_wmark | last usabe page | +--- | guard page +--- Closes #3418 --- v3: - add random offset from beginning of the page to put mark not at predefined address but by chance - drop freeze/frozen macros just use NULL - guard with TARGET_OS_DARWIN where needed src/lib/core/fiber.c | 185 +++++++++++++++++++++++++++++++++++++++++++ src/lib/core/fiber.h | 4 + 2 files changed, 189 insertions(+) diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 64bcda26a..d18227e7f 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -41,6 +41,7 @@ #include "assoc.h" #include "memory.h" #include "trigger.h" +#include "random.h" #include "third_party/valgrind/memcheck.h" @@ -104,6 +105,32 @@ static const struct fiber_attr fiber_attr_default = { .flags = FIBER_DEFAULT_FLAGS }; +/* + * Random values generated with uuid. + */ +static const uint64_t poison_pool[] = { + 0x74f31d37285c4c37, 0xb10269a05bf10c29, + 0x0994d845bd284e0f, 0x9ffd4f7129c184df, + 0x357151e6711c4415, 0x8c5e5f41aafe6f28, + 0x6917dd79e78049d5, 0xba61957c65ca2465, +}; + +/* + * An offset inside page to the first mark. + */ +static unsigned int wmark_inpage_offset; + +/* + * We poison by 8 bytes as it natural for stack + * step on x86-64. Also 128 byte gap between + * poison values should cover a common cases. + */ +#define POISON_SIZE (sizeof(poison_pool) / sizeof(poison_pool[0])) +#define POISON_GAP (128 + sizeof(poison_pool[0])) +#define POISON_OFF (POISON_GAP / sizeof(poison_pool[0])) + +static void fiber_wmark_recycle(struct fiber *fiber); + void fiber_attr_create(struct fiber_attr *fiber_attr) { @@ -624,6 +651,7 @@ fiber_recycle(struct fiber *fiber) /* no pending wakeup */ assert(rlist_empty(&fiber->state)); bool has_custom_stack = fiber->flags & FIBER_CUSTOM_STACK; + fiber_wmark_recycle(fiber); fiber_reset(fiber); fiber->name[0] = '\0'; fiber->f = NULL; @@ -710,6 +738,157 @@ page_align_up(void *ptr) return page_align_down(ptr + page_size - 1); } +static bool +stack_has_wmark(void *addr) +{ + const uint64_t *src = poison_pool; + const uint64_t *dst = addr; + size_t i; + + for (i = 0; i < POISON_SIZE; i++) { + if (*dst != src[i]) + return false; + dst += POISON_OFF; + } + + return true; +} + +static void +stack_put_wmark(void *addr) +{ + const uint64_t *src = poison_pool; + uint64_t *dst = addr; + size_t i; + + for (i = 0; i < POISON_SIZE; i++) { + *dst = src[i]; + dst += POISON_OFF; + } +} + +#ifndef TARGET_OS_DARWIN +static void +stack_shrink(struct fiber *fiber) +{ + void *hi, *lo; + + if (stack_direction < 0) { + hi = fiber->stack + fiber->stack_size; + lo = fiber->stack_overflow_wmark+ page_size; + } else { + hi = fiber->stack_overflow_wmark - page_size; + lo = fiber->stack - fiber->stack_size; + } + + if (fiber->stack_dynamic_wmark <= lo || + fiber->stack_dynamic_wmark >= hi) + return; + + if (stack_direction < 0) { + madvise(page_align_up(fiber->stack_dynamic_wmark), + page_size, MADV_DONTNEED); + fiber->stack_dynamic_wmark += page_size; + } else { + madvise(page_align_down(fiber->stack_dynamic_wmark), + page_size, MADV_DONTNEED); + fiber->stack_dynamic_wmark -= page_size; + } + stack_put_wmark(fiber); +} +#endif + +static void +fiber_wmark_recycle(struct fiber *fiber) +{ + static bool overflow_warned = false; + + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK) + return; + +#ifndef TARGET_OS_DARWIN + /* + * On recycle we're trying to shrink stack + * as much as we can until first mark overwrite + * detected, then we simply zap watermark and + * assume the stack is balanced and won't change + * much in future. + */ + if (fiber->stack_dynamic_wmark != NULL) { + if (!stack_has_wmark(fiber->stack_dynamic_wmark)) + fiber->stack_dynamic_wmark = NULL; + else + stack_shrink(fiber); + } +#endif + + /* + * We are watching for stack overflow in one shot way: + * simply to not spam a user with messages, if someone + * triggered the problem it is highly likely that another + * fiber hit the same. + */ + if (overflow_warned) + return; + + if (!stack_has_wmark(fiber->stack_overflow_wmark)) { + say_warn("fiber %d seems to overflow the stack soon", + fiber->name, fiber->fid); + overflow_warned = true; + } +} + +static void +fiber_wmark_init(struct fiber *fiber) +{ + /* + * No tracking on custom stacks + * in a sake of simplicity. + */ + if (fiber->flags & FIBER_CUSTOM_STACK) { + fiber->stack_overflow_wmark = NULL; + fiber->stack_dynamic_wmark = NULL; + return; + } + + /* + * Initially we arm the last page of the stack + * to catch if we're getting close to its exhausting. + * In turn the dynamic mark is put into next page so + * we could drop pages later if they are unused. + */ + if (stack_direction < 0) { + fiber->stack_overflow_wmark = fiber->stack; + fiber->stack_overflow_wmark += wmark_inpage_offset; + + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size; + } else { + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size; + fiber->stack_overflow_wmark -= page_size; + fiber->stack_overflow_wmark -= wmark_inpage_offset; + + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size; + } + stack_put_wmark(fiber->stack_overflow_wmark); +} + +static void +stack_wmark_init(void) +{ + uint16_t v; + + /* + * To increase probability of the stack overflow + * detection we put _first_ mark at random position + * in first 128 bytes range. The rest of the marks + * are put with constant step (because we should not + * pressue random generator much in case of hight + * number of fibers). + */ + random_bytes((void *)&v, sizeof(v)); + wmark_inpage_offset = ((v % 128) + 8) & ~7; +} + static int fiber_stack_create(struct fiber *fiber, size_t stack_size) { @@ -757,6 +936,7 @@ fiber_stack_create(struct fiber *fiber, size_t stack_size) madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED); #endif + fiber_wmark_init(fiber); mprotect(guard, page_size, PROT_NONE); return 0; } @@ -926,8 +1106,12 @@ cord_create(struct cord *cord, const char *name) /* Record stack extents */ tt_pthread_attr_getstack(cord->id, &cord->sched.stack, &cord->sched.stack_size); + cord->sched.stack_overflow_wmark = cord->sched.stack; + cord->sched.stack_dynamic_wmark = cord->sched.stack; #else cord->sched.stack = NULL; + cord->sched.stack_overflow_wmark = NULL; + cord->sched.stack_dynamic_wmark = NULL; cord->sched.stack_size = 0; #endif } @@ -1238,6 +1422,7 @@ fiber_init(int (*invoke)(fiber_func f, va_list ap)) { page_size = sysconf(_SC_PAGESIZE); stack_direction = check_stack_direction(__builtin_frame_address(0)); + stack_wmark_init(); fiber_invoke = invoke; main_thread_id = pthread_self(); main_cord.loop = ev_default_loop(EVFLAG_AUTO | EVFLAG_ALLOCFD); diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index f1f5a0555..271d2b4d0 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -348,6 +348,10 @@ struct fiber { struct slab *stack_slab; /** Coro stack addr. */ void *stack; + /** Stack dynamic watermark addr for usage optimization. */ + void *stack_dynamic_wmark; + /** Stack watermark addr for overflow detection. */ + void *stack_overflow_wmark; /** Coro stack size. */ size_t stack_size; /** Valgrind stack id. */ -- 2.20.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage 2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov @ 2019-03-06 18:05 ` Vladimir Davydov 2019-03-06 19:03 ` Cyrill Gorcunov 0 siblings, 1 reply; 12+ messages in thread From: Vladimir Davydov @ 2019-03-06 18:05 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tarantool-patches, kostja On Wed, Mar 06, 2019 at 01:38:54AM +0300, Cyrill Gorcunov wrote: > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 64bcda26a..d18227e7f 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -41,6 +41,7 @@ > #include "assoc.h" > #include "memory.h" > #include "trigger.h" > +#include "random.h" > > #include "third_party/valgrind/memcheck.h" > > @@ -104,6 +105,32 @@ static const struct fiber_attr fiber_attr_default = { > .flags = FIBER_DEFAULT_FLAGS > }; > > +/* > + * Random values generated with uuid. > + */ > +static const uint64_t poison_pool[] = { > + 0x74f31d37285c4c37, 0xb10269a05bf10c29, > + 0x0994d845bd284e0f, 0x9ffd4f7129c184df, > + 0x357151e6711c4415, 0x8c5e5f41aafe6f28, > + 0x6917dd79e78049d5, 0xba61957c65ca2465, > +}; > + > +/* > + * An offset inside page to the first mark. > + */ > +static unsigned int wmark_inpage_offset; > + > +/* > + * We poison by 8 bytes as it natural for stack > + * step on x86-64. Also 128 byte gap between > + * poison values should cover a common cases. > + */ > +#define POISON_SIZE (sizeof(poison_pool) / sizeof(poison_pool[0])) > +#define POISON_GAP (128 + sizeof(poison_pool[0])) > +#define POISON_OFF (POISON_GAP / sizeof(poison_pool[0])) > + > +static void fiber_wmark_recycle(struct fiber *fiber); > + > void > fiber_attr_create(struct fiber_attr *fiber_attr) > { > @@ -624,6 +651,7 @@ fiber_recycle(struct fiber *fiber) > /* no pending wakeup */ > assert(rlist_empty(&fiber->state)); > bool has_custom_stack = fiber->flags & FIBER_CUSTOM_STACK; > + fiber_wmark_recycle(fiber); > fiber_reset(fiber); > fiber->name[0] = '\0'; > fiber->f = NULL; > @@ -710,6 +738,157 @@ page_align_up(void *ptr) > return page_align_down(ptr + page_size - 1); > } > > +static bool > +stack_has_wmark(void *addr) > +{ > + const uint64_t *src = poison_pool; > + const uint64_t *dst = addr; > + size_t i; > + > + for (i = 0; i < POISON_SIZE; i++) { > + if (*dst != src[i]) > + return false; > + dst += POISON_OFF; > + } > + > + return true; > +} > + > +static void > +stack_put_wmark(void *addr) > +{ > + const uint64_t *src = poison_pool; > + uint64_t *dst = addr; > + size_t i; > + > + for (i = 0; i < POISON_SIZE; i++) { > + *dst = src[i]; > + dst += POISON_OFF; > + } > +} > + > +#ifndef TARGET_OS_DARWIN > +static void > +stack_shrink(struct fiber *fiber) > +{ > + void *hi, *lo; > + > + if (stack_direction < 0) { > + hi = fiber->stack + fiber->stack_size; > + lo = fiber->stack_overflow_wmark+ page_size; > + } else { > + hi = fiber->stack_overflow_wmark - page_size; > + lo = fiber->stack - fiber->stack_size; > + } > + > + if (fiber->stack_dynamic_wmark <= lo || > + fiber->stack_dynamic_wmark >= hi) > + return; > + > + if (stack_direction < 0) { > + madvise(page_align_up(fiber->stack_dynamic_wmark), > + page_size, MADV_DONTNEED); > + fiber->stack_dynamic_wmark += page_size; > + } else { > + madvise(page_align_down(fiber->stack_dynamic_wmark), > + page_size, MADV_DONTNEED); > + fiber->stack_dynamic_wmark -= page_size; > + } > + stack_put_wmark(fiber); > +} > +#endif > + > +static void > +fiber_wmark_recycle(struct fiber *fiber) > +{ > + static bool overflow_warned = false; > + > + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK) > + return; > + > +#ifndef TARGET_OS_DARWIN > + /* > + * On recycle we're trying to shrink stack > + * as much as we can until first mark overwrite > + * detected, then we simply zap watermark and > + * assume the stack is balanced and won't change > + * much in future. > + */ I don't understand. If the watermark has been exceeded once for a particular fiber, we'll never ever shrink the stack back to normal? That is if a task eager for stack is rescheduled on different fibers they will all end up with huge stacks. That's not what we want. We want the stack size to be set dynamically so that the *majority* of fibers don't need to shrink/grow stacks back and forth. Or am I reading the code wrong? > + if (fiber->stack_dynamic_wmark != NULL) { > + if (!stack_has_wmark(fiber->stack_dynamic_wmark)) > + fiber->stack_dynamic_wmark = NULL; > + else > + stack_shrink(fiber); > + } > +#endif > + > + /* > + * We are watching for stack overflow in one shot way: > + * simply to not spam a user with messages, if someone > + * triggered the problem it is highly likely that another > + * fiber hit the same. > + */ > + if (overflow_warned) > + return; > + > + if (!stack_has_wmark(fiber->stack_overflow_wmark)) { > + say_warn("fiber %d seems to overflow the stack soon", > + fiber->name, fiber->fid); No point in printing fiber->name and id - they are printed by the say module, anyway. At the same time I'd prefer to keep max stack size here and print its value whenever we have to grow stack, similarly to how the Linux kernel handles it. > + overflow_warned = true; > + } > +} > + > +static void > +fiber_wmark_init(struct fiber *fiber) > +{ > + /* > + * No tracking on custom stacks > + * in a sake of simplicity. > + */ > + if (fiber->flags & FIBER_CUSTOM_STACK) { > + fiber->stack_overflow_wmark = NULL; > + fiber->stack_dynamic_wmark = NULL; > + return; > + } > + > + /* > + * Initially we arm the last page of the stack > + * to catch if we're getting close to its exhausting. > + * In turn the dynamic mark is put into next page so > + * we could drop pages later if they are unused. > + */ > + if (stack_direction < 0) { > + fiber->stack_overflow_wmark = fiber->stack; > + fiber->stack_overflow_wmark += wmark_inpage_offset; > + > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size; > + } else { > + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size; > + fiber->stack_overflow_wmark -= page_size; > + fiber->stack_overflow_wmark -= wmark_inpage_offset; > + > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size; I'd rather start with the minimal allowed stack size and grow it on demand, if there are too many fibers that exceed the watermark now and then, not vice versa (imagine we start from 16 MB). > + } > + stack_put_wmark(fiber->stack_overflow_wmark); > +} > + > +static void > +stack_wmark_init(void) > +{ > + uint16_t v; > + > + /* > + * To increase probability of the stack overflow > + * detection we put _first_ mark at random position > + * in first 128 bytes range. The rest of the marks > + * are put with constant step (because we should not > + * pressue random generator much in case of hight > + * number of fibers). > + */ > + random_bytes((void *)&v, sizeof(v)); Why not rand? Anyway, I think we should recalculate the offset for each fiber individually each time it's recycled. Setting it once on system initialization is no better than not using randomization at all. As for pressure on the random generator - I don't understand why it can possibly be bad. After all, it's a pseudo random generator, not /dev/random. > + wmark_inpage_offset = ((v % 128) + 8) & ~7; > +} > + > static int > fiber_stack_create(struct fiber *fiber, size_t stack_size) > { > @@ -757,6 +936,7 @@ fiber_stack_create(struct fiber *fiber, size_t stack_size) > madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED); > #endif > > + fiber_wmark_init(fiber); > mprotect(guard, page_size, PROT_NONE); > return 0; > } > @@ -926,8 +1106,12 @@ cord_create(struct cord *cord, const char *name) > /* Record stack extents */ > tt_pthread_attr_getstack(cord->id, &cord->sched.stack, > &cord->sched.stack_size); > + cord->sched.stack_overflow_wmark = cord->sched.stack; > + cord->sched.stack_dynamic_wmark = cord->sched.stack; > #else > cord->sched.stack = NULL; > + cord->sched.stack_overflow_wmark = NULL; > + cord->sched.stack_dynamic_wmark = NULL; > cord->sched.stack_size = 0; > #endif > } > @@ -1238,6 +1422,7 @@ fiber_init(int (*invoke)(fiber_func f, va_list ap)) > { > page_size = sysconf(_SC_PAGESIZE); > stack_direction = check_stack_direction(__builtin_frame_address(0)); > + stack_wmark_init(); > fiber_invoke = invoke; > main_thread_id = pthread_self(); > main_cord.loop = ev_default_loop(EVFLAG_AUTO | EVFLAG_ALLOCFD); > diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h > index f1f5a0555..271d2b4d0 100644 > --- a/src/lib/core/fiber.h > +++ b/src/lib/core/fiber.h > @@ -348,6 +348,10 @@ struct fiber { > struct slab *stack_slab; > /** Coro stack addr. */ > void *stack; > + /** Stack dynamic watermark addr for usage optimization. */ > + void *stack_dynamic_wmark; > + /** Stack watermark addr for overflow detection. */ > + void *stack_overflow_wmark; Here I'd like to see a more elaborate comment, explaining why we need these members, what exactly the stack usage optimization is about. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage 2019-03-06 18:05 ` Vladimir Davydov @ 2019-03-06 19:03 ` Cyrill Gorcunov 2019-03-07 8:27 ` Vladimir Davydov 0 siblings, 1 reply; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-06 19:03 UTC (permalink / raw) To: Vladimir Davydov; +Cc: tarantool-patches, kostja On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote: ... > > + > > +static void > > +fiber_wmark_recycle(struct fiber *fiber) > > +{ > > + static bool overflow_warned = false; > > + > > + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK) > > + return; > > + > > +#ifndef TARGET_OS_DARWIN > > + /* > > + * On recycle we're trying to shrink stack > > + * as much as we can until first mark overwrite > > + * detected, then we simply zap watermark and > > + * assume the stack is balanced and won't change > > + * much in future. > > + */ > > I don't understand. If the watermark has been exceeded once for a > particular fiber, we'll never ever shrink the stack back to normal? I guess you meant "extend", not "shrink". But true -- every recycle we're trying to shrink stack, ie to make it less. And once we found that wmark is vanished we stop further analisys. > That is if a task eager for stack is rescheduled on different fibers > they will all end up with huge stacks. That's not what we want. > We want the stack size to be set dynamically so that the *majority* of > fibers don't need to shrink/grow stacks back and forth. Or am I reading > the code wrong? No, you're right. But it is unclean for me yet how we should reach this goal? Track stack usage in context switch? ... > > + > > + if (!stack_has_wmark(fiber->stack_overflow_wmark)) { > > + say_warn("fiber %d seems to overflow the stack soon", > > + fiber->name, fiber->fid); > > No point in printing fiber->name and id - they are printed by the say > module, anyway. At the same time I'd prefer to keep max stack size here > and print its value whenever we have to grow stack, similarly to how > the Linux kernel handles it. ok > > + if (stack_direction < 0) { > > + fiber->stack_overflow_wmark = fiber->stack; > > + fiber->stack_overflow_wmark += wmark_inpage_offset; > > + > > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size; > > + } else { > > + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size; > > + fiber->stack_overflow_wmark -= page_size; > > + fiber->stack_overflow_wmark -= wmark_inpage_offset; > > + > > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size; > > I'd rather start with the minimal allowed stack size and grow it on > demand, if there are too many fibers that exceed the watermark now and > then, not vice versa (imagine we start from 16 MB). This won't work. Imagine we put wmark in _first_ page, and then we find that the wmark has been overwriten, where to put next mark then? Or you propose to catch signals? > > + > > + /* > > + * To increase probability of the stack overflow > > + * detection we put _first_ mark at random position > > + * in first 128 bytes range. The rest of the marks > > + * are put with constant step (because we should not > > + * pressue random generator much in case of hight > > + * number of fibers). > > + */ > > + random_bytes((void *)&v, sizeof(v)); > > Why not rand? Because we already have a helper for random numbers :) > > Anyway, I think we should recalculate the offset for each fiber > individually each time it's recycled. Setting it once on system > initialization is no better than not using randomization at all. > As for pressure on the random generator - I don't understand why > it can possibly be bad. After all, it's a pseudo random generator, > not /dev/random. ok > > @@ -348,6 +348,10 @@ struct fiber { > > struct slab *stack_slab; > > /** Coro stack addr. */ > > void *stack; > > + /** Stack dynamic watermark addr for usage optimization. */ > > + void *stack_dynamic_wmark; > > + /** Stack watermark addr for overflow detection. */ > > + void *stack_overflow_wmark; > > Here I'd like to see a more elaborate comment, explaining why we need > these members, what exactly the stack usage optimization is about. ok ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage 2019-03-06 19:03 ` Cyrill Gorcunov @ 2019-03-07 8:27 ` Vladimir Davydov 2019-03-07 9:04 ` Cyrill Gorcunov 0 siblings, 1 reply; 12+ messages in thread From: Vladimir Davydov @ 2019-03-07 8:27 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tarantool-patches, kostja On Wed, Mar 06, 2019 at 10:03:49PM +0300, Cyrill Gorcunov wrote: > On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote: > ... > > > + > > > +static void > > > +fiber_wmark_recycle(struct fiber *fiber) > > > +{ > > > + static bool overflow_warned = false; > > > + > > > + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK) > > > + return; > > > + > > > +#ifndef TARGET_OS_DARWIN > > > + /* > > > + * On recycle we're trying to shrink stack > > > + * as much as we can until first mark overwrite > > > + * detected, then we simply zap watermark and > > > + * assume the stack is balanced and won't change > > > + * much in future. > > > + */ > > > > I don't understand. If the watermark has been exceeded once for a > > particular fiber, we'll never ever shrink the stack back to normal? > > I guess you meant "extend", not "shrink". But true -- every recycle > we're trying to shrink stack, ie to make it less. And once we found > that wmark is vanished we stop further analisys. > > > That is if a task eager for stack is rescheduled on different fibers > > they will all end up with huge stacks. That's not what we want. > > We want the stack size to be set dynamically so that the *majority* of > > fibers don't need to shrink/grow stacks back and forth. Or am I reading > > the code wrong? > > No, you're right. But it is unclean for me yet how we should reach > this goal? Track stack usage in context switch? You can track the ratio of fibers that have overwritten the watermark recently. If the ratio gets too high (say 90% out of 1000 recently scheduled fibers), the minimal stack allocation should be increased. > ... > > > + > > > + if (!stack_has_wmark(fiber->stack_overflow_wmark)) { > > > + say_warn("fiber %d seems to overflow the stack soon", > > > + fiber->name, fiber->fid); > > > > No point in printing fiber->name and id - they are printed by the say > > module, anyway. At the same time I'd prefer to keep max stack size here > > and print its value whenever we have to grow stack, similarly to how > > the Linux kernel handles it. > > ok > > > > + if (stack_direction < 0) { > > > + fiber->stack_overflow_wmark = fiber->stack; > > > + fiber->stack_overflow_wmark += wmark_inpage_offset; > > > + > > > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size; > > > + } else { > > > + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size; > > > + fiber->stack_overflow_wmark -= page_size; > > > + fiber->stack_overflow_wmark -= wmark_inpage_offset; > > > + > > > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size; > > > > I'd rather start with the minimal allowed stack size and grow it on > > demand, if there are too many fibers that exceed the watermark now and > > then, not vice versa (imagine we start from 16 MB). > > This won't work. Imagine we put wmark in _first_ page, and then we find > that the wmark has been overwriten, where to put next mark then? Or > you propose to catch signals? Put the next watermark in the second page then. If it gets overwritten, too, put it in the third page, and so on. Perhaps, we can even double the minimal stack allocation each time. > > > > + > > > + /* > > > + * To increase probability of the stack overflow > > > + * detection we put _first_ mark at random position > > > + * in first 128 bytes range. The rest of the marks > > > + * are put with constant step (because we should not > > > + * pressue random generator much in case of hight > > > + * number of fibers). > > > + */ > > > + random_bytes((void *)&v, sizeof(v)); > > > > Why not rand? > > Because we already have a helper for random numbers :) It's a helper for filling an array with random bytes. I don't see much point in using it for generating a single random number. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage 2019-03-07 8:27 ` Vladimir Davydov @ 2019-03-07 9:04 ` Cyrill Gorcunov 2019-03-07 9:20 ` Vladimir Davydov 0 siblings, 1 reply; 12+ messages in thread From: Cyrill Gorcunov @ 2019-03-07 9:04 UTC (permalink / raw) To: Vladimir Davydov; +Cc: tarantool-patches, kostja On Thu, Mar 07, 2019 at 11:27:58AM +0300, Vladimir Davydov wrote: ... > > You can track the ratio of fibers that have overwritten the watermark > recently. If the ratio gets too high (say 90% out of 1000 recently > scheduled fibers), the minimal stack allocation should be increased. IOW, we need to gather statistics :/ I would rather splt this task into two: - make stack 1M by default and madvise as not needed and merge, it upstream asap, to not block users - second series is to optimize stack usage > > > > This won't work. Imagine we put wmark in _first_ page, and then we find > > that the wmark has been overwriten, where to put next mark then? Or > > you propose to catch signals? > > Put the next watermark in the second page then. If it gets overwritten, > too, put it in the third page, and so on. Perhaps, we can even double > the minimal stack allocation each time. ok > > > > > > Why not rand? > > > > Because we already have a helper for random numbers :) > > It's a helper for filling an array with random bytes. I don't see much > point in using it for generating a single random number. ok Cyrill ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage 2019-03-07 9:04 ` Cyrill Gorcunov @ 2019-03-07 9:20 ` Vladimir Davydov 0 siblings, 0 replies; 12+ messages in thread From: Vladimir Davydov @ 2019-03-07 9:20 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tarantool-patches, kostja On Thu, Mar 07, 2019 at 12:04:39PM +0300, Cyrill Gorcunov wrote: > On Thu, Mar 07, 2019 at 11:27:58AM +0300, Vladimir Davydov wrote: > ... > > > > You can track the ratio of fibers that have overwritten the watermark > > recently. If the ratio gets too high (say 90% out of 1000 recently > > scheduled fibers), the minimal stack allocation should be increased. > > IOW, we need to gather statistics :/ I would rather splt this task > into two: > > - make stack 1M by default and madvise as not needed and merge, > it upstream asap, to not block users > > - second series is to optimize stack usage Fair enough. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-03-07 9:20 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov 2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko 2019-03-06 7:09 ` Cyrill Gorcunov 2019-03-06 7:30 ` Konstantin Osipov 2019-03-05 22:38 ` [PATCH 2/3] core/fiber: Mark stack as unneeded on creation Cyrill Gorcunov 2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov 2019-03-06 18:05 ` Vladimir Davydov 2019-03-06 19:03 ` Cyrill Gorcunov 2019-03-07 8:27 ` Vladimir Davydov 2019-03-07 9:04 ` Cyrill Gorcunov 2019-03-07 9:20 ` Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox