* [PATCH v3 0/3] core/fiber: Rework stack management
@ 2019-03-05 22:38 Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov
Guys, take a look please once time permit. Deep testing
is quite needed though I have no clue how to force lua
to use stack intensively.
Cyrill Gorcunov (3):
core/fiber: Increase default stack size
core/fiber: Mark stack as unneeded on creation
core/fiber: Put watermarks into stack to track its usage
src/lib/core/fiber.c | 200 ++++++++++++++++++++++++++++++++++++++++++-
src/lib/core/fiber.h | 4 +
2 files changed, 200 insertions(+), 4 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] core/fiber: Increase default stack size
2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov
@ 2019-03-05 22:38 ` Cyrill Gorcunov
2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko
2019-03-05 22:38 ` [PATCH 2/3] core/fiber: Mark stack as unneeded on creation Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov
2 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov
The default 64K stack size used for years become too small
for modern distors (Fedora 29 and etc) where third party libraries
(such as ncurses) started to use 64K for own buffers and we get
SIGSGV early without reaching interactive console phase.
Thus we increase default size up to 256K which should fit
for common case. Later we will make this value configurable
to address arbitrary stack sizes without a need to rebuild
the code.
Note the values are switched to 4K page granularity for sake
of future modifications -- we gonna manipulate pages to
relax rss usage if OS allows.
Closes #3418
---
src/lib/core/fiber.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index abd6c6b11..d4d16c05e 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -92,10 +92,10 @@ static size_t page_size;
static int stack_direction;
enum {
- /* The minimum allowable fiber stack size in bytes */
- FIBER_STACK_SIZE_MINIMAL = 16384,
- /* Default fiber stack size in bytes */
- FIBER_STACK_SIZE_DEFAULT = 65536
+ /* The minimum allowable fiber stack size in 4K page */
+ FIBER_STACK_SIZE_MINIMAL = 4 << 12,
+ /* Default fiber stack size in 4K page */
+ FIBER_STACK_SIZE_DEFAULT = 64 << 12,
};
/** Default fiber attributes */
--
2.20.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] core/fiber: Mark stack as unneeded on creation
2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov
@ 2019-03-05 22:38 ` Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov
2 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov
Since we've increased the default stack size we hope
the whole 256K won't be used for regular loads thus
we mark the stack area as unneeded to minimize rss
pressure.
Note we do this on fiber creation at the moment, more
detailed stack usage analisys will be in next patches
since it doesn't depend on this change.
Closes #3418
---
src/lib/core/fiber.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index d4d16c05e..64bcda26a 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -749,6 +749,13 @@ fiber_stack_create(struct fiber *fiber, size_t stack_size)
fiber->stack_id = VALGRIND_STACK_REGISTER(fiber->stack,
(char *)fiber->stack +
fiber->stack_size);
+#ifndef TARGET_OS_DARWIN
+ /*
+ * We don't expect the whole stack usage in regular
+ * loads, lets try to minimize rss pressure.
+ */
+ madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED);
+#endif
mprotect(guard, page_size, PROT_NONE);
return 0;
--
2.20.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 2/3] core/fiber: Mark stack as unneeded on creation Cyrill Gorcunov
@ 2019-03-05 22:38 ` Cyrill Gorcunov
2019-03-06 18:05 ` Vladimir Davydov
2 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-05 22:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, kostja, Cyrill Gorcunov
We want to detect a situation where task in fiber is too eager for
stack and close to its exhausting. For this sake upon stack creation
we put 8 marks on last stack page with step of 128 bytes. Such params
allows us to fill ~1/4 of a page, which does seem reasonable but
we might change this params with time.
Since the watermark position is permanent and some task is close
to stack limit we print about the situation once to not spam
a user much and stop putting the mark on recycling.
Still this techique doesn't guarantee to handle all possible
situations so to increas probability of catching eager fibers
we put marks *not* at page start address but withing some offset,
randomly generated during startup procedure.
To minimize a pressue on memory system we try to relax stack
usage with that named dynamic watermark. Basically it is the
same marks as for overflow detection but placed "earlier" and
on every recycle we try to shrink stack usage moving dynamic
mark closer to stack start dropping clean pages with madvise
help if OS supports it.
Thus the stack will look like (for "growsdown" case)
+--- <- fiber->stack + fiber->stack_size
|
| first stack page
|
+---
|
~~~~
|
+---
| ### <- fiber->stack_dynamic_wmark
|
|
+---
| ### <- fiber->stack_overflow_wmark
| last usabe page
|
+---
| guard page
+---
Closes #3418
---
v3:
- add random offset from beginning of the page to
put mark not at predefined address but by chance
- drop freeze/frozen macros just use NULL
- guard with TARGET_OS_DARWIN where needed
src/lib/core/fiber.c | 185 +++++++++++++++++++++++++++++++++++++++++++
src/lib/core/fiber.h | 4 +
2 files changed, 189 insertions(+)
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 64bcda26a..d18227e7f 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -41,6 +41,7 @@
#include "assoc.h"
#include "memory.h"
#include "trigger.h"
+#include "random.h"
#include "third_party/valgrind/memcheck.h"
@@ -104,6 +105,32 @@ static const struct fiber_attr fiber_attr_default = {
.flags = FIBER_DEFAULT_FLAGS
};
+/*
+ * Random values generated with uuid.
+ */
+static const uint64_t poison_pool[] = {
+ 0x74f31d37285c4c37, 0xb10269a05bf10c29,
+ 0x0994d845bd284e0f, 0x9ffd4f7129c184df,
+ 0x357151e6711c4415, 0x8c5e5f41aafe6f28,
+ 0x6917dd79e78049d5, 0xba61957c65ca2465,
+};
+
+/*
+ * An offset inside page to the first mark.
+ */
+static unsigned int wmark_inpage_offset;
+
+/*
+ * We poison by 8 bytes as it natural for stack
+ * step on x86-64. Also 128 byte gap between
+ * poison values should cover a common cases.
+ */
+#define POISON_SIZE (sizeof(poison_pool) / sizeof(poison_pool[0]))
+#define POISON_GAP (128 + sizeof(poison_pool[0]))
+#define POISON_OFF (POISON_GAP / sizeof(poison_pool[0]))
+
+static void fiber_wmark_recycle(struct fiber *fiber);
+
void
fiber_attr_create(struct fiber_attr *fiber_attr)
{
@@ -624,6 +651,7 @@ fiber_recycle(struct fiber *fiber)
/* no pending wakeup */
assert(rlist_empty(&fiber->state));
bool has_custom_stack = fiber->flags & FIBER_CUSTOM_STACK;
+ fiber_wmark_recycle(fiber);
fiber_reset(fiber);
fiber->name[0] = '\0';
fiber->f = NULL;
@@ -710,6 +738,157 @@ page_align_up(void *ptr)
return page_align_down(ptr + page_size - 1);
}
+static bool
+stack_has_wmark(void *addr)
+{
+ const uint64_t *src = poison_pool;
+ const uint64_t *dst = addr;
+ size_t i;
+
+ for (i = 0; i < POISON_SIZE; i++) {
+ if (*dst != src[i])
+ return false;
+ dst += POISON_OFF;
+ }
+
+ return true;
+}
+
+static void
+stack_put_wmark(void *addr)
+{
+ const uint64_t *src = poison_pool;
+ uint64_t *dst = addr;
+ size_t i;
+
+ for (i = 0; i < POISON_SIZE; i++) {
+ *dst = src[i];
+ dst += POISON_OFF;
+ }
+}
+
+#ifndef TARGET_OS_DARWIN
+static void
+stack_shrink(struct fiber *fiber)
+{
+ void *hi, *lo;
+
+ if (stack_direction < 0) {
+ hi = fiber->stack + fiber->stack_size;
+ lo = fiber->stack_overflow_wmark+ page_size;
+ } else {
+ hi = fiber->stack_overflow_wmark - page_size;
+ lo = fiber->stack - fiber->stack_size;
+ }
+
+ if (fiber->stack_dynamic_wmark <= lo ||
+ fiber->stack_dynamic_wmark >= hi)
+ return;
+
+ if (stack_direction < 0) {
+ madvise(page_align_up(fiber->stack_dynamic_wmark),
+ page_size, MADV_DONTNEED);
+ fiber->stack_dynamic_wmark += page_size;
+ } else {
+ madvise(page_align_down(fiber->stack_dynamic_wmark),
+ page_size, MADV_DONTNEED);
+ fiber->stack_dynamic_wmark -= page_size;
+ }
+ stack_put_wmark(fiber);
+}
+#endif
+
+static void
+fiber_wmark_recycle(struct fiber *fiber)
+{
+ static bool overflow_warned = false;
+
+ if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
+ return;
+
+#ifndef TARGET_OS_DARWIN
+ /*
+ * On recycle we're trying to shrink stack
+ * as much as we can until first mark overwrite
+ * detected, then we simply zap watermark and
+ * assume the stack is balanced and won't change
+ * much in future.
+ */
+ if (fiber->stack_dynamic_wmark != NULL) {
+ if (!stack_has_wmark(fiber->stack_dynamic_wmark))
+ fiber->stack_dynamic_wmark = NULL;
+ else
+ stack_shrink(fiber);
+ }
+#endif
+
+ /*
+ * We are watching for stack overflow in one shot way:
+ * simply to not spam a user with messages, if someone
+ * triggered the problem it is highly likely that another
+ * fiber hit the same.
+ */
+ if (overflow_warned)
+ return;
+
+ if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
+ say_warn("fiber %d seems to overflow the stack soon",
+ fiber->name, fiber->fid);
+ overflow_warned = true;
+ }
+}
+
+static void
+fiber_wmark_init(struct fiber *fiber)
+{
+ /*
+ * No tracking on custom stacks
+ * in a sake of simplicity.
+ */
+ if (fiber->flags & FIBER_CUSTOM_STACK) {
+ fiber->stack_overflow_wmark = NULL;
+ fiber->stack_dynamic_wmark = NULL;
+ return;
+ }
+
+ /*
+ * Initially we arm the last page of the stack
+ * to catch if we're getting close to its exhausting.
+ * In turn the dynamic mark is put into next page so
+ * we could drop pages later if they are unused.
+ */
+ if (stack_direction < 0) {
+ fiber->stack_overflow_wmark = fiber->stack;
+ fiber->stack_overflow_wmark += wmark_inpage_offset;
+
+ fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
+ } else {
+ fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size;
+ fiber->stack_overflow_wmark -= page_size;
+ fiber->stack_overflow_wmark -= wmark_inpage_offset;
+
+ fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
+ }
+ stack_put_wmark(fiber->stack_overflow_wmark);
+}
+
+static void
+stack_wmark_init(void)
+{
+ uint16_t v;
+
+ /*
+ * To increase probability of the stack overflow
+ * detection we put _first_ mark at random position
+ * in first 128 bytes range. The rest of the marks
+ * are put with constant step (because we should not
+ * pressue random generator much in case of hight
+ * number of fibers).
+ */
+ random_bytes((void *)&v, sizeof(v));
+ wmark_inpage_offset = ((v % 128) + 8) & ~7;
+}
+
static int
fiber_stack_create(struct fiber *fiber, size_t stack_size)
{
@@ -757,6 +936,7 @@ fiber_stack_create(struct fiber *fiber, size_t stack_size)
madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED);
#endif
+ fiber_wmark_init(fiber);
mprotect(guard, page_size, PROT_NONE);
return 0;
}
@@ -926,8 +1106,12 @@ cord_create(struct cord *cord, const char *name)
/* Record stack extents */
tt_pthread_attr_getstack(cord->id, &cord->sched.stack,
&cord->sched.stack_size);
+ cord->sched.stack_overflow_wmark = cord->sched.stack;
+ cord->sched.stack_dynamic_wmark = cord->sched.stack;
#else
cord->sched.stack = NULL;
+ cord->sched.stack_overflow_wmark = NULL;
+ cord->sched.stack_dynamic_wmark = NULL;
cord->sched.stack_size = 0;
#endif
}
@@ -1238,6 +1422,7 @@ fiber_init(int (*invoke)(fiber_func f, va_list ap))
{
page_size = sysconf(_SC_PAGESIZE);
stack_direction = check_stack_direction(__builtin_frame_address(0));
+ stack_wmark_init();
fiber_invoke = invoke;
main_thread_id = pthread_self();
main_cord.loop = ev_default_loop(EVFLAG_AUTO | EVFLAG_ALLOCFD);
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index f1f5a0555..271d2b4d0 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -348,6 +348,10 @@ struct fiber {
struct slab *stack_slab;
/** Coro stack addr. */
void *stack;
+ /** Stack dynamic watermark addr for usage optimization. */
+ void *stack_dynamic_wmark;
+ /** Stack watermark addr for overflow detection. */
+ void *stack_overflow_wmark;
/** Coro stack size. */
size_t stack_size;
/** Valgrind stack id. */
--
2.20.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tarantool-patches] [PATCH 1/3] core/fiber: Increase default stack size
2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov
@ 2019-03-05 23:43 ` Alexander Turenko
2019-03-06 7:09 ` Cyrill Gorcunov
2019-03-06 7:30 ` Konstantin Osipov
0 siblings, 2 replies; 12+ messages in thread
From: Alexander Turenko @ 2019-03-05 23:43 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches, vdavydov.dev, kostja
On Wed, Mar 06, 2019 at 01:38:52AM +0300, Cyrill Gorcunov wrote:
> The default 64K stack size used for years become too small
> for modern distors (Fedora 29 and etc) where third party libraries
> (such as ncurses) started to use 64K for own buffers and we get
> SIGSGV early without reaching interactive console phase.
>
> Thus we increase default size up to 256K which should fit
> for common case. Later we will make this value configurable
> to address arbitrary stack sizes without a need to rebuild
> the code.
>
> Note the values are switched to 4K page granularity for sake
> of future modifications -- we gonna manipulate pages to
> relax rss usage if OS allows.
>
> Closes #3418
> ---
> src/lib/core/fiber.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index abd6c6b11..d4d16c05e 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -92,10 +92,10 @@ static size_t page_size;
> static int stack_direction;
>
> enum {
> - /* The minimum allowable fiber stack size in bytes */
> - FIBER_STACK_SIZE_MINIMAL = 16384,
> - /* Default fiber stack size in bytes */
> - FIBER_STACK_SIZE_DEFAULT = 65536
> + /* The minimum allowable fiber stack size in 4K page */
> + FIBER_STACK_SIZE_MINIMAL = 4 << 12,
> + /* Default fiber stack size in 4K page */
> + FIBER_STACK_SIZE_DEFAULT = 64 << 12,
I recently looked at #3564 and observed that 256 KiB of stack is not
enough for our http client (the wrapper around libcurl). I bisected how
much it requeres and found that it fails with 448 KiB, but succeeds with
480 KiB. I just tried the following case with different
FIBER_STACK_SIZE_DEFAULT constants on our tt-mac machine:
./src/tarantool <<< "print(require('json').encode(require('http.client').get('http://google.com')))"
Tarantool is linked to libcurl-7.61.0:
$ otool -L ./src/tarantool | grep libcurl
/usr/local/opt/curl/lib/libcurl.4.dylib (compatibility version 10.0.0, current version 10.0.0)
$ strings /usr/local/opt/curl/lib/libcurl.4.dylib | grep libcurl/
libcurl/7.61.0
However this case succeeds on my Linux machine on tarantool with 64 KiB
stack with libcurl-7.63.0. Don't sure what is matter: libcurl version,
OS, some compilation or runtime options of libcurl. But at least we see
that there are environments where 256 KiB is not enough.
Maybe we should investigate what lead libcurl to use so much stack size
in that environment and possibly will find a way to configure it (at
runtime) to avoid this. Don't sure. So this message is just for the
record.
https://github.com/tarantool/tarantool/issues/3564
> };
>
> /** Default fiber attributes */
> --
> 2.20.1
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tarantool-patches] [PATCH 1/3] core/fiber: Increase default stack size
2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko
@ 2019-03-06 7:09 ` Cyrill Gorcunov
2019-03-06 7:30 ` Konstantin Osipov
1 sibling, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-06 7:09 UTC (permalink / raw)
To: Alexander Turenko; +Cc: tarantool-patches, vdavydov.dev, kostja
On Wed, Mar 06, 2019 at 02:43:37AM +0300, Alexander Turenko wrote:
...
> > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> > index abd6c6b11..d4d16c05e 100644
> > --- a/src/lib/core/fiber.c
> > +++ b/src/lib/core/fiber.c
> > @@ -92,10 +92,10 @@ static size_t page_size;
> > static int stack_direction;
> >
> > enum {
> > - /* The minimum allowable fiber stack size in bytes */
> > - FIBER_STACK_SIZE_MINIMAL = 16384,
> > - /* Default fiber stack size in bytes */
> > - FIBER_STACK_SIZE_DEFAULT = 65536
> > + /* The minimum allowable fiber stack size in 4K page */
> > + FIBER_STACK_SIZE_MINIMAL = 4 << 12,
> > + /* Default fiber stack size in 4K page */
> > + FIBER_STACK_SIZE_DEFAULT = 64 << 12,
>
> I recently looked at #3564 and observed that 256 KiB of stack is not
> enough for our http client (the wrapper around libcurl). I bisected how
> much it requeres and found that it fails with 448 KiB, but succeeds with
> 480 KiB. I just tried the following case with different
> FIBER_STACK_SIZE_DEFAULT constants on our tt-mac machine:
>
> ./src/tarantool <<< "print(require('json').encode(require('http.client').get('http://google.com')))"
>
> Tarantool is linked to libcurl-7.61.0:
>
> $ otool -L ./src/tarantool | grep libcurl
> /usr/local/opt/curl/lib/libcurl.4.dylib (compatibility version 10.0.0, current version 10.0.0)
> $ strings /usr/local/opt/curl/lib/libcurl.4.dylib | grep libcurl/
> libcurl/7.61.0
You know, I think this run and catch game will continue forever,
because we simply don't know in which environment users will
run tarantool. As being discussed with Vladimir we must provide
a way to a user to configure stack size. Most probably via
environment variable. I simply didn't implement it yet.
Also initially we started with 1M stack but then desided to
shrink back to 256K. Looks like 512K would suffice all sides.
> However this case succeeds on my Linux machine on tarantool with 64 KiB
> stack with libcurl-7.63.0. Don't sure what is matter: libcurl version,
> OS, some compilation or runtime options of libcurl. But at least we see
> that there are environments where 256 KiB is not enough.
>
> Maybe we should investigate what lead libcurl to use so much stack size
> in that environment and possibly will find a way to configure it (at
> runtime) to avoid this. Don't sure. So this message is just for the
> record.
>
> https://github.com/tarantool/tarantool/issues/3564
libcurl and etc are third party libs which we don't control so
we of course could investigate but only once time permit.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tarantool-patches] [PATCH 1/3] core/fiber: Increase default stack size
2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko
2019-03-06 7:09 ` Cyrill Gorcunov
@ 2019-03-06 7:30 ` Konstantin Osipov
1 sibling, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2019-03-06 7:30 UTC (permalink / raw)
To: Alexander Turenko; +Cc: Cyrill Gorcunov, tarantool-patches, vdavydov.dev
* Alexander Turenko <alexander.turenko@tarantool.org> [19/03/06 10:24]:
Alexander, we plan to make the default stack size configurable.
> On Wed, Mar 06, 2019 at 01:38:52AM +0300, Cyrill Gorcunov wrote:
> > The default 64K stack size used for years become too small
> > for modern distors (Fedora 29 and etc) where third party libraries
> > (such as ncurses) started to use 64K for own buffers and we get
> > SIGSGV early without reaching interactive console phase.
> >
> > Thus we increase default size up to 256K which should fit
> > for common case. Later we will make this value configurable
> > to address arbitrary stack sizes without a need to rebuild
> > the code.
> >
> > Note the values are switched to 4K page granularity for sake
> > of future modifications -- we gonna manipulate pages to
> > relax rss usage if OS allows.
> >
> > Closes #3418
> > ---
> > src/lib/core/fiber.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> > index abd6c6b11..d4d16c05e 100644
> > --- a/src/lib/core/fiber.c
> > +++ b/src/lib/core/fiber.c
> > @@ -92,10 +92,10 @@ static size_t page_size;
> > static int stack_direction;
> >
> > enum {
> > - /* The minimum allowable fiber stack size in bytes */
> > - FIBER_STACK_SIZE_MINIMAL = 16384,
> > - /* Default fiber stack size in bytes */
> > - FIBER_STACK_SIZE_DEFAULT = 65536
> > + /* The minimum allowable fiber stack size in 4K page */
> > + FIBER_STACK_SIZE_MINIMAL = 4 << 12,
> > + /* Default fiber stack size in 4K page */
> > + FIBER_STACK_SIZE_DEFAULT = 64 << 12,
>
> I recently looked at #3564 and observed that 256 KiB of stack is not
> enough for our http client (the wrapper around libcurl). I bisected how
> much it requeres and found that it fails with 448 KiB, but succeeds with
> 480 KiB. I just tried the following case with different
> FIBER_STACK_SIZE_DEFAULT constants on our tt-mac machine:
>
> ./src/tar
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov
@ 2019-03-06 18:05 ` Vladimir Davydov
2019-03-06 19:03 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Davydov @ 2019-03-06 18:05 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches, kostja
On Wed, Mar 06, 2019 at 01:38:54AM +0300, Cyrill Gorcunov wrote:
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 64bcda26a..d18227e7f 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -41,6 +41,7 @@
> #include "assoc.h"
> #include "memory.h"
> #include "trigger.h"
> +#include "random.h"
>
> #include "third_party/valgrind/memcheck.h"
>
> @@ -104,6 +105,32 @@ static const struct fiber_attr fiber_attr_default = {
> .flags = FIBER_DEFAULT_FLAGS
> };
>
> +/*
> + * Random values generated with uuid.
> + */
> +static const uint64_t poison_pool[] = {
> + 0x74f31d37285c4c37, 0xb10269a05bf10c29,
> + 0x0994d845bd284e0f, 0x9ffd4f7129c184df,
> + 0x357151e6711c4415, 0x8c5e5f41aafe6f28,
> + 0x6917dd79e78049d5, 0xba61957c65ca2465,
> +};
> +
> +/*
> + * An offset inside page to the first mark.
> + */
> +static unsigned int wmark_inpage_offset;
> +
> +/*
> + * We poison by 8 bytes as it natural for stack
> + * step on x86-64. Also 128 byte gap between
> + * poison values should cover a common cases.
> + */
> +#define POISON_SIZE (sizeof(poison_pool) / sizeof(poison_pool[0]))
> +#define POISON_GAP (128 + sizeof(poison_pool[0]))
> +#define POISON_OFF (POISON_GAP / sizeof(poison_pool[0]))
> +
> +static void fiber_wmark_recycle(struct fiber *fiber);
> +
> void
> fiber_attr_create(struct fiber_attr *fiber_attr)
> {
> @@ -624,6 +651,7 @@ fiber_recycle(struct fiber *fiber)
> /* no pending wakeup */
> assert(rlist_empty(&fiber->state));
> bool has_custom_stack = fiber->flags & FIBER_CUSTOM_STACK;
> + fiber_wmark_recycle(fiber);
> fiber_reset(fiber);
> fiber->name[0] = '\0';
> fiber->f = NULL;
> @@ -710,6 +738,157 @@ page_align_up(void *ptr)
> return page_align_down(ptr + page_size - 1);
> }
>
> +static bool
> +stack_has_wmark(void *addr)
> +{
> + const uint64_t *src = poison_pool;
> + const uint64_t *dst = addr;
> + size_t i;
> +
> + for (i = 0; i < POISON_SIZE; i++) {
> + if (*dst != src[i])
> + return false;
> + dst += POISON_OFF;
> + }
> +
> + return true;
> +}
> +
> +static void
> +stack_put_wmark(void *addr)
> +{
> + const uint64_t *src = poison_pool;
> + uint64_t *dst = addr;
> + size_t i;
> +
> + for (i = 0; i < POISON_SIZE; i++) {
> + *dst = src[i];
> + dst += POISON_OFF;
> + }
> +}
> +
> +#ifndef TARGET_OS_DARWIN
> +static void
> +stack_shrink(struct fiber *fiber)
> +{
> + void *hi, *lo;
> +
> + if (stack_direction < 0) {
> + hi = fiber->stack + fiber->stack_size;
> + lo = fiber->stack_overflow_wmark+ page_size;
> + } else {
> + hi = fiber->stack_overflow_wmark - page_size;
> + lo = fiber->stack - fiber->stack_size;
> + }
> +
> + if (fiber->stack_dynamic_wmark <= lo ||
> + fiber->stack_dynamic_wmark >= hi)
> + return;
> +
> + if (stack_direction < 0) {
> + madvise(page_align_up(fiber->stack_dynamic_wmark),
> + page_size, MADV_DONTNEED);
> + fiber->stack_dynamic_wmark += page_size;
> + } else {
> + madvise(page_align_down(fiber->stack_dynamic_wmark),
> + page_size, MADV_DONTNEED);
> + fiber->stack_dynamic_wmark -= page_size;
> + }
> + stack_put_wmark(fiber);
> +}
> +#endif
> +
> +static void
> +fiber_wmark_recycle(struct fiber *fiber)
> +{
> + static bool overflow_warned = false;
> +
> + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
> + return;
> +
> +#ifndef TARGET_OS_DARWIN
> + /*
> + * On recycle we're trying to shrink stack
> + * as much as we can until first mark overwrite
> + * detected, then we simply zap watermark and
> + * assume the stack is balanced and won't change
> + * much in future.
> + */
I don't understand. If the watermark has been exceeded once for a
particular fiber, we'll never ever shrink the stack back to normal?
That is if a task eager for stack is rescheduled on different fibers
they will all end up with huge stacks. That's not what we want.
We want the stack size to be set dynamically so that the *majority* of
fibers don't need to shrink/grow stacks back and forth. Or am I reading
the code wrong?
> + if (fiber->stack_dynamic_wmark != NULL) {
> + if (!stack_has_wmark(fiber->stack_dynamic_wmark))
> + fiber->stack_dynamic_wmark = NULL;
> + else
> + stack_shrink(fiber);
> + }
> +#endif
> +
> + /*
> + * We are watching for stack overflow in one shot way:
> + * simply to not spam a user with messages, if someone
> + * triggered the problem it is highly likely that another
> + * fiber hit the same.
> + */
> + if (overflow_warned)
> + return;
> +
> + if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
> + say_warn("fiber %d seems to overflow the stack soon",
> + fiber->name, fiber->fid);
No point in printing fiber->name and id - they are printed by the say
module, anyway. At the same time I'd prefer to keep max stack size here
and print its value whenever we have to grow stack, similarly to how
the Linux kernel handles it.
> + overflow_warned = true;
> + }
> +}
> +
> +static void
> +fiber_wmark_init(struct fiber *fiber)
> +{
> + /*
> + * No tracking on custom stacks
> + * in a sake of simplicity.
> + */
> + if (fiber->flags & FIBER_CUSTOM_STACK) {
> + fiber->stack_overflow_wmark = NULL;
> + fiber->stack_dynamic_wmark = NULL;
> + return;
> + }
> +
> + /*
> + * Initially we arm the last page of the stack
> + * to catch if we're getting close to its exhausting.
> + * In turn the dynamic mark is put into next page so
> + * we could drop pages later if they are unused.
> + */
> + if (stack_direction < 0) {
> + fiber->stack_overflow_wmark = fiber->stack;
> + fiber->stack_overflow_wmark += wmark_inpage_offset;
> +
> + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
> + } else {
> + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size;
> + fiber->stack_overflow_wmark -= page_size;
> + fiber->stack_overflow_wmark -= wmark_inpage_offset;
> +
> + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
I'd rather start with the minimal allowed stack size and grow it on
demand, if there are too many fibers that exceed the watermark now and
then, not vice versa (imagine we start from 16 MB).
> + }
> + stack_put_wmark(fiber->stack_overflow_wmark);
> +}
> +
> +static void
> +stack_wmark_init(void)
> +{
> + uint16_t v;
> +
> + /*
> + * To increase probability of the stack overflow
> + * detection we put _first_ mark at random position
> + * in first 128 bytes range. The rest of the marks
> + * are put with constant step (because we should not
> + * pressue random generator much in case of hight
> + * number of fibers).
> + */
> + random_bytes((void *)&v, sizeof(v));
Why not rand?
Anyway, I think we should recalculate the offset for each fiber
individually each time it's recycled. Setting it once on system
initialization is no better than not using randomization at all.
As for pressure on the random generator - I don't understand why
it can possibly be bad. After all, it's a pseudo random generator,
not /dev/random.
> + wmark_inpage_offset = ((v % 128) + 8) & ~7;
> +}
> +
> static int
> fiber_stack_create(struct fiber *fiber, size_t stack_size)
> {
> @@ -757,6 +936,7 @@ fiber_stack_create(struct fiber *fiber, size_t stack_size)
> madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED);
> #endif
>
> + fiber_wmark_init(fiber);
> mprotect(guard, page_size, PROT_NONE);
> return 0;
> }
> @@ -926,8 +1106,12 @@ cord_create(struct cord *cord, const char *name)
> /* Record stack extents */
> tt_pthread_attr_getstack(cord->id, &cord->sched.stack,
> &cord->sched.stack_size);
> + cord->sched.stack_overflow_wmark = cord->sched.stack;
> + cord->sched.stack_dynamic_wmark = cord->sched.stack;
> #else
> cord->sched.stack = NULL;
> + cord->sched.stack_overflow_wmark = NULL;
> + cord->sched.stack_dynamic_wmark = NULL;
> cord->sched.stack_size = 0;
> #endif
> }
> @@ -1238,6 +1422,7 @@ fiber_init(int (*invoke)(fiber_func f, va_list ap))
> {
> page_size = sysconf(_SC_PAGESIZE);
> stack_direction = check_stack_direction(__builtin_frame_address(0));
> + stack_wmark_init();
> fiber_invoke = invoke;
> main_thread_id = pthread_self();
> main_cord.loop = ev_default_loop(EVFLAG_AUTO | EVFLAG_ALLOCFD);
> diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
> index f1f5a0555..271d2b4d0 100644
> --- a/src/lib/core/fiber.h
> +++ b/src/lib/core/fiber.h
> @@ -348,6 +348,10 @@ struct fiber {
> struct slab *stack_slab;
> /** Coro stack addr. */
> void *stack;
> + /** Stack dynamic watermark addr for usage optimization. */
> + void *stack_dynamic_wmark;
> + /** Stack watermark addr for overflow detection. */
> + void *stack_overflow_wmark;
Here I'd like to see a more elaborate comment, explaining why we need
these members, what exactly the stack usage optimization is about.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
2019-03-06 18:05 ` Vladimir Davydov
@ 2019-03-06 19:03 ` Cyrill Gorcunov
2019-03-07 8:27 ` Vladimir Davydov
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-06 19:03 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches, kostja
On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote:
...
> > +
> > +static void
> > +fiber_wmark_recycle(struct fiber *fiber)
> > +{
> > + static bool overflow_warned = false;
> > +
> > + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
> > + return;
> > +
> > +#ifndef TARGET_OS_DARWIN
> > + /*
> > + * On recycle we're trying to shrink stack
> > + * as much as we can until first mark overwrite
> > + * detected, then we simply zap watermark and
> > + * assume the stack is balanced and won't change
> > + * much in future.
> > + */
>
> I don't understand. If the watermark has been exceeded once for a
> particular fiber, we'll never ever shrink the stack back to normal?
I guess you meant "extend", not "shrink". But true -- every recycle
we're trying to shrink stack, ie to make it less. And once we found
that wmark is vanished we stop further analisys.
> That is if a task eager for stack is rescheduled on different fibers
> they will all end up with huge stacks. That's not what we want.
> We want the stack size to be set dynamically so that the *majority* of
> fibers don't need to shrink/grow stacks back and forth. Or am I reading
> the code wrong?
No, you're right. But it is unclean for me yet how we should reach
this goal? Track stack usage in context switch?
...
> > +
> > + if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
> > + say_warn("fiber %d seems to overflow the stack soon",
> > + fiber->name, fiber->fid);
>
> No point in printing fiber->name and id - they are printed by the say
> module, anyway. At the same time I'd prefer to keep max stack size here
> and print its value whenever we have to grow stack, similarly to how
> the Linux kernel handles it.
ok
> > + if (stack_direction < 0) {
> > + fiber->stack_overflow_wmark = fiber->stack;
> > + fiber->stack_overflow_wmark += wmark_inpage_offset;
> > +
> > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
> > + } else {
> > + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size;
> > + fiber->stack_overflow_wmark -= page_size;
> > + fiber->stack_overflow_wmark -= wmark_inpage_offset;
> > +
> > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
>
> I'd rather start with the minimal allowed stack size and grow it on
> demand, if there are too many fibers that exceed the watermark now and
> then, not vice versa (imagine we start from 16 MB).
This won't work. Imagine we put wmark in _first_ page, and then we find
that the wmark has been overwriten, where to put next mark then? Or
you propose to catch signals?
> > +
> > + /*
> > + * To increase probability of the stack overflow
> > + * detection we put _first_ mark at random position
> > + * in first 128 bytes range. The rest of the marks
> > + * are put with constant step (because we should not
> > + * pressue random generator much in case of hight
> > + * number of fibers).
> > + */
> > + random_bytes((void *)&v, sizeof(v));
>
> Why not rand?
Because we already have a helper for random numbers :)
>
> Anyway, I think we should recalculate the offset for each fiber
> individually each time it's recycled. Setting it once on system
> initialization is no better than not using randomization at all.
> As for pressure on the random generator - I don't understand why
> it can possibly be bad. After all, it's a pseudo random generator,
> not /dev/random.
ok
> > @@ -348,6 +348,10 @@ struct fiber {
> > struct slab *stack_slab;
> > /** Coro stack addr. */
> > void *stack;
> > + /** Stack dynamic watermark addr for usage optimization. */
> > + void *stack_dynamic_wmark;
> > + /** Stack watermark addr for overflow detection. */
> > + void *stack_overflow_wmark;
>
> Here I'd like to see a more elaborate comment, explaining why we need
> these members, what exactly the stack usage optimization is about.
ok
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
2019-03-06 19:03 ` Cyrill Gorcunov
@ 2019-03-07 8:27 ` Vladimir Davydov
2019-03-07 9:04 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Davydov @ 2019-03-07 8:27 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches, kostja
On Wed, Mar 06, 2019 at 10:03:49PM +0300, Cyrill Gorcunov wrote:
> On Wed, Mar 06, 2019 at 09:05:04PM +0300, Vladimir Davydov wrote:
> ...
> > > +
> > > +static void
> > > +fiber_wmark_recycle(struct fiber *fiber)
> > > +{
> > > + static bool overflow_warned = false;
> > > +
> > > + if (fiber->stack == NULL || fiber->flags & FIBER_CUSTOM_STACK)
> > > + return;
> > > +
> > > +#ifndef TARGET_OS_DARWIN
> > > + /*
> > > + * On recycle we're trying to shrink stack
> > > + * as much as we can until first mark overwrite
> > > + * detected, then we simply zap watermark and
> > > + * assume the stack is balanced and won't change
> > > + * much in future.
> > > + */
> >
> > I don't understand. If the watermark has been exceeded once for a
> > particular fiber, we'll never ever shrink the stack back to normal?
>
> I guess you meant "extend", not "shrink". But true -- every recycle
> we're trying to shrink stack, ie to make it less. And once we found
> that wmark is vanished we stop further analisys.
>
> > That is if a task eager for stack is rescheduled on different fibers
> > they will all end up with huge stacks. That's not what we want.
> > We want the stack size to be set dynamically so that the *majority* of
> > fibers don't need to shrink/grow stacks back and forth. Or am I reading
> > the code wrong?
>
> No, you're right. But it is unclean for me yet how we should reach
> this goal? Track stack usage in context switch?
You can track the ratio of fibers that have overwritten the watermark
recently. If the ratio gets too high (say 90% out of 1000 recently
scheduled fibers), the minimal stack allocation should be increased.
> ...
> > > +
> > > + if (!stack_has_wmark(fiber->stack_overflow_wmark)) {
> > > + say_warn("fiber %d seems to overflow the stack soon",
> > > + fiber->name, fiber->fid);
> >
> > No point in printing fiber->name and id - they are printed by the say
> > module, anyway. At the same time I'd prefer to keep max stack size here
> > and print its value whenever we have to grow stack, similarly to how
> > the Linux kernel handles it.
>
> ok
>
> > > + if (stack_direction < 0) {
> > > + fiber->stack_overflow_wmark = fiber->stack;
> > > + fiber->stack_overflow_wmark += wmark_inpage_offset;
> > > +
> > > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark + page_size;
> > > + } else {
> > > + fiber->stack_overflow_wmark = fiber->stack + fiber->stack_size;
> > > + fiber->stack_overflow_wmark -= page_size;
> > > + fiber->stack_overflow_wmark -= wmark_inpage_offset;
> > > +
> > > + fiber->stack_dynamic_wmark = fiber->stack_overflow_wmark - page_size;
> >
> > I'd rather start with the minimal allowed stack size and grow it on
> > demand, if there are too many fibers that exceed the watermark now and
> > then, not vice versa (imagine we start from 16 MB).
>
> This won't work. Imagine we put wmark in _first_ page, and then we find
> that the wmark has been overwriten, where to put next mark then? Or
> you propose to catch signals?
Put the next watermark in the second page then. If it gets overwritten,
too, put it in the third page, and so on. Perhaps, we can even double
the minimal stack allocation each time.
>
> > > +
> > > + /*
> > > + * To increase probability of the stack overflow
> > > + * detection we put _first_ mark at random position
> > > + * in first 128 bytes range. The rest of the marks
> > > + * are put with constant step (because we should not
> > > + * pressue random generator much in case of hight
> > > + * number of fibers).
> > > + */
> > > + random_bytes((void *)&v, sizeof(v));
> >
> > Why not rand?
>
> Because we already have a helper for random numbers :)
It's a helper for filling an array with random bytes. I don't see much
point in using it for generating a single random number.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
2019-03-07 8:27 ` Vladimir Davydov
@ 2019-03-07 9:04 ` Cyrill Gorcunov
2019-03-07 9:20 ` Vladimir Davydov
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2019-03-07 9:04 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches, kostja
On Thu, Mar 07, 2019 at 11:27:58AM +0300, Vladimir Davydov wrote:
...
>
> You can track the ratio of fibers that have overwritten the watermark
> recently. If the ratio gets too high (say 90% out of 1000 recently
> scheduled fibers), the minimal stack allocation should be increased.
IOW, we need to gather statistics :/ I would rather splt this task
into two:
- make stack 1M by default and madvise as not needed and merge,
it upstream asap, to not block users
- second series is to optimize stack usage
> >
> > This won't work. Imagine we put wmark in _first_ page, and then we find
> > that the wmark has been overwriten, where to put next mark then? Or
> > you propose to catch signals?
>
> Put the next watermark in the second page then. If it gets overwritten,
> too, put it in the third page, and so on. Perhaps, we can even double
> the minimal stack allocation each time.
ok
> > >
> > > Why not rand?
> >
> > Because we already have a helper for random numbers :)
>
> It's a helper for filling an array with random bytes. I don't see much
> point in using it for generating a single random number.
ok
Cyrill
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage
2019-03-07 9:04 ` Cyrill Gorcunov
@ 2019-03-07 9:20 ` Vladimir Davydov
0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Davydov @ 2019-03-07 9:20 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches, kostja
On Thu, Mar 07, 2019 at 12:04:39PM +0300, Cyrill Gorcunov wrote:
> On Thu, Mar 07, 2019 at 11:27:58AM +0300, Vladimir Davydov wrote:
> ...
> >
> > You can track the ratio of fibers that have overwritten the watermark
> > recently. If the ratio gets too high (say 90% out of 1000 recently
> > scheduled fibers), the minimal stack allocation should be increased.
>
> IOW, we need to gather statistics :/ I would rather splt this task
> into two:
>
> - make stack 1M by default and madvise as not needed and merge,
> it upstream asap, to not block users
>
> - second series is to optimize stack usage
Fair enough.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-03-07 9:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05 22:38 [PATCH v3 0/3] core/fiber: Rework stack management Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 1/3] core/fiber: Increase default stack size Cyrill Gorcunov
2019-03-05 23:43 ` [tarantool-patches] " Alexander Turenko
2019-03-06 7:09 ` Cyrill Gorcunov
2019-03-06 7:30 ` Konstantin Osipov
2019-03-05 22:38 ` [PATCH 2/3] core/fiber: Mark stack as unneeded on creation Cyrill Gorcunov
2019-03-05 22:38 ` [PATCH 3/3] core/fiber: Put watermarks into stack to track its usage Cyrill Gorcunov
2019-03-06 18:05 ` Vladimir Davydov
2019-03-06 19:03 ` Cyrill Gorcunov
2019-03-07 8:27 ` Vladimir Davydov
2019-03-07 9:04 ` Cyrill Gorcunov
2019-03-07 9:20 ` Vladimir Davydov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox