* [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors @ 2020-02-06 12:31 Cyrill Gorcunov 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure Cyrill Gorcunov ` (4 more replies) 0 siblings, 5 replies; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-06 12:31 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy Sasha, I put your reviewed-by tag, please let me know if you're not agreed. Vlad, take a look please once time permit. v6 (by @alexander.turenko): - polish unit test (style and results) - fix bracing in core/fiber.c issue https://github.com/tarantool/tarantool/issues/4722 branch gorcunov/gh-4722-mprotect-diag-error-6 Cyrill Gorcunov (2): fiber: set diagnostics at madvise/mprotect failure fiber: leak slab if unable to bring prots back src/lib/core/errinj.h | 2 + src/lib/core/fiber.c | 93 +++++++++++++++++++----- test/box/errinj.result | 136 ++++++++++++++++++----------------- test/unit/CMakeLists.txt | 4 ++ test/unit/fiber_stack.cc | 83 +++++++++++++++++++++ test/unit/fiber_stack.result | 7 ++ test/unit/suite.ini | 2 +- 7 files changed, 242 insertions(+), 85 deletions(-) create mode 100644 test/unit/fiber_stack.cc create mode 100644 test/unit/fiber_stack.result -- 2.20.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure 2020-02-06 12:31 [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Cyrill Gorcunov @ 2020-02-06 12:31 ` Cyrill Gorcunov 2020-02-13 0:08 ` Vladislav Shpilevoy 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back Cyrill Gorcunov ` (3 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-06 12:31 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy Both madvise and mprotect calls can fail due to various reasons, mostly because of lack of free memory in the system. We log such cases via say_x helpers but this is not enough. In particular tarantool/memcached relies on diag error to be set to detect an error condition: | expire_fiber = fiber_new(name, memcached_expire_loop); | const box_error_t *err = box_error_last(); | if (err) { | say_error("Can't start the expire fiber"); | say_error("%s", box_error_message(err)); | return -1; | } Thus lets use diag_set() helper here and instead of macros use inline functions for better readability. Fixes #4722 Reported-by: Alexander Turenko <alexander.turenko@tarantool.org> Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- src/lib/core/errinj.h | 2 + src/lib/core/fiber.c | 84 +++++++++++++++++----- test/box/errinj.result | 136 ++++++++++++++++++----------------- test/unit/CMakeLists.txt | 4 ++ test/unit/fiber_stack.cc | 83 +++++++++++++++++++++ test/unit/fiber_stack.result | 7 ++ test/unit/suite.ini | 2 +- 7 files changed, 234 insertions(+), 84 deletions(-) create mode 100644 test/unit/fiber_stack.cc create mode 100644 test/unit/fiber_stack.result diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h index 672da2119..ed0cba903 100644 --- a/src/lib/core/errinj.h +++ b/src/lib/core/errinj.h @@ -135,6 +135,8 @@ struct errinj { _(ERRINJ_COIO_SENDFILE_CHUNK, ERRINJ_INT, {.iparam = -1}) \ _(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \ + _(ERRINJ_FIBER_MADVISE, ERRINJ_BOOL, {.bparam = false}) \ + _(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1}) ENUM0(errinj_id, ERRINJ_LIST); extern struct errinj errinjs[]; diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 00ae8cded..57e4cb6ef 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -41,6 +41,7 @@ #include "assoc.h" #include "memory.h" #include "trigger.h" +#include "errinj.h" #if ENABLE_FIBER_TOP #include <x86intrin.h> /* __rdtscp() */ @@ -173,21 +174,40 @@ static int (*fiber_invoke)(fiber_func f, va_list ap); #define ASAN_FINISH_SWITCH_FIBER(var_name) #endif -#define fiber_madvise(addr, len, advice) \ -({ \ - int err = madvise((addr), (len), (advice)); \ - if (err) \ - say_syserror("madvise"); \ - err; \ -}) - -#define fiber_mprotect(addr, len, prot) \ -({ \ - int err = mprotect((addr), (len), (prot)); \ - if (err) \ - say_syserror("mprotect"); \ - err; \ -}) +static inline int +fiber_madvise(void *addr, size_t len, int advice) +{ + int rc = 0; + + ERROR_INJECT(ERRINJ_FIBER_MADVISE, { + errno = ENOMEM; + rc = -1; + }); + + if (rc != 0 || madvise(addr, len, advice) != 0) { + diag_set(SystemError, "fiber madvise failed"); + return -1; + } + return 0; +} + +static inline int +fiber_mprotect(void *addr, size_t len, int prot) +{ + int rc = 0; + + struct errinj *inj = errinj(ERRINJ_FIBER_MPROTECT, ERRINJ_INT); + if (inj != NULL && inj->iparam == prot) { + errno = ENOMEM; + rc = -1; + } + + if (rc != 0 || mprotect(addr, len, prot) != 0) { + diag_set(SystemError, "fiber mprotect failed"); + return -1; + } + return 0; +} #if ENABLE_FIBER_TOP static __thread bool fiber_top_enabled = false; @@ -951,6 +971,12 @@ fiber_stack_recycle(struct fiber *fiber) start = page_align_up(fiber->stack_watermark); end = fiber->stack + fiber->stack_size; } + + /* + * Ignore errors on MADV_DONTNEED because this is + * just a hint for OS and not critical for + * functionality. + */ fiber_madvise(start, end - start, MADV_DONTNEED); stack_put_watermark(fiber->stack_watermark); } @@ -969,7 +995,9 @@ fiber_stack_watermark_create(struct fiber *fiber) /* * We don't expect the whole stack usage in regular - * loads, let's try to minimize rss pressure. + * loads, let's try to minimize rss pressure. But do + * not exit if MADV_DONTNEED failed, it is just a hint + * for OS, not critical one. */ fiber_madvise(fiber->stack, fiber->stack_size, MADV_DONTNEED); @@ -1007,6 +1035,8 @@ fiber_stack_watermark_create(struct fiber *fiber) static void fiber_stack_destroy(struct fiber *fiber, struct slab_cache *slabc) { + static const int mprotect_flags = PROT_READ | PROT_WRITE; + if (fiber->stack != NULL) { VALGRIND_STACK_DEREGISTER(fiber->stack_id); #if ENABLE_ASAN @@ -1017,7 +1047,22 @@ fiber_stack_destroy(struct fiber *fiber, struct slab_cache *slabc) guard = page_align_down(fiber->stack - page_size); else guard = page_align_up(fiber->stack + fiber->stack_size); - fiber_mprotect(guard, page_size, PROT_READ | PROT_WRITE); + + if (fiber_mprotect(guard, page_size, mprotect_flags) != 0) { + /* + * FIXME: We need some intelligent handling: + * say put this slab into a queue and retry + * to setup the original protection back in + * background. + * + * Note that in case if we're called from + * fiber_stack_create() the @mprotect_flags is + * the same as the slab been created with, so + * calling mprotect for VMA with same flags + * won't fail. + */ + diag_log(); + } slab_put(slabc, fiber->stack_slab); } } @@ -1064,6 +1109,11 @@ fiber_stack_create(struct fiber *fiber, struct slab_cache *slabc, fiber->stack_size); if (fiber_mprotect(guard, page_size, PROT_NONE)) { + /* + * Write an error into the log since a guard + * page is critical for functionality. + */ + diag_log(); fiber_stack_destroy(fiber, slabc); return -1; } diff --git a/test/box/errinj.result b/test/box/errinj.result index babe36b1b..16f6b40c2 100644 --- a/test/box/errinj.result +++ b/test/box/errinj.result @@ -23,132 +23,136 @@ errinj.info() --- - ERRINJ_VY_RUN_WRITE_STMT_TIMEOUT: state: 0 - ERRINJ_WAL_WRITE: - state: false - ERRINJ_RELAY_BREAK_LSN: + ERRINJ_WAL_BREAK_LSN: state: -1 - ERRINJ_HTTPC_EXECUTE: - state: false ERRINJ_VYRUN_DATA_READ: state: false - ERRINJ_SWIM_FD_ONLY: - state: false - ERRINJ_SQL_NAME_NORMALIZATION: - state: false ERRINJ_VY_SCHED_TIMEOUT: state: 0 - ERRINJ_COIO_SENDFILE_CHUNK: - state: -1 ERRINJ_HTTP_RESPONSE_ADD_WAIT: state: false - ERRINJ_WAL_WRITE_PARTIAL: - state: -1 - ERRINJ_VY_GC: - state: false - ERRINJ_WAL_DELAY: - state: false - ERRINJ_INDEX_ALLOC: - state: false ERRINJ_WAL_WRITE_EOF: state: false - ERRINJ_WAL_SYNC: - state: false - ERRINJ_BUILD_INDEX: - state: -1 ERRINJ_BUILD_INDEX_DELAY: state: false - ERRINJ_VY_RUN_FILE_RENAME: - state: false - ERRINJ_VY_COMPACTION_DELAY: - state: false - ERRINJ_VY_DUMP_DELAY: - state: false ERRINJ_VY_DELAY_PK_LOOKUP: state: false - ERRINJ_VY_TASK_COMPLETE: - state: false - ERRINJ_PORT_DUMP: + ERRINJ_VY_POINT_ITER_WAIT: state: false - ERRINJ_WAL_BREAK_LSN: - state: -1 ERRINJ_WAL_IO: state: false - ERRINJ_WAL_FALLOCATE: - state: 0 - ERRINJ_DYN_MODULE_COUNT: - state: 0 ERRINJ_VY_INDEX_FILE_RENAME: state: false ERRINJ_TUPLE_FORMAT_COUNT: state: -1 ERRINJ_TUPLE_ALLOC: state: false - ERRINJ_VY_RUN_WRITE_DELAY: + ERRINJ_VY_RUN_FILE_RENAME: state: false ERRINJ_VY_READ_PAGE: state: false ERRINJ_RELAY_REPORT_INTERVAL: state: 0 - ERRINJ_VY_LOG_FILE_RENAME: - state: false - ERRINJ_VY_READ_PAGE_TIMEOUT: - state: 0 + ERRINJ_RELAY_BREAK_LSN: + state: -1 ERRINJ_XLOG_META: state: false - ERRINJ_SIO_READ_MAX: - state: -1 ERRINJ_SNAP_COMMIT_DELAY: state: false - ERRINJ_WAL_WRITE_DISK: + ERRINJ_VY_RUN_WRITE: state: false - ERRINJ_SNAP_WRITE_DELAY: + ERRINJ_BUILD_INDEX: + state: -1 + ERRINJ_RELAY_FINAL_JOIN: + state: false + ERRINJ_REPLICA_JOIN_DELAY: state: false ERRINJ_LOG_ROTATE: state: false - ERRINJ_VY_RUN_WRITE: + ERRINJ_MEMTX_DELAY_GC: state: false - ERRINJ_CHECK_FORMAT_DELAY: + ERRINJ_XLOG_GARBAGE: + state: false + ERRINJ_VY_READ_PAGE_DELAY: + state: false + ERRINJ_SWIM_FD_ONLY: + state: false + ERRINJ_WAL_WRITE: + state: false + ERRINJ_HTTPC_EXECUTE: + state: false + ERRINJ_SQL_NAME_NORMALIZATION: + state: false + ERRINJ_WAL_WRITE_PARTIAL: + state: -1 + ERRINJ_VY_GC: + state: false + ERRINJ_WAL_DELAY: + state: false + ERRINJ_XLOG_READ: + state: -1 + ERRINJ_WAL_SYNC: + state: false + ERRINJ_VY_TASK_COMPLETE: state: false + ERRINJ_PORT_DUMP: + state: false + ERRINJ_COIO_SENDFILE_CHUNK: + state: -1 + ERRINJ_DYN_MODULE_COUNT: + state: 0 + ERRINJ_SIO_READ_MAX: + state: -1 + ERRINJ_FIBER_MPROTECT: + state: -1 + ERRINJ_FIBER_MADVISE: + state: false + ERRINJ_RELAY_TIMEOUT: + state: 0 + ERRINJ_VY_DUMP_DELAY: + state: false + ERRINJ_VY_SQUASH_TIMEOUT: + state: 0 ERRINJ_VY_LOG_FLUSH_DELAY: state: false - ERRINJ_RELAY_FINAL_JOIN: + ERRINJ_RELAY_SEND_DELAY: state: false - ERRINJ_REPLICA_JOIN_DELAY: + ERRINJ_VY_COMPACTION_DELAY: state: false - ERRINJ_RELAY_FINAL_SLEEP: + ERRINJ_VY_LOG_FILE_RENAME: state: false ERRINJ_VY_RUN_DISCARD: state: false ERRINJ_WAL_ROTATE: state: false - ERRINJ_RELAY_EXIT_DELAY: + ERRINJ_VY_READ_PAGE_TIMEOUT: state: 0 - ERRINJ_VY_POINT_ITER_WAIT: + ERRINJ_VY_INDEX_DUMP: + state: -1 + ERRINJ_TUPLE_FIELD: state: false - ERRINJ_MEMTX_DELAY_GC: + ERRINJ_SNAP_WRITE_DELAY: state: false ERRINJ_IPROTO_TX_DELAY: state: false - ERRINJ_XLOG_READ: - state: -1 - ERRINJ_TUPLE_FIELD: + ERRINJ_RELAY_EXIT_DELAY: + state: 0 + ERRINJ_RELAY_FINAL_SLEEP: state: false - ERRINJ_XLOG_GARBAGE: + ERRINJ_WAL_WRITE_DISK: state: false - ERRINJ_VY_INDEX_DUMP: - state: -1 - ERRINJ_VY_READ_PAGE_DELAY: + ERRINJ_CHECK_FORMAT_DELAY: state: false ERRINJ_TESTING: state: false - ERRINJ_RELAY_SEND_DELAY: + ERRINJ_VY_RUN_WRITE_DELAY: state: false - ERRINJ_VY_SQUASH_TIMEOUT: + ERRINJ_WAL_FALLOCATE: state: 0 ERRINJ_VY_LOG_FLUSH: state: false - ERRINJ_RELAY_TIMEOUT: - state: 0 + ERRINJ_INDEX_ALLOC: + state: false ... errinj.set("some-injection", true) --- diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt index 4a57597e9..212a02d36 100644 --- a/test/unit/CMakeLists.txt +++ b/test/unit/CMakeLists.txt @@ -74,6 +74,10 @@ add_executable(fiber.test fiber.cc) set_source_files_properties(fiber.cc PROPERTIES COMPILE_FLAGS -O0) target_link_libraries(fiber.test core unit) +add_executable(fiber_stack.test fiber_stack.cc) +set_source_files_properties(fiber_stack.cc PROPERTIES COMPILE_FLAGS -O0) +target_link_libraries(fiber_stack.test core unit) + if (NOT ENABLE_GCOV) # This test is known to be broken with GCOV add_executable(guard.test guard.cc) diff --git a/test/unit/fiber_stack.cc b/test/unit/fiber_stack.cc new file mode 100644 index 000000000..de7fe90e3 --- /dev/null +++ b/test/unit/fiber_stack.cc @@ -0,0 +1,83 @@ +#include "memory.h" +#include "fiber.h" +#include "unit.h" +#include "trivia/util.h" +#include "errinj.h" + +static struct fiber_attr default_attr; + +static int +noop_f(va_list ap) +{ + return 0; +} + +static void +fiber_stack_fail_test(void) +{ + struct errinj *inj; + struct fiber *fiber; + + header(); + + /* + * Set non-default stack size to prevent reusing of an + * existing fiber. + */ + struct fiber_attr *fiber_attr = fiber_attr_new(); + fiber_attr_setstacksize(fiber_attr, default_attr.stack_size * 2); + + /* + * Clear the fiber's diagnostics area to check that failed + * fiber_new() sets an error. + */ + diag_clear(diag_get()); + + /* + * Check guard page setup via mprotect. We can't test the fiber + * destroy path since it clears fiber's diag. + */ + inj = errinj(ERRINJ_FIBER_MPROTECT, ERRINJ_INT); + inj->iparam = PROT_NONE; + fiber = fiber_new_ex("test_mprotect", fiber_attr, noop_f); + inj->iparam = -1; + + ok(fiber == NULL, "mprotect: failed to setup fiber guard page"); + ok(diag_get() != NULL, "mprotect: diag is armed after error"); + + /* + * Check madvise. We can't test the fiber destroy + * path since it is cleaning error. + */ + diag_clear(diag_get()); + inj = errinj(ERRINJ_FIBER_MADVISE, ERRINJ_BOOL); + inj->bparam = true; + fiber = fiber_new_ex("test_madvise", fiber_attr, noop_f); + inj->bparam = false; + + ok(fiber != NULL, "madvise: non critical error on madvise hint"); + ok(diag_get() != NULL, "madvise: diag is armed after error"); + + footer(); +} + +static int +main_f(va_list ap) +{ + fiber_stack_fail_test(); + ev_break(loop(), EVBREAK_ALL); + return 0; +} + +int main() +{ + memory_init(); + fiber_init(fiber_cxx_invoke); + fiber_attr_create(&default_attr); + struct fiber *main = fiber_new_xc("main", main_f); + fiber_wakeup(main); + ev_run(loop(), 0); + fiber_free(); + memory_free(); + return 0; +} diff --git a/test/unit/fiber_stack.result b/test/unit/fiber_stack.result new file mode 100644 index 000000000..83f0d1490 --- /dev/null +++ b/test/unit/fiber_stack.result @@ -0,0 +1,7 @@ +SystemError fiber mprotect failed: Cannot allocate memory + *** fiber_stack_fail_test *** +ok 1 - mprotect: failed to setup fiber guard page +ok 2 - mprotect: diag is armed after error +ok 3 - madvise: non critical error on madvise hint +ok 4 - madvise: diag is armed after error + *** fiber_stack_fail_test: done *** diff --git a/test/unit/suite.ini b/test/unit/suite.ini index 89c8499fc..d429c95e9 100644 --- a/test/unit/suite.ini +++ b/test/unit/suite.ini @@ -1,5 +1,5 @@ [default] core = unittest description = unit tests -release_disabled = swim_errinj.test +release_disabled = fiber_stack.test swim_errinj.test is_parallel = True -- 2.20.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure Cyrill Gorcunov @ 2020-02-13 0:08 ` Vladislav Shpilevoy 2020-02-13 8:19 ` Cyrill Gorcunov 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-02-13 0:08 UTC (permalink / raw) To: Cyrill Gorcunov, tml Thanks for the patch! See 7 comments below. On 06/02/2020 13:31, Cyrill Gorcunov wrote: > Both madvise and mprotect calls can fail due to various > reasons, mostly because of lack of free memory in the > system. > > We log such cases via say_x helpers but this is not enough. > In particular tarantool/memcached relies on diag error to be > set to detect an error condition: > > | expire_fiber = fiber_new(name, memcached_expire_loop); > | const box_error_t *err = box_error_last(); > | if (err) { > | say_error("Can't start the expire fiber"); > | say_error("%s", box_error_message(err)); > | return -1; > | } > > Thus lets use diag_set() helper here and instead of macros > use inline functions for better readability. > > Fixes #4722 > > Reported-by: Alexander Turenko <alexander.turenko@tarantool.org> > Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org> > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > --- > src/lib/core/errinj.h | 2 + > src/lib/core/fiber.c | 84 +++++++++++++++++----- > test/box/errinj.result | 136 ++++++++++++++++++----------------- > test/unit/CMakeLists.txt | 4 ++ > test/unit/fiber_stack.cc | 83 +++++++++++++++++++++ > test/unit/fiber_stack.result | 7 ++ > test/unit/suite.ini | 2 +- > 7 files changed, 234 insertions(+), 84 deletions(-) > create mode 100644 test/unit/fiber_stack.cc > create mode 100644 test/unit/fiber_stack.result > > ENUM0(errinj_id, ERRINJ_LIST); > extern struct errinj errinjs[]; > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 00ae8cded..57e4cb6ef 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -1017,7 +1047,22 @@ fiber_stack_destroy(struct fiber *fiber, struct slab_cache *slabc) > guard = page_align_down(fiber->stack - page_size); > else > guard = page_align_up(fiber->stack + fiber->stack_size); > - fiber_mprotect(guard, page_size, PROT_READ | PROT_WRITE); > + > + if (fiber_mprotect(guard, page_size, mprotect_flags) != 0) { > + /* > + * FIXME: We need some intelligent handling: > + * say put this slab into a queue and retry > + * to setup the original protection back in > + * background. > + * > + * Note that in case if we're called from > + * fiber_stack_create() the @mprotect_flags is 1. Nit: we use Doxygen comment style, so any word right after @ is treated as a Doxygen command. If you want to mention some parameter, use command @a. Like this: '@a mprotect_flags'. > + * the same as the slab been created with, so > + * calling mprotect for VMA with same flags > + * won't fail. > + */ 2. Yeah, good idea. We can do it not only in background though, but do similar to pthread and how it manages stacks to free. We can put stacks into a list, and try to free something from that list on each next creation of a fiber. Or even reuse right away, without freeing. > + diag_log(); > + } > slab_put(slabc, fiber->stack_slab); > } > } > diff --git a/test/box/errinj.result b/test/box/errinj.result > index babe36b1b..16f6b40c2 100644 > --- a/test/box/errinj.result > +++ b/test/box/errinj.result 3. So big diff for such a small change. I propose to fix this. We can sort result of errinj.info before printing it. It would make all changes to errinj produce much smaller diff. The patch should be trivial. Could you please create it? > diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt > index 4a57597e9..212a02d36 100644 > --- a/test/unit/CMakeLists.txt > +++ b/test/unit/CMakeLists.txt > @@ -74,6 +74,10 @@ add_executable(fiber.test fiber.cc) > set_source_files_properties(fiber.cc PROPERTIES COMPILE_FLAGS -O0) > target_link_libraries(fiber.test core unit) > > +add_executable(fiber_stack.test fiber_stack.cc) > +set_source_files_properties(fiber_stack.cc PROPERTIES COMPILE_FLAGS -O0) > +target_link_libraries(fiber_stack.test core unit) > + > if (NOT ENABLE_GCOV) > # This test is known to be broken with GCOV > add_executable(guard.test guard.cc) > diff --git a/test/unit/fiber_stack.cc b/test/unit/fiber_stack.cc > new file mode 100644 > index 000000000..de7fe90e3 > --- /dev/null > +++ b/test/unit/fiber_stack.cc 4. Do you need C++ here? 5. I would call it fiber_errinj.c. In future we may want to add more tests here, which use error injections. Up to you. > @@ -0,0 +1,83 @@ > +#include "memory.h" > +#include "fiber.h" > +#include "unit.h" > +#include "trivia/util.h" > +#include "errinj.h" > + > +static struct fiber_attr default_attr; > + > +static int > +noop_f(va_list ap) > +{ > + return 0; > +} > + > +static void > +fiber_stack_fail_test(void) > +{ > + struct errinj *inj; > + struct fiber *fiber; > + > + header(); 6. After header() there should be a plan() call saying how many tests you are going to do. In the end there should be check_plan() call, which prints a summary. Could you, please, add them? > + > + /* > + * Set non-default stack size to prevent reusing of an > + * existing fiber. > + */ > + struct fiber_attr *fiber_attr = fiber_attr_new(); > + fiber_attr_setstacksize(fiber_attr, default_attr.stack_size * 2); > + > + /* > + * Clear the fiber's diagnostics area to check that failed > + * fiber_new() sets an error. > + */ > + diag_clear(diag_get()); > + > + /* > + * Check guard page setup via mprotect. We can't test the fiber > + * destroy path since it clears fiber's diag. > + */ > + inj = errinj(ERRINJ_FIBER_MPROTECT, ERRINJ_INT); > + inj->iparam = PROT_NONE; > + fiber = fiber_new_ex("test_mprotect", fiber_attr, noop_f); > + inj->iparam = -1; > + > + ok(fiber == NULL, "mprotect: failed to setup fiber guard page"); > + ok(diag_get() != NULL, "mprotect: diag is armed after error"); > + > + /* > + * Check madvise. We can't test the fiber destroy > + * path since it is cleaning error. > + */ > + diag_clear(diag_get()); > + inj = errinj(ERRINJ_FIBER_MADVISE, ERRINJ_BOOL); > + inj->bparam = true; > + fiber = fiber_new_ex("test_madvise", fiber_attr, noop_f); > + inj->bparam = false; > + > + ok(fiber != NULL, "madvise: non critical error on madvise hint"); > + ok(diag_get() != NULL, "madvise: diag is armed after error"); > + > + footer(); > +} > + > +static int > +main_f(va_list ap) > +{ 7. Main caller of test functions usually also has header(), plan(), check_plan(), and footer(). This is useful when you have many tests, and each has lots of cases. Here you have only one test, so far. Probably these calls can be omitted in main_f(). Up to you. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure 2020-02-13 0:08 ` Vladislav Shpilevoy @ 2020-02-13 8:19 ` Cyrill Gorcunov 2020-02-13 23:26 ` Vladislav Shpilevoy 0 siblings, 1 reply; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-13 8:19 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Thu, Feb 13, 2020 at 01:08:18AM +0100, Vladislav Shpilevoy wrote: > > + * > > + * Note that in case if we're called from > > + * fiber_stack_create() the @mprotect_flags is > > 1. Nit: we use Doxygen comment style, so any word right after @ is > treated as a Doxygen command. If you want to mention some parameter, > use command @a. Like this: '@a mprotect_flags'. OK > > + * the same as the slab been created with, so > > + * calling mprotect for VMA with same flags > > + * won't fail. > > + */ > > 2. Yeah, good idea. We can do it not only in background though, but > do similar to pthread and how it manages stacks to free. We can put > stacks into a list, and try to free something from that list on each > next creation of a fiber. Or even reuse right away, without freeing. Nod, this is a task for near future. > > diff --git a/test/box/errinj.result b/test/box/errinj.result > > index babe36b1b..16f6b40c2 100644 > > --- a/test/box/errinj.result > > +++ b/test/box/errinj.result > > 3. So big diff for such a small change. I propose to fix this. We > can sort result of errinj.info before printing it. It would make all > changes to errinj produce much smaller diff. The patch should be > trivial. Could you please create it? Wait, first I don't really understand what you mean by "sort". The output is in yaml format, you propose to sort nodes by names? > > diff --git a/test/unit/fiber_stack.cc b/test/unit/fiber_stack.cc > > new file mode 100644 > > index 000000000..de7fe90e3 > > --- /dev/null > > +++ b/test/unit/fiber_stack.cc > > 4. Do you need C++ here? The C++ was choosen because we already have fiber_XXX.cc tests so I simply tried to follow. > 5. I would call it fiber_errinj.c. In future we may want to > add more tests here, which use error injections. Up to you. From one point of view having one big test which would cover various aspects of tarantool might help discovering unexpectedly connected bugs. On the other hands having a set of small tests seems to be more neat in terms of their managing. I must confess I don't see strong preferrence here. > > +#include "errinj.h" > > + > > +static struct fiber_attr default_attr; > > + > > +static int > > +noop_f(va_list ap) > > +{ > > + return 0; > > +} > > + > > +static void > > +fiber_stack_fail_test(void) > > +{ > > + struct errinj *inj; > > + struct fiber *fiber; > > + > > + header(); > > 6. After header() there should be a plan() call saying how > many tests you are going to do. In the end there should be > check_plan() call, which prints a summary. Could you, > please, add them? Will take a look, thanks! > > + > > +static int > > +main_f(va_list ap) > > +{ > > 7. Main caller of test functions usually also has header(), > plan(), check_plan(), and footer(). This is useful when > you have many tests, and each has lots of cases. > > Here you have only one test, so far. Probably these calls > can be omitted in main_f(). Up to you. OK, I'll review, thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure 2020-02-13 8:19 ` Cyrill Gorcunov @ 2020-02-13 23:26 ` Vladislav Shpilevoy 2020-02-14 7:56 ` Cyrill Gorcunov 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-02-13 23:26 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tml >>> diff --git a/test/unit/fiber_stack.cc b/test/unit/fiber_stack.cc >>> new file mode 100644 >>> index 000000000..de7fe90e3 >>> --- /dev/null >>> +++ b/test/unit/fiber_stack.cc >> >> 4. Do you need C++ here? > > The C++ was choosen because we already have fiber_XXX.cc tests > so I simply tried to follow. > >> 5. I would call it fiber_errinj.c. In future we may want to >> add more tests here, which use error injections. Up to you. > > From one point of view having one big test which would cover > various aspects of tarantool might help discovering unexpectedly > connected bugs. In case of unit tests it is rather about reducing code duplication. Because any fiber unit test on C includes the same headers, does the same preparations in main(), may need some common utilities. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure 2020-02-13 23:26 ` Vladislav Shpilevoy @ 2020-02-14 7:56 ` Cyrill Gorcunov 0 siblings, 0 replies; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-14 7:56 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Fri, Feb 14, 2020 at 12:26:09AM +0100, Vladislav Shpilevoy wrote: > >> 4. Do you need C++ here? > > > > The C++ was choosen because we already have fiber_XXX.cc tests > > so I simply tried to follow. > > > >> 5. I would call it fiber_errinj.c. In future we may want to > >> add more tests here, which use error injections. Up to you. > > > > From one point of view having one big test which would cover > > various aspects of tarantool might help discovering unexpectedly > > connected bugs. > > In case of unit tests it is rather about reducing code duplication. > Because any fiber unit test on C includes the same headers, does > the same preparations in main(), may need some common utilities. I see what you mean but still prefer to have stack testings in separate test. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back 2020-02-06 12:31 [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Cyrill Gorcunov 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure Cyrill Gorcunov @ 2020-02-06 12:31 ` Cyrill Gorcunov 2020-02-13 0:08 ` Vladislav Shpilevoy 2020-02-09 17:39 ` [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Vladislav Shpilevoy ` (2 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-06 12:31 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy In case if we unable to revert guard page back to read|write we should never use such slab again. Initially I thought of just put panic here and exit but it is too destructive. I think better print an error and continue. If node admin ignore this message then one moment at future there won't be slab left for use and creating new fibers get prohibited. In future (hopefully near one) we plan to drop guard pages to prevent VMA fracturing and use stack marks instead. Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- src/lib/core/fiber.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 57e4cb6ef..a501d846a 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -1055,15 +1055,22 @@ fiber_stack_destroy(struct fiber *fiber, struct slab_cache *slabc) * to setup the original protection back in * background. * + * For now lets keep such slab referenced and + * leaked: if mprotect failed we must not allow + * to reuse such slab with PROT_NONE'ed page + * inside. + * * Note that in case if we're called from * fiber_stack_create() the @mprotect_flags is * the same as the slab been created with, so * calling mprotect for VMA with same flags * won't fail. */ - diag_log(); + say_syserror("fiber: Can't put guard page to slab. " + "Leak %zu bytes", (size_t)fiber->stack_size); + } else { + slab_put(slabc, fiber->stack_slab); } - slab_put(slabc, fiber->stack_slab); } } -- 2.20.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back Cyrill Gorcunov @ 2020-02-13 0:08 ` Vladislav Shpilevoy 2020-02-13 8:20 ` Cyrill Gorcunov 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-02-13 0:08 UTC (permalink / raw) To: Cyrill Gorcunov, tml Thanks for the patch! On 06/02/2020 13:31, Cyrill Gorcunov wrote: > In case if we unable to revert guard page back to > read|write we should never use such slab again. > > Initially I thought of just put panic here and > exit but it is too destructive. I think better > print an error and continue. If node admin ignore > this message then one moment at future there won't > be slab left for use and creating new fibers get > prohibited. > > In future (hopefully near one) we plan to drop > guard pages to prevent VMA fracturing and use > stack marks instead. > > Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org> > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > --- > src/lib/core/fiber.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 57e4cb6ef..a501d846a 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -1055,15 +1055,22 @@ fiber_stack_destroy(struct fiber *fiber, struct slab_cache *slabc) > * to setup the original protection back in > * background. > * > + * For now lets keep such slab referenced and > + * leaked: if mprotect failed we must not allow > + * to reuse such slab with PROT_NONE'ed page > + * inside. > + * > * Note that in case if we're called from > * fiber_stack_create() the @mprotect_flags is > * the same as the slab been created with, so > * calling mprotect for VMA with same flags > * won't fail. > */ > - diag_log(); > + say_syserror("fiber: Can't put guard page to slab. " > + "Leak %zu bytes", (size_t)fiber->stack_size); This behaviour actually should be testable, no? You could not test diag, but you should be able to detect a leak. Using slab_cache_used(), for example. Before and after a test. > + } else { > + slab_put(slabc, fiber->stack_slab); > } > - slab_put(slabc, fiber->stack_slab); > } > } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back 2020-02-13 0:08 ` Vladislav Shpilevoy @ 2020-02-13 8:20 ` Cyrill Gorcunov 0 siblings, 0 replies; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-13 8:20 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Thu, Feb 13, 2020 at 01:08:45AM +0100, Vladislav Shpilevoy wrote: > > - diag_log(); > > + say_syserror("fiber: Can't put guard page to slab. " > > + "Leak %zu bytes", (size_t)fiber->stack_size); > > This behaviour actually should be testable, no? You could not test > diag, but you should be able to detect a leak. Using slab_cache_used(), > for example. Before and after a test. Will try, thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors 2020-02-06 12:31 [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Cyrill Gorcunov 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure Cyrill Gorcunov 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back Cyrill Gorcunov @ 2020-02-09 17:39 ` Vladislav Shpilevoy 2020-02-09 18:34 ` Cyrill Gorcunov 2020-02-13 0:07 ` Vladislav Shpilevoy 2020-02-20 19:25 ` Alexander Turenko 4 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-02-09 17:39 UTC (permalink / raw) To: Cyrill Gorcunov, tml Hi! I remember about this patch, will review soon. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors 2020-02-09 17:39 ` [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Vladislav Shpilevoy @ 2020-02-09 18:34 ` Cyrill Gorcunov 0 siblings, 0 replies; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-09 18:34 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Sun, Feb 09, 2020 at 06:39:34PM +0100, Vladislav Shpilevoy wrote: > Hi! I remember about this patch, will review soon. Thanks, no rush. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors 2020-02-06 12:31 [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Cyrill Gorcunov ` (2 preceding siblings ...) 2020-02-09 17:39 ` [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Vladislav Shpilevoy @ 2020-02-13 0:07 ` Vladislav Shpilevoy 2020-02-13 7:16 ` Cyrill Gorcunov 2020-02-20 19:25 ` Alexander Turenko 4 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-02-13 0:07 UTC (permalink / raw) To: Cyrill Gorcunov, tml Hi! Thanks for the patchset! On 06/02/2020 13:31, Cyrill Gorcunov wrote: > Sasha, I put your reviewed-by tag, please let me know if > you're not agreed. > > Vlad, take a look please once time permit. > > v6 (by @alexander.turenko): > - polish unit test (style and results) > - fix bracing in core/fiber.c > > issue https://github.com/tarantool/tarantool/issues/4722 > branch gorcunov/gh-4722-mprotect-diag-error-6 Could you please put a link at the branch next time? Usually link is very useful when you can just click and look whether CI is ok. > Cyrill Gorcunov (2): > fiber: set diagnostics at madvise/mprotect failure > fiber: leak slab if unable to bring prots back > > src/lib/core/errinj.h | 2 + > src/lib/core/fiber.c | 93 +++++++++++++++++++----- > test/box/errinj.result | 136 ++++++++++++++++++----------------- > test/unit/CMakeLists.txt | 4 ++ > test/unit/fiber_stack.cc | 83 +++++++++++++++++++++ > test/unit/fiber_stack.result | 7 ++ > test/unit/suite.ini | 2 +- > 7 files changed, 242 insertions(+), 85 deletions(-) > create mode 100644 test/unit/fiber_stack.cc > create mode 100644 test/unit/fiber_stack.result > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors 2020-02-13 0:07 ` Vladislav Shpilevoy @ 2020-02-13 7:16 ` Cyrill Gorcunov 0 siblings, 0 replies; 14+ messages in thread From: Cyrill Gorcunov @ 2020-02-13 7:16 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Thu, Feb 13, 2020 at 01:07:15AM +0100, Vladislav Shpilevoy wrote: > > issue https://github.com/tarantool/tarantool/issues/4722 > > branch gorcunov/gh-4722-mprotect-diag-error-6 > > Could you please put a link at the branch next time? Usually > link is very useful when you can just click and look whether > CI is ok. Sure ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors 2020-02-06 12:31 [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Cyrill Gorcunov ` (3 preceding siblings ...) 2020-02-13 0:07 ` Vladislav Shpilevoy @ 2020-02-20 19:25 ` Alexander Turenko 4 siblings, 0 replies; 14+ messages in thread From: Alexander Turenko @ 2020-02-20 19:25 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tml, Vladislav Shpilevoy > Sasha, I put your reviewed-by tag, please let me know if > you're not agreed. Yep, I'm okay. So it seems the patch only waits for Vlad's ack. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-02-20 19:25 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-06 12:31 [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Cyrill Gorcunov 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 1/2] fiber: set diagnostics at madvise/mprotect failure Cyrill Gorcunov 2020-02-13 0:08 ` Vladislav Shpilevoy 2020-02-13 8:19 ` Cyrill Gorcunov 2020-02-13 23:26 ` Vladislav Shpilevoy 2020-02-14 7:56 ` Cyrill Gorcunov 2020-02-06 12:31 ` [Tarantool-patches] [PATCH v6 2/2] fiber: leak slab if unable to bring prots back Cyrill Gorcunov 2020-02-13 0:08 ` Vladislav Shpilevoy 2020-02-13 8:20 ` Cyrill Gorcunov 2020-02-09 17:39 ` [Tarantool-patches] [PATCH v6 0/2] fiber: Handle stack madvise/mprotect errors Vladislav Shpilevoy 2020-02-09 18:34 ` Cyrill Gorcunov 2020-02-13 0:07 ` Vladislav Shpilevoy 2020-02-13 7:16 ` Cyrill Gorcunov 2020-02-20 19:25 ` Alexander Turenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox