* [Tarantool-patches] [PATCH 0/2] Fiber storage leak @ 2019-12-08 19:28 Vladislav Shpilevoy 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state Vladislav Shpilevoy 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto Vladislav Shpilevoy 0 siblings, 2 replies; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-08 19:28 UTC (permalink / raw) To: tarantool-patches, kostja.osipov, imun The patchset makes fiber.storage be always deleted. Regardless of where was it created - in a fiber born in the Lua land, or in a fiber serving IProto requests and accessing Lua. That removes - a leak occurred each time when fiber storage was created for an IProto request; - a possibility to see previous request's artifacts left in fiber.storage; And makes possible to use fiber.storage as request-local data. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4662-fiber-storage-leak Issue: https://github.com/tarantool/tarantool/issues/4662 Issue: https://github.com/tarantool/tarantool/issues/3462 Vladislav Shpilevoy (2): fiber: unref fiber.storage via global Lua state fiber: destroy fiber.storage created by iproto src/box/session.cc | 11 ++- src/box/session.h | 7 +- src/box/txn.c | 13 +-- src/box/txn.h | 7 +- src/lib/core/fiber.c | 23 +++-- src/lib/core/fiber.h | 13 ++- src/lib/core/fiber_pool.c | 6 ++ src/lua/fiber.c | 35 ++++++-- test/app/gh-4662-fiber-storage-leak.result | 88 ++++++++++++++++++++ test/app/gh-4662-fiber-storage-leak.test.lua | 43 ++++++++++ 10 files changed, 217 insertions(+), 29 deletions(-) create mode 100644 test/app/gh-4662-fiber-storage-leak.result create mode 100644 test/app/gh-4662-fiber-storage-leak.test.lua -- 2.21.0 (Apple Git-122.2) ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state 2019-12-08 19:28 [Tarantool-patches] [PATCH 0/2] Fiber storage leak Vladislav Shpilevoy @ 2019-12-08 19:28 ` Vladislav Shpilevoy 2019-12-11 23:57 ` Igor Munkin 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto Vladislav Shpilevoy 1 sibling, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-08 19:28 UTC (permalink / raw) To: tarantool-patches, kostja.osipov, imun Fiber.storage is a table, available from anywhere in the fiber. It is destroyed after fiber function is finished. That provides a reliable fiber-local storage, similar to thread-local in C/C++. But there is a problem that the storage may be created via one struct lua_State, and destroyed via another. Here is an example: function test_storage() fiber.self().storage.key = 100 end box.schema.func.create('test_storage') _ = fiber.create(function() box.func.test_storage:call() end) There are 3 struct lua_State: tarantool_L - global always alive Lua state; L1 - stack of the fiber, created by fiber.create(); L2 - stack created by that fiber to execute test_storage(). Fiber.storage is created on stack of L2 and referenced by global LUA_REGISTRYINDEX. Then it is unreferenced from L1 when the fiber is being destroyed. That is generally ok as soon as the storage object is always in LUA_REGISTRYINDEX, which is shared by all Lua states. But soon during destruction of the fiber.storage there will be only tarantool_L and the original L2. Original L2 may be already deleted by the time the storage is being destroyed. So this patch makes unref of the storage via reliable tarantool_L. Needed for #4662 --- src/lua/fiber.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lua/fiber.c b/src/lua/fiber.c index 53ebec9aa..a7b75f9bf 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -445,7 +445,7 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) /* Destroy local storage */ int storage_ref = f->storage.lua.ref; if (storage_ref > 0) - luaL_unref(L, LUA_REGISTRYINDEX, storage_ref); + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); /* * If fiber is not joinable * We can unref child stack here, -- 2.21.0 (Apple Git-122.2) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state Vladislav Shpilevoy @ 2019-12-11 23:57 ` Igor Munkin 2019-12-12 8:50 ` Konstantin Osipov 2019-12-12 23:38 ` Vladislav Shpilevoy 0 siblings, 2 replies; 26+ messages in thread From: Igor Munkin @ 2019-12-11 23:57 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Vlad, Thanks for the patch! I spent some time knee deep into fibers machinery and totally agree with this fix: the most safe approach is "unrefing" storage entity via tarantool_L coro, taking into account the possible unref of the fiber's one and its consequtive collection. However, I don't get why we need to apply these changes, considering your follow-up patch. Could you please provide a bit more detailed rationale? On 08.12.19, Vladislav Shpilevoy wrote: > Fiber.storage is a table, available from anywhere in the fiber. It > is destroyed after fiber function is finished. That provides a > reliable fiber-local storage, similar to thread-local in C/C++. > > But there is a problem that the storage may be created via one > struct lua_State, and destroyed via another. Here is an example: > > function test_storage() > fiber.self().storage.key = 100 > end > box.schema.func.create('test_storage') > _ = fiber.create(function() > box.func.test_storage:call() > end) > > There are 3 struct lua_State: > tarantool_L - global always alive Lua state; > L1 - stack of the fiber, created by fiber.create(); > L2 - stack created by that fiber to execute test_storage(). Minor: Strictly saying, lua_State is a Lua coroutine, and not just a guest stack. Please, consider adjusting the commit message, but feel free to ignore this remark. > > Fiber.storage is created on stack of L2 and referenced by global > LUA_REGISTRYINDEX. Then it is unreferenced from L1 when the fiber > is being destroyed. > > That is generally ok as soon as the storage object is always in > LUA_REGISTRYINDEX, which is shared by all Lua states. > > But soon during destruction of the fiber.storage there will be > only tarantool_L and the original L2. Original L2 may be already > deleted by the time the storage is being destroyed. So this patch > makes unref of the storage via reliable tarantool_L. > > Needed for #4662 > --- > src/lua/fiber.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/lua/fiber.c b/src/lua/fiber.c > index 53ebec9aa..a7b75f9bf 100644 > --- a/src/lua/fiber.c > +++ b/src/lua/fiber.c > @@ -445,7 +445,7 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) > /* Destroy local storage */ > int storage_ref = f->storage.lua.ref; > if (storage_ref > 0) > - luaL_unref(L, LUA_REGISTRYINDEX, storage_ref); > + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); > /* > * If fiber is not joinable > * We can unref child stack here, > -- > 2.21.0 (Apple Git-122.2) > -- Best regards, IM ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state 2019-12-11 23:57 ` Igor Munkin @ 2019-12-12 8:50 ` Konstantin Osipov 2019-12-12 23:38 ` Vladislav Shpilevoy 1 sibling, 0 replies; 26+ messages in thread From: Konstantin Osipov @ 2019-12-12 8:50 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy * Igor Munkin <imun@tarantool.org> [19/12/12 03:03]: > I spent some time knee deep into fibers machinery and totally agree with > this fix: the most safe approach is "unrefing" storage entity via > tarantool_L coro, taking into account the possible unref of the fiber's > one and its consequtive collection. > > However, I don't get why we need to apply these changes, considering > your follow-up patch. Could you please provide a bit more detailed > rationale? I think it's worth applying them simply because the follow up changes may end up being something completely different. I agree this is a good fix. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state 2019-12-11 23:57 ` Igor Munkin 2019-12-12 8:50 ` Konstantin Osipov @ 2019-12-12 23:38 ` Vladislav Shpilevoy 2019-12-13 10:44 ` Igor Munkin 1 sibling, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-12 23:38 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi! Thanks for the review! On 12/12/2019 00:57, Igor Munkin wrote: > Vlad, > > Thanks for the patch! > > I spent some time knee deep into fibers machinery and totally agree with > this fix: the most safe approach is "unrefing" storage entity via > tarantool_L coro, taking into account the possible unref of the fiber's > one and its consequtive collection. > > However, I don't get why we need to apply these changes, considering > your follow-up patch. Could you please provide a bit more detailed > rationale? My follow up patch is about a different bug and I wanted to keep these independent fixes separated. That would have been strange, if in the second commit I had just silently replaced the old L with tarantool_L. > > On 08.12.19, Vladislav Shpilevoy wrote: >> Fiber.storage is a table, available from anywhere in the fiber. It >> is destroyed after fiber function is finished. That provides a >> reliable fiber-local storage, similar to thread-local in C/C++. >> >> But there is a problem that the storage may be created via one >> struct lua_State, and destroyed via another. Here is an example: >> >> function test_storage() >> fiber.self().storage.key = 100 >> end >> box.schema.func.create('test_storage') >> _ = fiber.create(function() >> box.func.test_storage:call() >> end) >> >> There are 3 struct lua_State: >> tarantool_L - global always alive Lua state; >> L1 - stack of the fiber, created by fiber.create(); >> L2 - stack created by that fiber to execute test_storage(). > > Minor: Strictly saying, lua_State is a Lua coroutine, and not just a > guest stack. Please, consider adjusting the commit message, but feel > free to ignore this remark. > Agree, here is a new version of this paragraph: " There are 3 struct lua_State: tarantool_L - global always alive state; L1 - Lua coroutine of the fiber, created by fiber.create(); L2 - Lua coroutine created by that fiber to execute test_storage(). " ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state 2019-12-12 23:38 ` Vladislav Shpilevoy @ 2019-12-13 10:44 ` Igor Munkin 0 siblings, 0 replies; 26+ messages in thread From: Igor Munkin @ 2019-12-13 10:44 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Vlad, On 13.12.19, Vladislav Shpilevoy wrote: > Hi! Thanks for the review! > > On 12/12/2019 00:57, Igor Munkin wrote: > > Vlad, > > > > Thanks for the patch! > > > > I spent some time knee deep into fibers machinery and totally agree with > > this fix: the most safe approach is "unrefing" storage entity via > > tarantool_L coro, taking into account the possible unref of the fiber's > > one and its consequtive collection. > > > > However, I don't get why we need to apply these changes, considering > > your follow-up patch. Could you please provide a bit more detailed > > rationale? > > My follow up patch is about a different bug and I wanted to keep > these independent fixes separated. That would have been strange, > if in the second commit I had just silently replaced the old L > with tarantool_L. > Yes, on second thought I agree with you and Kostja, so here is my LGTM for this patch. > > > > On 08.12.19, Vladislav Shpilevoy wrote: > >> Fiber.storage is a table, available from anywhere in the fiber. It > >> is destroyed after fiber function is finished. That provides a > >> reliable fiber-local storage, similar to thread-local in C/C++. > >> > >> But there is a problem that the storage may be created via one > >> struct lua_State, and destroyed via another. Here is an example: > >> > >> function test_storage() > >> fiber.self().storage.key = 100 > >> end > >> box.schema.func.create('test_storage') > >> _ = fiber.create(function() > >> box.func.test_storage:call() > >> end) > >> > >> There are 3 struct lua_State: > >> tarantool_L - global always alive Lua state; > >> L1 - stack of the fiber, created by fiber.create(); > >> L2 - stack created by that fiber to execute test_storage(). > > > > Minor: Strictly saying, lua_State is a Lua coroutine, and not just a > > guest stack. Please, consider adjusting the commit message, but feel > > free to ignore this remark. > > > > Agree, here is a new version of this paragraph: > > " > There are 3 struct lua_State: > tarantool_L - global always alive state; > L1 - Lua coroutine of the fiber, created by fiber.create(); > L2 - Lua coroutine created by that fiber to execute > test_storage(). > " -- Best regards, IM ^ permalink raw reply [flat|nested] 26+ messages in thread
* [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-08 19:28 [Tarantool-patches] [PATCH 0/2] Fiber storage leak Vladislav Shpilevoy 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state Vladislav Shpilevoy @ 2019-12-08 19:28 ` Vladislav Shpilevoy 2019-12-09 7:21 ` Konstantin Osipov 2019-12-10 8:32 ` Konstantin Osipov 1 sibling, 2 replies; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-08 19:28 UTC (permalink / raw) To: tarantool-patches, kostja.osipov, imun Fiber.storage was not deleted when created in a fiber started from the thread pool used by IProto requests. The problem was that fiber.storage was created and deleted in Lua land only, assuming that only Lua-born fibers could have it. But in fact any fiber can create a Lua storage. Including the ones used to serve IProto requests. Not deletion of the storage led to a possibility of meeting a non-empty fiber.storage in the beginning of an iproto request, and not deletion of the memory caught by the storage until its explicit nullification. Now the storage destructor works for any fiber, which managed to create the storage. The destructor unrefs and nullifies the storage. For destructor purposes the fiber.on_stop triggers were reworked. Now they are on_cleanup triggers, and can be called multiple times during fiber's lifetime. After every request done by that fiber. Closes #4662 Closes #3462 --- src/box/session.cc | 11 ++- src/box/session.h | 7 +- src/box/txn.c | 13 +-- src/box/txn.h | 7 +- src/lib/core/fiber.c | 23 +++-- src/lib/core/fiber.h | 13 ++- src/lib/core/fiber_pool.c | 6 ++ src/lua/fiber.c | 35 ++++++-- test/app/gh-4662-fiber-storage-leak.result | 88 ++++++++++++++++++++ test/app/gh-4662-fiber-storage-leak.test.lua | 43 ++++++++++ 10 files changed, 217 insertions(+), 29 deletions(-) create mode 100644 test/app/gh-4662-fiber-storage-leak.result create mode 100644 test/app/gh-4662-fiber-storage-leak.test.lua diff --git a/src/box/session.cc b/src/box/session.cc index 461d1cf25..a08043986 100644 --- a/src/box/session.cc +++ b/src/box/session.cc @@ -81,7 +81,7 @@ sid_max() } static int -session_on_stop(struct trigger *trigger, void * /* event */) +session_on_fiber_cleanup(struct trigger *trigger, void * /* event */) { /* * Remove on_stop trigger from the fiber, otherwise the @@ -167,11 +167,10 @@ session_create_on_demand() struct session *s = session_create(SESSION_TYPE_BACKGROUND); if (s == NULL) return NULL; - s->fiber_on_stop = { - RLIST_LINK_INITIALIZER, session_on_stop, NULL, NULL - }; - /* Add a trigger to destroy session on fiber stop */ - trigger_add(&fiber()->on_stop, &s->fiber_on_stop); + trigger_create(&s->fiber_on_cleanup, session_on_fiber_cleanup, + NULL, NULL); + /* Add a trigger to destroy session on fiber cleanup. */ + trigger_add(&fiber()->on_cleanup, &s->fiber_on_cleanup); credentials_reset(&s->credentials, admin_user); fiber_set_session(fiber(), s); fiber_set_user(fiber(), &s->credentials); diff --git a/src/box/session.h b/src/box/session.h index eff3d7a67..710e4919b 100644 --- a/src/box/session.h +++ b/src/box/session.h @@ -103,8 +103,11 @@ struct session { union session_meta meta; /** Session user id and global grants */ struct credentials credentials; - /** Trigger for fiber on_stop to cleanup created on-demand session */ - struct trigger fiber_on_stop; + /** + * Trigger for fiber on_cleanup to cleanup created + * on-demand session. + */ + struct trigger fiber_on_cleanup; }; struct session_vtab { diff --git a/src/box/txn.c b/src/box/txn.c index 963ec8eeb..7d549a6a9 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -41,7 +41,7 @@ double too_long_threshold; static struct stailq txn_cache = {NULL, &txn_cache.first}; static int -txn_on_stop(struct trigger *trigger, void *event); +txn_on_fiber_cleanup(struct trigger *trigger, void *event); static int txn_on_yield(struct trigger *trigger, void *event); @@ -226,8 +226,9 @@ txn_begin() txn->fiber = NULL; fiber_set_txn(fiber(), txn); /* fiber_on_yield is initialized by engine on demand */ - trigger_create(&txn->fiber_on_stop, txn_on_stop, NULL, NULL); - trigger_add(&fiber()->on_stop, &txn->fiber_on_stop); + trigger_create(&txn->fiber_on_cleanup, txn_on_fiber_cleanup, + NULL, NULL); + trigger_add(&fiber()->on_cleanup, &txn->fiber_on_cleanup); /* * By default all transactions may yield. * It's a responsibility of an engine to disable yields @@ -550,7 +551,7 @@ txn_prepare(struct txn *txn) if (engine_prepare(txn->engine, txn) != 0) return -1; } - trigger_clear(&txn->fiber_on_stop); + trigger_clear(&txn->fiber_on_cleanup); if (!txn_has_flag(txn, TXN_CAN_YIELD)) trigger_clear(&txn->fiber_on_yield); return 0; @@ -618,7 +619,7 @@ void txn_rollback(struct txn *txn) { assert(txn == in_txn()); - trigger_clear(&txn->fiber_on_stop); + trigger_clear(&txn->fiber_on_cleanup); if (!txn_has_flag(txn, TXN_CAN_YIELD)) trigger_clear(&txn->fiber_on_yield); txn->signature = -1; @@ -840,7 +841,7 @@ txn_savepoint_release(struct txn_savepoint *svp) } static int -txn_on_stop(struct trigger *trigger, void *event) +txn_on_fiber_cleanup(struct trigger *trigger, void *event) { (void) trigger; (void) event; diff --git a/src/box/txn.h b/src/box/txn.h index da12feebf..755c464f0 100644 --- a/src/box/txn.h +++ b/src/box/txn.h @@ -207,10 +207,11 @@ struct txn { */ struct trigger fiber_on_yield; /** - * Trigger on fiber stop, to rollback transaction - * in case a fiber stops (all engines). + * Trigger on fiber cleanup, to rollback transaction + * in case a fiber is getting reset/recycled/destroyed + * (all engines). */ - struct trigger fiber_on_stop; + struct trigger fiber_on_cleanup; /** Commit and rollback triggers. */ struct rlist on_commit, on_rollback; /** diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 00ae8cded..db8fdd5fe 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -325,6 +325,20 @@ fiber_attr_getstacksize(struct fiber_attr *fiber_attr) fiber_attr_default.stack_size; } +void +fiber_cleanup(struct fiber *f) +{ + if (trigger_run(&f->on_cleanup, f) != 0) + panic("Fiber cleanup can't fail"); + /* + * Double call of cleanup routines is a bad idea. It is + * like calling a destructor 2 times. On_cleanup can be + * called only once until the fiber is reused again, and + * new triggers are set. + */ + trigger_destroy(&f->on_cleanup); +} + static void fiber_recycle(struct fiber *fiber); @@ -787,7 +801,7 @@ static void fiber_reset(struct fiber *fiber) { rlist_create(&fiber->on_yield); - rlist_create(&fiber->on_stop); + rlist_create(&fiber->on_cleanup); fiber->flags = FIBER_DEFAULT_FLAGS; #if ENABLE_FIBER_TOP clock_stat_reset(&fiber->clock_stat); @@ -856,8 +870,7 @@ fiber_loop(MAYBE_UNUSED void *data) assert(f != fiber); fiber_wakeup(f); } - if (! rlist_empty(&fiber->on_stop)) - trigger_run(&fiber->on_stop, fiber); + fiber_cleanup(fiber); /* reset pending wakeups */ rlist_del(&fiber->state); if (! (fiber->flags & FIBER_IS_JOINABLE)) @@ -1161,7 +1174,7 @@ fiber_destroy(struct cord *cord, struct fiber *f) assert(f != &cord->sched); trigger_destroy(&f->on_yield); - trigger_destroy(&f->on_stop); + trigger_destroy(&f->on_cleanup); rlist_del(&f->state); rlist_del(&f->link); region_destroy(&f->gc); @@ -1555,7 +1568,7 @@ cord_costart_thread_func(void *arg) * Got to be in a trigger, to break the loop even * in case of an exception. */ - trigger_add(&f->on_stop, &break_ev_loop); + trigger_add(&f->on_cleanup, &break_ev_loop); fiber_set_joinable(f, true); fiber_start(f, ctx.arg); if (!fiber_is_dead(f)) { diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index c5b975513..4e21b9999 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -458,8 +458,13 @@ struct fiber { /** Triggers invoked before this fiber yields. Must not throw. */ struct rlist on_yield; - /** Triggers invoked before this fiber stops. Must not throw. */ - struct rlist on_stop; + /** + * Triggers invoked before this fiber is + * stopped/reset/recycled/destroyed/reused. In other + * words, each time when the fiber has finished execution + * of a request. + */ + struct rlist on_cleanup; /** * The list of fibers awaiting for this fiber's timely * (or untimely) death. @@ -506,6 +511,10 @@ struct fiber { char name[FIBER_NAME_MAX + 1]; }; +/** Invoke cleanup triggers and delete them. */ +void +fiber_cleanup(struct fiber *f); + struct cord_on_exit; /** diff --git a/src/lib/core/fiber_pool.c b/src/lib/core/fiber_pool.c index 77f89c9fa..31f366fac 100644 --- a/src/lib/core/fiber_pool.c +++ b/src/lib/core/fiber_pool.c @@ -62,6 +62,12 @@ restart: assert(f->caller->caller == &cord->sched); } cmsg_deliver(msg); + /* + * Different messages = different requests. They + * should not affect each other. This is why + * cleanup is done here. + */ + fiber_cleanup(f); } /** Put the current fiber into a fiber cache. */ if (!fiber_is_cancelled(fiber()) && (msg != NULL || diff --git a/src/lua/fiber.c b/src/lua/fiber.c index a7b75f9bf..cba625082 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -441,11 +441,6 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) int coro_ref = lua_tointeger(L, -1); lua_pop(L, 1); result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); - - /* Destroy local storage */ - int storage_ref = f->storage.lua.ref; - if (storage_ref > 0) - luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); /* * If fiber is not joinable * We can unref child stack here, @@ -606,12 +601,42 @@ lbox_fiber_name(struct lua_State *L) } } +/** + * Trigger invoked when the fiber has finished execution of its + * current request. Only purpose - delete storage.lua.ref keeping + * a reference of Lua fiber.storage object. Unlike Lua stack, + * Lua fiber storage may be created not only for fibers born from + * Lua land. For example, an IProto request may execute a Lua + * function, which can create the storage. Trigger guarantees, + * that even for non-Lua fibers the Lua storage is destroyed. + */ +static int +lbox_fiber_on_cleanup(struct trigger *trigger, void *event) +{ + (void) trigger; + struct fiber *f = (struct fiber *) event; + int storage_ref = f->storage.lua.ref; + assert(storage_ref > 0); + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); + f->storage.lua.ref = 0; + return 0; +} + static int lbox_fiber_storage(struct lua_State *L) { struct fiber *f = lbox_checkfiber(L, 1); int storage_ref = f->storage.lua.ref; if (storage_ref <= 0) { + struct trigger *t = (struct trigger *) + malloc(sizeof(*t)); + if (t == NULL) { + diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); + return luaT_error(L); + } + trigger_create(t, lbox_fiber_on_cleanup, NULL, + (trigger_f0) free); + trigger_add(&f->on_cleanup, t); lua_newtable(L); /* create local storage on demand */ storage_ref = luaL_ref(L, LUA_REGISTRYINDEX); f->storage.lua.ref = storage_ref; diff --git a/test/app/gh-4662-fiber-storage-leak.result b/test/app/gh-4662-fiber-storage-leak.result new file mode 100644 index 000000000..4ade017a4 --- /dev/null +++ b/test/app/gh-4662-fiber-storage-leak.result @@ -0,0 +1,88 @@ +-- test-run result file version 2 +fiber = require('fiber') + | --- + | ... +netbox = require('net.box') + | --- + | ... + +-- +-- gh-4662: fiber.storage was not deleted when created in a fiber +-- started from the thread pool used by IProto requests. The +-- problem was that fiber.storage was created and deleted in Lua +-- land only, assuming that only Lua-born fibers could have it. +-- But in fact any fiber can create a Lua storage. Including the +-- ones used to serve IProto requests. +-- The test checks if fiber.storage is really deleted, and is not +-- shared between requests. +-- + +box.schema.user.grant('guest', 'execute', 'universe') + | --- + | ... +storage = nil + | --- + | ... +i = 0 + | --- + | ... +weak_table = setmetatable({}, {__mode = 'v'}) + | --- + | ... +object = {'object'} + | --- + | ... +weak_table.object = object + | --- + | ... +function ref_object_in_fiber() \ + storage = fiber.self().storage \ + assert(next(storage) == nil) \ + i = i + 1 \ + fiber.self().storage.key = i \ + fiber.self().storage.object = object \ +end + | --- + | ... + +c = netbox.connect(box.cfg.listen) + | --- + | ... +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') + | --- + | ... +storage + | --- + | - key: 2 + | object: + | - object + | ... +i + | --- + | - 2 + | ... +object = nil + | --- + | ... +storage = nil + | --- + | ... +collectgarbage('collect') + | --- + | - 0 + | ... +-- The weak table should be empty, because the only two hard +-- references were in the fibers used to serve +-- ref_object_in_fiber() requests. And their storages should be +-- cleaned up. +weak_table + | --- + | - [] + | ... + +c:close() + | --- + | ... +box.schema.user.revoke('guest', 'execute', 'universe') + | --- + | ... diff --git a/test/app/gh-4662-fiber-storage-leak.test.lua b/test/app/gh-4662-fiber-storage-leak.test.lua new file mode 100644 index 000000000..25acf5679 --- /dev/null +++ b/test/app/gh-4662-fiber-storage-leak.test.lua @@ -0,0 +1,43 @@ +fiber = require('fiber') +netbox = require('net.box') + +-- +-- gh-4662: fiber.storage was not deleted when created in a fiber +-- started from the thread pool used by IProto requests. The +-- problem was that fiber.storage was created and deleted in Lua +-- land only, assuming that only Lua-born fibers could have it. +-- But in fact any fiber can create a Lua storage. Including the +-- ones used to serve IProto requests. +-- The test checks if fiber.storage is really deleted, and is not +-- shared between requests. +-- + +box.schema.user.grant('guest', 'execute', 'universe') +storage = nil +i = 0 +weak_table = setmetatable({}, {__mode = 'v'}) +object = {'object'} +weak_table.object = object +function ref_object_in_fiber() \ + storage = fiber.self().storage \ + assert(next(storage) == nil) \ + i = i + 1 \ + fiber.self().storage.key = i \ + fiber.self().storage.object = object \ +end + +c = netbox.connect(box.cfg.listen) +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') +storage +i +object = nil +storage = nil +collectgarbage('collect') +-- The weak table should be empty, because the only two hard +-- references were in the fibers used to serve +-- ref_object_in_fiber() requests. And their storages should be +-- cleaned up. +weak_table + +c:close() +box.schema.user.revoke('guest', 'execute', 'universe') -- 2.21.0 (Apple Git-122.2) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto Vladislav Shpilevoy @ 2019-12-09 7:21 ` Konstantin Osipov 2019-12-09 23:31 ` Vladislav Shpilevoy 2019-12-10 8:32 ` Konstantin Osipov 1 sibling, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-09 7:21 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/08 23:47]: The danger of reporting, self-assigning and fixing a bug is that ... it might be not a bug. Users should know that iproto fibers are pooled, not created/destroyed on demand. So fiber local storage is simply not making any sense for them. A proper fix would be to disable fiber local storage for these fibers, not try to make it work. Even though your patch "works", it is pretty much useless. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-09 7:21 ` Konstantin Osipov @ 2019-12-09 23:31 ` Vladislav Shpilevoy 2019-12-10 8:21 ` Konstantin Osipov 0 siblings, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-09 23:31 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches, imun Hi! Thanks for the review! Disclaimer: in case something looks 'toxic', this is an accident, and is not done intentionally. On 09/12/2019 08:21, Konstantin Osipov wrote: > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/08 23:47]: > > The danger of reporting, self-assigning and fixing a bug is that > ... it might be not a bug. I agree. Except when something is obviously a bug which is clearly the case. > > Users should know that iproto fibers are pooled, not > created/destroyed on demand. So fiber local storage is simply not > making any sense for them. Here I disagree. Users should not know anything about whether the fibers are pooled, when, and how. If so, then please, give an example, when such a strange fiber.storage is needed? I just really can't come up with a usage case. I did a public poll to understand what a behaviour is expected, and not a single person expects, that pooled vs not pooled fiber would affect its behaviour. Pool is like a cache. It should not be visible. You literally said it in another thread about SQL PREPARE. > > A proper fix would be to disable fiber local storage for these > fibers, not try to make it work. That would lead to a problem, that functions, using fiber.storage, would work properly being called from a background fiber, but would fail being called from an iproto-used fiber. It is clearly not a fix, but rather a much worse bug, sorry. > > Even though your patch "works", it is pretty much useless. > Well, sorry, with all respect, but I disagree here too. Appeared, there were repetitive requests to fix this weird behaviour of fiber.storage a couple of years ago, and again recently, which were successfully rejected by you and by the same reason, that pooling, as you think, should be exposed. My patch fixes that problem, what makes it not useless. Moreover, that fixes another problem about pooled fibers never freeing the storages (= leak) and therefore occupying Lua memory, which is very limited in size. That may become a real problem, when there are many pooled fibers. Their count may be thousands, and depends on net_msg_max. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-09 23:31 ` Vladislav Shpilevoy @ 2019-12-10 8:21 ` Konstantin Osipov 0 siblings, 0 replies; 26+ messages in thread From: Konstantin Osipov @ 2019-12-10 8:21 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/10 10:22]: > > The danger of reporting, self-assigning and fixing a bug is that > > ... it might be not a bug. > > I agree. Except when something is obviously a bug which is > clearly the case. > > > > > Users should know that iproto fibers are pooled, not > > created/destroyed on demand. So fiber local storage is simply not > > making any sense for them. > > Here I disagree. Users should not know anything about whether > the fibers are pooled, when, and how. > > If so, then please, give an example, when such a strange > fiber.storage is needed? I just really can't come up with a > usage case. > > I did a public poll to understand what a behaviour is expected, > and not a single person expects, that pooled vs not pooled fiber > would affect its behaviour. Pool is like a cache. It should not > be visible. You literally said it in another thread about SQL > PREPARE. You had a discussion with the community on the channel and many people favour your fix. This is how you handle this! I will send a review. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto Vladislav Shpilevoy 2019-12-09 7:21 ` Konstantin Osipov @ 2019-12-10 8:32 ` Konstantin Osipov 2019-12-10 22:59 ` Vladislav Shpilevoy 1 sibling, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-10 8:32 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/08 23:47]: > static int > -session_on_stop(struct trigger *trigger, void * /* event */) > +session_on_fiber_cleanup(struct trigger *trigger, void * /* event */) Please come up with a better name, and provide a comment. > - trigger_add(&fiber()->on_stop, &s->fiber_on_stop); > + trigger_create(&s->fiber_on_cleanup, session_on_fiber_cleanup, > + NULL, NULL); > + /* Add a trigger to destroy session on fiber cleanup. */ > + trigger_add(&fiber()->on_cleanup, &s->fiber_on_cleanup); I don't understand why these two concerns are intermixed here. Why is this code part of session now? I think if you change the semantics of the trigger, you should move setting/resetting it to the place which controls the relevant object and its life cycle. Specifically, if you change fiber.storage semantics to be per-request *or* per Lua fiber, you should move the trigger to lua fiber create/destroy + iproto request processing start/end. Instead you move it to the core library, which is oblivious of these events. The way the core is implemented may change, and whoever is changing it will have to be aware of the legacy behaviour you make it depend on. > index eff3d7a67..710e4919b 100644 > --- a/src/box/session.h > +++ b/src/box/session.h > @@ -103,8 +103,11 @@ struct session { > union session_meta meta; > /** Session user id and global grants */ > struct credentials credentials; > - /** Trigger for fiber on_stop to cleanup created on-demand session */ > - struct trigger fiber_on_stop; > + /** > + * Trigger for fiber on_cleanup to cleanup created > + * on-demand session. > + */ > + struct trigger fiber_on_cleanup; ... > - trigger_clear(&txn->fiber_on_stop); > + trigger_clear(&txn->fiber_on_cleanup); Please see the above comment. Fiber cleanup trigger, as defined now, does not have anything to do with the session, or fiber, or transaction. It belongs to the request processing layer and should be set/cleared from it. > - struct rlist on_stop; > + /** > + * Triggers invoked before this fiber is > + * stopped/reset/recycled/destroyed/reused. In other > + * words, each time when the fiber has finished execution > + * of a request. > + */ > + struct rlist on_cleanup; Please make the comment describing the timing when this trigger is called less broad and more accurate, calling out the specific places when the trigger is called, and call the trigger accordingly. on_request_or_fiber_end? > +/** Invoke cleanup triggers and delete them. */ > +void > +fiber_cleanup(struct fiber *f); The call for a better name for the event and the accompanying action applies to all of the new api methods this patch introduces. > cmsg_deliver(msg); > + /* > + * Different messages = different requests. They > + * should not affect each other. This is why > + * cleanup is done here. > + */ > + fiber_cleanup(f); Ugh. I think this is a layering violation, and the cleanup call should be somewhere inside cmsg_deliver for the relevant messages. The rlist with cleanup triggers is better stored in a fiber key. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-10 8:32 ` Konstantin Osipov @ 2019-12-10 22:59 ` Vladislav Shpilevoy 2019-12-11 7:08 ` Konstantin Osipov 0 siblings, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-10 22:59 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches Hi! Thanks for the review! On 10/12/2019 09:32, Konstantin Osipov wrote: > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/08 23:47]: >> static int >> -session_on_stop(struct trigger *trigger, void * /* event */) >> +session_on_fiber_cleanup(struct trigger *trigger, void * /* event */) > > Please come up with a better name, and provide a comment. > Well, I don't know a better name. IMO this name perfectly describes what is happening. 'Cleanup' means that the fiber is getting cleaned up. Quite straightforward explanation. If you have other ideas - please, bring them up, and lets discuss. I was thinking about 'on_reload', 'on_reuse', 'on_retire', 'on_reset', 'on_request_end', 'on_destroy'. Or keep 'on_stop'. Reload - bad, because it is also called when the fiber goes to the dead list. It is not reload. Reuse - bad by the same reason. Retire - bad, because basically = death, but the fiber may be getting reused. Reset - bad, too common. Request_end - bad, because some fibers don't serve requests. For example, single fibers of background threads. Stop/destroy - bad, because would be called after each request, and that != fiber stop or destruction. Waiting for your feedback. I really don't know what an other name to use. Comment update: =================================================================== +/** + * Destroy session of a background fiber when the + * fiber is getting destroyed. + */ static int session_on_fiber_cleanup(struct trigger *trigger, void * /* event */) { =================================================================== Even though I think that it is useless here. The function purpose is obvious already. >> - trigger_add(&fiber()->on_stop, &s->fiber_on_stop); >> + trigger_create(&s->fiber_on_cleanup, session_on_fiber_cleanup, >> + NULL, NULL); >> + /* Add a trigger to destroy session on fiber cleanup. */ >> + trigger_add(&fiber()->on_cleanup, &s->fiber_on_cleanup); > > I don't understand why these two concerns are intermixed here. > Why is this code part of session now? Looks like you didn't read the patch carefully enough. This piece of code is kept as is. I just used a proper constructor for trigger object instead of assigning its members manually. And added a comment. That is all. I would not change this hunk at all, if no the 'on_stop' -> 'on_cleanup' rename. Talking of why is that code part of the session, *and always was* - I guess so as to destroy only local sessions. We can't just destroy any session in struct fiber.storage.session when the fiber is being destroyed or reused, right in fiber.c. Because remote sessions live longer than their fibers. This is why local sessions schedule their own destruction via on_stop/on_cleanup trigger in the fiber. And remote sessions don't. Although that problem could be solved by reference counting. Anyway, that is not a subject to discuss in scope of that issue. > > I think if you change the semantics of the trigger, you should > move setting/resetting it to the place which controls the relevant > object and its life cycle. Specifically, if you change > fiber.storage semantics to be per-request *or* per Lua fiber, > you should move the trigger to lua fiber create/destroy + > iproto request processing start/end. Well, this is not about Lua only. Background sessions can be created from C API too. I could patch both Lua lbox_fiber_create and API_EXPORT fiber_new(), but again certainly not here. I would like to keep these not related changes away from the problem of storage leaking and garbaging. I would be happy to fix them separately. Btw, Lua fiber storage in that patch is created and destroyed in Lua part. So it conforms to your vision of how the trigger should be used. > Instead you move it > to the core library, which is oblivious of these events. Nope. I don't move anything anywhere. Everything is kept where it was. > The way the core is implemented may change, and whoever is > changing it will have to be aware of the legacy behaviour you make > it depend on. > >> index eff3d7a67..710e4919b 100644 >> --- a/src/box/session.h >> +++ b/src/box/session.h >> @@ -103,8 +103,11 @@ struct session { >> union session_meta meta; >> /** Session user id and global grants */ >> struct credentials credentials; >> - /** Trigger for fiber on_stop to cleanup created on-demand session */ >> - struct trigger fiber_on_stop; >> + /** >> + * Trigger for fiber on_cleanup to cleanup created >> + * on-demand session. >> + */ >> + struct trigger fiber_on_cleanup; > ... > >> - trigger_clear(&txn->fiber_on_stop); >> + trigger_clear(&txn->fiber_on_cleanup); > > Please see the above comment. Fiber cleanup trigger, as defined > now, does not have anything to do with the session, or fiber, or > transaction. But it does. As I said, my patch does not change anything here. Txn and local session were destroyed in fiber cleanup, and they still are. > It belongs to the request processing layer and should be > set/cleared from it. Fiber loop and pool are the request processing layers. They process all requests, local and remote. So cleanup was and still is invoked there. But see my last comment below. >> - struct rlist on_stop; >> + /** >> + * Triggers invoked before this fiber is >> + * stopped/reset/recycled/destroyed/reused. In other >> + * words, each time when the fiber has finished execution >> + * of a request. >> + */ >> + struct rlist on_cleanup; > > Please make the comment describing the timing when this > trigger is called less broad and more accurate, calling out > the specific places when the trigger is called, and call > the trigger accordingly. > on_request_or_fiber_end? IMO, that names looks terrible. See my first comment. Talking of the code comment. I am not sure what is wrong with accuracy. I said exactly when cleanup is being done: stop/reset/recycle/destroy/reuse/request end. But ok: =================================================================== @@ -463,6 +463,10 @@ struct fiber { * stopped/reset/recycled/destroyed/reused. In other * words, each time when the fiber has finished execution * of a request. + * In particular, for fibers not from fiber pool the + * cleanup is done before destroy and death. + * Pooled fibers are cleaned up after each request, even + * if they are never destroyed. */ struct rlist on_cleanup; /** =================================================================== I am going to stop here and not to give any source code references in the comment, because such refs tend to outdate. >> +/** Invoke cleanup triggers and delete them. */ >> +void >> +fiber_cleanup(struct fiber *f); > > The call for a better name for the event and the accompanying > action applies to all of the new api > methods this patch introduces. > This is mutual. Propose a better name or choose one of mine, please. I gave them in the first comment. I can't read your mind, so if you want a particular name - say it, and lets discuss. >> cmsg_deliver(msg); >> + /* >> + * Different messages = different requests. They >> + * should not affect each other. This is why >> + * cleanup is done here. >> + */ >> + fiber_cleanup(f); > > Ugh. I think this is a layering violation, and the cleanup > call should be somewhere inside cmsg_deliver for the relevant > messages. The rlist with cleanup triggers is better stored in > a fiber key. > Sorry. I really don't do that on purpose. This is not because I just like to disagree with everything. This is because I don't understand you sometimes. - Here I again partially disagree :) cmsg_deliver() is used in two places - the pooled fiber loop, and cbus loop. Cbus loop is used in background threads such as vinyl readers, relay, etc. They have no a concept of isolated request or a context. The whole cbus_loop is one long request working in one long-living fiber. So the only place needed that patch is the fiber pool loop, which serves many requests. An attempt to call cleanup in cbus loop would destroy that fiber's session after each request, what looks like needless oscillation. That session is empty anyway. Also cbus has nothing to do with fiber control. It uses some fiber things such as conds, but that is all. On the other hand I agree, that fiber pooled loop probably should not know anything about requests. Another option is to call fiber_cleanup() right in iproto.cc after each request. We would need to patch tx_process_misc(), tx_process_call(), tx_process1(), tx_process_select(), tx_process_sql(), tx_process_join_subscribe(). Not so many places. What do you think? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-10 22:59 ` Vladislav Shpilevoy @ 2019-12-11 7:08 ` Konstantin Osipov 2019-12-11 21:23 ` Vladislav Shpilevoy 0 siblings, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-11 7:08 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/11 10:01]: > On the other hand I agree, that fiber pooled loop probably should > not know anything about requests. Another option is to call > fiber_cleanup() right in iproto.cc after each request. We would need > to patch tx_process_misc(), tx_process_call(), tx_process1(), > tx_process_select(), tx_process_sql(), tx_process_join_subscribe(). > Not so many places. What do you think? This is what I suggest. Let's begin with this, then the rest of my comments will fit in their place. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-11 7:08 ` Konstantin Osipov @ 2019-12-11 21:23 ` Vladislav Shpilevoy 2019-12-12 0:00 ` Igor Munkin 2019-12-12 8:46 ` Konstantin Osipov 0 siblings, 2 replies; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-11 21:23 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches Thanks for the review! On 11/12/2019 08:08, Konstantin Osipov wrote: > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/11 10:01]: >> On the other hand I agree, that fiber pooled loop probably should >> not know anything about requests. Another option is to call >> fiber_cleanup() right in iproto.cc after each request. We would need >> to patch tx_process_misc(), tx_process_call(), tx_process1(), >> tx_process_select(), tx_process_sql(), tx_process_join_subscribe(). >> Not so many places. What do you think? > > This is what I suggest. Let's begin with this, then the rest of > my comments will fit in their place. > Here it is: - I moved cleanup to iproto.cc, and called it tx_fiber_cleanup(). By analogue with tx_fiber_init() which currently initializes the fiber; - I made cleanup triggers always clean them by themselves to remove trigger_destroy() from fiber.c. But it didn't help me to understand, why are you saying, that tx and session should not use these triggers. Tx, in case it was not removed from fiber before its cleanup, should be destroyed. Session uses that trigger to destroy local one-fiber sessions, which die together with the fiber getting recycled/killed. So what is a problem? That trigger perfectly fits all the needs. ========================================================================= diff --git a/src/box/iproto.cc b/src/box/iproto.cc index c39b8e7bf..600a21ac8 100644 --- a/src/box/iproto.cc +++ b/src/box/iproto.cc @@ -1306,6 +1306,12 @@ static void tx_fiber_init(struct session *session, uint64_t sync) { struct fiber *f = fiber(); + /* + * There should not be any not executed + * destructors from a previous request + * executed in that fiber. + */ + assert(rlist_empty(&f->on_cleanup)); f->storage.net.sync = sync; /* * We do not cleanup fiber keys at the end of each request. @@ -1321,6 +1327,17 @@ tx_fiber_init(struct session *session, uint64_t sync) fiber_set_user(f, &session->credentials); } +/** + * Cleanup current fiber after a request is + * executed to make it possible to reuse the fiber + * for a next request. + */ +static inline void +tx_fiber_cleanup() +{ + fiber_cleanup(fiber()); +} + static void tx_process_disconnect(struct cmsg *m) { @@ -1331,6 +1348,7 @@ tx_process_disconnect(struct cmsg *m) if (! rlist_empty(&session_on_disconnect)) { tx_fiber_init(con->session, 0); session_run_on_disconnect_triggers(con->session); + tx_fiber_cleanup(); } } } @@ -1486,6 +1504,7 @@ tx_reply_iproto_error(struct cmsg *m) iproto_reply_error(out, diag_last_error(&msg->diag), msg->header.sync, ::schema_version); iproto_wpos_create(&msg->wpos, out); + tx_fiber_cleanup(); } /** Inject a short delay on tx request processing for testing. */ @@ -1519,9 +1538,11 @@ tx_process1(struct cmsg *m) iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, tuple != 0); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1562,9 +1583,11 @@ tx_process_select(struct cmsg *m) iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, count); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static int @@ -1652,9 +1675,11 @@ tx_process_call(struct cmsg *m) iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, count); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1695,9 +1720,11 @@ tx_process_misc(struct cmsg *m) } catch (Exception *e) { tx_reply_error(msg); } - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1711,8 +1738,6 @@ tx_process_sql(struct cmsg *m) const char *sql; uint32_t len; - tx_fiber_init(msg->connection->session, msg->header.sync); - if (tx_check_schema(msg->header.schema_version)) goto error; assert(msg->header.type == IPROTO_EXECUTE); @@ -1746,9 +1771,11 @@ tx_process_sql(struct cmsg *m) port_destroy(&port); iproto_reply_sql(out, &header_svp, msg->header.sync, schema_version); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1781,11 +1808,12 @@ tx_process_join_subscribe(struct cmsg *m) unreachable(); } } catch (SocketError *e) { - return; /* don't write error response to prevent SIGPIPE */ + /* don't write error response to prevent SIGPIPE */ } catch (Exception *e) { iproto_write_error(con->input.fd, e, ::schema_version, msg->header.sync); } + tx_fiber_cleanup(); } static void @@ -1891,6 +1919,7 @@ tx_process_connect(struct cmsg *m) tx_reply_error(msg); msg->close_connection = true; } + tx_fiber_cleanup(); } /** diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index db8fdd5fe..d89c640aa 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -331,12 +331,12 @@ fiber_cleanup(struct fiber *f) if (trigger_run(&f->on_cleanup, f) != 0) panic("Fiber cleanup can't fail"); /* - * Double call of cleanup routines is a bad idea. It is - * like calling a destructor 2 times. On_cleanup can be - * called only once until the fiber is reused again, and - * new triggers are set. + * All cleanup triggers are supposed to + * remove themselves. So as no to waste + * time on that here, and to make them all + * work uniformly. */ - trigger_destroy(&f->on_cleanup); + assert(rlist_empty(&f->on_cleanup)); } static void @@ -1538,8 +1538,8 @@ cord_cojoin(struct cord *cord) int break_ev_loop_f(struct trigger *trigger, void *event) { - (void) trigger; (void) event; + trigger_clear(trigger); ev_break(loop(), EVBREAK_ALL); return 0; } diff --git a/src/lib/core/fiber_pool.c b/src/lib/core/fiber_pool.c index 31f366fac..77f89c9fa 100644 --- a/src/lib/core/fiber_pool.c +++ b/src/lib/core/fiber_pool.c @@ -62,12 +62,6 @@ restart: assert(f->caller->caller == &cord->sched); } cmsg_deliver(msg); - /* - * Different messages = different requests. They - * should not affect each other. This is why - * cleanup is done here. - */ - fiber_cleanup(f); } /** Put the current fiber into a fiber cache. */ if (!fiber_is_cancelled(fiber()) && (msg != NULL || diff --git a/src/lua/fiber.c b/src/lua/fiber.c index cba625082..4d2828f1c 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -613,12 +613,13 @@ lbox_fiber_name(struct lua_State *L) static int lbox_fiber_on_cleanup(struct trigger *trigger, void *event) { - (void) trigger; struct fiber *f = (struct fiber *) event; int storage_ref = f->storage.lua.ref; assert(storage_ref > 0); luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); f->storage.lua.ref = 0; + trigger_clear(trigger); + free(trigger); return 0; } ========================================================================= The whole new patch: ========================================================================= fiber: destroy fiber.storage created by iproto Fiber.storage was not deleted when created in a fiber started from the thread pool used by IProto requests. The problem was that fiber.storage was created and deleted in Lua land only, assuming that only Lua-born fibers could have it. But in fact any fiber can create a Lua storage. Including the ones used to serve IProto requests. Not deletion of the storage led to a possibility of meeting a non-empty fiber.storage in the beginning of an iproto request, and not deletion of the memory caught by the storage until its explicit nullification. Now the storage destructor works for any fiber, which managed to create the storage. The destructor unrefs and nullifies the storage. For destructor purposes the fiber.on_stop triggers were reworked. Now they are on_cleanup triggers, and can be called multiple times during fiber's lifetime. After every request done by that fiber. Closes #4662 Closes #3462 diff --git a/src/box/iproto.cc b/src/box/iproto.cc index c39b8e7bf..600a21ac8 100644 --- a/src/box/iproto.cc +++ b/src/box/iproto.cc @@ -1306,6 +1306,12 @@ static void tx_fiber_init(struct session *session, uint64_t sync) { struct fiber *f = fiber(); + /* + * There should not be any not executed + * destructors from a previous request + * executed in that fiber. + */ + assert(rlist_empty(&f->on_cleanup)); f->storage.net.sync = sync; /* * We do not cleanup fiber keys at the end of each request. @@ -1321,6 +1327,17 @@ tx_fiber_init(struct session *session, uint64_t sync) fiber_set_user(f, &session->credentials); } +/** + * Cleanup current fiber after a request is + * executed to make it possible to reuse the fiber + * for a next request. + */ +static inline void +tx_fiber_cleanup() +{ + fiber_cleanup(fiber()); +} + static void tx_process_disconnect(struct cmsg *m) { @@ -1331,6 +1348,7 @@ tx_process_disconnect(struct cmsg *m) if (! rlist_empty(&session_on_disconnect)) { tx_fiber_init(con->session, 0); session_run_on_disconnect_triggers(con->session); + tx_fiber_cleanup(); } } } @@ -1486,6 +1504,7 @@ tx_reply_iproto_error(struct cmsg *m) iproto_reply_error(out, diag_last_error(&msg->diag), msg->header.sync, ::schema_version); iproto_wpos_create(&msg->wpos, out); + tx_fiber_cleanup(); } /** Inject a short delay on tx request processing for testing. */ @@ -1519,9 +1538,11 @@ tx_process1(struct cmsg *m) iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, tuple != 0); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1562,9 +1583,11 @@ tx_process_select(struct cmsg *m) iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, count); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static int @@ -1652,9 +1675,11 @@ tx_process_call(struct cmsg *m) iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, count); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1695,9 +1720,11 @@ tx_process_misc(struct cmsg *m) } catch (Exception *e) { tx_reply_error(msg); } - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1711,8 +1738,6 @@ tx_process_sql(struct cmsg *m) const char *sql; uint32_t len; - tx_fiber_init(msg->connection->session, msg->header.sync); - if (tx_check_schema(msg->header.schema_version)) goto error; assert(msg->header.type == IPROTO_EXECUTE); @@ -1746,9 +1771,11 @@ tx_process_sql(struct cmsg *m) port_destroy(&port); iproto_reply_sql(out, &header_svp, msg->header.sync, schema_version); iproto_wpos_create(&msg->wpos, out); - return; + goto end; error: tx_reply_error(msg); +end: + tx_fiber_cleanup(); } static void @@ -1781,11 +1808,12 @@ tx_process_join_subscribe(struct cmsg *m) unreachable(); } } catch (SocketError *e) { - return; /* don't write error response to prevent SIGPIPE */ + /* don't write error response to prevent SIGPIPE */ } catch (Exception *e) { iproto_write_error(con->input.fd, e, ::schema_version, msg->header.sync); } + tx_fiber_cleanup(); } static void @@ -1891,6 +1919,7 @@ tx_process_connect(struct cmsg *m) tx_reply_error(msg); msg->close_connection = true; } + tx_fiber_cleanup(); } /** diff --git a/src/box/session.cc b/src/box/session.cc index 461d1cf25..09e988148 100644 --- a/src/box/session.cc +++ b/src/box/session.cc @@ -80,8 +80,12 @@ sid_max() return sid_max; } +/** + * Destroy session of a background fiber when the + * fiber is getting destroyed. + */ static int -session_on_stop(struct trigger *trigger, void * /* event */) +session_on_fiber_cleanup(struct trigger *trigger, void * /* event */) { /* * Remove on_stop trigger from the fiber, otherwise the @@ -167,11 +171,10 @@ session_create_on_demand() struct session *s = session_create(SESSION_TYPE_BACKGROUND); if (s == NULL) return NULL; - s->fiber_on_stop = { - RLIST_LINK_INITIALIZER, session_on_stop, NULL, NULL - }; - /* Add a trigger to destroy session on fiber stop */ - trigger_add(&fiber()->on_stop, &s->fiber_on_stop); + trigger_create(&s->fiber_on_cleanup, session_on_fiber_cleanup, + NULL, NULL); + /* Add a trigger to destroy session on fiber cleanup. */ + trigger_add(&fiber()->on_cleanup, &s->fiber_on_cleanup); credentials_reset(&s->credentials, admin_user); fiber_set_session(fiber(), s); fiber_set_user(fiber(), &s->credentials); diff --git a/src/box/session.h b/src/box/session.h index eff3d7a67..710e4919b 100644 --- a/src/box/session.h +++ b/src/box/session.h @@ -103,8 +103,11 @@ struct session { union session_meta meta; /** Session user id and global grants */ struct credentials credentials; - /** Trigger for fiber on_stop to cleanup created on-demand session */ - struct trigger fiber_on_stop; + /** + * Trigger for fiber on_cleanup to cleanup created + * on-demand session. + */ + struct trigger fiber_on_cleanup; }; struct session_vtab { diff --git a/src/box/txn.c b/src/box/txn.c index 963ec8eeb..7d549a6a9 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -41,7 +41,7 @@ double too_long_threshold; static struct stailq txn_cache = {NULL, &txn_cache.first}; static int -txn_on_stop(struct trigger *trigger, void *event); +txn_on_fiber_cleanup(struct trigger *trigger, void *event); static int txn_on_yield(struct trigger *trigger, void *event); @@ -226,8 +226,9 @@ txn_begin() txn->fiber = NULL; fiber_set_txn(fiber(), txn); /* fiber_on_yield is initialized by engine on demand */ - trigger_create(&txn->fiber_on_stop, txn_on_stop, NULL, NULL); - trigger_add(&fiber()->on_stop, &txn->fiber_on_stop); + trigger_create(&txn->fiber_on_cleanup, txn_on_fiber_cleanup, + NULL, NULL); + trigger_add(&fiber()->on_cleanup, &txn->fiber_on_cleanup); /* * By default all transactions may yield. * It's a responsibility of an engine to disable yields @@ -550,7 +551,7 @@ txn_prepare(struct txn *txn) if (engine_prepare(txn->engine, txn) != 0) return -1; } - trigger_clear(&txn->fiber_on_stop); + trigger_clear(&txn->fiber_on_cleanup); if (!txn_has_flag(txn, TXN_CAN_YIELD)) trigger_clear(&txn->fiber_on_yield); return 0; @@ -618,7 +619,7 @@ void txn_rollback(struct txn *txn) { assert(txn == in_txn()); - trigger_clear(&txn->fiber_on_stop); + trigger_clear(&txn->fiber_on_cleanup); if (!txn_has_flag(txn, TXN_CAN_YIELD)) trigger_clear(&txn->fiber_on_yield); txn->signature = -1; @@ -840,7 +841,7 @@ txn_savepoint_release(struct txn_savepoint *svp) } static int -txn_on_stop(struct trigger *trigger, void *event) +txn_on_fiber_cleanup(struct trigger *trigger, void *event) { (void) trigger; (void) event; diff --git a/src/box/txn.h b/src/box/txn.h index da12feebf..755c464f0 100644 --- a/src/box/txn.h +++ b/src/box/txn.h @@ -207,10 +207,11 @@ struct txn { */ struct trigger fiber_on_yield; /** - * Trigger on fiber stop, to rollback transaction - * in case a fiber stops (all engines). + * Trigger on fiber cleanup, to rollback transaction + * in case a fiber is getting reset/recycled/destroyed + * (all engines). */ - struct trigger fiber_on_stop; + struct trigger fiber_on_cleanup; /** Commit and rollback triggers. */ struct rlist on_commit, on_rollback; /** diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 00ae8cded..d89c640aa 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -325,6 +325,20 @@ fiber_attr_getstacksize(struct fiber_attr *fiber_attr) fiber_attr_default.stack_size; } +void +fiber_cleanup(struct fiber *f) +{ + if (trigger_run(&f->on_cleanup, f) != 0) + panic("Fiber cleanup can't fail"); + /* + * All cleanup triggers are supposed to + * remove themselves. So as no to waste + * time on that here, and to make them all + * work uniformly. + */ + assert(rlist_empty(&f->on_cleanup)); +} + static void fiber_recycle(struct fiber *fiber); @@ -787,7 +801,7 @@ static void fiber_reset(struct fiber *fiber) { rlist_create(&fiber->on_yield); - rlist_create(&fiber->on_stop); + rlist_create(&fiber->on_cleanup); fiber->flags = FIBER_DEFAULT_FLAGS; #if ENABLE_FIBER_TOP clock_stat_reset(&fiber->clock_stat); @@ -856,8 +870,7 @@ fiber_loop(MAYBE_UNUSED void *data) assert(f != fiber); fiber_wakeup(f); } - if (! rlist_empty(&fiber->on_stop)) - trigger_run(&fiber->on_stop, fiber); + fiber_cleanup(fiber); /* reset pending wakeups */ rlist_del(&fiber->state); if (! (fiber->flags & FIBER_IS_JOINABLE)) @@ -1161,7 +1174,7 @@ fiber_destroy(struct cord *cord, struct fiber *f) assert(f != &cord->sched); trigger_destroy(&f->on_yield); - trigger_destroy(&f->on_stop); + trigger_destroy(&f->on_cleanup); rlist_del(&f->state); rlist_del(&f->link); region_destroy(&f->gc); @@ -1525,8 +1538,8 @@ cord_cojoin(struct cord *cord) int break_ev_loop_f(struct trigger *trigger, void *event) { - (void) trigger; (void) event; + trigger_clear(trigger); ev_break(loop(), EVBREAK_ALL); return 0; } @@ -1555,7 +1568,7 @@ cord_costart_thread_func(void *arg) * Got to be in a trigger, to break the loop even * in case of an exception. */ - trigger_add(&f->on_stop, &break_ev_loop); + trigger_add(&f->on_cleanup, &break_ev_loop); fiber_set_joinable(f, true); fiber_start(f, ctx.arg); if (!fiber_is_dead(f)) { diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index c5b975513..21fff8f88 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -458,8 +458,17 @@ struct fiber { /** Triggers invoked before this fiber yields. Must not throw. */ struct rlist on_yield; - /** Triggers invoked before this fiber stops. Must not throw. */ - struct rlist on_stop; + /** + * Triggers invoked before this fiber is + * stopped/reset/recycled/destroyed/reused. In other + * words, each time when the fiber has finished execution + * of a request. + * In particular, for fibers not from fiber pool the + * cleanup is done before destroy and death. + * Pooled fibers are cleaned up after each request, even + * if they are never destroyed. + */ + struct rlist on_cleanup; /** * The list of fibers awaiting for this fiber's timely * (or untimely) death. @@ -506,6 +515,10 @@ struct fiber { char name[FIBER_NAME_MAX + 1]; }; +/** Invoke cleanup triggers and delete them. */ +void +fiber_cleanup(struct fiber *f); + struct cord_on_exit; /** diff --git a/src/lua/fiber.c b/src/lua/fiber.c index a7b75f9bf..4d2828f1c 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -441,11 +441,6 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) int coro_ref = lua_tointeger(L, -1); lua_pop(L, 1); result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); - - /* Destroy local storage */ - int storage_ref = f->storage.lua.ref; - if (storage_ref > 0) - luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); /* * If fiber is not joinable * We can unref child stack here, @@ -606,12 +601,43 @@ lbox_fiber_name(struct lua_State *L) } } +/** + * Trigger invoked when the fiber has finished execution of its + * current request. Only purpose - delete storage.lua.ref keeping + * a reference of Lua fiber.storage object. Unlike Lua stack, + * Lua fiber storage may be created not only for fibers born from + * Lua land. For example, an IProto request may execute a Lua + * function, which can create the storage. Trigger guarantees, + * that even for non-Lua fibers the Lua storage is destroyed. + */ +static int +lbox_fiber_on_cleanup(struct trigger *trigger, void *event) +{ + struct fiber *f = (struct fiber *) event; + int storage_ref = f->storage.lua.ref; + assert(storage_ref > 0); + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); + f->storage.lua.ref = 0; + trigger_clear(trigger); + free(trigger); + return 0; +} + static int lbox_fiber_storage(struct lua_State *L) { struct fiber *f = lbox_checkfiber(L, 1); int storage_ref = f->storage.lua.ref; if (storage_ref <= 0) { + struct trigger *t = (struct trigger *) + malloc(sizeof(*t)); + if (t == NULL) { + diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); + return luaT_error(L); + } + trigger_create(t, lbox_fiber_on_cleanup, NULL, + (trigger_f0) free); + trigger_add(&f->on_cleanup, t); lua_newtable(L); /* create local storage on demand */ storage_ref = luaL_ref(L, LUA_REGISTRYINDEX); f->storage.lua.ref = storage_ref; diff --git a/test/app/gh-4662-fiber-storage-leak.result b/test/app/gh-4662-fiber-storage-leak.result new file mode 100644 index 000000000..4ade017a4 --- /dev/null +++ b/test/app/gh-4662-fiber-storage-leak.result @@ -0,0 +1,88 @@ +-- test-run result file version 2 +fiber = require('fiber') + | --- + | ... +netbox = require('net.box') + | --- + | ... + +-- +-- gh-4662: fiber.storage was not deleted when created in a fiber +-- started from the thread pool used by IProto requests. The +-- problem was that fiber.storage was created and deleted in Lua +-- land only, assuming that only Lua-born fibers could have it. +-- But in fact any fiber can create a Lua storage. Including the +-- ones used to serve IProto requests. +-- The test checks if fiber.storage is really deleted, and is not +-- shared between requests. +-- + +box.schema.user.grant('guest', 'execute', 'universe') + | --- + | ... +storage = nil + | --- + | ... +i = 0 + | --- + | ... +weak_table = setmetatable({}, {__mode = 'v'}) + | --- + | ... +object = {'object'} + | --- + | ... +weak_table.object = object + | --- + | ... +function ref_object_in_fiber() \ + storage = fiber.self().storage \ + assert(next(storage) == nil) \ + i = i + 1 \ + fiber.self().storage.key = i \ + fiber.self().storage.object = object \ +end + | --- + | ... + +c = netbox.connect(box.cfg.listen) + | --- + | ... +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') + | --- + | ... +storage + | --- + | - key: 2 + | object: + | - object + | ... +i + | --- + | - 2 + | ... +object = nil + | --- + | ... +storage = nil + | --- + | ... +collectgarbage('collect') + | --- + | - 0 + | ... +-- The weak table should be empty, because the only two hard +-- references were in the fibers used to serve +-- ref_object_in_fiber() requests. And their storages should be +-- cleaned up. +weak_table + | --- + | - [] + | ... + +c:close() + | --- + | ... +box.schema.user.revoke('guest', 'execute', 'universe') + | --- + | ... diff --git a/test/app/gh-4662-fiber-storage-leak.test.lua b/test/app/gh-4662-fiber-storage-leak.test.lua new file mode 100644 index 000000000..25acf5679 --- /dev/null +++ b/test/app/gh-4662-fiber-storage-leak.test.lua @@ -0,0 +1,43 @@ +fiber = require('fiber') +netbox = require('net.box') + +-- +-- gh-4662: fiber.storage was not deleted when created in a fiber +-- started from the thread pool used by IProto requests. The +-- problem was that fiber.storage was created and deleted in Lua +-- land only, assuming that only Lua-born fibers could have it. +-- But in fact any fiber can create a Lua storage. Including the +-- ones used to serve IProto requests. +-- The test checks if fiber.storage is really deleted, and is not +-- shared between requests. +-- + +box.schema.user.grant('guest', 'execute', 'universe') +storage = nil +i = 0 +weak_table = setmetatable({}, {__mode = 'v'}) +object = {'object'} +weak_table.object = object +function ref_object_in_fiber() \ + storage = fiber.self().storage \ + assert(next(storage) == nil) \ + i = i + 1 \ + fiber.self().storage.key = i \ + fiber.self().storage.object = object \ +end + +c = netbox.connect(box.cfg.listen) +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') +storage +i +object = nil +storage = nil +collectgarbage('collect') +-- The weak table should be empty, because the only two hard +-- references were in the fibers used to serve +-- ref_object_in_fiber() requests. And their storages should be +-- cleaned up. +weak_table + +c:close() +box.schema.user.revoke('guest', 'execute', 'universe') ========================================================================= ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-11 21:23 ` Vladislav Shpilevoy @ 2019-12-12 0:00 ` Igor Munkin 2019-12-12 23:37 ` Vladislav Shpilevoy 2019-12-12 8:46 ` Konstantin Osipov 1 sibling, 1 reply; 26+ messages in thread From: Igor Munkin @ 2019-12-12 0:00 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Vlad, Thanks for the patch! Sorry, I needed some intro into all this machinery thus the review was postponed a bit and you've managed to prepare a new patch regarding Kostja's remarks. Therefore, I decided to move all my notes to the new patch instead of splitting them into two replies. This patch LGTM in general, but please consider several minor comments I left below. On 11.12.19, Vladislav Shpilevoy wrote: <snipped> > ========================================================================= > > The whole new patch: > > ========================================================================= > > fiber: destroy fiber.storage created by iproto > > Fiber.storage was not deleted when created in a fiber started from > the thread pool used by IProto requests. The problem was that > fiber.storage was created and deleted in Lua land only, assuming > that only Lua-born fibers could have it. But in fact any fiber can > create a Lua storage. Including the ones used to serve IProto > requests. > > Not deletion of the storage led to a possibility of meeting a > non-empty fiber.storage in the beginning of an iproto request, and > not deletion of the memory caught by the storage until its > explicit nullification. > > Now the storage destructor works for any fiber, which managed to > create the storage. The destructor unrefs and nullifies the > storage. > > For destructor purposes the fiber.on_stop triggers were reworked. > Now they are on_cleanup triggers, and can be called multiple times > during fiber's lifetime. After every request done by that fiber. > > Closes #4662 > Closes #3462 > > diff --git a/src/box/iproto.cc b/src/box/iproto.cc > index c39b8e7bf..600a21ac8 100644 > --- a/src/box/iproto.cc > +++ b/src/box/iproto.cc > @@ -1306,6 +1306,12 @@ static void > tx_fiber_init(struct session *session, uint64_t sync) > { > struct fiber *f = fiber(); > + /* > + * There should not be any not executed > + * destructors from a previous request > + * executed in that fiber. > + */ > + assert(rlist_empty(&f->on_cleanup)); > f->storage.net.sync = sync; > /* > * We do not cleanup fiber keys at the end of each request. > @@ -1321,6 +1327,17 @@ tx_fiber_init(struct session *session, uint64_t sync) > fiber_set_user(f, &session->credentials); > } > > +/** > + * Cleanup current fiber after a request is > + * executed to make it possible to reuse the fiber > + * for a next request. > + */ > +static inline void > +tx_fiber_cleanup() > +{ > + fiber_cleanup(fiber()); > +} > + > static void > tx_process_disconnect(struct cmsg *m) > { > @@ -1331,6 +1348,7 @@ tx_process_disconnect(struct cmsg *m) > if (! rlist_empty(&session_on_disconnect)) { > tx_fiber_init(con->session, 0); > session_run_on_disconnect_triggers(con->session); > + tx_fiber_cleanup(); > } > } > } > @@ -1486,6 +1504,7 @@ tx_reply_iproto_error(struct cmsg *m) > iproto_reply_error(out, diag_last_error(&msg->diag), > msg->header.sync, ::schema_version); > iproto_wpos_create(&msg->wpos, out); > + tx_fiber_cleanup(); > } > > /** Inject a short delay on tx request processing for testing. */ > @@ -1519,9 +1538,11 @@ tx_process1(struct cmsg *m) > iproto_reply_select(out, &svp, msg->header.sync, ::schema_version, > tuple != 0); > iproto_wpos_create(&msg->wpos, out); > - return; > + goto end; > error: > tx_reply_error(msg); > +end: > + tx_fiber_cleanup(); > } > > static void > @@ -1562,9 +1583,11 @@ tx_process_select(struct cmsg *m) > iproto_reply_select(out, &svp, msg->header.sync, > ::schema_version, count); > iproto_wpos_create(&msg->wpos, out); > - return; > + goto end; > error: > tx_reply_error(msg); > +end: > + tx_fiber_cleanup(); > } > > static int > @@ -1652,9 +1675,11 @@ tx_process_call(struct cmsg *m) > iproto_reply_select(out, &svp, msg->header.sync, > ::schema_version, count); > iproto_wpos_create(&msg->wpos, out); > - return; > + goto end; > error: > tx_reply_error(msg); > +end: > + tx_fiber_cleanup(); > } > > static void > @@ -1695,9 +1720,11 @@ tx_process_misc(struct cmsg *m) > } catch (Exception *e) { > tx_reply_error(msg); > } > - return; > + goto end; > error: > tx_reply_error(msg); > +end: > + tx_fiber_cleanup(); > } > > static void > @@ -1711,8 +1738,6 @@ tx_process_sql(struct cmsg *m) > const char *sql; > uint32_t len; > > - tx_fiber_init(msg->connection->session, msg->header.sync); > - > if (tx_check_schema(msg->header.schema_version)) > goto error; > assert(msg->header.type == IPROTO_EXECUTE); > @@ -1746,9 +1771,11 @@ tx_process_sql(struct cmsg *m) > port_destroy(&port); > iproto_reply_sql(out, &header_svp, msg->header.sync, schema_version); > iproto_wpos_create(&msg->wpos, out); > - return; > + goto end; > error: > tx_reply_error(msg); > +end: > + tx_fiber_cleanup(); > } > > static void > @@ -1781,11 +1808,12 @@ tx_process_join_subscribe(struct cmsg *m) > unreachable(); > } > } catch (SocketError *e) { > - return; /* don't write error response to prevent SIGPIPE */ > + /* don't write error response to prevent SIGPIPE */ > } catch (Exception *e) { > iproto_write_error(con->input.fd, e, ::schema_version, > msg->header.sync); > } > + tx_fiber_cleanup(); > } > > static void > @@ -1891,6 +1919,7 @@ tx_process_connect(struct cmsg *m) > tx_reply_error(msg); > msg->close_connection = true; > } > + tx_fiber_cleanup(); > } > > /** > diff --git a/src/box/session.cc b/src/box/session.cc > index 461d1cf25..09e988148 100644 > --- a/src/box/session.cc > +++ b/src/box/session.cc > @@ -80,8 +80,12 @@ sid_max() > return sid_max; > } > > +/** > + * Destroy session of a background fiber when the > + * fiber is getting destroyed. > + */ > static int > -session_on_stop(struct trigger *trigger, void * /* event */) > +session_on_fiber_cleanup(struct trigger *trigger, void * /* event */) > { > /* > * Remove on_stop trigger from the fiber, otherwise the > @@ -167,11 +171,10 @@ session_create_on_demand() > struct session *s = session_create(SESSION_TYPE_BACKGROUND); > if (s == NULL) > return NULL; > - s->fiber_on_stop = { > - RLIST_LINK_INITIALIZER, session_on_stop, NULL, NULL > - }; > - /* Add a trigger to destroy session on fiber stop */ > - trigger_add(&fiber()->on_stop, &s->fiber_on_stop); > + trigger_create(&s->fiber_on_cleanup, session_on_fiber_cleanup, > + NULL, NULL); > + /* Add a trigger to destroy session on fiber cleanup. */ > + trigger_add(&fiber()->on_cleanup, &s->fiber_on_cleanup); > credentials_reset(&s->credentials, admin_user); > fiber_set_session(fiber(), s); > fiber_set_user(fiber(), &s->credentials); > diff --git a/src/box/session.h b/src/box/session.h > index eff3d7a67..710e4919b 100644 > --- a/src/box/session.h > +++ b/src/box/session.h > @@ -103,8 +103,11 @@ struct session { > union session_meta meta; > /** Session user id and global grants */ > struct credentials credentials; > - /** Trigger for fiber on_stop to cleanup created on-demand session */ > - struct trigger fiber_on_stop; > + /** > + * Trigger for fiber on_cleanup to cleanup created > + * on-demand session. > + */ > + struct trigger fiber_on_cleanup; > }; > > struct session_vtab { > diff --git a/src/box/txn.c b/src/box/txn.c > index 963ec8eeb..7d549a6a9 100644 > --- a/src/box/txn.c > +++ b/src/box/txn.c > @@ -41,7 +41,7 @@ double too_long_threshold; > static struct stailq txn_cache = {NULL, &txn_cache.first}; > > static int > -txn_on_stop(struct trigger *trigger, void *event); > +txn_on_fiber_cleanup(struct trigger *trigger, void *event); > > static int > txn_on_yield(struct trigger *trigger, void *event); > @@ -226,8 +226,9 @@ txn_begin() > txn->fiber = NULL; > fiber_set_txn(fiber(), txn); > /* fiber_on_yield is initialized by engine on demand */ > - trigger_create(&txn->fiber_on_stop, txn_on_stop, NULL, NULL); > - trigger_add(&fiber()->on_stop, &txn->fiber_on_stop); > + trigger_create(&txn->fiber_on_cleanup, txn_on_fiber_cleanup, > + NULL, NULL); > + trigger_add(&fiber()->on_cleanup, &txn->fiber_on_cleanup); > /* > * By default all transactions may yield. > * It's a responsibility of an engine to disable yields > @@ -550,7 +551,7 @@ txn_prepare(struct txn *txn) > if (engine_prepare(txn->engine, txn) != 0) > return -1; > } > - trigger_clear(&txn->fiber_on_stop); > + trigger_clear(&txn->fiber_on_cleanup); > if (!txn_has_flag(txn, TXN_CAN_YIELD)) > trigger_clear(&txn->fiber_on_yield); > return 0; > @@ -618,7 +619,7 @@ void > txn_rollback(struct txn *txn) > { > assert(txn == in_txn()); > - trigger_clear(&txn->fiber_on_stop); > + trigger_clear(&txn->fiber_on_cleanup); > if (!txn_has_flag(txn, TXN_CAN_YIELD)) > trigger_clear(&txn->fiber_on_yield); > txn->signature = -1; > @@ -840,7 +841,7 @@ txn_savepoint_release(struct txn_savepoint *svp) > } > > static int > -txn_on_stop(struct trigger *trigger, void *event) > +txn_on_fiber_cleanup(struct trigger *trigger, void *event) > { > (void) trigger; > (void) event; > diff --git a/src/box/txn.h b/src/box/txn.h > index da12feebf..755c464f0 100644 > --- a/src/box/txn.h > +++ b/src/box/txn.h > @@ -207,10 +207,11 @@ struct txn { > */ > struct trigger fiber_on_yield; > /** > - * Trigger on fiber stop, to rollback transaction > - * in case a fiber stops (all engines). > + * Trigger on fiber cleanup, to rollback transaction > + * in case a fiber is getting reset/recycled/destroyed > + * (all engines). > */ > - struct trigger fiber_on_stop; > + struct trigger fiber_on_cleanup; > /** Commit and rollback triggers. */ > struct rlist on_commit, on_rollback; > /** > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 00ae8cded..d89c640aa 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -325,6 +325,20 @@ fiber_attr_getstacksize(struct fiber_attr *fiber_attr) > fiber_attr_default.stack_size; > } > > +void > +fiber_cleanup(struct fiber *f) > +{ > + if (trigger_run(&f->on_cleanup, f) != 0) > + panic("Fiber cleanup can't fail"); > + /* > + * All cleanup triggers are supposed to > + * remove themselves. So as no to waste > + * time on that here, and to make them all > + * work uniformly. > + */ > + assert(rlist_empty(&f->on_cleanup)); > +} > + > static void > fiber_recycle(struct fiber *fiber); > > @@ -787,7 +801,7 @@ static void > fiber_reset(struct fiber *fiber) > { > rlist_create(&fiber->on_yield); > - rlist_create(&fiber->on_stop); > + rlist_create(&fiber->on_cleanup); > fiber->flags = FIBER_DEFAULT_FLAGS; > #if ENABLE_FIBER_TOP > clock_stat_reset(&fiber->clock_stat); > @@ -856,8 +870,7 @@ fiber_loop(MAYBE_UNUSED void *data) > assert(f != fiber); > fiber_wakeup(f); > } > - if (! rlist_empty(&fiber->on_stop)) > - trigger_run(&fiber->on_stop, fiber); > + fiber_cleanup(fiber); I see you dropped the if above and fiber_cleanup doesn't have any corresponding within. Is this removal an intentional one? Minor: There is a mess with whitespace above. Feel free to ignore the remark, since indentation was broken before your changes. I blamed several lines and it looks more like a ridiculous code style, and not a problem with editor. Thereby you just implicitly continue the mix. > /* reset pending wakeups */ > rlist_del(&fiber->state); > if (! (fiber->flags & FIBER_IS_JOINABLE)) > @@ -1161,7 +1174,7 @@ fiber_destroy(struct cord *cord, struct fiber *f) > assert(f != &cord->sched); > > trigger_destroy(&f->on_yield); > - trigger_destroy(&f->on_stop); > + trigger_destroy(&f->on_cleanup); > rlist_del(&f->state); > rlist_del(&f->link); > region_destroy(&f->gc); > @@ -1525,8 +1538,8 @@ cord_cojoin(struct cord *cord) > int > break_ev_loop_f(struct trigger *trigger, void *event) > { > - (void) trigger; > (void) event; > + trigger_clear(trigger); > ev_break(loop(), EVBREAK_ALL); > return 0; > } > @@ -1555,7 +1568,7 @@ cord_costart_thread_func(void *arg) > * Got to be in a trigger, to break the loop even > * in case of an exception. > */ > - trigger_add(&f->on_stop, &break_ev_loop); > + trigger_add(&f->on_cleanup, &break_ev_loop); > fiber_set_joinable(f, true); > fiber_start(f, ctx.arg); > if (!fiber_is_dead(f)) { > diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h > index c5b975513..21fff8f88 100644 > --- a/src/lib/core/fiber.h > +++ b/src/lib/core/fiber.h > @@ -458,8 +458,17 @@ struct fiber { > > /** Triggers invoked before this fiber yields. Must not throw. */ > struct rlist on_yield; > - /** Triggers invoked before this fiber stops. Must not throw. */ > - struct rlist on_stop; > + /** > + * Triggers invoked before this fiber is > + * stopped/reset/recycled/destroyed/reused. In other > + * words, each time when the fiber has finished execution > + * of a request. > + * In particular, for fibers not from fiber pool the > + * cleanup is done before destroy and death. > + * Pooled fibers are cleaned up after each request, even > + * if they are never destroyed. > + */ > + struct rlist on_cleanup; Minor: both trigger lists above has the "Must not throw" note. Does the introduced list respect this rule? Please, adjust the comment if yes. > /** > * The list of fibers awaiting for this fiber's timely > * (or untimely) death. > @@ -506,6 +515,10 @@ struct fiber { > char name[FIBER_NAME_MAX + 1]; > }; > > +/** Invoke cleanup triggers and delete them. */ > +void > +fiber_cleanup(struct fiber *f); > + > struct cord_on_exit; > > /** > diff --git a/src/lua/fiber.c b/src/lua/fiber.c > index a7b75f9bf..4d2828f1c 100644 > --- a/src/lua/fiber.c > +++ b/src/lua/fiber.c > @@ -441,11 +441,6 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) > int coro_ref = lua_tointeger(L, -1); > lua_pop(L, 1); > result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); > - > - /* Destroy local storage */ > - int storage_ref = f->storage.lua.ref; > - if (storage_ref > 0) > - luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); > /* > * If fiber is not joinable > * We can unref child stack here, > @@ -606,12 +601,43 @@ lbox_fiber_name(struct lua_State *L) > } > } > > +/** > + * Trigger invoked when the fiber has finished execution of its > + * current request. Only purpose - delete storage.lua.ref keeping > + * a reference of Lua fiber.storage object. Unlike Lua stack, > + * Lua fiber storage may be created not only for fibers born from > + * Lua land. For example, an IProto request may execute a Lua > + * function, which can create the storage. Trigger guarantees, > + * that even for non-Lua fibers the Lua storage is destroyed. > + */ > +static int > +lbox_fiber_on_cleanup(struct trigger *trigger, void *event) > +{ > + struct fiber *f = (struct fiber *) event; > + int storage_ref = f->storage.lua.ref; > + assert(storage_ref > 0); > + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); > + f->storage.lua.ref = 0; Though 0 value is specific "ref" (a table slot, where the LRU ref is stored) and this value cannot be yield from luaL_ref. Please, consider using LUA_NOREF value for such cases, since it's being provided by Lua and lua_unref does nothing for it, as well as for LUA_REFNIL. > + trigger_clear(trigger); > + free(trigger); > + return 0; > +} > + > static int > lbox_fiber_storage(struct lua_State *L) > { > struct fiber *f = lbox_checkfiber(L, 1); > int storage_ref = f->storage.lua.ref; > if (storage_ref <= 0) { > + struct trigger *t = (struct trigger *) > + malloc(sizeof(*t)); > + if (t == NULL) { > + diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); > + return luaT_error(L); > + } > + trigger_create(t, lbox_fiber_on_cleanup, NULL, > + (trigger_f0) free); > + trigger_add(&f->on_cleanup, t); > lua_newtable(L); /* create local storage on demand */ > storage_ref = luaL_ref(L, LUA_REGISTRYINDEX); > f->storage.lua.ref = storage_ref; > diff --git a/test/app/gh-4662-fiber-storage-leak.result b/test/app/gh-4662-fiber-storage-leak.result > new file mode 100644 > index 000000000..4ade017a4 > --- /dev/null > +++ b/test/app/gh-4662-fiber-storage-leak.result > @@ -0,0 +1,88 @@ > +-- test-run result file version 2 > +fiber = require('fiber') > + | --- > + | ... > +netbox = require('net.box') > + | --- > + | ... > + > +-- > +-- gh-4662: fiber.storage was not deleted when created in a fiber > +-- started from the thread pool used by IProto requests. The > +-- problem was that fiber.storage was created and deleted in Lua > +-- land only, assuming that only Lua-born fibers could have it. > +-- But in fact any fiber can create a Lua storage. Including the > +-- ones used to serve IProto requests. > +-- The test checks if fiber.storage is really deleted, and is not > +-- shared between requests. > +-- > + > +box.schema.user.grant('guest', 'execute', 'universe') > + | --- > + | ... > +storage = nil > + | --- > + | ... > +i = 0 > + | --- > + | ... > +weak_table = setmetatable({}, {__mode = 'v'}) > + | --- > + | ... > +object = {'object'} > + | --- > + | ... > +weak_table.object = object > + | --- > + | ... > +function ref_object_in_fiber() \ > + storage = fiber.self().storage \ > + assert(next(storage) == nil) \ > + i = i + 1 \ > + fiber.self().storage.key = i \ > + fiber.self().storage.object = object \ > +end > + | --- > + | ... > + > +c = netbox.connect(box.cfg.listen) > + | --- > + | ... > +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') > + | --- > + | ... > +storage > + | --- > + | - key: 2 > + | object: > + | - object > + | ... > +i > + | --- > + | - 2 > + | ... > +object = nil > + | --- > + | ... > +storage = nil > + | --- > + | ... > +collectgarbage('collect') > + | --- > + | - 0 > + | ... > +-- The weak table should be empty, because the only two hard > +-- references were in the fibers used to serve > +-- ref_object_in_fiber() requests. And their storages should be > +-- cleaned up. > +weak_table > + | --- > + | - [] > + | ... > + > +c:close() > + | --- > + | ... > +box.schema.user.revoke('guest', 'execute', 'universe') > + | --- > + | ... > diff --git a/test/app/gh-4662-fiber-storage-leak.test.lua b/test/app/gh-4662-fiber-storage-leak.test.lua > new file mode 100644 > index 000000000..25acf5679 > --- /dev/null > +++ b/test/app/gh-4662-fiber-storage-leak.test.lua > @@ -0,0 +1,43 @@ > +fiber = require('fiber') > +netbox = require('net.box') > + > +-- > +-- gh-4662: fiber.storage was not deleted when created in a fiber > +-- started from the thread pool used by IProto requests. The > +-- problem was that fiber.storage was created and deleted in Lua > +-- land only, assuming that only Lua-born fibers could have it. > +-- But in fact any fiber can create a Lua storage. Including the > +-- ones used to serve IProto requests. > +-- The test checks if fiber.storage is really deleted, and is not > +-- shared between requests. > +-- > + > +box.schema.user.grant('guest', 'execute', 'universe') > +storage = nil > +i = 0 > +weak_table = setmetatable({}, {__mode = 'v'}) > +object = {'object'} > +weak_table.object = object > +function ref_object_in_fiber() \ > + storage = fiber.self().storage \ > + assert(next(storage) == nil) \ > + i = i + 1 \ > + fiber.self().storage.key = i \ > + fiber.self().storage.object = object \ > +end > + > +c = netbox.connect(box.cfg.listen) > +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') > +storage > +i > +object = nil > +storage = nil > +collectgarbage('collect') > +-- The weak table should be empty, because the only two hard > +-- references were in the fibers used to serve > +-- ref_object_in_fiber() requests. And their storages should be > +-- cleaned up. > +weak_table > + > +c:close() > +box.schema.user.revoke('guest', 'execute', 'universe') > > ========================================================================= -- Best regards, IM ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-12 0:00 ` Igor Munkin @ 2019-12-12 23:37 ` Vladislav Shpilevoy 2019-12-13 13:35 ` Igor Munkin 0 siblings, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-12 23:37 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Thanks for the review! >> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c >> index 00ae8cded..d89c640aa 100644 >> --- a/src/lib/core/fiber.c >> +++ b/src/lib/core/fiber.c >> @@ -787,7 +801,7 @@ static void >> fiber_reset(struct fiber *fiber) >> { >> rlist_create(&fiber->on_yield); >> - rlist_create(&fiber->on_stop); >> + rlist_create(&fiber->on_cleanup); >> fiber->flags = FIBER_DEFAULT_FLAGS; >> #if ENABLE_FIBER_TOP >> clock_stat_reset(&fiber->clock_stat); >> @@ -856,8 +870,7 @@ fiber_loop(MAYBE_UNUSED void *data) >> assert(f != fiber); >> fiber_wakeup(f); >> } >> - if (! rlist_empty(&fiber->on_stop)) >> - trigger_run(&fiber->on_stop, fiber); >> + fiber_cleanup(fiber); > > I see you dropped the if above and fiber_cleanup doesn't have any > corresponding within. Is this removal an intentional one? Yes. This is because trigger_run works fine with empty rlist, so this check was useless. > > Minor: There is a mess with whitespace above. Feel free to ignore the > remark, since indentation was broken before your changes. I blamed > several lines and it looks more like a ridiculous code style, and not a > problem with editor. Thereby you just implicitly continue the mix. Wow, good catch! I didn't notice it in git diff nor in the editor. Fixed. > >> /* reset pending wakeups */ >> rlist_del(&fiber->state); >> if (! (fiber->flags & FIBER_IS_JOINABLE)) >> diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h >> index c5b975513..21fff8f88 100644 >> --- a/src/lib/core/fiber.h >> +++ b/src/lib/core/fiber.h >> @@ -458,8 +458,17 @@ struct fiber { >> >> /** Triggers invoked before this fiber yields. Must not throw. */ >> struct rlist on_yield; >> - /** Triggers invoked before this fiber stops. Must not throw. */ >> - struct rlist on_stop; >> + /** >> + * Triggers invoked before this fiber is >> + * stopped/reset/recycled/destroyed/reused. In other >> + * words, each time when the fiber has finished execution >> + * of a request. >> + * In particular, for fibers not from fiber pool the >> + * cleanup is done before destroy and death. >> + * Pooled fibers are cleaned up after each request, even >> + * if they are never destroyed. >> + */ >> + struct rlist on_cleanup; > > Minor: both trigger lists above has the "Must not throw" note. Does the > introduced list respect this rule? Please, adjust the comment if yes. Nope. These 'must not throw' are outdated. Our triggers does not throw errors anymore by design. So I deleted it from the updated comment. >> /** >> * The list of fibers awaiting for this fiber's timely >> * (or untimely) death. >> diff --git a/src/lua/fiber.c b/src/lua/fiber.c >> index a7b75f9bf..4d2828f1c 100644 >> --- a/src/lua/fiber.c >> +++ b/src/lua/fiber.c >> @@ -606,12 +601,43 @@ lbox_fiber_name(struct lua_State *L) >> } >> } >> >> +/** >> + * Trigger invoked when the fiber has finished execution of its >> + * current request. Only purpose - delete storage.lua.ref keeping >> + * a reference of Lua fiber.storage object. Unlike Lua stack, >> + * Lua fiber storage may be created not only for fibers born from >> + * Lua land. For example, an IProto request may execute a Lua >> + * function, which can create the storage. Trigger guarantees, >> + * that even for non-Lua fibers the Lua storage is destroyed. >> + */ >> +static int >> +lbox_fiber_on_cleanup(struct trigger *trigger, void *event) >> +{ >> + struct fiber *f = (struct fiber *) event; >> + int storage_ref = f->storage.lua.ref; >> + assert(storage_ref > 0); >> + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); >> + f->storage.lua.ref = 0; > > Though 0 value is specific "ref" (a table slot, where the LRU ref is > stored) and this value cannot be yield from luaL_ref. Please, consider > using LUA_NOREF value for such cases, since it's being provided by Lua > and lua_unref does nothing for it, as well as for LUA_REFNIL. Thanks, didn't know about these constants. =================================================================== assert(storage_ref > 0); luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); - f->storage.lua.ref = 0; + f->storage.lua.ref = LUA_NOREF; trigger_clear(trigger); free(trigger); =================================================================== ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-12 23:37 ` Vladislav Shpilevoy @ 2019-12-13 13:35 ` Igor Munkin 0 siblings, 0 replies; 26+ messages in thread From: Igor Munkin @ 2019-12-13 13:35 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Vlad, On 13.12.19, Vladislav Shpilevoy wrote: > Thanks for the review! > <snipped> > > > > >> /* reset pending wakeups */ > >> rlist_del(&fiber->state); > >> if (! (fiber->flags & FIBER_IS_JOINABLE)) > >> diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h > >> index c5b975513..21fff8f88 100644 > >> --- a/src/lib/core/fiber.h > >> +++ b/src/lib/core/fiber.h > >> @@ -458,8 +458,17 @@ struct fiber { > >> > >> /** Triggers invoked before this fiber yields. Must not throw. */ > >> struct rlist on_yield; > >> - /** Triggers invoked before this fiber stops. Must not throw. */ > >> - struct rlist on_stop; > >> + /** > >> + * Triggers invoked before this fiber is > >> + * stopped/reset/recycled/destroyed/reused. In other > >> + * words, each time when the fiber has finished execution > >> + * of a request. > >> + * In particular, for fibers not from fiber pool the > >> + * cleanup is done before destroy and death. > >> + * Pooled fibers are cleaned up after each request, even > >> + * if they are never destroyed. > >> + */ > >> + struct rlist on_cleanup; > > > > Minor: both trigger lists above has the "Must not throw" note. Does the > > introduced list respect this rule? Please, adjust the comment if yes. > > Nope. These 'must not throw' are outdated. Our triggers does not > throw errors anymore by design. So I deleted it from the updated > comment. > OK, thanks, now it's clear. Guess we need an activity to update comments for other trigger lists. -- Best regards, IM ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-11 21:23 ` Vladislav Shpilevoy 2019-12-12 0:00 ` Igor Munkin @ 2019-12-12 8:46 ` Konstantin Osipov 2019-12-13 0:02 ` Vladislav Shpilevoy 1 sibling, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-12 8:46 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/12 00:27]: > - I moved cleanup to iproto.cc, and called it tx_fiber_cleanup(). > By analogue with tx_fiber_init() which currently initializes the > fiber; > > - I made cleanup triggers always clean them by themselves to remove > trigger_destroy() from fiber.c. > > But it didn't help me to understand, why are you saying, that > tx and session should not use these triggers. Tx, in case it was > not removed from fiber before its cleanup, should be destroyed. > Session uses that trigger to destroy local one-fiber sessions, > which die together with the fiber getting recycled/killed. So > what is a problem? That trigger perfectly fits all the needs. OK, the function itself may be a match, but it doesn't make the idea of using the same trigger object for everything right, don't you think? My point is that you do not only move where the function is invoked, but also where the trigger list object is stored. Can you move it to Lua fiber object and iproto_msg object? There could be no other entry points to the request, so my point is that the trigger is part of request state, not fiber state. Does it make any sense? -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-12 8:46 ` Konstantin Osipov @ 2019-12-13 0:02 ` Vladislav Shpilevoy 2019-12-13 7:58 ` Konstantin Osipov 0 siblings, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-13 0:02 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches Hi! Thanks for the review! On 12/12/2019 09:46, Konstantin Osipov wrote: > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/12 00:27]: >> - I moved cleanup to iproto.cc, and called it tx_fiber_cleanup(). >> By analogue with tx_fiber_init() which currently initializes the >> fiber; >> >> - I made cleanup triggers always clean them by themselves to remove >> trigger_destroy() from fiber.c. >> >> But it didn't help me to understand, why are you saying, that >> tx and session should not use these triggers. Tx, in case it was >> not removed from fiber before its cleanup, should be destroyed. >> Session uses that trigger to destroy local one-fiber sessions, >> which die together with the fiber getting recycled/killed. So >> what is a problem? That trigger perfectly fits all the needs. > > OK, the function itself may be a match, but it doesn't make the > idea of using the same trigger object for everything right, don't > you think? For everything possible to imagine - no. For txn, session, and language specific fiber-local data - yes. And by coincidence that is pretty much everything we have now. > > My point is that you do not only move where the function is > invoked, but also where the trigger list object is stored. Nope. The trigger list is still in struct fiber. > > Can you move it to Lua fiber object and iproto_msg object? I am not sure I understand what you mean. I can't move lua fiber storage destructor to iproto. I need Lua for that. So this should be in Lua folder. If you are talking about moving trigger list to iproto and Lua, and somehow saying pointer at that list to a member in struct fiber, then 1) I would need to patch C version too, because we have public C API. The fact you forgot about C API says, that it is not a good idea to make that cleanup in each front end. Easy to miss something; 2) That will give exactly the same - a member in struct fiber. There just no other communication channel between different frontends in which a fiber may participate during lifetime; 3) We would need to add a new member to struct fiber and increase its size. > There could be no other entry points to the request, so my point > is that the trigger is part of request state, not fiber state. 'Request' idea is different for session, for txn, for iproto. You can't define one request for all things. While you are trying to define a 'request' concept, there is an already defined concept of fiber function. The function which a fiber executes. That should be the anchor. This is the 'request' in terms of fibers. Fiber owns session, fiber owns txn, fiber owns Lua resources. So it is his task to clean them up. And it should be in his triggers. On_cleanup is fiber's destructor. Yes, fiber can hand out these resources, but all access to them goes through fiber object. Fiber owns them, and fiber should destroy them when its current task is done. This is fiber's responsibility. > > Does it make any sense? > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-13 0:02 ` Vladislav Shpilevoy @ 2019-12-13 7:58 ` Konstantin Osipov 2019-12-13 23:11 ` Vladislav Shpilevoy 0 siblings, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-13 7:58 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/13 09:42]: > > Can you move it to Lua fiber object and iproto_msg object? > > I am not sure I understand what you mean. I can't move lua fiber > storage destructor to iproto. I need Lua for that. So this should > be in Lua folder. I mean struct rlist on_request_end should be a member of iproto_msg, not struct fiber. Does it make sense? > If you are talking about moving trigger list to iproto and Lua, > and somehow saying pointer at that list to a member in struct > fiber, then No, why do you need to store the list in struct fiber? > 1) I would need to patch C version too, because we have public > C API. The fact you forgot about C API says, that it is not > a good idea to make that cleanup in each front end. Easy to > miss something; C api will enter into Lua land (lbox_pushfiber) as soon as it creates a Lua fiber, and the Lua land will call the trigger in the end of lbox_fiber_run_f. > 2) That will give exactly the same - a member in struct fiber. > There just no other communication channel between different > frontends in which a fiber may participate during lifetime; Why do you think there will be different frontends involved? The storage is pure-Lua, there is no fiber storage for non-Lua fibers. > 3) We would need to add a new member to struct fiber and > increase its size. Please see above, do you get the idea now? -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-13 7:58 ` Konstantin Osipov @ 2019-12-13 23:11 ` Vladislav Shpilevoy 2019-12-14 12:26 ` Konstantin Osipov 0 siblings, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-13 23:11 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches On 13/12/2019 08:58, Konstantin Osipov wrote: > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/13 09:42]: >>> Can you move it to Lua fiber object and iproto_msg object? >> >> I am not sure I understand what you mean. I can't move lua fiber >> storage destructor to iproto. I need Lua for that. So this should >> be in Lua folder. > > I mean struct rlist on_request_end should be a member of > iproto_msg, not struct fiber. Does it make sense? > >> If you are talking about moving trigger list to iproto and Lua, >> and somehow saying pointer at that list to a member in struct >> fiber, then > > No, why do you need to store the list in struct fiber? > Because the only thing which is available in lua/fiber.c is the fiber object. >> 1) I would need to patch C version too, because we have public >> C API. The fact you forgot about C API says, that it is not >> a good idea to make that cleanup in each front end. Easy to >> miss something; > > C api will enter into Lua land (lbox_pushfiber) as soon as it > creates a Lua fiber, and the Lua land will call the trigger in > the end of lbox_fiber_run_f. lbox_fiber_run_f is not called for C fibers. This is for Lua born fibers only. To invoke a C function Lua is not used. >> 2) That will give exactly the same - a member in struct fiber. >> There just no other communication channel between different >> frontends in which a fiber may participate during lifetime; > > Why do you think there will be different frontends involved? The > storage is pure-Lua, there is no fiber storage for non-Lua fibers. > Because other frontends will appear. >> 3) We would need to add a new member to struct fiber and >> increase its size. > > Please see above, do you get the idea now? > > No. I don't understand what you want and how. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-13 23:11 ` Vladislav Shpilevoy @ 2019-12-14 12:26 ` Konstantin Osipov 2019-12-14 12:30 ` Konstantin Osipov 0 siblings, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-14 12:26 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/14 02:13]: > lbox_fiber_run_f is not called for C fibers. This is for Lua born > fibers only. To invoke a C function Lua is not used. It can't access fiber storage. > >> 2) That will give exactly the same - a member in struct fiber. > >> There just no other communication channel between different > >> frontends in which a fiber may participate during lifetime; > > > > Why do you think there will be different frontends involved? The > > storage is pure-Lua, there is no fiber storage for non-Lua fibers. > > > > Because other frontends will appear. request storage is specific to Lua which doesn't have Go-style deferrables, Python-style finally() or any other working means for controlled garbage collection. Even in Lua it is better done as a third party library, which works the same way as box.atomic(): wraps the request body with a RAII-style function. This feature *is* broken & redundant, and instead of making the damage controlled, you want to "fix" it by making the feature universal and implementing it on C level. > > >> 3) We would need to add a new member to struct fiber and > >> increase its size. > > > > Please see above, do you get the idea now? > > No. I don't understand what you want and how. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-14 12:26 ` Konstantin Osipov @ 2019-12-14 12:30 ` Konstantin Osipov 2019-12-14 12:33 ` Konstantin Osipov 0 siblings, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-14 12:30 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches * Konstantin Osipov <kostja.osipov@gmail.com> [19/12/14 15:26]: > request storage is specific to Lua which doesn't have Go-style deferrables, > Python-style finally() or any other working means for controlled > garbage collection. > > Even in Lua it is better done as a third party library, which works > the same way as box.atomic(): wraps the request body with > a RAII-style function. > > This feature *is* broken & redundant, and instead of making the > damage controlled, you want to "fix" it by making the feature > universal and implementing it on C level. I checked now, and Lua addeded debug.getupvalue() to Lua C API. On other words, fiber storage now can be implemented in pure Lua, without any triggers. Back when we were writing, upvalue API was only available in C API, and AFAIR there was no way to access upvalues from Lua, this is why we had to come up with this hack. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-14 12:30 ` Konstantin Osipov @ 2019-12-14 12:33 ` Konstantin Osipov 2019-12-14 16:49 ` Vladislav Shpilevoy 0 siblings, 1 reply; 26+ messages in thread From: Konstantin Osipov @ 2019-12-14 12:33 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches * Konstantin Osipov <kostja.osipov@gmail.com> [19/12/14 15:30]: > I checked now, and Lua addeded debug.getupvalue() to Lua C API. > > On other words, fiber storage now can be implemented in pure Lua, > without any triggers. > > Back when we were writing, upvalue API was only available in C > API, and AFAIR there was no way to access upvalues from Lua, this > is why we had to come up with this hack. The feature can actually be deprecated. There is now a completely legitimate and easy to use way to have a request-local object with a user defined destructor in pure Lua. If you do want to keep it, you should try to minimized the damage to the core, not extend it. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-14 12:33 ` Konstantin Osipov @ 2019-12-14 16:49 ` Vladislav Shpilevoy 2019-12-14 20:59 ` Vladislav Shpilevoy 0 siblings, 1 reply; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-14 16:49 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches Well, I still don't understand what you are keep saying and what do you want, and since it does not look like I will, I am out of this ticket. On 14/12/2019 13:33, Konstantin Osipov wrote: > * Konstantin Osipov <kostja.osipov@gmail.com> [19/12/14 15:30]: >> I checked now, and Lua addeded debug.getupvalue() to Lua C API. >> >> On other words, fiber storage now can be implemented in pure Lua, >> without any triggers. >> >> Back when we were writing, upvalue API was only available in C >> API, and AFAIR there was no way to access upvalues from Lua, this >> is why we had to come up with this hack. > > The feature can actually be deprecated. There is now a completely > legitimate and easy to use way to have a request-local object with > a user defined destructor in pure Lua. > > If you do want to keep it, you should try to minimized the damage > to the core, not extend it. > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto 2019-12-14 16:49 ` Vladislav Shpilevoy @ 2019-12-14 20:59 ` Vladislav Shpilevoy 0 siblings, 0 replies; 26+ messages in thread From: Vladislav Shpilevoy @ 2019-12-14 20:59 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches Ok, we discussed the ticket verbally. It was decided to keep it as is but revert all renames. Also I will add a doc request in which I will describe with details when and how fiber storage behaves. For example, what about triggers, applier. On 14/12/2019 17:49, Vladislav Shpilevoy wrote: > Well, I still don't understand what you are keep saying and > what do you want, and since it does not look like I will, I > am out of this ticket. > > On 14/12/2019 13:33, Konstantin Osipov wrote: >> * Konstantin Osipov <kostja.osipov@gmail.com> [19/12/14 15:30]: >>> I checked now, and Lua addeded debug.getupvalue() to Lua C API. >>> >>> On other words, fiber storage now can be implemented in pure Lua, >>> without any triggers. >>> >>> Back when we were writing, upvalue API was only available in C >>> API, and AFAIR there was no way to access upvalues from Lua, this >>> is why we had to come up with this hack. >> >> The feature can actually be deprecated. There is now a completely >> legitimate and easy to use way to have a request-local object with >> a user defined destructor in pure Lua. >> >> If you do want to keep it, you should try to minimized the damage >> to the core, not extend it. >> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2019-12-14 20:59 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-08 19:28 [Tarantool-patches] [PATCH 0/2] Fiber storage leak Vladislav Shpilevoy 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 1/2] fiber: unref fiber.storage via global Lua state Vladislav Shpilevoy 2019-12-11 23:57 ` Igor Munkin 2019-12-12 8:50 ` Konstantin Osipov 2019-12-12 23:38 ` Vladislav Shpilevoy 2019-12-13 10:44 ` Igor Munkin 2019-12-08 19:28 ` [Tarantool-patches] [PATCH 2/2] fiber: destroy fiber.storage created by iproto Vladislav Shpilevoy 2019-12-09 7:21 ` Konstantin Osipov 2019-12-09 23:31 ` Vladislav Shpilevoy 2019-12-10 8:21 ` Konstantin Osipov 2019-12-10 8:32 ` Konstantin Osipov 2019-12-10 22:59 ` Vladislav Shpilevoy 2019-12-11 7:08 ` Konstantin Osipov 2019-12-11 21:23 ` Vladislav Shpilevoy 2019-12-12 0:00 ` Igor Munkin 2019-12-12 23:37 ` Vladislav Shpilevoy 2019-12-13 13:35 ` Igor Munkin 2019-12-12 8:46 ` Konstantin Osipov 2019-12-13 0:02 ` Vladislav Shpilevoy 2019-12-13 7:58 ` Konstantin Osipov 2019-12-13 23:11 ` Vladislav Shpilevoy 2019-12-14 12:26 ` Konstantin Osipov 2019-12-14 12:30 ` Konstantin Osipov 2019-12-14 12:33 ` Konstantin Osipov 2019-12-14 16:49 ` Vladislav Shpilevoy 2019-12-14 20:59 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox