From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 11D7046970E for ; Wed, 22 Jan 2020 01:22:01 +0300 (MSK) References: <31e884f2b6d2ab7b222229f1b204fa067d63ae65.1579211601.git.v.shpilevoy@tarantool.org> <20200117074552.GD24940@atlas> <20200120072234.GB19835@atlas> From: Vladislav Shpilevoy Message-ID: <36fe0fdc-c192-f850-c241-6a761777ad6b@tarantool.org> Date: Tue, 21 Jan 2020 23:21:59 +0100 MIME-Version: 1.0 In-Reply-To: <20200120072234.GB19835@atlas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v2 2/3] fiber: destroy fiber.storage created by iproto List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , tarantool-patches@dev.tarantool.org On 20/01/2020 08:22, Konstantin Osipov wrote: > * Vladislav Shpilevoy [20/01/20 09:59]: >>> Why did you have to add so many invocation points to >>> fiber_on_stop() rather than simply adding fiber_on_stop invocation to >>> fiber_pool.c? >> >> We already discussed that in the previous patch version. We decided >> to move cleanup to iproto.cc, because it depends on when a request >> ends. Fiber pool knows nothing about requests. Iproto.cc is request >> processing layer, and this is the right place for request data >> destruction. > > True, but since you abstracted out the destruction via an opaque > trigger, why not move the invocation of the trigger to fiber pool? > fiber pool has most knowledge about fiber life cycle, so it seems > natural to invoke the triggers in it - it will tie the *timing* to > fiber pool, but not what's going on inside the trigger. > > Thoughts? And we came to the exactly the same patch I had in the first version, with rolled back on_stop -> on_cleanup rename. Motivation for calling on_stop in iproto was that only iproto is interested in clearing the fiber's state after each request, and fiber pool should not decide when a request ends. From what I understood. But I am ok with both solutions. Anyway, here is the new patch. ================================================================================ commit 422068e5ea4aaf4e6f4b4dc56f252f3b5971aec2 Author: Vladislav Shpilevoy Date: Thu Jan 16 21:47:18 2020 +0100 fiber: destroy fiber.storage created by iproto Fiber.storage was not deleted when created in a fiber started from the thread pool used by IProto requests. The problem was that fiber.storage was created and deleted in Lua land only, assuming that only Lua-born fibers could have it. But in fact any fiber can create a Lua storage. Including the ones used to serve IProto requests. Not deletion of the storage led to a possibility of meeting a non-empty fiber.storage in the beginning of an iproto request, and to not deletion of the memory caught by the storage until its explicit nullification. Now the storage destructor works for any fiber, which managed to create the storage. The destructor unrefs and nullifies the storage. For destructor purposes the fiber.on_stop triggers were reworked. Now they can be called multiple times during fiber's lifetime. After every request done by that fiber. Closes #4662 Closes #3462 @TarantoolBot document Title: Clarify fiber.storage lifetime Fiber.storage is a Lua table created when it is first accessed. On the site it is said that it is deleted when fiber is canceled via fiber:cancel(). But it is not the full truth. Fiber.storage is destroyed when the fiber is finished. Regardless of how is it finished - via :cancel(), or the fiber's function did 'return', it does not matter. Moreover, from that moment the storage is cleaned up even for pooled fibers used to serve IProto requests. Pooled fibers never really die, but nonetheless their storage is cleaned up after each request. That makes possible to use fiber.storage as a full featured request-local storage. Fiber.storage may be created for a fiber no matter how the fiber itself was created - from C, from Lua. For example, a fiber could be created in C using fiber_new(), then it could insert into a space, which had Lua on_replace triggers, and one of the triggers could create fiber.storage. That storage will be deleted when the fiber is stopped. Another place where fiber.storage may be created - for replication applier fiber. Applier has a fiber from which it applies transactions from a remote instance. In case the applier fiber somehow creates a fiber.storage (for example, from a space trigger again), the storage won't be deleted until the applier fiber is stopped. diff --git a/src/box/iproto.cc b/src/box/iproto.cc index 5e83ab0ea..f313af6ae 100644 --- a/src/box/iproto.cc +++ b/src/box/iproto.cc @@ -1310,6 +1310,11 @@ static void tx_fiber_init(struct session *session, uint64_t sync) { struct fiber *f = fiber(); + /* + * There should not be any not executed on_stop triggers + * from a previous request executed in that fiber. + */ + assert(rlist_empty(&f->on_stop)); f->storage.net.sync = sync; /* * We do not cleanup fiber keys at the end of each request. @@ -1716,8 +1721,6 @@ tx_process_sql(struct cmsg *m) uint32_t len; bool is_unprepare = false; - tx_fiber_init(msg->connection->session, msg->header.sync); - if (tx_check_schema(msg->header.schema_version)) goto error; assert(msg->header.type == IPROTO_EXECUTE || diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 00ae8cded..354749549 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -325,6 +325,25 @@ fiber_attr_getstacksize(struct fiber_attr *fiber_attr) fiber_attr_default.stack_size; } +void +fiber_on_stop(struct fiber *f) +{ + /* + * The most common case is when the list is empty. Do an + * inlined check before calling trigger_run(). + */ + if (rlist_empty(&f->on_stop)) + return; + if (trigger_run(&f->on_stop, f) != 0) + panic("On_stop triggers can't fail"); + /* + * All on_stop triggers are supposed to remove themselves. + * So as no to waste time on that here, and to make them + * all work uniformly. + */ + assert(rlist_empty(&f->on_stop)); +} + static void fiber_recycle(struct fiber *fiber); @@ -856,8 +875,7 @@ fiber_loop(MAYBE_UNUSED void *data) assert(f != fiber); fiber_wakeup(f); } - if (! rlist_empty(&fiber->on_stop)) - trigger_run(&fiber->on_stop, fiber); + fiber_on_stop(fiber); /* reset pending wakeups */ rlist_del(&fiber->state); if (! (fiber->flags & FIBER_IS_JOINABLE)) @@ -1525,8 +1543,8 @@ cord_cojoin(struct cord *cord) int break_ev_loop_f(struct trigger *trigger, void *event) { - (void) trigger; (void) event; + trigger_clear(trigger); ev_break(loop(), EVBREAK_ALL); return 0; } diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index faaf2e0da..8a3e5aef5 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -457,7 +457,15 @@ struct fiber { /** Triggers invoked before this fiber yields. Must not throw. */ struct rlist on_yield; - /** Triggers invoked before this fiber stops. Must not throw. */ + /** + * Triggers invoked before this fiber is stopped/reset/ + * recycled/destroyed/reused. In other words, each time + * when the fiber has finished execution of a request. + * In particular, for fibers not from a fiber pool the + * stop event is emitted before destruction and death. + * Pooled fibers receive the stop event after each + * request, even if they are never destroyed. + */ struct rlist on_stop; /** * The list of fibers awaiting for this fiber's timely @@ -505,6 +513,10 @@ struct fiber { char name[FIBER_NAME_MAX + 1]; }; +/** Invoke on_stop triggers and delete them. */ +void +fiber_on_stop(struct fiber *f); + struct cord_on_exit; /** diff --git a/src/lib/core/fiber_pool.c b/src/lib/core/fiber_pool.c index 77f89c9fa..d40c9bcc3 100644 --- a/src/lib/core/fiber_pool.c +++ b/src/lib/core/fiber_pool.c @@ -62,6 +62,16 @@ restart: assert(f->caller->caller == &cord->sched); } cmsg_deliver(msg); + /* + * Normally fibers die after their function + * returns, and they call on_stop() triggers. The + * pool optimization interferes into that logic + * and a fiber doesn't die after its function + * returns. But on_stop triggers still should be + * called so that the pool wouldn't affect fiber's + * visible lifecycle. + */ + fiber_on_stop(f); } /** Put the current fiber into a fiber cache. */ if (!fiber_is_cancelled(fiber()) && (msg != NULL || diff --git a/src/lua/fiber.c b/src/lua/fiber.c index a7b75f9bf..575a020d0 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -441,11 +441,6 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) int coro_ref = lua_tointeger(L, -1); lua_pop(L, 1); result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); - - /* Destroy local storage */ - int storage_ref = f->storage.lua.ref; - if (storage_ref > 0) - luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); /* * If fiber is not joinable * We can unref child stack here, @@ -606,12 +601,42 @@ lbox_fiber_name(struct lua_State *L) } } +/** + * Trigger invoked when the fiber has stopped execution of its + * current request. Only purpose - delete storage.lua.ref keeping + * a reference of Lua fiber.storage object. Unlike Lua stack, + * Lua fiber storage may be created not only for fibers born from + * Lua land. For example, an IProto request may execute a Lua + * function, which can create the storage. Trigger guarantees, + * that even for non-Lua fibers the Lua storage is destroyed. + */ +static int +lbox_fiber_on_stop(struct trigger *trigger, void *event) +{ + struct fiber *f = (struct fiber *) event; + int storage_ref = f->storage.lua.ref; + assert(storage_ref > 0); + luaL_unref(tarantool_L, LUA_REGISTRYINDEX, storage_ref); + f->storage.lua.ref = LUA_NOREF; + trigger_clear(trigger); + free(trigger); + return 0; +} + static int lbox_fiber_storage(struct lua_State *L) { struct fiber *f = lbox_checkfiber(L, 1); int storage_ref = f->storage.lua.ref; if (storage_ref <= 0) { + struct trigger *t = (struct trigger *) + malloc(sizeof(*t)); + if (t == NULL) { + diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); + return luaT_error(L); + } + trigger_create(t, lbox_fiber_on_stop, NULL, (trigger_f0) free); + trigger_add(&f->on_stop, t); lua_newtable(L); /* create local storage on demand */ storage_ref = luaL_ref(L, LUA_REGISTRYINDEX); f->storage.lua.ref = storage_ref; diff --git a/test/app/gh-4662-fiber-storage-leak.result b/test/app/gh-4662-fiber-storage-leak.result new file mode 100644 index 000000000..4ade017a4 --- /dev/null +++ b/test/app/gh-4662-fiber-storage-leak.result @@ -0,0 +1,88 @@ +-- test-run result file version 2 +fiber = require('fiber') + | --- + | ... +netbox = require('net.box') + | --- + | ... + +-- +-- gh-4662: fiber.storage was not deleted when created in a fiber +-- started from the thread pool used by IProto requests. The +-- problem was that fiber.storage was created and deleted in Lua +-- land only, assuming that only Lua-born fibers could have it. +-- But in fact any fiber can create a Lua storage. Including the +-- ones used to serve IProto requests. +-- The test checks if fiber.storage is really deleted, and is not +-- shared between requests. +-- + +box.schema.user.grant('guest', 'execute', 'universe') + | --- + | ... +storage = nil + | --- + | ... +i = 0 + | --- + | ... +weak_table = setmetatable({}, {__mode = 'v'}) + | --- + | ... +object = {'object'} + | --- + | ... +weak_table.object = object + | --- + | ... +function ref_object_in_fiber() \ + storage = fiber.self().storage \ + assert(next(storage) == nil) \ + i = i + 1 \ + fiber.self().storage.key = i \ + fiber.self().storage.object = object \ +end + | --- + | ... + +c = netbox.connect(box.cfg.listen) + | --- + | ... +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') + | --- + | ... +storage + | --- + | - key: 2 + | object: + | - object + | ... +i + | --- + | - 2 + | ... +object = nil + | --- + | ... +storage = nil + | --- + | ... +collectgarbage('collect') + | --- + | - 0 + | ... +-- The weak table should be empty, because the only two hard +-- references were in the fibers used to serve +-- ref_object_in_fiber() requests. And their storages should be +-- cleaned up. +weak_table + | --- + | - [] + | ... + +c:close() + | --- + | ... +box.schema.user.revoke('guest', 'execute', 'universe') + | --- + | ... diff --git a/test/app/gh-4662-fiber-storage-leak.test.lua b/test/app/gh-4662-fiber-storage-leak.test.lua new file mode 100644 index 000000000..25acf5679 --- /dev/null +++ b/test/app/gh-4662-fiber-storage-leak.test.lua @@ -0,0 +1,43 @@ +fiber = require('fiber') +netbox = require('net.box') + +-- +-- gh-4662: fiber.storage was not deleted when created in a fiber +-- started from the thread pool used by IProto requests. The +-- problem was that fiber.storage was created and deleted in Lua +-- land only, assuming that only Lua-born fibers could have it. +-- But in fact any fiber can create a Lua storage. Including the +-- ones used to serve IProto requests. +-- The test checks if fiber.storage is really deleted, and is not +-- shared between requests. +-- + +box.schema.user.grant('guest', 'execute', 'universe') +storage = nil +i = 0 +weak_table = setmetatable({}, {__mode = 'v'}) +object = {'object'} +weak_table.object = object +function ref_object_in_fiber() \ + storage = fiber.self().storage \ + assert(next(storage) == nil) \ + i = i + 1 \ + fiber.self().storage.key = i \ + fiber.self().storage.object = object \ +end + +c = netbox.connect(box.cfg.listen) +c:call('ref_object_in_fiber') c:call('ref_object_in_fiber') +storage +i +object = nil +storage = nil +collectgarbage('collect') +-- The weak table should be empty, because the only two hard +-- references were in the fibers used to serve +-- ref_object_in_fiber() requests. And their storages should be +-- cleaned up. +weak_table + +c:close() +box.schema.user.revoke('guest', 'execute', 'universe')