* [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization @ 2020-07-10 12:01 Roman Khabibov 2020-07-10 12:29 ` Cyrill Gorcunov 0 siblings, 1 reply; 9+ messages in thread From: Roman Khabibov @ 2020-07-10 12:01 UTC (permalink / raw) To: tarantool-patches Add a limit to the number of calls to the __serialize function. Throw error in case of very deep (most likely endless) recursion. Closes #3228 --- Branch: https://github.com/tarantool/tarantool/tree/romanhabibov/gh-3228-serialize Issue: https://github.com/tarantool/tarantool/issues/3228 @ChangeLog - Fix bug with bus error when __serialize function generates infinite recursion. src/lua/utils.c | 8 ++++++++ ...-3228-serializer-look-for-recursion.result | 19 +++++++++++++++++++ ...228-serializer-look-for-recursion.test.lua | 8 ++++++++ 3 files changed, 35 insertions(+) create mode 100644 test/app/gh-3228-serializer-look-for-recursion.result create mode 100644 test/app/gh-3228-serializer-look-for-recursion.test.lua diff --git a/src/lua/utils.c b/src/lua/utils.c index 0b05d7257..7e55d43f1 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -50,6 +50,9 @@ static uint32_t CTID_CONST_CHAR_PTR; static uint32_t CTID_UUID; uint32_t CTID_DECIMAL; +enum { + SERIALIZER_CRITICAL_RECURSION_DEPTH = 256 +}; void * luaL_pushcdata(struct lua_State *L, uint32_t ctypeid) @@ -490,6 +493,11 @@ static int lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg, int idx, struct luaL_field *field) { + if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) { + diag_set(LuajitError, LUAL_SERIALIZE " generates too deep " + "recursion"); + return -1; + } if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0) return 1; if (lua_isfunction(L, -1)) { diff --git a/test/app/gh-3228-serializer-look-for-recursion.result b/test/app/gh-3228-serializer-look-for-recursion.result new file mode 100644 index 000000000..f105bfae9 --- /dev/null +++ b/test/app/gh-3228-serializer-look-for-recursion.result @@ -0,0 +1,19 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... + +-- +-- gh-3228: Check the error message in the case of a __serialize +-- function generating infinite recursion. +-- +setmetatable({}, {__serialize = function(a) return a end}) + | --- + | - error: 'console: an exception occurred when formatting the output: __serialize generates + | too deep recursion' + | ... +setmetatable({}, {__serialize = function(a, b, c) return a, b, c end}) + | --- + | - error: 'console: an exception occurred when formatting the output: __serialize generates + | too deep recursion' + | ... diff --git a/test/app/gh-3228-serializer-look-for-recursion.test.lua b/test/app/gh-3228-serializer-look-for-recursion.test.lua new file mode 100644 index 000000000..d3c76ef0c --- /dev/null +++ b/test/app/gh-3228-serializer-look-for-recursion.test.lua @@ -0,0 +1,8 @@ +test_run = require('test_run').new() + +-- +-- gh-3228: Check the error message in the case of a __serialize +-- function generating infinite recursion. +-- +setmetatable({}, {__serialize = function(a) return a end}) +setmetatable({}, {__serialize = function(a, b, c) return a, b, c end}) -- 2.21.0 (Apple Git-122) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-07-10 12:01 [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization Roman Khabibov @ 2020-07-10 12:29 ` Cyrill Gorcunov 2020-07-14 9:45 ` Igor Munkin 0 siblings, 1 reply; 9+ messages in thread From: Cyrill Gorcunov @ 2020-07-10 12:29 UTC (permalink / raw) To: Roman Khabibov; +Cc: tarantool-patches On Fri, Jul 10, 2020 at 03:01:09PM +0300, Roman Khabibov wrote: > luaL_pushcdata(struct lua_State *L, uint32_t ctypeid) > @@ -490,6 +493,11 @@ static int > lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg, > int idx, struct luaL_field *field) > { > + if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) { > + diag_set(LuajitError, LUAL_SERIALIZE " generates too deep " > + "recursion"); > + return -1; > + } > if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0) > return 1; Wait. The @idx stands for index in a table as far as I remember, this just happen to hit when you're calling youself recursively but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive calls as well. Lets CC our Lua's master: Igor. I might be simply wrong. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-07-10 12:29 ` Cyrill Gorcunov @ 2020-07-14 9:45 ` Igor Munkin 2020-07-14 10:40 ` Cyrill Gorcunov 0 siblings, 1 reply; 9+ messages in thread From: Igor Munkin @ 2020-07-14 9:45 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tarantool-patches Cyrill, On 10.07.20, Cyrill Gorcunov wrote: > On Fri, Jul 10, 2020 at 03:01:09PM +0300, Roman Khabibov wrote: > > luaL_pushcdata(struct lua_State *L, uint32_t ctypeid) > > @@ -490,6 +493,11 @@ static int > > lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg, > > int idx, struct luaL_field *field) > > { > > + if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) { > > + diag_set(LuajitError, LUAL_SERIALIZE " generates too deep " > > + "recursion"); > > + return -1; > > + } > > if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0) > > return 1; > > Wait. The @idx stands for index in a table as far as I remember, > this just happen to hit when you're calling youself recursively > but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive > calls as well. I guess you've just misread this part: @idx is just a guest stack slot where Lua object being serialized is stored. If I get your concerns right, you're talking about the following case: * before the patch | $ ./src/tarantool | Tarantool 2.5.0-184-gbefa36cfd | type 'help' for interactive help | tarantool> local b = 0 t = setmetatable({}, {__serialize = function(a) | > if b < 256 then b = b + 1 return a end | > return "QQ" end}) | --- | ... | | tarantool> t | --- | - QQ | ... | * after the patch | $ ./src/tarantool | Tarantool 2.5.0-185-gc9501fa8e | type 'help' for interactive help | tarantool> local b = 0 t = setmetatable({}, {__serialize = function(a) | > if b < 256 then b = b + 1 return a end | > return "QQ" end}) | --- | ... | | tarantool> t | --- | - error: 'console: an exception occurred when formatting the output: __serialize generates | too deep recursion' | ... | | tarantool> This example looks a bit synthetic, doesn't it? So, I guess we can assume the patch is fine (omitting that Mons proposed another solution in the issue but nobody provided its pros and cons). However, I agree that stack overflow can be handled in a different way. Roma's implementation is quite similar to the way Python handles stack overflow: | $ python <<EOF | heredoc> def f(n): | heredoc> if not n: return 1 | heredoc> return n * f(n - 1) | heredoc> | heredoc> f(2**64) | heredoc> EOF | Traceback (most recent call last): | File "<stdin>", line 5, in <module> | File "<stdin>", line 3, in f | File "<stdin>", line 3, in f | File "<stdin>", line 3, in f | [Previous line repeated 996 more times] | RecursionError: maximum recursion depth exceeded However, there is a significant difference between Roma's and Python approaches that inquisitive reader can find here[1]. Frankly speaking, I don't like such workaround at all and prefer the way Perl works when deep recursion occurs: | $ perl -wE 'sub f { my $n = shift; return 1 unless $n; return $n * f($n - 1) } f(2**64)' | Deep recursion on subroutine "main::f" at -e line 1. | Killed IMHO, it would be nice to drop a line to Tarantool log when "soft" stack overflow is violated, instead of user sandboxing. These are just my thoughts and not a full review, so feel free to ignore everything above. > > Lets CC our Lua's master: Igor. I might be simply wrong. [1]: https://docs.python.org/3/library/sys.html#sys.setrecursionlimit -- Best regards, IM ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-07-14 9:45 ` Igor Munkin @ 2020-07-14 10:40 ` Cyrill Gorcunov 2020-09-14 14:43 ` Roman Khabibov 0 siblings, 1 reply; 9+ messages in thread From: Cyrill Gorcunov @ 2020-07-14 10:40 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches On Tue, Jul 14, 2020 at 12:45:33PM +0300, Igor Munkin wrote: > > > > Wait. The @idx stands for index in a table as far as I remember, > > this just happen to hit when you're calling youself recursively > > but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive > > calls as well. > > I guess you've just misread this part: @idx is just a guest stack slot > where Lua object being serialized is stored. If I get your concerns > right, you're talking about the following case: Yes, you're right. Thanks, Igor! I don't have strong preference here on how to handle the case. Thus Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-07-14 10:40 ` Cyrill Gorcunov @ 2020-09-14 14:43 ` Roman Khabibov 2020-09-14 16:06 ` Cyrill Gorcunov 2020-09-16 7:29 ` Igor Munkin 0 siblings, 2 replies; 9+ messages in thread From: Roman Khabibov @ 2020-09-14 14:43 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tarantool-patches Hi, Cyrill and Igor! I tried to compare the addresses of the previous and the current iteration, if they are equal, then throw "looks like recursion, bad function!”. But I got swim tests failing. That is, they use some recursive serializers that do not overflow the stack. Therefore, I settled on the idea of introducing a recursion limit. > On Jul 14, 2020, at 13:40, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > On Tue, Jul 14, 2020 at 12:45:33PM +0300, Igor Munkin wrote: >>> >>> Wait. The @idx stands for index in a table as far as I remember, >>> this just happen to hit when you're calling youself recursively >>> but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive >>> calls as well. >> >> I guess you've just misread this part: @idx is just a guest stack slot >> where Lua object being serialized is stored. If I get your concerns >> right, you're talking about the following case: > > Yes, you're right. Thanks, Igor! I don't have strong preference here > on how to handle the case. Thus > > Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-09-14 14:43 ` Roman Khabibov @ 2020-09-14 16:06 ` Cyrill Gorcunov 2020-09-16 7:29 ` Igor Munkin 1 sibling, 0 replies; 9+ messages in thread From: Cyrill Gorcunov @ 2020-09-14 16:06 UTC (permalink / raw) To: Roman Khabibov; +Cc: tarantool-patches On Mon, Sep 14, 2020 at 05:43:56PM +0300, Roman Khabibov wrote: > Hi, Cyrill and Igor! > > I tried to compare the addresses of the previous and the current iteration, > if they are equal, then throw "looks like recursion, bad function!”. But I > got swim tests failing. That is, they use some recursive serializers that do > not overflow the stack. Therefore, I settled on the idea of introducing a > recursion limit. Look ok to me, I already gave an ack. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-09-14 14:43 ` Roman Khabibov 2020-09-14 16:06 ` Cyrill Gorcunov @ 2020-09-16 7:29 ` Igor Munkin 2020-09-30 21:49 ` Roman Khabibov 1 sibling, 1 reply; 9+ messages in thread From: Igor Munkin @ 2020-09-16 7:29 UTC (permalink / raw) To: Roman Khabibov; +Cc: tarantool-patches Roma, On 14.09.20, Roman Khabibov wrote: > Hi, Cyrill and Igor! > > I tried to compare the addresses of the previous and the current iteration, > if they are equal, then throw "looks like recursion, bad function!”. Could you please share your patch? > But I got swim tests failing. That is, they use some recursive > serializers that do not overflow the stack. Is it done intentionally or this behaviour can be changed? > Therefore, I settled on the idea of introducing a recursion limit. Nevertheless, I still propose one of the following: * make the limit configurable via box interface * introduce a "soft" limit to inform user when recursion occurs (e.g. using log with "WARN" facility) and a "hard" one to stop the instance IMHO, the best is to implement both proposals. Thoughts? Side note: __tostring Lua metamethod obligues user to yield a string, but I see __serialize method doesn't have such restrictions. Otherwise, the fix would be brief and clear. > <snipped> > -- Best regards, IM ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-09-16 7:29 ` Igor Munkin @ 2020-09-30 21:49 ` Roman Khabibov 2020-10-01 14:40 ` Igor Munkin 0 siblings, 1 reply; 9+ messages in thread From: Roman Khabibov @ 2020-09-30 21:49 UTC (permalink / raw) To: Igor Munkin; +Cc: tarantool-patches Hi! Thanks for the review. > On Sep 16, 2020, at 10:29, Igor Munkin <imun@tarantool.org> wrote: > > Roma, > > On 14.09.20, Roman Khabibov wrote: >> Hi, Cyrill and Igor! >> >> I tried to compare the addresses of the previous and the current iteration, >> if they are equal, then throw "looks like recursion, bad function!”. > > Could you please share your patch? > >> But I got swim tests failing. That is, they use some recursive >> serializers that do not overflow the stack. > > Is it done intentionally or this behaviour can be changed? Hm, I think intentionally. Then I need to study the swim code to fix this. Perhaps we can't do without it :( >> Therefore, I settled on the idea of introducing a recursion limit. > > Nevertheless, I still propose one of the following: > * make the limit configurable via box interface > * introduce a "soft" limit to inform user when recursion occurs (e.g. > using log with "WARN" facility) and a "hard" one to stop the instance > > IMHO, the best is to implement both proposals. Thoughts? I started doing this and thought, why? Why would a user need to regulate this? In my opinion, the main goal of the patch is to avoid the "bus error". > Side note: __tostring Lua metamethod obligues user to yield a string, > but I see __serialize method doesn't have such restrictions. Otherwise, > the fix would be brief and clear. Do you mean to add this check after serialization? Based on the serializers in swim, __serialize may return an accepted value, which may not be a string. >> > > <snipped> > >> > > -- > Best regards, > IM commit fa3131372bdaeb54015b851d2a84afc1e5d2449a Author: Roman Khabibov <roman.habibov@tarantool.org> Date: Tue Mar 10 19:29:10 2020 +0300 serilaizer: check for recursive serialization Add a limit to the number of calls to the __serialize function. Throw error in case of very deep (most likely endless) recursion. Closes #3228 diff --git a/src/lua/utils.c b/src/lua/utils.c index af114b0a2..8d3aa3450 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -52,6 +52,9 @@ static uint32_t CTID_CONST_CHAR_PTR; static uint32_t CTID_UUID; uint32_t CTID_DECIMAL; +enum { + SERIALIZER_CRITICAL_RECURSION_DEPTH = 256 +}; void * luaL_pushcdata(struct lua_State *L, uint32_t ctypeid) @@ -492,6 +495,11 @@ static int lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg, int idx, struct luaL_field *field) { + if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) { + diag_set(LuajitError, LUAL_SERIALIZE " generates too deep " + "recursion"); + return -1; + } if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0) return 1; if (lua_isfunction(L, -1)) { diff --git a/test/app/gh-3228-serializer-look-for-recursion.result b/test/app/gh-3228-serializer-look-for-recursion.result new file mode 100644 index 000000000..f105bfae9 --- /dev/null +++ b/test/app/gh-3228-serializer-look-for-recursion.result @@ -0,0 +1,19 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... + +-- +-- gh-3228: Check the error message in the case of a __serialize +-- function generating infinite recursion. +-- +setmetatable({}, {__serialize = function(a) return a end}) + | --- + | - error: 'console: an exception occurred when formatting the output: __serialize generates + | too deep recursion' + | ... +setmetatable({}, {__serialize = function(a, b, c) return a, b, c end}) + | --- + | - error: 'console: an exception occurred when formatting the output: __serialize generates + | too deep recursion' + | ... diff --git a/test/app/gh-3228-serializer-look-for-recursion.test.lua b/test/app/gh-3228-serializer-look-for-recursion.test.lua new file mode 100644 index 000000000..d3c76ef0c --- /dev/null +++ b/test/app/gh-3228-serializer-look-for-recursion.test.lua @@ -0,0 +1,8 @@ +test_run = require('test_run').new() + +-- +-- gh-3228: Check the error message in the case of a __serialize +-- function generating infinite recursion. +-- +setmetatable({}, {__serialize = function(a) return a end}) +setmetatable({}, {__serialize = function(a, b, c) return a, b, c end}) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization 2020-09-30 21:49 ` Roman Khabibov @ 2020-10-01 14:40 ` Igor Munkin 0 siblings, 0 replies; 9+ messages in thread From: Igor Munkin @ 2020-10-01 14:40 UTC (permalink / raw) To: Roman Khabibov; +Cc: tarantool-patches Roma, On 01.10.20, Roman Khabibov wrote: > Hi! Thanks for the review. > > > On Sep 16, 2020, at 10:29, Igor Munkin <imun@tarantool.org> wrote: > > > > Roma, > > > > On 14.09.20, Roman Khabibov wrote: > >> Hi, Cyrill and Igor! > >> > >> I tried to compare the addresses of the previous and the current iteration, > >> if they are equal, then throw "looks like recursion, bad function!”. > > > > Could you please share your patch? Well, the patch below looks like the original one with no addresses comparison at all. > > > >> But I got swim tests failing. That is, they use some recursive > >> serializers that do not overflow the stack. > > > > Is it done intentionally or this behaviour can be changed? > Hm, I think intentionally. Then I need to study the swim code > to fix this. Perhaps we can't do without it :( Anyway, let's look at SWIM code/tests together or even summon Vlad for the help as the major SWIM developer. > > >> Therefore, I settled on the idea of introducing a recursion limit. > > > > Nevertheless, I still propose one of the following: > > * make the limit configurable via box interface > > * introduce a "soft" limit to inform user when recursion occurs (e.g. > > using log with "WARN" facility) and a "hard" one to stop the instance > > > > IMHO, the best is to implement both proposals. Thoughts? > I started doing this and thought, why? Why would a user need to > regulate this? In my opinion, the main goal of the patch is to > avoid the "bus error". E.g. if he definitely needs 257 iterations. This value is your estimate, not the exact stack limit. So the end user might need to exceed it in the user code, but has no way to tune this limit for his needs. > > > Side note: __tostring Lua metamethod obligues user to yield a string, > > but I see __serialize method doesn't have such restrictions. Otherwise, > > the fix would be brief and clear. > > Do you mean to add this check after serialization? Based on the > serializers in swim, __serialize may return an accepted value, > which may not be a string. OK, your patch *perfectly* solves the "bus error" issue, but I believe it's a good point to think about the following: do we need __serialize metamethod going into recursion at all? What interface/contract does this function have? I mean, if we prohibit "echo" __serialize functions, then it's better to implement a check the similar way Mons originally proposed in the issue (yeah, I know you've already tried, but let's see whether we can do anything more here). Otherwise (i.e. if "echo" __serialize functions are allowed), it's better to implement a configurable limit than a hardcoded one, isn't it? I'm not quite sure "256 slots is ought to be enough". > Nevertheless, feel free to ignore everything above if you're sure. I'm not fully in the issue context, so don't see everything around the problem. Just want to make the changes more user-friendly. > >> > > > > <snipped> > > > >> > > > > -- > > Best regards, > > IM <snipped> -- Best regards, IM ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-10-01 14:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-10 12:01 [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization Roman Khabibov 2020-07-10 12:29 ` Cyrill Gorcunov 2020-07-14 9:45 ` Igor Munkin 2020-07-14 10:40 ` Cyrill Gorcunov 2020-09-14 14:43 ` Roman Khabibov 2020-09-14 16:06 ` Cyrill Gorcunov 2020-09-16 7:29 ` Igor Munkin 2020-09-30 21:49 ` Roman Khabibov 2020-10-01 14:40 ` Igor Munkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox