Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
@ 2020-07-10 12:01 Roman Khabibov
  2020-07-10 12:29 ` Cyrill Gorcunov
  0 siblings, 1 reply; 9+ messages in thread
From: Roman Khabibov @ 2020-07-10 12:01 UTC (permalink / raw)
  To: tarantool-patches

Add a limit to the number of calls to the __serialize function.
Throw error in case of very deep (most likely endless) recursion.

Closes #3228
---

Branch: https://github.com/tarantool/tarantool/tree/romanhabibov/gh-3228-serialize
Issue: https://github.com/tarantool/tarantool/issues/3228

@ChangeLog
- Fix bug with bus error when __serialize function generates
infinite recursion.

 src/lua/utils.c                               |  8 ++++++++
 ...-3228-serializer-look-for-recursion.result | 19 +++++++++++++++++++
 ...228-serializer-look-for-recursion.test.lua |  8 ++++++++
 3 files changed, 35 insertions(+)
 create mode 100644 test/app/gh-3228-serializer-look-for-recursion.result
 create mode 100644 test/app/gh-3228-serializer-look-for-recursion.test.lua

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d7257..7e55d43f1 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -50,6 +50,9 @@ static uint32_t CTID_CONST_CHAR_PTR;
 static uint32_t CTID_UUID;
 uint32_t CTID_DECIMAL;
 
+enum {
+	SERIALIZER_CRITICAL_RECURSION_DEPTH = 256
+};
 
 void *
 luaL_pushcdata(struct lua_State *L, uint32_t ctypeid)
@@ -490,6 +493,11 @@ static int
 lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg,
 			int idx, struct luaL_field *field)
 {
+	if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) {
+		diag_set(LuajitError, LUAL_SERIALIZE " generates too deep "
+			 "recursion");
+		return -1;
+	}
 	if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0)
 		return 1;
 	if (lua_isfunction(L, -1)) {
diff --git a/test/app/gh-3228-serializer-look-for-recursion.result b/test/app/gh-3228-serializer-look-for-recursion.result
new file mode 100644
index 000000000..f105bfae9
--- /dev/null
+++ b/test/app/gh-3228-serializer-look-for-recursion.result
@@ -0,0 +1,19 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-3228: Check the error message in the case of a __serialize
+-- function generating infinite recursion.
+--
+setmetatable({}, {__serialize = function(a) return a end})
+ | ---
+ | - error: 'console: an exception occurred when formatting the output: __serialize generates
+ |     too deep recursion'
+ | ...
+setmetatable({}, {__serialize = function(a, b, c) return a, b, c end})
+ | ---
+ | - error: 'console: an exception occurred when formatting the output: __serialize generates
+ |     too deep recursion'
+ | ...
diff --git a/test/app/gh-3228-serializer-look-for-recursion.test.lua b/test/app/gh-3228-serializer-look-for-recursion.test.lua
new file mode 100644
index 000000000..d3c76ef0c
--- /dev/null
+++ b/test/app/gh-3228-serializer-look-for-recursion.test.lua
@@ -0,0 +1,8 @@
+test_run = require('test_run').new()
+
+--
+-- gh-3228: Check the error message in the case of a __serialize
+-- function generating infinite recursion.
+--
+setmetatable({}, {__serialize = function(a) return a end})
+setmetatable({}, {__serialize = function(a, b, c) return a, b, c end})
-- 
2.21.0 (Apple Git-122)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-07-10 12:01 [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization Roman Khabibov
@ 2020-07-10 12:29 ` Cyrill Gorcunov
  2020-07-14  9:45   ` Igor Munkin
  0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2020-07-10 12:29 UTC (permalink / raw)
  To: Roman Khabibov; +Cc: tarantool-patches

On Fri, Jul 10, 2020 at 03:01:09PM +0300, Roman Khabibov wrote:
>  luaL_pushcdata(struct lua_State *L, uint32_t ctypeid)
> @@ -490,6 +493,11 @@ static int
>  lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg,
>  			int idx, struct luaL_field *field)
>  {
> +	if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) {
> +		diag_set(LuajitError, LUAL_SERIALIZE " generates too deep "
> +			 "recursion");
> +		return -1;
> +	}
>  	if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0)
>  		return 1;

Wait. The @idx stands for index in a table as far as I remember,
this just happen to hit when you're calling youself recursively
but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive
calls as well.

Lets CC our Lua's master: Igor. I might be simply wrong.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-07-10 12:29 ` Cyrill Gorcunov
@ 2020-07-14  9:45   ` Igor Munkin
  2020-07-14 10:40     ` Cyrill Gorcunov
  0 siblings, 1 reply; 9+ messages in thread
From: Igor Munkin @ 2020-07-14  9:45 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tarantool-patches

Cyrill,

On 10.07.20, Cyrill Gorcunov wrote:
> On Fri, Jul 10, 2020 at 03:01:09PM +0300, Roman Khabibov wrote:
> >  luaL_pushcdata(struct lua_State *L, uint32_t ctypeid)
> > @@ -490,6 +493,11 @@ static int
> >  lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg,
> >  			int idx, struct luaL_field *field)
> >  {
> > +	if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) {
> > +		diag_set(LuajitError, LUAL_SERIALIZE " generates too deep "
> > +			 "recursion");
> > +		return -1;
> > +	}
> >  	if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0)
> >  		return 1;
> 
> Wait. The @idx stands for index in a table as far as I remember,
> this just happen to hit when you're calling youself recursively
> but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive
> calls as well.

I guess you've just misread this part: @idx is just a guest stack slot
where Lua object being serialized is stored. If I get your concerns
right, you're talking about the following case:
* before the patch
| $ ./src/tarantool
| Tarantool 2.5.0-184-gbefa36cfd
| type 'help' for interactive help
| tarantool> local b = 0 t = setmetatable({}, {__serialize = function(a)
|          > if b < 256 then b = b + 1 return a end
|          > return "QQ" end})
|  ---
|  ...
|
|  tarantool> t
|  ---
|  - QQ
|  ...
|
* after the patch
| $ ./src/tarantool
| Tarantool 2.5.0-185-gc9501fa8e
| type 'help' for interactive help
| tarantool> local b = 0 t = setmetatable({}, {__serialize = function(a)
|          > if b < 256 then b = b + 1 return a end
|          > return "QQ" end})
| ---
| ...
|
| tarantool> t
| ---
| - error: 'console: an exception occurred when formatting the output: __serialize generates
|     too deep recursion'
| ...
|
| tarantool>

This example looks a bit synthetic, doesn't it? So, I guess we can
assume the patch is fine (omitting that Mons proposed another solution
in the issue but nobody provided its pros and cons).

However, I agree that stack overflow can be handled in a different way.
Roma's implementation is quite similar to the way Python handles stack
overflow:
| $ python <<EOF
| heredoc> def f(n):
| heredoc>   if not n: return 1
| heredoc>   return n * f(n - 1)
| heredoc>
| heredoc> f(2**64)
| heredoc> EOF
| Traceback (most recent call last):
|   File "<stdin>", line 5, in <module>
|   File "<stdin>", line 3, in f
|   File "<stdin>", line 3, in f
|   File "<stdin>", line 3, in f
|   [Previous line repeated 996 more times]
| RecursionError: maximum recursion depth exceeded

However, there is a significant difference between Roma's and Python
approaches that inquisitive reader can find here[1].

Frankly speaking, I don't like such workaround at all and prefer the way
Perl works when deep recursion occurs:
| $ perl -wE 'sub f { my $n = shift; return 1 unless $n; return $n * f($n - 1) } f(2**64)'
| Deep recursion on subroutine "main::f" at -e line 1.
| Killed

IMHO, it would be nice to drop a line to Tarantool log when "soft" stack
overflow is violated, instead of user sandboxing. These are just my
thoughts and not a full review, so feel free to ignore everything above.

> 
> Lets CC our Lua's master: Igor. I might be simply wrong.

[1]: https://docs.python.org/3/library/sys.html#sys.setrecursionlimit

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-07-14  9:45   ` Igor Munkin
@ 2020-07-14 10:40     ` Cyrill Gorcunov
  2020-09-14 14:43       ` Roman Khabibov
  0 siblings, 1 reply; 9+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 10:40 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

On Tue, Jul 14, 2020 at 12:45:33PM +0300, Igor Munkin wrote:
> > 
> > Wait. The @idx stands for index in a table as far as I remember,
> > this just happen to hit when you're calling youself recursively
> > but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive
> > calls as well.
> 
> I guess you've just misread this part: @idx is just a guest stack slot
> where Lua object being serialized is stored. If I get your concerns
> right, you're talking about the following case:

Yes, you're right. Thanks, Igor! I don't have strong preference here
on how to handle the case. Thus

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-07-14 10:40     ` Cyrill Gorcunov
@ 2020-09-14 14:43       ` Roman Khabibov
  2020-09-14 16:06         ` Cyrill Gorcunov
  2020-09-16  7:29         ` Igor Munkin
  0 siblings, 2 replies; 9+ messages in thread
From: Roman Khabibov @ 2020-09-14 14:43 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tarantool-patches

Hi, Cyrill and Igor!

I tried to compare the addresses of the previous and the current iteration,
if they are equal, then throw "looks like recursion, bad function!”. But I
got swim tests failing. That is, they use some recursive serializers that do
not overflow the stack. Therefore, I settled on the idea of ​​introducing a 
recursion limit.

> On Jul 14, 2020, at 13:40, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> 
> On Tue, Jul 14, 2020 at 12:45:33PM +0300, Igor Munkin wrote:
>>> 
>>> Wait. The @idx stands for index in a table as far as I remember,
>>> this just happen to hit when you're calling youself recursively
>>> but @idx may be > SERIALIZER_CRITICAL_RECURSION_DEPTH for nonrecursive
>>> calls as well.
>> 
>> I guess you've just misread this part: @idx is just a guest stack slot
>> where Lua object being serialized is stored. If I get your concerns
>> right, you're talking about the following case:
> 
> Yes, you're right. Thanks, Igor! I don't have strong preference here
> on how to handle the case. Thus
> 
> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-09-14 14:43       ` Roman Khabibov
@ 2020-09-14 16:06         ` Cyrill Gorcunov
  2020-09-16  7:29         ` Igor Munkin
  1 sibling, 0 replies; 9+ messages in thread
From: Cyrill Gorcunov @ 2020-09-14 16:06 UTC (permalink / raw)
  To: Roman Khabibov; +Cc: tarantool-patches

On Mon, Sep 14, 2020 at 05:43:56PM +0300, Roman Khabibov wrote:
> Hi, Cyrill and Igor!
> 
> I tried to compare the addresses of the previous and the current iteration,
> if they are equal, then throw "looks like recursion, bad function!”. But I
> got swim tests failing. That is, they use some recursive serializers that do
> not overflow the stack. Therefore, I settled on the idea of introducing a 
> recursion limit.

Look ok to me, I already gave an ack.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-09-14 14:43       ` Roman Khabibov
  2020-09-14 16:06         ` Cyrill Gorcunov
@ 2020-09-16  7:29         ` Igor Munkin
  2020-09-30 21:49           ` Roman Khabibov
  1 sibling, 1 reply; 9+ messages in thread
From: Igor Munkin @ 2020-09-16  7:29 UTC (permalink / raw)
  To: Roman Khabibov; +Cc: tarantool-patches

Roma,

On 14.09.20, Roman Khabibov wrote:
> Hi, Cyrill and Igor!
> 
> I tried to compare the addresses of the previous and the current iteration,
> if they are equal, then throw "looks like recursion, bad function!”.

Could you please share your patch?

> But I got swim tests failing. That is, they use some recursive
> serializers that do not overflow the stack.

Is it done intentionally or this behaviour can be changed?

> Therefore, I settled on the idea of introducing a recursion limit.

Nevertheless, I still propose one of the following:
* make the limit configurable via box interface
* introduce a "soft" limit to inform user when recursion occurs (e.g.
  using log with "WARN" facility) and a "hard" one to stop the instance

IMHO, the best is to implement both proposals. Thoughts?

Side note: __tostring Lua metamethod obligues user to yield a string,
but I see __serialize method doesn't have such restrictions. Otherwise,
the fix would be brief and clear.

> 

<snipped>

> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-09-16  7:29         ` Igor Munkin
@ 2020-09-30 21:49           ` Roman Khabibov
  2020-10-01 14:40             ` Igor Munkin
  0 siblings, 1 reply; 9+ messages in thread
From: Roman Khabibov @ 2020-09-30 21:49 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi! Thanks for the review.

> On Sep 16, 2020, at 10:29, Igor Munkin <imun@tarantool.org> wrote:
> 
> Roma,
> 
> On 14.09.20, Roman Khabibov wrote:
>> Hi, Cyrill and Igor!
>> 
>> I tried to compare the addresses of the previous and the current iteration,
>> if they are equal, then throw "looks like recursion, bad function!”.
> 
> Could you please share your patch?
> 
>> But I got swim tests failing. That is, they use some recursive
>> serializers that do not overflow the stack.
> 
> Is it done intentionally or this behaviour can be changed?
Hm, I think intentionally. Then I need to study the swim code
to fix this. Perhaps we can't do without it :(

>> Therefore, I settled on the idea of introducing a recursion limit.
> 
> Nevertheless, I still propose one of the following:
> * make the limit configurable via box interface
> * introduce a "soft" limit to inform user when recursion occurs (e.g.
>  using log with "WARN" facility) and a "hard" one to stop the instance
> 
> IMHO, the best is to implement both proposals. Thoughts?
I started doing this and thought, why? Why would a user need to
regulate this? In my opinion, the main goal of the patch is to
avoid the "bus error".

> Side note: __tostring Lua metamethod obligues user to yield a string,
> but I see __serialize method doesn't have such restrictions. Otherwise,
> the fix would be brief and clear.

Do you mean to add this check after serialization? Based on the
serializers in swim, __serialize may return an accepted value,
which may not be a string.

>> 
> 
> <snipped>
> 
>> 
> 
> -- 
> Best regards,
> IM

commit fa3131372bdaeb54015b851d2a84afc1e5d2449a
Author: Roman Khabibov <roman.habibov@tarantool.org>
Date:   Tue Mar 10 19:29:10 2020 +0300

    serilaizer: check for recursive serialization
    
    Add a limit to the number of calls to the __serialize function.
    Throw error in case of very deep (most likely endless) recursion.
    
    Closes #3228

diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0a2..8d3aa3450 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -52,6 +52,9 @@ static uint32_t CTID_CONST_CHAR_PTR;
 static uint32_t CTID_UUID;
 uint32_t CTID_DECIMAL;
 
+enum {
+	SERIALIZER_CRITICAL_RECURSION_DEPTH = 256
+};
 
 void *
 luaL_pushcdata(struct lua_State *L, uint32_t ctypeid)
@@ -492,6 +495,11 @@ static int
 lua_field_try_serialize(struct lua_State *L, struct luaL_serializer *cfg,
 			int idx, struct luaL_field *field)
 {
+	if (idx > SERIALIZER_CRITICAL_RECURSION_DEPTH) {
+		diag_set(LuajitError, LUAL_SERIALIZE " generates too deep "
+			 "recursion");
+		return -1;
+	}
 	if (luaL_getmetafield(L, idx, LUAL_SERIALIZE) == 0)
 		return 1;
 	if (lua_isfunction(L, -1)) {
diff --git a/test/app/gh-3228-serializer-look-for-recursion.result b/test/app/gh-3228-serializer-look-for-recursion.result
new file mode 100644
index 000000000..f105bfae9
--- /dev/null
+++ b/test/app/gh-3228-serializer-look-for-recursion.result
@@ -0,0 +1,19 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-3228: Check the error message in the case of a __serialize
+-- function generating infinite recursion.
+--
+setmetatable({}, {__serialize = function(a) return a end})
+ | ---
+ | - error: 'console: an exception occurred when formatting the output: __serialize generates
+ |     too deep recursion'
+ | ...
+setmetatable({}, {__serialize = function(a, b, c) return a, b, c end})
+ | ---
+ | - error: 'console: an exception occurred when formatting the output: __serialize generates
+ |     too deep recursion'
+ | ...
diff --git a/test/app/gh-3228-serializer-look-for-recursion.test.lua b/test/app/gh-3228-serializer-look-for-recursion.test.lua
new file mode 100644
index 000000000..d3c76ef0c
--- /dev/null
+++ b/test/app/gh-3228-serializer-look-for-recursion.test.lua
@@ -0,0 +1,8 @@
+test_run = require('test_run').new()
+
+--
+-- gh-3228: Check the error message in the case of a __serialize
+-- function generating infinite recursion.
+--
+setmetatable({}, {__serialize = function(a) return a end})
+setmetatable({}, {__serialize = function(a, b, c) return a, b, c end})

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization
  2020-09-30 21:49           ` Roman Khabibov
@ 2020-10-01 14:40             ` Igor Munkin
  0 siblings, 0 replies; 9+ messages in thread
From: Igor Munkin @ 2020-10-01 14:40 UTC (permalink / raw)
  To: Roman Khabibov; +Cc: tarantool-patches

Roma,

On 01.10.20, Roman Khabibov wrote:
> Hi! Thanks for the review.
> 
> > On Sep 16, 2020, at 10:29, Igor Munkin <imun@tarantool.org> wrote:
> > 
> > Roma,
> > 
> > On 14.09.20, Roman Khabibov wrote:
> >> Hi, Cyrill and Igor!
> >> 
> >> I tried to compare the addresses of the previous and the current iteration,
> >> if they are equal, then throw "looks like recursion, bad function!”.
> > 
> > Could you please share your patch?

Well, the patch below looks like the original one with no addresses
comparison at all.

> > 
> >> But I got swim tests failing. That is, they use some recursive
> >> serializers that do not overflow the stack.
> > 
> > Is it done intentionally or this behaviour can be changed?
> Hm, I think intentionally. Then I need to study the swim code
> to fix this. Perhaps we can't do without it :(

Anyway, let's look at SWIM code/tests together or even summon Vlad for
the help as the major SWIM developer.

> 
> >> Therefore, I settled on the idea of introducing a recursion limit.
> > 
> > Nevertheless, I still propose one of the following:
> > * make the limit configurable via box interface
> > * introduce a "soft" limit to inform user when recursion occurs (e.g.
> >  using log with "WARN" facility) and a "hard" one to stop the instance
> > 
> > IMHO, the best is to implement both proposals. Thoughts?
> I started doing this and thought, why? Why would a user need to
> regulate this? In my opinion, the main goal of the patch is to
> avoid the "bus error".

E.g. if he definitely needs 257 iterations. This value is your estimate,
not the exact stack limit. So the end user might need to exceed it in
the user code, but has no way to tune this limit for his needs.

> 
> > Side note: __tostring Lua metamethod obligues user to yield a string,
> > but I see __serialize method doesn't have such restrictions. Otherwise,
> > the fix would be brief and clear.
> 
> Do you mean to add this check after serialization? Based on the
> serializers in swim, __serialize may return an accepted value,
> which may not be a string.

OK, your patch *perfectly* solves the "bus error" issue, but I believe
it's a good point to think about the following: do we need __serialize
metamethod going into recursion at all? What interface/contract does
this function have? I mean, if we prohibit "echo" __serialize functions,
then it's better to implement a check the similar way Mons originally
proposed in the issue (yeah, I know you've already tried, but let's see
whether we can do anything more here). Otherwise (i.e. if "echo"
__serialize functions are allowed), it's better to implement a
configurable limit than a hardcoded one, isn't it? I'm not quite sure
"256 slots is ought to be enough".

> 

Nevertheless, feel free to ignore everything above if you're sure. I'm
not fully in the issue context, so don't see everything around the
problem. Just want to make the changes more user-friendly.

> >> 
> > 
> > <snipped>
> > 
> >> 
> > 
> > -- 
> > Best regards,
> > IM

<snipped>

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-01 14:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 12:01 [Tarantool-patches] [PATCH] serilaizer: check for recursive serialization Roman Khabibov
2020-07-10 12:29 ` Cyrill Gorcunov
2020-07-14  9:45   ` Igor Munkin
2020-07-14 10:40     ` Cyrill Gorcunov
2020-09-14 14:43       ` Roman Khabibov
2020-09-14 16:06         ` Cyrill Gorcunov
2020-09-16  7:29         ` Igor Munkin
2020-09-30 21:49           ` Roman Khabibov
2020-10-01 14:40             ` Igor Munkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox