Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
@ 2020-07-21 22:32 Ilya Kosarev
  2020-07-22  8:59 ` Nikita Pettik
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Kosarev @ 2020-07-21 22:32 UTC (permalink / raw)
  To: imun; +Cc: tarantool-patches

According to user report (details provided in tarantool/tarantool#4649)
lua_gettop() returns negative value in encode_lua_call(). Such behavior
is not expected and violates Lua reference manual. Thus corresponding
sane check with assert is introduced.

Relates to tarantool/tarantool#4649
---
Branch: https://github.com/tarantool/luajit/tree/i.kosarev/gh-4649-assert-in-lua_gettop
Issue: https://github.com/tarantool/tarantool/issues/4649

Changes in v2:
- added reasoning in comment

Changes in v3:
- moved assert to LuaJIT itself

(gdb) bt
#0  0x00007f0012bbf387 in raise () from .../lib64/libc.so.6
#1  0x00007f0012bc0a78 in abort () from .../lib64/libc.so.6
#2  0x000000000041108e in sig_fatal_cb (signo=11, siginfo=0x7ef54107f8f0, context=0x7ef54107f7c0) at .../src/main.cc:258
#3  <signal handler called>
#4  0x00007f0012cf8691 in __strlen_sse2_pminub () from .../lib64/libc.so.6
#5  0x000000000061b18c in iproto_reply_error (out=0x7f00098692e8, e=0x0, sync=54211898, schema_version=495) at .../src/box/xrow.c:434
#6  0x00000000004168c2 in tx_reply_error (msg=0x7f0009816d80) at .../src/box/iproto.cc:1465
#7  0x0000000000416ec6 in tx_process_call (m=0x7f0009816d80) at .../src/box/iproto.cc:1649
#8  0x0000000000520e5a in cmsg_deliver (msg=0x7f0009816d80) at .../src/cbus.c:375
#9  0x0000000000521e34 in fiber_pool_f (ap=0x7ef5448c1728) at .../src/fiber_pool.c:64
#10 0x0000000000410a7a in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=0x521ca1 <fiber_pool_f>, ap=0x7ef5448c1728) at .../src/fiber.h:678
#11 0x000000000051af4e in fiber_loop (data=0x0) at .../src/fiber.c:713
#12 0x00000000006abae3 in coro_init () at .../third_party/coro/coro.c:110
(gdb) fr 7
#7  0x0000000000416ec6 in tx_process_call (m=0x7f0009816d80) at .../src/box/iproto.cc:1649
1649	in .../src/box/iproto.cc
(gdb) p rc
$29 = 0
(gdb) p count
$30 = -1
(gdb) p msg->header.type
$31 = 10
(gdb) p port->vtab->dump_msgpack
$32 = (int (*)(port *, obuf *)) 0x4e759f <port_lua_dump>

static void
tx_process_call(struct cmsg *m)
{
	struct iproto_msg *msg = tx_accept_msg(m);
	if (tx_check_schema(msg->header.schema_version))
		goto error;

	/*
	 * CALL/EVAL should copy its arguments so we can discard
	 * input on yield to avoid stalling other connections by
	 * a long polling request.
	 */
	struct trigger fiber_on_yield;
	trigger_create(&fiber_on_yield, tx_process_call_on_yield, msg, NULL);
	trigger_add(&fiber()->on_yield, &fiber_on_yield);

	int rc;
	struct port port;

	switch (msg->header.type) {
	case IPROTO_CALL:
	case IPROTO_CALL_16:
		rc = box_process_call(&msg->call, &port);
		break;
	case IPROTO_EVAL:
		rc = box_process_eval(&msg->call, &port);
		break;
	default:
		unreachable();
	}

	trigger_clear(&fiber_on_yield);

	if (rc != 0)
		goto error;

	/*
	 * Add all elements returned by the function to iproto.
	 *
	 * To allow clients to understand a complex return from
	 * a procedure, we are compatible with SELECT protocol,
	 * and return the number of return values first, and
	 * then each return value as a tuple.
	 *
	 * (!) Please note that a save point for output buffer
	 * must be taken only after finishing executing of Lua
	 * function because Lua can yield and leave the
	 * buffer in inconsistent state (a parallel request
	 * from the same connection will break the protocol).
	 */

	int count;
	struct obuf *out;
	struct obuf_svp svp;

	out = msg->connection->tx.p_obuf;
	if (iproto_prepare_select(out, &svp) != 0) {
		port_destroy(&port);
		goto error;
	}

	if (msg->header.type == IPROTO_CALL_16)
		count = port_dump_msgpack_16(&port, out);
	else
		count = port_dump_msgpack(&port, out);
	port_destroy(&port);
	if (count < 0) {
		obuf_rollback_to_svp(out, &svp);
		goto error;
	}

	iproto_reply_select(out, &svp, msg->header.sync,
			    ::schema_version, count);
	iproto_wpos_create(&msg->wpos, out);
	return;
error:
	tx_reply_error(msg);
}

As we can see, we fail to the error path through count == -1. It comes
from port_lua_dump() calling port_lua_do_dump() with encode_lua_call().

static int
port_lua_dump(struct port *base, struct obuf *out)
{
	return port_lua_do_dump(base, out, encode_lua_call);
}

encode_lua_call() actually sets port->size equal to lua_gettop(), being
returned by port_lua_do_dump().

 src/lj_api.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/lj_api.c b/src/lj_api.c
index c9b5d22..5ca2c57 100644
--- a/src/lj_api.c
+++ b/src/lj_api.c
@@ -135,7 +135,12 @@ LUA_API const lua_Number *lua_version(lua_State *L)
 
 LUA_API int lua_gettop(lua_State *L)
 {
-  return (int)(L->top - L->base);
+  int stack_size = (int)(L->top - L->base);
+  /*
+   * Stack size has to be non-negative according to Lua reference manual.
+   */
+  assert(stack_size >= 0);
+  return stack_size;
 }
 
 LUA_API void lua_settop(lua_State *L, int idx)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-21 22:32 [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size Ilya Kosarev
@ 2020-07-22  8:59 ` Nikita Pettik
  2020-07-22  9:46   ` Ilya Kosarev
  0 siblings, 1 reply; 8+ messages in thread
From: Nikita Pettik @ 2020-07-22  8:59 UTC (permalink / raw)
  To: Ilya Kosarev; +Cc: tarantool-patches

On 22 Jul 01:32, Ilya Kosarev wrote:
> (gdb) p port->vtab->dump_msgpack
> 		count = port_dump_msgpack_16(&port, out);
> 	else
> 		count = port_dump_msgpack(&port, out);
> 	port_destroy(&port);
> 	if (count < 0) {
> 		obuf_rollback_to_svp(out, &svp);
> 		goto error;
> 	}
> 
> 	iproto_reply_select(out, &svp, msg->header.sync,
> 			    ::schema_version, count);
> 	iproto_wpos_create(&msg->wpos, out);
> 	return;
> error:
> 	tx_reply_error(msg);
> }
> 
> As we can see, we fail to the error path through count == -1. It comes
> from port_lua_dump() calling port_lua_do_dump() with encode_lua_call().
> 
> static int
> port_lua_dump(struct port *base, struct obuf *out)
> {
> 	return port_lua_do_dump(base, out, encode_lua_call);
> }
> 
> encode_lua_call() actually sets port->size equal to lua_gettop(), being
> returned by port_lua_do_dump().

As a first assumption it is OK. But there might be other reasons
for negative value in size var. For instance, out-of-bound memory
access in luamp_encode() which overwrites value of stack var..
 
>  src/lj_api.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/lj_api.c b/src/lj_api.c
> index c9b5d22..5ca2c57 100644
> --- a/src/lj_api.c
> +++ b/src/lj_api.c
> @@ -135,7 +135,12 @@ LUA_API const lua_Number *lua_version(lua_State *L)
>  
>  LUA_API int lua_gettop(lua_State *L)
>  {
> -  return (int)(L->top - L->base);
> +  int stack_size = (int)(L->top - L->base);
> +  /*
> +   * Stack size has to be non-negative according to Lua reference manual.
> +   */
> +  assert(stack_size >= 0);
> +  return stack_size;
>  }
>  
>  LUA_API void lua_settop(lua_State *L, int idx)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-22  8:59 ` Nikita Pettik
@ 2020-07-22  9:46   ` Ilya Kosarev
  2020-07-22  9:46     ` Igor Munkin
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Kosarev @ 2020-07-22  9:46 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]


Hi!

Thanks to @igormunkin we discovered more details of this problem.
Turns out we can see lua_State stored in port and it seems that
something is totally wrong here: it points to some function address
(port_lua_dump_16()). Thus I will make some more research into it and
this patch should be discarded for now.
(gdb) p ((struct port_lua *)port)->vtab
$19 = (const port_vtab *) 0x4e759f <port_lua_dump>
(gdb) p ((struct port_lua *)port)->L
$20 = (lua_State *) 0x4e75c9 <port_lua_dump_16>
  
>Среда, 22 июля 2020, 11:59 +03:00 от Nikita Pettik <korablev@tarantool.org>:
> 
>On 22 Jul 01:32, Ilya Kosarev wrote:
>> (gdb) p port->vtab->dump_msgpack
>> count = port_dump_msgpack_16(&port, out);
>> else
>> count = port_dump_msgpack(&port, out);
>> port_destroy(&port);
>> if (count < 0) {
>> obuf_rollback_to_svp(out, &svp);
>> goto error;
>> }
>>
>> iproto_reply_select(out, &svp, msg->header.sync,
>> ::schema_version, count);
>> iproto_wpos_create(&msg->wpos, out);
>> return;
>> error:
>> tx_reply_error(msg);
>> }
>>
>> As we can see, we fail to the error path through count == -1. It comes
>> from port_lua_dump() calling port_lua_do_dump() with encode_lua_call().
>>
>> static int
>> port_lua_dump(struct port *base, struct obuf *out)
>> {
>> return port_lua_do_dump(base, out, encode_lua_call);
>> }
>>
>> encode_lua_call() actually sets port->size equal to lua_gettop(), being
>> returned by port_lua_do_dump().
>
>As a first assumption it is OK. But there might be other reasons
>for negative value in size var. For instance, out-of-bound memory
>access in luamp_encode() which overwrites value of stack var..
> 
>> src/lj_api.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/lj_api.c b/src/lj_api.c
>> index c9b5d22..5ca2c57 100644
>> --- a/src/lj_api.c
>> +++ b/src/lj_api.c
>> @@ -135,7 +135,12 @@ LUA_API const lua_Number *lua_version(lua_State *L)
>>
>> LUA_API int lua_gettop(lua_State *L)
>> {
>> - return (int)(L->top - L->base);
>> + int stack_size = (int)(L->top - L->base);
>> + /*
>> + * Stack size has to be non-negative according to Lua reference manual.
>> + */
>> + assert(stack_size >= 0);
>> + return stack_size;
>> }
>>
>> LUA_API void lua_settop(lua_State *L, int idx)
>> --
>> 2.17.1
>> 
 
 
--
Ilya Kosarev
 

[-- Attachment #2: Type: text/html, Size: 3126 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-22  9:46   ` Ilya Kosarev
@ 2020-07-22  9:46     ` Igor Munkin
  2020-07-22 10:25       ` Ilya Kosarev
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Munkin @ 2020-07-22  9:46 UTC (permalink / raw)
  To: Ilya Kosarev; +Cc: tarantool-patches

Ilya,

On 22.07.20, Ilya Kosarev wrote:
> 
> Hi!
> 
> Thanks to @igormunkin we discovered more details of this problem.
> Turns out we can see lua_State stored in port and it seems that
> something is totally wrong here: it points to some function address
> (port_lua_dump_16()). Thus I will make some more research into it and
> this patch should be discarded for now.
> (gdb) p ((struct port_lua *)port)->vtab
> $19 = (const port_vtab *) 0x4e759f <port_lua_dump>
> (gdb) p ((struct port_lua *)port)->L
> $20 = (lua_State *) 0x4e75c9 <port_lua_dump_16>

I've just noticed: port is a stucture, not a pointer, so the values are
totally irrelevant. I guess you need something like this:
| ((struct port_lua *)&port)->L

Could you please share the results for the right dereference?

>   

<snipped>

>  
> --
> Ilya Kosarev
>  

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-22  9:46     ` Igor Munkin
@ 2020-07-22 10:25       ` Ilya Kosarev
  2020-07-22 10:47         ` Igor Munkin
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Kosarev @ 2020-07-22 10:25 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]


Right, error fixed and now we found more details. Now the best
assumption is that lua_State is somehow being broken by
user-called function (which name we now know).
Now we will look into user code. 
>Среда, 22 июля 2020, 12:56 +03:00 от Igor Munkin <imun@tarantool.org>:
> 
>Ilya,
>
>On 22.07.20, Ilya Kosarev wrote:
>>
>> Hi!
>>
>> Thanks to @igormunkin we discovered more details of this problem.
>> Turns out we can see lua_State stored in port and it seems that
>> something is totally wrong here: it points to some function address
>> (port_lua_dump_16()). Thus I will make some more research into it and
>> this patch should be discarded for now.
>> (gdb) p ((struct port_lua *)port)->vtab
>> $19 = (const port_vtab *) 0x4e759f <port_lua_dump>
>> (gdb) p ((struct port_lua *)port)->L
>> $20 = (lua_State *) 0x4e75c9 <port_lua_dump_16>
>
>I've just noticed: port is a stucture, not a pointer, so the values are
>totally irrelevant. I guess you need something like this:
>| ((struct port_lua *)&port)->L
>
>Could you please share the results for the right dereference?
>
>>  
>
><snipped>
>
>>  
>> --
>> Ilya Kosarev
>>  
>
>--
>Best regards,
>IM 
 
 
--
Ilya Kosarev
 

[-- Attachment #2: Type: text/html, Size: 1839 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-22 10:25       ` Ilya Kosarev
@ 2020-07-22 10:47         ` Igor Munkin
  2020-07-22 11:08           ` Nikita Pettik
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Munkin @ 2020-07-22 10:47 UTC (permalink / raw)
  To: Ilya Kosarev; +Cc: tarantool-patches

I'll add more info for Nikita and Sergos since they are also involved
to the investigation and the patch review.

On 22.07.20, Ilya Kosarev wrote:
> 
> Right, error fixed and now we found more details.

Here are the details:
| (gdb) p ((struct port_lua *)&port)->L
| $1 = (lua_State *) 0x41b8fa08
| (gdb) p ((struct port_lua *)&port)
| $2 = (port_lua *) 0x7ef54107fe30
| (gdb) p ((struct port_lua *)&port)->vtab
| $3 = (const port_vtab *) 0x708d60 <port_lua_vtab>
| (gdb) p ((struct port_lua *)&port)->ref
| $4 = 181

port->ref value is rotten since port->L is removed from Lua registry
within <port_destroy> (i.e. <port_lua_destroy>) function. But the
pointer to port->L is fine.

| (gdb) p ((struct port_lua *)&port)->size
| $5 = -1

size value is initialized to -1 prior to encoding loop, so there is no
<luamp_encode> call.

| (gdb) p ((struct port_lua *)&port)->out
| $6 = (obuf *) 0x7f00098692e8
| (gdb) p ((struct port_lua *)&port)->L->top
| $7 = (TValue *) 0x4107fa08
| (gdb) p ((struct port_lua *)&port)->L->base
| $8 = (TValue *) 0x4107fa10

size value is initialized properly since L->base is greater than L->top
(guest stack addresses grow downwards). Oops...

> Now the best assumption is that lua_State is somehow being broken by
> user-called function (which name we now know).

The port (and ergo port->L coroutine) is created in scope of
<box_process_lua> call. Considering the message type (IPROTO_CALL)
<execute_lua_call> handler is called. Both Ilya and me found nothing
suspicious there: the function to be called is found by its name and
then execution enters Lua space.

Considering the results (<box_process_call> rc is 0) the call succeeds
and execution proceeds with reply packing. There is also nothing
corrupting port->L coroutine internal structure prior to the place
port->size is initialized to -1.

> Now we will look into user code.

For now I see no reason to add even the assert, since Ilya's assumptions
are confirmed.


<snipped>

>  
> --
> Ilya Kosarev
>  

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-22 10:47         ` Igor Munkin
@ 2020-07-22 11:08           ` Nikita Pettik
  2020-07-22 12:05             ` Igor Munkin
  0 siblings, 1 reply; 8+ messages in thread
From: Nikita Pettik @ 2020-07-22 11:08 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

On 22 Jul 13:47, Igor Munkin wrote:
> I'll add more info for Nikita and Sergos since they are also involved
> to the investigation and the patch review.
> 
> On 22.07.20, Ilya Kosarev wrote:
> > 
> > Right, error fixed and now we found more details.
> 
> Here are the details:
> | (gdb) p ((struct port_lua *)&port)->L
> | $1 = (lua_State *) 0x41b8fa08
> | (gdb) p ((struct port_lua *)&port)
> | $2 = (port_lua *) 0x7ef54107fe30
> | (gdb) p ((struct port_lua *)&port)->vtab
> | $3 = (const port_vtab *) 0x708d60 <port_lua_vtab>
> | (gdb) p ((struct port_lua *)&port)->ref
> | $4 = 181
> 
> port->ref value is rotten since port->L is removed from Lua registry
> within <port_destroy> (i.e. <port_lua_destroy>) function. But the
> pointer to port->L is fine.
> 
> | (gdb) p ((struct port_lua *)&port)->size
> | $5 = -1
> 
> size value is initialized to -1 prior to encoding loop, so there is no
> <luamp_encode> call.
> 
> | (gdb) p ((struct port_lua *)&port)->out
> | $6 = (obuf *) 0x7f00098692e8
> | (gdb) p ((struct port_lua *)&port)->L->top
> | $7 = (TValue *) 0x4107fa08
> | (gdb) p ((struct port_lua *)&port)->L->base
> | $8 = (TValue *) 0x4107fa10
> 
> size value is initialized properly since L->base is greater than L->top
> (guest stack addresses grow downwards). Oops...
> 
> > Now the best assumption is that lua_State is somehow being broken by
> > user-called function (which name we now know).
> 
> The port (and ergo port->L coroutine) is created in scope of
> <box_process_lua> call. Considering the message type (IPROTO_CALL)
> <execute_lua_call> handler is called. Both Ilya and me found nothing
> suspicious there: the function to be called is found by its name and
> then execution enters Lua space.
> 
> Considering the results (<box_process_call> rc is 0) the call succeeds
> and execution proceeds with reply packing. There is also nothing
> corrupting port->L coroutine internal structure prior to the place
> port->size is initialized to -1.
> 
> > Now we will look into user code.
> 
> For now I see no reason to add even the assert, since Ilya's assumptions
> are confirmed.

Which one? Assuming that lua_gettop() returns negative value due
to spoiled lua state?
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
  2020-07-22 11:08           ` Nikita Pettik
@ 2020-07-22 12:05             ` Igor Munkin
  0 siblings, 0 replies; 8+ messages in thread
From: Igor Munkin @ 2020-07-22 12:05 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

Nikita,

On 22.07.20, Nikita Pettik wrote:

<snipped>

> > 
> > For now I see no reason to add even the assert, since Ilya's assumptions
> > are confirmed.
> 
> Which one? Assuming that lua_gettop() returns negative value due
> to spoiled lua state?

Yes, no need to check whether L->base is greater than L->top, it's
already known from the coredump. I doubt this change allows to localize
the problem much closer than now. Let's see customer's user code and
proceed with the investigation.

>  

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-22 12:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 22:32 [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size Ilya Kosarev
2020-07-22  8:59 ` Nikita Pettik
2020-07-22  9:46   ` Ilya Kosarev
2020-07-22  9:46     ` Igor Munkin
2020-07-22 10:25       ` Ilya Kosarev
2020-07-22 10:47         ` Igor Munkin
2020-07-22 11:08           ` Nikita Pettik
2020-07-22 12:05             ` Igor Munkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox