* [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
@ 2020-07-21 22:32 Ilya Kosarev
2020-07-22 8:59 ` Nikita Pettik
0 siblings, 1 reply; 8+ messages in thread
From: Ilya Kosarev @ 2020-07-21 22:32 UTC (permalink / raw)
To: imun; +Cc: tarantool-patches
According to user report (details provided in tarantool/tarantool#4649)
lua_gettop() returns negative value in encode_lua_call(). Such behavior
is not expected and violates Lua reference manual. Thus corresponding
sane check with assert is introduced.
Relates to tarantool/tarantool#4649
---
Branch: https://github.com/tarantool/luajit/tree/i.kosarev/gh-4649-assert-in-lua_gettop
Issue: https://github.com/tarantool/tarantool/issues/4649
Changes in v2:
- added reasoning in comment
Changes in v3:
- moved assert to LuaJIT itself
(gdb) bt
#0 0x00007f0012bbf387 in raise () from .../lib64/libc.so.6
#1 0x00007f0012bc0a78 in abort () from .../lib64/libc.so.6
#2 0x000000000041108e in sig_fatal_cb (signo=11, siginfo=0x7ef54107f8f0, context=0x7ef54107f7c0) at .../src/main.cc:258
#3 <signal handler called>
#4 0x00007f0012cf8691 in __strlen_sse2_pminub () from .../lib64/libc.so.6
#5 0x000000000061b18c in iproto_reply_error (out=0x7f00098692e8, e=0x0, sync=54211898, schema_version=495) at .../src/box/xrow.c:434
#6 0x00000000004168c2 in tx_reply_error (msg=0x7f0009816d80) at .../src/box/iproto.cc:1465
#7 0x0000000000416ec6 in tx_process_call (m=0x7f0009816d80) at .../src/box/iproto.cc:1649
#8 0x0000000000520e5a in cmsg_deliver (msg=0x7f0009816d80) at .../src/cbus.c:375
#9 0x0000000000521e34 in fiber_pool_f (ap=0x7ef5448c1728) at .../src/fiber_pool.c:64
#10 0x0000000000410a7a in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=0x521ca1 <fiber_pool_f>, ap=0x7ef5448c1728) at .../src/fiber.h:678
#11 0x000000000051af4e in fiber_loop (data=0x0) at .../src/fiber.c:713
#12 0x00000000006abae3 in coro_init () at .../third_party/coro/coro.c:110
(gdb) fr 7
#7 0x0000000000416ec6 in tx_process_call (m=0x7f0009816d80) at .../src/box/iproto.cc:1649
1649 in .../src/box/iproto.cc
(gdb) p rc
$29 = 0
(gdb) p count
$30 = -1
(gdb) p msg->header.type
$31 = 10
(gdb) p port->vtab->dump_msgpack
$32 = (int (*)(port *, obuf *)) 0x4e759f <port_lua_dump>
static void
tx_process_call(struct cmsg *m)
{
struct iproto_msg *msg = tx_accept_msg(m);
if (tx_check_schema(msg->header.schema_version))
goto error;
/*
* CALL/EVAL should copy its arguments so we can discard
* input on yield to avoid stalling other connections by
* a long polling request.
*/
struct trigger fiber_on_yield;
trigger_create(&fiber_on_yield, tx_process_call_on_yield, msg, NULL);
trigger_add(&fiber()->on_yield, &fiber_on_yield);
int rc;
struct port port;
switch (msg->header.type) {
case IPROTO_CALL:
case IPROTO_CALL_16:
rc = box_process_call(&msg->call, &port);
break;
case IPROTO_EVAL:
rc = box_process_eval(&msg->call, &port);
break;
default:
unreachable();
}
trigger_clear(&fiber_on_yield);
if (rc != 0)
goto error;
/*
* Add all elements returned by the function to iproto.
*
* To allow clients to understand a complex return from
* a procedure, we are compatible with SELECT protocol,
* and return the number of return values first, and
* then each return value as a tuple.
*
* (!) Please note that a save point for output buffer
* must be taken only after finishing executing of Lua
* function because Lua can yield and leave the
* buffer in inconsistent state (a parallel request
* from the same connection will break the protocol).
*/
int count;
struct obuf *out;
struct obuf_svp svp;
out = msg->connection->tx.p_obuf;
if (iproto_prepare_select(out, &svp) != 0) {
port_destroy(&port);
goto error;
}
if (msg->header.type == IPROTO_CALL_16)
count = port_dump_msgpack_16(&port, out);
else
count = port_dump_msgpack(&port, out);
port_destroy(&port);
if (count < 0) {
obuf_rollback_to_svp(out, &svp);
goto error;
}
iproto_reply_select(out, &svp, msg->header.sync,
::schema_version, count);
iproto_wpos_create(&msg->wpos, out);
return;
error:
tx_reply_error(msg);
}
As we can see, we fail to the error path through count == -1. It comes
from port_lua_dump() calling port_lua_do_dump() with encode_lua_call().
static int
port_lua_dump(struct port *base, struct obuf *out)
{
return port_lua_do_dump(base, out, encode_lua_call);
}
encode_lua_call() actually sets port->size equal to lua_gettop(), being
returned by port_lua_do_dump().
src/lj_api.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/lj_api.c b/src/lj_api.c
index c9b5d22..5ca2c57 100644
--- a/src/lj_api.c
+++ b/src/lj_api.c
@@ -135,7 +135,12 @@ LUA_API const lua_Number *lua_version(lua_State *L)
LUA_API int lua_gettop(lua_State *L)
{
- return (int)(L->top - L->base);
+ int stack_size = (int)(L->top - L->base);
+ /*
+ * Stack size has to be non-negative according to Lua reference manual.
+ */
+ assert(stack_size >= 0);
+ return stack_size;
}
LUA_API void lua_settop(lua_State *L, int idx)
--
2.17.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-21 22:32 [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size Ilya Kosarev
@ 2020-07-22 8:59 ` Nikita Pettik
2020-07-22 9:46 ` Ilya Kosarev
0 siblings, 1 reply; 8+ messages in thread
From: Nikita Pettik @ 2020-07-22 8:59 UTC (permalink / raw)
To: Ilya Kosarev; +Cc: tarantool-patches
On 22 Jul 01:32, Ilya Kosarev wrote:
> (gdb) p port->vtab->dump_msgpack
> count = port_dump_msgpack_16(&port, out);
> else
> count = port_dump_msgpack(&port, out);
> port_destroy(&port);
> if (count < 0) {
> obuf_rollback_to_svp(out, &svp);
> goto error;
> }
>
> iproto_reply_select(out, &svp, msg->header.sync,
> ::schema_version, count);
> iproto_wpos_create(&msg->wpos, out);
> return;
> error:
> tx_reply_error(msg);
> }
>
> As we can see, we fail to the error path through count == -1. It comes
> from port_lua_dump() calling port_lua_do_dump() with encode_lua_call().
>
> static int
> port_lua_dump(struct port *base, struct obuf *out)
> {
> return port_lua_do_dump(base, out, encode_lua_call);
> }
>
> encode_lua_call() actually sets port->size equal to lua_gettop(), being
> returned by port_lua_do_dump().
As a first assumption it is OK. But there might be other reasons
for negative value in size var. For instance, out-of-bound memory
access in luamp_encode() which overwrites value of stack var..
> src/lj_api.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/src/lj_api.c b/src/lj_api.c
> index c9b5d22..5ca2c57 100644
> --- a/src/lj_api.c
> +++ b/src/lj_api.c
> @@ -135,7 +135,12 @@ LUA_API const lua_Number *lua_version(lua_State *L)
>
> LUA_API int lua_gettop(lua_State *L)
> {
> - return (int)(L->top - L->base);
> + int stack_size = (int)(L->top - L->base);
> + /*
> + * Stack size has to be non-negative according to Lua reference manual.
> + */
> + assert(stack_size >= 0);
> + return stack_size;
> }
>
> LUA_API void lua_settop(lua_State *L, int idx)
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-22 8:59 ` Nikita Pettik
@ 2020-07-22 9:46 ` Ilya Kosarev
2020-07-22 9:46 ` Igor Munkin
0 siblings, 1 reply; 8+ messages in thread
From: Ilya Kosarev @ 2020-07-22 9:46 UTC (permalink / raw)
To: Nikita Pettik; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]
Hi!
Thanks to @igormunkin we discovered more details of this problem.
Turns out we can see lua_State stored in port and it seems that
something is totally wrong here: it points to some function address
(port_lua_dump_16()). Thus I will make some more research into it and
this patch should be discarded for now.
(gdb) p ((struct port_lua *)port)->vtab
$19 = (const port_vtab *) 0x4e759f <port_lua_dump>
(gdb) p ((struct port_lua *)port)->L
$20 = (lua_State *) 0x4e75c9 <port_lua_dump_16>
>Среда, 22 июля 2020, 11:59 +03:00 от Nikita Pettik <korablev@tarantool.org>:
>
>On 22 Jul 01:32, Ilya Kosarev wrote:
>> (gdb) p port->vtab->dump_msgpack
>> count = port_dump_msgpack_16(&port, out);
>> else
>> count = port_dump_msgpack(&port, out);
>> port_destroy(&port);
>> if (count < 0) {
>> obuf_rollback_to_svp(out, &svp);
>> goto error;
>> }
>>
>> iproto_reply_select(out, &svp, msg->header.sync,
>> ::schema_version, count);
>> iproto_wpos_create(&msg->wpos, out);
>> return;
>> error:
>> tx_reply_error(msg);
>> }
>>
>> As we can see, we fail to the error path through count == -1. It comes
>> from port_lua_dump() calling port_lua_do_dump() with encode_lua_call().
>>
>> static int
>> port_lua_dump(struct port *base, struct obuf *out)
>> {
>> return port_lua_do_dump(base, out, encode_lua_call);
>> }
>>
>> encode_lua_call() actually sets port->size equal to lua_gettop(), being
>> returned by port_lua_do_dump().
>
>As a first assumption it is OK. But there might be other reasons
>for negative value in size var. For instance, out-of-bound memory
>access in luamp_encode() which overwrites value of stack var..
>
>> src/lj_api.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/lj_api.c b/src/lj_api.c
>> index c9b5d22..5ca2c57 100644
>> --- a/src/lj_api.c
>> +++ b/src/lj_api.c
>> @@ -135,7 +135,12 @@ LUA_API const lua_Number *lua_version(lua_State *L)
>>
>> LUA_API int lua_gettop(lua_State *L)
>> {
>> - return (int)(L->top - L->base);
>> + int stack_size = (int)(L->top - L->base);
>> + /*
>> + * Stack size has to be non-negative according to Lua reference manual.
>> + */
>> + assert(stack_size >= 0);
>> + return stack_size;
>> }
>>
>> LUA_API void lua_settop(lua_State *L, int idx)
>> --
>> 2.17.1
>>
--
Ilya Kosarev
[-- Attachment #2: Type: text/html, Size: 3126 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-22 9:46 ` Ilya Kosarev
@ 2020-07-22 9:46 ` Igor Munkin
2020-07-22 10:25 ` Ilya Kosarev
0 siblings, 1 reply; 8+ messages in thread
From: Igor Munkin @ 2020-07-22 9:46 UTC (permalink / raw)
To: Ilya Kosarev; +Cc: tarantool-patches
Ilya,
On 22.07.20, Ilya Kosarev wrote:
>
> Hi!
>
> Thanks to @igormunkin we discovered more details of this problem.
> Turns out we can see lua_State stored in port and it seems that
> something is totally wrong here: it points to some function address
> (port_lua_dump_16()). Thus I will make some more research into it and
> this patch should be discarded for now.
> (gdb) p ((struct port_lua *)port)->vtab
> $19 = (const port_vtab *) 0x4e759f <port_lua_dump>
> (gdb) p ((struct port_lua *)port)->L
> $20 = (lua_State *) 0x4e75c9 <port_lua_dump_16>
I've just noticed: port is a stucture, not a pointer, so the values are
totally irrelevant. I guess you need something like this:
| ((struct port_lua *)&port)->L
Could you please share the results for the right dereference?
>
<snipped>
>
> --
> Ilya Kosarev
>
--
Best regards,
IM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-22 9:46 ` Igor Munkin
@ 2020-07-22 10:25 ` Ilya Kosarev
2020-07-22 10:47 ` Igor Munkin
0 siblings, 1 reply; 8+ messages in thread
From: Ilya Kosarev @ 2020-07-22 10:25 UTC (permalink / raw)
To: Igor Munkin; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]
Right, error fixed and now we found more details. Now the best
assumption is that lua_State is somehow being broken by
user-called function (which name we now know).
Now we will look into user code.
>Среда, 22 июля 2020, 12:56 +03:00 от Igor Munkin <imun@tarantool.org>:
>
>Ilya,
>
>On 22.07.20, Ilya Kosarev wrote:
>>
>> Hi!
>>
>> Thanks to @igormunkin we discovered more details of this problem.
>> Turns out we can see lua_State stored in port and it seems that
>> something is totally wrong here: it points to some function address
>> (port_lua_dump_16()). Thus I will make some more research into it and
>> this patch should be discarded for now.
>> (gdb) p ((struct port_lua *)port)->vtab
>> $19 = (const port_vtab *) 0x4e759f <port_lua_dump>
>> (gdb) p ((struct port_lua *)port)->L
>> $20 = (lua_State *) 0x4e75c9 <port_lua_dump_16>
>
>I've just noticed: port is a stucture, not a pointer, so the values are
>totally irrelevant. I guess you need something like this:
>| ((struct port_lua *)&port)->L
>
>Could you please share the results for the right dereference?
>
>>
>
><snipped>
>
>>
>> --
>> Ilya Kosarev
>>
>
>--
>Best regards,
>IM
--
Ilya Kosarev
[-- Attachment #2: Type: text/html, Size: 1839 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-22 10:25 ` Ilya Kosarev
@ 2020-07-22 10:47 ` Igor Munkin
2020-07-22 11:08 ` Nikita Pettik
0 siblings, 1 reply; 8+ messages in thread
From: Igor Munkin @ 2020-07-22 10:47 UTC (permalink / raw)
To: Ilya Kosarev; +Cc: tarantool-patches
I'll add more info for Nikita and Sergos since they are also involved
to the investigation and the patch review.
On 22.07.20, Ilya Kosarev wrote:
>
> Right, error fixed and now we found more details.
Here are the details:
| (gdb) p ((struct port_lua *)&port)->L
| $1 = (lua_State *) 0x41b8fa08
| (gdb) p ((struct port_lua *)&port)
| $2 = (port_lua *) 0x7ef54107fe30
| (gdb) p ((struct port_lua *)&port)->vtab
| $3 = (const port_vtab *) 0x708d60 <port_lua_vtab>
| (gdb) p ((struct port_lua *)&port)->ref
| $4 = 181
port->ref value is rotten since port->L is removed from Lua registry
within <port_destroy> (i.e. <port_lua_destroy>) function. But the
pointer to port->L is fine.
| (gdb) p ((struct port_lua *)&port)->size
| $5 = -1
size value is initialized to -1 prior to encoding loop, so there is no
<luamp_encode> call.
| (gdb) p ((struct port_lua *)&port)->out
| $6 = (obuf *) 0x7f00098692e8
| (gdb) p ((struct port_lua *)&port)->L->top
| $7 = (TValue *) 0x4107fa08
| (gdb) p ((struct port_lua *)&port)->L->base
| $8 = (TValue *) 0x4107fa10
size value is initialized properly since L->base is greater than L->top
(guest stack addresses grow downwards). Oops...
> Now the best assumption is that lua_State is somehow being broken by
> user-called function (which name we now know).
The port (and ergo port->L coroutine) is created in scope of
<box_process_lua> call. Considering the message type (IPROTO_CALL)
<execute_lua_call> handler is called. Both Ilya and me found nothing
suspicious there: the function to be called is found by its name and
then execution enters Lua space.
Considering the results (<box_process_call> rc is 0) the call succeeds
and execution proceeds with reply packing. There is also nothing
corrupting port->L coroutine internal structure prior to the place
port->size is initialized to -1.
> Now we will look into user code.
For now I see no reason to add even the assert, since Ilya's assumptions
are confirmed.
<snipped>
>
> --
> Ilya Kosarev
>
--
Best regards,
IM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-22 10:47 ` Igor Munkin
@ 2020-07-22 11:08 ` Nikita Pettik
2020-07-22 12:05 ` Igor Munkin
0 siblings, 1 reply; 8+ messages in thread
From: Nikita Pettik @ 2020-07-22 11:08 UTC (permalink / raw)
To: Igor Munkin; +Cc: tarantool-patches
On 22 Jul 13:47, Igor Munkin wrote:
> I'll add more info for Nikita and Sergos since they are also involved
> to the investigation and the patch review.
>
> On 22.07.20, Ilya Kosarev wrote:
> >
> > Right, error fixed and now we found more details.
>
> Here are the details:
> | (gdb) p ((struct port_lua *)&port)->L
> | $1 = (lua_State *) 0x41b8fa08
> | (gdb) p ((struct port_lua *)&port)
> | $2 = (port_lua *) 0x7ef54107fe30
> | (gdb) p ((struct port_lua *)&port)->vtab
> | $3 = (const port_vtab *) 0x708d60 <port_lua_vtab>
> | (gdb) p ((struct port_lua *)&port)->ref
> | $4 = 181
>
> port->ref value is rotten since port->L is removed from Lua registry
> within <port_destroy> (i.e. <port_lua_destroy>) function. But the
> pointer to port->L is fine.
>
> | (gdb) p ((struct port_lua *)&port)->size
> | $5 = -1
>
> size value is initialized to -1 prior to encoding loop, so there is no
> <luamp_encode> call.
>
> | (gdb) p ((struct port_lua *)&port)->out
> | $6 = (obuf *) 0x7f00098692e8
> | (gdb) p ((struct port_lua *)&port)->L->top
> | $7 = (TValue *) 0x4107fa08
> | (gdb) p ((struct port_lua *)&port)->L->base
> | $8 = (TValue *) 0x4107fa10
>
> size value is initialized properly since L->base is greater than L->top
> (guest stack addresses grow downwards). Oops...
>
> > Now the best assumption is that lua_State is somehow being broken by
> > user-called function (which name we now know).
>
> The port (and ergo port->L coroutine) is created in scope of
> <box_process_lua> call. Considering the message type (IPROTO_CALL)
> <execute_lua_call> handler is called. Both Ilya and me found nothing
> suspicious there: the function to be called is found by its name and
> then execution enters Lua space.
>
> Considering the results (<box_process_call> rc is 0) the call succeeds
> and execution proceeds with reply packing. There is also nothing
> corrupting port->L coroutine internal structure prior to the place
> port->size is initialized to -1.
>
> > Now we will look into user code.
>
> For now I see no reason to add even the assert, since Ilya's assumptions
> are confirmed.
Which one? Assuming that lua_gettop() returns negative value due
to spoiled lua state?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size
2020-07-22 11:08 ` Nikita Pettik
@ 2020-07-22 12:05 ` Igor Munkin
0 siblings, 0 replies; 8+ messages in thread
From: Igor Munkin @ 2020-07-22 12:05 UTC (permalink / raw)
To: Nikita Pettik; +Cc: tarantool-patches
Nikita,
On 22.07.20, Nikita Pettik wrote:
<snipped>
> >
> > For now I see no reason to add even the assert, since Ilya's assumptions
> > are confirmed.
>
> Which one? Assuming that lua_gettop() returns negative value due
> to spoiled lua state?
Yes, no need to check whether L->base is greater than L->top, it's
already known from the coredump. I doubt this change allows to localize
the problem much closer than now. Let's see customer's user code and
proceed with the investigation.
>
--
Best regards,
IM
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-07-22 12:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 22:32 [Tarantool-patches] [PATCH v3] lua: assert in lua_gettop() in case of negative stack size Ilya Kosarev
2020-07-22 8:59 ` Nikita Pettik
2020-07-22 9:46 ` Ilya Kosarev
2020-07-22 9:46 ` Igor Munkin
2020-07-22 10:25 ` Ilya Kosarev
2020-07-22 10:47 ` Igor Munkin
2020-07-22 11:08 ` Nikita Pettik
2020-07-22 12:05 ` Igor Munkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox