[Tarantool-patches] [PATCH] raft: more precise verification of incoming request state

Sat Jun 26 16:34:56 MSK 2021

On Fri, Jun 25, 2021 at 11:49:10PM +0200, Vladislav Shpilevoy wrote:
> > Closes #6067
> 
> Unfortunately, the patch does not fix much. Firstly,
> the state is decoded from MessagePack as mp_decode_uint().
> It can't be negative originally by design.

No, Vlad, this is not how signed\unsigned numbers are processed.
The decoder doesn't care about two complements at all, we just
force it to read 8 bytes as unsigned long integer and that's all.
And it is up to _us_ as programmers how to treat this 8 bytes
next: either we continue using it as an unsigned integer and
generated instructions won't be the ones which consiged "sign flag",
or we treat it as signed integer in the code so the sign flag
will be either "used" in instructions results or extended up
to register size. IOW, if we decode a number as unsigned integer
it doesn't mean it can't be used as a signed one later.

> Secondly, and most importantly, the state is decoded as
> uint64_t raft_request.state, but it is saved without checks
> into enum struct raft_msg.state in box_raft_request_to_msg().

I suppose this is done because box_raft_request_to_msg a kind of
general wrapper which doesn't care much about contents validation
but simply represent the data in a new structure? When stage
of processing starts (ie raft_process_msg helper) we validate
each member, so I think this is a good architecture to investigate
contents in raft_process_msg, but if you prefer to use
box_raft_request_to_msg instead I would make it so.

> So the current patch does not change almost anything I
> suppose? Because if I send ((UINT32_MAX << 31) | 0x01), it
> will work. Because the upper 32 bits would be truncated
> (assuming the enum is 32 bits), and your checks are done too
> late to notice it.
> 
> Correct? If so, then I would propose to fix the issue entirely,
> if you want to work on that.

I think I didnt put more technical details in commit message,
let me try to explain in a more verbose way: the enum is stored
as integer type but standart doesn't require it to be signed
or unsigned, so it is implementation defined. Currently on my
gcc-11 series the if statement consider @state member as
_unsigned_ integer thus basically my patch does absolutely
nothing, the code remains the same on asm level. Still a compiler
has all rights to use a signed "state" and in this case our
`if` won't hit leading to undefined behaviour in case if the
"state" is in form (0x80000000 | x) and what is worse we won't
even able to figure out what has happened.

We can emulate this behaviour by using

raft_process_msg
	...
	if ((int)req->state < 0) {
		panic("RAFT: req->state > 0 -> %d", (int)req->state >= (int)raft_state_MAX);
		assert(req->state > 0);
	}

in result if we pass -1u as state in the test we will have
[001] +2021-06-26 15:52:58.610 [57492] main/101/main F> RAFT: req->state > 0 -> 0

thus if compiler use signed int type as internal representation then
(int)req->state >= (int)raft_state_MAX won't take the branch because
it is a negative value.

Surely the patch won't catch integer overflows since we trim the
data as you noted above because, gimme some time to think about,
I really like how our box_raft_request_to_msg helpers are done
and if it possible would love to not change them much: they are
simple and short now.

> > diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
> > index eacdddb7e..409e983f0 100644
> > --- a/src/lib/raft/raft.c
> > +++ b/src/lib/raft/raft.c
> > @@ -309,7 +309,9 @@ raft_process_msg(struct raft *raft, const struct raft_msg *req, uint32_t source)
> >  	say_info("RAFT: message %s from %u", raft_msg_to_string(req), source);
> >  	assert(source > 0);
> >  	assert(source != raft->self);
> > -	if (req->term == 0 || req->state == 0 || req->state >= raft_state_MAX) {
> > +
> > +	if (req->term == 0 || req->state == 0 ||
> > +	    (unsigned)req->state >= raft_state_MAX) {
> 
> Please, perform a normal < 0 comparison. No need for unsigned cast
> cast.

We need an explicit cast here because as I said the internal
implementation is compiler defined.

On the other hands I played with a number of gcc and llvm compilers
and for this code they all does correct comparision (as unsigned
integers) so maybe we could just leave it as is, simply close
the bug as won't implement and spend time on more important things?
I don't mind actually, so up to you just say me what you think.