Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] Fix base64 decoder output buffer overrun (reads)
@ 2020-12-01 16:30 Sergey Nikiforov
  2020-12-03 21:35 ` Vladislav Shpilevoy
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Nikiforov @ 2020-12-01 16:30 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy

It also caused data corruption.

Also:
Fixed read access beyond decode table (noticed along the way).
Minimized number of condition checks in internal loops (performance).

Fixes: #3069
---
 third_party/base64.c | 57 ++++++++++++++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 18 deletions(-)

diff --git a/third_party/base64.c b/third_party/base64.c
index 8ecab23eb..ab644df22 100644
--- a/third_party/base64.c
+++ b/third_party/base64.c
@@ -222,7 +222,8 @@ base64_decode_value(int value)
 		32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
 		44, 45, 46, 47, 48, 49, 50, 51
 	};
-	static const int decoding_size = sizeof(decoding);
+	static const int decoding_size =
+		sizeof(decoding) / sizeof(decoding[0]);
 	int codepos = value;
 	codepos -= 43;
 	if (codepos < 0 || codepos >= decoding_size)
@@ -247,8 +248,9 @@ base64_decode_block(const char *in_base64, int in_len,
 	char *out_pos = out_bin;
 	char *out_end = out_bin + out_len;
 	int fragment;
+	char curr_byte;
 
-	*out_pos = state->result;
+	curr_byte = state->result;
 
 	switch (state->step)
 	{
@@ -256,52 +258,71 @@ base64_decode_block(const char *in_base64, int in_len,
 		{
 	case step_a:
 			do {
-				if (in_pos == in_end || out_pos >= out_end)
+				if (in_pos >= in_end)
 				{
 					state->step = step_a;
-					state->result = *out_pos;
+					state->result = curr_byte;
 					return out_pos - out_bin;
 				}
 				fragment = base64_decode_value(*in_pos++);
 			} while (fragment < 0);
-			*out_pos    = (fragment & 0x03f) << 2;
+			curr_byte = (fragment & 0x03f) << 2;
 	case step_b:
 			do {
-				if (in_pos == in_end || out_pos >= out_end)
+				if (in_pos >= in_end)
 				{
 					state->step = step_b;
-					state->result = *out_pos;
+					state->result = curr_byte;
 					return out_pos - out_bin;
 				}
 				fragment = base64_decode_value(*in_pos++);
 			} while (fragment < 0);
-			*out_pos++ |= (fragment & 0x030) >> 4;
-			if (out_pos < out_end)
-				*out_pos = (fragment & 0x00f) << 4;
+			curr_byte |= (fragment & 0x030) >> 4;
+			*out_pos = curr_byte;
+			curr_byte = (fragment & 0x00f) << 4;
+			if (++out_pos >= out_end)
+			{
+				state->step = step_c;
+				state->result = curr_byte;
+				return out_pos - out_bin;
+			}
 	case step_c:
 			do {
-				if (in_pos == in_end || out_pos >= out_end)
+				if (in_pos >= in_end)
 				{
 					state->step = step_c;
-					state->result = *out_pos;
+					state->result = curr_byte;
 					return out_pos - out_bin;
 				}
 				fragment = base64_decode_value(*in_pos++);
 			} while (fragment < 0);
-			*out_pos++ |= (fragment & 0x03c) >> 2;
-			if (out_pos < out_end)
-				*out_pos = (fragment & 0x003) << 6;
+			curr_byte |= (fragment & 0x03c) >> 2;
+			*out_pos = curr_byte;
+			curr_byte = (fragment & 0x003) << 6;
+			if (++out_pos >= out_end)
+			{
+				state->step = step_d;
+				state->result = curr_byte;
+				return out_pos - out_bin;
+			}
 	case step_d:
 			do {
-				if (in_pos == in_end || out_pos >= out_end)
+				if (in_pos >= in_end)
 				{
 					state->step = step_d;
-					state->result = *out_pos;
+					state->result = curr_byte;
 					return out_pos - out_bin;
 				}
 				fragment = base64_decode_value(*in_pos++);
 			} while (fragment < 0);
-			*out_pos++   |= (fragment & 0x03f);
+			curr_byte |= (fragment & 0x03f);
+			*out_pos = curr_byte;
+			if (++out_pos >= out_end)
+			{
+				state->step = step_a;
+				state->result = curr_byte;
+				return out_pos - out_bin;
+			}
 		}
 	}
 	/* control should not reach here */
-- 
2.25.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH] Fix base64 decoder output buffer overrun (reads)
  2020-12-01 16:30 [Tarantool-patches] [PATCH] Fix base64 decoder output buffer overrun (reads) Sergey Nikiforov
@ 2020-12-03 21:35 ` Vladislav Shpilevoy
  2020-12-15 14:20   ` Sergey Nikiforov
  0 siblings, 1 reply; 4+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-03 21:35 UTC (permalink / raw)
  To: Sergey Nikiforov, tarantool-patches

Hi! Thanks for the patch!

I recommend you to read this document:
https://github.com/tarantool/tarantool/wiki/Code-review-procedure

See 5 comments below.

1. Please, use subsystem prefix in the commit title. In your case it
should be 'base64: ...'.

On 01.12.2020 17:30, Sergey Nikiforov wrote:
> It also caused data corruption.

2. What do you mean 'also'? What did it cause besides data corruption?

> Also:
> Fixed read access beyond decode table (noticed along the way).
> Minimized number of condition checks in internal loops (performance).

3. Please, never mix unrelated changed into one commit. That
complicates the review; makes it harder to cherry-pick things to
the older branches; ruins git history; and you can introduce a new
bug while doing 'refactoring'.

Btw, can you prove your optimizations actually do any notable
impact on the performance? Do you have numbers showing that it is
worth optimizing?

> Fixes: #3069
> ---

4. See
https://github.com/tarantool/tarantool/wiki/Code-review-procedure#sending-the-patch.

Please, put here issue and branch links. I don't know the branch name,
and I don't see it in `git branch -a | grep 3069`. So I can't check if
the patch works.

>  third_party/base64.c | 57 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/third_party/base64.c b/third_party/base64.c
> index 8ecab23eb..ab644df22 100644
> --- a/third_party/base64.c
> +++ b/third_party/base64.c
> @@ -222,7 +222,8 @@ base64_decode_value(int value)
>  		32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
>  		44, 45, 46, 47, 48, 49, 50, 51
>  	};
> -	static const int decoding_size = sizeof(decoding);
> +	static const int decoding_size =
> +		sizeof(decoding) / sizeof(decoding[0]);

5. Since this is a bugfix, it should be covered with a test when
possible. Here it looks like it is possible. If 'value' was
big enough, it would access some memory out of the array, and
would return garbage instead of an error. I believe it shouldn't
be hard to create a proper unit test on that.

Everything below is just refactoring, hopefully without new bugs.
But I recommend to remove it, since it is not related to the bug
anyhow, and hardly makes performance any notable better. Unless
you have numbers. If it really makes difference, please, extract
these optimizations into a new commit on a different branch so as
we could handle it out of the 3069 bug context.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH] Fix base64 decoder output buffer overrun (reads)
  2020-12-03 21:35 ` Vladislav Shpilevoy
@ 2020-12-15 14:20   ` Sergey Nikiforov
  2020-12-16 23:22     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Nikiforov @ 2020-12-15 14:20 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches

Hi!

I would submit new patches in separate e-mail threads. Please see 
comments below.

On 04.12.2020 0:35, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> I recommend you to read this document:
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure
> 
> See 5 comments below.
> 
> 1. Please, use subsystem prefix in the commit title. In your case it
> should be 'base64: ...'.

ok. I have used 
http://www.tarantool.io/en/doc/latest/dev_guide/developer_guidelines/ 
which does not say that "subsystem" can be anything (there is a list).

> On 01.12.2020 17:30, Sergey Nikiforov wrote:
>> It also caused data corruption.
> 
> 2. What do you mean 'also'? What did it cause besides data corruption?

ASAN faults (see #3069). I have made commit description more clear.

>> Also:
>> Fixed read access beyond decode table (noticed along the way).
>> Minimized number of condition checks in internal loops (performance).
> 
> 3. Please, never mix unrelated changed into one commit. That
> complicates the review; makes it harder to cherry-pick things to
> the older branches; ruins git history; and you can introduce a new
> bug while doing 'refactoring'.

While I agree with you in general, in this particular case I had to 
create "intermediate" version containing only this specific fix w/o 
optimization (there was no such thing before - I was fixing and cleaning 
up logic in single pass) and test it.

> Btw, can you prove your optimizations actually do any notable
> impact on the performance? Do you have numbers showing that it is
> worth optimizing?

release, old:
It took 6369332219 ns to decode 7087960000 bytes, speed is 1112826236 bps

release, optimized base64 decoder:
It took 5550868992 ns to decode 7087960000 bytes, speed is 1276909977 bps

~1.15 times faster (Intel Core I7-9700K, single thread)

Where can I commit performance testing code?

>> Fixes: #3069
>> ---
> 
> 4. See
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#sending-the-patch.
> 
> Please, put here issue and branch links. I don't know the branch name,
> and I don't see it in `git branch -a | grep 3069`. So I can't check if
> the patch works.

ok

>>   third_party/base64.c | 57 ++++++++++++++++++++++++++++++--------------
>>   1 file changed, 39 insertions(+), 18 deletions(-)
>>
>> diff --git a/third_party/base64.c b/third_party/base64.c
>> index 8ecab23eb..ab644df22 100644
>> --- a/third_party/base64.c
>> +++ b/third_party/base64.c
>> @@ -222,7 +222,8 @@ base64_decode_value(int value)
>>   		32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
>>   		44, 45, 46, 47, 48, 49, 50, 51
>>   	};
>> -	static const int decoding_size = sizeof(decoding);
>> +	static const int decoding_size =
>> +		sizeof(decoding) / sizeof(decoding[0]);
> 
> 5. Since this is a bugfix, it should be covered with a test when
> possible. Here it looks like it is possible. If 'value' was
> big enough, it would access some memory out of the array, and
> would return garbage instead of an error. I believe it shouldn't
> be hard to create a proper unit test on that.

Done. I have created #5627 for this bug and fix + test would go into 
separate e-mail thread.

> Everything below is just refactoring, hopefully without new bugs.
> But I recommend to remove it, since it is not related to the bug
> anyhow, and hardly makes performance any notable better. Unless
> you have numbers. If it really makes difference, please, extract
> these optimizations into a new commit on a different branch so as
> we could handle it out of the 3069 bug context.

It would be second patch in #3069 series because of the dependency on 
the fix. Or we could merge it later after fix for #3069 is merged.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH] Fix base64 decoder output buffer overrun (reads)
  2020-12-15 14:20   ` Sergey Nikiforov
@ 2020-12-16 23:22     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-16 23:22 UTC (permalink / raw)
  To: Sergey Nikiforov, tarantool-patches

> On 04.12.2020 0:35, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> I recommend you to read this document:
>> https://github.com/tarantool/tarantool/wiki/Code-review-procedure
>>
>> See 5 comments below.
>>
>> 1. Please, use subsystem prefix in the commit title. In your case it
>> should be 'base64: ...'.
> 
> ok. I have used http://www.tarantool.io/en/doc/latest/dev_guide/developer_guidelines/ which does not say that "subsystem" can be anything (there is a list).

It also does not say, that the subsystem shall be one of the
items in the list. Usually we try to use one of already used
in the commit titles. Sometimes invent a new one.

Although I would like to have such a strict list. Would make
things a bit simpler.

>> On 01.12.2020 17:30, Sergey Nikiforov wrote:
>>> It also caused data corruption.
>>
>> 2. What do you mean 'also'? What did it cause besides data corruption?
> 
> ASAN faults (see #3069). I have made commit description more clear.

Now with the separate commits I understood everything, thanks.

>>> Also:
>>> Fixed read access beyond decode table (noticed along the way).
>>> Minimized number of condition checks in internal loops (performance).
>>
>> 3. Please, never mix unrelated changed into one commit. That
>> complicates the review; makes it harder to cherry-pick things to
>> the older branches; ruins git history; and you can introduce a new
>> bug while doing 'refactoring'.
> 
> While I agree with you in general, in this particular case I had to create "intermediate" version containing only this specific fix w/o optimization (there was no such thing before - I was fixing and cleaning up logic in single pass) and test it.

And this is a good thing. I understood the problem much easier. Other
people also will. And I found a bug in the optimization patch, which
I probably wouldn't notice if it would be one big commit.

>> Btw, can you prove your optimizations actually do any notable
>> impact on the performance? Do you have numbers showing that it is
>> worth optimizing?
> 
> release, old:
> It took 6369332219 ns to decode 7087960000 bytes, speed is 1112826236 bps
> 
> release, optimized base64 decoder:
> It took 5550868992 ns to decode 7087960000 bytes, speed is 1276909977 bps
> 
> ~1.15 times faster (Intel Core I7-9700K, single thread)

Looks nice, worth doing.

> Where can I commit performance testing code?

About this I don't know anything. You can ask Alexander L. if you want
your bench committed. I know that we have our microbenches somewhere,
and Alexander took a lead here recently.

>> Everything below is just refactoring, hopefully without new bugs.
>> But I recommend to remove it, since it is not related to the bug
>> anyhow, and hardly makes performance any notable better. Unless
>> you have numbers. If it really makes difference, please, extract
>> these optimizations into a new commit on a different branch so as
>> we could handle it out of the 3069 bug context.
> 
> It would be second patch in #3069 series because of the dependency on the fix. Or we could merge it later after fix for #3069 is merged.

Yes, exactly. It is a separate patch and this is great. It is ok
if it is done on top of another patch.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-16 23:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01 16:30 [Tarantool-patches] [PATCH] Fix base64 decoder output buffer overrun (reads) Sergey Nikiforov
2020-12-03 21:35 ` Vladislav Shpilevoy
2020-12-15 14:20   ` Sergey Nikiforov
2020-12-16 23:22     ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox