[Tarantool-patches] [PATCH v2 2/2] base64: Improve decoder performance
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Tue Dec 22 18:05:31 MSK 2020
>>>> This test now crashes:
>>>>
>>>> ====================
>>>> --- a/test/unit/base64.c
>>>> +++ b/test/unit/base64.c
>>>> @@ -7,7 +7,7 @@ static void
>>>> base64_test(const char *str, int options, const char *no_symbols,
>>>> int no_symbols_len)
>>>> {
>>>> - plan(3 + no_symbols_len);
>>>> + plan(4 + no_symbols_len);
>>>> int len = strlen(str);
>>>> int base64_buflen = base64_bufsize(len + 1, options);
>>>> @@ -34,6 +34,11 @@ base64_test(const char *str, int options, const char *no_symbols,
>>>> free(base64_buf);
>>>> free(strbuf);
>>>> + const char *in = "sIIpHw==";
>>>> + int in_len = strlen(in);
>>>> + rc = base64_decode(in, in_len, NULL, 0);
>>>> + is(rc, 0, "no space in out buffer");
>>>> +
>>>> check_plan();
>>>> }
>>>> ====================
>>>>
>>>> It didn't crash while the checks were in place.
>>>
>>> I knew about this "problem" even before first patch version. Glad someone noticed. More on that below.
>>>
>>>> I would suggest to
>>>> add this test to the previous commit. Because technically it is
>>>> also 'buffer overrun'. And it will crash 3069 bug even without ASAN.
>>>
>>> I do not think this is a good idea.
>>>
>>> Both old code and your proposed change (below) behave like this: if output buffer size if 0, stop everything and return zero. No input chars are decoded even when this is possible (no "stored" bits in state - aka "step_a" - and only one valid input char with 6 new bits). My "optimized" code already handles such situation properly. And this is "free" (no extra if's).
>>>
>>> It is of course possible to process "zero output buffer size" situation properly but it would require additional "if" in base64_decode_block() beginning and some logic - and in every case except "step_a with no more than one useful input char" would cause some input characters to be lost anyway. Existing API for base64_decode() completely ignores situation when output buffer is too small (it is easy to calculate worst-case output buffer size) - not-yet-processed input data is ignored in such case. No reasonable programmer would use "0" as output buffer size. Such requirement is even DOCUMENTED in base64.h.
>>>
>>> What would you say?
>>
>> I don't mind leaving it as is. But then I suggest to add an assertion, that
>> the out-buffer length >= in-buffer length * 3 / 4. If it will crash
>> anywhere in the tests, it will mean the function is used not only with
>> the buffers of the given size. And you can't assume it can be not big
>> enough to fit the whole output, but never 0.
>
> This would break code which accepts base64 string from whatever and decodes it into fixed-sized buffer - extra characters are silently ignored now.
>
>> If it will not crash, it means you can drop the out-buffer length
>> parameter as it does not matter. Also you need to validate all the
>> usages of base64_decode() just in case.
>
> I believe that overhauling API is outside of the task assigned to me.
The problem is that this whole 'optimization' is also outside of the
task assigned to you. But you decided to do it anyway. So lets do it
properly then.
> I have changed out_pos checks to be performed before *out_pos writes, now code correctly processes out_len==0 and it even works faster. Differential patch is below, I will e-mail everything from git in a new thread. Note that there no point in saving state at all in places marked as "We are losing some data" - our caller would have to guess the offset from in_base64 where we have stopped decoding.
>
> diff --git a/third_party/base64.c b/third_party/base64.c
> index f4fbbf477..93442c04b 100644
> --- a/third_party/base64.c
> +++ b/third_party/base64.c
> @@ -260,7 +260,8 @@ base64_decode_block(const char *in_base64, int in_len,
> if (in_pos >= in_end)
> {
> state->step = step_a;
> - state->result = curr_byte;
> + /* curr_byte is useless now */
> + /* state->result = curr_byte; */
> return out_pos - out_bin;
> }
> fragment = base64_decode_value(*in_pos++);
> @@ -276,15 +277,16 @@ base64_decode_block(const char *in_base64, int in_len,
> }
> fragment = base64_decode_value(*in_pos++);
> } while (fragment < 0);
> - curr_byte |= (fragment & 0x030) >> 4;
> - *out_pos = curr_byte;
> - curr_byte = (fragment & 0x00f) << 4;
> - if (++out_pos >= out_end)
> + if (out_pos >= out_end)
> {
> - state->step = step_c;
> + /* We are losing some data */
> + state->step = step_b;
> state->result = curr_byte;
> return out_pos - out_bin;
Can you just not loose the data? For example, move this check
one line below, after curr_byte |= ? I don't know if it helps though.
Since you decided to change this function, I would suggest to make
it work properly for all inputs. And even better - expose it in
base64.h and test separately.
It was initially taken from there:
https://github.com/libb64/libb64/blob/master/src/cdecode.c
in this commit:
https://github.com/tarantool/tarantool/commit/4e707c0f989434b74eb3a859a1b26822a87c42e2#diff-4fd3a92107ebfe91b691f08cd53a84eccbcf89f1a7bad47e20f950d6c4214e6cR216
Readme of libb64 says it uses 'coroutines' implemented via
switch-while-case. Which means the function is supposed to be
called multiple times. Although in the source repository it
does not accept output buffer length at all.
The reenterability means the function should be allowed to
process all the input step by step, and must not loose data.
Otherwise it makes no sense to have this 'state machine' via
'coroutines', and it also can be removed in order to pursue
more performance. You don't even need the state variable.
The bottom line: you either need to fix the function not to loose
data + keep it reenterable + test properly; or remove the state
entirely because if it looses information, it is useless anyway.
> }
> + curr_byte |= (fragment & 0x030) >> 4;
> + *out_pos++ = curr_byte;
> + curr_byte = (fragment & 0x00f) << 4;
> case step_c:
> do {
> if (in_pos >= in_end)
More information about the Tarantool-patches
mailing list