[Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads)

Thu Jan 21 05:16:27 MSK 2021

On Mon, Jan 11, 2021 at 12:45:00PM +0300, Sergey Nikiforov wrote:
> Was caught by base64 test with enabled ASAN.

It seems, we have a problem in CI, otherwise it would be detected. At
least, I don't see an explicit suppression relevant to the base64 code
or disabling the test under this CI job.

Can you, please, investigate, how to enable it in CI, so we'll catch the
similar problem next time if it'll appear?

> 
> It also caused data corruption - garbage instead of "extra bits" was
> saved into state->result if there was no space in output buffer.

We have the dead code and it appears to be broken. Why don't remove it?
(AFAIS, the rest of the code does not read off the buffer.)

Is it due to a little probability that we'll decide to support chunked
decoding and we'll decide to implement it in exactly this way (and not
just leaving undecoded bytes in, say, ibuf)?

Another side of this little probability is:

* The code complexity and so waste of the time for anyone who need to
  dive into it.
* Have untested code in the code base that may give us more surprises.
* Extra def-use dependencies may hide optimization opportunities and
  increase register pressure.

This is not the question regarding the patch, but this code looks broken
in several other ways. At least:

* It skips unrecognized symbols and does not report a decoding error.
* If the output buffer is too short, it neither report an error, nor a
  required buffer length (like snprintf()). No way to distinguish a
  successful and an 'interrupted' processing.

> 
> Added test for "zero-sized output buffer" case.

Nice catch.

> 
> Fixes: #3069
> ---
> 
> Branch: https://github.com/tarantool/tarantool/tree/void234/gh-3069-fix-base64-memory-overrun-v6
> Issue: https://github.com/tarantool/tarantool/issues/3069