Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads)

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Alexander Turenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Nikiforov <void@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads)
Date: Sat, 20 Feb 2021 15:49:33 +0300	[thread overview]
Message-ID: <20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb> (raw)
In-Reply-To: <a015ce9d-5c46-a3f6-0fa0-b776b01d82b3@tarantool.org>

> > > It also caused data corruption - garbage instead of "extra bits" was
> > > saved into state->result if there was no space in output buffer.
> > 
> > We have the dead code and it appears to be broken. Why don't remove it?
> > (AFAIS, the rest of the code does not read off the buffer.)
> > 
> > Is it due to a little probability that we'll decide to support chunked
> > decoding and we'll decide to implement it in exactly this way (and not
> > just leaving undecoded bytes in, say, ibuf)?
> > 
> > Another side of this little probability is:
> > 
> > * The code complexity and so waste of the time for anyone who need to
> >    dive into it.
> > * Have untested code in the code base that may give us more surprises.
> > * Extra def-use dependencies may hide optimization opportunities and
> >    increase register pressure.
> 
> And yet you are youself proposing to improve performance:
> "entirely eliminate the output buffer lengthchecks for the first (out_len *
> 3 / 4) input bytes" (quoted from your e-mail about 2/2). This means saving
> state and reporting input buffer stop position. So: do we want complexity
> (and performance) or simplicity?

Nope, it does not. Just a kind of loop unrolling optimization:

 | for (int i = 0; i < count; ++i) {
 |         <..processing..>
 | }

->

 | for (int i = 0; i < count; i += 4) {
 |     <..processing by 4 bytes..>
 | }
 |
 | /* Remainder. */
 | for (int i = count - count % 4 - 1; i < count; ++i) {
 |         <..processing..>
 | }

(I didn't verify the arithmetic, just for the idea.)

(Sure, it is meaningful only for large inputs.)

The remainder always starts from 'state_a'.

But here we don't discuss optimizations. You completely ignored my
main point. I'll repeat:

 | We have the dead code and it appears to be broken. Why don't remove it?
 | (AFAIS, the rest of the code does not read off the buffer.)

(I verified it locally with ASAN before I wrote to you the past time.)

Please, confirm that it is true or show that it is wrong.

It is not a dialogue if you respond only to those parts that looks
interesting for you.

> > This is not the question regarding the patch, but this code looks broken
> > in several other ways. At least:
> > 
> > * It skips unrecognized symbols and does not report a decoding error.
> 
> Some of these symbols should "legally" be skipped - like newlines in e-mail
> use case. And there are probably other use cases which use some other
> symbols. We could break something and gain nothing.

Sure, backward compatibility is the question.

However 'some symbols' and 'all unrecognized symbols' is not the same
thing. I may want to skip newlines, but catch ill-formed incoming data.
For example, it may be part of functional requirements for an API of my
service that receives or reads base64 encoded data. Say, when an
external service send me a text in ascii instead of base64, I want to
report an error and decline the request rather than accept the request
that is known to be ill-formed.

> Nothing prevents somebody from corrupting "legal" base64 characters, this is
> not detected nor should be. These issues are outside base64 decoder scope,
> CRCs or digital signatures should be used when data can be accidentally or
> intentionally corrupted.

There are cases, when a user wants to follow the robustness principle
and when (s)he wants the opposite. There is no silver bullet here,
different usage scenarious are different.

> Summary: I propose commiting this particular patch (1/2) "as is" (it was
> posted unmodified several times already) and discussing performance patch
> (2/2) a little further.

First I need a response to the question above regarding the unused code.

next prev parent reply	other threads:[~2021-02-20 12:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11  9:44 [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Sergey Nikiforov via Tarantool-patches
2021-01-11  9:45 ` [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) Sergey Nikiforov via Tarantool-patches
2021-01-21  2:16   ` Alexander Turenko via Tarantool-patches
2021-01-26 16:37     ` Sergey Nikiforov via Tarantool-patches
2021-01-26 16:48       ` Sergey Nikiforov via Tarantool-patches
2021-02-20 11:30         ` Alexander Turenko via Tarantool-patches
2021-02-20 12:49       ` Alexander Turenko via Tarantool-patches [this message]
2021-02-25  8:25         ` Sergey Nikiforov via Tarantool-patches
2021-02-27 18:47           ` Alexander Turenko via Tarantool-patches
2021-01-11  9:45 ` [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance Sergey Nikiforov via Tarantool-patches
2021-01-21 15:31   ` Alexander Turenko via Tarantool-patches
2021-01-26 16:37     ` Sergey Nikiforov via Tarantool-patches
2021-02-20 12:51       ` Alexander Turenko via Tarantool-patches
2021-01-12 19:39 ` [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Alexander Turenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=alexander.turenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=void@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox