From: Alexander Turenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Nikiforov <void@tarantool.org> Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) Date: Sat, 20 Feb 2021 15:49:33 +0300 [thread overview] Message-ID: <20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb> (raw) In-Reply-To: <a015ce9d-5c46-a3f6-0fa0-b776b01d82b3@tarantool.org> > > > It also caused data corruption - garbage instead of "extra bits" was > > > saved into state->result if there was no space in output buffer. > > > > We have the dead code and it appears to be broken. Why don't remove it? > > (AFAIS, the rest of the code does not read off the buffer.) > > > > Is it due to a little probability that we'll decide to support chunked > > decoding and we'll decide to implement it in exactly this way (and not > > just leaving undecoded bytes in, say, ibuf)? > > > > Another side of this little probability is: > > > > * The code complexity and so waste of the time for anyone who need to > > dive into it. > > * Have untested code in the code base that may give us more surprises. > > * Extra def-use dependencies may hide optimization opportunities and > > increase register pressure. > > And yet you are youself proposing to improve performance: > "entirely eliminate the output buffer lengthchecks for the first (out_len * > 3 / 4) input bytes" (quoted from your e-mail about 2/2). This means saving > state and reporting input buffer stop position. So: do we want complexity > (and performance) or simplicity? Nope, it does not. Just a kind of loop unrolling optimization: | for (int i = 0; i < count; ++i) { | <..processing..> | } -> | for (int i = 0; i < count; i += 4) { | <..processing by 4 bytes..> | } | | /* Remainder. */ | for (int i = count - count % 4 - 1; i < count; ++i) { | <..processing..> | } (I didn't verify the arithmetic, just for the idea.) (Sure, it is meaningful only for large inputs.) The remainder always starts from 'state_a'. But here we don't discuss optimizations. You completely ignored my main point. I'll repeat: | We have the dead code and it appears to be broken. Why don't remove it? | (AFAIS, the rest of the code does not read off the buffer.) (I verified it locally with ASAN before I wrote to you the past time.) Please, confirm that it is true or show that it is wrong. It is not a dialogue if you respond only to those parts that looks interesting for you. > > This is not the question regarding the patch, but this code looks broken > > in several other ways. At least: > > > > * It skips unrecognized symbols and does not report a decoding error. > > Some of these symbols should "legally" be skipped - like newlines in e-mail > use case. And there are probably other use cases which use some other > symbols. We could break something and gain nothing. Sure, backward compatibility is the question. However 'some symbols' and 'all unrecognized symbols' is not the same thing. I may want to skip newlines, but catch ill-formed incoming data. For example, it may be part of functional requirements for an API of my service that receives or reads base64 encoded data. Say, when an external service send me a text in ascii instead of base64, I want to report an error and decline the request rather than accept the request that is known to be ill-formed. > Nothing prevents somebody from corrupting "legal" base64 characters, this is > not detected nor should be. These issues are outside base64 decoder scope, > CRCs or digital signatures should be used when data can be accidentally or > intentionally corrupted. There are cases, when a user wants to follow the robustness principle and when (s)he wants the opposite. There is no silver bullet here, different usage scenarious are different. > Summary: I propose commiting this particular patch (1/2) "as is" (it was > posted unmodified several times already) and discussing performance patch > (2/2) a little further. First I need a response to the question above regarding the unused code.
next prev parent reply other threads:[~2021-02-20 12:49 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-11 9:44 [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Sergey Nikiforov via Tarantool-patches 2021-01-11 9:45 ` [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) Sergey Nikiforov via Tarantool-patches 2021-01-21 2:16 ` Alexander Turenko via Tarantool-patches 2021-01-26 16:37 ` Sergey Nikiforov via Tarantool-patches 2021-01-26 16:48 ` Sergey Nikiforov via Tarantool-patches 2021-02-20 11:30 ` Alexander Turenko via Tarantool-patches 2021-02-20 12:49 ` Alexander Turenko via Tarantool-patches [this message] 2021-02-25 8:25 ` Sergey Nikiforov via Tarantool-patches 2021-02-27 18:47 ` Alexander Turenko via Tarantool-patches 2021-01-11 9:45 ` [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance Sergey Nikiforov via Tarantool-patches 2021-01-21 15:31 ` Alexander Turenko via Tarantool-patches 2021-01-26 16:37 ` Sergey Nikiforov via Tarantool-patches 2021-02-20 12:51 ` Alexander Turenko via Tarantool-patches 2021-01-12 19:39 ` [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Alexander Turenko via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb \ --to=tarantool-patches@dev.tarantool.org \ --cc=alexander.turenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --cc=void@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads)' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox