From: Sergey Nikiforov via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Alexander Turenko <alexander.turenko@tarantool.org> Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) Date: Thu, 25 Feb 2021 11:25:21 +0300 [thread overview] Message-ID: <d81a7ad0-fd98-1798-3102-83bfc93eea76@tarantool.org> (raw) In-Reply-To: <20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb> On 20.02.2021 15:49, Alexander Turenko wrote: >>>> It also caused data corruption - garbage instead of "extra bits" was >>>> saved into state->result if there was no space in output buffer. >>> >>> We have the dead code and it appears to be broken. Why don't remove it? >>> (AFAIS, the rest of the code does not read off the buffer.) >>> >>> Is it due to a little probability that we'll decide to support chunked >>> decoding and we'll decide to implement it in exactly this way (and not >>> just leaving undecoded bytes in, say, ibuf)? >>> >>> Another side of this little probability is: >>> >>> * The code complexity and so waste of the time for anyone who need to >>> dive into it. >>> * Have untested code in the code base that may give us more surprises. >>> * Extra def-use dependencies may hide optimization opportunities and >>> increase register pressure. >> >> And yet you are youself proposing to improve performance: >> "entirely eliminate the output buffer lengthchecks for the first (out_len * >> 3 / 4) input bytes" (quoted from your e-mail about 2/2). This means saving >> state and reporting input buffer stop position. So: do we want complexity >> (and performance) or simplicity? > > Nope, it does not. Just a kind of loop unrolling optimization: > > | for (int i = 0; i < count; ++i) { > | <..processing..> > | } > > -> > > | for (int i = 0; i < count; i += 4) { > | <..processing by 4 bytes..> > | } > | > | /* Remainder. */ > | for (int i = count - count % 4 - 1; i < count; ++i) { > | <..processing..> > | } > > (I didn't verify the arithmetic, just for the idea.) > > (Sure, it is meaningful only for large inputs.) > > The remainder always starts from 'state_a'. Alas, it is not applicable - there could be skipped characters in input. So we get unknown intermediate state. The alternative - unknown processed input stop position - does not look good because w/o extra checks within loop we would overrun input. > But here we don't discuss optimizations. You completely ignored my > main point. I'll repeat: > > | We have the dead code and it appears to be broken. Why don't remove it? > | (AFAIS, the rest of the code does not read off the buffer.) > > (I verified it locally with ASAN before I wrote to you the past time.) > > Please, confirm that it is true or show that it is wrong. Confirm. > It is not a dialogue if you respond only to those parts that looks > interesting for you. There are two parallel e-mail threads, I am trying to avoid duplicating my responses. There was no request to verify your findings in earlier e-mails. >>> This is not the question regarding the patch, but this code looks broken >>> in several other ways. At least: >>> >>> * It skips unrecognized symbols and does not report a decoding error. >> >> Some of these symbols should "legally" be skipped - like newlines in e-mail >> use case. And there are probably other use cases which use some other >> symbols. We could break something and gain nothing. > > Sure, backward compatibility is the question. > > However 'some symbols' and 'all unrecognized symbols' is not the same > thing. I may want to skip newlines, but catch ill-formed incoming data. > For example, it may be part of functional requirements for an API of my > service that receives or reads base64 encoded data. Say, when an > external service send me a text in ascii instead of base64, I want to > report an error and decline the request rather than accept the request > that is known to be ill-formed. > >> Nothing prevents somebody from corrupting "legal" base64 characters, this is >> not detected nor should be. These issues are outside base64 decoder scope, >> CRCs or digital signatures should be used when data can be accidentally or >> intentionally corrupted. > > There are cases, when a user wants to follow the robustness principle > and when (s)he wants the opposite. There is no silver bullet here, > different usage scenarious are different. I still think CRC/signatures should be used when input is untrusted. Extra validation is hardly useful (even taking compatibility issues out of the picture) but increases code complexity and degrades performance. >> Summary: I propose commiting this particular patch (1/2) "as is" (it was >> posted unmodified several times already) and discussing performance patch >> (2/2) a little further. > > First I need a response to the question above regarding the unused code. I can easily create patch which fixes overrun bug and gets rid of state in one stroke. Should I do that? We may need state to implement optimizations later. I will postpone responding in another e-mail thread (API changes for robustness; optimizations).
next prev parent reply other threads:[~2021-02-25 8:25 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-11 9:44 [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Sergey Nikiforov via Tarantool-patches 2021-01-11 9:45 ` [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) Sergey Nikiforov via Tarantool-patches 2021-01-21 2:16 ` Alexander Turenko via Tarantool-patches 2021-01-26 16:37 ` Sergey Nikiforov via Tarantool-patches 2021-01-26 16:48 ` Sergey Nikiforov via Tarantool-patches 2021-02-20 11:30 ` Alexander Turenko via Tarantool-patches 2021-02-20 12:49 ` Alexander Turenko via Tarantool-patches 2021-02-25 8:25 ` Sergey Nikiforov via Tarantool-patches [this message] 2021-02-27 18:47 ` Alexander Turenko via Tarantool-patches 2021-01-11 9:45 ` [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance Sergey Nikiforov via Tarantool-patches 2021-01-21 15:31 ` Alexander Turenko via Tarantool-patches 2021-01-26 16:37 ` Sergey Nikiforov via Tarantool-patches 2021-02-20 12:51 ` Alexander Turenko via Tarantool-patches 2021-01-12 19:39 ` [Tarantool-patches] [PATCH v6 0/2] base64: Fix decoder, improve its performance Alexander Turenko via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d81a7ad0-fd98-1798-3102-83bfc93eea76@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=alexander.turenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --cc=void@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads)' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox