From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id AC56F6EC71; Sat, 27 Feb 2021 21:47:48 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AC56F6EC71 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1614451668; bh=HmNiRY+qBor5M37fh2L1Q8ykiYoVW6GFD8GlZi3as9M=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=EVgWB+DzraHxekM6GXCGTmq6DjGl+QuoNq1f82nDqt+yap6UJcvjLkO5XUh8btImD FbQKU/Ohy0MEGg73nm2ZJPN8Vtj0+xbLD4pqEBffkk9yJR2spqGXI+VuwLGI2ikPBt HxUILJuy+q6sLUSkNU5YOcULSWFBrIfBh/OdnzSU= Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [94.100.177.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 167806EC71 for ; Sat, 27 Feb 2021 21:47:47 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 167806EC71 Received: by smtp50.i.mail.ru with esmtpa (envelope-from ) id 1lG4d0-00024y-3P; Sat, 27 Feb 2021 21:47:46 +0300 Date: Sat, 27 Feb 2021 21:47:51 +0300 To: Sergey Nikiforov Message-ID: <20210227184751.xsq34g6tfkr57z3s@tkn_work_nb> References: <7f5ce3321cda6d4305282af486ed2f493b920d1f.1610357625.git.void@tarantool.org> <20210121021627.2dzyh7fho2cvjlrz@tkn_work_nb> <20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9795828B892398B720FF7727D08DA7A588E6C33AE9CCC9B44182A05F538085040F45A4D875026060BD1F05AB0FAA90E0691BAF38E5AB842CEEA03ECBB7DCDBF6C X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE704BA85F3D5A9F85BEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637670E1B164B70895C8638F802B75D45FF914D58D5BE9E6BC131B5C99E7648C95C25752893F242F32C00548A214FB5CAC28D66D8EE2D5B351CA471835C12D1D9774AD6D5ED66289B5278DA827A17800CE7D9442B0B5983000E8941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6668FBEE05F116134CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C22497052A37C3792C05A76E601842F6C81A12EF20D2F80756B5F7E9C4E3C761E06A776E601842F6C81A127C277FBC8AE2E8B2BE06DBDC430765AD81D268191BDAD3D698AB9A7B718F8C442539A7722CA490C13377AFFFEAFD26923F8577A6DFFEA7CFA80D66F452D417A93EC92FD9297F6715571747095F342E857739F23D657EF2BD5E8D9A59859A8B6D75B66E98413B381089D37D7C0E48F6C5571747095F342E857739F23D657EF2B6825BDBE14D8E70227F01F88B6EF2528BD9CCCA9EDD067B1EDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A5597D37CBD4F46F6AABDAA28725F4008461BA4D43C8FF325AD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75968C9853642EB7C3410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34D1AE09A115117C9640E7F2B23F6704049F16EB2B68196E0E5DC9543992D2AAD280E3A75819FDCD281D7E09C32AA3244CA9149DB047386248A42141D4349F468A51E887DA02A9F7BF927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojzT24cXffn6xYcjPUy45J0A== X-Mailru-Sender: FFAA8E4AEE17E37C3731A083A1A85ADE211A5E24A170458059714B32CC735B5DB7EA9FE7735C3DBFC664A44C781FCEA7C77752E0C033A69EDF9F2CE1E9CF805D8CD356D4F938FF726C18EFA0BB12DBB0 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Alexander Turenko via Tarantool-patches Reply-To: Alexander Turenko Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Thu, Feb 25, 2021 at 11:25:21AM +0300, Sergey Nikiforov wrote: > On 20.02.2021 15:49, Alexander Turenko wrote: > > > > > It also caused data corruption - garbage instead of "extra bits" was > > > > > saved into state->result if there was no space in output buffer. > > > > > > > > We have the dead code and it appears to be broken. Why don't remove it? > > > > (AFAIS, the rest of the code does not read off the buffer.) > > > > > > > > Is it due to a little probability that we'll decide to support chunked > > > > decoding and we'll decide to implement it in exactly this way (and not > > > > just leaving undecoded bytes in, say, ibuf)? > > > > > > > > Another side of this little probability is: > > > > > > > > * The code complexity and so waste of the time for anyone who need to > > > > dive into it. > > > > * Have untested code in the code base that may give us more surprises. > > > > * Extra def-use dependencies may hide optimization opportunities and > > > > increase register pressure. > > > > > > And yet you are youself proposing to improve performance: > > > "entirely eliminate the output buffer lengthchecks for the first (out_len * > > > 3 / 4) input bytes" (quoted from your e-mail about 2/2). This means saving > > > state and reporting input buffer stop position. So: do we want complexity > > > (and performance) or simplicity? > > > > Nope, it does not. Just a kind of loop unrolling optimization: > > > > | for (int i = 0; i < count; ++i) { > > | <..processing..> > > | } > > > > -> > > > > | for (int i = 0; i < count; i += 4) { > > | <..processing by 4 bytes..> > > | } > > | > > | /* Remainder. */ > > | for (int i = count - count % 4 - 1; i < count; ++i) { > > | <..processing..> > > | } > > > > (I didn't verify the arithmetic, just for the idea.) > > > > (Sure, it is meaningful only for large inputs.) > > > > The remainder always starts from 'state_a'. > > Alas, it is not applicable - there could be skipped characters in input. So > we get unknown intermediate state. The alternative - unknown processed input > stop position - does not look good because w/o extra checks within loop we > would overrun input. Okay, some optimization ways may require the state (but locally, not as part of the function contract). Some does not. I have some thoughts in the mind, but I'll hold myself to don't discuss optimizations. > > > But here we don't discuss optimizations. You completely ignored my > > main point. I'll repeat: > > > > | We have the dead code and it appears to be broken. Why don't remove it? > > | (AFAIS, the rest of the code does not read off the buffer.) > > > > (I verified it locally with ASAN before I wrote to you the past time.) > > > > Please, confirm that it is true or show that it is wrong. > > Confirm. So I would remove it. There is no confidence that we'll need it in a future. > > > > This is not the question regarding the patch, but this code looks broken > > > > in several other ways. At least: > > > > > > > > * It skips unrecognized symbols and does not report a decoding error. > > > > > > Some of these symbols should "legally" be skipped - like newlines in e-mail > > > use case. And there are probably other use cases which use some other > > > symbols. We could break something and gain nothing. > > > > Sure, backward compatibility is the question. > > > > However 'some symbols' and 'all unrecognized symbols' is not the same > > thing. I may want to skip newlines, but catch ill-formed incoming data. > > For example, it may be part of functional requirements for an API of my > > service that receives or reads base64 encoded data. Say, when an > > external service send me a text in ascii instead of base64, I want to > > report an error and decline the request rather than accept the request > > that is known to be ill-formed. > > > > > Nothing prevents somebody from corrupting "legal" base64 characters, this is > > > not detected nor should be. These issues are outside base64 decoder scope, > > > CRCs or digital signatures should be used when data can be accidentally or > > > intentionally corrupted. > > > > There are cases, when a user wants to follow the robustness principle > > and when (s)he wants the opposite. There is no silver bullet here, > > different usage scenarious are different. > > I still think CRC/signatures should be used when input is untrusted. Extra > validation is hardly useful (even taking compatibility issues out of the > picture) but increases code complexity and degrades performance. CRC is not a silver validation bullet. An external service may just send data in a wrong format (not base64), because of a developer mistake (say, forgotten encode call). There is no question whether validation is useful or not, when it is part of requirements for an application that you develop. Implementation of a particular format (without skipping unknown symbols, without skipping newlines, without accepting more than one newline in row, with certain amount of symbols in a line) may be simpler and more performant than the common implementation. > > > > Summary: I propose commiting this particular patch (1/2) "as is" (it was > > > posted unmodified several times already) and discussing performance patch > > > (2/2) a little further. > > > > First I need a response to the question above regarding the unused code. > > I can easily create patch which fixes overrun bug and gets rid of state in > one stroke. Should I do that? > We may need state to implement optimizations later. > > I will postpone responding in another e-mail thread (API changes for > robustness; optimizations). I'm sure we should remove the unused code now. We may need some way to store state in a future, but there is no confidence now.