From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 888816EC58; Sat, 20 Feb 2021 15:49:33 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 888816EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1613825373; bh=dtRnd5TliH4xPM90lgFCvJByBgMxrFdG4337IX7Nd1w=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=VHX2TcCG10ubDsnWaAhTpEeHWCZSouS24mn68QpxADnFAX8Zd70J21CtwkGjUp42s twU1DgVjFoCb1RJVY8L2RqAp8bev+802EHXeyMnuhQ8hQphGcQpHTkfEJpVkjTL3ng kuLTxaM/Mq2GBfzZo4SgIvpDTovS60v3COMgoShM= Received: from smtp38.i.mail.ru (smtp38.i.mail.ru [94.100.177.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id DBB156EC58 for ; Sat, 20 Feb 2021 15:49:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org DBB156EC58 Received: by smtp38.i.mail.ru with esmtpa (envelope-from ) id 1lDRhS-0005Fa-Vy; Sat, 20 Feb 2021 15:49:31 +0300 Date: Sat, 20 Feb 2021 15:49:33 +0300 To: Sergey Nikiforov Message-ID: <20210220124933.rg4fg4ouvljuv6dq@tkn_work_nb> References: <7f5ce3321cda6d4305282af486ed2f493b920d1f.1610357625.git.void@tarantool.org> <20210121021627.2dzyh7fho2cvjlrz@tkn_work_nb> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD975C3EC174F56692254B0AABE1FB071B2BA6557555153D6A0182A05F53808504032C6E0D7A72AC8E292FAD5D054A08E96D28B9F6172B24B2F7C45E865F4192858 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7AA1605287C7F04D6EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063719899BAB9B61B3948638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FCD8E8D49B3732BB3FDE7FEFE61F3937501BAEBA2F4E0F88AD389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0565C7A4E90E531F78941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6505D71D783575ABECC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C2249D034FE23A273109076E601842F6C81A12EF20D2F80756B5F7E9C4E3C761E06A776E601842F6C81A127C277FBC8AE2E8BC19EB8E125D93F263AA81AA40904B5D9DBF02ECDB25306B2B25CBF701D1BE8734AD6D5ED66289B5278DA827A17800CE75A9E79F66F1C28F367F23339F89546C5A8DF7F3B2552694A6FED454B719173D6725E5C173C3A84C37C6C241D9975906435872C767BF85DA2F004C906525384306FED454B719173D6462275124DF8B9C934F12F0C005D1A85E5BFE6E7EFDEDCD789D4C264860C145E X-C1DE0DAB: 0D63561A33F958A5B7AE91D48DEF8E768922D4AB3597B83F2D54721B92B2C77CD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75448CF9D3A7B2C848410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D345DB600F8E858000FB3335905E025F09CC07A9522C3884B2590F19804A12707C5C6F1392A7B7D57C11D7E09C32AA3244C187EFBD3E90A304B516AAC88464A670AD08D48398F32B4A6927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj2CfMKaWP9xvyPxbeqIj2Dg== X-Mailru-Sender: FFAA8E4AEE17E37C3731A083A1A85ADEB67EB180E3E27D117439DBC7DD29C4A9B7EA9FE7735C3DBFC664A44C781FCEA7C77752E0C033A69EDF9F2CE1E9CF805D8CD356D4F938FF726C18EFA0BB12DBB0 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Alexander Turenko via Tarantool-patches Reply-To: Alexander Turenko Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" > > > It also caused data corruption - garbage instead of "extra bits" was > > > saved into state->result if there was no space in output buffer. > > > > We have the dead code and it appears to be broken. Why don't remove it? > > (AFAIS, the rest of the code does not read off the buffer.) > > > > Is it due to a little probability that we'll decide to support chunked > > decoding and we'll decide to implement it in exactly this way (and not > > just leaving undecoded bytes in, say, ibuf)? > > > > Another side of this little probability is: > > > > * The code complexity and so waste of the time for anyone who need to > > dive into it. > > * Have untested code in the code base that may give us more surprises. > > * Extra def-use dependencies may hide optimization opportunities and > > increase register pressure. > > And yet you are youself proposing to improve performance: > "entirely eliminate the output buffer lengthchecks for the first (out_len * > 3 / 4) input bytes" (quoted from your e-mail about 2/2). This means saving > state and reporting input buffer stop position. So: do we want complexity > (and performance) or simplicity? Nope, it does not. Just a kind of loop unrolling optimization: | for (int i = 0; i < count; ++i) { | <..processing..> | } -> | for (int i = 0; i < count; i += 4) { | <..processing by 4 bytes..> | } | | /* Remainder. */ | for (int i = count - count % 4 - 1; i < count; ++i) { | <..processing..> | } (I didn't verify the arithmetic, just for the idea.) (Sure, it is meaningful only for large inputs.) The remainder always starts from 'state_a'. But here we don't discuss optimizations. You completely ignored my main point. I'll repeat: | We have the dead code and it appears to be broken. Why don't remove it? | (AFAIS, the rest of the code does not read off the buffer.) (I verified it locally with ASAN before I wrote to you the past time.) Please, confirm that it is true or show that it is wrong. It is not a dialogue if you respond only to those parts that looks interesting for you. > > This is not the question regarding the patch, but this code looks broken > > in several other ways. At least: > > > > * It skips unrecognized symbols and does not report a decoding error. > > Some of these symbols should "legally" be skipped - like newlines in e-mail > use case. And there are probably other use cases which use some other > symbols. We could break something and gain nothing. Sure, backward compatibility is the question. However 'some symbols' and 'all unrecognized symbols' is not the same thing. I may want to skip newlines, but catch ill-formed incoming data. For example, it may be part of functional requirements for an API of my service that receives or reads base64 encoded data. Say, when an external service send me a text in ascii instead of base64, I want to report an error and decline the request rather than accept the request that is known to be ill-formed. > Nothing prevents somebody from corrupting "legal" base64 characters, this is > not detected nor should be. These issues are outside base64 decoder scope, > CRCs or digital signatures should be used when data can be accidentally or > intentionally corrupted. There are cases, when a user wants to follow the robustness principle and when (s)he wants the opposite. There is no silver bullet here, different usage scenarious are different. > Summary: I propose commiting this particular patch (1/2) "as is" (it was > posted unmodified several times already) and discussing performance patch > (2/2) a little further. First I need a response to the question above regarding the unused code.