From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 610027030B; Tue, 26 Jan 2021 19:37:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 610027030B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1611679055; bh=jEawnWFGb3kSPgVto1wgO0e+WIP1iwwO9V2OENNI2+s=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=aN550bgPEVOWuA6pBQtwO2b1+VJR41wA145UOouex5bQR39Fa+7p8ivwmCn7kmU+e J3s+5P6D42ZWe3mbTBe5+Q4MQxGT45TZHB6nmJIRnA0FVnTxVgXEJQ9rMccu8lYSyP jBW7fp49R78afsu/ZdQHoAQ7/Yg50p5x3Z6CUlxc= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8BFA15B359 for ; Tue, 26 Jan 2021 19:37:06 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 8BFA15B359 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1l4RKz-0007OD-GQ; Tue, 26 Jan 2021 19:37:05 +0300 To: Alexander Turenko References: <20210121153109.5rgx6wttrkkmcrc5@tkn_work_nb> Message-ID: Date: Tue, 26 Jan 2021 19:37:04 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: <20210121153109.5rgx6wttrkkmcrc5@tkn_work_nb> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92BC6D7A73B5E1EC959C1FBF906E5CB90956EF1D42741DB9200894C459B0CD1B9FC9E7EB7BFD184ECC35201EAAC3EC4342A9F757D15EFBF3B3C5A36B8DDD67158 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE792E16514AF283DFAEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BC6099E116BE74F18638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC404929DF707E7A88B6682262BC83D068268643EA7FE776E0389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A29E2F051442AF778941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B68424CA1AAF98A6958941B15DA834481F9449624AB7ADAF3735872C767BF85DA29E625A9149C048EE22A4E020D69BCAC1F1C9CF18C8EB22694AD6D5ED66289B524E70A05D1297E1BB35872C767BF85DA227C277FBC8AE2E8B82A9524F326DA8DB75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57D4B828FA1BC0F1ACBD9CCCA9EDD067B1EDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A52F86757EC373F88D1F0CD5C7E07B22E45F9957AC4F0BAC1CD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75F04B387B5D7535DE410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34F1ADD4D8CD3C81CEE66BB1151191A372BE33CFF54857A819764311A0D6506CE29F6A73C84CAA13B61D7E09C32AA3244CCDA10376A3FCEB3DC5F0E58EA21DCC9B3FD9C8CA1B0515E083B48618A63566E0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojbLwCnVtyriffClGi5tUfvg== X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822D513A4ED6B0AE52EBF6A4872BBE18922DD675A873A6B1A573284F99205A65E8EFB559BB5D741EB966AABCD5B59A9F6DF9ABAAAF6BC5F075B67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v6 2/2] base64: improve decoder performance X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Nikiforov via Tarantool-patches Reply-To: Sergey Nikiforov Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On 21.01.2021 18:31, Alexander Turenko wrote: > On Mon, Jan 11, 2021 at 12:45:01PM +0300, Sergey Nikiforov wrote: >> Unnecessary checks were removed from internal loops. >> Benchmark shows that performance is now ~1.19 times higher >> (release build, Intel Core I7-9700K, only one thread). >> --- >> >> Branch: https://github.com/tarantool/tarantool/tree/void234/gh-3069-fix-base64-memory-overrun-v6 >> >> third_party/base64.c | 35 ++++++++++++++++++++++++++--------- >> 1 file changed, 26 insertions(+), 9 deletions(-) >> >> diff --git a/third_party/base64.c b/third_party/base64.c >> index 3ecd0843a..74260971b 100644 >> --- a/third_party/base64.c >> +++ b/third_party/base64.c >> @@ -258,10 +258,10 @@ base64_decode_block(const char *in_base64, int in_len, >> { >> case step_a: >> do { >> - if (in_pos == in_end || out_pos >= out_end) >> + if (in_pos >= in_end) > > Okay, it is the loop invariant and so we can move it outside the loop. > But why we sink it down rather than hoisting up? We know that there is > no space in the output buffer -- so there is no sense to read the input, > right? Or it will break 'correctness' of the hypotetical use of the > `state`? Yes. It also prevents us from skipping newlines and other ignored characters at the end of input. > So we should imagine how the state would be used if it would be > possible. From which input position the processing would be resumed if > the code would provide such ability? How a caller supposed to > distinguish end of the output buffer from successful processing the > whole input? > > Aside of this, skipping unrecognized input bytes looks as another > incorrectness. We should be ready to work with untrusted input, perform > appropriate validation and report errors. There is no point because "legal" characters can also be corrupted. CRCs / Digital Signatures, that is outside of base64 decoder scope. > We can skip newlines, but I guess there is no sense to do this in a > loop: only one newline is a row is allowed (but it worth to verify the > guess with the standard). This will make code complex and slow (we have to calculate line position - and it can be split between chunks). It would also break compatibility. A lot of real-life code breaks standards. > It is hard to discuss an optimization until we'll define how the result > should look in the corner cases like 'short output buffer', 'unexpected > end of an input buffer' and so on. Current code does not report errors: > it silently skips invalid input, As it should, in my opinion. > accepts trimmed input, There is no way to determine if input is trimmed or not. Our caller can determine whether enough bytes were decoded. >silently trims output at the buffer boundary. Which in our case (no chunks) means we have received more input than necessary. Just look at base64_decode() call sites - even returned value is not always checked. Should we fix this right now? > I don't understand how it supposed to > work with untrusted input. > > I propose to stop attempts to do some 'quick fixups' and either work on > the code or don't. I haven't attempted to fix anything in this particular patch, it is about performance only. We can drop it entirely. > If we would define behaviour first, I would spend MUCH less time trying > to deduce it. This patch is about "keeping existing behavior" to avoid breaking something (right now extra "legal" characters are silently ignored). > That's really hard to discuss correctness of dead code. To be honest, it > looks easier to me to finish the chained processing implementation, test > it and remove than rather than trying to interpter the code (both > existing and imaginary) in the mind. > > BTW, it seems we able to entirely eliminate the output buffer length > checks for the first (out_len * 3 / 4) input bytes and than process the > tail with full checks. (After we'll done with strict and correct API.) This means processing input in two chunks, so we need state and "final" input position reporting (and a lot of tests for corner cases). Extra logic would hurt performance for small buffers (our typical use case) - even simple check in base64_decode() from patch v4 does, I have tested this. Benchmark code with "barely enough output buffer" and "definitely large enough output buffer" (it accounted for newlines added by encoder) got lost somewhere, but I still have numbers from patch v4: Without length check in the beginning of base64_decode() - output buffer size is checked on all iterations It took 6170225078 ns to decode 7087960000 bytes, speed is 1148736052 bps It took 6141109481 ns to decode 7087960000 bytes, speed is 1154182321 bps It took 6142448824 ns to decode 7087960000 bytes, speed is 1153930655 bps It took 6165578853 ns to decode 7087960000 bytes, speed is 1149601711 bps It took 6126750494 ns to decode 7087960000 bytes, speed is 1156887326 bps It took 6146041244 ns to decode 7087960000 bytes, speed is 1153256172 bps With length check in the beginning of base64_decode() - output buffer size is checked on all iterations because its size is "suspicuous" (may not be large enough) It took 6310650551 ns to decode 7087960000 bytes, speed is 1123174218 bps It took 6243847147 ns to decode 7087960000 bytes, speed is 1135191146 bps It took 6263236313 ns to decode 7087960000 bytes, speed is 1131676923 bps It took 6260555388 ns to decode 7087960000 bytes, speed is 1132161535 bps With length check in the beginning of base64_decode() - output buffer size is definitely large enough so it is not checked within loops. It took 6037055181 ns to decode 7087960000 bytes, speed is 1174075735 bps It took 5993200086 ns to decode 7087960000 bytes, speed is 1182667005 bps It took 5976677666 ns to decode 7087960000 bytes, speed is 1185936467 bps It took 5990214242 ns to decode 7087960000 bytes, speed is 1183256510 bps Or should we do as patch v4 - eliminate chunks support completely? Then at least in_len should be made size_t instead of int (easiest way to avoid integer overflow) and probably tests for corner cases (in_len/out_len close to 4/3) should be added. Loop invariant can be moved up in that case. Vladislav Shpilevoy is clearly voting on patch v4 approach (Quote: "As I said - please, move on to another 2 reviewers, if you don't want to finish the version with state removed.") Leonid Vasiliev is happy even with v5. Please make up your mind.