From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4176C7030B; Tue, 26 Jan 2021 19:37:04 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4176C7030B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1611679024; bh=dvVOKLRZPOJ9yaDPrzmcu3B22HoVG3ZZqPCDbwDqr7g=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sJHKCWa8QCcEpGsQotCxbM2mVYquEbO3JyRoZlVFWjvjzCIsdGNeGeXmr914XS1T8 jPXkKlgGbVhQ+IxLzKQWbb61J/H7NBaTg0Kam/KqyvIi7XEZyGIc4c5mvXNZBJlml0 bBJrr6bQU6VA317sCd2r7RweNYeAKFAccp3TRL3o= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D3D1E7030B for ; Tue, 26 Jan 2021 19:37:02 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D3D1E7030B Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1l4RKv-0007JJ-IU; Tue, 26 Jan 2021 19:37:02 +0300 To: Alexander Turenko References: <7f5ce3321cda6d4305282af486ed2f493b920d1f.1610357625.git.void@tarantool.org> <20210121021627.2dzyh7fho2cvjlrz@tkn_work_nb> Message-ID: Date: Tue, 26 Jan 2021 19:37:00 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: <20210121021627.2dzyh7fho2cvjlrz@tkn_work_nb> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92BC6D7A73B5E1EC98EC91D74BAC48C3C5ED0B073A515FC4C00894C459B0CD1B9B6AD7083A3D2EE745ADDB92A5262F08E2A9F757D15EFBF3B66F2506D529139D3 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE760302A529BCAAAFCEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637C5BAFDCCAC60DA968638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC404929DF707E7A88DB04131915D2A1B661CE66FDF8CDDE19389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C07E7E81EEA8A9722B8941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B68424CA1AAF98A6958941B15DA834481F9449624AB7ADAF3735872C767BF85DA29E625A9149C048EE22A4E020D69BCAC1F1C9CF18C8EB22694AD6D5ED66289B524E70A05D1297E1BB35872C767BF85DA227C277FBC8AE2E8B82A9524F326DA8DB75ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57D4B828FA1BC0F1ACBD9CCCA9EDD067B1EDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A5A24B3FDA85D10FDBA0DAE054ED9F4084A443EFBA1FAA6DE6D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75F04B387B5D7535DE410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34D041FB2E16F174C4E10C3456A9FE93C2000D4E1ED3F2FA5D5C7B0157B7C1DDC37F27E4AAFE2C32511D7E09C32AA3244C38D464BCF0CADFC466419554263BD21E408A6A02710B730483B48618A63566E0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojbLwCnVtyrif/nSpe6IlZyQ== X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822A90691ECE646D39198EA164305C06D05DD675A873A6B1A573284F99205A65E8EFB559BB5D741EB966AABCD5B59A9F6DF9ABAAAF6BC5F075B67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v6 1/2] base64: fix decoder output buffer overrun (reads) X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Nikiforov via Tarantool-patches Reply-To: Sergey Nikiforov Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On 21.01.2021 5:16, Alexander Turenko wrote: > On Mon, Jan 11, 2021 at 12:45:00PM +0300, Sergey Nikiforov wrote: >> Was caught by base64 test with enabled ASAN. > > It seems, we have a problem in CI, otherwise it would be detected. At > least, I don't see an explicit suppression relevant to the base64 code > or disabling the test under this CI job. > > Can you, please, investigate, how to enable it in CI, so we'll catch the > similar problem next time if it'll appear? ASAN is not used in CI now. Which is clearly wrong. Right now a lot of tests fail if LeakSanitizer is enabled (the default for ASAN), but only 1 test (unit/guard.test) fails if LeakSanitizer if disabled. So it is quite straightforward: CC=clang CXX=clang++ cmake . -DENABLE_ASAN=ON && make -j ASAN_OPTIONS=detect_leaks=0 test/test-run.py (test-run.py is launched from several Makefiles) I propose creating tasks to make unit/guard.test "ASAN-tolerant" (ASAN prints warning which causes .result mismatch) and to add ASAN targets to CI. Should it be GitLab or GitHub Actions? We should probably also look on LeakSanitizer issues, some of them are probably real bugs and not just tests sloppiness. >> >> It also caused data corruption - garbage instead of "extra bits" was >> saved into state->result if there was no space in output buffer. > > We have the dead code and it appears to be broken. Why don't remove it? > (AFAIS, the rest of the code does not read off the buffer.) > > Is it due to a little probability that we'll decide to support chunked > decoding and we'll decide to implement it in exactly this way (and not > just leaving undecoded bytes in, say, ibuf)? > > Another side of this little probability is: > > * The code complexity and so waste of the time for anyone who need to > dive into it. > * Have untested code in the code base that may give us more surprises. > * Extra def-use dependencies may hide optimization opportunities and > increase register pressure. And yet you are youself proposing to improve performance: "entirely eliminate the output buffer lengthchecks for the first (out_len * 3 / 4) input bytes" (quoted from your e-mail about 2/2). This means saving state and reporting input buffer stop position. So: do we want complexity (and performance) or simplicity? > This is not the question regarding the patch, but this code looks broken > in several other ways. At least: > > * It skips unrecognized symbols and does not report a decoding error. Some of these symbols should "legally" be skipped - like newlines in e-mail use case. And there are probably other use cases which use some other symbols. We could break something and gain nothing. Nothing prevents somebody from corrupting "legal" base64 characters, this is not detected nor should be. These issues are outside base64 decoder scope, CRCs or digital signatures should be used when data can be accidentally or intentionally corrupted. > * If the output buffer is too short, it neither report an error, nor a > required buffer length (like snprintf()). No way to distinguish a > successful and an 'interrupted' processing. Here I agree. There was no output buffer length checks in code imported into Tarantool. That is clearly stopgap to prevent buffer overrun. Summary: I propose commiting this particular patch (1/2) "as is" (it was posted unmodified several times already) and discussing performance patch (2/2) a little further. >> >> Added test for "zero-sized output buffer" case. > > Nice catch. > >> >> Fixes: #3069 >> --- >> >> Branch: https://github.com/tarantool/tarantool/tree/void234/gh-3069-fix-base64-memory-overrun-v6 >> Issue: https://github.com/tarantool/tarantool/issues/3069