From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BB25D491189 for ; Thu, 17 Dec 2020 16:04:59 +0300 (MSK) References: <20201215142527.560937-1-void@tarantool.org> From: Sergey Nikiforov Message-ID: Date: Thu, 17 Dec 2020 16:04:58 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH v2] base64: Properly ignore invalid characters List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Leonid Vasiliev , tarantool-patches@dev.tarantool.org Cc: Vladislav Shpilevoy Hi! On 17.12.2020 12:41, Leonid Vasiliev wrote: > Hi! Thank you for the patch. > Generally LGTM. > See some comments below: > > According to > https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message > > "Description is what the patch does, started from lowercase letter, > without a dot in the end, in the imperative mood." > ("properly..."). > I could be wrong, but it seems like is not written in an > imperative mood. Mmm. I am not that great in English, but how "Properly ignore..." is not "imperative mood"? What would you suggest? I should, however, use lowercase (alas, force of habit). > On 15.12.2020 17:25, Sergey Nikiforov wrote: >> Not all invalid characters were ignored by base64 decoder >> causing data corruption and reads beyond decode table >> (faults under ASAN). >> >> Added corresponding check into base64 unit test. >> >> Fixes: #5627 >> --- >> >> Branch: >> https://github.com/tarantool/tarantool/tree/void234/gh-5627-fix-base64-invalid-chars-processing >> >> Issue: https://github.com/tarantool/tarantool/issues/5627 >> >>   test/unit/base64.c      | 23 ++++++++++++++++++++++- >>   test/unit/base64.result |  5 ++++- >>   third_party/base64.c    |  3 ++- >>   3 files changed, 28 insertions(+), 3 deletions(-) >> >> diff --git a/test/unit/base64.c b/test/unit/base64.c >> index ada497adf..c0f53a5e1 100644 >> --- a/test/unit/base64.c >> +++ b/test/unit/base64.c >> @@ -58,9 +58,28 @@ base64_nowrap_test(const char *str) >>       base64_test(str, BASE64_NOWRAP, symbols, lengthof(symbols)); >>   } >> +static void >> +base64_invalid_chars_test(void) >> +{ >> +    /* Upper bit must be cleared */ >> +    const char invalid_data[] = { '\x7b', '\x7c', '\x7d', '\x7e' }; >> +    char outbuf[8]; >> + >> +    plan(1); > > Usually `plan ()` is called as the first call in a function. It's just > easier to see how many checks there will be. I don't know any rule about > this. So, it's up to you. I would move it if you like (if there would be another patch revision). I have just tried to be C89-friendly. Force of habit (useful one). >> + >> +    /* Invalid chars should be ignored, not decoded into garbage */ >> +    is(base64_decode(invalid_data, sizeof(invalid_data), >> +                     outbuf, sizeof(outbuf)), >> +       0, "ignoring invalid chars"); >> + >> +    check_plan(); >> +} >> + >>   int main(int argc, char *argv[]) >>   { >> -    plan(28); >> +    plan(28 >> +         + 1 /* invalid chars test */ >> +         ); > > I agree with Vlad. Why `+ 1` and not just 29? Using "magic" values without explanation is a bad idea for readability. Should I "decode" how 28 was calculated as well (using constants where appropriate) or no one bothers so much with tests? >>       header(); >>       const char *option_tests[] = { >> @@ -78,6 +97,8 @@ int main(int argc, char *argv[]) >>           base64_nowrap_test(option_tests[i]); >>       } >> +    base64_invalid_chars_test(); >> + >>       footer(); >>       return check_plan(); >>   } >> diff --git a/test/unit/base64.result b/test/unit/base64.result >> index cd1f2b3f6..3bc2c2275 100644 >> --- a/test/unit/base64.result >> +++ b/test/unit/base64.result >> @@ -1,4 +1,4 @@ >> -1..28 >> +1..29 >>       *** main *** >>       1..3 >>       ok 1 - length >> @@ -175,4 +175,7 @@ ok 27 - subtests >>       ok 3 - decode length ok >>       ok 4 - encode/decode >>   ok 28 - subtests >> +    1..1 >> +    ok 1 - ignoring invalid chars >> +ok 29 - subtests >>       *** main: done *** >> diff --git a/third_party/base64.c b/third_party/base64.c >> index 8ecab23eb..7c69315ea 100644 >> --- a/third_party/base64.c >> +++ b/third_party/base64.c >> @@ -222,7 +222,8 @@ base64_decode_value(int value) >>           32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, >>           44, 45, 46, 47, 48, 49, 50, 51 >>       }; >> -    static const int decoding_size = sizeof(decoding); >> +    static const int decoding_size = >> +        sizeof(decoding) / sizeof(decoding[0]); >>       int codepos = value; >>       codepos -= 43; >>       if (codepos < 0 || codepos >= decoding_size) >>