From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0FDBB45C304 for ; Thu, 17 Dec 2020 17:53:26 +0300 (MSK) References: <20201215142527.560937-1-void@tarantool.org> From: Leonid Vasiliev Message-ID: Date: Thu, 17 Dec 2020 17:52:25 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH v2] base64: Properly ignore invalid characters List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Nikiforov , tarantool-patches@dev.tarantool.org Cc: Vladislav Shpilevoy Hi! On 17.12.2020 16:04, Sergey Nikiforov wrote: > Hi! > > On 17.12.2020 12:41, Leonid Vasiliev wrote: >> Hi! Thank you for the patch. >> Generally LGTM. >> See some comments below: >> >> According to >> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message >> >> "Description is what the patch does, started from lowercase letter, >> without a dot in the end, in the imperative mood." >> ("properly..."). >> I could be wrong, but it seems like is not written in an >> imperative mood. > > Mmm. I am not that great in English, but how "Properly ignore..." is not > "imperative mood"? What would you suggest? > I should, however, use lowercase (alas, force of habit). I will refer to my phrase:"I could be wrong") Leave it as is. > >> On 15.12.2020 17:25, Sergey Nikiforov wrote: >>> Not all invalid characters were ignored by base64 decoder >>> causing data corruption and reads beyond decode table >>> (faults under ASAN). >>> >>> Added corresponding check into base64 unit test. >>> >>> Fixes: #5627 >>> --- >>> >>> Branch: >>> https://github.com/tarantool/tarantool/tree/void234/gh-5627-fix-base64-invalid-chars-processing >>> >>> Issue: https://github.com/tarantool/tarantool/issues/5627 >>> >>>   test/unit/base64.c      | 23 ++++++++++++++++++++++- >>>   test/unit/base64.result |  5 ++++- >>>   third_party/base64.c    |  3 ++- >>>   3 files changed, 28 insertions(+), 3 deletions(-) >>> >>> diff --git a/test/unit/base64.c b/test/unit/base64.c >>> index ada497adf..c0f53a5e1 100644 >>> --- a/test/unit/base64.c >>> +++ b/test/unit/base64.c >>> @@ -58,9 +58,28 @@ base64_nowrap_test(const char *str) >>>       base64_test(str, BASE64_NOWRAP, symbols, lengthof(symbols)); >>>   } >>> +static void >>> +base64_invalid_chars_test(void) >>> +{ >>> +    /* Upper bit must be cleared */ >>> +    const char invalid_data[] = { '\x7b', '\x7c', '\x7d', '\x7e' }; >>> +    char outbuf[8]; >>> + >>> +    plan(1); >> >> Usually `plan ()` is called as the first call in a function. It's just >> easier to see how many checks there will be. I don't know any rule about >> this. So, it's up to you. > > I would move it if you like (if there would be another patch revision). > I have just tried to be C89-friendly. Force of habit (useful one). > Ok. >>> + >>> +    /* Invalid chars should be ignored, not decoded into garbage */ >>> +    is(base64_decode(invalid_data, sizeof(invalid_data), >>> +                     outbuf, sizeof(outbuf)), >>> +       0, "ignoring invalid chars"); >>> + >>> +    check_plan(); >>> +} >>> + >>>   int main(int argc, char *argv[]) >>>   { >>> -    plan(28); >>> +    plan(28 >>> +         + 1 /* invalid chars test */ >>> +         ); >> >> I agree with Vlad. Why `+ 1` and not just 29? > > Using "magic" values without explanation is a bad idea for readability. > Should I "decode" how 28 was calculated as well (using constants where > appropriate) or no one bothers so much with tests? AFAIK in `plan()` we always just write the "total" number of checks. > >>>       header(); >>>       const char *option_tests[] = { >>> @@ -78,6 +97,8 @@ int main(int argc, char *argv[]) >>>           base64_nowrap_test(option_tests[i]); >>>       } >>> +    base64_invalid_chars_test(); >>> + >>>       footer(); >>>       return check_plan(); >>>   } >>> diff --git a/test/unit/base64.result b/test/unit/base64.result >>> index cd1f2b3f6..3bc2c2275 100644 >>> --- a/test/unit/base64.result >>> +++ b/test/unit/base64.result >>> @@ -1,4 +1,4 @@ >>> -1..28 >>> +1..29 >>>       *** main *** >>>       1..3 >>>       ok 1 - length >>> @@ -175,4 +175,7 @@ ok 27 - subtests >>>       ok 3 - decode length ok >>>       ok 4 - encode/decode >>>   ok 28 - subtests >>> +    1..1 >>> +    ok 1 - ignoring invalid chars >>> +ok 29 - subtests >>>       *** main: done *** >>> diff --git a/third_party/base64.c b/third_party/base64.c >>> index 8ecab23eb..7c69315ea 100644 >>> --- a/third_party/base64.c >>> +++ b/third_party/base64.c >>> @@ -222,7 +222,8 @@ base64_decode_value(int value) >>>           32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, >>>           44, 45, 46, 47, 48, 49, 50, 51 >>>       }; >>> -    static const int decoding_size = sizeof(decoding); >>> +    static const int decoding_size = >>> +        sizeof(decoding) / sizeof(decoding[0]); >>>       int codepos = value; >>>       codepos -= 43; >>>       if (codepos < 0 || codepos >= decoding_size) >>>