From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [94.100.177.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 910434765E0 for ; Wed, 23 Dec 2020 15:18:04 +0300 (MSK) References: <20201215142527.560937-1-void@tarantool.org> From: Leonid Vasiliev Message-ID: <3567727f-90ae-0309-1cd7-1c726822e7d2@tarantool.org> Date: Wed, 23 Dec 2020 15:17:59 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH v2] base64: Properly ignore invalid characters List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Nikiforov , tarantool-patches@dev.tarantool.org Cc: Vladislav Shpilevoy Hi! Just to be clear. Perhaps before I have not clearly expressed my opinion. If you ok with 29 instead ``` + plan(28 + + 1 /* invalid chars test */ + ); ``` then LGTM. On 17.12.2020 17:52, Leonid Vasiliev via Tarantool-patches wrote: > Hi! > > On 17.12.2020 16:04, Sergey Nikiforov wrote: >> Hi! >> >> On 17.12.2020 12:41, Leonid Vasiliev wrote: >>> Hi! Thank you for the patch. >>> Generally LGTM. >>> See some comments below: >>> >>> According to >>> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message >>> >>> "Description is what the patch does, started from lowercase letter, >>> without a dot in the end, in the imperative mood." >>> ("properly..."). >>> I could be wrong, but it seems like is not written in an >>> imperative mood. >> >> Mmm. I am not that great in English, but how "Properly ignore..." is >> not "imperative mood"? What would you suggest? >> I should, however, use lowercase (alas, force of habit). > > I will refer to my phrase:"I could be wrong") Leave it as is. > >> >>> On 15.12.2020 17:25, Sergey Nikiforov wrote: >>>> Not all invalid characters were ignored by base64 decoder >>>> causing data corruption and reads beyond decode table >>>> (faults under ASAN). >>>> >>>> Added corresponding check into base64 unit test. >>>> >>>> Fixes: #5627 >>>> --- >>>> >>>> Branch: >>>> https://github.com/tarantool/tarantool/tree/void234/gh-5627-fix-base64-invalid-chars-processing >>>> >>>> Issue: https://github.com/tarantool/tarantool/issues/5627 >>>> >>>>   test/unit/base64.c      | 23 ++++++++++++++++++++++- >>>>   test/unit/base64.result |  5 ++++- >>>>   third_party/base64.c    |  3 ++- >>>>   3 files changed, 28 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/test/unit/base64.c b/test/unit/base64.c >>>> index ada497adf..c0f53a5e1 100644 >>>> --- a/test/unit/base64.c >>>> +++ b/test/unit/base64.c >>>> @@ -58,9 +58,28 @@ base64_nowrap_test(const char *str) >>>>       base64_test(str, BASE64_NOWRAP, symbols, lengthof(symbols)); >>>>   } >>>> +static void >>>> +base64_invalid_chars_test(void) >>>> +{ >>>> +    /* Upper bit must be cleared */ >>>> +    const char invalid_data[] = { '\x7b', '\x7c', '\x7d', '\x7e' }; >>>> +    char outbuf[8]; >>>> + >>>> +    plan(1); >>> >>> Usually `plan ()` is called as the first call in a function. It's just >>> easier to see how many checks there will be. I don't know any rule about >>> this. So, it's up to you. >> >> I would move it if you like (if there would be another patch >> revision). I have just tried to be C89-friendly. Force of habit >> (useful one). >> > > Ok. > >>>> + >>>> +    /* Invalid chars should be ignored, not decoded into garbage */ >>>> +    is(base64_decode(invalid_data, sizeof(invalid_data), >>>> +                     outbuf, sizeof(outbuf)), >>>> +       0, "ignoring invalid chars"); >>>> + >>>> +    check_plan(); >>>> +} >>>> + >>>>   int main(int argc, char *argv[]) >>>>   { >>>> -    plan(28); >>>> +    plan(28 >>>> +         + 1 /* invalid chars test */ >>>> +         ); >>> >>> I agree with Vlad. Why `+ 1` and not just 29? >> >> Using "magic" values without explanation is a bad idea for >> readability. Should I "decode" how 28 was calculated as well (using >> constants where appropriate) or no one bothers so much with tests? > > AFAIK in `plan()` we always just write the "total" number of checks. > >> >>>>       header(); >>>>       const char *option_tests[] = { >>>> @@ -78,6 +97,8 @@ int main(int argc, char *argv[]) >>>>           base64_nowrap_test(option_tests[i]); >>>>       } >>>> +    base64_invalid_chars_test(); >>>> + >>>>       footer(); >>>>       return check_plan(); >>>>   } >>>> diff --git a/test/unit/base64.result b/test/unit/base64.result >>>> index cd1f2b3f6..3bc2c2275 100644 >>>> --- a/test/unit/base64.result >>>> +++ b/test/unit/base64.result >>>> @@ -1,4 +1,4 @@ >>>> -1..28 >>>> +1..29 >>>>       *** main *** >>>>       1..3 >>>>       ok 1 - length >>>> @@ -175,4 +175,7 @@ ok 27 - subtests >>>>       ok 3 - decode length ok >>>>       ok 4 - encode/decode >>>>   ok 28 - subtests >>>> +    1..1 >>>> +    ok 1 - ignoring invalid chars >>>> +ok 29 - subtests >>>>       *** main: done *** >>>> diff --git a/third_party/base64.c b/third_party/base64.c >>>> index 8ecab23eb..7c69315ea 100644 >>>> --- a/third_party/base64.c >>>> +++ b/third_party/base64.c >>>> @@ -222,7 +222,8 @@ base64_decode_value(int value) >>>>           32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, >>>>           44, 45, 46, 47, 48, 49, 50, 51 >>>>       }; >>>> -    static const int decoding_size = sizeof(decoding); >>>> +    static const int decoding_size = >>>> +        sizeof(decoding) / sizeof(decoding[0]); >>>>       int codepos = value; >>>>       codepos -= 43; >>>>       if (codepos < 0 || codepos >= decoding_size) >>>>