From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lvasiliev@tarantool.org>
Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 0FDBB45C304
 for <tarantool-patches@dev.tarantool.org>;
 Thu, 17 Dec 2020 17:53:26 +0300 (MSK)
References: <20201215142527.560937-1-void@tarantool.org>
 <e129fc18-a7b0-3e7f-35eb-caff5b4eadc2@tarantool.org>
 <bac931d0-cec9-4a8d-72ec-8173bba5920f@tarantool.org>
From: Leonid Vasiliev <lvasiliev@tarantool.org>
Message-ID: <c1008f58-b572-877b-0e5a-61d1cdd41f47@tarantool.org>
Date: Thu, 17 Dec 2020 17:52:25 +0300
MIME-Version: 1.0
In-Reply-To: <bac931d0-cec9-4a8d-72ec-8173bba5920f@tarantool.org>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [Tarantool-patches] [PATCH v2] base64: Properly ignore invalid
	characters
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Sergey Nikiforov <void@tarantool.org>, tarantool-patches@dev.tarantool.org
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

Hi!

On 17.12.2020 16:04, Sergey Nikiforov wrote:
> Hi!
> 
> On 17.12.2020 12:41, Leonid Vasiliev wrote:
>> Hi! Thank you for the patch.
>> Generally LGTM.
>> See some comments below:
>>
>> According to 
>> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message 
>>
>> "Description is what the patch does, started from lowercase letter,
>> without a dot in the end, in the imperative mood."
>> ("properly...").
>> I could be wrong, but it seems like <description> is not written in an
>> imperative mood.
> 
> Mmm. I am not that great in English, but how "Properly ignore..." is not 
> "imperative mood"? What would you suggest?
> I should, however, use lowercase (alas, force of habit).

I will refer to my phrase:"I could be wrong") Leave it as is.

> 
>> On 15.12.2020 17:25, Sergey Nikiforov wrote:
>>> Not all invalid characters were ignored by base64 decoder
>>> causing data corruption and reads beyond decode table
>>> (faults under ASAN).
>>>
>>> Added corresponding check into base64 unit test.
>>>
>>> Fixes: #5627
>>> ---
>>>
>>> Branch: 
>>> https://github.com/tarantool/tarantool/tree/void234/gh-5627-fix-base64-invalid-chars-processing 
>>>
>>> Issue: https://github.com/tarantool/tarantool/issues/5627
>>>
>>>   test/unit/base64.c      | 23 ++++++++++++++++++++++-
>>>   test/unit/base64.result |  5 ++++-
>>>   third_party/base64.c    |  3 ++-
>>>   3 files changed, 28 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/test/unit/base64.c b/test/unit/base64.c
>>> index ada497adf..c0f53a5e1 100644
>>> --- a/test/unit/base64.c
>>> +++ b/test/unit/base64.c
>>> @@ -58,9 +58,28 @@ base64_nowrap_test(const char *str)
>>>       base64_test(str, BASE64_NOWRAP, symbols, lengthof(symbols));
>>>   }
>>> +static void
>>> +base64_invalid_chars_test(void)
>>> +{
>>> +    /* Upper bit must be cleared */
>>> +    const char invalid_data[] = { '\x7b', '\x7c', '\x7d', '\x7e' };
>>> +    char outbuf[8];
>>> +
>>> +    plan(1);
>>
>> Usually `plan ()` is called as the first call in a function. It's just
>> easier to see how many checks there will be. I don't know any rule about
>> this. So, it's up to you.
> 
> I would move it if you like (if there would be another patch revision). 
> I have just tried to be C89-friendly. Force of habit (useful one).
> 

Ok.

>>> +
>>> +    /* Invalid chars should be ignored, not decoded into garbage */
>>> +    is(base64_decode(invalid_data, sizeof(invalid_data),
>>> +                     outbuf, sizeof(outbuf)),
>>> +       0, "ignoring invalid chars");
>>> +
>>> +    check_plan();
>>> +}
>>> +
>>>   int main(int argc, char *argv[])
>>>   {
>>> -    plan(28);
>>> +    plan(28
>>> +         + 1 /* invalid chars test */
>>> +         );
>>
>> I agree with Vlad. Why `+ 1` and not just 29?
> 
> Using "magic" values without explanation is a bad idea for readability. 
> Should I "decode" how 28 was calculated as well (using constants where 
> appropriate) or no one bothers so much with tests?

AFAIK in `plan()` we always just write the "total" number of checks.

> 
>>>       header();
>>>       const char *option_tests[] = {
>>> @@ -78,6 +97,8 @@ int main(int argc, char *argv[])
>>>           base64_nowrap_test(option_tests[i]);
>>>       }
>>> +    base64_invalid_chars_test();
>>> +
>>>       footer();
>>>       return check_plan();
>>>   }
>>> diff --git a/test/unit/base64.result b/test/unit/base64.result
>>> index cd1f2b3f6..3bc2c2275 100644
>>> --- a/test/unit/base64.result
>>> +++ b/test/unit/base64.result
>>> @@ -1,4 +1,4 @@
>>> -1..28
>>> +1..29
>>>       *** main ***
>>>       1..3
>>>       ok 1 - length
>>> @@ -175,4 +175,7 @@ ok 27 - subtests
>>>       ok 3 - decode length ok
>>>       ok 4 - encode/decode
>>>   ok 28 - subtests
>>> +    1..1
>>> +    ok 1 - ignoring invalid chars
>>> +ok 29 - subtests
>>>       *** main: done ***
>>> diff --git a/third_party/base64.c b/third_party/base64.c
>>> index 8ecab23eb..7c69315ea 100644
>>> --- a/third_party/base64.c
>>> +++ b/third_party/base64.c
>>> @@ -222,7 +222,8 @@ base64_decode_value(int value)
>>>           32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
>>>           44, 45, 46, 47, 48, 49, 50, 51
>>>       };
>>> -    static const int decoding_size = sizeof(decoding);
>>> +    static const int decoding_size =
>>> +        sizeof(decoding) / sizeof(decoding[0]);
>>>       int codepos = value;
>>>       codepos -= 43;
>>>       if (codepos < 0 || codepos >= decoding_size)
>>>