[tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue
Nikita Tatunov
n.tatunov at tarantool.org
Tue Sep 11 09:06:22 MSK 2018
> On 11 Sep 2018, at 01:20, Alex Khatskevich <avkhatskevich at tarantool.org> wrote:
>
>>
>>
>>> On 17 Aug 2018, at 14:42, Alex Khatskevich <avkhatskevich at tarantool.org <mailto:avkhatskevich at tarantool.org>> wrote:
>>>
>>>
>>> On 17.08.2018 14:17, Alexander Turenko wrote:
>>>> 0xffff is the result of 'end of a string' check as well as internal buffer
>>>> overflow error. I have the relevant code pasted in the first review of
>>>> the patch (July, 18).
>>>>
>>>> // source/common/ucnv.c::ucnv_getNextUChar
>>>> 1860 s=*source;
>>>> 1861 if(sourceLimit<s) {
>>>> 1862 *err=U_ILLEGAL_ARGUMENT_ERROR;
>>>> 1863 return 0xffff;
>>>> 1864 }
>>>>
>>>> We should not handle the buffer overflow case as an invalid symbol. Of
>>>> course we should not handle it as the 'end of the string' situation.
>>>> Ideally we should perform pointer myself and raise an error in case of
>>>> 0xffff. I had thought that a buffer overflow error is unlikely to meet,
>>>> but you are right: we should differentiate these situations.
>>>>
>>>> In one of the previous version of a patch we perform this check like so:
>>>>
>>>> #define Utf8Read(s, e) (((s) < (e)) ?\
>>>> ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0)
>>>>
>>>> Don't sure why it was changed. Maybe it is try to correctly handle '\0'
>>>> symbol (it is valid unicode character)?
>>> The define you have pasted can return 0xffff.
>>> The reasons to change it back are described in the previous patchset.
>>> In short:
>>> 1. It is equivalent to
>>> a. check s < e in a while loop
>>> b. read next character inside of where loop body.
>>> 2. In some usages of the code this check (s<e) was redundant (it was performed a couple lines above)
>>> 3. There is no reason to rewrite the old version of this function. (So, we decided to use old version of the function)
>>>> So I see two ways to proceed:
>>>>
>>>> 1. Lean on icu's check and ignore possibility of the buffer overflow.
>>>> 2. Use our own check and possibly meet '\0' problems.
>>>> 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, raise
>>>> the error for other 0xffff.
>>>>
>>>> Alex, what do you suggests here?
>>> As I understand, by now the 0xffff is used ONLY to handle the case of unexpectedly ended symbol.
>>> E.g. some symbol consists of 2 characters, but the length of the input buffer is 1.
>>> In my opinion this is the same as an invalid symbol.
>>>
>>> I guess that internal buffer overflow cannot occur in the `ucnv_getNextChar` function.
>>>
>>> I suppose that it is Nikitas duty to investigate this problem and explain it to us all. I just have noticed a strange usage.
>>
>>
>> Hello, please consider my comments.
>>
>> There are some cases when 0xffff can occur, but:
>> 1) Cannot trigger in our context.
>> 2) Cannot trigger in our context.
>> 3) Only triggers if end < start. (Cannot happen in sql_utf8_pattern_compare, i guess)
>> 4) Only triggers if string length > (size_t) 0x7ffffffff (can it actually happen? I don’t think so).
>> 5) Occurs when trying to access to not unindexed data.
>> 6) Cannot occur in our context.
>> 7) Cannot occur in our context.
> I do not understand what are those numbers related to. Please, describe it.
They are related to possible cases returning 0xffff from icu source code (function ucnv_getNextUChar()).
>>
>> 0xfffd only means that symbol cannot be treated as a unicode symbol.
>>
>> Shall I change it somehow then?
>>
>>
>>> On 17 Aug 2018, at 12:23, Alex Khatskevich <avkhatskevich at tarantool.org <mailto:avkhatskevich at tarantool.org>> wrote:
>>>
>>> I have a look at icu code and It seems like 0xffff is an error, and it is more similar to
>>> invalid symbol that to "end of string". Check it, and fix the code, so that it is treated as
>>> an error.
>>> For example it is not handled in the main pattern loop:
>>>
>>> + while (pattern < pattern_end) {
>>> c = Utf8Read(pattern, pattern_end);
>>> + if (c == SQL_INVALID_UTF8_SYMBOL)
>>> + return SQL_INVALID_PATTERN;
>>>
>>> It seems like the 0xffff should be checked there too.
>>
>> No, it should not. This way it will only cause a bug when, for example ’select “” like “”’
>> will be treated as an error.
> I do not understand.
> ’select “” like “”’ should not even trap inside of the while loop
> (because `pattern < pattern_end` is false).
Ah, you’re right, sorry, then it just doesn’t matter, since pattern < pattern_end is equal
to 0xffff according to the comment above.
--
WBR, Nikita Tatunov.
n.tatunov at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180911/b64ce724/attachment.html>
More information about the Tarantool-patches
mailing list