>
>> On 17 Aug 2018, at 14:42, Alex Khatskevich 
>> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> wrote:
>>
>>
>> On 17.08.2018 14:17, Alexander Turenko wrote:
>>> 0xffff is the result of 'end of a string' check as well as internal 
>>> buffer
>>> overflow error. I have the relevant code pasted in the first review of
>>> the patch (July, 18).
>>>
>>> // source/common/ucnv.c::ucnv_getNextUChar
>>> 1860     s=*source;
>>> 1861     if(sourceLimit<s) {
>>> 1862         *err=U_ILLEGAL_ARGUMENT_ERROR;
>>> 1863         return 0xffff;
>>> 1864     }
>>>
>>> We should not handle the buffer overflow case as an invalid symbol. Of
>>> course we should not handle it as the 'end of the string' situation.
>>> Ideally we should perform pointer myself and raise an error in case of
>>> 0xffff. I had thought that a buffer overflow error is unlikely to meet,
>>> but you are right: we should differentiate these situations.
>>>
>>> In one of the previous version of a patch we perform this check like so:
>>>
>>> #define Utf8Read(s, e) (((s) < (e)) ?\
>>> ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0)
>>>
>>> Don't sure why it was changed. Maybe it is try to correctly handle '\0'
>>> symbol (it is valid unicode character)?
>> The define you have pasted can return 0xffff.
>> The reasons to change it back are described in the previous patchset.
>> In short:
>> 1. It is equivalent to
>>    a. check s < e in a while loop
>>    b. read next character inside of where loop body.
>> 2. In some usages of the code this check (s<e) was redundant (it was 
>> performed a couple lines above)
>> 3. There is no reason to rewrite the old version of this function. 
>> (So, we decided to use old version of the function)
>>> So I see two ways to proceed:
>>>
>>> 1. Lean on icu's check and ignore possibility of the buffer overflow.
>>> 2. Use our own check and possibly meet '\0' problems.
>>> 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, raise
>>>    the error for other 0xffff.
>>>
>>> Alex, what do you suggests here?
>> As I understand, by now the 0xffff is used ONLY to handle the case of 
>> unexpectedly ended symbol.
>> E.g. some symbol consists of 2 characters, but the length of the 
>> input buffer is 1.
>> In my opinion this is the same as an invalid symbol.
>>
>> I guess that internal buffer overflow cannot occur in the 
>> `ucnv_getNextChar` function.
>>
>> I suppose that it is Nikitas duty to investigate this problem and 
>> explain it to us all. I just have noticed a strange usage.
>
> Hello, please consider my comments.
>
> There are some cases when 0xffff can occur, but:
> 1) Cannot trigger in our context.
> 2) Cannot trigger in our context.
> 3) Only triggers if end < start. (Cannot happen in 
> sql_utf8_pattern_compare, i guess)
> 4) Only triggers if string length > (size_t) 0x7ffffffff (can it 
> actually happen? I don’t think so).
> 5) Occurs when trying to access to not unindexed data.
> 6) Cannot occur in our context.
> 7) Cannot occur in our context.
I do not understand what are those numbers related to. Please, describe it.
>
> 0xfffd only means that symbol cannot be treated as a unicode symbol.
>
> Shall I change it somehow then?
>
>
>> On 17 Aug 2018, at 12:23, Alex Khatskevich 
>> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> wrote:
>>
>> I have a look at icu code and It seems like 0xffff is an error, and 
>> it is more similar to
>> invalid symbol that to "end of string". Check it, and fix the code, 
>> so that it is treated as
>> an error.
>> For example it is not handled in the main pattern loop:
>>
>> +while (pattern < pattern_end) {
>> c = Utf8Read(pattern, pattern_end);
>> +if (c == SQL_INVALID_UTF8_SYMBOL)
>> +return SQL_INVALID_PATTERN;
>>
>> It seems like the 0xffff should be checked there too.
>
> No, it should not. This way it will only cause a bug when, for example 
> ’select “” like “”’
> will be treated as an error.
I do not understand.
’select “” like “”’ should not even trap inside of the while loop
(because `pattern < pattern_end` is false).