Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alex Khatskevich <avkhatskevich@tarantool.org>
To: Nikita Tatunov <n.tatunov@tarantool.org>,
	tarantool-patches@freelists.org
Cc: Alexander Turenko <alexander.turenko@tarantool.org>,
	korablev@tarantool.org
Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue
Date: Tue, 11 Sep 2018 13:06:57 +0300	[thread overview]
Message-ID: <860a125b-19f3-3bf1-8705-25156ff508ab@tarantool.org> (raw)
In-Reply-To: <58B407E2-AF5D-4531-A9FF-9DC57CE0070B@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]



On 11.09.2018 09:06, Nikita Tatunov wrote:
>
>
>> On 11 Sep 2018, at 01:20, Alex Khatskevich 
>> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> wrote:
>>
>>>
>>>
>>>> On 17 Aug 2018, at 14:42, Alex Khatskevich 
>>>> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> 
>>>> wrote:
>>>>
>>>>
>>>> On 17.08.2018 14:17, Alexander Turenko wrote:
>>>>> 0xffff is the result of 'end of a string' check as well as 
>>>>> internal buffer
>>>>> overflow error. I have the relevant code pasted in the first review of
>>>>> the patch (July, 18).
>>>>>
>>>>> // source/common/ucnv.c::ucnv_getNextUChar
>>>>> 1860     s=*source;
>>>>> 1861     if(sourceLimit<s) {
>>>>> 1862         *err=U_ILLEGAL_ARGUMENT_ERROR;
>>>>> 1863         return 0xffff;
>>>>> 1864     }
>>>>>
>>>>> We should not handle the buffer overflow case as an invalid symbol. Of
>>>>> course we should not handle it as the 'end of the string' situation.
>>>>> Ideally we should perform pointer myself and raise an error in case of
>>>>> 0xffff. I had thought that a buffer overflow error is unlikely to 
>>>>> meet,
>>>>> but you are right: we should differentiate these situations.
>>>>>
>>>>> In one of the previous version of a patch we perform this check 
>>>>> like so:
>>>>>
>>>>> #define Utf8Read(s, e) (((s) < (e)) ?\
>>>>> ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0)
>>>>>
>>>>> Don't sure why it was changed. Maybe it is try to correctly handle 
>>>>> '\0'
>>>>> symbol (it is valid unicode character)?
>>>> The define you have pasted can return 0xffff.
>>>> The reasons to change it back are described in the previous patchset.
>>>> In short:
>>>> 1. It is equivalent to
>>>> a. check s < e in a while loop
>>>> b. read next character inside of where loop body.
>>>> 2. In some usages of the code this check (s<e) was redundant (it 
>>>> was performed a couple lines above)
>>>> 3. There is no reason to rewrite the old version of this function. 
>>>> (So, we decided to use old version of the function)
>>>>> So I see two ways to proceed:
>>>>>
>>>>> 1. Lean on icu's check and ignore possibility of the buffer overflow.
>>>>> 2. Use our own check and possibly meet '\0' problems.
>>>>> 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, 
>>>>> raise
>>>>>    the error for other 0xffff.
>>>>>
>>>>> Alex, what do you suggests here?
>>>> As I understand, by now the 0xffff is used ONLY to handle the case 
>>>> of unexpectedly ended symbol.
>>>> E.g. some symbol consists of 2 characters, but the length of the 
>>>> input buffer is 1.
>>>> In my opinion this is the same as an invalid symbol.
>>>>
>>>> I guess that internal buffer overflow cannot occur in the 
>>>> `ucnv_getNextChar` function.
>>>>
>>>> I suppose that it is Nikitas duty to investigate this problem and 
>>>> explain it to us all. I just have noticed a strange usage.
>>>
>>> Hello, please consider my comments.
>>>
>>> There are some cases when 0xffff can occur, but:
>>> 1) Cannot trigger in our context.
>>> 2) Cannot trigger in our context.
>>> 3) Only triggers if end < start. (Cannot happen in 
>>> sql_utf8_pattern_compare, i guess)
>>> 4) Only triggers if string length > (size_t) 0x7ffffffff (can it 
>>> actually happen? I don’t think so).
>>> 5) Occurs when trying to access to not unindexed data.
>>> 6) Cannot occur in our context.
>>> 7) Cannot occur in our context.
>> I do not understand what are those numbers related to. Please, 
>> describe it.
>
> They are related to possible cases returning 0xffff from icu source 
> code (function ucnv_getNextUChar()).
Can you just copy it here, so that anyone interested in that 
conversation can
analyze it without looking for source files?
>
>>>
>>> 0xfffd only means that symbol cannot be treated as a unicode symbol.
>>>
>>> Shall I change it somehow then?
>>>
>>>
>>>> On 17 Aug 2018, at 12:23, Alex Khatskevich 
>>>> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> 
>>>> wrote:
>>>>
>>>> I have a look at icu code and It seems like 0xffff is an error, and 
>>>> it is more similar to
>>>> invalid symbol that to "end of string". Check it, and fix the code, 
>>>> so that it is treated as
>>>> an error.
>>>> For example it is not handled in the main pattern loop:
>>>>
>>>> +while (pattern < pattern_end) {
>>>> c = Utf8Read(pattern, pattern_end);
>>>> +if (c == SQL_INVALID_UTF8_SYMBOL)
>>>> +return SQL_INVALID_PATTERN;
>>>>
>>>> It seems like the 0xffff should be checked there too.
>>>
>>> No, it should not. This way it will only cause a bug when, for 
>>> example ’select “” like “”’
>>> will be treated as an error.
>> I do not understand.
>> ’select “” like “”’ should not even trap inside of the while loop
>> (because`pattern < pattern_end` is false).
>
> Ah, you’re right, sorry, then it just doesn’t matter, since pattern < 
> pattern_end is equal
> to 0xffff according to the comment above.
>
> --
> WBR, Nikita Tatunov.
> n.tatunov@tarantool.org <mailto:n.tatunov@tarantool.org>
>


[-- Attachment #2: Type: text/html, Size: 38461 bytes --]

  reply	other threads:[~2018-09-11 10:07 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16 17:00 [tarantool-patches] [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal N.Tatunov
2018-08-16 17:00 ` [tarantool-patches] [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue N.Tatunov
2018-08-17  9:23   ` [tarantool-patches] " Alex Khatskevich
2018-08-17 11:17     ` Alexander Turenko
2018-08-17 11:42       ` Alex Khatskevich
2018-09-09 13:33         ` Nikita Tatunov
2018-09-10 22:20           ` Alex Khatskevich
2018-09-11  6:06             ` Nikita Tatunov
2018-09-11 10:06               ` Alex Khatskevich [this message]
2018-09-11 13:31                 ` Nikita Tatunov
2018-10-18 18:02                   ` Nikita Tatunov
2018-10-21  3:51                     ` Alexander Turenko
2018-10-26 15:19                       ` Nikita Tatunov
2018-10-29 13:01                         ` Alexander Turenko
2018-10-31  5:25                           ` Nikita Tatunov
2018-11-01 10:30                             ` Alexander Turenko
2018-11-14 14:16                               ` n.pettik
2018-11-14 17:06                                 ` Alexander Turenko
2018-08-16 17:00 ` [tarantool-patches] [PATCH 2/2] sql: remove GLOB from Tarantool N.Tatunov
2018-08-17  8:25   ` [tarantool-patches] " Alex Khatskevich
2018-08-17  8:49     ` n.pettik
2018-08-17  9:01       ` Alex Khatskevich
2018-08-17  9:20         ` n.pettik
2018-08-17  9:28           ` Alex Khatskevich
     [not found]     ` <04D02794-07A5-4146-9144-84EE720C8656@corp.mail.ru>
2018-08-17  8:53       ` Alex Khatskevich
2018-08-17 11:26     ` Alexander Turenko
2018-08-17 11:34       ` Alexander Turenko
2018-08-17 13:46     ` Nikita Tatunov
2018-09-09 14:57     ` Nikita Tatunov
2018-09-10 22:06       ` Alex Khatskevich
2018-09-11  7:38         ` Nikita Tatunov
2018-09-11 10:11           ` Alexander Turenko
2018-09-11 10:22             ` Alex Khatskevich
2018-09-11 12:03           ` Alex Khatskevich
2018-10-18 20:28             ` Nikita Tatunov
2018-10-21  3:48               ` Alexander Turenko
2018-10-26 15:21                 ` Nikita Tatunov
2018-10-29 12:15                   ` Alexander Turenko
2018-11-08 15:09                     ` Nikita Tatunov
2018-11-09 12:18                       ` Alexander Turenko
2018-11-10  3:38                         ` Nikita Tatunov
2018-11-13 19:23                           ` Alexander Turenko
2018-11-14 14:16                             ` n.pettik
2018-11-14 17:41                               ` Alexander Turenko
2018-11-14 21:48                                 ` n.pettik
2018-11-15  4:57 ` [tarantool-patches] Re: [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=860a125b-19f3-3bf1-8705-25156ff508ab@tarantool.org \
    --to=avkhatskevich@tarantool.org \
    --cc=alexander.turenko@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=n.tatunov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox