From: Alex Khatskevich <avkhatskevich@tarantool.org> To: tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue Date: Tue, 11 Sep 2018 01:20:39 +0300 [thread overview] Message-ID: <6a1352e9-425c-d656-1bec-bb04d9f0fee6@tarantool.org> (raw) In-Reply-To: <87897608-173E-45EB-80A1-8B249706D8A1@tarantool.org> [-- Attachment #1: Type: text/plain, Size: 4139 bytes --] > >> On 17 Aug 2018, at 14:42, Alex Khatskevich >> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> wrote: >> >> >> On 17.08.2018 14:17, Alexander Turenko wrote: >>> 0xffff is the result of 'end of a string' check as well as internal >>> buffer >>> overflow error. I have the relevant code pasted in the first review of >>> the patch (July, 18). >>> >>> // source/common/ucnv.c::ucnv_getNextUChar >>> 1860 s=*source; >>> 1861 if(sourceLimit<s) { >>> 1862 *err=U_ILLEGAL_ARGUMENT_ERROR; >>> 1863 return 0xffff; >>> 1864 } >>> >>> We should not handle the buffer overflow case as an invalid symbol. Of >>> course we should not handle it as the 'end of the string' situation. >>> Ideally we should perform pointer myself and raise an error in case of >>> 0xffff. I had thought that a buffer overflow error is unlikely to meet, >>> but you are right: we should differentiate these situations. >>> >>> In one of the previous version of a patch we perform this check like so: >>> >>> #define Utf8Read(s, e) (((s) < (e)) ?\ >>> ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0) >>> >>> Don't sure why it was changed. Maybe it is try to correctly handle '\0' >>> symbol (it is valid unicode character)? >> The define you have pasted can return 0xffff. >> The reasons to change it back are described in the previous patchset. >> In short: >> 1. It is equivalent to >> a. check s < e in a while loop >> b. read next character inside of where loop body. >> 2. In some usages of the code this check (s<e) was redundant (it was >> performed a couple lines above) >> 3. There is no reason to rewrite the old version of this function. >> (So, we decided to use old version of the function) >>> So I see two ways to proceed: >>> >>> 1. Lean on icu's check and ignore possibility of the buffer overflow. >>> 2. Use our own check and possibly meet '\0' problems. >>> 3. Check for U_ILLEGAL_ARGUMENT_ERROR to treat as end of a string, raise >>> the error for other 0xffff. >>> >>> Alex, what do you suggests here? >> As I understand, by now the 0xffff is used ONLY to handle the case of >> unexpectedly ended symbol. >> E.g. some symbol consists of 2 characters, but the length of the >> input buffer is 1. >> In my opinion this is the same as an invalid symbol. >> >> I guess that internal buffer overflow cannot occur in the >> `ucnv_getNextChar` function. >> >> I suppose that it is Nikitas duty to investigate this problem and >> explain it to us all. I just have noticed a strange usage. > > Hello, please consider my comments. > > There are some cases when 0xffff can occur, but: > 1) Cannot trigger in our context. > 2) Cannot trigger in our context. > 3) Only triggers if end < start. (Cannot happen in > sql_utf8_pattern_compare, i guess) > 4) Only triggers if string length > (size_t) 0x7ffffffff (can it > actually happen? I don’t think so). > 5) Occurs when trying to access to not unindexed data. > 6) Cannot occur in our context. > 7) Cannot occur in our context. I do not understand what are those numbers related to. Please, describe it. > > 0xfffd only means that symbol cannot be treated as a unicode symbol. > > Shall I change it somehow then? > > >> On 17 Aug 2018, at 12:23, Alex Khatskevich >> <avkhatskevich@tarantool.org <mailto:avkhatskevich@tarantool.org>> wrote: >> >> I have a look at icu code and It seems like 0xffff is an error, and >> it is more similar to >> invalid symbol that to "end of string". Check it, and fix the code, >> so that it is treated as >> an error. >> For example it is not handled in the main pattern loop: >> >> +while (pattern < pattern_end) { >> c = Utf8Read(pattern, pattern_end); >> +if (c == SQL_INVALID_UTF8_SYMBOL) >> +return SQL_INVALID_PATTERN; >> >> It seems like the 0xffff should be checked there too. > > No, it should not. This way it will only cause a bug when, for example > ’select “” like “”’ > will be treated as an error. I do not understand. ’select “” like “”’ should not even trap inside of the while loop (because `pattern < pattern_end` is false). [-- Attachment #2: Type: text/html, Size: 25136 bytes --]
next prev parent reply other threads:[~2018-09-10 22:20 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-16 17:00 [tarantool-patches] [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal N.Tatunov 2018-08-16 17:00 ` [tarantool-patches] [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue N.Tatunov 2018-08-17 9:23 ` [tarantool-patches] " Alex Khatskevich 2018-08-17 11:17 ` Alexander Turenko 2018-08-17 11:42 ` Alex Khatskevich 2018-09-09 13:33 ` Nikita Tatunov 2018-09-10 22:20 ` Alex Khatskevich [this message] 2018-09-11 6:06 ` Nikita Tatunov 2018-09-11 10:06 ` Alex Khatskevich 2018-09-11 13:31 ` Nikita Tatunov 2018-10-18 18:02 ` Nikita Tatunov 2018-10-21 3:51 ` Alexander Turenko 2018-10-26 15:19 ` Nikita Tatunov 2018-10-29 13:01 ` Alexander Turenko 2018-10-31 5:25 ` Nikita Tatunov 2018-11-01 10:30 ` Alexander Turenko 2018-11-14 14:16 ` n.pettik 2018-11-14 17:06 ` Alexander Turenko 2018-08-16 17:00 ` [tarantool-patches] [PATCH 2/2] sql: remove GLOB from Tarantool N.Tatunov 2018-08-17 8:25 ` [tarantool-patches] " Alex Khatskevich 2018-08-17 8:49 ` n.pettik 2018-08-17 9:01 ` Alex Khatskevich 2018-08-17 9:20 ` n.pettik 2018-08-17 9:28 ` Alex Khatskevich [not found] ` <04D02794-07A5-4146-9144-84EE720C8656@corp.mail.ru> 2018-08-17 8:53 ` Alex Khatskevich 2018-08-17 11:26 ` Alexander Turenko 2018-08-17 11:34 ` Alexander Turenko 2018-08-17 13:46 ` Nikita Tatunov 2018-09-09 14:57 ` Nikita Tatunov 2018-09-10 22:06 ` Alex Khatskevich 2018-09-11 7:38 ` Nikita Tatunov 2018-09-11 10:11 ` Alexander Turenko 2018-09-11 10:22 ` Alex Khatskevich 2018-09-11 12:03 ` Alex Khatskevich 2018-10-18 20:28 ` Nikita Tatunov 2018-10-21 3:48 ` Alexander Turenko 2018-10-26 15:21 ` Nikita Tatunov 2018-10-29 12:15 ` Alexander Turenko 2018-11-08 15:09 ` Nikita Tatunov 2018-11-09 12:18 ` Alexander Turenko 2018-11-10 3:38 ` Nikita Tatunov 2018-11-13 19:23 ` Alexander Turenko 2018-11-14 14:16 ` n.pettik 2018-11-14 17:41 ` Alexander Turenko 2018-11-14 21:48 ` n.pettik 2018-11-15 4:57 ` [tarantool-patches] Re: [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=6a1352e9-425c-d656-1bec-bb04d9f0fee6@tarantool.org \ --to=avkhatskevich@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox