From: Nikita Tatunov <n.tatunov@tarantool.org> To: Alexander Turenko <alexander.turenko@tarantool.org> Cc: tarantool-patches@freelists.org, korablev@tarantool.org Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue Date: Wed, 31 Oct 2018 08:25:12 +0300 [thread overview] Message-ID: <3D4337BA-F528-425C-B352-C195C20DA282@tarantool.org> (raw) In-Reply-To: <20181029130123.f254chdxxuwi6c4w@tkn_work_nb> > On Oct 29, 2018, at 16:01, Alexander Turenko <alexander.turenko@tarantool.org> wrote: > > The patch is okay except one note re test case. > > WBR, Alexander Turenko. > >> --- EVIDENCE-OF: R-39414-35489 The infix GLOB operator is implemented by >> --- calling the function glob(Y,X) and can be modified by overriding that >> --- function. > > This test case was removed, while we have not a similar one for LIKE. I guess it is concerned more with the second patch. Anyways. If you mean the tests following this comment then actually there are some similar tests for LIKE (15.1.x). > >>>>> if(((size_t)(sourceLimit-s)>(size_t)0x7fffffff && sourceLimit>s)) { >>>>> *err=U_ILLEGAL_ARGUMENT_ERROR; >>>>> return 0xffff; >>>> >>>> 4. I’m not sure if string data can be this long in our context. >>>> (string length > (size_t) 0x7ffffffff) >>> >>> Note: not 0x7ffffffff, but 0x7fffffff. >>> >>> This limit seems to be some weird internal thing related to using >>> ucnv_getNextUChar inside libicu. >>> >>> I propose to lie libicu about the buffer size in case when it exceeds >>> this limit. A UTF-8 encoded symbol is 4 bytes long at max, so we can >>> pass the following instead of pattern_end: >>> >>> ((size_t) (pattern_end - pattern) > (size_t) 0x7fffffff ? pattern + 0x7fffffff : pattern_end >>> >>> I think this trick need to be covered with a unit test (because it is unclear >>> how to create a string of size >1GiB from lua). Don't sure whether it is >>> okay to allocate such amount of memory in the test, though... >>> >>> Please, don't do that within this patch, because it is about the another bug. >>> File an issue with all needed information instead (you can provide a link to >>> this message for example). >> >> Ok, thank you for advice. I think that’s a good idea, but there’s one thing >> I’m getting concerned about: it will cause a lot of operations especially >> in case we’re using LIKE for scanning a lot of data (). I guess even if it’s >> relevant it’s a discussion inside of an issue that’s going to be filed. > > Filed https://github.com/tarantool/tarantool/issues/3773 > >>>>> } else if(U_SUCCESS(*err) && c>=0) { >>>>> return c; >>>> >>>> 6. Returns symbol (can also be 0xfffd, as it is not treated as an actual error). >>>> >>>> So if I’m not mistaken we will get results in our function either from >>>> ‘return’ number 5 or number 6 and the following code will not be executed. >>> >>> It is not so. We'll fall through in case of U_ILLEGAL_CHAR_FOUND or >>> U_TRUNCATED_CHAR_FOUND error. >>> >>> To be honest I don't want to continue. It seems we should not lean on >>> the fact that 0xffff always means end of the buffer, because it does not >>> guaranteed by the API and is not clear from the code. >>> >>> AFAIR, the problem was to choose appropriate symbol to mark end of the >>> buffer situation and distinguish it from a real error. It seems we have >>> not one. So we should fairly (and always) check for the buffer before a >>> call to ucnv_getNextUChar() or check the status it provide after the >>> call. I would prefer to check it in our code. It seems that it is how >>> the API works. >>> >>> I propose to use the same code pattern for all Utf8Read calls, e.g.: >>> >>> if (pattern < pattern_end) >>> c = Utf8Read(pattern, pattern_end) >>> else >>> return SQL_...; >>> if (c == SQL_INVALID_UTF8_SYMBOL) >>> return SQL_...; >>> assert(U_SUCCESS(status)); >>> >>> Note: I have added the assert, because it is not clear what we can do >>> with, say, U_INVALID_TABLE_FORMAT (improper libicu build / >>> installation). Hope Nikita P. suggests right way, but now I think we >>> should at least assert on that. >>> >>> It seems the code above can be even wrapped into a macro that will get >>> two pointers (pattern and pattern_end / string and string_end) and two >>> SQL_... error code to handle two possible errors. Yep, it is generally >>> discouraged to return from a macro, but if it'll greatly improves the >>> code readability, so it is appropriate, I think. Just define the macro >>> right before the function and undefne it after to show a reader it is >>> some pure internal thing. >>> >>> Note: If you will going that way, don't wrap Utf8Read macro into another >>> macro. Use one with ucnv_getNextUChar call. >>> >>> It is refactoring of the code and our of the scope of your issue. >>> Please, file an issue and link this message into it (but please ask >>> Nikita P. opinion before). >>> >>> It is not good IMHO, but it seems now it worth to leave the code with >>> assumption 0xffff is the end of buffer. This is kind of splitting the >>> problem into parts and allow us to proceed with this patch re parsing >>> bug. > > Filed https://github.com/tarantool/tarantool/issues/3774
next prev parent reply other threads:[~2018-10-31 5:25 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-16 17:00 [tarantool-patches] [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal N.Tatunov 2018-08-16 17:00 ` [tarantool-patches] [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue N.Tatunov 2018-08-17 9:23 ` [tarantool-patches] " Alex Khatskevich 2018-08-17 11:17 ` Alexander Turenko 2018-08-17 11:42 ` Alex Khatskevich 2018-09-09 13:33 ` Nikita Tatunov 2018-09-10 22:20 ` Alex Khatskevich 2018-09-11 6:06 ` Nikita Tatunov 2018-09-11 10:06 ` Alex Khatskevich 2018-09-11 13:31 ` Nikita Tatunov 2018-10-18 18:02 ` Nikita Tatunov 2018-10-21 3:51 ` Alexander Turenko 2018-10-26 15:19 ` Nikita Tatunov 2018-10-29 13:01 ` Alexander Turenko 2018-10-31 5:25 ` Nikita Tatunov [this message] 2018-11-01 10:30 ` Alexander Turenko 2018-11-14 14:16 ` n.pettik 2018-11-14 17:06 ` Alexander Turenko 2018-08-16 17:00 ` [tarantool-patches] [PATCH 2/2] sql: remove GLOB from Tarantool N.Tatunov 2018-08-17 8:25 ` [tarantool-patches] " Alex Khatskevich 2018-08-17 8:49 ` n.pettik 2018-08-17 9:01 ` Alex Khatskevich 2018-08-17 9:20 ` n.pettik 2018-08-17 9:28 ` Alex Khatskevich [not found] ` <04D02794-07A5-4146-9144-84EE720C8656@corp.mail.ru> 2018-08-17 8:53 ` Alex Khatskevich 2018-08-17 11:26 ` Alexander Turenko 2018-08-17 11:34 ` Alexander Turenko 2018-08-17 13:46 ` Nikita Tatunov 2018-09-09 14:57 ` Nikita Tatunov 2018-09-10 22:06 ` Alex Khatskevich 2018-09-11 7:38 ` Nikita Tatunov 2018-09-11 10:11 ` Alexander Turenko 2018-09-11 10:22 ` Alex Khatskevich 2018-09-11 12:03 ` Alex Khatskevich 2018-10-18 20:28 ` Nikita Tatunov 2018-10-21 3:48 ` Alexander Turenko 2018-10-26 15:21 ` Nikita Tatunov 2018-10-29 12:15 ` Alexander Turenko 2018-11-08 15:09 ` Nikita Tatunov 2018-11-09 12:18 ` Alexander Turenko 2018-11-10 3:38 ` Nikita Tatunov 2018-11-13 19:23 ` Alexander Turenko 2018-11-14 14:16 ` n.pettik 2018-11-14 17:41 ` Alexander Turenko 2018-11-14 21:48 ` n.pettik 2018-11-15 4:57 ` [tarantool-patches] Re: [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3D4337BA-F528-425C-B352-C195C20DA282@tarantool.org \ --to=n.tatunov@tarantool.org \ --cc=alexander.turenko@tarantool.org \ --cc=korablev@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox