Tarantool development patches archive
 help / color / mirror / Atom feed
From: Nikita Tatunov <n.tatunov@tarantool.org>
To: Alexander Turenko <alexander.turenko@tarantool.org>
Cc: tarantool-patches@freelists.org, korablev@tarantool.org
Subject: [tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue
Date: Wed, 31 Oct 2018 08:25:12 +0300	[thread overview]
Message-ID: <3D4337BA-F528-425C-B352-C195C20DA282@tarantool.org> (raw)
In-Reply-To: <20181029130123.f254chdxxuwi6c4w@tkn_work_nb>



> On Oct 29, 2018, at 16:01, Alexander Turenko <alexander.turenko@tarantool.org> wrote:
> 
> The patch is okay except one note re test case.
> 
> WBR, Alexander Turenko.
> 
>> --- EVIDENCE-OF: R-39414-35489 The infix GLOB operator is implemented by
>> --- calling the function glob(Y,X) and can be modified by overriding that
>> --- function.
> 
> This test case was removed, while we have not a similar one for LIKE.

I guess it is concerned more with the second patch.
Anyways. If you mean the tests following this comment then
actually there are some similar tests for LIKE (15.1.x).

> 
>>>>>   if(((size_t)(sourceLimit-s)>(size_t)0x7fffffff && sourceLimit>s)) {
>>>>>       *err=U_ILLEGAL_ARGUMENT_ERROR;
>>>>>       return 0xffff;
>>>> 
>>>> 4. I’m not sure if string data can be this long in our context. 
>>>>  (string length > (size_t) 0x7ffffffff)
>>> 
>>> Note: not 0x7ffffffff, but 0x7fffffff.
>>> 
>>> This limit seems to be some weird internal thing related to using
>>> ucnv_getNextUChar inside libicu.
>>> 
>>> I propose to lie libicu about the buffer size in case when it exceeds
>>> this limit. A UTF-8 encoded symbol is 4 bytes long at max, so we can
>>> pass the following instead of pattern_end:
>>> 
>>> ((size_t) (pattern_end - pattern) > (size_t) 0x7fffffff ? pattern + 0x7fffffff : pattern_end
>>> 
>>> I think this trick need to be covered with a unit test (because it is unclear
>>> how to create a string of size >1GiB from lua). Don't sure whether it is
>>> okay to allocate such amount of memory in the test, though...
>>> 
>>> Please, don't do that within this patch, because it is about the another bug.
>>> File an issue with all needed information instead (you can provide a link to
>>> this message for example).
>> 
>> Ok, thank you for advice. I think that’s a good idea, but there’s one thing
>> I’m getting concerned about: it will cause a lot of operations especially
>> in case we’re using LIKE for scanning a lot of data (). I guess even if it’s
>> relevant it’s a discussion inside of an issue that’s going to be filed.
> 
> Filed https://github.com/tarantool/tarantool/issues/3773
> 
>>>>>           } else if(U_SUCCESS(*err) && c>=0) {
>>>>>               return c;
>>>> 
>>>> 6. Returns symbol (can also be 0xfffd, as it is not treated as an actual error).
>>>> 
>>>> So if I’m not mistaken we will get results in our function either from
>>>> ‘return’ number 5 or number 6 and the following code will not be executed.
>>> 
>>> It is not so. We'll fall through in case of U_ILLEGAL_CHAR_FOUND or
>>> U_TRUNCATED_CHAR_FOUND error.
>>> 
>>> To be honest I don't want to continue. It seems we should not lean on
>>> the fact that 0xffff always means end of the buffer, because it does not
>>> guaranteed by the API and is not clear from the code.
>>> 
>>> AFAIR, the problem was to choose appropriate symbol to mark end of the
>>> buffer situation and distinguish it from a real error. It seems we have
>>> not one. So we should fairly (and always) check for the buffer before a
>>> call to ucnv_getNextUChar() or check the status it provide after the
>>> call. I would prefer to check it in our code. It seems that it is how
>>> the API works.
>>> 
>>> I propose to use the same code pattern for all Utf8Read calls, e.g.:
>>> 
>>> if (pattern < pattern_end)
>>> 	c = Utf8Read(pattern, pattern_end)
>>> else
>>> 	return SQL_...;
>>> if (c == SQL_INVALID_UTF8_SYMBOL)
>>> 	return SQL_...;
>>> assert(U_SUCCESS(status));
>>> 
>>> Note: I have added the assert, because it is not clear what we can do
>>> with, say, U_INVALID_TABLE_FORMAT (improper libicu build /
>>> installation). Hope Nikita P. suggests right way, but now I think we
>>> should at least assert on that.
>>> 
>>> It seems the code above can be even wrapped into a macro that will get
>>> two pointers (pattern and pattern_end / string and string_end) and two
>>> SQL_...  error code to handle two possible errors. Yep, it is generally
>>> discouraged to return from a macro, but if it'll greatly improves the
>>> code readability, so it is appropriate, I think. Just define the macro
>>> right before the function and undefne it after to show a reader it is
>>> some pure internal thing.
>>> 
>>> Note: If you will going that way, don't wrap Utf8Read macro into another
>>> macro. Use one with ucnv_getNextUChar call.
>>> 
>>> It is refactoring of the code and our of the scope of your issue.
>>> Please, file an issue and link this message into it (but please ask
>>> Nikita P. opinion before).
>>> 
>>> It is not good IMHO, but it seems now it worth to leave the code with
>>> assumption 0xffff is the end of buffer. This is kind of splitting the
>>> problem into parts and allow us to proceed with this patch re parsing
>>> bug.
> 
> Filed https://github.com/tarantool/tarantool/issues/3774

  reply	other threads:[~2018-10-31  5:25 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16 17:00 [tarantool-patches] [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal N.Tatunov
2018-08-16 17:00 ` [tarantool-patches] [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue N.Tatunov
2018-08-17  9:23   ` [tarantool-patches] " Alex Khatskevich
2018-08-17 11:17     ` Alexander Turenko
2018-08-17 11:42       ` Alex Khatskevich
2018-09-09 13:33         ` Nikita Tatunov
2018-09-10 22:20           ` Alex Khatskevich
2018-09-11  6:06             ` Nikita Tatunov
2018-09-11 10:06               ` Alex Khatskevich
2018-09-11 13:31                 ` Nikita Tatunov
2018-10-18 18:02                   ` Nikita Tatunov
2018-10-21  3:51                     ` Alexander Turenko
2018-10-26 15:19                       ` Nikita Tatunov
2018-10-29 13:01                         ` Alexander Turenko
2018-10-31  5:25                           ` Nikita Tatunov [this message]
2018-11-01 10:30                             ` Alexander Turenko
2018-11-14 14:16                               ` n.pettik
2018-11-14 17:06                                 ` Alexander Turenko
2018-08-16 17:00 ` [tarantool-patches] [PATCH 2/2] sql: remove GLOB from Tarantool N.Tatunov
2018-08-17  8:25   ` [tarantool-patches] " Alex Khatskevich
2018-08-17  8:49     ` n.pettik
2018-08-17  9:01       ` Alex Khatskevich
2018-08-17  9:20         ` n.pettik
2018-08-17  9:28           ` Alex Khatskevich
     [not found]     ` <04D02794-07A5-4146-9144-84EE720C8656@corp.mail.ru>
2018-08-17  8:53       ` Alex Khatskevich
2018-08-17 11:26     ` Alexander Turenko
2018-08-17 11:34       ` Alexander Turenko
2018-08-17 13:46     ` Nikita Tatunov
2018-09-09 14:57     ` Nikita Tatunov
2018-09-10 22:06       ` Alex Khatskevich
2018-09-11  7:38         ` Nikita Tatunov
2018-09-11 10:11           ` Alexander Turenko
2018-09-11 10:22             ` Alex Khatskevich
2018-09-11 12:03           ` Alex Khatskevich
2018-10-18 20:28             ` Nikita Tatunov
2018-10-21  3:48               ` Alexander Turenko
2018-10-26 15:21                 ` Nikita Tatunov
2018-10-29 12:15                   ` Alexander Turenko
2018-11-08 15:09                     ` Nikita Tatunov
2018-11-09 12:18                       ` Alexander Turenko
2018-11-10  3:38                         ` Nikita Tatunov
2018-11-13 19:23                           ` Alexander Turenko
2018-11-14 14:16                             ` n.pettik
2018-11-14 17:41                               ` Alexander Turenko
2018-11-14 21:48                                 ` n.pettik
2018-11-15  4:57 ` [tarantool-patches] Re: [PATCH v2 0/2] sql: pattern comparison fixes & GLOB removal Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D4337BA-F528-425C-B352-C195C20DA282@tarantool.org \
    --to=n.tatunov@tarantool.org \
    --cc=alexander.turenko@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH 1/2] sql: LIKE & GLOB pattern comparison issue' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox