[tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0'
n.pettik
korablev at tarantool.org
Fri Feb 22 15:59:35 MSK 2019
> On 20 Feb 2019, at 22:24, i.koptelov <ivan.koptelov at tarantool.org> wrote:
>
>>>
>>> On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov at tarantool.org <mailto:ivan.koptelov at tarantool.org>> wrote:
>>>
>>> Thanks to Alexander, I fixed my patch to use a function
>>> from icu to count the length of the string.
>>>
>>> Changes:
Travis has failed. Please, make sure it is OK before sending the patch.
It doesn’t fail on my local (Mac) machine, so I guess this fail appears
only on Linux system.
>> Furthermore, description says that it “assumes well-formed UTF-8”,
>> which in our case is not true. So who knows what may happen if we pass
>> malformed byte sequence. I am not even saying that behaviour of
>> this function on invalid inputs may change later.
>
> In it's current implementation U8_FWD_1_UNSAFE satisfy our needs safely. Returned
> symbol length would never exceed byte_len.
>
> static int
> utf8_char_count(const unsigned char *str, int byte_len)
> {
> int symbol_count = 0;
> for (int i = 0; i < byte_len;) {
> U8_FWD_1_UNSAFE(str, i);
> symbol_count++;
> }
> return symbol_count;
> }
>
> I agree that it is a bad idea to relay on lib behaviour which may
> change lately. So maybe I would just inline these one line macros?
> Or use my own implementation, since it’s more efficient (but less beautiful)
Nevermind, let's keep it as is.
I really worry only about the fact that in other places SQL_SKIP_UTF8
is used instead. It handles only two-bytes utf8 symbols, meanwhile
U8_FWD_1_UNSAFE() accounts three and four bytes length symbols.
Can we use everywhere the same pattern?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20190222/dd8280f1/attachment.html>
More information about the Tarantool-patches
mailing list