[Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Oct 29 01:11:37 MSK 2021


Thanks for the fixes!

>>> +/** Implementation of the CHAR_LENGTH() function. */
>>> +static inline uint8_t
>>> +utf8_len_char(char c)
>>> +{
>>> +	uint8_t u = (uint8_t)c;
>>> +	return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0);
>>
>> It is not that simple really. Consider either using the old
>> lengthFunc() and other sqlite utf8 helpers or use the approach
>> similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT()
>> and has handling for special symbols like U_SENTINEL.
>>
>> Otherwise you are making already third version of functions to
>> work with utf8.
>>
>> I would even prefer to refactor lengthFunc() to stop using sqlite
>> legacy and drop sqlite utf8 entirely, but I suspect it might be
>> not so trivial to do and should be done later.
> I was able to use ucnv_getNextUChar() here. In fact, I was able to use this
> functions in all the places in this patch-set where we had to work with my or
> SQLite functions that work with UTF8 characters. I think I can remove sql/utf.c
> in the next patchset, since I refactor the LENGTH() and UNICODE() functions
> there.

Discussed in private that U8_NEXT() would work here just fine.
ucnv_getNextUChar() is an overkill. In other places of the patchset too.



More information about the Tarantool-patches mailing list