[Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sat Oct 9 00:56:59 MSK 2021


Thanks for the patch!

On 01.10.2021 18:29, imeevma at tarantool.org wrote:
> Part of #4145
> ---
>  src/box/sql/func.c | 38 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 54b03f359..2e53b32d8 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -263,6 +263,38 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv)
>  	mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r);
>  }
>  
> +/** Implementation of the CHAR_LENGTH() function. */
> +static inline uint8_t
> +utf8_len_char(char c)
> +{
> +	uint8_t u = (uint8_t)c;
> +	return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0);

It is not that simple really. Consider either using the old
lengthFunc() and other sqlite utf8 helpers or use the approach
similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT()
and has handling for special symbols like U_SENTINEL.

Otherwise you are making already third version of functions to
work with utf8.

I would even prefer to refactor lengthFunc() to stop using sqlite
legacy and drop sqlite utf8 entirely, but I suspect it might be
not so trivial to do and should be done later.


More information about the Tarantool-patches mailing list