* [Tarantool-patches] [PATCH v1 0/8] Rework standard function @ 2021-10-01 16:29 Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion Mergen Imeev via Tarantool-patches ` (9 more replies) 0 siblings, 10 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches This patch refactors all non-aggregate standard functions and makes these functions work according to ANSI rules, if it is possible. However, the ANSI specific syntax for these functions will be presented later. Mergen Imeev (8): sql: refactor ABS() funcion sql: refactor CHAR_LENGTH() function sql: refactor UPPER() and LOWER() functions sql: refactor NULLIF() function sql: rework TRIM() function sql: rework POSITION() function sql: rework SUBSTR() function sql: refactor LIKE() function src/box/sql/func.c | 1137 +++++++++++++++----------------- test/sql-tap/badutf1.test.lua | 41 +- test/sql-tap/func.test.lua | 16 +- test/sql-tap/func2.test.lua | 150 ++--- test/sql-tap/position.test.lua | 80 ++- test/sql-tap/substr.test.lua | 121 +++- 6 files changed, 786 insertions(+), 759 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-08 21:55 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function Mergen Imeev via Tarantool-patches ` (8 subsequent siblings) 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches Part of #4145 --- src/box/sql/func.c | 83 +++++++++++++++++----------------------------- 1 file changed, 30 insertions(+), 53 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index b23636d7e..54b03f359 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -237,6 +237,32 @@ step_group_concat(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementations of the ABS() function. */ +static void +func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (arg->type == MEM_TYPE_NULL) + return; + assert(mem_is_int(arg)); + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; + mem_set_uint(ctx->pOut, u); +} + +static void +func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (arg->type == MEM_TYPE_NULL) + return; + assert(arg->type == MEM_TYPE_DOUBLE); + mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -400,57 +426,6 @@ lengthFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/* - * Implementation of the abs() function. - * - * IMP: R-23979-26855 The abs(X) function returns the absolute value of - * the numeric argument X. - */ -static void -absFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - assert(argc == 1); - UNUSED_PARAMETER(argc); - switch (sql_value_type(&argv[0])) { - case MP_UINT: { - sql_result_uint(context, mem_get_uint_unsafe(&argv[0])); - break; - } - case MP_INT: { - int64_t value = mem_get_int_unsafe(&argv[0]); - assert(value < 0); - sql_result_uint(context, -value); - break; - } - case MP_NIL:{ - /* IMP: R-37434-19929 Abs(X) returns NULL if X is NULL. */ - sql_result_null(context); - break; - } - case MP_BOOL: - case MP_BIN: - case MP_EXT: - case MP_ARRAY: - case MP_MAP: { - diag_set(ClientError, ER_INCONSISTENT_TYPES, "number", - mem_str(&argv[0])); - context->is_aborted = true; - return; - } - default:{ - /* - * Abs(X) returns 0.0 if X is a string or blob - * that cannot be converted to a numeric value. - */ - double rVal = mem_get_double_unsafe(&argv[0]); - if (rVal < 0) - rVal = -rVal; - sql_result_double(context, rVal); - break; - } - } -} - /** * Implementation of the position() function. * @@ -1936,8 +1911,10 @@ struct sql_func_definition { * function should be defined in succession. */ static struct sql_func_definition definitions[] = { - {"ABS", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, absFunc, NULL}, - {"ABS", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, absFunc, NULL}, + {"ABS", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, func_abs_int, + NULL}, + {"ABS", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, func_abs_double, + NULL}, {"AVG", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_avg, fin_avg}, {"AVG", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_avg, fin_avg}, {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, charFunc, NULL}, -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion Mergen Imeev via Tarantool-patches @ 2021-10-08 21:55 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 16:52 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-08 21:55 UTC (permalink / raw) To: imeevma; +Cc: tarantool-patches Hi! Thanks for the patch! On 01.10.2021 18:29, imeevma@tarantool.org wrote: > Part of #4145 > --- > src/box/sql/func.c | 83 +++++++++++++++++----------------------------- > 1 file changed, 30 insertions(+), 53 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index b23636d7e..54b03f359 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -237,6 +237,32 @@ step_group_concat(struct sql_context *ctx, int argc, struct Mem *argv) > ctx->is_aborted = true; > } > > +/** Implementations of the ABS() function. */ > +static void > +func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) > +{ > + assert(argc == 1); > + (void)argc; > + struct Mem *arg = &argv[0]; > + if (arg->type == MEM_TYPE_NULL) > + return; 1. Need mem_is_null(). The same for the other places if there are more. > + assert(mem_is_int(arg)); > + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; 2. You could make return when mem_is_uint(). It would remove '?' and mem_set_uint() which would calls mem_clear() inside. > + mem_set_uint(ctx->pOut, u); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion 2021-10-08 21:55 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-10-20 16:52 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 16:52 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Hi! Thank you for the review! My answers, diff and new patch below. On Fri, Oct 08, 2021 at 11:55:33PM +0200, Vladislav Shpilevoy wrote: > Hi! Thanks for the patch! > > On 01.10.2021 18:29, imeevma@tarantool.org wrote: > > Part of #4145 > > --- > > src/box/sql/func.c | 83 +++++++++++++++++----------------------------- > > 1 file changed, 30 insertions(+), 53 deletions(-) > > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index b23636d7e..54b03f359 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -237,6 +237,32 @@ step_group_concat(struct sql_context *ctx, int argc, struct Mem *argv) > > ctx->is_aborted = true; > > } > > > > +/** Implementations of the ABS() function. */ > > +static void > > +func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) > > +{ > > + assert(argc == 1); > > + (void)argc; > > + struct Mem *arg = &argv[0]; > > + if (arg->type == MEM_TYPE_NULL) > > + return; > > 1. Need mem_is_null(). The same for the other places if there > are more. > Fixed here in all other places in this patch-set. > > + assert(mem_is_int(arg)); > > + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; > > 2. You could make return when mem_is_uint(). It would remove '?' and > mem_set_uint() which would calls mem_clear() inside. > I am not sure that I understood correctly. In case of argument being uint we can use mem_copy_as_ephemeral() instead of mem_set_uint(), but I am not sure if it would be better. > > + mem_set_uint(ctx->pOut, u); Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 5f90fb597..dbeb38bee 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -239,7 +239,7 @@ func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) assert(argc == 1); (void)argc; struct Mem *arg = &argv[0]; - if (arg->type == MEM_TYPE_NULL) + if (mem_is_null(arg)) return; assert(mem_is_int(arg)); uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; @@ -252,9 +252,9 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) assert(argc == 1); (void)argc; struct Mem *arg = &argv[0]; - if (arg->type == MEM_TYPE_NULL) + if (mem_is_null(arg)) return; - assert(arg->type == MEM_TYPE_DOUBLE); + assert(mem_is_double(arg)); mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); } New patch: commit 2cf5a2417954cb06bcc17cabf0636b6ba130cf8d Author: Mergen Imeev <imeevma@gmail.com> Date: Tue Sep 21 14:39:57 2021 +0300 sql: refactor ABS() funcion Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 0cf17572b..dbeb38bee 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -232,6 +232,32 @@ step_group_concat(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementations of the ABS() function. */ +static void +func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (mem_is_null(arg)) + return; + assert(mem_is_int(arg)); + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; + mem_set_uint(ctx->pOut, u); +} + +static void +func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (mem_is_null(arg)) + return; + assert(mem_is_double(arg)); + mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -395,57 +421,6 @@ lengthFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/* - * Implementation of the abs() function. - * - * IMP: R-23979-26855 The abs(X) function returns the absolute value of - * the numeric argument X. - */ -static void -absFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - assert(argc == 1); - UNUSED_PARAMETER(argc); - switch (sql_value_type(&argv[0])) { - case MP_UINT: { - sql_result_uint(context, mem_get_uint_unsafe(&argv[0])); - break; - } - case MP_INT: { - int64_t value = mem_get_int_unsafe(&argv[0]); - assert(value < 0); - sql_result_uint(context, -value); - break; - } - case MP_NIL:{ - /* IMP: R-37434-19929 Abs(X) returns NULL if X is NULL. */ - sql_result_null(context); - break; - } - case MP_BOOL: - case MP_BIN: - case MP_EXT: - case MP_ARRAY: - case MP_MAP: { - diag_set(ClientError, ER_INCONSISTENT_TYPES, "number", - mem_str(&argv[0])); - context->is_aborted = true; - return; - } - default:{ - /* - * Abs(X) returns 0.0 if X is a string or blob - * that cannot be converted to a numeric value. - */ - double rVal = mem_get_double_unsafe(&argv[0]); - if (rVal < 0) - rVal = -rVal; - sql_result_double(context, rVal); - break; - } - } -} - /** * Implementation of the position() function. * @@ -1930,8 +1905,10 @@ struct sql_func_definition { * function should be defined in succession. */ static struct sql_func_definition definitions[] = { - {"ABS", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, absFunc, NULL}, - {"ABS", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, absFunc, NULL}, + {"ABS", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, func_abs_int, + NULL}, + {"ABS", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, func_abs_double, + NULL}, {"AVG", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_avg, fin_avg}, {"AVG", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_avg, fin_avg}, {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, charFunc, NULL}, ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion 2021-10-20 16:52 ` Mergen Imeev via Tarantool-patches @ 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:11 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-28 22:11 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-patches Hi! Thanks for the fixes! >>> + assert(mem_is_int(arg)); >>> + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; >> >> 2. You could make return when mem_is_uint(). It would remove '?' and >> mem_set_uint() which would calls mem_clear() inside. >> > I am not sure that I understood correctly. In case of argument being uint we > can use mem_copy_as_ephemeral() instead of mem_set_uint(), but I am not sure > if it would be better. I mean this: ==================== diff --git a/src/box/sql/func.c b/src/box/sql/func.c index dbeb38bee..2a848be31 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -239,11 +239,10 @@ func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) assert(argc == 1); (void)argc; struct Mem *arg = &argv[0]; - if (mem_is_null(arg)) + if (mem_is_null(arg) || mem_is_uint(arg)) return; assert(mem_is_int(arg)); - uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; - mem_set_uint(ctx->pOut, u); + mem_set_uint(ctx->pOut, (uint64_t)-arg->u.i); } ==================== Up to you. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-11-01 10:11 ` Mergen Imeev via Tarantool-patches 2021-11-01 13:37 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:11 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Hi! Thank you the review! My answer below. On Fri, Oct 29, 2021 at 12:11:11AM +0200, Vladislav Shpilevoy wrote: > Hi! Thanks for the fixes! > > >>> + assert(mem_is_int(arg)); > >>> + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; > >> > >> 2. You could make return when mem_is_uint(). It would remove '?' and > >> mem_set_uint() which would calls mem_clear() inside. > >> > > I am not sure that I understood correctly. In case of argument being uint we > > can use mem_copy_as_ephemeral() instead of mem_set_uint(), but I am not sure > > if it would be better. > > I mean this: > > ==================== > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index dbeb38bee..2a848be31 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -239,11 +239,10 @@ func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) > assert(argc == 1); > (void)argc; > struct Mem *arg = &argv[0]; > - if (mem_is_null(arg)) > + if (mem_is_null(arg) || mem_is_uint(arg)) > return; > assert(mem_is_int(arg)); > - uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; > - mem_set_uint(ctx->pOut, u); > + mem_set_uint(ctx->pOut, (uint64_t)-arg->u.i); > } > ==================== > > Up to you. This would work for aggregate function in some cases, but not here. If we apply such diff the result of ABS() for UNSIGNED will be NULL. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion 2021-11-01 10:11 ` Mergen Imeev via Tarantool-patches @ 2021-11-01 13:37 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-11-01 13:37 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-patches >>>>> + assert(mem_is_int(arg)); >>>>> + uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; >>>> >>>> 2. You could make return when mem_is_uint(). It would remove '?' and >>>> mem_set_uint() which would calls mem_clear() inside. >>>> >>> I am not sure that I understood correctly. In case of argument being uint we >>> can use mem_copy_as_ephemeral() instead of mem_set_uint(), but I am not sure >>> if it would be better. >> >> I mean this: >> >> ==================== >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index dbeb38bee..2a848be31 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -239,11 +239,10 @@ func_abs_int(struct sql_context *ctx, int argc, struct Mem *argv) >> assert(argc == 1); >> (void)argc; >> struct Mem *arg = &argv[0]; >> - if (mem_is_null(arg)) >> + if (mem_is_null(arg) || mem_is_uint(arg)) >> return; >> assert(mem_is_int(arg)); >> - uint64_t u = mem_is_uint(arg) ? arg->u.u : (uint64_t)-arg->u.i; >> - mem_set_uint(ctx->pOut, u); >> + mem_set_uint(ctx->pOut, (uint64_t)-arg->u.i); >> } >> ==================== >> >> Up to you. > This would work for aggregate function in some cases, but not here. > If we apply such diff the result of ABS() for UNSIGNED will be NULL. Sorry, I forgot that we need to copy the result into pOut. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-08 21:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 3/8] sql: refactor UPPER() and LOWER() functions Mergen Imeev via Tarantool-patches ` (7 subsequent siblings) 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches Part of #4145 --- src/box/sql/func.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 54b03f359..2e53b32d8 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -263,6 +263,38 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); } +/** Implementation of the CHAR_LENGTH() function. */ +static inline uint8_t +utf8_len_char(char c) +{ + uint8_t u = (uint8_t)c; + return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); +} + +static inline uint32_t +utf8_len_str(const char *str, uint32_t size) +{ + uint32_t len = 0; + uint32_t pos = 0; + while (pos < size) { + pos += utf8_len_char(str[pos]); + ++len; + } + return len; +} + +static void +func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (arg->type == MEM_TYPE_NULL) + return; + assert(arg->type == MEM_TYPE_STR && arg->n >= 0); + mem_set_uint(ctx->pOut, utf8_len_str(arg->z, arg->n)); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1918,8 +1950,8 @@ static struct sql_func_definition definitions[] = { {"AVG", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_avg, fin_avg}, {"AVG", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_avg, fin_avg}, {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, charFunc, NULL}, - {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, lengthFunc, - NULL}, + {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, + func_char_length, NULL}, {"COALESCE", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, sql_builtin_stub, NULL}, {"COUNT", 0, {}, FIELD_TYPE_INTEGER, step_count, fin_count}, @@ -1963,7 +1995,7 @@ static struct sql_func_definition definitions[] = { {"LEAST", -1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, minmaxFunc, NULL}, {"LEAST", -1, {FIELD_TYPE_SCALAR}, FIELD_TYPE_SCALAR, minmaxFunc, NULL}, - {"LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, lengthFunc, + {"LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, func_char_length, NULL}, {"LENGTH", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_INTEGER, lengthFunc, NULL}, -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function Mergen Imeev via Tarantool-patches @ 2021-10-08 21:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 16:58 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-08 21:56 UTC (permalink / raw) To: imeevma; +Cc: tarantool-patches Thanks for the patch! On 01.10.2021 18:29, imeevma@tarantool.org wrote: > Part of #4145 > --- > src/box/sql/func.c | 38 +++++++++++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 54b03f359..2e53b32d8 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -263,6 +263,38 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) > mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); > } > > +/** Implementation of the CHAR_LENGTH() function. */ > +static inline uint8_t > +utf8_len_char(char c) > +{ > + uint8_t u = (uint8_t)c; > + return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); It is not that simple really. Consider either using the old lengthFunc() and other sqlite utf8 helpers or use the approach similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT() and has handling for special symbols like U_SENTINEL. Otherwise you are making already third version of functions to work with utf8. I would even prefer to refactor lengthFunc() to stop using sqlite legacy and drop sqlite utf8 entirely, but I suspect it might be not so trivial to do and should be done later. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function 2021-10-08 21:56 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-10-20 16:58 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 16:58 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for the review! My answer, diff and new patch below. On Fri, Oct 08, 2021 at 11:56:59PM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > On 01.10.2021 18:29, imeevma@tarantool.org wrote: > > Part of #4145 > > --- > > src/box/sql/func.c | 38 +++++++++++++++++++++++++++++++++++--- > > 1 file changed, 35 insertions(+), 3 deletions(-) > > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index 54b03f359..2e53b32d8 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -263,6 +263,38 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) > > mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); > > } > > > > +/** Implementation of the CHAR_LENGTH() function. */ > > +static inline uint8_t > > +utf8_len_char(char c) > > +{ > > + uint8_t u = (uint8_t)c; > > + return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); > > It is not that simple really. Consider either using the old > lengthFunc() and other sqlite utf8 helpers or use the approach > similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT() > and has handling for special symbols like U_SENTINEL. > > Otherwise you are making already third version of functions to > work with utf8. > > I would even prefer to refactor lengthFunc() to stop using sqlite > legacy and drop sqlite utf8 entirely, but I suspect it might be > not so trivial to do and should be done later. I was able to use ucnv_getNextUChar() here. In fact, I was able to use this functions in all the places in this patch-set where we had to work with my or SQLite functions that work with UTF8 characters. I think I can remove sql/utf.c in the next patchset, since I refactor the LENGTH() and UNICODE() functions there. Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index e7d1988e0..faef0eef3 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -259,35 +259,24 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) } /** Implementation of the CHAR_LENGTH() function. */ -static inline uint8_t -utf8_len_char(char c) -{ - uint8_t u = (uint8_t)c; - return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); -} - -static inline uint32_t -utf8_len_str(const char *str, uint32_t size) -{ - uint32_t len = 0; - uint32_t pos = 0; - while (pos < size) { - pos += utf8_len_char(str[pos]); - ++len; - } - return len; -} - static void func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) { assert(argc == 1); (void)argc; struct Mem *arg = &argv[0]; - if (arg->type == MEM_TYPE_NULL) + if (mem_is_null(arg)) return; - assert(arg->type == MEM_TYPE_STR && arg->n >= 0); - mem_set_uint(ctx->pOut, utf8_len_str(arg->z, arg->n)); + assert(mem_is_str(arg) && arg->n >= 0); + uint32_t len = 0; + UErrorCode err = U_ZERO_ERROR; + const char *pos = arg->z; + const char *end = arg->z + arg->n; + while (pos < end) { + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + ++len; + } + mem_set_uint(ctx->pOut, len); } static const unsigned char * diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index b25436186..27f17168b 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -287,7 +287,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80') AS x") end, { -- <badutf-3.8> - "X", 7 + "X", 10 -- </badutf-3.8> }) @@ -297,7 +297,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff') AS x") end, { -- <badutf-3.9> - "X", 7 + "X", 10 -- </badutf-3.9> }) diff --git a/test/sql-tap/built-in-functions.test.lua b/test/sql-tap/built-in-functions.test.lua index 6fae811dc..7fe987abc 100755 --- a/test/sql-tap/built-in-functions.test.lua +++ b/test/sql-tap/built-in-functions.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(52) +test:plan(54) -- -- Make sure that number of arguments check is checked properly for SQL built-in @@ -545,4 +545,28 @@ test:do_test( {name = "COLUMN_2", type = "scalar"}, }) +-- gh-4145: Make sure the character is now checked when calculating its length. + +-- Character with UTF-8 code F0808080 does not exist. +test:do_execsql_test( + "builtins-4.1", + [[ + SELECT CHAR_LENGTH(CAST(x'f0808080' AS STRING)); + ]], + { + 4 + } +) + +-- Character with UTF-8 code F0908080 is '𐀀'. +test:do_execsql_test( + "builtins-4.2", + [[ + SELECT CHAR_LENGTH(CAST(x'f0908080' AS STRING)); + ]], + { + 1 + } +) + test:finish_test() diff --git a/test/sql-tap/func3.test.lua b/test/sql-tap/func3.test.lua index 7f1d8d33c..6999fea67 100755 --- a/test/sql-tap/func3.test.lua +++ b/test/sql-tap/func3.test.lua @@ -297,8 +297,8 @@ suits[5] = {str = '\x61\xc0', len = 2} suits[6] = {str = '\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 12} suits[7] = {str = '\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 11} suits[8] = {str = '\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 10} -suits[9] = {str = '\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80', len = 7} -suits[10] = {str = '\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff', len = 7} +suits[9] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\x80', len = 7} +suits[10] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\xff', len = 7} for k,v in pairs(suits) do test:do_execsql_test( New patch: commit 080a23b1d84ce7749d887819d328e988ee7fe766 Author: Mergen Imeev <imeevma@gmail.com> Date: Fri Oct 1 11:12:39 2021 +0300 sql: rework CHAR_LENGTH() function The CHAR_LENGTH() and CHARACTER_LENGTH() functions now use ICU functions to determine the length of a string. This gives us a better pricision when determining the length. Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index dbeb38bee..faef0eef3 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -258,6 +258,27 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); } +/** Implementation of the CHAR_LENGTH() function. */ +static void +func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (mem_is_null(arg)) + return; + assert(mem_is_str(arg) && arg->n >= 0); + uint32_t len = 0; + UErrorCode err = U_ZERO_ERROR; + const char *pos = arg->z; + const char *end = arg->z + arg->n; + while (pos < end) { + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + ++len; + } + mem_set_uint(ctx->pOut, len); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1912,8 +1933,8 @@ static struct sql_func_definition definitions[] = { {"AVG", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_avg, fin_avg}, {"AVG", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_avg, fin_avg}, {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, charFunc, NULL}, - {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, lengthFunc, - NULL}, + {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, + func_char_length, NULL}, {"COALESCE", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, sql_builtin_stub, NULL}, {"COUNT", 0, {}, FIELD_TYPE_INTEGER, step_count, fin_count}, @@ -1957,7 +1978,7 @@ static struct sql_func_definition definitions[] = { {"LEAST", -1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, minmaxFunc, NULL}, {"LEAST", -1, {FIELD_TYPE_SCALAR}, FIELD_TYPE_SCALAR, minmaxFunc, NULL}, - {"LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, lengthFunc, + {"LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, func_char_length, NULL}, {"LENGTH", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_INTEGER, lengthFunc, NULL}, diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index b25436186..27f17168b 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -287,7 +287,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80') AS x") end, { -- <badutf-3.8> - "X", 7 + "X", 10 -- </badutf-3.8> }) @@ -297,7 +297,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff') AS x") end, { -- <badutf-3.9> - "X", 7 + "X", 10 -- </badutf-3.9> }) diff --git a/test/sql-tap/built-in-functions.test.lua b/test/sql-tap/built-in-functions.test.lua index 6fae811dc..7fe987abc 100755 --- a/test/sql-tap/built-in-functions.test.lua +++ b/test/sql-tap/built-in-functions.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(52) +test:plan(54) -- -- Make sure that number of arguments check is checked properly for SQL built-in @@ -545,4 +545,28 @@ test:do_test( {name = "COLUMN_2", type = "scalar"}, }) +-- gh-4145: Make sure the character is now checked when calculating its length. + +-- Character with UTF-8 code F0808080 does not exist. +test:do_execsql_test( + "builtins-4.1", + [[ + SELECT CHAR_LENGTH(CAST(x'f0808080' AS STRING)); + ]], + { + 4 + } +) + +-- Character with UTF-8 code F0908080 is '𐀀'. +test:do_execsql_test( + "builtins-4.2", + [[ + SELECT CHAR_LENGTH(CAST(x'f0908080' AS STRING)); + ]], + { + 1 + } +) + test:finish_test() diff --git a/test/sql-tap/func3.test.lua b/test/sql-tap/func3.test.lua index 7f1d8d33c..6999fea67 100755 --- a/test/sql-tap/func3.test.lua +++ b/test/sql-tap/func3.test.lua @@ -297,8 +297,8 @@ suits[5] = {str = '\x61\xc0', len = 2} suits[6] = {str = '\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 12} suits[7] = {str = '\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 11} suits[8] = {str = '\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 10} -suits[9] = {str = '\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80', len = 7} -suits[10] = {str = '\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff', len = 7} +suits[9] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\x80', len = 7} +suits[10] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\xff', len = 7} for k,v in pairs(suits) do test:do_execsql_test( ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function 2021-10-20 16:58 ` Mergen Imeev via Tarantool-patches @ 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:20 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-28 22:11 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-patches Thanks for the fixes! >>> +/** Implementation of the CHAR_LENGTH() function. */ >>> +static inline uint8_t >>> +utf8_len_char(char c) >>> +{ >>> + uint8_t u = (uint8_t)c; >>> + return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); >> >> It is not that simple really. Consider either using the old >> lengthFunc() and other sqlite utf8 helpers or use the approach >> similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT() >> and has handling for special symbols like U_SENTINEL. >> >> Otherwise you are making already third version of functions to >> work with utf8. >> >> I would even prefer to refactor lengthFunc() to stop using sqlite >> legacy and drop sqlite utf8 entirely, but I suspect it might be >> not so trivial to do and should be done later. > I was able to use ucnv_getNextUChar() here. In fact, I was able to use this > functions in all the places in this patch-set where we had to work with my or > SQLite functions that work with UTF8 characters. I think I can remove sql/utf.c > in the next patchset, since I refactor the LENGTH() and UNICODE() functions > there. Discussed in private that U8_NEXT() would work here just fine. ucnv_getNextUChar() is an overkill. In other places of the patchset too. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-11-01 10:20 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:20 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for the review! My answer, diff and new patch below. I also dropped some tests, including the ones I added in the previous version of the patch. I did this because the behavior of the ICU functions caused some of the CI targets to crash. I added a doc-bot request for this issue in the commit message. On Fri, Oct 29, 2021 at 12:11:37AM +0200, Vladislav Shpilevoy wrote: > Thanks for the fixes! > > >>> +/** Implementation of the CHAR_LENGTH() function. */ > >>> +static inline uint8_t > >>> +utf8_len_char(char c) > >>> +{ > >>> + uint8_t u = (uint8_t)c; > >>> + return 1 + (u >= 0xc2) + (u >= 0xe0) + (u >= 0xf0); > >> > >> It is not that simple really. Consider either using the old > >> lengthFunc() and other sqlite utf8 helpers or use the approach > >> similar to utf8_len() in utf8.c. It uses ICU macro U8_NEXT() > >> and has handling for special symbols like U_SENTINEL. > >> > >> Otherwise you are making already third version of functions to > >> work with utf8. > >> > >> I would even prefer to refactor lengthFunc() to stop using sqlite > >> legacy and drop sqlite utf8 entirely, but I suspect it might be > >> not so trivial to do and should be done later. > > I was able to use ucnv_getNextUChar() here. In fact, I was able to use this > > functions in all the places in this patch-set where we had to work with my or > > SQLite functions that work with UTF8 characters. I think I can remove sql/utf.c > > in the next patchset, since I refactor the LENGTH() and UNICODE() functions > > there. > > Discussed in private that U8_NEXT() would work here just fine. > ucnv_getNextUChar() is an overkill. In other places of the patchset too. > Thank you for the suggestion! I replaces ucnv_getNextUChar() by U8_NEXT(). Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index faef0eef3..bc7a1fedd 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -269,11 +269,10 @@ func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) return; assert(mem_is_str(arg) && arg->n >= 0); uint32_t len = 0; - UErrorCode err = U_ZERO_ERROR; - const char *pos = arg->z; - const char *end = arg->z + arg->n; - while (pos < end) { - ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + int offset = 0; + while (offset < arg->n) { + UChar32 c; + U8_NEXT((uint8_t *)arg->z, offset, arg->n, c); ++len; } mem_set_uint(ctx->pOut, len); diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index 27f17168b..ce8354840 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(23) +test:plan(19) --!./tcltestrunner.lua -- 2007 May 15 @@ -253,26 +253,6 @@ test:do_test( test:do_test( "badutf-3.5", - function() - return test:execsql2("SELECT length('\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") - end, { - -- <badutf-3.5> - "X", 12 - -- </badutf-3.5> - }) - -test:do_test( - "badutf-3.6", - function() - return test:execsql2("SELECT length('\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") - end, { - -- <badutf-3.6> - "X", 11 - -- </badutf-3.6> - }) - -test:do_test( - "badutf-3.7", function() return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { @@ -281,26 +261,6 @@ test:do_test( -- </badutf-3.7> }) -test:do_test( - "badutf-3.8", - function() - return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80') AS x") - end, { - -- <badutf-3.8> - "X", 10 - -- </badutf-3.8> - }) - -test:do_test( - "badutf-3.9", - function() - return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff') AS x") - end, { - -- <badutf-3.9> - "X", 10 - -- </badutf-3.9> - }) - test:do_test( "badutf-4.1", function() diff --git a/test/sql-tap/built-in-functions.test.lua b/test/sql-tap/built-in-functions.test.lua index 7fe987abc..6fae811dc 100755 --- a/test/sql-tap/built-in-functions.test.lua +++ b/test/sql-tap/built-in-functions.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(54) +test:plan(52) -- -- Make sure that number of arguments check is checked properly for SQL built-in @@ -545,28 +545,4 @@ test:do_test( {name = "COLUMN_2", type = "scalar"}, }) --- gh-4145: Make sure the character is now checked when calculating its length. - --- Character with UTF-8 code F0808080 does not exist. -test:do_execsql_test( - "builtins-4.1", - [[ - SELECT CHAR_LENGTH(CAST(x'f0808080' AS STRING)); - ]], - { - 4 - } -) - --- Character with UTF-8 code F0908080 is '𐀀'. -test:do_execsql_test( - "builtins-4.2", - [[ - SELECT CHAR_LENGTH(CAST(x'f0908080' AS STRING)); - ]], - { - 1 - } -) - test:finish_test() diff --git a/test/sql-tap/func3.test.lua b/test/sql-tap/func3.test.lua index 6999fea67..7ed0bb27f 100755 --- a/test/sql-tap/func3.test.lua +++ b/test/sql-tap/func3.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(35) +test:plan(33) --!./tcltestrunner.lua -- 2010 August 27 @@ -294,11 +294,9 @@ suits[2] = {str = '\x80', len = 1} suits[3] = {str = '\x61\x62\x63', len = 3} suits[4] = {str = '\x7f\x80\x81', len = 3} suits[5] = {str = '\x61\xc0', len = 2} -suits[6] = {str = '\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 12} -suits[7] = {str = '\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 11} -suits[8] = {str = '\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 10} -suits[9] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\x80', len = 7} -suits[10] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\xff', len = 7} +suits[6] = {str = '\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 10} +suits[7] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\x80', len = 7} +suits[8] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\xff', len = 7} for k,v in pairs(suits) do test:do_execsql_test( New patch: commit 918fd18760f4491b81266b279dc5c4b581dc1ed6 Author: Mergen Imeev <imeevma@gmail.com> Date: Fri Oct 1 11:12:39 2021 +0300 sql: rework CHAR_LENGTH() function The CHAR_LENGTH() and CHARACTER_LENGTH() functions now use ICU functions to determine the length of a string. Part of #4145 @TarantoolBot document Title: Invalid UTF-8 values and ICU Invalid UTF-8 values may be handled differently depending on the ICU version. For example, for this request: ``` SELECT CHAR_LENGTH(CAST(x'f0808080' AS STRING)); ``` On `centos 7` with `libicu-devel-50.2-4.el7_7.x86_64` the result will be: ``` tarantool> box.execute([[SELECT CHAR_LENGTH(CAST(x'f0808080' AS STRING));]]) --- - metadata: - name: COLUMN_1 type: integer rows: - [1] ... ``` On `ubuntu 20.04` with `libicu-dev` version `66.1-2ubuntu2` the result will be: ``` tarantool> box.execute([[SELECT CHAR_LENGTH(CAST(x'f0808080' AS STRING));]]) --- - metadata: - name: COLUMN_1 type: integer rows: - [4] ... ``` diff --git a/src/box/sql/func.c b/src/box/sql/func.c index dbeb38bee..bc7a1fedd 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -258,6 +258,26 @@ func_abs_double(struct sql_context *ctx, int argc, struct Mem *argv) mem_set_double(ctx->pOut, arg->u.r < 0 ? -arg->u.r : arg->u.r); } +/** Implementation of the CHAR_LENGTH() function. */ +static void +func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (mem_is_null(arg)) + return; + assert(mem_is_str(arg) && arg->n >= 0); + uint32_t len = 0; + int offset = 0; + while (offset < arg->n) { + UChar32 c; + U8_NEXT((uint8_t *)arg->z, offset, arg->n, c); + ++len; + } + mem_set_uint(ctx->pOut, len); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1912,8 +1932,8 @@ static struct sql_func_definition definitions[] = { {"AVG", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_avg, fin_avg}, {"AVG", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_avg, fin_avg}, {"CHAR", -1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_STRING, charFunc, NULL}, - {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, lengthFunc, - NULL}, + {"CHAR_LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, + func_char_length, NULL}, {"COALESCE", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, sql_builtin_stub, NULL}, {"COUNT", 0, {}, FIELD_TYPE_INTEGER, step_count, fin_count}, @@ -1957,7 +1977,7 @@ static struct sql_func_definition definitions[] = { {"LEAST", -1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, minmaxFunc, NULL}, {"LEAST", -1, {FIELD_TYPE_SCALAR}, FIELD_TYPE_SCALAR, minmaxFunc, NULL}, - {"LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, lengthFunc, + {"LENGTH", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, func_char_length, NULL}, {"LENGTH", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_INTEGER, lengthFunc, NULL}, diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index b25436186..ce8354840 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(23) +test:plan(19) --!./tcltestrunner.lua -- 2007 May 15 @@ -253,26 +253,6 @@ test:do_test( test:do_test( "badutf-3.5", - function() - return test:execsql2("SELECT length('\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") - end, { - -- <badutf-3.5> - "X", 12 - -- </badutf-3.5> - }) - -test:do_test( - "badutf-3.6", - function() - return test:execsql2("SELECT length('\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") - end, { - -- <badutf-3.6> - "X", 11 - -- </badutf-3.6> - }) - -test:do_test( - "badutf-3.7", function() return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { @@ -281,26 +261,6 @@ test:do_test( -- </badutf-3.7> }) -test:do_test( - "badutf-3.8", - function() - return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80') AS x") - end, { - -- <badutf-3.8> - "X", 7 - -- </badutf-3.8> - }) - -test:do_test( - "badutf-3.9", - function() - return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff') AS x") - end, { - -- <badutf-3.9> - "X", 7 - -- </badutf-3.9> - }) - test:do_test( "badutf-4.1", function() diff --git a/test/sql-tap/func3.test.lua b/test/sql-tap/func3.test.lua index 7f1d8d33c..7ed0bb27f 100755 --- a/test/sql-tap/func3.test.lua +++ b/test/sql-tap/func3.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(35) +test:plan(33) --!./tcltestrunner.lua -- 2010 August 27 @@ -294,11 +294,9 @@ suits[2] = {str = '\x80', len = 1} suits[3] = {str = '\x61\x62\x63', len = 3} suits[4] = {str = '\x7f\x80\x81', len = 3} suits[5] = {str = '\x61\xc0', len = 2} -suits[6] = {str = '\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 12} -suits[7] = {str = '\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 11} -suits[8] = {str = '\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 10} -suits[9] = {str = '\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80', len = 7} -suits[10] = {str = '\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff', len = 7} +suits[6] = {str = '\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80', len = 10} +suits[7] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\x80', len = 7} +suits[8] = {str = '\x80\x80\x80\x80\x80\xf0\x90\x80\x80\xff', len = 7} for k,v in pairs(suits) do test:do_execsql_test( ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 3/8] sql: refactor UPPER() and LOWER() functions 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-20 17:02 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 4/8] sql: refactor NULLIF() function Mergen Imeev via Tarantool-patches ` (6 subsequent siblings) 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches Part of #4145 --- src/box/sql/func.c | 118 +++++++++++++++++++++------------------------ 1 file changed, 55 insertions(+), 63 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 2e53b32d8..565ebcabe 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -295,6 +295,59 @@ func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) mem_set_uint(ctx->pOut, utf8_len_str(arg->z, arg->n)); } +/** Implementation of the UPPER() and LOWER() functions. */ +static void +func_lower_upper(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 1); + (void)argc; + struct Mem *arg = &argv[0]; + if (arg->type == MEM_TYPE_NULL) + return; + assert(arg->type == MEM_TYPE_STR && arg->n >= 0); + if (arg->n == 0) + return mem_set_str0_static(ctx->pOut, ""); + const char *str = arg->z; + int32_t len = arg->n; + struct sql *db = sql_get(); + char *res = sqlDbMallocRawNN(db, len); + if (res == NULL) { + ctx->is_aborted = true; + return; + } + int32_t size = sqlDbMallocSize(db, res); + assert(size >= len); + UErrorCode status = U_ZERO_ERROR; + const char *locale = NULL; + if (ctx->coll != NULL && ctx->coll->type == COLL_TYPE_ICU) { + locale = ucol_getLocaleByType(ctx->coll->collator, + ULOC_VALID_LOCALE, &status); + } + UCaseMap *cm = ucasemap_open(locale, 0, &status); + assert(cm != NULL); + assert(ctx->func->def->name[0] == 'U' || + ctx->func->def->name[0] == 'L'); + bool is_upper = ctx->func->def->name[0] == 'U'; + int32_t new_len = + is_upper ? + ucasemap_utf8ToUpper(cm, res, size, str, len, &status) : + ucasemap_utf8ToLower(cm, res, size, str, len, &status); + if (new_len > size) { + res = sqlDbRealloc(db, res, new_len); + if (db->mallocFailed != 0) { + ctx->is_aborted = true; + return; + } + status = U_ZERO_ERROR; + if (is_upper) + ucasemap_utf8ToUpper(cm, res, size, str, len, &status); + else + ucasemap_utf8ToLower(cm, res, size, str, len, &status); + } + ucasemap_close(cm); + mem_set_str_allocated(ctx->pOut, res, new_len); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -808,67 +861,6 @@ contextMalloc(struct sql_context *context, i64 nByte) return z; } -/* - * Implementation of the upper() and lower() SQL functions. - */ - -#define ICU_CASE_CONVERT(case_type) \ -static void \ -case_type##ICUFunc(sql_context *context, int argc, struct Mem *argv) \ -{ \ - char *z1; \ - const char *z2; \ - int n; \ - UNUSED_PARAMETER(argc); \ - if (mem_is_bin(&argv[0]) || mem_is_map(&argv[0]) || \ - mem_is_array(&argv[0])) { \ - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", \ - mem_str(&argv[0])); \ - context->is_aborted = true; \ - return; \ - } \ - z2 = mem_as_str0(&argv[0]); \ - n = mem_len_unsafe(&argv[0]); \ - /* \ - * Verify that the call to _bytes() \ - * does not invalidate the _text() pointer. \ - */ \ - assert(z2 == mem_as_str0(&argv[0])); \ - if (!z2) \ - return; \ - z1 = contextMalloc(context, ((i64) n) + 1); \ - if (z1 == NULL) { \ - context->is_aborted = true; \ - return; \ - } \ - UErrorCode status = U_ZERO_ERROR; \ - struct coll *coll = context->coll; \ - const char *locale = NULL; \ - if (coll != NULL && coll->type == COLL_TYPE_ICU) { \ - locale = ucol_getLocaleByType(coll->collator, \ - ULOC_VALID_LOCALE, &status); \ - } \ - UCaseMap *case_map = ucasemap_open(locale, 0, &status); \ - assert(case_map != NULL); \ - int len = ucasemap_utf8To##case_type(case_map, z1, n, z2, n, &status); \ - if (len > n) { \ - status = U_ZERO_ERROR; \ - sql_free(z1); \ - z1 = contextMalloc(context, ((i64) len) + 1); \ - if (z1 == NULL) { \ - context->is_aborted = true; \ - return; \ - } \ - ucasemap_utf8To##case_type(case_map, z1, len, z2, n, &status); \ - } \ - ucasemap_close(case_map); \ - sql_result_text(context, z1, len, sql_free); \ -} \ - -ICU_CASE_CONVERT(Lower); -ICU_CASE_CONVERT(Upper); - - /* * Some functions like COALESCE() and IFNULL() and UNLIKELY() are implemented * as VDBE code so that unused argument values do not have to be computed. @@ -2007,7 +1999,7 @@ static struct sql_func_definition definitions[] = { FIELD_TYPE_BOOLEAN, sql_builtin_stub, NULL}, {"LIKELY", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_BOOLEAN, sql_builtin_stub, NULL}, - {"LOWER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, LowerICUFunc, + {"LOWER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, func_lower_upper, NULL}, {"MAX", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_minmax, NULL}, @@ -2082,7 +2074,7 @@ static struct sql_func_definition definitions[] = { NULL}, {"UNLIKELY", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_BOOLEAN, sql_builtin_stub, NULL}, - {"UPPER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, UpperICUFunc, + {"UPPER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, func_lower_upper, NULL}, {"UUID", 0, {}, FIELD_TYPE_UUID, sql_func_uuid, NULL}, {"UUID", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_UUID, sql_func_uuid, NULL}, -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 3/8] sql: refactor UPPER() and LOWER() functions 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 3/8] sql: refactor UPPER() and LOWER() functions Mergen Imeev via Tarantool-patches @ 2021-10-20 17:02 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 17:02 UTC (permalink / raw) To: v.shpilevoy, tarantool-patches Thank you for the review! I replaced check of MEM type by mem_is_*() functions. Diff below. On Fri, Oct 01, 2021 at 07:29:29PM +0300, Mergen Imeev via Tarantool-patches wrote: > Part of #4145 > --- > src/box/sql/func.c | 118 +++++++++++++++++++++------------------------ > 1 file changed, 55 insertions(+), 63 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 2e53b32d8..565ebcabe 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -295,6 +295,59 @@ func_char_length(struct sql_context *ctx, int argc, struct Mem *argv) > mem_set_uint(ctx->pOut, utf8_len_str(arg->z, arg->n)); > } > > +/** Implementation of the UPPER() and LOWER() functions. */ > +static void > +func_lower_upper(struct sql_context *ctx, int argc, struct Mem *argv) > +{ > + assert(argc == 1); > + (void)argc; > + struct Mem *arg = &argv[0]; > + if (arg->type == MEM_TYPE_NULL) > + return; > + assert(arg->type == MEM_TYPE_STR && arg->n >= 0); > + if (arg->n == 0) > + return mem_set_str0_static(ctx->pOut, ""); > + const char *str = arg->z; > + int32_t len = arg->n; > + struct sql *db = sql_get(); > + char *res = sqlDbMallocRawNN(db, len); > + if (res == NULL) { > + ctx->is_aborted = true; > + return; > + } > + int32_t size = sqlDbMallocSize(db, res); > + assert(size >= len); > + UErrorCode status = U_ZERO_ERROR; > + const char *locale = NULL; > + if (ctx->coll != NULL && ctx->coll->type == COLL_TYPE_ICU) { > + locale = ucol_getLocaleByType(ctx->coll->collator, > + ULOC_VALID_LOCALE, &status); > + } > + UCaseMap *cm = ucasemap_open(locale, 0, &status); > + assert(cm != NULL); > + assert(ctx->func->def->name[0] == 'U' || > + ctx->func->def->name[0] == 'L'); > + bool is_upper = ctx->func->def->name[0] == 'U'; > + int32_t new_len = > + is_upper ? > + ucasemap_utf8ToUpper(cm, res, size, str, len, &status) : > + ucasemap_utf8ToLower(cm, res, size, str, len, &status); > + if (new_len > size) { > + res = sqlDbRealloc(db, res, new_len); > + if (db->mallocFailed != 0) { > + ctx->is_aborted = true; > + return; > + } > + status = U_ZERO_ERROR; > + if (is_upper) > + ucasemap_utf8ToUpper(cm, res, size, str, len, &status); > + else > + ucasemap_utf8ToLower(cm, res, size, str, len, &status); > + } > + ucasemap_close(cm); > + mem_set_str_allocated(ctx->pOut, res, new_len); > +} > + > static const unsigned char * > mem_as_ustr(struct Mem *mem) > { > @@ -808,67 +861,6 @@ contextMalloc(struct sql_context *context, i64 nByte) > return z; > } > > -/* > - * Implementation of the upper() and lower() SQL functions. > - */ > - > -#define ICU_CASE_CONVERT(case_type) \ > -static void \ > -case_type##ICUFunc(sql_context *context, int argc, struct Mem *argv) \ > -{ \ > - char *z1; \ > - const char *z2; \ > - int n; \ > - UNUSED_PARAMETER(argc); \ > - if (mem_is_bin(&argv[0]) || mem_is_map(&argv[0]) || \ > - mem_is_array(&argv[0])) { \ > - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", \ > - mem_str(&argv[0])); \ > - context->is_aborted = true; \ > - return; \ > - } \ > - z2 = mem_as_str0(&argv[0]); \ > - n = mem_len_unsafe(&argv[0]); \ > - /* \ > - * Verify that the call to _bytes() \ > - * does not invalidate the _text() pointer. \ > - */ \ > - assert(z2 == mem_as_str0(&argv[0])); \ > - if (!z2) \ > - return; \ > - z1 = contextMalloc(context, ((i64) n) + 1); \ > - if (z1 == NULL) { \ > - context->is_aborted = true; \ > - return; \ > - } \ > - UErrorCode status = U_ZERO_ERROR; \ > - struct coll *coll = context->coll; \ > - const char *locale = NULL; \ > - if (coll != NULL && coll->type == COLL_TYPE_ICU) { \ > - locale = ucol_getLocaleByType(coll->collator, \ > - ULOC_VALID_LOCALE, &status); \ > - } \ > - UCaseMap *case_map = ucasemap_open(locale, 0, &status); \ > - assert(case_map != NULL); \ > - int len = ucasemap_utf8To##case_type(case_map, z1, n, z2, n, &status); \ > - if (len > n) { \ > - status = U_ZERO_ERROR; \ > - sql_free(z1); \ > - z1 = contextMalloc(context, ((i64) len) + 1); \ > - if (z1 == NULL) { \ > - context->is_aborted = true; \ > - return; \ > - } \ > - ucasemap_utf8To##case_type(case_map, z1, len, z2, n, &status); \ > - } \ > - ucasemap_close(case_map); \ > - sql_result_text(context, z1, len, sql_free); \ > -} \ > - > -ICU_CASE_CONVERT(Lower); > -ICU_CASE_CONVERT(Upper); > - > - > /* > * Some functions like COALESCE() and IFNULL() and UNLIKELY() are implemented > * as VDBE code so that unused argument values do not have to be computed. > @@ -2007,7 +1999,7 @@ static struct sql_func_definition definitions[] = { > FIELD_TYPE_BOOLEAN, sql_builtin_stub, NULL}, > {"LIKELY", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_BOOLEAN, sql_builtin_stub, > NULL}, > - {"LOWER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, LowerICUFunc, > + {"LOWER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, func_lower_upper, > NULL}, > > {"MAX", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_minmax, NULL}, > @@ -2082,7 +2074,7 @@ static struct sql_func_definition definitions[] = { > NULL}, > {"UNLIKELY", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_BOOLEAN, sql_builtin_stub, > NULL}, > - {"UPPER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, UpperICUFunc, > + {"UPPER", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, func_lower_upper, > NULL}, > {"UUID", 0, {}, FIELD_TYPE_UUID, sql_func_uuid, NULL}, > {"UUID", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_UUID, sql_func_uuid, NULL}, > -- > 2.25.1 > Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 441bcb51d..d6c557c54 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -286,9 +286,9 @@ func_lower_upper(struct sql_context *ctx, int argc, struct Mem *argv) assert(argc == 1); (void)argc; struct Mem *arg = &argv[0]; - if (arg->type == MEM_TYPE_NULL) + if (mem_is_null(arg)) return; - assert(arg->type == MEM_TYPE_STR && arg->n >= 0); + assert(mem_is_str(arg) && arg->n >= 0); if (arg->n == 0) return mem_set_str0_static(ctx->pOut, ""); const char *str = arg->z; ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 4/8] sql: refactor NULLIF() function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (2 preceding siblings ...) 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 3/8] sql: refactor UPPER() and LOWER() functions Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function Mergen Imeev via Tarantool-patches ` (5 subsequent siblings) 9 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches Part of #4145 --- src/box/sql/func.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 565ebcabe..17d5fab8f 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -348,6 +348,18 @@ func_lower_upper(struct sql_context *ctx, int argc, struct Mem *argv) mem_set_str_allocated(ctx->pOut, res, new_len); } +/** Implementation of the NULLIF() function. */ +static void +func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_cmp_scalar(&argv[0], &argv[1], ctx->coll) == 0) + return; + if (mem_copy(ctx->pOut, &argv[0]) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1210,20 +1222,6 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) sql_result_bool(context, res == MATCH); } -/* - * Implementation of the NULLIF(x,y) function. The result is the first - * argument if the arguments are different. The result is NULL if the - * arguments are equal to each other. - */ -static void -nullifFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - (void)argc; - struct coll *pColl = context->coll; - if (mem_cmp_scalar(&argv[0], &argv[1], pColl) != 0) - sql_result_value(context, &argv[0]); -} - /** * Implementation of the version() function. The result is the * version of the Tarantool that is running. @@ -2021,7 +2019,7 @@ static struct sql_func_definition definitions[] = { {"MIN", 1, {FIELD_TYPE_SCALAR}, FIELD_TYPE_SCALAR, step_minmax, NULL}, {"NULLIF", 2, {FIELD_TYPE_ANY, FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, - nullifFunc, NULL}, + func_nullif, NULL}, {"POSITION", 2, {FIELD_TYPE_STRING, FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, position_func, NULL}, {"PRINTF", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, printfFunc, -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (3 preceding siblings ...) 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 4/8] sql: refactor NULLIF() function Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-20 17:05 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function Mergen Imeev via Tarantool-patches ` (4 subsequent siblings) 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches This patch refactoring TRIM() and fixes an issue with incorrect trimming of some VARBINARY values. Part of #4415 --- src/box/sql/func.c | 365 ++++++++++++++++------------------ test/sql-tap/badutf1.test.lua | 41 ++-- 2 files changed, 202 insertions(+), 204 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 17d5fab8f..1d1a8b0cd 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -360,6 +360,176 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +static inline void +return_empty_str(struct sql_context *ctx, bool is_str) +{ + return is_str ? mem_set_str_static(ctx->pOut, "", 0) : + mem_set_bin_static(ctx->pOut, "", 0); +} + +/** Implementation of the TRIM() function. */ +static inline int +trim_bin_end(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + char c = str[end - 1]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + --end; + } + return end; +} + +static inline int +trim_bin_start(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + char c = str[start]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + ++start; + } + return start; +} + +static void +func_trim_bin(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_bin(&argv[2]))); + assert(mem_is_bin(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *octets; + int octets_size; + if (argc == 3) { + octets = argv[2].z; + octets_size = argv[2].n; + } else { + octets = "\0"; + octets_size = 1; + } + + int flags = argv[1].u.u; + int end = trim_bin_end(str, size, octets, octets_size, flags); + int start = trim_bin_start(str, end, octets, octets_size, flags); + + if (start >= end) + return return_empty_str(ctx, false); + if (mem_copy_bin(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + +static inline int +trim_str_end(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + end - len; + is_trimmed = len <= end && memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + end -= len; + } + return end; +} + +static inline int +trim_str_start(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + start; + is_trimmed = start + len <= end && + memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + start += len; + } + return start; +} + +static void +func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_str(&argv[2]))); + assert(mem_is_str(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *chars; + int chars_size; + if (argc == 3) { + chars = argv[2].z; + chars_size = argv[2].n; + } else { + chars = " "; + chars_size = 1; + } + + uint8_t *chars_len = sqlDbMallocRawNN(sql_get(), + chars_size * sizeof(uint8_t)); + if (chars_len == NULL) { + ctx->is_aborted = true; + return; + } + int chars_count = 0; + int i = 0; + while (i < chars_size) { + uint8_t len = utf8_len_char(chars[i]); + i += len; + if (i <= chars_size) + chars_len[chars_count++] = len; + } + + uint64_t flags = argv[1].u.u; + int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); + int start = trim_str_start(str, end, chars, chars_len, chars_count, + flags); + sqlDbFree(sql_get(), chars_len); + + if (start >= end) + return mem_set_str0_static(ctx->pOut, ""); + if (mem_copy_str(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1545,193 +1715,6 @@ replaceFunc(struct sql_context *context, int argc, struct Mem *argv) mem_set_bin_dynamic(context->pOut, (char *)zOut, j); } -/** - * Remove characters included in @a trim_set from @a input_str - * until encounter a character that doesn't belong to @a trim_set. - * Remove from the side specified by @a flags. - * @param context SQL context. - * @param flags Trim specification: left, right or both. - * @param trim_set The set of characters for trimming. - * @param char_len Lengths of each UTF-8 character in @a trim_set. - * @param char_cnt A number of UTF-8 characters in @a trim_set. - * @param input_str Input string for trimming. - * @param input_str_sz Input string size in bytes. - */ -static void -trim_procedure(struct sql_context *context, enum trim_side_mask flags, - const unsigned char *trim_set, const uint8_t *char_len, - int char_cnt, const unsigned char *input_str, int input_str_sz) -{ - if (char_cnt == 0) - goto finish; - int i, len; - const unsigned char *z; - if ((flags & TRIM_LEADING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(input_str, z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str += len; - input_str_sz -= len; - } - } - if ((flags & TRIM_TRAILING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(&input_str[input_str_sz - len], - z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str_sz -= len; - } - } -finish: - if (context->func->def->returns == FIELD_TYPE_STRING) - mem_copy_str(context->pOut, (char *)input_str, input_str_sz); - else - mem_copy_bin(context->pOut, (char *)input_str, input_str_sz); -} - -/** - * Prepare arguments for trimming procedure. Allocate memory for - * @a char_len (array of lengths each character in @a trim_set) - * and fill it. - * - * @param context SQL context. - * @param trim_set The set of characters for trimming. - * @param[out] char_len Lengths of each character in @ trim_set. - * @retval >=0 A number of UTF-8 characters in @a trim_set. - * @retval -1 Memory allocation error. - */ -static int -trim_prepare_char_len(struct sql_context *context, - const unsigned char *trim_set, int trim_set_sz, - uint8_t **char_len) -{ - /* - * Count the number of UTF-8 characters passing through - * the entire char set, but not up to the '\0' or X'00' - * character. This allows to handle trimming set - * containing such characters. - */ - int char_cnt = sql_utf8_char_count(trim_set, trim_set_sz); - if (char_cnt == 0) { - *char_len = NULL; - return 0; - } - - if ((*char_len = (uint8_t *)contextMalloc(context, char_cnt)) == NULL) - return -1; - - int i = 0, j = 0; - while(j < char_cnt) { - int old_i = i; - SQL_UTF8_FWD_1(trim_set, i, trim_set_sz); - (*char_len)[j++] = i - old_i; - } - - return char_cnt; -} - -/** - * Normalize args from @a argv input array when it has two args. - * - * Case: TRIM(<str>) - * Call trimming procedure with TRIM_BOTH as the flags and " " as - * the trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>) - * If user has specified side keyword only, then call trimming - * procedure with the specified side and " " as the trimming set. - */ -static void -trim_func_two_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2) -{ - const unsigned char *trim_set; - if (mem_is_bin(arg1)) - trim_set = (const unsigned char *)"\0"; - else - trim_set = (const unsigned char *)" "; - const unsigned char *input_str; - if ((input_str = mem_as_ustr(arg1)) == NULL) - return; - - int input_str_sz = mem_len_unsafe(arg1); - assert(arg2->type == MEM_TYPE_UINT); - uint8_t len_one = 1; - trim_procedure(context, arg2->u.u, trim_set, - &len_one, 1, input_str, input_str_sz); -} - -/** - * Normalize args from @a argv input array when it has three args. - * - * Case: TRIM(<character_set> FROM <str>) - * If user has specified <character_set> only, call trimming procedure with - * TRIM_BOTH as the flags and that trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH <character_set> FROM <str>) - * If user has specified side keyword and <character_set>, then - * call trimming procedure with that args. - */ -static void -trim_func_three_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2, sql_value *arg3) -{ - assert(arg2->type == MEM_TYPE_UINT); - const unsigned char *input_str, *trim_set; - if ((input_str = mem_as_ustr(arg1)) == NULL || - (trim_set = mem_as_ustr(arg3)) == NULL) - return; - - int trim_set_sz = mem_len_unsafe(arg3); - int input_str_sz = mem_len_unsafe(arg1); - uint8_t *char_len; - int char_cnt = trim_prepare_char_len(context, trim_set, trim_set_sz, - &char_len); - if (char_cnt == -1) - return; - trim_procedure(context, arg2->u.u, trim_set, char_len, - char_cnt, input_str, input_str_sz); - sql_free(char_len); -} - -/** - * Normalize args from @a argv input array when it has one, - * two or three args. - * - * This is a dispatcher function that calls corresponding - * implementation depending on the number of arguments. -*/ -static void -trim_func(struct sql_context *context, int argc, struct Mem *argv) -{ - switch (argc) { - case 2: - trim_func_two_args(context, &argv[0], &argv[1]); - break; - case 3: - trim_func_three_args(context, &argv[0], &argv[1], &argv[2]); - break; - default: - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "TRIM", - "2 or 3", argc); - context->is_aborted = true; - } -} - /* * Compute the soundex encoding of a word. * @@ -2058,14 +2041,14 @@ static struct sql_func_definition definitions[] = { fin_total}, {"TRIM", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_STRING}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TRIM", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_VARBINARY}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TYPEOF", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, typeofFunc, NULL}, {"UNICODE", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, unicodeFunc, diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index b25436186..9fae91621 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(23) +test:plan(24) --!./tcltestrunner.lua -- 2007 May 15 @@ -336,47 +336,62 @@ test:do_test( test:do_test( "badutf-4.4", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'808080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'808080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.4> - "X", "808080F0808080FF" + "X", "F0" -- </badutf-4.4> }) test:do_test( "badutf-4.5", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff8080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff8080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.5> - "X", "80F0808080FF" + "X", "F0" -- </badutf-4.5> }) test:do_test( "badutf-4.6", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.6> - "X", "F0808080FF" + "X", "F0" -- </badutf-4.6> }) test:do_test( "badutf-4.7", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff8080' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff8080' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.7> - "X", "FF80F0808080FF" + "X", "F0" -- </badutf-4.7> }) +-- gh-4145: Make sure that TRIM() properly work with VARBINARY. +test:do_execsql_test( + "badutf-5", + [[ + SELECT HEX(TRIM(x'ff1234' from x'1234125678123412')); + ]], + { + '5678' + } +) + --db2("close") -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function Mergen Imeev via Tarantool-patches @ 2021-10-20 17:05 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:12 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 17:05 UTC (permalink / raw) To: v.shpilevoy, tarantool-patches Thank you for the review! I replaced self-created function by ucnv_getNextUChar(). Diff and new patch below. On Fri, Oct 01, 2021 at 07:29:32PM +0300, Mergen Imeev via Tarantool-patches wrote: <cut> Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index ed7a85c83..1294ff5b3 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -494,12 +494,14 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) return; } int chars_count = 0; - int i = 0; - while (i < chars_size) { - uint8_t len = utf8_len_char(chars[i]); - i += len; - if (i <= chars_size) - chars_len[chars_count++] = len; + + UErrorCode err = U_ZERO_ERROR; + const char *pos_start = chars; + const char *pos_end = chars + chars_size; + while (pos_start < pos_end) { + const char *cur = pos_start; + ucnv_getNextUChar(icu_utf8_conv, &pos_start, pos_end, &err); + chars_len[chars_count++] = pos_start - cur; } uint64_t flags = argv[1].u.u; New patch: commit d91a925090822bf074751f96b65d26bb35f5e5e2 Author: Mergen Imeev <imeevma@gmail.com> Date: Tue Sep 21 19:45:36 2021 +0300 sql: rework TRIM() function This patch refactoring TRIM() and fixes an issue with incorrect trimming of some VARBINARY values. Also, TRIM() now use ICU functions instead of self-created. Part of #4415 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index d36c83501..1294ff5b3 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -344,6 +344,178 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +static inline void +return_empty_str(struct sql_context *ctx, bool is_str) +{ + return is_str ? mem_set_str_static(ctx->pOut, "", 0) : + mem_set_bin_static(ctx->pOut, "", 0); +} + +/** Implementation of the TRIM() function. */ +static inline int +trim_bin_end(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + char c = str[end - 1]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + --end; + } + return end; +} + +static inline int +trim_bin_start(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + char c = str[start]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + ++start; + } + return start; +} + +static void +func_trim_bin(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_bin(&argv[2]))); + assert(mem_is_bin(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *octets; + int octets_size; + if (argc == 3) { + octets = argv[2].z; + octets_size = argv[2].n; + } else { + octets = "\0"; + octets_size = 1; + } + + int flags = argv[1].u.u; + int end = trim_bin_end(str, size, octets, octets_size, flags); + int start = trim_bin_start(str, end, octets, octets_size, flags); + + if (start >= end) + return return_empty_str(ctx, false); + if (mem_copy_bin(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + +static inline int +trim_str_end(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + end - len; + is_trimmed = len <= end && memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + end -= len; + } + return end; +} + +static inline int +trim_str_start(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + start; + is_trimmed = start + len <= end && + memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + start += len; + } + return start; +} + +static void +func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_str(&argv[2]))); + assert(mem_is_str(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *chars; + int chars_size; + if (argc == 3) { + chars = argv[2].z; + chars_size = argv[2].n; + } else { + chars = " "; + chars_size = 1; + } + + uint8_t *chars_len = sqlDbMallocRawNN(sql_get(), + chars_size * sizeof(uint8_t)); + if (chars_len == NULL) { + ctx->is_aborted = true; + return; + } + int chars_count = 0; + + UErrorCode err = U_ZERO_ERROR; + const char *pos_start = chars; + const char *pos_end = chars + chars_size; + while (pos_start < pos_end) { + const char *cur = pos_start; + ucnv_getNextUChar(icu_utf8_conv, &pos_start, pos_end, &err); + chars_len[chars_count++] = pos_start - cur; + } + + uint64_t flags = argv[1].u.u; + int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); + int start = trim_str_start(str, end, chars, chars_len, chars_count, + flags); + sqlDbFree(sql_get(), chars_len); + + if (start >= end) + return mem_set_str0_static(ctx->pOut, ""); + if (mem_copy_str(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1528,193 +1700,6 @@ replaceFunc(struct sql_context *context, int argc, struct Mem *argv) mem_set_bin_dynamic(context->pOut, (char *)zOut, j); } -/** - * Remove characters included in @a trim_set from @a input_str - * until encounter a character that doesn't belong to @a trim_set. - * Remove from the side specified by @a flags. - * @param context SQL context. - * @param flags Trim specification: left, right or both. - * @param trim_set The set of characters for trimming. - * @param char_len Lengths of each UTF-8 character in @a trim_set. - * @param char_cnt A number of UTF-8 characters in @a trim_set. - * @param input_str Input string for trimming. - * @param input_str_sz Input string size in bytes. - */ -static void -trim_procedure(struct sql_context *context, enum trim_side_mask flags, - const unsigned char *trim_set, const uint8_t *char_len, - int char_cnt, const unsigned char *input_str, int input_str_sz) -{ - if (char_cnt == 0) - goto finish; - int i, len; - const unsigned char *z; - if ((flags & TRIM_LEADING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(input_str, z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str += len; - input_str_sz -= len; - } - } - if ((flags & TRIM_TRAILING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(&input_str[input_str_sz - len], - z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str_sz -= len; - } - } -finish: - if (context->func->def->returns == FIELD_TYPE_STRING) - mem_copy_str(context->pOut, (char *)input_str, input_str_sz); - else - mem_copy_bin(context->pOut, (char *)input_str, input_str_sz); -} - -/** - * Prepare arguments for trimming procedure. Allocate memory for - * @a char_len (array of lengths each character in @a trim_set) - * and fill it. - * - * @param context SQL context. - * @param trim_set The set of characters for trimming. - * @param[out] char_len Lengths of each character in @ trim_set. - * @retval >=0 A number of UTF-8 characters in @a trim_set. - * @retval -1 Memory allocation error. - */ -static int -trim_prepare_char_len(struct sql_context *context, - const unsigned char *trim_set, int trim_set_sz, - uint8_t **char_len) -{ - /* - * Count the number of UTF-8 characters passing through - * the entire char set, but not up to the '\0' or X'00' - * character. This allows to handle trimming set - * containing such characters. - */ - int char_cnt = sql_utf8_char_count(trim_set, trim_set_sz); - if (char_cnt == 0) { - *char_len = NULL; - return 0; - } - - if ((*char_len = (uint8_t *)contextMalloc(context, char_cnt)) == NULL) - return -1; - - int i = 0, j = 0; - while(j < char_cnt) { - int old_i = i; - SQL_UTF8_FWD_1(trim_set, i, trim_set_sz); - (*char_len)[j++] = i - old_i; - } - - return char_cnt; -} - -/** - * Normalize args from @a argv input array when it has two args. - * - * Case: TRIM(<str>) - * Call trimming procedure with TRIM_BOTH as the flags and " " as - * the trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>) - * If user has specified side keyword only, then call trimming - * procedure with the specified side and " " as the trimming set. - */ -static void -trim_func_two_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2) -{ - const unsigned char *trim_set; - if (mem_is_bin(arg1)) - trim_set = (const unsigned char *)"\0"; - else - trim_set = (const unsigned char *)" "; - const unsigned char *input_str; - if ((input_str = mem_as_ustr(arg1)) == NULL) - return; - - int input_str_sz = mem_len_unsafe(arg1); - assert(arg2->type == MEM_TYPE_UINT); - uint8_t len_one = 1; - trim_procedure(context, arg2->u.u, trim_set, - &len_one, 1, input_str, input_str_sz); -} - -/** - * Normalize args from @a argv input array when it has three args. - * - * Case: TRIM(<character_set> FROM <str>) - * If user has specified <character_set> only, call trimming procedure with - * TRIM_BOTH as the flags and that trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH <character_set> FROM <str>) - * If user has specified side keyword and <character_set>, then - * call trimming procedure with that args. - */ -static void -trim_func_three_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2, sql_value *arg3) -{ - assert(arg2->type == MEM_TYPE_UINT); - const unsigned char *input_str, *trim_set; - if ((input_str = mem_as_ustr(arg1)) == NULL || - (trim_set = mem_as_ustr(arg3)) == NULL) - return; - - int trim_set_sz = mem_len_unsafe(arg3); - int input_str_sz = mem_len_unsafe(arg1); - uint8_t *char_len; - int char_cnt = trim_prepare_char_len(context, trim_set, trim_set_sz, - &char_len); - if (char_cnt == -1) - return; - trim_procedure(context, arg2->u.u, trim_set, char_len, - char_cnt, input_str, input_str_sz); - sql_free(char_len); -} - -/** - * Normalize args from @a argv input array when it has one, - * two or three args. - * - * This is a dispatcher function that calls corresponding - * implementation depending on the number of arguments. -*/ -static void -trim_func(struct sql_context *context, int argc, struct Mem *argv) -{ - switch (argc) { - case 2: - trim_func_two_args(context, &argv[0], &argv[1]); - break; - case 3: - trim_func_three_args(context, &argv[0], &argv[1], &argv[2]); - break; - default: - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "TRIM", - "2 or 3", argc); - context->is_aborted = true; - } -} - /* * Compute the soundex encoding of a word. * @@ -2041,14 +2026,14 @@ static struct sql_func_definition definitions[] = { fin_total}, {"TRIM", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_STRING}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TRIM", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_VARBINARY}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TYPEOF", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, typeofFunc, NULL}, {"UNICODE", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, unicodeFunc, diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index 27f17168b..e8b162ecc 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(23) +test:plan(24) --!./tcltestrunner.lua -- 2007 May 15 @@ -336,47 +336,62 @@ test:do_test( test:do_test( "badutf-4.4", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'808080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'808080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.4> - "X", "808080F0808080FF" + "X", "F0" -- </badutf-4.4> }) test:do_test( "badutf-4.5", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff8080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff8080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.5> - "X", "80F0808080FF" + "X", "F0" -- </badutf-4.5> }) test:do_test( "badutf-4.6", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.6> - "X", "F0808080FF" + "X", "F0" -- </badutf-4.6> }) test:do_test( "badutf-4.7", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff8080' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff8080' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.7> - "X", "FF80F0808080FF" + "X", "F0" -- </badutf-4.7> }) +-- gh-4145: Make sure that TRIM() properly work with VARBINARY. +test:do_execsql_test( + "badutf-5", + [[ + SELECT HEX(TRIM(x'ff1234' from x'1234125678123412')); + ]], + { + '5678' + } +) + --db2("close") ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function 2021-10-20 17:05 ` Mergen Imeev via Tarantool-patches @ 2021-10-28 22:12 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:35 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-28 22:12 UTC (permalink / raw) To: Mergen Imeev, tarantool-patches Thanks for the fixes! See 3 comments below. > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index d36c83501..1294ff5b3 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -344,6 +344,178 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) > ctx->is_aborted = true; > } > > +static inline void > +return_empty_str(struct sql_context *ctx, bool is_str) 1. It is called in a single place with is_str = false. Could we maybe inline it? > +{ > + return is_str ? mem_set_str_static(ctx->pOut, "", 0) : > + mem_set_bin_static(ctx->pOut, "", 0); > +} ... > + > +static void > +func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) > +{ > + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) > + return; > + assert(argc == 2 || (argc == 3 && mem_is_str(&argv[2]))); > + assert(mem_is_str(&argv[0]) && mem_is_uint(&argv[1])); > + const char *str = argv[0].z; > + int size = argv[0].n; > + const char *chars; > + int chars_size; > + if (argc == 3) { > + chars = argv[2].z; > + chars_size = argv[2].n; > + } else { > + chars = " "; > + chars_size = 1; > + } > + > + uint8_t *chars_len = sqlDbMallocRawNN(sql_get(), > + chars_size * sizeof(uint8_t)); 2. Could use fiber region here. Up to you. > + if (chars_len == NULL) { > + ctx->is_aborted = true; > + return; > + } > + int chars_count = 0; > + > + UErrorCode err = U_ZERO_ERROR; > + const char *pos_start = chars; > + const char *pos_end = chars + chars_size; > + while (pos_start < pos_end) { > + const char *cur = pos_start; > + ucnv_getNextUChar(icu_utf8_conv, &pos_start, pos_end, &err); > + chars_len[chars_count++] = pos_start - cur; > + } > + > + uint64_t flags = argv[1].u.u; > + int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); > + int start = trim_str_start(str, end, chars, chars_len, chars_count, > + flags); 3. The second line of the call is misaligned a bit. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function 2021-10-28 22:12 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-11-01 10:35 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:35 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for the review! My answers, diff and new patch below. Also, I replaced ucnv_getNextUChar() by U8_NEXT(). On Fri, Oct 29, 2021 at 12:12:25AM +0200, Vladislav Shpilevoy wrote: > Thanks for the fixes! > > See 3 comments below. > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index d36c83501..1294ff5b3 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -344,6 +344,178 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) > > ctx->is_aborted = true; > > } > > > > +static inline void > > +return_empty_str(struct sql_context *ctx, bool is_str) > > 1. It is called in a single place with is_str = false. Could we maybe > inline it? > There is couple more places in the next patches, but I agree that this function is not needed. Dropped. > > +{ > > + return is_str ? mem_set_str_static(ctx->pOut, "", 0) : > > + mem_set_bin_static(ctx->pOut, "", 0); > > +} > > ... > > > + > > +static void > > +func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) > > +{ > > + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) > > + return; > > + assert(argc == 2 || (argc == 3 && mem_is_str(&argv[2]))); > > + assert(mem_is_str(&argv[0]) && mem_is_uint(&argv[1])); > > + const char *str = argv[0].z; > > + int size = argv[0].n; > > + const char *chars; > > + int chars_size; > > + if (argc == 3) { > > + chars = argv[2].z; > > + chars_size = argv[2].n; > > + } else { > > + chars = " "; > > + chars_size = 1; > > + } > > + > > + uint8_t *chars_len = sqlDbMallocRawNN(sql_get(), > > + chars_size * sizeof(uint8_t)); > > 2. Could use fiber region here. Up to you. > Fixed. > > + if (chars_len == NULL) { > > + ctx->is_aborted = true; > > + return; > > + } > > + int chars_count = 0; > > + > > + UErrorCode err = U_ZERO_ERROR; > > + const char *pos_start = chars; > > + const char *pos_end = chars + chars_size; > > + while (pos_start < pos_end) { > > + const char *cur = pos_start; > > + ucnv_getNextUChar(icu_utf8_conv, &pos_start, pos_end, &err); > > + chars_len[chars_count++] = pos_start - cur; > > + } > > + > > + uint64_t flags = argv[1].u.u; > > + int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); > > + int start = trim_str_start(str, end, chars, chars_len, chars_count, > > + flags); > > 3. The second line of the call is misaligned a bit. Fixed. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 0ad6ac966..ba6b9246d 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -343,13 +343,6 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } -static inline void -return_empty_str(struct sql_context *ctx, bool is_str) -{ - return is_str ? mem_set_str_static(ctx->pOut, "", 0) : - mem_set_bin_static(ctx->pOut, "", 0); -} - /** Implementation of the TRIM() function. */ static inline int trim_bin_end(const char *str, int end, const char *octets, int octets_size, @@ -412,7 +405,7 @@ func_trim_bin(struct sql_context *ctx, int argc, struct Mem *argv) int start = trim_bin_start(str, end, octets, octets_size, flags); if (start >= end) - return return_empty_str(ctx, false); + return mem_set_bin_static(ctx->pOut, "", 0); if (mem_copy_bin(ctx->pOut, &str[start], end - start) != 0) ctx->is_aborted = true; } @@ -486,28 +479,29 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) chars_size = 1; } - uint8_t *chars_len = sqlDbMallocRawNN(sql_get(), - chars_size * sizeof(uint8_t)); + struct region *region = &fiber()->gc; + size_t svp = region_used(region); + uint8_t *chars_len = region_alloc(region, chars_size); if (chars_len == NULL) { ctx->is_aborted = true; + diag_set(OutOfMemory, chars_size, "region_alloc", "chars_len"); return; } int chars_count = 0; - UErrorCode err = U_ZERO_ERROR; - const char *pos_start = chars; - const char *pos_end = chars + chars_size; - while (pos_start < pos_end) { - const char *cur = pos_start; - ucnv_getNextUChar(icu_utf8_conv, &pos_start, pos_end, &err); - chars_len[chars_count++] = pos_start - cur; + int offset = 0; + while (offset < chars_size) { + UChar32 c; + int prev = offset; + U8_NEXT((uint8_t *)chars, offset, chars_size, c); + chars_len[chars_count++] = offset - prev; } uint64_t flags = argv[1].u.u; int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); int start = trim_str_start(str, end, chars, chars_len, chars_count, - flags); - sqlDbFree(sql_get(), chars_len); + flags); + region_truncate(region, svp); if (start >= end) return mem_set_str0_static(ctx->pOut, ""); New patch: commit 2e74fe281daee22a67d9ec61659ccee569f7fd65 Author: Mergen Imeev <imeevma@gmail.com> Date: Tue Sep 21 19:45:36 2021 +0300 sql: rework TRIM() function This patch refactoring TRIM() and fixes an issue with incorrect trimming of some VARBINARY values. Also, TRIM() now use ICU functions instead of self-created. Part of #4415 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 7d54a39cd..ba6b9246d 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -343,6 +343,172 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementation of the TRIM() function. */ +static inline int +trim_bin_end(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + char c = str[end - 1]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + --end; + } + return end; +} + +static inline int +trim_bin_start(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + char c = str[start]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + ++start; + } + return start; +} + +static void +func_trim_bin(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_bin(&argv[2]))); + assert(mem_is_bin(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *octets; + int octets_size; + if (argc == 3) { + octets = argv[2].z; + octets_size = argv[2].n; + } else { + octets = "\0"; + octets_size = 1; + } + + int flags = argv[1].u.u; + int end = trim_bin_end(str, size, octets, octets_size, flags); + int start = trim_bin_start(str, end, octets, octets_size, flags); + + if (start >= end) + return mem_set_bin_static(ctx->pOut, "", 0); + if (mem_copy_bin(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + +static inline int +trim_str_end(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + end - len; + is_trimmed = len <= end && memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + end -= len; + } + return end; +} + +static inline int +trim_str_start(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + start; + is_trimmed = start + len <= end && + memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + start += len; + } + return start; +} + +static void +func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_str(&argv[2]))); + assert(mem_is_str(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *chars; + int chars_size; + if (argc == 3) { + chars = argv[2].z; + chars_size = argv[2].n; + } else { + chars = " "; + chars_size = 1; + } + + struct region *region = &fiber()->gc; + size_t svp = region_used(region); + uint8_t *chars_len = region_alloc(region, chars_size); + if (chars_len == NULL) { + ctx->is_aborted = true; + diag_set(OutOfMemory, chars_size, "region_alloc", "chars_len"); + return; + } + int chars_count = 0; + + int offset = 0; + while (offset < chars_size) { + UChar32 c; + int prev = offset; + U8_NEXT((uint8_t *)chars, offset, chars_size, c); + chars_len[chars_count++] = offset - prev; + } + + uint64_t flags = argv[1].u.u; + int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); + int start = trim_str_start(str, end, chars, chars_len, chars_count, + flags); + region_truncate(region, svp); + + if (start >= end) + return mem_set_str0_static(ctx->pOut, ""); + if (mem_copy_str(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1527,193 +1693,6 @@ replaceFunc(struct sql_context *context, int argc, struct Mem *argv) mem_set_bin_dynamic(context->pOut, (char *)zOut, j); } -/** - * Remove characters included in @a trim_set from @a input_str - * until encounter a character that doesn't belong to @a trim_set. - * Remove from the side specified by @a flags. - * @param context SQL context. - * @param flags Trim specification: left, right or both. - * @param trim_set The set of characters for trimming. - * @param char_len Lengths of each UTF-8 character in @a trim_set. - * @param char_cnt A number of UTF-8 characters in @a trim_set. - * @param input_str Input string for trimming. - * @param input_str_sz Input string size in bytes. - */ -static void -trim_procedure(struct sql_context *context, enum trim_side_mask flags, - const unsigned char *trim_set, const uint8_t *char_len, - int char_cnt, const unsigned char *input_str, int input_str_sz) -{ - if (char_cnt == 0) - goto finish; - int i, len; - const unsigned char *z; - if ((flags & TRIM_LEADING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(input_str, z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str += len; - input_str_sz -= len; - } - } - if ((flags & TRIM_TRAILING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(&input_str[input_str_sz - len], - z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str_sz -= len; - } - } -finish: - if (context->func->def->returns == FIELD_TYPE_STRING) - mem_copy_str(context->pOut, (char *)input_str, input_str_sz); - else - mem_copy_bin(context->pOut, (char *)input_str, input_str_sz); -} - -/** - * Prepare arguments for trimming procedure. Allocate memory for - * @a char_len (array of lengths each character in @a trim_set) - * and fill it. - * - * @param context SQL context. - * @param trim_set The set of characters for trimming. - * @param[out] char_len Lengths of each character in @ trim_set. - * @retval >=0 A number of UTF-8 characters in @a trim_set. - * @retval -1 Memory allocation error. - */ -static int -trim_prepare_char_len(struct sql_context *context, - const unsigned char *trim_set, int trim_set_sz, - uint8_t **char_len) -{ - /* - * Count the number of UTF-8 characters passing through - * the entire char set, but not up to the '\0' or X'00' - * character. This allows to handle trimming set - * containing such characters. - */ - int char_cnt = sql_utf8_char_count(trim_set, trim_set_sz); - if (char_cnt == 0) { - *char_len = NULL; - return 0; - } - - if ((*char_len = (uint8_t *)contextMalloc(context, char_cnt)) == NULL) - return -1; - - int i = 0, j = 0; - while(j < char_cnt) { - int old_i = i; - SQL_UTF8_FWD_1(trim_set, i, trim_set_sz); - (*char_len)[j++] = i - old_i; - } - - return char_cnt; -} - -/** - * Normalize args from @a argv input array when it has two args. - * - * Case: TRIM(<str>) - * Call trimming procedure with TRIM_BOTH as the flags and " " as - * the trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>) - * If user has specified side keyword only, then call trimming - * procedure with the specified side and " " as the trimming set. - */ -static void -trim_func_two_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2) -{ - const unsigned char *trim_set; - if (mem_is_bin(arg1)) - trim_set = (const unsigned char *)"\0"; - else - trim_set = (const unsigned char *)" "; - const unsigned char *input_str; - if ((input_str = mem_as_ustr(arg1)) == NULL) - return; - - int input_str_sz = mem_len_unsafe(arg1); - assert(arg2->type == MEM_TYPE_UINT); - uint8_t len_one = 1; - trim_procedure(context, arg2->u.u, trim_set, - &len_one, 1, input_str, input_str_sz); -} - -/** - * Normalize args from @a argv input array when it has three args. - * - * Case: TRIM(<character_set> FROM <str>) - * If user has specified <character_set> only, call trimming procedure with - * TRIM_BOTH as the flags and that trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH <character_set> FROM <str>) - * If user has specified side keyword and <character_set>, then - * call trimming procedure with that args. - */ -static void -trim_func_three_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2, sql_value *arg3) -{ - assert(arg2->type == MEM_TYPE_UINT); - const unsigned char *input_str, *trim_set; - if ((input_str = mem_as_ustr(arg1)) == NULL || - (trim_set = mem_as_ustr(arg3)) == NULL) - return; - - int trim_set_sz = mem_len_unsafe(arg3); - int input_str_sz = mem_len_unsafe(arg1); - uint8_t *char_len; - int char_cnt = trim_prepare_char_len(context, trim_set, trim_set_sz, - &char_len); - if (char_cnt == -1) - return; - trim_procedure(context, arg2->u.u, trim_set, char_len, - char_cnt, input_str, input_str_sz); - sql_free(char_len); -} - -/** - * Normalize args from @a argv input array when it has one, - * two or three args. - * - * This is a dispatcher function that calls corresponding - * implementation depending on the number of arguments. -*/ -static void -trim_func(struct sql_context *context, int argc, struct Mem *argv) -{ - switch (argc) { - case 2: - trim_func_two_args(context, &argv[0], &argv[1]); - break; - case 3: - trim_func_three_args(context, &argv[0], &argv[1], &argv[2]); - break; - default: - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "TRIM", - "2 or 3", argc); - context->is_aborted = true; - } -} - /* * Compute the soundex encoding of a word. * @@ -2040,14 +2019,14 @@ static struct sql_func_definition definitions[] = { fin_total}, {"TRIM", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_STRING}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TRIM", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_VARBINARY}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TYPEOF", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, typeofFunc, NULL}, {"UNICODE", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, unicodeFunc, diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index ce8354840..d1e17ca3e 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(19) +test:plan(20) --!./tcltestrunner.lua -- 2007 May 15 @@ -296,47 +296,62 @@ test:do_test( test:do_test( "badutf-4.4", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'808080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'808080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.4> - "X", "808080F0808080FF" + "X", "F0" -- </badutf-4.4> }) test:do_test( "badutf-4.5", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff8080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff8080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.5> - "X", "80F0808080FF" + "X", "F0" -- </badutf-4.5> }) test:do_test( "badutf-4.6", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.6> - "X", "F0808080FF" + "X", "F0" -- </badutf-4.6> }) test:do_test( "badutf-4.7", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff8080' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff8080' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.7> - "X", "FF80F0808080FF" + "X", "F0" -- </badutf-4.7> }) +-- gh-4145: Make sure that TRIM() properly work with VARBINARY. +test:do_execsql_test( + "badutf-5", + [[ + SELECT HEX(TRIM(x'ff1234' from x'1234125678123412')); + ]], + { + '5678' + } +) + --db2("close") ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (4 preceding siblings ...) 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-08 21:58 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function Mergen Imeev via Tarantool-patches ` (3 subsequent siblings) 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches This patch is a refactoring of POSITION(). In addition, VARBINARY arguments can now be used in this function. Part of #4145 --- src/box/sql/func.c | 201 +++++++++++---------------------- test/sql-tap/position.test.lua | 80 +++++++------ 2 files changed, 110 insertions(+), 171 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 1d1a8b0cd..415a92738 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -530,6 +530,68 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementation of the POSITION() function. */ +static void +func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_is_null(&argv[0]) || mem_is_null(&argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_bytes(&argv[1])); + + const char *key = argv[0].z; + const char *str = argv[1].z; + int key_size = argv[0].n; + int str_size = argv[1].n; + if (key_size <= 0) + return mem_set_uint(ctx->pOut, 1); + /* Matching time O(n * m). */ + for (int i = 0; i <= str_size - key_size; ++i) { + if (memcmp(&str[i], key, key_size) == 0) + return mem_set_uint(ctx->pOut, i + 1); + } + return mem_set_uint(ctx->pOut, 0); +} + +static void +func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_is_null(&argv[0]) || mem_is_null(&argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); + + const char *key = argv[0].z; + const char *str = argv[1].z; + int key_size = argv[0].n; + int str_size = argv[1].n; + if (key_size <= 0) + return mem_set_uint(ctx->pOut, 1); + int key_len = utf8_len_str(key, key_size); + + int start = 0; + int end = 0; + for (int i = 0; i < key_len && end <= str_size; ++i) + end += utf8_len_char(str[end]); + if (end > str_size) + return mem_set_uint(ctx->pOut, 0); + int i = 0; + while (end <= str_size) { + struct coll *coll = ctx->coll; + const char *s = &str[start]; + if (coll->cmp(key, key_size, s, end - start, coll) == 0) + return mem_set_uint(ctx->pOut, i + 1); + start += utf8_len_char(str[start]); + if (end == str_size) + break; + end += utf8_len_char(str[end]); + ++i; + } + return mem_set_uint(ctx->pOut, 0); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -693,141 +755,6 @@ lengthFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/** - * Implementation of the position() function. - * - * position(needle, haystack) finds the first occurrence of needle - * in haystack and returns the number of previous characters - * plus 1, or 0 if needle does not occur within haystack. - * - * If both haystack and needle are BLOBs, then the result is one - * more than the number of bytes in haystack prior to the first - * occurrence of needle, or 0 if needle never occurs in haystack. - */ -static void -position_func(struct sql_context *context, int argc, struct Mem *argv) -{ - UNUSED_PARAMETER(argc); - struct Mem *needle = &argv[0]; - struct Mem *haystack = &argv[1]; - enum mp_type needle_type = sql_value_type(needle); - enum mp_type haystack_type = sql_value_type(haystack); - - if (haystack_type == MP_NIL || needle_type == MP_NIL) - return; - /* - * Position function can be called only with string - * or blob params. - */ - struct Mem *inconsistent_type_arg = NULL; - if (needle_type != MP_STR && needle_type != MP_BIN) - inconsistent_type_arg = needle; - if (haystack_type != MP_STR && haystack_type != MP_BIN) - inconsistent_type_arg = haystack; - if (inconsistent_type_arg != NULL) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - "string or varbinary", mem_str(inconsistent_type_arg)); - context->is_aborted = true; - return; - } - /* - * Both params of Position function must be of the same - * type. - */ - if (haystack_type != needle_type) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - mem_type_to_str(needle), mem_str(haystack)); - context->is_aborted = true; - return; - } - - int n_needle_bytes = mem_len_unsafe(needle); - int n_haystack_bytes = mem_len_unsafe(haystack); - int position = 1; - if (n_needle_bytes > 0) { - const unsigned char *haystack_str; - const unsigned char *needle_str; - if (haystack_type == MP_BIN) { - needle_str = mem_as_bin(needle); - haystack_str = mem_as_bin(haystack); - assert(needle_str != NULL); - assert(haystack_str != NULL || n_haystack_bytes == 0); - /* - * Naive implementation of substring - * searching: matching time O(n * m). - * Can be improved. - */ - while (n_needle_bytes <= n_haystack_bytes && - memcmp(haystack_str, needle_str, n_needle_bytes) != 0) { - position++; - n_haystack_bytes--; - haystack_str++; - } - if (n_needle_bytes > n_haystack_bytes) - position = 0; - } else { - /* - * Code below handles not only simple - * cases like position('a', 'bca'), but - * also more complex ones: - * position('a', 'bcá' COLLATE "unicode_ci") - * To do so, we need to use comparison - * window, which has constant character - * size, but variable byte size. - * Character size is equal to - * needle char size. - */ - haystack_str = mem_as_ustr(haystack); - needle_str = mem_as_ustr(needle); - - int n_needle_chars = - sql_utf8_char_count(needle_str, n_needle_bytes); - int n_haystack_chars = - sql_utf8_char_count(haystack_str, - n_haystack_bytes); - - if (n_haystack_chars < n_needle_chars) { - position = 0; - goto finish; - } - /* - * Comparison window is determined by - * beg_offset and end_offset. beg_offset - * is offset in bytes from haystack - * beginning to window beginning. - * end_offset is offset in bytes from - * haystack beginning to window end. - */ - int end_offset = 0; - for (int c = 0; c < n_needle_chars; c++) { - SQL_UTF8_FWD_1(haystack_str, end_offset, - n_haystack_bytes); - } - int beg_offset = 0; - struct coll *coll = context->coll; - int c; - for (c = 0; c + n_needle_chars <= n_haystack_chars; c++) { - if (coll->cmp((const char *) haystack_str + beg_offset, - end_offset - beg_offset, - (const char *) needle_str, - n_needle_bytes, coll) == 0) - goto finish; - position++; - /* Update offsets. */ - SQL_UTF8_FWD_1(haystack_str, beg_offset, - n_haystack_bytes); - SQL_UTF8_FWD_1(haystack_str, end_offset, - n_haystack_bytes); - } - /* Needle was not found in the haystack. */ - position = 0; - } - } -finish: - assert(position >= 0); - sql_result_uint(context, position); -} - /* * Implementation of the printf() function. */ @@ -2004,7 +1931,9 @@ static struct sql_func_definition definitions[] = { {"NULLIF", 2, {FIELD_TYPE_ANY, FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, func_nullif, NULL}, {"POSITION", 2, {FIELD_TYPE_STRING, FIELD_TYPE_STRING}, - FIELD_TYPE_INTEGER, position_func, NULL}, + FIELD_TYPE_INTEGER, func_position_characters, NULL}, + {"POSITION", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_VARBINARY}, + FIELD_TYPE_INTEGER, func_position_octets, NULL}, {"PRINTF", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, printfFunc, NULL}, {"QUOTE", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, quoteFunc, NULL}, diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua index 6a96ed9bc..5f62c7f54 100755 --- a/test/sql-tap/position.test.lua +++ b/test/sql-tap/position.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(80) +test:plan(81) test:do_test( "position-1.1", @@ -305,130 +305,130 @@ test:do_test( test:do_test( "position-1.31", function() - return test:catchsql "SELECT position(x'01', x'0102030405');" + return test:execsql "SELECT position(x'01', x'0102030405');" end, { -- <position-1.31> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.31> }) test:do_test( "position-1.32", function() - return test:catchsql "SELECT position(x'02', x'0102030405');" + return test:execsql "SELECT position(x'02', x'0102030405');" end, { -- <position-1.32> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 2 -- </position-1.32> }) test:do_test( "position-1.33", function() - return test:catchsql "SELECT position(x'03', x'0102030405');" + return test:execsql "SELECT position(x'03', x'0102030405');" end, { -- <position-1.33> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.33> }) test:do_test( "position-1.34", function() - return test:catchsql "SELECT position(x'04', x'0102030405');" + return test:execsql "SELECT position(x'04', x'0102030405');" end, { -- <position-1.34> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 4 -- </position-1.34> }) test:do_test( "position-1.35", function() - return test:catchsql "SELECT position(x'05', x'0102030405');" + return test:execsql "SELECT position(x'05', x'0102030405');" end, { -- <position-1.35> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 5 -- </position-1.35> }) test:do_test( "position-1.36", function() - return test:catchsql "SELECT position(x'06', x'0102030405');" + return test:execsql "SELECT position(x'06', x'0102030405');" end, { -- <position-1.36> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.36> }) test:do_test( "position-1.37", function() - return test:catchsql "SELECT position(x'0102030405', x'0102030405');" + return test:execsql "SELECT position(x'0102030405', x'0102030405');" end, { -- <position-1.37> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.37> }) test:do_test( "position-1.38", function() - return test:catchsql "SELECT position(x'02030405', x'0102030405');" + return test:execsql "SELECT position(x'02030405', x'0102030405');" end, { -- <position-1.38> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 2 -- </position-1.38> }) test:do_test( "position-1.39", function() - return test:catchsql "SELECT position(x'030405', x'0102030405');" + return test:execsql "SELECT position(x'030405', x'0102030405');" end, { -- <position-1.39> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.39> }) test:do_test( "position-1.40", function() - return test:catchsql "SELECT position(x'0405', x'0102030405');" + return test:execsql "SELECT position(x'0405', x'0102030405');" end, { -- <position-1.40> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 4 -- </position-1.40> }) test:do_test( "position-1.41", function() - return test:catchsql "SELECT position(x'0506', x'0102030405');" + return test:execsql "SELECT position(x'0506', x'0102030405');" end, { -- <position-1.41> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.41> }) test:do_test( "position-1.42", function() - return test:catchsql "SELECT position(x'', x'0102030405');" + return test:execsql "SELECT position(x'', x'0102030405');" end, { -- <position-1.42> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.42> }) test:do_test( "position-1.43", function() - return test:catchsql "SELECT position(x'', x'');" + return test:execsql "SELECT position(x'', x'');" end, { -- <position-1.43> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.43> }) @@ -571,40 +571,40 @@ test:do_test( test:do_test( "position-1.56.1", function() - return test:catchsql "SELECT position(x'79', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'79', x'78c3a4e282ac79');" end, { -- <position-1.56.1> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 7 -- </position-1.56.1> }) test:do_test( "position-1.56.2", function() - return test:catchsql "SELECT position(x'7a', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'7a', x'78c3a4e282ac79');" end, { -- <position-1.56.2> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.56.2> }) test:do_test( "position-1.56.3", function() - return test:catchsql "SELECT position(x'78', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'78', x'78c3a4e282ac79');" end, { -- <position-1.56.3> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.56.3> }) test:do_test( "position-1.56.3", function() - return test:catchsql "SELECT position(x'a4', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'a4', x'78c3a4e282ac79');" end, { -- <position-1.56.3> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.56.3> }) @@ -858,4 +858,14 @@ test:do_catchsql_test( } ) +-- gh-4145: Make sure that POSITION() can wirk with VARBINARY. +test:do_execsql_test( + "position-2", + [[ + SELECT POSITION(x'313233', x'30313231323334353132333435'); + ]], { + 4 + } +) + test:finish_test() -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function Mergen Imeev via Tarantool-patches @ 2021-10-08 21:58 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 17:08 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-08 21:58 UTC (permalink / raw) To: imeevma; +Cc: tarantool-patches Thanks for the patch! See 3 comments below. > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 1d1a8b0cd..415a92738 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -530,6 +530,68 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) > ctx->is_aborted = true; > } > > +/** Implementation of the POSITION() function. */ > +static void > +func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) > +{ > + assert(argc == 2); > + (void)argc; > + if (mem_is_null(&argv[0]) || mem_is_null(&argv[1])) 1. There is mem_is_any_null(). The same in the next function. > + return; > + assert(mem_is_bytes(&argv[0]) && mem_is_bytes(&argv[1])); > + > + const char *key = argv[0].z; > + const char *str = argv[1].z; > + int key_size = argv[0].n; > + int str_size = argv[1].n; > + if (key_size <= 0) > + return mem_set_uint(ctx->pOut, 1); > + /* Matching time O(n * m). */ > + for (int i = 0; i <= str_size - key_size; ++i) { > + if (memcmp(&str[i], key, key_size) == 0) > + return mem_set_uint(ctx->pOut, i + 1); > + } 2. There is memmem(). > + return mem_set_uint(ctx->pOut, 0); > +} > diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua > index 6a96ed9bc..5f62c7f54 100755 > --- a/test/sql-tap/position.test.lua > +++ b/test/sql-tap/position.test.lua > @@ -858,4 +858,14 @@ test:do_catchsql_test( > } > ) > > +-- gh-4145: Make sure that POSITION() can wirk with VARBINARY. 3. wirk -> work. > +test:do_execsql_test( > + "position-2", > + [[ > + SELECT POSITION(x'313233', x'30313231323334353132333435'); > + ]], { > + 4 > + } > +) > + > test:finish_test() > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function 2021-10-08 21:58 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-10-20 17:08 ` Mergen Imeev via Tarantool-patches 2021-11-01 10:41 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 17:08 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for the review! My answers, diff and new patch below. Also, I replaced self-created functions by ucnv_getNextUChar(). On Fri, Oct 08, 2021 at 11:58:32PM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > See 3 comments below. > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index 1d1a8b0cd..415a92738 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -530,6 +530,68 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) > > ctx->is_aborted = true; > > } > > > > +/** Implementation of the POSITION() function. */ > > +static void > > +func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) > > +{ > > + assert(argc == 2); > > + (void)argc; > > + if (mem_is_null(&argv[0]) || mem_is_null(&argv[1])) > > 1. There is mem_is_any_null(). The same in the next function. > Thanks, fixed. > > + return; > > + assert(mem_is_bytes(&argv[0]) && mem_is_bytes(&argv[1])); > > + > > + const char *key = argv[0].z; > > + const char *str = argv[1].z; > > + int key_size = argv[0].n; > > + int str_size = argv[1].n; > > + if (key_size <= 0) > > + return mem_set_uint(ctx->pOut, 1); > > + /* Matching time O(n * m). */ > > + for (int i = 0; i <= str_size - key_size; ++i) { > > + if (memcmp(&str[i], key, key_size) == 0) > > + return mem_set_uint(ctx->pOut, i + 1); > > + } > > 2. There is memmem(). > Thanks, fixed. > > + return mem_set_uint(ctx->pOut, 0); > > +} > > diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua > > index 6a96ed9bc..5f62c7f54 100755 > > --- a/test/sql-tap/position.test.lua > > +++ b/test/sql-tap/position.test.lua > > @@ -858,4 +858,14 @@ test:do_catchsql_test( > > } > > ) > > > > +-- gh-4145: Make sure that POSITION() can wirk with VARBINARY. > > 3. wirk -> work. > Fixed. > > +test:do_execsql_test( > > + "position-2", > > + [[ > > + SELECT POSITION(x'313233', x'30313231323334353132333435'); > > + ]], { > > + 4 > > + } > > +) > > + > > test:finish_test() > > Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index d145e9cc0..80b075dcf 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -522,7 +522,7 @@ func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) { assert(argc == 2); (void)argc; - if (mem_is_null(&argv[0]) || mem_is_null(&argv[1])) + if (mem_is_any_null(&argv[0], &argv[1])) return; assert(mem_is_bytes(&argv[0]) && mem_is_bytes(&argv[1])); @@ -532,12 +532,8 @@ func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) int str_size = argv[1].n; if (key_size <= 0) return mem_set_uint(ctx->pOut, 1); - /* Matching time O(n * m). */ - for (int i = 0; i <= str_size - key_size; ++i) { - if (memcmp(&str[i], key, key_size) == 0) - return mem_set_uint(ctx->pOut, i + 1); - } - return mem_set_uint(ctx->pOut, 0); + const char *pos = memmem(str, str_size, key, key_size); + return mem_set_uint(ctx->pOut, pos == NULL ? 0 : pos - str + 1); } static void @@ -545,7 +541,7 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) { assert(argc == 2); (void)argc; - if (mem_is_null(&argv[0]) || mem_is_null(&argv[1])) + if (mem_is_any_null(&argv[0], &argv[1])) return; assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); @@ -555,26 +551,29 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) int str_size = argv[1].n; if (key_size <= 0) return mem_set_uint(ctx->pOut, 1); - int key_len = utf8_len_str(key, key_size); - int start = 0; - int end = 0; - for (int i = 0; i < key_len && end <= str_size; ++i) - end += utf8_len_char(str[end]); - if (end > str_size) - return mem_set_uint(ctx->pOut, 0); + UErrorCode err = U_ZERO_ERROR; + const char *pos = str; + const char *cur = str; + const char *end = str + str_size; + const char *tmp_pos = key; + const char *tmp_end = key + key_size; + assert(icu_utf8_conv != NULL); + while (tmp_pos < tmp_end && err == U_ZERO_ERROR) { + ucnv_getNextUChar(icu_utf8_conv, &tmp_pos, tmp_end, &err); + ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); + } + int i = 0; - while (end <= str_size) { + while (err == U_ZERO_ERROR) { struct coll *coll = ctx->coll; - const char *s = &str[start]; - if (coll->cmp(key, key_size, s, end - start, coll) == 0) + if (coll->cmp(key, key_size, pos, cur - pos, coll) == 0) return mem_set_uint(ctx->pOut, i + 1); - start += utf8_len_char(str[start]); - if (end == str_size) - break; - end += utf8_len_char(str[end]); + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); ++i; } + assert(err == U_INDEX_OUTOFBOUNDS_ERROR && cur == end); return mem_set_uint(ctx->pOut, 0); } diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua index 5f62c7f54..e49f4665a 100755 --- a/test/sql-tap/position.test.lua +++ b/test/sql-tap/position.test.lua @@ -858,7 +858,7 @@ test:do_catchsql_test( } ) --- gh-4145: Make sure that POSITION() can wirk with VARBINARY. +-- gh-4145: Make sure POSITION() can work with VARBINARY. test:do_execsql_test( "position-2", [[ New patch: commit cdc02ef02866bdc603f8389e09d3ac0078c1e782 Author: Mergen Imeev <imeevma@gmail.com> Date: Wed Sep 22 14:36:40 2021 +0300 sql: rework POSITION() function This patch is a refactoring of POSITION(). In addition, VARBINARY arguments can now be used in this function. In addition, POSITION() now uses ICU functions instead of self-created. Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 1294ff5b3..80b075dcf 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -516,6 +516,67 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementation of the POSITION() function. */ +static void +func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_bytes(&argv[1])); + + const char *key = argv[0].z; + const char *str = argv[1].z; + int key_size = argv[0].n; + int str_size = argv[1].n; + if (key_size <= 0) + return mem_set_uint(ctx->pOut, 1); + const char *pos = memmem(str, str_size, key, key_size); + return mem_set_uint(ctx->pOut, pos == NULL ? 0 : pos - str + 1); +} + +static void +func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); + + const char *key = argv[0].z; + const char *str = argv[1].z; + int key_size = argv[0].n; + int str_size = argv[1].n; + if (key_size <= 0) + return mem_set_uint(ctx->pOut, 1); + + UErrorCode err = U_ZERO_ERROR; + const char *pos = str; + const char *cur = str; + const char *end = str + str_size; + const char *tmp_pos = key; + const char *tmp_end = key + key_size; + assert(icu_utf8_conv != NULL); + while (tmp_pos < tmp_end && err == U_ZERO_ERROR) { + ucnv_getNextUChar(icu_utf8_conv, &tmp_pos, tmp_end, &err); + ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); + } + + int i = 0; + while (err == U_ZERO_ERROR) { + struct coll *coll = ctx->coll; + if (coll->cmp(key, key_size, pos, cur - pos, coll) == 0) + return mem_set_uint(ctx->pOut, i + 1); + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); + ++i; + } + assert(err == U_INDEX_OUTOFBOUNDS_ERROR && cur == end); + return mem_set_uint(ctx->pOut, 0); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -679,141 +740,6 @@ lengthFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/** - * Implementation of the position() function. - * - * position(needle, haystack) finds the first occurrence of needle - * in haystack and returns the number of previous characters - * plus 1, or 0 if needle does not occur within haystack. - * - * If both haystack and needle are BLOBs, then the result is one - * more than the number of bytes in haystack prior to the first - * occurrence of needle, or 0 if needle never occurs in haystack. - */ -static void -position_func(struct sql_context *context, int argc, struct Mem *argv) -{ - UNUSED_PARAMETER(argc); - struct Mem *needle = &argv[0]; - struct Mem *haystack = &argv[1]; - enum mp_type needle_type = sql_value_type(needle); - enum mp_type haystack_type = sql_value_type(haystack); - - if (haystack_type == MP_NIL || needle_type == MP_NIL) - return; - /* - * Position function can be called only with string - * or blob params. - */ - struct Mem *inconsistent_type_arg = NULL; - if (needle_type != MP_STR && needle_type != MP_BIN) - inconsistent_type_arg = needle; - if (haystack_type != MP_STR && haystack_type != MP_BIN) - inconsistent_type_arg = haystack; - if (inconsistent_type_arg != NULL) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - "string or varbinary", mem_str(inconsistent_type_arg)); - context->is_aborted = true; - return; - } - /* - * Both params of Position function must be of the same - * type. - */ - if (haystack_type != needle_type) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - mem_type_to_str(needle), mem_str(haystack)); - context->is_aborted = true; - return; - } - - int n_needle_bytes = mem_len_unsafe(needle); - int n_haystack_bytes = mem_len_unsafe(haystack); - int position = 1; - if (n_needle_bytes > 0) { - const unsigned char *haystack_str; - const unsigned char *needle_str; - if (haystack_type == MP_BIN) { - needle_str = mem_as_bin(needle); - haystack_str = mem_as_bin(haystack); - assert(needle_str != NULL); - assert(haystack_str != NULL || n_haystack_bytes == 0); - /* - * Naive implementation of substring - * searching: matching time O(n * m). - * Can be improved. - */ - while (n_needle_bytes <= n_haystack_bytes && - memcmp(haystack_str, needle_str, n_needle_bytes) != 0) { - position++; - n_haystack_bytes--; - haystack_str++; - } - if (n_needle_bytes > n_haystack_bytes) - position = 0; - } else { - /* - * Code below handles not only simple - * cases like position('a', 'bca'), but - * also more complex ones: - * position('a', 'bcá' COLLATE "unicode_ci") - * To do so, we need to use comparison - * window, which has constant character - * size, but variable byte size. - * Character size is equal to - * needle char size. - */ - haystack_str = mem_as_ustr(haystack); - needle_str = mem_as_ustr(needle); - - int n_needle_chars = - sql_utf8_char_count(needle_str, n_needle_bytes); - int n_haystack_chars = - sql_utf8_char_count(haystack_str, - n_haystack_bytes); - - if (n_haystack_chars < n_needle_chars) { - position = 0; - goto finish; - } - /* - * Comparison window is determined by - * beg_offset and end_offset. beg_offset - * is offset in bytes from haystack - * beginning to window beginning. - * end_offset is offset in bytes from - * haystack beginning to window end. - */ - int end_offset = 0; - for (int c = 0; c < n_needle_chars; c++) { - SQL_UTF8_FWD_1(haystack_str, end_offset, - n_haystack_bytes); - } - int beg_offset = 0; - struct coll *coll = context->coll; - int c; - for (c = 0; c + n_needle_chars <= n_haystack_chars; c++) { - if (coll->cmp((const char *) haystack_str + beg_offset, - end_offset - beg_offset, - (const char *) needle_str, - n_needle_bytes, coll) == 0) - goto finish; - position++; - /* Update offsets. */ - SQL_UTF8_FWD_1(haystack_str, beg_offset, - n_haystack_bytes); - SQL_UTF8_FWD_1(haystack_str, end_offset, - n_haystack_bytes); - } - /* Needle was not found in the haystack. */ - position = 0; - } - } -finish: - assert(position >= 0); - sql_result_uint(context, position); -} - /* * Implementation of the printf() function. */ @@ -1989,7 +1915,9 @@ static struct sql_func_definition definitions[] = { {"NULLIF", 2, {FIELD_TYPE_ANY, FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, func_nullif, NULL}, {"POSITION", 2, {FIELD_TYPE_STRING, FIELD_TYPE_STRING}, - FIELD_TYPE_INTEGER, position_func, NULL}, + FIELD_TYPE_INTEGER, func_position_characters, NULL}, + {"POSITION", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_VARBINARY}, + FIELD_TYPE_INTEGER, func_position_octets, NULL}, {"PRINTF", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, printfFunc, NULL}, {"QUOTE", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, quoteFunc, NULL}, diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua index 6a96ed9bc..e49f4665a 100755 --- a/test/sql-tap/position.test.lua +++ b/test/sql-tap/position.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(80) +test:plan(81) test:do_test( "position-1.1", @@ -305,130 +305,130 @@ test:do_test( test:do_test( "position-1.31", function() - return test:catchsql "SELECT position(x'01', x'0102030405');" + return test:execsql "SELECT position(x'01', x'0102030405');" end, { -- <position-1.31> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.31> }) test:do_test( "position-1.32", function() - return test:catchsql "SELECT position(x'02', x'0102030405');" + return test:execsql "SELECT position(x'02', x'0102030405');" end, { -- <position-1.32> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 2 -- </position-1.32> }) test:do_test( "position-1.33", function() - return test:catchsql "SELECT position(x'03', x'0102030405');" + return test:execsql "SELECT position(x'03', x'0102030405');" end, { -- <position-1.33> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.33> }) test:do_test( "position-1.34", function() - return test:catchsql "SELECT position(x'04', x'0102030405');" + return test:execsql "SELECT position(x'04', x'0102030405');" end, { -- <position-1.34> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 4 -- </position-1.34> }) test:do_test( "position-1.35", function() - return test:catchsql "SELECT position(x'05', x'0102030405');" + return test:execsql "SELECT position(x'05', x'0102030405');" end, { -- <position-1.35> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 5 -- </position-1.35> }) test:do_test( "position-1.36", function() - return test:catchsql "SELECT position(x'06', x'0102030405');" + return test:execsql "SELECT position(x'06', x'0102030405');" end, { -- <position-1.36> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.36> }) test:do_test( "position-1.37", function() - return test:catchsql "SELECT position(x'0102030405', x'0102030405');" + return test:execsql "SELECT position(x'0102030405', x'0102030405');" end, { -- <position-1.37> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.37> }) test:do_test( "position-1.38", function() - return test:catchsql "SELECT position(x'02030405', x'0102030405');" + return test:execsql "SELECT position(x'02030405', x'0102030405');" end, { -- <position-1.38> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 2 -- </position-1.38> }) test:do_test( "position-1.39", function() - return test:catchsql "SELECT position(x'030405', x'0102030405');" + return test:execsql "SELECT position(x'030405', x'0102030405');" end, { -- <position-1.39> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.39> }) test:do_test( "position-1.40", function() - return test:catchsql "SELECT position(x'0405', x'0102030405');" + return test:execsql "SELECT position(x'0405', x'0102030405');" end, { -- <position-1.40> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 4 -- </position-1.40> }) test:do_test( "position-1.41", function() - return test:catchsql "SELECT position(x'0506', x'0102030405');" + return test:execsql "SELECT position(x'0506', x'0102030405');" end, { -- <position-1.41> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.41> }) test:do_test( "position-1.42", function() - return test:catchsql "SELECT position(x'', x'0102030405');" + return test:execsql "SELECT position(x'', x'0102030405');" end, { -- <position-1.42> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.42> }) test:do_test( "position-1.43", function() - return test:catchsql "SELECT position(x'', x'');" + return test:execsql "SELECT position(x'', x'');" end, { -- <position-1.43> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.43> }) @@ -571,40 +571,40 @@ test:do_test( test:do_test( "position-1.56.1", function() - return test:catchsql "SELECT position(x'79', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'79', x'78c3a4e282ac79');" end, { -- <position-1.56.1> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 7 -- </position-1.56.1> }) test:do_test( "position-1.56.2", function() - return test:catchsql "SELECT position(x'7a', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'7a', x'78c3a4e282ac79');" end, { -- <position-1.56.2> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.56.2> }) test:do_test( "position-1.56.3", function() - return test:catchsql "SELECT position(x'78', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'78', x'78c3a4e282ac79');" end, { -- <position-1.56.3> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.56.3> }) test:do_test( "position-1.56.3", function() - return test:catchsql "SELECT position(x'a4', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'a4', x'78c3a4e282ac79');" end, { -- <position-1.56.3> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.56.3> }) @@ -858,4 +858,14 @@ test:do_catchsql_test( } ) +-- gh-4145: Make sure POSITION() can work with VARBINARY. +test:do_execsql_test( + "position-2", + [[ + SELECT POSITION(x'313233', x'30313231323334353132333435'); + ]], { + 4 + } +) + test:finish_test() ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function 2021-10-20 17:08 ` Mergen Imeev via Tarantool-patches @ 2021-11-01 10:41 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:41 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches Thank you for the review! I replaced ucnv_getNextUChar() by U8_NEXT(). Diff and new patch below. On Wed, Oct 20, 2021 at 08:08:24PM +0300, Mergen Imeev via Tarantool-patches wrote: <cut> Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 600366b36..7914d2ec7 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -545,28 +545,31 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) if (key_size <= 0) return mem_set_uint(ctx->pOut, 1); - UErrorCode err = U_ZERO_ERROR; - const char *pos = str; - const char *cur = str; - const char *end = str + str_size; - const char *tmp_pos = key; - const char *tmp_end = key + key_size; - assert(icu_utf8_conv != NULL); - while (tmp_pos < tmp_end && err == U_ZERO_ERROR) { - ucnv_getNextUChar(icu_utf8_conv, &tmp_pos, tmp_end, &err); - ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); - } - - int i = 0; - while (err == U_ZERO_ERROR) { - struct coll *coll = ctx->coll; - if (coll->cmp(key, key_size, pos, cur - pos, coll) == 0) - return mem_set_uint(ctx->pOut, i + 1); - ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); - ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); + int key_end = 0; + int str_end = 0; + while (key_end < key_size && str_end < str_size) { + UChar32 c; + U8_NEXT((uint8_t *)key, key_end, key_size, c); + U8_NEXT((uint8_t *)str, str_end, str_size, c); + } + if (key_end < key_size) + return mem_set_uint(ctx->pOut, 0); + + struct coll *coll = ctx->coll; + if (coll->cmp(key, key_size, str, str_end, coll) == 0) + return mem_set_uint(ctx->pOut, 1); + + int i = 2; + int str_pos = 0; + while (str_end < str_size) { + UChar32 c; + U8_NEXT((uint8_t *)str, str_pos, str_size, c); + U8_NEXT((uint8_t *)str, str_end, str_size, c); + const char *s = str + str_pos; + if (coll->cmp(key, key_size, s, str_end - str_pos, coll) == 0) + return mem_set_uint(ctx->pOut, i); ++i; } - assert(err == U_INDEX_OUTOFBOUNDS_ERROR && cur == end); return mem_set_uint(ctx->pOut, 0); } New patch: commit cb4bcb4bc7da6a80c040cf0864b7bf538af02aaa Author: Mergen Imeev <imeevma@gmail.com> Date: Wed Sep 22 14:36:40 2021 +0300 sql: rework POSITION() function This patch is a refactoring of POSITION(). In addition, VARBINARY arguments can now be used in this function. In addition, POSITION() now uses ICU functions instead of self-created. Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index ba6b9246d..7914d2ec7 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -509,6 +509,70 @@ func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementation of the POSITION() function. */ +static void +func_position_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_bytes(&argv[1])); + + const char *key = argv[0].z; + const char *str = argv[1].z; + int key_size = argv[0].n; + int str_size = argv[1].n; + if (key_size <= 0) + return mem_set_uint(ctx->pOut, 1); + const char *pos = memmem(str, str_size, key, key_size); + return mem_set_uint(ctx->pOut, pos == NULL ? 0 : pos - str + 1); +} + +static void +func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); + + const char *key = argv[0].z; + const char *str = argv[1].z; + int key_size = argv[0].n; + int str_size = argv[1].n; + if (key_size <= 0) + return mem_set_uint(ctx->pOut, 1); + + int key_end = 0; + int str_end = 0; + while (key_end < key_size && str_end < str_size) { + UChar32 c; + U8_NEXT((uint8_t *)key, key_end, key_size, c); + U8_NEXT((uint8_t *)str, str_end, str_size, c); + } + if (key_end < key_size) + return mem_set_uint(ctx->pOut, 0); + + struct coll *coll = ctx->coll; + if (coll->cmp(key, key_size, str, str_end, coll) == 0) + return mem_set_uint(ctx->pOut, 1); + + int i = 2; + int str_pos = 0; + while (str_end < str_size) { + UChar32 c; + U8_NEXT((uint8_t *)str, str_pos, str_size, c); + U8_NEXT((uint8_t *)str, str_end, str_size, c); + const char *s = str + str_pos; + if (coll->cmp(key, key_size, s, str_end - str_pos, coll) == 0) + return mem_set_uint(ctx->pOut, i); + ++i; + } + return mem_set_uint(ctx->pOut, 0); +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -672,141 +736,6 @@ lengthFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/** - * Implementation of the position() function. - * - * position(needle, haystack) finds the first occurrence of needle - * in haystack and returns the number of previous characters - * plus 1, or 0 if needle does not occur within haystack. - * - * If both haystack and needle are BLOBs, then the result is one - * more than the number of bytes in haystack prior to the first - * occurrence of needle, or 0 if needle never occurs in haystack. - */ -static void -position_func(struct sql_context *context, int argc, struct Mem *argv) -{ - UNUSED_PARAMETER(argc); - struct Mem *needle = &argv[0]; - struct Mem *haystack = &argv[1]; - enum mp_type needle_type = sql_value_type(needle); - enum mp_type haystack_type = sql_value_type(haystack); - - if (haystack_type == MP_NIL || needle_type == MP_NIL) - return; - /* - * Position function can be called only with string - * or blob params. - */ - struct Mem *inconsistent_type_arg = NULL; - if (needle_type != MP_STR && needle_type != MP_BIN) - inconsistent_type_arg = needle; - if (haystack_type != MP_STR && haystack_type != MP_BIN) - inconsistent_type_arg = haystack; - if (inconsistent_type_arg != NULL) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - "string or varbinary", mem_str(inconsistent_type_arg)); - context->is_aborted = true; - return; - } - /* - * Both params of Position function must be of the same - * type. - */ - if (haystack_type != needle_type) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - mem_type_to_str(needle), mem_str(haystack)); - context->is_aborted = true; - return; - } - - int n_needle_bytes = mem_len_unsafe(needle); - int n_haystack_bytes = mem_len_unsafe(haystack); - int position = 1; - if (n_needle_bytes > 0) { - const unsigned char *haystack_str; - const unsigned char *needle_str; - if (haystack_type == MP_BIN) { - needle_str = mem_as_bin(needle); - haystack_str = mem_as_bin(haystack); - assert(needle_str != NULL); - assert(haystack_str != NULL || n_haystack_bytes == 0); - /* - * Naive implementation of substring - * searching: matching time O(n * m). - * Can be improved. - */ - while (n_needle_bytes <= n_haystack_bytes && - memcmp(haystack_str, needle_str, n_needle_bytes) != 0) { - position++; - n_haystack_bytes--; - haystack_str++; - } - if (n_needle_bytes > n_haystack_bytes) - position = 0; - } else { - /* - * Code below handles not only simple - * cases like position('a', 'bca'), but - * also more complex ones: - * position('a', 'bcá' COLLATE "unicode_ci") - * To do so, we need to use comparison - * window, which has constant character - * size, but variable byte size. - * Character size is equal to - * needle char size. - */ - haystack_str = mem_as_ustr(haystack); - needle_str = mem_as_ustr(needle); - - int n_needle_chars = - sql_utf8_char_count(needle_str, n_needle_bytes); - int n_haystack_chars = - sql_utf8_char_count(haystack_str, - n_haystack_bytes); - - if (n_haystack_chars < n_needle_chars) { - position = 0; - goto finish; - } - /* - * Comparison window is determined by - * beg_offset and end_offset. beg_offset - * is offset in bytes from haystack - * beginning to window beginning. - * end_offset is offset in bytes from - * haystack beginning to window end. - */ - int end_offset = 0; - for (int c = 0; c < n_needle_chars; c++) { - SQL_UTF8_FWD_1(haystack_str, end_offset, - n_haystack_bytes); - } - int beg_offset = 0; - struct coll *coll = context->coll; - int c; - for (c = 0; c + n_needle_chars <= n_haystack_chars; c++) { - if (coll->cmp((const char *) haystack_str + beg_offset, - end_offset - beg_offset, - (const char *) needle_str, - n_needle_bytes, coll) == 0) - goto finish; - position++; - /* Update offsets. */ - SQL_UTF8_FWD_1(haystack_str, beg_offset, - n_haystack_bytes); - SQL_UTF8_FWD_1(haystack_str, end_offset, - n_haystack_bytes); - } - /* Needle was not found in the haystack. */ - position = 0; - } - } -finish: - assert(position >= 0); - sql_result_uint(context, position); -} - /* * Implementation of the printf() function. */ @@ -1982,7 +1911,9 @@ static struct sql_func_definition definitions[] = { {"NULLIF", 2, {FIELD_TYPE_ANY, FIELD_TYPE_ANY}, FIELD_TYPE_SCALAR, func_nullif, NULL}, {"POSITION", 2, {FIELD_TYPE_STRING, FIELD_TYPE_STRING}, - FIELD_TYPE_INTEGER, position_func, NULL}, + FIELD_TYPE_INTEGER, func_position_characters, NULL}, + {"POSITION", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_VARBINARY}, + FIELD_TYPE_INTEGER, func_position_octets, NULL}, {"PRINTF", -1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, printfFunc, NULL}, {"QUOTE", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, quoteFunc, NULL}, diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua index 6a96ed9bc..e49f4665a 100755 --- a/test/sql-tap/position.test.lua +++ b/test/sql-tap/position.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(80) +test:plan(81) test:do_test( "position-1.1", @@ -305,130 +305,130 @@ test:do_test( test:do_test( "position-1.31", function() - return test:catchsql "SELECT position(x'01', x'0102030405');" + return test:execsql "SELECT position(x'01', x'0102030405');" end, { -- <position-1.31> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.31> }) test:do_test( "position-1.32", function() - return test:catchsql "SELECT position(x'02', x'0102030405');" + return test:execsql "SELECT position(x'02', x'0102030405');" end, { -- <position-1.32> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 2 -- </position-1.32> }) test:do_test( "position-1.33", function() - return test:catchsql "SELECT position(x'03', x'0102030405');" + return test:execsql "SELECT position(x'03', x'0102030405');" end, { -- <position-1.33> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.33> }) test:do_test( "position-1.34", function() - return test:catchsql "SELECT position(x'04', x'0102030405');" + return test:execsql "SELECT position(x'04', x'0102030405');" end, { -- <position-1.34> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 4 -- </position-1.34> }) test:do_test( "position-1.35", function() - return test:catchsql "SELECT position(x'05', x'0102030405');" + return test:execsql "SELECT position(x'05', x'0102030405');" end, { -- <position-1.35> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 5 -- </position-1.35> }) test:do_test( "position-1.36", function() - return test:catchsql "SELECT position(x'06', x'0102030405');" + return test:execsql "SELECT position(x'06', x'0102030405');" end, { -- <position-1.36> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.36> }) test:do_test( "position-1.37", function() - return test:catchsql "SELECT position(x'0102030405', x'0102030405');" + return test:execsql "SELECT position(x'0102030405', x'0102030405');" end, { -- <position-1.37> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.37> }) test:do_test( "position-1.38", function() - return test:catchsql "SELECT position(x'02030405', x'0102030405');" + return test:execsql "SELECT position(x'02030405', x'0102030405');" end, { -- <position-1.38> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 2 -- </position-1.38> }) test:do_test( "position-1.39", function() - return test:catchsql "SELECT position(x'030405', x'0102030405');" + return test:execsql "SELECT position(x'030405', x'0102030405');" end, { -- <position-1.39> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.39> }) test:do_test( "position-1.40", function() - return test:catchsql "SELECT position(x'0405', x'0102030405');" + return test:execsql "SELECT position(x'0405', x'0102030405');" end, { -- <position-1.40> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 4 -- </position-1.40> }) test:do_test( "position-1.41", function() - return test:catchsql "SELECT position(x'0506', x'0102030405');" + return test:execsql "SELECT position(x'0506', x'0102030405');" end, { -- <position-1.41> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.41> }) test:do_test( "position-1.42", function() - return test:catchsql "SELECT position(x'', x'0102030405');" + return test:execsql "SELECT position(x'', x'0102030405');" end, { -- <position-1.42> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.42> }) test:do_test( "position-1.43", function() - return test:catchsql "SELECT position(x'', x'');" + return test:execsql "SELECT position(x'', x'');" end, { -- <position-1.43> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.43> }) @@ -571,40 +571,40 @@ test:do_test( test:do_test( "position-1.56.1", function() - return test:catchsql "SELECT position(x'79', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'79', x'78c3a4e282ac79');" end, { -- <position-1.56.1> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 7 -- </position-1.56.1> }) test:do_test( "position-1.56.2", function() - return test:catchsql "SELECT position(x'7a', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'7a', x'78c3a4e282ac79');" end, { -- <position-1.56.2> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 0 -- </position-1.56.2> }) test:do_test( "position-1.56.3", function() - return test:catchsql "SELECT position(x'78', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'78', x'78c3a4e282ac79');" end, { -- <position-1.56.3> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 1 -- </position-1.56.3> }) test:do_test( "position-1.56.3", function() - return test:catchsql "SELECT position(x'a4', x'78c3a4e282ac79');" + return test:execsql "SELECT position(x'a4', x'78c3a4e282ac79');" end, { -- <position-1.56.3> - 1, "Failed to execute SQL statement: wrong arguments for function POSITION()" + 3 -- </position-1.56.3> }) @@ -858,4 +858,14 @@ test:do_catchsql_test( } ) +-- gh-4145: Make sure POSITION() can work with VARBINARY. +test:do_execsql_test( + "position-2", + [[ + SELECT POSITION(x'313233', x'30313231323334353132333435'); + ]], { + 4 + } +) + test:finish_test() ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (5 preceding siblings ...) 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function Mergen Imeev via Tarantool-patches ` (2 subsequent siblings) 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches This patch makes SUBSTR() work according to ANSI rules for SUBSTRING() function. Also, SUBSTR() can now work correctly with large INTEGER values. The SUBSTR() syntax has not changed. Part of #4145 @TarantoolBot document Title: SUBSTR() function SUBSTR() now works according to the ANSI rules for SUBSTRING(), but with no syntax changes. What changed: 1) if the third argument is specified and it is less than 0, an error is thrown; 2) if the sum of the second and third arguments is less than 1, an empty string is returned; 3) if the second argument is less than 0, and the sum of the second and third arguments is greater than 1, then the result is equal to the result of SUBSTR() with the second argument equal to 1 and the third argument equal to the sum of the original second and third arguments minus 1. --- src/box/sql/func.c | 264 ++++++++++++++++++++--------------- test/sql-tap/func.test.lua | 16 +-- test/sql-tap/func2.test.lua | 150 ++++++++++---------- test/sql-tap/substr.test.lua | 121 ++++++++++++---- 4 files changed, 330 insertions(+), 221 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 415a92738..1b4d52225 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -592,6 +592,152 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) return mem_set_uint(ctx->pOut, 0); } +/** Implementation of the SUBSTR() function. */ +int +substr_normalize(int64_t base_start, bool is_start_neg, uint64_t base_length, + uint64_t *start, uint64_t *length) +{ + if (!is_start_neg && base_start > 0) { + *start = (uint64_t)base_start - 1; + *length = base_length; + return 0; + } + *start = 0; + if (base_length == 0) { + *length = 0; + return 0; + } + /* + * We are subtracting 1 from base_length instead of subtracting from + * base_start, since base_start can be INT64_MIN. At the same time, + * base_length is not less than 1. + */ + int64_t a = base_start; + int64_t b = (int64_t)(base_length - 1); + int64_t res; + bool is_neg; + /* + * Integer cannot overflow since non-positive value is added to positive + * value. + */ + if (sql_add_int(a, a != 0, b, false, &res, &is_neg) != 0) { + diag_set(ClientError, ER_SQL_EXECUTE, "integer is overflowed"); + return -1; + } + *length = is_neg ? 0 : (uint64_t)res; + return 0; +} + +static void +func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_int(&argv[1])); + + bool is_str = mem_is_str(&argv[0]); + uint64_t size = argv[0].n; + + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + if (start >= size) + return return_empty_str(ctx, is_str); + char *s = &argv[0].z[start]; + uint64_t n = size - start; + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : + mem_copy_bin(ctx->pOut, s, n) != 0; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (start >= size || length == 0) + return return_empty_str(ctx, is_str); + char *str = &argv[0].z[start]; + uint64_t len = MIN(size - start, length); + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, str, len) != 0 : + mem_copy_bin(ctx->pOut, str, len) != 0; +} + +static void +func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); + + uint64_t size = argv[0].n; + + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + char *str = argv[0].z; + uint64_t s = 0; + for (uint64_t i = 0; i < start && s < size; ++i) + s += utf8_len_char(str[s]); + + if (s >= size) + return mem_set_str_static(ctx->pOut, "", 0); + uint64_t len = size - s; + if (mem_copy_str(ctx->pOut, &str[s], len) != 0) + ctx->is_aborted = true; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (length == 0) + return mem_set_str_static(ctx->pOut, "", 0); + + char *str = argv[0].z; + uint64_t s = 0; + for (uint64_t i = 0; i < start && s < size; ++i) + s += utf8_len_char(str[s]); + uint64_t e = s; + for (uint64_t i = 0; i < length && e < size; ++i) + e += utf8_len_char(str[e]); + if (s >= size) + return mem_set_str_static(ctx->pOut, "", 0); + + uint64_t len = MIN(e, size) - s; + if (mem_copy_str(ctx->pOut, &str[s], len) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -789,116 +935,6 @@ printfFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/* - * Implementation of the substr() function. - * - * substr(x,p1,p2) returns p2 characters of x[] beginning with p1. - * p1 is 1-indexed. So substr(x,1,1) returns the first character - * of x. If x is text, then we actually count UTF-8 characters. - * If x is a blob, then we count bytes. - * - * If p1 is negative, then we begin abs(p1) from the end of x[]. - * - * If p2 is negative, return the p2 characters preceding p1. - */ -static void -substrFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - const unsigned char *z; - const unsigned char *z2; - int len; - int p0type; - int64_t p1, p2; - int negP2 = 0; - - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "SUBSTR", - "1 or 2", argc); - context->is_aborted = true; - return; - } - if (mem_is_null(&argv[1]) || (argc == 3 && mem_is_null(&argv[2]))) - return; - p0type = sql_value_type(&argv[0]); - p1 = mem_get_int_unsafe(&argv[1]); - if (p0type == MP_BIN) { - z = mem_as_bin(&argv[0]); - len = mem_len_unsafe(&argv[0]); - if (z == 0) - return; - assert(len == mem_len_unsafe(&argv[0])); - } else { - z = mem_as_ustr(&argv[0]); - if (z == 0) - return; - len = 0; - if (p1 < 0) - len = sql_utf8_char_count(z, mem_len_unsafe(&argv[0])); - } - if (argc == 3) { - p2 = mem_get_int_unsafe(&argv[2]); - if (p2 < 0) { - p2 = -p2; - negP2 = 1; - } - } else { - p2 = sql_context_db_handle(context)-> - aLimit[SQL_LIMIT_LENGTH]; - } - if (p1 < 0) { - p1 += len; - if (p1 < 0) { - p2 += p1; - if (p2 < 0) - p2 = 0; - p1 = 0; - } - } else if (p1 > 0) { - p1--; - } else if (p2 > 0) { - p2--; - } - if (negP2) { - p1 -= p2; - if (p1 < 0) { - p2 += p1; - p1 = 0; - } - } - assert(p1 >= 0 && p2 >= 0); - if (p0type != MP_BIN) { - /* - * In the code below 'cnt' and 'n_chars' is - * used because '\0' is not supposed to be - * end-of-string symbol. - */ - int byte_size = mem_len_unsafe(&argv[0]); - int n_chars = sql_utf8_char_count(z, byte_size); - int cnt = 0; - int i = 0; - while (cnt < n_chars && p1) { - SQL_UTF8_FWD_1(z, i, byte_size); - cnt++; - p1--; - } - z += i; - i = 0; - for (z2 = z; cnt < n_chars && p2; p2--) { - SQL_UTF8_FWD_1(z2, i, byte_size); - cnt++; - } - z2 += i; - mem_copy_str(context->pOut, (char *)z, z2 - z); - } else { - if (p1 + p2 > len) { - p2 = len - p1; - if (p2 < 0) - p2 = 0; - } - mem_copy_bin(context->pOut, (char *)&z[p1], p2); - } -} - /* * Implementation of the round() function */ @@ -1953,15 +1989,15 @@ static struct sql_func_definition definitions[] = { {"SOUNDEX", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, soundexFunc, NULL}, {"SUBSTR", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUBSTR", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUM", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_sum, NULL}, {"SUM", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_sum, NULL}, {"TOTAL", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_DOUBLE, step_total, diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 416f27d69..dc4dfdc0e 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -139,7 +139,7 @@ test:do_execsql_test( SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.3> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.3> }) @@ -149,7 +149,7 @@ test:do_execsql_test( SELECT substr(t1,-1,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.4> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.4> }) @@ -159,7 +159,7 @@ test:do_execsql_test( SELECT substr(t1,-2,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.5> - "e", "i", "a", "r", "i" + "", "", "", "", "" -- </func-2.5> }) @@ -169,7 +169,7 @@ test:do_execsql_test( SELECT substr(t1,-2,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.6> - "ee", "is", "am", "re", "is" + "", "", "", "", "" -- </func-2.6> }) @@ -179,7 +179,7 @@ test:do_execsql_test( SELECT substr(t1,-4,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.7> - "fr", "", "gr", "wa", "th" + "", "", "", "", "" -- </func-2.7> }) @@ -288,7 +288,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-3.8> - "8", "s", "s", "o" + "", "", "", "" -- </func-3.8> }) @@ -298,7 +298,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-3,2) FROM tbl1 ORDER BY t1 ]], { -- <func-3.9> - "F-", "er", "in", "ሴh" + "", "", "", "" -- </func-3.9> }) @@ -308,7 +308,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-4,3) FROM tbl1 ORDER BY t1 ]], { -- <func-3.10> - "TF-", "ter", "ain", "iሴh" + "", "", "", "" -- </func-3.10> }) diff --git a/test/sql-tap/func2.test.lua b/test/sql-tap/func2.test.lua index 792f020f1..b786b4d96 100755 --- a/test/sql-tap/func2.test.lua +++ b/test/sql-tap/func2.test.lua @@ -162,7 +162,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1) ]], { -- <func2-1.11> - "s" + "Supercalifragilisticexpialidocious" -- </func2-1.11> }) @@ -172,7 +172,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2) ]], { -- <func2-1.12> - "us" + "Supercalifragilisticexpialidocious" -- </func2-1.12> }) @@ -182,7 +182,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30) ]], { -- <func2-1.13> - "rcalifragilisticexpialidocious" + "Supercalifragilisticexpialidocious" -- </func2-1.13> }) @@ -344,7 +344,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 1) ]], { -- <func2-1.25.1> - "s" + "" -- </func2-1.25.1> }) @@ -354,7 +354,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 2) ]], { -- <func2-1.25.2> - "s" + "" -- </func2-1.25.2> }) @@ -364,7 +364,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2, 1) ]], { -- <func2-1.26> - "u" + "" -- </func2-1.26> }) @@ -374,7 +374,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30, 1) ]], { -- <func2-1.27> - "r" + "" -- </func2-1.27> }) @@ -394,7 +394,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 1) ]], { -- <func2-1.28.1> - "S" + "" -- </func2-1.28.1> }) @@ -404,7 +404,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 2) ]], { -- <func2-1.28.2> - "Su" + "" -- </func2-1.28.2> }) @@ -424,7 +424,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -35, 2) ]], { -- <func2-1.29.2> - "S" + "" -- </func2-1.29.2> }) @@ -464,11 +464,13 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -36, 3) ]], { -- <func2-1.30.3> - "S" + "" -- </func2-1.30.3> }) -- p1 is 1-indexed, p2 length to return, p2<0 return p2 chars before p1 +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_execsql_test( "func2-1.31.0", [[ @@ -479,23 +481,23 @@ test:do_execsql_test( -- </func2-1.31.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -1) ]], { -- <func2-1.31.1> - "" + 1, err -- </func2-1.31.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -2) ]], { -- <func2-1.31.2> - "" + 1, err -- </func2-1.31.2> }) @@ -509,13 +511,13 @@ test:do_execsql_test( -- </func2-1.32.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.32.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 1, -1) ]], { -- <func2-1.32.1> - "" + 1, err -- </func2-1.32.1> }) @@ -529,23 +531,23 @@ test:do_execsql_test( -- </func2-1.33.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -1) ]], { -- <func2-1.33.1> - "S" + 1, err -- </func2-1.33.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -2) ]], { -- <func2-1.33.2> - "S" + 1, err -- </func2-1.33.2> }) @@ -559,63 +561,63 @@ test:do_execsql_test( -- </func2-1.34.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -1) ]], { -- <func2-1.34.1> - "u" + 1, err -- </func2-1.34.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -2) ]], { -- <func2-1.34.2> - "Su" + 1, err -- </func2-1.34.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -1) ]], { -- <func2-1.35.1> - "o" + 1, err -- </func2-1.35.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -2) ]], { -- <func2-1.35.2> - "do" + 1, err -- </func2-1.35.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.36", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 34, -1) ]], { -- <func2-1.36> - "u" + 1, err -- </func2-1.36> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.37", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 35, -1) ]], { -- <func2-1.37> - "s" + 1, err -- </func2-1.37> }) @@ -629,23 +631,23 @@ test:do_execsql_test( -- </func2-1.38.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -1) ]], { -- <func2-1.38.1> - "" + 1, err -- </func2-1.38.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -2) ]], { -- <func2-1.38.2> - "s" + 1, err -- </func2-1.38.2> }) @@ -993,7 +995,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 1) ]], { -- <func2-2.8.1> - "ሴ" + "" -- </func2-2.8.1> }) @@ -1003,7 +1005,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 2) ]], { -- <func2-2.8.2> - "ሴ" + "" -- </func2-2.8.2> }) @@ -1130,21 +1132,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "12") test:do_test( "func2-3.5.0", @@ -1158,21 +1160,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "12") + end, "") test:do_test( "func2-3.5.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.5.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.6.0", @@ -1181,26 +1183,28 @@ test:do_test( return bin_to_hex(test.lindex(blob, 0)) end, "") +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_test( "func2-3.6.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -1)") + end, + {1, err}) test:do_test( "func2-3.6.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -2)") + end, + {1, err}) test:do_test( "func2-3.6.3", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -3)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -3)") + end, + {1, err}) test:do_test( "func2-3.7.0", @@ -1212,16 +1216,16 @@ test:do_test( test:do_test( "func2-3.7.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -1)") + end, + {1, err}) test:do_test( "func2-3.7.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -2)") + end, + {1, err}) test:do_test( "func2-3.8.0", @@ -1233,16 +1237,16 @@ test:do_test( test:do_test( "func2-3.8.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -1)") + end, + {1, err}) test:do_test( "func2-3.8.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -2)") + end, + {1, err}) test:do_test( "func2-3.9.0", @@ -1254,17 +1258,15 @@ test:do_test( test:do_test( "func2-3.9.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:do_test( "func2-3.9.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") - - + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:finish_test() diff --git a/test/sql-tap/substr.test.lua b/test/sql-tap/substr.test.lua index e7e6d7aca..45aae8506 100755 --- a/test/sql-tap/substr.test.lua +++ b/test/sql-tap/substr.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(93) +test:plan(85) --!./tcltestrunner.lua -- 2007 May 14 @@ -74,17 +74,11 @@ substr_test("1.2", "abcdefg","2","1","b") substr_test("1.3", "abcdefg","1","2","ab") substr_test("1.4", "abcdefg","1","100","abcdefg") substr_test("1.5", "abcdefg","0","2","a") -substr_test("1.6", "abcdefg","-1","1","g") -substr_test("1.7", "abcdefg","-1","10","g") -substr_test("1.8", "abcdefg","-5","3","cde") -substr_test("1.9", "abcdefg","-7","3","abc") -substr_test("1.10", "abcdefg","-100","98","abcde") -substr_test("1.11", "abcdefg","5","-1","d") -substr_test("1.12", "abcdefg","5","-4","abcd") -substr_test("1.13", "abcdefg","5","-5","abcd") -substr_test("1.14", "abcdefg","-5","-1","b") -substr_test("1.15", "abcdefg","-5","-2","ab") -substr_test("1.16", "abcdefg","-5","-3","ab") +substr_test("1.6", "abcdefg","-1","1","") +substr_test("1.7", "abcdefg","-1","10","abcdefg") +substr_test("1.8", "abcdefg","-5","3","") +substr_test("1.9", "abcdefg","-7","3","") +substr_test("1.10", "abcdefg","-100","98","") substr_test("1.17", "abcdefg","100","200","") substr_test("1.18", "abcdefg","200","100","") -- Make sure NULL is returned if any parameter is NULL @@ -144,21 +138,20 @@ test:do_test( substr_test("2.1", "ሴ⍅㑖","1","1","ሴ") substr_test("2.2", "ሴ⍅㑖","2","1","⍅") substr_test("2.3", "ሴ⍅㑖","1","2","ሴ⍅") -substr_test("2.4", "ሴ⍅㑖","-1","1","㑖") -substr_test("2.5", "aሴb⍅c㑖c","-5","3","b⍅c") -substr_test("2.6", "aሴb⍅c㑖c","-2","-3","b⍅c") +substr_test("2.4", "ሴ⍅㑖","-1","1","") +substr_test("2.5", "aሴb⍅c㑖c","-5","3","") -- Basic functionality for BLOBs -- subblob_test("3.1", "61626364656667","1","1","61") subblob_test("3.2", "61626364656667","2","1","62") subblob_test("3.3", "61626364656667","1","2","6162") subblob_test("3.4", "61626364656667","1","100","61626364656667") -subblob_test("3.5", "61626364656667","0","2","61") -subblob_test("3.6", "61626364656667","-1","1","67") -subblob_test("3.7", "61626364656667","-1","10","67") -subblob_test("3.8", "61626364656667","-5","3","636465") -subblob_test("3.9", "61626364656667","-7","3","616263") -subblob_test("3.10", "61626364656667","-100","98","6162636465") +subblob_test("3.5", "61626364656667", "0", "2", "61") +subblob_test("3.6", "61626364656667", "-1", "1", "") +subblob_test("3.7", "61626364656667", "-1", "10", "61626364656667") +subblob_test("3.8", "61626364656667", "-5", "3", "") +subblob_test("3.9", "61626364656667","-7","3", "") +subblob_test("3.10", "61626364656667", "-100", "98", "") subblob_test("3.11", "61626364656667","100","200","") subblob_test("3.12", "61626364656667","200","100","") -- If these blobs were strings, then they would contain multi-byte @@ -168,9 +161,9 @@ subblob_test("3.12", "61626364656667","200","100","") subblob_test("4.1", "61E188B462E28D8563E3919663","1","1","61") subblob_test("4.2", "61E188B462E28D8563E3919663","2","1","E1") subblob_test("4.3", "61E188B462E28D8563E3919663","1","2","61E1") -subblob_test("4.4", "61E188B462E28D8563E3919663","-2","1","96") -subblob_test("4.5", "61E188B462E28D8563E3919663","-5","4","63E39196") -subblob_test("4.6", "61E188B462E28D8563E3919663","-100","98","61E188B462E28D8563E391") +subblob_test("4.4", "61E188B462E28D8563E3919663", "-2", "1", "") +subblob_test("4.5", "61E188B462E28D8563E3919663", "-5", "4", "") +subblob_test("4.6", "61E188B462E28D8563E3919663", "-100", "98", "") -- Two-argument SUBSTR -- local function substr_2_test(id, string, idx, result) @@ -193,7 +186,85 @@ local function substr_2_test(id, string, idx, result) end substr_2_test("5.1","abcdefghijklmnop","5","efghijklmnop") -substr_2_test("5.2","abcdef","-5","bcdef") +substr_2_test("5.2","abcdef","-5","abcdef") +-- +-- gh-4145: Make sure SUBSTR() throws an error if the third argument is +-- negative. +-- +test:do_catchsql_test( + "substr-6.1", + [[ + SELECT SUBSTR('12345', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +test:do_catchsql_test( + "substr-6.2", + [[ + SELECT SUBSTR(x'3132333435', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +-- gh-4145: Make sure that SUBSTR() works according to ANSI. + +-- +-- Make sure SUBSTR() returns "" if the sum of the second and third arguments is +-- 1 or less. +-- +test:do_execsql_test( + "builtins-6.3", + [[ + SELECT SUBSTR('asdfg', -10, 5), SUBSTR('asdfg', -4, 5); + ]], + { + '', '' + } +) + +-- +-- Make sure that if the sum of the second and third arguments is more than 1 +-- and the second argument is negative, the result starts from the start of the +-- string and length of the result will be one less than sum of the second and +-- third arguments. +-- +test:do_execsql_test( + "builtins-6.4", + [[ + SELECT SUBSTR('123456789', -5, 10); + ]], + { + '1234' + } +) + +-- Make sure SUBSTR() can work with big INTEGERs. +test:do_execsql_test( + "builtins-6.5", + [[ + SELECT SUBSTR('123456789', -9223372036854775808, 9223372036854775812); + ]], + { + '123' + } +) + +test:do_execsql_test( + "builtins-6.6", + [[ + SELECT SUBSTR('123456789', 0, 18000000000000000000); + ]], + { + '123456789' + } +) test:finish_test() -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function Mergen Imeev via Tarantool-patches @ 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 17:15 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-08 22:02 UTC (permalink / raw) To: imeevma; +Cc: tarantool-patches Thanks for the patch! On 01.10.2021 18:29, Mergen Imeev via Tarantool-patches wrote: > This patch makes SUBSTR() work according to ANSI rules for SUBSTRING() > function. Also, SUBSTR() can now work correctly with large INTEGER > values. The SUBSTR() syntax has not changed. Does not make much sense given that Mem.n is just int. You can't have strings or blobs larger than that. > Part of #4145 > > @TarantoolBot document > Title: SUBSTR() function > > SUBSTR() now works according to the ANSI rules for SUBSTRING(), but with > no syntax changes. What changed: > 1) if the third argument is specified and it is less than 0, an error is > thrown; > 2) if the sum of the second and third arguments is less than 1, an empty > string is returned; > 3) if the second argument is less than 0, and the sum of the second and > third arguments is greater than 1, then the result is equal to the > result of SUBSTR() with the second argument equal to 1 and the third > argument equal to the sum of the original second and third arguments > minus 1. I don't understand how it works now. What are the arguments? What do they do? Can you explain fully? I see that a lot of tests now has turned into errors and I don't understand why did they stop working. In the code I couldn't find any comments. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-10-20 17:15 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:13 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 17:15 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for review! My answers, diff and new patch below. Also, I replaced self-created functions ucnv_getNextUChar(). On Sat, Oct 09, 2021 at 12:02:14AM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > On 01.10.2021 18:29, Mergen Imeev via Tarantool-patches wrote: > > This patch makes SUBSTR() work according to ANSI rules for SUBSTRING() > > function. Also, SUBSTR() can now work correctly with large INTEGER > > values. The SUBSTR() syntax has not changed. > > Does not make much sense given that Mem.n is just int. You can't > have strings or blobs larger than that. > > > Part of #4145 > > > > @TarantoolBot document > > Title: SUBSTR() function > > > > SUBSTR() now works according to the ANSI rules for SUBSTRING(), but with > > no syntax changes. What changed: > > 1) if the third argument is specified and it is less than 0, an error is > > thrown; > > 2) if the sum of the second and third arguments is less than 1, an empty > > string is returned; > > 3) if the second argument is less than 0, and the sum of the second and > > third arguments is greater than 1, then the result is equal to the > > result of SUBSTR() with the second argument equal to 1 and the third > > argument equal to the sum of the original second and third arguments > > minus 1. > > I don't understand how it works now. What are the arguments? What > do they do? Can you explain fully? I see that a lot of tests now has > turned into errors and I don't understand why did they stop working. > > In the code I couldn't find any comments. Rewrote the docbot request. I thought about adding comments, but maybe it's better to divide these two functions into four? One for two VARBINARY arguments, one for three VARBINARY arguments, and the same for STRING arguments. Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 441fab482..65bf03250 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -670,20 +670,17 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) return; assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); - uint64_t size = argv[0].n; - + UErrorCode err = U_ZERO_ERROR; + const char *pos = argv[0].z; + const char *end = argv[0].z + argv[0].n; if (argc == 2) { uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? argv[1].u.u - 1 : 0; - char *str = argv[0].z; - uint64_t s = 0; - for (uint64_t i = 0; i < start && s < size; ++i) - s += utf8_len_char(str[s]); - - if (s >= size) + for (uint64_t i = 0; i < start && pos < end; ++i) + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + if (pos == end) return mem_set_str_static(ctx->pOut, "", 0); - uint64_t len = size - s; - if (mem_copy_str(ctx->pOut, &str[s], len) != 0) + if (mem_copy_str(ctx->pOut, pos, end - pos) != 0) ctx->is_aborted = true; return; } @@ -708,18 +705,15 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) if (length == 0) return mem_set_str_static(ctx->pOut, "", 0); - char *str = argv[0].z; - uint64_t s = 0; - for (uint64_t i = 0; i < start && s < size; ++i) - s += utf8_len_char(str[s]); - uint64_t e = s; - for (uint64_t i = 0; i < length && e < size; ++i) - e += utf8_len_char(str[e]); - if (s >= size) + for (uint64_t i = 0; i < start && err == U_ZERO_ERROR; ++i) + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + const char *cur = pos; + for (uint64_t i = 0; i < length && cur < end; ++i) + ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); + if (err != U_ZERO_ERROR || cur == pos) return mem_set_str_static(ctx->pOut, "", 0); - uint64_t len = MIN(e, size) - s; - if (mem_copy_str(ctx->pOut, &str[s], len) != 0) + if (mem_copy_str(ctx->pOut, pos, cur - pos) != 0) ctx->is_aborted = true; } New patch: commit 8bf344a18cac835818142a9c76f327136ce29903 Author: Mergen Imeev <imeevma@gmail.com> Date: Mon Sep 27 10:15:14 2021 +0300 sql: rework SUBSTR() function This patch makes SUBSTR() work according to ANSI rules for SUBSTRING() function. Also, SUBSTR() can now work correctly with large INTEGER values. The SUBSTR() syntax has not changed. Part of #4145 @TarantoolBot document Title: SUBSTR() function SUBSTR() now works according to the ANSI rules for SUBSTRING(). Rules for SUBSTR() with 2 arguments: 1) let the first argument be VALUE, and the second argument be START; 2) VALUE should be STRING or VARBINARY, START should be INTEGER; 3) if any of arguments is NULL, NULL is returned; 4) let POS be MAX(START - 1, 0), END be length of the VALUE; 5) if POS >= END, the result is empty string; 6) if POS < END, the result will be substring of VALUE, starting from the position POS to the position END. Rules for SUBSTR() with 3 arguments: 1) let the first argument be VALUE, the second argument be START, and the third argument be LENGTH; 2) VALUE should be STRING or VARBINARY, START and LENGTH should be INTEGERs; 3) if any of arguments is NULL, NULL is returned; 4) if LENGTH < 0, an error is thrown; 5) let POS be MAX(START - 1, 0), END be START + LENGTH - 1; 6) if POS >= END, the result is empty string; 7) if POS < END, the result will be substring of VALUE, starting from the position POS to the position END. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 80b075dcf..65bf03250 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -577,6 +577,146 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) return mem_set_uint(ctx->pOut, 0); } +/** Implementation of the SUBSTR() function. */ +int +substr_normalize(int64_t base_start, bool is_start_neg, uint64_t base_length, + uint64_t *start, uint64_t *length) +{ + if (!is_start_neg && base_start > 0) { + *start = (uint64_t)base_start - 1; + *length = base_length; + return 0; + } + *start = 0; + if (base_length == 0) { + *length = 0; + return 0; + } + /* + * We are subtracting 1 from base_length instead of subtracting from + * base_start, since base_start can be INT64_MIN. At the same time, + * base_length is not less than 1. + */ + int64_t a = base_start; + int64_t b = (int64_t)(base_length - 1); + int64_t res; + bool is_neg; + /* + * Integer cannot overflow since non-positive value is added to positive + * value. + */ + if (sql_add_int(a, a != 0, b, false, &res, &is_neg) != 0) { + diag_set(ClientError, ER_SQL_EXECUTE, "integer is overflowed"); + return -1; + } + *length = is_neg ? 0 : (uint64_t)res; + return 0; +} + +static void +func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_int(&argv[1])); + + bool is_str = mem_is_str(&argv[0]); + uint64_t size = argv[0].n; + + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + if (start >= size) + return return_empty_str(ctx, is_str); + char *s = &argv[0].z[start]; + uint64_t n = size - start; + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : + mem_copy_bin(ctx->pOut, s, n) != 0; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (start >= size || length == 0) + return return_empty_str(ctx, is_str); + char *str = &argv[0].z[start]; + uint64_t len = MIN(size - start, length); + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, str, len) != 0 : + mem_copy_bin(ctx->pOut, str, len) != 0; +} + +static void +func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); + + UErrorCode err = U_ZERO_ERROR; + const char *pos = argv[0].z; + const char *end = argv[0].z + argv[0].n; + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + for (uint64_t i = 0; i < start && pos < end; ++i) + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + if (pos == end) + return mem_set_str_static(ctx->pOut, "", 0); + if (mem_copy_str(ctx->pOut, pos, end - pos) != 0) + ctx->is_aborted = true; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (length == 0) + return mem_set_str_static(ctx->pOut, "", 0); + + for (uint64_t i = 0; i < start && err == U_ZERO_ERROR; ++i) + ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + const char *cur = pos; + for (uint64_t i = 0; i < length && cur < end; ++i) + ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); + if (err != U_ZERO_ERROR || cur == pos) + return mem_set_str_static(ctx->pOut, "", 0); + + if (mem_copy_str(ctx->pOut, pos, cur - pos) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -774,116 +914,6 @@ printfFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/* - * Implementation of the substr() function. - * - * substr(x,p1,p2) returns p2 characters of x[] beginning with p1. - * p1 is 1-indexed. So substr(x,1,1) returns the first character - * of x. If x is text, then we actually count UTF-8 characters. - * If x is a blob, then we count bytes. - * - * If p1 is negative, then we begin abs(p1) from the end of x[]. - * - * If p2 is negative, return the p2 characters preceding p1. - */ -static void -substrFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - const unsigned char *z; - const unsigned char *z2; - int len; - int p0type; - int64_t p1, p2; - int negP2 = 0; - - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "SUBSTR", - "1 or 2", argc); - context->is_aborted = true; - return; - } - if (mem_is_null(&argv[1]) || (argc == 3 && mem_is_null(&argv[2]))) - return; - p0type = sql_value_type(&argv[0]); - p1 = mem_get_int_unsafe(&argv[1]); - if (p0type == MP_BIN) { - z = mem_as_bin(&argv[0]); - len = mem_len_unsafe(&argv[0]); - if (z == 0) - return; - assert(len == mem_len_unsafe(&argv[0])); - } else { - z = mem_as_ustr(&argv[0]); - if (z == 0) - return; - len = 0; - if (p1 < 0) - len = sql_utf8_char_count(z, mem_len_unsafe(&argv[0])); - } - if (argc == 3) { - p2 = mem_get_int_unsafe(&argv[2]); - if (p2 < 0) { - p2 = -p2; - negP2 = 1; - } - } else { - p2 = sql_context_db_handle(context)-> - aLimit[SQL_LIMIT_LENGTH]; - } - if (p1 < 0) { - p1 += len; - if (p1 < 0) { - p2 += p1; - if (p2 < 0) - p2 = 0; - p1 = 0; - } - } else if (p1 > 0) { - p1--; - } else if (p2 > 0) { - p2--; - } - if (negP2) { - p1 -= p2; - if (p1 < 0) { - p2 += p1; - p1 = 0; - } - } - assert(p1 >= 0 && p2 >= 0); - if (p0type != MP_BIN) { - /* - * In the code below 'cnt' and 'n_chars' is - * used because '\0' is not supposed to be - * end-of-string symbol. - */ - int byte_size = mem_len_unsafe(&argv[0]); - int n_chars = sql_utf8_char_count(z, byte_size); - int cnt = 0; - int i = 0; - while (cnt < n_chars && p1) { - SQL_UTF8_FWD_1(z, i, byte_size); - cnt++; - p1--; - } - z += i; - i = 0; - for (z2 = z; cnt < n_chars && p2; p2--) { - SQL_UTF8_FWD_1(z2, i, byte_size); - cnt++; - } - z2 += i; - mem_copy_str(context->pOut, (char *)z, z2 - z); - } else { - if (p1 + p2 > len) { - p2 = len - p1; - if (p2 < 0) - p2 = 0; - } - mem_copy_bin(context->pOut, (char *)&z[p1], p2); - } -} - /* * Implementation of the round() function */ @@ -1937,15 +1967,15 @@ static struct sql_func_definition definitions[] = { {"SOUNDEX", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, soundexFunc, NULL}, {"SUBSTR", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUBSTR", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUM", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_sum, NULL}, {"SUM", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_sum, NULL}, {"TOTAL", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_DOUBLE, step_total, diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 416f27d69..dc4dfdc0e 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -139,7 +139,7 @@ test:do_execsql_test( SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.3> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.3> }) @@ -149,7 +149,7 @@ test:do_execsql_test( SELECT substr(t1,-1,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.4> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.4> }) @@ -159,7 +159,7 @@ test:do_execsql_test( SELECT substr(t1,-2,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.5> - "e", "i", "a", "r", "i" + "", "", "", "", "" -- </func-2.5> }) @@ -169,7 +169,7 @@ test:do_execsql_test( SELECT substr(t1,-2,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.6> - "ee", "is", "am", "re", "is" + "", "", "", "", "" -- </func-2.6> }) @@ -179,7 +179,7 @@ test:do_execsql_test( SELECT substr(t1,-4,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.7> - "fr", "", "gr", "wa", "th" + "", "", "", "", "" -- </func-2.7> }) @@ -288,7 +288,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-3.8> - "8", "s", "s", "o" + "", "", "", "" -- </func-3.8> }) @@ -298,7 +298,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-3,2) FROM tbl1 ORDER BY t1 ]], { -- <func-3.9> - "F-", "er", "in", "ሴh" + "", "", "", "" -- </func-3.9> }) @@ -308,7 +308,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-4,3) FROM tbl1 ORDER BY t1 ]], { -- <func-3.10> - "TF-", "ter", "ain", "iሴh" + "", "", "", "" -- </func-3.10> }) diff --git a/test/sql-tap/func2.test.lua b/test/sql-tap/func2.test.lua index 792f020f1..b786b4d96 100755 --- a/test/sql-tap/func2.test.lua +++ b/test/sql-tap/func2.test.lua @@ -162,7 +162,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1) ]], { -- <func2-1.11> - "s" + "Supercalifragilisticexpialidocious" -- </func2-1.11> }) @@ -172,7 +172,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2) ]], { -- <func2-1.12> - "us" + "Supercalifragilisticexpialidocious" -- </func2-1.12> }) @@ -182,7 +182,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30) ]], { -- <func2-1.13> - "rcalifragilisticexpialidocious" + "Supercalifragilisticexpialidocious" -- </func2-1.13> }) @@ -344,7 +344,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 1) ]], { -- <func2-1.25.1> - "s" + "" -- </func2-1.25.1> }) @@ -354,7 +354,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 2) ]], { -- <func2-1.25.2> - "s" + "" -- </func2-1.25.2> }) @@ -364,7 +364,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2, 1) ]], { -- <func2-1.26> - "u" + "" -- </func2-1.26> }) @@ -374,7 +374,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30, 1) ]], { -- <func2-1.27> - "r" + "" -- </func2-1.27> }) @@ -394,7 +394,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 1) ]], { -- <func2-1.28.1> - "S" + "" -- </func2-1.28.1> }) @@ -404,7 +404,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 2) ]], { -- <func2-1.28.2> - "Su" + "" -- </func2-1.28.2> }) @@ -424,7 +424,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -35, 2) ]], { -- <func2-1.29.2> - "S" + "" -- </func2-1.29.2> }) @@ -464,11 +464,13 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -36, 3) ]], { -- <func2-1.30.3> - "S" + "" -- </func2-1.30.3> }) -- p1 is 1-indexed, p2 length to return, p2<0 return p2 chars before p1 +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_execsql_test( "func2-1.31.0", [[ @@ -479,23 +481,23 @@ test:do_execsql_test( -- </func2-1.31.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -1) ]], { -- <func2-1.31.1> - "" + 1, err -- </func2-1.31.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -2) ]], { -- <func2-1.31.2> - "" + 1, err -- </func2-1.31.2> }) @@ -509,13 +511,13 @@ test:do_execsql_test( -- </func2-1.32.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.32.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 1, -1) ]], { -- <func2-1.32.1> - "" + 1, err -- </func2-1.32.1> }) @@ -529,23 +531,23 @@ test:do_execsql_test( -- </func2-1.33.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -1) ]], { -- <func2-1.33.1> - "S" + 1, err -- </func2-1.33.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -2) ]], { -- <func2-1.33.2> - "S" + 1, err -- </func2-1.33.2> }) @@ -559,63 +561,63 @@ test:do_execsql_test( -- </func2-1.34.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -1) ]], { -- <func2-1.34.1> - "u" + 1, err -- </func2-1.34.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -2) ]], { -- <func2-1.34.2> - "Su" + 1, err -- </func2-1.34.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -1) ]], { -- <func2-1.35.1> - "o" + 1, err -- </func2-1.35.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -2) ]], { -- <func2-1.35.2> - "do" + 1, err -- </func2-1.35.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.36", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 34, -1) ]], { -- <func2-1.36> - "u" + 1, err -- </func2-1.36> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.37", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 35, -1) ]], { -- <func2-1.37> - "s" + 1, err -- </func2-1.37> }) @@ -629,23 +631,23 @@ test:do_execsql_test( -- </func2-1.38.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -1) ]], { -- <func2-1.38.1> - "" + 1, err -- </func2-1.38.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -2) ]], { -- <func2-1.38.2> - "s" + 1, err -- </func2-1.38.2> }) @@ -993,7 +995,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 1) ]], { -- <func2-2.8.1> - "ሴ" + "" -- </func2-2.8.1> }) @@ -1003,7 +1005,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 2) ]], { -- <func2-2.8.2> - "ሴ" + "" -- </func2-2.8.2> }) @@ -1130,21 +1132,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "12") test:do_test( "func2-3.5.0", @@ -1158,21 +1160,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "12") + end, "") test:do_test( "func2-3.5.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.5.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.6.0", @@ -1181,26 +1183,28 @@ test:do_test( return bin_to_hex(test.lindex(blob, 0)) end, "") +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_test( "func2-3.6.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -1)") + end, + {1, err}) test:do_test( "func2-3.6.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -2)") + end, + {1, err}) test:do_test( "func2-3.6.3", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -3)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -3)") + end, + {1, err}) test:do_test( "func2-3.7.0", @@ -1212,16 +1216,16 @@ test:do_test( test:do_test( "func2-3.7.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -1)") + end, + {1, err}) test:do_test( "func2-3.7.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -2)") + end, + {1, err}) test:do_test( "func2-3.8.0", @@ -1233,16 +1237,16 @@ test:do_test( test:do_test( "func2-3.8.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -1)") + end, + {1, err}) test:do_test( "func2-3.8.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -2)") + end, + {1, err}) test:do_test( "func2-3.9.0", @@ -1254,17 +1258,15 @@ test:do_test( test:do_test( "func2-3.9.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:do_test( "func2-3.9.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") - - + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:finish_test() diff --git a/test/sql-tap/substr.test.lua b/test/sql-tap/substr.test.lua index e7e6d7aca..45aae8506 100755 --- a/test/sql-tap/substr.test.lua +++ b/test/sql-tap/substr.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(93) +test:plan(85) --!./tcltestrunner.lua -- 2007 May 14 @@ -74,17 +74,11 @@ substr_test("1.2", "abcdefg","2","1","b") substr_test("1.3", "abcdefg","1","2","ab") substr_test("1.4", "abcdefg","1","100","abcdefg") substr_test("1.5", "abcdefg","0","2","a") -substr_test("1.6", "abcdefg","-1","1","g") -substr_test("1.7", "abcdefg","-1","10","g") -substr_test("1.8", "abcdefg","-5","3","cde") -substr_test("1.9", "abcdefg","-7","3","abc") -substr_test("1.10", "abcdefg","-100","98","abcde") -substr_test("1.11", "abcdefg","5","-1","d") -substr_test("1.12", "abcdefg","5","-4","abcd") -substr_test("1.13", "abcdefg","5","-5","abcd") -substr_test("1.14", "abcdefg","-5","-1","b") -substr_test("1.15", "abcdefg","-5","-2","ab") -substr_test("1.16", "abcdefg","-5","-3","ab") +substr_test("1.6", "abcdefg","-1","1","") +substr_test("1.7", "abcdefg","-1","10","abcdefg") +substr_test("1.8", "abcdefg","-5","3","") +substr_test("1.9", "abcdefg","-7","3","") +substr_test("1.10", "abcdefg","-100","98","") substr_test("1.17", "abcdefg","100","200","") substr_test("1.18", "abcdefg","200","100","") -- Make sure NULL is returned if any parameter is NULL @@ -144,21 +138,20 @@ test:do_test( substr_test("2.1", "ሴ⍅㑖","1","1","ሴ") substr_test("2.2", "ሴ⍅㑖","2","1","⍅") substr_test("2.3", "ሴ⍅㑖","1","2","ሴ⍅") -substr_test("2.4", "ሴ⍅㑖","-1","1","㑖") -substr_test("2.5", "aሴb⍅c㑖c","-5","3","b⍅c") -substr_test("2.6", "aሴb⍅c㑖c","-2","-3","b⍅c") +substr_test("2.4", "ሴ⍅㑖","-1","1","") +substr_test("2.5", "aሴb⍅c㑖c","-5","3","") -- Basic functionality for BLOBs -- subblob_test("3.1", "61626364656667","1","1","61") subblob_test("3.2", "61626364656667","2","1","62") subblob_test("3.3", "61626364656667","1","2","6162") subblob_test("3.4", "61626364656667","1","100","61626364656667") -subblob_test("3.5", "61626364656667","0","2","61") -subblob_test("3.6", "61626364656667","-1","1","67") -subblob_test("3.7", "61626364656667","-1","10","67") -subblob_test("3.8", "61626364656667","-5","3","636465") -subblob_test("3.9", "61626364656667","-7","3","616263") -subblob_test("3.10", "61626364656667","-100","98","6162636465") +subblob_test("3.5", "61626364656667", "0", "2", "61") +subblob_test("3.6", "61626364656667", "-1", "1", "") +subblob_test("3.7", "61626364656667", "-1", "10", "61626364656667") +subblob_test("3.8", "61626364656667", "-5", "3", "") +subblob_test("3.9", "61626364656667","-7","3", "") +subblob_test("3.10", "61626364656667", "-100", "98", "") subblob_test("3.11", "61626364656667","100","200","") subblob_test("3.12", "61626364656667","200","100","") -- If these blobs were strings, then they would contain multi-byte @@ -168,9 +161,9 @@ subblob_test("3.12", "61626364656667","200","100","") subblob_test("4.1", "61E188B462E28D8563E3919663","1","1","61") subblob_test("4.2", "61E188B462E28D8563E3919663","2","1","E1") subblob_test("4.3", "61E188B462E28D8563E3919663","1","2","61E1") -subblob_test("4.4", "61E188B462E28D8563E3919663","-2","1","96") -subblob_test("4.5", "61E188B462E28D8563E3919663","-5","4","63E39196") -subblob_test("4.6", "61E188B462E28D8563E3919663","-100","98","61E188B462E28D8563E391") +subblob_test("4.4", "61E188B462E28D8563E3919663", "-2", "1", "") +subblob_test("4.5", "61E188B462E28D8563E3919663", "-5", "4", "") +subblob_test("4.6", "61E188B462E28D8563E3919663", "-100", "98", "") -- Two-argument SUBSTR -- local function substr_2_test(id, string, idx, result) @@ -193,7 +186,85 @@ local function substr_2_test(id, string, idx, result) end substr_2_test("5.1","abcdefghijklmnop","5","efghijklmnop") -substr_2_test("5.2","abcdef","-5","bcdef") +substr_2_test("5.2","abcdef","-5","abcdef") +-- +-- gh-4145: Make sure SUBSTR() throws an error if the third argument is +-- negative. +-- +test:do_catchsql_test( + "substr-6.1", + [[ + SELECT SUBSTR('12345', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +test:do_catchsql_test( + "substr-6.2", + [[ + SELECT SUBSTR(x'3132333435', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +-- gh-4145: Make sure that SUBSTR() works according to ANSI. + +-- +-- Make sure SUBSTR() returns "" if the sum of the second and third arguments is +-- 1 or less. +-- +test:do_execsql_test( + "builtins-6.3", + [[ + SELECT SUBSTR('asdfg', -10, 5), SUBSTR('asdfg', -4, 5); + ]], + { + '', '' + } +) + +-- +-- Make sure that if the sum of the second and third arguments is more than 1 +-- and the second argument is negative, the result starts from the start of the +-- string and length of the result will be one less than sum of the second and +-- third arguments. +-- +test:do_execsql_test( + "builtins-6.4", + [[ + SELECT SUBSTR('123456789', -5, 10); + ]], + { + '1234' + } +) + +-- Make sure SUBSTR() can work with big INTEGERs. +test:do_execsql_test( + "builtins-6.5", + [[ + SELECT SUBSTR('123456789', -9223372036854775808, 9223372036854775812); + ]], + { + '123' + } +) + +test:do_execsql_test( + "builtins-6.6", + [[ + SELECT SUBSTR('123456789', 0, 18000000000000000000); + ]], + { + '123456789' + } +) test:finish_test() ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function 2021-10-20 17:15 ` Mergen Imeev via Tarantool-patches @ 2021-10-28 22:13 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:45 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-28 22:13 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-patches Thanks for the fixes! > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 80b075dcf..65bf03250 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > + > +static void > +func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) > +{ > + assert(argc == 2 || argc == 3); > + if (mem_is_any_null(&argv[0], &argv[1])) > + return; > + assert(mem_is_bytes(&argv[0]) && mem_is_int(&argv[1])); > + > + bool is_str = mem_is_str(&argv[0]); > + uint64_t size = argv[0].n; > + > + if (argc == 2) { > + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? > + argv[1].u.u - 1 : 0; > + if (start >= size) > + return return_empty_str(ctx, is_str); > + char *s = &argv[0].z[start]; > + uint64_t n = size - start; > + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : > + mem_copy_bin(ctx->pOut, s, n) != 0; > + return; > + } > + > + assert(argc == 3); > + if (mem_is_null(&argv[2])) > + return; > + assert(mem_is_int(&argv[2])); > + if (!mem_is_uint(&argv[2])) { > + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " > + "cannot be less than 0"); > + ctx->is_aborted = true; > + return; > + } > + uint64_t start; > + uint64_t length; > + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, > + &start, &length) != 0) { The second line of the call is misaligned. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function 2021-10-28 22:13 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-11-01 10:45 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:45 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for the review! My answer, diff and new patch below. Also, I replaced ucnv_getNextUChar() by U8_NEXT(). On Fri, Oct 29, 2021 at 12:13:05AM +0200, Vladislav Shpilevoy wrote: > Thanks for the fixes! > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index 80b075dcf..65bf03250 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > + > > +static void > > +func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) > > +{ > > + assert(argc == 2 || argc == 3); > > + if (mem_is_any_null(&argv[0], &argv[1])) > > + return; > > + assert(mem_is_bytes(&argv[0]) && mem_is_int(&argv[1])); > > + > > + bool is_str = mem_is_str(&argv[0]); > > + uint64_t size = argv[0].n; > > + > > + if (argc == 2) { > > + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? > > + argv[1].u.u - 1 : 0; > > + if (start >= size) > > + return return_empty_str(ctx, is_str); > > + char *s = &argv[0].z[start]; > > + uint64_t n = size - start; > > + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : > > + mem_copy_bin(ctx->pOut, s, n) != 0; > > + return; > > + } > > + > > + assert(argc == 3); > > + if (mem_is_null(&argv[2])) > > + return; > > + assert(mem_is_int(&argv[2])); > > + if (!mem_is_uint(&argv[2])) { > > + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " > > + "cannot be less than 0"); > > + ctx->is_aborted = true; > > + return; > > + } > > + uint64_t start; > > + uint64_t length; > > + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, > > + &start, &length) != 0) { > > The second line of the call is misaligned. Fixed. Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index a76b023eb..7e71a757b 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -623,8 +623,12 @@ func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) if (argc == 2) { uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? argv[1].u.u - 1 : 0; - if (start >= size) - return return_empty_str(ctx, is_str); + if (start >= size) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } char *s = &argv[0].z[start]; uint64_t n = size - start; ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : @@ -645,12 +649,16 @@ func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) uint64_t start; uint64_t length; if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, - &start, &length) != 0) { + &start, &length) != 0) { ctx->is_aborted = true; return; } - if (start >= size || length == 0) - return return_empty_str(ctx, is_str); + if (start >= size || length == 0) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } char *str = &argv[0].z[start]; uint64_t len = MIN(size - start, length); ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, str, len) != 0 : @@ -666,17 +674,19 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) return; assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); - UErrorCode err = U_ZERO_ERROR; - const char *pos = argv[0].z; - const char *end = argv[0].z + argv[0].n; + const char *str = argv[0].z; + int pos = 0; + int end = argv[0].n; if (argc == 2) { uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? argv[1].u.u - 1 : 0; - for (uint64_t i = 0; i < start && pos < end; ++i) - ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } if (pos == end) return mem_set_str_static(ctx->pOut, "", 0); - if (mem_copy_str(ctx->pOut, pos, end - pos) != 0) + if (mem_copy_str(ctx->pOut, str + pos, end - pos) != 0) ctx->is_aborted = true; return; } @@ -694,22 +704,27 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) uint64_t start; uint64_t length; if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, - &start, &length) != 0) { + &start, &length) != 0) { ctx->is_aborted = true; return; } if (length == 0) return mem_set_str_static(ctx->pOut, "", 0); - for (uint64_t i = 0; i < start && err == U_ZERO_ERROR; ++i) - ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); - const char *cur = pos; - for (uint64_t i = 0; i < length && cur < end; ++i) - ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); - if (err != U_ZERO_ERROR || cur == pos) + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } + if (pos == end) return mem_set_str_static(ctx->pOut, "", 0); - if (mem_copy_str(ctx->pOut, pos, cur - pos) != 0) + int cur = pos; + for (uint64_t i = 0; i < length && cur < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, cur, end, c); + } + assert(cur > pos); + if (mem_copy_str(ctx->pOut, str + pos, cur - pos) != 0) ctx->is_aborted = true; } New patch: commit 56a7a57544cebcbf2992fe4543a8b4a30cdc3f67 Author: Mergen Imeev <imeevma@gmail.com> Date: Mon Sep 27 10:15:14 2021 +0300 sql: rework SUBSTR() function This patch makes SUBSTR() work according to ANSI rules for SUBSTRING() function. Also, SUBSTR() can now work correctly with large INTEGER values. The SUBSTR() syntax has not changed. Part of #4145 @TarantoolBot document Title: SUBSTR() function SUBSTR() now works according to the ANSI rules for SUBSTRING(). Rules for SUBSTR() with 2 arguments: 1) let the first argument be VALUE, and the second argument be START; 2) VALUE should be STRING or VARBINARY, START should be INTEGER; 3) if any of arguments is NULL, NULL is returned; 4) let POS be MAX(START - 1, 0), END be length of the VALUE; 5) if POS >= END, the result is empty string; 6) if POS < END, the result will be substring of VALUE, starting from the position POS to the position END. Rules for SUBSTR() with 3 arguments: 1) let the first argument be VALUE, the second argument be START, and the third argument be LENGTH; 2) VALUE should be STRING or VARBINARY, START and LENGTH should be INTEGERs; 3) if any of arguments is NULL, NULL is returned; 4) if LENGTH < 0, an error is thrown; 5) let POS be MAX(START - 1, 0), END be START + LENGTH - 1; 6) if POS >= END, the result is empty string; 7) if POS < END, the result will be substring of VALUE, starting from the position POS to the position END. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 7914d2ec7..7e71a757b 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -573,6 +573,161 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) return mem_set_uint(ctx->pOut, 0); } +/** Implementation of the SUBSTR() function. */ +int +substr_normalize(int64_t base_start, bool is_start_neg, uint64_t base_length, + uint64_t *start, uint64_t *length) +{ + if (!is_start_neg && base_start > 0) { + *start = (uint64_t)base_start - 1; + *length = base_length; + return 0; + } + *start = 0; + if (base_length == 0) { + *length = 0; + return 0; + } + /* + * We are subtracting 1 from base_length instead of subtracting from + * base_start, since base_start can be INT64_MIN. At the same time, + * base_length is not less than 1. + */ + int64_t a = base_start; + int64_t b = (int64_t)(base_length - 1); + int64_t res; + bool is_neg; + /* + * Integer cannot overflow since non-positive value is added to positive + * value. + */ + if (sql_add_int(a, a != 0, b, false, &res, &is_neg) != 0) { + diag_set(ClientError, ER_SQL_EXECUTE, "integer is overflowed"); + return -1; + } + *length = is_neg ? 0 : (uint64_t)res; + return 0; +} + +static void +func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_int(&argv[1])); + + bool is_str = mem_is_str(&argv[0]); + uint64_t size = argv[0].n; + + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + if (start >= size) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } + char *s = &argv[0].z[start]; + uint64_t n = size - start; + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : + mem_copy_bin(ctx->pOut, s, n) != 0; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (start >= size || length == 0) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } + char *str = &argv[0].z[start]; + uint64_t len = MIN(size - start, length); + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, str, len) != 0 : + mem_copy_bin(ctx->pOut, str, len) != 0; +} + +static void +func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); + + const char *str = argv[0].z; + int pos = 0; + int end = argv[0].n; + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } + if (pos == end) + return mem_set_str_static(ctx->pOut, "", 0); + if (mem_copy_str(ctx->pOut, str + pos, end - pos) != 0) + ctx->is_aborted = true; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (length == 0) + return mem_set_str_static(ctx->pOut, "", 0); + + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } + if (pos == end) + return mem_set_str_static(ctx->pOut, "", 0); + + int cur = pos; + for (uint64_t i = 0; i < length && cur < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, cur, end, c); + } + assert(cur > pos); + if (mem_copy_str(ctx->pOut, str + pos, cur - pos) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -770,116 +925,6 @@ printfFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/* - * Implementation of the substr() function. - * - * substr(x,p1,p2) returns p2 characters of x[] beginning with p1. - * p1 is 1-indexed. So substr(x,1,1) returns the first character - * of x. If x is text, then we actually count UTF-8 characters. - * If x is a blob, then we count bytes. - * - * If p1 is negative, then we begin abs(p1) from the end of x[]. - * - * If p2 is negative, return the p2 characters preceding p1. - */ -static void -substrFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - const unsigned char *z; - const unsigned char *z2; - int len; - int p0type; - int64_t p1, p2; - int negP2 = 0; - - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "SUBSTR", - "1 or 2", argc); - context->is_aborted = true; - return; - } - if (mem_is_null(&argv[1]) || (argc == 3 && mem_is_null(&argv[2]))) - return; - p0type = sql_value_type(&argv[0]); - p1 = mem_get_int_unsafe(&argv[1]); - if (p0type == MP_BIN) { - z = mem_as_bin(&argv[0]); - len = mem_len_unsafe(&argv[0]); - if (z == 0) - return; - assert(len == mem_len_unsafe(&argv[0])); - } else { - z = mem_as_ustr(&argv[0]); - if (z == 0) - return; - len = 0; - if (p1 < 0) - len = sql_utf8_char_count(z, mem_len_unsafe(&argv[0])); - } - if (argc == 3) { - p2 = mem_get_int_unsafe(&argv[2]); - if (p2 < 0) { - p2 = -p2; - negP2 = 1; - } - } else { - p2 = sql_context_db_handle(context)-> - aLimit[SQL_LIMIT_LENGTH]; - } - if (p1 < 0) { - p1 += len; - if (p1 < 0) { - p2 += p1; - if (p2 < 0) - p2 = 0; - p1 = 0; - } - } else if (p1 > 0) { - p1--; - } else if (p2 > 0) { - p2--; - } - if (negP2) { - p1 -= p2; - if (p1 < 0) { - p2 += p1; - p1 = 0; - } - } - assert(p1 >= 0 && p2 >= 0); - if (p0type != MP_BIN) { - /* - * In the code below 'cnt' and 'n_chars' is - * used because '\0' is not supposed to be - * end-of-string symbol. - */ - int byte_size = mem_len_unsafe(&argv[0]); - int n_chars = sql_utf8_char_count(z, byte_size); - int cnt = 0; - int i = 0; - while (cnt < n_chars && p1) { - SQL_UTF8_FWD_1(z, i, byte_size); - cnt++; - p1--; - } - z += i; - i = 0; - for (z2 = z; cnt < n_chars && p2; p2--) { - SQL_UTF8_FWD_1(z2, i, byte_size); - cnt++; - } - z2 += i; - mem_copy_str(context->pOut, (char *)z, z2 - z); - } else { - if (p1 + p2 > len) { - p2 = len - p1; - if (p2 < 0) - p2 = 0; - } - mem_copy_bin(context->pOut, (char *)&z[p1], p2); - } -} - /* * Implementation of the round() function */ @@ -1933,15 +1978,15 @@ static struct sql_func_definition definitions[] = { {"SOUNDEX", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, soundexFunc, NULL}, {"SUBSTR", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUBSTR", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUM", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_sum, NULL}, {"SUM", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_sum, NULL}, {"TOTAL", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_DOUBLE, step_total, diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 416f27d69..dc4dfdc0e 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -139,7 +139,7 @@ test:do_execsql_test( SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.3> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.3> }) @@ -149,7 +149,7 @@ test:do_execsql_test( SELECT substr(t1,-1,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.4> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.4> }) @@ -159,7 +159,7 @@ test:do_execsql_test( SELECT substr(t1,-2,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.5> - "e", "i", "a", "r", "i" + "", "", "", "", "" -- </func-2.5> }) @@ -169,7 +169,7 @@ test:do_execsql_test( SELECT substr(t1,-2,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.6> - "ee", "is", "am", "re", "is" + "", "", "", "", "" -- </func-2.6> }) @@ -179,7 +179,7 @@ test:do_execsql_test( SELECT substr(t1,-4,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.7> - "fr", "", "gr", "wa", "th" + "", "", "", "", "" -- </func-2.7> }) @@ -288,7 +288,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-3.8> - "8", "s", "s", "o" + "", "", "", "" -- </func-3.8> }) @@ -298,7 +298,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-3,2) FROM tbl1 ORDER BY t1 ]], { -- <func-3.9> - "F-", "er", "in", "ሴh" + "", "", "", "" -- </func-3.9> }) @@ -308,7 +308,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-4,3) FROM tbl1 ORDER BY t1 ]], { -- <func-3.10> - "TF-", "ter", "ain", "iሴh" + "", "", "", "" -- </func-3.10> }) diff --git a/test/sql-tap/func2.test.lua b/test/sql-tap/func2.test.lua index 792f020f1..b786b4d96 100755 --- a/test/sql-tap/func2.test.lua +++ b/test/sql-tap/func2.test.lua @@ -162,7 +162,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1) ]], { -- <func2-1.11> - "s" + "Supercalifragilisticexpialidocious" -- </func2-1.11> }) @@ -172,7 +172,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2) ]], { -- <func2-1.12> - "us" + "Supercalifragilisticexpialidocious" -- </func2-1.12> }) @@ -182,7 +182,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30) ]], { -- <func2-1.13> - "rcalifragilisticexpialidocious" + "Supercalifragilisticexpialidocious" -- </func2-1.13> }) @@ -344,7 +344,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 1) ]], { -- <func2-1.25.1> - "s" + "" -- </func2-1.25.1> }) @@ -354,7 +354,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 2) ]], { -- <func2-1.25.2> - "s" + "" -- </func2-1.25.2> }) @@ -364,7 +364,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2, 1) ]], { -- <func2-1.26> - "u" + "" -- </func2-1.26> }) @@ -374,7 +374,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30, 1) ]], { -- <func2-1.27> - "r" + "" -- </func2-1.27> }) @@ -394,7 +394,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 1) ]], { -- <func2-1.28.1> - "S" + "" -- </func2-1.28.1> }) @@ -404,7 +404,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 2) ]], { -- <func2-1.28.2> - "Su" + "" -- </func2-1.28.2> }) @@ -424,7 +424,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -35, 2) ]], { -- <func2-1.29.2> - "S" + "" -- </func2-1.29.2> }) @@ -464,11 +464,13 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -36, 3) ]], { -- <func2-1.30.3> - "S" + "" -- </func2-1.30.3> }) -- p1 is 1-indexed, p2 length to return, p2<0 return p2 chars before p1 +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_execsql_test( "func2-1.31.0", [[ @@ -479,23 +481,23 @@ test:do_execsql_test( -- </func2-1.31.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -1) ]], { -- <func2-1.31.1> - "" + 1, err -- </func2-1.31.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -2) ]], { -- <func2-1.31.2> - "" + 1, err -- </func2-1.31.2> }) @@ -509,13 +511,13 @@ test:do_execsql_test( -- </func2-1.32.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.32.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 1, -1) ]], { -- <func2-1.32.1> - "" + 1, err -- </func2-1.32.1> }) @@ -529,23 +531,23 @@ test:do_execsql_test( -- </func2-1.33.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -1) ]], { -- <func2-1.33.1> - "S" + 1, err -- </func2-1.33.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -2) ]], { -- <func2-1.33.2> - "S" + 1, err -- </func2-1.33.2> }) @@ -559,63 +561,63 @@ test:do_execsql_test( -- </func2-1.34.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -1) ]], { -- <func2-1.34.1> - "u" + 1, err -- </func2-1.34.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -2) ]], { -- <func2-1.34.2> - "Su" + 1, err -- </func2-1.34.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -1) ]], { -- <func2-1.35.1> - "o" + 1, err -- </func2-1.35.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -2) ]], { -- <func2-1.35.2> - "do" + 1, err -- </func2-1.35.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.36", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 34, -1) ]], { -- <func2-1.36> - "u" + 1, err -- </func2-1.36> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.37", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 35, -1) ]], { -- <func2-1.37> - "s" + 1, err -- </func2-1.37> }) @@ -629,23 +631,23 @@ test:do_execsql_test( -- </func2-1.38.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -1) ]], { -- <func2-1.38.1> - "" + 1, err -- </func2-1.38.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -2) ]], { -- <func2-1.38.2> - "s" + 1, err -- </func2-1.38.2> }) @@ -993,7 +995,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 1) ]], { -- <func2-2.8.1> - "ሴ" + "" -- </func2-2.8.1> }) @@ -1003,7 +1005,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 2) ]], { -- <func2-2.8.2> - "ሴ" + "" -- </func2-2.8.2> }) @@ -1130,21 +1132,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "12") test:do_test( "func2-3.5.0", @@ -1158,21 +1160,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "12") + end, "") test:do_test( "func2-3.5.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.5.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.6.0", @@ -1181,26 +1183,28 @@ test:do_test( return bin_to_hex(test.lindex(blob, 0)) end, "") +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_test( "func2-3.6.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -1)") + end, + {1, err}) test:do_test( "func2-3.6.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -2)") + end, + {1, err}) test:do_test( "func2-3.6.3", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -3)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -3)") + end, + {1, err}) test:do_test( "func2-3.7.0", @@ -1212,16 +1216,16 @@ test:do_test( test:do_test( "func2-3.7.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -1)") + end, + {1, err}) test:do_test( "func2-3.7.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -2)") + end, + {1, err}) test:do_test( "func2-3.8.0", @@ -1233,16 +1237,16 @@ test:do_test( test:do_test( "func2-3.8.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -1)") + end, + {1, err}) test:do_test( "func2-3.8.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -2)") + end, + {1, err}) test:do_test( "func2-3.9.0", @@ -1254,17 +1258,15 @@ test:do_test( test:do_test( "func2-3.9.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:do_test( "func2-3.9.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") - - + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:finish_test() diff --git a/test/sql-tap/substr.test.lua b/test/sql-tap/substr.test.lua index e7e6d7aca..45aae8506 100755 --- a/test/sql-tap/substr.test.lua +++ b/test/sql-tap/substr.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(93) +test:plan(85) --!./tcltestrunner.lua -- 2007 May 14 @@ -74,17 +74,11 @@ substr_test("1.2", "abcdefg","2","1","b") substr_test("1.3", "abcdefg","1","2","ab") substr_test("1.4", "abcdefg","1","100","abcdefg") substr_test("1.5", "abcdefg","0","2","a") -substr_test("1.6", "abcdefg","-1","1","g") -substr_test("1.7", "abcdefg","-1","10","g") -substr_test("1.8", "abcdefg","-5","3","cde") -substr_test("1.9", "abcdefg","-7","3","abc") -substr_test("1.10", "abcdefg","-100","98","abcde") -substr_test("1.11", "abcdefg","5","-1","d") -substr_test("1.12", "abcdefg","5","-4","abcd") -substr_test("1.13", "abcdefg","5","-5","abcd") -substr_test("1.14", "abcdefg","-5","-1","b") -substr_test("1.15", "abcdefg","-5","-2","ab") -substr_test("1.16", "abcdefg","-5","-3","ab") +substr_test("1.6", "abcdefg","-1","1","") +substr_test("1.7", "abcdefg","-1","10","abcdefg") +substr_test("1.8", "abcdefg","-5","3","") +substr_test("1.9", "abcdefg","-7","3","") +substr_test("1.10", "abcdefg","-100","98","") substr_test("1.17", "abcdefg","100","200","") substr_test("1.18", "abcdefg","200","100","") -- Make sure NULL is returned if any parameter is NULL @@ -144,21 +138,20 @@ test:do_test( substr_test("2.1", "ሴ⍅㑖","1","1","ሴ") substr_test("2.2", "ሴ⍅㑖","2","1","⍅") substr_test("2.3", "ሴ⍅㑖","1","2","ሴ⍅") -substr_test("2.4", "ሴ⍅㑖","-1","1","㑖") -substr_test("2.5", "aሴb⍅c㑖c","-5","3","b⍅c") -substr_test("2.6", "aሴb⍅c㑖c","-2","-3","b⍅c") +substr_test("2.4", "ሴ⍅㑖","-1","1","") +substr_test("2.5", "aሴb⍅c㑖c","-5","3","") -- Basic functionality for BLOBs -- subblob_test("3.1", "61626364656667","1","1","61") subblob_test("3.2", "61626364656667","2","1","62") subblob_test("3.3", "61626364656667","1","2","6162") subblob_test("3.4", "61626364656667","1","100","61626364656667") -subblob_test("3.5", "61626364656667","0","2","61") -subblob_test("3.6", "61626364656667","-1","1","67") -subblob_test("3.7", "61626364656667","-1","10","67") -subblob_test("3.8", "61626364656667","-5","3","636465") -subblob_test("3.9", "61626364656667","-7","3","616263") -subblob_test("3.10", "61626364656667","-100","98","6162636465") +subblob_test("3.5", "61626364656667", "0", "2", "61") +subblob_test("3.6", "61626364656667", "-1", "1", "") +subblob_test("3.7", "61626364656667", "-1", "10", "61626364656667") +subblob_test("3.8", "61626364656667", "-5", "3", "") +subblob_test("3.9", "61626364656667","-7","3", "") +subblob_test("3.10", "61626364656667", "-100", "98", "") subblob_test("3.11", "61626364656667","100","200","") subblob_test("3.12", "61626364656667","200","100","") -- If these blobs were strings, then they would contain multi-byte @@ -168,9 +161,9 @@ subblob_test("3.12", "61626364656667","200","100","") subblob_test("4.1", "61E188B462E28D8563E3919663","1","1","61") subblob_test("4.2", "61E188B462E28D8563E3919663","2","1","E1") subblob_test("4.3", "61E188B462E28D8563E3919663","1","2","61E1") -subblob_test("4.4", "61E188B462E28D8563E3919663","-2","1","96") -subblob_test("4.5", "61E188B462E28D8563E3919663","-5","4","63E39196") -subblob_test("4.6", "61E188B462E28D8563E3919663","-100","98","61E188B462E28D8563E391") +subblob_test("4.4", "61E188B462E28D8563E3919663", "-2", "1", "") +subblob_test("4.5", "61E188B462E28D8563E3919663", "-5", "4", "") +subblob_test("4.6", "61E188B462E28D8563E3919663", "-100", "98", "") -- Two-argument SUBSTR -- local function substr_2_test(id, string, idx, result) @@ -193,7 +186,85 @@ local function substr_2_test(id, string, idx, result) end substr_2_test("5.1","abcdefghijklmnop","5","efghijklmnop") -substr_2_test("5.2","abcdef","-5","bcdef") +substr_2_test("5.2","abcdef","-5","abcdef") +-- +-- gh-4145: Make sure SUBSTR() throws an error if the third argument is +-- negative. +-- +test:do_catchsql_test( + "substr-6.1", + [[ + SELECT SUBSTR('12345', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +test:do_catchsql_test( + "substr-6.2", + [[ + SELECT SUBSTR(x'3132333435', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +-- gh-4145: Make sure that SUBSTR() works according to ANSI. + +-- +-- Make sure SUBSTR() returns "" if the sum of the second and third arguments is +-- 1 or less. +-- +test:do_execsql_test( + "builtins-6.3", + [[ + SELECT SUBSTR('asdfg', -10, 5), SUBSTR('asdfg', -4, 5); + ]], + { + '', '' + } +) + +-- +-- Make sure that if the sum of the second and third arguments is more than 1 +-- and the second argument is negative, the result starts from the start of the +-- string and length of the result will be one less than sum of the second and +-- third arguments. +-- +test:do_execsql_test( + "builtins-6.4", + [[ + SELECT SUBSTR('123456789', -5, 10); + ]], + { + '1234' + } +) + +-- Make sure SUBSTR() can work with big INTEGERs. +test:do_execsql_test( + "builtins-6.5", + [[ + SELECT SUBSTR('123456789', -9223372036854775808, 9223372036854775812); + ]], + { + '123' + } +) + +test:do_execsql_test( + "builtins-6.6", + [[ + SELECT SUBSTR('123456789', 0, 18000000000000000000); + ]], + { + '123456789' + } +) test:finish_test() ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (6 preceding siblings ...) 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 ` Mergen Imeev via Tarantool-patches 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-04 13:32 ` [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches 2021-11-01 13:38 ` Vladislav Shpilevoy via Tarantool-patches 9 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-01 16:29 UTC (permalink / raw) To: v.shpilevoy; +Cc: tarantool-patches Part of #4145 --- src/box/sql/func.c | 40 +++++++++++----------------------------- 1 file changed, 11 insertions(+), 29 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 1b4d52225..27106ced6 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1281,46 +1281,28 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) { u32 escape = SQL_END_OF_STRING; int nPat; - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, - "LIKE", "2 or 3", argc); - context->is_aborted = true; - return; - } - sql *db = sql_context_db_handle(context); - int rhs_type = sql_value_type(&argv[0]); - int lhs_type = sql_value_type(&argv[1]); - - if (lhs_type != MP_STR || rhs_type != MP_STR) { - if (lhs_type == MP_NIL || rhs_type == MP_NIL) - return; - const char *str = rhs_type != MP_STR ? - mem_str(&argv[0]) : mem_str(&argv[1]); - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", str); - context->is_aborted = true; + assert(argc == 2 || argc == 3); + struct sql *db = sql_context_db_handle(context); + if (mem_is_any_null(&argv[0], &argv[1])) return; - } - const char *zB = mem_as_str0(&argv[0]); - const char *zA = mem_as_str0(&argv[1]); - const char *zB_end = zB + mem_len_unsafe(&argv[0]); - const char *zA_end = zA + mem_len_unsafe(&argv[1]); + assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); + const char *zB = argv[0].z; + const char *zA = argv[1].z; + const char *zB_end = zB + argv[0].n; + const char *zA_end = zA + argv[1].n; /* * Limit the length of the LIKE pattern to avoid problems * of deep recursion and N*N behavior in * sql_utf8_pattern_compare(). */ - nPat = mem_len_unsafe(&argv[0]); - testcase(nPat == db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]); - testcase(nPat == db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH] + 1); + nPat = argv[0].n; if (nPat > db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { diag_set(ClientError, ER_SQL_EXECUTE, "LIKE pattern is too "\ "complex"); context->is_aborted = true; return; } - /* Encoding did not change */ - assert(zB == mem_as_str0(&argv[0])); if (argc == 3) { /* @@ -1328,10 +1310,10 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) * single UTF-8 character. Otherwise, return an * error. */ - const unsigned char *zEsc = mem_as_ustr(&argv[2]); + const unsigned char *zEsc = (const unsigned char *)argv[2].z; if (zEsc == 0) return; - if (sql_utf8_char_count(zEsc, mem_len_unsafe(&argv[2])) != 1) { + if (sql_utf8_char_count(zEsc, argv[2].n) != 1) { diag_set(ClientError, ER_SQL_EXECUTE, "ESCAPE "\ "expression must be a single character"); context->is_aborted = true; -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function Mergen Imeev via Tarantool-patches @ 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 17:19 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-10-08 22:02 UTC (permalink / raw) To: imeevma; +Cc: tarantool-patches Thanks for the patch! > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 1b4d52225..27106ced6 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1281,46 +1281,28 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) > { > u32 escape = SQL_END_OF_STRING; > int nPat; > - if (argc != 2 && argc != 3) { > - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, > - "LIKE", "2 or 3", argc); > - context->is_aborted = true; > - return; > - } > - sql *db = sql_context_db_handle(context); > - int rhs_type = sql_value_type(&argv[0]); > - int lhs_type = sql_value_type(&argv[1]); > - > - if (lhs_type != MP_STR || rhs_type != MP_STR) { > - if (lhs_type == MP_NIL || rhs_type == MP_NIL) > - return; > - const char *str = rhs_type != MP_STR ? > - mem_str(&argv[0]) : mem_str(&argv[1]); > - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", str); > - context->is_aborted = true; > + assert(argc == 2 || argc == 3); > + struct sql *db = sql_context_db_handle(context); You can get the db after the is_null check below. It is not used when the check passes. > + if (mem_is_any_null(&argv[0], &argv[1])) > return; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-10-20 17:19 ` Mergen Imeev via Tarantool-patches 2021-11-01 10:48 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-20 17:19 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Thank you for the review! My answer, diff and new patch below. Also, I replaced self-created and SQLite3 functions to work with UTF8 characters by iucnv_getNextUChar(). And I replaced sql_result_bool() by mem_set_bool(), so I will be able to drop sql_result_bool() later. On Sat, Oct 09, 2021 at 12:02:35AM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index 1b4d52225..27106ced6 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -1281,46 +1281,28 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) > > { > > u32 escape = SQL_END_OF_STRING; > > int nPat; > > - if (argc != 2 && argc != 3) { > > - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, > > - "LIKE", "2 or 3", argc); > > - context->is_aborted = true; > > - return; > > - } > > - sql *db = sql_context_db_handle(context); > > - int rhs_type = sql_value_type(&argv[0]); > > - int lhs_type = sql_value_type(&argv[1]); > > - > > - if (lhs_type != MP_STR || rhs_type != MP_STR) { > > - if (lhs_type == MP_NIL || rhs_type == MP_NIL) > > - return; > > - const char *str = rhs_type != MP_STR ? > > - mem_str(&argv[0]) : mem_str(&argv[1]); > > - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", str); > > - context->is_aborted = true; > > + assert(argc == 2 || argc == 3); > > + struct sql *db = sql_context_db_handle(context); > > You can get the db after the is_null check below. It is not > used when the check passes. > Fixed. Dropped this line and used sql_get() instead. > > + if (mem_is_any_null(&argv[0], &argv[1])) > > return; Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 90417c892..afe34f7f0 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1261,7 +1261,6 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) u32 escape = SQL_END_OF_STRING; int nPat; assert(argc == 2 || argc == 3); - struct sql *db = sql_context_db_handle(context); if (mem_is_any_null(&argv[0], &argv[1])) return; assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); @@ -1276,7 +1275,7 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) * sql_utf8_pattern_compare(). */ nPat = argv[0].n; - if (nPat > db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { + if (nPat > sql_get()->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { diag_set(ClientError, ER_SQL_EXECUTE, "LIKE pattern is too "\ "complex"); context->is_aborted = true; @@ -1284,21 +1283,23 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) } if (argc == 3) { + if (mem_is_null(&argv[2])) + return; /* * The escape character string must consist of a * single UTF-8 character. Otherwise, return an * error. */ - const unsigned char *zEsc = (const unsigned char *)argv[2].z; - if (zEsc == 0) - return; - if (sql_utf8_char_count(zEsc, argv[2].n) != 1) { + UErrorCode err = U_ZERO_ERROR; + const char *pos = argv[2].z; + const char *end = argv[2].z + argv[2].n; + escape = ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + if (pos != end || err != U_ZERO_ERROR) { diag_set(ClientError, ER_SQL_EXECUTE, "ESCAPE "\ "expression must be a single character"); context->is_aborted = true; return; } - escape = sqlUtf8Read(&zEsc); } if (!zA || !zB) return; @@ -1313,7 +1314,7 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) context->is_aborted = true; return; } - sql_result_bool(context, res == MATCH); + mem_set_bool(context->pOut, res == MATCH); } /** New patch: commit a739926ff3178d499650cff67c8597ff21e18b3c Author: Mergen Imeev <imeevma@gmail.com> Date: Mon Sep 27 19:29:28 2021 +0300 sql: refactor LIKE() function Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 65bf03250..afe34f7f0 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1260,63 +1260,46 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) { u32 escape = SQL_END_OF_STRING; int nPat; - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, - "LIKE", "2 or 3", argc); - context->is_aborted = true; - return; - } - sql *db = sql_context_db_handle(context); - int rhs_type = sql_value_type(&argv[0]); - int lhs_type = sql_value_type(&argv[1]); - - if (lhs_type != MP_STR || rhs_type != MP_STR) { - if (lhs_type == MP_NIL || rhs_type == MP_NIL) - return; - const char *str = rhs_type != MP_STR ? - mem_str(&argv[0]) : mem_str(&argv[1]); - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", str); - context->is_aborted = true; + assert(argc == 2 || argc == 3); + if (mem_is_any_null(&argv[0], &argv[1])) return; - } - const char *zB = mem_as_str0(&argv[0]); - const char *zA = mem_as_str0(&argv[1]); - const char *zB_end = zB + mem_len_unsafe(&argv[0]); - const char *zA_end = zA + mem_len_unsafe(&argv[1]); + assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); + const char *zB = argv[0].z; + const char *zA = argv[1].z; + const char *zB_end = zB + argv[0].n; + const char *zA_end = zA + argv[1].n; /* * Limit the length of the LIKE pattern to avoid problems * of deep recursion and N*N behavior in * sql_utf8_pattern_compare(). */ - nPat = mem_len_unsafe(&argv[0]); - testcase(nPat == db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]); - testcase(nPat == db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH] + 1); - if (nPat > db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { + nPat = argv[0].n; + if (nPat > sql_get()->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { diag_set(ClientError, ER_SQL_EXECUTE, "LIKE pattern is too "\ "complex"); context->is_aborted = true; return; } - /* Encoding did not change */ - assert(zB == mem_as_str0(&argv[0])); if (argc == 3) { + if (mem_is_null(&argv[2])) + return; /* * The escape character string must consist of a * single UTF-8 character. Otherwise, return an * error. */ - const unsigned char *zEsc = mem_as_ustr(&argv[2]); - if (zEsc == 0) - return; - if (sql_utf8_char_count(zEsc, mem_len_unsafe(&argv[2])) != 1) { + UErrorCode err = U_ZERO_ERROR; + const char *pos = argv[2].z; + const char *end = argv[2].z + argv[2].n; + escape = ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + if (pos != end || err != U_ZERO_ERROR) { diag_set(ClientError, ER_SQL_EXECUTE, "ESCAPE "\ "expression must be a single character"); context->is_aborted = true; return; } - escape = sqlUtf8Read(&zEsc); } if (!zA || !zB) return; @@ -1331,7 +1314,7 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) context->is_aborted = true; return; } - sql_result_bool(context, res == MATCH); + mem_set_bool(context->pOut, res == MATCH); } /** ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function 2021-10-20 17:19 ` Mergen Imeev via Tarantool-patches @ 2021-11-01 10:48 ` Mergen Imeev via Tarantool-patches 2021-11-01 10:53 ` Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:48 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches Thank you for the review! I replaced ucnv_getNextUChar() by U8_NEXT(). Diff and new patch below. On Wed, Oct 20, 2021 at 08:19:38PM +0300, Mergen Imeev via Tarantool-patches wrote: <cut> Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index a76b023eb..7e71a757b 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -623,8 +623,12 @@ func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) if (argc == 2) { uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? argv[1].u.u - 1 : 0; - if (start >= size) - return return_empty_str(ctx, is_str); + if (start >= size) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } char *s = &argv[0].z[start]; uint64_t n = size - start; ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : @@ -645,12 +649,16 @@ func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) uint64_t start; uint64_t length; if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, - &start, &length) != 0) { + &start, &length) != 0) { ctx->is_aborted = true; return; } - if (start >= size || length == 0) - return return_empty_str(ctx, is_str); + if (start >= size || length == 0) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } char *str = &argv[0].z[start]; uint64_t len = MIN(size - start, length); ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, str, len) != 0 : @@ -666,17 +674,19 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) return; assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); - UErrorCode err = U_ZERO_ERROR; - const char *pos = argv[0].z; - const char *end = argv[0].z + argv[0].n; + const char *str = argv[0].z; + int pos = 0; + int end = argv[0].n; if (argc == 2) { uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? argv[1].u.u - 1 : 0; - for (uint64_t i = 0; i < start && pos < end; ++i) - ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } if (pos == end) return mem_set_str_static(ctx->pOut, "", 0); - if (mem_copy_str(ctx->pOut, pos, end - pos) != 0) + if (mem_copy_str(ctx->pOut, str + pos, end - pos) != 0) ctx->is_aborted = true; return; } @@ -694,22 +704,27 @@ func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) uint64_t start; uint64_t length; if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, - &start, &length) != 0) { + &start, &length) != 0) { ctx->is_aborted = true; return; } if (length == 0) return mem_set_str_static(ctx->pOut, "", 0); - for (uint64_t i = 0; i < start && err == U_ZERO_ERROR; ++i) - ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); - const char *cur = pos; - for (uint64_t i = 0; i < length && cur < end; ++i) - ucnv_getNextUChar(icu_utf8_conv, &cur, end, &err); - if (err != U_ZERO_ERROR || cur == pos) + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } + if (pos == end) return mem_set_str_static(ctx->pOut, "", 0); - if (mem_copy_str(ctx->pOut, pos, cur - pos) != 0) + int cur = pos; + for (uint64_t i = 0; i < length && cur < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, cur, end, c); + } + assert(cur > pos); + if (mem_copy_str(ctx->pOut, str + pos, cur - pos) != 0) ctx->is_aborted = true; } New patch: commit 56a7a57544cebcbf2992fe4543a8b4a30cdc3f67 Author: Mergen Imeev <imeevma@gmail.com> Date: Mon Sep 27 10:15:14 2021 +0300 sql: rework SUBSTR() function This patch makes SUBSTR() work according to ANSI rules for SUBSTRING() function. Also, SUBSTR() can now work correctly with large INTEGER values. The SUBSTR() syntax has not changed. Part of #4145 @TarantoolBot document Title: SUBSTR() function SUBSTR() now works according to the ANSI rules for SUBSTRING(). Rules for SUBSTR() with 2 arguments: 1) let the first argument be VALUE, and the second argument be START; 2) VALUE should be STRING or VARBINARY, START should be INTEGER; 3) if any of arguments is NULL, NULL is returned; 4) let POS be MAX(START - 1, 0), END be length of the VALUE; 5) if POS >= END, the result is empty string; 6) if POS < END, the result will be substring of VALUE, starting from the position POS to the position END. Rules for SUBSTR() with 3 arguments: 1) let the first argument be VALUE, the second argument be START, and the third argument be LENGTH; 2) VALUE should be STRING or VARBINARY, START and LENGTH should be INTEGERs; 3) if any of arguments is NULL, NULL is returned; 4) if LENGTH < 0, an error is thrown; 5) let POS be MAX(START - 1, 0), END be START + LENGTH - 1; 6) if POS >= END, the result is empty string; 7) if POS < END, the result will be substring of VALUE, starting from the position POS to the position END. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 7914d2ec7..7e71a757b 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -573,6 +573,161 @@ func_position_characters(struct sql_context *ctx, int argc, struct Mem *argv) return mem_set_uint(ctx->pOut, 0); } +/** Implementation of the SUBSTR() function. */ +int +substr_normalize(int64_t base_start, bool is_start_neg, uint64_t base_length, + uint64_t *start, uint64_t *length) +{ + if (!is_start_neg && base_start > 0) { + *start = (uint64_t)base_start - 1; + *length = base_length; + return 0; + } + *start = 0; + if (base_length == 0) { + *length = 0; + return 0; + } + /* + * We are subtracting 1 from base_length instead of subtracting from + * base_start, since base_start can be INT64_MIN. At the same time, + * base_length is not less than 1. + */ + int64_t a = base_start; + int64_t b = (int64_t)(base_length - 1); + int64_t res; + bool is_neg; + /* + * Integer cannot overflow since non-positive value is added to positive + * value. + */ + if (sql_add_int(a, a != 0, b, false, &res, &is_neg) != 0) { + diag_set(ClientError, ER_SQL_EXECUTE, "integer is overflowed"); + return -1; + } + *length = is_neg ? 0 : (uint64_t)res; + return 0; +} + +static void +func_substr_octets(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_bytes(&argv[0]) && mem_is_int(&argv[1])); + + bool is_str = mem_is_str(&argv[0]); + uint64_t size = argv[0].n; + + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + if (start >= size) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } + char *s = &argv[0].z[start]; + uint64_t n = size - start; + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, s, n) != 0 : + mem_copy_bin(ctx->pOut, s, n) != 0; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (start >= size || length == 0) { + if (is_str) + return mem_set_str0_static(ctx->pOut, ""); + else + return mem_set_bin_static(ctx->pOut, "", 0); + } + char *str = &argv[0].z[start]; + uint64_t len = MIN(size - start, length); + ctx->is_aborted = is_str ? mem_copy_str(ctx->pOut, str, len) != 0 : + mem_copy_bin(ctx->pOut, str, len) != 0; +} + +static void +func_substr_characters(struct sql_context *ctx, int argc, struct Mem *argv) +{ + assert(argc == 2 || argc == 3); + (void)argc; + if (mem_is_any_null(&argv[0], &argv[1])) + return; + assert(mem_is_str(&argv[0]) && mem_is_int(&argv[1])); + + const char *str = argv[0].z; + int pos = 0; + int end = argv[0].n; + if (argc == 2) { + uint64_t start = mem_is_uint(&argv[1]) && argv[1].u.u > 1 ? + argv[1].u.u - 1 : 0; + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } + if (pos == end) + return mem_set_str_static(ctx->pOut, "", 0); + if (mem_copy_str(ctx->pOut, str + pos, end - pos) != 0) + ctx->is_aborted = true; + return; + } + + assert(argc == 3); + if (mem_is_null(&argv[2])) + return; + assert(mem_is_int(&argv[2])); + if (!mem_is_uint(&argv[2])) { + diag_set(ClientError, ER_SQL_EXECUTE, "Length of the result " + "cannot be less than 0"); + ctx->is_aborted = true; + return; + } + uint64_t start; + uint64_t length; + if (substr_normalize(argv[1].u.i, !mem_is_uint(&argv[1]), argv[2].u.u, + &start, &length) != 0) { + ctx->is_aborted = true; + return; + } + if (length == 0) + return mem_set_str_static(ctx->pOut, "", 0); + + for (uint64_t i = 0; i < start && pos < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, pos, end, c); + } + if (pos == end) + return mem_set_str_static(ctx->pOut, "", 0); + + int cur = pos; + for (uint64_t i = 0; i < length && cur < end; ++i) { + UChar32 c; + U8_NEXT((uint8_t *)str, cur, end, c); + } + assert(cur > pos); + if (mem_copy_str(ctx->pOut, str + pos, cur - pos) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -770,116 +925,6 @@ printfFunc(struct sql_context *context, int argc, struct Mem *argv) } } -/* - * Implementation of the substr() function. - * - * substr(x,p1,p2) returns p2 characters of x[] beginning with p1. - * p1 is 1-indexed. So substr(x,1,1) returns the first character - * of x. If x is text, then we actually count UTF-8 characters. - * If x is a blob, then we count bytes. - * - * If p1 is negative, then we begin abs(p1) from the end of x[]. - * - * If p2 is negative, return the p2 characters preceding p1. - */ -static void -substrFunc(struct sql_context *context, int argc, struct Mem *argv) -{ - const unsigned char *z; - const unsigned char *z2; - int len; - int p0type; - int64_t p1, p2; - int negP2 = 0; - - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "SUBSTR", - "1 or 2", argc); - context->is_aborted = true; - return; - } - if (mem_is_null(&argv[1]) || (argc == 3 && mem_is_null(&argv[2]))) - return; - p0type = sql_value_type(&argv[0]); - p1 = mem_get_int_unsafe(&argv[1]); - if (p0type == MP_BIN) { - z = mem_as_bin(&argv[0]); - len = mem_len_unsafe(&argv[0]); - if (z == 0) - return; - assert(len == mem_len_unsafe(&argv[0])); - } else { - z = mem_as_ustr(&argv[0]); - if (z == 0) - return; - len = 0; - if (p1 < 0) - len = sql_utf8_char_count(z, mem_len_unsafe(&argv[0])); - } - if (argc == 3) { - p2 = mem_get_int_unsafe(&argv[2]); - if (p2 < 0) { - p2 = -p2; - negP2 = 1; - } - } else { - p2 = sql_context_db_handle(context)-> - aLimit[SQL_LIMIT_LENGTH]; - } - if (p1 < 0) { - p1 += len; - if (p1 < 0) { - p2 += p1; - if (p2 < 0) - p2 = 0; - p1 = 0; - } - } else if (p1 > 0) { - p1--; - } else if (p2 > 0) { - p2--; - } - if (negP2) { - p1 -= p2; - if (p1 < 0) { - p2 += p1; - p1 = 0; - } - } - assert(p1 >= 0 && p2 >= 0); - if (p0type != MP_BIN) { - /* - * In the code below 'cnt' and 'n_chars' is - * used because '\0' is not supposed to be - * end-of-string symbol. - */ - int byte_size = mem_len_unsafe(&argv[0]); - int n_chars = sql_utf8_char_count(z, byte_size); - int cnt = 0; - int i = 0; - while (cnt < n_chars && p1) { - SQL_UTF8_FWD_1(z, i, byte_size); - cnt++; - p1--; - } - z += i; - i = 0; - for (z2 = z; cnt < n_chars && p2; p2--) { - SQL_UTF8_FWD_1(z2, i, byte_size); - cnt++; - } - z2 += i; - mem_copy_str(context->pOut, (char *)z, z2 - z); - } else { - if (p1 + p2 > len) { - p2 = len - p1; - if (p2 < 0) - p2 = 0; - } - mem_copy_bin(context->pOut, (char *)&z[p1], p2); - } -} - /* * Implementation of the round() function */ @@ -1933,15 +1978,15 @@ static struct sql_func_definition definitions[] = { {"SOUNDEX", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_STRING, soundexFunc, NULL}, {"SUBSTR", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, substrFunc, NULL}, + FIELD_TYPE_STRING, func_substr_characters, NULL}, {"SUBSTR", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUBSTR", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, substrFunc, NULL}, + FIELD_TYPE_VARBINARY, func_substr_octets, NULL}, {"SUM", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_INTEGER, step_sum, NULL}, {"SUM", 1, {FIELD_TYPE_DOUBLE}, FIELD_TYPE_DOUBLE, step_sum, NULL}, {"TOTAL", 1, {FIELD_TYPE_INTEGER}, FIELD_TYPE_DOUBLE, step_total, diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 416f27d69..dc4dfdc0e 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -139,7 +139,7 @@ test:do_execsql_test( SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.3> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.3> }) @@ -149,7 +149,7 @@ test:do_execsql_test( SELECT substr(t1,-1,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.4> - "e", "s", "m", "e", "s" + "", "", "", "", "" -- </func-2.4> }) @@ -159,7 +159,7 @@ test:do_execsql_test( SELECT substr(t1,-2,1) FROM tbl1 ORDER BY t1 ]], { -- <func-2.5> - "e", "i", "a", "r", "i" + "", "", "", "", "" -- </func-2.5> }) @@ -169,7 +169,7 @@ test:do_execsql_test( SELECT substr(t1,-2,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.6> - "ee", "is", "am", "re", "is" + "", "", "", "", "" -- </func-2.6> }) @@ -179,7 +179,7 @@ test:do_execsql_test( SELECT substr(t1,-4,2) FROM tbl1 ORDER BY t1 ]], { -- <func-2.7> - "fr", "", "gr", "wa", "th" + "", "", "", "", "" -- </func-2.7> }) @@ -288,7 +288,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-1,1) FROM tbl1 ORDER BY t1 ]], { -- <func-3.8> - "8", "s", "s", "o" + "", "", "", "" -- </func-3.8> }) @@ -298,7 +298,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-3,2) FROM tbl1 ORDER BY t1 ]], { -- <func-3.9> - "F-", "er", "in", "ሴh" + "", "", "", "" -- </func-3.9> }) @@ -308,7 +308,7 @@ if ("ሴ" ~= "u1234") then SELECT substr(t1,-4,3) FROM tbl1 ORDER BY t1 ]], { -- <func-3.10> - "TF-", "ter", "ain", "iሴh" + "", "", "", "" -- </func-3.10> }) diff --git a/test/sql-tap/func2.test.lua b/test/sql-tap/func2.test.lua index 792f020f1..b786b4d96 100755 --- a/test/sql-tap/func2.test.lua +++ b/test/sql-tap/func2.test.lua @@ -162,7 +162,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1) ]], { -- <func2-1.11> - "s" + "Supercalifragilisticexpialidocious" -- </func2-1.11> }) @@ -172,7 +172,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2) ]], { -- <func2-1.12> - "us" + "Supercalifragilisticexpialidocious" -- </func2-1.12> }) @@ -182,7 +182,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30) ]], { -- <func2-1.13> - "rcalifragilisticexpialidocious" + "Supercalifragilisticexpialidocious" -- </func2-1.13> }) @@ -344,7 +344,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 1) ]], { -- <func2-1.25.1> - "s" + "" -- </func2-1.25.1> }) @@ -354,7 +354,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -1, 2) ]], { -- <func2-1.25.2> - "s" + "" -- </func2-1.25.2> }) @@ -364,7 +364,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -2, 1) ]], { -- <func2-1.26> - "u" + "" -- </func2-1.26> }) @@ -374,7 +374,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -30, 1) ]], { -- <func2-1.27> - "r" + "" -- </func2-1.27> }) @@ -394,7 +394,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 1) ]], { -- <func2-1.28.1> - "S" + "" -- </func2-1.28.1> }) @@ -404,7 +404,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -34, 2) ]], { -- <func2-1.28.2> - "Su" + "" -- </func2-1.28.2> }) @@ -424,7 +424,7 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -35, 2) ]], { -- <func2-1.29.2> - "S" + "" -- </func2-1.29.2> }) @@ -464,11 +464,13 @@ test:do_execsql_test( SELECT SUBSTR('Supercalifragilisticexpialidocious', -36, 3) ]], { -- <func2-1.30.3> - "S" + "" -- </func2-1.30.3> }) -- p1 is 1-indexed, p2 length to return, p2<0 return p2 chars before p1 +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_execsql_test( "func2-1.31.0", [[ @@ -479,23 +481,23 @@ test:do_execsql_test( -- </func2-1.31.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -1) ]], { -- <func2-1.31.1> - "" + 1, err -- </func2-1.31.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.31.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 0, -2) ]], { -- <func2-1.31.2> - "" + 1, err -- </func2-1.31.2> }) @@ -509,13 +511,13 @@ test:do_execsql_test( -- </func2-1.32.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.32.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 1, -1) ]], { -- <func2-1.32.1> - "" + 1, err -- </func2-1.32.1> }) @@ -529,23 +531,23 @@ test:do_execsql_test( -- </func2-1.33.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -1) ]], { -- <func2-1.33.1> - "S" + 1, err -- </func2-1.33.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.33.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 2, -2) ]], { -- <func2-1.33.2> - "S" + 1, err -- </func2-1.33.2> }) @@ -559,63 +561,63 @@ test:do_execsql_test( -- </func2-1.34.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -1) ]], { -- <func2-1.34.1> - "u" + 1, err -- </func2-1.34.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.34.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 3, -2) ]], { -- <func2-1.34.2> - "Su" + 1, err -- </func2-1.34.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -1) ]], { -- <func2-1.35.1> - "o" + 1, err -- </func2-1.35.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.35.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 30, -2) ]], { -- <func2-1.35.2> - "do" + 1, err -- </func2-1.35.2> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.36", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 34, -1) ]], { -- <func2-1.36> - "u" + 1, err -- </func2-1.36> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.37", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 35, -1) ]], { -- <func2-1.37> - "s" + 1, err -- </func2-1.37> }) @@ -629,23 +631,23 @@ test:do_execsql_test( -- </func2-1.38.0> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.1", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -1) ]], { -- <func2-1.38.1> - "" + 1, err -- </func2-1.38.1> }) -test:do_execsql_test( +test:do_catchsql_test( "func2-1.38.2", [[ SELECT SUBSTR('Supercalifragilisticexpialidocious', 36, -2) ]], { -- <func2-1.38.2> - "s" + 1, err -- </func2-1.38.2> }) @@ -993,7 +995,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 1) ]], { -- <func2-2.8.1> - "ሴ" + "" -- </func2-2.8.1> }) @@ -1003,7 +1005,7 @@ if ("ሴ" ~= "u1234") SELECT SUBSTR('ሴ', -1, 2) ]], { -- <func2-2.8.2> - "ሴ" + "" -- </func2-2.8.2> }) @@ -1130,21 +1132,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "") test:do_test( "func2-3.4.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -1, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "34") + end, "12") test:do_test( "func2-3.5.0", @@ -1158,21 +1160,21 @@ test:do_test( function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 1)" return bin_to_hex(test.lindex(blob, 0)) - end, "12") + end, "") test:do_test( "func2-3.5.2", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 2)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.5.3", function() local blob = test:execsql "SELECT SUBSTR(x'1234', -2, 3)" return bin_to_hex(test.lindex(blob, 0)) - end, "1234") + end, "") test:do_test( "func2-3.6.0", @@ -1181,26 +1183,28 @@ test:do_test( return bin_to_hex(test.lindex(blob, 0)) end, "") +local err = [[Failed to execute SQL statement: Length of the result cannot ]].. + [[be less than 0]] test:do_test( "func2-3.6.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -1)") + end, + {1, err}) test:do_test( "func2-3.6.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -2)") + end, + {1, err}) test:do_test( "func2-3.6.3", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -1, -3)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', -1, -3)") + end, + {1, err}) test:do_test( "func2-3.7.0", @@ -1212,16 +1216,16 @@ test:do_test( test:do_test( "func2-3.7.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -1)") + end, + {1, err}) test:do_test( "func2-3.7.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', -2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', -2, -2)") + end, + {1, err}) test:do_test( "func2-3.8.0", @@ -1233,16 +1237,16 @@ test:do_test( test:do_test( "func2-3.8.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -1)") + end, + {1, err}) test:do_test( "func2-3.8.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 1, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "") + return test:catchsql("SELECT SUBSTR(x'1234', 1, -2)") + end, + {1, err}) test:do_test( "func2-3.9.0", @@ -1254,17 +1258,15 @@ test:do_test( test:do_test( "func2-3.9.1", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -1)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:do_test( "func2-3.9.2", function() - local blob = test:execsql "SELECT SUBSTR(x'1234', 2, -2)" - return bin_to_hex(test.lindex(blob, 0)) - end, "12") - - + return test:catchsql("SELECT SUBSTR(x'1234', 2, -2)") + end, + {1, err}) test:finish_test() diff --git a/test/sql-tap/substr.test.lua b/test/sql-tap/substr.test.lua index e7e6d7aca..45aae8506 100755 --- a/test/sql-tap/substr.test.lua +++ b/test/sql-tap/substr.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(93) +test:plan(85) --!./tcltestrunner.lua -- 2007 May 14 @@ -74,17 +74,11 @@ substr_test("1.2", "abcdefg","2","1","b") substr_test("1.3", "abcdefg","1","2","ab") substr_test("1.4", "abcdefg","1","100","abcdefg") substr_test("1.5", "abcdefg","0","2","a") -substr_test("1.6", "abcdefg","-1","1","g") -substr_test("1.7", "abcdefg","-1","10","g") -substr_test("1.8", "abcdefg","-5","3","cde") -substr_test("1.9", "abcdefg","-7","3","abc") -substr_test("1.10", "abcdefg","-100","98","abcde") -substr_test("1.11", "abcdefg","5","-1","d") -substr_test("1.12", "abcdefg","5","-4","abcd") -substr_test("1.13", "abcdefg","5","-5","abcd") -substr_test("1.14", "abcdefg","-5","-1","b") -substr_test("1.15", "abcdefg","-5","-2","ab") -substr_test("1.16", "abcdefg","-5","-3","ab") +substr_test("1.6", "abcdefg","-1","1","") +substr_test("1.7", "abcdefg","-1","10","abcdefg") +substr_test("1.8", "abcdefg","-5","3","") +substr_test("1.9", "abcdefg","-7","3","") +substr_test("1.10", "abcdefg","-100","98","") substr_test("1.17", "abcdefg","100","200","") substr_test("1.18", "abcdefg","200","100","") -- Make sure NULL is returned if any parameter is NULL @@ -144,21 +138,20 @@ test:do_test( substr_test("2.1", "ሴ⍅㑖","1","1","ሴ") substr_test("2.2", "ሴ⍅㑖","2","1","⍅") substr_test("2.3", "ሴ⍅㑖","1","2","ሴ⍅") -substr_test("2.4", "ሴ⍅㑖","-1","1","㑖") -substr_test("2.5", "aሴb⍅c㑖c","-5","3","b⍅c") -substr_test("2.6", "aሴb⍅c㑖c","-2","-3","b⍅c") +substr_test("2.4", "ሴ⍅㑖","-1","1","") +substr_test("2.5", "aሴb⍅c㑖c","-5","3","") -- Basic functionality for BLOBs -- subblob_test("3.1", "61626364656667","1","1","61") subblob_test("3.2", "61626364656667","2","1","62") subblob_test("3.3", "61626364656667","1","2","6162") subblob_test("3.4", "61626364656667","1","100","61626364656667") -subblob_test("3.5", "61626364656667","0","2","61") -subblob_test("3.6", "61626364656667","-1","1","67") -subblob_test("3.7", "61626364656667","-1","10","67") -subblob_test("3.8", "61626364656667","-5","3","636465") -subblob_test("3.9", "61626364656667","-7","3","616263") -subblob_test("3.10", "61626364656667","-100","98","6162636465") +subblob_test("3.5", "61626364656667", "0", "2", "61") +subblob_test("3.6", "61626364656667", "-1", "1", "") +subblob_test("3.7", "61626364656667", "-1", "10", "61626364656667") +subblob_test("3.8", "61626364656667", "-5", "3", "") +subblob_test("3.9", "61626364656667","-7","3", "") +subblob_test("3.10", "61626364656667", "-100", "98", "") subblob_test("3.11", "61626364656667","100","200","") subblob_test("3.12", "61626364656667","200","100","") -- If these blobs were strings, then they would contain multi-byte @@ -168,9 +161,9 @@ subblob_test("3.12", "61626364656667","200","100","") subblob_test("4.1", "61E188B462E28D8563E3919663","1","1","61") subblob_test("4.2", "61E188B462E28D8563E3919663","2","1","E1") subblob_test("4.3", "61E188B462E28D8563E3919663","1","2","61E1") -subblob_test("4.4", "61E188B462E28D8563E3919663","-2","1","96") -subblob_test("4.5", "61E188B462E28D8563E3919663","-5","4","63E39196") -subblob_test("4.6", "61E188B462E28D8563E3919663","-100","98","61E188B462E28D8563E391") +subblob_test("4.4", "61E188B462E28D8563E3919663", "-2", "1", "") +subblob_test("4.5", "61E188B462E28D8563E3919663", "-5", "4", "") +subblob_test("4.6", "61E188B462E28D8563E3919663", "-100", "98", "") -- Two-argument SUBSTR -- local function substr_2_test(id, string, idx, result) @@ -193,7 +186,85 @@ local function substr_2_test(id, string, idx, result) end substr_2_test("5.1","abcdefghijklmnop","5","efghijklmnop") -substr_2_test("5.2","abcdef","-5","bcdef") +substr_2_test("5.2","abcdef","-5","abcdef") +-- +-- gh-4145: Make sure SUBSTR() throws an error if the third argument is +-- negative. +-- +test:do_catchsql_test( + "substr-6.1", + [[ + SELECT SUBSTR('12345', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +test:do_catchsql_test( + "substr-6.2", + [[ + SELECT SUBSTR(x'3132333435', 1, -1); + ]], + { + 1, [[Failed to execute SQL statement: ]].. + [[Length of the result cannot be less than 0]] + } +) + +-- gh-4145: Make sure that SUBSTR() works according to ANSI. + +-- +-- Make sure SUBSTR() returns "" if the sum of the second and third arguments is +-- 1 or less. +-- +test:do_execsql_test( + "builtins-6.3", + [[ + SELECT SUBSTR('asdfg', -10, 5), SUBSTR('asdfg', -4, 5); + ]], + { + '', '' + } +) + +-- +-- Make sure that if the sum of the second and third arguments is more than 1 +-- and the second argument is negative, the result starts from the start of the +-- string and length of the result will be one less than sum of the second and +-- third arguments. +-- +test:do_execsql_test( + "builtins-6.4", + [[ + SELECT SUBSTR('123456789', -5, 10); + ]], + { + '1234' + } +) + +-- Make sure SUBSTR() can work with big INTEGERs. +test:do_execsql_test( + "builtins-6.5", + [[ + SELECT SUBSTR('123456789', -9223372036854775808, 9223372036854775812); + ]], + { + '123' + } +) + +test:do_execsql_test( + "builtins-6.6", + [[ + SELECT SUBSTR('123456789', 0, 18000000000000000000); + ]], + { + '123456789' + } +) test:finish_test() ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function 2021-11-01 10:48 ` Mergen Imeev via Tarantool-patches @ 2021-11-01 10:53 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-01 10:53 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches I am sorry, I uncluded wrong diff and patch. Diff and patch below. On Mon, Nov 01, 2021 at 01:48:51PM +0300, Mergen Imeev via Tarantool-patches wrote: > Thank you for the review! I replaced ucnv_getNextUChar() by U8_NEXT(). Diff and > new patch below. > > On Wed, Oct 20, 2021 at 08:19:38PM +0300, Mergen Imeev via Tarantool-patches wrote: > <cut> > Diff: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index b055c7515..868d51145 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1301,11 +1301,11 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) * single UTF-8 character. Otherwise, return an * error. */ - UErrorCode err = U_ZERO_ERROR; - const char *pos = argv[2].z; - const char *end = argv[2].z + argv[2].n; - escape = ucnv_getNextUChar(icu_utf8_conv, &pos, end, &err); - if (pos != end || err != U_ZERO_ERROR) { + const char *str = argv[2].z; + int pos = 0; + int end = argv[2].n; + U8_NEXT((uint8_t *)str, pos, end, escape); + if (pos != end || end == 0) { diag_set(ClientError, ER_SQL_EXECUTE, "ESCAPE "\ "expression must be a single character"); context->is_aborted = true; Patch: commit 43d41680d9198a098f866ba7669739221952f446 Author: Mergen Imeev <imeevma@gmail.com> Date: Mon Sep 27 19:29:28 2021 +0300 sql: refactor LIKE() function Part of #4145 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 7e71a757b..868d51145 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1271,63 +1271,46 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) { u32 escape = SQL_END_OF_STRING; int nPat; - if (argc != 2 && argc != 3) { - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, - "LIKE", "2 or 3", argc); - context->is_aborted = true; - return; - } - sql *db = sql_context_db_handle(context); - int rhs_type = sql_value_type(&argv[0]); - int lhs_type = sql_value_type(&argv[1]); - - if (lhs_type != MP_STR || rhs_type != MP_STR) { - if (lhs_type == MP_NIL || rhs_type == MP_NIL) - return; - const char *str = rhs_type != MP_STR ? - mem_str(&argv[0]) : mem_str(&argv[1]); - diag_set(ClientError, ER_INCONSISTENT_TYPES, "string", str); - context->is_aborted = true; + assert(argc == 2 || argc == 3); + if (mem_is_any_null(&argv[0], &argv[1])) return; - } - const char *zB = mem_as_str0(&argv[0]); - const char *zA = mem_as_str0(&argv[1]); - const char *zB_end = zB + mem_len_unsafe(&argv[0]); - const char *zA_end = zA + mem_len_unsafe(&argv[1]); + assert(mem_is_str(&argv[0]) && mem_is_str(&argv[1])); + const char *zB = argv[0].z; + const char *zA = argv[1].z; + const char *zB_end = zB + argv[0].n; + const char *zA_end = zA + argv[1].n; /* * Limit the length of the LIKE pattern to avoid problems * of deep recursion and N*N behavior in * sql_utf8_pattern_compare(). */ - nPat = mem_len_unsafe(&argv[0]); - testcase(nPat == db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]); - testcase(nPat == db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH] + 1); - if (nPat > db->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { + nPat = argv[0].n; + if (nPat > sql_get()->aLimit[SQL_LIMIT_LIKE_PATTERN_LENGTH]) { diag_set(ClientError, ER_SQL_EXECUTE, "LIKE pattern is too "\ "complex"); context->is_aborted = true; return; } - /* Encoding did not change */ - assert(zB == mem_as_str0(&argv[0])); if (argc == 3) { + if (mem_is_null(&argv[2])) + return; /* * The escape character string must consist of a * single UTF-8 character. Otherwise, return an * error. */ - const unsigned char *zEsc = mem_as_ustr(&argv[2]); - if (zEsc == 0) - return; - if (sql_utf8_char_count(zEsc, mem_len_unsafe(&argv[2])) != 1) { + const char *str = argv[2].z; + int pos = 0; + int end = argv[2].n; + U8_NEXT((uint8_t *)str, pos, end, escape); + if (pos != end || end == 0) { diag_set(ClientError, ER_SQL_EXECUTE, "ESCAPE "\ "expression must be a single character"); context->is_aborted = true; return; } - escape = sqlUtf8Read(&zEsc); } if (!zA || !zB) return; @@ -1342,7 +1325,7 @@ likeFunc(sql_context *context, int argc, struct Mem *argv) context->is_aborted = true; return; } - sql_result_bool(context, res == MATCH); + mem_set_bool(context->pOut, res == MATCH); } /** ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 0/8] Rework standard function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (7 preceding siblings ...) 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function Mergen Imeev via Tarantool-patches @ 2021-10-04 13:32 ` Mergen Imeev via Tarantool-patches 2021-11-01 13:38 ` Vladislav Shpilevoy via Tarantool-patches 9 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-10-04 13:32 UTC (permalink / raw) To: v.shpilevoy, tarantool-patches Hi! I'm sorry, I forgot to add links to the issue and the branch. Here they are: https://github.com/tarantool/tarantool/issues/4145 https://github.com/tarantool/tarantool/tree/imeevma/gh-4145-rework-sql-built-in-funcs On Fri, Oct 01, 2021 at 07:29:23PM +0300, Mergen Imeev via Tarantool-patches wrote: > This patch refactors all non-aggregate standard functions and makes these > functions work according to ANSI rules, if it is possible. However, the ANSI > specific syntax for these functions will be presented later. > > Mergen Imeev (8): > sql: refactor ABS() funcion > sql: refactor CHAR_LENGTH() function > sql: refactor UPPER() and LOWER() functions > sql: refactor NULLIF() function > sql: rework TRIM() function > sql: rework POSITION() function > sql: rework SUBSTR() function > sql: refactor LIKE() function > > src/box/sql/func.c | 1137 +++++++++++++++----------------- > test/sql-tap/badutf1.test.lua | 41 +- > test/sql-tap/func.test.lua | 16 +- > test/sql-tap/func2.test.lua | 150 ++--- > test/sql-tap/position.test.lua | 80 ++- > test/sql-tap/substr.test.lua | 121 +++- > 6 files changed, 786 insertions(+), 759 deletions(-) > > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Tarantool-patches] [PATCH v1 0/8] Rework standard function 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches ` (8 preceding siblings ...) 2021-10-04 13:32 ` [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches @ 2021-11-01 13:38 ` Vladislav Shpilevoy via Tarantool-patches 9 siblings, 0 replies; 36+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-11-01 13:38 UTC (permalink / raw) To: imeevma; +Cc: tarantool-patches Hi! Thanks for the fixes! LGTM. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 0/8] Rework standard function @ 2021-11-11 10:45 Mergen Imeev via Tarantool-patches 2021-11-11 10:45 ` [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function Mergen Imeev via Tarantool-patches 0 siblings, 1 reply; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-11 10:45 UTC (permalink / raw) To: kyukhin; +Cc: tarantool-patches This patch refactors all non-aggregate standard functions and makes these functions work according to ANSI rules, if it is possible. However, the ANSI specific syntax for these functions will be presented later. https://github.com/tarantool/tarantool/issues/4145 https://github.com/tarantool/tarantool/tree/imeevma/gh-4145-rework-sql-built-in-funcs Mergen Imeev (8): sql: refactor ABS() funcion sql: rework CHAR_LENGTH() function sql: refactor UPPER() and LOWER() functions sql: refactor NULLIF() function sql: rework TRIM() function sql: rework POSITION() function sql: rework SUBSTR() function sql: refactor LIKE() function src/box/sql/func.c | 1143 +++++++++++++++----------------- test/sql-tap/badutf1.test.lua | 81 +-- test/sql-tap/func.test.lua | 16 +- test/sql-tap/func2.test.lua | 150 ++--- test/sql-tap/func3.test.lua | 10 +- test/sql-tap/position.test.lua | 80 ++- test/sql-tap/substr.test.lua | 121 +++- 7 files changed, 791 insertions(+), 810 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function 2021-11-11 10:45 Mergen Imeev via Tarantool-patches @ 2021-11-11 10:45 ` Mergen Imeev via Tarantool-patches 0 siblings, 0 replies; 36+ messages in thread From: Mergen Imeev via Tarantool-patches @ 2021-11-11 10:45 UTC (permalink / raw) To: kyukhin; +Cc: tarantool-patches This patch refactoring TRIM() and fixes an issue with incorrect trimming of some VARBINARY values. Also, TRIM() now use ICU functions instead of self-created. Part of #4415 --- src/box/sql/func.c | 361 ++++++++++++++++------------------ test/sql-tap/badutf1.test.lua | 41 ++-- 2 files changed, 198 insertions(+), 204 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 7d54a39cd..ba6b9246d 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -343,6 +343,172 @@ func_nullif(struct sql_context *ctx, int argc, struct Mem *argv) ctx->is_aborted = true; } +/** Implementation of the TRIM() function. */ +static inline int +trim_bin_end(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + char c = str[end - 1]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + --end; + } + return end; +} + +static inline int +trim_bin_start(const char *str, int end, const char *octets, int octets_size, + int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + char c = str[start]; + for (int i = 0; i < octets_size && !is_trimmed; ++i) + is_trimmed = c == octets[i]; + if (!is_trimmed) + break; + ++start; + } + return start; +} + +static void +func_trim_bin(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_bin(&argv[2]))); + assert(mem_is_bin(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *octets; + int octets_size; + if (argc == 3) { + octets = argv[2].z; + octets_size = argv[2].n; + } else { + octets = "\0"; + octets_size = 1; + } + + int flags = argv[1].u.u; + int end = trim_bin_end(str, size, octets, octets_size, flags); + int start = trim_bin_start(str, end, octets, octets_size, flags); + + if (start >= end) + return mem_set_bin_static(ctx->pOut, "", 0); + if (mem_copy_bin(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + +static inline int +trim_str_end(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_TRAILING) == 0) + return end; + while (end > 0) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + end - len; + is_trimmed = len <= end && memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + end -= len; + } + return end; +} + +static inline int +trim_str_start(const char *str, int end, const char *chars, uint8_t *chars_len, + int chars_count, int flags) +{ + if ((flags & TRIM_LEADING) == 0) + return 0; + int start = 0; + while (start < end) { + bool is_trimmed = false; + const char *c = chars; + int len; + for (int i = 0; i < chars_count && !is_trimmed; ++i) { + len = chars_len[i]; + const char *s = str + start; + is_trimmed = start + len <= end && + memcmp(c, s, len) == 0; + c += len; + } + if (!is_trimmed) + break; + assert(len > 0); + start += len; + } + return start; +} + +static void +func_trim_str(struct sql_context *ctx, int argc, struct Mem *argv) +{ + if (mem_is_null(&argv[0]) || (argc == 3 && mem_is_null(&argv[2]))) + return; + assert(argc == 2 || (argc == 3 && mem_is_str(&argv[2]))); + assert(mem_is_str(&argv[0]) && mem_is_uint(&argv[1])); + const char *str = argv[0].z; + int size = argv[0].n; + const char *chars; + int chars_size; + if (argc == 3) { + chars = argv[2].z; + chars_size = argv[2].n; + } else { + chars = " "; + chars_size = 1; + } + + struct region *region = &fiber()->gc; + size_t svp = region_used(region); + uint8_t *chars_len = region_alloc(region, chars_size); + if (chars_len == NULL) { + ctx->is_aborted = true; + diag_set(OutOfMemory, chars_size, "region_alloc", "chars_len"); + return; + } + int chars_count = 0; + + int offset = 0; + while (offset < chars_size) { + UChar32 c; + int prev = offset; + U8_NEXT((uint8_t *)chars, offset, chars_size, c); + chars_len[chars_count++] = offset - prev; + } + + uint64_t flags = argv[1].u.u; + int end = trim_str_end(str, size, chars, chars_len, chars_count, flags); + int start = trim_str_start(str, end, chars, chars_len, chars_count, + flags); + region_truncate(region, svp); + + if (start >= end) + return mem_set_str0_static(ctx->pOut, ""); + if (mem_copy_str(ctx->pOut, &str[start], end - start) != 0) + ctx->is_aborted = true; +} + static const unsigned char * mem_as_ustr(struct Mem *mem) { @@ -1527,193 +1693,6 @@ replaceFunc(struct sql_context *context, int argc, struct Mem *argv) mem_set_bin_dynamic(context->pOut, (char *)zOut, j); } -/** - * Remove characters included in @a trim_set from @a input_str - * until encounter a character that doesn't belong to @a trim_set. - * Remove from the side specified by @a flags. - * @param context SQL context. - * @param flags Trim specification: left, right or both. - * @param trim_set The set of characters for trimming. - * @param char_len Lengths of each UTF-8 character in @a trim_set. - * @param char_cnt A number of UTF-8 characters in @a trim_set. - * @param input_str Input string for trimming. - * @param input_str_sz Input string size in bytes. - */ -static void -trim_procedure(struct sql_context *context, enum trim_side_mask flags, - const unsigned char *trim_set, const uint8_t *char_len, - int char_cnt, const unsigned char *input_str, int input_str_sz) -{ - if (char_cnt == 0) - goto finish; - int i, len; - const unsigned char *z; - if ((flags & TRIM_LEADING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(input_str, z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str += len; - input_str_sz -= len; - } - } - if ((flags & TRIM_TRAILING) != 0) { - while (input_str_sz > 0) { - z = trim_set; - for (i = 0; i < char_cnt; ++i, z += len) { - len = char_len[i]; - if (len <= input_str_sz - && memcmp(&input_str[input_str_sz - len], - z, len) == 0) - break; - } - if (i >= char_cnt) - break; - input_str_sz -= len; - } - } -finish: - if (context->func->def->returns == FIELD_TYPE_STRING) - mem_copy_str(context->pOut, (char *)input_str, input_str_sz); - else - mem_copy_bin(context->pOut, (char *)input_str, input_str_sz); -} - -/** - * Prepare arguments for trimming procedure. Allocate memory for - * @a char_len (array of lengths each character in @a trim_set) - * and fill it. - * - * @param context SQL context. - * @param trim_set The set of characters for trimming. - * @param[out] char_len Lengths of each character in @ trim_set. - * @retval >=0 A number of UTF-8 characters in @a trim_set. - * @retval -1 Memory allocation error. - */ -static int -trim_prepare_char_len(struct sql_context *context, - const unsigned char *trim_set, int trim_set_sz, - uint8_t **char_len) -{ - /* - * Count the number of UTF-8 characters passing through - * the entire char set, but not up to the '\0' or X'00' - * character. This allows to handle trimming set - * containing such characters. - */ - int char_cnt = sql_utf8_char_count(trim_set, trim_set_sz); - if (char_cnt == 0) { - *char_len = NULL; - return 0; - } - - if ((*char_len = (uint8_t *)contextMalloc(context, char_cnt)) == NULL) - return -1; - - int i = 0, j = 0; - while(j < char_cnt) { - int old_i = i; - SQL_UTF8_FWD_1(trim_set, i, trim_set_sz); - (*char_len)[j++] = i - old_i; - } - - return char_cnt; -} - -/** - * Normalize args from @a argv input array when it has two args. - * - * Case: TRIM(<str>) - * Call trimming procedure with TRIM_BOTH as the flags and " " as - * the trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>) - * If user has specified side keyword only, then call trimming - * procedure with the specified side and " " as the trimming set. - */ -static void -trim_func_two_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2) -{ - const unsigned char *trim_set; - if (mem_is_bin(arg1)) - trim_set = (const unsigned char *)"\0"; - else - trim_set = (const unsigned char *)" "; - const unsigned char *input_str; - if ((input_str = mem_as_ustr(arg1)) == NULL) - return; - - int input_str_sz = mem_len_unsafe(arg1); - assert(arg2->type == MEM_TYPE_UINT); - uint8_t len_one = 1; - trim_procedure(context, arg2->u.u, trim_set, - &len_one, 1, input_str, input_str_sz); -} - -/** - * Normalize args from @a argv input array when it has three args. - * - * Case: TRIM(<character_set> FROM <str>) - * If user has specified <character_set> only, call trimming procedure with - * TRIM_BOTH as the flags and that trimming set. - * - * Case: TRIM(LEADING/TRAILING/BOTH <character_set> FROM <str>) - * If user has specified side keyword and <character_set>, then - * call trimming procedure with that args. - */ -static void -trim_func_three_args(struct sql_context *context, sql_value *arg1, - sql_value *arg2, sql_value *arg3) -{ - assert(arg2->type == MEM_TYPE_UINT); - const unsigned char *input_str, *trim_set; - if ((input_str = mem_as_ustr(arg1)) == NULL || - (trim_set = mem_as_ustr(arg3)) == NULL) - return; - - int trim_set_sz = mem_len_unsafe(arg3); - int input_str_sz = mem_len_unsafe(arg1); - uint8_t *char_len; - int char_cnt = trim_prepare_char_len(context, trim_set, trim_set_sz, - &char_len); - if (char_cnt == -1) - return; - trim_procedure(context, arg2->u.u, trim_set, char_len, - char_cnt, input_str, input_str_sz); - sql_free(char_len); -} - -/** - * Normalize args from @a argv input array when it has one, - * two or three args. - * - * This is a dispatcher function that calls corresponding - * implementation depending on the number of arguments. -*/ -static void -trim_func(struct sql_context *context, int argc, struct Mem *argv) -{ - switch (argc) { - case 2: - trim_func_two_args(context, &argv[0], &argv[1]); - break; - case 3: - trim_func_three_args(context, &argv[0], &argv[1], &argv[2]); - break; - default: - diag_set(ClientError, ER_FUNC_WRONG_ARG_COUNT, "TRIM", - "2 or 3", argc); - context->is_aborted = true; - } -} - /* * Compute the soundex encoding of a word. * @@ -2040,14 +2019,14 @@ static struct sql_func_definition definitions[] = { fin_total}, {"TRIM", 2, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 3, {FIELD_TYPE_STRING, FIELD_TYPE_INTEGER, FIELD_TYPE_STRING}, - FIELD_TYPE_STRING, trim_func, NULL}, + FIELD_TYPE_STRING, func_trim_str, NULL}, {"TRIM", 2, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TRIM", 3, {FIELD_TYPE_VARBINARY, FIELD_TYPE_INTEGER, FIELD_TYPE_VARBINARY}, - FIELD_TYPE_VARBINARY, trim_func, NULL}, + FIELD_TYPE_VARBINARY, func_trim_bin, NULL}, {"TYPEOF", 1, {FIELD_TYPE_ANY}, FIELD_TYPE_STRING, typeofFunc, NULL}, {"UNICODE", 1, {FIELD_TYPE_STRING}, FIELD_TYPE_INTEGER, unicodeFunc, diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index ce8354840..d1e17ca3e 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool local test = require("sqltester") -test:plan(19) +test:plan(20) --!./tcltestrunner.lua -- 2007 May 15 @@ -296,47 +296,62 @@ test:do_test( test:do_test( "badutf-4.4", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'808080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'808080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.4> - "X", "808080F0808080FF" + "X", "F0" -- </badutf-4.4> }) test:do_test( "badutf-4.5", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff8080f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff8080f0808080ff')) AS x; + ]]) end, { -- <badutf-4.5> - "X", "80F0808080FF" + "X", "F0" -- </badutf-4.5> }) test:do_test( "badutf-4.6", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff80' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff80' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.6> - "X", "F0808080FF" + "X", "F0" -- </badutf-4.6> }) test:do_test( "badutf-4.7", function() - return test:execsql2([[SELECT hex(CAST(TRIM(x'ff8080' FROM ]].. - [[x'ff80f0808080ff') AS VARBINARY)) AS x]]) + return test:execsql2([[ + SELECT hex(TRIM(x'ff8080' FROM x'ff80f0808080ff')) AS x; + ]]) end, { -- <badutf-4.7> - "X", "FF80F0808080FF" + "X", "F0" -- </badutf-4.7> }) +-- gh-4145: Make sure that TRIM() properly work with VARBINARY. +test:do_execsql_test( + "badutf-5", + [[ + SELECT HEX(TRIM(x'ff1234' from x'1234125678123412')); + ]], + { + '5678' + } +) + --db2("close") -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2021-11-11 10:47 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-01 16:29 [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 1/8] sql: refactor ABS() funcion Mergen Imeev via Tarantool-patches 2021-10-08 21:55 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 16:52 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:11 ` Mergen Imeev via Tarantool-patches 2021-11-01 13:37 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 2/8] sql: refactor CHAR_LENGTH() function Mergen Imeev via Tarantool-patches 2021-10-08 21:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 16:58 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:11 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:20 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 3/8] sql: refactor UPPER() and LOWER() functions Mergen Imeev via Tarantool-patches 2021-10-20 17:02 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 4/8] sql: refactor NULLIF() function Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function Mergen Imeev via Tarantool-patches 2021-10-20 17:05 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:12 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:35 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 6/8] sql: rework POSITION() function Mergen Imeev via Tarantool-patches 2021-10-08 21:58 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 17:08 ` Mergen Imeev via Tarantool-patches 2021-11-01 10:41 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 7/8] sql: rework SUBSTR() function Mergen Imeev via Tarantool-patches 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 17:15 ` Mergen Imeev via Tarantool-patches 2021-10-28 22:13 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-01 10:45 ` Mergen Imeev via Tarantool-patches 2021-10-01 16:29 ` [Tarantool-patches] [PATCH v1 8/8] sql: refactor LIKE() function Mergen Imeev via Tarantool-patches 2021-10-08 22:02 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-20 17:19 ` Mergen Imeev via Tarantool-patches 2021-11-01 10:48 ` Mergen Imeev via Tarantool-patches 2021-11-01 10:53 ` Mergen Imeev via Tarantool-patches 2021-10-04 13:32 ` [Tarantool-patches] [PATCH v1 0/8] Rework standard function Mergen Imeev via Tarantool-patches 2021-11-01 13:38 ` Vladislav Shpilevoy via Tarantool-patches 2021-11-11 10:45 Mergen Imeev via Tarantool-patches 2021-11-11 10:45 ` [Tarantool-patches] [PATCH v1 5/8] sql: rework TRIM() function Mergen Imeev via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox