From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 70646245C9 for ; Tue, 12 Mar 2019 09:31:33 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5RmhnbCmiGeK for ; Tue, 12 Mar 2019 09:31:33 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 096DC23EB6 for ; Tue, 12 Mar 2019 09:31:32 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: [tarantool-patches] Re: [PATCH] sql: replace instr() with position() From: "n.pettik" In-Reply-To: <20190310074112.3260-1-ivan.koptelov@tarantool.org> Date: Tue, 12 Mar 2019 16:31:30 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20190310074112.3260-1-ivan.koptelov@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Ivan Koptelov > On 10 Mar 2019, at 10:41, Ivan Koptelov = wrote: >=20 > Before this patch we had instr() SQL function. > After the patch it is renamed to position(). > Also a few things have been changed: arguments > order, allowed arguments types and usage of > collations. >=20 > The order of the arguments has changed: > Before: instr(haystack, needle). > After: position(needle, haystack). What are reasons to change arguments' order? I guess to make it closer to ANSI syntax . > Type checking became more strict: before it was > possible to call the function with INTEGER arguments > or with arguments of different types. Now both arguments > must have the same type and be either text or binary > strings. Are there any reasons to support BLOBs as arguments? For instance, we=E2=80=99ve banned this opportunity for LIKE func. > Before the patch collations were not taken into > consideration during the search. Now it is fixed, and > both implicit (column) collations and explicit > (using COLLATE expression) are used. There are > several possible cases: >=20 > - One of collations is mentioned alongside with explicit > COLLATE clause, which forces this collation over another > one. It is allowed to have the same forced collations; > - Both collations are derived from table columns and they > are the same; > - One collation is derived from table column and another > one is not specified (i.e. COLL_NONE); This is regulated by ANSI =E2=80=9CType combination=E2=80=9D rules. Skip this paragraph and simply mention this fact. > In all other cases they are not accounted to be compatible > and error is raised. >=20 > Related to #3933 I would use =E2=80=9CWorkaround for=E2=80=9D label. Also, please underline the fact that syntax is still different from ANSI one and describe reasons for that. >=20 > @TarantoolBot document > Title: instr() is replaced with position() > Changes are described in the associated commit > message. AFAIK we do this vice versa: move vital parts of commit message under doc bot tag request. > diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c > index a75f23756..59b43ec41 100644 > --- a/src/box/sql/expr.c > +++ b/src/box/sql/expr.c > @@ -4054,7 +4054,8 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, = int target) > constMask |=3D MASKBIT32(i); > } > if ((pDef->funcFlags & = SQL_FUNC_NEEDCOLL) !=3D > - 0 && coll =3D=3D NULL) { > + 0 && coll =3D=3D NULL && > + !(pDef->funcFlags & = SQL_FUNC_ARG_COLL)) { Why these conditions can=E2=80=99t be compatible? What is the difference between SQL_FUNC_NEEDCOLL and new flag? Does NEEDCOLL refer to returning value? Please, comment this part. > bool unused; > uint32_t id; > if (sql_expr_coll(pParse, > @@ -4064,6 +4065,37 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, = int target) > return 0; > } > } > + if (pDef->funcFlags & SQL_FUNC_ARG_COLL) { > + assert(nFarg =3D=3D 2); Add comment to this assertion or better to the whole branch. It is not obvious at first sight. Why only functions with two args=E2=80=A6= For instance, there=E2=80=99s MAX function, which may take any number of arguments. Collations are also vital for this function, since they are used to compare strings. What is more, I suggest to move this part in a separate patch, since position is not the only function that must apply collations on its operands. I don=E2=80=99t insist on dozens of tests on each = built-in function, just a few. > + struct coll *colls[2] =3D {NULL}; I=E2=80=99d rather use explicit naming: struct coll *lhs_coll =3D NULL; struct coll *rhs_coll =3D NULL; It makes code a bit longer, but on the other hand IMHO makes it less prone to typos. > + uint32_t colls_ids[2] =3D {0}; > + bool is_forced[2] =3D {false}; > + if (sql_expr_coll(pParse, = pFarg->a[0].pExpr, > + &is_forced[0], = &colls_ids[0], > + &colls[0]) !=3D 0) > + return 0; > + if (sql_expr_coll(pParse, = pFarg->a[1].pExpr, > + &is_forced[1], = &colls_ids[1], > + &colls[1]) !=3D 0) > + return 0; > + > + uint32_t coll_id; > + if = (collations_check_compatibility(colls_ids[0], > + = is_forced[0], > + = colls_ids[1], > + = is_forced[1], > + = &coll_id) > + !=3D = 0) { > + pParse->rc =3D = SQL_TARANTOOL_ERROR; > + pParse->nErr++; > + return 0; > + } > + > + coll =3D (coll_id =3D=3D colls_ids[0]) ? = colls[0] : > + = colls[1]; > + if (coll =3D=3D NULL) > + coll =3D = coll_by_id(COLL_NONE)->coll; Why do we need this? > + } > if (pFarg) { > if (constMask) { > r1 =3D pParse->nMem + 1; > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 2de6ef5ce..f44b7ce78 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -38,6 +38,7 @@ > #include "vdbeInt.h" > #include "version.h" > #include "coll/coll.h" > +#include "tarantoolInt.h" > #include > #include > #include > @@ -212,59 +213,173 @@ absFunc(sql_context * context, int argc, = sql_value ** argv) > } >=20 > /* > - * Implementation of the instr() function. > + * Returns name of SQL type, which is given by the int code. > * > - * instr(haystack,needle) finds the first occurrence of needle > + * @param sql_type One of enum sql_type codes. > + * @retval string representing name of the given type. > + */ > +static inline char * > +sql_value_type_to_str(enum sql_type sql_type) { I don=E2=80=99t recommend using this function: mem_type_to_str() already exists, just make it non-static. sql_type is extracted as type of VDBE memory cell, so in fact it is the same thing.=20 > + switch(sql_type) { > + case SQL_INTEGER: > + return "INTEGER"; > + case SQL_TEXT: > + return "TEXT"; > + case SQL_FLOAT: > + return "REAL"; > + case SQL_BLOB: > + return "BLOB"; > + default: > + return "NULL"; > + } > +} > + > +/* > + * Implementation of the position() function. > + * > + * position(haystack,needle) finds the first occurrence of needle > * in haystack and returns the number of previous characters plus 1, > * or 0 if needle does not occur within haystack. > * > - * If both haystack and needle are BLOBs, then the result is one more = than > - * the number of bytes in haystack prior to the first occurrence of = needle, > - * or 0 if needle never occurs in haystack. > + * Haystack and needle must both have the same type, either > + * TEXT or BLOB. > + * > + * If haystack and needle are BLOBs, then the result is one more > + * than the number of bytes in haystack prior to the first > + * occurrence of needle, or 0 if needle never occurs in haystack. > + * > + * If haystack and needle are TEXTs, then their collations (if > + * any) are taken into consideration. If only one param has a > + * collation, then it is used. If both params have collations, > + * then the right one is chosen by > + * box/sql/sqlInt.h/collations_check_compatibility() function > + * (If collations are not compatible then the error is raised). > + * > + * Note that the "collation-selection" logic is implemented in > + * box/sql/expr.c/sqlExprCodeTarget() function. > */ > static void > -instrFunc(sql_context * context, int argc, sql_value ** argv) > +positionFunc(sql_context *context, int argc, sql_value **argv) Please, refactor this function in two steps: firstly provide code style fixes, then functional ones. Otherwise it complicates review process. > { > - const unsigned char *zHaystack; > - const unsigned char *zNeedle; > - int nHaystack; > - int nNeedle; > - int typeHaystack, typeNeedle; > + const unsigned char *haystack; > + const unsigned char *needle; > + int n_haystack_bytes; > + int n_needle_bytes; Ok, if you started refactoring, please finish it: move args declaration to their initialisations, fix name of function, !=3D 0 -> !=3D NULL etc. > + int type_haystack, type_needle; > int N =3D 1; > - int isText; >=20 > UNUSED_PARAMETER(argc); > - typeHaystack =3D sql_value_type(argv[0]); > - typeNeedle =3D sql_value_type(argv[1]); > - if (typeHaystack =3D=3D SQL_NULL || typeNeedle =3D=3D SQL_NULL) > + type_haystack =3D sql_value_type(argv[1]); > + type_needle =3D sql_value_type(argv[0]); > + if (type_haystack =3D=3D SQL_NULL || type_needle =3D=3D = SQL_NULL) > + return; > + /* > + * Position function can be called only with string > + * or blob params. > + */ > + int inconsistent_type =3D 0; > + if (type_needle !=3D SQL_TEXT && type_needle !=3D SQL_BLOB) > + inconsistent_type =3D type_needle; > + if (type_haystack !=3D SQL_TEXT && type_haystack !=3D SQL_BLOB) > + inconsistent_type =3D type_haystack; > + if (inconsistent_type > 0) { > + diag_set(ClientError, ER_INCONSISTENT_TYPES, "TEXT or = BLOB", > + sql_value_type_to_str(inconsistent_type)); > + context->pVdbe->pParse->rc =3D SQL_TARANTOOL_ERROR; > + context->pVdbe->pParse->nErr++; > + sql_result_error(context, tarantoolErrorMessage(), -1); > + return; > + } > + /* > + * Both params of Position function must be of the same > + * type. > + */ > + if (type_haystack !=3D type_needle) { > + diag_set(ClientError, ER_INCONSISTENT_TYPES, > + sql_value_type_to_str(type_needle), > + sql_value_type_to_str(type_haystack)); > + context->pVdbe->pParse->rc =3D SQL_TARANTOOL_ERROR; > + context->pVdbe->pParse->nErr++; > + sql_result_error(context, tarantoolErrorMessage(), -1); > return; > - nHaystack =3D sql_value_bytes(argv[0]); > - nNeedle =3D sql_value_bytes(argv[1]); > - if (nNeedle > 0) { > - if (typeHaystack =3D=3D SQL_BLOB && typeNeedle =3D=3D = SQL_BLOB) { > - zHaystack =3D sql_value_blob(argv[0]); > - zNeedle =3D sql_value_blob(argv[1]); > - assert(zNeedle !=3D 0); > - assert(zHaystack !=3D 0 || nHaystack =3D=3D 0); > - isText =3D 0; > + } > + n_haystack_bytes =3D sql_value_bytes(argv[1]); > + n_needle_bytes =3D sql_value_bytes(argv[0]); > + if (n_needle_bytes > 0) { > + if (type_haystack =3D=3D SQL_BLOB) { > + haystack =3D sql_value_blob(argv[1]); > + needle =3D sql_value_blob(argv[0]); > + assert(needle !=3D 0); > + assert(haystack !=3D 0 || n_haystack_bytes =3D=3D = 0); > + > + while (n_needle_bytes <=3D n_haystack_bytes > + && memcmp(haystack, needle, = n_needle_bytes) !=3D 0) { > + N++; > + n_haystack_bytes--; > + haystack++; > + } > + if (n_needle_bytes > n_haystack_bytes) > + N =3D 0; > } else { > - zHaystack =3D sql_value_text(argv[0]); > - zNeedle =3D sql_value_text(argv[1]); > - isText =3D 1; > - if (zHaystack =3D=3D 0 || zNeedle =3D=3D 0) > + /* > + * Code below handles not only simple > + * cases like position('a', 'bca'), but > + * also more complex ones: > + * position('a', 'bc=C3=A1' COLLATE = "unicode_ci") > + * To do so, we need to use comparison > + * window, which has constant character > + * size, but variable byte size. > + * Character size is equal to > + * needle char size. > + */ > + haystack =3D sql_value_text(argv[1]); > + needle =3D sql_value_text(argv[0]); > + if (haystack =3D=3D 0 || needle =3D=3D 0) > return; How they can be NULL, if this case is filtered in the beginning: if (type_haystack =3D=3D SQL_NULL || type_needle =3D=3D SQL_NULL) return; > - } > - while (nNeedle <=3D nHaystack > - && memcmp(zHaystack, zNeedle, nNeedle) !=3D 0) { > - N++; > - do { > - nHaystack--; > - zHaystack++; > - } while (isText && (zHaystack[0] & 0xc0) =3D=3D = 0x80); > - } > - if (nNeedle > nHaystack) > + struct coll *coll =3D = sqlGetFuncCollSeq(context); > + > + int n_needle_chars =3D > + sql_utf8_char_count(needle, = n_needle_bytes); > + int n_haystack_chars =3D > + sql_utf8_char_count(haystack, = n_haystack_bytes); > + > + if (n_haystack_chars < n_needle_chars) { > + N =3D 0; What=E2=80=99s N? > diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h > index 1d8fae5b0..4d059dd7e 100644 > --- a/src/box/sql/sqlInt.h > +++ b/src/box/sql/sqlInt.h > @@ -1666,6 +1666,13 @@ struct FuncDestructor { > #define SQL_FUNC_SLOCHNG 0x2000 /* "Slow Change". Value constant = during a > * single query - might change = over time > */ > +#define SQL_FUNC_ARG_COLL 0x4000 ARG_NEED_COLL?