From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Roman Khabibov <roman.habibov@tarantool.org>, tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH] sql: modify TRIM() function signature Date: Tue, 16 Apr 2019 20:14:52 +0300 [thread overview] Message-ID: <49c88b8e-95ff-1a37-9ec3-fc16597891e3@tarantool.org> (raw) In-Reply-To: <D522318E-9237-4A00-8915-4A8D970B5770@tarantool.org> Hi! Thanks for the fixes! Much better now, seriously, but see 21 comments below. >> 8. Please, create a enum with normal names for these constants. > +enum trim_specification { > + LEADING = 1, > + TRAILING = 2, > + BOTH = 3 1. These values are used as a bitmask in the TRIM function implementation. I expected that you would account it. BOTH should be a bit combination of LEADING and TRAILING. Also, in such a case it should be 'trim_side_mask' enum, not just 'trim_specification' - what does it specify. In addition, we have a strict policy of naming enum values, because they are visible in the whole namespace. We do not have C++ namespaces. C-way of namespacing is prefixing all functions and constants with a certain name. It means, that the values should be prefixed with uppercased enum name (or its part, when it is too long). Here I would use just 'TRIM_' prefix. Finally, add a comment to that enum. 2. Ok, you added enum, but you do not use it at all anywhere. What is a point of such enum? You still use constants in both parse.y and trim_procedure. Please, do a self-review. In is easy to find such places by yourself just diligently scanning the diff couple of times before a send. >> >>> return; >>> } else { >>> const unsigned char *z = zCharSet; >>> - int trim_set_sz = sql_value_bytes(argv[1]); >>> + int trim_set_sz = sql_value_bytes(argv[source_index - 1]); >>> /* >>> * Count the number of UTF-8 characters passing >>> * through the entire char set, but not up >>> @@ -1272,8 +1288,7 @@ trimFunc(sql_context * context, int argc, sql_value ** argv) >>> } >>> } >>> if (nChar > 0) { >>> - flags = SQL_PTR_TO_INT(sql_user_data(context)); >>> - if (flags & 1) { >>> + if (trim_side & 1) { >> >> 13. When checking flags, use (flag & ...) != 0 instead of an >> implicit conversion. In other places too. > + if ((flags & 1) != 0) { > + if ((flags & 2) != 0) { 3. Use enum bitmask values instead of 1 and 2. (flags & TRIM_TRAILING) != 0 (flags & TRIM_LEADING) != 0 >> >> 14. Better write three trim functions taking different number of >> args, converting them to normal types, and calling the single >> trim function. Instead of making a pile of 'if's about argc inside >> the current implementation. > Done. But now I have dublicated pieces of code: 4. Then do not duplicate and extract it into another function. It is one of your tasks as a programmer to reduce code duplication. You should not be a silent text-editor into which I insert my own code and ideas via the mailing list. Probably after fixing my next comments the code duplication will be minor or will even disappear. > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index abeecefa1..bf7e7a652 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1286,108 +1286,223 @@ replaceFunc(sql_context * context, int argc, sql_value ** argv) > sql_result_text(context, (char *)zOut, j, sql_free); > } > > -/* > - * Implementation of the TRIM(), LTRIM(), and RTRIM() functions. > - * The userdata is 0x1 for left trim, 0x2 for right trim, 0x3 for both. > +enum trim_specification { > + LEADING = 1, > + TRAILING = 2, > + BOTH = 3 > +}; > + > +/** > + * Remove chars included into @a collation from @a input_str. > + * @param context SQL context. > + * @param flags Trim specification: left, right or both. > + * @param collation Character set. > + * @param coll_sz Character set size in bytes. > + * @param input_str Input string for trimming. > + * @param input_str_sz Input string size in bytes. > */ > static void > -trimFunc(sql_context * context, int argc, sql_value ** argv) > +trim_procedure(sql_context * context, enum trim_specification flags, > + const unsigned char *collation, int coll_sz, > + const unsigned char *input_str, int input_str_sz) 5. Broken alignment. 6. Why do you really need 'unsigned char'? I do not see any arithmetical operations here. Only assignments. > { > - const unsigned char *zIn; /* Input string */ > - const unsigned char *zCharSet; /* Set of characters to trim */ > - int nIn; /* Number of bytes in input */ > - int flags; /* 1: trimleft 2: trimright 3: trim */ > - int i; /* Loop counter */ > - unsigned char *aLen = 0; /* Length of each character in zCharSet */ > - unsigned char **azChar = 0; /* Individual characters in zCharSet */ > - int nChar; /* Number of characters in zCharSet */ > + int i; > + /* 7. Trailing whitespaces here and below. As I know, git highlights them with red color, which means, that you haven't reviewed that patch before sending. Please, do it next time. Also, you can avoid automatic trailing whitespaces if install one of the comment packages for Sublime. > + * Length of each character in collation. 8. Ok, now I see what did you mean as 'character set' in the previous version. Sorry, in such a case it is not collation of course, and it is strange, that you blindly renamed it without any opposition. It is ok to argue with me. > + */ > + unsigned char *aLen = 0; 9. Please, do not use camel code style for new code. We never use it in Tarantool. Use normal names. > + /* > + * Individual characters in collation. > + */ > + unsigned char **azChar = 0; > + /* > + * Number of characters in collation. > + */ > + int nChar; > > + const unsigned char *z = collation; > + /* > + * Count the number of UTF-8 characters passing > + * through the entire char set, but not up > + * to the '\0' or X'00' character. This allows > + * to handle trimming set containing such > + * characters. 10. The comment's indentation is reduced and the text can be realligned with less number of lines. > + */ > + nChar = sql_utf8_char_count(z, coll_sz); 11. It is not C89. You do not need to declare all the variables at the beginning of function before their usage. > +/** > + * Normalize args from @a argv input array when it has one arg only. 12. Out of 66. In some other places below too. Sublime has facilities to show 66 and 80 borders, google by the phrase 'sublime rulers'. Please, use them. > + * > + * Case: TRIM(<str>) > + * Call trimming procedure with BOTH as the flags and " " as the collation. > + * > + * @param context SQL context. > + * @param argc Number of args. > + * @param argv Args array. 13. Comments on such simple args are useless and on the other hand there is nothing more to say. We often omit @param/@retval section in such a case, and I think it is applicable here. I mean, that everything above first @param is ok, but below is not necessary. You can keep it if you want, up to you. > + */ > +static void > +trim_func_one_arg(sql_context * context, int argc, sql_value **argv) > +{ > + const unsigned char *input_str; > + assert(argc == 1); > + (void) argc; > + > + if (sql_value_type(argv[0]) == SQL_NULL) { > + return; > + } 14. We do not use curly braces when 'if' or 'for' body consists of one line. What is more, you do not need this check at all, because sql_value_text returns NULL, when value is NULL as well. The same in other helper functions. > + if ((input_str = sql_value_text(argv[0])) == NULL) { > + return; > + }> + > + int input_str_sz = sql_value_bytes(argv[0]); > + assert(input_str == sql_value_text(argv[0])); 15. What is a point of that assertion? You assigned input_str to this value literally 5 lines above. > + > + trim_procedure(context, BOTH, (const unsigned char *) " ", > + 1, input_str, input_str_sz); > +} > + > +/** > + * Normalize args from @a argv input array when it has two args. > + * > + * Case: TRIM(<trim_collation> FROM <str>) > + * If user has specified <trim_collation> only, call trimming procedure with > + * BOTH as the flags and that collation. > + * > + * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>) > + * If user has specified side keyword only, call trimming procedure > + * with the specified side and " " as the collation. > + * > + * @param context SQL context. > + * @param argc Number of args. > + * @param argv Args array. > + */ > +static void > +trim_func_two_args(sql_context * context, int argc, sql_value **argv) > +{ > + const unsigned char *input_str; > + assert(argc == 2); > + (void) argc; > + > + if (sql_value_type(argv[1]) == SQL_NULL) { > + return; > + } > + if ((input_str = sql_value_text(argv[1])) == NULL) { > + return; > + } > + > + int input_str_sz = sql_value_bytes(argv[1]); > + assert(input_str == sql_value_text(argv[1])); > + > + const unsigned char *collation; > + if (sql_value_type(argv[0]) == SQL_INTEGER) { > + trim_procedure(context, sql_value_int(argv[0]), > + (const unsigned char *) " ", 1, > + input_str, input_str_sz); > + } else if ((collation = sql_value_text(argv[0])) == NULL) { > + return; > + } else { > + int coll_sz = sql_value_bytes(argv[0]); > + trim_procedure(context, BOTH, collation, coll_sz, input_str, > + input_str_sz); > + } > +} > + > +/** > + * Normalize args from @a argv input array when it has three args. > + * > + * Case: TRIM(LEADING/TRAILING/BOTH <trim_collation> FROM <str>) > + * User has specified side keyword and <trim_collation>, call trimming > + * procedure with that args. > + * > + * @param context SQL context. > + * @param argc Number of args. > + * @param argv Args array. > + */ > +static void > +trim_func_three_args(sql_context * context, int argc, sql_value **argv) > +{ > + const unsigned char *input_str; > + assert(argc == 3); > + (void) argc; > + > + if (sql_value_type(argv[2]) == SQL_NULL) { > + return; > + } > + if ((input_str = sql_value_text(argv[2])) == NULL) { > + return; > + } > + > + int input_str_sz = sql_value_bytes(argv[2]); > + assert(input_str == sql_value_text(argv[2])); > + > + const unsigned char *collation; > + assert(sql_value_type(argv[0]) == SQL_INTEGER); > + if ((collation = sql_value_text(argv[1])) != 0) { 16. As I said in the previous review, and in reviews to other patches - use NULL to check if a pointer is NULL. When a code hunk is tall, and someone sees code like variable = func() if (variable != 0) .... they could think that the variable is integer. It is confusing (variable can be declared somewhere above and the one does not see its type). > + int coll_sz = sql_value_bytes(argv[1]); > + trim_procedure(context, sql_value_int(argv[0]), collation, > + coll_sz, input_str, input_str_sz); > + } else { > + return; 17. What is a point of this last return? Even without 'else' the compiler inserts implicit 'ret' instruction at the end of 'void' function. > + } > } > > #ifdef SQL_ENABLE_UNKNOWN_SQL_FUNCTION > diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y > index 099daf512..985d33605 100644 > --- a/src/box/sql/parse.y > +++ b/src/box/sql/parse.y > @@ -1032,6 +1032,51 @@ expr(A) ::= CAST(X) LP expr(E) AS typedef(T) RP(Y). { > sqlExprAttachSubtrees(pParse->db, A.pExpr, E.pExpr, 0); > } > %endif SQL_OMIT_CAST > + > +expr(A) ::= TRIM(X) LP trim_operands(Y) RP(E). { > + A.pExpr = sqlExprFunction(pParse, Y, &X); > + spanSet(&A, &X, &E); > + } > + > +%type trim_operands {struct ExprList *} > +%destructor trim_operands { sql_expr_list_delete(pParse->db, $$); } > + > +trim_operands(A) ::= trim_from_clause(F) expr(Y). { > + A = sql_expr_list_append(pParse->db, F, Y.pExpr); > +} > + > +trim_operands(A) ::= expr(Y). { > + A = sql_expr_list_append(pParse->db, NULL, Y.pExpr); > +} > + > +%type trim_from_clause {struct ExprList *} > +%destructor trim_from_clause { sql_expr_list_delete(pParse->db, $$); } > + > +trim_from_clause(A) ::= expr(Y) FROM. { > + A = sql_expr_list_append(pParse->db, NULL, Y.pExpr); > +} > + > +trim_from_clause(A) ::= trim_specification(N) trim_character(Y) FROM. { 18. I understand, why you did not use trim_character rule above, but someone looking at this code first time and not seen our discussion will not understand. I would add a comment about it. > + struct Expr *p = sql_expr_new_dequoted(pParse->db, TK_INTEGER, > + &sqlIntTokens[N]); > + A = sql_expr_list_append(pParse->db, NULL, p); > + if (Y != NULL) { > + A = sql_expr_list_append(pParse->db, A, Y); > + } > +} > + > +%type trim_character {struct Expr *} > +%destructor trim_character {sql_expr_delete(pParse->db, $$, false);} > + > +trim_character(A) ::= . { A = NULL; } > +trim_character(A) ::= expr(X). { A = X.pExpr; } 19. Exactly the same rule already exists: case_operand. I think, it is worth merging them into one rule like expr_optional(A) ::= . { A = NULL; } expr_optional(A) ::= expr(X). { A = X.pExpr; } And using in both places. > + > +%type trim_specification {int} > + > +trim_specification(A) ::= LEADING. {A = 1;} > +trim_specification(A) ::= TRAILING. {A = 2;} > +trim_specification(A) ::= BOTH. {A = 3;} > + > diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua > index 251cc3534..8fe04fab1 100755 > --- a/test/sql-tap/func.test.lua > +++ b/test/sql-tap/func.test.lua > @@ -1,6 +1,6 @@ > #!/usr/bin/env tarantool > test = require("sqltester") > -test:plan(14586) > +test:plan(14590) > > --!./tcltestrunner.lua > -- 2001 September 15 > @@ -1912,37 +1912,37 @@ test:do_test( > test:do_catchsql_test( > "func-22.1", > [[ > - SELECT trim(1,2,3) > + SELECT TRIM(1,2,3) 20. Why? I thought that all identifiers are normalized anyway, including function names, and you do not need to uppercase everything manually. The same about the test func-22.4, func-22.20. > ]], { > -- <func-22.1> > - 1, "wrong number of arguments to function TRIM()" > + 1, "Syntax error near ','" > -- </func-22.1> > }) > @@ -2215,13 +2215,55 @@ test:do_execsql_test( > test:do_execsql_test( > "func-22.34", > [[ > - SELECT RTRIM(X'00004100420000', X'00'); > + SELECT TRIM(TRAILING X'00' FROM X'00004100420000'); > ]], { > -- <func-22.34> > "\0\0A\0B" > -- </func-22.34> > }) > > +-- gh-3879 Check new TRIM() grammar, particularly BOTH keyword and FROM without > +-- any agrs before. LEADING and TRAILING keywords is checked above. 21. Out of 66.
next prev parent reply other threads:[~2019-04-16 17:14 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-11 17:33 [tarantool-patches] " Roman Khabibov 2019-04-14 18:01 ` [tarantool-patches] " Vladislav Shpilevoy 2019-04-16 0:14 ` Roman Khabibov 2019-04-16 17:14 ` Vladislav Shpilevoy [this message] 2019-04-18 17:11 ` Roman Khabibov 2019-04-19 12:49 ` Vladislav Shpilevoy 2019-04-20 0:48 ` Roman Khabibov 2019-04-21 19:36 ` Vladislav Shpilevoy 2019-04-22 10:43 ` Vladislav Shpilevoy 2019-04-22 16:45 ` Roman Khabibov 2019-04-22 18:22 ` Vladislav Shpilevoy 2019-04-23 1:04 ` Roman Khabibov 2019-04-23 8:59 ` Vladislav Shpilevoy 2019-04-23 10:21 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=49c88b8e-95ff-1a37-9ec3-fc16597891e3@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=roman.habibov@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH] sql: modify TRIM() function signature' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox