[tarantool-patches] Re: [PATCH] sql: modify TRIM() function signature

Tue Apr 16 20:14:52 MSK 2019

Hi! Thanks for the fixes! Much better now, seriously, but
see 21 comments below.

>> 8. Please, create a enum with normal names for these constants.
> +enum trim_specification {
> +	LEADING = 1,
> +	TRAILING = 2,
> +	BOTH = 3

1. These values are used as a bitmask in the TRIM function
implementation. I expected that you would account it. BOTH
should be a bit combination of LEADING and TRAILING.

Also, in such a case it should be 'trim_side_mask' enum,
not just 'trim_specification' - what does it specify.

In addition, we have a strict policy of naming enum values,
because they are visible in the whole namespace. We do not
have C++ namespaces. C-way of namespacing is prefixing all
functions and constants with a certain name. It means, that
the values should be prefixed with uppercased enum name
(or its part, when it is too long). Here I would use just
'TRIM_' prefix.

Finally, add a comment to that enum.

2. Ok, you added enum, but you do not use it at all
anywhere. What is a point of such enum? You still use
constants in both parse.y and trim_procedure.

Please, do a self-review. In is easy to find such
places by yourself just diligently scanning the diff
couple of times before a send.

>>
>>> 		return;
>>> 	} else {
>>> 		const unsigned char *z = zCharSet;
>>> -		int trim_set_sz = sql_value_bytes(argv[1]);
>>> +		int trim_set_sz = sql_value_bytes(argv[source_index - 1]);
>>> 		/*
>>> 		* Count the number of UTF-8 characters passing
>>> 		* through the entire char set, but not up
>>> @@ -1272,8 +1288,7 @@ trimFunc(sql_context * context, int argc, sql_value ** argv)
>>> 		}
>>> 	}
>>> 	if (nChar > 0) {
>>> -		flags = SQL_PTR_TO_INT(sql_user_data(context));
>>> -		if (flags & 1) {
>>> +		if (trim_side & 1) {
>>
>> 13. When checking flags, use (flag & ...) != 0 instead of an
>> implicit conversion. In other places too.
> +		if ((flags & 1) != 0) {
> +		if ((flags & 2) != 0) {

3. Use enum bitmask values instead of 1 and 2.

    (flags & TRIM_TRAILING) != 0
    (flags & TRIM_LEADING) != 0

>>
>> 14. Better write three trim functions taking different number of
>> args, converting them to normal types, and calling the single
>> trim function. Instead of making a pile of 'if's about argc inside
>> the current implementation.
> Done. But now I have dublicated pieces of code:

4. Then do not duplicate and extract it into another function. It is
one of your tasks as a programmer to reduce code duplication. You should
not be a silent text-editor into which I insert my own code and ideas
via the mailing list.

Probably after fixing my next comments the code duplication will be
minor or will even disappear.

> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index abeecefa1..bf7e7a652 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -1286,108 +1286,223 @@ replaceFunc(sql_context * context, int argc, sql_value ** argv)
>  	sql_result_text(context, (char *)zOut, j, sql_free);
>  }
>  
> -/*
> - * Implementation of the TRIM(), LTRIM(), and RTRIM() functions.
> - * The userdata is 0x1 for left trim, 0x2 for right trim, 0x3 for both.
> +enum trim_specification {
> +	LEADING = 1,
> +	TRAILING = 2,
> +	BOTH = 3
> +};
> +
> +/**
> + * Remove chars included into @a collation from @a input_str.
> + * @param context SQL context.
> + * @param flags Trim specification: left, right or both.
> + * @param collation Character set.
> + * @param coll_sz Character set size in bytes.
> + * @param input_str Input string for trimming.
> + * @param input_str_sz Input string size in bytes.
>   */
>  static void
> -trimFunc(sql_context * context, int argc, sql_value ** argv)
> +trim_procedure(sql_context * context, enum trim_specification flags,
> +	     const unsigned char *collation, int coll_sz,
> +	     const unsigned char *input_str, int input_str_sz)

5. Broken alignment.

6. Why do you really need 'unsigned char'? I do not see any
arithmetical operations here. Only assignments.

>  {
> -	const unsigned char *zIn;	/* Input string */
> -	const unsigned char *zCharSet;	/* Set of characters to trim */
> -	int nIn;		/* Number of bytes in input */
> -	int flags;		/* 1: trimleft  2: trimright  3: trim */
> -	int i;			/* Loop counter */
> -	unsigned char *aLen = 0;	/* Length of each character in zCharSet */
> -	unsigned char **azChar = 0;	/* Individual characters in zCharSet */
> -	int nChar;		/* Number of characters in zCharSet */
> +	int i;
> +	/* 

7. Trailing whitespaces here and below. As I know, git highlights them
with red color, which means, that you haven't reviewed that patch
before sending. Please, do it next time. Also, you can avoid automatic
trailing whitespaces if install one of the comment packages for Sublime.

> +	 * Length of each character in collation.

8. Ok, now I see what did you mean as 'character set' in the
previous version. Sorry, in such a case it is not collation of
course, and it is strange, that you blindly renamed it without any
opposition. It is ok to argue with me.

> +	 */
> +	unsigned char *aLen = 0;

9. Please, do not use camel code style for new code. We
never use it in Tarantool. Use normal names.

> +	/*
> +	 * Individual characters in collation.
> +	 */
> +	unsigned char **azChar = 0;
> +	/* 
> +	 * Number of characters in collation.
> +	 */
> +	int nChar;
>  
> +	const unsigned char *z = collation;
> +	/*
> +	 * Count the number of UTF-8 characters passing
> +	 * through the entire char set, but not up
> +	 * to the '\0' or X'00' character. This allows
> +	 * to handle trimming set containing such
> +	 * characters.

10. The comment's indentation is reduced and the text can be
realligned with less number of lines.

> +	 */
> +	nChar = sql_utf8_char_count(z, coll_sz);

11. It is not C89. You do not need to declare all the variables
at the beginning of function before their usage.

> +/**
> + * Normalize args from @a argv input array when it has one arg only.

12. Out of 66. In some other places below too. Sublime
has facilities to show 66 and 80 borders, google by the
phrase 'sublime rulers'. Please, use them.

> + * 
> + * Case: TRIM(<str>)
> + * Call trimming procedure with BOTH as the flags and " " as the collation.
> + *
> + * @param context SQL context.
> + * @param argc Number of args.
> + * @param argv Args array.

13. Comments on such simple args are useless and on the other hand there
is nothing more to say. We often omit @param/@retval section in such a
case, and I think it is applicable here.

I mean, that everything above first @param is ok, but below is not
necessary. You can keep it if you want, up to you.

> + */
> +static void
> +trim_func_one_arg(sql_context * context, int argc, sql_value **argv)
> +{
> +	const unsigned char *input_str;
> +	assert(argc == 1);
> +	(void) argc;
> +
> +	if (sql_value_type(argv[0]) == SQL_NULL) {
> +		return;
> +	}

14. We do not use curly braces when 'if' or 'for' body
consists of one line. What is more, you do not need
this check at all, because sql_value_text returns NULL,
when value is NULL as well. The same in other helper
functions.

> +	if ((input_str = sql_value_text(argv[0])) == NULL) {
> +		return;
> +	}> +
> +	int input_str_sz = sql_value_bytes(argv[0]);
> +	assert(input_str == sql_value_text(argv[0]));

15. What is a point of that assertion? You assigned input_str
to this value literally 5 lines above.

> +
> +	trim_procedure(context, BOTH, (const unsigned char *) " ",
> +		       1, input_str, input_str_sz);
> +}
> +
> +/**
> + * Normalize args from @a argv input array when it has two args.
> + * 
> + * Case: TRIM(<trim_collation> FROM <str>)
> + * If user has specified <trim_collation> only, call trimming procedure with
> + * BOTH as the flags and that collation.
> + *
> + * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>)
> + * If user has specified side keyword only, call trimming procedure
> + * with the specified side and " " as the collation.
> + *
> + * @param context SQL context.
> + * @param argc Number of args.
> + * @param argv Args array.
> + */
> +static void
> +trim_func_two_args(sql_context * context, int argc, sql_value **argv)
> +{
> +	const unsigned char *input_str;
> +	assert(argc == 2);
> +	(void) argc;
> +
> +	if (sql_value_type(argv[1]) == SQL_NULL) {
> +		return;
> +	}
> +	if ((input_str = sql_value_text(argv[1])) == NULL) {
> +		return;
> +	}
> +
> +	int input_str_sz = sql_value_bytes(argv[1]);
> +	assert(input_str == sql_value_text(argv[1]));
> +
> +	const unsigned char *collation;
> +	if (sql_value_type(argv[0]) == SQL_INTEGER) {
> +		trim_procedure(context, sql_value_int(argv[0]),
> +			       (const unsigned char *) " ", 1,
> +			       input_str, input_str_sz);
> +	} else if ((collation = sql_value_text(argv[0])) == NULL) {
> +		return;
> +	} else {
> +		int coll_sz = sql_value_bytes(argv[0]);
> +		trim_procedure(context, BOTH, collation, coll_sz, input_str,
> +			       input_str_sz);
> +	}
> +}
> +
> +/**
> + * Normalize args from @a argv input array when it has three args.
> + *
> + * Case: TRIM(LEADING/TRAILING/BOTH <trim_collation> FROM <str>)
> + * User has specified side keyword and <trim_collation>, call trimming
> + * procedure with that args.
> + *
> + * @param context SQL context.
> + * @param argc Number of args.
> + * @param argv Args array.
> + */
> +static void
> +trim_func_three_args(sql_context * context, int argc, sql_value **argv)
> +{
> +	const unsigned char *input_str;
> +	assert(argc == 3);
> +	(void) argc;
> +
> +	if (sql_value_type(argv[2]) == SQL_NULL) {
> +		return;
> +	}
> +	if ((input_str = sql_value_text(argv[2])) == NULL) {
> +		return;
> +	}
> +
> +	int input_str_sz = sql_value_bytes(argv[2]);
> +	assert(input_str == sql_value_text(argv[2]));
> +
> +	const unsigned char *collation;
> +	assert(sql_value_type(argv[0]) == SQL_INTEGER);
> +	if ((collation = sql_value_text(argv[1])) != 0) {

16. As I said in the previous review, and in reviews to
other patches - use NULL to check if a pointer is NULL.

When a code hunk is tall, and someone sees code like

    variable = func()
    if (variable != 0)
        ....

they could think that the variable is integer. It is
confusing (variable can be declared somewhere above
and the one does not see its type).

> +		int coll_sz = sql_value_bytes(argv[1]);
> +		trim_procedure(context, sql_value_int(argv[0]), collation,
> +			       coll_sz, input_str, input_str_sz);
> +	} else {
> +		return;

17. What is a point of this last return? Even without 'else'
the compiler inserts implicit 'ret' instruction at the end of
'void' function.

> +	}
>  }
>  
>  #ifdef SQL_ENABLE_UNKNOWN_SQL_FUNCTION
> diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y
> index 099daf512..985d33605 100644
> --- a/src/box/sql/parse.y
> +++ b/src/box/sql/parse.y
> @@ -1032,6 +1032,51 @@ expr(A) ::= CAST(X) LP expr(E) AS typedef(T) RP(Y). {
>    sqlExprAttachSubtrees(pParse->db, A.pExpr, E.pExpr, 0);
>  }
>  %endif  SQL_OMIT_CAST
> +
> +expr(A) ::= TRIM(X) LP trim_operands(Y) RP(E). {
> +    A.pExpr = sqlExprFunction(pParse, Y, &X);
> +    spanSet(&A, &X, &E);
> +  }
> +
> +%type trim_operands {struct ExprList *}
> +%destructor trim_operands { sql_expr_list_delete(pParse->db, $$); }
> +
> +trim_operands(A) ::= trim_from_clause(F) expr(Y). {
> +    A = sql_expr_list_append(pParse->db, F, Y.pExpr);
> +}
> +
> +trim_operands(A) ::= expr(Y). {
> +    A = sql_expr_list_append(pParse->db, NULL, Y.pExpr);
> +}
> +
> +%type trim_from_clause {struct ExprList *}
> +%destructor trim_from_clause { sql_expr_list_delete(pParse->db, $$); }
> +
> +trim_from_clause(A) ::= expr(Y) FROM. {
> +    A = sql_expr_list_append(pParse->db, NULL, Y.pExpr);
> +}
> +
> +trim_from_clause(A) ::= trim_specification(N) trim_character(Y) FROM. {

18. I understand, why you did not use trim_character rule above,
but someone looking at this code first time and not seen our
discussion will not understand. I would add a comment about it.

> +  struct Expr *p = sql_expr_new_dequoted(pParse->db, TK_INTEGER,
> +                                         &sqlIntTokens[N]);
> +  A = sql_expr_list_append(pParse->db, NULL, p);
> +  if (Y != NULL) {
> +    A = sql_expr_list_append(pParse->db, A, Y);
> +  }
> +}
> +
> +%type trim_character {struct Expr *}
> +%destructor trim_character {sql_expr_delete(pParse->db, $$, false);}
> +
> +trim_character(A) ::= . { A = NULL; }
> +trim_character(A) ::= expr(X). { A = X.pExpr; }

19. Exactly the same rule already exists: case_operand. I think, it
is worth merging them into one rule like

    expr_optional(A) ::= . { A = NULL; }
    expr_optional(A) ::= expr(X). { A = X.pExpr; }

And using in both places.

> +
> +%type trim_specification {int}
> +
> +trim_specification(A) ::= LEADING.  {A = 1;}
> +trim_specification(A) ::= TRAILING. {A = 2;}
> +trim_specification(A) ::= BOTH.     {A = 3;}
> +
> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua
> index 251cc3534..8fe04fab1 100755
> --- a/test/sql-tap/func.test.lua
> +++ b/test/sql-tap/func.test.lua
> @@ -1,6 +1,6 @@
>  #!/usr/bin/env tarantool
>  test = require("sqltester")
> -test:plan(14586)
> +test:plan(14590)
>  
>  --!./tcltestrunner.lua
>  -- 2001 September 15
> @@ -1912,37 +1912,37 @@ test:do_test(
>  test:do_catchsql_test(
>      "func-22.1",
>      [[
> -        SELECT trim(1,2,3)
> +        SELECT TRIM(1,2,3)

20. Why? I thought that all identifiers are normalized
anyway, including function names, and you do not need
to uppercase everything manually. The same about the
test func-22.4, func-22.20.

>      ]], {
>          -- <func-22.1>
> -        1, "wrong number of arguments to function TRIM()"
> +        1, "Syntax error near ','"
>          -- </func-22.1>
>      })
> @@ -2215,13 +2215,55 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "func-22.34",
>      [[
> -        SELECT RTRIM(X'00004100420000', X'00');
> +        SELECT TRIM(TRAILING X'00' FROM X'00004100420000');
>      ]], {
>          -- <func-22.34>
>          "\0\0A\0B"
>          -- </func-22.34>
>      })
>  
> +-- gh-3879 Check new TRIM() grammar, particularly BOTH keyword and FROM without
> +-- any agrs before. LEADING and TRAILING keywords is checked above.

21. Out of 66.