[tarantool-patches] Re: [PATCH] sql: modify TRIM() function signature

Fri Apr 19 15:49:08 MSK 2019

Hi! Thanks for the fixes, much better already,
almost done! But see 16 comments below.

>>> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua
>>> index 251cc3534..8fe04fab1 100755
>>> --- a/test/sql-tap/func.test.lua
>>> +++ b/test/sql-tap/func.test.lua
>>> @@ -1,6 +1,6 @@
>>> #!/usr/bin/env tarantool
>>> test = require("sqltester")
>>> -test:plan(14586)
>>> +test:plan(14590)
>>>
>>> --!./tcltestrunner.lua
>>> -- 2001 September 15
>>> @@ -1912,37 +1912,37 @@ test:do_test(
>>> test:do_catchsql_test(
>>>     "func-22.1",
>>>     [[
>>> -        SELECT trim(1,2,3)
>>> +        SELECT TRIM(1,2,3)
>>
>> 20. Why? I thought that all identifiers are normalized
>> anyway, including function names, and you do not need
>> to uppercase everything manually. The same about the
>> test func-22.4, func-22.20.
> For consistency.

1. Please, do not make dubious changes not required by
the patch, wiping the git history, and padding the diff
out. Keep the old version. In other places, where the
only change was uppercasing, too, please.

>     sql: modify TRIM() function signature
>     
>     According to the ANSI standard, ltrim, rtrim and trim should
>     be merged into one unified TRIM() function. The specialization of
>     trimming (left, right or both and trimming characters) determined
>     in arguments of this function.
>     
>     Closes #3879
> 
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index abeecefa1..ac52ddda2 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -1286,108 +1286,183 @@ replaceFunc(sql_context * context, int argc, sql_value ** argv)
>  	sql_result_text(context, (char *)zOut, j, sql_free);
>  }
>  
> -/*
> - * Implementation of the TRIM(), LTRIM(), and RTRIM() functions.
> - * The userdata is 0x1 for left trim, 0x2 for right trim, 0x3 for both.
> +/**
> + * Remove chars included into @a trimming set from @a input_str.

2. There is no parameter named 'trimming'. According to
doxygen documentation, @a takes one word as an argument
and it is usually the function argument's name.
http://www.doxygen.nl/manual/commands.html#cmda

Please, prune old parameter names from the comments.

> + * @param context SQL context.
> + * @param flags Trim specification: left, right or both.
> + * @param trim_set The set of characters for trimming.
> + * @param trim_set_sz Character set size in bytes.
> + * @param input_str Input string for trimming.
> + * @param input_str_sz Input string size in bytes.
>   */
>  static void
> -trimFunc(sql_context * context, int argc, sql_value ** argv)
> +trim_procedure(sql_context * context, enum trim_side_mask flags,
> +	       const unsigned char *trim_set, int trim_set_sz,
> +	       const unsigned char *input_str, int input_str_sz)

3. Now I started looking into this function more attentively and
I see, that you changed this function in 100%, which means that you
should use Tarantool code style, not SQLite. See some concrete
points below.

>  {
> -	const unsigned char *zIn;	/* Input string */
> -	const unsigned char *zCharSet;	/* Set of characters to trim */
> -	int nIn;		/* Number of bytes in input */
> -	int flags;		/* 1: trimleft  2: trimright  3: trim */
> -	int i;			/* Loop counter */
> -	unsigned char *aLen = 0;	/* Length of each character in zCharSet */
> -	unsigned char **azChar = 0;	/* Individual characters in zCharSet */
> -	int nChar;		/* Number of characters in zCharSet */
> +	int i;
> +	/* Length of each character in the character set. */
> +	char unsigned *char_len = 0;

4. You again ignored my comment about NULL. Please, find all other
places and fix it finally. I said it already 1000 times in 1000
reviews - we do not use 0 for pointers. It is a simple rule. Just
follow it. Write it down somewhere in a list of code style rules
and check them all before sending a patch.

Seeing how many my comments you repeatedly ignore, I think that
probably you should reconsider the way how you do self-reviews. If
you do it via just looking a couple of seconds at the code in the
text editor, then it is definitely a bad way.

First of all, use 'git diff/show' in console to look only at the
patch changes, not at the entire files and functions. If you do not
like console, and it is ok, then you can use Sublime Merge
desktop program or Sublime Git package for the editor. When you
look at the diff only, it is much simpler to notice such violations
and even bugs.



5. In our code style we do not use 'char' to represent numbers, we
use 'uint8_t' or 'int8_t' when we want to use one-byte numbers. It
is the same as 'char'/'unsigned char', but looks shorter and it
becomes obvious that these values are used as numbers, not text.
Firstly I thought that char_len was an array of characters, but
it emerged being an array of symbol sizes. In the summary, I
suggest to use 'uint8_t *' for char_len array.

> +	/* Individual characters in the character set. */
> +	char unsigned **ind_chars = 0;

6. If you declare it as 'const char unsigned **', then you
can remove unnecessary type cast from line 1330.

7. Normally, we do not reorder 'unsigned' and 'char/int/long'.

    char unsigned -> unsigned char

>  
> -	if (sql_value_type(argv[0]) == SQL_NULL) {
> -		return;
> -	}
> -	zIn = sql_value_text(argv[0]);
> -	if (zIn == 0)
> -		return;
> -	nIn = sql_value_bytes(argv[0]);
> -	assert(zIn == sql_value_text(argv[0]));
> -	if (argc == 1) {
> -		static const unsigned char lenOne[] = { 1 };
> -		static unsigned char *const azOne[] = { (u8 *) " " };
> -		nChar = 1;
> -		aLen = (u8 *) lenOne;
> -		azChar = (unsigned char **)azOne;
> -		zCharSet = 0;
> -	} else if ((zCharSet = sql_value_text(argv[1])) == 0) {
> -		return;
> -	} else {
> -		const unsigned char *z = zCharSet;
> -		int trim_set_sz = sql_value_bytes(argv[1]);
> -		/*
> -		* Count the number of UTF-8 characters passing
> -		* through the entire char set, but not up
> -		* to the '\0' or X'00' character. This allows
> -		* to handle trimming set containing such
> -		* characters.
> -		*/
> -		nChar = sql_utf8_char_count(z, trim_set_sz);
> -		if (nChar > 0) {
> -			azChar =
> -			    contextMalloc(context,
> -					  ((i64) nChar) * (sizeof(char *) + 1));
> -			if (azChar == 0) {
> -				return;
> -			}
> -			aLen = (unsigned char *)&azChar[nChar];
> -			z = zCharSet;
> -			i = 0;
> -			nChar = 0;
> -			int handled_bytes_cnt = trim_set_sz;
> -			while(handled_bytes_cnt > 0) {
> -				azChar[nChar] = (unsigned char *)(z + i);
> -				SQL_UTF8_FWD_1(z, i, trim_set_sz);
> -				aLen[nChar] = (u8) (z + i - azChar[nChar]);
> -				handled_bytes_cnt -= aLen[nChar];
> -				nChar++;
> -			}
> +	const unsigned char *z = trim_set;
> +	/*
> +	 * Count the number of UTF-8 characters passing through
> +	 * the entire char set, but not up to the '\0' or X'00'
> +	 * character. This allows to handle trimming set
> +	 * containing such characters.
> +	 */
> +	int char_cnt = sql_utf8_char_count(z, trim_set_sz);
> +	if (char_cnt > 0) {
> +		ind_chars =
> +		    contextMalloc(context,
> +				  ((i64) char_cnt) *

8. Why do you need that cast to 'i64'? Anyway you access that memory by
'int' indexes in the next lines. Please, remove it.

> +				  (sizeof(unsigned char *) + 1));
> +		if (ind_chars == 0)
> +			return;
> +		char_len = (unsigned char *)&ind_chars[char_cnt];
> +		z = trim_set;
> +		i = 0;
> +		char_cnt = 0;
> +		int handled_bytes_cnt = trim_set_sz;
> +		while(handled_bytes_cnt > 0) {
> +			ind_chars[char_cnt] = (unsigned char *)(z + i);
> +			SQL_UTF8_FWD_1(z, i, trim_set_sz);
> +			char_len[char_cnt] = (u8) (z + i - ind_chars[char_cnt]);
9. Why do you need that cast to 'u8'? 'u8' == 'unsigned char', and the
type of that expression is already 'unsigned char'.

> +			handled_bytes_cnt -= char_len[char_cnt];
> +			char_cnt++;
>  		}
>  	}
> -	if (nChar > 0) {
> -		flags = SQL_PTR_TO_INT(sql_user_data(context));
> -		if (flags & 1) {
> -			while (nIn > 0) {
> +	if (char_cnt > 0) {
10. Indentation next 33 lines is huge and they are followed by
just one 2-line function call. Just do 'goto result' here if
char_cnt == 0 and reduce the indentation. The same can be done at
line 1317 in order to reduce indentation of next 17 lines.

> +		if ((flags & TRIM_LEADING) != 0) {
> +			while (input_str_sz > 0) {
>  				int len = 0;
> -				for (i = 0; i < nChar; i++) {
> -					len = aLen[i];
> -					if (len <= nIn
> -					    && memcmp(zIn, azChar[i], len) == 0)
> +				for (i = 0; i < char_cnt; i++) {
> +					len = char_len[i];
> +					if (len <= input_str_sz
> +					    && memcmp(input_str,
> +						      ind_chars[i], len) == 0)
>  						break;
>  				}
> -				if (i >= nChar)
> +				if (i >= char_cnt)
>  					break;
> -				zIn += len;
> -				nIn -= len;
> +				input_str += len;
> +				input_str_sz -= len;
>  			}
>  		}
> -		if (flags & 2) {
> -			while (nIn > 0) {
> +		if ((flags & TRIM_TRAILING) != 0) {
> +			while (input_str_sz > 0) {
>  				int len = 0;
> -				for (i = 0; i < nChar; i++) {
> -					len = aLen[i];
> -					if (len <= nIn
> -					    && memcmp(&zIn[nIn - len],
> -						      azChar[i], len) == 0)
> +				for (i = 0; i < char_cnt; i++) {
> +					len = char_len[i];
> +					if (len <= input_str_sz
> +					    && memcmp(&input_str[input_str_sz - len],
> +						      ind_chars[i], len) == 0)

11. Out of 80. And you saw that in your editor, even without
'git diff' and console, because you have 80-rulers. So why did
you decide not to fix it?

>  						break;
>  				}
> -				if (i >= nChar)
> +				if (i >= char_cnt)
>  					break;
> -				nIn -= len;
> +				input_str_sz -= len;
>  			}
>  		}
> -		if (zCharSet) {
> -			sql_free(azChar);
> -		}
> +		if (trim_set_sz != 0)
> +			sql_free(ind_chars);
> +	}
> +	sql_result_text(context, (char *)input_str, input_str_sz,
> +			SQL_TRANSIENT);
> +}
> +
> +/**
> + * Normalize args from @a argv input array when it has one arg
> + * only.
> + *
> + * Case: TRIM(<str>)
> + * Call trimming procedure with TRIM_BOTH as the flags and " " as
> + * the trimming set.
> + *
> + * @param context SQL context.

12. As I said in the previous reviews, we either omit doxygen
formal style completely, or use it correctly. If you want to use
doxygen, please, describe all the 3 parameters. If you do not
want, then omit @param/@retval section. The same for other places.

> + */
> +static void
> +trim_func_one_arg(sql_context * context, int argc, sql_value **argv)

13. In new code we use explicit 'struct' keyword for struct
types - sql_context and sql_value. Also, we do not put whitepaces
after '*' when declare a pointer type value. The same for other places.

> +/**
> + * Normalize args from @a argv input array when it has two args.
> + *
> + * Case: TRIM(<character_set> FROM <str>)
> + * If user has specified <character_set> only, call trimming
> + * procedure with TRIM_BOTH as the flags and that trimming set.
> + *
> + * Case: TRIM(LEADING/TRAILING/BOTH FROM <str>)
> + * If user has specified side keyword only, then call trimming
> + * procedure with the specified side and " " as the trimming set.
> + *
> + * @param context SQL context.
> + */
> +static void
> +trim_func_two_args(sql_context * context, int argc, sql_value **argv)
> +{
> +	const unsigned char *input_str;
> +	assert(argc == 2);
> +	(void) argc;
> +
> +	if ((input_str = sql_value_text(argv[1])) == NULL)
> +		return;
> +	int input_str_sz = sql_value_bytes(argv[1]);
> +
> +	const char unsigned *trim_set;
> +	if (sql_value_type(argv[0]) == SQL_INTEGER) {
> +		trim_procedure(context, sql_value_int(argv[0]),
> +			       (const unsigned char *) " ", 1,
> +			       input_str, input_str_sz);
> +	} else if ((trim_set = sql_value_text(argv[0])) == NULL) {
> +		return;
> +	} else {

14. Please, apply.

@@ -1427,9 +1427,7 @@ trim_func_two_args(sql_context * context, int argc, sql_value **argv)
 		trim_procedure(context, sql_value_int(argv[0]),
 			       (const unsigned char *) " ", 1,
 			       input_str, input_str_sz);
-	} else if ((trim_set = sql_value_text(argv[0])) == NULL) {
-		return;
-	} else {
+	} else if ((trim_set = sql_value_text(argv[0])) != NULL) {
 		int trim_set_sz = sql_value_bytes(argv[0]);
 		trim_procedure(context, TRIM_BOTH, trim_set, trim_set_sz,
 			       input_str, input_str_sz);

> +		int trim_set_sz = sql_value_bytes(argv[0]);
> +		trim_procedure(context, TRIM_BOTH, trim_set, trim_set_sz,
> +			       input_str, input_str_sz);
> +	}
> +}
> +
> +/**
> + * Normalize args from @a argv input array when it has three args.
> + *
> + * Case: TRIM(LEADING/TRAILING/BOTH <character_set> FROM <str>)
> + * If user has specified side keyword and <character_set>, then
> + * call trimming procedure with that args.
> + *
> + * @param context SQL context.
> + */
> +static void
> +trim_func_three_args(sql_context * context, int argc, sql_value **argv)
> +{
> +	const unsigned char *input_str;
> +	assert(argc == 3);
> +	assert(sql_value_type(argv[0]) == SQL_INTEGER);
> +	(void) argc;
> +
> +	if ((input_str = sql_value_text(argv[2])) == NULL)
> +		return;
> +	int input_str_sz = sql_value_bytes(argv[2]);
> +
> +	const char unsigned *trim_set;
> +	if ((trim_set = sql_value_text(argv[1])) != NULL) {
> +		int trim_set_sz = sql_value_bytes(argv[1]);
> +		trim_procedure(context, sql_value_int(argv[0]), trim_set,
> +			       trim_set_sz, input_str, input_str_sz);
>  	}
> -	sql_result_text(context, (char *)zIn, nIn, SQL_TRANSIENT);

15. Please, apply.

@@ -1448,21 +1446,18 @@ trim_func_two_args(sql_context * context, int argc, sql_value **argv)
 static void
 trim_func_three_args(sql_context * context, int argc, sql_value **argv)
 {
-	const unsigned char *input_str;
+	const unsigned char *input_str, *trim_set;
 	assert(argc == 3);
 	assert(sql_value_type(argv[0]) == SQL_INTEGER);
 	(void) argc;
 
-	if ((input_str = sql_value_text(argv[2])) == NULL)
+	if ((input_str = sql_value_text(argv[2])) == NULL ||
+	    (trim_set = sql_value_text(argv[1])) == NULL)
 		return;
 	int input_str_sz = sql_value_bytes(argv[2]);
-
-	const char unsigned *trim_set;
-	if ((trim_set = sql_value_text(argv[1])) != NULL) {
-		int trim_set_sz = sql_value_bytes(argv[1]);
-		trim_procedure(context, sql_value_int(argv[0]), trim_set,
-			       trim_set_sz, input_str, input_str_sz);
-	}
+	int trim_set_sz = sql_value_bytes(argv[1]);
+	trim_procedure(context, sql_value_int(argv[0]), trim_set, trim_set_sz,
+		       input_str, input_str_sz);
 }

> diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y
> index 099daf512..b49638d44 100644
> --- a/src/box/sql/parse.y
> +++ b/src/box/sql/parse.y
> +
> +%type expr_optional {struct Expr *}
> +%destructor expr_optional {sql_expr_delete(pParse->db, $$, false);}
> +
> +expr_optional(A) ::= . { A = NULL; }
> +expr_optional(A) ::= expr(X). { A = X.pExpr; }
> +
> +%type trim_specification {int}

16. It is not int - it is enum trim_side_mask.