[tarantool-patches] Re: [PATCH 3/3] sql: fix passing FP values to integer iterator

Tue May 28 00:49:14 MSK 2019

Thanks for the patch! See 7 comments below.

1. Double whitespace in the commit title before 'FP'.

2. The patch does not work with unsigned numbers.

		tarantool> format = {}
		---
		...

		tarantool> format[1] = {name = 'ID', type = 'unsigned'}
		---
		...

		tarantool> format[2] = {name = 'A', type = 'unsigned'}
		---
		...

		tarantool> s = box.schema.create_space('T1', {format = format})
		---
		...

		tarantool> pk = s:create_index('pk')
		---
		...

		tarantool> sk = s:create_index('sk', {parts = {'A'}})
		---
		...

		tarantool> box.execute("INSERT INTO t1 VALUES (1, 1);")
		---
		- row_count: 1
		...

		tarantool> box.execute("SELECT a FROM t1 WHERE a IN (1.1, 2.1);")
		2019-05-28 00:34:19.922 [1452] main/102/interactive key_def.h:488 E> ER_KEY_PART_TYPE: Supplied key type of part 0 does not match index part type: expected unsigned
		---
		- error: 'Failed to execute SQL statement: Supplied key type of part 0 does not match
		    index part type: expected unsigned'
		...

On 24/05/2019 20:39, Nikita Pettik wrote:
> Before this patch it was impossible to compare indexed field of integer
> type and floating point value. For instance:
> 
> CREATE TABLE t1(id INT PRIMARY KEY, a INT UNIQUE);
> INSERT INTO t1 VALUES (1, 1);
> SELECT * FROM t1 WHERE a = 1.5;
> ---
> - error: 'Failed to execute SQL statement: Supplied key type of part 0 does not match
>     index part type: expected integer'
> ...
> 
> That happened due to the fact that type casting mechanism (OP_ApplyType)
> doesn't affect FP value when it is converted to integer. Hence, FP value
> was passed to the iterator over integer field which resulted in error.
> Meanwhile, comparison of integer and FP values is legal in SQL.  To cope
> with this problem for each equality comparison involving integer field
> we emit OP_MustBeInt, which checks whether value to be compared is
> integer or not. If the latter, we assume that result of comparison is
> always false and continue processing query.  For inequality constraints
> we pass auxiliary flag to OP_Seek** opcodes to notify it that one of key
> fields must be truncated to integer (in case of FP value) alongside with
> changing iterator's type: a > 1.5 -> a >= 2.
> 
> Closes #4187
> ---
>  src/box/sql/vdbe.c      | 23 +++++++++------
>  src/box/sql/wherecode.c | 76 +++++++++++++++++++++++++++++++++++--------------
>  test/sql/types.result   | 40 ++++++++++++++++++++++++++
>  test/sql/types.test.lua |  9 ++++++
>  4 files changed, 118 insertions(+), 30 deletions(-)
> 
> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
> index 3bd82234e..3ead6b40f 100644
> --- a/src/box/sql/vdbe.c
> +++ b/src/box/sql/vdbe.c
> @@ -3496,15 +3495,22 @@ case OP_SeekGT: {       /* jump, in3 */
>  	pC->seekOp = pOp->opcode;
>  #endif
>  	iKey = 0;
> -	reg_ipk = pOp->p5;
> -
> -	if (reg_ipk > 0) {
> +	/*
> +	 * In case floating value is intended to be passed to
> +	 * iterator over integer field, we must truncate it to
> +	 * integer value and change type of iterator:
> +	 * a > 1.5 -> a >= 2
> +	 */
> +	int int_field = pOp->p5;

3. Comments on P5 are now outdated.

>  
> +	if (int_field > 0) {
>  		/* The input value in P3 might be of any type: integer, real, string,
>  		 * blob, or NULL.  But it needs to be an integer before we can do
>  		 * the seek, so convert it.
>  		 */
> -		pIn3 = &aMem[reg_ipk];
> +		pIn3 = &aMem[int_field];
> +		if ((pIn3->flags & MEM_Null) != 0)
> +			goto skip_truncate;
>  		if ((pIn3->flags & (MEM_Int|MEM_Real|MEM_Str))==MEM_Str) {
>  			mem_apply_numeric_type(pIn3);
>  		}
> diff --git a/src/box/sql/wherecode.c b/src/box/sql/wherecode.c
> index 977c0fced..e779729c7 100644
> --- a/src/box/sql/wherecode.c
> +++ b/src/box/sql/wherecode.c
> @@ -1086,31 +1086,61 @@ sqlWhereCodeOneLoopStart(WhereInfo * pWInfo,	/* Complete information about the W
>  			start_constraints = 1;
>  		}
>  		struct index_def *idx_pk = space->index[0]->def;
> -		int fieldno = idx_pk->key_def->parts[0].fieldno;
> -
>  		uint32_t pk_part_count = idx_pk->key_def->part_count;
> -		if (pk_part_count == 1 &&
> -		    space->def->fields[fieldno].type == FIELD_TYPE_INTEGER) {
> -			/* Right now INTEGER PRIMARY KEY is the only option to
> -			 * get Tarantool's INTEGER column type. Need special handling
> -			 * here: try to loosely convert FLOAT to INT. If RHS type
> -			 * is not INT or FLOAT - skip this ites, i.e. goto addrNxt.
> -			 */
> -			int limit = pRangeStart == NULL ? nEq : nEq + 1;
> -			for (int i = 0; i < limit; i++) {
> -				if (idx_def->key_def->parts[i].fieldno ==
> -				    idx_pk->key_def->parts[0].fieldno) {
> -					/* Here: we know for sure that table has INTEGER
> -					   PRIMARY KEY, single column, and Index we're
> -					   trying to use for scan contains this column. */
> -					if (i < nEq)
> -						sqlVdbeAddOp2(v, OP_MustBeInt, regBase + i, addrNxt);
> -					else
> -						force_integer_reg = regBase + i;
> -					break;
> -				}
> +		/*
> +		 * Tarantool's iterator over integer fields doesn't
> +		 * tolerate floating point values. Hence, if term
> +		 * is equality comparison and value of operand is
> +		 * not integer, we can skip it since it always
> +		 * results in false: INT a == 0.5 -> false;
> +		 * It is done using OP_MustBeInt facilities.
> +		 * In case term is greater comparison (a > ?), we
> +		 * should notify OP_SeekGT to process truncation of
> +		 * floating point value: a > 0.5 -> a >= 1;
> +		 * It is done by setting P5 flag for OP_Seek*.
> +		 * It is worth mentioning that we do not need
> +		 * this step when it comes for less comparison:
> +		 * key is NULL in this case: values are ordered
> +		 * as  NULL, ... NULL, min_value, ..., so to
> +		 * fetch min value we pass NULL to GT iterator.

4. I did not get the 'less comparator == compare with NULL'
idea. Firstly you describe how to compare 'a > ?', this is
fully understandable. Then you, as I see, go to the 'a < ?'
case, but somewhy say, that it is the same as 'NULL < ?'.
Do not understand.

> +		 * The only exception is less comparison in
> +		 * conjunction with ORDER BY DESC clause:
> +		 * in such situation we use LE iterator and
> +		 * truncated value to compare. But then
> +		 * pRangeStart == NULL.
> +		 * This procedure is correct for compound index:
> +		 * only one comparison of less/greater type can be
> +		 * used at the same time. For instance,
> +		 * a < 1.5 AND b > 0.5 is handled by SeekGT using
> +		 * column a and fetching column b from tuple and
> +		 * OP_Le comparison.
> +		 *
> +		 * Note that OP_ApplyType, which is emitted before
> +		 * OP_Seek** doesn't truncate floating point to
> +		 * integer. That's why we need this routine.

5. Why can't OP_ApplyType truncate the numbers? Because it does
not know, to which side round the values? Then probably we can
drop OP_ApplyType invocations?

> +		 * Also, note that terms are separated by OR
> +		 * predicates, so we consider term as sequence
> +		 * of AND'ed predicates.
> +		 */
> +		int seek_addr = 0;
> +		for (int i = 0; i < nEq; i++) {
> +			enum field_type type = idx_def->key_def->parts[i].type;
> +			if (type == FIELD_TYPE_INTEGER) {
> +				/*
> +				 * OP_MustBeInt consider NULLs as
> +				 * non-integer values, so firstly
> +				 * check whether value is NULL or not.
> +				 */
> +				seek_addr = sqlVdbeAddOp1(v, OP_IsNull,
> +							  regBase);

6. I do not understand, how does it work. You rewrite
seek_addr in the cycle. It means, that if there are 2 integers,
the first OP_IsNull address is rewritten by the second, and
below, in sqlVdbeJumpHere(v, seek_addr), you will update P2
for the last comparison only. The first OP_IsNull.P2 will stay 0.

> +				sqlVdbeAddOp2(v, OP_MustBeInt, regBase + i,
> +					      addrNxt);
>  			}
>  		}
> +		/* Inequality constraint comes always at the end of list. */
> +		if (pRangeStart != NULL &&
> +		    idx_def->key_def->parts[nEq].type == FIELD_TYPE_INTEGER)
> +			force_integer_reg = regBase + nEq;
>  		emit_apply_type(pParse, regBase, nConstraint - bSeekPastNull,
>  				start_types);
>  		if (pLoop->nSkip > 0 && nConstraint == pLoop->nSkip) {
> @@ -1122,6 +1152,8 @@ sqlWhereCodeOneLoopStart(WhereInfo * pWInfo,	/* Complete information about the W
>  			op = aStartOp[(start_constraints << 2) +
>  				      (startEq << 1) + bRev];
>  			assert(op != 0);
> +			if (seek_addr != 0)
> +				sqlVdbeJumpHere(v, seek_addr);
>  			sqlVdbeAddOp4Int(v, op, iIdxCur, addrNxt, regBase,
>  					     nConstraint);
>  			/* If this is Seek* opcode, and IPK is detected in the
> diff --git a/test/sql/types.result b/test/sql/types.result
> index 758018eb5..274d73dd2 100644
> --- a/test/sql/types.result
> +++ b/test/sql/types.result
> @@ -927,3 +927,43 @@ box.execute("SELECT * FROM tboolean WHERE s1 = 1.123;")
>  box.space.TBOOLEAN:drop()
>  ---
>  ...
> +box.execute("CREATE TABLE t1(id INT PRIMARY KEY, a INT UNIQUE);")
> +---
> +- row_count: 1
> +...
> +box.execute("INSERT INTO t1 VALUES (1, 1);")
> +---
> +- row_count: 1
> +...
> +box.execute("SELECT a FROM t1 WHERE a IN (1.1, 2.1);")
> +---
> +- metadata:
> +  - name: A
> +    type: integer
> +  rows: []
> +...
> +box.execute("SELECT a FROM t1 WHERE a = 1.1;")
> +---
> +- metadata:
> +  - name: A
> +    type: integer
> +  rows: []
> +...
> +box.execute("SELECT a FROM t1 WHERE a > 1.1;")
> +---
> +- metadata:
> +  - name: A
> +    type: integer
> +  rows: []
> +...
> +box.execute("SELECT a FROM t1 WHERE a < 1.1;")
> +---
> +- metadata:
> +  - name: A
> +    type: integer
> +  rows:
> +  - [1]

7. Please, add the Kostja's example about '= 1.0'.

> +...
> +box.space.T1:drop()
> +---
> +...