[tarantool-patches] Re: [PATCH] sql: add index_def to struct Index

Tue May 22 22:10:13 MSK 2018

Hello. Thanks for the patch! See 26 comments below.

On 22/05/2018 19:03, Ivan Koptelov wrote:
> Currently there is a problem with
> non-deterministic nullability in
> index_def. To see the error just
> uncomment printfs on build.c 2999
> and run any sql-tap test that
> has to do with indexes, e.g
> distinct.test.lua.
> github branch: https://github.com/tarantool/tarantool/tree/sb/gh-3369-use-index-def-in-select-and-where
> ---
>   src/box/sql/build.c     | 179 +++++++++++++++++++++++++++++++++++++++++++++++-
>   src/box/sql/sqliteInt.h |   2 +
>   2 files changed, 179 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/sql/build.c b/src/box/sql/build.c
> index 742f8a572..6bd70fa9f 100644
> --- a/src/box/sql/build.c
> +++ b/src/box/sql/build.c
> @@ -43,6 +43,7 @@
>    *     COMMIT
>    *     ROLLBACK
>    */
> +#include <ctype.h>

1. See the comment 2 and remove it, please.

>   #include "sqliteInt.h"
>   #include "vdbeInt.h"
>   #include "tarantoolInt.h"
> @@ -2841,9 +2842,173 @@ index_is_unique(Index *idx)
>   	return tnt_index->def->opts.is_unique;
>   }
>   
> +/*
> + * In CREATE INDEX requests user may set binary collation with COLLATE
> + * [BINARY, binary, "binary" and "bInarY"] This function is to handle all
> + * these cases.
> + */
> +bool is_binary_collation(const char *coll_name)
> +{
> +	char *upper_case_name =
> +		malloc(sizeof(*coll_name) * strlen(coll_name));
> +	for (int i = 0; i < (int) strlen(coll_name); i++) {
> +		upper_case_name[i] = toupper(coll_name[i]);
> +	}
> +
> +	bool res = (strcmp(upper_case_name, "BINARY") == 0);
> +	free(upper_case_name);
> +	return res;

2. All this function is identical to simple strcasecmp(coll_name, "binary").

> +}
> +
> +//bool is_index_equal(Index * old, struct index_def *new)
> +//{
> +//	assert(strcmp(old->zName, new->name);
> +//	return true;
> +//}

3. Garbage diff.

> +
> +struct index_def *create_index_def(Table *pTab, const char *name,
> +				   uint32_t name_len, int onError,
> +				   ExprList *pList, u8 idxType)

4. According to Tarantool code style you should wrap line after return type
definition. And please, use snake_case for variable names.

> +{
> +	struct space_def *space_def = pTab->def;
> +	struct space *space = space_by_id(space_def->id);

5. space_by_id takes a space from the space cache, but during
'CREATE TABLE' generation the space is not in the cache. This will
return NULL.

You must not use space * here at all. I see, that you use it
in a single place to find a primary index key_def, but you can get it
from the pTab from the first pIndex - it is always primary one.

> +
> +	uint32_t iid = 0;
> +	Index *temp = pTab->pIndex;
> +	while (temp != NULL) {
> +		iid++;
> +		temp = temp->pNext;

6. Bad idea. You must take Index * as the function argument, and initialize its
index_def.

> +	}
> +
> +	struct index_opts *opts = (struct index_opts *) malloc(sizeof(*opts));

7. Why can not you declare it on the stack?

> +	index_opts_create(opts);
> +	opts->sql = NULL;
> +
> +	if (onError == ON_CONFLICT_ACTION_NONE) {
> +		opts->is_unique = false;
> +	} else {
> +		opts->is_unique = true;
> +	}

8. Do not put 'if' body in {} when it consists of a single line.

9. Here you can simply do this:
opts->is_unique = onError != ON_CONFLICT_ACTION_NONE;

> +
> +	struct key_part_def *part_def_tmp = (struct key_part_def *)
> +		malloc(sizeof(*part_def_tmp) * pList->nExpr);

10. Please, do not do this. Use key_def_new + multiple key_def_set_part.
You use too many mallocs.

> +
> +	uint32_t part_count = 0;
> +	int i = 0;
> +	struct ExprList_item *pListItem;
> +
> +	/*
> +	 * In case of ordinary index creation we want ot store a
> +	 * SQL statement of this creation.
> +	 */
> +	if (idxType == SQLITE_IDXTYPE_APPDEF) {
> +		asprintf(&opts->sql,"CREATE INDEX %s ON %s(",
> +			 name, space_def->name);

11. You should check if asprintf is failed. Or do not use asprintf at
all. See below how.

> +	}
> +
> +	for (i = 0, pListItem = pList->a; i < pList->nExpr; pListItem++, i++) {
> +		Expr *pCExpr = sqlite3ExprSkipCollate(pListItem->pExpr);
> +
> +		if (pCExpr->op == TK_COLUMN) {
> +			uint32_t fieldno = pCExpr->iColumn;
> +			part_def_tmp[i].fieldno = fieldno;
> +			part_def_tmp[i].nullable_action =
> +				ON_CONFLICT_ACTION_DEFAULT;

12. Wrong - action default is internal thing, that can not be
set in key_part actually. You must set action to the same value as
the space field is set.

> +			part_def_tmp[i].type =
> +				space_def->fields[fieldno].type;
> +
> +//			part_def_tmp[i].is_nullable =
> +//				(fieldno < space_def->field_count ?
> +//				 space_def->fields[fieldno].is_nullable :
> +//				 false);
> +
> +			part_def_tmp[i].is_nullable =
> +				table_column_is_nullable(pTab->def, fieldno);

13. Please, do not use table_column_is_nullable. It does space_by_id(def->id),
but here def->id is not defined when you create a primary index inside a CREATE
TABLE request.

> +
> +			part_def_tmp[i].coll_id =
> +				space_def->fields[fieldno].coll_id ?
> +				space_def->fields[fieldno].coll_id : COLL_NONE;

14. Do not use integers as booleans.

15. Why not just part_def_tmp[i].coll_id = space_def->fields[fieldno].coll_id?
Besides, coll_id == 0 is valid collation identifier. It is not the same as COLL_NONE.

> +			part_count++;

16. Why do you need this counter? You already know part count - it is equal
to pList->nExpr.

> +
> +			if (opts->sql != NULL) {

17. How can it be NULL?

> +				if (part_count == 1) {
> +					asprintf(&opts->sql,"%s%s,",opts->sql,
> +						 space_def->fields[fieldno].name);
> +				} else {
> +					asprintf(&opts->sql,"%s %s,",opts->sql,
> +						 space_def->fields[fieldno].name);
> +				}
> +			}
> +		} else {
> +			assert(pCExpr->op == TK_COLUMN);

18. So I see this:
if (something) {
  ...
} else {
  assert(something);
}

Very strange. This assert fails always.

> +		}
> +
> +		if (pListItem->pExpr->op == TK_COLLATE) {

19. Here you completely skipped column initialization when it has
COLLATE. See sqlite3ExprSkipCollate, the same code on line 3235, and
how COLLATE is stored in Expr tree (it replaces the original Expr making
it to be COLLATE's chils). This is why here pList->nExpr is the same as
part count.

> +			const char *coll_name = pListItem->pExpr->u.zToken;
> +			assert(coll_name != NULL);
> +			size_t coll_name_len = strlen(coll_name);
> +			struct coll *coll = coll_by_name(coll_name, coll_name_len);
> +
> +			if (is_binary_collation(coll_name)) {
> +				part_def_tmp[i].coll_id = COLL_NONE;
> +			} else {
> +				assert(coll != NULL);

20. Why? Collation can be not found. Use sqlite3GetCollSeq. But before do
rebase on the latest 2.0. You branch looks very out of date. Many things
were changed after the latest 1.10 into 2.0 merge.

> +				part_def_tmp[i].coll_id = coll->id;
> +			}
> +
> +			if (opts->sql != NULL) {

21. How can it be NULL?

> +				/*
> +				 * If index part has a collation then
> +				 * corresponding part of statement should
> +				 * be changed from 'column_name,' to
> +				 * 'column_name COLLATE collation_name,'.
> +				 */
> +				opts->sql[strlen(opts->sql) - 1] = ' ';
> +				asprintf(&opts->sql,"%sCOLLATE %s,", opts->sql,
> +					 coll_name);

22. Here and in all others asprintf you have leak - you reset opts->sql with new
malloc (called in asprintf), but do not free the original. And please, use region
instead. You can accumulate SQL parts via multiple region_alloc + single
region_join + strdup at the end. It makes leaks be impossible, and does not
require any malloc + free.

> +			}
> +		}
> +		if (pListItem->pExpr->op != TK_COLUMN &&
> +		    pListItem->pExpr->op != TK_COLLATE) {
> +			printf("error %d\n", pListItem->pExpr->op);
> +		}
> +	}
> +
> +	if (idxType == SQLITE_IDXTYPE_APPDEF) {
> +		opts->sql[strlen(opts->sql) - 1] = ')';
> +	}
> +
> +	struct key_part_def *part_def = (struct key_part_def *)
> +		malloc(sizeof(*part_def) * part_count + 1);
> +
> +	memcpy(part_def, part_def_tmp, sizeof(*part_def) * part_count);
> +	free(part_def_tmp);
> +
> +	struct key_def *key_def = key_def_new_with_parts(part_def, part_count);
> +	assert(key_def != NULL);

23. All this things can be removed by usage of key_def_new +
multiple key_def_set_part.

> +
> +	struct key_def *pk_key_def;
> +	if (idxType == SQLITE_IDXTYPE_APPDEF) {
> +		pk_key_def = space_index_key_def(space, 0);

24. You must not use a space here.

> +	} else {
> +		pk_key_def = NULL;
> +	}
> +
> +	enum index_type default_type = TREE;
> +
> +	printf("space id %d index name %s index iid %d\n", space_def->id, name, iid);
> +	printf("nullables\n");
> +	for (uint32_t j = 0; j < key_def->part_count; j++){
> +		printf("%d\n", key_part_is_nullable(&key_def->parts[i]));

25. You asked me why you see unstable nullability. Here the answer. You iterate over
j but use i as an index for key_def->parts array.

> +	}
> +
> +	return index_def_new(space_def->id, iid, name, name_len,
> +			     default_type, opts, key_def, pk_key_def);

26. Why no just inline TREE instead of declaring a variable default_type?