[tarantool-patches] Re: [PATCH 3/3] sql: move default col values to Tarantool's core

Kirill Yukhin kyukhin at tarantool.org
Sat Mar 31 06:55:04 MSK 2018


Hello Vlad!
I've fixed most of your inputs.

On 29 мар 17:00, v.shpilevoy at tarantool.org wrote:
> See below 20 comments.
> 
> > 29 марта 2018 г., в 9:42, Kirill Yukhin <kyukhin at tarantool.org> написал(а):
> > 
> > Extract expressions parsing into separate routine to
> > allow Tarantool's backend compile DEFAULT statements w/o
> > SQL machinery at all. So far, for DEFAULT values no context
> > is needed: only constant expressions and built-ins are allowed.
> > In future, table context will be added to allow use column
> > values for CHECK constraint expressions.
> > 
> > Move DEFAULT string value to space_def. Move compiled expresion
> > to field_def as well. Reason not to move compiled expression
> > to tuple_field as follows: we do not want to engage parser
> > during tuple validation. So, do it in alter.cc.
> > 
> > In order to allow expression duplication in alter.cc: expose
> > those routines from expr.c and make their names correspond to
> > coding style.
> > 
> > Since recovery is performed before SQL fronend initialization:
> > split it into two pieces: 1. create SQL handler, enable
> > all subsystems 2. Do recovery.  This will allow to run
> > parser during recovery, since it needs db handle so far.
> > 
> > Part of #3235
> > ---
> > src/CMakeLists.txt      |   2 +-
> > src/box/alter.cc        |  17 +
> > src/box/box.cc          |   3 +-
> > src/box/field_def.c     |   5 +-
> > src/box/field_def.h     |   4 +
> > src/box/space_def.c     | 105 ++++--
> > src/box/space_def.h     |  10 +-
> > src/box/sql.c           |  59 +++-
> > src/box/sql.h           |  30 ++
> > src/box/sql/build.c     |  18 +-
> > src/box/sql/delete.c    |  12 +-
> > src/box/sql/expr.c      | 114 ++++---
> > src/box/sql/fkey.c      |  13 +-
> > src/box/sql/insert.c    |  32 +-
> > src/box/sql/main.c      |  15 -
> > src/box/sql/parse.c     | 876 ++++++++++++++++++++++++------------------------
> > src/box/sql/parse.y     |  32 +-
> > src/box/sql/resolve.c   |  12 +-
> > src/box/sql/select.c    |  28 +-
> > src/box/sql/sqliteInt.h |  11 +-
> > src/box/sql/tokenize.c  | 121 +++++++
> > src/box/sql/trigger.c   |  12 +-
> > src/box/sql/update.c    |   2 +-
> > src/box/sql/wherecode.c |   2 +-
> > src/box/sql/whereexpr.c |   6 +-
> > 25 files changed, 928 insertions(+), 613 deletions(-)
> > 
> > diff --git a/src/box/alter.cc b/src/box/alter.cc
> > index 299dcb0..b976d0f 100644
> > --- a/src/box/alter.cc
> > @@ -403,6 +404,22 @@ field_def_decode(struct field_def *field, const char **data,
> > 			  tt_sprintf("collation is reasonable only for "
> > 				     "string, scalar and any fields"));
> > 	}
> > +
> > +	if (field->default_value != NULL) {
> > +		struct Expr *expr = NULL;
> > +		char *err = 0;
> 
> 1. err = NUlL.
Fixed.

> > +		/* Recovery is performed before SQL FE is initialized.
> > +		 * Postpone AST compilation if in process of recovery.
> > +		 */
> > +		if (sql_get() != NULL) {
> 
> 1. In the commit message you said: 
> "Since recovery is performed before SQL fronend initialization:
> split it into two pieces: 1. create SQL handler, enable
> all subsystems 2. Do recovery.  This will allow to run
> parser during recovery, since it needs db handle so far."
> 
> Why sql_get() can be NULL here?
> And why do you ignore error?
> In a case of error in one of next field definitions, or because of
> OOM, or error further in space_def_new_from_tuple, the default_value_ast
> leaks. Other field_def fields are not deleted explicitly here, because they
> are on region memory, that is deleted on a transaction rollback.
As agreed verbally, I've put scope guard to free memory in case something
went wrong. The only place which I was managed to find was in
on_replace_dd_space(). It looks like already properly guarded everywhere
else.
Also, I've refactored tarantoolSqlite3EphemeralCreate, to avoid multiple
leaks there.
Vlad, this is new area to me, so could you pls put especial attention
to such stuff while reviewing my changes?

> > +			sql_compile_expr(sql_get(),
> > +					 field->default_value,
> > +					 &expr,
> > +					 &err);
> > +			assert(err == NULL);
> > +			field->default_value_ast = expr;
> 
> 2. How about another name? For example, default_value_expr? I know,
> formally it is AST, but actually it is expr. Or lets rename struct Expr to
> struct expr_ast, or struct sql_ast or something.
Rgr. Done, although I am still referencing in field def that the field is
actually AST.

> > diff --git a/src/box/space_def.c b/src/box/space_def.c
> > index 1b3c305..7b9a664 100644
> > --- a/src/box/space_def.c
> > +++ b/src/box/space_def.c
> > @@ -49,7 +50,7 @@ const struct opt_def space_opts_reg[] = {
> > /**
> >  * Size of the space_def.
> >  * @param name_len Length of the space name.
> > - * @param field_names_size Size of all names.
> > + * @param fields Fields array of space format.
> >  * @param field_count Space field count.
> >  * @param[out] names_offset Offset from the beginning of a def to
> >  *             a field names memory.
> 
> 3. Add a comment about second out parameter, please.
Done.

> > @@ -58,25 +59,37 @@ const struct opt_def space_opts_reg[] = {
> >  * @retval Size in bytes.
> >  */
> > static inline size_t
> > -space_def_sizeof(uint32_t name_len, uint32_t field_names_size,
> > +space_def_sizeof(uint32_t name_len, const struct field_def *fields,
> > 		 uint32_t field_count, uint32_t *names_offset,
> > -		 uint32_t *fields_offset)
> > +		 uint32_t *fields_offset, uint32_t *def_ast_offset)
> > {
> > +	uint32_t field_strs_size = 0;
> > +	uint32_t def_exprs_size = 0;
> > +	for (uint32_t i = 0; i < field_count; ++i) {
> > +		field_strs_size += strlen(fields[i].name) + 1;
> > +		if (fields[i].default_value != NULL) {
> > +			int len = strlen(fields[i].default_value);
> > +			field_strs_size += len + 1;
> > +		}
> > +		if (fields[i].default_value_ast != NULL) {
> 
> 4. If I correctly understand, (fields[i].default_value != NULL) ==
>                               (fields[i].default_value_ast != NULL).
> So if a default_value != NULL, then it is guaranteed the
> default_value_ast != NULL too. Please, correct me, or move this code
> into the body of the previous 'if' and add an assertion.
Done, although frankly speaking, I see no benefit from this.

> > @@ -92,12 +105,25 @@ space_def_dup(const struct space_def *src)
> > 			return NULL;
> > 		}
> > 	}
> > -	char *name_pos = (char *)ret + names_offset;
> > +	char *strs_pos = (char *)ret + strs_offset;
> > +	char *expr_pos = (char *)ret + def_ast_offset;
> > 	if (src->field_count > 0) {
> > 		ret->fields = (struct field_def *)((char *)ret + fields_offset);
> > 		for (uint32_t i = 0; i < src->field_count; ++i) {
> > -			ret->fields[i].name = name_pos;
> > -			name_pos += strlen(name_pos) + 1;
> > +			ret->fields[i].name = strs_pos;
> > +			strs_pos += strlen(strs_pos) + 1;
> > +			if (src->fields[i].default_value != NULL) {
> > +				ret->fields[i].default_value = strs_pos;
> > +				strs_pos += strlen(strs_pos) + 1;
> > +			}
> 
> 5. Same as 3.
Same as 4. I guess. Done.

> > +			struct Expr *def_expr_ast = src->fields[i].default_value_ast;
> > +			if (def_expr_ast != NULL) {
> > +				struct Expr *expr = NULL;
> 
> 6. Useless initialization.
Fixed.

> > +				expr = sql_expr_dup(sql_get(), def_expr_ast,
> > +						    0, &expr_pos);
> 
> 7. If I correctly understand the sql_expr_dup, after the expression is duplicated,
> expr_pos points the byte right after the duplicated Expr. Can you please check it?
> If it is true, then you must skip this line "expr_pos += sql_duped_expr_size(expr, 0);",
> and I can not understand why it works now.
This is total mess. I think, I've mentioned that in cover letter: Expr is not changed after
invocation of this routine. That's why I am explicitly promoting pointer's value few lines
later. All problems reside in fact, that memory management for ASTs in legacy SQL is
inconsistent: skeleton of the tree uses a buffer (if provided), everything else is
simply malloc'ed. Buffer pointer is promored only on non-reduced expressions, for
unknown reason. I'll submit separate issue asking move all allocations for AST to
region allocator. Also, I've put an assert that pointer wasn't promoted and commented
this piece of code.

> > @@ -147,18 +171,35 @@ space_def_new(uint32_t id, uint32_t uid, uint32_t exact_field_count,
> > 		}
> > 	}
> > 	def->field_count = field_count;
> > -	if (field_count == 0) {
> > +	if (field_count == 0)
> > 		def->fields = NULL;
> > -	} else {
> > -		char *name_pos = (char *)def + names_offset;
> > +	else {
> > +		char *strs_pos = (char *)def + strs_offset;
> > +		char *expr_pos = (char *)def + def_ast_offset;
> > 		def->fields = (struct field_def *)((char *)def + fields_offset);
> > 		for (uint32_t i = 0; i < field_count; ++i) {
> > 			def->fields[i] = fields[i];
> > -			def->fields[i].name = name_pos;
> > +			def->fields[i].name = strs_pos;
> > 			uint32_t len = strlen(fields[i].name);
> > 			memcpy(def->fields[i].name, fields[i].name, len);
> > 			def->fields[i].name[len] = 0;
> > -			name_pos += len + 1;
> > +			strs_pos += len + 1;
> > +
> > +			if (fields[i].default_value != NULL) {
> > +				def->fields[i].default_value = strs_pos;
> > +				len = strlen(fields[i].default_value);
> > +				memcpy(def->fields[i].default_value, fields[i].default_value, len);
> > +				def->fields[i].default_value[len] = 0;
> > +				strs_pos += len + 1;
> > +			}
> 
> 8. Same as 5, 6, 7.
Fixed/commented.

> > @@ -217,3 +258,17 @@ space_def_check_compatibility(const struct space_def *old_def,
> > 	return 0;
> > }
> > 
> > +void
> > +space_def_delete(struct space_def *def)
> > +{
> > +	space_opts_destroy(&def->opts);
> > +	tuple_dictionary_unref(def->dict);
> > +	for (uint32_t i = 0; i < def->field_count; ++i) {
> > +		if (def->fields[i].default_value_ast != NULL) {
> > +			sql_expr_free(sql_get(),
> > +				      def->fields[i].default_value_ast);
> > +		}
> 
> 9. Why does it work, if your expressions are the part of megamallocs above?
Fix is because of inconsistent allocation ficilities of AST. I've introduced
new flag designating if Expr's skeleton was allocated externally. In such a
case nothing is actually freed for skeleton, postponing deallocation to single
mega-free.
 
> > diff --git a/src/box/sql.c b/src/box/sql.c
> > index 6d8ef9a..8606be6 100644
> > --- a/src/box/sql.c
> > +++ b/src/box/sql.c
> > @@ -56,7 +56,7 @@
> > #include "xrow.h"
> > #include "iproto_constants.h"
> > 
> > -static sqlite3 *db;
> > +static sqlite3 *db = 0;
> 
> 10. Please, use NULL instead of 0. Common comment: assigning 0 in the
> code, where the variable declaration is not strictly visible is very
> confusing - it seems to be integer.
Done.
 
> > @@ -1781,3 +1814,25 @@ tarantoolSqlGetMaxId(uint32_t space_id, uint32_t index_id, uint32_t fieldno,
> > 
> > 	return tuple_field_u64(tuple, fieldno, max_id);
> > }
> > +
> > +/**
> > + * Given space_id and field number, return default value
> > + * for the field.
> > + * @param space_id Space ID.
> > + * @param fieldno Field index.
> > + * @retval Pointer to AST corresponding to default value.
> > + * Can be NULL if no DEFAULT specified or it is a view.
> > + */
> 
> 11. Please, put a function description above a declaration instead of implementation.
> We put description above implementation only if a declaration is absent.
Done. For all new routines in sql.h 

> > +struct Expr*
> > +space_get_column_default_expr(uint32_t space_id, uint32_t fieldno)
> > +{
> > +	struct space *space;
> > +	space = space_cache_find(space_id);
> > +	assert(space != NULL);
> > +	assert(space->def != NULL);
> > +	if (space->def->opts.is_view)
> > +		return NULL;
> 
> 12. How is it possible? Must not it be an assertion?
Yes, it's possible. 
Please, see example from sql-tap/triggerC-11.4:
            CREATE TABLE t2(a PRIMARY KEY, b);
            CREATE VIEW v2 AS SELECT * FROM t2;
            CREATE TRIGGER tv2 INSTEAD OF INSERT ON v2 BEGIN
              INSERT INTO log VALUES((SELECT coalesce(max(id),0) + 1 FROM log),
                                     new.a, new.b);
            END;
            INSERT INTO v2 DEFAULT VALUES;
In such a case, executor tries to retrieve default value column of a  VIEW,
however semantics is to return absent default value and do insert ""/NULLs.

> > diff --git a/src/box/sql.h b/src/box/sql.h
> > @@ -55,6 +60,31 @@ sql_free();
> > struct sqlite3 *
> > sql_get();
> > 
> > +struct Expr;
> > +struct Parse;
> > +struct Select;
> > +
> > +int
> > +sql_compile_expr(struct sqlite3 *db,
> > +		 const char *expr,
> > +		 struct Expr **result,
> > +		 char **err);
> > +
> > +void
> > +sql_extract_select_expr(struct Parse *parser, struct Select *select);
> > +
> > +struct Expr*
> > +space_get_column_default_expr(uint32_t space_id, uint32_t fieldno);
> > +
> > +int
> > +sql_duped_expr_size(struct Expr *p, int flags);
> 
> 13. How about simply expr_size, not duped? And please, name methods of a specific
> struct using convention <struct_name>_<method_name>. And for struct size
> methods use sizeof name (see space_def_sizeof, key_def_sizeof). For 'free' methods
> we use name 'delete'. Delete means, that the struct's memory is freed. Also we
> have 'destroy' methods, which frees only struct members, but not the struct itself.
> Here it is expr_sizeof(), expr_delete(), expr_dup().
> 
> And please, do not use 'get' for getters. Here it is space_column_default_expr().
Done.

> > diff --git a/src/box/sql/build.c b/src/box/sql/build.c
> > index 05a7cc9..c3bd637 100644
> > --- a/src/box/sql/build.c
> > +++ b/src/box/sql/build.c
> > @@ -1018,9 +1018,7 @@ sqlite3AddCheckConstraint(Parse * pParse,	/* Parsing context */
> > 		}
> > 	} else
> > #endif
> > -	{
> > -		sqlite3ExprDelete(pParse->db, pCheckExpr);
> > -	}
> > +		sql_expr_free(pParse->db, pCheckExpr);
> 
> 14. Extra tab.
Why? It belongs to `else`, which is under ifndef.

> > diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
> > index 89dcb23..093f99e 100644
> > --- a/src/box/sql/expr.c
> > +++ b/src/box/sql/expr.c
> > @@ -1203,11 +1203,16 @@ sqlite3ExprDeleteNN(sqlite3 * db, Expr * p)
> > 	}
> > }
> > 
> > +/**
> > + * Recursively free storage occupied by AST expr.
> > + * @param db SQL handle.
> > + * @param p Pointer to root node.
> > + */
> 
> 15. Same as 11.
Fixed.

> > @@ -1401,29 +1412,28 @@ exprDup(sqlite3 * db, Expr * p, int dupFlags, u8 ** pzBuffer)
> > 			if (ExprHasProperty(p, EP_xIsSelect)) {
> > 				pNew->x.pSelect =
> > 				    sqlite3SelectDup(db, p->x.pSelect,
> > -						     dupFlags);
> > +						     flags);
> > 			} else {
> > 				pNew->x.pList =
> > 				    sqlite3ExprListDup(db, p->x.pList,
> > -						       dupFlags);
> > +						       flags);
> > 			}
> > 		}
> > 
> > 		/* Fill in pNew->pLeft and pNew->pRight. */
> > 		if (ExprHasProperty(pNew, EP_Reduced | EP_TokenOnly)) {
> > -			zAlloc += dupedExprNodeSize(p, dupFlags);
> > +			zAlloc += dupedExprNodeSize(p, flags);
> > 			if (!ExprHasProperty(pNew, EP_TokenOnly | EP_Leaf)) {
> > 				pNew->pLeft = p->pLeft ?
> > -				    exprDup(db, p->pLeft, EXPRDUP_REDUCE,
> > +				    sql_expr_dup(db, p->pLeft, EXPRDUP_REDUCE,
> > 					    &zAlloc) : 0;
> 
> 16. Bad alignment.
Fixed.

> > diff --git a/src/box/sql/insert.c b/src/box/sql/insert.c
> > index 42254dd..00e263a 100644
> > --- a/src/box/sql/insert.c
> > +++ b/src/box/sql/insert.c
> > @@ -674,10 +678,15 @@ sqlite3Insert(Parse * pParse,	/* Parser context */
> > 				if (i == pTab->iAutoIncPKey)
> > 					sqlite3VdbeAddOp2(v, OP_Integer, -1,
> > 							  regCols + i + 1);
> > -				else
> > +				else {
> 
> 17. If 'else' body is inside {}, then its 'if' body must be too, even if
> consists of single line.
Huh. Okay.

> > diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h
> > index e9c0b31..6d4ae7c 100644
> > --- a/src/box/sql/sqliteInt.h
> > +++ b/src/box/sql/sqliteInt.h
> > @@ -3007,6 +3009,10 @@ struct Parse {
> > 	With *pWithToFree;	/* Free this WITH object at the end of the parse */
> > 
> > 	bool initiateTTrans;	/* Initiate Tarantool transaction */
> > +	/* If set - do not emit byte code at all, just parse.  */
> > +	bool parse_only;
> > +	/* If parse_only is set to true, store parsed expression. */
> > +	struct Expr *parsed_expr;
> 
> 18. For struct members please use /** comments instead of /*.
Done.

> > diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
> > index f810db4..ae52731 100644
> > --- a/src/box/sql/tokenize.c
> > +++ b/src/box/sql/tokenize.c
> > @@ -652,3 +652,124 @@ sqlite3RunParser(Parse * pParse, const char *zSql, char **pzErrMsg)
> > 	assert(nErr == 0 || pParse->rc != SQLITE_OK);
> > 	return nErr;
> > }
> > +
> > +/**
> > + * Perform parsing of provided expression. This is done by
> > + * surrounding the expression w/ 'SELECT ' prefix and perform
> > + * convetional parsing. Then extract result expression value from
> > + * stuct Select and return it.
> > + * @param db SQL context handle.
> > + * @param expr Expression to parse.
> > + * @param[out] result Result: AST structure.
> > + * @param[out] err Error string if any.
> > + *
> > + * @retval Error code if any.
> > + */
> > +int
> > +sql_compile_expr(sqlite3 *db,
> > +		 const char *expr,
> > +		 struct Expr **result,
> > +		 char **err)
> > +{
> > +	struct Parse parser;
> > +	int nErr = 0;		/* Number of errors encountered */
> > +	int i;			/* Loop counter */
> > +	void *engine;		/* The LEMON-generated LALR(1) parser */
> > +	int token_type;		/* type of the next token */
> > +	int last_token_type = -1;	/* type of the previous token */
> > +	int mxSqlLen;		/* Max length of an SQL string */
> > +
> > +	assert(expr != 0);
> > +	mxSqlLen = db->aLimit[SQLITE_LIMIT_SQL_LENGTH];
> > +	memset(&parser, 0, sizeof(struct Parse));
> > +	parser.rc = SQLITE_OK;
> > +	parser.zTail = expr;
> > +	parser.db = db;
> > +	parser.pToplevel = NULL;
> > +	parser.parse_only = true;
> > +	i = 0;
> > +	assert(err != 0);
> > +	engine = sqlite3ParserAlloc(sqlite3Malloc);
> > +	if (engine == NULL) {
> > +		sqlite3OomFault(db);
> > +		return SQLITE_NOMEM_BKPT;
> > +	}
> > +
> > +	const char *outer = "SELECT ";
> > +	char *stmt = malloc(strlen(outer) + strlen(expr) + 1);
> 
> 19. Please, use region. You do this in a DDL transaction only, so
> region usage allows you to do not free this manually.
Done.

> > +	if (stmt == NULL) {
> > +		sqlite3OomFault(db);
> > +		return SQLITE_NOMEM_BKPT;
> > +	}
> > +	sprintf(stmt, "%s%s", outer, expr);
> > +	while (1) {
> 
> 20. While (true). For boolean variables and constants please use
> boolean values.
Done.

--
Thanks, Kirill




More information about the Tarantool-patches mailing list