Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Roman Khabibov <roman.habibov@tarantool.org>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] sql: support column addition
Date: Sat, 14 Mar 2020 18:17:44 +0100	[thread overview]
Message-ID: <de4ef1f2-54e5-3b52-287c-3cfb628f8a02@tarantool.org> (raw)
In-Reply-To: <20200131150537.90826-1-roman.habibov@tarantool.org>

Hi! Thanks for the patch!

See 18 comments below.

On 31/01/2020 16:05, Roman Khabibov wrote:
> Enable to add column to existing space with
> <ALTER TABLE ADD [COLUMN]> statement. Column definition can be
> supplemented with <DEFAULT>, <COLLATE> clauses and <[NOT] NULL>
> column constraint.
> 
> Closes #3075
> 
> @TarantoolBot document
> Title: Add columns to existing tables in SQL
> Now, it is possible to add columns to existing empty spaces using
> <ALTER TABLE table_name ADD [COLUMN] column_name column_type ...>

1. Why do you require emptiness? At least for nullable columns it
is not necessary - nullable fields can be absent.

Moreover, someday we may even make possible to create a table
without a PK in SQL, and they allow to add a PK column as ADD COLUMN
instead of ADD CONSTRAINT like it works now (which works on a space
not having columns, btw).

> statement. The column definition is the same as in <CREATE TABLE>
> statement, except that table constraints (PRIMARY KEY, UNIQUE,
> REFERENCES, CHECK) cannot be specified yet.
> 
> For example:
> 
> tarantool> box.execute([[CREATE TABLE test (
>                             a INTEGER PRIMARY KEY
>                        );]])
> ---
> - row_count: 1
> ...
> 
> tarantool> box.execute([[ALTER TABLE test ADD COLUMN
>                             b TEXT
>                             NOT NULL
>                             DEFAULT ('a')
>                             COLLATE "unicode_ci"
>                         ;]])
> ---
> - row_count: 0
> ...
> ---
> Branch: https://github.com/tarantool/tarantool/tree/romanhabibov/gh-3075-add-column
> Issue: https://github.com/tarantool/tarantool/issues/3075
> 
> P.S.: I didn't implement column constraints in ADD COLUMN statement,
> because of the sql_create_index(), sql_create_foreign_key() and
> sql_create_check_contraint() functions. These functions have
> branching inside:

2. Are constraints allowed by the standard to be used in ADD COLUMN?

> 1) CREATE TABLE (parse->create_table_def.new_space != NULL)
> We work with the ephemeral space. Use its def, indexes array,
> ck and fk rlists for checks, opcode emitting, etc and modify it on
> parsing level.

3. Column-related functions (null, collate, default) use create_table_def
only to extract a field_def. See my proposal in the comments far below
how to make their code more reusable.

Talking of constraints, I don't see at this moment how to make their
code completely resuable and intact. We will need to add some ifs.

For sql_create_index() you need to fill create_index_def in struct
Parse. However, that would collide with create_column_def stored
in the same union. So probably you should consider moving create_column_def
to a separate member of struct Parse. Or as a union with create_table_def
in case it will be possible. Although maybe it is not. The problem
is that sql_create_index() emits insertion into _index right away, if
it is not a CREATE TABLE, and to check that it uses create_table_def.

So you need to add a check that if it is ADD COLUMN, that don't generate
code now, and just keep the new index somewhere. You could save it into
create_column_def into a new member. Anyway you can't add more than one
index, so it should be simple. Then you generate the index insertion in
sql_alter_add_column_end().

For sql_create_foreign_key() and sql_create_check_contraint() it is the
same, from what I see.

> 2) ALTER TABLE (parse->create_table_def.new_space == NULL)
> We work with existing space = space_by_name(), but don't modify it.
> This space will be modified after parsing by alter.cc triggers.
> 
> In the case of
> ALTER TABLE test ADD a INT PRIMARY KEY CONSTRAINT c UNIQUE CONSTRAINT d UNIQUE ...
> we need to modifiy the existing space, for example UNIQUE constraint
> creation/opcode emitting requires new space def with new modified
> fields array and new index_id_max. The same is for REFERENCES> constraint. With CHECK it seems easier, but I didn't implement it,
> because I want to do patch in the futue: "Enable constraints in ADD COLUMN",
> if it will be needed.

5. This should be a separate commit. But in scope of this
patchset, I think.

> There is obvious way to implement constraints parsing during
> ADD COLUMN. To do partial copy of space = space_by_name() on region,
> to keep it in parse->create_column_>def and to work with it. Then

6. Yes, you can create a copy of space_def. However it can be a
'shallow', light copy. Because constraint addition does not change
space_def. The only thing you need to add is the new column to the
field array so as constraint addition functions would see it. And it
seems to be fine to make a shallow copy of space_def for that, on a
region. You already create a field array copy. Just copy struct
space_def object too.

> we will have to change the current code a bit, but it is necessary
> to write API to create an ephemeral space based on the existing.
> 
> I would like to hear your opinion. Maybe I'm generally wrong in
> thinking about all this, and it could be made easier.

7. It can't be done much easier. But a few simplifications
can be done. See my comments about setting NULL, COLLATE, and DEFAULT.

General idea seems to be fine, about trying to reuse the existing code.

> P.P.S.: Could you, please, help me with COLUMN keyword support.
> I couldn't add it as I wanted, i.e., the line in parse.y near the
> line 1754:
> column_name(N) ::= COLUMN nm(A). { N = A; }
> because I had:
> /Users/r.khabibov/tarantool/src/box/sql/parse.h:171:9: error: 'TK_COLUMN' macro redefined
> #define TK_COLUMN
> 
> I don't understand why it is so? For example, the definition of
> CONSTRAINT keyword in mkkeywordhash.c is the same, but there is no
> error about this #define during compilation.
> 
>   { "COLUMN",                 "TK_COLUMN",      true  },
>   { "CONSTRAINT",             "TK_CONSTRAINT",  true  },

8. This is because of addopcodes.sh. It generates TK_COLUMN to
use for tokens treated as a column name. Seems like the only way
is to make a preparatory patch, which will change the old TK_COLUMN
to something else. For example, TK_COLUMN_NAME, it would be more
correct. Or TK_COLUMN_REF.

>  extra/mkkeywordhash.c       |   6 +-
>  src/box/errcode.h           |   1 +
>  src/box/sql.c               |  28 +++++---
>  src/box/sql/alter.c         | 125 ++++++++++++++++++++++++++++++++++++
>  src/box/sql/build.c         | 106 ++++++++++++++++++------------
>  src/box/sql/parse.y         |  59 +++++++++++++++--
>  src/box/sql/parse_def.h     |  23 +++++++
>  src/box/sql/sqlInt.h        |  40 ++++++++++++
>  src/box/sql/tarantoolInt.h  |   6 +-
>  test/box/misc.result        |   1 +
>  test/sql-tap/alter.test.lua |  76 +++++++++++++++++++++-
>  11 files changed, 405 insertions(+), 66 deletions(-)
> 
> diff --git a/src/box/sql.c b/src/box/sql.c
> index 1256df856..7157cabe6 100644
> --- a/src/box/sql.c
> +++ b/src/box/sql.c
> @@ -980,6 +980,14 @@ sql_encode_table(struct region *region, struct space_def *def, uint32_t *size)
>  	return raw;
>  }
>  
> +char *
> +sql_encode_table(struct region *region, struct space_def *def, uint32_t *size)
> +{
> +	assert(def != NULL);
> +	return sql_encode_table_format(region, def->fields, def->field_count,
> +				       size);
> +}

9. The function is unused.

> +
>  char *
>  sql_encode_table_opts(struct region *region, struct space_def *def,
>  		      uint32_t *size)
> diff --git a/src/box/sql/alter.c b/src/box/sql/alter.c
> index 973b420cf..c12d9373a 100644
> --- a/src/box/sql/alter.c
> +++ b/src/box/sql/alter.c
> @@ -209,3 +210,127 @@ rename_trigger(sql *db, char const *sql_stmt, char const *table_name,
>  				      table_name, tname.z + tname.n);
>  	return new_sql_stmt;
>  }
> +
> +void
> +sql_alter_add_column_start(struct Parse *parse)
> +{
> +	struct create_column_def *create_column_def = &parse->create_column_def;
> +	struct alter_entity_def *alter_entity_def =
> +		&create_column_def->base.base;
> +	assert(alter_entity_def->entity_type == ENTITY_TYPE_COLUMN);
> +	assert(alter_entity_def->alter_action == ALTER_ACTION_CREATE);
> +
> +	const char *space_name =
> +		alter_entity_def->entity_name->a[0].zName;
> +	struct space *space = space_by_name(space_name);
> +	struct sql *db = parse->db;
> +	if (space == NULL) {
> +		diag_set(ClientError, ER_NO_SUCH_SPACE, space_name);
> +		goto tnt_error;
> +	}
> +
> +	struct space_def *def = space->def;
> +	create_column_def->space_def = def;
> +	if (sql_space_def_check_format(def) != 0)
> +		goto tnt_error;

10. Why I can't add a field to a space not having any fields?

> +	uint32_t new_field_count = def->field_count + 1;
> +	create_column_def->new_field_count = new_field_count;
> +
> +#if SQL_MAX_COLUMN
> +	if ((int)new_field_count > db->aLimit[SQL_LIMIT_COLUMN]) {
> +		diag_set(ClientError, ER_SQL_COLUMN_COUNT_MAX, def->name,
> +			 new_field_count, db->aLimit[SQL_LIMIT_COLUMN]);
> +		goto tnt_error;
> +	}
> +#endif
> +	/*
> +	 * Create new fields array and add the new column's
> +	 * field_def to its end. During copying don't make
> +	 * duplicates of the non-scalar fields (name,
> +	 * default_value, default_value_expr), because they will
> +	 * be just copied and encoded to the msgpack array on sql
> +	 * allocator in sql_alter_add_column_end().
> +	 */
> +	uint32_t size = sizeof(struct field_def) * new_field_count;
> +	struct region *region = &parse->region;
> +	struct field_def *new_fields = region_alloc(region, size);
> +	if (new_fields == NULL) {
> +		diag_set(OutOfMemory, size, "region_alloc", "new_fields");
> +		goto tnt_error;
> +	}
> +	memcpy(new_fields, def->fields,
> +	       sizeof(struct field_def) * def->field_count);
> +	create_column_def->new_fields = new_fields;
> +	struct field_def *new_column_def = &new_fields[def->field_count];
> +
> +	memcpy(new_column_def, &field_def_default, sizeof(struct field_def));
> +	struct Token *name = &create_column_def->base.name;
> +	new_column_def->name =
> +		sql_normalized_name_region_new(region, name->z, name->n);
> +	if (new_column_def->name == NULL)
> +		goto tnt_error;
> +	if (def->opts.is_view) {
> +		diag_set(ClientError, ER_SQL_CANT_ADD_COLUMN_TO_VIEW,
> +			 new_column_def->name, def->name);
> +		goto tnt_error;
> +	}
> +
> +	new_column_def->type = create_column_def->type_def->type;

11. Seems like lots of duplication from sqlAddColumn(). You could reuse
it. Or reuse this function in sqlAddColumn(). Or try to extract their
common part. Because they are totally the same not counting how do they
get space_def. For example, you will be able to use sql_field_retrieve()
to realloc field array.

You could copy entire space_def on a region for ADD COLUMN. In that case
both functions will be even more the same.

> +
> +exit_alter_add_column_start:

12. The label is never used except from one goto below. Please,
move it to there and drop the label.

> +	sqlSrcListDelete(db, alter_entity_def->entity_name);
> +	return;
> +tnt_error:
> +	parse->is_aborted = true;
> +	goto exit_alter_add_column_start;
> +}
> +
> +void
> +sql_alter_add_column_end(struct Parse *parse)
> +{
> +	struct create_column_def *create_column_def = &parse->create_column_def;
> +	uint32_t new_field_count = create_column_def->new_field_count;
> +	uint32_t table_stmt_sz = 0;
> +	char *table_stmt =
> +		sql_encode_table_format(&parse->region,
> +					create_column_def->new_fields,
> +					new_field_count, &table_stmt_sz);
> +	if (table_stmt == NULL) {
> +		parse->is_aborted = true;
> +		return;
> +	}
> +	char *raw = sqlDbMallocRaw(parse->db, table_stmt_sz);
> +	if (raw == NULL){
> +		parse->is_aborted = true;
> +		return;
> +	}
> +	memcpy(raw, table_stmt, table_stmt_sz);
> +
> +	struct space *system_space = space_by_id(BOX_SPACE_ID);
> +	assert(system_space != NULL);
> +	int cursor = parse->nTab++;
> +	struct Vdbe *v = sqlGetVdbe(parse);
> +	assert(v != NULL);
> +	vdbe_emit_open_cursor(parse, cursor, 0, system_space);
> +	sqlVdbeChangeP5(v, OPFLAG_SYSTEMSP);
> +
> +	int key_reg = sqlGetTempReg(parse);
> +	sqlVdbeAddOp2(v, OP_Integer, create_column_def->space_def->id, key_reg);
> +	int addr = sqlVdbeAddOp4Int(v, OP_Found, cursor, 0, key_reg, 1);
> +	sqlVdbeAddOp2(v, OP_Halt, -1, ON_CONFLICT_ACTION_ABORT);
> +	sqlVdbeJumpHere(v, addr);
> +
> +	const int system_space_field_count = 7;

13. It does not look right to hardcode the constant here. Please, add a
new constant to schema_def.h, where all other _space fields are defined.

> +	int tuple_reg = sqlGetTempRange(parse, system_space_field_count + 1);
> +	for (int i = 0; i < system_space_field_count - 1; ++i)
> +		sqlVdbeAddOp3(v, OP_Column, cursor, i, tuple_reg + i);
> +	sqlVdbeAddOp1(v, OP_Close, cursor);
> +
> +	sqlVdbeAddOp2(v, OP_Integer, new_field_count, tuple_reg + 4);
> +	sqlVdbeAddOp4(v, OP_Blob, table_stmt_sz, tuple_reg + 6,
> +		      SQL_SUBTYPE_MSGPACK, raw, P4_DYNAMIC);
> +	sqlVdbeAddOp3(v, OP_MakeRecord, tuple_reg, system_space_field_count,
> +		      tuple_reg + system_space_field_count);
> +	sqlVdbeAddOp4(v, OP_IdxReplace, tuple_reg + system_space_field_count, 0,
> +		      0, (char *)system_space, P4_SPACEPTR);
> +}
> diff --git a/src/box/sql/build.c b/src/box/sql/build.c
> index bc50ecbfa..2f51b55df 100644
> --- a/src/box/sql/build.c
> +++ b/src/box/sql/build.c
> @@ -368,11 +368,21 @@ void
>  sql_column_add_nullable_action(struct Parse *parser,
>  			       enum on_conflict_action nullable_action)
>  {
> -	struct space *space = parser->create_table_def.new_space;
> -	if (space == NULL || NEVER(space->def->field_count < 1))
> +	struct space_def *def = NULL;
> +	struct field_def *field = NULL;
> +	if (parser->create_table_def.new_space != NULL) {
> +		def = parser->create_table_def.new_space->def;
> +		if (NEVER(def->field_count < 1))
> +			return;
> +		field = &def->fields[def->field_count - 1];
> +	} else if (parser->create_column_def.space_def != NULL) {
> +		def = parser->create_column_def.space_def;
> +		uint32_t field_count =
> +			parser->create_column_def.new_field_count;
> +		field = &parser->create_column_def.new_fields[field_count - 1];
> +	} else {

14. I think you could try to use create_column_def both for CREATE TABLE
and for ADD COLUMN. Try storing the new field in create_column_def separately,
as a special attribute. CREATE TABLE will recreate that field everytime a new
column is scanned. ADD COLUMN will create it once and use in all such
functions.

The same hack can help to make sqlAddDefaultValue(), sqlAddCollateType() more
straightforward and reusable.

>  		return;
> -	struct space_def *def = space->def;
> -	struct field_def *field = &def->fields[def->field_count - 1];
> +	}
> diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y
> index cfe1c0012..2eba72ac3 100644
> --- a/src/box/sql/parse.y
> +++ b/src/box/sql/parse.y
> @@ -335,7 +344,8 @@ ccons ::= cconsname(N) REFERENCES nm(T) eidlist_opt(TA) matcharg(M) refargs(R).
> +
> +/*
> + * "alter_carglist" is a list of additional constraints and
> + * clauses that come after the column name and column type in a
> + * ALTER TABLE ADD [COLUMN] statement.
> + */
> +alter_carglist ::= alter_carglist alter_ccons.
> +alter_carglist ::= .
> +alter_ccons ::= default_clause_1.
> +alter_ccons ::= default_clause_2.
> +alter_ccons ::= default_clause_3.
> +alter_ccons ::= default_clause_4.
> +alter_ccons ::= null_cons.
> +alter_ccons ::= not_null_cons.
> +alter_ccons ::= collate_clause.

15. Seems like you could just take ccons instead of inventing
alter_ccons. After you enable constraints in ADD COLUMN, right?

> +
> +add_column_end ::= . {
> +  sql_alter_add_column_end(pParse);
> +}
> diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
> index d1fcf4761..46d3d6d2a 100644
> --- a/src/box/sql/sqlInt.h
> +++ b/src/box/sql/sqlInt.h
> @@ -2842,6 +2843,18 @@ struct space *
>  sqlStartTable(Parse *, Token *);
>  void sqlAddColumn(Parse *, Token *, struct type_def *);
>  
> +/**
> + * Set the is_nullable flag on the column @a field.
> + *
> + * @param field           Column to set.
> + * @param nullable_action on_conflict_action value.
> + * @param space_name Name of the space to which the column
> + *                   belongs. Needed to set error.
> + *
> + * @retval 0  Success.
> + * @retval -1 nullable_action has been already set.
> + */
> +

16. The comment just hangs not belonging to any function. The
function below already has a comment.

>  /**
>   * This routine is called by the parser while in the middle of
>   * parsing a CREATE TABLE statement.  A "NOT NULL" constraint has
> @@ -2868,6 +2881,19 @@ sqlAddPrimaryKey(struct Parse *parse);
>  void
>  sql_create_check_contraint(Parse *parser);
>  
> +/**
> + * Add default value to the column @a field.
> + *
> + * @param parse      Parsing context.
> + * @param field      Column to set.
> + * @param span       Default expression.
> + * @param space_name Name of the space to which the column
> + *                   belongs. Needed to set error.
> + *
> + * @retval 0  Success.
> + * @retval -1 Error.
> + */

17. This comment also hangs, and contains not existing parameters.

> +
>  void sqlAddDefaultValue(Parse *, ExprSpan *);
>  void sqlAddCollateType(Parse *, Token *);
>  
> diff --git a/test/sql-tap/alter.test.lua b/test/sql-tap/alter.test.lua
> index 615b9d8a6..283dc1759 100755
> --- a/test/sql-tap/alter.test.lua
> +++ b/test/sql-tap/alter.test.lua

18. You will need to add much more tests. For example, ADD COLUMN
in a transaction, add multiple columns in a transaction, add to a
non-empty space a non-nullable field, add a nullable fields to a
non-empty space, add a column to a space not having any columns,
and so on.

  parent reply	other threads:[~2020-03-14 17:17 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-31 15:05 Roman Khabibov
2020-03-09 15:08 ` Roman Khabibov
2020-03-09 22:16   ` Vladislav Shpilevoy
2020-03-14 17:17 ` Vladislav Shpilevoy [this message]
2020-07-06 13:37   ` Roman Khabibov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de4ef1f2-54e5-3b52-287c-3cfb628f8a02@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=roman.habibov@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] sql: support column addition' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox