[Tarantool-patches] [PATCH 2/2] vinyl: rework upsert operation

Sat Aug 8 17:51:15 MSK 2020

On 02 Aug 16:44, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> ASAN tests on the branch are failing:
> https://gitlab.com/tarantool/tarantool/-/jobs/661977877

Hi. I've sent second iteration of patch containing fixes to
all your comments below. ASAN now is OK as well:
https://gitlab.com/tarantool/tarantool/-/jobs/677105339

> > +static bool
> > +vy_apply_result_does_cross_pk(struct tuple *old_stmt, const char *result,
> > +			      const char *result_end, struct key_def *cmp_def,
> > +			      uint64_t col_mask)
> > +{
> > +	if (!key_update_can_be_skipped(cmp_def->column_mask, col_mask)) {
> > +		struct tuple *tuple =
> > +			vy_stmt_new_replace(tuple_format(old_stmt), result,
> > +					    result_end);
> > +		int cmp_res = vy_stmt_compare(old_stmt, HINT_NONE, tuple,
> > +					       HINT_NONE, cmp_def);
> 
> 1. Bad indentation.

Fixed.

> > +		tuple_unref(tuple);
> > +		return cmp_res != 0;
> > +	}
> > +	return false;
> > +}
> > +
> > +/**
> > + * Apply update operations stored in @new_stmt (which is assumed to
> 
> 2. Please, look at the doxygen syntax on the official page or here:
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure
> This is a single very simple rule I keep repeating I don't already
> remember how many times - use @a <param_name>, not @<param_name>.
> I don't understand why does everyone keep violating it.

I'm really sorry for that (haven't been writing comments in doxy
style for a while). Fixed all comments.

> 2. Parameter 'new_stmt' does not exist. As well as 'old_stmt'. What
> did you mean?
> 
> > + * be upsert statement) on tuple @old_stmt. If @old_stmt is void
> > + * statement (i.e. it is NULL or delete statement) then operations
> > + * are applied on tuple @new_stmt. All operations which can't be
> > + * applied are skipped; errors may be logged depending on @supress_error
> 
> 3. supress_error -> supress_error.

Fixed.

> 4. What do you mean as 'all operations'? Operation groups from various
> upserts? Or individual operations?
> 
> > + * flag.
> > + *
> > + * @upsert Upsert statement to be applied on @stmt.
> 
> 5. If you want to use doxygen, use @param <param_name>.

Fixed.

> > + * @stmt Statement to be used as base for upsert operations.
> > + * @cmp_def Key definition required to provide check of primary key
> > + *          modification.
> > + * @retrun Tuple containing result of upsert application;
> > + *         NULL in case OOM.
> 
> 6. retrun -> return.

Fixed.

> 7. I guess you are among the ones who voted for 80 symbol comments - I
> suggest you to use it. Since this is our new code style now.

OK!

> > + */
> > +static struct tuple *
> > +vy_apply_upsert_on_terminal_stmt(struct tuple *upsert, struct tuple *stmt,
> > +				 struct key_def *cmp_def, bool suppress_error)
> > +{
> > +	assert(vy_stmt_type(upsert) == IPROTO_UPSERT);
> > +	assert(stmt == NULL || vy_stmt_type(stmt) != IPROTO_UPSERT);
> > +
> > +	uint32_t mp_size;
> > +	const char *new_ops = vy_stmt_upsert_ops(upsert, &mp_size);
> > +	/* Msgpack containing result of upserts application. */
> > +	const char *result_mp;
> > +	if (vy_stmt_is_void(stmt))
> 
> 8. This {is_void} helper is used 2 times inside one funtion on the same
> value. Seems like you could simply inline it, remeber result into a variable
> {bool is_void;} and use it instead.

Ok, refactored and dropped commit containing introduction of vy_stmt_is_void().

> > +	for (uint32_t i = 0; i < ups_cnt; ++i) {
> > +		assert(mp_typeof(*ups_ops) == MP_ARRAY);
> > +		const char *ups_ops_end = ups_ops;
> > +		mp_next(&ups_ops_end);
> > +		const char *exec_res = result_mp;
> > +		exec_res = xrow_upsert_execute(ups_ops, ups_ops_end, result_mp,
> > +					       result_mp_end, format, &mp_size,
> > +					       0, suppress_error, &column_mask);
> > +		if (exec_res == NULL) {
> > +			if (! suppress_error) {
> 
> 9. According to one another recent code style change, unary operators
> should not have a whitespace after them.

Fixed.

> > +				assert(diag_last_error(diag_get()) != NULL);
> 
> 10. Use {diag_is_empty}. Or better - save {diag_last_error(diag_get())} into
> {struct error *e} before the assertion, and use {assert(e != NULL);}.

OK, fixed.

> > +				struct error *e = diag_last_error(diag_get());
> > +				/* Bail out immediately in case of OOM. */
> > +				if (e->type != &type_ClientError) {
> > +					region_truncate(region, region_svp);
> > +					return NULL;
> > +				}
> > +				diag_log();
> > +			}
> > +			ups_ops = ups_ops_end;
> > +			continue;
> > +		}
> > +		/*
> > +		 * If it turns out that resulting tuple modifies primary
> > +		 * key, than simply ignore this upsert.
> 
> 11. than -> then.

Fixed.

> > +static bool
> > +tuple_format_is_suitable_for_squash(struct tuple_format *format)
> > +{
> > +	struct tuple_field *field;
> > +	json_tree_foreach_entry_preorder(field, &format->fields.root,
> > +					 struct tuple_field, token) {
> > +		if (field->type == FIELD_TYPE_UNSIGNED)
> > +				return false;
> 
> 12. Bad indentation.
> Also this squash rule is not going to work because integer type also can
> overflow, both below INT64_MIN and above UINT64_MAX. Decimal types
> can overflow. Besides, decimal can fail when a non-decimal value does not
> fit the decimal type during conversion. For example, a huge double value.
> DBL_MAX is bigger than maximal value available in our decimal type. See
> xrow_update_arith_make() for all errors.
> 
> Since squash is mostly about squashing +/-, looks like it won't work
> almost always, and becomes useless.

It's true. In its previous implementation it was almost useless.
I've reworked this part in V2 and integrated format check right
in xrow_upsert_squash() so that we can operate on particular values
of squash result (e.g. if format declares unsigned field and the
result of squash is negative - operations are not squashed).
See update_arith_op_does_satisfy_format() and vy_upsert_try_to_squash().

> P.S.
> 
> In the end of the review I noticed that this prevents squashing not
> only of operations with unsigned fields. It will prevent squashing if
> the whole format has at least one unsigned. This makes current implementation
> of squash even more useless, because forbids to use the fastest field type,
> which is default when you create an index without specification of field type
> btw.
> 
> > @@ -87,122 +248,74 @@ vy_apply_upsert(struct tuple *new_stmt, struct tuple *old_stmt,
> >  	assert(new_stmt != old_stmt);
> >  	assert(vy_stmt_type(new_stmt) == IPROTO_UPSERT);
> >  
> > -	if (old_stmt == NULL || vy_stmt_type(old_stmt) == IPROTO_DELETE) {
> > -		/*
> > -		 * INSERT case: return new stmt.
> > -		 */
> > -		return vy_stmt_replace_from_upsert(new_stmt);
> > +	struct tuple *result_stmt = NULL;
> > +	if (old_stmt == NULL || vy_stmt_type(old_stmt) != IPROTO_UPSERT) {
> > +		return vy_apply_upsert_on_terminal_stmt(new_stmt, old_stmt,
> > +						        cmp_def, suppress_error);
> >  	}
> >  
> > +	assert(vy_stmt_type(old_stmt) == IPROTO_UPSERT);
> 
> 13. The assertion looks useless, since it is reverse of the {if}
> condition above, but up to you.

Skipped (imho it increases a bit code readability).

> >  	/*
> > -	 * Unpack UPSERT operation from the new stmt
> > +	 * Unpack UPSERT operation from the old and new stmts.
> >  	 */
> > +	assert(old_stmt != NULL);
> 
> 14. This is strage to check old_stmt->type in the previous assertion before
> you checked old_stmt != NULL.

Agree, swapped these asserts.

> > -	if (result_mp == NULL) {
> > -		region_truncate(region, region_svp);
> > -		return NULL;
> > +	if (tuple_format_is_suitable_for_squash(format)) {
> > +		const char *new_ops_end = new_ops + mp_size;
> > +		if (vy_upsert_try_to_squash(format, old_stmt_mp, old_stmt_mp_end,
> > +					    old_ops, old_ops_end, new_ops,
> > +					    new_ops_end, &result_stmt) != 0) {
> > +			/* OOM */
> > +			region_truncate(region, region_svp);
> > +			return NULL;
> > +		}
> 
> 15. vy_upsert_try_to_squash() returns a result into result_stmt. But
> you ignore it. Basically, whatever it returns, you act like squash
> didn't happen and it never works now. You continue to work with 2 old
> operation set arrays. Also result_stmt leaks.
> 
> What is also strange - I added {assert(false);} here and the tests
> passed. I thought we had quite a lot squash tests. Seems they are all
> for formats having unsigned field type.

My aplogies for this broken part of patch, somehow I've missed it..
I've reworked it in patch V2 (see comment above).

> (Actualy the tests failed, but not here - on my machine vinyl tests
> fail in almost 100% runs somewhere with random errors, could be
> luajit problems on Mac maybe.)
> 
> > +	 * If upsert corresponding to old_ops becomes insert, then
> > +	 * {{op1}, {op2}} update operations are not applied.
> >  	 */
> > -	assert(old_ops_end - old_ops > 0);
> > -	if (vy_upsert_try_to_squash(format, result_mp, result_mp_end,
> > -				    old_ops, old_ops_end, new_ops, new_ops_end,
> > -				    &result_stmt) != 0) {
> > +	uint32_t old_ops_cnt = mp_decode_array(&old_ops);
> > +	uint32_t new_ops_cnt = mp_decode_array(&new_ops);
> > +	size_t ops_size = sizeof(struct iovec) * (old_ops_cnt + new_ops_cnt);
> > +	struct iovec *operations = region_alloc(region, ops_size);
> 
> 16. region_alloc_array.
> 
> 17. But you don't really need that. Nor upsert_ops_to_iovec() function.

iovecs really simply code and workflow with update operations.
I use them more intensely in new patch version.

> You could keep the old code almost as is, because for vy_stmt_new_with_ops()
> to work correctly, it is not necessary to have each operation set in a
> separate iovec. Anyway they are all copied as is without unpacking. You
> could have 1 iovec for the root MP_ARRAY, 1 iovec for the old operation sets,
> 1 iovec for the new operation sets.
> 
> Having first iovec with root MP_ARRAY would allow to delete is_ops_encoded.

Not sure if it possible in V2...

> > +	if (operations == NULL) {
> >  		region_truncate(region, region_svp);
> > +		diag_set(OutOfMemory, ops_size, "region_alloc", "operations");
> >  		return NULL;
> >  	}
> > -	if (result_stmt != NULL) {
> > -		region_truncate(region, region_svp);
> > -		vy_stmt_set_lsn(result_stmt, vy_stmt_lsn(new_stmt));
> > -		goto check_key;
> > -	}
> > +	upsert_ops_to_iovec(old_ops, old_ops_cnt, operations);
> > +	upsert_ops_to_iovec(new_ops, new_ops_cnt, &operations[old_ops_cnt]);
> >  
> 
> 18. You need to put references to the relevant issues in the tests,
> using
> 
>     --
>     -- gh-NNNN: description.
>     --

Fixed (added tags).

> > +- ok
> > +...
> > +s:select{}
> > +---
> > +- - [1, 2, 3, 'upserted']
> > +...
> > +s:drop()
> > +---
> > +...
> 
> 19. All tests work with unsigned fields. So squashing is not tested here.

In new patch squashing requirements become more tolerant, so squashing
now takes place in these tests.