[Tarantool-patches] [PATCH 2/2] vinyl: rework upsert operation
Nikita Pettik
korablev at tarantool.org
Sat Aug 8 17:51:15 MSK 2020
On 02 Aug 16:44, Vladislav Shpilevoy wrote:
> Thanks for the patch!
>
> ASAN tests on the branch are failing:
> https://gitlab.com/tarantool/tarantool/-/jobs/661977877
Hi. I've sent second iteration of patch containing fixes to
all your comments below. ASAN now is OK as well:
https://gitlab.com/tarantool/tarantool/-/jobs/677105339
> > +static bool
> > +vy_apply_result_does_cross_pk(struct tuple *old_stmt, const char *result,
> > + const char *result_end, struct key_def *cmp_def,
> > + uint64_t col_mask)
> > +{
> > + if (!key_update_can_be_skipped(cmp_def->column_mask, col_mask)) {
> > + struct tuple *tuple =
> > + vy_stmt_new_replace(tuple_format(old_stmt), result,
> > + result_end);
> > + int cmp_res = vy_stmt_compare(old_stmt, HINT_NONE, tuple,
> > + HINT_NONE, cmp_def);
>
> 1. Bad indentation.
Fixed.
> > + tuple_unref(tuple);
> > + return cmp_res != 0;
> > + }
> > + return false;
> > +}
> > +
> > +/**
> > + * Apply update operations stored in @new_stmt (which is assumed to
>
> 2. Please, look at the doxygen syntax on the official page or here:
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure
> This is a single very simple rule I keep repeating I don't already
> remember how many times - use @a <param_name>, not @<param_name>.
> I don't understand why does everyone keep violating it.
I'm really sorry for that (haven't been writing comments in doxy
style for a while). Fixed all comments.
> 2. Parameter 'new_stmt' does not exist. As well as 'old_stmt'. What
> did you mean?
>
> > + * be upsert statement) on tuple @old_stmt. If @old_stmt is void
> > + * statement (i.e. it is NULL or delete statement) then operations
> > + * are applied on tuple @new_stmt. All operations which can't be
> > + * applied are skipped; errors may be logged depending on @supress_error
>
> 3. supress_error -> supress_error.
Fixed.
> 4. What do you mean as 'all operations'? Operation groups from various
> upserts? Or individual operations?
>
> > + * flag.
> > + *
> > + * @upsert Upsert statement to be applied on @stmt.
>
> 5. If you want to use doxygen, use @param <param_name>.
Fixed.
> > + * @stmt Statement to be used as base for upsert operations.
> > + * @cmp_def Key definition required to provide check of primary key
> > + * modification.
> > + * @retrun Tuple containing result of upsert application;
> > + * NULL in case OOM.
>
> 6. retrun -> return.
Fixed.
> 7. I guess you are among the ones who voted for 80 symbol comments - I
> suggest you to use it. Since this is our new code style now.
OK!
> > + */
> > +static struct tuple *
> > +vy_apply_upsert_on_terminal_stmt(struct tuple *upsert, struct tuple *stmt,
> > + struct key_def *cmp_def, bool suppress_error)
> > +{
> > + assert(vy_stmt_type(upsert) == IPROTO_UPSERT);
> > + assert(stmt == NULL || vy_stmt_type(stmt) != IPROTO_UPSERT);
> > +
> > + uint32_t mp_size;
> > + const char *new_ops = vy_stmt_upsert_ops(upsert, &mp_size);
> > + /* Msgpack containing result of upserts application. */
> > + const char *result_mp;
> > + if (vy_stmt_is_void(stmt))
>
> 8. This {is_void} helper is used 2 times inside one funtion on the same
> value. Seems like you could simply inline it, remeber result into a variable
> {bool is_void;} and use it instead.
Ok, refactored and dropped commit containing introduction of vy_stmt_is_void().
> > + for (uint32_t i = 0; i < ups_cnt; ++i) {
> > + assert(mp_typeof(*ups_ops) == MP_ARRAY);
> > + const char *ups_ops_end = ups_ops;
> > + mp_next(&ups_ops_end);
> > + const char *exec_res = result_mp;
> > + exec_res = xrow_upsert_execute(ups_ops, ups_ops_end, result_mp,
> > + result_mp_end, format, &mp_size,
> > + 0, suppress_error, &column_mask);
> > + if (exec_res == NULL) {
> > + if (! suppress_error) {
>
> 9. According to one another recent code style change, unary operators
> should not have a whitespace after them.
Fixed.
> > + assert(diag_last_error(diag_get()) != NULL);
>
> 10. Use {diag_is_empty}. Or better - save {diag_last_error(diag_get())} into
> {struct error *e} before the assertion, and use {assert(e != NULL);}.
OK, fixed.
> > + struct error *e = diag_last_error(diag_get());
> > + /* Bail out immediately in case of OOM. */
> > + if (e->type != &type_ClientError) {
> > + region_truncate(region, region_svp);
> > + return NULL;
> > + }
> > + diag_log();
> > + }
> > + ups_ops = ups_ops_end;
> > + continue;
> > + }
> > + /*
> > + * If it turns out that resulting tuple modifies primary
> > + * key, than simply ignore this upsert.
>
> 11. than -> then.
Fixed.
> > +static bool
> > +tuple_format_is_suitable_for_squash(struct tuple_format *format)
> > +{
> > + struct tuple_field *field;
> > + json_tree_foreach_entry_preorder(field, &format->fields.root,
> > + struct tuple_field, token) {
> > + if (field->type == FIELD_TYPE_UNSIGNED)
> > + return false;
>
> 12. Bad indentation.
> Also this squash rule is not going to work because integer type also can
> overflow, both below INT64_MIN and above UINT64_MAX. Decimal types
> can overflow. Besides, decimal can fail when a non-decimal value does not
> fit the decimal type during conversion. For example, a huge double value.
> DBL_MAX is bigger than maximal value available in our decimal type. See
> xrow_update_arith_make() for all errors.
>
> Since squash is mostly about squashing +/-, looks like it won't work
> almost always, and becomes useless.
It's true. In its previous implementation it was almost useless.
I've reworked this part in V2 and integrated format check right
in xrow_upsert_squash() so that we can operate on particular values
of squash result (e.g. if format declares unsigned field and the
result of squash is negative - operations are not squashed).
See update_arith_op_does_satisfy_format() and vy_upsert_try_to_squash().
> P.S.
>
> In the end of the review I noticed that this prevents squashing not
> only of operations with unsigned fields. It will prevent squashing if
> the whole format has at least one unsigned. This makes current implementation
> of squash even more useless, because forbids to use the fastest field type,
> which is default when you create an index without specification of field type
> btw.
>
> > @@ -87,122 +248,74 @@ vy_apply_upsert(struct tuple *new_stmt, struct tuple *old_stmt,
> > assert(new_stmt != old_stmt);
> > assert(vy_stmt_type(new_stmt) == IPROTO_UPSERT);
> >
> > - if (old_stmt == NULL || vy_stmt_type(old_stmt) == IPROTO_DELETE) {
> > - /*
> > - * INSERT case: return new stmt.
> > - */
> > - return vy_stmt_replace_from_upsert(new_stmt);
> > + struct tuple *result_stmt = NULL;
> > + if (old_stmt == NULL || vy_stmt_type(old_stmt) != IPROTO_UPSERT) {
> > + return vy_apply_upsert_on_terminal_stmt(new_stmt, old_stmt,
> > + cmp_def, suppress_error);
> > }
> >
> > + assert(vy_stmt_type(old_stmt) == IPROTO_UPSERT);
>
> 13. The assertion looks useless, since it is reverse of the {if}
> condition above, but up to you.
Skipped (imho it increases a bit code readability).
> > /*
> > - * Unpack UPSERT operation from the new stmt
> > + * Unpack UPSERT operation from the old and new stmts.
> > */
> > + assert(old_stmt != NULL);
>
> 14. This is strage to check old_stmt->type in the previous assertion before
> you checked old_stmt != NULL.
Agree, swapped these asserts.
> > - if (result_mp == NULL) {
> > - region_truncate(region, region_svp);
> > - return NULL;
> > + if (tuple_format_is_suitable_for_squash(format)) {
> > + const char *new_ops_end = new_ops + mp_size;
> > + if (vy_upsert_try_to_squash(format, old_stmt_mp, old_stmt_mp_end,
> > + old_ops, old_ops_end, new_ops,
> > + new_ops_end, &result_stmt) != 0) {
> > + /* OOM */
> > + region_truncate(region, region_svp);
> > + return NULL;
> > + }
>
> 15. vy_upsert_try_to_squash() returns a result into result_stmt. But
> you ignore it. Basically, whatever it returns, you act like squash
> didn't happen and it never works now. You continue to work with 2 old
> operation set arrays. Also result_stmt leaks.
>
> What is also strange - I added {assert(false);} here and the tests
> passed. I thought we had quite a lot squash tests. Seems they are all
> for formats having unsigned field type.
My aplogies for this broken part of patch, somehow I've missed it..
I've reworked it in patch V2 (see comment above).
> (Actualy the tests failed, but not here - on my machine vinyl tests
> fail in almost 100% runs somewhere with random errors, could be
> luajit problems on Mac maybe.)
>
> > + * If upsert corresponding to old_ops becomes insert, then
> > + * {{op1}, {op2}} update operations are not applied.
> > */
> > - assert(old_ops_end - old_ops > 0);
> > - if (vy_upsert_try_to_squash(format, result_mp, result_mp_end,
> > - old_ops, old_ops_end, new_ops, new_ops_end,
> > - &result_stmt) != 0) {
> > + uint32_t old_ops_cnt = mp_decode_array(&old_ops);
> > + uint32_t new_ops_cnt = mp_decode_array(&new_ops);
> > + size_t ops_size = sizeof(struct iovec) * (old_ops_cnt + new_ops_cnt);
> > + struct iovec *operations = region_alloc(region, ops_size);
>
> 16. region_alloc_array.
>
> 17. But you don't really need that. Nor upsert_ops_to_iovec() function.
iovecs really simply code and workflow with update operations.
I use them more intensely in new patch version.
> You could keep the old code almost as is, because for vy_stmt_new_with_ops()
> to work correctly, it is not necessary to have each operation set in a
> separate iovec. Anyway they are all copied as is without unpacking. You
> could have 1 iovec for the root MP_ARRAY, 1 iovec for the old operation sets,
> 1 iovec for the new operation sets.
>
> Having first iovec with root MP_ARRAY would allow to delete is_ops_encoded.
Not sure if it possible in V2...
> > + if (operations == NULL) {
> > region_truncate(region, region_svp);
> > + diag_set(OutOfMemory, ops_size, "region_alloc", "operations");
> > return NULL;
> > }
> > - if (result_stmt != NULL) {
> > - region_truncate(region, region_svp);
> > - vy_stmt_set_lsn(result_stmt, vy_stmt_lsn(new_stmt));
> > - goto check_key;
> > - }
> > + upsert_ops_to_iovec(old_ops, old_ops_cnt, operations);
> > + upsert_ops_to_iovec(new_ops, new_ops_cnt, &operations[old_ops_cnt]);
> >
>
> 18. You need to put references to the relevant issues in the tests,
> using
>
> --
> -- gh-NNNN: description.
> --
Fixed (added tags).
> > +- ok
> > +...
> > +s:select{}
> > +---
> > +- - [1, 2, 3, 'upserted']
> > +...
> > +s:drop()
> > +---
> > +...
>
> 19. All tests work with unsigned fields. So squashing is not tested here.
In new patch squashing requirements become more tolerant, so squashing
now takes place in these tests.
More information about the Tarantool-patches
mailing list