From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AE249430407 for ; Sat, 8 Aug 2020 17:51:16 +0300 (MSK) Date: Sat, 8 Aug 2020 14:51:15 +0000 From: Nikita Pettik Message-ID: <20200808145115.GB28157@tarantool.org> References: <14aec86329c7820470e40e47a3d1b5655b531732.1595985135.git.korablev@tarantool.org> <444ff8e3-d6c2-e6a5-8b54-18a521208ba5@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <444ff8e3-d6c2-e6a5-8b54-18a521208ba5@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 2/2] vinyl: rework upsert operation List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org On 02 Aug 16:44, Vladislav Shpilevoy wrote: > Thanks for the patch! > > ASAN tests on the branch are failing: > https://gitlab.com/tarantool/tarantool/-/jobs/661977877 Hi. I've sent second iteration of patch containing fixes to all your comments below. ASAN now is OK as well: https://gitlab.com/tarantool/tarantool/-/jobs/677105339 > > +static bool > > +vy_apply_result_does_cross_pk(struct tuple *old_stmt, const char *result, > > + const char *result_end, struct key_def *cmp_def, > > + uint64_t col_mask) > > +{ > > + if (!key_update_can_be_skipped(cmp_def->column_mask, col_mask)) { > > + struct tuple *tuple = > > + vy_stmt_new_replace(tuple_format(old_stmt), result, > > + result_end); > > + int cmp_res = vy_stmt_compare(old_stmt, HINT_NONE, tuple, > > + HINT_NONE, cmp_def); > > 1. Bad indentation. Fixed. > > + tuple_unref(tuple); > > + return cmp_res != 0; > > + } > > + return false; > > +} > > + > > +/** > > + * Apply update operations stored in @new_stmt (which is assumed to > > 2. Please, look at the doxygen syntax on the official page or here: > https://github.com/tarantool/tarantool/wiki/Code-review-procedure > This is a single very simple rule I keep repeating I don't already > remember how many times - use @a , not @. > I don't understand why does everyone keep violating it. I'm really sorry for that (haven't been writing comments in doxy style for a while). Fixed all comments. > 2. Parameter 'new_stmt' does not exist. As well as 'old_stmt'. What > did you mean? > > > + * be upsert statement) on tuple @old_stmt. If @old_stmt is void > > + * statement (i.e. it is NULL or delete statement) then operations > > + * are applied on tuple @new_stmt. All operations which can't be > > + * applied are skipped; errors may be logged depending on @supress_error > > 3. supress_error -> supress_error. Fixed. > 4. What do you mean as 'all operations'? Operation groups from various > upserts? Or individual operations? > > > + * flag. > > + * > > + * @upsert Upsert statement to be applied on @stmt. > > 5. If you want to use doxygen, use @param . Fixed. > > + * @stmt Statement to be used as base for upsert operations. > > + * @cmp_def Key definition required to provide check of primary key > > + * modification. > > + * @retrun Tuple containing result of upsert application; > > + * NULL in case OOM. > > 6. retrun -> return. Fixed. > 7. I guess you are among the ones who voted for 80 symbol comments - I > suggest you to use it. Since this is our new code style now. OK! > > + */ > > +static struct tuple * > > +vy_apply_upsert_on_terminal_stmt(struct tuple *upsert, struct tuple *stmt, > > + struct key_def *cmp_def, bool suppress_error) > > +{ > > + assert(vy_stmt_type(upsert) == IPROTO_UPSERT); > > + assert(stmt == NULL || vy_stmt_type(stmt) != IPROTO_UPSERT); > > + > > + uint32_t mp_size; > > + const char *new_ops = vy_stmt_upsert_ops(upsert, &mp_size); > > + /* Msgpack containing result of upserts application. */ > > + const char *result_mp; > > + if (vy_stmt_is_void(stmt)) > > 8. This {is_void} helper is used 2 times inside one funtion on the same > value. Seems like you could simply inline it, remeber result into a variable > {bool is_void;} and use it instead. Ok, refactored and dropped commit containing introduction of vy_stmt_is_void(). > > + for (uint32_t i = 0; i < ups_cnt; ++i) { > > + assert(mp_typeof(*ups_ops) == MP_ARRAY); > > + const char *ups_ops_end = ups_ops; > > + mp_next(&ups_ops_end); > > + const char *exec_res = result_mp; > > + exec_res = xrow_upsert_execute(ups_ops, ups_ops_end, result_mp, > > + result_mp_end, format, &mp_size, > > + 0, suppress_error, &column_mask); > > + if (exec_res == NULL) { > > + if (! suppress_error) { > > 9. According to one another recent code style change, unary operators > should not have a whitespace after them. Fixed. > > + assert(diag_last_error(diag_get()) != NULL); > > 10. Use {diag_is_empty}. Or better - save {diag_last_error(diag_get())} into > {struct error *e} before the assertion, and use {assert(e != NULL);}. OK, fixed. > > + struct error *e = diag_last_error(diag_get()); > > + /* Bail out immediately in case of OOM. */ > > + if (e->type != &type_ClientError) { > > + region_truncate(region, region_svp); > > + return NULL; > > + } > > + diag_log(); > > + } > > + ups_ops = ups_ops_end; > > + continue; > > + } > > + /* > > + * If it turns out that resulting tuple modifies primary > > + * key, than simply ignore this upsert. > > 11. than -> then. Fixed. > > +static bool > > +tuple_format_is_suitable_for_squash(struct tuple_format *format) > > +{ > > + struct tuple_field *field; > > + json_tree_foreach_entry_preorder(field, &format->fields.root, > > + struct tuple_field, token) { > > + if (field->type == FIELD_TYPE_UNSIGNED) > > + return false; > > 12. Bad indentation. > Also this squash rule is not going to work because integer type also can > overflow, both below INT64_MIN and above UINT64_MAX. Decimal types > can overflow. Besides, decimal can fail when a non-decimal value does not > fit the decimal type during conversion. For example, a huge double value. > DBL_MAX is bigger than maximal value available in our decimal type. See > xrow_update_arith_make() for all errors. > > Since squash is mostly about squashing +/-, looks like it won't work > almost always, and becomes useless. It's true. In its previous implementation it was almost useless. I've reworked this part in V2 and integrated format check right in xrow_upsert_squash() so that we can operate on particular values of squash result (e.g. if format declares unsigned field and the result of squash is negative - operations are not squashed). See update_arith_op_does_satisfy_format() and vy_upsert_try_to_squash(). > P.S. > > In the end of the review I noticed that this prevents squashing not > only of operations with unsigned fields. It will prevent squashing if > the whole format has at least one unsigned. This makes current implementation > of squash even more useless, because forbids to use the fastest field type, > which is default when you create an index without specification of field type > btw. > > > @@ -87,122 +248,74 @@ vy_apply_upsert(struct tuple *new_stmt, struct tuple *old_stmt, > > assert(new_stmt != old_stmt); > > assert(vy_stmt_type(new_stmt) == IPROTO_UPSERT); > > > > - if (old_stmt == NULL || vy_stmt_type(old_stmt) == IPROTO_DELETE) { > > - /* > > - * INSERT case: return new stmt. > > - */ > > - return vy_stmt_replace_from_upsert(new_stmt); > > + struct tuple *result_stmt = NULL; > > + if (old_stmt == NULL || vy_stmt_type(old_stmt) != IPROTO_UPSERT) { > > + return vy_apply_upsert_on_terminal_stmt(new_stmt, old_stmt, > > + cmp_def, suppress_error); > > } > > > > + assert(vy_stmt_type(old_stmt) == IPROTO_UPSERT); > > 13. The assertion looks useless, since it is reverse of the {if} > condition above, but up to you. Skipped (imho it increases a bit code readability). > > /* > > - * Unpack UPSERT operation from the new stmt > > + * Unpack UPSERT operation from the old and new stmts. > > */ > > + assert(old_stmt != NULL); > > 14. This is strage to check old_stmt->type in the previous assertion before > you checked old_stmt != NULL. Agree, swapped these asserts. > > - if (result_mp == NULL) { > > - region_truncate(region, region_svp); > > - return NULL; > > + if (tuple_format_is_suitable_for_squash(format)) { > > + const char *new_ops_end = new_ops + mp_size; > > + if (vy_upsert_try_to_squash(format, old_stmt_mp, old_stmt_mp_end, > > + old_ops, old_ops_end, new_ops, > > + new_ops_end, &result_stmt) != 0) { > > + /* OOM */ > > + region_truncate(region, region_svp); > > + return NULL; > > + } > > 15. vy_upsert_try_to_squash() returns a result into result_stmt. But > you ignore it. Basically, whatever it returns, you act like squash > didn't happen and it never works now. You continue to work with 2 old > operation set arrays. Also result_stmt leaks. > > What is also strange - I added {assert(false);} here and the tests > passed. I thought we had quite a lot squash tests. Seems they are all > for formats having unsigned field type. My aplogies for this broken part of patch, somehow I've missed it.. I've reworked it in patch V2 (see comment above). > (Actualy the tests failed, but not here - on my machine vinyl tests > fail in almost 100% runs somewhere with random errors, could be > luajit problems on Mac maybe.) > > > + * If upsert corresponding to old_ops becomes insert, then > > + * {{op1}, {op2}} update operations are not applied. > > */ > > - assert(old_ops_end - old_ops > 0); > > - if (vy_upsert_try_to_squash(format, result_mp, result_mp_end, > > - old_ops, old_ops_end, new_ops, new_ops_end, > > - &result_stmt) != 0) { > > + uint32_t old_ops_cnt = mp_decode_array(&old_ops); > > + uint32_t new_ops_cnt = mp_decode_array(&new_ops); > > + size_t ops_size = sizeof(struct iovec) * (old_ops_cnt + new_ops_cnt); > > + struct iovec *operations = region_alloc(region, ops_size); > > 16. region_alloc_array. > > 17. But you don't really need that. Nor upsert_ops_to_iovec() function. iovecs really simply code and workflow with update operations. I use them more intensely in new patch version. > You could keep the old code almost as is, because for vy_stmt_new_with_ops() > to work correctly, it is not necessary to have each operation set in a > separate iovec. Anyway they are all copied as is without unpacking. You > could have 1 iovec for the root MP_ARRAY, 1 iovec for the old operation sets, > 1 iovec for the new operation sets. > > Having first iovec with root MP_ARRAY would allow to delete is_ops_encoded. Not sure if it possible in V2... > > + if (operations == NULL) { > > region_truncate(region, region_svp); > > + diag_set(OutOfMemory, ops_size, "region_alloc", "operations"); > > return NULL; > > } > > - if (result_stmt != NULL) { > > - region_truncate(region, region_svp); > > - vy_stmt_set_lsn(result_stmt, vy_stmt_lsn(new_stmt)); > > - goto check_key; > > - } > > + upsert_ops_to_iovec(old_ops, old_ops_cnt, operations); > > + upsert_ops_to_iovec(new_ops, new_ops_cnt, &operations[old_ops_cnt]); > > > > 18. You need to put references to the relevant issues in the tests, > using > > -- > -- gh-NNNN: description. > -- Fixed (added tags). > > +- ok > > +... > > +s:select{} > > +--- > > +- - [1, 2, 3, 'upserted'] > > +... > > +s:drop() > > +--- > > +... > > 19. All tests work with unsigned fields. So squashing is not tested here. In new patch squashing requirements become more tolerant, so squashing now takes place in these tests.