From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 4F08E2B2CB for ; Thu, 4 Apr 2019 11:03:55 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X7nfK3dl7JrL for ; Thu, 4 Apr 2019 11:03:55 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 4CD582AEC6 for ; Thu, 4 Apr 2019 11:03:54 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH v1 1/1] sql: fix perf degradation on name normalization References: <09c8ef39eaf35ae7fa6825236a3b32b54d13dec5.1554386791.git.kshcherbatov@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Thu, 4 Apr 2019 18:03:50 +0300 MIME-Version: 1.0 In-Reply-To: <09c8ef39eaf35ae7fa6825236a3b32b54d13dec5.1554386791.git.kshcherbatov@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org, Kirill Shcherbatov Hi! Thanks for the patch. See 13 comments below. On 04/04/2019 17:07, Kirill Shcherbatov wrote: > Because sql_normalize_name used to be called twice - to estimate > the size of the name buffer and to process data querying the > UCaseMap object each time DDL performance felt by 15%. 1. It is not about DDL only. Names are normalized always, on DML and DQL as well. You just can't look up spaces and columns by not normalized names. > > This patch should eliminate some of the negative effects of using > ICU for name normalization. > > Thanks @avtikhon for a bechmark > > Follow up e7558062d3559e6bcc18f91eacb88269428321dc > --- 2. As I see, you did the patch on top of 2.1 branch, but our policy is implement patches on top of the master, and then cherry-pick. > https://github.com/tarantool/msgpuck/tree/kshch/gh-3931-perfomance-fixup > https://github.com/tarantool/tarantool/issues/3931 > > src/box/sql.c | 8 ++++ > src/box/sql.h | 9 +++++ > src/box/sql/expr.c | 25 ++++++------- > src/box/sql/parse.y | 22 ++++++----- > src/box/sql/sqlInt.h | 9 +---- > src/box/sql/trigger.c | 22 +++++++---- > src/box/sql/util.c | 80 ++++++++++++++++++++-------------------- > src/lib/core/errinj.h | 1 - > test/box/errinj.result | 2 - > test/sql/errinj.result | 18 --------- > test/sql/errinj.test.lua | 8 ---- > 11 files changed, 97 insertions(+), 107 deletions(-) > > diff --git a/src/box/sql.c b/src/box/sql.c > index 4fac020b0..872bfc4f4 100644 > --- a/src/box/sql.c > +++ b/src/box/sql.c> @@ -74,6 +77,11 @@ sql_init() > if (sql_init_db(&db) != SQL_OK) > panic("failed to initialize SQL subsystem"); > > + UErrorCode status = U_ZERO_ERROR; > + sql_case_map = ucasemap_open(NULL, 0, &status); > + if (sql_case_map == NULL) > + panic("failed to initialize SQL subsystem"); 3. As you can see, there are two exactly the same messages. Please, avoid this duplication. ================================================================ @@ -71,15 +71,10 @@ void sql_init() { default_flags |= default_sql_flags; - current_session()->sql_flags |= default_sql_flags; - - if (sql_init_db(&db) != SQL_OK) - panic("failed to initialize SQL subsystem"); - UErrorCode status = U_ZERO_ERROR; sql_case_map = ucasemap_open(NULL, 0, &status); - if (sql_case_map == NULL) + if (sql_init_db(&db) != SQL_OK || sql_case_map == NULL) panic("failed to initialize SQL subsystem"); assert(db != NULL); ================================================================ > + > assert(db != NULL); > } > > diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c > index 838fbd21a..0668fcca6 100644 > --- a/src/box/sql/expr.c > +++ b/src/box/sql/expr.c> @@ -973,14 +973,7 @@ sql_expr_new_dequoted(struct sql *db, int op, const struct Token *token) > if (sql_expr_token_to_int(op, token, &val) == 0) > return sql_expr_new_int(db, val); > is_name = op == TK_ID || op == TK_COLLATE || op == TK_FUNCTION; > - if (is_name) { > - extra_size = sql_normalize_name(NULL, 0, token->z, > - token->n); > - if (extra_size < 0) > - return NULL; > - } else { > - extra_size = token->n + 1; > - } > + extra_size = token->n + 1; 4. is_name now is not necessary to keep as a variable. Please, inline it in its single usage place. > } > struct Expr *e = sql_expr_new_empty(db, op, extra_size); > if (e == NULL || token == NULL || token->n == 0) > @@ -992,10 +985,16 @@ sql_expr_new_dequoted(struct sql *db, int op, const struct Token *token) > memcpy(e->u.zToken, token->z, token->n); > e->u.zToken[token->n] = '\0'; > sqlDequote(e->u.zToken); > - } else if (sql_normalize_name(e->u.zToken, extra_size, token->z, > - token->n) < 0) { > - sql_expr_delete(db, e, false); > - return NULL; > + } else if ((rc = sql_normalize_name(e->u.zToken, extra_size, token->z, > + token->n)) > extra_size) { > + extra_size = rc; > + e = sqlDbReallocOrFree(db, e, sizeof(*e) + extra_size); > + if (e == NULL) > + return NULL; > + e->u.zToken = (char *) &e[1]; > + if (sql_normalize_name(e->u.zToken, extra_size, token->z, > + token->n) > extra_size) > + unreachable(); > } > return e; > } > diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y > index c37b8d429..357335d23 100644 > --- a/src/box/sql/parse.y > +++ b/src/box/sql/parse.y > @@ -899,8 +892,17 @@ idlist(A) ::= nm(Y). { > p->iAgg = -1; > p->u.zToken = (char*)&p[1]; > if (op != TK_VARIABLE) { > - if (sql_normalize_name(p->u.zToken, name_sz, t.z, t.n) < 0) > - goto tarantool_error; > + int rc; > + if ((rc = sql_normalize_name(p->u.zToken, name_sz, t.z, > + t.n)) > name_sz) { 5. Why do you declare 'rc' and on the next line set it inside 'if'? It does not make the code shorter, not more readable. Just write rc = sql_normalize...(); and check its result on the next line. > + name_sz = rc; > + p = sqlDbReallocOrFree(pParse->db, p, sizeof(Expr) + name_sz); 6. Either sizeof(struct Expr), or sizeof(*p). We do not use typedefs normally, and typedef Expr will be dropped someday. > + if (p == NULL) > + goto tarantool_error; > + p->u.zToken = (char*) &p[1]; 7. In our code we put whitespace before '*'. > + if (sql_normalize_name(p->u.zToken, name_sz, t.z, t.n) > name_sz) > + unreachable(); > + } > } else { > memcpy(p->u.zToken, t.z, t.n); > p->u.zToken[t.n] = 0; > diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h > index 4a2197f96..1374f364c 100644 > --- a/src/box/sql/sqlInt.h > +++ b/src/box/sql/sqlInt.h > @@ -3207,15 +3207,10 @@ void sqlDequote(char *); > * @param dst A buffer for the result string. The result will be > * 0-terminated if the buffer is large enough. The contents > * is undefined in case of failure. > - * @param dst_size The size of the buffer (number of bytes). If it > - * is 0, then dest may be NULL and the function will only > - * return the length of the result without writing any of > - * the result string > + * @param dst_size The size of the buffer (number of bytes). > * @param src The original string. > * @param src_len The length of the original string. > - * @retval The count of bytes written(or need to be written) on > - * success. > - * @retval < 0 Otherwise. The diag message is set. > + * @retval The count of bytes written. 8. What is a motivation of dropping '(or need to be written)' ? I understand removal of 'on success', but why did you drop the text in the braces? You still can get a value > dst_size from that function. And since you dropped the comment on dst_size, here you need to write a more detailed one, when and why it can return > dst_size. > */ > int > sql_normalize_name(char *dst, int dst_size, const char *src, int src_len); > diff --git a/src/box/sql/trigger.c b/src/box/sql/trigger.c > index c94880086..add792c63 100644 > --- a/src/box/sql/trigger.c > +++ b/src/box/sql/trigger.c > @@ -290,10 +287,19 @@ sql_trigger_step_new(struct sql *db, u8 op, struct Token *target_name) > return NULL; > } > char *z = (char *)&trigger_step[1]; > - if (sql_normalize_name(z, name_size, target_name->z, > - target_name->n) < 0) { > - sqlDbFree(db, trigger_step); > - return NULL; > + int rc; > + if ((rc = sql_normalize_name(z, name_size, target_name->z, > + target_name->n)) > name_size) { 9. The same as in parse.y. > + name_size = rc; > + trigger_step = sqlDbReallocOrFree(db, trigger_step, > + sizeof(*trigger_step) + > + name_size); > + if (trigger_step == NULL) > + return NULL; > + z = (char *) &trigger_step[1]; > + if (sql_normalize_name(z, name_size, target_name->z, > + target_name->n) > name_size) > + unreachable(); > } > trigger_step->zTarget = z; > trigger_step->op = op; > diff --git a/src/box/sql/util.c b/src/box/sql/util.c > index e9553b3a4..b365966d5 100644 > --- a/src/box/sql/util.c > +++ b/src/box/sql/util.c> > char * > sql_normalized_name_db_new(struct sql *db, const char *name, int len) > { > - int size = sql_normalize_name(NULL, 0, name, len); > - if (size < 0) > - return NULL; > + int size = len + 1, rc; > char *res = sqlDbMallocRawNN(db, size); > - if (res == NULL) { > - diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res"); > - return NULL; > - } > - if (sql_normalize_name(res, size, name, len) < 0) { > - sqlDbFree(db, res); > - return NULL; > - } > + if (res == NULL) > + goto oom_error; 10. Funny thing, but if we looked at the history of sql malloc usages and changes we could see that firstly it was not setting diag. Then Mergen started setting diag right inside malloc.c. Then we started to set diag twice in certain places in scope of our normalization patch, because I missed Mergen's patch. Then I see this patch, and here you set diag in half of it, but do not in the other half. As I understand, sqlDbMallocRawNN already sets diag, and you don't need to do it again here. > + if ((rc = sql_normalize_name(res, size, name, len)) <= size) > + return res; > + > + size = rc; > + res = sqlDbReallocOrFree(db, res, size); > + if (res == NULL) > + goto oom_error; > + if (sql_normalize_name(res, size, name, len) > size) > + unreachable(); > return res; > + > +oom_error: > + diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res"); > + return NULL; > } > > char * > sql_normalized_name_region_new(struct region *r, const char *name, int len) > { > - int size = sql_normalize_name(NULL, 0, name, len); > - if (size < 0) > - return NULL; > - char *res = (char *) region_alloc(r, size); > - if (res == NULL) { > - diag_set(OutOfMemory, size, "region_alloc", "res"); > - return NULL; > - } > - if (sql_normalize_name(res, size, name, len) < 0) > - return NULL; > + int size = len + 1, rc; > + size_t region_svp = region_used(r); > + char *res = region_alloc(r, size); > + if (res == NULL) > + goto oom_error; > + if ((rc = sql_normalize_name(res, size, name, len)) <= size) > + return res; > + > + size = rc; > + region_truncate(r, region_svp); > + res = region_alloc(r, size); > + if (res == NULL) > + goto oom_error; > + if (sql_normalize_name(res, size, name, len) > size) > + unreachable(); > return res; > + > +oom_error: > + diag_set(OutOfMemory, size, "region", "res"); 11. Third argument of OutOfMemory is a name of the function failed to allocate memory. Here you call region_alloc, not region. > + return NULL; > } > > /* > diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h > index c823d3597..41783cc74 100644 > --- a/src/lib/core/errinj.h > +++ b/src/lib/core/errinj.h > @@ -125,7 +125,6 @@ struct errinj { > _(ERRINJ_VY_COMPACTION_DELAY, ERRINJ_BOOL, {.bparam = false}) \ > _(ERRINJ_TUPLE_FORMAT_COUNT, ERRINJ_INT, {.iparam = -1}) \ > _(ERRINJ_MEMTX_DELAY_GC, ERRINJ_BOOL, {.bparam = false}) \ > - _(ERRINJ_SQL_NAME_NORMALIZATION, ERRINJ_BOOL, {.bparam = false}) \ 12. What is a motivation of drop of that test? You still can fail in OOM, but a bit later, in sqlDbReallocOrFree in sql_expr_new_dequoted. The test was not about OOM specifically, but about normalization error, and that the error goes up to user normally, without being replaced or ignored. > > ENUM0(errinj_id, ERRINJ_LIST); > extern struct errinj errinjs[]; 13. We already have global case_map in lua/utf8.c. I think, it is worth moving it to lib/coll and use both in lua/utf8.c and here.