From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 3F2612A99A for ; Thu, 4 Apr 2019 13:31:30 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xa2v6bprrhgc for ; Thu, 4 Apr 2019 13:31:30 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id EAC982A98E for ; Thu, 4 Apr 2019 13:31:29 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH v1 1/1] sql: fix perf degradation on name normalization References: <09c8ef39eaf35ae7fa6825236a3b32b54d13dec5.1554386791.git.kshcherbatov@tarantool.org> <7cdb5d69-8ace-3899-872e-c97c477866e6@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Thu, 4 Apr 2019 20:31:27 +0300 MIME-Version: 1.0 In-Reply-To: <7cdb5d69-8ace-3899-872e-c97c477866e6@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: Kirill Shcherbatov , tarantool-patches@freelists.org >> 5. Why do you declare 'rc' and on the next line set it inside 'if'? >> It does not make the code shorter, not more readable. Just write >> rc = sql_normalize...(); and check its result on the next line. > I like this more. Ok. Reworked here and everywhere. I could understand that if not the fact, that through the patch you used both, sometimes in one function on neighbour lines. >> 11. Third argument of OutOfMemory is a name of the function >> failed to allocate memory. Here you call region_alloc, not region. > Vova prefer "region" in such cases if I not mistaken. I don't care. "region_alloc" Do not remember if Vova even once said me something like this. But clearly remember that Kostja said to use function name, and we do that through the code. See 3 comments below. > > Because sql_normalize_name used to be called twice - to estimate > the size of the name buffer and to process data querying the > UCaseMap object each time performance in SQL felt by 15%. > > This patch should eliminate some of the negative effects of using > ICU for name normalization. > > Thanks @avtikhon for a bechmark > > Follow up e7558062d3559e6bcc18f91eacb88269428321dc > --- > src/box/sql/expr.c | 29 ++++++++-------- > src/box/sql/parse.y | 21 ++++++------ > src/box/sql/sqlInt.h | 9 ++--- > src/box/sql/trigger.c | 22 ++++++++----- > src/box/sql/util.c | 71 +++++++++++++++++++++------------------- > src/lib/coll/coll.c | 8 ++++- > src/lib/coll/coll.h | 3 ++ > src/lua/utf8.c | 11 +------ > test/sql/errinj.result | 9 ++++- > test/sql/errinj.test.lua | 2 ++ > 10 files changed, 98 insertions(+), 87 deletions(-) > > diff --git a/src/box/sql/util.c b/src/box/sql/util.c > index a13efa682..2f3c17c9a 100644 > --- a/src/box/sql/util.c > +++ b/src/box/sql/util.c > @@ -259,66 +260,68 @@ int > char * > sql_normalized_name_region_new(struct region *r, const char *name, int len) > { > - int size = sql_normalize_name(NULL, 0, name, len); > - if (size < 0) > - return NULL; > - char *res = (char *) region_alloc(r, size); > - if (res == NULL) { > + int size = len + 1; > + ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, { > diag_set(OutOfMemory, size, "region_alloc", "res"); > return NULL; > - } > - if (sql_normalize_name(res, size, name, len) < 0) > + }); > + size_t region_svp = region_used(r); > + char *res = region_alloc(r, size); > + if (res == NULL) > + return NULL; 1. Missed diag_set. > + int rc = sql_normalize_name(res, size, name, len); > + if (rc <= size) > + return res; > + > + size = rc; > + region_truncate(r, region_svp); > + res = region_alloc(r, size); > + if (res == NULL) > return NULL; 2. Again. > + if (sql_normalize_name(res, size, name, len) > size) > + unreachable(); > return res; > } > > diff --git a/src/lib/coll/coll.c b/src/lib/coll/coll.c > index b83f0fdc7..21f2489d4 100644 > --- a/src/lib/coll/coll.c > +++ b/src/lib/coll/coll.c > @@ -34,8 +34,11 @@ > #include "diag.h" > #include "assoc.h" > #include > +#include > #include > > +struct UCaseMap *root_map = NULL; 3. That name was ok to be local for utf8.c, but now it is global, and 'root_map' in the whole scope of tarantool looks ambiguous. What it is? MessagePack map? Lua table map? RB-tree map? What is 'root'? I propose icu_ucase_default_map