[tarantool-patches] Re: [PATCH v1 1/1] sql: fix perf degradation on name normalization

Kirill Shcherbatov kshcherbatov at tarantool.org
Thu Apr 4 20:12:17 MSK 2019


Hi! Thank you for review!

> 1. It is not about DDL only. Names are normalized always, on DML
> and DQL as well. You just can't look up spaces and columns by
> not normalized names.
DDL -> SQL

> 2. As I see, you did the patch on top of 2.1 branch, but our
> policy is implement patches on top of the master, and then
> cherry-pick.
Done, based on master.

> 3. As you can see, there are two exactly the same messages.
> Please, avoid this duplication.
Non-actual with last proposal.>> +		extra_size = token->n + 1;
> 
> 4. is_name now is not necessary to keep as a variable. Please,
> inline it in its single usage place.
Got rid of.

> 5. Why do you declare 'rc' and on the next line set it inside 'if'?
> It does not make the code shorter, not more readable. Just write
> rc = sql_normalize...(); and check its result on the next line.
I like this more. Ok. Reworked here and everywhere.

> 6. Either sizeof(struct Expr), or sizeof(*p). We do not use typedefs
> normally, and typedef Expr will be dropped someday.
 sizeof(*p).

> 7. In our code we put whitespace before '*'.
Fixed.

> 8. What is a motivation of dropping '(or need to be written)' ? I
> understand removal of 'on success', but why did you drop the text in
> the braces? You still can get a value > dst_size from that function.
> And since you dropped the comment on dst_size, here you need to write
> a more detailed one, when and why it can return > dst_size.
No real motivation. Returned.

> 9. The same as in parse.y.
Ok

> 10. Funny thing, but if we looked at the history of sql malloc usages
> and changes we could see that firstly it was not setting diag. Then
> Mergen started setting diag right inside malloc.c. Then we started to
> set diag twice in certain places in scope of our normalization patch,
> because I missed Mergen's patch. Then I see this patch, and here you
> set diag in half of it, but do not in the other half.
> 
> As I understand, sqlDbMallocRawNN already sets diag, and you don't need
> to do it again here.
You are right.

> 11. Third argument of OutOfMemory is a name of the function
> failed to allocate memory. Here you call region_alloc, not region.
Vova prefer "region" in such cases if I not mistaken. I don't care. "region_alloc"

> 12. What is a motivation of drop of that test? You still can
> fail in OOM, but a bit later, in sqlDbReallocOrFree in sql_expr_new_dequoted.
> The test was not about OOM specifically, but about normalization error,
> and that the error goes up to user normally, without being replaced or
> ignored.
New ERRINJ:
 char *
 sql_normalized_name_db_new(struct sql *db, const char *name, int len)
 {
-       int size = sql_normalize_name(NULL, 0, name, len);
-       if (size < 0)
+       int size = len + 1;
+       ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
+               diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res");
                return NULL;
+       });


 char *
 sql_normalized_name_region_new(struct region *r, const char *name, int len)
 {
-       int size = sql_normalize_name(NULL, 0, name, len);
-       if (size < 0)
-               return NULL;
-       char *res = (char *) region_alloc(r, size);
-       if (res == NULL) {
+       int size = len + 1;
+       ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
                diag_set(OutOfMemory, size, "region_alloc", "res");
                return NULL;
-       }

+++ b/test/sql/errinj.result
@@ -450,7 +450,14 @@ errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", true)
 ...
 box.execute("CREATE TABLE hello (id INT primary key,x INT,y INT);")
 ---
-- error: 'string conversion to the uppercase failed: U_MEMORY_ALLOCATION_ERROR'
+- error: Failed to allocate 6 bytes in sqlDbMallocRawNN for res
+...
+dummy_f = function(int) return 1 end
+---
+...
+box.internal.sql_create_function("counter1", "INT", dummy_f, -1, false)
+---
+- error: Failed to allocate 9 bytes in region_alloc for res
 ...


> 13. We already have global case_map in lua/utf8.c. I think, it is worth
> moving it to lib/coll and use both in lua/utf8.c and here.

Looks good.

========================================================

Because sql_normalize_name used to be called twice - to estimate
the size of the name buffer and to process data querying the
UCaseMap object each time performance in SQL felt by 15%.

This patch should eliminate some of the negative effects of using
ICU for name normalization.

Thanks @avtikhon for a bechmark

Follow up e7558062d3559e6bcc18f91eacb88269428321dc
---
 src/box/sql/expr.c       | 29 ++++++++--------
 src/box/sql/parse.y      | 21 ++++++------
 src/box/sql/sqlInt.h     |  9 ++---
 src/box/sql/trigger.c    | 22 ++++++++-----
 src/box/sql/util.c       | 71 +++++++++++++++++++++-------------------
 src/lib/coll/coll.c      |  8 ++++-
 src/lib/coll/coll.h      |  3 ++
 src/lua/utf8.c           | 11 +------
 test/sql/errinj.result   |  9 ++++-
 test/sql/errinj.test.lua |  2 ++
 10 files changed, 98 insertions(+), 87 deletions(-)

diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index 4b98bd175..0d61be344 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -965,22 +965,13 @@ sql_expr_new(struct sql *db, int op, const struct Token *token)
 struct Expr *
 sql_expr_new_dequoted(struct sql *db, int op, const struct Token *token)
 {
-	int extra_size = 0;
-	bool is_name = false;
+	int extra_size = 0, rc;
 	if (token != NULL) {
 		int val;
 		assert(token->z != NULL || token->n == 0);
 		if (sql_expr_token_to_int(op, token, &val) == 0)
 			return sql_expr_new_int(db, val);
-		is_name = op == TK_ID || op == TK_COLLATE || op == TK_FUNCTION;
-		if (is_name) {
-			extra_size = sql_normalize_name(NULL, 0, token->z,
-							token->n);
-			if (extra_size < 0)
-				return NULL;
-		} else {
-			extra_size = token->n + 1;
-		}
+		extra_size = token->n + 1;
 	}
 	struct Expr *e = sql_expr_new_empty(db, op, extra_size);
 	if (e == NULL || token == NULL || token->n == 0)
@@ -988,14 +979,20 @@ sql_expr_new_dequoted(struct sql *db, int op, const struct Token *token)
 	e->u.zToken = (char *) &e[1];
 	if (token->z[0] == '"')
 		e->flags |= EP_DblQuoted;
-	if (! is_name) {
+	if (op != TK_ID && op != TK_COLLATE && op != TK_FUNCTION) {
 		memcpy(e->u.zToken, token->z, token->n);
 		e->u.zToken[token->n] = '\0';
 		sqlDequote(e->u.zToken);
-	} else if (sql_normalize_name(e->u.zToken, extra_size, token->z,
-				      token->n) < 0) {
-		sql_expr_delete(db, e, false);
-		return NULL;
+	} else if ((rc = sql_normalize_name(e->u.zToken, extra_size, token->z,
+					    token->n)) > extra_size) {
+		extra_size = rc;
+		e = sqlDbReallocOrFree(db, e, sizeof(*e) + extra_size);
+		if (e == NULL)
+			return NULL;
+		e->u.zToken = (char *) &e[1];
+		if (sql_normalize_name(e->u.zToken, extra_size, token->z,
+				       token->n) > extra_size)
+			unreachable();
 	}
 	return e;
 }
diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y
index ed5c05436..4581ef6e7 100644
--- a/src/box/sql/parse.y
+++ b/src/box/sql/parse.y
@@ -893,15 +893,8 @@ idlist(A) ::= nm(Y). {
   ** that created the expression.
   */
   static void spanExpr(ExprSpan *pOut, Parse *pParse, int op, Token t){
-    int name_sz = 0;
     struct Expr *p = NULL;
-    if (op != TK_VARIABLE) {
-      name_sz = sql_normalize_name(NULL, 0, t.z, t.n);
-      if (name_sz < 0)
-        goto tarantool_error;
-    } else {
-      name_sz = t.n + 1;
-    }
+    int name_sz = t.n + 1;
     p = sqlDbMallocRawNN(pParse->db, sizeof(Expr) + name_sz);
     if( p ){
       memset(p, 0, sizeof(Expr));
@@ -936,8 +929,16 @@ idlist(A) ::= nm(Y). {
       p->iAgg = -1;
       p->u.zToken = (char*)&p[1];
       if (op != TK_VARIABLE) {
-        if (sql_normalize_name(p->u.zToken, name_sz, t.z, t.n) < 0)
-          goto tarantool_error;
+        int rc = sql_normalize_name(p->u.zToken, name_sz, t.z, t.n);
+        if (rc > name_sz) {
+          name_sz = rc;
+          p = sqlDbReallocOrFree(pParse->db, p, sizeof(*p) + name_sz);
+          if (p == NULL)
+            goto tarantool_error;
+          p->u.zToken = (char *) &p[1];
+          if (sql_normalize_name(p->u.zToken, name_sz, t.z, t.n) > name_sz)
+              unreachable();
+        }
       } else {
         memcpy(p->u.zToken, t.z, t.n);
         p->u.zToken[t.n] = 0;
diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
index d8dc03284..b322602dc 100644
--- a/src/box/sql/sqlInt.h
+++ b/src/box/sql/sqlInt.h
@@ -3175,15 +3175,10 @@ void sqlDequote(char *);
  * @param dst A buffer for the result string. The result will be
  *        0-terminated if the buffer is large enough. The contents
  *        is undefined in case of failure.
- * @param dst_size The size of the buffer (number of bytes). If it
- *        is 0, then dest may be NULL and the function will only
- *        return the length of the result without writing any of
- *        the result string
+ * @param dst_size The size of the buffer (number of bytes).
  * @param src The original string.
  * @param src_len The length of the original string.
- * @retval The count of bytes written(or need to be written) on
- *         success.
- * @retval < 0 Otherwise. The diag message is set.
+ * @retval The count of bytes written (or need to be written).
  */
 int
 sql_normalize_name(char *dst, int dst_size, const char *src, int src_len);
diff --git a/src/box/sql/trigger.c b/src/box/sql/trigger.c
index 20b2c970c..14e4198a8 100644
--- a/src/box/sql/trigger.c
+++ b/src/box/sql/trigger.c
@@ -278,10 +278,7 @@ sql_trigger_select_step(struct sql *db, struct Select *select)
 static struct TriggerStep *
 sql_trigger_step_new(struct sql *db, u8 op, struct Token *target_name)
 {
-	int name_size =
-		sql_normalize_name(NULL, 0, target_name->z, target_name->n);
-	if (name_size < 0)
-		return NULL;
+	int name_size = target_name->n + 1;
 	int size = sizeof(struct TriggerStep) + name_size;
 	struct TriggerStep *trigger_step = sqlDbMallocZero(db, size);
 	if (trigger_step == NULL) {
@@ -289,10 +286,19 @@ sql_trigger_step_new(struct sql *db, u8 op, struct Token *target_name)
 		return NULL;
 	}
 	char *z = (char *)&trigger_step[1];
-	if (sql_normalize_name(z, name_size, target_name->z,
-			       target_name->n) < 0) {
-		sqlDbFree(db, trigger_step);
-		return NULL;
+	int rc = sql_normalize_name(z, name_size, target_name->z,
+				    target_name->n);
+	if (rc > name_size) {
+		name_size = rc;
+		trigger_step = sqlDbReallocOrFree(db, trigger_step,
+						  sizeof(*trigger_step) +
+						  name_size);
+		if (trigger_step == NULL)
+			return NULL;
+		z = (char *) &trigger_step[1];
+		if (sql_normalize_name(z, name_size, target_name->z,
+				       target_name->n) > name_size)
+			unreachable();
 	}
 	trigger_step->zTarget = z;
 	trigger_step->op = op;
diff --git a/src/box/sql/util.c b/src/box/sql/util.c
index a13efa682..2f3c17c9a 100644
--- a/src/box/sql/util.c
+++ b/src/box/sql/util.c
@@ -41,6 +41,7 @@
 #if HAVE_ISNAN || SQL_HAVE_ISNAN
 #include <math.h>
 #endif
+#include "coll/coll.h"
 #include <unicode/ucasemap.h>
 #include "errinj.h"
 
@@ -259,66 +260,68 @@ int
 sql_normalize_name(char *dst, int dst_size, const char *src, int src_len)
 {
 	assert(src != NULL);
+	assert(dst != NULL && dst_size > 0);
 	if (sqlIsquote(src[0])){
-		if (dst_size == 0)
-			return src_len + 1;
 		memcpy(dst, src, src_len);
 		dst[src_len] = '\0';
 		sqlDequote(dst);
 		return src_len + 1;
 	}
 	UErrorCode status = U_ZERO_ERROR;
-	ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
-		status = U_MEMORY_ALLOCATION_ERROR;
-		goto error;
-	});
-	UCaseMap *case_map = ucasemap_open(NULL, 0, &status);
-	if (case_map == NULL)
-		goto error;
-	int len = ucasemap_utf8ToUpper(case_map, dst, dst_size, src, src_len,
-				       &status);
-	ucasemap_close(case_map);
-	assert(U_SUCCESS(status) ||
-	       (dst_size == 0 && status == U_BUFFER_OVERFLOW_ERROR));
+	assert(root_map != NULL);
+	int len = ucasemap_utf8ToUpper(root_map, dst, dst_size, src,
+				       src_len, &status);
+	assert(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR);
 	return len + 1;
-error:
-	diag_set(CollationError,
-		 "string conversion to the uppercase failed: %s",
-		 u_errorName(status));
-	return -1;
 }
 
 char *
 sql_normalized_name_db_new(struct sql *db, const char *name, int len)
 {
-	int size = sql_normalize_name(NULL, 0, name, len);
-	if (size < 0)
+	int size = len + 1;
+	ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
+		diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res");
 		return NULL;
+	});
 	char *res = sqlDbMallocRawNN(db, size);
-	if (res == NULL) {
-		diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res");
+	if (res == NULL)
 		return NULL;
-	}
-	if (sql_normalize_name(res, size, name, len) < 0) {
-		sqlDbFree(db, res);
+	int rc = sql_normalize_name(res, size, name, len);
+	if (rc <= size)
+		return res;
+
+	size = rc;
+	res = sqlDbReallocOrFree(db, res, size);
+	if (res == NULL)
 		return NULL;
-	}
+	if (sql_normalize_name(res, size, name, len) > size)
+		unreachable();
 	return res;
 }
 
 char *
 sql_normalized_name_region_new(struct region *r, const char *name, int len)
 {
-	int size = sql_normalize_name(NULL, 0, name, len);
-	if (size < 0)
-		return NULL;
-	char *res = (char *) region_alloc(r, size);
-	if (res == NULL) {
+	int size = len + 1;
+	ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
 		diag_set(OutOfMemory, size, "region_alloc", "res");
 		return NULL;
-	}
-	if (sql_normalize_name(res, size, name, len) < 0)
+	});
+	size_t region_svp = region_used(r);
+	char *res = region_alloc(r, size);
+	if (res == NULL)
+		return NULL;
+	int rc = sql_normalize_name(res, size, name, len);
+	if (rc <= size)
+		return res;
+
+	size = rc;
+	region_truncate(r, region_svp);
+	res = region_alloc(r, size);
+	if (res == NULL)
 		return NULL;
+	if (sql_normalize_name(res, size, name, len) > size)
+		unreachable();
 	return res;
 }
 
diff --git a/src/lib/coll/coll.c b/src/lib/coll/coll.c
index b83f0fdc7..21f2489d4 100644
--- a/src/lib/coll/coll.c
+++ b/src/lib/coll/coll.c
@@ -34,8 +34,11 @@
 #include "diag.h"
 #include "assoc.h"
 #include <unicode/ucol.h>
+#include <unicode/ucasemap.h>
 #include <trivia/config.h>
 
+struct UCaseMap *root_map = NULL;
+
 #define mh_name _coll
 struct mh_coll_key_t {
 	const char *str;
@@ -414,13 +417,16 @@ coll_unref(struct coll *coll)
 void
 coll_init()
 {
+	UErrorCode err = U_ZERO_ERROR;
 	coll_cache = mh_coll_new();
-	if (coll_cache == NULL)
+	root_map = ucasemap_open("", 0, &err);
+	if (coll_cache == NULL || root_map == NULL)
 		panic("Can not create system collations cache");
 }
 
 void
 coll_free()
 {
+	ucasemap_close(root_map);
 	mh_coll_delete(coll_cache);
 }
diff --git a/src/lib/coll/coll.h b/src/lib/coll/coll.h
index c505804f6..713c9646e 100644
--- a/src/lib/coll/coll.h
+++ b/src/lib/coll/coll.h
@@ -53,6 +53,9 @@ typedef size_t (*coll_hint_f)(const char *s, size_t s_len, char *buf,
 
 struct UCollator;
 
+/** Default universal casemap for case transformations. */
+extern struct UCaseMap *root_map;
+
 /**
  * Collation. It has no unique features like name, id or owner.
  * Only functional part - comparator, locale, ICU settings.
diff --git a/src/lua/utf8.c b/src/lua/utf8.c
index de026a921..311698bda 100644
--- a/src/lua/utf8.c
+++ b/src/lua/utf8.c
@@ -38,9 +38,6 @@
 
 extern struct ibuf *tarantool_lua_ibuf;
 
-/** Default universal casemap for case transformations. */
-static UCaseMap *root_map = NULL;
-
 /** Collations for cmp/casecmp functions. */
 static struct coll *unicode_coll = NULL;
 static struct coll *unicode_ci_coll = NULL;
@@ -447,12 +444,7 @@ static const struct luaL_Reg utf8_lib[] = {
 void
 tarantool_lua_utf8_init(struct lua_State *L)
 {
-	UErrorCode err = U_ZERO_ERROR;
-	root_map = ucasemap_open("", 0, &err);
-	if (root_map == NULL) {
-		luaL_error(L, tt_sprintf("error in ICU ucasemap_open: %s",
-					 u_errorName(err)));
-	}
+	assert(root_map != NULL);
 	struct coll_def def;
 	memset(&def, 0, sizeof(def));
 	def.icu.strength = COLL_ICU_STRENGTH_TERTIARY;
@@ -474,7 +466,6 @@ error_coll:
 void
 tarantool_lua_utf8_free()
 {
-	ucasemap_close(root_map);
 	if (unicode_coll != NULL)
 		coll_unref(unicode_coll);
 	if (unicode_ci_coll != NULL)
diff --git a/test/sql/errinj.result b/test/sql/errinj.result
index dead00f7d..5a427e855 100644
--- a/test/sql/errinj.result
+++ b/test/sql/errinj.result
@@ -450,7 +450,14 @@ errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", true)
 ...
 box.execute("CREATE TABLE hello (id INT primary key,x INT,y INT);")
 ---
-- error: 'string conversion to the uppercase failed: U_MEMORY_ALLOCATION_ERROR'
+- error: Failed to allocate 6 bytes in sqlDbMallocRawNN for res
+...
+dummy_f = function(int) return 1 end
+---
+...
+box.internal.sql_create_function("counter1", "INT", dummy_f, -1, false)
+---
+- error: Failed to allocate 9 bytes in region_alloc for res
 ...
 errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", false)
 ---
diff --git a/test/sql/errinj.test.lua b/test/sql/errinj.test.lua
index 736e66957..7f4b30baf 100644
--- a/test/sql/errinj.test.lua
+++ b/test/sql/errinj.test.lua
@@ -146,4 +146,6 @@ errinj.set("ERRINJ_WAL_DELAY", false)
 errinj = box.error.injection
 errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", true)
 box.execute("CREATE TABLE hello (id INT primary key,x INT,y INT);")
+dummy_f = function(int) return 1 end
+box.internal.sql_create_function("counter1", "INT", dummy_f, -1, false)
 errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", false)
-- 
2.21.0





More information about the Tarantool-patches mailing list