[tarantool-patches] Re: [PATCH v1 1/1] sql: fix perf degradation on name normalization
Kirill Shcherbatov
kshcherbatov at tarantool.org
Thu Apr 4 20:12:17 MSK 2019
Hi! Thank you for review!
> 1. It is not about DDL only. Names are normalized always, on DML
> and DQL as well. You just can't look up spaces and columns by
> not normalized names.
DDL -> SQL
> 2. As I see, you did the patch on top of 2.1 branch, but our
> policy is implement patches on top of the master, and then
> cherry-pick.
Done, based on master.
> 3. As you can see, there are two exactly the same messages.
> Please, avoid this duplication.
Non-actual with last proposal.>> + extra_size = token->n + 1;
>
> 4. is_name now is not necessary to keep as a variable. Please,
> inline it in its single usage place.
Got rid of.
> 5. Why do you declare 'rc' and on the next line set it inside 'if'?
> It does not make the code shorter, not more readable. Just write
> rc = sql_normalize...(); and check its result on the next line.
I like this more. Ok. Reworked here and everywhere.
> 6. Either sizeof(struct Expr), or sizeof(*p). We do not use typedefs
> normally, and typedef Expr will be dropped someday.
sizeof(*p).
> 7. In our code we put whitespace before '*'.
Fixed.
> 8. What is a motivation of dropping '(or need to be written)' ? I
> understand removal of 'on success', but why did you drop the text in
> the braces? You still can get a value > dst_size from that function.
> And since you dropped the comment on dst_size, here you need to write
> a more detailed one, when and why it can return > dst_size.
No real motivation. Returned.
> 9. The same as in parse.y.
Ok
> 10. Funny thing, but if we looked at the history of sql malloc usages
> and changes we could see that firstly it was not setting diag. Then
> Mergen started setting diag right inside malloc.c. Then we started to
> set diag twice in certain places in scope of our normalization patch,
> because I missed Mergen's patch. Then I see this patch, and here you
> set diag in half of it, but do not in the other half.
>
> As I understand, sqlDbMallocRawNN already sets diag, and you don't need
> to do it again here.
You are right.
> 11. Third argument of OutOfMemory is a name of the function
> failed to allocate memory. Here you call region_alloc, not region.
Vova prefer "region" in such cases if I not mistaken. I don't care. "region_alloc"
> 12. What is a motivation of drop of that test? You still can
> fail in OOM, but a bit later, in sqlDbReallocOrFree in sql_expr_new_dequoted.
> The test was not about OOM specifically, but about normalization error,
> and that the error goes up to user normally, without being replaced or
> ignored.
New ERRINJ:
char *
sql_normalized_name_db_new(struct sql *db, const char *name, int len)
{
- int size = sql_normalize_name(NULL, 0, name, len);
- if (size < 0)
+ int size = len + 1;
+ ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
+ diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res");
return NULL;
+ });
char *
sql_normalized_name_region_new(struct region *r, const char *name, int len)
{
- int size = sql_normalize_name(NULL, 0, name, len);
- if (size < 0)
- return NULL;
- char *res = (char *) region_alloc(r, size);
- if (res == NULL) {
+ int size = len + 1;
+ ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
diag_set(OutOfMemory, size, "region_alloc", "res");
return NULL;
- }
+++ b/test/sql/errinj.result
@@ -450,7 +450,14 @@ errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", true)
...
box.execute("CREATE TABLE hello (id INT primary key,x INT,y INT);")
---
-- error: 'string conversion to the uppercase failed: U_MEMORY_ALLOCATION_ERROR'
+- error: Failed to allocate 6 bytes in sqlDbMallocRawNN for res
+...
+dummy_f = function(int) return 1 end
+---
+...
+box.internal.sql_create_function("counter1", "INT", dummy_f, -1, false)
+---
+- error: Failed to allocate 9 bytes in region_alloc for res
...
> 13. We already have global case_map in lua/utf8.c. I think, it is worth
> moving it to lib/coll and use both in lua/utf8.c and here.
Looks good.
========================================================
Because sql_normalize_name used to be called twice - to estimate
the size of the name buffer and to process data querying the
UCaseMap object each time performance in SQL felt by 15%.
This patch should eliminate some of the negative effects of using
ICU for name normalization.
Thanks @avtikhon for a bechmark
Follow up e7558062d3559e6bcc18f91eacb88269428321dc
---
src/box/sql/expr.c | 29 ++++++++--------
src/box/sql/parse.y | 21 ++++++------
src/box/sql/sqlInt.h | 9 ++---
src/box/sql/trigger.c | 22 ++++++++-----
src/box/sql/util.c | 71 +++++++++++++++++++++-------------------
src/lib/coll/coll.c | 8 ++++-
src/lib/coll/coll.h | 3 ++
src/lua/utf8.c | 11 +------
test/sql/errinj.result | 9 ++++-
test/sql/errinj.test.lua | 2 ++
10 files changed, 98 insertions(+), 87 deletions(-)
diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index 4b98bd175..0d61be344 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -965,22 +965,13 @@ sql_expr_new(struct sql *db, int op, const struct Token *token)
struct Expr *
sql_expr_new_dequoted(struct sql *db, int op, const struct Token *token)
{
- int extra_size = 0;
- bool is_name = false;
+ int extra_size = 0, rc;
if (token != NULL) {
int val;
assert(token->z != NULL || token->n == 0);
if (sql_expr_token_to_int(op, token, &val) == 0)
return sql_expr_new_int(db, val);
- is_name = op == TK_ID || op == TK_COLLATE || op == TK_FUNCTION;
- if (is_name) {
- extra_size = sql_normalize_name(NULL, 0, token->z,
- token->n);
- if (extra_size < 0)
- return NULL;
- } else {
- extra_size = token->n + 1;
- }
+ extra_size = token->n + 1;
}
struct Expr *e = sql_expr_new_empty(db, op, extra_size);
if (e == NULL || token == NULL || token->n == 0)
@@ -988,14 +979,20 @@ sql_expr_new_dequoted(struct sql *db, int op, const struct Token *token)
e->u.zToken = (char *) &e[1];
if (token->z[0] == '"')
e->flags |= EP_DblQuoted;
- if (! is_name) {
+ if (op != TK_ID && op != TK_COLLATE && op != TK_FUNCTION) {
memcpy(e->u.zToken, token->z, token->n);
e->u.zToken[token->n] = '\0';
sqlDequote(e->u.zToken);
- } else if (sql_normalize_name(e->u.zToken, extra_size, token->z,
- token->n) < 0) {
- sql_expr_delete(db, e, false);
- return NULL;
+ } else if ((rc = sql_normalize_name(e->u.zToken, extra_size, token->z,
+ token->n)) > extra_size) {
+ extra_size = rc;
+ e = sqlDbReallocOrFree(db, e, sizeof(*e) + extra_size);
+ if (e == NULL)
+ return NULL;
+ e->u.zToken = (char *) &e[1];
+ if (sql_normalize_name(e->u.zToken, extra_size, token->z,
+ token->n) > extra_size)
+ unreachable();
}
return e;
}
diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y
index ed5c05436..4581ef6e7 100644
--- a/src/box/sql/parse.y
+++ b/src/box/sql/parse.y
@@ -893,15 +893,8 @@ idlist(A) ::= nm(Y). {
** that created the expression.
*/
static void spanExpr(ExprSpan *pOut, Parse *pParse, int op, Token t){
- int name_sz = 0;
struct Expr *p = NULL;
- if (op != TK_VARIABLE) {
- name_sz = sql_normalize_name(NULL, 0, t.z, t.n);
- if (name_sz < 0)
- goto tarantool_error;
- } else {
- name_sz = t.n + 1;
- }
+ int name_sz = t.n + 1;
p = sqlDbMallocRawNN(pParse->db, sizeof(Expr) + name_sz);
if( p ){
memset(p, 0, sizeof(Expr));
@@ -936,8 +929,16 @@ idlist(A) ::= nm(Y). {
p->iAgg = -1;
p->u.zToken = (char*)&p[1];
if (op != TK_VARIABLE) {
- if (sql_normalize_name(p->u.zToken, name_sz, t.z, t.n) < 0)
- goto tarantool_error;
+ int rc = sql_normalize_name(p->u.zToken, name_sz, t.z, t.n);
+ if (rc > name_sz) {
+ name_sz = rc;
+ p = sqlDbReallocOrFree(pParse->db, p, sizeof(*p) + name_sz);
+ if (p == NULL)
+ goto tarantool_error;
+ p->u.zToken = (char *) &p[1];
+ if (sql_normalize_name(p->u.zToken, name_sz, t.z, t.n) > name_sz)
+ unreachable();
+ }
} else {
memcpy(p->u.zToken, t.z, t.n);
p->u.zToken[t.n] = 0;
diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
index d8dc03284..b322602dc 100644
--- a/src/box/sql/sqlInt.h
+++ b/src/box/sql/sqlInt.h
@@ -3175,15 +3175,10 @@ void sqlDequote(char *);
* @param dst A buffer for the result string. The result will be
* 0-terminated if the buffer is large enough. The contents
* is undefined in case of failure.
- * @param dst_size The size of the buffer (number of bytes). If it
- * is 0, then dest may be NULL and the function will only
- * return the length of the result without writing any of
- * the result string
+ * @param dst_size The size of the buffer (number of bytes).
* @param src The original string.
* @param src_len The length of the original string.
- * @retval The count of bytes written(or need to be written) on
- * success.
- * @retval < 0 Otherwise. The diag message is set.
+ * @retval The count of bytes written (or need to be written).
*/
int
sql_normalize_name(char *dst, int dst_size, const char *src, int src_len);
diff --git a/src/box/sql/trigger.c b/src/box/sql/trigger.c
index 20b2c970c..14e4198a8 100644
--- a/src/box/sql/trigger.c
+++ b/src/box/sql/trigger.c
@@ -278,10 +278,7 @@ sql_trigger_select_step(struct sql *db, struct Select *select)
static struct TriggerStep *
sql_trigger_step_new(struct sql *db, u8 op, struct Token *target_name)
{
- int name_size =
- sql_normalize_name(NULL, 0, target_name->z, target_name->n);
- if (name_size < 0)
- return NULL;
+ int name_size = target_name->n + 1;
int size = sizeof(struct TriggerStep) + name_size;
struct TriggerStep *trigger_step = sqlDbMallocZero(db, size);
if (trigger_step == NULL) {
@@ -289,10 +286,19 @@ sql_trigger_step_new(struct sql *db, u8 op, struct Token *target_name)
return NULL;
}
char *z = (char *)&trigger_step[1];
- if (sql_normalize_name(z, name_size, target_name->z,
- target_name->n) < 0) {
- sqlDbFree(db, trigger_step);
- return NULL;
+ int rc = sql_normalize_name(z, name_size, target_name->z,
+ target_name->n);
+ if (rc > name_size) {
+ name_size = rc;
+ trigger_step = sqlDbReallocOrFree(db, trigger_step,
+ sizeof(*trigger_step) +
+ name_size);
+ if (trigger_step == NULL)
+ return NULL;
+ z = (char *) &trigger_step[1];
+ if (sql_normalize_name(z, name_size, target_name->z,
+ target_name->n) > name_size)
+ unreachable();
}
trigger_step->zTarget = z;
trigger_step->op = op;
diff --git a/src/box/sql/util.c b/src/box/sql/util.c
index a13efa682..2f3c17c9a 100644
--- a/src/box/sql/util.c
+++ b/src/box/sql/util.c
@@ -41,6 +41,7 @@
#if HAVE_ISNAN || SQL_HAVE_ISNAN
#include <math.h>
#endif
+#include "coll/coll.h"
#include <unicode/ucasemap.h>
#include "errinj.h"
@@ -259,66 +260,68 @@ int
sql_normalize_name(char *dst, int dst_size, const char *src, int src_len)
{
assert(src != NULL);
+ assert(dst != NULL && dst_size > 0);
if (sqlIsquote(src[0])){
- if (dst_size == 0)
- return src_len + 1;
memcpy(dst, src, src_len);
dst[src_len] = '\0';
sqlDequote(dst);
return src_len + 1;
}
UErrorCode status = U_ZERO_ERROR;
- ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
- status = U_MEMORY_ALLOCATION_ERROR;
- goto error;
- });
- UCaseMap *case_map = ucasemap_open(NULL, 0, &status);
- if (case_map == NULL)
- goto error;
- int len = ucasemap_utf8ToUpper(case_map, dst, dst_size, src, src_len,
- &status);
- ucasemap_close(case_map);
- assert(U_SUCCESS(status) ||
- (dst_size == 0 && status == U_BUFFER_OVERFLOW_ERROR));
+ assert(root_map != NULL);
+ int len = ucasemap_utf8ToUpper(root_map, dst, dst_size, src,
+ src_len, &status);
+ assert(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR);
return len + 1;
-error:
- diag_set(CollationError,
- "string conversion to the uppercase failed: %s",
- u_errorName(status));
- return -1;
}
char *
sql_normalized_name_db_new(struct sql *db, const char *name, int len)
{
- int size = sql_normalize_name(NULL, 0, name, len);
- if (size < 0)
+ int size = len + 1;
+ ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
+ diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res");
return NULL;
+ });
char *res = sqlDbMallocRawNN(db, size);
- if (res == NULL) {
- diag_set(OutOfMemory, size, "sqlDbMallocRawNN", "res");
+ if (res == NULL)
return NULL;
- }
- if (sql_normalize_name(res, size, name, len) < 0) {
- sqlDbFree(db, res);
+ int rc = sql_normalize_name(res, size, name, len);
+ if (rc <= size)
+ return res;
+
+ size = rc;
+ res = sqlDbReallocOrFree(db, res, size);
+ if (res == NULL)
return NULL;
- }
+ if (sql_normalize_name(res, size, name, len) > size)
+ unreachable();
return res;
}
char *
sql_normalized_name_region_new(struct region *r, const char *name, int len)
{
- int size = sql_normalize_name(NULL, 0, name, len);
- if (size < 0)
- return NULL;
- char *res = (char *) region_alloc(r, size);
- if (res == NULL) {
+ int size = len + 1;
+ ERROR_INJECT(ERRINJ_SQL_NAME_NORMALIZATION, {
diag_set(OutOfMemory, size, "region_alloc", "res");
return NULL;
- }
- if (sql_normalize_name(res, size, name, len) < 0)
+ });
+ size_t region_svp = region_used(r);
+ char *res = region_alloc(r, size);
+ if (res == NULL)
+ return NULL;
+ int rc = sql_normalize_name(res, size, name, len);
+ if (rc <= size)
+ return res;
+
+ size = rc;
+ region_truncate(r, region_svp);
+ res = region_alloc(r, size);
+ if (res == NULL)
return NULL;
+ if (sql_normalize_name(res, size, name, len) > size)
+ unreachable();
return res;
}
diff --git a/src/lib/coll/coll.c b/src/lib/coll/coll.c
index b83f0fdc7..21f2489d4 100644
--- a/src/lib/coll/coll.c
+++ b/src/lib/coll/coll.c
@@ -34,8 +34,11 @@
#include "diag.h"
#include "assoc.h"
#include <unicode/ucol.h>
+#include <unicode/ucasemap.h>
#include <trivia/config.h>
+struct UCaseMap *root_map = NULL;
+
#define mh_name _coll
struct mh_coll_key_t {
const char *str;
@@ -414,13 +417,16 @@ coll_unref(struct coll *coll)
void
coll_init()
{
+ UErrorCode err = U_ZERO_ERROR;
coll_cache = mh_coll_new();
- if (coll_cache == NULL)
+ root_map = ucasemap_open("", 0, &err);
+ if (coll_cache == NULL || root_map == NULL)
panic("Can not create system collations cache");
}
void
coll_free()
{
+ ucasemap_close(root_map);
mh_coll_delete(coll_cache);
}
diff --git a/src/lib/coll/coll.h b/src/lib/coll/coll.h
index c505804f6..713c9646e 100644
--- a/src/lib/coll/coll.h
+++ b/src/lib/coll/coll.h
@@ -53,6 +53,9 @@ typedef size_t (*coll_hint_f)(const char *s, size_t s_len, char *buf,
struct UCollator;
+/** Default universal casemap for case transformations. */
+extern struct UCaseMap *root_map;
+
/**
* Collation. It has no unique features like name, id or owner.
* Only functional part - comparator, locale, ICU settings.
diff --git a/src/lua/utf8.c b/src/lua/utf8.c
index de026a921..311698bda 100644
--- a/src/lua/utf8.c
+++ b/src/lua/utf8.c
@@ -38,9 +38,6 @@
extern struct ibuf *tarantool_lua_ibuf;
-/** Default universal casemap for case transformations. */
-static UCaseMap *root_map = NULL;
-
/** Collations for cmp/casecmp functions. */
static struct coll *unicode_coll = NULL;
static struct coll *unicode_ci_coll = NULL;
@@ -447,12 +444,7 @@ static const struct luaL_Reg utf8_lib[] = {
void
tarantool_lua_utf8_init(struct lua_State *L)
{
- UErrorCode err = U_ZERO_ERROR;
- root_map = ucasemap_open("", 0, &err);
- if (root_map == NULL) {
- luaL_error(L, tt_sprintf("error in ICU ucasemap_open: %s",
- u_errorName(err)));
- }
+ assert(root_map != NULL);
struct coll_def def;
memset(&def, 0, sizeof(def));
def.icu.strength = COLL_ICU_STRENGTH_TERTIARY;
@@ -474,7 +466,6 @@ error_coll:
void
tarantool_lua_utf8_free()
{
- ucasemap_close(root_map);
if (unicode_coll != NULL)
coll_unref(unicode_coll);
if (unicode_ci_coll != NULL)
diff --git a/test/sql/errinj.result b/test/sql/errinj.result
index dead00f7d..5a427e855 100644
--- a/test/sql/errinj.result
+++ b/test/sql/errinj.result
@@ -450,7 +450,14 @@ errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", true)
...
box.execute("CREATE TABLE hello (id INT primary key,x INT,y INT);")
---
-- error: 'string conversion to the uppercase failed: U_MEMORY_ALLOCATION_ERROR'
+- error: Failed to allocate 6 bytes in sqlDbMallocRawNN for res
+...
+dummy_f = function(int) return 1 end
+---
+...
+box.internal.sql_create_function("counter1", "INT", dummy_f, -1, false)
+---
+- error: Failed to allocate 9 bytes in region_alloc for res
...
errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", false)
---
diff --git a/test/sql/errinj.test.lua b/test/sql/errinj.test.lua
index 736e66957..7f4b30baf 100644
--- a/test/sql/errinj.test.lua
+++ b/test/sql/errinj.test.lua
@@ -146,4 +146,6 @@ errinj.set("ERRINJ_WAL_DELAY", false)
errinj = box.error.injection
errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", true)
box.execute("CREATE TABLE hello (id INT primary key,x INT,y INT);")
+dummy_f = function(int) return 1 end
+box.internal.sql_create_function("counter1", "INT", dummy_f, -1, false)
errinj.set("ERRINJ_SQL_NAME_NORMALIZATION", false)
--
2.21.0
More information about the Tarantool-patches
mailing list