* [tarantool-patches] [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set @ 2018-12-15 10:57 Roman Khabibov 2018-12-18 8:40 ` [tarantool-patches] " n.pettik ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Roman Khabibov @ 2018-12-15 10:57 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy The reason for the bug was that X'00' is a terminal symbol. If the char set contained X'00', all other characters after its (inculding itself) are then ignored. Closes #3543 Branch: https://github.com/tarantool/tarantool/tree/romankhabibov/gh-3543-trim-terminal Issue: https://github.com/tarantool/tarantool/issues/3543 --- src/box/sql/func.c | 18 +++++- test/sql-tap/func.test.lua | 126 ++++++++++++++++++++++++++++++++++++- 2 files changed, 140 insertions(+), 4 deletions(-) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 9667aead5..5beba7bd2 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1224,8 +1224,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) return; } else { const unsigned char *z; - for (z = zCharSet, nChar = 0; *z; nChar++) { + int sizeInChar = sqlite3_value_bytes(argv[1]); + int length = 0; + z = zCharSet; + nChar = 0; + const unsigned char *zStepBack; + while(sizeInChar - length) { + zStepBack = z; SQLITE_SKIP_UTF8(z); + length += z - zStepBack; + nChar++; } if (nChar > 0) { azChar = @@ -1235,10 +1243,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) return; } aLen = (unsigned char *)&azChar[nChar]; - for (z = zCharSet, nChar = 0; *z; nChar++) { + z = zCharSet; + nChar = 0; + length = 0; + while(sizeInChar - length) { azChar[nChar] = (unsigned char *)z; + zStepBack = z; SQLITE_SKIP_UTF8(z); + length += z - zStepBack; aLen[nChar] = (u8) (z - azChar[nChar]); + nChar++; } } } diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 393212968..b7b8e7c4c 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test = require("sqltester") -test:plan(14535) +test:plan(14547) --!./tcltestrunner.lua -- 2001 September 15 @@ -2100,11 +2100,133 @@ test:do_execsql_test( -- </func-22.22> }) +-- gh-3543 Check trimming of binary string when X'00' in trimming char set. + +test:do_execsql_test( + "func-22.23", + [[ + SELECT TRIM(X'004100', X'00'); + ]], { + -- <func-22.23> + "A" + -- </func-22.23> + }) + +test:do_execsql_test( + "func-22.24", + [[ + SELECT TRIM(X'004100', X'0000'); + ]], { + -- <func-22.24> + "A" + -- </func-22.24> + }) + +test:do_execsql_test( + "func-22.25", + [[ + SELECT TRIM(X'004100', X'0042'); + ]], { + -- <func-22.25> + "A" + -- </func-22.25> + }) + +test:do_execsql_test( + "func-22.26", + [[ + SELECT TRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.26> + "A\0B" + -- </func-22.26> + }) + +test:do_execsql_test( + "func-22.27", + [[ + SELECT LTRIM(X'004100', X'00'); + ]], { + -- <func-22.27> + "A\0" + -- </func-22.27> + }) + +test:do_execsql_test( + "func-22.28", + [[ + SELECT LTRIM(X'004100', X'0000'); + ]], { + -- <func-22.28> + "A\0" + -- </func-22.28> + }) + +test:do_execsql_test( + "func-22.29", + [[ + SELECT LTRIM(X'004100', X'0042'); + ]], { + -- <func-22.29> + "A\0" + -- </func-22.29> + }) + +test:do_execsql_test( + "func-22.30", + [[ + SELECT LTRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.30> + "A\0B\0\0" + -- </func-22.30> + }) + +test:do_execsql_test( + "func-22.31", + [[ + SELECT RTRIM(X'004100', X'00'); + ]], { + -- <func-22.31> + "\0A" + -- </func-22.31> + }) + +test:do_execsql_test( + "func-22.32", + [[ + SELECT RTRIM(X'004100', X'0000'); + ]], { + -- <func-22.32> + "\0A" + -- </func-22.32> + }) + +test:do_execsql_test( + "func-22.33", + [[ + SELECT RTRIM(X'004100', X'0042'); + ]], { + -- <func-22.33> + "\0A" + -- </func-22.33> + }) + +test:do_execsql_test( + "func-22.34", + [[ + SELECT RTRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.34> + "\0\0A\0B" + -- </func-22.34> + }) + -- This is to test the deprecated sqlite3_aggregate_count() API. -- --test:do_test( -- "func-23.1", --- function() +-- function()S -- sqlite3_create_aggregate("db") -- return test:execsql([[ -- SELECT legacy_count() FROM t6; -- 2.17.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-15 10:57 [tarantool-patches] [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set Roman Khabibov @ 2018-12-18 8:40 ` n.pettik 2018-12-18 14:30 ` Roman 2018-12-20 20:41 ` Vladislav Shpilevoy 2018-12-27 12:28 ` Kirill Yukhin 2 siblings, 1 reply; 8+ messages in thread From: n.pettik @ 2018-12-18 8:40 UTC (permalink / raw) To: tarantool-patches; +Cc: Roman Khabibov > On 15 Dec 2018, at 13:57, Roman Khabibov <roman.habibov@tarantool.org> wrote: > > The reason for the bug was that X'00' is a terminal symbol. If the char set > contained X'00', all other characters after its (inculding itself) are then > ignored. Nit: ‘… after its’ -> ‘… after it’. Nit: inculding -> including. Nit: ‘are then ignored’ -> ‘are ignored’. Nit: I wouldn’t mix first and second conditions: ‘If the char set contains …, all characters are ignored...' > > Closes #3543 > > Branch: https://github.com/tarantool/tarantool/tree/romankhabibov/gh-3543-trim-terminal > Issue: https://github.com/tarantool/tarantool/issues/3543 You should put links to the branch and issue below delimiter ---, not above. > --- > src/box/sql/func.c | 18 +++++- > test/sql-tap/func.test.lua | 126 ++++++++++++++++++++++++++++++++++++- > 2 files changed, 140 insertions(+), 4 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 9667aead5..5beba7bd2 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1224,8 +1224,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > return; > } else { > const unsigned char *z; > - for (z = zCharSet, nChar = 0; *z; nChar++) { Lets start from putting here comments which explain what is going on here (how you algorithm/approach works). Then, lets choose more suitable names for variables: length, sizeInChar and nChar sound very similar (if you can’t come up with better names, then at least mention in comment what they mean). > + int sizeInChar = sqlite3_value_bytes(argv[1]); > + int length = 0; What is the difference between sizeInChar, length and nChar? > + z = zCharSet; > + nChar = 0; > + const unsigned char *zStepBack; > + while(sizeInChar - length) { We use explicit conditions as a predicate value. In other words: while (sizeInChar - length > 0) Also, it would be better if you declare and initialise var at once: const unsigned char *z = zCharSet; > + zStepBack = z; > SQLITE_SKIP_UTF8(z); > + length += z - zStepBack; > + nChar++; > } > if (nChar > 0) { > azChar = > @@ -1235,10 +1243,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > return; > } > aLen = (unsigned char *)&azChar[nChar]; > - for (z = zCharSet, nChar = 0; *z; nChar++) { > + z = zCharSet; > + nChar = 0; > + length = 0; > + while(sizeInChar - length) { > azChar[nChar] = (unsigned char *)z; > + zStepBack = z; > SQLITE_SKIP_UTF8(z); > + length += z - zStepBack; > aLen[nChar] = (u8) (z - azChar[nChar]); > + nChar++; > } > } > } > diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua > index 393212968..b7b8e7c4c 100755 > --- a/test/sql-tap/func.test.lua > +++ b/test/sql-tap/func.test.lua > @@ -1,6 +1,6 @@ > #!/usr/bin/env tarantool > test = require("sqltester") > -test:plan(14535) > +test:plan(14547) > > --!./tcltestrunner.lua > -- 2001 September 15 > @@ -2100,11 +2100,133 @@ test:do_execsql_test( > -- </func-22.22> > }) > > -- This is to test the deprecated sqlite3_aggregate_count() API. > -- > --test:do_test( > -- "func-23.1", > --- function() > +-- function()S Garbage diff. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-18 8:40 ` [tarantool-patches] " n.pettik @ 2018-12-18 14:30 ` Roman 2018-12-25 11:40 ` n.pettik 0 siblings, 1 reply; 8+ messages in thread From: Roman @ 2018-12-18 14:30 UTC (permalink / raw) To: tarantool-patches, n.pettik Hi! Thanks for review. > Nit: ‘… after its’ -> ‘… after it’. > > Nit: inculding -> including. > > Nit: ‘are then ignored’ -> ‘are ignored’. > > Nit: I wouldn’t mix first and second conditions: > ‘If the char set contains …, all characters are ignored...' Fixed. > You should put links to the branch and issue below delimiter ---, not above. Sorry. > Lets start from putting here comments which explain > what is going on here (how you algorithm/approach works). > Then, lets choose more suitable names for variables: > length, sizeInChar and nChar sound very similar > (if you can’t come up with better names, then at least > mention in comment what they mean). > >> + int sizeInChar = sqlite3_value_bytes(argv[1]); >> + int length = 0; > What is the difference between sizeInChar, length and nChar? I add comments and renamed vars. Hope, now it's clearer. > We use explicit conditions as a predicate value. In other words: > > while (sizeInChar - length > 0) > > Also, it would be better if you declare and initialise var at once: > > const unsigned char *z = zCharSet; Fixed. >> + zStepBack = z; >> SQLITE_SKIP_UTF8(z); >> + length += z - zStepBack; >> + nChar++; >> } >> if (nChar > 0) { >> azChar = >> @@ -1235,10 +1243,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) >> return; >> } >> aLen = (unsigned char *)&azChar[nChar]; >> - for (z = zCharSet, nChar = 0; *z; nChar++) { >> + z = zCharSet; >> + nChar = 0; >> + length = 0; >> + while(sizeInChar - length) { >> azChar[nChar] = (unsigned char *)z; >> + zStepBack = z; >> SQLITE_SKIP_UTF8(z); >> + length += z - zStepBack; >> aLen[nChar] = (u8) (z - azChar[nChar]); >> + nChar++; >> } >> } >> } >> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua >> index 393212968..b7b8e7c4c 100755 >> --- a/test/sql-tap/func.test.lua >> +++ b/test/sql-tap/func.test.lua >> @@ -1,6 +1,6 @@ >> #!/usr/bin/env tarantool >> test = require("sqltester") >> -test:plan(14535) >> +test:plan(14547) >> >> --!./tcltestrunner.lua >> -- 2001 September 15 >> @@ -2100,11 +2100,133 @@ test:do_execsql_test( >> -- </func-22.22> >> }) >> >> -- This is to test the deprecated sqlite3_aggregate_count() API. >> -- >> --test:do_test( >> -- "func-23.1", >> --- function() >> +-- function()S > Garbage diff. Sorry. I's an accident. commit 39b2481341cd2e71d67a04bec9aeed0e1189740c Author: Roman Khabibov <roman.habibov@tarantool.org> Date: Sat Dec 15 13:21:59 2018 +0300 sql: fix bug with BLOB TRIM() when X'00' in char set The reason for the bug was that X'00' is a terminal symbol. If the char set contained X'00', all characters are ignored after it (including itself). Closes #3543 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 9667aead5..f397e23c1 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1223,9 +1223,19 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) } else if ((zCharSet = sqlite3_value_text(argv[1])) == 0) { return; } else { - const unsigned char *z; - for (z = zCharSet, nChar = 0; *z; nChar++) { + const unsigned char *z = zCharSet; + int sizeOfCharSet = \ + sqlite3_value_bytes(argv[1]); /* Size of char set in bytes. */ + int nProcessedBytes = 0; + nChar = 0; + const unsigned char *zStepBack; + /* Count the number of UTF-8 characters passing through the + * entire char set, but not up to the '\0' or X'00' character. */ + while(sizeOfCharSet - nProcessedBytes > 0) { + zStepBack = z; SQLITE_SKIP_UTF8(z); + nProcessedBytes += z - zStepBack; + nChar++; } if (nChar > 0) { azChar = @@ -1235,10 +1245,18 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) return; } aLen = (unsigned char *)&azChar[nChar]; - for (z = zCharSet, nChar = 0; *z; nChar++) { + z = zCharSet; + nChar = 0; + nProcessedBytes = 0; + /* Similar to the previous cycle. But + * now write into "azCharSet". */ + while(sizeOfCharSet - nProcessedBytes > 0) { azChar[nChar] = (unsigned char *)z; + zStepBack = z; SQLITE_SKIP_UTF8(z); + nProcessedBytes += z - zStepBack; aLen[nChar] = (u8) (z - azChar[nChar]); + nChar++; } } } diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 393212968..b7de1d955 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test = require("sqltester") -test:plan(14535) +test:plan(14547) --!./tcltestrunner.lua -- 2001 September 15 @@ -2100,6 +2100,128 @@ test:do_execsql_test( -- </func-22.22> }) +-- gh-3543 Check trimming of binary string when X'00' in trimming char set. + +test:do_execsql_test( + "func-22.23", + [[ + SELECT TRIM(X'004100', X'00'); + ]], { + -- <func-22.23> + "A" + -- </func-22.23> + }) + +test:do_execsql_test( + "func-22.24", + [[ + SELECT TRIM(X'004100', X'0000'); + ]], { + -- <func-22.24> + "A" + -- </func-22.24> + }) + +test:do_execsql_test( + "func-22.25", + [[ + SELECT TRIM(X'004100', X'0042'); + ]], { + -- <func-22.25> + "A" + -- </func-22.25> + }) + +test:do_execsql_test( + "func-22.26", + [[ + SELECT TRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.26> + "A\0B" + -- </func-22.26> + }) + +test:do_execsql_test( + "func-22.27", + [[ + SELECT LTRIM(X'004100', X'00'); + ]], { + -- <func-22.27> + "A\0" + -- </func-22.27> + }) + +test:do_execsql_test( + "func-22.28", + [[ + SELECT LTRIM(X'004100', X'0000'); + ]], { + -- <func-22.28> + "A\0" + -- </func-22.28> + }) + +test:do_execsql_test( + "func-22.29", + [[ + SELECT LTRIM(X'004100', X'0042'); + ]], { + -- <func-22.29> + "A\0" + -- </func-22.29> + }) + +test:do_execsql_test( + "func-22.30", + [[ + SELECT LTRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.30> + "A\0B\0\0" + -- </func-22.30> + }) + +test:do_execsql_test( + "func-22.31", + [[ + SELECT RTRIM(X'004100', X'00'); + ]], { + -- <func-22.31> + "\0A" + -- </func-22.31> + }) + +test:do_execsql_test( + "func-22.32", + [[ + SELECT RTRIM(X'004100', X'0000'); + ]], { + -- <func-22.32> + "\0A" + -- </func-22.32> + }) + +test:do_execsql_test( + "func-22.33", + [[ + SELECT RTRIM(X'004100', X'0042'); + ]], { + -- <func-22.33> + "\0A" + -- </func-22.33> + }) + +test:do_execsql_test( + "func-22.34", + [[ + SELECT RTRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.34> + "\0\0A\0B" + -- </func-22.34> + }) + -- This is to test the deprecated sqlite3_aggregate_count() API. -- --test:do_test( ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-18 14:30 ` Roman @ 2018-12-25 11:40 ` n.pettik 2018-12-26 13:56 ` Roman 0 siblings, 1 reply; 8+ messages in thread From: n.pettik @ 2018-12-25 11:40 UTC (permalink / raw) To: tarantool-patches; +Cc: Roman Khabibov > commit 39b2481341cd2e71d67a04bec9aeed0e1189740c > Author: Roman Khabibov <roman.habibov@tarantool.org> > Date: Sat Dec 15 13:21:59 2018 +0300 > > sql: fix bug with BLOB TRIM() when X'00' in char set > > The reason for the bug was that X'00' is a terminal symbol. If the char set > contained X'00', all characters are ignored after it (including itself). > > Closes #3543 > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 9667aead5..f397e23c1 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1223,9 +1223,19 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > } else if ((zCharSet = sqlite3_value_text(argv[1])) == 0) { > return; > } else { > - const unsigned char *z; > - for (z = zCharSet, nChar = 0; *z; nChar++) { > + const unsigned char *z = zCharSet; > + int sizeOfCharSet = \ We don’t use backslashes to carry code (it is used for macroses and sometimes for comments). > + sqlite3_value_bytes(argv[1]); /* Size of char set in bytes. */ We put comments on the top of code to be commented: /* Size of char set in bytes. */ int sizeOfCharSet = sqlite3_value_bytes(argv[1]); I guess this comment is completely useless: name of var and function say exactly the same as comment does. > + int nProcessedBytes = 0; > + nChar = 0; > + const unsigned char *zStepBack; > + /* Count the number of UTF-8 characters passing through the > + * entire char set, but not up to the '\0' or X'00' character. */ Use tnt-style comments. > + while(sizeOfCharSet - nProcessedBytes > 0) { > + zStepBack = z; > SQLITE_SKIP_UTF8(z); > + nProcessedBytes += z - zStepBack; > + nChar++; > } > if (nChar > 0) { > azChar = > @@ -1235,10 +1245,18 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > return; > } > aLen = (unsigned char *)&azChar[nChar]; > - for (z = zCharSet, nChar = 0; *z; nChar++) { > + z = zCharSet; > + nChar = 0; > + nProcessedBytes = 0; This comment is again useless. > + /* Similar to the previous cycle. But I see trailing space. Use git diff to spot such places. > + * now write into "azCharSet". */ Use tnt-style comments. > + while(sizeOfCharSet - nProcessedBytes > 0) { > azChar[nChar] = (unsigned char *)z; > + zStepBack = z; You don’t need here ‘zStepBack’, you already saved current str position to azChar. > SQLITE_SKIP_UTF8(z); > + nProcessedBytes += z - zStepBack; > aLen[nChar] = (u8) (z - azChar[nChar]); > + nChar++; > } > } > } All points considered, I suggest diff like this: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index f397e23c1..9b5773321 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1203,7 +1203,8 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) int i; /* Loop counter */ unsigned char *aLen = 0; /* Length of each character in zCharSet */ unsigned char **azChar = 0; /* Individual characters in zCharSet */ - int nChar; /* Number of characters in zCharSet */ + /* Number of UTF-8 characters in zCharSet. */ + int nChar; if (sqlite3_value_type(argv[0]) == SQLITE_NULL) { return; @@ -1224,17 +1225,20 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) return; } else { const unsigned char *z = zCharSet; - int sizeOfCharSet = \ - sqlite3_value_bytes(argv[1]); /* Size of char set in bytes. */ - int nProcessedBytes = 0; + int trim_set_sz = sqlite3_value_bytes(argv[1]); + int handled_bytes_cnt = trim_set_sz; nChar = 0; - const unsigned char *zStepBack; - /* Count the number of UTF-8 characters passing through the - * entire char set, but not up to the '\0' or X'00' character. */ - while(sizeOfCharSet - nProcessedBytes > 0) { - zStepBack = z; + /* + * Count the number of UTF-8 characters passing + * through the entire char set, but not up + * to the '\0' or X'00' character. This allows + * to handle trimming set containing such + * characters. + */ + while(handled_bytes_cnt > 0) { + const unsigned char *prev_byte = z; SQLITE_SKIP_UTF8(z); - nProcessedBytes += z - zStepBack; + handled_bytes_cnt -= (z - prev_byte); nChar++; } if (nChar > 0) { @@ -1247,15 +1251,12 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) aLen = (unsigned char *)&azChar[nChar]; z = zCharSet; nChar = 0; - nProcessedBytes = 0; - /* Similar to the previous cycle. But - * now write into "azCharSet". */ - while(sizeOfCharSet - nProcessedBytes > 0) { + handled_bytes_cnt = trim_set_sz; + while(handled_bytes_cnt > 0) { azChar[nChar] = (unsigned char *)z; - zStepBack = z; SQLITE_SKIP_UTF8(z); - nProcessedBytes += z - zStepBack; aLen[nChar] = (u8) (z - azChar[nChar]); + handled_bytes_cnt -= aLen[nChar]; nChar++; Check it out. If you are ok with it, you can apply it (partially or fully). ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-25 11:40 ` n.pettik @ 2018-12-26 13:56 ` Roman 2018-12-28 11:09 ` n.pettik 0 siblings, 1 reply; 8+ messages in thread From: Roman @ 2018-12-26 13:56 UTC (permalink / raw) To: n.pettik, tarantool-patches On 25.12.2018 14:40, n.pettik wrote: > All points considered, I suggest diff like this: > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index f397e23c1..9b5773321 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1203,7 +1203,8 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > int i; /* Loop counter */ > unsigned char *aLen = 0; /* Length of each character in zCharSet */ > unsigned char **azChar = 0; /* Individual characters in zCharSet */ > - int nChar; /* Number of characters in zCharSet */ > + /* Number of UTF-8 characters in zCharSet. */ > + int nChar; > > if (sqlite3_value_type(argv[0]) == SQLITE_NULL) { > return; > @@ -1224,17 +1225,20 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > return; > } else { > const unsigned char *z = zCharSet; > - int sizeOfCharSet = \ > - sqlite3_value_bytes(argv[1]); /* Size of char set in bytes. */ > - int nProcessedBytes = 0; > + int trim_set_sz = sqlite3_value_bytes(argv[1]); > + int handled_bytes_cnt = trim_set_sz; > nChar = 0; > - const unsigned char *zStepBack; > - /* Count the number of UTF-8 characters passing through the > - * entire char set, but not up to the '\0' or X'00' character. */ > - while(sizeOfCharSet - nProcessedBytes > 0) { > - zStepBack = z; > + /* > + * Count the number of UTF-8 characters passing > + * through the entire char set, but not up > + * to the '\0' or X'00' character. This allows > + * to handle trimming set containing such > + * characters. > + */ > + while(handled_bytes_cnt > 0) { > + const unsigned char *prev_byte = z; > SQLITE_SKIP_UTF8(z); > - nProcessedBytes += z - zStepBack; > + handled_bytes_cnt -= (z - prev_byte); > nChar++; > } > if (nChar > 0) { > @@ -1247,15 +1251,12 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > aLen = (unsigned char *)&azChar[nChar]; > z = zCharSet; > nChar = 0; > - nProcessedBytes = 0; > - /* Similar to the previous cycle. But > - * now write into "azCharSet". */ > - while(sizeOfCharSet - nProcessedBytes > 0) { > + handled_bytes_cnt = trim_set_sz; > + while(handled_bytes_cnt > 0) { > azChar[nChar] = (unsigned char *)z; > - zStepBack = z; > SQLITE_SKIP_UTF8(z); > - nProcessedBytes += z - zStepBack; > aLen[nChar] = (u8) (z - azChar[nChar]); > + handled_bytes_cnt -= aLen[nChar]; > nChar++; > > Check it out. If you are ok with it, you can apply it (partially or fully). > Applied your diff. commit 844d438852be6e3bc06a7020ec0aeb96d3d5ee4e Author: Roman Khabibov <roman.habibov@tarantool.org> Date: Sat Dec 15 13:21:59 2018 +0300 sql: fix bug with BLOB TRIM() when X'00' in char set The reason for the bug was that X'00' is a terminal symbol. If the char set contained X'00', all characters are ignored after it (including itself). Closes #3543 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 9667aead5..e46b162d9 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1223,9 +1223,22 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) } else if ((zCharSet = sqlite3_value_text(argv[1])) == 0) { return; } else { - const unsigned char *z; - for (z = zCharSet, nChar = 0; *z; nChar++) { + const unsigned char *z = zCharSet; + int trim_set_sz = sqlite3_value_bytes(argv[1]); + int handled_bytes_cnt = trim_set_sz; + nChar = 0; + /* + * Count the number of UTF-8 characters passing + * through the entire char set, but not up + * to the '\0' or X'00' character. This allows + * to handle trimming set containing such + * characters. + */ + while(handled_bytes_cnt > 0) { + const unsigned char *prev_byte = z; SQLITE_SKIP_UTF8(z); + handled_bytes_cnt -= (z - prev_byte); + nChar++; } if (nChar > 0) { azChar = @@ -1235,10 +1248,15 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) return; } aLen = (unsigned char *)&azChar[nChar]; - for (z = zCharSet, nChar = 0; *z; nChar++) { + z = zCharSet; + nChar = 0; + handled_bytes_cnt = trim_set_sz; + while(handled_bytes_cnt > 0) { azChar[nChar] = (unsigned char *)z; SQLITE_SKIP_UTF8(z); aLen[nChar] = (u8) (z - azChar[nChar]); + handled_bytes_cnt -= aLen[nChar]; + nChar++; } } } diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 393212968..b7de1d955 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test = require("sqltester") -test:plan(14535) +test:plan(14547) --!./tcltestrunner.lua -- 2001 September 15 @@ -2100,6 +2100,128 @@ test:do_execsql_test( -- </func-22.22> }) +-- gh-3543 Check trimming of binary string when X'00' in trimming char set. + +test:do_execsql_test( + "func-22.23", + [[ + SELECT TRIM(X'004100', X'00'); + ]], { + -- <func-22.23> + "A" + -- </func-22.23> + }) + +test:do_execsql_test( + "func-22.24", + [[ + SELECT TRIM(X'004100', X'0000'); + ]], { + -- <func-22.24> + "A" + -- </func-22.24> + }) + +test:do_execsql_test( + "func-22.25", + [[ + SELECT TRIM(X'004100', X'0042'); + ]], { + -- <func-22.25> + "A" + -- </func-22.25> + }) + +test:do_execsql_test( + "func-22.26", + [[ + SELECT TRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.26> + "A\0B" + -- </func-22.26> + }) + +test:do_execsql_test( + "func-22.27", + [[ + SELECT LTRIM(X'004100', X'00'); + ]], { + -- <func-22.27> + "A\0" + -- </func-22.27> + }) + +test:do_execsql_test( + "func-22.28", + [[ + SELECT LTRIM(X'004100', X'0000'); + ]], { + -- <func-22.28> + "A\0" + -- </func-22.28> + }) + +test:do_execsql_test( + "func-22.29", + [[ + SELECT LTRIM(X'004100', X'0042'); + ]], { + -- <func-22.29> + "A\0" + -- </func-22.29> + }) + +test:do_execsql_test( + "func-22.30", + [[ + SELECT LTRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.30> + "A\0B\0\0" + -- </func-22.30> + }) + +test:do_execsql_test( + "func-22.31", + [[ + SELECT RTRIM(X'004100', X'00'); + ]], { + -- <func-22.31> + "\0A" + -- </func-22.31> + }) + +test:do_execsql_test( + "func-22.32", + [[ + SELECT RTRIM(X'004100', X'0000'); + ]], { + -- <func-22.32> + "\0A" + -- </func-22.32> + }) + +test:do_execsql_test( + "func-22.33", + [[ + SELECT RTRIM(X'004100', X'0042'); + ]], { + -- <func-22.33> + "\0A" + -- </func-22.33> + }) + +test:do_execsql_test( + "func-22.34", + [[ + SELECT RTRIM(X'00004100420000', X'00'); + ]], { + -- <func-22.34> + "\0\0A\0B" + -- </func-22.34> + }) + -- This is to test the deprecated sqlite3_aggregate_count() API. -- --test:do_test( ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-26 13:56 ` Roman @ 2018-12-28 11:09 ` n.pettik 0 siblings, 0 replies; 8+ messages in thread From: n.pettik @ 2018-12-28 11:09 UTC (permalink / raw) To: tarantool-patches; +Cc: Roman Khabibov LGTM. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-15 10:57 [tarantool-patches] [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set Roman Khabibov 2018-12-18 8:40 ` [tarantool-patches] " n.pettik @ 2018-12-20 20:41 ` Vladislav Shpilevoy 2018-12-27 12:28 ` Kirill Yukhin 2 siblings, 0 replies; 8+ messages in thread From: Vladislav Shpilevoy @ 2018-12-20 20:41 UTC (permalink / raw) To: Roman Khabibov, tarantool-patches, Nikita Pettik Nikita, please, do first review. On 15/12/2018 13:57, Roman Khabibov wrote: > The reason for the bug was that X'00' is a terminal symbol. If the char set > contained X'00', all other characters after its (inculding itself) are then > ignored. > > Closes #3543 > > Branch: https://github.com/tarantool/tarantool/tree/romankhabibov/gh-3543-trim-terminal > Issue: https://github.com/tarantool/tarantool/issues/3543 > --- > src/box/sql/func.c | 18 +++++- > test/sql-tap/func.test.lua | 126 ++++++++++++++++++++++++++++++++++++- > 2 files changed, 140 insertions(+), 4 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 9667aead5..5beba7bd2 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1224,8 +1224,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > return; > } else { > const unsigned char *z; > - for (z = zCharSet, nChar = 0; *z; nChar++) { > + int sizeInChar = sqlite3_value_bytes(argv[1]); > + int length = 0; > + z = zCharSet; > + nChar = 0; > + const unsigned char *zStepBack; > + while(sizeInChar - length) { > + zStepBack = z; > SQLITE_SKIP_UTF8(z); > + length += z - zStepBack; > + nChar++; > } > if (nChar > 0) { > azChar = > @@ -1235,10 +1243,16 @@ trimFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > return; > } > aLen = (unsigned char *)&azChar[nChar]; > - for (z = zCharSet, nChar = 0; *z; nChar++) { > + z = zCharSet; > + nChar = 0; > + length = 0; > + while(sizeInChar - length) { > azChar[nChar] = (unsigned char *)z; > + zStepBack = z; > SQLITE_SKIP_UTF8(z); > + length += z - zStepBack; > aLen[nChar] = (u8) (z - azChar[nChar]); > + nChar++; > } > } > } > diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua > index 393212968..b7b8e7c4c 100755 > --- a/test/sql-tap/func.test.lua > +++ b/test/sql-tap/func.test.lua > @@ -1,6 +1,6 @@ > #!/usr/bin/env tarantool > test = require("sqltester") > -test:plan(14535) > +test:plan(14547) > > --!./tcltestrunner.lua > -- 2001 September 15 > @@ -2100,11 +2100,133 @@ test:do_execsql_test( > -- </func-22.22> > }) > > +-- gh-3543 Check trimming of binary string when X'00' in trimming char set. > + > +test:do_execsql_test( > + "func-22.23", > + [[ > + SELECT TRIM(X'004100', X'00'); > + ]], { > + -- <func-22.23> > + "A" > + -- </func-22.23> > + }) > + > +test:do_execsql_test( > + "func-22.24", > + [[ > + SELECT TRIM(X'004100', X'0000'); > + ]], { > + -- <func-22.24> > + "A" > + -- </func-22.24> > + }) > + > +test:do_execsql_test( > + "func-22.25", > + [[ > + SELECT TRIM(X'004100', X'0042'); > + ]], { > + -- <func-22.25> > + "A" > + -- </func-22.25> > + }) > + > +test:do_execsql_test( > + "func-22.26", > + [[ > + SELECT TRIM(X'00004100420000', X'00'); > + ]], { > + -- <func-22.26> > + "A\0B" > + -- </func-22.26> > + }) > + > +test:do_execsql_test( > + "func-22.27", > + [[ > + SELECT LTRIM(X'004100', X'00'); > + ]], { > + -- <func-22.27> > + "A\0" > + -- </func-22.27> > + }) > + > +test:do_execsql_test( > + "func-22.28", > + [[ > + SELECT LTRIM(X'004100', X'0000'); > + ]], { > + -- <func-22.28> > + "A\0" > + -- </func-22.28> > + }) > + > +test:do_execsql_test( > + "func-22.29", > + [[ > + SELECT LTRIM(X'004100', X'0042'); > + ]], { > + -- <func-22.29> > + "A\0" > + -- </func-22.29> > + }) > + > +test:do_execsql_test( > + "func-22.30", > + [[ > + SELECT LTRIM(X'00004100420000', X'00'); > + ]], { > + -- <func-22.30> > + "A\0B\0\0" > + -- </func-22.30> > + }) > + > +test:do_execsql_test( > + "func-22.31", > + [[ > + SELECT RTRIM(X'004100', X'00'); > + ]], { > + -- <func-22.31> > + "\0A" > + -- </func-22.31> > + }) > + > +test:do_execsql_test( > + "func-22.32", > + [[ > + SELECT RTRIM(X'004100', X'0000'); > + ]], { > + -- <func-22.32> > + "\0A" > + -- </func-22.32> > + }) > + > +test:do_execsql_test( > + "func-22.33", > + [[ > + SELECT RTRIM(X'004100', X'0042'); > + ]], { > + -- <func-22.33> > + "\0A" > + -- </func-22.33> > + }) > + > +test:do_execsql_test( > + "func-22.34", > + [[ > + SELECT RTRIM(X'00004100420000', X'00'); > + ]], { > + -- <func-22.34> > + "\0\0A\0B" > + -- </func-22.34> > + }) > + > -- This is to test the deprecated sqlite3_aggregate_count() API. > -- > --test:do_test( > -- "func-23.1", > --- function() > +-- function()S > -- sqlite3_create_aggregate("db") > -- return test:execsql([[ > -- SELECT legacy_count() FROM t6; > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set 2018-12-15 10:57 [tarantool-patches] [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set Roman Khabibov 2018-12-18 8:40 ` [tarantool-patches] " n.pettik 2018-12-20 20:41 ` Vladislav Shpilevoy @ 2018-12-27 12:28 ` Kirill Yukhin 2 siblings, 0 replies; 8+ messages in thread From: Kirill Yukhin @ 2018-12-27 12:28 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy Hello, On 15 Dec 13:57, Roman Khabibov wrote: > The reason for the bug was that X'00' is a terminal symbol. If the char set > contained X'00', all other characters after its (inculding itself) are then > ignored. > > Closes #3543 > > Branch: https://github.com/tarantool/tarantool/tree/romankhabibov/gh-3543-trim-terminal > Issue: https://github.com/tarantool/tarantool/issues/3543 I've checked your patch into 2.1 branch. -- Regards, Kirill Yukhin ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-12-28 11:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-15 10:57 [tarantool-patches] [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set Roman Khabibov 2018-12-18 8:40 ` [tarantool-patches] " n.pettik 2018-12-18 14:30 ` Roman 2018-12-25 11:40 ` n.pettik 2018-12-26 13:56 ` Roman 2018-12-28 11:09 ` n.pettik 2018-12-20 20:41 ` Vladislav Shpilevoy 2018-12-27 12:28 ` Kirill Yukhin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox