Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2 1/1] sql: limit blob size during CAST AS INTEGER
@ 2020-02-13  8:16 imeevma
  2020-02-20 19:58 ` Nikita Pettik
  0 siblings, 1 reply; 4+ messages in thread
From: imeevma @ 2020-02-13  8:16 UTC (permalink / raw)
  To: korablev; +Cc: tarantool-patches

This patch limits the number of decoded bytes during CAST from
BLOB to INTEGER according to the specified BLOB size.

Closes #4766
---
https://github.com/tarantool/tarantool/issues/4766
https://github.com/tarantool/tarantool/tree/imeevma/gh-4766-fix-blob-size-for-cast

@ChangeLog
- added '\0' to the end of VARBINARY when casting to INTEGER or
  UNSIGNED (gh-4766).

 src/box/sql/util.c         | 13 ++++++++++---
 test/sql-tap/cast.test.lua | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 4 deletions(-)

Hi! Thank you for review. I send a new version to add a changelog.
My answers, diff and new patch below.

On 2/11/20 5:50 PM, Nikita Pettik wrote:
> On 11 Feb 12:52, imeevma@tarantool.org wrote:
>> This patch limits the number of decoded bytes during CAST from
>> BLOB to INTEGER according to the specified BLOB size.
>>
>> Closes #4766
>
> Note that issue has no assigned milestone. As far as I remember it is
> not recommended to work on such issues (personally I do not care, but
> I guess Kirill does). What is more, according to our new policy, each
> patch should come with change-log: see 'Support ChangeLogs' chapter in SOP.
>
Thanks, fixed. Also, 2.2.3 milestone was assigned.

>>
>> diff --git a/src/box/sql/util.c b/src/box/sql/util.c
>> index f908e9c..8f70ce6 100644
>> --- a/src/box/sql/util.c
>> +++ b/src/box/sql/util.c
>> @@ -467,14 +467,21 @@ sql_atoi64(const char *z, int64_t *val, bool *is_neg, int length)
>>  	if (*z == '-')
>>  		*is_neg = true;
>>  
>> +	/*
>> +	 * BLOB data may not end with '\0'. This leads to an error
>> +	 * in the strtoll() and strtoull() functions.
>
> Nit: does strtoll() raise an error, or return wrong result?
>
Return wrong result. Fixed the comment.

>> To fix this,
>> +	 * let's copy the value for decoding into static memory
>> +	 * and add '\0' to it.
>> +	 */
>> +	const char *str_value = tt_cstr(z, length);
>>  	char *end = NULL;
>>  	errno = 0;
>> -	if (*z == '-') {
>> +	if (*str_value == '-') {
>>  		*is_neg = true;
>> -		*val = strtoll(z, &end, 10);
>> +		*val = strtoll(str_value, &end, 10);
>>  	} else {
>>  		*is_neg = false;
>> -		uint64_t u_val = strtoull(z, &end, 10);
>> +		uint64_t u_val = strtoull(str_value, &end, 10);
>>  		*val = u_val;
>>  	}
>>  	/* Overflow and underflow errors. */
>> diff --git a/test/sql-tap/cast.test.lua b/test/sql-tap/cast.test.lua
>> index 9c937a0..a579f24 100755
>> --- a/test/sql-tap/cast.test.lua
>> +++ b/test/sql-tap/cast.test.lua
>> @@ -1,6 +1,6 @@
>>  #!/usr/bin/env tarantool
>>  test = require("sqltester")
>> -test:plan(85)
>> +test:plan(87)
>>  
>>  --!./tcltestrunner.lua
>>  -- 2005 June 25
>> @@ -905,4 +905,34 @@ test:do_execsql_test(
>>          -- </cast-5.1>
>>      })
>>  
>> +--
>> +-- gh-4766: Make sure that a blob as part of a tuple can be cast
>> +-- to NUMBER, INTEGER and UNSIGNED.
>
> I'd outline the fact that bug appears due to blob may not be terminated
> with \0 while stored in space. So make sure that function providing
> conversion from blob to numbers does account blob's lenght.
>
Changed the comment. Added info about '\0'.

>> +--
>> +test:do_execsql_test(
>> +    "cast-6.1",
>> +    [[
>> +        CREATE TABLE t (a VARBINARY PRIMARY KEY);
>> +        INSERT INTO t VALUES (X'33'), (X'372020202020');
>> +        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
>> +        DROP TABLE t;
>> +    ]], {
>> +        -- <cast-6.1>
>> +        '3', 3, 3, 3, '7     ', 7, 7, 7
>> +        -- </cast-6.1>
>> +    })
>> +
>> +test:do_execsql_test(
>> +    "cast-6.2",
>> +    [[
>> +        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
>> +        INSERT INTO t VALUES (X'33', 1);
>> +        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
>> +        DROP TABLE t;
>
> What's the (in functional sense) difference between 6.1 and 6.2?
>
True, in this test it isn't obvious what it should show. I fixed
the test. I made this test to show that the '\0' at the end of the
BLOB is important. One possible way to solve this problem was to
limit the length of the decoded BLOB (without copying and adding
'\0'), but this could lead to an incorrect result of this test.
I mean this:

tarantool> CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT) WITH ENGINE = 'vinyl';
---
- row_count: 1
...

tarantool> INSERT INTO t VALUES (X'33', 0x33);
---
- row_count: 1
...

tarantool> SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
---
- metadata:
  - name: A
    type: varbinary
  - name: CAST(a AS NUMBER)
    type: number
  - name: CAST(a AS INTEGER)
    type: integer
  - name: CAST(a AS UNSIGNED)
    type: unsigned
  rows:
  - ['3', 33, 33, 33]
...

tarantool> DROP TABLE t;
---
- row_count: 1
...

>> +    ]], {
>> +        -- <cast-6.2>
>> +        '3', 3, 3, 3
>> +        -- </cast-6.2>
>> +    })
>> +
>>  test:finish_test()
>> -- 
>> 2.7.4
>>

Diff:

diff --git a/src/box/sql/util.c b/src/box/sql/util.c
index 8f70ce6..b01b1ec 100644
--- a/src/box/sql/util.c
+++ b/src/box/sql/util.c
@@ -468,10 +468,10 @@ sql_atoi64(const char *z, int64_t *val, bool *is_neg, int length)
 		*is_neg = true;
 
 	/*
-	 * BLOB data may not end with '\0'. This leads to an error
-	 * in the strtoll() and strtoull() functions. To fix this,
-	 * let's copy the value for decoding into static memory
-	 * and add '\0' to it.
+	 * BLOB data may not end with '\0'. Because of this, the
+	 * strtoll() and strtoull() functions may return an
+	 * incorrect result. To fix this, let's copy the value for
+	 * decoding into static memory and add '\0' to it.
 	 */
 	const char *str_value = tt_cstr(z, length);
 	char *end = NULL;
diff --git a/test/sql-tap/cast.test.lua b/test/sql-tap/cast.test.lua
index b1ef27a..86c0fee 100755
--- a/test/sql-tap/cast.test.lua
+++ b/test/sql-tap/cast.test.lua
@@ -873,7 +873,8 @@ test:do_execsql_test(
 
 --
 -- gh-4766: Make sure that a blob as part of a tuple can be cast
--- to NUMBER, INTEGER and UNSIGNED.
+-- to NUMBER, INTEGER and UNSIGNED. Prior to this patch, an error
+-- could appear due to the absence of '\0' at the end of the BLOB.
 --
 test:do_execsql_test(
     "cast-6.1",
@@ -888,11 +889,15 @@ test:do_execsql_test(
         -- </cast-6.1>
     })
 
+--
+-- In some cases, the absence of '\0' could lead to an incorrect
+-- result. Make sure this does not happen now.
+--
 test:do_execsql_test(
     "cast-6.2",
     [[
         CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
-        INSERT INTO t VALUES (X'33', 1);
+        INSERT INTO t VALUES (X'33', 0x33);
         SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
         DROP TABLE t;
     ]], {

New patch:

From 4f7cb05d8161597c0e58520b75af04167ce0b5e6 Mon Sep 17 00:00:00 2001
From: Mergen Imeev <imeevma@gmail.com>
Date: Wed, 5 Feb 2020 16:38:33 +0300
Subject: [PATCH] sql: limit blob size during CAST AS INTEGER

This patch limits the number of decoded bytes during CAST from
BLOB to INTEGER according to the specified BLOB size.

Closes #4766

diff --git a/src/box/sql/util.c b/src/box/sql/util.c
index f908e9c..b01b1ec 100644
--- a/src/box/sql/util.c
+++ b/src/box/sql/util.c
@@ -467,14 +467,21 @@ sql_atoi64(const char *z, int64_t *val, bool *is_neg, int length)
 	if (*z == '-')
 		*is_neg = true;
 
+	/*
+	 * BLOB data may not end with '\0'. Because of this, the
+	 * strtoll() and strtoull() functions may return an
+	 * incorrect result. To fix this, let's copy the value for
+	 * decoding into static memory and add '\0' to it.
+	 */
+	const char *str_value = tt_cstr(z, length);
 	char *end = NULL;
 	errno = 0;
-	if (*z == '-') {
+	if (*str_value == '-') {
 		*is_neg = true;
-		*val = strtoll(z, &end, 10);
+		*val = strtoll(str_value, &end, 10);
 	} else {
 		*is_neg = false;
-		uint64_t u_val = strtoull(z, &end, 10);
+		uint64_t u_val = strtoull(str_value, &end, 10);
 		*val = u_val;
 	}
 	/* Overflow and underflow errors. */
diff --git a/test/sql-tap/cast.test.lua b/test/sql-tap/cast.test.lua
index fb0790d..86c0fee 100755
--- a/test/sql-tap/cast.test.lua
+++ b/test/sql-tap/cast.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(79)
+test:plan(81)
 
 --!./tcltestrunner.lua
 -- 2005 June 25
@@ -871,4 +871,39 @@ test:do_execsql_test(
         -- </cast-5.1>
     })
 
+--
+-- gh-4766: Make sure that a blob as part of a tuple can be cast
+-- to NUMBER, INTEGER and UNSIGNED. Prior to this patch, an error
+-- could appear due to the absence of '\0' at the end of the BLOB.
+--
+test:do_execsql_test(
+    "cast-6.1",
+    [[
+        CREATE TABLE t (a VARBINARY PRIMARY KEY);
+        INSERT INTO t VALUES (X'33'), (X'372020202020');
+        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
+        DROP TABLE t;
+    ]], {
+        -- <cast-6.1>
+        '3', 3, 3, 3, '7     ', 7, 7, 7
+        -- </cast-6.1>
+    })
+
+--
+-- In some cases, the absence of '\0' could lead to an incorrect
+-- result. Make sure this does not happen now.
+--
+test:do_execsql_test(
+    "cast-6.2",
+    [[
+        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
+        INSERT INTO t VALUES (X'33', 0x33);
+        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
+        DROP TABLE t;
+    ]], {
+        -- <cast-6.2>
+        '3', 3, 3, 3
+        -- </cast-6.2>
+    })
+
 test:finish_test()

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/1] sql: limit blob size during CAST AS INTEGER
  2020-02-13  8:16 [Tarantool-patches] [PATCH v2 1/1] sql: limit blob size during CAST AS INTEGER imeevma
@ 2020-02-20 19:58 ` Nikita Pettik
  2020-02-22  8:27   ` Mergen Imeev
  0 siblings, 1 reply; 4+ messages in thread
From: Nikita Pettik @ 2020-02-20 19:58 UTC (permalink / raw)
  To: imeevma; +Cc: tarantool-patches

On 13 Feb 11:16, imeevma@tarantool.org wrote:
> >> +test:do_execsql_test(
> >> +    "cast-6.1",
> >> +    [[
> >> +        CREATE TABLE t (a VARBINARY PRIMARY KEY);
> >> +        INSERT INTO t VALUES (X'33'), (X'372020202020');
> >> +        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> >> +        DROP TABLE t;
> >> +    ]], {
> >> +        -- <cast-6.1>
> >> +        '3', 3, 3, 3, '7     ', 7, 7, 7
> >> +        -- </cast-6.1>
> >> +    })
> >> +
> >> +test:do_execsql_test(
> >> +    "cast-6.2",
> >> +    [[
> >> +        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
> >> +        INSERT INTO t VALUES (X'33', 1);
> >> +        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> >> +        DROP TABLE t;
> >
> > What's the (in functional sense) difference between 6.1 and 6.2?
> >
> True, in this test it isn't obvious what it should show. I fixed
> the test. I made this test to show that the '\0' at the end of the
> BLOB is important. One possible way to solve this problem was to
> limit the length of the decoded BLOB (without copying and adding
> '\0'), but this could lead to an incorrect result of this test.
> I mean this:
> 
> tarantool> CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT) WITH ENGINE = 'vinyl';
> ---
> - row_count: 1
> ...
> 
> tarantool> INSERT INTO t VALUES (X'33', 0x33);
> ---
> - row_count: 1
> ...

So now you insert 0x33 instead of 1 to integer field. But how does it
affect test? I failed to understand. In both cases you fetch and operate
on blob, meanwhile integer field doesn't seem to be involved.

> 
> tarantool> SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> ---
> - metadata:
>   - name: A
>     type: varbinary
>   - name: CAST(a AS NUMBER)
>     type: number
>   - name: CAST(a AS INTEGER)
>     type: integer
>   - name: CAST(a AS UNSIGNED)
>     type: unsigned
>   rows:
>   - ['3', 33, 33, 33]
> ...
> 
> +--
> +-- In some cases, the absence of '\0' could lead to an incorrect
> +-- result. Make sure this does not happen now.
> +--
>  test:do_execsql_test(
>      "cast-6.2",
>      [[
>          CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
> -        INSERT INTO t VALUES (X'33', 1);
> +        INSERT INTO t VALUES (X'33', 0x33);
>          SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
>          DROP TABLE t;
>      ]], {
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/1] sql: limit blob size during CAST AS INTEGER
  2020-02-20 19:58 ` Nikita Pettik
@ 2020-02-22  8:27   ` Mergen Imeev
  2020-03-11 16:15     ` Nikita Pettik
  0 siblings, 1 reply; 4+ messages in thread
From: Mergen Imeev @ 2020-02-22  8:27 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

Hi! Thank you for review. I changed a test once again.
Diff below.

On Thu, Feb 20, 2020 at 10:58:21PM +0300, Nikita Pettik wrote:
> On 13 Feb 11:16, imeevma@tarantool.org wrote:
> > >> +test:do_execsql_test(
> > >> +    "cast-6.1",
> > >> +    [[
> > >> +        CREATE TABLE t (a VARBINARY PRIMARY KEY);
> > >> +        INSERT INTO t VALUES (X'33'), (X'372020202020');
> > >> +        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> > >> +        DROP TABLE t;
> > >> +    ]], {
> > >> +        -- <cast-6.1>
> > >> +        '3', 3, 3, 3, '7     ', 7, 7, 7
> > >> +        -- </cast-6.1>
> > >> +    })
> > >> +
> > >> +test:do_execsql_test(
> > >> +    "cast-6.2",
> > >> +    [[
> > >> +        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
> > >> +        INSERT INTO t VALUES (X'33', 1);
> > >> +        SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> > >> +        DROP TABLE t;
> > >
> > > What's the (in functional sense) difference between 6.1 and 6.2?
> > >
> > True, in this test it isn't obvious what it should show. I fixed
> > the test. I made this test to show that the '\0' at the end of the
> > BLOB is important. One possible way to solve this problem was to
> > limit the length of the decoded BLOB (without copying and adding
> > '\0'), but this could lead to an incorrect result of this test.
> > I mean this:
> > 
> > tarantool> CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT) WITH ENGINE = 'vinyl';
> > ---
> > - row_count: 1
> > ...
> > 
> > tarantool> INSERT INTO t VALUES (X'33', 0x33);
> > ---
> > - row_count: 1
> > ...
> 
> So now you insert 0x33 instead of 1 to integer field. But how does it
> affect test? I failed to understand. In both cases you fetch and operate
> on blob, meanwhile integer field doesn't seem to be involved.
> 
As I wrote in the last letter, we have a way to make sure
that with the first case everything will be in order,
without creating a duplicate of this binary value.
Obviously, that method will definitely not affect
performance. But it can lead to the part of the value that
looks like X'333300' being decoded as 33. See the example
from the last letter.

> > 
> > tarantool> SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> > ---
> > - metadata:
> >   - name: A
> >     type: varbinary
> >   - name: CAST(a AS NUMBER)
> >     type: number
> >   - name: CAST(a AS INTEGER)
> >     type: integer
> >   - name: CAST(a AS UNSIGNED)
> >     type: unsigned
> >   rows:
> >   - ['3', 33, 33, 33]
> > ...
> > 
> > +--
> > +-- In some cases, the absence of '\0' could lead to an incorrect
> > +-- result. Make sure this does not happen now.
> > +--
> >  test:do_execsql_test(
> >      "cast-6.2",
> >      [[
> >          CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
> > -        INSERT INTO t VALUES (X'33', 1);
> > +        INSERT INTO t VALUES (X'33', 0x33);
> >          SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
> >          DROP TABLE t;
> >      ]], {
> > 

Diff:

diff --git a/test/sql-tap/cast.test.lua b/test/sql-tap/cast.test.lua
index 86c0fee..74844e0 100755
--- a/test/sql-tap/cast.test.lua
+++ b/test/sql-tap/cast.test.lua
@@ -891,13 +891,15 @@ test:do_execsql_test(
 
 --
 -- In some cases, the absence of '\0' could lead to an incorrect
--- result. Make sure this does not happen now.
+-- result. For example, in this case, part of the value is as
+-- follows: X'333300', which can be decoded as the number 33. Make
+-- sure this does not happen now.
 --
 test:do_execsql_test(
     "cast-6.2",
     [[
-        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
-        INSERT INTO t VALUES (X'33', 0x33);
+        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT, u INT);
+        INSERT INTO t VALUES (X'33', 0x33, 0x00);
         SELECT a, CAST(a AS NUMBER), CAST(a AS INTEGER), CAST(a AS UNSIGNED) FROM t;
         DROP TABLE t;
     ]], {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/1] sql: limit blob size during CAST AS INTEGER
  2020-02-22  8:27   ` Mergen Imeev
@ 2020-03-11 16:15     ` Nikita Pettik
  0 siblings, 0 replies; 4+ messages in thread
From: Nikita Pettik @ 2020-03-11 16:15 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-patches

On 22 Feb 11:27, Mergen Imeev wrote:
> Hi! Thank you for review. I changed a test once again.
> Diff below.
> 
> On Thu, Feb 20, 2020 at 10:58:21PM +0300, Nikita Pettik wrote:
> > On 13 Feb 11:16, imeevma@tarantool.org wrote:
> > So now you insert 0x33 instead of 1 to integer field. But how does it
> > affect test? I failed to understand. In both cases you fetch and operate
> > on blob, meanwhile integer field doesn't seem to be involved.
> > 
> As I wrote in the last letter, we have a way to make sure
> that with the first case everything will be in order,
> without creating a duplicate of this binary value.
> Obviously, that method will definitely not affect
> performance. But it can lead to the part of the value that
> looks like X'333300' being decoded as 33. See the example
> from the last letter.
> 
> 
> Diff:
> 
> diff --git a/test/sql-tap/cast.test.lua b/test/sql-tap/cast.test.lua
> index 86c0fee..74844e0 100755
> --- a/test/sql-tap/cast.test.lua
> +++ b/test/sql-tap/cast.test.lua
> @@ -891,13 +891,15 @@ test:do_execsql_test(
>  
>  --
>  -- In some cases, the absence of '\0' could lead to an incorrect
> --- result. Make sure this does not happen now.
> +-- result. For example, in this case, part of the value is as
> +-- follows: X'333300', which can be decoded as the number 33. Make
> +-- sure this does not happen now.
>  --
>  test:do_execsql_test(
>      "cast-6.2",
>      [[
> -        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT);
> -        INSERT INTO t VALUES (X'33', 0x33);
> +        CREATE TABLE t (a VARBINARY PRIMARY KEY, i INT, u INT);
> +        INSERT INTO t VALUES (X'33', 0x33, 0x00);

Still don't understand the purpose of creating separate table and so on.
Again: next/prev fields don't affect content of field 'A': blob is
stored in msgpack alongside with its length, so OP_Column can't decode
more/less bytes than indicated in msgpack.

What is more, found that your implementation relies on tt_cstr() which
uses static buffer which in turn restricted by 3 * 4096 bytes. So users
may get wrong results of cast with ease. Example:

long_str = string.rep('0', 15000) 
long_str = long_str..'123'
box.execute(string.format("insert into test values(2, '%s')", long_str)) 
box.execute("select cast(s as INTEGER) from test")

Result is 0 meanwhile should lead to error.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-03-11 16:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13  8:16 [Tarantool-patches] [PATCH v2 1/1] sql: limit blob size during CAST AS INTEGER imeevma
2020-02-20 19:58 ` Nikita Pettik
2020-02-22  8:27   ` Mergen Imeev
2020-03-11 16:15     ` Nikita Pettik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox