Tarantool development patches archive
 help / color / mirror / Atom feed
From: Safin Timur via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Mergen Imeev <imeevma@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: fix a segfault in hex() on receiving zeroblob
Date: Mon, 6 Sep 2021 23:32:06 +0300	[thread overview]
Message-ID: <7ac838fc-44cf-ccf0-2ca2-c126be437cb2@tarantool.org> (raw)
In-Reply-To: <20210906094528.GA24664@tarantool.org>


On 06.09.2021 12:45, Mergen Imeev wrote:
> Hi! Thank you for the review! My answer below.
> 
> On Fri, Sep 03, 2021 at 10:19:56PM +0300, Safin Timur wrote:
>>
>>
>> On 01.09.2021 11:44, Mergen Imeev wrote:
>>> Hi! Thank you for the review. My answers below.
>>>
>>> On Tue, Aug 31, 2021 at 10:32:46PM +0300, Timur Safin wrote:
>>>> I may miss something obvious, but prior version of a code
>>>> with pBlob and n was much shorter, compacter and more readable.
>>>> I'm curious, why do you prefer to always use argv[0]->n and
>>>> argv[0]->z instead?
>>>>
>>> If we talk about the old function, then it really looks simpler. However, it did
>>> not work correctly and also made some unnecessary changes to the arguments. You
>>> can compare to the fixed version of old function on this branch:
>>> imeevma/gh-6113-fix-hex-segfault-2.8 (which I also sent you for review). You will
>>> see much less difference there.
>>
>> I meant that newer code was a little bit .. mouthful, with unnecessary code
>> substitution and visual noise which harmed readability. Here is an example
>> of version which is not using argv[0]->.. wherever we refer to fields.
>>
>> ----------------------------------------------------
>> /** Implementation of the HEX() SQL built-in function. */
>> static void
>> func_hex(struct sql_context *ctx, int argc, struct Mem **argv)
>> {
>> 	assert(argc == 1);
>> 	(void)argc;
>> 	if (argv[0]->type == MEM_TYPE_NULL)
>> 		return mem_set_null(ctx->pOut);
>>
>> 	int n = argv[0]->n;
>> 	int zero_len = argv[0]->u.nZero;

> I believe you cannot use undefined value.

That's good question, and wording in standard C is(was) confusing here, 
and one could get an impression that it's UB to acsess field of union, 
other than that has been initialized last. It's called type-punning. And 
always was just implemented as a trickier way for "reinterpret cast" 
which is compatible with aliasing analysis in compiler. There is 
correndum in C99 defect report which clarifies that:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm
```
78a If the member used to access the contents of a union object is not 
the same as the member last used to store a value in the object, the 
appropriate part of the object representation of the value is 
reinterpreted as an object representation in the new type as described 
in 6.2.6 (a process sometimes called "type punning"). This might be a 
trap representation.
```

i.e. it's either cast to asked type, or trap of such cast is impossible 
(like if target type is double, and value in memory creates invalid 
floating value, and host hardware programed to raise in such cases).

In this case we always get integer value, moreover we use this value 
below __only__ if argv[0]->flags & MEM_Zero is non-zero, so we reuse 
this value only in correct circumstances.

So, in short, that's ok in code below.

> 
>> 	assert(argv[0]->type == MEM_TYPE_BIN && n >= 0);
>> 	assert((argv[0]->flags & MEM_Zero) == 0 || zero_len >= 0);
>>
>> 	uint32_t size = 2 * n;
>> 	if ((argv[0]->flags & MEM_Zero) != 0)
>> 		size += 2 * zero_len;
>> 	if (size == 0)
>> 		return mem_set_str0_static(ctx->pOut, "");
>>
>> 	char *str = sqlDbMallocRawNN(sql_get(), size);
>> 	if (str == NULL) {
>> 		ctx->is_aborted = true;
>> 		return;
>> 	}
>> 	for (int i = 0; i < n; ++i) {
>> 		char c = argv[0]->z[i];
>> 		str[2 * i] = hexdigits[(c >> 4) & 0xf];
>> 		str[2 * i + 1] = hexdigits[c & 0xf];
>> 	}
>> 	if ((argv[0]->flags & MEM_Zero) != 0)
>> 		memset(&str[2 * n], '0', 2 * zero_len);
>> 	mem_set_str_allocated(ctx->pOut, str, size);
>> }
>>
>> ----------------------------------------------------
>>
>> It's more resembling original code (and that was done intentionally).
>>
> I don't like that you define a variable with an undefined value in some cases.
> I would introduce some new variables if there was some complicated logic,
> however I don't see the need to do this here since I don't see complex
> expressions.

I still insist that excessive usage of argv[0]->xxx in every line make 
this code uglier and less readable. Please, make code less verbose.

> 
>> Also (and I didn't change it in the sample) there is apparent missing check
>> for SQL_LIMIT_LENGTH limit which used to be done in contextMalloc() before,
>> but now is missing once we use sqlDbMallocRawNN(). I assume we better return
>> this check (once again as a proper wrapper which contextMalloc() essentially
>> was).
>>
> This will be verified in VDBE. I think it is better to have such a check
> centralized for all functions.

Is it already verified at the moment? Or you meant it __will be__ 
verified eventually in future code.

In general, though, we may not assume that the code will always be 
called in correct context where all bounds checks processed. So 
assertions should be local, placed where they are assumed to be present.

Remember recent discussion elsewhere - assertions may not be too much. 
Please put them here. (And in any case, they will not hurt performance 
in release mode, even if there will be coincidentally some duplicates 
here and there).

> 
>>>
>>>> Also, it seems to me we better to limit the number of bytes customer
>>>> may request to allocate from HEX()? What about to check against SQL_LIMIT_LENGTH?
>>>>
>>> This check is performed in the OP_BuiltinFunction opcode.
>>
>> That's nice, so it's not a problem then.

Though, assertions help, as I said above...

>>
>>>
>>>> Thanks,
>>>> Timur
>>>>

Thanks,
Timur


>>>>> -----Original Message-----
>>>>> From: imeevma@tarantool.org <imeevma@tarantool.org>
>>>>> Sent: Monday, August 30, 2021 9:31 AM
>>>>> To: tsafin@tarantool.org
>>>>> Cc: tarantool-patches@dev.tarantool.org
>>>>> Subject: [PATCH v1 1/1] sql: fix a segfault in hex() on receiving
>>>>> zeroblob
>>>>>
>>>>> This patch fixes a segmentation fault when zeroblob is received by
>>>>> the
>>>>> SQL built-in HEX() function.
>>>>>
>>>>> Closes #6113
>>>>> ---
>>>>> https://github.com/tarantool/tarantool/issues/6113
>>>>> https://github.com/tarantool/tarantool/tree/imeevma/gh-6113-fix-hex-
>>>>> segfault-2.10
>>>>>
>>>>>    .../gh-6113-fix-segfault-in-hex-func.md       |  5 ++
>>>>>    src/box/sql/func.c                            | 75 ++++++++++-------
>>>>> --
>>>>>    test/sql-tap/engine.cfg                       |  1 +
>>>>>    ...gh-6113-assert-in-hex-on-zeroblob.test.lua | 13 ++++
>>>>>    4 files changed, 58 insertions(+), 36 deletions(-)
>>>>>    create mode 100644 changelogs/unreleased/gh-6113-fix-segfault-in-
>>>>> hex-func.md
>>>>>    create mode 100755 test/sql-tap/gh-6113-assert-in-hex-on-
>>>>> zeroblob.test.lua
>>>>>
>>>>> diff --git a/changelogs/unreleased/gh-6113-fix-segfault-in-hex-
>>>>> func.md b/changelogs/unreleased/gh-6113-fix-segfault-in-hex-func.md
>>>>> new file mode 100644
>>>>> index 000000000..c59be4d96
>>>>> --- /dev/null
>>>>> +++ b/changelogs/unreleased/gh-6113-fix-segfault-in-hex-func.md
>>>>> @@ -0,0 +1,5 @@
>>>>> +## bugfix/sql
>>>>> +
>>>>> +* The HEX() SQL built-in function now does not throw an assert on
>>>>> receiving
>>>>> +  varbinary values that consist of zero-bytes (gh-6113).
>>>>> +
>>>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>>>>> index c063552d6..fa2a2c245 100644
>>>>> --- a/src/box/sql/func.c
>>>>> +++ b/src/box/sql/func.c
>>>>> @@ -53,6 +53,44 @@
>>>>>    static struct mh_strnptr_t *built_in_functions = NULL;
>>>>>    static struct func_sql_builtin **functions;
>>>>>
>>>>> +/** Array for converting from half-bytes into ASCII hex digits. */
>>>>> +static const char hexdigits[] = {
>>>>> +	'0', '1', '2', '3', '4', '5', '6', '7',
>>>>> +	'8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
>>>>> +};
>>>>> +
>>>>> +/** Implementation of the HEX() SQL built-in function. */
>>>>> +static void
>>>>> +func_hex(struct sql_context *ctx, int argc, struct Mem **argv)
>>>>> +{
>>>>> +	assert(argc == 1);
>>>>> +	(void)argc;
>>>>> +	if (argv[0]->type == MEM_TYPE_NULL)
>>>>> +		return mem_set_null(ctx->pOut);
>>>>> +
>>>>> +	assert(argv[0]->type == MEM_TYPE_BIN && argv[0]->n >= 0);
>>>>> +	assert((argv[0]->flags & MEM_Zero) == 0 || argv[0]->u.nZero >=
>>>>> 0);
>>>>> +	uint32_t size = 2 * argv[0]->n;
>>>>> +	if ((argv[0]->flags & MEM_Zero) != 0)
>>>>> +		size += 2 * argv[0]->u.nZero;
>>>>> +	if (size == 0)
>>>>> +		return mem_set_str0_static(ctx->pOut, "");
>>>>> +
>>>>> +	char *str = sqlDbMallocRawNN(sql_get(), size);
>>>>> +	if (str == NULL) {
>>>>> +		ctx->is_aborted = true;
>>>>> +		return;
>>>>> +	}
>>>>> +	for (int i = 0; i < argv[0]->n; ++i) {
>>>>> +		char c = argv[0]->z[i];
>>>>> +		str[2 * i] = hexdigits[(c >> 4) & 0xf];
>>>>> +		str[2 * i + 1] = hexdigits[c & 0xf];
>>>>> +	}
>>>>> +	if ((argv[0]->flags & MEM_Zero) != 0)
>>>>> +		memset(&str[2 * argv[0]->n], '0', 2 * argv[0]->u.nZero);
>>>>> +	mem_set_str_allocated(ctx->pOut, str, size);
>>>>> +}
>>>>> +
>>>>>    static const unsigned char *
>>>>>    mem_as_ustr(struct Mem *mem)
>>>>>    {
>>>>> @@ -1072,14 +1110,6 @@ sql_func_version(struct sql_context *context,
>>>>>    	sql_result_text(context, tarantool_version(), -1, SQL_STATIC);
>>>>>    }
>>>>>
>>>>> -/* Array for converting from half-bytes (nybbles) into ASCII hex
>>>>> - * digits.
>>>>> - */
>>>>> -static const char hexdigits[] = {
>>>>> -	'0', '1', '2', '3', '4', '5', '6', '7',
>>>>> -	'8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
>>>>> -};
>>>>> -
>>>>>    /*
>>>>>     * Implementation of the QUOTE() function.  This function takes a
>>>>> single
>>>>>     * argument.  If the argument is numeric, the return value is the
>>>>> same as
>>>>> @@ -1233,33 +1263,6 @@ charFunc(sql_context * context, int argc,
>>>>> sql_value ** argv)
>>>>>    	sql_result_text64(context, (char *)z, zOut - z, sql_free);
>>>>>    }
>>>>>
>>>>> -/*
>>>>> - * The hex() function.  Interpret the argument as a blob.  Return
>>>>> - * a hexadecimal rendering as text.
>>>>> - */
>>>>> -static void
>>>>> -hexFunc(sql_context * context, int argc, sql_value ** argv)
>>>>> -{
>>>>> -	int i, n;
>>>>> -	const unsigned char *pBlob;
>>>>> -	char *zHex, *z;
>>>>> -	assert(argc == 1);
>>>>> -	UNUSED_PARAMETER(argc);
>>>>> -	pBlob = mem_as_bin(argv[0]);
>>>>> -	n = mem_len_unsafe(argv[0]);
>>>>> -	assert(pBlob == mem_as_bin(argv[0]));	/* No encoding change */
>>>>> -	z = zHex = contextMalloc(context, ((i64) n) * 2 + 1);
>>>>> -	if (zHex) {
>>>>> -		for (i = 0; i < n; i++, pBlob++) {
>>>>> -			unsigned char c = *pBlob;
>>>>> -			*(z++) = hexdigits[(c >> 4) & 0xf];
>>>>> -			*(z++) = hexdigits[c & 0xf];
>>>>> -		}
>>>>> -		*z = 0;
>>>>> -		sql_result_text(context, zHex, n * 2, sql_free);
>>>>> -	}
>>>>> -}
>>>>> -
>>>>>    /*
>>>>>     * The zeroblob(N) function returns a zero-filled blob of size N
>>>>> bytes.
>>>>>     */
>>>>> @@ -2034,7 +2037,7 @@ static struct sql_func_definition definitions[]
>>>>> = {
>>>>>    	{"GROUP_CONCAT", 2, {FIELD_TYPE_VARBINARY,
>>>>> FIELD_TYPE_VARBINARY},
>>>>>    	 FIELD_TYPE_VARBINARY, groupConcatStep, groupConcatFinalize},
>>>>>
>>>>> -	{"HEX", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_STRING, hexFunc,
>>>>> NULL},
>>>>> +	{"HEX", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_STRING, func_hex,
>>>>> NULL},
>>>>>    	{"IFNULL", 2, {FIELD_TYPE_ANY, FIELD_TYPE_ANY},
>>>>> FIELD_TYPE_SCALAR,
>>>>>    	 sql_builtin_stub, NULL},
>>>>>
>>
>> Regards,
>> Timur

  reply	other threads:[~2021-09-06 20:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30  6:30 Mergen Imeev via Tarantool-patches
2021-08-31 19:32 ` Timur Safin via Tarantool-patches
2021-09-01  8:44   ` Mergen Imeev via Tarantool-patches
2021-09-03 19:19     ` Safin Timur via Tarantool-patches
2021-09-06  9:45       ` Mergen Imeev via Tarantool-patches
2021-09-06 20:32         ` Safin Timur via Tarantool-patches [this message]
2021-09-07  9:16           ` Mergen Imeev via Tarantool-patches
  -- strict thread matches above, loose matches on Subject: below --
2021-10-05 12:49 Mergen Imeev via Tarantool-patches
2021-08-30  6:20 Mergen Imeev via Tarantool-patches
2021-09-03 19:20 ` Safin Timur via Tarantool-patches
2021-08-26 11:11 Mergen Imeev via Tarantool-patches
2021-08-26 20:42 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-27  8:26   ` Mergen Imeev via Tarantool-patches
2021-08-27 21:31     ` Vladislav Shpilevoy via Tarantool-patches
2021-08-26 11:10 Mergen Imeev via Tarantool-patches
2021-08-26 20:31 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-27  7:54   ` Mergen Imeev via Tarantool-patches
2021-08-27 21:52     ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ac838fc-44cf-ccf0-2ca2-c126be437cb2@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=imeevma@tarantool.org \
    --cc=tsafin@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/1] sql: fix a segfault in hex() on receiving zeroblob' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox