Tarantool development patches archive
 help / color / mirror / Atom feed
From: Mergen Imeev via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Safin Timur <tsafin@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: fix a segfault in hex() on receiving zeroblob
Date: Tue, 7 Sep 2021 12:16:24 +0300	[thread overview]
Message-ID: <20210907091624.GA4768@tarantool.org> (raw)
In-Reply-To: <7ac838fc-44cf-ccf0-2ca2-c126be437cb2@tarantool.org>

Hi! Thank you for the review! My answers below. Also, I think there is no point
to continue with this review since we cannot get through such basic questions.
Thanks for you time.

On Mon, Sep 06, 2021 at 11:32:06PM +0300, Safin Timur wrote:
> 
> On 06.09.2021 12:45, Mergen Imeev wrote:
> > Hi! Thank you for the review! My answer below.
> > 
> > On Fri, Sep 03, 2021 at 10:19:56PM +0300, Safin Timur wrote:
> > > 
> > > 
> > > On 01.09.2021 11:44, Mergen Imeev wrote:
> > > > Hi! Thank you for the review. My answers below.
> > > > 
> > > > On Tue, Aug 31, 2021 at 10:32:46PM +0300, Timur Safin wrote:
> > > > > I may miss something obvious, but prior version of a code
> > > > > with pBlob and n was much shorter, compacter and more readable.
> > > > > I'm curious, why do you prefer to always use argv[0]->n and
> > > > > argv[0]->z instead?
> > > > > 
> > > > If we talk about the old function, then it really looks simpler. However, it did
> > > > not work correctly and also made some unnecessary changes to the arguments. You
> > > > can compare to the fixed version of old function on this branch:
> > > > imeevma/gh-6113-fix-hex-segfault-2.8 (which I also sent you for review). You will
> > > > see much less difference there.
> > > 
> > > I meant that newer code was a little bit .. mouthful, with unnecessary code
> > > substitution and visual noise which harmed readability. Here is an example
> > > of version which is not using argv[0]->.. wherever we refer to fields.
> > > 
> > > ----------------------------------------------------
> > > /** Implementation of the HEX() SQL built-in function. */
> > > static void
> > > func_hex(struct sql_context *ctx, int argc, struct Mem **argv)
> > > {
> > > 	assert(argc == 1);
> > > 	(void)argc;
> > > 	if (argv[0]->type == MEM_TYPE_NULL)
> > > 		return mem_set_null(ctx->pOut);
> > > 
> > > 	int n = argv[0]->n;
> > > 	int zero_len = argv[0]->u.nZero;
> 
> > I believe you cannot use undefined value.
> 
> That's good question, and wording in standard C is(was) confusing here, and
> one could get an impression that it's UB to acsess field of union, other
> than that has been initialized last. It's called type-punning. And always
> was just implemented as a trickier way for "reinterpret cast" which is
> compatible with aliasing analysis in compiler. There is correndum in C99
> defect report which clarifies that:
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm
> ```
> 78a If the member used to access the contents of a union object is not the
> same as the member last used to store a value in the object, the appropriate
> part of the object representation of the value is reinterpreted as an object
> representation in the new type as described in 6.2.6 (a process sometimes
> called "type punning"). This might be a trap representation.
> ```
> 
> i.e. it's either cast to asked type, or trap of such cast is impossible
> (like if target type is double, and value in memory creates invalid floating
> value, and host hardware programed to raise in such cases).
> 
> In this case we always get integer value, moreover we use this value below
> __only__ if argv[0]->flags & MEM_Zero is non-zero, so we reuse this value
> only in correct circumstances.
> 
> So, in short, that's ok in code below.
> 
You want to declare variable with value, which will be properly defined only if
MEM has MEM_Zero flag. I think that it is not good idea.

> > 
> > > 	assert(argv[0]->type == MEM_TYPE_BIN && n >= 0);
> > > 	assert((argv[0]->flags & MEM_Zero) == 0 || zero_len >= 0);
> > > 
> > > 	uint32_t size = 2 * n;
> > > 	if ((argv[0]->flags & MEM_Zero) != 0)
> > > 		size += 2 * zero_len;
> > > 	if (size == 0)
> > > 		return mem_set_str0_static(ctx->pOut, "");
> > > 
> > > 	char *str = sqlDbMallocRawNN(sql_get(), size);
> > > 	if (str == NULL) {
> > > 		ctx->is_aborted = true;
> > > 		return;
> > > 	}
> > > 	for (int i = 0; i < n; ++i) {
> > > 		char c = argv[0]->z[i];
> > > 		str[2 * i] = hexdigits[(c >> 4) & 0xf];
> > > 		str[2 * i + 1] = hexdigits[c & 0xf];
> > > 	}
> > > 	if ((argv[0]->flags & MEM_Zero) != 0)
> > > 		memset(&str[2 * n], '0', 2 * zero_len);
> > > 	mem_set_str_allocated(ctx->pOut, str, size);
> > > }
> > > 
> > > ----------------------------------------------------
> > > 
> > > It's more resembling original code (and that was done intentionally).
> > > 
> > I don't like that you define a variable with an undefined value in some cases.
> > I would introduce some new variables if there was some complicated logic,
> > however I don't see the need to do this here since I don't see complex
> > expressions.
> 
> I still insist that excessive usage of argv[0]->xxx in every line make this
> code uglier and less readable. Please, make code less verbose.
> 
> > 
> > > Also (and I didn't change it in the sample) there is apparent missing check
> > > for SQL_LIMIT_LENGTH limit which used to be done in contextMalloc() before,
> > > but now is missing once we use sqlDbMallocRawNN(). I assume we better return
> > > this check (once again as a proper wrapper which contextMalloc() essentially
> > > was).
> > > 
> > This will be verified in VDBE. I think it is better to have such a check
> > centralized for all functions.
> 
> Is it already verified at the moment? Or you meant it __will be__ verified
> eventually in future code.
> 
See vdbe.c line 1337.

> In general, though, we may not assume that the code will always be called in
> correct context where all bounds checks processed. So assertions should be
> local, placed where they are assumed to be present.
> 
> Remember recent discussion elsewhere - assertions may not be too much.
> Please put them here. (And in any case, they will not hurt performance in
> release mode, even if there will be coincidentally some duplicates here and
> there).
> 
This will be plainly wrong. If some value will have too much memory, the
error will be thrown in VDBE (see vdbe.c line 1337). However, if we insert
assert before this check, we will get assertion instead of the error, which
doesn't look good, I believe.

> > 
> > > > 
> > > > > Also, it seems to me we better to limit the number of bytes customer
> > > > > may request to allocate from HEX()? What about to check against SQL_LIMIT_LENGTH?
> > > > > 
> > > > This check is performed in the OP_BuiltinFunction opcode.
> > > 
> > > That's nice, so it's not a problem then.
> 
> Though, assertions help, as I said above...
> 
> > > 
> > > > 
> > > > > Thanks,
> > > > > Timur
> > > > > 
> 
> Thanks,
> Timur
> 
> 
> > > > > > -----Original Message-----
> > > > > > From: imeevma@tarantool.org <imeevma@tarantool.org>
> > > > > > Sent: Monday, August 30, 2021 9:31 AM
> > > > > > To: tsafin@tarantool.org
> > > > > > Cc: tarantool-patches@dev.tarantool.org
> > > > > > Subject: [PATCH v1 1/1] sql: fix a segfault in hex() on receiving
> > > > > > zeroblob
> > > > > > 
> > > > > > This patch fixes a segmentation fault when zeroblob is received by
> > > > > > the
> > > > > > SQL built-in HEX() function.
> > > > > > 
> > > > > > Closes #6113
> > > > > > ---
> > > > > > https://github.com/tarantool/tarantool/issues/6113
> > > > > > https://github.com/tarantool/tarantool/tree/imeevma/gh-6113-fix-hex-
> > > > > > segfault-2.10
> > > > > > 
> > > > > >    .../gh-6113-fix-segfault-in-hex-func.md       |  5 ++
> > > > > >    src/box/sql/func.c                            | 75 ++++++++++-------
> > > > > > --
> > > > > >    test/sql-tap/engine.cfg                       |  1 +
> > > > > >    ...gh-6113-assert-in-hex-on-zeroblob.test.lua | 13 ++++
> > > > > >    4 files changed, 58 insertions(+), 36 deletions(-)
> > > > > >    create mode 100644 changelogs/unreleased/gh-6113-fix-segfault-in-
> > > > > > hex-func.md
> > > > > >    create mode 100755 test/sql-tap/gh-6113-assert-in-hex-on-
> > > > > > zeroblob.test.lua
> > > > > > 
> > > > > > diff --git a/changelogs/unreleased/gh-6113-fix-segfault-in-hex-
> > > > > > func.md b/changelogs/unreleased/gh-6113-fix-segfault-in-hex-func.md
> > > > > > new file mode 100644
> > > > > > index 000000000..c59be4d96
> > > > > > --- /dev/null
> > > > > > +++ b/changelogs/unreleased/gh-6113-fix-segfault-in-hex-func.md
> > > > > > @@ -0,0 +1,5 @@
> > > > > > +## bugfix/sql
> > > > > > +
> > > > > > +* The HEX() SQL built-in function now does not throw an assert on
> > > > > > receiving
> > > > > > +  varbinary values that consist of zero-bytes (gh-6113).
> > > > > > +
> > > > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> > > > > > index c063552d6..fa2a2c245 100644
> > > > > > --- a/src/box/sql/func.c
> > > > > > +++ b/src/box/sql/func.c
> > > > > > @@ -53,6 +53,44 @@
> > > > > >    static struct mh_strnptr_t *built_in_functions = NULL;
> > > > > >    static struct func_sql_builtin **functions;
> > > > > > 
> > > > > > +/** Array for converting from half-bytes into ASCII hex digits. */
> > > > > > +static const char hexdigits[] = {
> > > > > > +	'0', '1', '2', '3', '4', '5', '6', '7',
> > > > > > +	'8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
> > > > > > +};
> > > > > > +
> > > > > > +/** Implementation of the HEX() SQL built-in function. */
> > > > > > +static void
> > > > > > +func_hex(struct sql_context *ctx, int argc, struct Mem **argv)
> > > > > > +{
> > > > > > +	assert(argc == 1);
> > > > > > +	(void)argc;
> > > > > > +	if (argv[0]->type == MEM_TYPE_NULL)
> > > > > > +		return mem_set_null(ctx->pOut);
> > > > > > +
> > > > > > +	assert(argv[0]->type == MEM_TYPE_BIN && argv[0]->n >= 0);
> > > > > > +	assert((argv[0]->flags & MEM_Zero) == 0 || argv[0]->u.nZero >=
> > > > > > 0);
> > > > > > +	uint32_t size = 2 * argv[0]->n;
> > > > > > +	if ((argv[0]->flags & MEM_Zero) != 0)
> > > > > > +		size += 2 * argv[0]->u.nZero;
> > > > > > +	if (size == 0)
> > > > > > +		return mem_set_str0_static(ctx->pOut, "");
> > > > > > +
> > > > > > +	char *str = sqlDbMallocRawNN(sql_get(), size);
> > > > > > +	if (str == NULL) {
> > > > > > +		ctx->is_aborted = true;
> > > > > > +		return;
> > > > > > +	}
> > > > > > +	for (int i = 0; i < argv[0]->n; ++i) {
> > > > > > +		char c = argv[0]->z[i];
> > > > > > +		str[2 * i] = hexdigits[(c >> 4) & 0xf];
> > > > > > +		str[2 * i + 1] = hexdigits[c & 0xf];
> > > > > > +	}
> > > > > > +	if ((argv[0]->flags & MEM_Zero) != 0)
> > > > > > +		memset(&str[2 * argv[0]->n], '0', 2 * argv[0]->u.nZero);
> > > > > > +	mem_set_str_allocated(ctx->pOut, str, size);
> > > > > > +}
> > > > > > +
> > > > > >    static const unsigned char *
> > > > > >    mem_as_ustr(struct Mem *mem)
> > > > > >    {
> > > > > > @@ -1072,14 +1110,6 @@ sql_func_version(struct sql_context *context,
> > > > > >    	sql_result_text(context, tarantool_version(), -1, SQL_STATIC);
> > > > > >    }
> > > > > > 
> > > > > > -/* Array for converting from half-bytes (nybbles) into ASCII hex
> > > > > > - * digits.
> > > > > > - */
> > > > > > -static const char hexdigits[] = {
> > > > > > -	'0', '1', '2', '3', '4', '5', '6', '7',
> > > > > > -	'8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
> > > > > > -};
> > > > > > -
> > > > > >    /*
> > > > > >     * Implementation of the QUOTE() function.  This function takes a
> > > > > > single
> > > > > >     * argument.  If the argument is numeric, the return value is the
> > > > > > same as
> > > > > > @@ -1233,33 +1263,6 @@ charFunc(sql_context * context, int argc,
> > > > > > sql_value ** argv)
> > > > > >    	sql_result_text64(context, (char *)z, zOut - z, sql_free);
> > > > > >    }
> > > > > > 
> > > > > > -/*
> > > > > > - * The hex() function.  Interpret the argument as a blob.  Return
> > > > > > - * a hexadecimal rendering as text.
> > > > > > - */
> > > > > > -static void
> > > > > > -hexFunc(sql_context * context, int argc, sql_value ** argv)
> > > > > > -{
> > > > > > -	int i, n;
> > > > > > -	const unsigned char *pBlob;
> > > > > > -	char *zHex, *z;
> > > > > > -	assert(argc == 1);
> > > > > > -	UNUSED_PARAMETER(argc);
> > > > > > -	pBlob = mem_as_bin(argv[0]);
> > > > > > -	n = mem_len_unsafe(argv[0]);
> > > > > > -	assert(pBlob == mem_as_bin(argv[0]));	/* No encoding change */
> > > > > > -	z = zHex = contextMalloc(context, ((i64) n) * 2 + 1);
> > > > > > -	if (zHex) {
> > > > > > -		for (i = 0; i < n; i++, pBlob++) {
> > > > > > -			unsigned char c = *pBlob;
> > > > > > -			*(z++) = hexdigits[(c >> 4) & 0xf];
> > > > > > -			*(z++) = hexdigits[c & 0xf];
> > > > > > -		}
> > > > > > -		*z = 0;
> > > > > > -		sql_result_text(context, zHex, n * 2, sql_free);
> > > > > > -	}
> > > > > > -}
> > > > > > -
> > > > > >    /*
> > > > > >     * The zeroblob(N) function returns a zero-filled blob of size N
> > > > > > bytes.
> > > > > >     */
> > > > > > @@ -2034,7 +2037,7 @@ static struct sql_func_definition definitions[]
> > > > > > = {
> > > > > >    	{"GROUP_CONCAT", 2, {FIELD_TYPE_VARBINARY,
> > > > > > FIELD_TYPE_VARBINARY},
> > > > > >    	 FIELD_TYPE_VARBINARY, groupConcatStep, groupConcatFinalize},
> > > > > > 
> > > > > > -	{"HEX", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_STRING, hexFunc,
> > > > > > NULL},
> > > > > > +	{"HEX", 1, {FIELD_TYPE_VARBINARY}, FIELD_TYPE_STRING, func_hex,
> > > > > > NULL},
> > > > > >    	{"IFNULL", 2, {FIELD_TYPE_ANY, FIELD_TYPE_ANY},
> > > > > > FIELD_TYPE_SCALAR,
> > > > > >    	 sql_builtin_stub, NULL},
> > > > > > 
> > > 
> > > Regards,
> > > Timur

  reply	other threads:[~2021-09-07  9:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30  6:30 Mergen Imeev via Tarantool-patches
2021-08-31 19:32 ` Timur Safin via Tarantool-patches
2021-09-01  8:44   ` Mergen Imeev via Tarantool-patches
2021-09-03 19:19     ` Safin Timur via Tarantool-patches
2021-09-06  9:45       ` Mergen Imeev via Tarantool-patches
2021-09-06 20:32         ` Safin Timur via Tarantool-patches
2021-09-07  9:16           ` Mergen Imeev via Tarantool-patches [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-10-05 12:49 Mergen Imeev via Tarantool-patches
2021-08-30  6:20 Mergen Imeev via Tarantool-patches
2021-09-03 19:20 ` Safin Timur via Tarantool-patches
2021-08-26 11:11 Mergen Imeev via Tarantool-patches
2021-08-26 20:42 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-27  8:26   ` Mergen Imeev via Tarantool-patches
2021-08-27 21:31     ` Vladislav Shpilevoy via Tarantool-patches
2021-08-26 11:10 Mergen Imeev via Tarantool-patches
2021-08-26 20:31 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-27  7:54   ` Mergen Imeev via Tarantool-patches
2021-08-27 21:52     ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210907091624.GA4768@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=imeevma@tarantool.org \
    --cc=tsafin@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/1] sql: fix a segfault in hex() on receiving zeroblob' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox