Tarantool development patches archive
 help / color / mirror / Atom feed
From: "n.pettik" <korablev@tarantool.org>
To: tarantool-patches@freelists.org
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean
Date: Tue, 23 Apr 2019 22:58:26 +0300	[thread overview]
Message-ID: <45E647F0-79F4-41DE-BD69-996A2D3EB7F1@tarantool.org> (raw)
In-Reply-To: <afbeb182-e5d0-4e5a-e010-57913e36d345@tarantool.org>


>> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
>> index 6b38e8e66..d70b64c45 100644
>> --- a/src/box/sql/expr.c
>> +++ b/src/box/sql/expr.c
>> @@ -3775,7 +3766,7 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
>>                }
>>        case TK_TRUE:
>>        case TK_FALSE: {
>> -                       vdbe_emit_bool(pParse->pVdbe, pExpr, target);
>> +                       sqlVdbeAddOp2(v, OP_Bool, op == TK_TRUE, target);
>>                        return target;
>>                }
>> 
>>> 6. I see, that all other 'case's set pExpr->type. Why don't you
>>> do it here?
>> 
>> It is required only for non-constant values. For literals it is assigned
>> in parse.y spanExpr().
> 
> 1. Then why case TK_INTEGER sets 'pExpr->type = FIELD_TYPE_INTEGER',
> TK_STRING sets 'pExpr->type = FIELD_TYPE_STRING;', TK_BLOB sets
> 'pExpr->type = FIELD_TYPE_SCALAR;' - I see all of them being set in
> parse.y.

Those are literals, so as I said above, their types are assigned
right during parsing. If you are asking “why do we need to do so?”,
then answer is “they are leaf nodes and it is most appropriate
moment to do so”.

> On the contrary, there are places which create a new Expr, and
> does not set type even in parse.y. For example:
> 
>    1240: A.pExpr = sql_expr_new_dequoted(pParse->db, TK_INTEGER, &sqlIntTokens[N]);

Speaking of this particular example, yes, it was buggy place.
I’ve fixed it in "sql: make predicates accept and return boolean" patch

> Here the expr is created and spanExpr() is not used. How does it
> work?
> 
>>> 8. Looking at all these type comparison prohibitions I am getting
>>> afraid of how will we select from SCALAR columns?
>>> 
>>> A schema:
>>> 
>>>   box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);')
>>>   box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')")
>>> 
>>> SQL select:
>>> 
>>>> box.execute('SELECT * FROM t WHERE a > 1')
>>>   ---
>>>   - error: 'Type mismatch: can not convert INTEGER to boolean'
>>>   ...
>>> 
>>> The same Lua select:
>>> 
>>>> box.space.T.index[1]:select({1}, {iterator = 'GT'})
>>>   ---
>>>   - - [4, 'str']
>>>   ...
>>> 
>>> In fact, now we can not use SCALAR in SQL for any comparisons
>>> because it will raise type mismatch on literally everything.
>>> 
>>> What are we going to do with it? IMO, we could add a flag like
>>> 'is_scalar' to struct Mem in its flags section, which would allow
>>> to compare this value with anything.
>> 
>> Konstantin suggested to extend struct Mem with field_type,
>> which is going to substitute this flag.
>> 
>>> Looks crutchy though. And is
>>> not a part of this issue of course, but should be filed into the
>>> issue list.
>>> 
>>> I see a similar issue https://github.com/tarantool/tarantool/issues/4124.
>>> Probably, it is worth extending it instead of opening new.
>> 
>> I consider this as a trade-off of using scalar: users shouldn’t use this
>> type until they are really have to. Nevertheless, they still can use CAST
>> operator and switch case, or write their own lua-function converting values
>> to required type.
> 
> 2. Users may have good designed schema and work in each request with only
> one type in such a column, but still they will get errors just because
> the iterator can not walk to the needed tree node. Lets modify my example:
> 
>    SELECT * FROM t WHERE a = 1;
> 
> This request implicitly says that it will work with numbers only, but
> somehow its success depends on the content of the space, not even
> related to '1' value. In a nutshell it means that insertion of something
> into a space can break existing requests not related to the inserted
> data. It is weird. Such scalar type would be useless.

Yep, we can observe the same behaviour in DB2. For instance:

db2 => create table t (id varchar(10))
DB20000I  The SQL command completed successfully.
db2 => insert into t values(1)
DB20000I  The SQL command completed successfully.
db2 => select id from t where id = 1

ID        
----------
1         

  1 record(s) selected.

db2 => insert into t values('abc’)
DB20000I  The SQL command completed successfully.
db2 => insert into t values(1)
DB20000I  The SQL command completed successfully.
db2 => select id from t where id = 1

ID        
----------
1         
SQL0420N  Invalid character found in a character string argument of the 
function "DECFLOAT".  SQLSTATE=22018

After insertion of unconvertible to integer value ‘abc’,
select starts to fail. Note that DB2 is considered to be
most compatible with ANSI according to members of
our server team.

>>>> + *
>>>> + * @param str String to be converted to boolean.
>>>> + *            Assumed to be null terminated.
>>>> + * @param result Resulting value of cast.
>>>> + * @retval 0 If string satisfies conditions above.
>>>> + * @retval -1 Otherwise.
>>>> + */
>>>> +static int
>>>> +str_cast_to_boolean(const char *str, bool *result)
>>>> +{
>>>> +	assert(str != NULL);
>>>> +	for (; *str == ' '; str++);
>>>> +	size_t rest_str_len = strlen(str);
>>>> +	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>>>> +		*result = true;
>>>> +		return 0;
>>>> +	}
>>>> +	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>>>> +		*result = false;
>>>> +		return 0;
>>>> +	}
>>> 
>>> 16. I guess it is far from the most efficient implementation. You
>>> calculate length of the whole string before comparison, scanning
>>> its twice in the worst and most common case. Please, consider my
>>> fixes below. I did not test them though, so probably there are some
>>> typos.
>>> Note, that in my fixes I've already accounted my comment about
>>> trimming trailing whitespaces.
>>> 
>>> ============================================================================
>>> @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result)
>>> {
>>> 	assert(str != NULL);
>>> 	for (; *str == ' '; str++);
>>> -	size_t rest_str_len = strlen(str);
>>> -	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>>> +	if (strncasecmp(str, "true", 4) == 0) {
>> 
>> We can’t use strncasecmp() without verification the fact that
>> rest_str_len (i.e. string length after skipping leading spaces) is >= 4.
> 
> 3.1. Why? According to the standard, strncasecmp() compares not more than
> 'n' characters taking into account terminating zero. I tried these
> examples to simulate our case (when 'str' points to the beginning of a
> possible 'true/false' token).
> 
> 	printf("%d\n", strncasecmp("", "true", 4));
> 	printf("%d\n", strncasecmp("t", "true", 4));
> 	printf("%d\n", strncasecmp("tr", "true", 4));
> 	printf("%d\n", strncasecmp("tru", "true", 4));
> 	printf("%d\n", strncasecmp("true   ", "true", 4));
> 
> I got this output:
> 
>    -116
>    -114
>    -117
>    -101
>    0
> 
> It means, that you do not need to call 'strlen' before
> 'strncasecmp'. I've dropped 'strlen' again and the tests
> passed. If you think that explicit length check is mandatory,
> then please, provide a test. Probably I'm wrong somewhere in
> my speculations.

Well, you are right: I missed the fact that it compares UP to n symbols.
I’m sorry to have confused you.

>> -       if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>> +       if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
>>                *result = true;
>> -               return 0;
>> -       }
>> -       if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>> +               str += 4;
>> +       } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
>>                *result = false;
>> -               return 0;
>> +               str += 5;
>> +       } else {
>> +               return -1;
>>        }
>> -       return -1;
>> +       for (; *str != '\0'; ++str) {
>> +               if (*str != ' ')
>> +                       return -1;
>> +       }
>> +       return 0;
>> }
> 
> My fixes, on the branch:

Ok, applied.

  reply	other threads:[~2019-04-23 19:58 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-14 15:03 [tarantool-patches] [PATCH 0/9] Introduce type BOOLEAN in SQL Nikita Pettik
2019-04-14 15:03 ` [tarantool-patches] [PATCH 1/9] sql: refactor mem_apply_numeric_type() Nikita Pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 2/9] sql: disallow text values participate in sum() aggregate Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 3/9] sql: use msgpack types instead of custom ones Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 4/9] sql: introduce type boolean Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik [this message]
2019-04-23 21:06           ` Vladislav Shpilevoy
2019-04-14 15:04 ` [tarantool-patches] [PATCH 5/9] sql: improve type determination for column meta Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 6/9] sql: make comparison predicate return boolean Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 7/9] sql: make predicates accept and " Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:55     ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 9/9] sql: make <search condition> accept only boolean Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:55     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:59         ` n.pettik
2019-04-23 21:06           ` Vladislav Shpilevoy
2019-04-23 22:01             ` n.pettik
     [not found] ` <b2a84f129c2343d3da3311469cbb7b20488a21c2.1555252410.git.korablev@tarantool.org>
2019-04-16 14:12   ` [tarantool-patches] Re: [PATCH 8/9] sql: make LIKE predicate return boolean result Vladislav Shpilevoy
2019-04-18 17:55     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-24 10:28 ` [tarantool-patches] Re: [PATCH 0/9] Introduce type BOOLEAN in SQL Vladislav Shpilevoy
2019-04-25  8:46 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45E647F0-79F4-41DE-BD69-996A2D3EB7F1@tarantool.org \
    --to=korablev@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='[tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox