From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <imeevma@tarantool.org>
Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id C2417469719
 for <tarantool-discussions@dev.tarantool.org>;
 Sat, 14 Mar 2020 12:19:52 +0300 (MSK)
Date: Sat, 14 Mar 2020 12:19:48 +0300
From: Mergen Imeev <imeevma@tarantool.org>
Message-ID: <20200314091948.GA8144@tarantool.org>
References: <20200310132339.GB9563@tarantool.org>
 <7da68689-123f-6ec6-2846-4ff35f40831c@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <7da68689-123f-6ec6-2846-4ff35f40831c@tarantool.org>
Subject: Re: [Tarantool-discussions] Rules for comparing numbers using
	Tarantool indexes in SQL.
List-Id: Tarantool development process
 <tarantool-discussions.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-discussions>, 
 <mailto:tarantool-discussions-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-discussions/>
List-Post: <mailto:tarantool-discussions@dev.tarantool.org>
List-Help: <mailto:tarantool-discussions-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-discussions>, 
 <mailto:tarantool-discussions-request@dev.tarantool.org?subject=subscribe>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-discussions@dev.tarantool.org

Hi! Thanks for answering. Sorry, I did not fully describe
everything that I should in the original letter.

At first, when I say "value of type INTEGER", I mean the integers
in [-2^63, 2^64-1]; when I say "value of type DOUBLE", I mean a
double precision floating point number (IEEE 754).

Rules for comparing two values that I plan to implement:
1) If both values are of the same type, they can be compared
without reduction.
2) If the type of the first value is contained in the type of the
second value, then they can be compared. The first value is cast
to the type of the second value. (For example, a value of type
UNSIGNED can be compared with a value of type INTEGER, the same
for UNSIGNED/INTEGER/ DOUBLE and NUMBER. In addition, "first" and
"second" here do not mean "left operand" or "right operand". We
must read it as "one" and "the other".)
3) If we compare the value d of type DOUBLE with the value i of
type INTEGER/UNSIGNED, then we will use these rules:
  a) If d <-2 ^ 63 or d> = 2 ^ 64, then we can say the result of
     the comparison without actual comparison.
  b) If -2 ^ 53 <= i <= 2 ^ 53, then we will convert i to the type
     DOUBLE.
  c) If -2 ^ 63 <= d <2 ^ 53 or 2 ^ 53 <d <= 2 ^ 64, then we cast
     d to INTEGER.

Variables that I use in tables: i is value of type INTEGER, d is
value of type DOUBLE.

The operation "(int)d" means that we are trying to convert
value d of type DOUBLE to a value of type INTEGER,
discarding the part after the floating point. If d >= 0,
than (int)d <= d, else (int)d >= d.

The operation "(double)i" means that we are trying to convert
value i of type INTEGER/UNSIGNED to a value of type DOUBLE.


On Thu, Mar 12, 2020 at 10:37:25PM +0100, Vladislav Shpilevoy wrote:
> Hi!
> 
> Nice table. Perhaps it is worth documenting it somewhere on the
> site, or on wiki, or in an RFC. To persist it somehow, and send
> users to there, when they ask something about cross-type comparison.
> 
> On 10/03/2020 14:23, Mergen Imeev wrote:
> > Hi all,
> > I would like to discuss comparison between numbers using an index
> > in SQL. I admit that I was wrong when I said that we can compare
> > numbers without any explicit conversion.
> 
> I don't think I fully understand what it means to compare
> numbers using an index.
> 
I mean: "when using comparison operators, we should get the
same result when searching by index, as in the case of a
fullscan."

> See 2 comments below.
> 
> > To achieve the same result as if we would not use any index, I
> > propose the following rules.
> > 
> > Column names in the tables below:
> > opcode - original operation ('>' == GT, etc.);
> > conditions - additional conditions on the value of the variable;
> > iter - iterator that will be used to search in Tarantool;
> > value - value that will be used to search in Tarantool;
> > is_none - flag, is_none == true if tuples will not be found.
> > 
> > I) Field type of index part == UNSIGNED:
> > 
> > 1) mp_type of value == MP_INT:
> > 
> > All values with mp_type == MP_INT are less than 0 by definition.
> > 
> > | opcode   | conditions | iter   | value | is_none |
> > --------------------------------------------------------
> > | GE/GT    | -          | GE     | 0     | No      |
> > | LE/LT/EQ | -          | -      | -     | Yes     |
> > 
> > 2) mp_type of value == MP_DOUBLE:
> > 
> > | opcode   | conditions     | iter   | value      | is_none |
> > -------------------------------------------------------------
> > | GE/GT    | d < 0          | GE     | 0          | No      |
> > | LE/LT/EQ | d < 0          | -      | -          | Yes     |
> > | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
> > | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |
> > 
> > In the next table we assert that d >= 0 and d <= UINT64_MAX.
> 
> 1. What is 'd'? Value to search by?
> 
Yes. Sorry, forgot to mention that previously.

> > 
> > | opcode   | conditions  | iter   | value  | is_none |
> > ----------------------------------------------------------
> > | GT       | -           | GT     | (int)d     | No      |
> > | GE       | d == (int)d | GE     | (int)d     | No      |
> > | GE       | d != (int)d | GT     | (int)d     | No      |
> > | EQ       | d == (int)d | EQ     | (int)d     | No      |
> > | EQ       | d != (int)d | -      | -          | Yes     |
> > | LT       | d == (int)d | LT     | (int)d     | No      |
> > | LT       | d != (int)d | LE     | (int)d     | No      |
> > | LE       | -           | LE     | (int)d     | No      |
> > 
> > II) Field type of index part == INTEGER:
> > 
> > In this case the only possible mp_type of value is MP_DOUBLE.
> 
> 2. Why? INTEGER field type can store MP_UINT and MP_INT. So you
> can search here by any number mp_type.
> 
True, but these tables should be used only in cases where
we cannot use given value to search by index.


> > | opcode   | conditions     | iter   | value      | is_none |
> > -------------------------------------------------------------
> > | GE/GT    | d < INT64_MIN  | GE     | INT64_MIN  | No      |
> > | LE/LT/EQ | d < INT64_MIN  | -      | -          | Yes     |
> > | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
> > | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |
> > 
> > In the next table we assert that d >= INT64_MIN and
> > d < 0. In case d >= 0 and d <= UINT64_MAX we may use table for
> > UNSIGNED.
> > 
> > | opcode   | conditions  | iter   | value     | is_none |
> > ---------------------------------------------------------
> > | GT       | d == (int)d | GT     | (int)d    | No      |
> > | GT       | d != (int)d | GE     | (int)d    | No      |
> > | GE       | -           | GE     | (int)d    | No      |
> > | EQ       | d == (int)d | EQ     | (int)d    | No      |
> > | EQ       | d != (int)d | -      | -         | Yes     |
> > | LT       | -           | LT     | (int)d    | No      |
> > | LE       | d == (int)d | LE     | (int)d    | No      |
> > | LE       | d != (int)d | LT     | (int)d    | No      |
> > 
> > III) Field type of index part == DOUBLE:
> > 
> > In this case, the rules are the same for MP_INT and MP_UINT. In
> > addition, we exclude the case when the value is less than 2^53 and
> > more than -2^53, since we can simply convert it to DOUBLE without
> > precision loss.
> > 
> > In the table below, we assert that the value is INTEGER which is
> > greater than 2^53 or less than -2^53.
> > 
> > | opcode   | conditions     | iter   | value     | is_none |
> > ------------------------------------------------------------
> > | GT       | (double)i > i  | GE     | (double)i | No      |
> > | GT       | (double)i <= i | GT     | (double)i | No      |
> > | GE       | (double)i >= i | GE     | (double)i | No      |
> > | GE       | (double)i < i  | GT     | (double)i | No      |
> > | EQ       | (double)i == i | EQ     | (double)i | No      |
> > | EQ       | (double)i != i | -      | -         | Yes     |
> > | LT       | (double)i < i  | LE     | (double)i | No      |
> > | LT       | (double)i >= i | LT     | (double)i | No      |
> > | LE       | (double)i <= i | LE     | (double)i | No      |
> > | LE       | (double)i > i  | LT     | (double)i | No      |
> >