Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL.
@ 2020-03-10 13:23 Mergen Imeev
  2020-03-12 21:37 ` Vladislav Shpilevoy
  2020-03-13 16:25 ` Peter Gulutzan
  0 siblings, 2 replies; 6+ messages in thread
From: Mergen Imeev @ 2020-03-10 13:23 UTC (permalink / raw)
  To: tarantool-discussions, korablev, alexander.turenko, v.shpilevoy, kyukhin

Hi all,
I would like to discuss comparison between numbers using an index
in SQL. I admit that I was wrong when I said that we can compare
numbers without any explicit conversion.

To achieve the same result as if we would not use any index, I
propose the following rules.

Column names in the tables below:
opcode - original operation ('>' == GT, etc.);
conditions - additional conditions on the value of the variable;
iter - iterator that will be used to search in Tarantool;
value - value that will be used to search in Tarantool;
is_none - flag, is_none == true if tuples will not be found.

I) Field type of index part == UNSIGNED:

1) mp_type of value == MP_INT:

All values with mp_type == MP_INT are less than 0 by definition.

| opcode   | conditions | iter   | value | is_none |
--------------------------------------------------------
| GE/GT    | -          | GE     | 0     | No      |
| LE/LT/EQ | -          | -      | -     | Yes     |

2) mp_type of value == MP_DOUBLE:

| opcode   | conditions     | iter   | value      | is_none |
-------------------------------------------------------------
| GE/GT    | d < 0          | GE     | 0          | No      |
| LE/LT/EQ | d < 0          | -      | -          | Yes     |
| GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
| LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |

In the next table we assert that d >= 0 and d <= UINT64_MAX.

| opcode   | conditions  | iter   | value  | is_none |
----------------------------------------------------------
| GT       | -           | GT     | (int)d     | No      |
| GE       | d == (int)d | GE     | (int)d     | No      |
| GE       | d != (int)d | GT     | (int)d     | No      |
| EQ       | d == (int)d | EQ     | (int)d     | No      |
| EQ       | d != (int)d | -      | -          | Yes     |
| LT       | d == (int)d | LT     | (int)d     | No      |
| LT       | d != (int)d | LE     | (int)d     | No      |
| LE       | -           | LE     | (int)d     | No      |

II) Field type of index part == INTEGER:

In this case the only possible mp_type of value is MP_DOUBLE.

| opcode   | conditions     | iter   | value      | is_none |
-------------------------------------------------------------
| GE/GT    | d < INT64_MIN  | GE     | INT64_MIN  | No      |
| LE/LT/EQ | d < INT64_MIN  | -      | -          | Yes     |
| GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
| LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |

In the next table we assert that d >= INT64_MIN and
d < 0. In case d >= 0 and d <= UINT64_MAX we may use table for
UNSIGNED.

| opcode   | conditions  | iter   | value     | is_none |
---------------------------------------------------------
| GT       | d == (int)d | GT     | (int)d    | No      |
| GT       | d != (int)d | GE     | (int)d    | No      |
| GE       | -           | GE     | (int)d    | No      |
| EQ       | d == (int)d | EQ     | (int)d    | No      |
| EQ       | d != (int)d | -      | -         | Yes     |
| LT       | -           | LT     | (int)d    | No      |
| LE       | d == (int)d | LE     | (int)d    | No      |
| LE       | d != (int)d | LT     | (int)d    | No      |

III) Field type of index part == DOUBLE:

In this case, the rules are the same for MP_INT and MP_UINT. In
addition, we exclude the case when the value is less than 2^53 and
more than -2^53, since we can simply convert it to DOUBLE without
precision loss.

In the table below, we assert that the value is INTEGER which is
greater than 2^53 or less than -2^53.

| opcode   | conditions     | iter   | value     | is_none |
------------------------------------------------------------
| GT       | (double)i > i  | GE     | (double)i | No      |
| GT       | (double)i <= i | GT     | (double)i | No      |
| GE       | (double)i >= i | GE     | (double)i | No      |
| GE       | (double)i < i  | GT     | (double)i | No      |
| EQ       | (double)i == i | EQ     | (double)i | No      |
| EQ       | (double)i != i | -      | -         | Yes     |
| LT       | (double)i < i  | LE     | (double)i | No      |
| LT       | (double)i >= i | LT     | (double)i | No      |
| LE       | (double)i <= i | LE     | (double)i | No      |
| LE       | (double)i > i  | LT     | (double)i | No      |

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL.
  2020-03-10 13:23 [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL Mergen Imeev
@ 2020-03-12 21:37 ` Vladislav Shpilevoy
  2020-03-14  9:19   ` Mergen Imeev
  2020-03-13 16:25 ` Peter Gulutzan
  1 sibling, 1 reply; 6+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-12 21:37 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions, korablev, alexander.turenko,
	kyukhin

Hi!

Nice table. Perhaps it is worth documenting it somewhere on the
site, or on wiki, or in an RFC. To persist it somehow, and send
users to there, when they ask something about cross-type comparison.

On 10/03/2020 14:23, Mergen Imeev wrote:
> Hi all,
> I would like to discuss comparison between numbers using an index
> in SQL. I admit that I was wrong when I said that we can compare
> numbers without any explicit conversion.

I don't think I fully understand what it means to compare
numbers using an index.

See 2 comments below.

> To achieve the same result as if we would not use any index, I
> propose the following rules.
> 
> Column names in the tables below:
> opcode - original operation ('>' == GT, etc.);
> conditions - additional conditions on the value of the variable;
> iter - iterator that will be used to search in Tarantool;
> value - value that will be used to search in Tarantool;
> is_none - flag, is_none == true if tuples will not be found.
> 
> I) Field type of index part == UNSIGNED:
> 
> 1) mp_type of value == MP_INT:
> 
> All values with mp_type == MP_INT are less than 0 by definition.
> 
> | opcode   | conditions | iter   | value | is_none |
> --------------------------------------------------------
> | GE/GT    | -          | GE     | 0     | No      |
> | LE/LT/EQ | -          | -      | -     | Yes     |
> 
> 2) mp_type of value == MP_DOUBLE:
> 
> | opcode   | conditions     | iter   | value      | is_none |
> -------------------------------------------------------------
> | GE/GT    | d < 0          | GE     | 0          | No      |
> | LE/LT/EQ | d < 0          | -      | -          | Yes     |
> | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
> | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |
> 
> In the next table we assert that d >= 0 and d <= UINT64_MAX.

1. What is 'd'? Value to search by?

> 
> | opcode   | conditions  | iter   | value  | is_none |
> ----------------------------------------------------------
> | GT       | -           | GT     | (int)d     | No      |
> | GE       | d == (int)d | GE     | (int)d     | No      |
> | GE       | d != (int)d | GT     | (int)d     | No      |
> | EQ       | d == (int)d | EQ     | (int)d     | No      |
> | EQ       | d != (int)d | -      | -          | Yes     |
> | LT       | d == (int)d | LT     | (int)d     | No      |
> | LT       | d != (int)d | LE     | (int)d     | No      |
> | LE       | -           | LE     | (int)d     | No      |
> 
> II) Field type of index part == INTEGER:
> 
> In this case the only possible mp_type of value is MP_DOUBLE.

2. Why? INTEGER field type can store MP_UINT and MP_INT. So you
can search here by any number mp_type.

> | opcode   | conditions     | iter   | value      | is_none |
> -------------------------------------------------------------
> | GE/GT    | d < INT64_MIN  | GE     | INT64_MIN  | No      |
> | LE/LT/EQ | d < INT64_MIN  | -      | -          | Yes     |
> | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
> | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |
> 
> In the next table we assert that d >= INT64_MIN and
> d < 0. In case d >= 0 and d <= UINT64_MAX we may use table for
> UNSIGNED.
> 
> | opcode   | conditions  | iter   | value     | is_none |
> ---------------------------------------------------------
> | GT       | d == (int)d | GT     | (int)d    | No      |
> | GT       | d != (int)d | GE     | (int)d    | No      |
> | GE       | -           | GE     | (int)d    | No      |
> | EQ       | d == (int)d | EQ     | (int)d    | No      |
> | EQ       | d != (int)d | -      | -         | Yes     |
> | LT       | -           | LT     | (int)d    | No      |
> | LE       | d == (int)d | LE     | (int)d    | No      |
> | LE       | d != (int)d | LT     | (int)d    | No      |
> 
> III) Field type of index part == DOUBLE:
> 
> In this case, the rules are the same for MP_INT and MP_UINT. In
> addition, we exclude the case when the value is less than 2^53 and
> more than -2^53, since we can simply convert it to DOUBLE without
> precision loss.
> 
> In the table below, we assert that the value is INTEGER which is
> greater than 2^53 or less than -2^53.
> 
> | opcode   | conditions     | iter   | value     | is_none |
> ------------------------------------------------------------
> | GT       | (double)i > i  | GE     | (double)i | No      |
> | GT       | (double)i <= i | GT     | (double)i | No      |
> | GE       | (double)i >= i | GE     | (double)i | No      |
> | GE       | (double)i < i  | GT     | (double)i | No      |
> | EQ       | (double)i == i | EQ     | (double)i | No      |
> | EQ       | (double)i != i | -      | -         | Yes     |
> | LT       | (double)i < i  | LE     | (double)i | No      |
> | LT       | (double)i >= i | LT     | (double)i | No      |
> | LE       | (double)i <= i | LE     | (double)i | No      |
> | LE       | (double)i > i  | LT     | (double)i | No      |
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL.
  2020-03-10 13:23 [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL Mergen Imeev
  2020-03-12 21:37 ` Vladislav Shpilevoy
@ 2020-03-13 16:25 ` Peter Gulutzan
  2020-03-14  9:54   ` Mergen Imeev
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Gulutzan @ 2020-03-13 16:25 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

Hi,

On 2020-03-10 7:23 a.m., Mergen Imeev wrote:
 > Hi all,
 > I would like to discuss comparison between numbers using an index
 > in SQL. I admit that I was wrong when I said that we can compare
 > numbers without any explicit conversion.
 >

Do you mean that you will require explicit conversion
i.e. the CAST function when comparing numbers, sometimes?

 > To achieve the same result as if we would not use any index, I
 > propose the following rules.
 >
 > Column names in the tables below:
 > opcode - original operation ('>' == GT, etc.);
 > conditions - additional conditions on the value of the variable;
 > iter - iterator that will be used to search in Tarantool;
 > value - value that will be used to search in Tarantool;
 > is_none - flag, is_none == true if tuples will not be found.
 >

I'll assume that when you say "the value of the variable",
you mean the value in the index. So later, when you say
"d < 0", you mean "the value in the index is less than 0".

I'll assume that when you say "is_none == true if tuples will be found",
you do not mean that is_none == false implies that tuples
will be found, that is unknown.
And of course when you put "No" and "Yes" in the is_none column,
you mean False and True respectively.

I'll assume that when you put "-" in the value column, you mean
that value can be anything.

I'll assume that when you say "d < 0" in the conditions column
and "-" in the value column and "Yes" in the is_none column,
you do not mean "if the index contains any value that is less
than 0 then there will be no returned tuples", you mean
"if the value in the index is less than 0 then the tuple
associated with that value will not be returned (but for
another value in the index there might be a returned tuple)".

 > I) Field type of index part == UNSIGNED:
 >
 > 1) mp_type of value == MP_INT:
 >
 > All values with mp_type == MP_INT are less than 0 by definition.
 >
 > | opcode   | conditions | iter   | value | is_none |
 > --------------------------------------------------------
 > | GE/GT    | -          | GE     | 0     | No      |
 > | LE/LT/EQ | -          | -      | -     | Yes     |
 >

MessagePack definition
https://github.com/msgpack/msgpack/blob/master/spec.md#int-format-family
does not say that numbers in the "int" family are all less than 0.
So I'll assume that when you say MP_INT values are less than 0
"by definition", you do not mean MessagePack definition, you
mean that you will guarantee that any search values which are
greater than or equal to 0 will not be identified as MP_INT.

 > 2) mp_type of value == MP_DOUBLE:
 >
 > | opcode   | conditions     | iter   | value      | is_none |
 > -------------------------------------------------------------
 > | GE/GT    | d < 0          | GE     | 0          | No |
 > | LE/LT/EQ | d < 0          | -      | -          | Yes |
 > | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes |
 > | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No |
 >
<cut>

Because of the assumptions that I made about what you mean,
I think that these statements are true:
If index on column1 has -1: column1 >= 0.0 maybe will return tuples.
If index on column1 has -1: column1 <= 0.0 will not return tuples.
If index on column1 has UINT64_MAX+1: column1 > 0.0 will not return tuples.
If index on column1 has UINT64_MAX+1: column1 < 0.0 maybe will return 
tuples.

But those results are impossible or absurd.

Therefore some or all of my assumptions, about what you mean,
must be incorrect. Perhaps you can explain what I did wrong?

Peter Gulutzan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL.
  2020-03-12 21:37 ` Vladislav Shpilevoy
@ 2020-03-14  9:19   ` Mergen Imeev
  0 siblings, 0 replies; 6+ messages in thread
From: Mergen Imeev @ 2020-03-14  9:19 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-discussions

Hi! Thanks for answering. Sorry, I did not fully describe
everything that I should in the original letter.

At first, when I say "value of type INTEGER", I mean the integers
in [-2^63, 2^64-1]; when I say "value of type DOUBLE", I mean a
double precision floating point number (IEEE 754).

Rules for comparing two values that I plan to implement:
1) If both values are of the same type, they can be compared
without reduction.
2) If the type of the first value is contained in the type of the
second value, then they can be compared. The first value is cast
to the type of the second value. (For example, a value of type
UNSIGNED can be compared with a value of type INTEGER, the same
for UNSIGNED/INTEGER/ DOUBLE and NUMBER. In addition, "first" and
"second" here do not mean "left operand" or "right operand". We
must read it as "one" and "the other".)
3) If we compare the value d of type DOUBLE with the value i of
type INTEGER/UNSIGNED, then we will use these rules:
  a) If d <-2 ^ 63 or d> = 2 ^ 64, then we can say the result of
     the comparison without actual comparison.
  b) If -2 ^ 53 <= i <= 2 ^ 53, then we will convert i to the type
     DOUBLE.
  c) If -2 ^ 63 <= d <2 ^ 53 or 2 ^ 53 <d <= 2 ^ 64, then we cast
     d to INTEGER.

Variables that I use in tables: i is value of type INTEGER, d is
value of type DOUBLE.

The operation "(int)d" means that we are trying to convert
value d of type DOUBLE to a value of type INTEGER,
discarding the part after the floating point. If d >= 0,
than (int)d <= d, else (int)d >= d.

The operation "(double)i" means that we are trying to convert
value i of type INTEGER/UNSIGNED to a value of type DOUBLE.


On Thu, Mar 12, 2020 at 10:37:25PM +0100, Vladislav Shpilevoy wrote:
> Hi!
> 
> Nice table. Perhaps it is worth documenting it somewhere on the
> site, or on wiki, or in an RFC. To persist it somehow, and send
> users to there, when they ask something about cross-type comparison.
> 
> On 10/03/2020 14:23, Mergen Imeev wrote:
> > Hi all,
> > I would like to discuss comparison between numbers using an index
> > in SQL. I admit that I was wrong when I said that we can compare
> > numbers without any explicit conversion.
> 
> I don't think I fully understand what it means to compare
> numbers using an index.
> 
I mean: "when using comparison operators, we should get the
same result when searching by index, as in the case of a
fullscan."

> See 2 comments below.
> 
> > To achieve the same result as if we would not use any index, I
> > propose the following rules.
> > 
> > Column names in the tables below:
> > opcode - original operation ('>' == GT, etc.);
> > conditions - additional conditions on the value of the variable;
> > iter - iterator that will be used to search in Tarantool;
> > value - value that will be used to search in Tarantool;
> > is_none - flag, is_none == true if tuples will not be found.
> > 
> > I) Field type of index part == UNSIGNED:
> > 
> > 1) mp_type of value == MP_INT:
> > 
> > All values with mp_type == MP_INT are less than 0 by definition.
> > 
> > | opcode   | conditions | iter   | value | is_none |
> > --------------------------------------------------------
> > | GE/GT    | -          | GE     | 0     | No      |
> > | LE/LT/EQ | -          | -      | -     | Yes     |
> > 
> > 2) mp_type of value == MP_DOUBLE:
> > 
> > | opcode   | conditions     | iter   | value      | is_none |
> > -------------------------------------------------------------
> > | GE/GT    | d < 0          | GE     | 0          | No      |
> > | LE/LT/EQ | d < 0          | -      | -          | Yes     |
> > | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
> > | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |
> > 
> > In the next table we assert that d >= 0 and d <= UINT64_MAX.
> 
> 1. What is 'd'? Value to search by?
> 
Yes. Sorry, forgot to mention that previously.

> > 
> > | opcode   | conditions  | iter   | value  | is_none |
> > ----------------------------------------------------------
> > | GT       | -           | GT     | (int)d     | No      |
> > | GE       | d == (int)d | GE     | (int)d     | No      |
> > | GE       | d != (int)d | GT     | (int)d     | No      |
> > | EQ       | d == (int)d | EQ     | (int)d     | No      |
> > | EQ       | d != (int)d | -      | -          | Yes     |
> > | LT       | d == (int)d | LT     | (int)d     | No      |
> > | LT       | d != (int)d | LE     | (int)d     | No      |
> > | LE       | -           | LE     | (int)d     | No      |
> > 
> > II) Field type of index part == INTEGER:
> > 
> > In this case the only possible mp_type of value is MP_DOUBLE.
> 
> 2. Why? INTEGER field type can store MP_UINT and MP_INT. So you
> can search here by any number mp_type.
> 
True, but these tables should be used only in cases where
we cannot use given value to search by index.


> > | opcode   | conditions     | iter   | value      | is_none |
> > -------------------------------------------------------------
> > | GE/GT    | d < INT64_MIN  | GE     | INT64_MIN  | No      |
> > | LE/LT/EQ | d < INT64_MIN  | -      | -          | Yes     |
> > | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes     |
> > | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No      |
> > 
> > In the next table we assert that d >= INT64_MIN and
> > d < 0. In case d >= 0 and d <= UINT64_MAX we may use table for
> > UNSIGNED.
> > 
> > | opcode   | conditions  | iter   | value     | is_none |
> > ---------------------------------------------------------
> > | GT       | d == (int)d | GT     | (int)d    | No      |
> > | GT       | d != (int)d | GE     | (int)d    | No      |
> > | GE       | -           | GE     | (int)d    | No      |
> > | EQ       | d == (int)d | EQ     | (int)d    | No      |
> > | EQ       | d != (int)d | -      | -         | Yes     |
> > | LT       | -           | LT     | (int)d    | No      |
> > | LE       | d == (int)d | LE     | (int)d    | No      |
> > | LE       | d != (int)d | LT     | (int)d    | No      |
> > 
> > III) Field type of index part == DOUBLE:
> > 
> > In this case, the rules are the same for MP_INT and MP_UINT. In
> > addition, we exclude the case when the value is less than 2^53 and
> > more than -2^53, since we can simply convert it to DOUBLE without
> > precision loss.
> > 
> > In the table below, we assert that the value is INTEGER which is
> > greater than 2^53 or less than -2^53.
> > 
> > | opcode   | conditions     | iter   | value     | is_none |
> > ------------------------------------------------------------
> > | GT       | (double)i > i  | GE     | (double)i | No      |
> > | GT       | (double)i <= i | GT     | (double)i | No      |
> > | GE       | (double)i >= i | GE     | (double)i | No      |
> > | GE       | (double)i < i  | GT     | (double)i | No      |
> > | EQ       | (double)i == i | EQ     | (double)i | No      |
> > | EQ       | (double)i != i | -      | -         | Yes     |
> > | LT       | (double)i < i  | LE     | (double)i | No      |
> > | LT       | (double)i >= i | LT     | (double)i | No      |
> > | LE       | (double)i <= i | LE     | (double)i | No      |
> > | LE       | (double)i > i  | LT     | (double)i | No      |
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL.
  2020-03-13 16:25 ` Peter Gulutzan
@ 2020-03-14  9:54   ` Mergen Imeev
  2020-03-14 14:44     ` Peter Gulutzan
  0 siblings, 1 reply; 6+ messages in thread
From: Mergen Imeev @ 2020-03-14  9:54 UTC (permalink / raw)
  To: Peter Gulutzan; +Cc: tarantool-discussions

Hi! Thanks for answering. Sorry, I did not fully describe
everything that I should in the original letter.

At first, when I say "value of type INTEGER", I mean the integers
in [-2^63, 2^64-1]; when I say "value of type DOUBLE", I mean a
double precision floating point number (IEEE 754).

Rules for comparing two numeric values that I plan to implement:
1) If both values are of the same type, they can be compared
without reduction.
2) If the type of the first value is contained in the type of the
second value, then they can be compared. The first value is cast
to the type of the second value. (For example, a value of type
UNSIGNED can be compared with a value of type INTEGER, the same
for UNSIGNED/INTEGER/ DOUBLE and NUMBER. In addition, "first" and
"second" here do not mean "left operand" or "right operand". We
must read it as "one" and "the other".)
3) If we compare the value d of type DOUBLE with the value i of
type INTEGER/UNSIGNED, then we will use these rules:
  a) If d <-2 ^ 63 or d> = 2 ^ 64, then we can say the result of
     the comparison without actual comparison.
  b) If -2 ^ 53 <= i <= 2 ^ 53, then we will convert i to the type
     DOUBLE.
  c) If -2 ^ 63 <= d <2 ^ 53 or 2 ^ 53 <d <= 2 ^ 64, then we cast
     d to INTEGER.

Variables that I use in tables: i is value of type INTEGER, d is
value of type DOUBLE.

The operation "(int)d" means that we are trying to convert
value d of type DOUBLE to a value of type INTEGER,
discarding the part after the floating point. If d >= 0,
than (int)d <= d, else (int)d >= d.

The operation "(double)i" means that we are trying to convert
value i of type INTEGER/UNSIGNED to a value of type DOUBLE.


On Fri, Mar 13, 2020 at 10:25:26AM -0600, Peter Gulutzan wrote:
> Hi,
> 
> On 2020-03-10 7:23 a.m., Mergen Imeev wrote:
> > Hi all,
> > I would like to discuss comparison between numbers using an index
> > in SQL. I admit that I was wrong when I said that we can compare
> > numbers without any explicit conversion.
> >
> 
> Do you mean that you will require explicit conversion
> i.e. the CAST function when comparing numbers, sometimes?
> 
No, I mean that before we pass the value to the Tarantool
iterator, we must implicitly cast it to the appropriate
type.

> > To achieve the same result as if we would not use any index, I
> > propose the following rules.
> >
> > Column names in the tables below:
> > opcode - original operation ('>' == GT, etc.);
> > conditions - additional conditions on the value of the variable;
> > iter - iterator that will be used to search in Tarantool;
> > value - value that will be used to search in Tarantool;
> > is_none - flag, is_none == true if tuples will not be found.
> >
> 
> I'll assume that when you say "the value of the variable",
> you mean the value in the index. So later, when you say
> "d < 0", you mean "the value in the index is less than 0".
No, I mean the temporary variables I use in these tables.
There is two of them: d of type DOUBLE and i of type
INTEGER.

> 
> I'll assume that when you say "is_none == true if tuples will be found",
Actually, I wrote "is_none == true if tuples will not be
found", but I think you meant the same.

> you do not mean that is_none == false implies that tuples
> will be found, that is unknown.
I do not think so. For example, if we search values less
than -10 in field of type UNSIGNED, we won't find
anything. But, it doesn't mean, that we will find
something in case we search value more or equal than -10
in the same field. The table may be empty.

> And of course when you put "No" and "Yes" in the is_none column,
> you mean False and True respectively.
True.

> 
> I'll assume that when you put "-" in the value column, you mean
> that value can be anything.
Yes. However, the conditions before the table should be
applied.

> 
> I'll assume that when you say "d < 0" in the conditions column
> and "-" in the value column and "Yes" in the is_none column,
> you do not mean "if the index contains any value that is less
> than 0 then there will be no returned tuples", you mean
> "if the value in the index is less than 0 then the tuple
> associated with that value will not be returned (but for
> another value in the index there might be a returned tuple)".
Let me just say what I meant: "If we looking for values in
field of type UNSIGNED that are LE/LT/EQ value d of type
DOUBLE than we will find nothing". For example:
CREATE TABLE t(i UNSIGNED PRIMARY KEY);
... -- all insertions are here.
SELECT i FROM t WHERE i < -3.5;
Variable d is -3.5 here.

> 
> > I) Field type of index part == UNSIGNED:
> >
> > 1) mp_type of value == MP_INT:
> >
> > All values with mp_type == MP_INT are less than 0 by definition.
> >
> > | opcode   | conditions | iter   | value | is_none |
> > --------------------------------------------------------
> > | GE/GT    | -          | GE     | 0     | No      |
> > | LE/LT/EQ | -          | -      | -     | Yes     |
> >
> 
> MessagePack definition
> https://github.com/msgpack/msgpack/blob/master/spec.md#int-format-family
> does not say that numbers in the "int" family are all less than 0.
> So I'll assume that when you say MP_INT values are less than 0
> "by definition", you do not mean MessagePack definition, you
> mean that you will guarantee that any search values which are
> greater than or equal to 0 will not be identified as MP_INT.
> 
It is more like that "in our implementation, all integers >= 0
have mp_type MP_UINT, and all integers < 0 have mp_type MP_INT".
Also, about field_types and mp_types:
FIELD_TYPE_UNSIGNED = {MP_UINT}
FIELD_TYPE_INTEGER = {MP_INT, MP_UINT}
FIELD_TYPE_DOUBLE = {MP_DOUBLE}
FIELD_TYPE_NUMBER = {MP_INT, MP_UINT, MP_DOUBLE, MP_FLOAT}

MP_FLOAT isn't widely used right now.

From this we can see, that FIELD_TYPE_INTEGER contains
FIELD_TYPE_UNSIGNED. Also, FIELD_TYPE_NUMBER contains all
other numeric field_types.

> > 2) mp_type of value == MP_DOUBLE:
> >
> > | opcode   | conditions     | iter   | value      | is_none |
> > -------------------------------------------------------------
> > | GE/GT    | d < 0          | GE     | 0          | No |
> > | LE/LT/EQ | d < 0          | -      | -          | Yes |
> > | GE/GT/EQ | d > UINT64_MAX | -      | -          | Yes |
> > | LE/LT    | d > UINT64_MAX | LE     | UINT64_MAX | No |
> >
> <cut>
> 
> Because of the assumptions that I made about what you mean,
> I think that these statements are true:
> If index on column1 has -1: column1 >= 0.0 maybe will return tuples.
> If index on column1 has -1: column1 <= 0.0 will not return tuples.
> If index on column1 has UINT64_MAX+1: column1 > 0.0 will not return tuples.
> If index on column1 has UINT64_MAX+1: column1 < 0.0 maybe will return
> tuples.
> 
> But those results are impossible or absurd.
> 
> Therefore some or all of my assumptions, about what you mean,
> must be incorrect. Perhaps you can explain what I did wrong?
> 
Have I resolved all the misunderstandings mentioned?

> Peter Gulutzan
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL.
  2020-03-14  9:54   ` Mergen Imeev
@ 2020-03-14 14:44     ` Peter Gulutzan
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Gulutzan @ 2020-03-14 14:44 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

Hi,


On 2020-03-14 3:54 a.m., Mergen Imeev wrote:
<cut>
 > Have I resolved all the misunderstandings mentioned?

Yes. Thank you very much.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-14 14:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-10 13:23 [Tarantool-discussions] Rules for comparing numbers using Tarantool indexes in SQL Mergen Imeev
2020-03-12 21:37 ` Vladislav Shpilevoy
2020-03-14  9:19   ` Mergen Imeev
2020-03-13 16:25 ` Peter Gulutzan
2020-03-14  9:54   ` Mergen Imeev
2020-03-14 14:44     ` Peter Gulutzan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox