[PATCH 5/5] decimal: allow to index decimals
Vladimir Davydov
vdavydov.dev at gmail.com
Wed Jul 24 19:56:16 MSK 2019
On Wed, Jul 17, 2019 at 06:33:46PM +0300, Serge Petrenko wrote:
> Indices can now be built over decimal fields.
> A new field type - 'decimal' is introduced.
> Decimal values may be stored in 'decimal' columns, as well as in
> 'scalar' and 'any' columns.
You can remove this text as it reiterates the doc bot request below.
>
> Closes #4333
>
> @TarantoolBot document
> Title: Document decimal field type.
>
> Decimals may now be stored in spaces. A corresponding field type is
> introduced: 'decimal'. Decimal values are also allowed in 'scalar' and
> 'number' fields.
>
> 'decimal' field type is appropriate for both memtx HASH and TREE
> indices, as well as for vinyl TREE index.
Please describe how decimals are ordered in respect to other numeric
types. Give an example of creating and using an index over a decimal
field.
Can one use decimal in space format? If so, please mention it too
and add some tests.
What about update operations (index.update, tuple.update). Do they
support decimals? I assume not for now. This should be addressed
(a separate patch is okay).
> ---
> src/box/field_def.c | 43 +++++--
> src/box/field_def.h | 16 ++-
> src/box/key_def.h | 2 +-
> src/box/tuple_compare.cc | 160 ++++++++++++++++++++++++-
> src/box/tuple_format.c | 2 +-
> src/lib/core/decimal.h | 8 ++
> src/lib/core/mp_decimal.h | 8 ++
> test/engine/decimal.result | 226 +++++++++++++++++++++++++++++++++++
> test/engine/decimal.test.lua | 65 ++++++++++
> 9 files changed, 510 insertions(+), 20 deletions(-)
> create mode 100644 test/engine/decimal.result
> create mode 100644 test/engine/decimal.test.lua
>
> diff --git a/src/box/field_def.c b/src/box/field_def.c
> index 346042b98..da06e6bde 100644
> --- a/src/box/field_def.c
> +++ b/src/box/field_def.c
> @@ -52,17 +52,32 @@ const uint32_t field_mp_type[] = {
> /* [FIELD_TYPE_UNSIGNED] = */ 1U << MP_UINT,
> /* [FIELD_TYPE_STRING] = */ 1U << MP_STR,
> /* [FIELD_TYPE_NUMBER] = */ (1U << MP_UINT) | (1U << MP_INT) |
> - (1U << MP_FLOAT) | (1U << MP_DOUBLE),
> + (1U << MP_FLOAT) | (1U << MP_DOUBLE) | (1U << MP_EXT),
> /* [FIELD_TYPE_INTEGER] = */ (1U << MP_UINT) | (1U << MP_INT),
> /* [FIELD_TYPE_BOOLEAN] = */ 1U << MP_BOOL,
> /* [FIELD_TYPE_VARBINARY] = */ 1U << MP_BIN,
> /* [FIELD_TYPE_SCALAR] = */ (1U << MP_UINT) | (1U << MP_INT) |
> (1U << MP_FLOAT) | (1U << MP_DOUBLE) | (1U << MP_STR) |
> - (1U << MP_BIN) | (1U << MP_BOOL),
> + (1U << MP_BIN) | (1U << MP_BOOL) | (1U << MP_EXT),
> + /* [FIELD_TYPE_DECIMAL] = */ 1U << MP_EXT,
> /* [FIELD_TYPE_ARRAY] = */ 1U << MP_ARRAY,
> /* [FIELD_TYPE_MAP] = */ (1U << MP_MAP),
> };
>
> +const uint32_t field_ext_type[] = {
field_mp_ext_type
> + /* [FIELD_TYPE_ANY] = */ UINT32_MAX & ~(1U << MP_UNKNOWN),
Can one insert MP_EXT other than decimal into a space? Do we need to
support that?
> + /* [FIELD_TYPE_UNSIGNED] = */ 0,
> + /* [FIELD_TYPE_STRING] = */ 0,
> + /* [FIELD_TYPE_NUMBER] = */ 1U << MP_DECIMAL,
> + /* [FIELD_TYPE_INTEGER] = */ 0,
> + /* [FIELD_TYPE_BOOLEAN] = */ 0,
> + /* [FIELD_TYPE_VARBINARY] = */ 0,
> + /* [FIELD_TYPE_SCALAR] = */ 1U << MP_DECIMAL,
> + /* [FIELD_TYPE_DECIMAL] = */ 1U << MP_DECIMAL,
> + /* [FIELD_TYPE_ARRAY] = */ 0,
> + /* [FIELD_TYPE_MAP] = */ 0,
> +};
> +
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index f4a1a8fd1..ed3354ade 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -505,7 +505,7 @@ static inline int
> key_part_validate(enum field_type key_type, const char *key,
> uint32_t field_no, bool is_nullable)
> {
> - if (unlikely(!field_mp_type_is_compatible(key_type, mp_typeof(*key),
> + if (unlikely(!field_mp_type_is_compatible(key_type, key,
> is_nullable))) {
> diag_set(ClientError, ER_KEY_PART_TYPE, field_no,
> field_type_strs[key_type]);
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index 95a0f58c9..6ab28f662 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -33,6 +33,9 @@
> #include "coll/coll.h"
> #include "trivia/util.h" /* NOINLINE */
> #include <math.h>
> +#include "lib/core/decimal.h"
> +#include "lib/core/mp_decimal.h"
> +#include "lib/core/mp_user_types.h"
>
> /* {{{ tuple_compare */
>
> @@ -87,7 +90,7 @@ static enum mp_class mp_classes[] = {
> /* .MP_BOOL = */ MP_CLASS_BOOL,
> /* .MP_FLOAT = */ MP_CLASS_NUMBER,
> /* .MP_DOUBLE = */ MP_CLASS_NUMBER,
> - /* .MP_BIN = */ MP_CLASS_BIN
> + /* .MP_EXT = */ MP_CLASS_NUMBER /* requires additional parsing */
This, as well as the use of MP_EXT below, looks confusing: you imply
that MP_EXT is always a number, but what happens when we add TIMESTAMP
or UUID ext type?
I would instead add a new enum for all comparable types (bool, float,
decimal, etc) and a helper that would determine the type by msgpack
data. This way all switch-cases below would be flat and the code would
be generally easier for understanding IMO. Please consider this. This
could be done in a separate patch.
> };
>
> #define COMPARE_RESULT(a, b) (a < b ? -1 : a > b)
> @@ -264,6 +267,50 @@ mp_compare_double_any_number(double lhs, const char *rhs,
> return k * COMPARE_RESULT(lqbit, rqbit);
> }
>
> +static int
> +mp_compare_decimal_any_number(decimal_t *lhs, const char *rhs,
> + enum mp_type rhs_type, int k)
> +{
> + decimal_t rhs_dec;
> + switch (rhs_type) {
> + case MP_FLOAT:
> + {
> + double d = mp_decode_float(&rhs);
> + decimal_from_double(&rhs_dec, d);
> + break;
> + }
> + case MP_DOUBLE:
> + {
> + double d = mp_decode_double(&rhs);
> + decimal_from_double(&rhs_dec, d);
> + break;
> + }
> + case MP_INT:
> + {
> + int64_t num = mp_decode_int(&rhs);
> + decimal_from_int64(&rhs_dec, num);
> + break;
> + }
> + case MP_UINT:
> + {
> + uint64_t num = mp_decode_uint(&rhs);
> + decimal_from_uint64(&rhs_dec, num);
> + break;
> + }
> + case MP_EXT:
> + {
> + int8_t ext_type;
> + uint32_t len = mp_decode_extl(&rhs, &ext_type);
> + assert(ext_type == MP_DECIMAL);
> + decimal_unpack(&rhs, len, &rhs_dec);
> + break;
> + }
> + default:
> + unreachable();
> + }
> + return k * decimal_compare(lhs, &rhs_dec);
> +}
> +
> static int
> mp_compare_number_with_type(const char *lhs, enum mp_type lhs_type,
> const char *rhs, enum mp_type rhs_type)
> @@ -271,6 +318,21 @@ mp_compare_number_with_type(const char *lhs, enum mp_type lhs_type,
> assert(mp_classof(lhs_type) == MP_CLASS_NUMBER);
> assert(mp_classof(rhs_type) == MP_CLASS_NUMBER);
>
> + /*
> + * test decimals first, so that we don't have to
> + * account for them in other comarators.
> + */
> + decimal_t dec;
> + if (rhs_type == MP_EXT) {
> + return mp_compare_decimal_any_number(
> + mp_decode_decimal(&rhs, &dec), lhs, lhs_type, -1
> + );
> + }
> + if (lhs_type == MP_EXT) {
> + return mp_compare_decimal_any_number(
> + mp_decode_decimal(&lhs, &dec), rhs, rhs_type, 1
> + );
> + }
> if (rhs_type == MP_FLOAT) {
> return mp_compare_double_any_number(
> mp_decode_float(&rhs), lhs, lhs_type, -1
> @@ -410,6 +472,8 @@ tuple_compare_field(const char *field_a, const char *field_b,
> return coll != NULL ?
> mp_compare_scalar_coll(field_a, field_b, coll) :
> mp_compare_scalar(field_a, field_b);
> + case FIELD_TYPE_DECIMAL:
> + return mp_compare_number(field_a, field_b);
> default:
> unreachable();
> return 0;
> @@ -443,6 +507,8 @@ tuple_compare_field_with_type(const char *field_a, enum mp_type a_type,
> mp_compare_scalar_coll(field_a, field_b, coll) :
> mp_compare_scalar_with_type(field_a, a_type,
> field_b, b_type);
> + case FIELD_TYPE_DECIMAL:
> + return mp_compare_number(field_a, field_b);
> default:
> unreachable();
> return 0;
> @@ -1356,6 +1422,25 @@ static const comparator_with_key_signature cmp_wk_arr[] = {
> #define HINT_VALUE_DOUBLE_MAX (exp2(HINT_VALUE_BITS - 1) - 1)
> #define HINT_VALUE_DOUBLE_MIN (-exp2(HINT_VALUE_BITS - 1))
>
> +/**
> + * Max and min decimal numbers whose integral parts fit
> + * in a hint value.
> + */
> +static const decimal_t HINT_VALUE_DECIMAL_MAX = {
> + 18, /* decimal digits */
> + 0, /* exponent */
> + 0, /* no special bits. */
> + {487, 423, 303, 752, 460, 576} /* 576,460,752,303,423,488 = 2^59 - 1 */
> + /* 59 == HINT_VALUE_BITS - 1 */
> +};
> +
> +static const decimal_t HINT_VALUE_DECIMAL_MIN = {
> + 18, /* decimal digits */
> + 0, /* exponent */
> + 0x80, /* negative bit */
> + {488, 423, 303, 752, 460, 576} /* 576,460,752,303,423,488 = 2^59 */
> +};
> +
This looks fragile. Suppose we remove one bit from a hint, how should
one update those? Please define these constants using decimal_from_int64
or, even better, get rid of them altogether (see below).
> /*
> * HINT_CLASS_BITS should be big enough to store any mp_class value.
> * Note, ((1 << HINT_CLASS_BITS) - 1) is reserved for HINT_NONE.
> @@ -1415,6 +1500,25 @@ hint_double(double d)
> return hint_create(MP_CLASS_NUMBER, val);
> }
>
> +static inline hint_t
> +hint_decimal(decimal_t *dec)
> +{
> + uint64_t val = 0;
> + int64_t num;
> + if (decimal_compare(dec, &HINT_VALUE_DECIMAL_MAX) >= 0)
> + val = HINT_VALUE_MAX;
> + else if (decimal_compare(dec, &HINT_VALUE_DECIMAL_MIN) <= 0)
> + val = 0;
> + else {
> + dec = decimal_to_int64(dec, &num);
> + /* We've checked boundaries above. */
> + assert(dec != NULL);
> + val = num - HINT_VALUE_INT_MIN;
> + }
Why not call decimal_to_int64 first and then calculate a hint from the
returned number? I mean
if (decimal_to_int64(dec, &num) &&
num >= HINT_VALUE_INT_MIN && num <= HINT_VALUE_INT_MAX)
hint = num;
else
...
> +
> + return hint_create(MP_CLASS_NUMBER, val);
> +}
> +
> static inline uint64_t
> hint_str_raw(const char *s, uint32_t len)
> {
> @@ -1491,12 +1595,42 @@ field_hint_number(const char *field)
> return hint_double(mp_decode_float(&field));
> case MP_DOUBLE:
> return hint_double(mp_decode_double(&field));
> + case MP_EXT:
> + {
> + int8_t ext_type;
> + uint32_t len = mp_decode_extl(&field, &ext_type);
> + switch(ext_type) {
> + case MP_DECIMAL:
> + {
> + decimal_t dec;
> + decimal_t *res;
> + /*
> + * The effect of mp_decode_extl() +
> + * decimal_unpack() is the same that
> + * the one of mp_decode_decimal().
> + */
> + res = decimal_unpack(&field, len, &dec);
> + assert(res != NULL);
> + return hint_decimal(res);
> + }
> + default:
> + unreachable();
> + }
> + }
> default:
> unreachable();
> }
> return HINT_NONE;
> }
>
> +static inline hint_t
> +field_hint_decimal(const char *field)
> +{
> + assert(mp_typeof(*field) == MP_EXT);
> + decimal_t dec;
> + return hint_decimal(mp_decode_decimal(&field, &dec));
> +}
> +
> static inline hint_t
> field_hint_string(const char *field, struct coll *coll)
> {
> @@ -1536,6 +1670,25 @@ field_hint_scalar(const char *field, struct coll *coll)
> case MP_BIN:
> len = mp_decode_binl(&field);
> return hint_bin(field, len);
> + case MP_EXT:
> + {
> + int8_t ext_type;
> + uint32_t len = mp_decode_extl(&field, &ext_type);
> + switch(ext_type) {
> + case MP_DECIMAL:
> + {
> + decimal_t dec;
> + /*
> + * The effect of mp_decode_extl() +
> + * decimal_unpack() is the same that
> + * the one of mp_decode_decimal().
> + */
> + return hint_decimal(decimal_unpack(&field, len, &dec));
> + }
Please try to eliminate code duplication.
> diff --git a/test/engine/decimal.test.lua b/test/engine/decimal.test.lua
> new file mode 100644
> index 000000000..52a300e72
> --- /dev/null
> +++ b/test/engine/decimal.test.lua
> @@ -0,0 +1,65 @@
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +
> +decimal = require('decimal')
> +ffi = require('ffi')
> +
> +_ = box.schema.space.create('test', {engine=engine})
> +_ = box.space.test:create_index('pk', {parts={1,'decimal'}})
> +
> +test_run:cmd('setopt delimiter ";"')
> +for i = 0,16 do
> + box.space.test:insert{decimal.new((i-8)/4)}
> +end;
> +test_run:cmd('setopt delimiter ""');
> +
> +box.space.test:select{}
> +
> +-- check invalid values
> +box.space.test:insert{1.23}
> +box.space.test:insert{'str'}
> +box.space.test:insert{ffi.new('uint64_t', 0)}
> +-- check duplicates
> +box.space.test:insert{decimal.new(0)}
> +
> +box.space.test.index.pk:drop()
> +
> +_ = box.space.test:create_index('pk', {parts={1, 'number'}})
> +
> +test_run:cmd('setopt delimiter ";"')
> +for i = 0, 32 do
> + local val = (i - 16) / 8
> + if i % 2 == 1 then val = decimal.new(val) end
> + box.space.test:insert{val}
> +end;
> +test_run:cmd('setopt delimiter ""');
> +
> +box.space.test:select{}
> +
> +-- check duplicates
> +box.space.test:insert{-2}
> +box.space.test:insert{decimal.new(-2)}
> +box.space.test:insert{decimal.new(-1.875)}
> +box.space.test:insert{-1.875}
> +
> +box.space.test.index.pk:drop()
> +
> +_ = box.space.test:create_index('pk')
> +test_run:cmd('setopt delimiter ";"')
> +for i = 1,10 do
> + box.space.test:insert{i, decimal.new(i/10)}
> +end;
> +test_run:cmd('setopt delimiter ""');
> +
> +-- a bigger test with a secondary index this time.
> +box.space.test:insert{11, 'str'}
> +box.space.test:insert{12, 0.63}
> +box.space.test:insert{13, 0.57}
> +box.space.test:insert{14, 0.33}
> +box.space.test:insert{16, 0.71}
> +
> +_ = box.space.test:create_index('sk', {parts={2, 'scalar'}})
> +box.space.test.index.sk:select{}
> +
> +box.space.test:drop()
Please add a test for space.format and index.alter.
More information about the Tarantool-patches
mailing list