From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 24 Jul 2019 19:56:16 +0300 From: Vladimir Davydov Subject: Re: [PATCH 5/5] decimal: allow to index decimals Message-ID: <20190724165616.GC24631@esperanza> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: To: Serge Petrenko Cc: tarantool-patches@freelists.org List-ID: On Wed, Jul 17, 2019 at 06:33:46PM +0300, Serge Petrenko wrote: > Indices can now be built over decimal fields. > A new field type - 'decimal' is introduced. > Decimal values may be stored in 'decimal' columns, as well as in > 'scalar' and 'any' columns. You can remove this text as it reiterates the doc bot request below. > > Closes #4333 > > @TarantoolBot document > Title: Document decimal field type. > > Decimals may now be stored in spaces. A corresponding field type is > introduced: 'decimal'. Decimal values are also allowed in 'scalar' and > 'number' fields. > > 'decimal' field type is appropriate for both memtx HASH and TREE > indices, as well as for vinyl TREE index. Please describe how decimals are ordered in respect to other numeric types. Give an example of creating and using an index over a decimal field. Can one use decimal in space format? If so, please mention it too and add some tests. What about update operations (index.update, tuple.update). Do they support decimals? I assume not for now. This should be addressed (a separate patch is okay). > --- > src/box/field_def.c | 43 +++++-- > src/box/field_def.h | 16 ++- > src/box/key_def.h | 2 +- > src/box/tuple_compare.cc | 160 ++++++++++++++++++++++++- > src/box/tuple_format.c | 2 +- > src/lib/core/decimal.h | 8 ++ > src/lib/core/mp_decimal.h | 8 ++ > test/engine/decimal.result | 226 +++++++++++++++++++++++++++++++++++ > test/engine/decimal.test.lua | 65 ++++++++++ > 9 files changed, 510 insertions(+), 20 deletions(-) > create mode 100644 test/engine/decimal.result > create mode 100644 test/engine/decimal.test.lua > > diff --git a/src/box/field_def.c b/src/box/field_def.c > index 346042b98..da06e6bde 100644 > --- a/src/box/field_def.c > +++ b/src/box/field_def.c > @@ -52,17 +52,32 @@ const uint32_t field_mp_type[] = { > /* [FIELD_TYPE_UNSIGNED] = */ 1U << MP_UINT, > /* [FIELD_TYPE_STRING] = */ 1U << MP_STR, > /* [FIELD_TYPE_NUMBER] = */ (1U << MP_UINT) | (1U << MP_INT) | > - (1U << MP_FLOAT) | (1U << MP_DOUBLE), > + (1U << MP_FLOAT) | (1U << MP_DOUBLE) | (1U << MP_EXT), > /* [FIELD_TYPE_INTEGER] = */ (1U << MP_UINT) | (1U << MP_INT), > /* [FIELD_TYPE_BOOLEAN] = */ 1U << MP_BOOL, > /* [FIELD_TYPE_VARBINARY] = */ 1U << MP_BIN, > /* [FIELD_TYPE_SCALAR] = */ (1U << MP_UINT) | (1U << MP_INT) | > (1U << MP_FLOAT) | (1U << MP_DOUBLE) | (1U << MP_STR) | > - (1U << MP_BIN) | (1U << MP_BOOL), > + (1U << MP_BIN) | (1U << MP_BOOL) | (1U << MP_EXT), > + /* [FIELD_TYPE_DECIMAL] = */ 1U << MP_EXT, > /* [FIELD_TYPE_ARRAY] = */ 1U << MP_ARRAY, > /* [FIELD_TYPE_MAP] = */ (1U << MP_MAP), > }; > > +const uint32_t field_ext_type[] = { field_mp_ext_type > + /* [FIELD_TYPE_ANY] = */ UINT32_MAX & ~(1U << MP_UNKNOWN), Can one insert MP_EXT other than decimal into a space? Do we need to support that? > + /* [FIELD_TYPE_UNSIGNED] = */ 0, > + /* [FIELD_TYPE_STRING] = */ 0, > + /* [FIELD_TYPE_NUMBER] = */ 1U << MP_DECIMAL, > + /* [FIELD_TYPE_INTEGER] = */ 0, > + /* [FIELD_TYPE_BOOLEAN] = */ 0, > + /* [FIELD_TYPE_VARBINARY] = */ 0, > + /* [FIELD_TYPE_SCALAR] = */ 1U << MP_DECIMAL, > + /* [FIELD_TYPE_DECIMAL] = */ 1U << MP_DECIMAL, > + /* [FIELD_TYPE_ARRAY] = */ 0, > + /* [FIELD_TYPE_MAP] = */ 0, > +}; > + > diff --git a/src/box/key_def.h b/src/box/key_def.h > index f4a1a8fd1..ed3354ade 100644 > --- a/src/box/key_def.h > +++ b/src/box/key_def.h > @@ -505,7 +505,7 @@ static inline int > key_part_validate(enum field_type key_type, const char *key, > uint32_t field_no, bool is_nullable) > { > - if (unlikely(!field_mp_type_is_compatible(key_type, mp_typeof(*key), > + if (unlikely(!field_mp_type_is_compatible(key_type, key, > is_nullable))) { > diag_set(ClientError, ER_KEY_PART_TYPE, field_no, > field_type_strs[key_type]); > diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc > index 95a0f58c9..6ab28f662 100644 > --- a/src/box/tuple_compare.cc > +++ b/src/box/tuple_compare.cc > @@ -33,6 +33,9 @@ > #include "coll/coll.h" > #include "trivia/util.h" /* NOINLINE */ > #include > +#include "lib/core/decimal.h" > +#include "lib/core/mp_decimal.h" > +#include "lib/core/mp_user_types.h" > > /* {{{ tuple_compare */ > > @@ -87,7 +90,7 @@ static enum mp_class mp_classes[] = { > /* .MP_BOOL = */ MP_CLASS_BOOL, > /* .MP_FLOAT = */ MP_CLASS_NUMBER, > /* .MP_DOUBLE = */ MP_CLASS_NUMBER, > - /* .MP_BIN = */ MP_CLASS_BIN > + /* .MP_EXT = */ MP_CLASS_NUMBER /* requires additional parsing */ This, as well as the use of MP_EXT below, looks confusing: you imply that MP_EXT is always a number, but what happens when we add TIMESTAMP or UUID ext type? I would instead add a new enum for all comparable types (bool, float, decimal, etc) and a helper that would determine the type by msgpack data. This way all switch-cases below would be flat and the code would be generally easier for understanding IMO. Please consider this. This could be done in a separate patch. > }; > > #define COMPARE_RESULT(a, b) (a < b ? -1 : a > b) > @@ -264,6 +267,50 @@ mp_compare_double_any_number(double lhs, const char *rhs, > return k * COMPARE_RESULT(lqbit, rqbit); > } > > +static int > +mp_compare_decimal_any_number(decimal_t *lhs, const char *rhs, > + enum mp_type rhs_type, int k) > +{ > + decimal_t rhs_dec; > + switch (rhs_type) { > + case MP_FLOAT: > + { > + double d = mp_decode_float(&rhs); > + decimal_from_double(&rhs_dec, d); > + break; > + } > + case MP_DOUBLE: > + { > + double d = mp_decode_double(&rhs); > + decimal_from_double(&rhs_dec, d); > + break; > + } > + case MP_INT: > + { > + int64_t num = mp_decode_int(&rhs); > + decimal_from_int64(&rhs_dec, num); > + break; > + } > + case MP_UINT: > + { > + uint64_t num = mp_decode_uint(&rhs); > + decimal_from_uint64(&rhs_dec, num); > + break; > + } > + case MP_EXT: > + { > + int8_t ext_type; > + uint32_t len = mp_decode_extl(&rhs, &ext_type); > + assert(ext_type == MP_DECIMAL); > + decimal_unpack(&rhs, len, &rhs_dec); > + break; > + } > + default: > + unreachable(); > + } > + return k * decimal_compare(lhs, &rhs_dec); > +} > + > static int > mp_compare_number_with_type(const char *lhs, enum mp_type lhs_type, > const char *rhs, enum mp_type rhs_type) > @@ -271,6 +318,21 @@ mp_compare_number_with_type(const char *lhs, enum mp_type lhs_type, > assert(mp_classof(lhs_type) == MP_CLASS_NUMBER); > assert(mp_classof(rhs_type) == MP_CLASS_NUMBER); > > + /* > + * test decimals first, so that we don't have to > + * account for them in other comarators. > + */ > + decimal_t dec; > + if (rhs_type == MP_EXT) { > + return mp_compare_decimal_any_number( > + mp_decode_decimal(&rhs, &dec), lhs, lhs_type, -1 > + ); > + } > + if (lhs_type == MP_EXT) { > + return mp_compare_decimal_any_number( > + mp_decode_decimal(&lhs, &dec), rhs, rhs_type, 1 > + ); > + } > if (rhs_type == MP_FLOAT) { > return mp_compare_double_any_number( > mp_decode_float(&rhs), lhs, lhs_type, -1 > @@ -410,6 +472,8 @@ tuple_compare_field(const char *field_a, const char *field_b, > return coll != NULL ? > mp_compare_scalar_coll(field_a, field_b, coll) : > mp_compare_scalar(field_a, field_b); > + case FIELD_TYPE_DECIMAL: > + return mp_compare_number(field_a, field_b); > default: > unreachable(); > return 0; > @@ -443,6 +507,8 @@ tuple_compare_field_with_type(const char *field_a, enum mp_type a_type, > mp_compare_scalar_coll(field_a, field_b, coll) : > mp_compare_scalar_with_type(field_a, a_type, > field_b, b_type); > + case FIELD_TYPE_DECIMAL: > + return mp_compare_number(field_a, field_b); > default: > unreachable(); > return 0; > @@ -1356,6 +1422,25 @@ static const comparator_with_key_signature cmp_wk_arr[] = { > #define HINT_VALUE_DOUBLE_MAX (exp2(HINT_VALUE_BITS - 1) - 1) > #define HINT_VALUE_DOUBLE_MIN (-exp2(HINT_VALUE_BITS - 1)) > > +/** > + * Max and min decimal numbers whose integral parts fit > + * in a hint value. > + */ > +static const decimal_t HINT_VALUE_DECIMAL_MAX = { > + 18, /* decimal digits */ > + 0, /* exponent */ > + 0, /* no special bits. */ > + {487, 423, 303, 752, 460, 576} /* 576,460,752,303,423,488 = 2^59 - 1 */ > + /* 59 == HINT_VALUE_BITS - 1 */ > +}; > + > +static const decimal_t HINT_VALUE_DECIMAL_MIN = { > + 18, /* decimal digits */ > + 0, /* exponent */ > + 0x80, /* negative bit */ > + {488, 423, 303, 752, 460, 576} /* 576,460,752,303,423,488 = 2^59 */ > +}; > + This looks fragile. Suppose we remove one bit from a hint, how should one update those? Please define these constants using decimal_from_int64 or, even better, get rid of them altogether (see below). > /* > * HINT_CLASS_BITS should be big enough to store any mp_class value. > * Note, ((1 << HINT_CLASS_BITS) - 1) is reserved for HINT_NONE. > @@ -1415,6 +1500,25 @@ hint_double(double d) > return hint_create(MP_CLASS_NUMBER, val); > } > > +static inline hint_t > +hint_decimal(decimal_t *dec) > +{ > + uint64_t val = 0; > + int64_t num; > + if (decimal_compare(dec, &HINT_VALUE_DECIMAL_MAX) >= 0) > + val = HINT_VALUE_MAX; > + else if (decimal_compare(dec, &HINT_VALUE_DECIMAL_MIN) <= 0) > + val = 0; > + else { > + dec = decimal_to_int64(dec, &num); > + /* We've checked boundaries above. */ > + assert(dec != NULL); > + val = num - HINT_VALUE_INT_MIN; > + } Why not call decimal_to_int64 first and then calculate a hint from the returned number? I mean if (decimal_to_int64(dec, &num) && num >= HINT_VALUE_INT_MIN && num <= HINT_VALUE_INT_MAX) hint = num; else ... > + > + return hint_create(MP_CLASS_NUMBER, val); > +} > + > static inline uint64_t > hint_str_raw(const char *s, uint32_t len) > { > @@ -1491,12 +1595,42 @@ field_hint_number(const char *field) > return hint_double(mp_decode_float(&field)); > case MP_DOUBLE: > return hint_double(mp_decode_double(&field)); > + case MP_EXT: > + { > + int8_t ext_type; > + uint32_t len = mp_decode_extl(&field, &ext_type); > + switch(ext_type) { > + case MP_DECIMAL: > + { > + decimal_t dec; > + decimal_t *res; > + /* > + * The effect of mp_decode_extl() + > + * decimal_unpack() is the same that > + * the one of mp_decode_decimal(). > + */ > + res = decimal_unpack(&field, len, &dec); > + assert(res != NULL); > + return hint_decimal(res); > + } > + default: > + unreachable(); > + } > + } > default: > unreachable(); > } > return HINT_NONE; > } > > +static inline hint_t > +field_hint_decimal(const char *field) > +{ > + assert(mp_typeof(*field) == MP_EXT); > + decimal_t dec; > + return hint_decimal(mp_decode_decimal(&field, &dec)); > +} > + > static inline hint_t > field_hint_string(const char *field, struct coll *coll) > { > @@ -1536,6 +1670,25 @@ field_hint_scalar(const char *field, struct coll *coll) > case MP_BIN: > len = mp_decode_binl(&field); > return hint_bin(field, len); > + case MP_EXT: > + { > + int8_t ext_type; > + uint32_t len = mp_decode_extl(&field, &ext_type); > + switch(ext_type) { > + case MP_DECIMAL: > + { > + decimal_t dec; > + /* > + * The effect of mp_decode_extl() + > + * decimal_unpack() is the same that > + * the one of mp_decode_decimal(). > + */ > + return hint_decimal(decimal_unpack(&field, len, &dec)); > + } Please try to eliminate code duplication. > diff --git a/test/engine/decimal.test.lua b/test/engine/decimal.test.lua > new file mode 100644 > index 000000000..52a300e72 > --- /dev/null > +++ b/test/engine/decimal.test.lua > @@ -0,0 +1,65 @@ > +env = require('test_run') > +test_run = env.new() > +engine = test_run:get_cfg('engine') > + > +decimal = require('decimal') > +ffi = require('ffi') > + > +_ = box.schema.space.create('test', {engine=engine}) > +_ = box.space.test:create_index('pk', {parts={1,'decimal'}}) > + > +test_run:cmd('setopt delimiter ";"') > +for i = 0,16 do > + box.space.test:insert{decimal.new((i-8)/4)} > +end; > +test_run:cmd('setopt delimiter ""'); > + > +box.space.test:select{} > + > +-- check invalid values > +box.space.test:insert{1.23} > +box.space.test:insert{'str'} > +box.space.test:insert{ffi.new('uint64_t', 0)} > +-- check duplicates > +box.space.test:insert{decimal.new(0)} > + > +box.space.test.index.pk:drop() > + > +_ = box.space.test:create_index('pk', {parts={1, 'number'}}) > + > +test_run:cmd('setopt delimiter ";"') > +for i = 0, 32 do > + local val = (i - 16) / 8 > + if i % 2 == 1 then val = decimal.new(val) end > + box.space.test:insert{val} > +end; > +test_run:cmd('setopt delimiter ""'); > + > +box.space.test:select{} > + > +-- check duplicates > +box.space.test:insert{-2} > +box.space.test:insert{decimal.new(-2)} > +box.space.test:insert{decimal.new(-1.875)} > +box.space.test:insert{-1.875} > + > +box.space.test.index.pk:drop() > + > +_ = box.space.test:create_index('pk') > +test_run:cmd('setopt delimiter ";"') > +for i = 1,10 do > + box.space.test:insert{i, decimal.new(i/10)} > +end; > +test_run:cmd('setopt delimiter ""'); > + > +-- a bigger test with a secondary index this time. > +box.space.test:insert{11, 'str'} > +box.space.test:insert{12, 0.63} > +box.space.test:insert{13, 0.57} > +box.space.test:insert{14, 0.33} > +box.space.test:insert{16, 0.71} > + > +_ = box.space.test:create_index('sk', {parts={2, 'scalar'}}) > +box.space.test.index.sk:select{} > + > +box.space.test:drop() Please add a test for space.format and index.alter.