[PATCH 5/5] decimal: allow to index decimals

Wed Jul 24 19:56:16 MSK 2019

On Wed, Jul 17, 2019 at 06:33:46PM +0300, Serge Petrenko wrote:
> Indices can now be built over decimal fields.
> A new field type - 'decimal' is introduced.
> Decimal values may be stored in 'decimal' columns, as well as in
> 'scalar' and 'any' columns.

You can remove this text as it reiterates the doc bot request below.

> 
> Closes #4333
> 
> @TarantoolBot document
> Title: Document decimal field type.
> 
> Decimals may now be stored in spaces. A corresponding field type is
> introduced: 'decimal'. Decimal values are also allowed in 'scalar' and
> 'number' fields.
> 
> 'decimal' field type is appropriate for both memtx HASH and TREE
> indices, as well as for vinyl TREE index.

Please describe how decimals are ordered in respect to other numeric
types. Give an example of creating and using an index over a decimal
field.

Can one use decimal in space format? If so, please mention it too
and add some tests.

What about update operations (index.update, tuple.update). Do they
support decimals? I assume not for now. This should be addressed
(a separate patch is okay).

> ---
>  src/box/field_def.c          |  43 +++++--
>  src/box/field_def.h          |  16 ++-
>  src/box/key_def.h            |   2 +-
>  src/box/tuple_compare.cc     | 160 ++++++++++++++++++++++++-
>  src/box/tuple_format.c       |   2 +-
>  src/lib/core/decimal.h       |   8 ++
>  src/lib/core/mp_decimal.h    |   8 ++
>  test/engine/decimal.result   | 226 +++++++++++++++++++++++++++++++++++
>  test/engine/decimal.test.lua |  65 ++++++++++
>  9 files changed, 510 insertions(+), 20 deletions(-)
>  create mode 100644 test/engine/decimal.result
>  create mode 100644 test/engine/decimal.test.lua
> 
> diff --git a/src/box/field_def.c b/src/box/field_def.c
> index 346042b98..da06e6bde 100644
> --- a/src/box/field_def.c
> +++ b/src/box/field_def.c
> @@ -52,17 +52,32 @@ const uint32_t field_mp_type[] = {
>  	/* [FIELD_TYPE_UNSIGNED] =  */ 1U << MP_UINT,
>  	/* [FIELD_TYPE_STRING]   =  */ 1U << MP_STR,
>  	/* [FIELD_TYPE_NUMBER]   =  */ (1U << MP_UINT) | (1U << MP_INT) |
> -		(1U << MP_FLOAT) | (1U << MP_DOUBLE),
> +		(1U << MP_FLOAT) | (1U << MP_DOUBLE) | (1U << MP_EXT),
>  	/* [FIELD_TYPE_INTEGER]  =  */ (1U << MP_UINT) | (1U << MP_INT),
>  	/* [FIELD_TYPE_BOOLEAN]  =  */ 1U << MP_BOOL,
>  	/* [FIELD_TYPE_VARBINARY] =  */ 1U << MP_BIN,
>  	/* [FIELD_TYPE_SCALAR]   =  */ (1U << MP_UINT) | (1U << MP_INT) |
>  		(1U << MP_FLOAT) | (1U << MP_DOUBLE) | (1U << MP_STR) |
> -		(1U << MP_BIN) | (1U << MP_BOOL),
> +		(1U << MP_BIN) | (1U << MP_BOOL) | (1U << MP_EXT),
> +	/* [FIELD_TYPE_DECIMAL]  =  */ 1U << MP_EXT,
>  	/* [FIELD_TYPE_ARRAY]    =  */ 1U << MP_ARRAY,
>  	/* [FIELD_TYPE_MAP]      =  */ (1U << MP_MAP),
>  };
>  
> +const uint32_t field_ext_type[] = {

field_mp_ext_type

> +	/* [FIELD_TYPE_ANY]      =  */ UINT32_MAX & ~(1U << MP_UNKNOWN),

Can one insert MP_EXT other than decimal into a space? Do we need to
support that?

> +	/* [FIELD_TYPE_UNSIGNED] =  */ 0,
> +	/* [FIELD_TYPE_STRING]   =  */ 0,
> +	/* [FIELD_TYPE_NUMBER]   =  */ 1U << MP_DECIMAL,
> +	/* [FIELD_TYPE_INTEGER]  =  */ 0,
> +	/* [FIELD_TYPE_BOOLEAN]  =  */ 0,
> +	/* [FIELD_TYPE_VARBINARY] = */ 0,
> +	/* [FIELD_TYPE_SCALAR]   =  */ 1U << MP_DECIMAL,
> +	/* [FIELD_TYPE_DECIMAL]  =  */ 1U << MP_DECIMAL,
> +	/* [FIELD_TYPE_ARRAY]    =  */ 0,
> +	/* [FIELD_TYPE_MAP]      =  */ 0,
> +};
> +
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index f4a1a8fd1..ed3354ade 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -505,7 +505,7 @@ static inline int
>  key_part_validate(enum field_type key_type, const char *key,
>  		  uint32_t field_no, bool is_nullable)
>  {
> -	if (unlikely(!field_mp_type_is_compatible(key_type, mp_typeof(*key),
> +	if (unlikely(!field_mp_type_is_compatible(key_type, key,
>  						  is_nullable))) {
>  		diag_set(ClientError, ER_KEY_PART_TYPE, field_no,
>  			 field_type_strs[key_type]);
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index 95a0f58c9..6ab28f662 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -33,6 +33,9 @@
>  #include "coll/coll.h"
>  #include "trivia/util.h" /* NOINLINE */
>  #include <math.h>
> +#include "lib/core/decimal.h"
> +#include "lib/core/mp_decimal.h"
> +#include "lib/core/mp_user_types.h"
>  
>  /* {{{ tuple_compare */
>  
> @@ -87,7 +90,7 @@ static enum mp_class mp_classes[] = {
>  	/* .MP_BOOL    = */ MP_CLASS_BOOL,
>  	/* .MP_FLOAT   = */ MP_CLASS_NUMBER,
>  	/* .MP_DOUBLE  = */ MP_CLASS_NUMBER,
> -	/* .MP_BIN     = */ MP_CLASS_BIN
> +	/* .MP_EXT     = */ MP_CLASS_NUMBER /* requires additional parsing */

This, as well as the use of MP_EXT below, looks confusing: you imply
that MP_EXT is always a number, but what happens when we add TIMESTAMP
or UUID ext type?

I would instead add a new enum for all comparable types (bool, float,
decimal, etc) and a helper that would determine the type by msgpack
data. This way all switch-cases below would be flat and the code would
be generally easier for understanding IMO. Please consider this. This
could be done in a separate patch.

>  };
>  
>  #define COMPARE_RESULT(a, b) (a < b ? -1 : a > b)
> @@ -264,6 +267,50 @@ mp_compare_double_any_number(double lhs, const char *rhs,
>  	return k * COMPARE_RESULT(lqbit, rqbit);
>  }
>  
> +static int
> +mp_compare_decimal_any_number(decimal_t *lhs, const char *rhs,
> +			      enum mp_type rhs_type, int k)
> +{
> +	decimal_t rhs_dec;
> +	switch (rhs_type) {
> +	case MP_FLOAT:
> +	{
> +		double d = mp_decode_float(&rhs);
> +		decimal_from_double(&rhs_dec, d);
> +		break;
> +	}
> +	case MP_DOUBLE:
> +	{
> +		double d = mp_decode_double(&rhs);
> +		decimal_from_double(&rhs_dec, d);
> +		break;
> +	}
> +	case MP_INT:
> +	{
> +		int64_t num = mp_decode_int(&rhs);
> +		decimal_from_int64(&rhs_dec, num);
> +		break;
> +	}
> +	case MP_UINT:
> +	{
> +		uint64_t num = mp_decode_uint(&rhs);
> +		decimal_from_uint64(&rhs_dec, num);
> +		break;
> +	}
> +	case MP_EXT:
> +	{
> +		int8_t ext_type;
> +		uint32_t len = mp_decode_extl(&rhs, &ext_type);
> +		assert(ext_type == MP_DECIMAL);
> +		decimal_unpack(&rhs, len, &rhs_dec);
> +		break;
> +	}
> +	default:
> +		unreachable();
> +	}
> +	return k * decimal_compare(lhs, &rhs_dec);
> +}
> +
>  static int
>  mp_compare_number_with_type(const char *lhs, enum mp_type lhs_type,
>  			    const char *rhs, enum mp_type rhs_type)
> @@ -271,6 +318,21 @@ mp_compare_number_with_type(const char *lhs, enum mp_type lhs_type,
>  	assert(mp_classof(lhs_type) == MP_CLASS_NUMBER);
>  	assert(mp_classof(rhs_type) == MP_CLASS_NUMBER);
>  
> +	/*
> +	 * test decimals first, so that we don't have to
> +	 * account for them in other comarators.
> +	 */
> +	decimal_t dec;
> +	if (rhs_type == MP_EXT) {
> +		return mp_compare_decimal_any_number(
> +			mp_decode_decimal(&rhs, &dec), lhs, lhs_type, -1
> +		);
> +	}
> +	if (lhs_type == MP_EXT) {
> +		return mp_compare_decimal_any_number(
> +			mp_decode_decimal(&lhs, &dec), rhs, rhs_type, 1
> +		);
> +	}
>  	if (rhs_type == MP_FLOAT) {
>  		return mp_compare_double_any_number(
>  			mp_decode_float(&rhs), lhs, lhs_type, -1
> @@ -410,6 +472,8 @@ tuple_compare_field(const char *field_a, const char *field_b,
>  		return coll != NULL ?
>  		       mp_compare_scalar_coll(field_a, field_b, coll) :
>  		       mp_compare_scalar(field_a, field_b);
> +	case FIELD_TYPE_DECIMAL:
> +		return mp_compare_number(field_a, field_b);
>  	default:
>  		unreachable();
>  		return 0;
> @@ -443,6 +507,8 @@ tuple_compare_field_with_type(const char *field_a, enum mp_type a_type,
>  		       mp_compare_scalar_coll(field_a, field_b, coll) :
>  		       mp_compare_scalar_with_type(field_a, a_type,
>  						   field_b, b_type);
> +	case FIELD_TYPE_DECIMAL:
> +		return mp_compare_number(field_a, field_b);
>  	default:
>  		unreachable();
>  		return 0;
> @@ -1356,6 +1422,25 @@ static const comparator_with_key_signature cmp_wk_arr[] = {
>  #define HINT_VALUE_DOUBLE_MAX	(exp2(HINT_VALUE_BITS - 1) - 1)
>  #define HINT_VALUE_DOUBLE_MIN	(-exp2(HINT_VALUE_BITS - 1))
>  
> +/**
> + * Max and min decimal numbers whose integral parts fit
> + * in a hint value.
> + */
> +static const decimal_t HINT_VALUE_DECIMAL_MAX = {
> +	18,				/* decimal digits */
> +	0,				/* exponent */
> +	0,				/* no special bits. */
> +	{487, 423, 303, 752, 460, 576}	/* 576,460,752,303,423,488 = 2^59 - 1 */
> +					/* 59 == HINT_VALUE_BITS - 1 */
> +};
> +
> +static const decimal_t HINT_VALUE_DECIMAL_MIN = {
> +	18,				/* decimal digits */
> +	0,				/* exponent */
> +	0x80,				/* negative bit */
> +	{488, 423, 303, 752, 460, 576}	/* 576,460,752,303,423,488 = 2^59 */
> +};
> +

This looks fragile. Suppose we remove one bit from a hint, how should
one update those? Please define these constants using decimal_from_int64
or, even better, get rid of them altogether (see below).

>  /*
>   * HINT_CLASS_BITS should be big enough to store any mp_class value.
>   * Note, ((1 << HINT_CLASS_BITS) - 1) is reserved for HINT_NONE.
> @@ -1415,6 +1500,25 @@ hint_double(double d)
>  	return hint_create(MP_CLASS_NUMBER, val);
>  }
>  
> +static inline hint_t
> +hint_decimal(decimal_t *dec)
> +{
> +	uint64_t val = 0;
> +	int64_t num;
> +	if (decimal_compare(dec, &HINT_VALUE_DECIMAL_MAX) >= 0)
> +		val = HINT_VALUE_MAX;
> +	else if (decimal_compare(dec, &HINT_VALUE_DECIMAL_MIN) <= 0)
> +		val = 0;
> +	else {
> +		dec = decimal_to_int64(dec, &num);
> +		/* We've checked boundaries above. */
> +		assert(dec != NULL);
> +		val = num - HINT_VALUE_INT_MIN;
> +	}

Why not call decimal_to_int64 first and then calculate a hint from the
returned number? I mean

  if (decimal_to_int64(dec, &num) &&
      num >= HINT_VALUE_INT_MIN && num <= HINT_VALUE_INT_MAX)
          hint = num;
  else
          ...

> +
> +	return hint_create(MP_CLASS_NUMBER, val);
> +}
> +
>  static inline uint64_t
>  hint_str_raw(const char *s, uint32_t len)
>  {
> @@ -1491,12 +1595,42 @@ field_hint_number(const char *field)
>  		return hint_double(mp_decode_float(&field));
>  	case MP_DOUBLE:
>  		return hint_double(mp_decode_double(&field));
> +	case MP_EXT:
> +	{
> +		int8_t ext_type;
> +		uint32_t len = mp_decode_extl(&field, &ext_type);
> +		switch(ext_type) {
> +		case MP_DECIMAL:
> +		{
> +			decimal_t dec;
> +			decimal_t *res;
> +			/*
> +			 * The effect of mp_decode_extl() +
> +			 * decimal_unpack() is the same that
> +			 * the one of mp_decode_decimal().
> +			 */
> +			res = decimal_unpack(&field, len, &dec);
> +			assert(res != NULL);
> +			return hint_decimal(res);
> +		}
> +		default:
> +			unreachable();
> +		}
> +	}
>  	default:
>  		unreachable();
>  	}
>  	return HINT_NONE;
>  }
>  
> +static inline hint_t
> +field_hint_decimal(const char *field)
> +{
> +	assert(mp_typeof(*field) == MP_EXT);
> +	decimal_t dec;
> +	return hint_decimal(mp_decode_decimal(&field, &dec));
> +}
> +
>  static inline hint_t
>  field_hint_string(const char *field, struct coll *coll)
>  {
> @@ -1536,6 +1670,25 @@ field_hint_scalar(const char *field, struct coll *coll)
>  	case MP_BIN:
>  		len = mp_decode_binl(&field);
>  		return hint_bin(field, len);
> +	case MP_EXT:
> +	{
> +		int8_t ext_type;
> +		uint32_t len = mp_decode_extl(&field, &ext_type);
> +		switch(ext_type) {
> +		case MP_DECIMAL:
> +		{
> +			decimal_t dec;
> +			/*
> +			 * The effect of mp_decode_extl() +
> +			 * decimal_unpack() is the same that
> +			 * the one of mp_decode_decimal().
> +			 */
> +			return hint_decimal(decimal_unpack(&field, len, &dec));
> +		}

Please try to eliminate code duplication.

> diff --git a/test/engine/decimal.test.lua b/test/engine/decimal.test.lua
> new file mode 100644
> index 000000000..52a300e72
> --- /dev/null
> +++ b/test/engine/decimal.test.lua
> @@ -0,0 +1,65 @@
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +
> +decimal = require('decimal')
> +ffi = require('ffi')
> +
> +_ = box.schema.space.create('test', {engine=engine})
> +_ = box.space.test:create_index('pk', {parts={1,'decimal'}})
> +
> +test_run:cmd('setopt delimiter ";"')
> +for i = 0,16 do
> +    box.space.test:insert{decimal.new((i-8)/4)}
> +end;
> +test_run:cmd('setopt delimiter ""');
> +
> +box.space.test:select{}
> +
> +-- check invalid values
> +box.space.test:insert{1.23}
> +box.space.test:insert{'str'}
> +box.space.test:insert{ffi.new('uint64_t', 0)}
> +-- check duplicates
> +box.space.test:insert{decimal.new(0)}
> +
> +box.space.test.index.pk:drop()
> +
> +_ = box.space.test:create_index('pk', {parts={1, 'number'}})
> +
> +test_run:cmd('setopt delimiter ";"')
> +for i = 0, 32 do
> +    local val = (i - 16) / 8
> +    if i % 2 == 1 then val = decimal.new(val) end
> +    box.space.test:insert{val}
> +end;
> +test_run:cmd('setopt delimiter ""');
> +
> +box.space.test:select{}
> +
> +-- check duplicates
> +box.space.test:insert{-2}
> +box.space.test:insert{decimal.new(-2)}
> +box.space.test:insert{decimal.new(-1.875)}
> +box.space.test:insert{-1.875}
> +
> +box.space.test.index.pk:drop()
> +
> +_ = box.space.test:create_index('pk')
> +test_run:cmd('setopt delimiter ";"')
> +for i = 1,10 do
> +    box.space.test:insert{i, decimal.new(i/10)}
> +end;
> +test_run:cmd('setopt delimiter ""');
> +
> +-- a bigger test with a secondary index this time.
> +box.space.test:insert{11, 'str'}
> +box.space.test:insert{12, 0.63}
> +box.space.test:insert{13, 0.57}
> +box.space.test:insert{14, 0.33}
> +box.space.test:insert{16, 0.71}
> +
> +_ = box.space.test:create_index('sk', {parts={2, 'scalar'}})
> +box.space.test.index.sk:select{}
> +
> +box.space.test:drop()

Please add a test for space.format and index.alter.