[Tarantool-patches] [PATCH v2 10/15] module api: add box_key_def_new_v2()

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sun Oct 11 18:25:54 MSK 2020


Thanks for the patch!

See 5 comments below.

On 11.10.2020 14:57, Alexander Turenko wrote:
> Unlike box_key_def_new() it allows to set nullability, collation and
> JSON path.
> 
> Note: JSON paths are not supported in the backported version of the
> patch for 1.10.
> 
> Provided public non-opaque key part definition structure to create a key
> definition. The next commit will also use this structure to dump a key
> definition.
> 
> There are several technical points around the box_key_part_def_t
> structure. They are mainly about providing stable ABI.
> 
> - Two uint32_t fields are placed first for better aligning of next
>   fields (pointers, which are usually 64 bit wide).
> 
> - A padding is used to guarantee that the structure size will remain the
>   same across tarantool versions. It allows to allocate an array of such
>   structures.
> 
> - The padding array is not a field of the structure itself, but added as
>   a union variant (see the code). It allows to get rid of manual
>   calculation of cumulative fields size, which is hard to do in a
>   platform independent way.
> 
> - A minimal size of the structure is guaranteed by the union with
>   padding, but a static assert is required to guarantee that the size
>   will not overrun the predefined value.
> 
> - PACKED is added as an extra remedy to make the structure layout
>   predictable (on given target architecture).
> 
> - A bit flag is used for is_nullable. bool is considered as too
>   expensive (it requires 8 bits). bitfields (int:1 and so on) do no
>   guarantee certain data layout (it is compiler implementation detail),
>   while a module is compiled outside of tarantool build and may use
>   different toolchain. A bit flag is the only choice.
> 
> - A collation is identified using a string. Different IDs may be used on
>   different tarantool instances for collations. The only 'real'
>   identifier is a collation name, so using it as identifier in the API
>   should be more convenient and less error-prone.

1. What if key_def/merger module is used on client tarantool? For example,
vshard router. It does not store collation structs in the hash you use
to lookup the name. Should we fix that on the clients, to make them fill
the hash?

This kind of issue exists for all schema-dependent values. Mostly for SQL,
which heavily depends on struct space/index/... even during parsing.

> - A field type is also identified using a string instead of a number. We
>   have <enum field_type> in the module API, but it cannot be used here,
>   because IDs are changed across tarantool versions. Aside of this, size
>   of a enum is compiler defined. Anyway, we can expose field types as
>   numbers and implement number-to-name and name-to-number mapping
>   functions, but IMHO it would just add extra complexity.
> 
> The dependency on the fiber compilation unit is added for key_def: it is
> purely to obtain the box region in the module API functions. The key_def
> compilation unit does not use anything fiber related in fact.
> ---
>  src/box/key_def.c                | 141 ++++++++++++++++++
>  src/box/key_def.h                | 144 ++++++++++++++++++-
>  src/exports.h                    |   2 +
>  test/app-tap/module_api.c        | 240 +++++++++++++++++++++++++++++++
>  test/app-tap/module_api.test.lua |   2 +-
>  5 files changed, 527 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index a03537227..e2bcfe321 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -341,6 +342,76 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts,
>  	return 0;
>  }
>  
> + /* {{{ Module API helpers */
> +
> +static int
> +key_def_set_internal_part(struct key_part_def *internal_part,
> +			  box_key_part_def_t *part, struct region *region)
> +{
> +	*internal_part = key_part_def_default;
> +
> +	/* Set internal_part->fieldno. */
> +	internal_part->fieldno = part->fieldno;
> +
> +	/* Set internal_part->type. */
> +	if (part->field_type == NULL) {
> +		diag_set(IllegalParams, "Field type is mandatory");
> +		return -1;
> +	}
> +	size_t type_len = strlen(part->field_type);
> +	internal_part->type = field_type_by_name(part->field_type, type_len);
> +	if (internal_part->type == field_type_MAX) {
> +		diag_set(IllegalParams, "Unknown field type: \"%s\"",
> +			 part->field_type);
> +		return -1;
> +	}
> +
> +	/* Set internal_part->{is_nullable,nullable_action}. */
> +	bool is_nullable = (part->flags & BOX_KEY_PART_DEF_IS_NULLABLE) ==
> +		BOX_KEY_PART_DEF_IS_NULLABLE;
> +	if (is_nullable) {
> +		internal_part->is_nullable = is_nullable;
> +		internal_part->nullable_action = ON_CONFLICT_ACTION_NONE;
> +	}
> +
> +	/* Set internal_part->coll_id. */
> +	if (part->collation != NULL) {
> +		size_t collation_len = strlen(part->collation);
> +		struct coll_id *coll_id = coll_by_name(part->collation,
> +						       collation_len);
> +		if (coll_id == NULL) {
> +			diag_set(IllegalParams, "Unknown collation: \"%s\"",
> +				 part->collation);
> +			return -1;
> +		}
> +		internal_part->coll_id = coll_id->id;
> +	}
> +
> +	/* Set internal_part->path (JSON path). */
> +	if (part->path) {

2. Better use != NULL. At least to be consistent with the other code in
in the patch.

> +		size_t path_len = strlen(part->path);
> +		if (json_path_validate(part->path, path_len,
> +				       TUPLE_INDEX_BASE) != 0) {
> +			diag_set(IllegalParams, "Invalid JSON path: \"%s\"",
> +				 part->path);
> +			return -1;
> +		}
> +		char *tmp = region_alloc(region, path_len + 1);
> +		if (tmp == NULL) {
> +			diag_set(OutOfMemory, path_len + 1, "region", "path");
> +			return -1;
> +		}
> diff --git a/test/app-tap/module_api.c b/test/app-tap/module_api.c
> index 184fdf8f8..26853dd9b 100644
> --- a/test/app-tap/module_api.c
> +++ b/test/app-tap/module_api.c
> @@ -365,6 +368,242 @@ test_key_def_api(lua_State *L)
>  	return 1;
>  }
>  
> +/* }}} key_def api */
> +
> +/* {{{ key_def api v2 */
> +
> +/*
> + * More functions around key_def were exposed to the module API
> + * in order to implement external tuple.keydef and tuple.merger
> + * modules (gh-5273, gh-5384).
> + */
> +
> +/**
> + * Verify type and message of an error in the diagnostics area.
> + *
> + * Accepts a prefix of an actual type or a message. Pass an empty
> + * string if you need to verify only type or only message.
> + */
> +static void
> +check_diag(const char *exp_err_type, const char *exp_err_msg)
> +{
> +	box_error_t *e = box_error_last();
> +	assert(!strcmp(box_error_type(e), exp_err_type));
> +	assert(!strcmp(box_error_message(e), exp_err_msg));

3. strcmp() returns not a boolean. Better check for == 0
explicitly. Otherwise !strcmp(a, b) looks like a != b intuitively.
In other patches too.

> +}
> +
> +/**
> + * Create a tuple on runtime arena.
> + *
> + * Release this tuple using <box_tuple_unref>().
> + */
> +static box_tuple_t *
> +new_runtime_tuple(const char *tuple_data, size_t tuple_size)
> +{
> +	box_tuple_format_t *fmt = box_tuple_format_default();
> +	const char *tuple_end = tuple_data + tuple_size;
> +	box_tuple_t *tuple = box_tuple_new(fmt, tuple_data, tuple_end);
> +	assert(tuple != NULL);
> +	box_tuple_ref(tuple);
> +	return tuple;
> +}
> +
> +/**
> + * Where padding bytes starts.
> + */
> +static size_t
> +key_part_padding_offset(void)
> +{
> +	if (sizeof(void *) * CHAR_BIT == 64)
> +		return 32;
> +	if (sizeof(void *) * CHAR_BIT == 32)
> +		return 20;
> +	assert(false);
> +}
> +
> +/**
> + * Mask of all defined flags.
> + */
> +static uint32_t
> +key_part_def_known_flags(void)
> +{
> +	return BOX_KEY_PART_DEF_IS_NULLABLE;
> +}
> +
> +/**
> + * Default flags value.
> + *
> + * All unknown bits are set to zero.
> + */
> +static uint32_t
> +key_part_def_default_flags(void)
> +{
> +	return 0;
> +}
> +
> +/**
> + * Set all <box_key_part_def_t> fields to nondefault values.
> + *
> + * It also set all padding bytes and unknown flags to non-zero
> + * values.
> + */
> +static void
> +key_part_def_set_nondefault(box_key_part_def_t *part)
> +{
> +	size_t padding_offset = key_part_padding_offset();
> +	uint32_t default_flags = key_part_def_default_flags();
> +
> +	/*
> +	 * Give correct non-default values for known fields and
> +	 * flags. Set unknown flags to non-zero values.
> +	 */
> +	part->fieldno = 1;
> +	part->flags = ~default_flags;
> +	part->field_type = "string";
> +	part->collation = "unicode_ci";
> +	part->path = "foo";
> +
> +	/* Fill padding with non-zero bytes. */
> +	char *padding = ((char *) part) + padding_offset;
> +	size_t padding_size = sizeof(box_key_part_def_t) - padding_offset;
> +	memset(padding, 0xff, padding_size);
> +}
> +
> +/**
> + * Verify that all known fields and flags are set to default
> + * values.
> + */
> +static void
> +key_part_def_check_default(box_key_part_def_t *part)
> +{
> +	uint32_t known_flags = key_part_def_known_flags();
> +	uint32_t default_flags = key_part_def_default_flags();
> +
> +	assert(part->fieldno == 0);
> +	assert((part->flags & known_flags) == default_flags);
> +	assert(part->field_type == NULL);
> +	assert(part->collation == NULL);
> +	assert(part->path == NULL);
> +}
> +
> +/**
> + * Verify that all padding bytes and unknown flags are set to
> + * zeros.
> + */
> +static void
> +key_part_def_check_zeros(const box_key_part_def_t *part)
> +{
> +	size_t padding_offset = key_part_padding_offset();
> +	uint32_t unknown_flags = ~key_part_def_known_flags();
> +
> +	char *padding = ((char *) part) + padding_offset;
> +	char *padding_end = ((char *) part) + sizeof(box_key_part_def_t);
> +	for (char *p = padding; p < padding_end; ++p) {
> +		assert(*p == 0);
> +	}
> +
> +	assert((part->flags & unknown_flags) == 0);
> +}
> +
> +/**
> + * Basic <box_key_part_def_create>() and <test_key_def_new_v2>()

4. test_key_def_new_v2 -> box_key_def_new_v2.

> + * test.
> + */
> +static int
> +test_key_def_new_v2(struct lua_State *L)
> +{
> +	/* Verify <box_key_part_def_t> binary layout. */
> +	assert(BOX_KEY_PART_DEF_T_SIZE == 64);
> +	assert(sizeof(box_key_part_def_t) == BOX_KEY_PART_DEF_T_SIZE);
> +	assert(offsetof(box_key_part_def_t, fieldno) == 0);
> +	assert(offsetof(box_key_part_def_t, flags) == 4);
> +	assert(offsetof(box_key_part_def_t, field_type) == 8);
> +	if (sizeof(void *) * CHAR_BIT == 64) {
> +		assert(offsetof(box_key_part_def_t, collation) == 16);
> +		assert(offsetof(box_key_part_def_t, path) == 24);
> +	} else if (sizeof(void *) * CHAR_BIT == 32) {
> +		assert(offsetof(box_key_part_def_t, collation) == 12);
> +		assert(offsetof(box_key_part_def_t, path) == 16);
> +	} else {
> +		assert(false);
> +	}
> +
> +	/*
> +	 * Fill key part definition with nondefault values.
> +	 * Fill padding and unknown flags with non-zero values.
> +	 */
> +	box_key_part_def_t part;
> +	key_part_def_set_nondefault(&part);
> +
> +	/*
> +	 * Verify that all known fields are set to default values and
> +	 * all unknown fields and flags are set to zeros.
> +	 */
> +	box_key_part_def_create(&part);
> +	key_part_def_check_default(&part);
> +	key_part_def_check_zeros(&part);
> +
> +	box_key_def_t *key_def;
> +
> +	/* Should not accept zero part count. */
> +	key_def = box_key_def_new_v2(NULL, 0);
> +	assert(key_def == NULL);
> +	check_diag("IllegalParams", "At least one key part is required");
> +
> +	/* Should not accept NULL as a <field_type>. */
> +	key_def = box_key_def_new_v2(&part, 1);
> +	assert(key_def == NULL);
> +	check_diag("IllegalParams", "Field type is mandatory");

5. Will you test not existing collation, unknown field type, bad JSON path?

> +
> +	/* Success case. */
> +	part.field_type = "unsigned";
> +	key_def = box_key_def_new_v2(&part, 1);
> +	assert(key_def != NULL);
> +
> +	/*
> +	 * Prepare tuples to do some comparisons.
> +	 *
> +	 * [1, 2, 3] and [3, 2, 1].
> +	 */
> +	box_tuple_t *tuple_1 = new_runtime_tuple("\x93\x01\x02\x03", 4);
> +	box_tuple_t *tuple_2 = new_runtime_tuple("\x93\x03\x02\x01", 4);
> +
> +	/*
> +	 * Verify that key_def actually can be used in functions
> +	 * that accepts it.
> +	 *
> +	 * Do several comparisons. Far away from being an
> +	 * exhaustive comparator test.
> +	 */
> +	int rc;
> +	rc = box_tuple_compare(tuple_1, tuple_1, key_def);
> +	assert(rc == 0);
> +	rc = box_tuple_compare(tuple_2, tuple_2, key_def);
> +	assert(rc == 0);
> +	rc = box_tuple_compare(tuple_1, tuple_2, key_def);
> +	assert(rc < 0);
> +	rc = box_tuple_compare(tuple_2, tuple_1, key_def);
> +	assert(rc > 0);
> +
> +	/* The same idea, but perform comparisons against keys. */
> +	rc = box_tuple_compare_with_key(tuple_1, "\x91\x00", key_def);
> +	assert(rc > 0);
> +	rc = box_tuple_compare_with_key(tuple_1, "\x91\x01", key_def);
> +	assert(rc == 0);
> +	rc = box_tuple_compare_with_key(tuple_1, "\x91\x02", key_def);
> +	assert(rc < 0);
> +
> +	/* Clean up. */
> +	box_tuple_unref(tuple_1);
> +	box_tuple_unref(tuple_2);
> +	box_key_def_delete(key_def);
> +
> +	lua_pushboolean(L, 1);
> +	return 1;
> +}


More information about the Tarantool-patches mailing list