[Tarantool-patches] [PATCH 09/14] WIP: module api: add box_key_def_new_ex()
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sat Sep 26 01:58:49 MSK 2020
Thanks for the patch!
See 12 comments below.
1. Why _ex()? Why not 2? What are we going to do when will need a new
parameter? _ex_ex()? _ex2()? With a number at least we could just bump
it.
On 23.09.2020 03:14, Alexander Turenko wrote:
> Unlike box_key_def_new() it allows to set nullability, collation and
> JSON path.
>
> Note: JSON paths are not supported in the backported version of the
> patch for 1.10.
>
> Provided public non-opaque key part definition structure to create a key
> definition. The next commit will also use this structure to dump a key
> definition.
2. Did you think of making a function which would create a key_def taking
only part_count, and then exporting functions like
box_key_def_set_part_type(...);
box_key_def_set_part_json_path(...);
box_key_def_set_part_collation(...);
etc.
to customize it after creation? Then we wouldn't need to export
box_key_part_def_t and care about ABI shit.
> There are several techinal points around the box_key_part_def_t
3. techinal -> techincal.
> structure. They are mainly about providing stable ABI.
>
> - Two uint32_t fields are placed first for better aligning of next
> fields (pointers, which are usually 64 bit wide).
>
> - A padding is used to guarantee that the structure size will remain the
> same across tarantool versions. It allows to allocate an array of such
> structures.
>
> - The padding array is not a field of the structure itself, but added as
> a union variant (see the code). It allows to get rid of manual
> calculation of cumulative fields size, which is hard to do in a
> platform independent way.
>
> - A minimal size of the structure is guaranteed by the union with
> padding, but a static assert is required to guarantee that the size
> will not overrun the predefined value.
>
> - PACKED is added as an extra remedy to make the structure layout
> predictable.
4. Is pointer size predictable?
> - A bit flag is used for is_nullable. bool is considered as too
> expensive (it requires 8 bits). bitfields (int:1 and so on) do no
> guarantee certain data layout (it is compiler implementation detail),
> while a module is compiled outside of tarantool build and may use
> different toolchain. A bit flag is the only choice.
>
> - A collation is identified using a string. Different IDs may be used on
> different tarantool instances for collations. The only 'real'
> identifier is a collation name, so using it as identifier in the API
> should be more convenient and less error-prone.
>
> - A field type is also identified using a string instead of a number. We
> have <enum field_type> in the module API, but it cannot be used here,
> because IDs are changed across tarantool versions. Aside of this, size
> of a enum is compiler defined. Anyway, we can expose field types as
> numbers and implement number-to-name and name-to-number mapping
> functions, but IMHO it would just add extra complexity.
5. For record - I still don't like the strings. Because it is not
obvious what to specify here. There is no a set of values to use, and it
makes the creation slower.
It simplifies API, but not module.h. It simplifies the Lua module you
built on top of this. For pure C modules and for the module.h itself it
complicates things.
But I do not like it not enough to insist.
> XXX: Add a module API test.
>
> Part of #5273
> ---
> src/box/key_def_api.c | 146 ++++++++++++++++++++++++++++++++++++++++++
> src/box/key_def_api.h | 123 +++++++++++++++++++++++++++++++++++
> src/exports.h | 2 +
> 3 files changed, 271 insertions(+)
>
> diff --git a/src/box/key_def_api.c b/src/box/key_def_api.c
> index 7f6c0ac55..19590095d 100644
> --- a/src/box/key_def_api.c
> +++ b/src/box/key_def_api.c
> @@ -55,6 +138,67 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
> return key_def;
> }
>
> +box_key_def_t *
> +box_key_def_new_ex(box_key_part_def_t *parts, uint32_t part_count)
> +{
> + struct region *region = &fiber()->gc;
> + size_t region_svp = region_used(region);
> + size_t internal_parts_size;
> + struct key_part_def *internal_parts =
> + region_alloc_array(region, typeof(internal_parts[0]),
> + part_count, &internal_parts_size);
> + if (parts == NULL) {
6. I assume you wanted to check for internal_parts == NULL.
> + diag_set(OutOfMemory, internal_parts_size, "region_alloc_array",
> + "parts");
> + return NULL;
> + }
> + if (part_count == 0) {
7. internal_parts leaks here? I don't even know if region allocates
anything when size is 0. I propose to move the check before the
allocation.
> + diag_set(IllegalParams, "At least one key part is required");
> + return NULL;
> + }
> +
> + /*
> + * It is possible to implement a function similar to
> + * key_def_new() and eliminate <box_key_part_def_t> ->
> + * <struct key_part_def> copying. However this would lead
> + * to code duplication and would complicate maintanence,
> + * so it worth to do so only if key_def creation will
> + * appear on a hot path in some meaningful use case.
> + */
> + uint32_t min_field_count = 0;
> + for (uint32_t i = 0; i < part_count; ++i) {
> + if (key_def_set_internal_part(&internal_parts[i], &parts[i],
> + region) != 0) {
> + region_truncate(region, region_svp);
> + return NULL;
> + }
> + bool is_nullable =
> + (parts[i].flags & BOX_KEY_PART_DEF_IS_NULLABLE_MASK) ==
> + BOX_KEY_PART_DEF_IS_NULLABLE_MASK;
8. You have internal_parts[i].is_nullable for that.
> + if (!is_nullable && parts[i].fieldno > min_field_count)
> + min_field_count = parts[i].fieldno;
> + }
> +
> + struct key_def *key_def = key_def_new(internal_parts, part_count,
> + false);
> + region_truncate(region, region_svp);
> + if (key_def == NULL)
> + return NULL;
> +
> + /*
> + * Update key_def->has_optional_parts and function
> + * pointers.
> + *
> + * FIXME: It seems, this call should be part of
> + * key_def_new(), because otherwise a callee function may
> + * obtain an incorrect key_def. However I don't know any
> + * case that would prove this guess.
9. Just do that and if all works, then it is fine to change. Looks like
a bug which didn't have a place to explode until now.
> + */
> + key_def_update_optionality(key_def, min_field_count);
> +
> + return key_def;
> +}
> +
> void
> box_key_def_delete(box_key_def_t *key_def)
> diff --git a/src/box/key_def_api.h b/src/box/key_def_api.h
> index 5b1c861f5..328a58c70 100644
> --- a/src/box/key_def_api.h
> +++ b/src/box/key_def_api.h
> @@ -44,11 +44,104 @@ typedef struct tuple box_tuple_t;
>
> typedef struct key_def box_key_def_t;
>
> +/** Key part definition flags. */
> +enum {
> + BOX_KEY_PART_DEF_IS_NULLABLE_SHIFT = 0,
> + BOX_KEY_PART_DEF_IS_NULLABLE_MASK =
> + 1 << BOX_KEY_PART_DEF_IS_NULLABLE_SHIFT,
10. Does not it seem a little overcomplicated to you? Why
do you need BOX_KEY_PART_DEF_IS_NULLABLE_SHIFT? Why not
call BOX_KEY_PART_DEF_IS_NULLABLE_MASK
BOX_KEY_PART_DEF_IS_NULLABLE_FLAG? Or
BOX_KEY_PART_DEF_IS_NULLABLE.
> +};
> +
> +/**
> + * It is recommended to verify size of <box_key_part_def_t>
> + * against this constant on the module side at build time.
> + * Example:
> + *
> + * | #if !defined(__cplusplus) && !defined(static_assert)
> + * | #define static_assert _Static_assert
> + * | #endif
> + * |
> + * | (slash)*
> + * | * Verify that <box_key_part_def_t> has the same size when
> + * | * compiled within tarantool and within the module.
> + * | *
> + * | * It is important, because the module allocates an array of key
> + * | * parts and passes it to <box_key_def_new_ex>() tarantool
> + * | * function.
> + * | *(slash)
> + * | static_assert(sizeof(box_key_part_def_t) == BOX_KEY_PART_DEF_T_SIZE,
> + * | "sizeof(box_key_part_def_t)");
> + *
> + * This snippet is not part of module.h, because portability of
> + * static_assert() / _Static_assert() is dubious. It should be
> + * decision of a module author how portable its code should be.
> + */
> +enum {
> + BOX_KEY_PART_DEF_T_SIZE = 64,
> +};
> +
> +/**
> + * Public representation of a key part definition.
> + *
> + * Usage: Allocate an array of such key parts, initialize each
> + * key part (call <box_key_part_def_create>() and set necessary
> + * fields), pass the array into <box_key_def_new_ex>() function.
> + *
> + * The idea of separation from internal <struct key_part_def> is
> + * to provide stable API and ABI for modules.
> + *
> + * New fields may be added into the end of the structure in later
> + * tarantool versions. Also new flags may be introduced within
> + * <flags> field. <collation> cannot be changed to a union (to
> + * reuse for some other value), because it is verified even for
> + * a non-string key part by <box_key_def_new_ex>().
> + *
> + * Fields that are unknown at given tarantool version are ignored.
> + */
> +typedef union PACKED {
> + struct {
> + /** Index of a tuple field (zero based). */
> + uint32_t fieldno;
> + /** Flags, e.g. nullability. */
> + uint32_t flags;
> + /** Type of the tuple field. */
> + const char *field_type;
> + /** Collation name for string comparisons. */
> + const char *collation;
> + /**
> + * JSON path to point a nested field.
> + *
> + * Example:
> + *
> + * tuple: [1, {"foo": "bar"}]
> + * key parts: [
> + * {
> + * "fieldno": 2,
> + * "type": "string",
> + * "path": "foo"
> + * }
> + * ]
> + *
> + * => key: ["bar"]
> + *
> + * Note: When the path is given, <field_type>
> + * means type of the nested field.
> + */
> + const char *path;
> + };
> + /**
> + * Padding to guarantee certain size across different
> + * tarantool versions.
> + */
> + char padding[BOX_KEY_PART_DEF_T_SIZE];
> +} box_key_part_def_t;
11. You know, we could also export a function like box_key_def_part_size(),
which would return sizeof(key_part_def). Then the module would need
to call it to allocate the part. And then use functions box_key_def_set_part_*()
to set the part def fields. So the struct would be opaque, could be
allocated in an array, and would be ABI stable. However probably it is
the same what I proposed in the beginning of this email but with extra
steps.
> +
> /**
> * Create key definition with given field numbers and field types.
> *
> * May be used for tuple format creation and/or tuple comparison.
> *
> + * \sa <box_key_def_new_ex>().
> + *
> * \param fields array with key field identifiers
> * \param types array with key field types (see enum field_type)
> * \param part_count the number of key fields
> @@ -57,6 +150,28 @@ typedef struct key_def box_key_def_t;
> API_EXPORT box_key_def_t *
> box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count);
>
> +/**
> + * Initialize a key part with default values.
> + *
> + * All trailing padding bytes are set to zero.
12. Why does it need to touch the padding? How is it
related to ABI?
> + *
> + * All unknown <flags> bits are set to zero.
> + */
> +API_EXPORT void
> +box_key_part_def_create(box_key_part_def_t *part);
> +
> +/**
> + * Create a key_def from given key parts.
> + *
> + * Unlike <box_key_def_new>() this function allows to define
> + * nullability, collation and other options for each key part.
> + *
> + * <box_key_part_def_t> fields that are unknown at given tarantool
> + * version are ignored. The same for unknown <flags> bits.
> + */
> +API_EXPORT box_key_def_t *
> +box_key_def_new_ex(box_key_part_def_t *parts, uint32_t part_count);
More information about the Tarantool-patches
mailing list