From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 68FD944643B for ; Sat, 26 Sep 2020 01:58:51 +0300 (MSK) References: From: Vladislav Shpilevoy Message-ID: <04e9b777-b0de-d38b-f080-f54ff55a31c1@tarantool.org> Date: Sat, 26 Sep 2020 00:58:49 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 09/14] WIP: module api: add box_key_def_new_ex() List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Turenko Cc: tarantool-patches@dev.tarantool.org Thanks for the patch! See 12 comments below. 1. Why _ex()? Why not 2? What are we going to do when will need a new parameter? _ex_ex()? _ex2()? With a number at least we could just bump it. On 23.09.2020 03:14, Alexander Turenko wrote: > Unlike box_key_def_new() it allows to set nullability, collation and > JSON path. > > Note: JSON paths are not supported in the backported version of the > patch for 1.10. > > Provided public non-opaque key part definition structure to create a key > definition. The next commit will also use this structure to dump a key > definition. 2. Did you think of making a function which would create a key_def taking only part_count, and then exporting functions like box_key_def_set_part_type(...); box_key_def_set_part_json_path(...); box_key_def_set_part_collation(...); etc. to customize it after creation? Then we wouldn't need to export box_key_part_def_t and care about ABI shit. > There are several techinal points around the box_key_part_def_t 3. techinal -> techincal. > structure. They are mainly about providing stable ABI. > > - Two uint32_t fields are placed first for better aligning of next > fields (pointers, which are usually 64 bit wide). > > - A padding is used to guarantee that the structure size will remain the > same across tarantool versions. It allows to allocate an array of such > structures. > > - The padding array is not a field of the structure itself, but added as > a union variant (see the code). It allows to get rid of manual > calculation of cumulative fields size, which is hard to do in a > platform independent way. > > - A minimal size of the structure is guaranteed by the union with > padding, but a static assert is required to guarantee that the size > will not overrun the predefined value. > > - PACKED is added as an extra remedy to make the structure layout > predictable. 4. Is pointer size predictable? > - A bit flag is used for is_nullable. bool is considered as too > expensive (it requires 8 bits). bitfields (int:1 and so on) do no > guarantee certain data layout (it is compiler implementation detail), > while a module is compiled outside of tarantool build and may use > different toolchain. A bit flag is the only choice. > > - A collation is identified using a string. Different IDs may be used on > different tarantool instances for collations. The only 'real' > identifier is a collation name, so using it as identifier in the API > should be more convenient and less error-prone. > > - A field type is also identified using a string instead of a number. We > have in the module API, but it cannot be used here, > because IDs are changed across tarantool versions. Aside of this, size > of a enum is compiler defined. Anyway, we can expose field types as > numbers and implement number-to-name and name-to-number mapping > functions, but IMHO it would just add extra complexity. 5. For record - I still don't like the strings. Because it is not obvious what to specify here. There is no a set of values to use, and it makes the creation slower. It simplifies API, but not module.h. It simplifies the Lua module you built on top of this. For pure C modules and for the module.h itself it complicates things. But I do not like it not enough to insist. > XXX: Add a module API test. > > Part of #5273 > --- > src/box/key_def_api.c | 146 ++++++++++++++++++++++++++++++++++++++++++ > src/box/key_def_api.h | 123 +++++++++++++++++++++++++++++++++++ > src/exports.h | 2 + > 3 files changed, 271 insertions(+) > > diff --git a/src/box/key_def_api.c b/src/box/key_def_api.c > index 7f6c0ac55..19590095d 100644 > --- a/src/box/key_def_api.c > +++ b/src/box/key_def_api.c > @@ -55,6 +138,67 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) > return key_def; > } > > +box_key_def_t * > +box_key_def_new_ex(box_key_part_def_t *parts, uint32_t part_count) > +{ > + struct region *region = &fiber()->gc; > + size_t region_svp = region_used(region); > + size_t internal_parts_size; > + struct key_part_def *internal_parts = > + region_alloc_array(region, typeof(internal_parts[0]), > + part_count, &internal_parts_size); > + if (parts == NULL) { 6. I assume you wanted to check for internal_parts == NULL. > + diag_set(OutOfMemory, internal_parts_size, "region_alloc_array", > + "parts"); > + return NULL; > + } > + if (part_count == 0) { 7. internal_parts leaks here? I don't even know if region allocates anything when size is 0. I propose to move the check before the allocation. > + diag_set(IllegalParams, "At least one key part is required"); > + return NULL; > + } > + > + /* > + * It is possible to implement a function similar to > + * key_def_new() and eliminate -> > + * copying. However this would lead > + * to code duplication and would complicate maintanence, > + * so it worth to do so only if key_def creation will > + * appear on a hot path in some meaningful use case. > + */ > + uint32_t min_field_count = 0; > + for (uint32_t i = 0; i < part_count; ++i) { > + if (key_def_set_internal_part(&internal_parts[i], &parts[i], > + region) != 0) { > + region_truncate(region, region_svp); > + return NULL; > + } > + bool is_nullable = > + (parts[i].flags & BOX_KEY_PART_DEF_IS_NULLABLE_MASK) == > + BOX_KEY_PART_DEF_IS_NULLABLE_MASK; 8. You have internal_parts[i].is_nullable for that. > + if (!is_nullable && parts[i].fieldno > min_field_count) > + min_field_count = parts[i].fieldno; > + } > + > + struct key_def *key_def = key_def_new(internal_parts, part_count, > + false); > + region_truncate(region, region_svp); > + if (key_def == NULL) > + return NULL; > + > + /* > + * Update key_def->has_optional_parts and function > + * pointers. > + * > + * FIXME: It seems, this call should be part of > + * key_def_new(), because otherwise a callee function may > + * obtain an incorrect key_def. However I don't know any > + * case that would prove this guess. 9. Just do that and if all works, then it is fine to change. Looks like a bug which didn't have a place to explode until now. > + */ > + key_def_update_optionality(key_def, min_field_count); > + > + return key_def; > +} > + > void > box_key_def_delete(box_key_def_t *key_def) > diff --git a/src/box/key_def_api.h b/src/box/key_def_api.h > index 5b1c861f5..328a58c70 100644 > --- a/src/box/key_def_api.h > +++ b/src/box/key_def_api.h > @@ -44,11 +44,104 @@ typedef struct tuple box_tuple_t; > > typedef struct key_def box_key_def_t; > > +/** Key part definition flags. */ > +enum { > + BOX_KEY_PART_DEF_IS_NULLABLE_SHIFT = 0, > + BOX_KEY_PART_DEF_IS_NULLABLE_MASK = > + 1 << BOX_KEY_PART_DEF_IS_NULLABLE_SHIFT, 10. Does not it seem a little overcomplicated to you? Why do you need BOX_KEY_PART_DEF_IS_NULLABLE_SHIFT? Why not call BOX_KEY_PART_DEF_IS_NULLABLE_MASK BOX_KEY_PART_DEF_IS_NULLABLE_FLAG? Or BOX_KEY_PART_DEF_IS_NULLABLE. > +}; > + > +/** > + * It is recommended to verify size of > + * against this constant on the module side at build time. > + * Example: > + * > + * | #if !defined(__cplusplus) && !defined(static_assert) > + * | #define static_assert _Static_assert > + * | #endif > + * | > + * | (slash)* > + * | * Verify that has the same size when > + * | * compiled within tarantool and within the module. > + * | * > + * | * It is important, because the module allocates an array of key > + * | * parts and passes it to () tarantool > + * | * function. > + * | *(slash) > + * | static_assert(sizeof(box_key_part_def_t) == BOX_KEY_PART_DEF_T_SIZE, > + * | "sizeof(box_key_part_def_t)"); > + * > + * This snippet is not part of module.h, because portability of > + * static_assert() / _Static_assert() is dubious. It should be > + * decision of a module author how portable its code should be. > + */ > +enum { > + BOX_KEY_PART_DEF_T_SIZE = 64, > +}; > + > +/** > + * Public representation of a key part definition. > + * > + * Usage: Allocate an array of such key parts, initialize each > + * key part (call () and set necessary > + * fields), pass the array into () function. > + * > + * The idea of separation from internal is > + * to provide stable API and ABI for modules. > + * > + * New fields may be added into the end of the structure in later > + * tarantool versions. Also new flags may be introduced within > + * field. cannot be changed to a union (to > + * reuse for some other value), because it is verified even for > + * a non-string key part by (). > + * > + * Fields that are unknown at given tarantool version are ignored. > + */ > +typedef union PACKED { > + struct { > + /** Index of a tuple field (zero based). */ > + uint32_t fieldno; > + /** Flags, e.g. nullability. */ > + uint32_t flags; > + /** Type of the tuple field. */ > + const char *field_type; > + /** Collation name for string comparisons. */ > + const char *collation; > + /** > + * JSON path to point a nested field. > + * > + * Example: > + * > + * tuple: [1, {"foo": "bar"}] > + * key parts: [ > + * { > + * "fieldno": 2, > + * "type": "string", > + * "path": "foo" > + * } > + * ] > + * > + * => key: ["bar"] > + * > + * Note: When the path is given, > + * means type of the nested field. > + */ > + const char *path; > + }; > + /** > + * Padding to guarantee certain size across different > + * tarantool versions. > + */ > + char padding[BOX_KEY_PART_DEF_T_SIZE]; > +} box_key_part_def_t; 11. You know, we could also export a function like box_key_def_part_size(), which would return sizeof(key_part_def). Then the module would need to call it to allocate the part. And then use functions box_key_def_set_part_*() to set the part def fields. So the struct would be opaque, could be allocated in an array, and would be ABI stable. However probably it is the same what I proposed in the beginning of this email but with extra steps. > + > /** > * Create key definition with given field numbers and field types. > * > * May be used for tuple format creation and/or tuple comparison. > * > + * \sa (). > + * > * \param fields array with key field identifiers > * \param types array with key field types (see enum field_type) > * \param part_count the number of key fields > @@ -57,6 +150,28 @@ typedef struct key_def box_key_def_t; > API_EXPORT box_key_def_t * > box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count); > > +/** > + * Initialize a key part with default values. > + * > + * All trailing padding bytes are set to zero. 12. Why does it need to touch the padding? How is it related to ABI? > + * > + * All unknown bits are set to zero. > + */ > +API_EXPORT void > +box_key_part_def_create(box_key_part_def_t *part); > + > +/** > + * Create a key_def from given key parts. > + * > + * Unlike () this function allows to define > + * nullability, collation and other options for each key part. > + * > + * fields that are unknown at given tarantool > + * version are ignored. The same for unknown bits. > + */ > +API_EXPORT box_key_def_t * > +box_key_def_new_ex(box_key_part_def_t *parts, uint32_t part_count);