From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 18 Dec 2018 23:46:57 +0300 From: Vladimir Davydov Subject: Re: [PATCH v6 0/8] box: Indexes by JSON path Message-ID: <20181218204657.x6lgyogmtnmvaklv@esperanza> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: To: Kirill Shcherbatov Cc: tarantool-patches@freelists.org, kostja@tarantool.org List-ID: On Mon, Dec 17, 2018 at 09:52:44AM +0300, Kirill Shcherbatov wrote: > Sometimes field data could have complex document structure. > When this structure is consistent across whole document, > you are able to create an index by JSON path. > > Changes in version 6: > - key_part(s) path string is not 0-terminated anymore > - global formats_epoch pool used to initialize tuple_format:epoch > numeric field > - more informative error messages > - splitted changes to few new additional preparatory commits > - got rid off hashtable in vy_stmt_build > - prevent traversing field tree twice in vy_stmt_build > - inline basic field_by_part with slowpath alternative > - simplified tuple_format1_can_store_format2_tuples > - splitted tuple_init_field_map to helper routines > - moved tests to separate file > - large and descriptive comments for many things > > http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes > https://github.com/tarantool/tarantool/issues/1012 > > Kirill Shcherbatov (8): > box: refactor tuple_validate_raw > box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} > box: build path to field string uniformly on error I'd love to apply these preparatory patches before looking at the rest, but they are badly split and leave the code in a rather incomplete state. Let's sort them out first. What I want to see: 1. Implement tuple_validate with tuple_init_field_map. Do not mix field_map_template in this patch, as it has nothing to do with that. And it's not just refactoring. Actually, it can degrade the function performance. Please fix the commit message accordingly. 2. Refactor field type/nullability checks. Basically, this is what patch #2 of your series does, but it's not good enough. First, mp_type_is_compatible and key_mp_type_validate share the same piece of code. You should factor it out instead of copying and pasting, e.g. - field_mp_type_is_compatible returns true if given field and msgpack types are compatible. It should probably be defined in field_def.c. key_mp_type should probably be renamed to field_mp_type and moved to field_def.c, as it deals with field_type. I think it's also worth moving mp_type_strs to field_def.c, too. - key_part_validate takes a pointer to msgpack and a field type. Calls field_mp_type_is_compatible(mp_typeof(*key), field_type) to check if the given key part is valid. Sets diag on error. - key_validate_parts calls key_part_validate for each msgpack part. I'm not sure about function names - please think/try by yourself. Second, key_mp_type_validate doesn't need to take an error code anymore - now we always pass ER_KEY_PART_COUNT to it. Third, changing %u to %s in error messages should be done later, see step 4. And you should use int2str instead of tt_snprintf for it. This patch is going to do quite a bit of work - that's OK I think, just enumerate all the changes done by it in a bullet list in the commit message. 3. Add a helper function for getting a path to a json_token in a json_tree. Should be defined in json.c. We will use this function for reporting tuple format violations. I understand that if we go down this path, we won't be able to report top-level field names as they are defined in a tuple_dictionary, not in a json_tree. I think it's OK as the messages will be descriptive enough even if we use field numbers instead. At the same time, this will allow us to move a relatively big chunk of code that doesn't really have anything to do with tuples out of tuple_format.c. This function should be implemented without an alloc() callback. It should be passed a buffer of a fixed length. It should return the number of bytes that would be printed if there were enough space in the buffer, in snprintf-like manner, so that one could first call it with a 0-length buffer to figure out the buffer length, then allocate a buffer and call it again, this time for real. Don't forget to add a unit test. 4. Use the helper introduced at step #3 to report field names in error messages. You should change %u to %s in error messages in this patch, not in patch 2, because after patch 2 we still use field numbers, not paths. Let's leave field_map validation introduction to the main patch of the series for now. It would be great to add it in a separate patch, but it wouldn't work for plain indexes as missing fields are already checked with ER_MIN_FIELD_COUNT, i.e. we would just add some dead code, and unfortunately we can't get rid of ER_MIN_FIELD_COUNT, because space.format fields aren't assigned slots in the field map so we can't use the field map to check if they are present. This makes me wonder what we are going to do when we get to support setting JSON schema in space.format. How will we check that all formatted fields are present then? We need to figure it out before proceeding with the main patch. Let's discuss it. If you agree with my points above, please send the four patches in a separate series. Do not resubmit the rest of the series before I've committed the preparatory patches. Side notes: - Never add dead code in a patch. All code added by a patch should be tested. Introducing tuple_field_map_validate before implementing JSON index support violates this rule. - Try not to rewrite the same piece of code multiple times in the same series, because this practically doubles my work as a reviewer. Patches of a patch set should fit in like pieces of a jigsaw puzzle. Here's an example of you doing this: @@ -547,8 +559,10 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, if (validate && !mp_type_is_compatible(type, field->type, is_nullable) != 0) { - diag_set(ClientError, ER_FIELD_TYPE, - tt_sprintf("%u", token.num + 1), + const char *path = + tuple_field_json_path(format, field); + assert(path != NULL); + diag_set(ClientError, ER_FIELD_TYPE, path, field_type_strs[field->type]); return -1; } and in the next patch @@ -560,10 +922,13 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, !mp_type_is_compatible(type, field->type, is_nullable) != 0) { const char *path = - tuple_field_json_path(format, field); - assert(path != NULL); - diag_set(ClientError, ER_FIELD_TYPE, path, - field_type_strs[field->type]); + tuple_field_json_path(format, field, + region); + if (path != NULL) { + diag_set(ClientError, ER_FIELD_TYPE, + path, + field_type_strs[field->type]); + } return -1; } if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {