From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Kirill Shcherbatov <kshcherbatov@tarantool.org>
Cc: tarantool-patches@freelists.org, kostja@tarantool.org
Subject: Re: [PATCH v6 0/8] box: Indexes by JSON path
Date: Tue, 18 Dec 2018 23:46:57 +0300 [thread overview]
Message-ID: <20181218204657.x6lgyogmtnmvaklv@esperanza> (raw)
In-Reply-To: <cover.1544995259.git.kshcherbatov@tarantool.org>
On Mon, Dec 17, 2018 at 09:52:44AM +0300, Kirill Shcherbatov wrote:
> Sometimes field data could have complex document structure.
> When this structure is consistent across whole document,
> you are able to create an index by JSON path.
>
> Changes in version 6:
> - key_part(s) path string is not 0-terminated anymore
> - global formats_epoch pool used to initialize tuple_format:epoch
> numeric field
> - more informative error messages
> - splitted changes to few new additional preparatory commits
> - got rid off hashtable in vy_stmt_build
> - prevent traversing field tree twice in vy_stmt_build
> - inline basic field_by_part with slowpath alternative
> - simplified tuple_format1_can_store_format2_tuples
> - splitted tuple_init_field_map to helper routines
> - moved tests to separate file
> - large and descriptive comments for many things
>
> http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes
> https://github.com/tarantool/tarantool/issues/1012
>
> Kirill Shcherbatov (8):
> box: refactor tuple_validate_raw
> box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH}
> box: build path to field string uniformly on error
I'd love to apply these preparatory patches before looking at the rest,
but they are badly split and leave the code in a rather incomplete
state. Let's sort them out first.
What I want to see:
1. Implement tuple_validate with tuple_init_field_map. Do not mix
field_map_template in this patch, as it has nothing to do with that.
And it's not just refactoring. Actually, it can degrade the function
performance. Please fix the commit message accordingly.
2. Refactor field type/nullability checks. Basically, this is what patch
#2 of your series does, but it's not good enough.
First, mp_type_is_compatible and key_mp_type_validate share the same
piece of code. You should factor it out instead of copying and
pasting, e.g.
- field_mp_type_is_compatible returns true if given field and msgpack
types are compatible. It should probably be defined in field_def.c.
key_mp_type should probably be renamed to field_mp_type and moved
to field_def.c, as it deals with field_type. I think it's also
worth moving mp_type_strs to field_def.c, too.
- key_part_validate takes a pointer to msgpack and a field type.
Calls field_mp_type_is_compatible(mp_typeof(*key), field_type) to
check if the given key part is valid. Sets diag on error.
- key_validate_parts calls key_part_validate for each msgpack part.
I'm not sure about function names - please think/try by yourself.
Second, key_mp_type_validate doesn't need to take an error code
anymore - now we always pass ER_KEY_PART_COUNT to it.
Third, changing %u to %s in error messages should be done later, see
step 4. And you should use int2str instead of tt_snprintf for it.
This patch is going to do quite a bit of work - that's OK I think,
just enumerate all the changes done by it in a bullet list in the
commit message.
3. Add a helper function for getting a path to a json_token in a
json_tree. Should be defined in json.c. We will use this function for
reporting tuple format violations. I understand that if we go down
this path, we won't be able to report top-level field names as they
are defined in a tuple_dictionary, not in a json_tree. I think it's
OK as the messages will be descriptive enough even if we use field
numbers instead. At the same time, this will allow us to move a
relatively big chunk of code that doesn't really have anything to do
with tuples out of tuple_format.c.
This function should be implemented without an alloc() callback. It
should be passed a buffer of a fixed length. It should return the
number of bytes that would be printed if there were enough space in
the buffer, in snprintf-like manner, so that one could first call it
with a 0-length buffer to figure out the buffer length, then allocate
a buffer and call it again, this time for real.
Don't forget to add a unit test.
4. Use the helper introduced at step #3 to report field names in error
messages. You should change %u to %s in error messages in this patch,
not in patch 2, because after patch 2 we still use field numbers, not
paths.
Let's leave field_map validation introduction to the main patch of the
series for now. It would be great to add it in a separate patch, but it
wouldn't work for plain indexes as missing fields are already checked
with ER_MIN_FIELD_COUNT, i.e. we would just add some dead code, and
unfortunately we can't get rid of ER_MIN_FIELD_COUNT, because
space.format fields aren't assigned slots in the field map so we can't
use the field map to check if they are present. This makes me wonder
what we are going to do when we get to support setting JSON schema in
space.format. How will we check that all formatted fields are present
then? We need to figure it out before proceeding with the main patch.
Let's discuss it.
If you agree with my points above, please send the four patches in a
separate series. Do not resubmit the rest of the series before I've
committed the preparatory patches.
Side notes:
- Never add dead code in a patch. All code added by a patch should be
tested. Introducing tuple_field_map_validate before implementing JSON
index support violates this rule.
- Try not to rewrite the same piece of code multiple times in the same
series, because this practically doubles my work as a reviewer.
Patches of a patch set should fit in like pieces of a jigsaw puzzle.
Here's an example of you doing this:
@@ -547,8 +559,10 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
if (validate &&
!mp_type_is_compatible(type, field->type,
is_nullable) != 0) {
- diag_set(ClientError, ER_FIELD_TYPE,
- tt_sprintf("%u", token.num + 1),
+ const char *path =
+ tuple_field_json_path(format, field);
+ assert(path != NULL);
+ diag_set(ClientError, ER_FIELD_TYPE, path,
field_type_strs[field->type]);
return -1;
}
and in the next patch
@@ -560,10 +922,13 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
!mp_type_is_compatible(type, field->type,
is_nullable) != 0) {
const char *path =
- tuple_field_json_path(format, field);
- assert(path != NULL);
- diag_set(ClientError, ER_FIELD_TYPE, path,
- field_type_strs[field->type]);
+ tuple_field_json_path(format, field,
+ region);
+ if (path != NULL) {
+ diag_set(ClientError, ER_FIELD_TYPE,
+ path,
+ field_type_strs[field->type]);
+ }
return -1;
}
if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
prev parent reply other threads:[~2018-12-18 20:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-17 6:52 Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 1/8] box: refactor tuple_validate_raw Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 2/8] box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 3/8] box: build path to field string uniformly on error Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 4/8] box: introduce JSON Indexes Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 5/8] box: introduce has_json_paths flag in templates Kirill Shcherbatov
2018-12-27 11:51 ` [tarantool-patches] " Konstantin Osipov
2018-12-27 11:52 ` Konstantin Osipov
2018-12-27 11:57 ` [tarantool-patches] " Konstantin Osipov
2018-12-17 6:52 ` [PATCH v6 6/8] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 7/8] box: introduce offset_slot cache in key_part Kirill Shcherbatov
2018-12-17 6:52 ` [PATCH v6 8/8] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-12-18 20:58 ` Vladimir Davydov
2018-12-18 20:46 ` Vladimir Davydov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181218204657.x6lgyogmtnmvaklv@esperanza \
--to=vdavydov.dev@gmail.com \
--cc=kostja@tarantool.org \
--cc=kshcherbatov@tarantool.org \
--cc=tarantool-patches@freelists.org \
--subject='Re: [PATCH v6 0/8] box: Indexes by JSON path' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox