[tarantool-patches] [PATCH v2 03/10] yaml: introduce yaml.decode_tag
Konstantin Osipov
kostja at tarantool.org
Thu May 10 21:41:07 MSK 2018
* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [18/04/20 16:25]:
> Yaml.decode_tag allows to decode a single tag of a YAML document.
Why do you need a function to decode just the tag, not an entire
document with tags?
>
> To distinguish them YAML tags will be used. A client console for
> each message will try to find a tag. If a tag is absent, then the
> message is a simple response on a request.
response to
> If a tag is !print!, then the document consists of a single
> string, that must be printed. Such a document must be decoded to
> get the printed string. So the calls sequence is yaml.decode_tag
> + yaml.decode. The reason why a print message must be decoded
> is that a print() result on a server side can be not
> well-formatted YAML, and must be encoded into it to be correctly
> send. For example, when I do on a server side something like
> console.print('very bad YAML string')
>
> The result of a print() is not a YAML document, and to be sent it
> must be encoded into YAML on a server side.
>
> If a tag is !push!, then the document is sent via
> box.session.push, and must not be decoded. It can be just printed
> or ignored or something.
It is nice you explain this convention in a changeset comment, but
I'd suggest to move the explanation to the relevant commit, i.e.
the one which uses the api you're adding in this commit.
>
> Needed for #2677
> ---
> test/app-tap/yaml.test.lua | 30 ++++++++++++++++++-
> third_party/lua-yaml/lyaml.cc | 67 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 96 insertions(+), 1 deletion(-)
>
> diff --git a/test/app-tap/yaml.test.lua b/test/app-tap/yaml.test.lua
> index b81402cc0..ef64509fe 100755
> --- a/test/app-tap/yaml.test.lua
> +++ b/test/app-tap/yaml.test.lua
> @@ -70,7 +70,10 @@ local function test_output(test, s)
> end
>
> local function test_tagged(test, s)
> - test:plan(7)
> + test:plan(15)
> + --
> + -- Test encoding tags.
> + --
> local must_be_err = 'Usage: encode_tagged({prefix = <string>, handle = <string>}, object)'
> local prefix = 'tag:tarantool.io/push,2018'
> local ok, err = pcall(s.encode_tagged)
> @@ -87,6 +90,31 @@ local function test_tagged(test, s)
> test:is(ret, "%TAG !push! "..prefix.."\n--- 300\n...\n", "encode_tagged usage")
> ret = s.encode_tagged({handle = '!print!', prefix = prefix}, {a = 100, b = 200})
> test:is(ret, "%TAG !print! tag:tarantool.io/push,2018\n---\na: 100\nb: 200\n...\n", 'encode_tagged usage')
> + --
> + -- Test decoding tags.
> + --
> + must_be_err = "Usage: decode_tag(<string>)"
This is a bad test - any change to the error message will require
changes to the test. If you're testing that there is a relevant
usage message, you could search for substring.
As a person reading your code and test for it, I am at a loss what
kind of string the API expects. Is it a fragment of YAML document
containing the tag? Is it a single YAML document? Is it a stream
with multiple documents? I can see you're passing the return value
from the encoder into decode_tag(). Why is the API a-symmetric?
It should not be asymmetric in the first place, but if you decided
to make it one please this deserves an explanation in the
changeset comment.
> + ok, err = pcall(s.decode_tag)
I suggest you simply have encode_tagged and decode_tagged.
Or even simpler, extend dump/load, with tag support.
> + test:is(err, must_be_err, "decode_tag usage")
> + ok, err = pcall(s.decode_tag, false)
> + test:is(err, must_be_err, "decode_tag usage")
> + local handle, prefix = s.decode_tag(ret)
> + test:is(handle, "!print!", "handle is decoded ok")
> + test:is(prefix, "tag:tarantool.io/push,2018", "prefix is decoded ok")
> + local several_tags =
> +[[%TAG !tag1! tag:tarantool.io/tag1,2018
> +%TAG !tag2! tag:tarantool.io/tag2,2018
Please add a test case for multiple documents.
> +---
> +- 100
> +...
> +]]
> + ok, err = s.decode_tag(several_tags)
> + test:is(ok, nil, "can not decode multiple tags")
> + test:is(err, "can not decode multiple tags", "same")
> + local no_tags = s.encode(100)
> + handle, prefix = s.decode_tag(no_tags)
> + test:is(handle, nil, "no tag - no handle")
> + test:is(prefix, nil, "no tag - no prefix")
> end
>
> tap.test("yaml", function(test)
> diff --git a/third_party/lua-yaml/lyaml.cc b/third_party/lua-yaml/lyaml.cc
> index 8329f84e9..d24715edd 100644
> --- a/third_party/lua-yaml/lyaml.cc
> +++ b/third_party/lua-yaml/lyaml.cc
> @@ -361,6 +361,72 @@ static int l_load(lua_State *L) {
> return loader.document_count;
> }
>
> +/**
> + * Decode a global tag of document. Multiple tags can not be
> + * decoded. In a case of multiple documents only first is
> + * processed.
Decode a document taking into account document tags.
In case of success, pops the input from the stack and pushed
the document and a table with options, containing tag prefix and
tag handle.
> + * @param YAML document in string.
> + * @retval nil, err Error occured during decoding. In the second
> + * value is error description.
> + * @retval nil, nil A document does not contain tags.
> + * @retval handle, prefix Handle and prefix of a tag.
> + */
> +static int
> +l_load_tag(struct lua_State *L)
Function name in C does not match the Lua name. Please make sure
the names match.
I understand you sometimes may avoid extra work because you don't
believe I am ever going to look at the patch, but this is not
extra work, this is just sloppy code.
> +{
> + if (lua_gettop(L) != 1 || !lua_isstring(L, 1))
> + return luaL_error(L, "Usage: decode_tag(<string>)");
> + size_t len;
> + const char *str = lua_tolstring(L, 1, &len);
> + struct lua_yaml_loader loader;
> + memset(&loader, 0, sizeof(loader));
> + loader.L = L;
> + loader.cfg = luaL_checkserializer(L);
> + if (yaml_parser_initialize(&loader.parser) == 0)
> + luaL_error(L, OOM_ERRMSG);
> + yaml_tag_directive_t *start, *end;
> + yaml_parser_set_input_string(&loader.parser, (const unsigned char *) str,
> + len);
Shouldn't you use yaml typedef here rather than const unsigned
char *?
> + /* Initial parser step. Detect the documents start position. */
> + if (do_parse(&loader) == 0)
> + goto parse_error;
> + if (loader.event.type != YAML_STREAM_START_EVENT) {
> + lua_pushnil(L);
> + lua_pushstring(L, "expected STREAM_START_EVENT");
> + return 2;
> + }
What is the current convention for dump/load API? Does it use nil,
err or lua_error() for errors?
Why did you decide to depart from the current convention?
> + /* Parse a document start. */
> + if (do_parse(&loader) == 0)
> + goto parse_error;
> + if (loader.event.type == YAML_STREAM_END_EVENT)
> + goto no_tags;
> + assert(loader.event.type == YAML_DOCUMENT_START_EVENT);
> + start = loader.event.data.document_start.tag_directives.start;
> + end = loader.event.data.document_start.tag_directives.end;
> + if (start == end)
> + goto no_tags;
> + if (end - start > 1) {
> + lua_pushnil(L);
> + lua_pushstring(L, "can not decode multiple tags");
> + return 2;
> + }
> + lua_pushstring(L, (const char *) start->handle);
> + lua_pushstring(L, (const char *) start->prefix);
> + delete_event(&loader);
> + yaml_parser_delete(&loader.parser);
> + return 2;
Why not make the API symmetric in what it expects and returns?
dump(object, options) -> stream
load(stream) -> object, options or error
> +
> +parse_error:
> + lua_pushnil(L);
> + /* Make nil be before an error message. */
> + lua_insert(L, -2);
> + return 2;
> +
> +no_tags:
> + lua_pushnil(L);
> + return 1;
> +}
> +
> static int dump_node(struct lua_yaml_dumper *dumper);
>
> static yaml_char_t *get_yaml_anchor(struct lua_yaml_dumper *dumper) {
> @@ -753,6 +819,7 @@ static const luaL_Reg yamllib[] = {
> { "encode", l_dump },
> { "encode_tagged", l_dump_tagged },
> { "decode", l_load },
> + { "decode_tag", l_load_tag },
> { "new", l_new },
> { NULL, NULL}
> };
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
More information about the Tarantool-patches
mailing list