[tarantool-patches] [PATCH v2 03/10] yaml: introduce yaml.decode_tag

Thu May 10 21:41:07 MSK 2018

* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [18/04/20 16:25]:

> Yaml.decode_tag allows to decode a single tag of a YAML document.

Why do you need a function to decode just the tag, not an entire
document with tags?

> 
> To distinguish them YAML tags will be used. A client console for
> each message will try to find a tag. If a tag is absent, then the
> message is a simple response on a request.

response to

> If a tag is !print!, then the document consists of a single
> string, that must be printed. Such a document must be decoded to
> get the printed string. So the calls sequence is yaml.decode_tag
> + yaml.decode. The reason why a print message must be decoded
> is that a print() result on a server side can be not
> well-formatted YAML, and must be encoded into it to be correctly
> send. For example, when I do on a server side something like
> console.print('very bad YAML string')
> 
> The result of a print() is not a YAML document, and to be sent it
> must be encoded into YAML on a server side.
> 
> If a tag is !push!, then the document is sent via
> box.session.push, and must not be decoded. It can be just printed
> or ignored or something.

It is nice you explain this convention in a changeset comment, but
I'd suggest to move the explanation to the relevant commit, i.e.
the one which uses the api you're adding in this commit.

> 
> Needed for #2677
> ---
>  test/app-tap/yaml.test.lua    | 30 ++++++++++++++++++-
>  third_party/lua-yaml/lyaml.cc | 67 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/test/app-tap/yaml.test.lua b/test/app-tap/yaml.test.lua
> index b81402cc0..ef64509fe 100755
> --- a/test/app-tap/yaml.test.lua
> +++ b/test/app-tap/yaml.test.lua
> @@ -70,7 +70,10 @@ local function test_output(test, s)
>  end
>  
>  local function test_tagged(test, s)
> -    test:plan(7)
> +    test:plan(15)
> +    --
> +    -- Test encoding tags.
> +    --
>      local must_be_err = 'Usage: encode_tagged({prefix = <string>, handle = <string>}, object)'
>      local prefix = 'tag:tarantool.io/push,2018'
>      local ok, err = pcall(s.encode_tagged)
> @@ -87,6 +90,31 @@ local function test_tagged(test, s)
>      test:is(ret, "%TAG !push! "..prefix.."\n--- 300\n...\n", "encode_tagged usage")
>      ret = s.encode_tagged({handle = '!print!', prefix = prefix}, {a = 100, b = 200})
>      test:is(ret, "%TAG !print! tag:tarantool.io/push,2018\n---\na: 100\nb: 200\n...\n", 'encode_tagged usage')
> +    --
> +    -- Test decoding tags.
> +    --
> +    must_be_err = "Usage: decode_tag(<string>)"

This is a bad test - any change to the error message will require
changes to the test. If you're testing that there is a relevant
usage message, you could search for substring.

As a person reading your code and test for it, I am at a loss what
kind of string the API expects. Is it a fragment of YAML document
containing the tag? Is it a single YAML document? Is it a stream
with multiple documents? I can see you're passing the return value
from the encoder into decode_tag(). Why is the API a-symmetric?

It should not be asymmetric in the first place, but if you decided
to make it one please this deserves an explanation in the
changeset comment.

> +    ok, err = pcall(s.decode_tag)

I suggest you simply have encode_tagged and decode_tagged. 
Or even simpler, extend dump/load, with tag support.

> +    test:is(err, must_be_err, "decode_tag usage")
> +    ok, err = pcall(s.decode_tag, false)
> +    test:is(err, must_be_err, "decode_tag usage")
> +    local handle, prefix = s.decode_tag(ret)
> +    test:is(handle, "!print!", "handle is decoded ok")
> +    test:is(prefix, "tag:tarantool.io/push,2018", "prefix is decoded ok")
> +    local several_tags =
> +[[%TAG !tag1! tag:tarantool.io/tag1,2018
> +%TAG !tag2! tag:tarantool.io/tag2,2018

Please add a  test case for multiple documents.

> +---
> +- 100
> +...
> +]]
> +    ok, err = s.decode_tag(several_tags)
> +    test:is(ok, nil, "can not decode multiple tags")
> +    test:is(err, "can not decode multiple tags", "same")
> +    local no_tags = s.encode(100)
> +    handle, prefix = s.decode_tag(no_tags)
> +    test:is(handle, nil, "no tag - no handle")
> +    test:is(prefix, nil, "no tag - no prefix")
>  end
>  
>  tap.test("yaml", function(test)
> diff --git a/third_party/lua-yaml/lyaml.cc b/third_party/lua-yaml/lyaml.cc
> index 8329f84e9..d24715edd 100644
> --- a/third_party/lua-yaml/lyaml.cc
> +++ b/third_party/lua-yaml/lyaml.cc
> @@ -361,6 +361,72 @@ static int l_load(lua_State *L) {
>     return loader.document_count;
>  }
>  
> +/**
> + * Decode a global tag of document. Multiple tags can not be
> + * decoded. In a case of multiple documents only first is
> + * processed.

Decode a document taking into account document tags. 
In case of success, pops the input from the stack and pushed
the document and a table with options, containing tag prefix and
tag handle.

> + * @param YAML document in string.
> + * @retval nil, err Error occured during decoding. In the second
> + *         value is error description.

> + * @retval nil, nil A document does not contain tags.
> + * @retval handle, prefix Handle and prefix of a tag.
> + */
> +static int
> +l_load_tag(struct lua_State *L)

Function name in C does not match the Lua name. Please make sure
the names match. 

I understand you sometimes may avoid extra work because you don't
believe I am ever going  to look at the patch, but this is not
extra work, this is just sloppy code.

> +{
> +   if (lua_gettop(L) != 1 || !lua_isstring(L, 1))
> +      return luaL_error(L, "Usage: decode_tag(<string>)");
> +   size_t len;
> +   const char *str = lua_tolstring(L, 1, &len);
> +   struct lua_yaml_loader loader;
> +   memset(&loader, 0, sizeof(loader));
> +   loader.L = L;
> +   loader.cfg = luaL_checkserializer(L);
> +   if (yaml_parser_initialize(&loader.parser) == 0)
> +      luaL_error(L, OOM_ERRMSG);
> +   yaml_tag_directive_t *start, *end;
> +   yaml_parser_set_input_string(&loader.parser, (const unsigned char *) str,
> +                                len);

Shouldn't you use yaml typedef here rather than const unsigned
char *?

> +   /* Initial parser step. Detect the documents start position. */
> +   if (do_parse(&loader) == 0)
> +      goto parse_error;
> +   if (loader.event.type != YAML_STREAM_START_EVENT) {
> +      lua_pushnil(L);
> +      lua_pushstring(L, "expected STREAM_START_EVENT");
> +      return 2;
> +   }

What is the current convention for dump/load API? Does it use nil,
err or lua_error() for errors? 

Why did you decide to depart from the current convention?

> +   /* Parse a document start. */
> +   if (do_parse(&loader) == 0)
> +      goto parse_error;
> +   if (loader.event.type == YAML_STREAM_END_EVENT)
> +      goto no_tags;
> +   assert(loader.event.type == YAML_DOCUMENT_START_EVENT);
> +   start = loader.event.data.document_start.tag_directives.start;
> +   end = loader.event.data.document_start.tag_directives.end;
> +   if (start == end)
> +      goto no_tags;
> +   if (end - start > 1) {
> +      lua_pushnil(L);
> +      lua_pushstring(L, "can not decode multiple tags");
> +      return 2;
> +   }
> +   lua_pushstring(L, (const char *) start->handle);
> +   lua_pushstring(L, (const char *) start->prefix);
> +   delete_event(&loader);
> +   yaml_parser_delete(&loader.parser);
> +   return 2;

Why not make the API symmetric in what it expects and returns?

dump(object, options) -> stream

load(stream) -> object, options or error

> +
> +parse_error:
> +   lua_pushnil(L);
> +   /* Make nil be before an error message. */
> +   lua_insert(L, -2);
> +   return 2;
> +
> +no_tags:
> +   lua_pushnil(L);
> +   return 1;
> +}
> +
>  static int dump_node(struct lua_yaml_dumper *dumper);
>  
>  static yaml_char_t *get_yaml_anchor(struct lua_yaml_dumper *dumper) {
> @@ -753,6 +819,7 @@ static const luaL_Reg yamllib[] = {
>     { "encode", l_dump },
>     { "encode_tagged", l_dump_tagged },
>     { "decode", l_load },
> +   { "decode_tag", l_load_tag },
>     { "new",    l_new },
>     { NULL, NULL}
>  };
-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov