[tarantool-patches] Re: [PATCH v2 03/10] yaml: introduce yaml.decode_tag

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Thu May 24 23:50:36 MSK 2018



On 10/05/2018 21:41, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [18/04/20 16:25]:
> 
>> Yaml.decode_tag allows to decode a single tag of a YAML document.
> 
> Why do you need a function to decode just the tag, not an entire
> document with tags?

Fixed on the branch. See the complete patch at the end of the letter. But
IMO now decode() function looks ugly. Option 'tag_only' requires to
create a Lua table on each call with the single option when it is needed.
And 'tag_only' itself is very strange. It is the same as we would make
message pack types as options like this:

- msgpack.decode_array/unsigned/string ...
+ msgpack.decode(..., {array/unsigned/string... = true})

But as you wish.

> 
>>
>> To distinguish them YAML tags will be used. A client console for
>> each message will try to find a tag. If a tag is absent, then the
>> message is a simple response on a request.
> 
> response to

Ditto.

>> @@ -87,6 +90,31 @@ local function test_tagged(test, s)
>>       test:is(ret, "%TAG !push! "..prefix.."\n--- 300\n...\n", "encode_tagged usage")
>>       ret = s.encode_tagged({handle = '!print!', prefix = prefix}, {a = 100, b = 200})
>>       test:is(ret, "%TAG !print! tag:tarantool.io/push,2018\n---\na: 100\nb: 200\n...\n", 'encode_tagged usage')
>> +    --
>> +    -- Test decoding tags.
>> +    --
>> +    must_be_err = "Usage: decode_tag(<string>)"
> 
> This is a bad test - any change to the error message will require
> changes to the test. If you're testing that there is a relevant
> usage message, you could search for substring.

@@ -93,11 +93,10 @@ local function test_tagged(test, s)
      --
      -- Test decoding tags.
      --
-    must_be_err = "Usage: yaml.decode(document, [{tag_only = boolean}])"
      ok, err = pcall(s.decode)
-    test:is(err, must_be_err, "decode usage")
+    test:isnt(err:find('Usage'), nil, "decode usage")
      ok, err = pcall(s.decode, false)
-    test:is(err, must_be_err, "decode usage")
+    test:isnt(err:find('Usage'), nil, "decode usage")
      local handle, prefix = s.decode(ret, {tag_only = true})
      test:is(handle, "!print!", "handle is decoded ok")
      test:is(prefix, "tag:tarantool.io/push,2018", "prefix is decoded ok")

Answers on your questions below are in the commit comment already. I will
copy-paste them here for you.

> 
> As a person reading your code and test for it, I am at a loss what
> kind of string the API expects. Is it a fragment of YAML document
> containing the tag? Is it a single YAML document? Is it a stream
> with multiple documents?

Cite: "Yaml.decode tag_only option allows to decode a single tag of a YAML document.".

> I can see you're passing the return value
> from the encoder into decode_tag(). Why is the API a-symmetric?
> 
> It should not be asymmetric in the first place, but if you decided
> to make it one please this deserves an explanation in the
> changeset comment.

Cite: "If a tag is !print!, then the document consists of a single
     string, that must be printed. Such a document must be decoded to
     get the printed string."

Cite: "If a tag is !push!, then the document is sent via
     box.session.push, and must not be decoded. It can be just printed
     or ignored or something."

Summary: it makes no sense to decode a push message in console, if it will be
in any case encoded back and printed. So the message can be just printed without
decode+encode.

> 
>> +    ok, err = pcall(s.decode_tag)
> 
> I suggest you simply have encode_tagged and decode_tagged.
> Or even simpler, extend dump/load, with tag support.

I do not need full tag support. I need to be able to encode with
a single tag - 'push' or 'print', and decode a *single* tag, not
the entire document. I explained why in the commit comment, and above.

> 
>> +    test:is(err, must_be_err, "decode_tag usage")
>> +    ok, err = pcall(s.decode_tag, false)
>> +    test:is(err, must_be_err, "decode_tag usage")
>> +    local handle, prefix = s.decode_tag(ret)
>> +    test:is(handle, "!print!", "handle is decoded ok")
>> +    test:is(prefix, "tag:tarantool.io/push,2018", "prefix is decoded ok")
>> +    local several_tags =
>> +[[%TAG !tag1! tag:tarantool.io/tag1,2018
>> +%TAG !tag2! tag:tarantool.io/tag2,2018
> 
> Please add a  test case for multiple documents.

+    local several_documents =
+[[
+%TAG !tag1! tag:tarantool.io/tag1,2018
+--- 1
+...
+
+%TAG !tag2! tag:tarantool.io/tag2,2018
+--- 2
+...
+
+]]
+    handle, prefix = s.decode(several_documents, {tag_only = true})
+    test:is(handle, "!tag1!", "tag handle on multiple documents")
+    test:is(prefix, "tag:tarantool.io/tag1,2018",
+            "tag prefix on multiple documents")

>> diff --git a/third_party/lua-yaml/lyaml.cc b/third_party/lua-yaml/lyaml.cc
>> index 8329f84e9..d24715edd 100644
>> --- a/third_party/lua-yaml/lyaml.cc
>> +++ b/third_party/lua-yaml/lyaml.cc
>> @@ -361,6 +361,72 @@ static int l_load(lua_State *L) {
>>      return loader.document_count;
>>   }
>>   
>> +/**
>> + * Decode a global tag of document. Multiple tags can not be
>> + * decoded. In a case of multiple documents only first is
>> + * processed.
> 
> Decode a document taking into account document tags.
> In case of success, pops the input from the stack and pushed
> the document and a table with options, containing tag prefix and
> tag handle.

Is not relevant after the previous comment.

> 
>> + * @param YAML document in string.
>> + * @retval nil, err Error occured during decoding. In the second
>> + *         value is error description.
> 
> 
>> + * @retval nil, nil A document does not contain tags.
>> + * @retval handle, prefix Handle and prefix of a tag.
>> + */
>> +static int
>> +l_load_tag(struct lua_State *L)
> 
> Function name in C does not match the Lua name. Please make sure
> the names match.

I did it advisedly. Lyaml library has such naming convention. We must
either convert the entire file, or obey this style.

> 
> I understand you sometimes may avoid extra work

Ok, thanks for the 'compliment'. I am 'happy' you think so about me. I
am working every day more then 12 hours for you, so 'obviously' I very very
'do not like' to work trying to avoid any. You are right as always.

> because you don't believe I am ever going  to look at the patch, but this is not
> extra work, this is just sloppy code.
> 
>> +{
>> +   if (lua_gettop(L) != 1 || !lua_isstring(L, 1))
>> +      return luaL_error(L, "Usage: decode_tag(<string>)");
>> +   size_t len;
>> +   const char *str = lua_tolstring(L, 1, &len);
>> +   struct lua_yaml_loader loader;
>> +   memset(&loader, 0, sizeof(loader));
>> +   loader.L = L;
>> +   loader.cfg = luaL_checkserializer(L);
>> +   if (yaml_parser_initialize(&loader.parser) == 0)
>> +      luaL_error(L, OOM_ERRMSG);
>> +   yaml_tag_directive_t *start, *end;
>> +   yaml_parser_set_input_string(&loader.parser, (const unsigned char *) str,
>> +                                len);
> 
> Shouldn't you use yaml typedef here rather than const unsigned
> char *?

It is not used in yaml_parser_set_input_string declaration, but as you wish.

-   yaml_parser_set_input_string(&loader.parser,
-                                (const unsigned char *) document, len);
+   yaml_parser_set_input_string(&loader.parser, (yaml_char_t *) document, len);
     if (lua_gettop(L) > 1) {


> 
>> +   /* Initial parser step. Detect the documents start position. */
>> +   if (do_parse(&loader) == 0)
>> +      goto parse_error;
>> +   if (loader.event.type != YAML_STREAM_START_EVENT) {
>> +      lua_pushnil(L);
>> +      lua_pushstring(L, "expected STREAM_START_EVENT");
>> +      return 2;
>> +   }
> 
> What is the current convention for dump/load API? Does it use nil,
> err or lua_error() for errors?

Throw on OOM, return nil and error object on other error types.

> 
> Why did you decide to depart from the current convention?

I respect the convention. Our convention is throw on OOM, and nil + error
object else.

> 
>> +   /* Parse a document start. */
>> +   if (do_parse(&loader) == 0)
>> +      goto parse_error;
>> +   if (loader.event.type == YAML_STREAM_END_EVENT)
>> +      goto no_tags;
>> +   assert(loader.event.type == YAML_DOCUMENT_START_EVENT);
>> +   start = loader.event.data.document_start.tag_directives.start;
>> +   end = loader.event.data.document_start.tag_directives.end;
>> +   if (start == end)
>> +      goto no_tags;
>> +   if (end - start > 1) {
>> +      lua_pushnil(L);
>> +      lua_pushstring(L, "can not decode multiple tags");
>> +      return 2;
>> +   }
>> +   lua_pushstring(L, (const char *) start->handle);
>> +   lua_pushstring(L, (const char *) start->prefix);
>> +   delete_event(&loader);
>> +   yaml_parser_delete(&loader.parser);
>> +   return 2;
> 
> Why not make the API symmetric in what it expects and returns?
> 
> dump(object, options) -> stream

Dump does not return a stream. It returns a single string with YAML documents.

Maybe you meant load? Load returns multiple values, and throws on any error, when
no options are specified. This is its old behavior that I can not break. But for
new functionality (tag_only option) I use the correct convention: objects or
nil + error.

See below the entire patch.

===============================================================================

commit 2d30cb1f3c28faf25fc39e33d7215d09d0814e3f
Author: Vladislav Shpilevoy <v.shpilevoy at tarantool.org>
Date:   Tue Apr 3 23:12:50 2018 +0300

     yaml: introduce yaml.decode tag_only option
     
     Yaml.decode tag_only option allows to decode a single tag of a
     YAML document. For #2677 it is needed to detect different push
     types in text console: print pushes via console.print, and actual
     pushes via box.session.push.
     
     To distinguish them YAML tags will be used. A client console for
     each message will try to find a tag. If a tag is absent, then the
     message is a simple response to a request.
     
     If a tag is !print!, then the document consists of a single
     string, that must be printed. Such a document must be decoded to
     get the printed string. So the calls sequence is
     yaml.decode(tag_only) + yaml.decode. The reason why a print
     message must be decoded is that a print() result on a server side
     can be not well-formatted YAML, and must be encoded into it to be
     correctly sent. For example, when I do on a server side something
     like this:
     
     console.print('very bad YAML string')
     
     The result of a print() is not a YAML document, and to be sent it
     must be encoded into YAML on a server side.
     
     If a tag is !push!, then the document is sent via
     box.session.push, and must not be decoded. It can be just printed
     or ignored or something.
     
     Needed for #2677

diff --git a/test/app-tap/yaml.test.lua b/test/app-tap/yaml.test.lua
index bb75ce4ea..c6eca100d 100755
--- a/test/app-tap/yaml.test.lua
+++ b/test/app-tap/yaml.test.lua
@@ -70,7 +70,10 @@ local function test_output(test, s)
  end
  
  local function test_tagged(test, s)
-    test:plan(7)
+    test:plan(17)
+    --
+    -- Test encoding tags.
+    --
      local prefix = 'tag:tarantool.io/push,2018'
      local ok, err = pcall(s.encode_tagged)
      test:isnt(err:find('Usage'), nil, "encode_tagged usage")
@@ -86,6 +89,45 @@ local function test_tagged(test, s)
      test:is(ret, "%TAG !push! "..prefix.."\n--- 300\n...\n", "encode_tagged usage")
      ret = s.encode_tagged({a = 100, b = 200}, {handle = '!print!', prefix = prefix})
      test:is(ret, "%TAG !print! tag:tarantool.io/push,2018\n---\na: 100\nb: 200\n...\n", 'encode_tagged usage')
+    --
+    -- Test decoding tags.
+    --
+    ok, err = pcall(s.decode)
+    test:isnt(err:find('Usage'), nil, "decode usage")
+    ok, err = pcall(s.decode, false)
+    test:isnt(err:find('Usage'), nil, "decode usage")
+    local handle, prefix = s.decode(ret, {tag_only = true})
+    test:is(handle, "!print!", "handle is decoded ok")
+    test:is(prefix, "tag:tarantool.io/push,2018", "prefix is decoded ok")
+    local several_tags =
+[[%TAG !tag1! tag:tarantool.io/tag1,2018
+%TAG !tag2! tag:tarantool.io/tag2,2018
+---
+- 100
+...
+]]
+    ok, err = s.decode(several_tags, {tag_only = true})
+    test:is(ok, nil, "can not decode multiple tags")
+    test:is(err, "can not decode multiple tags", "same")
+    local no_tags = s.encode(100)
+    handle, prefix = s.decode(no_tags, {tag_only = true})
+    test:is(handle, nil, "no tag - no handle")
+    test:is(prefix, nil, "no tag - no prefix")
+    local several_documents =
+[[
+%TAG !tag1! tag:tarantool.io/tag1,2018
+--- 1
+...
+
+%TAG !tag2! tag:tarantool.io/tag2,2018
+--- 2
+...
+
+]]
+    handle, prefix = s.decode(several_documents, {tag_only = true})
+    test:is(handle, "!tag1!", "tag handle on multiple documents")
+    test:is(prefix, "tag:tarantool.io/tag1,2018",
+            "tag prefix on multiple documents")
  end
  
  tap.test("yaml", function(test)
diff --git a/third_party/lua-yaml/lyaml.cc b/third_party/lua-yaml/lyaml.cc
index 33057da5d..03573dd39 100644
--- a/third_party/lua-yaml/lyaml.cc
+++ b/third_party/lua-yaml/lyaml.cc
@@ -309,6 +309,53 @@ static int load_node(struct lua_yaml_loader *loader) {
     }
  }
  
+/**
+ * Decode YAML document global tag onto Lua stack.
+ * @param loader Initialized loader to load tag from.
+ * @retval 2 Tag handle and prefix are pushed. Both are not nil.
+ * @retval 2 If an error occurred during decoding. Nil and error
+ *         string are pushed.
+ */
+static int load_tag(struct lua_yaml_loader *loader) {
+   yaml_tag_directive_t *start, *end;
+   /* Initial parser step. Detect the documents start position. */
+   if (do_parse(loader) == 0)
+      goto parse_error;
+   if (loader->event.type != YAML_STREAM_START_EVENT) {
+      lua_pushnil(loader->L);
+      lua_pushstring(loader->L, "expected STREAM_START_EVENT");
+      return 2;
+   }
+   /* Parse a document start. */
+   if (do_parse(loader) == 0)
+      goto parse_error;
+   if (loader->event.type == YAML_STREAM_END_EVENT)
+      goto no_tags;
+   assert(loader->event.type == YAML_DOCUMENT_START_EVENT);
+   start = loader->event.data.document_start.tag_directives.start;
+   end = loader->event.data.document_start.tag_directives.end;
+   if (start == end)
+      goto no_tags;
+   if (end - start > 1) {
+      lua_pushnil(loader->L);
+      lua_pushstring(loader->L, "can not decode multiple tags");
+      return 2;
+   }
+   lua_pushstring(loader->L, (const char *) start->handle);
+   lua_pushstring(loader->L, (const char *) start->prefix);
+   return 2;
+
+parse_error:
+   lua_pushnil(loader->L);
+   /* Make nil be before an error message. */
+   lua_insert(loader->L, -2);
+   return 2;
+
+no_tags:
+   lua_pushnil(loader->L);
+   return 1;
+}
+
  static void load(struct lua_yaml_loader *loader) {
     if (!do_parse(loader))
        return;
@@ -340,34 +387,59 @@ static void load(struct lua_yaml_loader *loader) {
     }
  }
  
+/**
+ * Decode YAML documents onto Lua stack. First value on stack
+ * is string with YAML document. Second value is options:
+ * {tag_only = boolean}. Options are not required.
+ * @retval N Pushed document count, if tag_only option is not
+ *         specified, or equals to false.
+ * @retval 2 If tag_only option is true. Tag handle and prefix
+ *         are pushed.
+ * @retval 2 If an error occurred during decoding. Nil and error
+ *         string are pushed.
+ */
  static int l_load(lua_State *L) {
     struct lua_yaml_loader loader;
-
-   luaL_argcheck(L, lua_isstring(L, 1), 1, "must provide a string argument");
-
+   if (! lua_isstring(L, 1)) {
+usage_error:
+      return luaL_error(L, "Usage: yaml.decode(document, "\
+                        "[{tag_only = boolean}])");
+   }
+   size_t len;
+   const char *document = lua_tolstring(L, 1, &len);
     loader.L = L;
     loader.cfg = luaL_checkserializer(L);
     loader.validevent = 0;
     loader.error = 0;
     loader.document_count = 0;
-
-   /* create table used to track anchors */
-   lua_newtable(L);
-   loader.anchortable_index = lua_gettop(L);
-
     if (!yaml_parser_initialize(&loader.parser))
-      luaL_error(L, OOM_ERRMSG);
-   yaml_parser_set_input_string(&loader.parser,
-      (const unsigned char *)lua_tostring(L, 1), lua_strlen(L, 1));
-   load(&loader);
+      return luaL_error(L, OOM_ERRMSG);
+   yaml_parser_set_input_string(&loader.parser, (yaml_char_t *) document, len);
+   bool tag_only;
+   if (lua_gettop(L) > 1) {
+      if (! lua_istable(L, 2))
+         goto usage_error;
+      lua_getfield(L, 2, "tag_only");
+      tag_only = lua_isboolean(L, -1) && lua_toboolean(L, -1);
+   } else {
+      tag_only = false;
+   }
  
+   int rc;
+   if (! tag_only) {
+      /* create table used to track anchors */
+      lua_newtable(L);
+      loader.anchortable_index = lua_gettop(L);
+      load(&loader);
+      if (loader.error)
+         lua_error(L);
+      rc = loader.document_count;
+   } else {
+      rc = load_tag(&loader);
+   }
     delete_event(&loader);
     yaml_parser_delete(&loader.parser);
-
-   if (loader.error)
-      lua_error(L);
-
-   return loader.document_count;
+   return rc;
  }
  
  static int dump_node(struct lua_yaml_dumper *dumper);



More information about the Tarantool-patches mailing list