[tarantool-patches] Re: [PATCH v2 2/3] lua-yaml: fix boolean/null representation in yaml

Alexander Turenko alexander.turenko at tarantool.org
Mon Feb 18 21:55:22 MSK 2019


I answered to the comments below and noted the fixes were made. Rebased
the patchset on top of fresh 2.1.

The new patch is at end of the email.

WBR, Alexander Turenko.

On Tue, Feb 05, 2019 at 10:36:40PM +0300, Vladislav Shpilevoy wrote:
> Thanks for the fixes!
> 
> > > > diff --git a/test/app-tap/console.test.lua b/test/app-tap/console.test.lua
> > > > index 4f915afd7..68f273234 100755
> > > > --- a/test/app-tap/console.test.lua
> > > > +++ b/test/app-tap/console.test.lua
> > > > @@ -77,11 +77,24 @@ test:is(yaml.decode(client:read(EOL)), '', "clear delimiter")
> > > >    --
> > > >    -- gh-3476: yaml.encode encodes 'false' and 'true' incorrectly.
> > > > +-- gh-3662: yaml.encode encodes booleans with multiline format.
> > > > +-- gh-3583: yaml.encode encodes null incorrectly.
> > > >    --
> > > > -test:is(type(yaml.decode(yaml.encode('false'))), 'string')
> > > > -test:is(type(yaml.decode(yaml.encode('true'))), 'string')
> > > > -test:is(type(yaml.decode(yaml.encode({a = 'false'})).a), 'string')
> > > > -test:is(type(yaml.decode(yaml.encode({a = 'false'})).a), 'string')
> > > 
> > > 1. Are you sure that this is the correct behaviour? So now it is
> > > not true that decode(encode(obj)) == obj? Looks utterly confusing.
> > > Or why else you removed these tests?
> > 
> > They are work as before, but now superseded with simpler ones.
> 
> It is not obvious. I see a test
> 
>     yaml.encode(false) == "--- false\n...\n"
> 
> I see another test:
> 
>     yaml.decode('false') == false
> 
> But where is this? -
> 
>     yaml.decode("--- false\n...\n") == false

The removed test cases covered one case that is not covered with the new
cases: they decode a yaml document, not just a scalar. And it is
confusing why I remove test cases in the patch. So it worth to just add
them back.

Added.

> 
> And why "false" == "--- false\n...\n" according to the tests
> in the patch?

`---` and `...` is a yaml document start and end, see [1]. A sequence of
documents forms a yaml stream like a sequence of stanzas forms a xml
stream. We decode one document to its value. Multiple documents however
forms a table. Anyway, it does not matter much in context of this issue.

[1]: https://yaml.org/spec/1.2/spec.html#id2800401

> 
> > > 
> > > > +{
> > > > +   if (len > 5)
> > > > +      return YAML_NO_MATCH;
> > > > +   if (strcmp(str, "false") == 0 || strcmp(str, "no") == 0)
> > > > +      return YAML_FALSE;
> > > > +   if (strcmp(str, "true") == 0 || strcmp(str, "yes") == 0)
> > > 
> > > 3. I think, it is redundant to firstly check length and then
> > > call strcmp, which also calculates length (by checking zero
> > > byte). We need either use only strcmp(), or use length comparison
> > > + memcmp. The same below for the null parser.
> > 
> > Okay, it was the premature optimization.
> > 
> > The more important thing is that is was incorrect: "fals" gives
> > YAML_FALSE.
> > 
> > I had fixed it like so, is it okay?
> > 
> > ```
> > if ((len == 5 && memcmp(str, "false", 5) == 0) ||
> >      (len == 2 && memcmp(str, "no", 2) == 0)) {
> >     if (out != NULL)
> >        *out = false;
> >     return 0;
> > }
> > ...
> > ```
> 
> Yes, this fix is ok, thanks.

Ok.

> 
> > 
> > > 
> > > > +      return YAML_TRUE;
> > > > +   return YAML_NO_MATCH;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Verify whether a string represents a null literal in YAML.
> > > > + *
> > > > + * Non-standard: don't match an empty string as null.
> > > > + */
> > > > +static yaml_type
> > > > +yaml_get_null(const char *str, const size_t len){
> > > 
> > > 4. ?
> > > 
> > >      static inline int
> > >      yaml_parse_null(const char *str, size_t len, bool *is_null);
> > 
> > is_null is redundant here, because a return code contain all needed
> > information. Agreed except that. Changed.
> 
> My point was to standardize parser functions so as to always
> return 0/-1 on success/error, and always return an actual value
> via an out parameter. But now I see, that -1 here is not really an
> error, but rather 'not being of a requested type'. So here we
> do not have even a notion of 'error'.
> 
> In such a case, please, consider the fixes below. Also, I removed
> optionality of out parameter in yaml_parse_bool in order to get rid
> of 'if'.

I don't mind. Integrated these changes into the patch (with minor
wording / formatting changes).

> 
> diff --git a/third_party/lua-yaml/lyaml.cc b/third_party/lua-yaml/lyaml.cc
> index 5afcace1b..a4e03a8ba 100644
> --- a/third_party/lua-yaml/lyaml.cc
> +++ b/third_party/lua-yaml/lyaml.cc
> @@ -96,46 +96,43 @@ struct lua_yaml_dumper {
>  /**
>   * Verify whether a string represents a boolean literal in YAML.
> - *
>   * Non-standard: only subset of YAML 1.1 boolean literals are
>   * treated as boolean values.
> - *
> - * Return 0 at success, -1 at error.
> + * @param str Literal.
> + * @param len Length of @a str.
> + * @param[out] out Result boolean value.
> + * @retval Whether @a str represents a boolean value or not.
>   */
> -static inline int
> -yaml_parse_bool(const char *str, size_t len, bool *out)
> +static inline bool
> +yaml_is_bool(const char *str, size_t len, bool *out)
>  {
>     if ((len == 5 && memcmp(str, "false", 5) == 0) ||
>         (len == 2 && memcmp(str, "no", 2) == 0)) {
> -      if (out != NULL)
> -         *out = false;
> -      return 0;
> +      *out = false;
> +      return true;
>     }
>     if ((len == 4 && memcmp(str, "true", 4) == 0) ||
>         (len == 3 && memcmp(str, "yes", 3) == 0)) {
> -      if (out != NULL)
> -         *out = true;
> -      return 0;
> +      *out = true;
> +      return true;
>     }
> -   return -1;
> +   return false;
>  }
>  /**
>   * Verify whether a string represents a null literal in YAML.
> - *
>   * Non-standard: don't match an empty string, 'Null' and 'NULL' as
>   * null.
> - *
> - * Return 0 at success, -1 at error.
> + * @retval Whether @a str represents NULL or not.
>   */
> -static inline int
> -yaml_parse_null(const char *str, size_t len)
> +static inline bool
> +yaml_is_null(const char *str, size_t len)
>  {
>     if (len == 1 && str[0] == '~')
> -      return 0;
> +      return true;
>     if (len == 4 && memcmp(str, "null", 4) == 0)
> -      return 0;
> -   return -1;
> +      return true;
> +   return false;
>  }
>  static void generate_error_message(struct lua_yaml_loader *loader) {
> @@ -254,9 +251,9 @@ static void load_scalar(struct lua_yaml_loader *loader) {
>           return;
>        } else if (!strcmp(tag, "bool")) {
>           bool value = false;
> -         int rc = yaml_parse_bool(str, length, &value);
> +         bool rc = yaml_is_bool(str, length, &value);
>           (void) rc;
> -         assert(rc == 0);
> +         assert(rc);

I removed the assert() here, because yaml.decode('!!bool null') should
not lead to an assertion fail. I didn't investigate why the behaviour is
to produce `false` rather then raise an error, just keep the behaviour
the same as before. However I don't add a test, because it is unclear
whether the module behaves in this way with intention or by occasion.

>           lua_pushboolean(loader->L, value);
>           return;
>        } else if (!strcmp(tag, "binary")) {
> @@ -275,10 +272,10 @@ static void load_scalar(struct lua_yaml_loader *loader) {
>            */
>           lua_pushliteral(loader->L, "");
>           return;
> -      } else if (yaml_parse_null(str, length) == 0) {
> +      } else if (yaml_is_null(str, length)) {
>           luaL_pushnull(loader->L);
>           return;
> -      } else if (yaml_parse_bool(str, length, &value) == 0) {
> +      } else if (yaml_is_bool(str, length, &value)) {
>           lua_pushboolean(loader->L, value);
>           return;
>        }
> @@ -610,6 +607,7 @@ static int dump_node(struct lua_yaml_dumper *dumper)
>     yaml_event_t ev;
>     yaml_scalar_style_t style = YAML_PLAIN_SCALAR_STYLE;
>     int is_binary = 0;
> +   bool unused;
>     char buf[FPCONV_G_FMT_BUFSIZE];
>     struct luaL_field field;
> @@ -644,9 +642,8 @@ static int dump_node(struct lua_yaml_dumper *dumper)
>        return dump_table(dumper, &field);
>     case MP_STR:
>        str = lua_tolstring(dumper->L, -1, &len);
> -      if ((yaml_parse_null(str, len) == 0)
> -         || (yaml_parse_bool(str, len, NULL) == 0)
> -         || (lua_isnumber(dumper->L, -1))) {
> +      if (yaml_is_null(str, len) || yaml_is_bool(str, len, &unused) ||
> +          lua_isnumber(dumper->L, -1)) {
>           /*
>            * The string is convertible to a null, a boolean or
>            * a number, quote it to preserve its type.
> 

----

commit 8531b4af03016a69d82bff35504ef6ae3ffbdb82
Author: Alexander Turenko <alexander.turenko at tarantool.org>
Date:   Tue Jan 22 01:03:01 2019 +0300

    lua-yaml: fix strings literals encoding in yaml
    
    yaml.encode() now wraps a string literal whose content is equal to a
    null or a boolean value representation in YAML into single quotes. Those
    literals are: 'false', 'no', 'true', 'yes', '~', 'null'.
    
    Reverted the a2d7643c commit to use single quotes instead of multiline
    encoding for 'true' and 'false' string literals.
    
    Follows up #3476
    Closes #3662
    Closes #3583

diff --git a/test/app-tap/console.test.lua b/test/app-tap/console.test.lua
index 4f915afd7..a03df5277 100755
--- a/test/app-tap/console.test.lua
+++ b/test/app-tap/console.test.lua
@@ -21,7 +21,7 @@ local EOL = "\n...\n"
 
 test = tap.test("console")
 
-test:plan(59)
+test:plan(70)
 
 -- Start console and connect to it
 local server = console.listen(CONSOLE_SOCKET)
@@ -77,12 +77,28 @@ test:is(yaml.decode(client:read(EOL)), '', "clear delimiter")
 
 --
 -- gh-3476: yaml.encode encodes 'false' and 'true' incorrectly.
+-- gh-3662: yaml.encode encodes booleans with multiline format.
+-- gh-3583: yaml.encode encodes null incorrectly.
 --
+
 test:is(type(yaml.decode(yaml.encode('false'))), 'string')
 test:is(type(yaml.decode(yaml.encode('true'))), 'string')
 test:is(type(yaml.decode(yaml.encode({a = 'false'})).a), 'string')
 test:is(type(yaml.decode(yaml.encode({a = 'false'})).a), 'string')
 
+test:is(yaml.encode(false), "--- false\n...\n")
+test:is(yaml.encode(true), "--- true\n...\n")
+test:is(yaml.encode('false'), "--- 'false'\n...\n")
+test:is(yaml.encode('true'), "--- 'true'\n...\n")
+test:is(yaml.encode(nil), "--- null\n...\n")
+
+test:is(yaml.decode('false'), false)
+test:is(yaml.decode('no'), false)
+test:is(yaml.decode('true'), true)
+test:is(yaml.decode('yes'), true)
+test:is(yaml.decode('~'), nil)
+test:is(yaml.decode('null'), nil)
+
 box.cfg{
     listen=IPROTO_SOCKET;
     memtx_memory = 107374182,
diff --git a/third_party/lua-yaml/lyaml.cc b/third_party/lua-yaml/lyaml.cc
index bd876ab29..46c98bde1 100644
--- a/third_party/lua-yaml/lyaml.cc
+++ b/third_party/lua-yaml/lyaml.cc
@@ -94,6 +94,55 @@ struct lua_yaml_dumper {
    luaL_Buffer yamlbuf;
 };
 
+/**
+ * Verify whether a string represents a boolean literal in YAML.
+ *
+ * Non-standard: only subset of YAML 1.1 boolean literals are
+ * treated as boolean values.
+ *
+ * @param str Literal to check.
+ * @param len Length of @a str.
+ * @param[out] out Result boolean value.
+ *
+ * @retval Whether @a str represents a boolean value.
+ */
+static inline bool
+yaml_is_bool(const char *str, size_t len, bool *out)
+{
+   if ((len == 5 && memcmp(str, "false", 5) == 0) ||
+       (len == 2 && memcmp(str, "no", 2) == 0)) {
+      *out = false;
+      return true;
+   }
+   if ((len == 4 && memcmp(str, "true", 4) == 0) ||
+       (len == 3 && memcmp(str, "yes", 3) == 0)) {
+      *out = true;
+      return true;
+   }
+   return false;
+}
+
+/**
+ * Verify whether a string represents a null literal in YAML.
+ *
+ * Non-standard: don't match an empty string, 'Null' and 'NULL' as
+ * null.
+ *
+ * @param str Literal to check.
+ * @param len Length of @a str.
+ *
+ * @retval Whether @a str represents a null value.
+ */
+static inline bool
+yaml_is_null(const char *str, size_t len)
+{
+   if (len == 1 && str[0] == '~')
+      return true;
+   if (len == 4 && memcmp(str, "null", 4) == 0)
+      return true;
+   return false;
+}
+
 static void generate_error_message(struct lua_yaml_loader *loader) {
    char buf[256];
    luaL_Buffer b;
@@ -209,7 +258,9 @@ static void load_scalar(struct lua_yaml_loader *loader) {
          lua_pushnumber(loader->L, dval);
          return;
       } else if (!strcmp(tag, "bool")) {
-         lua_pushboolean(loader->L, !strcmp(str, "true") || !strcmp(str, "yes"));
+         bool value = false;
+         yaml_is_bool(str, length, &value);
+         lua_pushboolean(loader->L, value);
          return;
       } else if (!strcmp(tag, "binary")) {
          frombase64(loader->L, (const unsigned char *)str, length);
@@ -218,20 +269,20 @@ static void load_scalar(struct lua_yaml_loader *loader) {
    }
 
    if (loader->event.data.scalar.style == YAML_PLAIN_SCALAR_STYLE) {
-      if (!strcmp(str, "~")) {
-         luaL_pushnull(loader->L);
-         return;
-      } else if (!strcmp(str, "true") || !strcmp(str, "yes")) {
-         lua_pushboolean(loader->L, 1);
-         return;
-      } else if (!strcmp(str, "false") || !strcmp(str, "no")) {
-         lua_pushboolean(loader->L, 0);
+      bool value;
+      if (!length) {
+         /*
+          * Non-standard: an empty value/document is null
+          * according to the standard, but we decode it as an
+          * empty string.
+          */
+         lua_pushliteral(loader->L, "");
          return;
-      } else if (!strcmp(str, "null")) {
+      } else if (yaml_is_null(str, length)) {
          luaL_pushnull(loader->L);
          return;
-      } else if (!length) {
-         lua_pushliteral(loader->L, "");
+      } else if (yaml_is_bool(str, length, &value)) {
+         lua_pushboolean(loader->L, value);
          return;
       }
 
@@ -547,7 +598,6 @@ static int yaml_is_flow_mode(struct lua_yaml_dumper *dumper) {
                (evp->type == YAML_MAPPING_START_EVENT &&
                 evp->data.mapping_start.style == YAML_FLOW_MAPPING_STYLE)) {
             return 1;
-            break;
          }
       }
    }
@@ -564,6 +614,8 @@ static int dump_node(struct lua_yaml_dumper *dumper)
    int is_binary = 0;
    char buf[FPCONV_G_FMT_BUFSIZE];
    struct luaL_field field;
+   bool unused;
+   (void) unused;
 
    int top = lua_gettop(dumper->L);
    luaL_checkfield(dumper->L, dumper->cfg, top, &field);
@@ -596,8 +648,12 @@ static int dump_node(struct lua_yaml_dumper *dumper)
       return dump_table(dumper, &field);
    case MP_STR:
       str = lua_tolstring(dumper->L, -1, &len);
-      if (lua_isnumber(dumper->L, -1)) {
-         /* string is convertible to number, quote it to preserve type */
+      if (yaml_is_null(str, len) || yaml_is_bool(str, len, &unused) ||
+          lua_isnumber(dumper->L, -1)) {
+         /*
+          * The string is convertible to a null, a boolean or
+          * a number, quote it to preserve its type.
+          */
          style = YAML_SINGLE_QUOTED_SCALAR_STYLE;
          break;
       }
@@ -605,12 +661,10 @@ static int dump_node(struct lua_yaml_dumper *dumper)
       if (utf8_check_printable(str, len)) {
          if (yaml_is_flow_mode(dumper)) {
             style = YAML_SINGLE_QUOTED_SCALAR_STYLE;
-         } else if (strstr(str, "\n\n") != NULL || strcmp(str, "true") == 0 ||
-		    strcmp(str, "false") == 0) {
+         } else if (strstr(str, "\n\n") != 0) {
             /*
              * Tarantool-specific: use literal style for string
-             * with empty lines and strings representing boolean
-             * types.
+             * with empty lines.
              * Useful for tutorial().
              */
             style = YAML_LITERAL_SCALAR_STYLE;




More information about the Tarantool-patches mailing list