[Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages

Sat Dec 7 00:16:57 MSK 2019

Hi! Thanks for the review.

> On Oct 17, 2019, at 00:45, Vladislav Shpilevoy <v.shpilevoy at tarantool.org> wrote:
> 
> Hi! Thanks for the patch!
> 
> Sorry, but I am not sure that now the messages are
> better. Just look at it:
> 
>    tarantool> json.decode('{')
>    ---
>    - error: Expected object key string but found '\0' on line 1 at character 2 here ^^^...
>    ...
> 
> What is `^^^`, what is `...`? In the ticket it is said, that a user should have a
> context, but there is no a context.
> 
> I was thinking, that the author of 4339 rather meant that ^^^ should be below the
> error in an error string, like this:
> 
>    tarantool> json.decode('{')
>    ---
>    - error: Expected object key string but found '\0' on line 1 at character 2 here:
>             "{ "
>              ^^^
>    ...
> 
> But I do realize, that it is too complex. And still ^^^ looks confusing.
> It looks like arrows pointing up, while you are trying to put a
> context after them.
> 
> At least, you could make a context like this:
> 
>    tarantool> json.decode('{')
>    ---
>    - error: Expected object key string but found '\0' on line 1 at character 2 here
>             "{>>\0<<"
>    ...
> 
> So you would have arrows pointing right at the bad position.
> 
> I tried an error not related to eof, and got the same confusing
> result:
> 
>    tarantool> json.decode('{1: "world"}')
>    ---
>    - error: 'Expected object key string but found int on line 1 at character 2 here ^^^:
>        "world"}...'
>    ...
> 
> From that message looks like the error is in ':', or in "world", or
> somewhere above because of up arrows.
Let’s just remove “T_*” tokens. Let print of a context will be “wontfix". Double
arrows isn’t very good, IMO, because we don’t know the length of wrong token.

> See 3 comments below.
> 
>> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
>> index 3d7edbf28..aab662585 100644
>> --- a/third_party/lua-cjson/lua_cjson.c
>> +++ b/third_party/lua-cjson/lua_cjson.c
>> @@ -72,23 +72,23 @@ typedef enum {
>> } json_token_type_t;
>> 
>> static const char *json_token_type_name[] = {
>> -    "T_OBJ_BEGIN",
>> -    "T_OBJ_END",
>> -    "T_ARR_BEGIN",
>> -    "T_ARR_END",
>> -    "T_STRING",
>> -    "T_UINT",
>> -    "T_INT",
>> -    "T_NUMBER",
>> -    "T_BOOLEAN",
>> -    "T_NULL",
>> -    "T_COLON",
>> -    "T_COMMA",
>> -    "T_END",
>> -    "T_WHITESPACE",
>> -    "T_LINEFEED",
>> -    "T_ERROR",
>> -    "T_UNKNOWN",
>> +    "'{'",
>> +    "'}'",
>> +    "'['",
>> +    "']'",
>> +    "string",
>> +    "unsigned int",
>> +    "int",
>> +    "number",
>> +    "boolean",
>> +    "null",
>> +    "':'",
>> +    "','",
>> +    "'\\0'",
> 
> 1. I don't think this is a good idea. Probably,
> just 'end' would be better. Users don't write
> the terminating zero explicitly, and its presence
> in an error message is confusing.
Done.

>> +    "whitespace",
>> +    "'\\n'",
> 
> 2. Please, no special characters: no \0, no \n, no \t
> or anything else. Here 'new line' would look easier
> to understand.
Done.

>> +    "error",
>> +    "unknown symbol",
>>     NULL
>> };
>> 
>> @@ -825,6 +825,21 @@ static void json_next_token(json_parse_t *json, json_token_t *token)
>>     json_set_token_error(token, json, "invalid token");
>> }
>> 
>> +/*
>> + * Copy 10 or less (if string @a ptr is shorter than 10)
>> + * characters to static string buffer @a aux_str.
>> + *
>> + * @param aux_str String static buffer to fill.
>> + * @param ptr String with the context.
>> + */
>> +static void fill_context(char *aux_str, const char *ptr)
> 
> 3. So as a context you understand 10 symbols *after* the error?
> I don't think that will help with error understanding, because
> a user won't even see the error position. I think, you need to
> take some symbols before and some after the bad position.
> 
> Also, please, make 10 a enum constant.
> 
>> +{
>> +    int i = 0;
>> +    for(; ptr[i] != '\0' && i < 10; i++)
>> +        aux_str[i] = ptr[i];
>> +    aux_str[i] = '\0';
>> +}
>> +
>> /* This function does not return.
>>  * DO NOT CALL WITH DYNAMIC MEMORY ALLOCATED.
>>  * The only supported exception is the temporary parser string
> 

commit 5642c534a599dcb1ada004a4d6ff4555063ff03d
Author: Roman Khabibov <roman.habibov at tarantool.org>
Date:   Wed Oct 9 17:07:41 2019 +0300

    json: make error messages more readable
    
    Print tokens themselves instead of token names "T_*" in the error
    messages.
    
    Closes #4339

diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua
index fadfc74ec..de277f2f7 100755
--- a/test/app-tap/json.test.lua
+++ b/test/app-tap/json.test.lua
@@ -22,7 +22,7 @@ end
 
 tap.test("json", function(test)
     local serializer = require('json')
-    test:plan(40)
+    test:plan(51)
 
     test:test("unsigned", common.test_unsigned, serializer)
     test:test("signed", common.test_signed, serializer)
@@ -154,4 +154,34 @@ tap.test("json", function(test)
     _, err_msg = pcall(serializer.decode, '{"hello": "world",\n 100: 200}')
     test:ok(string.find(err_msg, 'line 2 at character 2') ~= nil,
             'mistake on second line')
+
+    --
+    -- gh-3316: Make sure that tokens 'T_*' are  absent in error
+    -- messages and a context is printed.
+    --
+    _, err_msg = pcall(serializer.decode, '{{: "world"}')
+    test:ok(string.find(err_msg, '\'{\'') ~= nil, '"{" instead of T_OBJ_BEGIN')
+    _, err_msg = pcall(serializer.decode, '{"a": "world"}}')
+    test:ok(string.find(err_msg, '\'}\'') ~= nil, '"}" instead of T_OBJ_END')
+    _, err_msg = pcall(serializer.decode, '{[: "world"}')
+    test:ok(string.find(err_msg, '\'[\'', 1, true) ~= nil,
+            '"[" instead of T_ARR_BEGIN')
+    _, err_msg = pcall(serializer.decode, '{]: "world"}')
+    test:ok(string.find(err_msg, '\']\'', 1, true) ~= nil,
+            '"]" instead of T_ARR_END')
+    _, err_msg = pcall(serializer.decode, '{1: "world"}')
+    test:ok(string.find(err_msg, 'int') ~= nil, 'int instead of T_INT')
+    _, err_msg = pcall(serializer.decode, '{1.0: "world"}')
+    test:ok(string.find(err_msg, 'number') ~= nil, 'number instead of T_NUMBER')
+    _, err_msg = pcall(serializer.decode, '{true: "world"}')
+    test:ok(string.find(err_msg, 'boolean') ~= nil,
+            'boolean instead of T_BOOLEAN')
+    _, err_msg = pcall(serializer.decode, '{null: "world"}')
+    test:ok(string.find(err_msg, 'null') ~= nil, 'null instead of T_NULL')
+    _, err_msg = pcall(serializer.decode, '{:: "world"}')
+    test:ok(string.find(err_msg, 'colon') ~= nil, 'colon instead of T_COLON')
+    _, err_msg = pcall(serializer.decode, '{,: "world"}')
+    test:ok(string.find(err_msg, 'comma') ~= nil, 'comma instead of T_COMMA')
+    _, err_msg = pcall(serializer.decode, '{')
+    test:ok(string.find(err_msg, 'end') ~= nil, 'end instead of T_END')
 end)
diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
index 3d25814f3..e68b52847 100644
--- a/third_party/lua-cjson/lua_cjson.c
+++ b/third_party/lua-cjson/lua_cjson.c
@@ -72,23 +72,23 @@ typedef enum {
 } json_token_type_t;
 
 static const char *json_token_type_name[] = {
-    "T_OBJ_BEGIN",
-    "T_OBJ_END",
-    "T_ARR_BEGIN",
-    "T_ARR_END",
-    "T_STRING",
-    "T_UINT",
-    "T_INT",
-    "T_NUMBER",
-    "T_BOOLEAN",
-    "T_NULL",
-    "T_COLON",
-    "T_COMMA",
-    "T_END",
-    "T_WHITESPACE",
-    "T_LINEFEED",
-    "T_ERROR",
-    "T_UNKNOWN",
+    "'{'",
+    "'}'",
+    "'['",
+    "']'",
+    "string",
+    "unsigned int",
+    "int",
+    "number",
+    "boolean",
+    "null",
+    "colon",
+    "comma",
+    "end",
+    "whitespace",
+    "line feed",
+    "error",
+    "unknown symbol",
     NULL
 };
 
@@ -914,7 +914,7 @@ static void json_parse_object_context(lua_State *l, json_parse_t *json)
         }
 
         if (token.type != T_COMMA)
-            json_throw_parse_error(l, json, "comma or object end", &token);
+            json_throw_parse_error(l, json, "comma or '}'", &token);
 
         json_next_token(json, &token);
     }
@@ -954,7 +954,7 @@ static void json_parse_array_context(lua_State *l, json_parse_t *json)
         }
 
         if (token.type != T_COMMA)
-            json_throw_parse_error(l, json, "comma or array end", &token);
+            json_throw_parse_error(l, json, "comma or ']'", &token);
 
         json_next_token(json, &token);
     }