Tarantool development patches archive
 help / color / mirror / Atom feed
* Re: [Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages
       [not found] ` <ad789ccf-092e-99ff-d8cf-1b439c788e27@tarantool.org>
@ 2019-12-06 21:16   ` Roman Khabibov
  2019-12-10 23:20     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 2+ messages in thread
From: Roman Khabibov @ 2019-12-06 21:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi! Thanks for the review.

> On Oct 17, 2019, at 00:45, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the patch!
> 
> Sorry, but I am not sure that now the messages are
> better. Just look at it:
> 
>    tarantool> json.decode('{')
>    ---
>    - error: Expected object key string but found '\0' on line 1 at character 2 here ^^^...
>    ...
> 
> What is `^^^`, what is `...`? In the ticket it is said, that a user should have a
> context, but there is no a context.
> 
> I was thinking, that the author of 4339 rather meant that ^^^ should be below the
> error in an error string, like this:
> 
>    tarantool> json.decode('{')
>    ---
>    - error: Expected object key string but found '\0' on line 1 at character 2 here:
>             "{ "
>              ^^^
>    ...
> 
> But I do realize, that it is too complex. And still ^^^ looks confusing.
> It looks like arrows pointing up, while you are trying to put a
> context after them.
> 
> At least, you could make a context like this:
> 
>    tarantool> json.decode('{')
>    ---
>    - error: Expected object key string but found '\0' on line 1 at character 2 here
>             "{>>\0<<"
>    ...
> 
> So you would have arrows pointing right at the bad position.
> 
> I tried an error not related to eof, and got the same confusing
> result:
> 
>    tarantool> json.decode('{1: "world"}')
>    ---
>    - error: 'Expected object key string but found int on line 1 at character 2 here ^^^:
>        "world"}...'
>    ...
> 
> From that message looks like the error is in ':', or in "world", or
> somewhere above because of up arrows.
Let’s just remove “T_*” tokens. Let print of a context will be “wontfix". Double
arrows isn’t very good, IMO, because we don’t know the length of wrong token.

> See 3 comments below.
> 
>> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
>> index 3d7edbf28..aab662585 100644
>> --- a/third_party/lua-cjson/lua_cjson.c
>> +++ b/third_party/lua-cjson/lua_cjson.c
>> @@ -72,23 +72,23 @@ typedef enum {
>> } json_token_type_t;
>> 
>> static const char *json_token_type_name[] = {
>> -    "T_OBJ_BEGIN",
>> -    "T_OBJ_END",
>> -    "T_ARR_BEGIN",
>> -    "T_ARR_END",
>> -    "T_STRING",
>> -    "T_UINT",
>> -    "T_INT",
>> -    "T_NUMBER",
>> -    "T_BOOLEAN",
>> -    "T_NULL",
>> -    "T_COLON",
>> -    "T_COMMA",
>> -    "T_END",
>> -    "T_WHITESPACE",
>> -    "T_LINEFEED",
>> -    "T_ERROR",
>> -    "T_UNKNOWN",
>> +    "'{'",
>> +    "'}'",
>> +    "'['",
>> +    "']'",
>> +    "string",
>> +    "unsigned int",
>> +    "int",
>> +    "number",
>> +    "boolean",
>> +    "null",
>> +    "':'",
>> +    "','",
>> +    "'\\0'",
> 
> 1. I don't think this is a good idea. Probably,
> just 'end' would be better. Users don't write
> the terminating zero explicitly, and its presence
> in an error message is confusing.
Done.

>> +    "whitespace",
>> +    "'\\n'",
> 
> 2. Please, no special characters: no \0, no \n, no \t
> or anything else. Here 'new line' would look easier
> to understand.
Done.

>> +    "error",
>> +    "unknown symbol",
>>     NULL
>> };
>> 
>> @@ -825,6 +825,21 @@ static void json_next_token(json_parse_t *json, json_token_t *token)
>>     json_set_token_error(token, json, "invalid token");
>> }
>> 
>> +/*
>> + * Copy 10 or less (if string @a ptr is shorter than 10)
>> + * characters to static string buffer @a aux_str.
>> + *
>> + * @param aux_str String static buffer to fill.
>> + * @param ptr String with the context.
>> + */
>> +static void fill_context(char *aux_str, const char *ptr)
> 
> 3. So as a context you understand 10 symbols *after* the error?
> I don't think that will help with error understanding, because
> a user won't even see the error position. I think, you need to
> take some symbols before and some after the bad position.
> 
> Also, please, make 10 a enum constant.
> 
>> +{
>> +    int i = 0;
>> +    for(; ptr[i] != '\0' && i < 10; i++)
>> +        aux_str[i] = ptr[i];
>> +    aux_str[i] = '\0';
>> +}
>> +
>> /* This function does not return.
>>  * DO NOT CALL WITH DYNAMIC MEMORY ALLOCATED.
>>  * The only supported exception is the temporary parser string
> 

commit 5642c534a599dcb1ada004a4d6ff4555063ff03d
Author: Roman Khabibov <roman.habibov@tarantool.org>
Date:   Wed Oct 9 17:07:41 2019 +0300

    json: make error messages more readable
    
    Print tokens themselves instead of token names "T_*" in the error
    messages.
    
    Closes #4339

diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua
index fadfc74ec..de277f2f7 100755
--- a/test/app-tap/json.test.lua
+++ b/test/app-tap/json.test.lua
@@ -22,7 +22,7 @@ end
 
 tap.test("json", function(test)
     local serializer = require('json')
-    test:plan(40)
+    test:plan(51)
 
     test:test("unsigned", common.test_unsigned, serializer)
     test:test("signed", common.test_signed, serializer)
@@ -154,4 +154,34 @@ tap.test("json", function(test)
     _, err_msg = pcall(serializer.decode, '{"hello": "world",\n 100: 200}')
     test:ok(string.find(err_msg, 'line 2 at character 2') ~= nil,
             'mistake on second line')
+
+    --
+    -- gh-3316: Make sure that tokens 'T_*' are  absent in error
+    -- messages and a context is printed.
+    --
+    _, err_msg = pcall(serializer.decode, '{{: "world"}')
+    test:ok(string.find(err_msg, '\'{\'') ~= nil, '"{" instead of T_OBJ_BEGIN')
+    _, err_msg = pcall(serializer.decode, '{"a": "world"}}')
+    test:ok(string.find(err_msg, '\'}\'') ~= nil, '"}" instead of T_OBJ_END')
+    _, err_msg = pcall(serializer.decode, '{[: "world"}')
+    test:ok(string.find(err_msg, '\'[\'', 1, true) ~= nil,
+            '"[" instead of T_ARR_BEGIN')
+    _, err_msg = pcall(serializer.decode, '{]: "world"}')
+    test:ok(string.find(err_msg, '\']\'', 1, true) ~= nil,
+            '"]" instead of T_ARR_END')
+    _, err_msg = pcall(serializer.decode, '{1: "world"}')
+    test:ok(string.find(err_msg, 'int') ~= nil, 'int instead of T_INT')
+    _, err_msg = pcall(serializer.decode, '{1.0: "world"}')
+    test:ok(string.find(err_msg, 'number') ~= nil, 'number instead of T_NUMBER')
+    _, err_msg = pcall(serializer.decode, '{true: "world"}')
+    test:ok(string.find(err_msg, 'boolean') ~= nil,
+            'boolean instead of T_BOOLEAN')
+    _, err_msg = pcall(serializer.decode, '{null: "world"}')
+    test:ok(string.find(err_msg, 'null') ~= nil, 'null instead of T_NULL')
+    _, err_msg = pcall(serializer.decode, '{:: "world"}')
+    test:ok(string.find(err_msg, 'colon') ~= nil, 'colon instead of T_COLON')
+    _, err_msg = pcall(serializer.decode, '{,: "world"}')
+    test:ok(string.find(err_msg, 'comma') ~= nil, 'comma instead of T_COMMA')
+    _, err_msg = pcall(serializer.decode, '{')
+    test:ok(string.find(err_msg, 'end') ~= nil, 'end instead of T_END')
 end)
diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
index 3d25814f3..e68b52847 100644
--- a/third_party/lua-cjson/lua_cjson.c
+++ b/third_party/lua-cjson/lua_cjson.c
@@ -72,23 +72,23 @@ typedef enum {
 } json_token_type_t;
 
 static const char *json_token_type_name[] = {
-    "T_OBJ_BEGIN",
-    "T_OBJ_END",
-    "T_ARR_BEGIN",
-    "T_ARR_END",
-    "T_STRING",
-    "T_UINT",
-    "T_INT",
-    "T_NUMBER",
-    "T_BOOLEAN",
-    "T_NULL",
-    "T_COLON",
-    "T_COMMA",
-    "T_END",
-    "T_WHITESPACE",
-    "T_LINEFEED",
-    "T_ERROR",
-    "T_UNKNOWN",
+    "'{'",
+    "'}'",
+    "'['",
+    "']'",
+    "string",
+    "unsigned int",
+    "int",
+    "number",
+    "boolean",
+    "null",
+    "colon",
+    "comma",
+    "end",
+    "whitespace",
+    "line feed",
+    "error",
+    "unknown symbol",
     NULL
 };
 
@@ -914,7 +914,7 @@ static void json_parse_object_context(lua_State *l, json_parse_t *json)
         }
 
         if (token.type != T_COMMA)
-            json_throw_parse_error(l, json, "comma or object end", &token);
+            json_throw_parse_error(l, json, "comma or '}'", &token);
 
         json_next_token(json, &token);
     }
@@ -954,7 +954,7 @@ static void json_parse_array_context(lua_State *l, json_parse_t *json)
         }
 
         if (token.type != T_COMMA)
-            json_throw_parse_error(l, json, "comma or array end", &token);
+            json_throw_parse_error(l, json, "comma or ']'", &token);
 
         json_next_token(json, &token);
     }

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages
  2019-12-06 21:16   ` [Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages Roman Khabibov
@ 2019-12-10 23:20     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 2+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-10 23:20 UTC (permalink / raw)
  To: Roman Khabibov; +Cc: tarantool-patches

Hi! Thanks for the patch!

On 06/12/2019 22:16, Roman Khabibov wrote:
> Hi! Thanks for the review.
> 
>> On Oct 17, 2019, at 00:45, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
>>
>> Hi! Thanks for the patch!
>>
>> Sorry, but I am not sure that now the messages are
>> better. Just look at it:
>>
>>    tarantool> json.decode('{')
>>    ---
>>    - error: Expected object key string but found '\0' on line 1 at character 2 here ^^^...
>>    ...
>>
>> What is `^^^`, what is `...`? In the ticket it is said, that a user should have a
>> context, but there is no a context.
>>
>> I was thinking, that the author of 4339 rather meant that ^^^ should be below the
>> error in an error string, like this:
>>
>>    tarantool> json.decode('{')
>>    ---
>>    - error: Expected object key string but found '\0' on line 1 at character 2 here:
>>             "{ "
>>              ^^^
>>    ...
>>
>> But I do realize, that it is too complex. And still ^^^ looks confusing.
>> It looks like arrows pointing up, while you are trying to put a
>> context after them.
>>
>> At least, you could make a context like this:
>>
>>    tarantool> json.decode('{')
>>    ---
>>    - error: Expected object key string but found '\0' on line 1 at character 2 here
>>             "{>>\0<<"
>>    ...
>>
>> So you would have arrows pointing right at the bad position.
>>
>> I tried an error not related to eof, and got the same confusing
>> result:
>>
>>    tarantool> json.decode('{1: "world"}')
>>    ---
>>    - error: 'Expected object key string but found int on line 1 at character 2 here ^^^:
>>        "world"}...'
>>    ...
>>
>> From that message looks like the error is in ':', or in "world", or
>> somewhere above because of up arrows.
> Let’s just remove “T_*” tokens. Let print of a context will be “wontfix". Double
> arrows isn’t very good, IMO, because we don’t know the length of wrong token.
> 

Sorry, but the context is a part of the ticket. The ticket
consists of 3 points:

  - Replace T_*;
  - Provide context;
  - Provide line number.

Line numbers already work. This patch removes T_*. But
the context still is not provided. Context is asked not
because it is necessary. But because it is really hard to
count character count, when it is big.

Please, add the context.

Talking of "don't know the length of a wrong token" - ok, then
you can provide just arrows pointing at the error from left,
if you don't know right end:

    tarantool> json.decode('{blablabla')
    ---
    - error: Expected object key string but found 'b' on line 1 at character 2 here
             "{ >> blabla..."
    ...

> @@ -154,4 +154,34 @@ tap.test("json", function(test)
>      _, err_msg = pcall(serializer.decode, '{"hello": "world",\n 100: 200}')
>      test:ok(string.find(err_msg, 'line 2 at character 2') ~= nil,
>              'mistake on second line')
> +
> +    --
> +    -- gh-3316: Make sure that tokens 'T_*' are  absent in error

Double whitespace after 'are'.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-12-10 23:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191015155953.10246-1-roman.habibov@tarantool.org>
     [not found] ` <ad789ccf-092e-99ff-d8cf-1b439c788e27@tarantool.org>
2019-12-06 21:16   ` [Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages Roman Khabibov
2019-12-10 23:20     ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox