[Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages
Roman Khabibov
roman.habibov at tarantool.org
Sat Dec 7 00:16:57 MSK 2019
Hi! Thanks for the review.
> On Oct 17, 2019, at 00:45, Vladislav Shpilevoy <v.shpilevoy at tarantool.org> wrote:
>
> Hi! Thanks for the patch!
>
> Sorry, but I am not sure that now the messages are
> better. Just look at it:
>
> tarantool> json.decode('{')
> ---
> - error: Expected object key string but found '\0' on line 1 at character 2 here ^^^...
> ...
>
> What is `^^^`, what is `...`? In the ticket it is said, that a user should have a
> context, but there is no a context.
>
> I was thinking, that the author of 4339 rather meant that ^^^ should be below the
> error in an error string, like this:
>
> tarantool> json.decode('{')
> ---
> - error: Expected object key string but found '\0' on line 1 at character 2 here:
> "{ "
> ^^^
> ...
>
> But I do realize, that it is too complex. And still ^^^ looks confusing.
> It looks like arrows pointing up, while you are trying to put a
> context after them.
>
> At least, you could make a context like this:
>
> tarantool> json.decode('{')
> ---
> - error: Expected object key string but found '\0' on line 1 at character 2 here
> "{>>\0<<"
> ...
>
> So you would have arrows pointing right at the bad position.
>
> I tried an error not related to eof, and got the same confusing
> result:
>
> tarantool> json.decode('{1: "world"}')
> ---
> - error: 'Expected object key string but found int on line 1 at character 2 here ^^^:
> "world"}...'
> ...
>
> From that message looks like the error is in ':', or in "world", or
> somewhere above because of up arrows.
Let’s just remove “T_*” tokens. Let print of a context will be “wontfix". Double
arrows isn’t very good, IMO, because we don’t know the length of wrong token.
> See 3 comments below.
>
>> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
>> index 3d7edbf28..aab662585 100644
>> --- a/third_party/lua-cjson/lua_cjson.c
>> +++ b/third_party/lua-cjson/lua_cjson.c
>> @@ -72,23 +72,23 @@ typedef enum {
>> } json_token_type_t;
>>
>> static const char *json_token_type_name[] = {
>> - "T_OBJ_BEGIN",
>> - "T_OBJ_END",
>> - "T_ARR_BEGIN",
>> - "T_ARR_END",
>> - "T_STRING",
>> - "T_UINT",
>> - "T_INT",
>> - "T_NUMBER",
>> - "T_BOOLEAN",
>> - "T_NULL",
>> - "T_COLON",
>> - "T_COMMA",
>> - "T_END",
>> - "T_WHITESPACE",
>> - "T_LINEFEED",
>> - "T_ERROR",
>> - "T_UNKNOWN",
>> + "'{'",
>> + "'}'",
>> + "'['",
>> + "']'",
>> + "string",
>> + "unsigned int",
>> + "int",
>> + "number",
>> + "boolean",
>> + "null",
>> + "':'",
>> + "','",
>> + "'\\0'",
>
> 1. I don't think this is a good idea. Probably,
> just 'end' would be better. Users don't write
> the terminating zero explicitly, and its presence
> in an error message is confusing.
Done.
>> + "whitespace",
>> + "'\\n'",
>
> 2. Please, no special characters: no \0, no \n, no \t
> or anything else. Here 'new line' would look easier
> to understand.
Done.
>> + "error",
>> + "unknown symbol",
>> NULL
>> };
>>
>> @@ -825,6 +825,21 @@ static void json_next_token(json_parse_t *json, json_token_t *token)
>> json_set_token_error(token, json, "invalid token");
>> }
>>
>> +/*
>> + * Copy 10 or less (if string @a ptr is shorter than 10)
>> + * characters to static string buffer @a aux_str.
>> + *
>> + * @param aux_str String static buffer to fill.
>> + * @param ptr String with the context.
>> + */
>> +static void fill_context(char *aux_str, const char *ptr)
>
> 3. So as a context you understand 10 symbols *after* the error?
> I don't think that will help with error understanding, because
> a user won't even see the error position. I think, you need to
> take some symbols before and some after the bad position.
>
> Also, please, make 10 a enum constant.
>
>> +{
>> + int i = 0;
>> + for(; ptr[i] != '\0' && i < 10; i++)
>> + aux_str[i] = ptr[i];
>> + aux_str[i] = '\0';
>> +}
>> +
>> /* This function does not return.
>> * DO NOT CALL WITH DYNAMIC MEMORY ALLOCATED.
>> * The only supported exception is the temporary parser string
>
commit 5642c534a599dcb1ada004a4d6ff4555063ff03d
Author: Roman Khabibov <roman.habibov at tarantool.org>
Date: Wed Oct 9 17:07:41 2019 +0300
json: make error messages more readable
Print tokens themselves instead of token names "T_*" in the error
messages.
Closes #4339
diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua
index fadfc74ec..de277f2f7 100755
--- a/test/app-tap/json.test.lua
+++ b/test/app-tap/json.test.lua
@@ -22,7 +22,7 @@ end
tap.test("json", function(test)
local serializer = require('json')
- test:plan(40)
+ test:plan(51)
test:test("unsigned", common.test_unsigned, serializer)
test:test("signed", common.test_signed, serializer)
@@ -154,4 +154,34 @@ tap.test("json", function(test)
_, err_msg = pcall(serializer.decode, '{"hello": "world",\n 100: 200}')
test:ok(string.find(err_msg, 'line 2 at character 2') ~= nil,
'mistake on second line')
+
+ --
+ -- gh-3316: Make sure that tokens 'T_*' are absent in error
+ -- messages and a context is printed.
+ --
+ _, err_msg = pcall(serializer.decode, '{{: "world"}')
+ test:ok(string.find(err_msg, '\'{\'') ~= nil, '"{" instead of T_OBJ_BEGIN')
+ _, err_msg = pcall(serializer.decode, '{"a": "world"}}')
+ test:ok(string.find(err_msg, '\'}\'') ~= nil, '"}" instead of T_OBJ_END')
+ _, err_msg = pcall(serializer.decode, '{[: "world"}')
+ test:ok(string.find(err_msg, '\'[\'', 1, true) ~= nil,
+ '"[" instead of T_ARR_BEGIN')
+ _, err_msg = pcall(serializer.decode, '{]: "world"}')
+ test:ok(string.find(err_msg, '\']\'', 1, true) ~= nil,
+ '"]" instead of T_ARR_END')
+ _, err_msg = pcall(serializer.decode, '{1: "world"}')
+ test:ok(string.find(err_msg, 'int') ~= nil, 'int instead of T_INT')
+ _, err_msg = pcall(serializer.decode, '{1.0: "world"}')
+ test:ok(string.find(err_msg, 'number') ~= nil, 'number instead of T_NUMBER')
+ _, err_msg = pcall(serializer.decode, '{true: "world"}')
+ test:ok(string.find(err_msg, 'boolean') ~= nil,
+ 'boolean instead of T_BOOLEAN')
+ _, err_msg = pcall(serializer.decode, '{null: "world"}')
+ test:ok(string.find(err_msg, 'null') ~= nil, 'null instead of T_NULL')
+ _, err_msg = pcall(serializer.decode, '{:: "world"}')
+ test:ok(string.find(err_msg, 'colon') ~= nil, 'colon instead of T_COLON')
+ _, err_msg = pcall(serializer.decode, '{,: "world"}')
+ test:ok(string.find(err_msg, 'comma') ~= nil, 'comma instead of T_COMMA')
+ _, err_msg = pcall(serializer.decode, '{')
+ test:ok(string.find(err_msg, 'end') ~= nil, 'end instead of T_END')
end)
diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
index 3d25814f3..e68b52847 100644
--- a/third_party/lua-cjson/lua_cjson.c
+++ b/third_party/lua-cjson/lua_cjson.c
@@ -72,23 +72,23 @@ typedef enum {
} json_token_type_t;
static const char *json_token_type_name[] = {
- "T_OBJ_BEGIN",
- "T_OBJ_END",
- "T_ARR_BEGIN",
- "T_ARR_END",
- "T_STRING",
- "T_UINT",
- "T_INT",
- "T_NUMBER",
- "T_BOOLEAN",
- "T_NULL",
- "T_COLON",
- "T_COMMA",
- "T_END",
- "T_WHITESPACE",
- "T_LINEFEED",
- "T_ERROR",
- "T_UNKNOWN",
+ "'{'",
+ "'}'",
+ "'['",
+ "']'",
+ "string",
+ "unsigned int",
+ "int",
+ "number",
+ "boolean",
+ "null",
+ "colon",
+ "comma",
+ "end",
+ "whitespace",
+ "line feed",
+ "error",
+ "unknown symbol",
NULL
};
@@ -914,7 +914,7 @@ static void json_parse_object_context(lua_State *l, json_parse_t *json)
}
if (token.type != T_COMMA)
- json_throw_parse_error(l, json, "comma or object end", &token);
+ json_throw_parse_error(l, json, "comma or '}'", &token);
json_next_token(json, &token);
}
@@ -954,7 +954,7 @@ static void json_parse_array_context(lua_State *l, json_parse_t *json)
}
if (token.type != T_COMMA)
- json_throw_parse_error(l, json, "comma or array end", &token);
+ json_throw_parse_error(l, json, "comma or ']'", &token);
json_next_token(json, &token);
}
More information about the Tarantool-patches
mailing list