From: Roman Khabibov <roman.habibov@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages Date: Sat, 7 Dec 2019 00:16:57 +0300 [thread overview] Message-ID: <822D629F-3969-4FD2-B5A3-B30C1EB7FB52@tarantool.org> (raw) In-Reply-To: <ad789ccf-092e-99ff-d8cf-1b439c788e27@tarantool.org> Hi! Thanks for the review. > On Oct 17, 2019, at 00:45, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Hi! Thanks for the patch! > > Sorry, but I am not sure that now the messages are > better. Just look at it: > > tarantool> json.decode('{') > --- > - error: Expected object key string but found '\0' on line 1 at character 2 here ^^^... > ... > > What is `^^^`, what is `...`? In the ticket it is said, that a user should have a > context, but there is no a context. > > I was thinking, that the author of 4339 rather meant that ^^^ should be below the > error in an error string, like this: > > tarantool> json.decode('{') > --- > - error: Expected object key string but found '\0' on line 1 at character 2 here: > "{ " > ^^^ > ... > > But I do realize, that it is too complex. And still ^^^ looks confusing. > It looks like arrows pointing up, while you are trying to put a > context after them. > > At least, you could make a context like this: > > tarantool> json.decode('{') > --- > - error: Expected object key string but found '\0' on line 1 at character 2 here > "{>>\0<<" > ... > > So you would have arrows pointing right at the bad position. > > I tried an error not related to eof, and got the same confusing > result: > > tarantool> json.decode('{1: "world"}') > --- > - error: 'Expected object key string but found int on line 1 at character 2 here ^^^: > "world"}...' > ... > > From that message looks like the error is in ':', or in "world", or > somewhere above because of up arrows. Let’s just remove “T_*” tokens. Let print of a context will be “wontfix". Double arrows isn’t very good, IMO, because we don’t know the length of wrong token. > See 3 comments below. > >> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c >> index 3d7edbf28..aab662585 100644 >> --- a/third_party/lua-cjson/lua_cjson.c >> +++ b/third_party/lua-cjson/lua_cjson.c >> @@ -72,23 +72,23 @@ typedef enum { >> } json_token_type_t; >> >> static const char *json_token_type_name[] = { >> - "T_OBJ_BEGIN", >> - "T_OBJ_END", >> - "T_ARR_BEGIN", >> - "T_ARR_END", >> - "T_STRING", >> - "T_UINT", >> - "T_INT", >> - "T_NUMBER", >> - "T_BOOLEAN", >> - "T_NULL", >> - "T_COLON", >> - "T_COMMA", >> - "T_END", >> - "T_WHITESPACE", >> - "T_LINEFEED", >> - "T_ERROR", >> - "T_UNKNOWN", >> + "'{'", >> + "'}'", >> + "'['", >> + "']'", >> + "string", >> + "unsigned int", >> + "int", >> + "number", >> + "boolean", >> + "null", >> + "':'", >> + "','", >> + "'\\0'", > > 1. I don't think this is a good idea. Probably, > just 'end' would be better. Users don't write > the terminating zero explicitly, and its presence > in an error message is confusing. Done. >> + "whitespace", >> + "'\\n'", > > 2. Please, no special characters: no \0, no \n, no \t > or anything else. Here 'new line' would look easier > to understand. Done. >> + "error", >> + "unknown symbol", >> NULL >> }; >> >> @@ -825,6 +825,21 @@ static void json_next_token(json_parse_t *json, json_token_t *token) >> json_set_token_error(token, json, "invalid token"); >> } >> >> +/* >> + * Copy 10 or less (if string @a ptr is shorter than 10) >> + * characters to static string buffer @a aux_str. >> + * >> + * @param aux_str String static buffer to fill. >> + * @param ptr String with the context. >> + */ >> +static void fill_context(char *aux_str, const char *ptr) > > 3. So as a context you understand 10 symbols *after* the error? > I don't think that will help with error understanding, because > a user won't even see the error position. I think, you need to > take some symbols before and some after the bad position. > > Also, please, make 10 a enum constant. > >> +{ >> + int i = 0; >> + for(; ptr[i] != '\0' && i < 10; i++) >> + aux_str[i] = ptr[i]; >> + aux_str[i] = '\0'; >> +} >> + >> /* This function does not return. >> * DO NOT CALL WITH DYNAMIC MEMORY ALLOCATED. >> * The only supported exception is the temporary parser string > commit 5642c534a599dcb1ada004a4d6ff4555063ff03d Author: Roman Khabibov <roman.habibov@tarantool.org> Date: Wed Oct 9 17:07:41 2019 +0300 json: make error messages more readable Print tokens themselves instead of token names "T_*" in the error messages. Closes #4339 diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua index fadfc74ec..de277f2f7 100755 --- a/test/app-tap/json.test.lua +++ b/test/app-tap/json.test.lua @@ -22,7 +22,7 @@ end tap.test("json", function(test) local serializer = require('json') - test:plan(40) + test:plan(51) test:test("unsigned", common.test_unsigned, serializer) test:test("signed", common.test_signed, serializer) @@ -154,4 +154,34 @@ tap.test("json", function(test) _, err_msg = pcall(serializer.decode, '{"hello": "world",\n 100: 200}') test:ok(string.find(err_msg, 'line 2 at character 2') ~= nil, 'mistake on second line') + + -- + -- gh-3316: Make sure that tokens 'T_*' are absent in error + -- messages and a context is printed. + -- + _, err_msg = pcall(serializer.decode, '{{: "world"}') + test:ok(string.find(err_msg, '\'{\'') ~= nil, '"{" instead of T_OBJ_BEGIN') + _, err_msg = pcall(serializer.decode, '{"a": "world"}}') + test:ok(string.find(err_msg, '\'}\'') ~= nil, '"}" instead of T_OBJ_END') + _, err_msg = pcall(serializer.decode, '{[: "world"}') + test:ok(string.find(err_msg, '\'[\'', 1, true) ~= nil, + '"[" instead of T_ARR_BEGIN') + _, err_msg = pcall(serializer.decode, '{]: "world"}') + test:ok(string.find(err_msg, '\']\'', 1, true) ~= nil, + '"]" instead of T_ARR_END') + _, err_msg = pcall(serializer.decode, '{1: "world"}') + test:ok(string.find(err_msg, 'int') ~= nil, 'int instead of T_INT') + _, err_msg = pcall(serializer.decode, '{1.0: "world"}') + test:ok(string.find(err_msg, 'number') ~= nil, 'number instead of T_NUMBER') + _, err_msg = pcall(serializer.decode, '{true: "world"}') + test:ok(string.find(err_msg, 'boolean') ~= nil, + 'boolean instead of T_BOOLEAN') + _, err_msg = pcall(serializer.decode, '{null: "world"}') + test:ok(string.find(err_msg, 'null') ~= nil, 'null instead of T_NULL') + _, err_msg = pcall(serializer.decode, '{:: "world"}') + test:ok(string.find(err_msg, 'colon') ~= nil, 'colon instead of T_COLON') + _, err_msg = pcall(serializer.decode, '{,: "world"}') + test:ok(string.find(err_msg, 'comma') ~= nil, 'comma instead of T_COMMA') + _, err_msg = pcall(serializer.decode, '{') + test:ok(string.find(err_msg, 'end') ~= nil, 'end instead of T_END') end) diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c index 3d25814f3..e68b52847 100644 --- a/third_party/lua-cjson/lua_cjson.c +++ b/third_party/lua-cjson/lua_cjson.c @@ -72,23 +72,23 @@ typedef enum { } json_token_type_t; static const char *json_token_type_name[] = { - "T_OBJ_BEGIN", - "T_OBJ_END", - "T_ARR_BEGIN", - "T_ARR_END", - "T_STRING", - "T_UINT", - "T_INT", - "T_NUMBER", - "T_BOOLEAN", - "T_NULL", - "T_COLON", - "T_COMMA", - "T_END", - "T_WHITESPACE", - "T_LINEFEED", - "T_ERROR", - "T_UNKNOWN", + "'{'", + "'}'", + "'['", + "']'", + "string", + "unsigned int", + "int", + "number", + "boolean", + "null", + "colon", + "comma", + "end", + "whitespace", + "line feed", + "error", + "unknown symbol", NULL }; @@ -914,7 +914,7 @@ static void json_parse_object_context(lua_State *l, json_parse_t *json) } if (token.type != T_COMMA) - json_throw_parse_error(l, json, "comma or object end", &token); + json_throw_parse_error(l, json, "comma or '}'", &token); json_next_token(json, &token); } @@ -954,7 +954,7 @@ static void json_parse_array_context(lua_State *l, json_parse_t *json) } if (token.type != T_COMMA) - json_throw_parse_error(l, json, "comma or array end", &token); + json_throw_parse_error(l, json, "comma or ']'", &token); json_next_token(json, &token); }
next parent reply other threads:[~2019-12-06 21:16 UTC|newest] Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <20191015155953.10246-1-roman.habibov@tarantool.org> [not found] ` <ad789ccf-092e-99ff-d8cf-1b439c788e27@tarantool.org> 2019-12-06 21:16 ` Roman Khabibov [this message] 2019-12-10 23:20 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=822D629F-3969-4FD2-B5A3-B30C1EB7FB52@tarantool.org \ --to=roman.habibov@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [tarantool-patches] [PATCH] json: improve error messages' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox