[tarantool-patches] Re: [PATCH] json: improve error messages

Thu Oct 17 00:45:51 MSK 2019

Hi! Thanks for the patch!

Sorry, but I am not sure that now the messages are
better. Just look at it:

    tarantool> json.decode('{')
    ---
    - error: Expected object key string but found '\0' on line 1 at character 2 here ^^^...
    ...

What is `^^^`, what is `...`? In the ticket it is said, that a user should have a
context, but there is no a context.

I was thinking, that the author of 4339 rather meant that ^^^ should be below the
error in an error string, like this:

    tarantool> json.decode('{')
    ---
    - error: Expected object key string but found '\0' on line 1 at character 2 here:
             "{ "
              ^^^
    ...

But I do realize, that it is too complex. And still ^^^ looks confusing.
It looks like arrows pointing up, while you are trying to put a
context after them.

At least, you could make a context like this:

    tarantool> json.decode('{')
    ---
    - error: Expected object key string but found '\0' on line 1 at character 2 here
             "{>>\0<<"
    ...

So you would have arrows pointing right at the bad position.

I tried an error not related to eof, and got the same confusing
result:

    tarantool> json.decode('{1: "world"}')
    ---
    - error: 'Expected object key string but found int on line 1 at character 2 here ^^^:
        "world"}...'
    ...

>From that message looks like the error is in ':', or in "world", or
somewhere above because of up arrows.

See 3 comments below.

> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
> index 3d7edbf28..aab662585 100644
> --- a/third_party/lua-cjson/lua_cjson.c
> +++ b/third_party/lua-cjson/lua_cjson.c
> @@ -72,23 +72,23 @@ typedef enum {
>  } json_token_type_t;
>  
>  static const char *json_token_type_name[] = {
> -    "T_OBJ_BEGIN",
> -    "T_OBJ_END",
> -    "T_ARR_BEGIN",
> -    "T_ARR_END",
> -    "T_STRING",
> -    "T_UINT",
> -    "T_INT",
> -    "T_NUMBER",
> -    "T_BOOLEAN",
> -    "T_NULL",
> -    "T_COLON",
> -    "T_COMMA",
> -    "T_END",
> -    "T_WHITESPACE",
> -    "T_LINEFEED",
> -    "T_ERROR",
> -    "T_UNKNOWN",
> +    "'{'",
> +    "'}'",
> +    "'['",
> +    "']'",
> +    "string",
> +    "unsigned int",
> +    "int",
> +    "number",
> +    "boolean",
> +    "null",
> +    "':'",
> +    "','",
> +    "'\\0'",

1. I don't think this is a good idea. Probably,
just 'end' would be better. Users don't write
the terminating zero explicitly, and its presence
in an error message is confusing.

> +    "whitespace",
> +    "'\\n'",

2. Please, no special characters: no \0, no \n, no \t
or anything else. Here 'new line' would look easier
to understand.

> +    "error",
> +    "unknown symbol",
>      NULL
>  };
>  
> @@ -825,6 +825,21 @@ static void json_next_token(json_parse_t *json, json_token_t *token)
>      json_set_token_error(token, json, "invalid token");
>  }
>  
> +/*
> + * Copy 10 or less (if string @a ptr is shorter than 10)
> + * characters to static string buffer @a aux_str.
> + *
> + * @param aux_str String static buffer to fill.
> + * @param ptr String with the context.
> + */
> +static void fill_context(char *aux_str, const char *ptr)

3. So as a context you understand 10 symbols *after* the error?
I don't think that will help with error understanding, because
a user won't even see the error position. I think, you need to
take some symbols before and some after the bad position.

Also, please, make 10 a enum constant.

> +{
> +    int i = 0;
> +    for(; ptr[i] != '\0' && i < 10; i++)
> +        aux_str[i] = ptr[i];
> +    aux_str[i] = '\0';
> +}
> +
>  /* This function does not return.
>   * DO NOT CALL WITH DYNAMIC MEMORY ALLOCATED.
>   * The only supported exception is the temporary parser string