[Tarantool-patches] [PATCH] json: fix silent change of global json settings

Sun Feb 16 03:06:43 MSK 2020

Pushed to master, 2.3, 2.2 and 1.10 with minor changes. CCed Kirill.

The changes I made are described below and I hope they are NOT about my
personal feeling how code should be organized and commented, but how the
whole team operates on it. I explained a reason that is under each
change (where it is not very obvious), so you may verify future patches
versus those points youself.

Some changes, however, may look as my personal taste of a good code
shape. Excuse me if it is so, I tried to refrain myself from this kind
of modifications.

WBR, Alexander Turenko.

----

Pasted the new patch to comment it, because the diff of diffs you pasted
does not have all recent changes (at least usage of
luaL_serializer_copy_options()).

> commit cc7e3fc49cc71ac0b4616c6f073a40fdc7ed3979
> Author: Olga Arkhangelskaia <arkholga at tarantool.org>
> Date:   Wed Feb 5 14:05:25 2020 +0300
> 
>     json: don't spoil instance options with per-call

I see, you removed 'ones' from the end to fit 50 symbols limit. However
'per-call' is adjective and it can not be used w/o a noun (or noun
group). So I rephrased it a bit:

 | json: don't spoil instance with per-call options

>     
>     When json.decode is used with 2 arguments, 2nd argument seeps out to the json
>     configuration of the instance. Moreover, due to current serializer.cfg
>     implementation it remains invisible while checking settings by json.cfg.

Reformatted to fit 72 symbols. The limit comes from the tradition to
format any text in the way that will look good on 80x25 screen: 4
symbols for the indent `git log` adds before the text, 72 for the text
itself and 4 symbols to make the text visually centered on the screen.

Changed 'by json.cfg' (at end of the sentence) to 'using json.cfg
table', it looks more strict.

>     
>     This fixes commit 6508ddb ("json: fix stack-use-after-scope in json_decode()").

Use full 40 hex digits commit hash to follow our usual style (GitHub
renders it to shorter hash, but git log shows as is).

>     
>     Closes #4761
> 
> diff --git a/test/app-tap/gh-4761-fix-json-decode.test.lua b/test/app-tap/gh-4761-fix-json-decode.test.lua
> new file mode 100755
> index 000000000..d10849317
> --- /dev/null
> +++ b/test/app-tap/gh-4761-fix-json-decode.test.lua

Changed the test name to 'gh-4761-json-per-call-options.test.lua' to
follow the style of other 'gh-*' regression tests: it is either noun
group describing a component where the problem was or a noun group
explaining the problem itself. Not a sentence in the imperative mood.

> @@ -0,0 +1,28 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +tap.test("json", function(test)

Changed double quites to single quotes to unify the style.

Changed the test name to 'gh-4761-json-per-call-options',
because 'json' looks too common here.

> +    local serializer = require('json')

Moved the 'require' statement to the top level and changed the variable
name to 'json': it is more usual for our Lua code.

> +    test:plan(4)
> +
> +    --
> +    -- gh-4761: json.decode silently changes instances settings when called
> +    -- with 2nd parameter.

Cite from the previous review notes:

 | * Fit a comment within 66 symbols (see [2]; it is for C, but in fact we
 |   apply the rule to Lua).
 |
 | [2]: https://www.tarantool.io/en/doc/1.10/dev_guide/c_style_guide/

Reformatted the comment to follow this rule. Fixed typo: 'instances' ->
'instance' (plural is not needed here).

Moved the comment above: since we test both :decode() and :encode() and
the comment describes the situation that is theoretically possible for
both methods, it looks more applicable for the entire test. Added
'verify json.encode as well' sentence.

> +    --
> +    test:ok(not pcall(serializer.decode, '{"foo":{"bar": 1}}',
> +                      {decode_max_depth = 1}),
> +            'error: too many nested data structures')

I would use test:ok() and so for the test case itself, but assert() for
test preparation code.

I would also split the call to :decode() and result verification code
for readability matters.

So, rewrote in this way:

 |  -- Preparation code: call :decode() with a custom option.
 |  local ok, err = pcall(json.decode, '{"foo": {"bar": 1}}',
 |                        {decode_max_depth = 1})
 |  assert(not ok, 'expect "too many nested data structures" error')

> +    test:ok(pcall(serializer.decode, '{"foo":{"bar": 1}}'),
> +            'json instance settings are unchanged')

Splitted :decode() call from test:ok() as above. Also verify the result
of decoding:

 | -- Verify that the instance option remains unchanged.
 | local exp_res = {foo = {bar = 1}}
 | local ok, res = pcall(json.decode, '{"foo": {"bar": 1}}')
 | test:is_deeply({ok, res}, {true, exp_res},
 |                'json instance settings remain unchanged after :decode()')

Added :decode() to the test case name to differentiate it from the one
below (re :encode()) in the TAP13 output.

> +
> +    --
> +    -- Same check for json.encode.
> +    --
> +    local nan = 1/0
> +    test:ok(not pcall(serializer.encode, {a = 1/0},

Changed '1/0' to 'nan'.

> +                      {encode_invalid_numbers = false}),
> +            'expected error with NaN encoding with .encode')
> +    test:ok(pcall(serializer.encode, {a=nan}),
> +            "json instance settings are unchanged")

Made the changes in the spirit of ones described above.

> +
> +end)

It is good to exit from a test with a non-zero exit code in case of a
fail (because our test harness may miss some kinds of fails listed in
TAP13 output), so I have added the check of tap.test() result:

 | local res = tap.test(<...>, function(test)
 |     <...test cases...>
 | end)
 |
 | os.exit(res and 0 or 1)

It is mentioned in [1] for the following form:

 | local test = tap.test(<...>)
 | <...test cases...>
 | os.exit(test:check() and 0 or 1)

test:check() returns exactly same code as tap.test(<...>) when the
latter is called with a function.

[1]: https://github.com/tarantool/doc/issues/1004

> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
> index 3d25814f3..8ff2ca89a 100644
> --- a/third_party/lua-cjson/lua_cjson.c
> +++ b/third_party/lua-cjson/lua_cjson.c
> @@ -1004,13 +1004,26 @@ static int json_decode(lua_State *l)
>      luaL_argcheck(l, lua_gettop(l) == 2 || lua_gettop(l) == 1, 1,
>                    "expected 1 or 2 arguments");
>  
> +    struct luaL_serializer *cfg = luaL_checkserializer(l);
> +
> +    /* user_cfg is per-call local version of global cfg: it is

Changed 'global cfg' to 'serializer instance options'.

> +     * used if user passes custom options to :decode() method

Changed: user -> a user.

> +     * as a separate arguments. In this case it is required

Changed 'arguments' to the singular form.

> +     * to avoid modifying global parameters. Life span of

Changed 'global parameters' to 'options of the instance'.

> +     * user_cfg is restricted by the scope of :decode() so it
> +     * is enough to allocate it on the stack. */
> +    struct luaL_serializer user_cfg;

We use two styles of comments (for one-line and multi-line comments):

 | /* Hello. */

 | /*
 |  * Hello,
 |  * world.
 |  */

Changed the comment to follow the latter style.

If you're curious, Linus about comments style (strong language, you have
been warned): https://lkml.org/lkml/2016/7/8/625

> +    json.cfg = cfg;
>      if (lua_gettop(l) == 2) {
> -        struct luaL_serializer *user_cfg = luaL_checkserializer(l);
> -        luaL_serializer_parse_options(l, user_cfg);
> -        lua_pop(l, 1);
> -        json.cfg = user_cfg;
> -    } else {
> -        json.cfg = luaL_checkserializer(l);
> +
> +        /* Only copy cfg options and left update triggers, see
> +	 * struct luaL_serializer, uninitialized. The code should
> +	 * never run them, since we must not change the json config of the
> +	 * instance. */
> +	luaL_serializer_copy_options(&user_cfg, cfg);
> +	luaL_serializer_parse_options(l, &user_cfg);
> +	lua_pop(l, 1);
> +	json.cfg = &user_cfg;
>      }

Tab / space mix. The code is inherited from an other source, so we
should follow the surrounding style and use spaces for indentation.
Fixed.

I rewrote the comment:

 | /*
 |  * on_update triggers are left uninitialized for user_cfg.
 |  * The decoding code don't (and shouldn't) run them.
 |  */

The reason is to make the comment technically strict:

* The code not only copy options from a serializer instance, but also
  parse and apply user-provided options. I removed this clause to don't
  describe this (it is quite obvious from the code now).
* Only changes from Lua run on_update triggers, because C code don't
  change options expect when a serializer is created or when the change
  is result of on_update() trigger run. So even if the decoding code
  would change an option the triggers will not be run. So I only said
  that triggers don't and shouldn't be run rather than that the decoding
  code don't change the options.

Hope you don't mind.

>  
>      json.data = luaL_checklstring(l, 1, &json_len);