Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH v2 1/4] app: serializers update now is reflected in Lua
  2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
@ 2019-09-09 19:00 ` Vladislav Shpilevoy
  2019-09-12 23:22   ` [tarantool-patches] " Alexander Turenko
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option Vladislav Shpilevoy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-09 19:00 UTC (permalink / raw)
  To: tarantool-patches; +Cc: alexander.turenko

There are some objects called serializers - msgpack, cjson, yaml,
maybe more. They are global objects affecting both Lua and C
modules.

A serializer have settings which can be updated. But before the
patch an update changed only C structure of the serializer. It
made impossible to use settings of the serializers from Lua.

Now any update of any serializer is reflected both in its C and
Lua structures.

Part of #4434
---
 src/lua/utils.c                      | 34 +++++++++++++++-------------
 test/app-tap/json.test.lua           |  3 ++-
 test/app-tap/lua/serializer_test.lua | 13 +++++++++++
 test/app-tap/msgpack.test.lua        |  3 ++-
 4 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 75efe0ed2..a082a2e5b 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -270,10 +270,8 @@ luaL_serializer_parse_option(struct lua_State *L, int i,
 			     struct luaL_serializer *cfg)
 {
 	lua_getfield(L, 2, OPTIONS[i].name);
-	if (lua_isnil(L, -1)) {
-		lua_pop(L, 1);
+	if (lua_isnil(L, -1))
 		return NULL;
-	}
 	/*
 	 * Update struct luaL_serializer using pointer to a
 	 * configuration value (all values must be `int` for that).
@@ -289,7 +287,6 @@ luaL_serializer_parse_option(struct lua_State *L, int i,
 	default:
 		unreachable();
 	}
-	lua_pop(L, 1);
 	return pval;
 }
 
@@ -297,26 +294,31 @@ void
 luaL_serializer_parse_options(struct lua_State *L,
 			      struct luaL_serializer *cfg)
 {
-	for (int i = 0; OPTIONS[i].name != NULL; i++)
+	for (int i = 0; OPTIONS[i].name != NULL; ++i) {
 		luaL_serializer_parse_option(L, i, cfg);
+		lua_pop(L, 1);
+	}
 }
 
 /**
- * @brief serializer.cfg{} Lua binding for serializers.
- * serializer.cfg is a table that contains current configuration values from
- * luaL_serializer structure. serializer.cfg has overriden __call() method
- * to change configuration keys in internal userdata (like box.cfg{}).
- * Please note that direct change in serializer.cfg.key will not affect
- * internal state of userdata.
- * @param L lua stack
- * @return 0
+ * Serializer.cfg.__call implementation. Merge new parameters into
+ * both C structure of the serializer, and into its Lua table
+ * representation: serializer.cfg.
  */
 static int
 luaL_serializer_cfg(struct lua_State *L)
 {
-	luaL_checktype(L, 1, LUA_TTABLE); /* serializer */
-	luaL_checktype(L, 2, LUA_TTABLE); /* serializer.cfg */
-	luaL_serializer_parse_options(L, luaL_checkserializer(L));
+	/* Serializer.cfg */
+	luaL_checktype(L, 1, LUA_TTABLE);
+	/* Updated parameters. */
+	luaL_checktype(L, 2, LUA_TTABLE);
+	struct luaL_serializer *cfg = luaL_checkserializer(L);
+	for (int i = 0; OPTIONS[i].name != NULL; ++i) {
+		if (luaL_serializer_parse_option(L, i, cfg) == NULL)
+			lua_pop(L, 1);
+		else
+			lua_setfield(L, 1, OPTIONS[i].name);
+	}
 	return 0;
 }
 
diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua
index 10f6f4ab2..0a8966866 100755
--- a/test/app-tap/json.test.lua
+++ b/test/app-tap/json.test.lua
@@ -22,7 +22,7 @@ end
 
 tap.test("json", function(test)
     local serializer = require('json')
-    test:plan(32)
+    test:plan(33)
 
     test:test("unsigned", common.test_unsigned, serializer)
     test:test("signed", common.test_signed, serializer)
@@ -32,6 +32,7 @@ tap.test("json", function(test)
     test:test("nil", common.test_nil, serializer)
     test:test("table", common.test_table, serializer, is_array, is_map)
     test:test("ucdata", common.test_ucdata, serializer)
+    test:test("depth", common.test_depth, serializer)
     test:test("misc", test_misc, serializer)
 
     --
diff --git a/test/app-tap/lua/serializer_test.lua b/test/app-tap/lua/serializer_test.lua
index e7048da4d..a44247d70 100644
--- a/test/app-tap/lua/serializer_test.lua
+++ b/test/app-tap/lua/serializer_test.lua
@@ -402,6 +402,18 @@ local function test_ucdata(test, s)
     ss = nil
 end
 
+local function test_depth(test, s)
+    test:plan(1)
+    --
+    -- gh-4434: serializer update should be reflected in Lua.
+    --
+    local max_depth = s.cfg.encode_max_depth
+    s.cfg({encode_max_depth = max_depth + 5})
+    test:is(s.cfg.encode_max_depth, max_depth + 5,
+            "cfg({<name> = value}) is reflected in cfg.<name>")
+    s.cfg({encode_max_depth = max_depth})
+end
+
 return {
     test_unsigned = test_unsigned;
     test_signed = test_signed;
@@ -412,4 +424,5 @@ return {
     test_table = test_table;
     test_ucdata = test_ucdata;
     test_decimal = test_decimal;
+    test_depth = test_depth;
 }
diff --git a/test/app-tap/msgpack.test.lua b/test/app-tap/msgpack.test.lua
index bd095e5ae..752f107a8 100755
--- a/test/app-tap/msgpack.test.lua
+++ b/test/app-tap/msgpack.test.lua
@@ -231,7 +231,7 @@ end
 
 tap.test("msgpack", function(test)
     local serializer = require('msgpack')
-    test:plan(11)
+    test:plan(12)
     test:test("unsigned", common.test_unsigned, serializer)
     test:test("signed", common.test_signed, serializer)
     test:test("double", common.test_double, serializer)
@@ -240,6 +240,7 @@ tap.test("msgpack", function(test)
     test:test("nil", common.test_nil, serializer)
     test:test("table", common.test_table, serializer, is_array, is_map)
     test:test("ucdata", common.test_ucdata, serializer)
+    test:test("depth", common.test_depth, serializer)
     test:test("offsets", test_offsets, serializer)
     test:test("misc", test_misc, serializer)
     test:test("decode_array_map", test_decode_array_map_header, serializer)
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option
  2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 1/4] app: serializers update now is reflected in Lua Vladislav Shpilevoy
@ 2019-09-09 19:00 ` Vladislav Shpilevoy
  2019-09-12 23:24   ` [tarantool-patches] " Alexander Turenko
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-09 19:00 UTC (permalink / raw)
  To: tarantool-patches; +Cc: alexander.turenko

Msgpack Lua module is not a simple set of functions. It is a
global serializer object used by plenty of other Lua and C
modules. Msgpack as a serializer can be configured, and in theory
its configuration updates should affect all other modules. For
example, a user could change encode_max_depth:

    require('msgpack').cfg({encode_max_depth = <new_value>})

And that would make tuple:update() accept tables with <new_value>
depth without a crop.

But in fact msgpack configuration didn't affect some places, such
as this one. And all the others who use msgpackffi.

This patch fixes it, for encode_max_depth option. Other options
are still ignored.

Part of #4434
---
 src/lua/msgpackffi.lua           |  3 +--
 test/app-tap/msgpackffi.test.lua | 26 +++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
index f7ee44291..65c57aa6e 100644
--- a/src/lua/msgpackffi.lua
+++ b/src/lua/msgpackffi.lua
@@ -4,7 +4,6 @@ local ffi = require('ffi')
 local buffer = require('buffer')
 local builtin = ffi.C
 local msgpack = require('msgpack') -- .NULL, .array_mt, .map_mt, .cfg
-local MAXNESTING = 16
 local int8_ptr_t = ffi.typeof('int8_t *')
 local uint8_ptr_t = ffi.typeof('uint8_t *')
 local uint16_ptr_t = ffi.typeof('uint16_t *')
@@ -219,7 +218,7 @@ local function encode_r(buf, obj, level)
     elseif type(obj) == "string" then
         encode_str(buf, obj)
     elseif type(obj) == "table" then
-        if level >= MAXNESTING then -- Limit nested tables
+        if level >= msgpack.cfg.encode_max_depth then
             encode_nil(buf)
             return
         end
diff --git a/test/app-tap/msgpackffi.test.lua b/test/app-tap/msgpackffi.test.lua
index f2a8f254b..e26247625 100755
--- a/test/app-tap/msgpackffi.test.lua
+++ b/test/app-tap/msgpackffi.test.lua
@@ -36,7 +36,7 @@ local function test_offsets(test, s)
 end
 
 local function test_other(test, s)
-    test:plan(19)
+    test:plan(21)
     local buf = string.char(0x93, 0x6e, 0xcb, 0x42, 0x2b, 0xed, 0x30, 0x47,
         0x6f, 0xff, 0xff, 0xac, 0x77, 0x6b, 0x61, 0x71, 0x66, 0x7a, 0x73,
         0x7a, 0x75, 0x71, 0x71, 0x78)
@@ -68,6 +68,30 @@ local function test_other(test, s)
     test:is(#s.encode(-0x8001), 5, "len(encode(-0x8001))")
     test:is(#s.encode(-0x80000000), 5, "len(encode(-0x80000000))")
     test:is(#s.encode(-0x80000001), 9, "len(encode(-0x80000001))")
+
+    --
+    -- gh-4434: msgpackffi does not care about msgpack serializer
+    -- configuration, but it should.
+    --
+    local function check_depth(depth_to_try)
+        local t = nil
+        for i = 1, depth_to_try do t = {t} end
+        t = s.decode_unchecked(s.encode(t))
+        local level = 0
+        while t ~= nil do level = level + 1 t = t[1] end
+        return level
+    end
+    local msgpack = require('msgpack')
+    local max_depth = msgpack.cfg.encode_max_depth
+    local result_depth = check_depth(max_depth + 5)
+    test:is(result_depth, max_depth,
+            "msgpackffi uses msgpack.cfg.encode_max_depth")
+
+    msgpack.cfg({encode_max_depth = max_depth + 5})
+    result_depth = check_depth(max_depth + 5)
+    test:is(result_depth, max_depth + 5, "and uses it dynamically")
+
+    msgpack.cfg({encode_max_depth = max_depth})
 end
 
 tap.test("msgpackffi", function(test)
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple
  2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 1/4] app: serializers update now is reflected in Lua Vladislav Shpilevoy
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option Vladislav Shpilevoy
@ 2019-09-09 19:00 ` Vladislav Shpilevoy
  2019-09-12 23:27   ` [tarantool-patches] " Alexander Turenko
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 4/4] app: allow to raise an error on too nested tables Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-09 19:00 UTC (permalink / raw)
  To: tarantool-patches; +Cc: alexander.turenko

Tuple is a C library exposed to Lua. In Lua to translate Lua
objects into tuples and back luaL_serializer structure is used.

In Tarantool we have several global serializers, one of which is
for msgpack. Tuples store data in msgpack, and in theory should
have used that global msgpack serializer. But in fact the tuple
module had its own private serializer because of tuples encoding
specifics such as never encode sparse arrays as maps.

This patch makes tuple Lua module use global msgpack serializer
always. But how does tuple handle sparse arrays now? In fact,
the tuple module still has its own serializer, but it is updated
each time when the msgpack serializer is changed.

Part of #4434
---
 src/box/lua/tuple.c        | 33 ++++++++++++++++++++-------
 src/lua/utils.c            | 12 ++++++++--
 src/lua/utils.h            | 26 +++++++++++++++++++++
 test/box/tuple.result      | 46 ++++++++++++++++++++++++++++++++++++++
 test/box/tuple.test.lua    | 21 +++++++++++++++++
 test/unit/luaT_tuple_new.c |  2 +-
 6 files changed, 129 insertions(+), 11 deletions(-)

diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
index 3902288bf..fc22f5572 100644
--- a/src/box/lua/tuple.c
+++ b/src/box/lua/tuple.c
@@ -58,6 +58,11 @@
 
 static const char *tuplelib_name = "box.tuple";
 static const char *tuple_iteratorlib_name = "box.tuple.iterator";
+/*
+ * Special serializer for box.tuple.new() to disable storage
+ * optimization for excessively sparse arrays as a tuple always
+ * must be regular MP_ARRAY.
+ */
 static struct luaL_serializer tuple_serializer;
 
 extern char tuple_lua[]; /* Lua source */
@@ -548,6 +553,21 @@ static const struct luaL_Reg lbox_tuple_iterator_meta[] = {
 
 /* }}} */
 
+static inline void
+tuple_serializer_fill(void)
+{
+	luaL_serializer_copy_options(&tuple_serializer, luaL_msgpack_default);
+	tuple_serializer.encode_sparse_ratio = 0;
+}
+
+static void
+on_msgpack_serializer_update(struct trigger *trigger, void *event)
+{
+	(void) trigger;
+	(void) event;
+	tuple_serializer_fill();
+}
+
 void
 box_lua_tuple_init(struct lua_State *L)
 {
@@ -564,14 +584,11 @@ box_lua_tuple_init(struct lua_State *L)
 
 	luamp_set_encode_extension(luamp_encode_extension_box);
 
-	/*
-	 * Create special serializer for box.tuple.new().
-	 * Disable storage optimization for excessively
-	 * sparse arrays as a tuple always must be regular
-	 * MP_ARRAY.
-	 */
-	luaL_serializer_create(&tuple_serializer);
-	tuple_serializer.encode_sparse_ratio = 0;
+	tuple_serializer_fill();
+	trigger_create(&tuple_serializer.update_trigger,
+		       on_msgpack_serializer_update, NULL, NULL);
+	trigger_add(&luaL_msgpack_default->on_update,
+		    &tuple_serializer.update_trigger);
 
 	/* Get CTypeID for `struct tuple' */
 	int rc = luaL_cdef(L, "struct tuple;");
diff --git a/src/lua/utils.c b/src/lua/utils.c
index a082a2e5b..b5cefd07f 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -251,12 +251,20 @@ static struct {
 void
 luaL_serializer_create(struct luaL_serializer *cfg)
 {
+	rlist_create(&cfg->on_update);
 	for (int i = 0; OPTIONS[i].name != NULL; i++) {
 		int *pval = (int *) ((char *) cfg + OPTIONS[i].offset);
 		*pval = OPTIONS[i].defvalue;
 	}
 }
 
+void
+luaL_serializer_copy_options(struct luaL_serializer *dst,
+			     const struct luaL_serializer *src)
+{
+	memcpy(dst, src, offsetof(struct luaL_serializer, end_of_options));
+}
+
 /**
  * Configure one field in @a cfg.
  * @param L Lua stack.
@@ -319,6 +327,7 @@ luaL_serializer_cfg(struct lua_State *L)
 		else
 			lua_setfield(L, 1, OPTIONS[i].name);
 	}
+	trigger_run(&cfg->on_update, cfg);
 	return 0;
 }
 
@@ -342,7 +351,7 @@ luaL_newserializer(struct lua_State *L, const char *modname, const luaL_Reg *reg
 			lua_newuserdata(L, sizeof(*serializer));
 	luaL_getmetatable(L, LUAL_SERIALIZER);
 	lua_setmetatable(L, -2);
-	memset(serializer, 0, sizeof(*serializer));
+	luaL_serializer_create(serializer);
 
 	for (; reg->name != NULL; reg++) {
 		/* push luaL_serializer as upvalue */
@@ -362,7 +371,6 @@ luaL_newserializer(struct lua_State *L, const char *modname, const luaL_Reg *reg
 	/* Save configuration values to serializer.cfg */
 	for (int i = 0; OPTIONS[i].name != NULL; i++) {
 		int *pval = (int *) ((char *) serializer + OPTIONS[i].offset);
-		*pval = OPTIONS[i].defvalue;
 		switch (OPTIONS[i].type) {
 		case LUA_TBOOLEAN:
 			lua_pushboolean(L, *pval);
diff --git a/src/lua/utils.h b/src/lua/utils.h
index d42cc3992..e9c316d22 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -36,6 +36,7 @@
 #include <math.h> /* modf, isfinite */
 
 #include <msgpuck.h> /* enum mp_type */
+#include "trigger.h"
 
 #if defined(__cplusplus)
 extern "C" {
@@ -242,6 +243,23 @@ struct luaL_serializer {
 
 	/** Enable support for compact represenation (internal, YAML-only). */
 	int has_compact;
+	/**
+	 * Border where copyable fields end. Is used to copy
+	 * serializer options into an existing serializer without
+	 * erasure of its non-option fields like triggers.
+	 */
+	char end_of_options[0];
+	/**
+	 * Trigger object to subscribe on updates of a more
+	 * general serializer. For example, tuple serializer
+	 * subscribes on msgpack.
+	 */
+	struct trigger update_trigger;
+	/**
+	 * List of triggers on update of this serializer. To push
+	 * updates down to dependent serializers.
+	 */
+	struct rlist on_update;
 };
 
 extern int luaL_nil_ref;
@@ -254,6 +272,14 @@ extern int luaL_array_metatable_ref;
 struct luaL_serializer *
 luaL_newserializer(struct lua_State *L, const char *modname, const luaL_Reg *reg);
 
+/**
+ * Copy all option fields of @a src into @a dst. Other fields,
+ * such as triggers, are not touched.
+ */
+void
+luaL_serializer_copy_options(struct luaL_serializer *dst,
+			     const struct luaL_serializer *src);
+
 static inline struct luaL_serializer *
 luaL_checkserializer(struct lua_State *L) {
 	return (struct luaL_serializer *)
diff --git a/test/box/tuple.result b/test/box/tuple.result
index 895462518..83f74d111 100644
--- a/test/box/tuple.result
+++ b/test/box/tuple.result
@@ -1395,3 +1395,49 @@ d:update{{'-', 1, dec.new('1e37')}}
 ---
 - [80000000000000000000000000000000000000]
 ...
+--
+-- gh-4434: tuple should use global msgpack serializer.
+--
+max_depth = msgpack.cfg.encode_max_depth
+---
+...
+t = nil
+---
+...
+for i = 1, max_depth + 5 do t = {t} end
+---
+...
+tuple = box.tuple.new(t):totable()
+---
+...
+level = 0
+---
+...
+while tuple ~= nil do level = level + 1 tuple = tuple[1] end
+---
+...
+level == max_depth or {level, max_depth}
+---
+- true
+...
+msgpack.cfg({encode_max_depth = max_depth + 5})
+---
+...
+tuple = box.tuple.new(t):totable()
+---
+...
+level = 0
+---
+...
+while tuple ~= nil do level = level + 1 tuple = tuple[1] end
+---
+...
+-- Level should be bigger now, because the default msgpack
+-- serializer allows deeper tables.
+level == max_depth + 5 or {level, max_depth}
+---
+- true
+...
+msgpack.cfg({encode_max_depth = max_depth})
+---
+...
diff --git a/test/box/tuple.test.lua b/test/box/tuple.test.lua
index 9762fc8b3..c8a0c03c5 100644
--- a/test/box/tuple.test.lua
+++ b/test/box/tuple.test.lua
@@ -473,3 +473,24 @@ d = box.tuple.new(dec.new('9e37'))
 d
 d:update{{'+', 1, dec.new('1e37')}}
 d:update{{'-', 1, dec.new('1e37')}}
+
+--
+-- gh-4434: tuple should use global msgpack serializer.
+--
+max_depth = msgpack.cfg.encode_max_depth
+t = nil
+for i = 1, max_depth + 5 do t = {t} end
+tuple = box.tuple.new(t):totable()
+level = 0
+while tuple ~= nil do level = level + 1 tuple = tuple[1] end
+level == max_depth or {level, max_depth}
+
+msgpack.cfg({encode_max_depth = max_depth + 5})
+tuple = box.tuple.new(t):totable()
+level = 0
+while tuple ~= nil do level = level + 1 tuple = tuple[1] end
+-- Level should be bigger now, because the default msgpack
+-- serializer allows deeper tables.
+level == max_depth + 5 or {level, max_depth}
+
+msgpack.cfg({encode_max_depth = max_depth})
diff --git a/test/unit/luaT_tuple_new.c b/test/unit/luaT_tuple_new.c
index 609d64e45..8f25c8e07 100644
--- a/test/unit/luaT_tuple_new.c
+++ b/test/unit/luaT_tuple_new.c
@@ -170,8 +170,8 @@ main()
 	luaL_openlibs(L);
 
 	box_init();
-	box_lua_tuple_init(L);
 	luaopen_msgpack(L);
+	box_lua_tuple_init(L);
 	lua_pop(L, 1);
 
 	return test_basic(L);
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 4/4] app: allow to raise an error on too nested tables
  2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple Vladislav Shpilevoy
@ 2019-09-09 19:00 ` Vladislav Shpilevoy
  2019-09-12 23:32   ` [tarantool-patches] " Alexander Turenko
  2019-09-10 20:25 ` [tarantool-patches] Re: [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
  2019-09-12 23:44 ` Alexander Turenko
  5 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-09 19:00 UTC (permalink / raw)
  To: tarantool-patches; +Cc: alexander.turenko

Closes #4434
Follow-up #4366

@TarantoolBot document
Title: json/msgpack.cfg.encode_crop_too_deep option

Tarantool has several so called serializers to convert data
between Lua and another format: YAML, JSON, msgpack.

YAML is a crazy serializer without depth restrictions. But for
JSON and msgpack a user could set encode_max_depth option. That
option led to crop of a table when it had too many nested levels.
Sometimes such behaviour is undesirable.

Now a user can choose to raise an error instead of data
corruption:

    msgpack.cfg({encode_crop_too_deep = false})
    t = nil
    for i = 1, 100 do t = {t} end
    msgpack.encode(t) -- Here an exception is thrown.

Option encode_crop_too_deep works for JSON and msgpack modules,
and is false by default. It means, that now if some existing
users have cropping, even intentional, they will get the
exception.
---
 src/lua/msgpack.c                    |  4 ++++
 src/lua/msgpackffi.lua               |  3 +++
 src/lua/utils.c                      |  1 +
 src/lua/utils.h                      |  6 ++++++
 test/app-tap/lua/serializer_test.lua | 20 +++++++++++++++++++-
 test/app-tap/msgpackffi.test.lua     |  7 ++++++-
 test/box/tuple.result                |  8 +++++++-
 test/box/tuple.test.lua              |  4 +++-
 third_party/lua-cjson/lua_cjson.c    | 10 ++++++++--
 9 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/src/lua/msgpack.c b/src/lua/msgpack.c
index c2be0b3e8..b1354776d 100644
--- a/src/lua/msgpack.c
+++ b/src/lua/msgpack.c
@@ -143,6 +143,8 @@ restart: /* used by MP_EXT */
 	case MP_MAP:
 		/* Map */
 		if (level >= cfg->encode_max_depth) {
+			if (! cfg->encode_crop_too_deep)
+				return luaL_error(L, "Too high nest level");
 			mpstream_encode_nil(stream); /* Limit nested maps */
 			return MP_NIL;
 		}
@@ -164,6 +166,8 @@ restart: /* used by MP_EXT */
 	case MP_ARRAY:
 		/* Array */
 		if (level >= cfg->encode_max_depth) {
+			if (! cfg->encode_crop_too_deep)
+				return luaL_error(L, "Too high nest level");
 			mpstream_encode_nil(stream); /* Limit nested arrays */
 			return MP_NIL;
 		}
diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
index 65c57aa6e..73d0d6fe2 100644
--- a/src/lua/msgpackffi.lua
+++ b/src/lua/msgpackffi.lua
@@ -219,6 +219,9 @@ local function encode_r(buf, obj, level)
         encode_str(buf, obj)
     elseif type(obj) == "table" then
         if level >= msgpack.cfg.encode_max_depth then
+            if not msgpack.cfg.encode_crop_too_deep then
+                error('Too high nest level')
+            end
             encode_nil(buf)
             return
         end
diff --git a/src/lua/utils.c b/src/lua/utils.c
index b5cefd07f..c338e8dfa 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -237,6 +237,7 @@ static struct {
 	OPTION(LUA_TNUMBER,  encode_sparse_ratio, 2),
 	OPTION(LUA_TNUMBER,  encode_sparse_safe, 10),
 	OPTION(LUA_TNUMBER,  encode_max_depth, 32),
+	OPTION(LUA_TBOOLEAN, encode_crop_too_deep, 0),
 	OPTION(LUA_TBOOLEAN, encode_invalid_numbers, 1),
 	OPTION(LUA_TNUMBER,  encode_number_precision, 14),
 	OPTION(LUA_TBOOLEAN, encode_load_metatables, 1),
diff --git a/src/lua/utils.h b/src/lua/utils.h
index e9c316d22..d581e47fd 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -213,6 +213,12 @@ struct luaL_serializer {
 	int encode_sparse_safe;
 	/** Max recursion depth for encoding (MsgPack, CJSON only) */
 	int encode_max_depth;
+	/**
+	 * A flag whether a table with too high nest level should
+	 * be cropped. If not set, than this is considered an
+	 * error.
+	 */
+	int encode_crop_too_deep;
 	/** Enables encoding of NaN and Inf numbers */
 	int encode_invalid_numbers;
 	/** Floating point numbers precision (YAML, CJSON only) */
diff --git a/test/app-tap/lua/serializer_test.lua b/test/app-tap/lua/serializer_test.lua
index a44247d70..4fa2924cf 100644
--- a/test/app-tap/lua/serializer_test.lua
+++ b/test/app-tap/lua/serializer_test.lua
@@ -403,7 +403,7 @@ local function test_ucdata(test, s)
 end
 
 local function test_depth(test, s)
-    test:plan(1)
+    test:plan(3)
     --
     -- gh-4434: serializer update should be reflected in Lua.
     --
@@ -412,6 +412,24 @@ local function test_depth(test, s)
     test:is(s.cfg.encode_max_depth, max_depth + 5,
             "cfg({<name> = value}) is reflected in cfg.<name>")
     s.cfg({encode_max_depth = max_depth})
+
+    --
+    -- gh-4434 (yes, the same issue): let users choose whether
+    -- they want to raise an error on tables with too high nest
+    -- level.
+    --
+    s.cfg({encode_crop_too_deep = false})
+
+    local t = nil
+    for i = 1, max_depth + 1 do t = {t} end
+    local ok, err = pcall(s.encode, t)
+    test:ok(not ok, "too deep encode depth")
+
+    s.cfg({encode_max_depth = max_depth + 1})
+    ok, err = pcall(s.encode, t)
+    test:ok(ok, "no throw in a corner case")
+
+    s.cfg({encode_crop_too_deep = true, encode_max_depth = max_depth})
 end
 
 return {
diff --git a/test/app-tap/msgpackffi.test.lua b/test/app-tap/msgpackffi.test.lua
index e26247625..d80978939 100755
--- a/test/app-tap/msgpackffi.test.lua
+++ b/test/app-tap/msgpackffi.test.lua
@@ -36,7 +36,7 @@ local function test_offsets(test, s)
 end
 
 local function test_other(test, s)
-    test:plan(21)
+    test:plan(22)
     local buf = string.char(0x93, 0x6e, 0xcb, 0x42, 0x2b, 0xed, 0x30, 0x47,
         0x6f, 0xff, 0xff, 0xac, 0x77, 0x6b, 0x61, 0x71, 0x66, 0x7a, 0x73,
         0x7a, 0x75, 0x71, 0x71, 0x78)
@@ -82,6 +82,7 @@ local function test_other(test, s)
         return level
     end
     local msgpack = require('msgpack')
+    msgpack.cfg({encode_crop_too_deep = true})
     local max_depth = msgpack.cfg.encode_max_depth
     local result_depth = check_depth(max_depth + 5)
     test:is(result_depth, max_depth,
@@ -91,6 +92,10 @@ local function test_other(test, s)
     result_depth = check_depth(max_depth + 5)
     test:is(result_depth, max_depth + 5, "and uses it dynamically")
 
+    msgpack.cfg({encode_crop_too_deep = false})
+    local ok = pcall(check_depth, max_depth + 6)
+    test:ok(not ok, "exception is thrown when crop is not allowed")
+
     msgpack.cfg({encode_max_depth = max_depth})
 end
 
diff --git a/test/box/tuple.result b/test/box/tuple.result
index 83f74d111..088400276 100644
--- a/test/box/tuple.result
+++ b/test/box/tuple.result
@@ -1401,6 +1401,12 @@ d:update{{'-', 1, dec.new('1e37')}}
 max_depth = msgpack.cfg.encode_max_depth
 ---
 ...
+crop_too_deep = msgpack.cfg.encode_crop_too_deep
+---
+...
+msgpack.cfg({encode_crop_too_deep = true})
+---
+...
 t = nil
 ---
 ...
@@ -1438,6 +1444,6 @@ level == max_depth + 5 or {level, max_depth}
 ---
 - true
 ...
-msgpack.cfg({encode_max_depth = max_depth})
+msgpack.cfg({encode_max_depth = max_depth, encode_crop_too_deep = crop_too_deep})
 ---
 ...
diff --git a/test/box/tuple.test.lua b/test/box/tuple.test.lua
index c8a0c03c5..7660f651c 100644
--- a/test/box/tuple.test.lua
+++ b/test/box/tuple.test.lua
@@ -478,6 +478,8 @@ d:update{{'-', 1, dec.new('1e37')}}
 -- gh-4434: tuple should use global msgpack serializer.
 --
 max_depth = msgpack.cfg.encode_max_depth
+crop_too_deep = msgpack.cfg.encode_crop_too_deep
+msgpack.cfg({encode_crop_too_deep = true})
 t = nil
 for i = 1, max_depth + 5 do t = {t} end
 tuple = box.tuple.new(t):totable()
@@ -493,4 +495,4 @@ while tuple ~= nil do level = level + 1 tuple = tuple[1] end
 -- serializer allows deeper tables.
 level == max_depth + 5 or {level, max_depth}
 
-msgpack.cfg({encode_max_depth = max_depth})
+msgpack.cfg({encode_max_depth = max_depth, encode_crop_too_deep = crop_too_deep})
diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
index 6b03e391a..0c325d03c 100644
--- a/third_party/lua-cjson/lua_cjson.c
+++ b/third_party/lua-cjson/lua_cjson.c
@@ -400,14 +400,20 @@ static void json_append_data(lua_State *l, struct luaL_serializer *cfg,
     json_append_nil(cfg, json);
     break;
     case MP_MAP:
-    if (current_depth >= cfg->encode_max_depth)
+    if (current_depth >= cfg->encode_max_depth) {
+        if (! cfg->encode_crop_too_deep)
+            luaL_error(l, "Too high nest level");
         return json_append_nil(cfg, json); /* Limit nested maps */
+    }
     json_append_object(l, cfg, current_depth + 1, json);
     return;
     case MP_ARRAY:
     /* Array */
-    if (current_depth >= cfg->encode_max_depth)
+    if (current_depth >= cfg->encode_max_depth) {
+        if (! cfg->encode_crop_too_deep)
+            luaL_error(l, "Too high nest level");
         return json_append_nil(cfg, json); /* Limit nested arrays */
+    }
     json_append_array(l, cfg, current_depth + 1, json, field.size);
     return;
     case MP_EXT:
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 0/4] Serializer bugs
@ 2019-09-10 20:24 Vladislav Shpilevoy
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 1/4] app: serializers update now is reflected in Lua Vladislav Shpilevoy
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-10 20:24 UTC (permalink / raw)
  To: tarantool-patches; +Cc: alexander.turenko

The patchset fixes several problems about JSON and msgpack serializers.

1) msgpackffi didn't use msgpack serializer options;
2) tuple serializer either;
3) update of a serializer option was not reflected in its Lua representation;
4) during serialization too nested tables are silently cropped and there was no
  way to prevent it.

Also it was discovered, that msgpackffi does not care *all* options. Not only
about max_depth. I am not sure if it is worth fixing here (or at all) though.

Changes in v2:
- Renames;
- Default value a flag for whether we need to raise an error on a too nested msgpack (was crop, now error).

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4434-box-lua-msgpack-cfg
Issue: https://github.com/tarantool/tarantool/issues/4434

Vladislav Shpilevoy (4):
  app: serializers update now is reflected in Lua
  msgpack: make msgpackffi use encode_max_depth option
  tuple: use global msgpack serializer in Lua tuple
  app: allow to raise an error on too nested tables

 src/box/lua/tuple.c                  | 33 +++++++++++++-----
 src/lua/msgpack.c                    |  4 +++
 src/lua/msgpackffi.lua               |  6 ++--
 src/lua/utils.c                      | 47 +++++++++++++++----------
 src/lua/utils.h                      | 32 +++++++++++++++++
 test/app-tap/json.test.lua           |  3 +-
 test/app-tap/lua/serializer_test.lua | 31 +++++++++++++++++
 test/app-tap/msgpack.test.lua        |  3 +-
 test/app-tap/msgpackffi.test.lua     | 31 ++++++++++++++++-
 test/box/tuple.result                | 52 ++++++++++++++++++++++++++++
 test/box/tuple.test.lua              | 23 ++++++++++++
 test/unit/luaT_tuple_new.c           |  2 +-
 third_party/lua-cjson/lua_cjson.c    | 10 ++++--
 13 files changed, 243 insertions(+), 34 deletions(-)

-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 0/4] Serializer bugs
  2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 4/4] app: allow to raise an error on too nested tables Vladislav Shpilevoy
@ 2019-09-10 20:25 ` Vladislav Shpilevoy
  2019-09-12 23:44 ` Alexander Turenko
  5 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-10 20:25 UTC (permalink / raw)
  To: tarantool-patches; +Cc: alexander.turenko

Sorry, accidentally sent again. It is a duplicate, ignore please.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 1/4] app: serializers update now is reflected in Lua
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 1/4] app: serializers update now is reflected in Lua Vladislav Shpilevoy
@ 2019-09-12 23:22   ` Alexander Turenko
  2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Turenko @ 2019-09-12 23:22 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

LGTM.

Small comments are below.

WBR, Alexander Turenko.

>  src/lua/utils.c                      | 34 +++++++++++++++-------------
>  test/app-tap/json.test.lua           |  3 ++-
>  test/app-tap/lua/serializer_test.lua | 13 +++++++++++
>  test/app-tap/msgpack.test.lua        |  3 ++-

It looks that it worth to add the test to test/app-tap/yaml.test.lua
too.

>  4 files changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 75efe0ed2..a082a2e5b 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -270,10 +270,8 @@ luaL_serializer_parse_option(struct lua_State *L, int i,
>  			     struct luaL_serializer *cfg)

Now the function pushes a value to a Lua stack, I think it worth to
update the doxygen-style comment.

>  /**
> - * @brief serializer.cfg{} Lua binding for serializers.
> - * serializer.cfg is a table that contains current configuration values from
> - * luaL_serializer structure. serializer.cfg has overriden __call() method
> - * to change configuration keys in internal userdata (like box.cfg{}).
> - * Please note that direct change in serializer.cfg.key will not affect
> - * internal state of userdata.

The note is still actual, I would not remove it.

( Sooner or later we should just do
https://github.com/tarantool/tarantool/issues/2867 )

Side note: I think that it is better to have one source of information
and have only a handle in Lua. Or maybe (if option(s) access cost is
matter) update the Lua table using some kind of triggers: if we'll
change an option from C, then the current implementation will fail to
track it.

Anyway, it is not a part of your issue. Just side thoughts.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option Vladislav Shpilevoy
@ 2019-09-12 23:24   ` Alexander Turenko
  2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Turenko @ 2019-09-12 23:24 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

LGTM.

One small comment about the test case, see below.

WBR, Alexander Turenko.

> +    --
> +    -- gh-4434: msgpackffi does not care about msgpack serializer
> +    -- configuration, but it should.
> +    --
> +    local function check_depth(depth_to_try)
> +        local t = nil
> +        for i = 1, depth_to_try do t = {t} end
> +        t = s.decode_unchecked(s.encode(t))
> +        local level = 0
> +        while t ~= nil do level = level + 1 t = t[1] end
> +        return level
> +    end
> +    local msgpack = require('msgpack')
> +    local max_depth = msgpack.cfg.encode_max_depth
> +    local result_depth = check_depth(max_depth + 5)
> +    test:is(result_depth, max_depth,
> +            "msgpackffi uses msgpack.cfg.encode_max_depth")
> +
> +    msgpack.cfg({encode_max_depth = max_depth + 5})
> +    result_depth = check_depth(max_depth + 5)
> +    test:is(result_depth, max_depth + 5, "and uses it dynamically")
> +
> +    msgpack.cfg({encode_max_depth = max_depth})

Just in case: we recently close an [issue][1] when json module handles
max depth correctly for maps, but non-correctly for arrays. I think it
worth to ensure that those cases works good both for msgpackffi too.

[1]: https://github.com/tarantool/tarantool/issues/4366

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple Vladislav Shpilevoy
@ 2019-09-12 23:27   ` Alexander Turenko
  2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Turenko @ 2019-09-12 23:27 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

LGTM.

Several questions are below.

I agree that the variant with triggers looks more natural.

Should not we add box_lua_tuple_free() and call trigger_destroy()
inside? I know, we now don't call tarantool_lua_free(), but I hope it
will be fixed someday.

If you'll going to add it, please, add also the call to luaT_tuple_new.c
for the symmetry with box_lua_tuple_init(). It also will help to keep
this test clean from ASAN / Valgrind point of view: I did verify it
aganst one of those tools at the time of writing the test (don't
remember against which of them).

Hm. We don't have box_lua_free() at all. I'm doubtful now.

WBR, Alexander Turenko.

> +static inline void
> +tuple_serializer_fill(void)
> +{
> +	luaL_serializer_copy_options(&tuple_serializer, luaL_msgpack_default);
> +	tuple_serializer.encode_sparse_ratio = 0;
> +}

Is not this name quite common? Maybe tuple_serializer_update_options()?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 4/4] app: allow to raise an error on too nested tables
  2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 4/4] app: allow to raise an error on too nested tables Vladislav Shpilevoy
@ 2019-09-12 23:32   ` Alexander Turenko
  2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Turenko @ 2019-09-12 23:32 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

> app: allow to raise an error on too nested tables

The commit header and the description looks like we have 'crop'
behaviour as default, however it is not more so.

I agree that we should change the default behaviour and raise an error
by default. This however should be clearly stated in the docbot request.

See more comments below.

WBR, Alexander Turenko.

>     msgpack.cfg({encode_crop_too_deep = false})

We have encode_invalid_as_nil with the similar meanings, so, maybe,
encode_too_deep_as_nil?

Or maybe rephrase 'too deep' as one word: foots, dregs, underside? Those
variants don't look good, but maybe you know some word that would fit
better.

> +			if (! cfg->encode_crop_too_deep)
> +				return luaL_error(L, "Too high nest level");

We can easily give more information: say, a current max depth level; so
I think it worth to add to the error message. (The same for msgpackffi
and json.)

> +    -- gh-4434 (yes, the same issue): let users choose whether
> +    -- they want to raise an error on tables with too high nest
> +    -- level.
> +    --
> +    s.cfg({encode_crop_too_deep = false})
> +
> +    local t = nil
> +    for i = 1, max_depth + 1 do t = {t} end
> +    local ok, err = pcall(s.encode, t)
> +    test:ok(not ok, "too deep encode depth")
> +
> +    s.cfg({encode_max_depth = max_depth + 1})
> +    ok, err = pcall(s.encode, t)
> +    test:ok(ok, "no throw in a corner case")
> +
> +    s.cfg({encode_crop_too_deep = true, encode_max_depth = max_depth})

I think we should save a default value of the option and set it here as
we do with encode_max_depth. (The same for msgpackffi.test.lua.)

BTW, the default is 'false' and we should set the default value here
(now we set 'true'). Following test cases (e.g. gh-2888 test case in
app-tap/json.test.lua) should set and restore the option locally.

Maybe it also worth add a test case that will verify that the default
behaviour is to raise an error?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 0/4] Serializer bugs
  2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
                   ` (4 preceding siblings ...)
  2019-09-10 20:25 ` [tarantool-patches] Re: [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
@ 2019-09-12 23:44 ` Alexander Turenko
  2019-09-13 22:32   ` Vladislav Shpilevoy
  5 siblings, 1 reply; 16+ messages in thread
From: Alexander Turenko @ 2019-09-12 23:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Mon, Sep 09, 2019 at 09:00:06PM +0200, Vladislav Shpilevoy wrote:
> The patchset fixes several problems about JSON and msgpack serializers.
> 
> 1) msgpackffi didn't use msgpack serializer options;
> 2) tuple serializer either;
> 3) update of a serializer option was not reflected in its Lua representation;
> 4) during serialization too nested tables are silently cropped and there was no
>   way to prevent it.
> 
> Also it was discovered, that msgpackffi does not care *all* options. Not only
> about max_depth. I am not sure if it is worth fixing here (or at all) though.

I would not do it here. However it worth to file a follow up issue: we
should either fix or document this behaviour. It seems that it may be
very unexpected that space:update / upsert does not follow, say,
encode_invalid_numbers (if a user use this option and see that it works
for other box functions).

> 
> Changes in v2:
> - Renames;
> - Default value a flag for whether we need to raise an error on a too nested msgpack (was crop, now error).
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4434-box-lua-msgpack-cfg
> Issue: https://github.com/tarantool/tarantool/issues/4434

Thank you!

The pathset looks mostly okay for me. I gave LGTM (with some minor
comments) for the first three patches. I would discuss the option name
in the fourth patch.

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 4/4] app: allow to raise an error on too nested tables
  2019-09-12 23:32   ` [tarantool-patches] " Alexander Turenko
@ 2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-13 22:32 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

Thanks for the review!

On 13/09/2019 01:32, Alexander Turenko wrote:
>> app: allow to raise an error on too nested tables
> 
> The commit header and the description looks like we have 'crop'
> behaviour as default, however it is not more so.
> 
> I agree that we should change the default behaviour and raise an error
> by default. This however should be clearly stated in the docbot request.

Ok, new commit message:

    app: raise an error on too nested tables serialization
    
    Closes #4434
    Follow-up #4366
    
    @TarantoolBot document
    Title: json/msgpack.cfg.encode_crop_too_deep option
    
    Tarantool has several so called serializers to convert data
    between Lua and another format: YAML, JSON, msgpack.
    
    YAML is a crazy serializer without depth restrictions. But for
    JSON and msgpack a user could set encode_max_depth option. That
    option led to crop of a table when it had too many nested levels.
    Sometimes such behaviour is undesirable.
    
    Now an error is raised instead of data corruption:
    
        t = nil
        for i = 1, 100 do t = {t} end
        msgpack.encode(t) -- Here an exception is thrown.
    
    To disable it and return the old behaviour back here is a new
    option:
    
        <serializer>.cfg({encode_crop_too_deep = true})
    
    Option encode_crop_too_deep works for JSON and msgpack modules,
    and is false by default. It means, that now if some existing
    users have cropping, even intentional, they will get the
    exception.

> 
> See more comments below.
> 
> WBR, Alexander Turenko.
> 
>>     msgpack.cfg({encode_crop_too_deep = false})
> 
> We have encode_invalid_as_nil with the similar meanings, so, maybe,
> encode_too_deep_as_nil?
> 
> Or maybe rephrase 'too deep' as one word: foots, dregs, underside? Those
> variants don't look good, but maybe you know some word that would fit
> better.

No, I don't know a good synonym. But perhaps we could use
'encode_max_depth_and_crop'. That 1) makes clear, that the option is
related to 'encode_max_depth', 2) is obvious, that data deeper than
max_depth is cropped. Is it ok?

Or 'encode_crop_after_max_depth'. Or 'encode_trim_by_max_depth'.

>> +			if (! cfg->encode_crop_too_deep)
>> +				return luaL_error(L, "Too high nest level");
> 
> We can easily give more information: say, a current max depth level; so
> I think it worth to add to the error message. (The same for msgpackffi
> and json.)
> 

The error message will always just contain cfg.encode_max_depth + 1, I
don't think it makes much sense to return it. But ok, it won't make worse:

diff --git a/src/lua/msgpack.c b/src/lua/msgpack.c
index b1354776d..06f6dc53e 100644
--- a/src/lua/msgpack.c
+++ b/src/lua/msgpack.c
@@ -143,8 +143,10 @@ restart: /* used by MP_EXT */
 	case MP_MAP:
 		/* Map */
 		if (level >= cfg->encode_max_depth) {
-			if (! cfg->encode_crop_too_deep)
-				return luaL_error(L, "Too high nest level");
+			if (! cfg->encode_crop_too_deep) {
+				return luaL_error(L, "Too high nest level - %d",
+						  level + 1);
+			}
 			mpstream_encode_nil(stream); /* Limit nested maps */
 			return MP_NIL;
 		}
@@ -167,7 +169,8 @@ restart: /* used by MP_EXT */
 		/* Array */
 		if (level >= cfg->encode_max_depth) {
-			if (! cfg->encode_crop_too_deep)
-				return luaL_error(L, "Too high nest level");
+			if (! cfg->encode_crop_too_deep) {
+				return luaL_error(L, "Too high nest level - %d",
+						  level + 1);
+			}
 			mpstream_encode_nil(stream); /* Limit nested arrays */
 			return MP_NIL;
 		}
diff --git a/src/lua/msgpackffi.lua b/src/lua/msgpackffi.lua
index 73d0d6fe2..138663ccb 100644
--- a/src/lua/msgpackffi.lua
+++ b/src/lua/msgpackffi.lua
@@ -220,7 +220,8 @@ local function encode_r(buf, obj, level)
     elseif type(obj) == "table" then
         if level >= msgpack.cfg.encode_max_depth then
             if not msgpack.cfg.encode_crop_too_deep then
-                error('Too high nest level')
+                error(string.format('Too high nest level - %d',
+                                    msgpack.cfg.encode_max_depth + 1))
             end
             encode_nil(buf)
             return

>> +    -- gh-4434 (yes, the same issue): let users choose whether
>> +    -- they want to raise an error on tables with too high nest
>> +    -- level.
>> +    --
>> +    s.cfg({encode_crop_too_deep = false})
>> +
>> +    local t = nil
>> +    for i = 1, max_depth + 1 do t = {t} end
>> +    local ok, err = pcall(s.encode, t)
>> +    test:ok(not ok, "too deep encode depth")
>> +
>> +    s.cfg({encode_max_depth = max_depth + 1})
>> +    ok, err = pcall(s.encode, t)
>> +    test:ok(ok, "no throw in a corner case")
>> +
>> +    s.cfg({encode_crop_too_deep = true, encode_max_depth = max_depth})
> 
> I think we should save a default value of the option and set it here as
> we do with encode_max_depth. (The same for msgpackffi.test.lua.)

Agree, it looks more correct. Only not 'default' but just 'previous',
used before a test:

diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua
index 0a8966866..ba77a8d77 100755
--- a/test/app-tap/json.test.lua
+++ b/test/app-tap/json.test.lua
@@ -38,9 +38,10 @@ tap.test("json", function(test)
     --
     -- gh-2888: Check the possibility of using options in encode()/decode().
     --
+    local orig_encode_crop_too_deep = serializer.cfg.encode_crop_too_deep
     local orig_encode_max_depth = serializer.cfg.encode_max_depth
     local sub = {a = 1, { b = {c = 1, d = {e = 1}}}}
-    serializer.cfg({encode_max_depth = 1})
+    serializer.cfg({encode_max_depth = 1, encode_crop_too_deep = true})
     test:ok(serializer.encode(sub) == '{"1":null,"a":1}',
             'depth of encoding is 1 with .cfg')
     serializer.cfg({encode_max_depth = orig_encode_max_depth})
@@ -121,5 +122,6 @@ tap.test("json", function(test)
     rec4['b'] = rec4
     test:is(serializer.encode(rec4),
             '{"a":{"a":null,"b":null},"b":{"a":null,"b":null}}')
-    serializer.cfg({encode_max_depth = orig_encode_max_depth})
+    serializer.cfg({encode_max_depth = orig_encode_max_depth,
+                    encode_crop_too_deep = orig_encode_crop_too_deep})
 end)
diff --git a/test/app-tap/lua/serializer_test.lua b/test/app-tap/lua/serializer_test.lua
index 4fa2924cf..85738924a 100644
--- a/test/app-tap/lua/serializer_test.lua
+++ b/test/app-tap/lua/serializer_test.lua
@@ -418,6 +418,7 @@ local function test_depth(test, s)
     -- they want to raise an error on tables with too high nest
     -- level.
     --
+    local crop_too_deep = s.cfg.encode_crop_too_deep
     s.cfg({encode_crop_too_deep = false})
 
     local t = nil
@@ -429,7 +430,7 @@ local function test_depth(test, s)
     ok, err = pcall(s.encode, t)
     test:ok(ok, "no throw in a corner case")
 
-    s.cfg({encode_crop_too_deep = true, encode_max_depth = max_depth})
+    s.cfg({encode_crop_too_deep = crop_too_deep, encode_max_depth = max_depth})
 end
 
 return {
diff --git a/test/app-tap/msgpackffi.test.lua b/test/app-tap/msgpackffi.test.lua
index a418f5ac1..dbae72362 100755
--- a/test/app-tap/msgpackffi.test.lua
+++ b/test/app-tap/msgpackffi.test.lua
@@ -82,6 +82,7 @@ local function test_other(test, s)
         return level
     end
     local msgpack = require('msgpack')
+    local crop_too_deep = msgpack.cfg.encode_crop_too_deep
     msgpack.cfg({encode_crop_too_deep = true})
     local max_depth = msgpack.cfg.encode_max_depth
     local result_depth = check_depth(max_depth + 5)
@@ -110,7 +111,8 @@ local function test_other(test, s)
     local ok = pcall(check_depth, max_depth + 6)
     test:ok(not ok, "exception is thrown when crop is not allowed")
 
-    msgpack.cfg({encode_max_depth = max_depth})
+    msgpack.cfg({encode_crop_too_deep = crop_too_deep,
+                 encode_max_depth = max_depth})
 end
 
 tap.test("msgpackffi", function(test)

> 
> BTW, the default is 'false' and we should set the default value here
> (now we set 'true'). Following test cases (e.g. gh-2888 test case in
> app-tap/json.test.lua) should set and restore the option locally.
> 
> Maybe it also worth add a test case that will verify that the default
> behaviour is to raise an error?
> 

Not sure. Such a test would assume a certain value of
msgpack.cfg.encode_crop_too_deep. This is exactly what I was trying to
avoid in the tests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 1/4] app: serializers update now is reflected in Lua
  2019-09-12 23:22   ` [tarantool-patches] " Alexander Turenko
@ 2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-13 22:32 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

Thanks for the review!

On 13/09/2019 01:22, Alexander Turenko wrote:
> LGTM.
> 
> Small comments are below.
> 
> WBR, Alexander Turenko.
> 
>>  src/lua/utils.c                      | 34 +++++++++++++++-------------
>>  test/app-tap/json.test.lua           |  3 ++-
>>  test/app-tap/lua/serializer_test.lua | 13 +++++++++++
>>  test/app-tap/msgpack.test.lua        |  3 ++-
> 
> It looks that it worth to add the test to test/app-tap/yaml.test.lua
> too.

The test is called 'test_depth' and YAML serializer does not check max
depth. This is why I omitted it for YAML, deliberately. Next commits
add here more depth-specific tests, and they will fail for YAML.

>>  4 files changed, 35 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/lua/utils.c b/src/lua/utils.c
>> index 75efe0ed2..a082a2e5b 100644
>> --- a/src/lua/utils.c
>> +++ b/src/lua/utils.c
>> @@ -270,10 +270,8 @@ luaL_serializer_parse_option(struct lua_State *L, int i,
>>  			     struct luaL_serializer *cfg)
> 
> Now the function pushes a value to a Lua stack, I think it worth to
> update the doxygen-style comment.
> 

I thought it is quite obvious, and the old comment was still valid. But
ok:

diff --git a/src/lua/utils.c b/src/lua/utils.c
index a082a2e5b..6789a8477 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -258,7 +258,8 @@ luaL_serializer_create(struct luaL_serializer *cfg)
 }
 
 /**
- * Configure one field in @a cfg.
+ * Configure one field in @a cfg. Value of the field is kept on
+ * Lua stack after this function, and should be popped manually.
  * @param L Lua stack.
  * @param i Index of option in OPTIONS[].
  * @param cfg Serializer to inherit configuration.

(I didn't describe it in @retval, because this C function does not
return Lua values. It still returns the same - int pointer).

>>  /**
>> - * @brief serializer.cfg{} Lua binding for serializers.
>> - * serializer.cfg is a table that contains current configuration values from
>> - * luaL_serializer structure. serializer.cfg has overriden __call() method
>> - * to change configuration keys in internal userdata (like box.cfg{}).
>> - * Please note that direct change in serializer.cfg.key will not affect
>> - * internal state of userdata.
> 
> The note is still actual, I would not remove it.

The old comment is super overcomplicated and obvious I think. But ok:

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 6789a8477..93e00d81f 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -302,9 +302,15 @@ luaL_serializer_parse_options(struct lua_State *L,
 }
 
 /**
- * Serializer.cfg.__call implementation. Merge new parameters into
- * both C structure of the serializer, and into its Lua table
- * representation: serializer.cfg.
+ * @brief serializer.cfg{} Lua binding for serializers.
+ * serializer.cfg is a table that contains current configuration values from
+ * luaL_serializer structure. serializer.cfg has overriden __call() method
+ * to change configuration keys in internal userdata (like box.cfg{}).
+ * Please note that direct change in serializer.cfg.key will not affect
+ * internal state of userdata. Changes via cfg() are reflected in
+ * both Lua cfg table, and C serializer structure.
+ * @param L lua stack
+ * @return 0
  */
 static int
 luaL_serializer_cfg(struct lua_State *L)

> 
> ( Sooner or later we should just do
> https://github.com/tarantool/tarantool/issues/2867 )
> 
> Side note: I think that it is better to have one source of information
> and have only a handle in Lua. Or maybe (if option(s) access cost is
> matter) update the Lua table using some kind of triggers: if we'll
> change an option from C, then the current implementation will fail to
> track it.

We could copy luaL_serializer to ffi.cdef so as Lua would be able to work
with that structure directly. That looks like a piece of cake.

> 
> Anyway, it is not a part of your issue. Just side thoughts.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option
  2019-09-12 23:24   ` [tarantool-patches] " Alexander Turenko
@ 2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-13 22:32 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

Thanks for the review!

On 13/09/2019 01:24, Alexander Turenko wrote:
> LGTM.
> 
> One small comment about the test case, see below.
> 
> WBR, Alexander Turenko.
> 
>> +    --
>> +    -- gh-4434: msgpackffi does not care about msgpack serializer
>> +    -- configuration, but it should.
>> +    --
>> +    local function check_depth(depth_to_try)
>> +        local t = nil
>> +        for i = 1, depth_to_try do t = {t} end
>> +        t = s.decode_unchecked(s.encode(t))
>> +        local level = 0
>> +        while t ~= nil do level = level + 1 t = t[1] end
>> +        return level
>> +    end
>> +    local msgpack = require('msgpack')
>> +    local max_depth = msgpack.cfg.encode_max_depth
>> +    local result_depth = check_depth(max_depth + 5)
>> +    test:is(result_depth, max_depth,
>> +            "msgpackffi uses msgpack.cfg.encode_max_depth")
>> +
>> +    msgpack.cfg({encode_max_depth = max_depth + 5})
>> +    result_depth = check_depth(max_depth + 5)
>> +    test:is(result_depth, max_depth + 5, "and uses it dynamically")
>> +
>> +    msgpack.cfg({encode_max_depth = max_depth})
> 
> Just in case: we recently close an [issue][1] when json module handles
> max depth correctly for maps, but non-correctly for arrays. I think it
> worth to ensure that those cases works good both for msgpackffi too.
> 
> [1]: https://github.com/tarantool/tarantool/issues/4366
> 

Ok:

diff --git a/test/app-tap/msgpackffi.test.lua b/test/app-tap/msgpackffi.test.lua
index e26247625..a1028e0ba 100755
--- a/test/app-tap/msgpackffi.test.lua
+++ b/test/app-tap/msgpackffi.test.lua
@@ -36,7 +36,7 @@ local function test_offsets(test, s)
 end
 
 local function test_other(test, s)
-    test:plan(21)
+    test:plan(23)
     local buf = string.char(0x93, 0x6e, 0xcb, 0x42, 0x2b, 0xed, 0x30, 0x47,
         0x6f, 0xff, 0xff, 0xac, 0x77, 0x6b, 0x61, 0x71, 0x66, 0x7a, 0x73,
         0x7a, 0x75, 0x71, 0x71, 0x78)
@@ -91,6 +91,20 @@ local function test_other(test, s)
     result_depth = check_depth(max_depth + 5)
     test:is(result_depth, max_depth + 5, "and uses it dynamically")
 
+    -- Recursive tables are handled correctly.
+    local level = 0
+    local t = {}
+    t[1] = t
+    t = s.decode(s.encode(t))
+    while t ~= nil do level = level + 1 t = t[1] end
+    test:is(level, max_depth + 5, "recursive array")
+    t = {}
+    t.key = t
+    t = s.decode(s.encode(t))
+    level = 0
+    while t ~= nil do level = level + 1 t = t.key end
+    test:is(level, max_depth + 5, "recursive map")
+
     msgpack.cfg({encode_max_depth = max_depth})
 end
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple
  2019-09-12 23:27   ` [tarantool-patches] " Alexander Turenko
@ 2019-09-13 22:32     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-13 22:32 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

Thanks for the review!

On 13/09/2019 01:27, Alexander Turenko wrote:
> LGTM.
> 
> Several questions are below.
> 
> I agree that the variant with triggers looks more natural.
> 
> Should not we add box_lua_tuple_free() and call trigger_destroy()
> inside? I know, we now don't call tarantool_lua_free(), but I hope it
> will be fixed someday.
> 
> If you'll going to add it, please, add also the call to luaT_tuple_new.c
> for the symmetry with box_lua_tuple_init(). It also will help to keep
> this test clean from ASAN / Valgrind point of view: I did verify it
> aganst one of those tools at the time of writing the test (don't
> remember against which of them).

I tried to add box_lua_tuple_free, but as you mentioned, we don't
have box_lua_free() so that function is unused. I don't think it
should be added now.

Also I stumbled into a problem, that luaT_tuple_new.c uses
luaopen_msgpack, but we don't have luaclose_msgpack. So there
are 2 holes. Moreover, as I understand, no one of our Lua C
modules really drops its bindings from Lua namespace. It is
not related to this issue, but I remember that someday we wanted
to be able to 'uncfg' box, like it was not called.

> 
> Hm. We don't have box_lua_free() at all. I'm doubtful now.
> 
> WBR, Alexander Turenko.
> 
>> +static inline void
>> +tuple_serializer_fill(void)
>> +{
>> +	luaL_serializer_copy_options(&tuple_serializer, luaL_msgpack_default);
>> +	tuple_serializer.encode_sparse_ratio = 0;
>> +}
> 
> Is not this name quite common? Maybe tuple_serializer_update_options()?
> 

Honestly, I don't really care that much. The function with all its invocations
fits in one screen. Both names are ok for me:

diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
index fc22f5572..8b59466b9 100644
--- a/src/box/lua/tuple.c
+++ b/src/box/lua/tuple.c
@@ -554,7 +554,7 @@ static const struct luaL_Reg lbox_tuple_iterator_meta[] = {
 /* }}} */
 
 static inline void
-tuple_serializer_fill(void)
+tuple_serializer_update_options(void)
 {
 	luaL_serializer_copy_options(&tuple_serializer, luaL_msgpack_default);
 	tuple_serializer.encode_sparse_ratio = 0;
@@ -565,7 +565,7 @@ on_msgpack_serializer_update(struct trigger *trigger, void *event)
 {
 	(void) trigger;
 	(void) event;
-	tuple_serializer_fill();
+	tuple_serializer_update_options();
 }
 
 void
@@ -584,7 +584,7 @@ box_lua_tuple_init(struct lua_State *L)
 
 	luamp_set_encode_extension(luamp_encode_extension_box);
 
-	tuple_serializer_fill();
+	tuple_serializer_update_options();
 	trigger_create(&tuple_serializer.update_trigger,
 		       on_msgpack_serializer_update, NULL, NULL);
 	trigger_add(&luaL_msgpack_default->on_update,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 0/4] Serializer bugs
  2019-09-12 23:44 ` Alexander Turenko
@ 2019-09-13 22:32   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2019-09-13 22:32 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

Hi!

Thanks for the review!

On 13/09/2019 01:44, Alexander Turenko wrote:
> On Mon, Sep 09, 2019 at 09:00:06PM +0200, Vladislav Shpilevoy wrote:
>> The patchset fixes several problems about JSON and msgpack serializers.
>>
>> 1) msgpackffi didn't use msgpack serializer options;
>> 2) tuple serializer either;
>> 3) update of a serializer option was not reflected in its Lua representation;
>> 4) during serialization too nested tables are silently cropped and there was no
>>   way to prevent it.
>>
>> Also it was discovered, that msgpackffi does not care *all* options. Not only
>> about max_depth. I am not sure if it is worth fixing here (or at all) though.
> 
> I would not do it here. However it worth to file a follow up issue: we
> should either fix or document this behaviour. It seems that it may be
> very unexpected that space:update / upsert does not follow, say,
> encode_invalid_numbers (if a user use this option and see that it works
> for other box functions).

https://github.com/tarantool/tarantool/issues/4499

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-09-13 22:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-10 20:24 [tarantool-patches] [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 1/4] app: serializers update now is reflected in Lua Vladislav Shpilevoy
2019-09-12 23:22   ` [tarantool-patches] " Alexander Turenko
2019-09-13 22:32     ` Vladislav Shpilevoy
2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 2/4] msgpack: make msgpackffi use encode_max_depth option Vladislav Shpilevoy
2019-09-12 23:24   ` [tarantool-patches] " Alexander Turenko
2019-09-13 22:32     ` Vladislav Shpilevoy
2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 3/4] tuple: use global msgpack serializer in Lua tuple Vladislav Shpilevoy
2019-09-12 23:27   ` [tarantool-patches] " Alexander Turenko
2019-09-13 22:32     ` Vladislav Shpilevoy
2019-09-09 19:00 ` [tarantool-patches] [PATCH v2 4/4] app: allow to raise an error on too nested tables Vladislav Shpilevoy
2019-09-12 23:32   ` [tarantool-patches] " Alexander Turenko
2019-09-13 22:32     ` Vladislav Shpilevoy
2019-09-10 20:25 ` [tarantool-patches] Re: [PATCH v2 0/4] Serializer bugs Vladislav Shpilevoy
2019-09-12 23:44 ` Alexander Turenko
2019-09-13 22:32   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox