Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
@ 2021-05-23 14:06 Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 10:04 ` Oleg Babin via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-05-23 14:06 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, olegrok

Lua json module used to have a global buffer for all encodings. It
was reused by each next encode().

This was not correct, because during encode() might happen a GC
step, which might call encode() again and spoil the global buffer.

The same problem was already fixed for the global static buffer in
scope of #5632. Similarly to that time, the patch makes Lua json
module use cord_ibuf to prevent "concurrent" usage of the buffer
data. The global buffer is deleted.

According to a few microbenchmarks it didn't affect the perf
anyhow.

Core part of the patch is strbuf changes. Firstly, its destruction
is now optional, cord_ibuf can free itself on a next yield.
Secondly, its reallocation algorithm is kept intact - ibuf is used
as an allocator, not as the buffer itself. This is done so as not
to be too intrusive in the third party module which might need an
upgrade to the upstream in the future.

Closes #6050
---
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-6050-lua-json-static-buf-gc
Issue: https://github.com/tarantool/tarantool/issues/6050

 .../unreleased/gh-6050-gc-global-buf-json.md  |  5 ++
 ...lua => gh-5632-6050-gc-buf-reuse.test.lua} | 61 ++++++++++++++--
 test/app-tap/json.test.lua                    | 14 +++-
 third_party/lua-cjson/lua_cjson.c             | 38 ++++++----
 third_party/lua-cjson/strbuf.c                | 70 +++++--------------
 third_party/lua-cjson/strbuf.h                | 16 ++---
 6 files changed, 121 insertions(+), 83 deletions(-)
 create mode 100644 changelogs/unreleased/gh-6050-gc-global-buf-json.md
 rename test/app-tap/{gh-5632-gc-buf-reuse.test.lua => gh-5632-6050-gc-buf-reuse.test.lua} (70%)

diff --git a/changelogs/unreleased/gh-6050-gc-global-buf-json.md b/changelogs/unreleased/gh-6050-gc-global-buf-json.md
new file mode 100644
index 000000000..f214ea230
--- /dev/null
+++ b/changelogs/unreleased/gh-6050-gc-global-buf-json.md
@@ -0,0 +1,5 @@
+## bugfix/core
+
+* Fixed invalid results produced by `json` module's `encode` function when it
+  was used from Lua's garbage collector. For instance, in functions used as
+  `ffi.gc()` (gh-6050).
diff --git a/test/app-tap/gh-5632-gc-buf-reuse.test.lua b/test/app-tap/gh-5632-6050-gc-buf-reuse.test.lua
similarity index 70%
rename from test/app-tap/gh-5632-gc-buf-reuse.test.lua
rename to test/app-tap/gh-5632-6050-gc-buf-reuse.test.lua
index 6efddb714..bf7590a14 100755
--- a/test/app-tap/gh-5632-gc-buf-reuse.test.lua
+++ b/test/app-tap/gh-5632-6050-gc-buf-reuse.test.lua
@@ -1,16 +1,17 @@
 #!/usr/bin/env tarantool
 
 --
--- gh-5632: Lua code should not use any global buffers or objects without
--- proper ownership protection. Otherwise these items might be suddenly reused
--- during Lua GC which happens almost at any moment. That might lead to data
--- corruption.
+-- gh-5632, gh-6050: Lua code should not use any global buffers or objects
+-- without proper ownership protection. Otherwise these items might be suddenly
+-- reused during Lua GC which happens almost at any moment. That might lead to
+-- data corruption.
 --
 
 local tap = require('tap')
 local ffi = require('ffi')
 local uuid = require('uuid')
 local uri = require('uri')
+local json = require('json')
 local msgpackffi = require('msgpackffi')
 
 local function test_uuid(test)
@@ -142,10 +143,58 @@ local function test_msgpackffi(test)
     test:ok(is_success, 'msgpackffi in gc')
 end
 
-local test = tap.test('gh-5632-gc-buf-reuse')
-test:plan(3)
+local function test_json(test)
+    test:plan(1)
+
+    local encode = json.encode
+    local decode = json.decode
+    local gc_count = 100
+    local iter_count = 1000
+    local is_success = true
+    local data1 = {1, 2, 3, 4, 5}
+    local data2 = {6, 7, 8, 9, 10}
+
+    local function do_encode(data)
+        if not is_success then
+            return
+        end
+        local t = encode(data)
+        t = decode(t)
+        if #t ~= #data then
+            is_success = false
+            return
+        end
+        for i = 1, #t do
+            if t[i] ~= data[i] then
+                is_success = false
+                return
+            end
+        end
+    end
+
+    local function gc_encode()
+        return do_encode(data1)
+    end
+
+    local function create_gc()
+        for _ = 1, gc_count do
+            ffi.gc(ffi.new('char[1]'), gc_encode)
+        end
+    end
+
+    for _ = 1, iter_count do
+        create_gc()
+        do_encode(data2)
+    end
+
+   test:ok(is_success, 'json in gc')
+end
+
+local test = tap.test('gh-5632-6050-gc-buf-reuse')
+test:plan(4)
 test:test('uuid in __gc', test_uuid)
 test:test('uri in __gc', test_uri)
 test:test('msgpackffi in __gc', test_msgpackffi)
+test:test('json in __gc', test_json)
 
 os.exit(test:check() and 0 or 1)
diff --git a/test/app-tap/json.test.lua b/test/app-tap/json.test.lua
index be60e45c9..12a37d3ec 100755
--- a/test/app-tap/json.test.lua
+++ b/test/app-tap/json.test.lua
@@ -21,7 +21,7 @@ end
 
 tap.test("json", function(test)
     local serializer = require('json')
-    test:plan(57)
+    test:plan(58)
 
     test:test("unsigned", common.test_unsigned, serializer)
     test:test("signed", common.test_signed, serializer)
@@ -201,4 +201,16 @@ tap.test("json", function(test)
     _, err_msg = pcall(serializer.decode, '{"a": {a = {}}}')
     test:ok(string.find(err_msg, '{"a":  >> {a = {}}') ~= nil, 'context #6')
 
+    --
+    -- Create a big JSON string to ensure the string builder works fine with
+    -- internal reallocs.
+    --
+    local bigstr = string.rep('a', 16384)
+    local t = {}
+    for i = 1, 10 do
+        table.insert(t, bigstr)
+    end
+    local bigjson = serializer.encode(t)
+    local t_dec = serializer.decode(bigjson)
+    test:is_deeply(t_dec, t, 'encode/decode big strings')
 end)
diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
index 38e999870..85186d6d5 100644
--- a/third_party/lua-cjson/lua_cjson.c
+++ b/third_party/lua-cjson/lua_cjson.c
@@ -51,8 +51,7 @@
 #include "mp_extension_types.h" /* MP_DECIMAL, MP_UUID */
 #include "tt_static.h"
 #include "uuid/tt_uuid.h" /* tt_uuid_to_string(), UUID_STR_LEN */
-
-#define DEFAULT_ENCODE_KEEP_BUFFER 1
+#include "cord_buf.h"
 
 typedef enum {
     T_OBJ_BEGIN,
@@ -100,10 +99,6 @@ static struct luaL_serializer *luaL_json_default;
 static json_token_type_t ch2token[256];
 static char escape2char[256];  /* Decoding */
 
-/* encode_buf is only allocated and used when
- * encode_keep_buffer is set */
-static strbuf_t encode_buf;
-
 typedef struct {
     const char *data;
     const char *ptr;
@@ -182,9 +177,6 @@ static int json_destroy_config(lua_State *l)
 static void json_create_tokens()
 {
     int i;
-#if DEFAULT_ENCODE_KEEP_BUFFER > 0
-    strbuf_init(&encode_buf, 0);
-#endif
 
     /* Decoding init */
 
@@ -444,7 +436,9 @@ static int json_encode(lua_State *l) {
                   "expected 1 or 2 arguments");
 
     /* Reuse existing buffer. */
-    strbuf_reset(&encode_buf);
+    strbuf_t encode_buf;
+    struct ibuf *ibuf = cord_ibuf_take();
+    strbuf_create(&encode_buf, -1, ibuf);
     struct luaL_serializer *cfg = luaL_checkserializer(l);
 
     if (lua_gettop(l) == 2) {
@@ -458,6 +452,13 @@ static int json_encode(lua_State *l) {
 
     char *json = strbuf_string(&encode_buf, NULL);
     lua_pushlstring(l, json, strbuf_length(&encode_buf));
+    /*
+     * Even if an exception is raised above, it is fine to skip the buffer
+     * destruction. The strbuf object destructor does not free anything, and
+     * the cord_ibuf object is freed automatically on a next yield.
+     */
+    strbuf_destroy(&encode_buf);
+    cord_ibuf_put(ibuf);
     return 1;
 }
 
@@ -885,8 +886,9 @@ static void json_throw_parse_error(lua_State *l, json_parse_t *json,
                                    const char *exp, json_token_t *token)
 {
     const char *found;
-
-    strbuf_free(json->tmp);
+    struct ibuf *ibuf = json->tmp->ibuf;
+    strbuf_destroy(json->tmp);
+    cord_ibuf_put(ibuf);
 
     if (token->type == T_ERROR)
         found = token->value.string;
@@ -919,7 +921,9 @@ static void json_decode_descend(lua_State *l, json_parse_t *json, int slots)
     char err_context[ERR_CONTEXT_MAX_LENGTH + 1];
     fill_err_context(err_context, json, json->ptr - json->cur_line_ptr - 1);
 
-    strbuf_free(json->tmp);
+    struct ibuf *ibuf = json->tmp->ibuf;
+    strbuf_destroy(json->tmp);
+    cord_ibuf_put(ibuf);
     luaL_error(l, "Found too many nested data structures (%d) on line %d at cha"
                "racter %d here '%s'", json->current_depth, json->line_count,
                json->ptr - json->cur_line_ptr, err_context);
@@ -1101,7 +1105,10 @@ static int json_decode(lua_State *l)
     /* Ensure the temporary buffer can hold the entire string.
      * This means we no longer need to do length checks since the decoded
      * string must be smaller than the entire json string */
-    json.tmp = strbuf_new(json_len);
+    strbuf_t decode_buf;
+    json.tmp = &decode_buf;
+    struct ibuf *ibuf = cord_ibuf_take();
+    strbuf_create(&decode_buf, json_len, ibuf);
 
     json_next_token(&json, &token);
     json_process_value(l, &json, &token);
@@ -1112,7 +1119,8 @@ static int json_decode(lua_State *l)
     if (token.type != T_END)
         json_throw_parse_error(l, &json, "the end", &token);
 
-    strbuf_free(json.tmp);
+    strbuf_destroy(&decode_buf);
+    cord_ibuf_put(ibuf);
 
     return 1;
 }
diff --git a/third_party/lua-cjson/strbuf.c b/third_party/lua-cjson/strbuf.c
index f0f7f4b9a..22c1f2093 100644
--- a/third_party/lua-cjson/strbuf.c
+++ b/third_party/lua-cjson/strbuf.c
@@ -28,6 +28,7 @@
 #include <string.h>
 
 #include "strbuf.h"
+#include "small/ibuf.h"
 
 static void die(const char *fmt, ...)
 {
@@ -41,7 +42,7 @@ static void die(const char *fmt, ...)
     exit(-1);
 }
 
-void strbuf_init(strbuf_t *s, int len)
+void strbuf_create(strbuf_t *s, int len, struct ibuf *ibuf)
 {
     int size;
 
@@ -50,37 +51,19 @@ void strbuf_init(strbuf_t *s, int len)
     else
         size = len + 1;         /* \0 terminator */
 
-    s->buf = NULL;
+    s->buf = ibuf_reserve(ibuf, size);
     s->size = size;
     s->length = 0;
     s->increment = STRBUF_DEFAULT_INCREMENT;
-    s->dynamic = 0;
     s->reallocs = 0;
     s->debug = 0;
-
-    s->buf = malloc(size);
+    s->ibuf = ibuf;
     if (!s->buf)
         die("Out of memory");
 
     strbuf_ensure_null(s);
 }
 
-strbuf_t *strbuf_new(int len)
-{
-    strbuf_t *s;
-
-    s = malloc(sizeof(strbuf_t));
-    if (!s)
-        die("Out of memory");
-
-    strbuf_init(s, len);
-
-    /* Dynamic strbuf allocation / deallocation */
-    s->dynamic = 1;
-
-    return s;
-}
-
 void strbuf_set_increment(strbuf_t *s, int increment)
 {
     /* Increment > 0:  Linear buffer growth rate
@@ -99,36 +82,9 @@ static inline void debug_stats(strbuf_t *s)
     }
 }
 
-/* If strbuf_t has not been dynamically allocated, strbuf_free() can
- * be called any number of times strbuf_init() */
-void strbuf_free(strbuf_t *s)
-{
-    debug_stats(s);
-
-    if (s->buf) {
-        free(s->buf);
-        s->buf = NULL;
-    }
-    if (s->dynamic)
-        free(s);
-}
-
-char *strbuf_free_to_string(strbuf_t *s, int *len)
+void strbuf_destroy(strbuf_t *s)
 {
-    char *buf;
-
     debug_stats(s);
-
-    strbuf_ensure_null(s);
-
-    buf = s->buf;
-    if (len)
-        *len = s->length;
-
-    if (s->dynamic)
-        free(s);
-
-    return buf;
 }
 
 static int calculate_new_size(strbuf_t *s, int len)
@@ -171,9 +127,21 @@ void strbuf_resize(strbuf_t *s, int len)
         fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
                 (long)s, s->size, newsize);
     }
-
     s->size = newsize;
-    s->buf = realloc(s->buf, s->size);
+
+    struct ibuf *ibuf = s->ibuf;
+    /*
+     * Propagate the write position to enable memcpy() when realloc happens
+     * inside of the ibuf.
+     */
+    ibuf->wpos += s->length;
+    s->buf = ibuf_reserve(ibuf, newsize - s->length) - s->length;
+    /*
+     * But then it is reverted because memcpy() of the old data is done, and the
+     * write position + capacity are managed by strbuf itself.
+     */
+    ibuf->wpos = s->buf;
+
     if (!s->buf)
         die("Out of memory");
     s->reallocs++;
diff --git a/third_party/lua-cjson/strbuf.h b/third_party/lua-cjson/strbuf.h
index d861108c1..b2f8a6efa 100644
--- a/third_party/lua-cjson/strbuf.h
+++ b/third_party/lua-cjson/strbuf.h
@@ -25,6 +25,8 @@
 #include <stdlib.h>
 #include <stdarg.h>
 
+struct ibuf;
+
 /* Size: Total bytes allocated to *buf
  * Length: String length, excluding optional NULL terminator.
  * Increment: Allocation increments when resizing the string buffer.
@@ -36,9 +38,10 @@ typedef struct {
     int size;
     int length;
     int increment;
-    int dynamic;
     int reallocs;
     int debug;
+    /** Backend allocator for the buffer data. */
+    struct ibuf *ibuf;
 } strbuf_t;
 
 #ifndef STRBUF_DEFAULT_SIZE
@@ -49,13 +52,11 @@ typedef struct {
 #endif
 
 /* Initialise */
-extern strbuf_t *strbuf_new(int len);
-extern void strbuf_init(strbuf_t *s, int len);
+extern void strbuf_create(strbuf_t *s, int len, struct ibuf *ibuf);
 extern void strbuf_set_increment(strbuf_t *s, int increment);
 
 /* Release */
-extern void strbuf_free(strbuf_t *s);
-extern char *strbuf_free_to_string(strbuf_t *s, int *len);
+extern void strbuf_destroy(strbuf_t *s);
 
 /* Management */
 extern void strbuf_resize(strbuf_t *s, int len);
@@ -80,11 +81,6 @@ static inline void strbuf_reset(strbuf_t *s)
     s->length = 0;
 }
 
-static inline int strbuf_allocated(strbuf_t *s)
-{
-    return s->buf != NULL;
-}
-
 /* Return bytes remaining in the string buffer
  * Ensure there is space for a NULL terminator. */
 static inline int strbuf_empty_length(strbuf_t *s)
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-23 14:06 [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding Vladislav Shpilevoy via Tarantool-patches
@ 2021-05-24 10:04 ` Oleg Babin via Tarantool-patches
  2021-05-24 15:49   ` Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 13:01 ` Serge Petrenko via Tarantool-patches
  2021-05-25 21:20 ` Vladislav Shpilevoy via Tarantool-patches
  2 siblings, 1 reply; 10+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-05-24 10:04 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, sergepetrenko

Hi! Thanks for your patch.


I see strange effect. After a patch following script:

```

for i = 1, 1e9 do pcall(json.encode, function() end) end

```

produces quite strange effects with memory. After some time

my system kills a process - also I see in htop that process consumes 
about 20% of memory.

In contrast before the patch process uses 0.1% of memory and doesn't 
have any oscillations

in "VIRT" and "RES" columns. Yes, it's a negative case but I believe 
such behaviour shouldn't be affected as well.


Valid json encoding looks to be worked ok.

One nit below.


On 23.05.2021 17:06, Vladislav Shpilevoy wrote:
> Lua json module used to have a global buffer for all encodings. It
> was reused by each next encode().
>
> This was not correct, because during encode() might happen a GC
> step, which might call encode() again and spoil the global buffer.
>
> The same problem was already fixed for the global static buffer in
> scope of #5632. Similarly to that time, the patch makes Lua json
> module use cord_ibuf to prevent "concurrent" usage of the buffer
> data. The global buffer is deleted.
>
> According to a few microbenchmarks it didn't affect the perf
> anyhow.
>
> Core part of the patch is strbuf changes. Firstly, its destruction
> is now optional, cord_ibuf can free itself on a next yield.
> Secondly, its reallocation algorithm is kept intact - ibuf is used
> as an allocator, not as the buffer itself. This is done so as not
> to be too intrusive in the third party module which might need an
> upgrade to the upstream in the future.
>
> Closes #6050
> ---
> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
> index 38e999870..85186d6d5 100644
> --- a/third_party/lua-cjson/lua_cjson.c
> +++ b/third_party/lua-cjson/lua_cjson.c
> @@ -51,8 +51,7 @@
>   #include "mp_extension_types.h" /* MP_DECIMAL, MP_UUID */
>   #include "tt_static.h"
>   #include "uuid/tt_uuid.h" /* tt_uuid_to_string(), UUID_STR_LEN */
> -
> -#define DEFAULT_ENCODE_KEEP_BUFFER 1
> +#include "cord_buf.h"
>   
>   typedef enum {
>       T_OBJ_BEGIN,
> @@ -100,10 +99,6 @@ static struct luaL_serializer *luaL_json_default;
>   static json_token_type_t ch2token[256];
>   static char escape2char[256];  /* Decoding */
>   
> -/* encode_buf is only allocated and used when
> - * encode_keep_buffer is set */
> -static strbuf_t encode_buf;
> -
>   typedef struct {
>       const char *data;
>       const char *ptr;
> @@ -182,9 +177,6 @@ static int json_destroy_config(lua_State *l)
>   static void json_create_tokens()
>   {
>       int i;
> -#if DEFAULT_ENCODE_KEEP_BUFFER > 0
> -    strbuf_init(&encode_buf, 0);
> -#endif
>   
>       /* Decoding init */
>   
> @@ -444,7 +436,9 @@ static int json_encode(lua_State *l) {
>                     "expected 1 or 2 arguments");
>   
>       /* Reuse existing buffer. */
> -    strbuf_reset(&encode_buf);
> +    strbuf_t encode_buf;
> +    struct ibuf *ibuf = cord_ibuf_take();
> +    strbuf_create(&encode_buf, -1, ibuf);

Maybe it's better to use "0" here. I know it has the same effect but 
usually 0 is default value. But up to you.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-23 14:06 [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 10:04 ` Oleg Babin via Tarantool-patches
@ 2021-05-24 13:01 ` Serge Petrenko via Tarantool-patches
  2021-05-24 13:05   ` Serge Petrenko via Tarantool-patches
  2021-05-24 15:47   ` Vladislav Shpilevoy via Tarantool-patches
  2021-05-25 21:20 ` Vladislav Shpilevoy via Tarantool-patches
  2 siblings, 2 replies; 10+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-05-24 13:01 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, olegrok



23.05.2021 17:06, Vladislav Shpilevoy пишет:
> Lua json module used to have a global buffer for all encodings. It
> was reused by each next encode().
>
> This was not correct, because during encode() might happen a GC
> step, which might call encode() again and spoil the global buffer.
>
> The same problem was already fixed for the global static buffer in
> scope of #5632. Similarly to that time, the patch makes Lua json
> module use cord_ibuf to prevent "concurrent" usage of the buffer
> data. The global buffer is deleted.
>
> According to a few microbenchmarks it didn't affect the perf
> anyhow.
>
> Core part of the patch is strbuf changes. Firstly, its destruction
> is now optional, cord_ibuf can free itself on a next yield.
> Secondly, its reallocation algorithm is kept intact - ibuf is used
> as an allocator, not as the buffer itself. This is done so as not
> to be too intrusive in the third party module which might need an
> upgrade to the upstream in the future.
>
> Closes #6050
> ---
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-6050-lua-json-static-buf-gc
> Issue: https://github.com/tarantool/tarantool/issues/6050

Hi! Thanks for the patch!
LGTM with one nit below.


...

> diff --git a/third_party/lua-cjson/strbuf.c b/third_party/lua-cjson/strbuf.c
> index f0f7f4b9a..22c1f2093 100644
> --- a/third_party/lua-cjson/strbuf.c
> +++ b/third_party/lua-cjson/strbuf.c
> @@ -28,6 +28,7 @@
>   #include <string.h>
>   
>   #include "strbuf.h"
> +#include "small/ibuf.h"
>   
>   static void die(const char *fmt, ...)
>   {
> @@ -41,7 +42,7 @@ static void die(const char *fmt, ...)
>       exit(-1);
>   }
>   
> -void strbuf_init(strbuf_t *s, int len)
> +void strbuf_create(strbuf_t *s, int len, struct ibuf *ibuf)
>   {
>       int size;
>   
> @@ -50,37 +51,19 @@ void strbuf_init(strbuf_t *s, int len)
>       else
>           size = len + 1;         /* \0 terminator */
>   
> -    s->buf = NULL;
> +    s->buf = ibuf_reserve(ibuf, size);
>       s->size = size;
>       s->length = 0;
>       s->increment = STRBUF_DEFAULT_INCREMENT;
> -    s->dynamic = 0;
>       s->reallocs = 0;
>       s->debug = 0;
> -
> -    s->buf = malloc(size);
> +    s->ibuf = ibuf;
>       if (!s->buf)
>           die("Out of memory");
>   
>       strbuf_ensure_null(s);
>   }
>   
> -strbuf_t *strbuf_new(int len)
> -{
> -    strbuf_t *s;
> -
> -    s = malloc(sizeof(strbuf_t));
> -    if (!s)
> -        die("Out of memory");
> -
> -    strbuf_init(s, len);
> -
> -    /* Dynamic strbuf allocation / deallocation */
> -    s->dynamic = 1;
> -
> -    return s;
> -}
> -
>   void strbuf_set_increment(strbuf_t *s, int increment)
>   {
>       /* Increment > 0:  Linear buffer growth rate
> @@ -99,36 +82,9 @@ static inline void debug_stats(strbuf_t *s)
>       }
>   }
>   
> -/* If strbuf_t has not been dynamically allocated, strbuf_free() can
> - * be called any number of times strbuf_init() */
> -void strbuf_free(strbuf_t *s)
> -{
> -    debug_stats(s);
> -
> -    if (s->buf) {
> -        free(s->buf);
> -        s->buf = NULL;
> -    }
> -    if (s->dynamic)
> -        free(s);
> -}
> -
> -char *strbuf_free_to_string(strbuf_t *s, int *len)
> +void strbuf_destroy(strbuf_t *s)
>   {
> -    char *buf;
> -
>       debug_stats(s);
> -
> -    strbuf_ensure_null(s);
> -
> -    buf = s->buf;
> -    if (len)
> -        *len = s->length;
> -
> -    if (s->dynamic)
> -        free(s);
> -
> -    return buf;
>   }
>   
>   static int calculate_new_size(strbuf_t *s, int len)
> @@ -171,9 +127,21 @@ void strbuf_resize(strbuf_t *s, int len)
>           fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
>                   (long)s, s->size, newsize);
>       }
> -
>       s->size = newsize;

  ^ Extraneous change

> -    s->buf = realloc(s->buf, s->size);
> +
> +    struct ibuf *ibuf = s->ibuf;
> +    /*
> +     * Propagate the write position to enable memcpy() when realloc happens
> +     * inside of the ibuf.
> +     */
> +    ibuf->wpos += s->length;
> +    s->buf = ibuf_reserve(ibuf, newsize - s->length) - s->length;
> +    /*
> +     * But then it is reverted because memcpy() of the old data is done, and the
> +     * write position + capacity are managed by strbuf itself.
> +     */
> +    ibuf->wpos = s->buf;
> +
>       if (!s->buf)
>           die("Out of memory");
>       s->reallocs++;

...

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-24 13:01 ` Serge Petrenko via Tarantool-patches
@ 2021-05-24 13:05   ` Serge Petrenko via Tarantool-patches
  2021-05-24 15:47     ` Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 15:47   ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 1 reply; 10+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-05-24 13:05 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, olegrok



24.05.2021 16:01, Serge Petrenko via Tarantool-patches пишет:
>
>
> 23.05.2021 17:06, Vladislav Shpilevoy пишет:
>> Lua json module used to have a global buffer for all encodings. It
>> was reused by each next encode().
>>
>> This was not correct, because during encode() might happen a GC
>> step, which might call encode() again and spoil the global buffer.
>>
>> The same problem was already fixed for the global static buffer in
>> scope of #5632. Similarly to that time, the patch makes Lua json
>> module use cord_ibuf to prevent "concurrent" usage of the buffer
>> data. The global buffer is deleted.
>>
>> According to a few microbenchmarks it didn't affect the perf
>> anyhow.
>>
>> Core part of the patch is strbuf changes. Firstly, its destruction
>> is now optional, cord_ibuf can free itself on a next yield.
>> Secondly, its reallocation algorithm is kept intact - ibuf is used
>> as an allocator, not as the buffer itself. This is done so as not
>> to be too intrusive in the third party module which might need an
>> upgrade to the upstream in the future.
>>
>> Closes #6050
>> ---
>> Branch: 
>> http://github.com/tarantool/tarantool/tree/gerold103/gh-6050-lua-json-static-buf-gc
>> Issue: https://github.com/tarantool/tarantool/issues/6050
>
> Hi! Thanks for the patch!
> LGTM with one nit below.

Sorry, I noticed this too late: luacheck has one warning on your branch:

Checking test/app-tap/json.test.lua 1 warning
422 <https://github.com/tarantool/tarantool/runs/2650197964#step:4:422>
423 
<https://github.com/tarantool/tarantool/runs/2650197964#step:4:423>test/app-tap/json.test.lua:210:9: 
(W213) unused loop variable i
424 <https://github.com/tarantool/tarantool/runs/2650197964#step:4:424>



>
>
> ...
>
>> diff --git a/third_party/lua-cjson/strbuf.c 
>> b/third_party/lua-cjson/strbuf.c
>> index f0f7f4b9a..22c1f2093 100644
>> --- a/third_party/lua-cjson/strbuf.c
>> +++ b/third_party/lua-cjson/strbuf.c
>> @@ -28,6 +28,7 @@
>>   #include <string.h>
>>     #include "strbuf.h"
>> +#include "small/ibuf.h"
>>     static void die(const char *fmt, ...)
>>   {
>> @@ -41,7 +42,7 @@ static void die(const char *fmt, ...)
>>       exit(-1);
>>   }
>>   -void strbuf_init(strbuf_t *s, int len)
>> +void strbuf_create(strbuf_t *s, int len, struct ibuf *ibuf)
>>   {
>>       int size;
>>   @@ -50,37 +51,19 @@ void strbuf_init(strbuf_t *s, int len)
>>       else
>>           size = len + 1;         /* \0 terminator */
>>   -    s->buf = NULL;
>> +    s->buf = ibuf_reserve(ibuf, size);
>>       s->size = size;
>>       s->length = 0;
>>       s->increment = STRBUF_DEFAULT_INCREMENT;
>> -    s->dynamic = 0;
>>       s->reallocs = 0;
>>       s->debug = 0;
>> -
>> -    s->buf = malloc(size);
>> +    s->ibuf = ibuf;
>>       if (!s->buf)
>>           die("Out of memory");
>>         strbuf_ensure_null(s);
>>   }
>>   -strbuf_t *strbuf_new(int len)
>> -{
>> -    strbuf_t *s;
>> -
>> -    s = malloc(sizeof(strbuf_t));
>> -    if (!s)
>> -        die("Out of memory");
>> -
>> -    strbuf_init(s, len);
>> -
>> -    /* Dynamic strbuf allocation / deallocation */
>> -    s->dynamic = 1;
>> -
>> -    return s;
>> -}
>> -
>>   void strbuf_set_increment(strbuf_t *s, int increment)
>>   {
>>       /* Increment > 0:  Linear buffer growth rate
>> @@ -99,36 +82,9 @@ static inline void debug_stats(strbuf_t *s)
>>       }
>>   }
>>   -/* If strbuf_t has not been dynamically allocated, strbuf_free() can
>> - * be called any number of times strbuf_init() */
>> -void strbuf_free(strbuf_t *s)
>> -{
>> -    debug_stats(s);
>> -
>> -    if (s->buf) {
>> -        free(s->buf);
>> -        s->buf = NULL;
>> -    }
>> -    if (s->dynamic)
>> -        free(s);
>> -}
>> -
>> -char *strbuf_free_to_string(strbuf_t *s, int *len)
>> +void strbuf_destroy(strbuf_t *s)
>>   {
>> -    char *buf;
>> -
>>       debug_stats(s);
>> -
>> -    strbuf_ensure_null(s);
>> -
>> -    buf = s->buf;
>> -    if (len)
>> -        *len = s->length;
>> -
>> -    if (s->dynamic)
>> -        free(s);
>> -
>> -    return buf;
>>   }
>>     static int calculate_new_size(strbuf_t *s, int len)
>> @@ -171,9 +127,21 @@ void strbuf_resize(strbuf_t *s, int len)
>>           fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
>>                   (long)s, s->size, newsize);
>>       }
>> -
>>       s->size = newsize;
>
>  ^ Extraneous change
>
>> -    s->buf = realloc(s->buf, s->size);
>> +
>> +    struct ibuf *ibuf = s->ibuf;
>> +    /*
>> +     * Propagate the write position to enable memcpy() when realloc 
>> happens
>> +     * inside of the ibuf.
>> +     */
>> +    ibuf->wpos += s->length;
>> +    s->buf = ibuf_reserve(ibuf, newsize - s->length) - s->length;
>> +    /*
>> +     * But then it is reverted because memcpy() of the old data is 
>> done, and the
>> +     * write position + capacity are managed by strbuf itself.
>> +     */
>> +    ibuf->wpos = s->buf;
>> +
>>       if (!s->buf)
>>           die("Out of memory");
>>       s->reallocs++;
>
> ...
>

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-24 13:05   ` Serge Petrenko via Tarantool-patches
@ 2021-05-24 15:47     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-05-24 15:47 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches, olegrok

Thanks for the review!

> Sorry, I noticed this too late: luacheck has one warning on your branch:
> 
> Checking test/app-tap/json.test.lua 1 warning
> 422 <https://github.com/tarantool/tarantool/runs/2650197964#step:4:422>
> 423 <https://github.com/tarantool/tarantool/runs/2650197964#step:4:423>test/app-tap/json.test.lua:210:9: (W213) unused loop variable i
> 424 <https://github.com/tarantool/tarantool/runs/2650197964#step:4:424>

Fixed:

====================
@@ -207,7 +207,7 @@ tap.test("json", function(test)
     --
     local bigstr = string.rep('a', 16384)
     local t = {}
-    for i = 1, 10 do
+    for _ = 1, 10 do
         table.insert(t, bigstr)
     end
     local bigjson = serializer.encode(t)
====================

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-24 13:01 ` Serge Petrenko via Tarantool-patches
  2021-05-24 13:05   ` Serge Petrenko via Tarantool-patches
@ 2021-05-24 15:47   ` Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 16:17     ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-05-24 15:47 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches, olegrok

Hi! Thanks for the review!

>> @@ -171,9 +127,21 @@ void strbuf_resize(strbuf_t *s, int len)
>>           fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
>>                   (long)s, s->size, newsize);
>>       }
>> -
>>       s->size = newsize;
> 
>  ^ Extraneous change

This was rather intentional, there are too many empty lines in this
function now. But I don't mind to leave it:

====================
@@ -127,6 +127,7 @@ void strbuf_resize(strbuf_t *s, int len)
         fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
                 (long)s, s->size, newsize);
     }
+
     s->size = newsize;
 
     struct ibuf *ibuf = s->ibuf;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-24 10:04 ` Oleg Babin via Tarantool-patches
@ 2021-05-24 15:49   ` Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 16:00     ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-05-24 15:49 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, sergepetrenko

Hi! Thanks for the review!

On 24.05.2021 12:04, Oleg Babin wrote:
> Hi! Thanks for your patch.
> 
> 
> I see strange effect. After a patch following script:
> 
> ```
> 
> for i = 1, 1e9 do pcall(json.encode, function() end) end
> 
> ```
> 
> produces quite strange effects with memory. After some time
> 
> my system kills a process - also I see in htop that process consumes about 20% of memory.
> 
> In contrast before the patch process uses 0.1% of memory and doesn't have any oscillations
> 
> in "VIRT" and "RES" columns. Yes, it's a negative case but I believe such behaviour shouldn't be affected as well.

This is happening because you didn't do any yields. Cord buffer is freed
automatically when a yield happens. This is a workaround for not being
able to use a global buffer, which wouldn't need freeing at all.

This is a known issue with the cord buffer, and the only working alternative
I see is to wrap all related Lua C calls into lua_pcall(). This leads to
perf issues for the success case, because pcall does more work; because
you usually need to re-push the arguments; and because pcall is not jitted
AFAIK. For instance about arguments re-push, to use lua_pcall() in lua_cjson
in json_encode() I would need to push the Lua json.encode(...) arguments on
the stack again.

I couldn't find any good solution for the error-case so far. The same issue
exists now with all the code which used IBUF_SHARED/tarantool_ibuf and now
uses cord_ibuf_take()/put(). It does not justify the problem though.

I was thinking about using pcall anyway; about pushing a GC function on Lua
stack to free the cord buffer; about having a global buffer for normal context
and another global buffer per each level of GC recursion. The last idea is
not possible to implement due to lack of a concept of GC level in our Lua
implementation. The other ideas are going to hit the perf for the success
case. All looks bad.

Your particular example started working when I added a yield every 10k
encodes.

>> diff --git a/third_party/lua-cjson/lua_cjson.c b/third_party/lua-cjson/lua_cjson.c
>> index 38e999870..85186d6d5 100644
>> --- a/third_party/lua-cjson/lua_cjson.c
>> +++ b/third_party/lua-cjson/lua_cjson.c
>> @@ -182,9 +177,6 @@ static int json_destroy_config(lua_State *l)
>>   static void json_create_tokens()
>>   {
>>       int i;
>> -#if DEFAULT_ENCODE_KEEP_BUFFER > 0
>> -    strbuf_init(&encode_buf, 0);
>> -#endif
>>         /* Decoding init */
>>   @@ -444,7 +436,9 @@ static int json_encode(lua_State *l) {
>>                     "expected 1 or 2 arguments");
>>         /* Reuse existing buffer. */
>> -    strbuf_reset(&encode_buf);
>> +    strbuf_t encode_buf;
>> +    struct ibuf *ibuf = cord_ibuf_take();
>> +    strbuf_create(&encode_buf, -1, ibuf);
> 
> Maybe it's better to use "0" here. I know it has the same effect but usually 0 is default value. But up to you.

0 looks like "do not pre-allocate anything". I used the default value
explicitly now:

====================
@@ -438,7 +438,7 @@ static int json_encode(lua_State *l) {
     /* Reuse existing buffer. */
     strbuf_t encode_buf;
     struct ibuf *ibuf = cord_ibuf_take();
-    strbuf_create(&encode_buf, -1, ibuf);
+    strbuf_create(&encode_buf, STRBUF_DEFAULT_SIZE, ibuf);
     struct luaL_serializer *cfg = luaL_checkserializer(l);
 
     if (lua_gettop(l) == 2) {
====================

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-24 15:49   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-05-24 16:00     ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-05-24 16:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, sergepetrenko

Thanks for your explanation. LGTM.

I put several comments below.

On 24.05.2021 18:49, Vladislav Shpilevoy wrote:
> Hi! Thanks for the review!
>
> On 24.05.2021 12:04, Oleg Babin wrote:
>> Hi! Thanks for your patch.
>>
>>
>> I see strange effect. After a patch following script:
>>
>> ```
>>
>> for i = 1, 1e9 do pcall(json.encode, function() end) end
>>
>> ```
>>
>> produces quite strange effects with memory. After some time
>>
>> my system kills a process - also I see in htop that process consumes about 20% of memory.
>>
>> In contrast before the patch process uses 0.1% of memory and doesn't have any oscillations
>>
>> in "VIRT" and "RES" columns. Yes, it's a negative case but I believe such behaviour shouldn't be affected as well.
> This is happening because you didn't do any yields. Cord buffer is freed
> automatically when a yield happens. This is a workaround for not being
> able to use a global buffer, which wouldn't need freeing at all.

After your explanation my comment looks irrelevant. Agree, in real projects

it's impossible to run code without any yields. Thanks!

> This is a known issue with the cord buffer, and the only working alternative
> I see is to wrap all related Lua C calls into lua_pcall(). This leads to
> perf issues for the success case, because pcall does more work; because
> you usually need to re-push the arguments; and because pcall is not jitted
> AFAIK.

No, pcall in Lua is jitted (see 
http://wiki.luajit.org/NYI#libraries_base-library).

But anyway I agree it gives huge overhead on hot paths.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-24 15:47   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-05-24 16:17     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 10+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-05-24 16:17 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, olegrok



24.05.2021 18:47, Vladislav Shpilevoy пишет:
> Hi! Thanks for the review!
>
>>> @@ -171,9 +127,21 @@ void strbuf_resize(strbuf_t *s, int len)
>>>            fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
>>>                    (long)s, s->size, newsize);
>>>        }
>>> -
>>>        s->size = newsize;
>>   ^ Extraneous change
> This was rather intentional, there are too many empty lines in this
> function now. But I don't mind to leave it:
>
> ====================
> @@ -127,6 +127,7 @@ void strbuf_resize(strbuf_t *s, int len)
>           fprintf(stderr, "strbuf(%lx) resize: %d => %d\n",
>                   (long)s, s->size, newsize);
>       }
> +
>       s->size = newsize;
>   
>       struct ibuf *ibuf = s->ibuf;
Thanks for the fixes!

LGTM.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding
  2021-05-23 14:06 [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding Vladislav Shpilevoy via Tarantool-patches
  2021-05-24 10:04 ` Oleg Babin via Tarantool-patches
  2021-05-24 13:01 ` Serge Petrenko via Tarantool-patches
@ 2021-05-25 21:20 ` Vladislav Shpilevoy via Tarantool-patches
  2 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-05-25 21:20 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, olegrok

Pushed to master, 2.8, 2.7, 1.10.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-25 21:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-23 14:06 [Tarantool-patches] [PATCH 1/1] json: use cord_ibuf for encoding and decoding Vladislav Shpilevoy via Tarantool-patches
2021-05-24 10:04 ` Oleg Babin via Tarantool-patches
2021-05-24 15:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-05-24 16:00     ` Oleg Babin via Tarantool-patches
2021-05-24 13:01 ` Serge Petrenko via Tarantool-patches
2021-05-24 13:05   ` Serge Petrenko via Tarantool-patches
2021-05-24 15:47     ` Vladislav Shpilevoy via Tarantool-patches
2021-05-24 15:47   ` Vladislav Shpilevoy via Tarantool-patches
2021-05-24 16:17     ` Serge Petrenko via Tarantool-patches
2021-05-25 21:20 ` Vladislav Shpilevoy via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox