[tarantool-patches] Re: [PATCH 2/2] wal: drop rows_per_wal option

Sun Oct 6 18:59:28 MSK 2019

Hi! Thanks for the review!

On 05/10/2019 23:21, Alexander Turenko wrote:
> On Sat, Sep 07, 2019 at 06:05:37PM +0200, Vladislav Shpilevoy wrote:
>> Rows_per_wal option was deprecated because it can be covered by
>> wal_max_size. In order not to complicate WAL code with that
>> option's support this commit drops it completely.
> 
> It is just one 'if' subcondition in the code and one field for storing
> yields frequency. (And the field can be replaced with a static variable
> w/o removing the option.) I don't see how this commit simplifies the
> code.

AFAIU this complicates not only the present code, but also future code,
according to Kostja's comment in the issue description:

    "We will have trouble with automatic size management if we
     continue using wal_max_rows"

But honestly I don't really care that much about this option. I just
decided to fix the unassigned issue, marked as breaking change, as
soon as possible, to break less users. So I can't provide any serious
evidence how rows_per_wal was going to complicate auto size management.

> 
> The option was convenient for tests at least. Now additional
> calculations are necessary to dig into cases that verifies xlog files
> count.

If that is needed for testing only, then it should not be exposed to
public API.

> 
> However I don't think that those cons and pros are important, so I don't
> want to block the patch. The patch itself looks good except minor
> comments below.
> 
>>
>> In some tests the option was used to create several small xlog
>> files. Now the same is done via wal_max_size. Where it was
>> needed, number of rows per wal is estimated as wal_max_size / 50.
>> Because struct xrow_header size ~= 50 not counting paddings and
>> body.
>>
>> Closes #3762
>> ---
>>  src/box/box.cc                       |  30 +-------
>>  src/box/lua/load_cfg.lua             |   5 --
>>  src/box/wal.c                        |  14 +---
>>  src/box/wal.h                        |  13 +++-
>>  test/app-tap/init_script.result      |  39 +++++-----
>>  test/app-tap/snapshot.test.lua       |   2 +-
>>  test/app/app.lua                     |   2 +-
>>  test/app/fiber.result                |   6 +-
>>  test/app/fiber.test.lua              |   4 +-
>>  test/box-py/box.lua                  |   2 +-
>>  test/box-tap/cfg.test.lua            |   3 +-
>>  test/box/admin.result                |   2 -
>>  test/box/cfg.result                  |   4 -
>>  test/box/configuration.result        | 107 ---------------------------
>>  test/box/proxy.lua                   |   2 +-
>>  test/box/tiny.lua                    |   1 -
>>  test/engine/box.lua                  |   2 +-
>>  test/engine_long/box.lua             |   1 -
>>  test/long_run-py/box.lua             |   1 -
>>  test/vinyl/vinyl.lua                 |   1 -
>>  test/xlog-py/box.lua                 |   1 -
>>  test/xlog/checkpoint_daemon.result   |  11 ++-
>>  test/xlog/checkpoint_daemon.test.lua |   9 ++-
>>  test/xlog/errinj.result              |   7 +-
>>  test/xlog/errinj.test.lua            |   5 +-
>>  test/xlog/panic.lua                  |   1 -
>>  test/xlog/upgrade/fill.lua           |   2 +-
>>  test/xlog/xlog.lua                   |   2 +-
>>  28 files changed, 72 insertions(+), 207 deletions(-)
>>  delete mode 100644 test/box/configuration.result
> 
>>  static void
>> -wal_stream_create(struct wal_stream *ctx, size_t wal_max_rows)
>> +wal_stream_create(struct wal_stream *ctx)
>>  {
>>  	xstream_create(&ctx->base, apply_wal_row);
>>  	ctx->rows = 0;
>> -	/**
>> -	 * Make the yield logic covered by the functional test
>> -	 * suite, which has a small setting for rows_per_wal.
>> -	 * Each yield can take up to 1ms if there are no events,
>> -	 * so we can't afford many of them during recovery.
>> -	 */
>> -	ctx->yield = (wal_max_rows >> 4)  + 1;
> 
> Yield frequency is not more depend on wal file size / rows count, so we
> we'll not test this scenario anymore. Look:
> 
> https://coveralls.io/builds/26111576/source?filename=src/box/box.cc#L345
> https://coveralls.io/builds/25591832/source?filename=src/box/box.cc#L343
> 
> Don't sure whether it is really important, but it seems that someone
> paid special attention to this and even leave the comment about.
> 
> Can we, say, store 'rows per yield' in a static variable with 32000 as
> the default value and allow tests to set it with errinj?

There are too many tests which had small rows_per_wal setting. I don't
think it is a good idea to make them all errinj dependent. That would
require to disable them in release build, and to move to separate files.
Instead, I added a new test:

===========================================================================

diff --git a/test/engine/recover_snapshot_wal.result b/test/engine/recover_snapshot_wal.result
index d2667559d..6b7b05362 100644
--- a/test/engine/recover_snapshot_wal.result
+++ b/test/engine/recover_snapshot_wal.result
@@ -23,9 +23,9 @@ box.snapshot()
 ---
 - ok
 ...
-space:insert({1001})
+space:insert({33001})
 ---
-- [1001]
+- [33001]
 ...
 test_run:cmd("restart server default")
 space = box.space['test']
@@ -37,7 +37,7 @@ index = space.index['primary']
 index:select({}, {iterator = box.index.ALL})
 ---
 - - [0]
-  - [1001]
+  - [33001]
 ...
 for key = 1, 351 do space:insert({key}) end
 ---
@@ -46,7 +46,18 @@ box.snapshot()
 ---
 - ok
 ...
-for key = 352, 1000 do space:insert({key}) end
+-- Insert so many tuples, that recovery would need to yield
+-- periodically to allow other fibers do something. At the moment
+-- of writing this the yield period was 32k tuples.
+box.begin()                                                     \
+for key = 352, 33000 do                                         \
+    space:insert({key})                                         \
+    if key % 10000 == 0 then                                    \
+        box.commit()                                            \
+        box.begin()                                             \
+    end                                                         \
+end                                                             \
+box.commit()
 ---
 ...
 test_run:cmd("restart server default")
@@ -56,1010 +67,24 @@ space = box.space['test']
 index = space.index['primary']
 ---
 ...
-index:select({}, {iterator = box.index.ALL})
+i = 0
 ---
-- - [0]
-  - [1]
...           -- lots of deletions
-  - [999]
-  - [1000]
-  - [1001]
+...
+err = nil
+---
+...
+for _, t in space:pairs() do                                    \
+    if t[1] ~= i then                                           \
+        err = {i, t}                                            \
+        break                                                   \
+    end                                                         \
+    i = i + 1                                                   \
+end
 ---
 ...
+i
+---
+- 33002
+...
+err
+---
+- null
 ...
 space:drop()
 ---
diff --git a/test/engine/recover_snapshot_wal.test.lua b/test/engine/recover_snapshot_wal.test.lua
index 1f133dd72..ff286bc4f 100644
--- a/test/engine/recover_snapshot_wal.test.lua
+++ b/test/engine/recover_snapshot_wal.test.lua
@@ -10,7 +10,7 @@ index = space:create_index('primary')
 
 space:insert({0})
 box.snapshot()
-space:insert({1001})
+space:insert({33001})
 
 test_run:cmd("restart server default")
 
@@ -20,13 +20,33 @@ index:select({}, {iterator = box.index.ALL})
 
 for key = 1, 351 do space:insert({key}) end
 box.snapshot()
-for key = 352, 1000 do space:insert({key}) end
+-- Insert so many tuples, that recovery would need to yield
+-- periodically to allow other fibers do something. At the moment
+-- of writing this the yield period was 32k tuples.
+box.begin()                                                     \
+for key = 352, 33000 do                                         \
+    space:insert({key})                                         \
+    if key % 10000 == 0 then                                    \
+        box.commit()                                            \
+        box.begin()                                             \
+    end                                                         \
+end                                                             \
+box.commit()
 
 test_run:cmd("restart server default")
 
 space = box.space['test']
 index = space.index['primary']
-index:select({}, {iterator = box.index.ALL})
+i = 0
+err = nil
+for _, t in space:pairs() do                                    \
+    if t[1] ~= i then                                           \
+        err = {i, t}                                            \
+        break                                                   \
+    end                                                         \
+    i = i + 1                                                   \
+end
+i
+err
 
 space:drop()
 test_run:cmd("restart server default with cleanup=1")

===========================================================================

But I don't think it was necessary. According to the comment (and
to the patch introduced it), this signal-handling-sleep(0) was never
properly tested. And even though this line was covered by our tests,
its purpose was not tested.

> 
>> diff --git a/test/box/configuration.result b/test/box/configuration.result
>> deleted file mode 100644
>> index c885c28dc..000000000
>> --- a/test/box/configuration.result
>> +++ /dev/null
> 
> I would say that the test was removed in
> fdc3d1dd2c3b120ff6426d074285908da7fb646d to don't confuse a reader.

Ok, I don't mind. Added to the commit message:

"
    Note, file box/configuration.result was deleted here, because it
    is a stray result file, and it contained the rows_per_wal option
    mentioning. Its test was dropped much earlier in
    fdc3d1dd2c3b120ff6426d074285908da7fb646d.
"

> 
>> diff --git a/test/xlog/errinj.test.lua b/test/xlog/errinj.test.lua
>> index de0e2e7e8..3d72dc4e4 100644
>> --- a/test/xlog/errinj.test.lua
>> +++ b/test/xlog/errinj.test.lua
>> @@ -30,12 +30,13 @@ _ = test:create_index('primary')
>>  
>>  box.schema.user.grant('guest', 'write', 'space', 'test')
>>  
>> -for i=1, box.cfg.rows_per_wal do test:insert{i, 'test'} end
>> +row_count_per_wal = box.cfg.wal_max_size / 50 + 10
>> +for i=1, row_count_per_wal do test:insert{i, 'test'} end
> 
> Is +10 here because tuples are a bit larger than in other tests?
> 

No. Rather because in other tests it is too +10, and here it wasn't.
Size of one wal record I estimated as >= 50 (size of xrow_header +
msgpack headers + body size), so perhaps in theory / 50 was already
enough. +10 is just in case, to ensure that row_count_per_wal 100%
will create at least one new xlog file.