[tarantool-patches] Re: [PATCH 2/2] wal: drop rows_per_wal option
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sun Oct 6 18:59:28 MSK 2019
Hi! Thanks for the review!
On 05/10/2019 23:21, Alexander Turenko wrote:
> On Sat, Sep 07, 2019 at 06:05:37PM +0200, Vladislav Shpilevoy wrote:
>> Rows_per_wal option was deprecated because it can be covered by
>> wal_max_size. In order not to complicate WAL code with that
>> option's support this commit drops it completely.
>
> It is just one 'if' subcondition in the code and one field for storing
> yields frequency. (And the field can be replaced with a static variable
> w/o removing the option.) I don't see how this commit simplifies the
> code.
AFAIU this complicates not only the present code, but also future code,
according to Kostja's comment in the issue description:
"We will have trouble with automatic size management if we
continue using wal_max_rows"
But honestly I don't really care that much about this option. I just
decided to fix the unassigned issue, marked as breaking change, as
soon as possible, to break less users. So I can't provide any serious
evidence how rows_per_wal was going to complicate auto size management.
>
> The option was convenient for tests at least. Now additional
> calculations are necessary to dig into cases that verifies xlog files
> count.
If that is needed for testing only, then it should not be exposed to
public API.
>
> However I don't think that those cons and pros are important, so I don't
> want to block the patch. The patch itself looks good except minor
> comments below.
>
>>
>> In some tests the option was used to create several small xlog
>> files. Now the same is done via wal_max_size. Where it was
>> needed, number of rows per wal is estimated as wal_max_size / 50.
>> Because struct xrow_header size ~= 50 not counting paddings and
>> body.
>>
>> Closes #3762
>> ---
>> src/box/box.cc | 30 +-------
>> src/box/lua/load_cfg.lua | 5 --
>> src/box/wal.c | 14 +---
>> src/box/wal.h | 13 +++-
>> test/app-tap/init_script.result | 39 +++++-----
>> test/app-tap/snapshot.test.lua | 2 +-
>> test/app/app.lua | 2 +-
>> test/app/fiber.result | 6 +-
>> test/app/fiber.test.lua | 4 +-
>> test/box-py/box.lua | 2 +-
>> test/box-tap/cfg.test.lua | 3 +-
>> test/box/admin.result | 2 -
>> test/box/cfg.result | 4 -
>> test/box/configuration.result | 107 ---------------------------
>> test/box/proxy.lua | 2 +-
>> test/box/tiny.lua | 1 -
>> test/engine/box.lua | 2 +-
>> test/engine_long/box.lua | 1 -
>> test/long_run-py/box.lua | 1 -
>> test/vinyl/vinyl.lua | 1 -
>> test/xlog-py/box.lua | 1 -
>> test/xlog/checkpoint_daemon.result | 11 ++-
>> test/xlog/checkpoint_daemon.test.lua | 9 ++-
>> test/xlog/errinj.result | 7 +-
>> test/xlog/errinj.test.lua | 5 +-
>> test/xlog/panic.lua | 1 -
>> test/xlog/upgrade/fill.lua | 2 +-
>> test/xlog/xlog.lua | 2 +-
>> 28 files changed, 72 insertions(+), 207 deletions(-)
>> delete mode 100644 test/box/configuration.result
>
>> static void
>> -wal_stream_create(struct wal_stream *ctx, size_t wal_max_rows)
>> +wal_stream_create(struct wal_stream *ctx)
>> {
>> xstream_create(&ctx->base, apply_wal_row);
>> ctx->rows = 0;
>> - /**
>> - * Make the yield logic covered by the functional test
>> - * suite, which has a small setting for rows_per_wal.
>> - * Each yield can take up to 1ms if there are no events,
>> - * so we can't afford many of them during recovery.
>> - */
>> - ctx->yield = (wal_max_rows >> 4) + 1;
>
> Yield frequency is not more depend on wal file size / rows count, so we
> we'll not test this scenario anymore. Look:
>
> https://coveralls.io/builds/26111576/source?filename=src/box/box.cc#L345
> https://coveralls.io/builds/25591832/source?filename=src/box/box.cc#L343
>
> Don't sure whether it is really important, but it seems that someone
> paid special attention to this and even leave the comment about.
>
> Can we, say, store 'rows per yield' in a static variable with 32000 as
> the default value and allow tests to set it with errinj?
There are too many tests which had small rows_per_wal setting. I don't
think it is a good idea to make them all errinj dependent. That would
require to disable them in release build, and to move to separate files.
Instead, I added a new test:
===========================================================================
diff --git a/test/engine/recover_snapshot_wal.result b/test/engine/recover_snapshot_wal.result
index d2667559d..6b7b05362 100644
--- a/test/engine/recover_snapshot_wal.result
+++ b/test/engine/recover_snapshot_wal.result
@@ -23,9 +23,9 @@ box.snapshot()
---
- ok
...
-space:insert({1001})
+space:insert({33001})
---
-- [1001]
+- [33001]
...
test_run:cmd("restart server default")
space = box.space['test']
@@ -37,7 +37,7 @@ index = space.index['primary']
index:select({}, {iterator = box.index.ALL})
---
- - [0]
- - [1001]
+ - [33001]
...
for key = 1, 351 do space:insert({key}) end
---
@@ -46,7 +46,18 @@ box.snapshot()
---
- ok
...
-for key = 352, 1000 do space:insert({key}) end
+-- Insert so many tuples, that recovery would need to yield
+-- periodically to allow other fibers do something. At the moment
+-- of writing this the yield period was 32k tuples.
+box.begin() \
+for key = 352, 33000 do \
+ space:insert({key}) \
+ if key % 10000 == 0 then \
+ box.commit() \
+ box.begin() \
+ end \
+end \
+box.commit()
---
...
test_run:cmd("restart server default")
@@ -56,1010 +67,24 @@ space = box.space['test']
index = space.index['primary']
---
...
-index:select({}, {iterator = box.index.ALL})
+i = 0
---
-- - [0]
- - [1]
... -- lots of deletions
- - [999]
- - [1000]
- - [1001]
+...
+err = nil
+---
+...
+for _, t in space:pairs() do \
+ if t[1] ~= i then \
+ err = {i, t} \
+ break \
+ end \
+ i = i + 1 \
+end
---
...
+i
+---
+- 33002
+...
+err
+---
+- null
...
space:drop()
---
diff --git a/test/engine/recover_snapshot_wal.test.lua b/test/engine/recover_snapshot_wal.test.lua
index 1f133dd72..ff286bc4f 100644
--- a/test/engine/recover_snapshot_wal.test.lua
+++ b/test/engine/recover_snapshot_wal.test.lua
@@ -10,7 +10,7 @@ index = space:create_index('primary')
space:insert({0})
box.snapshot()
-space:insert({1001})
+space:insert({33001})
test_run:cmd("restart server default")
@@ -20,13 +20,33 @@ index:select({}, {iterator = box.index.ALL})
for key = 1, 351 do space:insert({key}) end
box.snapshot()
-for key = 352, 1000 do space:insert({key}) end
+-- Insert so many tuples, that recovery would need to yield
+-- periodically to allow other fibers do something. At the moment
+-- of writing this the yield period was 32k tuples.
+box.begin() \
+for key = 352, 33000 do \
+ space:insert({key}) \
+ if key % 10000 == 0 then \
+ box.commit() \
+ box.begin() \
+ end \
+end \
+box.commit()
test_run:cmd("restart server default")
space = box.space['test']
index = space.index['primary']
-index:select({}, {iterator = box.index.ALL})
+i = 0
+err = nil
+for _, t in space:pairs() do \
+ if t[1] ~= i then \
+ err = {i, t} \
+ break \
+ end \
+ i = i + 1 \
+end
+i
+err
space:drop()
test_run:cmd("restart server default with cleanup=1")
===========================================================================
But I don't think it was necessary. According to the comment (and
to the patch introduced it), this signal-handling-sleep(0) was never
properly tested. And even though this line was covered by our tests,
its purpose was not tested.
>
>> diff --git a/test/box/configuration.result b/test/box/configuration.result
>> deleted file mode 100644
>> index c885c28dc..000000000
>> --- a/test/box/configuration.result
>> +++ /dev/null
>
> I would say that the test was removed in
> fdc3d1dd2c3b120ff6426d074285908da7fb646d to don't confuse a reader.
Ok, I don't mind. Added to the commit message:
"
Note, file box/configuration.result was deleted here, because it
is a stray result file, and it contained the rows_per_wal option
mentioning. Its test was dropped much earlier in
fdc3d1dd2c3b120ff6426d074285908da7fb646d.
"
>
>> diff --git a/test/xlog/errinj.test.lua b/test/xlog/errinj.test.lua
>> index de0e2e7e8..3d72dc4e4 100644
>> --- a/test/xlog/errinj.test.lua
>> +++ b/test/xlog/errinj.test.lua
>> @@ -30,12 +30,13 @@ _ = test:create_index('primary')
>>
>> box.schema.user.grant('guest', 'write', 'space', 'test')
>>
>> -for i=1, box.cfg.rows_per_wal do test:insert{i, 'test'} end
>> +row_count_per_wal = box.cfg.wal_max_size / 50 + 10
>> +for i=1, row_count_per_wal do test:insert{i, 'test'} end
>
> Is +10 here because tuples are a bit larger than in other tests?
>
No. Rather because in other tests it is too +10, and here it wasn't.
Size of one wal record I estimated as >= 50 (size of xrow_header +
msgpack headers + body size), so perhaps in theory / 50 was already
enough. +10 is just in case, to ensure that row_count_per_wal 100%
will create at least one new xlog file.
More information about the Tarantool-patches
mailing list