From: "Илья Конюхов" <runsfor@gmail.com> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org, alexander.turenko@tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Date: Wed, 10 Jun 2020 02:06:18 +0300 [thread overview] Message-ID: <EE69E031-21CF-49AC-89CE-8E57CB006E84@gmail.com> (raw) In-Reply-To: <67c75c01-8503-2355-e1f7-9644def2179c@tarantool.org> Thanks for the detailed review! I’ve corrected most of the comments, you highlighted. You've also mentioned we may want to collect some internal statistics in C. It seems to be more broader topic to talk about. There is still not so clear to me which stats exactly we need to collect. We should discuss it more in related GH issue. Right now I would focus on parts we can easily reach from lua. And I still think this patch may be useful to track a distribution how spaces and indices are used. I think performance should not be a major issue here since it is now use caching and intended to run once an hour. In general, based on your feedback I’ve decided to refactor this patch a bit and: - it now adds a caching for space and index statistics based on schema version; - redundant primary and secondary index flags are removed; - flags were replaced by counters. Also, I’ve left detailed answers under your comments below. > On 7 Jun 2020, at 19:45, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Thanks for the patch! > > Generally, I don't like having so much Lua code in > the daemon, and system space full scans. Because it > is slow and produces Lua garbage. > > Also anyway it can't collect some internal things > such as whether SQL is used (it is not exposed in > any system spaces), popen, swim, etc. These things > don't register self in any global place. > > I was rather thinking about keeping track of all > these modules and their statistics in C. So as collection > of the statistics would be right when it changes, in > a set of int counters. And statistics dump would cost O(1) > by time, right into a JSON string, without Lua participation > except that it would call this C dumper and put its result > into an http request. > > In other words, I am not sure this commit is needed at all, > until we understand how to collect all the other features > too. > > See 6 comments below. > > On 05/06/2020 10:35, Ilya Konyukhov wrote: >> This patch adds basic db features to feedback report. >> It collects info about what engine and which types of >> indexes are setup by the user. >> >> Here is how report may look like if all the features used: >> >> ```json >> { >> "arch": "x64", >> "features": { >> "has_bitset_index": true, >> "has_jsonpath_index": true, >> "vinyl": true, >> "has_tree_index": true, >> "has_primary_index": true, >> "has_hash_index": true, >> "memtx": true, >> "has_temporary_spaces": true, >> "has_local_spaces": true, >> "has_rtree_index": true, >> "has_secondary_index": true, >> "has_functional_index": true >> }, >> "server_id": "7c8490f7-61c5-4e12-a7ff-d9fed05ad8ac", >> "is_docker": false, >> "os": "OSX", >> "feedback_type": "version", >> "cluster_id": "1eb7d98e-3344-4f15-a439-c287464f09e7", >> "tarantool_version": "2.5.0-90-g27fbe6ecd", >> "feedback_version": 1 >> } >> ``` >> >> Part of #4943 >> --- >> src/box/lua/feedback_daemon.lua | 65 +++++++++++++++++++++++++++ >> test/box-tap/feedback_daemon.test.lua | 42 ++++++++++++++++- >> 2 files changed, 106 insertions(+), 1 deletion(-) >> >> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua >> index 2ce49fb22..0fcd8ed87 100644 >> --- a/src/box/lua/feedback_daemon.lua >> +++ b/src/box/lua/feedback_daemon.lua >> @@ -41,6 +41,15 @@ local function detect_docker_environment() >> return true >> end >> >> +local function is_system_space(sp) >> + local sp_id = sp.id >> + if box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX then >> + return true >> + end > > 1. Please, keep code lines inside 80 symbols border. Also this function > return can be simplified to > > return box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX Ok, collapsed that part > >> + >> + return false >> +end >> + >> local function fill_in_base_info(feedback) >> if box.info.status ~= "running" then >> return nil, "not running" >> @@ -56,9 +65,65 @@ local function fill_in_platform_info(feedback) >> feedback.is_docker = detect_docker_environment() >> end >> >> +local function fill_in_space_indices(feedback, sp) >> + if not sp.index[0] then return end >> + >> + feedback.features.has_primary_index = true > > 2. What is a purpose of this field? Zero-index spaces always > exist, at least because indexes are created in a separate DDL > statement. > > Besides, the function and spaces iteration may be really heavy, > if space count is thousands. Or even hundreds, but with many > indexes. And there is no a yield. > > In addition to yields I ask you to add caching of this function > results using schema version counter. Schema changes very rarely, > so caching would make this function practically free almost > always. - removed primary/secondary fields - added caching. Cache invalidates when schema version updates - added yield after each space iteration > >> + local idx_count = 0 >> + for _, idx in pairs(sp.index) do >> + for _, part in pairs(idx.parts) do >> + if part.path ~= nil then >> + feedback.features.has_jsonpath_index = true >> + break >> + end >> + end >> + if idx.func ~= nil then >> + feedback.features.has_functional_index = true >> + end >> + if idx.type == 'TREE' then >> + feedback.features.has_tree_index = true >> + elseif idx.type == 'HASH' then >> + feedback.features.has_hash_index = true >> + elseif idx.type == 'RTREE' then >> + feedback.features.has_rtree_index = true >> + elseif idx.type == 'BITSET' then >> + feedback.features.has_bitset_index = true >> + end >> + idx_count = idx_count + 1 >> + end >> + >> + if idx_count > 1 then >> + feedback.features.has_secondary_index = true > > 3. This does not look really useful. What is this flag going > to tell us? Secondary indexes exist almost always. > > Besides, I agree with Dmitry's comment about counters instead > of flags. - removed secondary index tracking - introduced counters for other indices. > >> + end >> +end >> + >> +local function fill_in_features(feedback) >> + feedback.features = feedback.features or {} >> + >> + local is_memtx, is_vinyl, is_temporary, is_local >> + for _, sp in pairs(box.space) do >> + local is_system = is_system_space(sp) >> + if not is_system then >> + if sp.engine == 'vinyl' then is_vinyl = true end >> + if sp.engine == 'memtx' then >> + if sp.temporary ~= nil then is_temporary = true end >> + is_memtx = true >> + end >> + if sp.is_local ~= nil then is_local = true end >> + fill_in_space_indices(feedback, sp) >> + end >> + end >> + >> + feedback.features.has_temporary_spaces = is_temporary >> + feedback.features.has_local_spaces = is_local >> + feedback.features.memtx = is_memtx >> + feedback.features.vinyl = is_vinyl > > 4. Why do some flags have prefix 'has_', some have 'is_', > and some are just nouns like 'memtx', 'vinyl'? Lets be > consistent and use one name template. For that type of > flags in C we would use 'has_'. With counters it all now has suffixes like “_spaces” and “_indices" > >> +end >> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua >> index c36b2a694..e382af8e8 100755 >> --- a/test/box-tap/feedback_daemon.test.lua >> +++ b/test/box-tap/feedback_daemon.test.lua >> @@ -113,6 +113,46 @@ check("feedback after start") >> daemon.send_test() >> check("feedback after feedback send_test") >> >> +local feedback_json = json.decode(feedback_save) > > 5. When write a test for an issue, please, mention the > issue in a comment and describe it shortly. Like this: > > -- > -- gh-####: description. > — > Done >> +test:is(type(feedback_json.features), 'table', 'features field is present') >> +test:isnil(next(feedback_json.features), 'features are empty at the moment') >> + >> +box.schema.create_space('features_vinyl', {engine = 'vinyl'}) >> +box.schema.create_space('features_memtx', {engine = 'memtx', is_local = true, temporary = true}) >> +box.space.features_memtx:create_index('vinyl_pk', {type = 'tree'}) >> +box.space.features_memtx:create_index('memtx_pk', {type = 'hash'}) >> +box.space.features_memtx:create_index('memtx_bitset', {type = 'bitset'}) >> +box.space.features_memtx:create_index('memtx_rtree', {type = 'rtree', parts = {3, 'array'}}) >> +box.space.features_memtx:create_index('memtx_jpath', >> + {parts = {{field=4, type='str', path='data.name'}}}) > > 6. Please, be consistent in the code style. Surround '=' with whitespaces, > add a whitespace after ',' (see your code below). > Adjusted code style here. Thanks for pointing it out. >> +box.schema.func.create('features_func', { >> + body = "function(tuple) return {string.sub(tuple[2],1,1)} end", >> + is_deterministic = true, >> + is_sandboxed = true}) >> +box.space.features_memtx:create_index('j', >> + {parts={{field = 1, type = 'number'}},func = 'features_func'}) >> + >> +check('old feedback received') >> +feedback_reset() >> +check('feedback with db features received') >> + >> +feedback_json = json.decode(feedback_save) >> +test:test('features', function(t) >> + t:plan(12) >> + t:ok(feedback_json.features.memtx, 'memtx engine usage gathered') >> + t:ok(feedback_json.features.vinyl, 'vinyl engine usage gathered') >> + t:ok(feedback_json.features.has_temporary_spaces, 'temporary space usage gathered') >> + t:ok(feedback_json.features.has_local_spaces, 'local space usage gathered') >> + t:ok(feedback_json.features.has_primary_index, 'primary index gathered') >> + t:ok(feedback_json.features.has_secondary_index, 'secondary index gathered') >> + t:ok(feedback_json.features.has_tree_index, 'tree index gathered') >> + t:ok(feedback_json.features.has_hash_index, 'hash index gathered') >> + t:ok(feedback_json.features.has_rtree_index, 'rtree index gathered') >> + t:ok(feedback_json.features.has_bitset_index, 'bitset index gathered') >> + t:ok(feedback_json.features.has_jsonpath_index, 'jsonpath index gathered') >> + t:ok(feedback_json.features.has_functional_index, 'functional index gathered') >> +end) >> + >> daemon.stop() >> >> box.feedback.save("feedback.json”) >> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua index 21e69d511..1f177a204 100644 --- a/src/box/lua/feedback_daemon.lua +++ b/src/box/lua/feedback_daemon.lua @@ -50,6 +50,25 @@ local function detect_docker_environment() return cached_detect_docker_env end +local function is_system_space(sp) + return box.schema.SYSTEM_ID_MIN <= sp.id and + sp.id <= box.schema.SYSTEM_ID_MAX +end + +local function is_jsonpath_index(idx) + for _, part in pairs(idx.parts) do + if part.path ~= nil then + return true + end + end + + return false +end + +local function is_functional_index(idx) + return idx.func ~= nil +end + local function fill_in_base_info(feedback) if box.info.status ~= "running" then return nil, "not running" @@ -65,9 +84,98 @@ local function fill_in_platform_info(feedback) feedback.is_docker = detect_docker_environment() end +local function fill_in_indices_stats(space, stats) + if not space.index[0] then return end + + for name, idx in pairs(space.index) do + if type(name) == 'number' then + local idx_type = idx.type + if idx_type == 'TREE' then + if is_functional_index(idx) then + stats.functional = stats.functional + 1 + elseif is_jsonpath_index(idx) then + stats.jsonpath = stats.jsonpath + 1 + end + stats.tree = stats.tree + 1 + elseif idx_type == 'HASH' then + stats.hash = stats.hash + 1 + elseif idx_type == 'RTREE' then + stats.rtree = stats.rtree + 1 + elseif idx_type == 'BITSET' then + stats.bitset = stats.bitset + 1 + end + end + end +end + +local function fill_in_space_stats(features) + local spaces = { + memtx = 0, + vinyl = 0, + temporary = 0, + ['local'] = 0, + } + + local indices = { + hash = 0, + tree = 0, + rtree = 0, + bitset = 0, + jsonpath = 0, + functional = 0, + } + + for name, space in pairs(box.space) do + local is_system = is_system_space(space) + if not is_system and type(name) == 'number' then + if space.engine == 'vinyl' then + spaces.vinyl = spaces.vinyl + 1 + elseif space.engine == 'memtx' then + if space.temporary ~= nil then + spaces.temporary = spaces.temporary + 1 + end + spaces.memtx = spaces.memtx + 1 + end + if space.is_local == false then + spaces['local'] = spaces['local'] + 1 + end + fill_in_indices_stats(space, indices) + end + fiber.yield() + end + + for k, v in pairs(spaces) do + features[k..'_spaces'] = v + end + + for k, v in pairs(indices) do + features[k..'_indices'] = v + end +end + +local function fill_in_features_impl(features) + fill_in_space_stats(features) +end + +local cached_schema_version = 0 +local cached_feedback_features = {} + +local function fill_in_features(feedback) + local schema_version = box.internal.schema_version() + if cached_schema_version < schema_version then + local features = {} + fill_in_features_impl(features) + cached_schema_version = schema_version + cached_feedback_features = features + end + + feedback.features = cached_feedback_features +end + local function fill_in_feedback(feedback) fill_in_base_info(feedback) fill_in_platform_info(feedback) + fill_in_features(feedback) return feedback end diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua index d4adb71f1..8ef20e0d0 100755 --- a/test/box-tap/feedback_daemon.test.lua +++ b/test/box-tap/feedback_daemon.test.lua @@ -67,7 +67,7 @@ if not ok then os.exit(0) end -test:plan(11) +test:plan(19) local function check(message) while feedback_count < 1 do @@ -113,6 +113,71 @@ check("feedback after start") daemon.send_test() check("feedback after feedback send_test") +-- +-- gh-4943: Collect engines and indices statistics. +-- + +local feedback_json = json.decode(feedback_save) +test:is(type(feedback_json.features), 'table', 'features field is present') +local expected = { + memtx_spaces = 0, + vinyl_spaces = 0, + temporary_spaces = 0, + local_spaces = 0, + tree_indices = 0, + rtree_indices = 0, + hash_indices = 0, + bitset_indices = 0, + jsonpath_indices = 0, + functional_indices = 0, +} +test:is_deeply(feedback_json.features, expected, 'features are empty at the moment') + +box.schema.create_space('features_vinyl', {engine = 'vinyl'}) +box.schema.create_space('features_memtx', + {engine = 'memtx', is_local = true, temporary = true}) +box.space.features_vinyl:create_index('vinyl_pk', {type = 'tree'}) +box.space.features_memtx:create_index('memtx_pk', {type = 'tree'}) +box.space.features_memtx:create_index('memtx_hash', {type = 'hash'}) +box.space.features_memtx:create_index('memtx_bitset', {type = 'bitset'}) +box.space.features_memtx:create_index('memtx_rtree', + {type = 'rtree', parts = {{field = 3, type = 'array'}}}) +box.space.features_memtx:create_index('memtx_jpath', + {parts = {{field = 4, type = 'str', path = 'data.name'}}}) +box.schema.func.create('features_func', { + body = "function(tuple) return {string.sub(tuple[2], 1, 1)} end", + is_deterministic = true, + is_sandboxed = true}) +box.space.features_memtx:create_index('j', + {parts = {{field = 1, type = 'number'}}, func = 'features_func'}) + +check('old feedback received') +feedback_reset() +check('feedback with db features received') + +feedback_json = json.decode(feedback_save) +test:test('features', function(t) + t:plan(10) + t:is(feedback_json.features.memtx_spaces, 1, 'memtx engine usage gathered') + t:is(feedback_json.features.vinyl_spaces, 1, 'vinyl engine usage gathered') + t:is(feedback_json.features.temporary_spaces, 1, 'temporary space usage gathered') + t:is(feedback_json.features.local_spaces, 1, 'local space usage gathered') + t:is(feedback_json.features.tree_indices, 4, 'tree index gathered') + t:is(feedback_json.features.hash_indices, 1, 'hash index gathered') + t:is(feedback_json.features.rtree_indices, 1, 'rtree index gathered') + t:is(feedback_json.features.bitset_indices, 1, 'bitset index gathered') + t:is(feedback_json.features.jsonpath_indices, 1, 'jsonpath index gathered') + t:is(feedback_json.features.functional_indices, 1, 'functional index gathered') +end) + +box.space.features_memtx:create_index('memtx_sec', {type = 'hash'}) + +check('old feedback received') +feedback_reset() +check('feedback with new db features received') +feedback_json = json.decode(feedback_save) +test:is(feedback_json.features.hash_indices, 2, 'internal cache invalidates when schema changes') + daemon.stop() box.feedback.save("feedback.json")
next prev parent reply other threads:[~2020-06-09 23:06 UTC|newest] Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-05 8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov 2020-06-05 8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov 2020-06-07 16:45 ` Vladislav Shpilevoy 2020-06-09 23:05 ` Илья Конюхов 2020-06-11 19:32 ` Vladislav Shpilevoy 2020-07-01 0:16 ` Alexander Turenko 2020-07-05 2:14 ` Alexander Turenko 2020-06-05 8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov 2020-06-07 16:45 ` Vladislav Shpilevoy 2020-06-09 23:06 ` Илья Конюхов [this message] 2020-06-11 19:32 ` Vladislav Shpilevoy 2020-06-17 8:59 ` Илья Конюхов 2020-06-17 22:53 ` Vladislav Shpilevoy 2020-06-18 15:42 ` Илья Конюхов 2020-06-18 23:02 ` Vladislav Shpilevoy 2020-06-19 14:01 ` Илья Конюхов 2020-06-19 23:49 ` Vladislav Shpilevoy 2020-06-22 8:55 ` Илья Конюхов 2020-07-01 0:15 ` Alexander Turenko 2020-07-03 12:05 ` Илья Конюхов 2020-07-05 2:10 ` Alexander Turenko 2020-06-23 21:23 ` [Tarantool-patches] [PATCH 0/2] Extend feedback module report Vladislav Shpilevoy 2020-07-13 13:47 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=EE69E031-21CF-49AC-89CE-8E57CB006E84@gmail.com \ --to=runsfor@gmail.com \ --cc=alexander.turenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox