From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id ACD0D469710 for ; Wed, 10 Jun 2020 02:06:25 +0300 (MSK) Received: by mail-lf1-f65.google.com with SMTP id w15so292875lfe.11 for ; Tue, 09 Jun 2020 16:06:25 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) From: =?utf-8?B?0JjQu9GM0Y8g0JrQvtC90Y7RhdC+0LI=?= In-Reply-To: <67c75c01-8503-2355-e1f7-9644def2179c@tarantool.org> Date: Wed, 10 Jun 2020 02:06:18 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <67c75c01-8503-2355-e1f7-9644def2179c@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org, alexander.turenko@tarantool.org Thanks for the detailed review! I=E2=80=99ve corrected most of the comments, you highlighted. You've = also mentioned we may want to collect some internal statistics in C. It = seems to be more broader topic to talk about. There is still not so = clear to me which stats exactly we need to collect. We should discuss it = more in related GH issue. Right now I would focus on parts we can easily reach from lua. And I = still think this patch may be useful to track a distribution how spaces = and indices are used. I think performance should not be a major issue = here since it is now use caching and intended to run once an hour. In general, based on your feedback I=E2=80=99ve decided to refactor this = patch a bit and: - it now adds a caching for space and index statistics based on schema = version; - redundant primary and secondary index flags are removed; - flags were replaced by counters. Also, I=E2=80=99ve left detailed answers under your comments below. > On 7 Jun 2020, at 19:45, Vladislav Shpilevoy = wrote: >=20 > Thanks for the patch! >=20 > Generally, I don't like having so much Lua code in > the daemon, and system space full scans. Because it > is slow and produces Lua garbage. >=20 > Also anyway it can't collect some internal things > such as whether SQL is used (it is not exposed in > any system spaces), popen, swim, etc. These things > don't register self in any global place. >=20 > I was rather thinking about keeping track of all > these modules and their statistics in C. So as collection > of the statistics would be right when it changes, in > a set of int counters. And statistics dump would cost O(1) > by time, right into a JSON string, without Lua participation > except that it would call this C dumper and put its result > into an http request. >=20 > In other words, I am not sure this commit is needed at all, > until we understand how to collect all the other features > too. >=20 > See 6 comments below. >=20 > On 05/06/2020 10:35, Ilya Konyukhov wrote: >> This patch adds basic db features to feedback report. >> It collects info about what engine and which types of >> indexes are setup by the user. >>=20 >> Here is how report may look like if all the features used: >>=20 >> ```json >> { >> "arch": "x64", >> "features": { >> "has_bitset_index": true, >> "has_jsonpath_index": true, >> "vinyl": true, >> "has_tree_index": true, >> "has_primary_index": true, >> "has_hash_index": true, >> "memtx": true, >> "has_temporary_spaces": true, >> "has_local_spaces": true, >> "has_rtree_index": true, >> "has_secondary_index": true, >> "has_functional_index": true >> }, >> "server_id": "7c8490f7-61c5-4e12-a7ff-d9fed05ad8ac", >> "is_docker": false, >> "os": "OSX", >> "feedback_type": "version", >> "cluster_id": "1eb7d98e-3344-4f15-a439-c287464f09e7", >> "tarantool_version": "2.5.0-90-g27fbe6ecd", >> "feedback_version": 1 >> } >> ``` >>=20 >> Part of #4943 >> --- >> src/box/lua/feedback_daemon.lua | 65 = +++++++++++++++++++++++++++ >> test/box-tap/feedback_daemon.test.lua | 42 ++++++++++++++++- >> 2 files changed, 106 insertions(+), 1 deletion(-) >>=20 >> diff --git a/src/box/lua/feedback_daemon.lua = b/src/box/lua/feedback_daemon.lua >> index 2ce49fb22..0fcd8ed87 100644 >> --- a/src/box/lua/feedback_daemon.lua >> +++ b/src/box/lua/feedback_daemon.lua >> @@ -41,6 +41,15 @@ local function detect_docker_environment() >> return true >> end >>=20 >> +local function is_system_space(sp) >> + local sp_id =3D sp.id >> + if box.schema.SYSTEM_ID_MIN <=3D sp_id and sp_id <=3D = box.schema.SYSTEM_ID_MAX then >> + return true >> + end >=20 > 1. Please, keep code lines inside 80 symbols border. Also this = function > return can be simplified to >=20 > return box.schema.SYSTEM_ID_MIN <=3D sp_id and sp_id <=3D = box.schema.SYSTEM_ID_MAX Ok, collapsed that part >=20 >> + >> + return false >> +end >> + >> local function fill_in_base_info(feedback) >> if box.info.status ~=3D "running" then >> return nil, "not running" >> @@ -56,9 +65,65 @@ local function fill_in_platform_info(feedback) >> feedback.is_docker =3D detect_docker_environment() >> end >>=20 >> +local function fill_in_space_indices(feedback, sp) >> + if not sp.index[0] then return end >> + >> + feedback.features.has_primary_index =3D true >=20 > 2. What is a purpose of this field? Zero-index spaces always > exist, at least because indexes are created in a separate DDL > statement. >=20 > Besides, the function and spaces iteration may be really heavy, > if space count is thousands. Or even hundreds, but with many > indexes. And there is no a yield. >=20 > In addition to yields I ask you to add caching of this function > results using schema version counter. Schema changes very rarely, > so caching would make this function practically free almost > always. - removed primary/secondary fields - added caching. Cache invalidates when schema version updates - added yield after each space iteration >=20 >> + local idx_count =3D 0 >> + for _, idx in pairs(sp.index) do >> + for _, part in pairs(idx.parts) do >> + if part.path ~=3D nil then >> + feedback.features.has_jsonpath_index =3D true >> + break >> + end >> + end >> + if idx.func ~=3D nil then >> + feedback.features.has_functional_index =3D true >> + end >> + if idx.type =3D=3D 'TREE' then >> + feedback.features.has_tree_index =3D true >> + elseif idx.type =3D=3D 'HASH' then >> + feedback.features.has_hash_index =3D true >> + elseif idx.type =3D=3D 'RTREE' then >> + feedback.features.has_rtree_index =3D true >> + elseif idx.type =3D=3D 'BITSET' then >> + feedback.features.has_bitset_index =3D true >> + end >> + idx_count =3D idx_count + 1 >> + end >> + >> + if idx_count > 1 then >> + feedback.features.has_secondary_index =3D true >=20 > 3. This does not look really useful. What is this flag going > to tell us? Secondary indexes exist almost always. >=20 > Besides, I agree with Dmitry's comment about counters instead > of flags. - removed secondary index tracking - introduced counters for other indices. >=20 >> + end >> +end >> + >> +local function fill_in_features(feedback) >> + feedback.features =3D feedback.features or {} >> + >> + local is_memtx, is_vinyl, is_temporary, is_local >> + for _, sp in pairs(box.space) do >> + local is_system =3D is_system_space(sp) >> + if not is_system then >> + if sp.engine =3D=3D 'vinyl' then is_vinyl =3D true end >> + if sp.engine =3D=3D 'memtx' then >> + if sp.temporary ~=3D nil then is_temporary =3D true = end >> + is_memtx =3D true >> + end >> + if sp.is_local ~=3D nil then is_local =3D true end >> + fill_in_space_indices(feedback, sp) >> + end >> + end >> + >> + feedback.features.has_temporary_spaces =3D is_temporary >> + feedback.features.has_local_spaces =3D is_local >> + feedback.features.memtx =3D is_memtx >> + feedback.features.vinyl =3D is_vinyl >=20 > 4. Why do some flags have prefix 'has_', some have 'is_', > and some are just nouns like 'memtx', 'vinyl'? Lets be > consistent and use one name template. For that type of > flags in C we would use 'has_'. With counters it all now has suffixes like =E2=80=9C_spaces=E2=80=9D and = =E2=80=9C_indices" >=20 >> +end >> diff --git a/test/box-tap/feedback_daemon.test.lua = b/test/box-tap/feedback_daemon.test.lua >> index c36b2a694..e382af8e8 100755 >> --- a/test/box-tap/feedback_daemon.test.lua >> +++ b/test/box-tap/feedback_daemon.test.lua >> @@ -113,6 +113,46 @@ check("feedback after start") >> daemon.send_test() >> check("feedback after feedback send_test") >>=20 >> +local feedback_json =3D json.decode(feedback_save) >=20 > 5. When write a test for an issue, please, mention the > issue in a comment and describe it shortly. Like this: >=20 > -- > -- gh-####: description. > =E2=80=94 >=20 Done >> +test:is(type(feedback_json.features), 'table', 'features field is = present') >> +test:isnil(next(feedback_json.features), 'features are empty at the = moment') >> + >> +box.schema.create_space('features_vinyl', {engine =3D 'vinyl'}) >> +box.schema.create_space('features_memtx', {engine =3D 'memtx', = is_local =3D true, temporary =3D true}) >> +box.space.features_memtx:create_index('vinyl_pk', {type =3D 'tree'}) >> +box.space.features_memtx:create_index('memtx_pk', {type =3D 'hash'}) >> +box.space.features_memtx:create_index('memtx_bitset', {type =3D = 'bitset'}) >> +box.space.features_memtx:create_index('memtx_rtree', {type =3D = 'rtree', parts =3D {3, 'array'}}) >> +box.space.features_memtx:create_index('memtx_jpath', >> + {parts =3D {{field=3D4, type=3D'str', path=3D'data.name'}}}) >=20 > 6. Please, be consistent in the code style. Surround '=3D' with = whitespaces, > add a whitespace after ',' (see your code below). >=20 Adjusted code style here. Thanks for pointing it out. >> +box.schema.func.create('features_func', { >> + body =3D "function(tuple) return {string.sub(tuple[2],1,1)} = end", >> + is_deterministic =3D true, >> + is_sandboxed =3D true}) >> +box.space.features_memtx:create_index('j', >> + {parts=3D{{field =3D 1, type =3D 'number'}},func =3D = 'features_func'}) >> + >> +check('old feedback received') >> +feedback_reset() >> +check('feedback with db features received') >> + >> +feedback_json =3D json.decode(feedback_save) >> +test:test('features', function(t) >> + t:plan(12) >> + t:ok(feedback_json.features.memtx, 'memtx engine usage = gathered') >> + t:ok(feedback_json.features.vinyl, 'vinyl engine usage = gathered') >> + t:ok(feedback_json.features.has_temporary_spaces, 'temporary = space usage gathered') >> + t:ok(feedback_json.features.has_local_spaces, 'local space usage = gathered') >> + t:ok(feedback_json.features.has_primary_index, 'primary index = gathered') >> + t:ok(feedback_json.features.has_secondary_index, 'secondary = index gathered') >> + t:ok(feedback_json.features.has_tree_index, 'tree index = gathered') >> + t:ok(feedback_json.features.has_hash_index, 'hash index = gathered') >> + t:ok(feedback_json.features.has_rtree_index, 'rtree index = gathered') >> + t:ok(feedback_json.features.has_bitset_index, 'bitset index = gathered') >> + t:ok(feedback_json.features.has_jsonpath_index, 'jsonpath index = gathered') >> + t:ok(feedback_json.features.has_functional_index, 'functional = index gathered') >> +end) >> + >> daemon.stop() >>=20 >> box.feedback.save("feedback.json=E2=80=9D) >>=20 diff --git a/src/box/lua/feedback_daemon.lua = b/src/box/lua/feedback_daemon.lua index 21e69d511..1f177a204 100644 --- a/src/box/lua/feedback_daemon.lua +++ b/src/box/lua/feedback_daemon.lua @@ -50,6 +50,25 @@ local function detect_docker_environment() return cached_detect_docker_env end =20 +local function is_system_space(sp) + return box.schema.SYSTEM_ID_MIN <=3D sp.id and + sp.id <=3D box.schema.SYSTEM_ID_MAX +end + +local function is_jsonpath_index(idx) + for _, part in pairs(idx.parts) do + if part.path ~=3D nil then + return true + end + end + + return false +end + +local function is_functional_index(idx) + return idx.func ~=3D nil +end + local function fill_in_base_info(feedback) if box.info.status ~=3D "running" then return nil, "not running" @@ -65,9 +84,98 @@ local function fill_in_platform_info(feedback) feedback.is_docker =3D detect_docker_environment() end =20 +local function fill_in_indices_stats(space, stats) + if not space.index[0] then return end + + for name, idx in pairs(space.index) do + if type(name) =3D=3D 'number' then + local idx_type =3D idx.type + if idx_type =3D=3D 'TREE' then + if is_functional_index(idx) then + stats.functional =3D stats.functional + 1 + elseif is_jsonpath_index(idx) then + stats.jsonpath =3D stats.jsonpath + 1 + end + stats.tree =3D stats.tree + 1 + elseif idx_type =3D=3D 'HASH' then + stats.hash =3D stats.hash + 1 + elseif idx_type =3D=3D 'RTREE' then + stats.rtree =3D stats.rtree + 1 + elseif idx_type =3D=3D 'BITSET' then + stats.bitset =3D stats.bitset + 1 + end + end + end +end + +local function fill_in_space_stats(features) + local spaces =3D { + memtx =3D 0, + vinyl =3D 0, + temporary =3D 0, + ['local'] =3D 0, + } + + local indices =3D { + hash =3D 0, + tree =3D 0, + rtree =3D 0, + bitset =3D 0, + jsonpath =3D 0, + functional =3D 0, + } + + for name, space in pairs(box.space) do + local is_system =3D is_system_space(space) + if not is_system and type(name) =3D=3D 'number' then + if space.engine =3D=3D 'vinyl' then + spaces.vinyl =3D spaces.vinyl + 1 + elseif space.engine =3D=3D 'memtx' then + if space.temporary ~=3D nil then + spaces.temporary =3D spaces.temporary + 1 + end + spaces.memtx =3D spaces.memtx + 1 + end + if space.is_local =3D=3D false then + spaces['local'] =3D spaces['local'] + 1 + end + fill_in_indices_stats(space, indices) + end + fiber.yield() + end + + for k, v in pairs(spaces) do + features[k..'_spaces'] =3D v + end + + for k, v in pairs(indices) do + features[k..'_indices'] =3D v + end +end + +local function fill_in_features_impl(features) + fill_in_space_stats(features) +end + +local cached_schema_version =3D 0 +local cached_feedback_features =3D {} + +local function fill_in_features(feedback) + local schema_version =3D box.internal.schema_version() + if cached_schema_version < schema_version then + local features =3D {} + fill_in_features_impl(features) + cached_schema_version =3D schema_version + cached_feedback_features =3D features + end + + feedback.features =3D cached_feedback_features +end + local function fill_in_feedback(feedback) fill_in_base_info(feedback) fill_in_platform_info(feedback) + fill_in_features(feedback) =20 return feedback end diff --git a/test/box-tap/feedback_daemon.test.lua = b/test/box-tap/feedback_daemon.test.lua index d4adb71f1..8ef20e0d0 100755 --- a/test/box-tap/feedback_daemon.test.lua +++ b/test/box-tap/feedback_daemon.test.lua @@ -67,7 +67,7 @@ if not ok then os.exit(0) end =20 -test:plan(11) +test:plan(19) =20 local function check(message) while feedback_count < 1 do @@ -113,6 +113,71 @@ check("feedback after start") daemon.send_test() check("feedback after feedback send_test") =20 +-- +-- gh-4943: Collect engines and indices statistics. +-- + +local feedback_json =3D json.decode(feedback_save) +test:is(type(feedback_json.features), 'table', 'features field is = present') +local expected =3D { + memtx_spaces =3D 0, + vinyl_spaces =3D 0, + temporary_spaces =3D 0, + local_spaces =3D 0, + tree_indices =3D 0, + rtree_indices =3D 0, + hash_indices =3D 0, + bitset_indices =3D 0, + jsonpath_indices =3D 0, + functional_indices =3D 0, +} +test:is_deeply(feedback_json.features, expected, 'features are empty at = the moment') + +box.schema.create_space('features_vinyl', {engine =3D 'vinyl'}) +box.schema.create_space('features_memtx', + {engine =3D 'memtx', is_local =3D true, temporary =3D true}) +box.space.features_vinyl:create_index('vinyl_pk', {type =3D 'tree'}) +box.space.features_memtx:create_index('memtx_pk', {type =3D 'tree'}) +box.space.features_memtx:create_index('memtx_hash', {type =3D 'hash'}) +box.space.features_memtx:create_index('memtx_bitset', {type =3D = 'bitset'}) +box.space.features_memtx:create_index('memtx_rtree', + {type =3D 'rtree', parts =3D {{field =3D 3, type =3D = 'array'}}}) +box.space.features_memtx:create_index('memtx_jpath', + {parts =3D {{field =3D 4, type =3D 'str', path =3D = 'data.name'}}}) +box.schema.func.create('features_func', { + body =3D "function(tuple) return {string.sub(tuple[2], 1, 1)} end", + is_deterministic =3D true, + is_sandboxed =3D true}) +box.space.features_memtx:create_index('j', + {parts =3D {{field =3D 1, type =3D 'number'}}, func =3D = 'features_func'}) + +check('old feedback received') +feedback_reset() +check('feedback with db features received') + +feedback_json =3D json.decode(feedback_save) +test:test('features', function(t) + t:plan(10) + t:is(feedback_json.features.memtx_spaces, 1, 'memtx engine usage = gathered') + t:is(feedback_json.features.vinyl_spaces, 1, 'vinyl engine usage = gathered') + t:is(feedback_json.features.temporary_spaces, 1, 'temporary space = usage gathered') + t:is(feedback_json.features.local_spaces, 1, 'local space usage = gathered') + t:is(feedback_json.features.tree_indices, 4, 'tree index gathered') + t:is(feedback_json.features.hash_indices, 1, 'hash index gathered') + t:is(feedback_json.features.rtree_indices, 1, 'rtree index = gathered') + t:is(feedback_json.features.bitset_indices, 1, 'bitset index = gathered') + t:is(feedback_json.features.jsonpath_indices, 1, 'jsonpath index = gathered') + t:is(feedback_json.features.functional_indices, 1, 'functional = index gathered') +end) + +box.space.features_memtx:create_index('memtx_sec', {type =3D 'hash'}) + +check('old feedback received') +feedback_reset() +check('feedback with new db features received') +feedback_json =3D json.decode(feedback_save) +test:is(feedback_json.features.hash_indices, 2, 'internal cache = invalidates when schema changes') + daemon.stop() =20 box.feedback.save("feedback.json")