From: Alexander Turenko <alexander.turenko@tarantool.org> To: Ilya Konyukhov <runsfor@gmail.com> Cc: Oleg Babin <babinoleg@mail.ru>, tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Date: Wed, 1 Jul 2020 03:15:44 +0300 [thread overview] Message-ID: <20200701001544.wxx275pojfj3su5o@tkn_work_nb> (raw) In-Reply-To: <ef0528997c2d120730f40af9f6d70c7bd2db08c1.1591345474.git.runsfor@gmail.com> (CCed all persons who is mentioned below.) You all know it, but I cannot keep myself from saying again that I'm not fan on the idea of the built-in/enabled-by-default reporting tool. Especially when it is desined in the way that a user don't have a guarantee that its IP address is not collected. It is not in the opensource spirit and is not kindly to our users (to ones who care about privacy). I understand why the organization need it: it give some view on community -- what is useful and what is not. It may help when we, say, prioritize problems. But this fact does not change anything: garthering IP addresses and some information without expicit agreement does not become a friendly action. And while we're on the topic, there was the idea from Mons that it would feel more comfortable if we would offer something useful for a user together with the reporting tool: say, provide information about critical / security problems in a running version if it is affected. Or feed news about important updates in functionality that is relevant for a particular application. I would add from myself that this way the reporting tool not even necessarily must be enabled default to be useful for us: there is a motivation to enable it. Win-win. Collect only from ones who explicitly said that is okay with it. We also can structurize and improve our documentation and see what topics are more popular and what are less. Collect download statistics from repositories, rocks and git clones. Here a user explicitly asked an action with a remote server and (s)he most likely know that the remote server may count it. Pasted actual version to comment it inline. Nothing critical in fact. WBR, Alexander Turenko. > "feedback_type": "version", Was it planned to report several maps with different types? If we decided to report everything within one map, then this 'feedback_type' field has no sense anymore and just confusing, right? If so, then maybe it is the time to remove it and bump 'feedback_version'? > +local function is_system_space(sp) > + return box.schema.SYSTEM_ID_MIN <= sp.id and > + sp.id <= box.schema.SYSTEM_ID_MAX > +end Nit: a space object is named 'space' below, but 'sp' above. I agree with Max and generally prefer to avoid abbreviations. > +local function is_jsonpath_index(idx) > + for _, part in pairs(idx.parts) do > + if part.path ~= nil then > + return true > + end > + end > + > + return false > +end I agree with Oleg's point: it worth to count multikey indices too. I don't know much, but what I understood from the commit message [1]: - Any multikey index is JSON path index. - A multikey index has '[*]' in 'path'. Looks quite simple to count if I don't miss anything. Please, verify and/or ask about it someone who knows more. [1]: https://github.com/tarantool/tarantool/commit/f1d9f2575c11afcf3b0368d38f44ae399cd7e045 > +local function fill_in_schema_stats_impl(schema) > + local spaces = { > + memtx = 0, > + vinyl = 0, > + temporary = 0, > + ['local'] = 0, > + } > + > + local indices = { > + hash = 0, > + tree = 0, > + rtree = 0, > + bitset = 0, > + jsonpath = 0, > + functional = 0, > + } > + > + for name, space in pairs(box.space) do > + local is_system = is_system_space(space) > + if not is_system and type(name) == 'number' then > + if space.engine == 'vinyl' then > + spaces.vinyl = spaces.vinyl + 1 > + elseif space.engine == 'memtx' then > + if space.temporary then > + spaces.temporary = spaces.temporary + 1 > + end > + spaces.memtx = spaces.memtx + 1 > + end > + if space.is_local then > + spaces['local'] = spaces['local'] + 1 > + end > + fill_in_indices_stats(space, indices) > + end > + fiber.yield() What if DDL (say, a space drop) will occur here? I think we should check schema version right after exiting from yield and skip / repeat collecting statistics if the schema version is changed. I don't sure that iteration over `box.space` over DDL is safe -- what if a current space is dropped? Is the behaviour will be like the following? | tarantool> t = {a = 1, b = 2} | --- | ... | tarantool> for k, v in pairs(t) do if k == 'a' then t['a'] = nil end end | --- | ... So, let's check it within the same iteration of the loop: just after exiting from the yield. > + end > + > + for k, v in pairs(spaces) do > + schema[k..'_spaces'] = v > + end > + > + for k, v in pairs(indices) do > + schema[k..'_indices'] = v > + end Nit: We collect those metrics into one structure and then transform it into another one. Is one better and another worse? If so, it seems logical to use one. I was triggered on this, because in fact it is using of different names for the same things: it usually does not improve code readability. Anyway, if you have understanding why nested structuring is good in the code, but bad in a report, proceed with your way. I don't have a strong preference here (but I'm sympathetic to the nesting way).
next prev parent reply other threads:[~2020-07-01 0:16 UTC|newest] Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-05 8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov 2020-06-05 8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov 2020-06-07 16:45 ` Vladislav Shpilevoy 2020-06-09 23:05 ` Илья Конюхов 2020-06-11 19:32 ` Vladislav Shpilevoy 2020-07-01 0:16 ` Alexander Turenko 2020-07-05 2:14 ` Alexander Turenko 2020-06-05 8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov 2020-06-07 16:45 ` Vladislav Shpilevoy 2020-06-09 23:06 ` Илья Конюхов 2020-06-11 19:32 ` Vladislav Shpilevoy 2020-06-17 8:59 ` Илья Конюхов 2020-06-17 22:53 ` Vladislav Shpilevoy 2020-06-18 15:42 ` Илья Конюхов 2020-06-18 23:02 ` Vladislav Shpilevoy 2020-06-19 14:01 ` Илья Конюхов 2020-06-19 23:49 ` Vladislav Shpilevoy 2020-06-22 8:55 ` Илья Конюхов 2020-07-01 0:15 ` Alexander Turenko [this message] 2020-07-03 12:05 ` Илья Конюхов 2020-07-05 2:10 ` Alexander Turenko 2020-06-23 21:23 ` [Tarantool-patches] [PATCH 0/2] Extend feedback module report Vladislav Shpilevoy 2020-07-13 13:47 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200701001544.wxx275pojfj3su5o@tkn_work_nb \ --to=alexander.turenko@tarantool.org \ --cc=babinoleg@mail.ru \ --cc=runsfor@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox