Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alexander Turenko <alexander.turenko@tarantool.org>
To: Ilya Konyukhov <runsfor@gmail.com>
Cc: Oleg Babin <babinoleg@mail.ru>,
	tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
Date: Wed, 1 Jul 2020 03:15:44 +0300	[thread overview]
Message-ID: <20200701001544.wxx275pojfj3su5o@tkn_work_nb> (raw)
In-Reply-To: <ef0528997c2d120730f40af9f6d70c7bd2db08c1.1591345474.git.runsfor@gmail.com>

(CCed all persons who is mentioned below.)

You all know it, but I cannot keep myself from saying again that I'm not
fan on the idea of the built-in/enabled-by-default reporting tool.
Especially when it is desined in the way that a user don't have a
guarantee that its IP address is not collected. It is not in the
opensource spirit and is not kindly to our users (to ones who care about
privacy).

I understand why the organization need it: it give some view on
community -- what is useful and what is not. It may help when we, say,
prioritize problems. But this fact does not change anything: garthering
IP addresses and some information without expicit agreement does not
become a friendly action.

And while we're on the topic, there was the idea from Mons that it would
feel more comfortable if we would offer something useful for a user
together with the reporting tool: say, provide information about
critical / security problems in a running version if it is affected. Or
feed news about important updates in functionality that is relevant for
a particular application. I would add from myself that this way the
reporting tool not even necessarily must be enabled default to be useful
for us: there is a motivation to enable it. Win-win. Collect only from
ones who explicitly said that is okay with it.

We also can structurize and improve our documentation and see what
topics are more popular and what are less. Collect download statistics
from repositories, rocks and git clones. Here a user explicitly asked an
action with a remote server and (s)he most likely know that the remote
server may count it.

Pasted actual version to comment it inline. Nothing critical in fact.

WBR, Alexander Turenko.

>       "feedback_type": "version",

Was it planned to report several maps with different types? If we
decided to report everything within one map, then this 'feedback_type'
field has no sense anymore and just confusing, right? If so, then maybe
it is the time to remove it and bump 'feedback_version'?

> +local function is_system_space(sp)
> +    return box.schema.SYSTEM_ID_MIN <= sp.id and
> +           sp.id <= box.schema.SYSTEM_ID_MAX
> +end

Nit: a space object is named 'space' below, but 'sp' above. I agree with
Max and generally prefer to avoid abbreviations.

> +local function is_jsonpath_index(idx)
> +    for _, part in pairs(idx.parts) do
> +        if part.path ~= nil then
> +            return true
> +        end
> +    end
> +
> +    return false
> +end

I agree with Oleg's point: it worth to count multikey indices too. I
don't know much, but what I understood from the commit message [1]:

- Any multikey index is JSON path index.
- A multikey index has '[*]' in 'path'.

Looks quite simple to count if I don't miss anything. Please, verify
and/or ask about it someone who knows more.

[1]: https://github.com/tarantool/tarantool/commit/f1d9f2575c11afcf3b0368d38f44ae399cd7e045

> +local function fill_in_schema_stats_impl(schema)
> +    local spaces = {
> +        memtx     = 0,
> +        vinyl     = 0,
> +        temporary = 0,
> +        ['local'] = 0,
> +    }
> +
> +    local indices = {
> +        hash       = 0,
> +        tree       = 0,
> +        rtree      = 0,
> +        bitset     = 0,
> +        jsonpath   = 0,
> +        functional = 0,
> +    }
> +
> +    for name, space in pairs(box.space) do
> +        local is_system = is_system_space(space)
> +        if not is_system and type(name) == 'number' then
> +            if space.engine == 'vinyl' then
> +                spaces.vinyl = spaces.vinyl + 1
> +            elseif space.engine == 'memtx' then
> +                if space.temporary then
> +                    spaces.temporary = spaces.temporary + 1
> +                end
> +                spaces.memtx = spaces.memtx + 1
> +            end
> +            if space.is_local then
> +                spaces['local'] = spaces['local'] + 1
> +            end
> +            fill_in_indices_stats(space, indices)
> +        end
> +        fiber.yield()

What if DDL (say, a space drop) will occur here? I think we should check
schema version right after exiting from yield and skip / repeat
collecting statistics if the schema version is changed.

I don't sure that iteration over `box.space` over DDL is safe -- what
if a current space is dropped? Is the behaviour will be like the
following?

 | tarantool> t = {a = 1, b = 2}
 | ---
 | ...
 | tarantool> for k, v in pairs(t) do if k == 'a' then t['a'] = nil end end
 | ---
 | ...

So, let's check it within the same iteration of the loop: just after
exiting from the yield.

> +    end
> +
> +    for k, v in pairs(spaces) do
> +        schema[k..'_spaces'] = v
> +    end
> +
> +    for k, v in pairs(indices) do
> +        schema[k..'_indices'] = v
> +    end

Nit: We collect those metrics into one structure and then transform it
into another one. Is one better and another worse? If so, it seems
logical to use one.

I was triggered on this, because in fact it is using of different names
for the same things: it usually does not improve code readability.

Anyway, if you have understanding why nested structuring is good in the
code, but bad in a report, proceed with your way. I don't have a strong
preference here (but I'm sympathetic to the nesting way).

  parent reply	other threads:[~2020-07-01  0:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-05  8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov
2020-06-05  8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov
2020-06-07 16:45   ` Vladislav Shpilevoy
2020-06-09 23:05     ` Илья Конюхов
2020-06-11 19:32       ` Vladislav Shpilevoy
2020-07-01  0:16       ` Alexander Turenko
2020-07-05  2:14         ` Alexander Turenko
2020-06-05  8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov
2020-06-07 16:45   ` Vladislav Shpilevoy
2020-06-09 23:06     ` Илья Конюхов
2020-06-11 19:32       ` Vladislav Shpilevoy
2020-06-17  8:59         ` Илья Конюхов
2020-06-17 22:53           ` Vladislav Shpilevoy
2020-06-18 15:42             ` Илья Конюхов
2020-06-18 23:02               ` Vladislav Shpilevoy
2020-06-19 14:01                 ` Илья Конюхов
2020-06-19 23:49                   ` Vladislav Shpilevoy
2020-06-22  8:55                     ` Илья Конюхов
2020-07-01  0:15   ` Alexander Turenko [this message]
2020-07-03 12:05     ` Илья Конюхов
2020-07-05  2:10       ` Alexander Turenko
2020-06-23 21:23 ` [Tarantool-patches] [PATCH 0/2] Extend feedback module report Vladislav Shpilevoy
2020-07-13 13:47 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200701001544.wxx275pojfj3su5o@tkn_work_nb \
    --to=alexander.turenko@tarantool.org \
    --cc=babinoleg@mail.ru \
    --cc=runsfor@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox