[Tarantool-patches] [PATCH 0/2] Extend feedback module report

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH 0/2] Extend feedback module report
@ 2020-06-05  8:35 Ilya Konyukhov
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Ilya Konyukhov @ 2020-06-05  8:35 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, alexander.turenko

Following patchset adds more information about running
tarantool instance. Firstly, it collects platform information
such as os, arch and docker environment. Secondly, it looks
through box user spaces and collects info about db engines and
indices.

Related issues:
- https://github.com/tarantool/tarantool/issues/3608
- https://github.com/tarantool/tarantool/issues/4943

PR:
- https://github.com/tarantool/tarantool/pull/5039

Branch:
- RunsFor:runsfor/gh-3608-platform-detection-for-feedback

Ilya Konyukhov (2):
  feedback: determine runtime platform info
  feedback: collect db engines and index features

 cmake/os.cmake                        |  2 +-
 src/box/lua/feedback_daemon.lua       | 94 ++++++++++++++++++++++++++-
 test/box-tap/feedback_daemon.test.lua | 45 ++++++++++++-
 3 files changed, 136 insertions(+), 5 deletions(-)

-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info
  2020-06-05  8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov
@ 2020-06-05  8:35 ` Ilya Konyukhov
  2020-06-07 16:45   ` Vladislav Shpilevoy
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Ilya Konyukhov @ 2020-06-05  8:35 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, alexander.turenko

This patch detect which platform instance is running on.
It uses luajit `jit` module to get OS name and architecture.
Se more in [docs page](https://luajit.org/ext_jit.html).

Also it tries to figure out whether instance is running
inside docker container or not. It's difficult know
accurately but one of the most stable and simple ways
at the same time is to look in
[`/proc/1/cgroup`](https://stackoverflow.com/a/20012536/1881632)
file.

Closes #3608
Related to #4943
---
 cmake/os.cmake                        |  2 +-
 src/box/lua/feedback_daemon.lua       | 29 ++++++++++++++++++++++++++-
 test/box-tap/feedback_daemon.test.lua |  3 +--
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/cmake/os.cmake b/cmake/os.cmake
index 905be61df..462bdccc3 100644
--- a/cmake/os.cmake
+++ b/cmake/os.cmake
@@ -78,7 +78,7 @@ elseif (${CMAKE_SYSTEM_NAME} STREQUAL "Darwin")
 
 #
 # Default build type is None, which uses depends by Apple
-# command line tools. Also supportting install with MacPorts.
+# command line tools. Also supporting install with MacPorts.
 #
     if (NOT DARWIN_BUILD_TYPE)
         set(DARWIN_BUILD_TYPE None CACHE STRING
diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 95130d696..2ce49fb22 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -26,13 +26,40 @@ local function get_fiber_id(f)
     return fid
 end
 
-local function fill_in_feedback(feedback)
+local function detect_docker_environment()
+    local fh = fio.open('/proc/1/cgroup', {'O_RDONLY'})
+    if not fh then return false end
+
+    -- fh:read() doesn't read empty files
+    local big_enough_chunk = 4096
+    local ok = fh:read(big_enough_chunk)
+    fh:close()
+    if not ok then return false end
+
+    if not string.find(ok, 'docker') then return false end
+
+    return true
+end
+
+local function fill_in_base_info(feedback)
     if box.info.status ~= "running" then
         return nil, "not running"
     end
     feedback.tarantool_version = box.info.version
     feedback.server_id         = box.info.uuid
     feedback.cluster_id        = box.info.cluster.uuid
+end
+
+local function fill_in_platform_info(feedback)
+    feedback.os        = jit.os
+    feedback.arch      = jit.arch
+    feedback.is_docker = detect_docker_environment()
+end
+
+local function fill_in_feedback(feedback)
+    fill_in_base_info(feedback)
+    fill_in_platform_info(feedback)
+
     return feedback
 end
 
diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
index d4adb71f1..c36b2a694 100755
--- a/test/box-tap/feedback_daemon.test.lua
+++ b/test/box-tap/feedback_daemon.test.lua
@@ -131,5 +131,4 @@ daemon.start()
 daemon.send_test()
 daemon.stop()
 
-test:check()
-os.exit(0)
+os.exit(test:check() and 0 or 1)
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov
@ 2020-06-07 16:45   ` Vladislav Shpilevoy
  2020-06-09 23:05     ` Илья Конюхов
  0 siblings, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-07 16:45 UTC (permalink / raw)
  To: Ilya Konyukhov, tarantool-patches; +Cc: alexander.turenko

Hi! Thanks for the patch!

See 4 comments below.

On 05/06/2020 10:35, Ilya Konyukhov wrote:
> This patch detect which platform instance is running on.
> It uses luajit `jit` module to get OS name and architecture.
> Se more in [docs page](https://luajit.org/ext_jit.html).
> 
> Also it tries to figure out whether instance is running
> inside docker container or not. It's difficult know
> accurately but one of the most stable and simple ways
> at the same time is to look in
> [`/proc/1/cgroup`](https://stackoverflow.com/a/20012536/1881632)
> file.
> 
> Closes #3608
> Related to #4943
> ---
>  cmake/os.cmake                        |  2 +-
>  src/box/lua/feedback_daemon.lua       | 29 ++++++++++++++++++++++++++-
>  test/box-tap/feedback_daemon.test.lua |  3 +--
>  3 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/cmake/os.cmake b/cmake/os.cmake
> index 905be61df..462bdccc3 100644
> --- a/cmake/os.cmake
> +++ b/cmake/os.cmake
> @@ -78,7 +78,7 @@ elseif (${CMAKE_SYSTEM_NAME} STREQUAL "Darwin")
>  
>  #
>  # Default build type is None, which uses depends by Apple
> -# command line tools. Also supportting install with MacPorts.
> +# command line tools. Also supporting install with MacPorts.

1. Lets better avoid unnecessary diff, not related to the patch
subject.

>  #
>      if (NOT DARWIN_BUILD_TYPE)
>          set(DARWIN_BUILD_TYPE None CACHE STRING
> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
> index 95130d696..2ce49fb22 100644
> --- a/src/box/lua/feedback_daemon.lua
> +++ b/src/box/lua/feedback_daemon.lua
> @@ -26,13 +26,40 @@ local function get_fiber_id(f)
>      return fid
>  end
>  
> -local function fill_in_feedback(feedback)
> +local function detect_docker_environment()
> +    local fh = fio.open('/proc/1/cgroup', {'O_RDONLY'})
> +    if not fh then return false end

2. Please, write conditions and their bodies on
separate lines. In the second patch too.

> +
> +    -- fh:read() doesn't read empty files
> +    local big_enough_chunk = 4096
> +    local ok = fh:read(big_enough_chunk)
> +    fh:close()
> +    if not ok then return false end
> +
> +    if not string.find(ok, 'docker') then return false end
> +
> +    return true
> +end
> +
> +local function fill_in_base_info(feedback)
>      if box.info.status ~= "running" then
>          return nil, "not running"
>      end
>      feedback.tarantool_version = box.info.version
>      feedback.server_id         = box.info.uuid
>      feedback.cluster_id        = box.info.cluster.uuid
> +end
> +
> +local function fill_in_platform_info(feedback)
> +    feedback.os        = jit.os
> +    feedback.arch      = jit.arch
> +    feedback.is_docker = detect_docker_environment()

3. detect_docker_environment() involves file reading, up to 4KB.
On my machine it was 1KB. It would be better to avoid calling it
multiple times. I suggest you to call this function only once, and
then re-use the result. It can't change anyway.

> +end
> +
> +local function fill_in_feedback(feedback)
> +    fill_in_base_info(feedback)
> +    fill_in_platform_info(feedback)
> +
>      return feedback
>  end
>  
> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
> index d4adb71f1..c36b2a694 100755
> --- a/test/box-tap/feedback_daemon.test.lua
> +++ b/test/box-tap/feedback_daemon.test.lua
> @@ -131,5 +131,4 @@ daemon.start()
>  daemon.send_test()
>  daemon.stop()
>  
> -test:check()
> -os.exit(0)
> +os.exit(test:check() and 0 or 1)

4. Unnecessary diff, not related to the issue.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info
  2020-06-07 16:45   ` Vladislav Shpilevoy
@ 2020-06-09 23:05     ` Илья Конюхов
  2020-06-11 19:32       ` Vladislav Shpilevoy
  2020-07-01  0:16       ` Alexander Turenko
  0 siblings, 2 replies; 23+ messages in thread
From: Илья Конюхов @ 2020-06-09 23:05 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko

Hi, thanks for your review!

> On 7 Jun 2020, at 19:45, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the patch!
> 
> See 4 comments below.
> 
> On 05/06/2020 10:35, Ilya Konyukhov wrote:
>> This patch detect which platform instance is running on.
>> It uses luajit `jit` module to get OS name and architecture.
>> Se more in [docs page](https://luajit.org/ext_jit.html).
>> 
>> Also it tries to figure out whether instance is running
>> inside docker container or not. It's difficult know
>> accurately but one of the most stable and simple ways
>> at the same time is to look in
>> [`/proc/1/cgroup`](https://stackoverflow.com/a/20012536/1881632)
>> file.
>> 
>> Closes #3608
>> Related to #4943
>> ---
>> cmake/os.cmake                        |  2 +-
>> src/box/lua/feedback_daemon.lua       | 29 ++++++++++++++++++++++++++-
>> test/box-tap/feedback_daemon.test.lua |  3 +--
>> 3 files changed, 30 insertions(+), 4 deletions(-)
>> 
>> diff --git a/cmake/os.cmake b/cmake/os.cmake
>> index 905be61df..462bdccc3 100644
>> --- a/cmake/os.cmake
>> +++ b/cmake/os.cmake
>> @@ -78,7 +78,7 @@ elseif (${CMAKE_SYSTEM_NAME} STREQUAL "Darwin")
>> 
>> #
>> # Default build type is None, which uses depends by Apple
>> -# command line tools. Also supportting install with MacPorts.
>> +# command line tools. Also supporting install with MacPorts.
> 
> 1. Lets better avoid unnecessary diff, not related to the patch
> subject.

Got it.

> 
>> #
>>     if (NOT DARWIN_BUILD_TYPE)
>>         set(DARWIN_BUILD_TYPE None CACHE STRING
>> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
>> index 95130d696..2ce49fb22 100644
>> --- a/src/box/lua/feedback_daemon.lua
>> +++ b/src/box/lua/feedback_daemon.lua
>> @@ -26,13 +26,40 @@ local function get_fiber_id(f)
>>     return fid
>> end
>> 
>> -local function fill_in_feedback(feedback)
>> +local function detect_docker_environment()
>> +    local fh = fio.open('/proc/1/cgroup', {'O_RDONLY'})
>> +    if not fh then return false end
> 
> 2. Please, write conditions and their bodies on
> separate lines. In the second patch too.

Done. I’ve adjusted ifs and simplified function code a bit.

> 
>> +
>> +    -- fh:read() doesn't read empty files
>> +    local big_enough_chunk = 4096
>> +    local ok = fh:read(big_enough_chunk)
>> +    fh:close()
>> +    if not ok then return false end
>> +
>> +    if not string.find(ok, 'docker') then return false end
>> +
>> +    return true
>> +end
>> +
>> +local function fill_in_base_info(feedback)
>>     if box.info.status ~= "running" then
>>         return nil, "not running"
>>     end
>>     feedback.tarantool_version = box.info.version
>>     feedback.server_id         = box.info.uuid
>>     feedback.cluster_id        = box.info.cluster.uuid
>> +end
>> +
>> +local function fill_in_platform_info(feedback)
>> +    feedback.os        = jit.os
>> +    feedback.arch      = jit.arch
>> +    feedback.is_docker = detect_docker_environment()
> 
> 3. detect_docker_environment() involves file reading, up to 4KB.
> On my machine it was 1KB. It would be better to avoid calling it
> multiple times. I suggest you to call this function only once, and
> then re-use the result. It can't change anyway.

Thanks for a suggestion about caching. I’ve splitted the function into two parts and added caching part. Now file reading will be invoked only once.

> 
>> +end
>> +
>> +local function fill_in_feedback(feedback)
>> +    fill_in_base_info(feedback)
>> +    fill_in_platform_info(feedback)
>> +
>>     return feedback
>> end
>> 
>> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
>> index d4adb71f1..c36b2a694 100755
>> --- a/test/box-tap/feedback_daemon.test.lua
>> +++ b/test/box-tap/feedback_daemon.test.lua
>> @@ -131,5 +131,4 @@ daemon.start()
>> daemon.send_test()
>> daemon.stop()
>> 
>> -test:check()
>> -os.exit(0)
>> +os.exit(test:check() and 0 or 1)
> 
> 4. Unnecessary diff, not related to the issue.

Reverted back


diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 95130d696..21e69d511 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -26,13 +26,49 @@ local function get_fiber_id(f)
     return fid
 end
 
-local function fill_in_feedback(feedback)
+local function detect_docker_environment_impl()
+    local fh = fio.open('/proc/1/cgroup', {'O_RDONLY'})
+    if not fh then
+        return false
+    end
+
+    -- fh:read() doesn't read empty "proc" files
+    local big_enough_chunk = 4096
+    local s = fh:read(big_enough_chunk)
+    fh:close()
+
+    return s and s:find('docker') and true or false
+end
+
+local cached_detect_docker_env
+
+local function detect_docker_environment()
+    if cached_detect_docker_env == nil then
+        cached_detect_docker_env = detect_docker_environment_impl()
+    end
+
+    return cached_detect_docker_env
+end
+
+local function fill_in_base_info(feedback)
     if box.info.status ~= "running" then
         return nil, "not running"
     end
     feedback.tarantool_version = box.info.version
     feedback.server_id         = box.info.uuid
     feedback.cluster_id        = box.info.cluster.uuid
+end
+
+local function fill_in_platform_info(feedback)
+    feedback.os        = jit.os
+    feedback.arch      = jit.arch
+    feedback.is_docker = detect_docker_environment()
+end
+
+local function fill_in_feedback(feedback)
+    fill_in_base_info(feedback)
+    fill_in_platform_info(feedback)
+
     return feedback
 end
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info
  2020-06-09 23:05     ` Илья Конюхов
@ 2020-06-11 19:32       ` Vladislav Shpilevoy
  2020-07-01  0:16       ` Alexander Turenko
  1 sibling, 0 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-11 19:32 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches, alexander.turenko

Hi! Thanks for the fixes!

This commit looks fine.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info
  2020-06-09 23:05     ` Илья Конюхов
  2020-06-11 19:32       ` Vladislav Shpilevoy
@ 2020-07-01  0:16       ` Alexander Turenko
  2020-07-05  2:14         ` Alexander Turenko
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander Turenko @ 2020-07-01  0:16 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches, Vladislav Shpilevoy

> +local function fill_in_platform_info(feedback)
> +    feedback.os        = jit.os
> +    feedback.arch      = jit.arch
> +    feedback.is_docker = detect_docker_environment()
> +end

Platform information may be grouped into its own category.

Nit: I would choose more English friendly name for the 'is_docker'
property. Say, 'dockerized' or at least 'is_under_docker'.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info
  2020-07-01  0:16       ` Alexander Turenko
@ 2020-07-05  2:14         ` Alexander Turenko
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Turenko @ 2020-07-05  2:14 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches, Vladislav Shpilevoy

On Wed, Jul 01, 2020 at 03:16:10AM +0300, Alexander Turenko wrote:
> > +local function fill_in_platform_info(feedback)
> > +    feedback.os        = jit.os
> > +    feedback.arch      = jit.arch
> > +    feedback.is_docker = detect_docker_environment()
> > +end
> 
> Platform information may be grouped into its own category.
> 
> Nit: I would choose more English friendly name for the 'is_docker'
> property. Say, 'dockerized' or at least 'is_under_docker'.

Whether you'll agree with grouping and renaming or not, the patch LGTM.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-05  8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov
@ 2020-06-05  8:35 ` Ilya Konyukhov
  2020-06-07 16:45   ` Vladislav Shpilevoy
  2020-07-01  0:15   ` Alexander Turenko
  2020-06-23 21:23 ` [Tarantool-patches] [PATCH 0/2] Extend feedback module report Vladislav Shpilevoy
  2020-07-13 13:47 ` Kirill Yukhin
  3 siblings, 2 replies; 23+ messages in thread
From: Ilya Konyukhov @ 2020-06-05  8:35 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, alexander.turenko

This patch adds basic db features to feedback report.
It collects info about what engine and which types of
indexes are setup by the user.

Here is how report may look like if all the features used:

```json
{
  "arch": "x64",
  "features": {
    "has_bitset_index": true,
    "has_jsonpath_index": true,
    "vinyl": true,
    "has_tree_index": true,
    "has_primary_index": true,
    "has_hash_index": true,
    "memtx": true,
    "has_temporary_spaces": true,
    "has_local_spaces": true,
    "has_rtree_index": true,
    "has_secondary_index": true,
    "has_functional_index": true
  },
  "server_id": "7c8490f7-61c5-4e12-a7ff-d9fed05ad8ac",
  "is_docker": false,
  "os": "OSX",
  "feedback_type": "version",
  "cluster_id": "1eb7d98e-3344-4f15-a439-c287464f09e7",
  "tarantool_version": "2.5.0-90-g27fbe6ecd",
  "feedback_version": 1
}
```

Part of #4943
---
 src/box/lua/feedback_daemon.lua       | 65 +++++++++++++++++++++++++++
 test/box-tap/feedback_daemon.test.lua | 42 ++++++++++++++++-
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 2ce49fb22..0fcd8ed87 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -41,6 +41,15 @@ local function detect_docker_environment()
     return true
 end
 
+local function is_system_space(sp)
+    local sp_id = sp.id
+    if box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX then
+        return true
+    end
+
+    return false
+end
+
 local function fill_in_base_info(feedback)
     if box.info.status ~= "running" then
         return nil, "not running"
@@ -56,9 +65,65 @@ local function fill_in_platform_info(feedback)
     feedback.is_docker = detect_docker_environment()
 end
 
+local function fill_in_space_indices(feedback, sp)
+    if not sp.index[0] then return end
+
+    feedback.features.has_primary_index = true
+    local idx_count = 0
+    for _, idx in pairs(sp.index) do
+        for _, part in pairs(idx.parts) do
+            if part.path ~= nil then
+                feedback.features.has_jsonpath_index = true
+                break
+            end
+        end
+        if idx.func ~= nil then
+            feedback.features.has_functional_index = true
+        end
+        if idx.type == 'TREE' then
+            feedback.features.has_tree_index = true
+        elseif idx.type == 'HASH' then
+            feedback.features.has_hash_index = true
+        elseif idx.type == 'RTREE' then
+            feedback.features.has_rtree_index = true
+        elseif idx.type == 'BITSET' then
+            feedback.features.has_bitset_index = true
+        end
+        idx_count = idx_count + 1
+    end
+
+    if idx_count > 1 then
+        feedback.features.has_secondary_index = true
+    end
+end
+
+local function fill_in_features(feedback)
+    feedback.features = feedback.features or {}
+
+    local is_memtx, is_vinyl, is_temporary, is_local
+    for _, sp in pairs(box.space) do
+        local is_system = is_system_space(sp)
+        if not is_system then
+            if sp.engine == 'vinyl' then is_vinyl = true end
+            if sp.engine == 'memtx' then
+                if sp.temporary ~= nil then is_temporary = true end
+                is_memtx = true
+            end
+            if sp.is_local ~= nil then is_local = true end
+            fill_in_space_indices(feedback, sp)
+        end
+    end
+
+    feedback.features.has_temporary_spaces = is_temporary
+    feedback.features.has_local_spaces = is_local
+    feedback.features.memtx = is_memtx
+    feedback.features.vinyl = is_vinyl
+end
+
 local function fill_in_feedback(feedback)
     fill_in_base_info(feedback)
     fill_in_platform_info(feedback)
+    fill_in_features(feedback)
 
     return feedback
 end
diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
index c36b2a694..e382af8e8 100755
--- a/test/box-tap/feedback_daemon.test.lua
+++ b/test/box-tap/feedback_daemon.test.lua
@@ -67,7 +67,7 @@ if not ok then
     os.exit(0)
 end
 
-test:plan(11)
+test:plan(16)
 
 local function check(message)
     while feedback_count < 1 do
@@ -113,6 +113,46 @@ check("feedback after start")
 daemon.send_test()
 check("feedback after feedback send_test")
 
+local feedback_json = json.decode(feedback_save)
+test:is(type(feedback_json.features), 'table', 'features field is present')
+test:isnil(next(feedback_json.features), 'features are empty at the moment')
+
+box.schema.create_space('features_vinyl', {engine = 'vinyl'})
+box.schema.create_space('features_memtx', {engine = 'memtx', is_local = true, temporary = true})
+box.space.features_memtx:create_index('vinyl_pk', {type = 'tree'})
+box.space.features_memtx:create_index('memtx_pk', {type = 'hash'})
+box.space.features_memtx:create_index('memtx_bitset', {type = 'bitset'})
+box.space.features_memtx:create_index('memtx_rtree', {type = 'rtree', parts = {3, 'array'}})
+box.space.features_memtx:create_index('memtx_jpath',
+        {parts = {{field=4, type='str', path='data.name'}}})
+box.schema.func.create('features_func', {
+    body = "function(tuple) return {string.sub(tuple[2],1,1)} end",
+    is_deterministic = true,
+    is_sandboxed = true})
+box.space.features_memtx:create_index('j',
+        {parts={{field = 1, type = 'number'}},func = 'features_func'})
+
+check('old feedback received')
+feedback_reset()
+check('feedback with db features received')
+
+feedback_json = json.decode(feedback_save)
+test:test('features', function(t)
+    t:plan(12)
+    t:ok(feedback_json.features.memtx, 'memtx engine usage gathered')
+    t:ok(feedback_json.features.vinyl, 'vinyl engine usage gathered')
+    t:ok(feedback_json.features.has_temporary_spaces, 'temporary space usage gathered')
+    t:ok(feedback_json.features.has_local_spaces, 'local space usage gathered')
+    t:ok(feedback_json.features.has_primary_index, 'primary index gathered')
+    t:ok(feedback_json.features.has_secondary_index, 'secondary index gathered')
+    t:ok(feedback_json.features.has_tree_index, 'tree index gathered')
+    t:ok(feedback_json.features.has_hash_index, 'hash index gathered')
+    t:ok(feedback_json.features.has_rtree_index, 'rtree index gathered')
+    t:ok(feedback_json.features.has_bitset_index, 'bitset index gathered')
+    t:ok(feedback_json.features.has_jsonpath_index, 'jsonpath index gathered')
+    t:ok(feedback_json.features.has_functional_index, 'functional index gathered')
+end)
+
 daemon.stop()
 
 box.feedback.save("feedback.json")
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov
@ 2020-06-07 16:45   ` Vladislav Shpilevoy
  2020-06-09 23:06     ` Илья Конюхов
  2020-07-01  0:15   ` Alexander Turenko
  1 sibling, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-07 16:45 UTC (permalink / raw)
  To: Ilya Konyukhov, tarantool-patches; +Cc: alexander.turenko

Thanks for the patch!

Generally, I don't like having so much Lua code in
the daemon, and system space full scans. Because it
is slow and produces Lua garbage.

Also anyway it can't collect some internal things
such as whether SQL is used (it is not exposed in
any system spaces), popen, swim, etc. These things
don't register self in any global place.

I was rather thinking about keeping track of all
these modules and their statistics in C. So as collection
of the statistics would be right when it changes, in
a set of int counters. And statistics dump would cost O(1)
by time, right into a JSON string, without Lua participation
except that it would call this C dumper and put its result
into an http request.

In other words, I am not sure this commit is needed at all,
until we understand how to collect all the other features
too.

See 6 comments below.

On 05/06/2020 10:35, Ilya Konyukhov wrote:
> This patch adds basic db features to feedback report.
> It collects info about what engine and which types of
> indexes are setup by the user.
> 
> Here is how report may look like if all the features used:
> 
> ```json
> {
>   "arch": "x64",
>   "features": {
>     "has_bitset_index": true,
>     "has_jsonpath_index": true,
>     "vinyl": true,
>     "has_tree_index": true,
>     "has_primary_index": true,
>     "has_hash_index": true,
>     "memtx": true,
>     "has_temporary_spaces": true,
>     "has_local_spaces": true,
>     "has_rtree_index": true,
>     "has_secondary_index": true,
>     "has_functional_index": true
>   },
>   "server_id": "7c8490f7-61c5-4e12-a7ff-d9fed05ad8ac",
>   "is_docker": false,
>   "os": "OSX",
>   "feedback_type": "version",
>   "cluster_id": "1eb7d98e-3344-4f15-a439-c287464f09e7",
>   "tarantool_version": "2.5.0-90-g27fbe6ecd",
>   "feedback_version": 1
> }
> ```
> 
> Part of #4943
> ---
>  src/box/lua/feedback_daemon.lua       | 65 +++++++++++++++++++++++++++
>  test/box-tap/feedback_daemon.test.lua | 42 ++++++++++++++++-
>  2 files changed, 106 insertions(+), 1 deletion(-)
> 
> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
> index 2ce49fb22..0fcd8ed87 100644
> --- a/src/box/lua/feedback_daemon.lua
> +++ b/src/box/lua/feedback_daemon.lua
> @@ -41,6 +41,15 @@ local function detect_docker_environment()
>      return true
>  end
>  
> +local function is_system_space(sp)
> +    local sp_id = sp.id
> +    if box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX then
> +        return true
> +    end

1. Please, keep code lines inside 80 symbols border. Also this function
return can be simplified to

    return box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX

> +
> +    return false
> +end
> +
>  local function fill_in_base_info(feedback)
>      if box.info.status ~= "running" then
>          return nil, "not running"
> @@ -56,9 +65,65 @@ local function fill_in_platform_info(feedback)
>      feedback.is_docker = detect_docker_environment()
>  end
>  
> +local function fill_in_space_indices(feedback, sp)
> +    if not sp.index[0] then return end
> +
> +    feedback.features.has_primary_index = true

2. What is a purpose of this field? Zero-index spaces always
exist, at least because indexes are created in a separate DDL
statement.

Besides, the function and spaces iteration may be really heavy,
if space count is thousands. Or even hundreds, but with many
indexes. And there is no a yield.

In addition to yields I ask you to add caching of this function
results using schema version counter. Schema changes very rarely,
so caching would make this function practically free almost
always.

> +    local idx_count = 0
> +    for _, idx in pairs(sp.index) do
> +        for _, part in pairs(idx.parts) do
> +            if part.path ~= nil then
> +                feedback.features.has_jsonpath_index = true
> +                break
> +            end
> +        end
> +        if idx.func ~= nil then
> +            feedback.features.has_functional_index = true
> +        end
> +        if idx.type == 'TREE' then
> +            feedback.features.has_tree_index = true
> +        elseif idx.type == 'HASH' then
> +            feedback.features.has_hash_index = true
> +        elseif idx.type == 'RTREE' then
> +            feedback.features.has_rtree_index = true
> +        elseif idx.type == 'BITSET' then
> +            feedback.features.has_bitset_index = true
> +        end
> +        idx_count = idx_count + 1
> +    end
> +
> +    if idx_count > 1 then
> +        feedback.features.has_secondary_index = true

3. This does not look really useful. What is this flag going
to tell us? Secondary indexes exist almost always.

Besides, I agree with Dmitry's comment about counters instead
of flags.

> +    end
> +end
> +
> +local function fill_in_features(feedback)
> +    feedback.features = feedback.features or {}
> +
> +    local is_memtx, is_vinyl, is_temporary, is_local
> +    for _, sp in pairs(box.space) do
> +        local is_system = is_system_space(sp)
> +        if not is_system then
> +            if sp.engine == 'vinyl' then is_vinyl = true end
> +            if sp.engine == 'memtx' then
> +                if sp.temporary ~= nil then is_temporary = true end
> +                is_memtx = true
> +            end
> +            if sp.is_local ~= nil then is_local = true end
> +            fill_in_space_indices(feedback, sp)
> +        end
> +    end
> +
> +    feedback.features.has_temporary_spaces = is_temporary
> +    feedback.features.has_local_spaces = is_local
> +    feedback.features.memtx = is_memtx
> +    feedback.features.vinyl = is_vinyl

4. Why do some flags have prefix 'has_', some have 'is_',
and some are just nouns like 'memtx', 'vinyl'? Lets be
consistent and use one name template. For that type of
flags in C we would use 'has_'.

> +end
> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
> index c36b2a694..e382af8e8 100755
> --- a/test/box-tap/feedback_daemon.test.lua
> +++ b/test/box-tap/feedback_daemon.test.lua
> @@ -113,6 +113,46 @@ check("feedback after start")
>  daemon.send_test()
>  check("feedback after feedback send_test")
>  
> +local feedback_json = json.decode(feedback_save)

5. When write a test for an issue, please, mention the
issue in a comment and describe it shortly. Like this:

	--
	-- gh-####: description.
	--

> +test:is(type(feedback_json.features), 'table', 'features field is present')
> +test:isnil(next(feedback_json.features), 'features are empty at the moment')
> +
> +box.schema.create_space('features_vinyl', {engine = 'vinyl'})
> +box.schema.create_space('features_memtx', {engine = 'memtx', is_local = true, temporary = true})
> +box.space.features_memtx:create_index('vinyl_pk', {type = 'tree'})
> +box.space.features_memtx:create_index('memtx_pk', {type = 'hash'})
> +box.space.features_memtx:create_index('memtx_bitset', {type = 'bitset'})
> +box.space.features_memtx:create_index('memtx_rtree', {type = 'rtree', parts = {3, 'array'}})
> +box.space.features_memtx:create_index('memtx_jpath',
> +        {parts = {{field=4, type='str', path='data.name'}}})

6. Please, be consistent in the code style. Surround '=' with whitespaces,
add a whitespace after ',' (see your code below).

> +box.schema.func.create('features_func', {
> +    body = "function(tuple) return {string.sub(tuple[2],1,1)} end",
> +    is_deterministic = true,
> +    is_sandboxed = true})
> +box.space.features_memtx:create_index('j',
> +        {parts={{field = 1, type = 'number'}},func = 'features_func'})
> +
> +check('old feedback received')
> +feedback_reset()
> +check('feedback with db features received')
> +
> +feedback_json = json.decode(feedback_save)
> +test:test('features', function(t)
> +    t:plan(12)
> +    t:ok(feedback_json.features.memtx, 'memtx engine usage gathered')
> +    t:ok(feedback_json.features.vinyl, 'vinyl engine usage gathered')
> +    t:ok(feedback_json.features.has_temporary_spaces, 'temporary space usage gathered')
> +    t:ok(feedback_json.features.has_local_spaces, 'local space usage gathered')
> +    t:ok(feedback_json.features.has_primary_index, 'primary index gathered')
> +    t:ok(feedback_json.features.has_secondary_index, 'secondary index gathered')
> +    t:ok(feedback_json.features.has_tree_index, 'tree index gathered')
> +    t:ok(feedback_json.features.has_hash_index, 'hash index gathered')
> +    t:ok(feedback_json.features.has_rtree_index, 'rtree index gathered')
> +    t:ok(feedback_json.features.has_bitset_index, 'bitset index gathered')
> +    t:ok(feedback_json.features.has_jsonpath_index, 'jsonpath index gathered')
> +    t:ok(feedback_json.features.has_functional_index, 'functional index gathered')
> +end)
> +
>  daemon.stop()
>  
>  box.feedback.save("feedback.json")
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-07 16:45   ` Vladislav Shpilevoy
@ 2020-06-09 23:06     ` Илья Конюхов
  2020-06-11 19:32       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 23+ messages in thread
From: Илья Конюхов @ 2020-06-09 23:06 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko

Thanks for the detailed review!

I’ve corrected most of the comments, you highlighted. You've also mentioned we may want to collect some internal statistics in C. It seems to be more broader topic to talk about. There is still not so clear to me which stats exactly we need to collect. We should discuss it more in related GH issue.

Right now I would focus on parts we can easily reach from lua. And I still think this patch may be useful to track a distribution how spaces and indices are used. I think performance should not be a major issue here since it is now use caching and intended to run once an hour.

In general, based on your feedback I’ve decided to refactor this patch a bit and:
- it now adds a caching for space and index statistics based on schema version;
- redundant primary and secondary index flags are removed;
- flags were replaced by counters.


Also, I’ve left detailed answers under your comments below.

> On 7 Jun 2020, at 19:45, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Thanks for the patch!
> 
> Generally, I don't like having so much Lua code in
> the daemon, and system space full scans. Because it
> is slow and produces Lua garbage.
> 
> Also anyway it can't collect some internal things
> such as whether SQL is used (it is not exposed in
> any system spaces), popen, swim, etc. These things
> don't register self in any global place.
> 
> I was rather thinking about keeping track of all
> these modules and their statistics in C. So as collection
> of the statistics would be right when it changes, in
> a set of int counters. And statistics dump would cost O(1)
> by time, right into a JSON string, without Lua participation
> except that it would call this C dumper and put its result
> into an http request.
> 
> In other words, I am not sure this commit is needed at all,
> until we understand how to collect all the other features
> too.
> 
> See 6 comments below.
> 
> On 05/06/2020 10:35, Ilya Konyukhov wrote:
>> This patch adds basic db features to feedback report.
>> It collects info about what engine and which types of
>> indexes are setup by the user.
>> 
>> Here is how report may look like if all the features used:
>> 
>> ```json
>> {
>>  "arch": "x64",
>>  "features": {
>>    "has_bitset_index": true,
>>    "has_jsonpath_index": true,
>>    "vinyl": true,
>>    "has_tree_index": true,
>>    "has_primary_index": true,
>>    "has_hash_index": true,
>>    "memtx": true,
>>    "has_temporary_spaces": true,
>>    "has_local_spaces": true,
>>    "has_rtree_index": true,
>>    "has_secondary_index": true,
>>    "has_functional_index": true
>>  },
>>  "server_id": "7c8490f7-61c5-4e12-a7ff-d9fed05ad8ac",
>>  "is_docker": false,
>>  "os": "OSX",
>>  "feedback_type": "version",
>>  "cluster_id": "1eb7d98e-3344-4f15-a439-c287464f09e7",
>>  "tarantool_version": "2.5.0-90-g27fbe6ecd",
>>  "feedback_version": 1
>> }
>> ```
>> 
>> Part of #4943
>> ---
>> src/box/lua/feedback_daemon.lua       | 65 +++++++++++++++++++++++++++
>> test/box-tap/feedback_daemon.test.lua | 42 ++++++++++++++++-
>> 2 files changed, 106 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
>> index 2ce49fb22..0fcd8ed87 100644
>> --- a/src/box/lua/feedback_daemon.lua
>> +++ b/src/box/lua/feedback_daemon.lua
>> @@ -41,6 +41,15 @@ local function detect_docker_environment()
>>     return true
>> end
>> 
>> +local function is_system_space(sp)
>> +    local sp_id = sp.id
>> +    if box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX then
>> +        return true
>> +    end
> 
> 1. Please, keep code lines inside 80 symbols border. Also this function
> return can be simplified to
> 
>    return box.schema.SYSTEM_ID_MIN <= sp_id and sp_id <= box.schema.SYSTEM_ID_MAX

Ok, collapsed that part

> 
>> +
>> +    return false
>> +end
>> +
>> local function fill_in_base_info(feedback)
>>     if box.info.status ~= "running" then
>>         return nil, "not running"
>> @@ -56,9 +65,65 @@ local function fill_in_platform_info(feedback)
>>     feedback.is_docker = detect_docker_environment()
>> end
>> 
>> +local function fill_in_space_indices(feedback, sp)
>> +    if not sp.index[0] then return end
>> +
>> +    feedback.features.has_primary_index = true
> 
> 2. What is a purpose of this field? Zero-index spaces always
> exist, at least because indexes are created in a separate DDL
> statement.
> 
> Besides, the function and spaces iteration may be really heavy,
> if space count is thousands. Or even hundreds, but with many
> indexes. And there is no a yield.
> 
> In addition to yields I ask you to add caching of this function
> results using schema version counter. Schema changes very rarely,
> so caching would make this function practically free almost
> always.

- removed primary/secondary fields
- added caching. Cache invalidates when schema version updates
- added yield after each space iteration

> 
>> +    local idx_count = 0
>> +    for _, idx in pairs(sp.index) do
>> +        for _, part in pairs(idx.parts) do
>> +            if part.path ~= nil then
>> +                feedback.features.has_jsonpath_index = true
>> +                break
>> +            end
>> +        end
>> +        if idx.func ~= nil then
>> +            feedback.features.has_functional_index = true
>> +        end
>> +        if idx.type == 'TREE' then
>> +            feedback.features.has_tree_index = true
>> +        elseif idx.type == 'HASH' then
>> +            feedback.features.has_hash_index = true
>> +        elseif idx.type == 'RTREE' then
>> +            feedback.features.has_rtree_index = true
>> +        elseif idx.type == 'BITSET' then
>> +            feedback.features.has_bitset_index = true
>> +        end
>> +        idx_count = idx_count + 1
>> +    end
>> +
>> +    if idx_count > 1 then
>> +        feedback.features.has_secondary_index = true
> 
> 3. This does not look really useful. What is this flag going
> to tell us? Secondary indexes exist almost always.
> 
> Besides, I agree with Dmitry's comment about counters instead
> of flags.

- removed secondary index tracking
- introduced counters for other indices.

> 
>> +    end
>> +end
>> +
>> +local function fill_in_features(feedback)
>> +    feedback.features = feedback.features or {}
>> +
>> +    local is_memtx, is_vinyl, is_temporary, is_local
>> +    for _, sp in pairs(box.space) do
>> +        local is_system = is_system_space(sp)
>> +        if not is_system then
>> +            if sp.engine == 'vinyl' then is_vinyl = true end
>> +            if sp.engine == 'memtx' then
>> +                if sp.temporary ~= nil then is_temporary = true end
>> +                is_memtx = true
>> +            end
>> +            if sp.is_local ~= nil then is_local = true end
>> +            fill_in_space_indices(feedback, sp)
>> +        end
>> +    end
>> +
>> +    feedback.features.has_temporary_spaces = is_temporary
>> +    feedback.features.has_local_spaces = is_local
>> +    feedback.features.memtx = is_memtx
>> +    feedback.features.vinyl = is_vinyl
> 
> 4. Why do some flags have prefix 'has_', some have 'is_',
> and some are just nouns like 'memtx', 'vinyl'? Lets be
> consistent and use one name template. For that type of
> flags in C we would use 'has_'.

With counters it all now has suffixes like “_spaces” and “_indices"

> 
>> +end
>> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
>> index c36b2a694..e382af8e8 100755
>> --- a/test/box-tap/feedback_daemon.test.lua
>> +++ b/test/box-tap/feedback_daemon.test.lua
>> @@ -113,6 +113,46 @@ check("feedback after start")
>> daemon.send_test()
>> check("feedback after feedback send_test")
>> 
>> +local feedback_json = json.decode(feedback_save)
> 
> 5. When write a test for an issue, please, mention the
> issue in a comment and describe it shortly. Like this:
> 
> 	--
> 	-- gh-####: description.
> 	—
> 

Done

>> +test:is(type(feedback_json.features), 'table', 'features field is present')
>> +test:isnil(next(feedback_json.features), 'features are empty at the moment')
>> +
>> +box.schema.create_space('features_vinyl', {engine = 'vinyl'})
>> +box.schema.create_space('features_memtx', {engine = 'memtx', is_local = true, temporary = true})
>> +box.space.features_memtx:create_index('vinyl_pk', {type = 'tree'})
>> +box.space.features_memtx:create_index('memtx_pk', {type = 'hash'})
>> +box.space.features_memtx:create_index('memtx_bitset', {type = 'bitset'})
>> +box.space.features_memtx:create_index('memtx_rtree', {type = 'rtree', parts = {3, 'array'}})
>> +box.space.features_memtx:create_index('memtx_jpath',
>> +        {parts = {{field=4, type='str', path='data.name'}}})
> 
> 6. Please, be consistent in the code style. Surround '=' with whitespaces,
> add a whitespace after ',' (see your code below).
> 

Adjusted code style here. Thanks for pointing it out.

>> +box.schema.func.create('features_func', {
>> +    body = "function(tuple) return {string.sub(tuple[2],1,1)} end",
>> +    is_deterministic = true,
>> +    is_sandboxed = true})
>> +box.space.features_memtx:create_index('j',
>> +        {parts={{field = 1, type = 'number'}},func = 'features_func'})
>> +
>> +check('old feedback received')
>> +feedback_reset()
>> +check('feedback with db features received')
>> +
>> +feedback_json = json.decode(feedback_save)
>> +test:test('features', function(t)
>> +    t:plan(12)
>> +    t:ok(feedback_json.features.memtx, 'memtx engine usage gathered')
>> +    t:ok(feedback_json.features.vinyl, 'vinyl engine usage gathered')
>> +    t:ok(feedback_json.features.has_temporary_spaces, 'temporary space usage gathered')
>> +    t:ok(feedback_json.features.has_local_spaces, 'local space usage gathered')
>> +    t:ok(feedback_json.features.has_primary_index, 'primary index gathered')
>> +    t:ok(feedback_json.features.has_secondary_index, 'secondary index gathered')
>> +    t:ok(feedback_json.features.has_tree_index, 'tree index gathered')
>> +    t:ok(feedback_json.features.has_hash_index, 'hash index gathered')
>> +    t:ok(feedback_json.features.has_rtree_index, 'rtree index gathered')
>> +    t:ok(feedback_json.features.has_bitset_index, 'bitset index gathered')
>> +    t:ok(feedback_json.features.has_jsonpath_index, 'jsonpath index gathered')
>> +    t:ok(feedback_json.features.has_functional_index, 'functional index gathered')
>> +end)
>> +
>> daemon.stop()
>> 
>> box.feedback.save("feedback.json”)
>> 

diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 21e69d511..1f177a204 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -50,6 +50,25 @@ local function detect_docker_environment()
     return cached_detect_docker_env
 end
 
+local function is_system_space(sp)
+    return box.schema.SYSTEM_ID_MIN <= sp.id and
+            sp.id <= box.schema.SYSTEM_ID_MAX
+end
+
+local function is_jsonpath_index(idx)
+    for _, part in pairs(idx.parts) do
+        if part.path ~= nil then
+            return true
+        end
+    end
+
+    return false
+end
+
+local function is_functional_index(idx)
+    return idx.func ~= nil
+end
+
 local function fill_in_base_info(feedback)
     if box.info.status ~= "running" then
         return nil, "not running"
@@ -65,9 +84,98 @@ local function fill_in_platform_info(feedback)
     feedback.is_docker = detect_docker_environment()
 end
 
+local function fill_in_indices_stats(space, stats)
+    if not space.index[0] then return end
+
+    for name, idx in pairs(space.index) do
+        if type(name) == 'number' then
+            local idx_type = idx.type
+            if idx_type == 'TREE' then
+                if is_functional_index(idx) then
+                    stats.functional = stats.functional + 1
+                elseif is_jsonpath_index(idx) then
+                    stats.jsonpath = stats.jsonpath + 1
+                end
+                stats.tree = stats.tree + 1
+            elseif idx_type == 'HASH' then
+                stats.hash = stats.hash + 1
+            elseif idx_type == 'RTREE' then
+                stats.rtree = stats.rtree + 1
+            elseif idx_type == 'BITSET' then
+                stats.bitset = stats.bitset + 1
+            end
+        end
+    end
+end
+
+local function fill_in_space_stats(features)
+    local spaces = {
+        memtx     = 0,
+        vinyl     = 0,
+        temporary = 0,
+        ['local'] = 0,
+    }
+
+    local indices = {
+        hash       = 0,
+        tree       = 0,
+        rtree      = 0,
+        bitset     = 0,
+        jsonpath   = 0,
+        functional = 0,
+    }
+
+    for name, space in pairs(box.space) do
+        local is_system = is_system_space(space)
+        if not is_system and type(name) == 'number' then
+            if space.engine == 'vinyl' then
+                spaces.vinyl = spaces.vinyl + 1
+            elseif space.engine == 'memtx' then
+                if space.temporary ~= nil then
+                    spaces.temporary = spaces.temporary + 1
+                end
+                spaces.memtx = spaces.memtx + 1
+            end
+            if space.is_local == false then
+                spaces['local'] = spaces['local'] + 1
+            end
+            fill_in_indices_stats(space, indices)
+        end
+        fiber.yield()
+    end
+
+    for k, v in pairs(spaces) do
+        features[k..'_spaces'] = v
+    end
+
+    for k, v in pairs(indices) do
+        features[k..'_indices'] = v
+    end
+end
+
+local function fill_in_features_impl(features)
+    fill_in_space_stats(features)
+end
+
+local cached_schema_version = 0
+local cached_feedback_features = {}
+
+local function fill_in_features(feedback)
+    local schema_version = box.internal.schema_version()
+    if cached_schema_version < schema_version then
+        local features = {}
+        fill_in_features_impl(features)
+        cached_schema_version = schema_version
+        cached_feedback_features = features
+    end
+
+    feedback.features = cached_feedback_features
+end
+
 local function fill_in_feedback(feedback)
     fill_in_base_info(feedback)
     fill_in_platform_info(feedback)
+    fill_in_features(feedback)
 
     return feedback
 end
diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
index d4adb71f1..8ef20e0d0 100755
--- a/test/box-tap/feedback_daemon.test.lua
+++ b/test/box-tap/feedback_daemon.test.lua
@@ -67,7 +67,7 @@ if not ok then
     os.exit(0)
 end
 
-test:plan(11)
+test:plan(19)
 
 local function check(message)
     while feedback_count < 1 do
@@ -113,6 +113,71 @@ check("feedback after start")
 daemon.send_test()
 check("feedback after feedback send_test")
 
+--
+-- gh-4943: Collect engines and indices statistics.
+--
+
+local feedback_json = json.decode(feedback_save)
+test:is(type(feedback_json.features), 'table', 'features field is present')
+local expected = {
+    memtx_spaces = 0,
+    vinyl_spaces = 0,
+    temporary_spaces = 0,
+    local_spaces = 0,
+    tree_indices = 0,
+    rtree_indices = 0,
+    hash_indices = 0,
+    bitset_indices = 0,
+    jsonpath_indices = 0,
+    functional_indices = 0,
+}
+test:is_deeply(feedback_json.features, expected, 'features are empty at the moment')
+
+box.schema.create_space('features_vinyl', {engine = 'vinyl'})
+box.schema.create_space('features_memtx',
+        {engine = 'memtx', is_local = true, temporary = true})
+box.space.features_vinyl:create_index('vinyl_pk', {type = 'tree'})
+box.space.features_memtx:create_index('memtx_pk', {type = 'tree'})
+box.space.features_memtx:create_index('memtx_hash', {type = 'hash'})
+box.space.features_memtx:create_index('memtx_bitset', {type = 'bitset'})
+box.space.features_memtx:create_index('memtx_rtree',
+        {type = 'rtree', parts = {{field = 3, type = 'array'}}})
+box.space.features_memtx:create_index('memtx_jpath',
+        {parts = {{field = 4, type = 'str', path = 'data.name'}}})
+box.schema.func.create('features_func', {
+    body = "function(tuple) return {string.sub(tuple[2], 1, 1)} end",
+    is_deterministic = true,
+    is_sandboxed = true})
+box.space.features_memtx:create_index('j',
+        {parts = {{field = 1, type = 'number'}}, func = 'features_func'})
+
+check('old feedback received')
+feedback_reset()
+check('feedback with db features received')
+
+feedback_json = json.decode(feedback_save)
+test:test('features', function(t)
+    t:plan(10)
+    t:is(feedback_json.features.memtx_spaces, 1, 'memtx engine usage gathered')
+    t:is(feedback_json.features.vinyl_spaces, 1, 'vinyl engine usage gathered')
+    t:is(feedback_json.features.temporary_spaces, 1, 'temporary space usage gathered')
+    t:is(feedback_json.features.local_spaces, 1, 'local space usage gathered')
+    t:is(feedback_json.features.tree_indices, 4, 'tree index gathered')
+    t:is(feedback_json.features.hash_indices, 1, 'hash index gathered')
+    t:is(feedback_json.features.rtree_indices, 1, 'rtree index gathered')
+    t:is(feedback_json.features.bitset_indices, 1, 'bitset index gathered')
+    t:is(feedback_json.features.jsonpath_indices, 1, 'jsonpath index gathered')
+    t:is(feedback_json.features.functional_indices, 1, 'functional index gathered')
+end)
+
+box.space.features_memtx:create_index('memtx_sec', {type = 'hash'})
+
+check('old feedback received')
+feedback_reset()
+check('feedback with new db features received')
+feedback_json = json.decode(feedback_save)
+test:is(feedback_json.features.hash_indices, 2, 'internal cache invalidates when schema changes')
+
 daemon.stop()
 
 box.feedback.save("feedback.json")

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-09 23:06     ` Илья Конюхов
@ 2020-06-11 19:32       ` Vladislav Shpilevoy
  2020-06-17  8:59         ` Илья Конюхов
  0 siblings, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-11 19:32 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches, alexander.turenko

Thanks for the fixes!

Generally fine. As a first set of simple stats. See a few
more comments below.

First is about why I wanted to collect the stat from C, in a
reactive way, rather than in a proactive way.

The main reason - C is cheaper. Also the collected stats could be
useful for users, if we could design it in a good way. And then it
could be collected more often than once in an hour.

The other reason - the schema cache should help in most of the
cases, but I once saw a customer, who used spaces as temporary
objects. From what I saw, it was code of a game, and spaces were
lobbies. In such installation the space count may be huge, and
schema change rate as well. So the cache becomes useless.

A reactive C implementation could just increment a set of int counters
in a global struct in alter.cc right when a space/index is
created/changed/dropped. It would cost nothing for everyone. And would
be a good base for wider set of statistics. Which we clearly need.
For example, Cyrill was struggling with lack of internal statistics
when tried to bench relay threads performance.

See 4 comments below.

> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
> index 21e69d511..1f177a204 100644
> --- a/src/box/lua/feedback_daemon.lua
> +++ b/src/box/lua/feedback_daemon.lua
> @@ -50,6 +50,25 @@ local function detect_docker_environment()
>      return cached_detect_docker_env
>  end
>  
> +local function is_system_space(sp)
> +    return box.schema.SYSTEM_ID_MIN <= sp.id and
> +            sp.id <= box.schema.SYSTEM_ID_MAX

1. Nit: please, align the second line by 'box' in the previous
line. Like this:

    return box.schema.SYSTEM_ID_MIN <= sp.id and
           sp.id <= box.schema.SYSTEM_ID_MAX

> +end
> +
> +local function is_jsonpath_index(idx)
> +    for _, part in pairs(idx.parts) do
> +        if part.path ~= nil then
> +            return true
> +        end
> +    end
> +
> +    return false
> +end
> +
> +local function is_functional_index(idx)
> +    return idx.func ~= nil
> +end
> +
>  local function fill_in_base_info(feedback)
>      if box.info.status ~= "running" then
>          return nil, "not running"
> @@ -65,9 +84,98 @@ local function fill_in_platform_info(feedback)
>      feedback.is_docker = detect_docker_environment()
>  end
>  
> +local function fill_in_indices_stats(space, stats)
> +    if not space.index[0] then return end

2. Looks like this condition is not necessary. The cycle anyway
shouldn't do any iterations if the table is empty.

> +
> +    for name, idx in pairs(space.index) do
> +        if type(name) == 'number' then
> +            local idx_type = idx.type
> +            if idx_type == 'TREE' then
> +                if is_functional_index(idx) then
> +                    stats.functional = stats.functional + 1
> +                elseif is_jsonpath_index(idx) then
> +                    stats.jsonpath = stats.jsonpath + 1
> +                end
> +                stats.tree = stats.tree + 1
> +            elseif idx_type == 'HASH' then
> +                stats.hash = stats.hash + 1
> +            elseif idx_type == 'RTREE' then
> +                stats.rtree = stats.rtree + 1
> +            elseif idx_type == 'BITSET' then
> +                stats.bitset = stats.bitset + 1
> +            end
> +        end
> +    end
> +end
> +
> +local function fill_in_space_stats(features)
> +    local spaces = {
> +        memtx     = 0,
> +        vinyl     = 0,
> +        temporary = 0,
> +        ['local'] = 0,
> +    }
> +
> +    local indices = {
> +        hash       = 0,
> +        tree       = 0,
> +        rtree      = 0,
> +        bitset     = 0,
> +        jsonpath   = 0,
> +        functional = 0,
> +    }
> +
> +    for name, space in pairs(box.space) do
> +        local is_system = is_system_space(space)
> +        if not is_system and type(name) == 'number' then
> +            if space.engine == 'vinyl' then
> +                spaces.vinyl = spaces.vinyl + 1
> +            elseif space.engine == 'memtx' then
> +                if space.temporary ~= nil then
> +                    spaces.temporary = spaces.temporary + 1
> +                end
> +                spaces.memtx = spaces.memtx + 1
> +            end
> +            if space.is_local == false then

3. Are you sure? Looks like a typo. Perhaps you meant:

    if space.is_local then

Without '== false'. Otherwise you count non-local spaces.
How does it pass the tests now?

> +local function fill_in_features_impl(features)
> +    fill_in_space_stats(features)
> +end
> +
> +local cached_schema_version = 0
> +local cached_feedback_features = {}

4. I would better move the cache handling into fill_in_space_stats().
Because when you will add more features, not related to the schema,
they won't relate to schema version.

fill_in_features() should not use any caches. Does not depend on schema.
fill_in_space_stats() can use the cache. Because fetches info from the
schema.

I mean, it works now, but we would need to rewrite that mostly, when
more features will be collected.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-11 19:32       ` Vladislav Shpilevoy
@ 2020-06-17  8:59         ` Илья Конюхов
  2020-06-17 22:53           ` Vladislav Shpilevoy
  0 siblings, 1 reply; 23+ messages in thread
From: Илья Конюхов @ 2020-06-17  8:59 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi,

I’ve worked on your comments and fixed lapses. See comments below.

Besides that, are you ok with pushing this (Lua) patch forward? I understand your thoughts on implementing more robust solution in C, which would also cover some statistics gathering goals. I agree with it, but I think we need to design it first. Let me branch out an issue on this topic.

> On 11 Jun 2020, at 22:32, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Thanks for the fixes!
> 
> Generally fine. As a first set of simple stats. See a few
> more comments below.
> 
> First is about why I wanted to collect the stat from C, in a
> reactive way, rather than in a proactive way.
> 
> The main reason - C is cheaper. Also the collected stats could be
> useful for users, if we could design it in a good way. And then it
> could be collected more often than once in an hour.
> 
> The other reason - the schema cache should help in most of the
> cases, but I once saw a customer, who used spaces as temporary
> objects. From what I saw, it was code of a game, and spaces were
> lobbies. In such installation the space count may be huge, and
> schema change rate as well. So the cache becomes useless.
> 
> A reactive C implementation could just increment a set of int counters
> in a global struct in alter.cc right when a space/index is
> created/changed/dropped. It would cost nothing for everyone. And would
> be a good base for wider set of statistics. Which we clearly need.
> For example, Cyrill was struggling with lack of internal statistics
> when tried to bench relay threads performance.
> 
> See 4 comments below.
> 
>> diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
>> index 21e69d511..1f177a204 100644
>> --- a/src/box/lua/feedback_daemon.lua
>> +++ b/src/box/lua/feedback_daemon.lua
>> @@ -50,6 +50,25 @@ local function detect_docker_environment()
>>     return cached_detect_docker_env
>> end
>> 
>> +local function is_system_space(sp)
>> +    return box.schema.SYSTEM_ID_MIN <= sp.id and
>> +            sp.id <= box.schema.SYSTEM_ID_MAX
> 
> 1. Nit: please, align the second line by 'box' in the previous
> line. Like this:
> 
>    return box.schema.SYSTEM_ID_MIN <= sp.id and
>           sp.id <= box.schema.SYSTEM_ID_MAX

done.

> 
>> +end
>> +
>> +local function is_jsonpath_index(idx)
>> +    for _, part in pairs(idx.parts) do
>> +        if part.path ~= nil then
>> +            return true
>> +        end
>> +    end
>> +
>> +    return false
>> +end
>> +
>> +local function is_functional_index(idx)
>> +    return idx.func ~= nil
>> +end
>> +
>> local function fill_in_base_info(feedback)
>>     if box.info.status ~= "running" then
>>         return nil, "not running"
>> @@ -65,9 +84,98 @@ local function fill_in_platform_info(feedback)
>>     feedback.is_docker = detect_docker_environment()
>> end
>> 
>> +local function fill_in_indices_stats(space, stats)
>> +    if not space.index[0] then return end
> 
> 2. Looks like this condition is not necessary. The cycle anyway
> shouldn't do any iterations if the table is empty.

agree. fixed.

> 
>> +
>> +    for name, idx in pairs(space.index) do
>> +        if type(name) == 'number' then
>> +            local idx_type = idx.type
>> +            if idx_type == 'TREE' then
>> +                if is_functional_index(idx) then
>> +                    stats.functional = stats.functional + 1
>> +                elseif is_jsonpath_index(idx) then
>> +                    stats.jsonpath = stats.jsonpath + 1
>> +                end
>> +                stats.tree = stats.tree + 1
>> +            elseif idx_type == 'HASH' then
>> +                stats.hash = stats.hash + 1
>> +            elseif idx_type == 'RTREE' then
>> +                stats.rtree = stats.rtree + 1
>> +            elseif idx_type == 'BITSET' then
>> +                stats.bitset = stats.bitset + 1
>> +            end
>> +        end
>> +    end
>> +end
>> +
>> +local function fill_in_space_stats(features)
>> +    local spaces = {
>> +        memtx     = 0,
>> +        vinyl     = 0,
>> +        temporary = 0,
>> +        ['local'] = 0,
>> +    }
>> +
>> +    local indices = {
>> +        hash       = 0,
>> +        tree       = 0,
>> +        rtree      = 0,
>> +        bitset     = 0,
>> +        jsonpath   = 0,
>> +        functional = 0,
>> +    }
>> +
>> +    for name, space in pairs(box.space) do
>> +        local is_system = is_system_space(space)
>> +        if not is_system and type(name) == 'number' then
>> +            if space.engine == 'vinyl' then
>> +                spaces.vinyl = spaces.vinyl + 1
>> +            elseif space.engine == 'memtx' then
>> +                if space.temporary ~= nil then
>> +                    spaces.temporary = spaces.temporary + 1
>> +                end
>> +                spaces.memtx = spaces.memtx + 1
>> +            end
>> +            if space.is_local == false then
> 
> 3. Are you sure? Looks like a typo. Perhaps you meant:
> 
>    if space.is_local then
> 
> Without '== false'. Otherwise you count non-local spaces.
> How does it pass the tests now?

My bad. Thanks, fixed. Test happen to pass, because there were 2 spaces. 1 local and 1 non-local. Adjusted it.

> 
>> +local function fill_in_features_impl(features)
>> +    fill_in_space_stats(features)
>> +end
>> +
>> +local cached_schema_version = 0
>> +local cached_feedback_features = {}
> 
> 4. I would better move the cache handling into fill_in_space_stats().
> Because when you will add more features, not related to the schema,
> they won't relate to schema version.
> 
> fill_in_features() should not use any caches. Does not depend on schema.
> fill_in_space_stats() can use the cache. Because fetches info from the
> schema.
> 
> I mean, it works now, but we would need to rewrite that mostly, when
> more features will be collected.

Agree. Extracted out caching into fill_in_space_stats(), but kept it in fill_in_features(), since space_stats is semantically features.



diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 95130d696..111a70fca 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -26,13 +26,154 @@ local function get_fiber_id(f)
     return fid
 end
 
-local function fill_in_feedback(feedback)
+local function detect_docker_environment_impl()
+    local fh = fio.open('/proc/1/cgroup', {'O_RDONLY'})
+    if not fh then
+        return false
+    end
+
+    -- fh:read() doesn't read empty "proc" files
+    local big_enough_chunk = 4096
+    local s = fh:read(big_enough_chunk)
+    fh:close()
+
+    return s and s:find('docker') and true or false
+end
+
+local cached_detect_docker_env
+
+local function detect_docker_environment()
+    if cached_detect_docker_env == nil then
+        cached_detect_docker_env = detect_docker_environment_impl()
+    end
+
+    return cached_detect_docker_env
+end
+
+local function is_system_space(sp)
+    return box.schema.SYSTEM_ID_MIN <= sp.id and
+           sp.id <= box.schema.SYSTEM_ID_MAX
+end
+
+local function is_jsonpath_index(idx)
+    for _, part in pairs(idx.parts) do
+        if part.path ~= nil then
+            return true
+        end
+    end
+
+    return false
+end
+
+local function is_functional_index(idx)
+    return idx.func ~= nil
+end
+
+local function fill_in_base_info(feedback)
     if box.info.status ~= "running" then
         return nil, "not running"
     end
     feedback.tarantool_version = box.info.version
     feedback.server_id         = box.info.uuid
     feedback.cluster_id        = box.info.cluster.uuid
+end
+
+local function fill_in_platform_info(feedback)
+    feedback.os        = jit.os
+    feedback.arch      = jit.arch
+    feedback.is_docker = detect_docker_environment()
+end
+
+local function fill_in_indices_stats(space, stats)
+    for name, idx in pairs(space.index) do
+        if type(name) == 'number' then
+            local idx_type = idx.type
+            if idx_type == 'TREE' then
+                if is_functional_index(idx) then
+                    stats.functional = stats.functional + 1
+                elseif is_jsonpath_index(idx) then
+                    stats.jsonpath = stats.jsonpath + 1
+                end
+                stats.tree = stats.tree + 1
+            elseif idx_type == 'HASH' then
+                stats.hash = stats.hash + 1
+            elseif idx_type == 'RTREE' then
+                stats.rtree = stats.rtree + 1
+            elseif idx_type == 'BITSET' then
+                stats.bitset = stats.bitset + 1
+            end
+        end
+    end
+end
+
+local function fill_in_space_stats_impl(features)
+    local spaces = {
+        memtx     = 0,
+        vinyl     = 0,
+        temporary = 0,
+        ['local'] = 0,
+    }
+
+    local indices = {
+        hash       = 0,
+        tree       = 0,
+        rtree      = 0,
+        bitset     = 0,
+        jsonpath   = 0,
+        functional = 0,
+    }
+
+    for name, space in pairs(box.space) do
+        local is_system = is_system_space(space)
+        if not is_system and type(name) == 'number' then
+            if space.engine == 'vinyl' then
+                spaces.vinyl = spaces.vinyl + 1
+            elseif space.engine == 'memtx' then
+                if space.temporary then
+                    spaces.temporary = spaces.temporary + 1
+                end
+                spaces.memtx = spaces.memtx + 1
+            end
+            if space.is_local then
+                spaces['local'] = spaces['local'] + 1
+            end
+            fill_in_indices_stats(space, indices)
+        end
+        fiber.yield()
+    end
+
+    for k, v in pairs(spaces) do
+        features[k..'_spaces'] = v
+    end
+
+    for k, v in pairs(indices) do
+        features[k..'_indices'] = v
+    end
+end
+
+local cached_schema_version = 0
+local cached_feedback_features = {}
+
+local function fill_in_space_stats(feedback)
+    local schema_version = box.internal.schema_version()
+    if cached_schema_version < schema_version then
+        local features = {}
+        fill_in_space_stats_impl(features)
+        cached_schema_version = schema_version
+        cached_feedback_features = features
+    end
+    feedback.features = cached_feedback_features
+end
+
+local function fill_in_features(feedback)
+    fill_in_space_stats(feedback)
+end
+
+local function fill_in_feedback(feedback)
+    fill_in_base_info(feedback)
+    fill_in_platform_info(feedback)
+    fill_in_features(feedback)
+
     return feedback
 end
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-17  8:59         ` Илья Конюхов
@ 2020-06-17 22:53           ` Vladislav Shpilevoy
  2020-06-18 15:42             ` Илья Конюхов
  0 siblings, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-17 22:53 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches

Hi! Thanks for the fixes!

> I’ve worked on your comments and fixed lapses. See comments below.
> 
> Besides that, are you ok with pushing this (Lua) patch forward?

I am ok.

>>> +local function fill_in_features_impl(features)
>>> +    fill_in_space_stats(features)
>>> +end
>>> +
>>> +local cached_schema_version = 0
>>> +local cached_feedback_features = {}
>>
>> 4. I would better move the cache handling into fill_in_space_stats().
>> Because when you will add more features, not related to the schema,
>> they won't relate to schema version.
>>
>> fill_in_features() should not use any caches. Does not depend on schema.
>> fill_in_space_stats() can use the cache. Because fetches info from the
>> schema.
>>
>> I mean, it works now, but we would need to rewrite that mostly, when
>> more features will be collected.
> 
> Agree. Extracted out caching into fill_in_space_stats(), but kept it in fill_in_features(), since space_stats is semantically features.

This is a little bit better but basically is still the same. Just add
a few dummy functions like

	fill_in_swim()
	fill_in_box_update()
	fill_in_vinyl_stat()
	fill_in_netbox()

which would need to add several new fields to the feedback.features table,
and see how hard it is to fit it into this code without rewriting it
completely. Ideally addition of such new functions should not change a
single line of the existing code.

Perhaps you may need to store schema features in feedback.features.schema,
not in feedback.features. So the schema becomes isolated from the other
features. And store the others like feedback.features.swim,
feedback.features.box_update, feedback.features.vinyl, etc. Not in a huge
flat table.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-17 22:53           ` Vladislav Shpilevoy
@ 2020-06-18 15:42             ` Илья Конюхов
  2020-06-18 23:02               ` Vladislav Shpilevoy
  0 siblings, 1 reply; 23+ messages in thread
From: Илья Конюхов @ 2020-06-18 15:42 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi,

I’ve refactored code a bit according to your comment. See below.

> On 18 Jun 2020, at 01:53, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the fixes!
> 
>> I’ve worked on your comments and fixed lapses. See comments below.
>> 
>> Besides that, are you ok with pushing this (Lua) patch forward?
> 
> I am ok.
> 
>>>> +local function fill_in_features_impl(features)
>>>> +    fill_in_space_stats(features)
>>>> +end
>>>> +
>>>> +local cached_schema_version = 0
>>>> +local cached_feedback_features = {}
>>> 
>>> 4. I would better move the cache handling into fill_in_space_stats().
>>> Because when you will add more features, not related to the schema,
>>> they won't relate to schema version.
>>> 
>>> fill_in_features() should not use any caches. Does not depend on schema.
>>> fill_in_space_stats() can use the cache. Because fetches info from the
>>> schema.
>>> 
>>> I mean, it works now, but we would need to rewrite that mostly, when
>>> more features will be collected.
>> 
>> Agree. Extracted out caching into fill_in_space_stats(), but kept it in fill_in_features(), since space_stats is semantically features.
> 
> This is a little bit better but basically is still the same. Just add
> a few dummy functions like
> 
> 	fill_in_swim()
> 	fill_in_box_update()
> 	fill_in_vinyl_stat()
> 	fill_in_netbox()
> 
> which would need to add several new fields to the feedback.features table,
> and see how hard it is to fit it into this code without rewriting it
> completely. Ideally addition of such new functions should not change a
> single line of the existing code.
> 
> Perhaps you may need to store schema features in feedback.features.schema,
> not in feedback.features. So the schema becomes isolated from the other
> features. And store the others like feedback.features.swim,
> feedback.features.box_update, feedback.features.vinyl, etc. Not in a huge
> flat table.

I’ve thought about it in the first run, but decided to stick to the flat approach. Now within a context of us adding more features into a report it seems more reasonable to me. Fixed that and put schema statistics into feedback.features.schema field.



diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index 111a70fca..ee58575a4 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -106,7 +106,7 @@ local function fill_in_indices_stats(space, stats)
     end
 end
 
-local function fill_in_space_stats_impl(features)
+local function fill_in_schema_stats_impl(schema)
     local spaces = {
         memtx     = 0,
         vinyl     = 0,
@@ -143,30 +143,31 @@ local function fill_in_space_stats_impl(features)
     end
 
     for k, v in pairs(spaces) do
-        features[k..'_spaces'] = v
+        schema[k..'_spaces'] = v
     end
 
     for k, v in pairs(indices) do
-        features[k..'_indices'] = v
+        schema[k..'_indices'] = v
     end
 end
 
 local cached_schema_version = 0
-local cached_feedback_features = {}
+local cached_schema_features = {}
 
-local function fill_in_space_stats(feedback)
+local function fill_in_schema_stats(features)
     local schema_version = box.internal.schema_version()
     if cached_schema_version < schema_version then
-        local features = {}
-        fill_in_space_stats_impl(features)
+        local schema = {}
+        fill_in_schema_stats_impl(schema)
         cached_schema_version = schema_version
-        cached_feedback_features = features
+        cached_schema_features = schema
     end
-    feedback.features = cached_feedback_features
+    features.schema = cached_schema_features
 end
 
 local function fill_in_features(feedback)
-    fill_in_space_stats(feedback)
+    feedback.features = {}
+    fill_in_schema_stats(feedback.features)
 end
 
 local function fill_in_feedback(feedback)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-18 15:42             ` Илья Конюхов
@ 2020-06-18 23:02               ` Vladislav Shpilevoy
  2020-06-19 14:01                 ` Илья Конюхов
  0 siblings, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-18 23:02 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches

Hi! Thanks for the fixes!

The patch LGTM except that its test is failing here:
https://travis-ci.org/github/tarantool/tarantool/jobs/699753951

Also it fails locally on my machine.

======================================================================================
WORKR TEST                                            PARAMS          RESULT
---------------------------------------------------------------------------------
[001] box-tap/feedback_daemon.test.lua                                
[001] not ok 15 - failed subtests #  
[001] Traceback:
[001] [Lua ] function 'test' at <builtin/tap.lua:249>
[001] [main] at </Users/gerold/Work/Repositories/tarantool/test/box-tap/feedback_daemon.test.lua:0>
[001] 
[001] {'failed': 2, 'planned': 10}
[001] 
[001] not ok 22 - data is equal #  
[001] Expected: {"arch":"x64","features":{"schema":{"local_spaces":1,"functional_indices":0,"hash_indices":1,"rtree_indices":0,"temporary_spaces":1,"tree_indices":1,"jsonpath_indices":0,"vinyl_spaces":0,"bitset_indices":1,"memtx_spaces":1}},"server_id":"fb2c934e-e20e-44c3-824d-2ee854863569","is_docker":false,"os":"OSX","feedback_type":"version","cluster_id":"26f54eba-e463-4248-82ed-03b1e504ec82","tarantool_version":"2.5.0-148-g15a09c910","feedback_version":1}
[001] Got:      {"arch":"x64","features":{"schema":{"local_spaces":0,"functional_indices":0,"hash_indices":0,"rtree_indices":0,"temporary_spaces":0,"tree_indices":0,"jsonpath_indices":0,"vinyl_spaces":0,"bitset_indices":0,"memtx_spaces":0}},"server_id":"fb2c934e-e20e-44c3-824d-2ee854863569","is_docker":false,"os":"OSX","feedback_type":"version","cluster_id":"26f54eba-e463-4248-82ed-03b1e504ec82","tarantool_version":"2.5.0-148-g15a09c910","feedback_version":1}
[001] Traceback:
[001] [main] at </Users/gerold/Work/Repositories/tarantool/test/box-tap/feedback_daemon.test.lua:0>
[001] 
[001] Rejected result file: box-tap/feedback_daemon.reject
[001] [ fail ]
[Main process] Got failed test; gently terminate all workers...
[001] Worker "001_box-tap" got failed test; stopping the server...
---------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-18 23:02               ` Vladislav Shpilevoy
@ 2020-06-19 14:01                 ` Илья Конюхов
  2020-06-19 23:49                   ` Vladislav Shpilevoy
  0 siblings, 1 reply; 23+ messages in thread
From: Илья Конюхов @ 2020-06-19 14:01 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi,

Got it. That was a flaky test, which was failed on busy host. Fixed by gathering feedback synchronously.

> On 19 Jun 2020, at 02:02, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the fixes!
> 
> The patch LGTM except that its test is failing here:
> https://travis-ci.org/github/tarantool/tarantool/jobs/699753951
> 
> Also it fails locally on my machine.
> 
> ======================================================================================
> WORKR TEST                                            PARAMS          RESULT
> ---------------------------------------------------------------------------------
> [001] box-tap/feedback_daemon.test.lua                                
> [001] not ok 15 - failed subtests #  
> [001] Traceback:
> [001] [Lua ] function 'test' at <builtin/tap.lua:249>
> [001] [main] at </Users/gerold/Work/Repositories/tarantool/test/box-tap/feedback_daemon.test.lua:0>
> [001] 
> [001] {'failed': 2, 'planned': 10}
> [001] 
> [001] not ok 22 - data is equal #  
> [001] Expected: {"arch":"x64","features":{"schema":{"local_spaces":1,"functional_indices":0,"hash_indices":1,"rtree_indices":0,"temporary_spaces":1,"tree_indices":1,"jsonpath_indices":0,"vinyl_spaces":0,"bitset_indices":1,"memtx_spaces":1}},"server_id":"fb2c934e-e20e-44c3-824d-2ee854863569","is_docker":false,"os":"OSX","feedback_type":"version","cluster_id":"26f54eba-e463-4248-82ed-03b1e504ec82","tarantool_version":"2.5.0-148-g15a09c910","feedback_version":1}
> [001] Got:      {"arch":"x64","features":{"schema":{"local_spaces":0,"functional_indices":0,"hash_indices":0,"rtree_indices":0,"temporary_spaces":0,"tree_indices":0,"jsonpath_indices":0,"vinyl_spaces":0,"bitset_indices":0,"memtx_spaces":0}},"server_id":"fb2c934e-e20e-44c3-824d-2ee854863569","is_docker":false,"os":"OSX","feedback_type":"version","cluster_id":"26f54eba-e463-4248-82ed-03b1e504ec82","tarantool_version":"2.5.0-148-g15a09c910","feedback_version":1}
> [001] Traceback:
> [001] [main] at </Users/gerold/Work/Repositories/tarantool/test/box-tap/feedback_daemon.test.lua:0>
> [001] 
> [001] Rejected result file: box-tap/feedback_daemon.reject
> [001] [ fail ]
> [Main process] Got failed test; gently terminate all workers...
> [001] Worker "001_box-tap" got failed test; stopping the server...
> ————————————————————————————————————————

diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
index d5c0eacab..1fd5778b4 100755
--- a/test/box-tap/feedback_daemon.test.lua
+++ b/test/box-tap/feedback_daemon.test.lua
@@ -67,7 +67,7 @@ if not ok then
     os.exit(0)
 end
 
-test:plan(22)
+test:plan(16)
 
 local function check(message)
     while feedback_count < 1 do
@@ -113,13 +113,32 @@ check("feedback after start")
 daemon.send_test()
 check("feedback after feedback send_test")
 
+daemon.stop()
+
+actual = daemon.generate_feedback()
+box.feedback.save("feedback.json")
+local fio = require("fio")
+local fh = fio.open("feedback.json")
+test:ok(fh, "file is created")
+local file_data = fh:read()
+test:is(file_data, json.encode(actual), "data is equal")
+fh:close()
+fio.unlink("feedback.json")
+
+server:close()
+-- check it does not fail without server
+daemon = box.internal.feedback_daemon
+daemon.start()
+daemon.send_test()
+daemon.stop()
+
 --
 -- gh-4943: Collect engines and indices statistics.
 --
 
-local feedback_json = json.decode(feedback_save)
-test:is(type(feedback_json.features), 'table', 'features field is present')
-test:is(type(feedback_json.features.schema), 'table', 'schema stats are present')
+local actual = daemon.generate_feedback()
+test:is(type(actual.features), 'table', 'features field is present')
+test:is(type(actual.features.schema), 'table', 'schema stats are present')
 local expected = {
     memtx_spaces = 0,
     vinyl_spaces = 0,
@@ -132,7 +151,7 @@ local expected = {
     jsonpath_indices = 0,
     functional_indices = 0,
 }
-test:is_deeply(feedback_json.features.schema, expected,
+test:is_deeply(actual.features.schema, expected,
         'schema stats are empty at the moment')
 
 box.schema.create_space('features_vinyl', {engine = 'vinyl'})
@@ -154,12 +173,8 @@ box.schema.func.create('features_func', {
 box.space.features_memtx:create_index('j',
         {parts = {{field = 1, type = 'number'}}, func = 'features_func'})
 
-check('old feedback received')
-feedback_reset()
-check('feedback with db features received')
-
-feedback_json = json.decode(feedback_save)
-local schema_stats = feedback_json.features.schema
+actual = daemon.generate_feedback()
+local schema_stats = actual.features.schema
 test:test('features.schema', function(t)
     t:plan(10)
     t:is(schema_stats.memtx_spaces, 2, 'memtx engine usage gathered')
@@ -176,38 +191,13 @@ end)
 
 box.space.features_memtx:create_index('memtx_sec', {type = 'hash'})
 
-check('old feedback received')
-feedback_reset()
-check('feedback with new db features received')
-feedback_json = json.decode(feedback_save)
-test:is(feedback_json.features.schema.hash_indices, 2,
+actual = daemon.generate_feedback()
+test:is(actual.features.schema.hash_indices, 2,
         'internal cache invalidates when schema changes')
 
 box.space.features_vinyl:drop()
 box.space.features_memtx_empty:drop()
 box.space.features_memtx:drop()
 
-check('old feedback received')
-feedback_reset()
-check('feedback with new stats received')
-
-daemon.stop()
-
-box.feedback.save("feedback.json")
-local fio = require("fio")
-local fh = fio.open("feedback.json")
-test:ok(fh, "file is created")
-local file_data = fh:read()
-test:is(file_data, feedback_save, "data is equal")
-fh:close()
-fio.unlink("feedback.json")
-
-server:close()
--- check it does not fail without server
-local daemon = box.internal.feedback_daemon
-daemon.start()
-daemon.send_test()
-daemon.stop()
-
 test:check()
 os.exit(0)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-19 14:01                 ` Илья Конюхов
@ 2020-06-19 23:49                   ` Vladislav Shpilevoy
  2020-06-22  8:55                     ` Илья Конюхов
  0 siblings, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-19 23:49 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches

> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
> index d4adb71f1..1fd5778b4 100755
> --- a/test/box-tap/feedback_daemon.test.lua
> +++ b/test/box-tap/feedback_daemon.test.lua
> @@ -115,21 +115,89 @@ check("feedback after feedback send_test")
>  
>  daemon.stop()
>  
> +actual = daemon.generate_feedback()
>  box.feedback.save("feedback.json")
>  local fio = require("fio")
>  local fh = fio.open("feedback.json")
>  test:ok(fh, "file is created")
>  local file_data = fh:read()
> -test:is(file_data, feedback_save, "data is equal")
> +test:is(file_data, json.encode(actual), "data is equal")
>  fh:close()
>  fio.unlink("feedback.json")
>  
>  server:close()
>  -- check it does not fail without server
> -local daemon = box.internal.feedback_daemon
> +daemon = box.internal.feedback_daemon

This diff hunk is not related to the patch. I reverted it
and the test passed.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-19 23:49                   ` Vladislav Shpilevoy
@ 2020-06-22  8:55                     ` Илья Конюхов
  0 siblings, 0 replies; 23+ messages in thread
From: Илья Конюхов @ 2020-06-22  8:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi,

I’ve removed unrelated hunk, thanks for noticing.

> On 20 Jun 2020, at 02:49, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
>> diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
>> index d4adb71f1..1fd5778b4 100755
>> --- a/test/box-tap/feedback_daemon.test.lua
>> +++ b/test/box-tap/feedback_daemon.test.lua
>> @@ -115,21 +115,89 @@ check("feedback after feedback send_test")
>> 
>> daemon.stop()
>> 
>> +actual = daemon.generate_feedback()
>> box.feedback.save("feedback.json")
>> local fio = require("fio")
>> local fh = fio.open("feedback.json")
>> test:ok(fh, "file is created")
>> local file_data = fh:read()
>> -test:is(file_data, feedback_save, "data is equal")
>> +test:is(file_data, json.encode(actual), "data is equal")
>> fh:close()
>> fio.unlink("feedback.json")
>> 
>> server:close()
>> -- check it does not fail without server
>> -local daemon = box.internal.feedback_daemon
>> +daemon = box.internal.feedback_daemon
> 
> This diff hunk is not related to the patch. I reverted it
> and the test passed.
> 

diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
index 1fd5778b4..330442f2e 100755
--- a/test/box-tap/feedback_daemon.test.lua
+++ b/test/box-tap/feedback_daemon.test.lua
@@ -115,19 +115,18 @@ check("feedback after feedback send_test")
 
 daemon.stop()
 
-actual = daemon.generate_feedback()
 box.feedback.save("feedback.json")
 local fio = require("fio")
 local fh = fio.open("feedback.json")
 test:ok(fh, "file is created")
 local file_data = fh:read()
-test:is(file_data, json.encode(actual), "data is equal")
+test:is(file_data, feedback_save, "data is equal")
 fh:close()
 fio.unlink("feedback.json")
 
 server:close()
 -- check it does not fail without server
-daemon = box.internal.feedback_daemon
+local daemon = box.internal.feedback_daemon
 daemon.start()
 daemon.send_test()
 daemon.stop()

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov
  2020-06-07 16:45   ` Vladislav Shpilevoy
@ 2020-07-01  0:15   ` Alexander Turenko
  2020-07-03 12:05     ` Илья Конюхов
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander Turenko @ 2020-07-01  0:15 UTC (permalink / raw)
  To: Ilya Konyukhov; +Cc: Oleg Babin, tarantool-patches, v.shpilevoy

(CCed all persons who is mentioned below.)

You all know it, but I cannot keep myself from saying again that I'm not
fan on the idea of the built-in/enabled-by-default reporting tool.
Especially when it is desined in the way that a user don't have a
guarantee that its IP address is not collected. It is not in the
opensource spirit and is not kindly to our users (to ones who care about
privacy).

I understand why the organization need it: it give some view on
community -- what is useful and what is not. It may help when we, say,
prioritize problems. But this fact does not change anything: garthering
IP addresses and some information without expicit agreement does not
become a friendly action.

And while we're on the topic, there was the idea from Mons that it would
feel more comfortable if we would offer something useful for a user
together with the reporting tool: say, provide information about
critical / security problems in a running version if it is affected. Or
feed news about important updates in functionality that is relevant for
a particular application. I would add from myself that this way the
reporting tool not even necessarily must be enabled default to be useful
for us: there is a motivation to enable it. Win-win. Collect only from
ones who explicitly said that is okay with it.

We also can structurize and improve our documentation and see what
topics are more popular and what are less. Collect download statistics
from repositories, rocks and git clones. Here a user explicitly asked an
action with a remote server and (s)he most likely know that the remote
server may count it.

Pasted actual version to comment it inline. Nothing critical in fact.

WBR, Alexander Turenko.

>       "feedback_type": "version",

Was it planned to report several maps with different types? If we
decided to report everything within one map, then this 'feedback_type'
field has no sense anymore and just confusing, right? If so, then maybe
it is the time to remove it and bump 'feedback_version'?

> +local function is_system_space(sp)
> +    return box.schema.SYSTEM_ID_MIN <= sp.id and
> +           sp.id <= box.schema.SYSTEM_ID_MAX
> +end

Nit: a space object is named 'space' below, but 'sp' above. I agree with
Max and generally prefer to avoid abbreviations.

> +local function is_jsonpath_index(idx)
> +    for _, part in pairs(idx.parts) do
> +        if part.path ~= nil then
> +            return true
> +        end
> +    end
> +
> +    return false
> +end

I agree with Oleg's point: it worth to count multikey indices too. I
don't know much, but what I understood from the commit message [1]:

- Any multikey index is JSON path index.
- A multikey index has '[*]' in 'path'.

Looks quite simple to count if I don't miss anything. Please, verify
and/or ask about it someone who knows more.

[1]: https://github.com/tarantool/tarantool/commit/f1d9f2575c11afcf3b0368d38f44ae399cd7e045

> +local function fill_in_schema_stats_impl(schema)
> +    local spaces = {
> +        memtx     = 0,
> +        vinyl     = 0,
> +        temporary = 0,
> +        ['local'] = 0,
> +    }
> +
> +    local indices = {
> +        hash       = 0,
> +        tree       = 0,
> +        rtree      = 0,
> +        bitset     = 0,
> +        jsonpath   = 0,
> +        functional = 0,
> +    }
> +
> +    for name, space in pairs(box.space) do
> +        local is_system = is_system_space(space)
> +        if not is_system and type(name) == 'number' then
> +            if space.engine == 'vinyl' then
> +                spaces.vinyl = spaces.vinyl + 1
> +            elseif space.engine == 'memtx' then
> +                if space.temporary then
> +                    spaces.temporary = spaces.temporary + 1
> +                end
> +                spaces.memtx = spaces.memtx + 1
> +            end
> +            if space.is_local then
> +                spaces['local'] = spaces['local'] + 1
> +            end
> +            fill_in_indices_stats(space, indices)
> +        end
> +        fiber.yield()

What if DDL (say, a space drop) will occur here? I think we should check
schema version right after exiting from yield and skip / repeat
collecting statistics if the schema version is changed.

I don't sure that iteration over `box.space` over DDL is safe -- what
if a current space is dropped? Is the behaviour will be like the
following?

 | tarantool> t = {a = 1, b = 2}
 | ---
 | ...
 | tarantool> for k, v in pairs(t) do if k == 'a' then t['a'] = nil end end
 | ---
 | ...

So, let's check it within the same iteration of the loop: just after
exiting from the yield.

> +    end
> +
> +    for k, v in pairs(spaces) do
> +        schema[k..'_spaces'] = v
> +    end
> +
> +    for k, v in pairs(indices) do
> +        schema[k..'_indices'] = v
> +    end

Nit: We collect those metrics into one structure and then transform it
into another one. Is one better and another worse? If so, it seems
logical to use one.

I was triggered on this, because in fact it is using of different names
for the same things: it usually does not improve code readability.

Anyway, if you have understanding why nested structuring is good in the
code, but bad in a report, proceed with your way. I don't have a strong
preference here (but I'm sympathetic to the nesting way).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-07-01  0:15   ` Alexander Turenko
@ 2020-07-03 12:05     ` Илья Конюхов
  2020-07-05  2:10       ` Alexander Turenko
  0 siblings, 1 reply; 23+ messages in thread
From: Илья Конюхов @ 2020-07-03 12:05 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: Oleg Babin, tarantool-patches, Vladislav Shpilevoy

Hi,

Thanks for your review.

You also stated your concerns about the fact that user's IP may be tracked and it’s not good practice in open source community. It sounds valid to me. But I don’t understand how this patch is related to these concerns. It does not add IP tracking feature, nor the first feedback daemon version does. This problem should have been arisen when adding feedback daemon in the first place. If it happen to be noticed now, after feedback daemon have been merged, a possibility of ip tracking may be considered as a bug. Personally, I don’t have an idea, how to solve it in tarantool. It must be some kind of trusted external service which guarantees to hide original IP address, but I don’t know such yet. I believe the best way is to start a discussion in GitHub issues and gather peoples opinions on this.

Another question you bringing up is the design of feedback tool. You are saying we may want to rethink goals of the feedback tool (you are naming it a reporting tool) so a user can benefit from it too. This is fine idea but it is also a more broader topic to discuss and goes beyond of this patch changes. I don’t see this patch somehow interferes with you ideas.


As for another comments, see my answers below:


> On 1 Jul 2020, at 10:15, Alexander Turenko <alexander.turenko@tarantool.org> wrote:
> 
> (CCed all persons who is mentioned below.)
> 
> You all know it, but I cannot keep myself from saying again that I'm not
> fan on the idea of the built-in/enabled-by-default reporting tool.
> Especially when it is desined in the way that a user don't have a
> guarantee that its IP address is not collected. It is not in the
> opensource spirit and is not kindly to our users (to ones who care about
> privacy).
> 
> I understand why the organization need it: it give some view on
> community -- what is useful and what is not. It may help when we, say,
> prioritize problems. But this fact does not change anything: garthering
> IP addresses and some information without expicit agreement does not
> become a friendly action.
> 
> And while we're on the topic, there was the idea from Mons that it would
> feel more comfortable if we would offer something useful for a user
> together with the reporting tool: say, provide information about
> critical / security problems in a running version if it is affected. Or
> feed news about important updates in functionality that is relevant for
> a particular application. I would add from myself that this way the
> reporting tool not even necessarily must be enabled default to be useful
> for us: there is a motivation to enable it. Win-win. Collect only from
> ones who explicitly said that is okay with it.
> 
> We also can structurize and improve our documentation and see what
> topics are more popular and what are less. Collect download statistics
> from repositories, rocks and git clones. Here a user explicitly asked an
> action with a remote server and (s)he most likely know that the remote
> server may count it.
> 
> Pasted actual version to comment it inline. Nothing critical in fact.
> 
> WBR, Alexander Turenko.
> 
>>      "feedback_type": "version",
> 
> Was it planned to report several maps with different types? If we
> decided to report everything within one map, then this 'feedback_type'
> field has no sense anymore and just confusing, right? If so, then maybe
> it is the time to remove it and bump 'feedback_version'?

“feedback_type” was added as a possibility to split feedback into different launches. I.e. collect simple statistics once an hour, but “heavy” stats only once a day. This patch adds some stats collection logic, but it may only be heavy for rare corner cases. It also adds caching and yielding in order to avoid affecting performance, when feedback daemon wakes up. So I decided to not split it into different feedbacks yet. However, I agree on that “version” type became obsolete with new changes, so I removed it.

> 
>> +local function is_system_space(sp)
>> +    return box.schema.SYSTEM_ID_MIN <= sp.id and
>> +           sp.id <= box.schema.SYSTEM_ID_MAX
>> +end
> 
> Nit: a space object is named 'space' below, but 'sp' above. I agree with
> Max and generally prefer to avoid abbreviations.

Okay, since you both agree on this, I made it more verbose.

> 
>> +local function is_jsonpath_index(idx)
>> +    for _, part in pairs(idx.parts) do
>> +        if part.path ~= nil then
>> +            return true
>> +        end
>> +    end
>> +
>> +    return false
>> +end
> 
> I agree with Oleg's point: it worth to count multikey indices too. I
> don't know much, but what I understood from the commit message [1]:
> 
> - Any multikey index is JSON path index.
> - A multikey index has '[*]' in 'path'.
> 
> Looks quite simple to count if I don't miss anything. Please, verify
> and/or ask about it someone who knows more.
> 
> [1]: https://github.com/tarantool/tarantool/commit/f1d9f2575c11afcf3b0368d38f44ae399cd7e045

I've checked the code and realize, that multikey path is only valid if it contains “[*]” substring. So I've added a new metric for jsonpath multikey indices, which includes also counts as jsonpath index. Similarly, I added functional multikey index, which check if the function has "is_multikey = true” option.

> 
>> +local function fill_in_schema_stats_impl(schema)
>> +    local spaces = {
>> +        memtx     = 0,
>> +        vinyl     = 0,
>> +        temporary = 0,
>> +        ['local'] = 0,
>> +    }
>> +
>> +    local indices = {
>> +        hash       = 0,
>> +        tree       = 0,
>> +        rtree      = 0,
>> +        bitset     = 0,
>> +        jsonpath   = 0,
>> +        functional = 0,
>> +    }
>> +
>> +    for name, space in pairs(box.space) do
>> +        local is_system = is_system_space(space)
>> +        if not is_system and type(name) == 'number' then
>> +            if space.engine == 'vinyl' then
>> +                spaces.vinyl = spaces.vinyl + 1
>> +            elseif space.engine == 'memtx' then
>> +                if space.temporary then
>> +                    spaces.temporary = spaces.temporary + 1
>> +                end
>> +                spaces.memtx = spaces.memtx + 1
>> +            end
>> +            if space.is_local then
>> +                spaces['local'] = spaces['local'] + 1
>> +            end
>> +            fill_in_indices_stats(space, indices)
>> +        end
>> +        fiber.yield()
> 
> What if DDL (say, a space drop) will occur here? I think we should check
> schema version right after exiting from yield and skip / repeat
> collecting statistics if the schema version is changed.
> 
> I don't sure that iteration over `box.space` over DDL is safe -- what
> if a current space is dropped? Is the behaviour will be like the
> following?
> 
> | tarantool> t = {a = 1, b = 2}
> | ---
> | ...
> | tarantool> for k, v in pairs(t) do if k == 'a' then t['a'] = nil end end
> | ---
> | ...
> 
> So, let's check it within the same iteration of the loop: just after
> exiting from the yield.

Good point. Your suggestion has some pros, but I decided to solve this problem in other, imo more robust way. I collect all space into a table and then iterating over this table. If space is not found, just skip it. I accept the fact, that there is a little chance to have a bit inaccurate feedback numbers for spaces and indices, but more stable approach.

> 
>> +    end
>> +
>> +    for k, v in pairs(spaces) do
>> +        schema[k..'_spaces'] = v
>> +    end
>> +
>> +    for k, v in pairs(indices) do
>> +        schema[k..'_indices'] = v
>> +    end
> 
> Nit: We collect those metrics into one structure and then transform it
> into another one. Is one better and another worse? If so, it seems
> logical to use one.
> 
> I was triggered on this, because in fact it is using of different names
> for the same things: it usually does not improve code readability.
> 
> Anyway, if you have understanding why nested structuring is good in the
> code, but bad in a report, proceed with your way. I don't have a strong
> preference here (but I'm sympathetic to the nesting way).

I think this way is simpler. It’s not only readable but also more usable. This hunk simply merges two tables into one container which stores feedback schema statistics. These two tables (spaces and indices) are isolated, so name clashes won’t happen. Also if I defined final keys at the beginning, it would contain lines like "spaces.memtx_spaces = spaces.memtx_spaces + 1” and would look cumbersome as a whole.

---

diff --git a/src/box/lua/feedback_daemon.lua b/src/box/lua/feedback_daemon.lua
index ee58575a4..64fd71f86 100644
--- a/src/box/lua/feedback_daemon.lua
+++ b/src/box/lua/feedback_daemon.lua
@@ -50,14 +50,30 @@ local function detect_docker_environment()
     return cached_detect_docker_env
 end
 
-local function is_system_space(sp)
-    return box.schema.SYSTEM_ID_MIN <= sp.id and
-           sp.id <= box.schema.SYSTEM_ID_MAX
+local function is_system_space(space)
+    return box.schema.SYSTEM_ID_MIN <= space.id and
+           space.id <= box.schema.SYSTEM_ID_MAX
 end
 
-local function is_jsonpath_index(idx)
+local function jsonpaths_from_idx_parts(idx)
+    local paths = {}
+
     for _, part in pairs(idx.parts) do
-        if part.path ~= nil then
+        if type(part.path) == 'string' then
+            table.insert(paths, part.path)
+        end
+    end
+
+    return paths
+end
+
+local function is_jsonpath_index(idx)
+    return #jsonpaths_from_idx_parts(idx) > 0
+end
+
+local function is_jp_multikey_index(idx)
+    for _, path in pairs(jsonpaths_from_idx_parts(idx)) do
+        if path:find('[*]', 1, true) then
             return true
         end
     end
@@ -69,6 +85,16 @@ local function is_functional_index(idx)
     return idx.func ~= nil
 end
 
+local function is_func_multikey_index(idx)
+    if is_functional_index(idx) then
+        local fid = idx.func.fid
+        local func = fid and box.func[fid]
+        return func and func.is_multikey or false
+    end
+
+    return false
+end
+
 local function fill_in_base_info(feedback)
     if box.info.status ~= "running" then
         return nil, "not running"
@@ -91,8 +117,14 @@ local function fill_in_indices_stats(space, stats)
             if idx_type == 'TREE' then
                 if is_functional_index(idx) then
                     stats.functional = stats.functional + 1
+                    if is_func_multikey_index(idx) then
+                        stats.functional_multikey = stats.functional_multikey + 1
+                    end
                 elseif is_jsonpath_index(idx) then
                     stats.jsonpath = stats.jsonpath + 1
+                    if is_jp_multikey_index(idx) then
+                        stats.jsonpath_multikey = stats.jsonpath_multikey + 1
+                    end
                 end
                 stats.tree = stats.tree + 1
             elseif idx_type == 'HASH' then
@@ -115,31 +147,45 @@ local function fill_in_schema_stats_impl(schema)
     }
 
     local indices = {
-        hash       = 0,
-        tree       = 0,
-        rtree      = 0,
-        bitset     = 0,
-        jsonpath   = 0,
-        functional = 0,
+        hash                = 0,
+        tree                = 0,
+        rtree               = 0,
+        bitset              = 0,
+        jsonpath            = 0,
+        jsonpath_multikey   = 0,
+        functional          = 0,
+        functional_multikey = 0,
     }
 
+    local space_ids = {}
     for name, space in pairs(box.space) do
         local is_system = is_system_space(space)
         if not is_system and type(name) == 'number' then
-            if space.engine == 'vinyl' then
-                spaces.vinyl = spaces.vinyl + 1
-            elseif space.engine == 'memtx' then
-                if space.temporary then
-                    spaces.temporary = spaces.temporary + 1
-                end
-                spaces.memtx = spaces.memtx + 1
-            end
-            if space.is_local then
-                spaces['local'] = spaces['local'] + 1
+            table.insert(space_ids, name)
+        end
+    end
+
+    for _, id in pairs(space_ids) do
+        local space = box.space[id]
+        if space == nil then
+            goto continue;
+        end
+
+        if space.engine == 'vinyl' then
+            spaces.vinyl = spaces.vinyl + 1
+        elseif space.engine == 'memtx' then
+            if space.temporary then
+                spaces.temporary = spaces.temporary + 1
             end
-            fill_in_indices_stats(space, indices)
+            spaces.memtx = spaces.memtx + 1
+        end
+        if space.is_local then
+            spaces['local'] = spaces['local'] + 1
         end
+        fill_in_indices_stats(space, indices)
+
         fiber.yield()
+        ::continue::
     end
 
     for k, v in pairs(spaces) do
@@ -255,7 +301,7 @@ setmetatable(daemon, {
         end,
         -- this function is used in saving feedback in file
         generate_feedback = function()
-            return fill_in_feedback({ feedback_type = "version", feedback_version = 1 })
+            return fill_in_feedback({ feedback_version = 2 })
         end,
         start = function()
             start(daemon)
diff --git a/test/box-tap/feedback_daemon.test.lua b/test/box-tap/feedback_daemon.test.lua
index 330442f2e..1606af96d 100755
--- a/test/box-tap/feedback_daemon.test.lua
+++ b/test/box-tap/feedback_daemon.test.lua
@@ -148,7 +148,9 @@ local expected = {
     hash_indices = 0,
     bitset_indices = 0,
     jsonpath_indices = 0,
+    jsonpath_multikey_indices = 0,
     functional_indices = 0,
+    functional_multikey_indices = 0,
 }
 test:is_deeply(actual.features.schema, expected,
         'schema stats are empty at the moment')
@@ -165,27 +167,38 @@ box.space.features_memtx:create_index('memtx_rtree',
         {type = 'rtree', parts = {{field = 3, type = 'array'}}})
 box.space.features_memtx:create_index('memtx_jpath',
         {parts = {{field = 4, type = 'str', path = 'data.name'}}})
+box.space.features_memtx:create_index('memtx_multikey',
+        {parts = {{field = 5, type = 'str', path = 'data[*].name'}}})
 box.schema.func.create('features_func', {
     body = "function(tuple) return {string.sub(tuple[2], 1, 1)} end",
     is_deterministic = true,
     is_sandboxed = true})
-box.space.features_memtx:create_index('j',
+box.schema.func.create('features_func_multikey', {
+    body = "function(tuple) return {1, 2} end",
+    is_deterministic = true,
+    is_sandboxed = true,
+    opts = {is_multikey = true}})
+box.space.features_memtx:create_index('functional',
         {parts = {{field = 1, type = 'number'}}, func = 'features_func'})
+box.space.features_memtx:create_index('functional_multikey',
+        {parts = {{field = 1, type = 'number'}}, func = 'features_func_multikey'})
 
 actual = daemon.generate_feedback()
 local schema_stats = actual.features.schema
 test:test('features.schema', function(t)
-    t:plan(10)
+    t:plan(12)
     t:is(schema_stats.memtx_spaces, 2, 'memtx engine usage gathered')
     t:is(schema_stats.vinyl_spaces, 1, 'vinyl engine usage gathered')
     t:is(schema_stats.temporary_spaces, 1, 'temporary space usage gathered')
     t:is(schema_stats.local_spaces, 1, 'local space usage gathered')
-    t:is(schema_stats.tree_indices, 4, 'tree index gathered')
+    t:is(schema_stats.tree_indices, 6, 'tree index gathered')
     t:is(schema_stats.hash_indices, 1, 'hash index gathered')
     t:is(schema_stats.rtree_indices, 1, 'rtree index gathered')
     t:is(schema_stats.bitset_indices, 1, 'bitset index gathered')
-    t:is(schema_stats.jsonpath_indices, 1, 'jsonpath index gathered')
-    t:is(schema_stats.functional_indices, 1, 'functional index gathered')
+    t:is(schema_stats.jsonpath_indices, 2, 'jsonpath index gathered')
+    t:is(schema_stats.jsonpath_multikey_indices, 1, 'jsonpath multikey index gathered')
+    t:is(schema_stats.functional_indices, 2, 'functional index gathered')
+    t:is(schema_stats.functional_multikey_indices, 1, 'functional multikey index gathered')
 end)
 
 box.space.features_memtx:create_index('memtx_sec', {type = 'hash'})

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features
  2020-07-03 12:05     ` Илья Конюхов
@ 2020-07-05  2:10       ` Alexander Turenko
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Turenko @ 2020-07-05  2:10 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: Oleg Babin, tarantool-patches, Vladislav Shpilevoy

LGTM.

> > You all know it, but I cannot keep myself from saying again that I'm not
> > fan on the idea of the built-in/enabled-by-default reporting tool.
> > Especially when it is desined in the way that a user don't have a
> > guarantee that its IP address is not collected. It is not in the
> > opensource spirit and is not kindly to our users (to ones who care about
> > privacy).
> > 
> > I understand why the organization need it: it give some view on
> > community -- what is useful and what is not. It may help when we, say,
> > prioritize problems. But this fact does not change anything: garthering
> > IP addresses and some information without expicit agreement does not
> > become a friendly action.
> > 
> > And while we're on the topic, there was the idea from Mons that it would
> > feel more comfortable if we would offer something useful for a user
> > together with the reporting tool: say, provide information about
> > critical / security problems in a running version if it is affected. Or
> > feed news about important updates in functionality that is relevant for
> > a particular application. I would add from myself that this way the
> > reporting tool not even necessarily must be enabled default to be useful
> > for us: there is a motivation to enable it. Win-win. Collect only from
> > ones who explicitly said that is okay with it.
> > 
> > We also can structurize and improve our documentation and see what
> > topics are more popular and what are less. Collect download statistics
> > from repositories, rocks and git clones. Here a user explicitly asked an
> > action with a remote server and (s)he most likely know that the remote
> > server may count it.
>
> You also stated your concerns about the fact that user's IP may be
> tracked and it’s not good practice in open source community. It sounds
> valid to me. But I don’t understand how this patch is related to these
> concerns. It does not add IP tracking feature, nor the first feedback
> daemon version does. This problem should have been arisen when adding
> feedback daemon in the first place. If it happen to be noticed now,
> after feedback daemon have been merged, a possibility of ip tracking
> may be considered as a bug. Personally, I don’t have an idea, how to
> solve it in tarantool. It must be some kind of trusted external
> service which guarantees to hide original IP address, but I don’t know
> such yet. I believe the best way is to start a discussion in GitHub
> issues and gather peoples opinions on this.
> 
> Another question you bringing up is the design of feedback tool. You
> are saying we may want to rethink goals of the feedback tool (you are
> naming it a reporting tool) so a user can benefit from it too. This is
> fine idea but it is also a more broader topic to discuss and goes
> beyond of this patch changes. I don’t see this patch somehow
> interferes with you ideas.

Of course, it is not about your patch. Just grumbling around the
feedback daemon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Extend feedback module report
  2020-06-05  8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov
  2020-06-05  8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov
@ 2020-06-23 21:23 ` Vladislav Shpilevoy
  2020-07-13 13:47 ` Kirill Yukhin
  3 siblings, 0 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-23 21:23 UTC (permalink / raw)
  To: Ilya Konyukhov, tarantool-patches; +Cc: alexander.turenko

LGTM.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Extend feedback module report
  2020-06-05  8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov
                   ` (2 preceding siblings ...)
  2020-06-23 21:23 ` [Tarantool-patches] [PATCH 0/2] Extend feedback module report Vladislav Shpilevoy
@ 2020-07-13 13:47 ` Kirill Yukhin
  3 siblings, 0 replies; 23+ messages in thread
From: Kirill Yukhin @ 2020-07-13 13:47 UTC (permalink / raw)
  To: Ilya Konyukhov; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko

Hello,

On 05 июн 11:35, Ilya Konyukhov wrote:
> Following patchset adds more information about running
> tarantool instance. Firstly, it collects platform information
> such as os, arch and docker environment. Secondly, it looks
> through box user spaces and collects info about db engines and
> indices.
> 
> Related issues:
> - https://github.com/tarantool/tarantool/issues/3608
> - https://github.com/tarantool/tarantool/issues/4943
> 
> PR:
> - https://github.com/tarantool/tarantool/pull/5039
> 
> Branch:
> - RunsFor:runsfor/gh-3608-platform-detection-for-feedback

I've checked your patchset into master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-07-13 13:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05  8:35 [Tarantool-patches] [PATCH 0/2] Extend feedback module report Ilya Konyukhov
2020-06-05  8:35 ` [Tarantool-patches] [PATCH 1/2] feedback: determine runtime platform info Ilya Konyukhov
2020-06-07 16:45   ` Vladislav Shpilevoy
2020-06-09 23:05     ` Илья Конюхов
2020-06-11 19:32       ` Vladislav Shpilevoy
2020-07-01  0:16       ` Alexander Turenko
2020-07-05  2:14         ` Alexander Turenko
2020-06-05  8:35 ` [Tarantool-patches] [PATCH 2/2] feedback: collect db engines and index features Ilya Konyukhov
2020-06-07 16:45   ` Vladislav Shpilevoy
2020-06-09 23:06     ` Илья Конюхов
2020-06-11 19:32       ` Vladislav Shpilevoy
2020-06-17  8:59         ` Илья Конюхов
2020-06-17 22:53           ` Vladislav Shpilevoy
2020-06-18 15:42             ` Илья Конюхов
2020-06-18 23:02               ` Vladislav Shpilevoy
2020-06-19 14:01                 ` Илья Конюхов
2020-06-19 23:49                   ` Vladislav Shpilevoy
2020-06-22  8:55                     ` Илья Конюхов
2020-07-01  0:15   ` Alexander Turenko
2020-07-03 12:05     ` Илья Конюхов
2020-07-05  2:10       ` Alexander Turenko
2020-06-23 21:23 ` [Tarantool-patches] [PATCH 0/2] Extend feedback module report Vladislav Shpilevoy
2020-07-13 13:47 ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox