Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Bronnikov <sergeyb@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] serializer: serialize recursive structures
Date: Wed, 21 Apr 2021 21:20:31 +0300	[thread overview]
Message-ID: <YIBs7112r1thKf7f@root> (raw)
In-Reply-To: <092bc107-40cd-2c58-7a90-236663551249@tarantool.org>

Hi!

Thanks for benchmarks!
I like them!

But I am tormented by doubts whether it is worth keeping it in a
semi-working state in our repository. Shouldn't we first introduce the
architecture of performance tests, and then add some new benchmarks?

I'm totally for the presence of such benchmarks in our repo, but I'm
afraid to make a real mess of things.

Also, I suppose that it should be a collegeal descicion (not only 2
reviewers).

On 21.04.21, Sergey Bronnikov wrote:
> 
> On 21.04.2021 12:27, Sergey Kaplun via Tarantool-patches wrote:
> > Hi!
> >
> > Thanks for the fixes and the benchmarks!
> 
> I propose an updated version of script and propose to add benchmark as a 
> part of patch series.
> 
> --[[
> -- measure serialization time
> -- run: taskset -c 1 tarantool perf.lua
> ]]
> 
> local clock = require('clock')
> 
> local function elapsed(f, n)
>      local t0 = clock.monotonic()

I've read discussions that we should use CLOCK_PROCESS_CPUTIME_ID
instead of CLOCK_MONOTONIC. Is it one preferable than another?

>      for i = 1, n do
>          f()
>      end
>      local t1 = clock.monotonic()
>      return t1 - t0
> end

Shouldn't we still take into account the time to call the function
`clock.monotonic()`?

> 
> -- Get the mean of a table
> function calculate_mean(t)
>    local sum = 0
>    local count= 0
>    for k, v in pairs(t) do
>      if type(v) == 'number' then

2 general points:
1) use `for i = 1, #t do` instead pairs (as far as we use #t later)
2) use `assert(type(v) == 'number')` instead of if statement.

Here and below.

>        sum = sum + v
>      end
>    end
> 
>    return (sum/#t)
> end
> 
> -- Get the median of a table.
> function calculate_median(t)
>    local temp = {}
>    -- deep copy table so that when we sort it, the original is unchanged

Are we interested in unchanged original table?
May be it will be better to sort values inside before and work with
them?

>    -- also weed out any non numbers
>    for k, v in pairs(t) do
>      if type(v) == 'number' then
>        table.insert(temp, v)
>      end
>    end
>    table.sort(temp)
>    -- If we have an even number of table elements or odd.
>    if math.fmod(#temp,2) == 0 then
>      -- return mean value of middle two elements
>      return (temp[#temp/2] + temp[(#temp/2)+1]) / 2
>    else
>      -- return middle element
>      return temp[math.ceil(#temp/2)]
>    end
> end
> 
> -- Get the standard deviation of a table
> function calculate_stddev(t)
>    local vm

Sorry, don't get it: why do you choose this scope for the variable?

>    local sum = 0
>    local mean = calculate_mean(t)
>    for k, v in pairs(t) do
>      if type(v) == 'number' then
>        vm = v - mean
>        sum = sum + (vm * vm)
>      end
>    end
> 
>    return math.sqrt(sum/(#t - 1))
> end
> 
> local function timeit(f, name)
>      print('======================================')
>      print(name)
>      print('======================================')
>      local res = {}
>      local iterations = 10
>      local elapsed_time = 0
>      local repetitions = 150000

It will be nice to have an ability to configure this value (via config
or environment variables).

>      for j = 1, iterations do
>          -- warming
>          for i = 1, 100 do f() end
>          -- measurement
>          elapsed_time = elapsed(f, repetitions)
>          table.insert(res, elapsed_time)
>          print(string.format("%-2d - %f sec / %d repetitions", j, 
> elapsed_time, repetitions))
>      end
>      print(string.format("time mean   %f", calculate_mean(res)))
>      print(string.format("time median %f", calculate_median(res)))
>      print(string.format("time stddev %f", calculate_stddev(res)))
> end
> 
> local t = {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
> timeit(function()
>             local console = require('console')
>             console.set_default_output('yaml')
>             return console.eval(tostring(t))
>         end, 'serializer console yaml')

It is better to move out all this actions and create a local copy of
called functions.

For example:
===================================================================
local tstr = tostring(t)
local console = require('console')
console.set_default_output('yaml')
local eval = console.eval
timeit(function()
  return eval(tstr)
end, 'serializer console yaml')
===================================================================

It reduces time by 5% (stderr is 0.5%).

> timeit(function()
>             local console = require('console')
>             console.set_default_output('lua')
>             return console.eval(tostring(t))
>         end, 'serializer console lua')
> timeit(function()
>             local serializer = require('json')
>             serializer.cfg({encode_max_depth = 64})
>             return serializer.encode(t)
>         end, 'serializer json')
> timeit(function()
>             local serializer = require('yaml')
>             serializer.cfg({encode_max_depth = 64})
>             return serializer.encode(t)
>         end, 'serializer yaml')
> timeit(function()
>             local serializer = require('msgpack')
>             serializer.cfg({encode_max_depth = 64})
>             return serializer.encode(t)
>         end, 'serializer msgpack')
> timeit(function()
>             local serializer = require('msgpackffi')
>             return serializer.encode(t)
>         end, 'serializer msgpackffi')
> 

-- 
Best regards,
Sergey Kaplun

  reply	other threads:[~2021-04-21 18:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11  5:49 Roman Khabibov via Tarantool-patches
2021-03-14 12:42 ` Sergey Kaplun via Tarantool-patches
2021-04-05 22:05   ` Roman Khabibov via Tarantool-patches
2021-04-21  9:27     ` Sergey Kaplun via Tarantool-patches
2021-04-21 13:12       ` Sergey Bronnikov via Tarantool-patches
2021-04-21 18:20         ` Sergey Kaplun via Tarantool-patches [this message]
2021-04-13 15:54 ` Sergey Bronnikov via Tarantool-patches
2021-04-15 20:39   ` Roman Khabibov via Tarantool-patches
2021-04-21 12:34 ` Sergey Bronnikov via Tarantool-patches
2021-07-07 22:42 ` Alexander Turenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YIBs7112r1thKf7f@root \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] serializer: serialize recursive structures' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox