[Tarantool-patches] [PATCH] serializer: serialize recursive structures

Sergey Kaplun skaplun at tarantool.org
Wed Apr 21 21:20:31 MSK 2021


Hi!

Thanks for benchmarks!
I like them!

But I am tormented by doubts whether it is worth keeping it in a
semi-working state in our repository. Shouldn't we first introduce the
architecture of performance tests, and then add some new benchmarks?

I'm totally for the presence of such benchmarks in our repo, but I'm
afraid to make a real mess of things.

Also, I suppose that it should be a collegeal descicion (not only 2
reviewers).

On 21.04.21, Sergey Bronnikov wrote:
> 
> On 21.04.2021 12:27, Sergey Kaplun via Tarantool-patches wrote:
> > Hi!
> >
> > Thanks for the fixes and the benchmarks!
> 
> I propose an updated version of script and propose to add benchmark as a 
> part of patch series.
> 
> --[[
> -- measure serialization time
> -- run: taskset -c 1 tarantool perf.lua
> ]]
> 
> local clock = require('clock')
> 
> local function elapsed(f, n)
>      local t0 = clock.monotonic()

I've read discussions that we should use CLOCK_PROCESS_CPUTIME_ID
instead of CLOCK_MONOTONIC. Is it one preferable than another?

>      for i = 1, n do
>          f()
>      end
>      local t1 = clock.monotonic()
>      return t1 - t0
> end

Shouldn't we still take into account the time to call the function
`clock.monotonic()`?

> 
> -- Get the mean of a table
> function calculate_mean(t)
>    local sum = 0
>    local count= 0
>    for k, v in pairs(t) do
>      if type(v) == 'number' then

2 general points:
1) use `for i = 1, #t do` instead pairs (as far as we use #t later)
2) use `assert(type(v) == 'number')` instead of if statement.

Here and below.

>        sum = sum + v
>      end
>    end
> 
>    return (sum/#t)
> end
> 
> -- Get the median of a table.
> function calculate_median(t)
>    local temp = {}
>    -- deep copy table so that when we sort it, the original is unchanged

Are we interested in unchanged original table?
May be it will be better to sort values inside before and work with
them?

>    -- also weed out any non numbers
>    for k, v in pairs(t) do
>      if type(v) == 'number' then
>        table.insert(temp, v)
>      end
>    end
>    table.sort(temp)
>    -- If we have an even number of table elements or odd.
>    if math.fmod(#temp,2) == 0 then
>      -- return mean value of middle two elements
>      return (temp[#temp/2] + temp[(#temp/2)+1]) / 2
>    else
>      -- return middle element
>      return temp[math.ceil(#temp/2)]
>    end
> end
> 
> -- Get the standard deviation of a table
> function calculate_stddev(t)
>    local vm

Sorry, don't get it: why do you choose this scope for the variable?

>    local sum = 0
>    local mean = calculate_mean(t)
>    for k, v in pairs(t) do
>      if type(v) == 'number' then
>        vm = v - mean
>        sum = sum + (vm * vm)
>      end
>    end
> 
>    return math.sqrt(sum/(#t - 1))
> end
> 
> local function timeit(f, name)
>      print('======================================')
>      print(name)
>      print('======================================')
>      local res = {}
>      local iterations = 10
>      local elapsed_time = 0
>      local repetitions = 150000

It will be nice to have an ability to configure this value (via config
or environment variables).

>      for j = 1, iterations do
>          -- warming
>          for i = 1, 100 do f() end
>          -- measurement
>          elapsed_time = elapsed(f, repetitions)
>          table.insert(res, elapsed_time)
>          print(string.format("%-2d - %f sec / %d repetitions", j, 
> elapsed_time, repetitions))
>      end
>      print(string.format("time mean   %f", calculate_mean(res)))
>      print(string.format("time median %f", calculate_median(res)))
>      print(string.format("time stddev %f", calculate_stddev(res)))
> end
> 
> local t = {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
> timeit(function()
>             local console = require('console')
>             console.set_default_output('yaml')
>             return console.eval(tostring(t))
>         end, 'serializer console yaml')

It is better to move out all this actions and create a local copy of
called functions.

For example:
===================================================================
local tstr = tostring(t)
local console = require('console')
console.set_default_output('yaml')
local eval = console.eval
timeit(function()
  return eval(tstr)
end, 'serializer console yaml')
===================================================================

It reduces time by 5% (stderr is 0.5%).

> timeit(function()
>             local console = require('console')
>             console.set_default_output('lua')
>             return console.eval(tostring(t))
>         end, 'serializer console lua')
> timeit(function()
>             local serializer = require('json')
>             serializer.cfg({encode_max_depth = 64})
>             return serializer.encode(t)
>         end, 'serializer json')
> timeit(function()
>             local serializer = require('yaml')
>             serializer.cfg({encode_max_depth = 64})
>             return serializer.encode(t)
>         end, 'serializer yaml')
> timeit(function()
>             local serializer = require('msgpack')
>             serializer.cfg({encode_max_depth = 64})
>             return serializer.encode(t)
>         end, 'serializer msgpack')
> timeit(function()
>             local serializer = require('msgpackffi')
>             return serializer.encode(t)
>         end, 'serializer msgpackffi')
> 

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list