[PATCH 8/8] vinyl: keep track of thread pool idle ratio
vdavydov.dev at gmail.com
Thu Sep 6 14:59:57 MSK 2018
On Thu, Sep 06, 2018 at 01:57:42PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/09/06 13:53]:
> > > > > > To understand whether the disk is fully utilized or can still handle
> > > > > > more compaction load and make right decisions regarding transaction
> > > > > > throttling, we need a metric that would report how much time worker
> > > > > > threads spent being idle. So this patch adds two new metrics to global
> > > > > > statistics, disk.dump_idle_ratio and compact_idle_ratio, which show how
> > > > > > much time dump threads and compaction threads were idle, respectively.
> > > > > > The metrics are updated using the following formula:
> > > > > >
> > > > > > idle_time
> > > > > > idle_ratio = --------------------------
> > > > > > dump_period * worker_count
> > > > >
> > > > > I don't understand the formula. There can be many workers.
> > > > > Is idle time measured per worker or per entire pool?
> > > > >
> > > > > If it is measured per entire pool, how is idle time calculated if
> > > > > some workers are busy and some not?
> > > >
> > > > It is measured for entire pool - note that I divide the result by
> > > > worker_count. E.g. if there were two workers and one of them were
> > > > busy all the time between two last dumps while another were idle,
> > > > idle_ratio would be 0.5.
> > >
> > > This looks imprecise. Why not measure idle time of each worker and
> > > then even it out over the total number of workers?
> > That's exactly what I do. I maintain the idle time for each worker
> > thread and then divide it by the total number of workers.
> This is not what you've written in the comment, though.
It is. Quoting the commit message:
} The metrics are updated using the following formula:
} idle_ratio = --------------------------
} dump_period * worker_count
} where idle_time is the total amount of time workers were idle between
} the last two dumps, dump_period is the time that passed between the last
} two dumps, worker_count is the number of workers in the pool.
> > > Besides, once again, how do you define the window over which you
> > > measure?
> > I use dump period for the window, i.e. time between two subsequent
> > memory dumps. Respectively, I update the idle_ratio after each memory
> > dump. I'm planning to update throttle rate limit there too.
> This choice of time window looks arbitrary. Why do you think it's
> a good choice?
Dump period seems to be the only reasonable choice for a minimal time
window characterising a vinyl workload. If we choose a smaller window,
then dump_idle_ratio will jump from 0 when dump is inactive to 1 when
dump is in progress, which isn't very convenient.
At the same time, if we want to accumulate idle time statistics over a
longer period, we can still do that by averaging idle_ratio.
More information about the Tarantool-patches