Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Nikita Pettik <korablev@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/3] box: introduce port_c
Date: Mon, 27 Apr 2020 23:28:01 +0200	[thread overview]
Message-ID: <e0a01b4c-b5cd-b097-a6c8-fa9284fef006@tarantool.org> (raw)
In-Reply-To: <20200427140955.GB30870@tarantool.org>

Hi! Thanks for the review!

>>>> +}
>>>> +
>>>> +static inline struct port_c_entry *
>>>> +port_c_new_entry(struct port_c *port)
>>>> +{
>>>> +	struct port_c_entry *e;
>>>> +	if (port->size == 0) {
>>>> +		e = &port->first_entry;
>>>
>>> I don't really get this 'first_entry' optimization. Why do you
>>> need it at all?
>>
>> Because most functions return only 1 result. And most of these
>> return it as a tuple. So having this value inside the port
>> preallocated allows to avoid any allocations/frees for these
>> cases. Zero allocation. Although I didn't really bench how
>> much it helps. I just stole it from port_tuple as a low hanging
>> fruit.
> 
> On the other hand you get memory overhead, which you fight for
> so much even using packed attribute...

There is no overhead for preallocated entries (for our allocators
and the heap), since port is on the stack always.

At least there is no overhead with reasonable number of reserved
entries. Of course, if you preallocate, say, thousand, it would
affect the stack cache locality in a bad way.

> Why not then keeping 2, 3 or whatever number benchmarks say 
> entries inlined?

So yes, in theory adding 2 or 3 or a few more will probably speed up
something, but not the most common case, when result is just one,
or when there are tens or hundreds or even thousands results (although
thousands are unlikely, since tx thread will have problems with
selecting so much without yields).

>>>> +	if (pe != NULL) {
>>>> +		/* 0 mp_size means the entry stores a tuple. */
>>>> +		pe->mp_size = 0;
>>>> +		pe->tuple = tuple;
>>>> +		tuple_ref(tuple);
>>>> +		return 0;
>>>> +	}
>>>> +	return -1;
>>>> +}
>>>> +
>>>> +int
>>>> +port_c_add_mp(struct port *base, const char *mp, const char *mp_end)
>>>> +{
>>>> +	struct port_c *port = (struct port_c *)base;
>>>> +	struct port_c_entry *pe;
>>>> +	uint32_t size = mp_end - mp;
>>>> +	char *dst;
>>>> +	if (size <= PORT_ENTRY_SIZE) {
>>>> +		/*
>>>> +		 * Alloc on a mempool is several times faster than
>>>> +		 * on the heap. And it perfectly fits any
>>>> +		 * MessagePack number, a short string, a boolean.
>>>> +		 */
>>>> +		dst = mempool_alloc(&port_entry_pool);
>>>
>>> Dubious optimization...I mean is this code complication really
>>> worth it?
>>
>> mempool_alloc() of this size is x4-5 times faster than malloc()
>> according to my microbenchmarks on release build. I decided not
>> to write concrete numbers x4-5, since they probably depend on
>> machine.
> 
> Is there at least one real scenario where entry allocation
> policy can turn out to be bottleneck in overall performance?

I guess we have plenty of such scenarios - choose any favorite among
any of our GC problems, when solution/sales team need to write
functions in C because of bad memory management in Lua or its high
overusage because of uncontrolled unpacking of everything. C stored
functions are supposed to be much much faster than Lua, be called
hundreds of thousands times per second easily.

If we use heap for each such request, we loose some real perf, and
aggravate heap fragmentation for other requests with lots of small
allocations. Moreover, allocations of about the same size - exactly
from what mempools are supposed to protect.

I measure here by numbers I got on my machine. I was running a C
function, which returned 100 unsigned numbers, each <= 128. The
function was called just 10k times, not a big number.

With using memory pool the bench finished in 35ms. With using heap
it finished in 143-144ms. So just by allocating results on the pool
instead of the heap I won more than 100ms in 1kk mere returns. Not
selects, or gets, or other non trivial operations. Entirely on
returns, of pure TX thread time.

On the summary, I consider this low hanging perf fruit a good
enough result which is worth the code complication.

Talking of memory problems, my entirely subjective guess is that
partially we don't see malloc/free in our top results in flamegraphs,
because we avoid their usage always when possible. All hot data
structures are pooled - transactions, iterators, port entries (used
not only for C functions, but for all selects too). All hot temporary
data is batched on regions (lsregion in case of vinyl). So if we don't
see malloc in top used functions, it does not mean we can start using it
in hot places, because it seems to be light. It is actually not.

  reply	other threads:[~2020-04-27 21:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 17:47 [Tarantool-patches] [PATCH 0/3] box_return_mp Vladislav Shpilevoy
2020-03-08 17:47 ` [Tarantool-patches] [PATCH 1/3] box: fix struct port_tuple.size wrong type in Lua Vladislav Shpilevoy
2020-03-10 13:42   ` Nikita Pettik
2020-03-11  0:17     ` Vladislav Shpilevoy
2020-03-08 17:47 ` [Tarantool-patches] [PATCH 2/3] box: introduce port_c Vladislav Shpilevoy
2020-03-26 17:49   ` Nikita Pettik
2020-04-23  0:14     ` Vladislav Shpilevoy
2020-04-27 14:09       ` Nikita Pettik
2020-04-27 21:28         ` Vladislav Shpilevoy [this message]
2020-04-27 23:24           ` Nikita Pettik
2020-04-03 14:12   ` Igor Munkin
2020-04-23  0:14     ` Vladislav Shpilevoy
2020-03-08 17:47 ` [Tarantool-patches] [PATCH 3/3] box: introduce box_return_mp() public C function Vladislav Shpilevoy
2020-03-26 17:51   ` Nikita Pettik
2020-04-03 14:13   ` Igor Munkin
2020-04-23  0:14     ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e0a01b4c-b5cd-b097-a6c8-fa9284fef006@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/3] box: introduce port_c' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox