Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Nikita Pettik <korablev@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/3] box: introduce port_c
Date: Thu, 23 Apr 2020 02:14:36 +0200	[thread overview]
Message-ID: <7301da02-16f8-2629-b673-8d6125032111@tarantool.org> (raw)
In-Reply-To: <20200326174911.GA5718@tarantool.org>

Hi! Thanks for the review!

On 26/03/2020 18:49, Nikita Pettik wrote:
> On 08 Mar 18:47, Vladislav Shpilevoy wrote:
> 
> In general, patch is okay. Below some comments which can be
> considered as nits.
> 
> I see that your branch doesn't pass CI testing:
> https://travis-ci.org/github/tarantool/tarantool/jobs/659861432?utm_medium=notification&utm_source=github_status
> 
> I haven't managed to find box/alter_limits among existing flaky
> tests. If this test fail is not really related to your patches,
> could you please file an issue?

It is not related (at least I think so), I saw it failing on other
branches too.

Created an issue:
https://github.com/tarantool/tarantool/issues/4926

>> If you (reviewers) want me to drop port_tuple in scope of this
>> patchset, I can do that. At the moment of sending the patchset I
>> decided to postpone it.
> 
> I'd do it as a follow-up in case you have time to do it.
> Otherwise please just file an issue.

Now I have time, so it is done in a separate commit in
v2 thread.

>> diff --git a/src/box/port.c b/src/box/port.c
>> index 6e2fe3a6e..925a4b2d5 100644
>> --- a/src/box/port.c
>> +++ b/src/box/port.c
>> @@ -37,7 +37,12 @@
>>  #include <fiber.h>
>>  #include "errinj.h"
>>  
>> -static struct mempool port_tuple_entry_pool;
> 
> I'd add comment explaining that port_entry_pool holds both
> port_c and port_tuple entries now. Moreover, as an optimization
> some msgpacks stored in port_c also can be allocated in this
> pool (as far as I understand).

Ok, no problem:

====================
diff --git a/src/box/port.c b/src/box/port.c
index 3498f7af0..fea98ecc5 100644
--- a/src/box/port.c
+++ b/src/box/port.c
@@ -37,6 +37,12 @@
 #include <fiber.h>
 #include "errinj.h"
 
+/**
+ * The pools is used both by port_c and port_tuple, since their
+ * entires are almost of the same size. Also port_c can use
+ * objects from the pool to store result data in their memory,
+ * when it fits.
+ */
 static struct mempool port_entry_pool;
 
 enum {

====================

>> +static struct mempool port_entry_pool;
>> +
>> +enum {
>> +	PORT_ENTRY_SIZE = MAX(sizeof(struct port_c_entry),
>> +			      sizeof(struct port_tuple_entry)),
>> +};
>>  
>> @@ -127,28 +132,194 @@ port_tuple_dump_msgpack(struct port *base, struct obuf *out)
>>  extern void
>>  port_tuple_dump_lua(struct port *base, struct lua_State *L, bool is_flat);
>>  
>> +static inline void
>> +port_c_destroy_entry(struct port_c_entry *pe)
>> +{
> 
> I'd add a brief reference like:
> 
> /* For allocation policy see port_c_add_tuple() and port_c_add_mp(). */

Ok, done:

====================
diff --git a/src/box/port.c b/src/box/port.c
index fea98ecc5..2dfd0118d 100644
--- a/src/box/port.c
+++ b/src/box/port.c
@@ -141,6 +141,10 @@ port_tuple_dump_lua(struct port *base, struct lua_State *L, bool is_flat);
 static inline void
 port_c_destroy_entry(struct port_c_entry *pe)
 {
+	/*
+	 * See port_c_add_*() for algorithm of how and where to
+	 * store data, to understand why it is freed differently.
+	 */
 	if (pe->mp_size == 0)
 		tuple_unref(pe->tuple);
 	else if (pe->mp_size <= PORT_ENTRY_SIZE)

====================

>> +	if (pe->mp_size == 0)
>> +		tuple_unref(pe->tuple);
>> +	else if (pe->mp_size <= PORT_ENTRY_SIZE)
>> +		mempool_free(&port_entry_pool, pe->mp);
>> +	else
>> +		free(pe->mp);
>> +}
>> +
>> +static inline struct port_c_entry *
>> +port_c_new_entry(struct port_c *port)
>> +{
>> +	struct port_c_entry *e;
>> +	if (port->size == 0) {
>> +		e = &port->first_entry;
> 
> I don't really get this 'first_entry' optimization. Why do you
> need it at all?

Because most functions return only 1 result. And most of these
return it as a tuple. So having this value inside the port
preallocated allows to avoid any allocations/frees for these
cases. Zero allocation. Although I didn't really bench how
much it helps. I just stole it from port_tuple as a low hanging
fruit.

>> +		port->first = e;
>> +		port->last = e;
>> +	} else {
>> +		e = mempool_alloc(&port_entry_pool);
>> +		if (e == NULL) {
>> +			diag_set(OutOfMemory, sizeof(*e), "mempool_alloc", "e");
>> +			return NULL;
>> +		}
>> +		port->last->next = e;
>> +		port->last = e;
>> +	}
>> +	e->next = NULL;
>> +	++port->size;
>> +	return e;
>> +}
>> +
>> +int
>> +port_c_add_tuple(struct port *base, struct tuple *tuple)
>> +{
>> +	struct port_c *port = (struct port_c *)base;
>> +	struct port_c_entry *pe = port_c_new_entry(port);
> 
> I guess reverting condition would make code flow more straightforward:
> if (pe == NULL)
> 	return -1;
> ...

I am infected with the paranoia that the most likely result should lead
to the positive branch. But I don't mind to make it simpler.

====================
diff --git a/src/box/port.c b/src/box/port.c
index 2dfd0118d..2c1fadb5c 100644
--- a/src/box/port.c
+++ b/src/box/port.c
@@ -202,14 +202,13 @@ port_c_add_tuple(struct port *base, struct tuple *tuple)
 {
 	struct port_c *port = (struct port_c *)base;
 	struct port_c_entry *pe = port_c_new_entry(port);
-	if (pe != NULL) {
-		/* 0 mp_size means the entry stores a tuple. */
-		pe->mp_size = 0;
-		pe->tuple = tuple;
-		tuple_ref(tuple);
-		return 0;
-	}
-	return -1;
+	if (pe == NULL)
+		return -1;
+	/* 0 mp_size means the entry stores a tuple. */
+	pe->mp_size = 0;
+	pe->tuple = tuple;
+	tuple_ref(tuple);
+	return 0;
 }
 
 int

====================

>> +	if (pe != NULL) {
>> +		/* 0 mp_size means the entry stores a tuple. */
>> +		pe->mp_size = 0;
>> +		pe->tuple = tuple;
>> +		tuple_ref(tuple);
>> +		return 0;
>> +	}
>> +	return -1;
>> +}
>> +
>> +int
>> +port_c_add_mp(struct port *base, const char *mp, const char *mp_end)
>> +{
>> +	struct port_c *port = (struct port_c *)base;
>> +	struct port_c_entry *pe;
>> +	uint32_t size = mp_end - mp;
>> +	char *dst;
>> +	if (size <= PORT_ENTRY_SIZE) {
>> +		/*
>> +		 * Alloc on a mempool is several times faster than
>> +		 * on the heap. And it perfectly fits any
>> +		 * MessagePack number, a short string, a boolean.
>> +		 */
>> +		dst = mempool_alloc(&port_entry_pool);
> 
> Dubious optimization...I mean is this code complication really
> worth it?

mempool_alloc() of this size is x4-5 times faster than malloc()
according to my microbenchmarks on release build. I decided not
to write concrete numbers x4-5, since they probably depend on
machine.

By saying 'on this size' I mean fitting into port entry object.
And this is the perfect match for functions which return something
small and scalar like a number, a boolean, or a small string.

>> +		if (dst == NULL) {
>> +			diag_set(OutOfMemory, size, "mempool_alloc", "dst");
>> +			return -1;
>> +		}
>> +
>> +static int
>> +port_c_dump_msgpack_16(struct port *base, struct obuf *out)
>> +{
>> +	struct port_c *port = (struct port_c *)base;
>> +	struct port_c_entry *pe;
>> +	for (pe = port->first; pe != NULL; pe = pe->next) {
>> +		uint32_t size = pe->mp_size;
>> +		if (size == 0) {
>> +			if (tuple_to_obuf(pe->tuple, out) != 0)
>> +				return -1;
>> +		} else if (obuf_dup(out, pe->mp, size) != size) {
>> +			diag_set(OutOfMemory, size, "obuf_dup", "data");
>> +			return -1;
>> +		}
>> +		ERROR_INJECT(ERRINJ_PORT_DUMP, {
>> +			diag_set(OutOfMemory,
>> +				 size == 0 ? tuple_size(pe->tuple) : size,
>> +				 "obuf_dup", "data");
>> +			return -1;
>> +		});
> 
> You added error injection, but did not test it. Could you provide test
> (or point me at it please)?

In the v2 thread it is used in box/errinj.test.lua to check that
"port_dump can fail".

>> +	}
>> +	return port->size;
>> +}
>> +
>> diff --git a/src/box/port.h b/src/box/port.h
>> index 9d3d02b3c..cd40d8a15 100644
>> --- a/src/box/port.h
>> +++ b/src/box/port.h
>> @@ -130,6 +130,56 @@ void
>>  port_vdbemem_create(struct port *base, struct sql_value *mem,
>>  		    uint32_t mem_count);
>>  
>> +/**
>> + * The struct is PACKED to avoid unnecessary 4 byte padding
> 
> Is this really profitable?

-4 bytes for every object of this type by just adding one word to
the struct? Sounds good to me. Very easy to do, to prove it works
correct, and profit is obvious - the structure becomes 16% smaller.
This means 16% more objects in every slab used for port_c_entry
objects. Although I didn't measure how it affects things under heavy
load on C functions.

>> + * after mp_size. These objects are never stored on stack, neither
>> + * allocated as an array, so the padding is not needed.
>> + */
>> +struct PACKED port_c_entry {
>> +	struct port_c_entry *next;
>> +	union {
>> +		/** Valid if mp_size is 0. */
>> +		struct tuple *tuple;
>> +		/**
>> +		 * Valid if mp_size is > 0. MessagePack is
>> +		 * allocated either on heap or on the port entry
>> +		 * mempool, if it fits into a pool object.
>> +		 */
>> +		char *mp;
>> +	};
>> +	uint32_t mp_size;
>> +};
>> +
>> +/**
>> + * C port is used by C functions from the public API. They can
>> + * return tuples and arbitrary MessagePack.
>> + */
>> +struct port_c {
>> +	const struct port_vtab *vtab;
>> +	struct port_c_entry *first;
>> +	struct port_c_entry *last;
>> +	struct port_c_entry first_entry;
>> +	int size;
> 
> Why do you call it size? I'd name it entry_cnt or count
> (imho port size = sum of all entries' sizes).

For that we have term 'bsize'. But I don't have anything against
count, personally. The problem here is that it will be inconsistent
with other ports (port_lua, and port_tuple which still exists here),
and with local variables named 'size', which are assigned to
port->size member - that is the problem in src/box/lua/call.c.

When I renamed size to count, I got notable amount of unnecessary
diff. So I would like to keep size unless you insist, and in that
case I will do it in a separate patch.

  reply	other threads:[~2020-04-23  0:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 17:47 [Tarantool-patches] [PATCH 0/3] box_return_mp Vladislav Shpilevoy
2020-03-08 17:47 ` [Tarantool-patches] [PATCH 1/3] box: fix struct port_tuple.size wrong type in Lua Vladislav Shpilevoy
2020-03-10 13:42   ` Nikita Pettik
2020-03-11  0:17     ` Vladislav Shpilevoy
2020-03-08 17:47 ` [Tarantool-patches] [PATCH 2/3] box: introduce port_c Vladislav Shpilevoy
2020-03-26 17:49   ` Nikita Pettik
2020-04-23  0:14     ` Vladislav Shpilevoy [this message]
2020-04-27 14:09       ` Nikita Pettik
2020-04-27 21:28         ` Vladislav Shpilevoy
2020-04-27 23:24           ` Nikita Pettik
2020-04-03 14:12   ` Igor Munkin
2020-04-23  0:14     ` Vladislav Shpilevoy
2020-03-08 17:47 ` [Tarantool-patches] [PATCH 3/3] box: introduce box_return_mp() public C function Vladislav Shpilevoy
2020-03-26 17:51   ` Nikita Pettik
2020-04-03 14:13   ` Igor Munkin
2020-04-23  0:14     ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7301da02-16f8-2629-b673-8d6125032111@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/3] box: introduce port_c' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox