[tarantool-patches] Re: [PATCH 6/6] swim: introduce suspicion
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Wed Apr 24 23:28:55 MSK 2019
On 24/04/2019 20:01, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [19/04/24 18:50]:
>> + /**
>> + * Number of attempts to reach out a member via another
>> + * members when it did not answer on a regular ping .
>> + */
>> + INDIRECT_PING_COUNT = 2,
>
> As far as I can see it's not the number of attempts, it's the
> number of cluster members involved as proxies for sending an
> indirect ping. The fact that you attempt an indirect ping only
> once before switching member state to dead is secondary.
>
> This has to do with lack of a description in changeset or code
> comment of indirect pings strategy.
Sorry, you are right. They are not 'attempts' strictly
speaking. New comment:
==================================================================
*/
NO_ACKS_TO_GC = 2,
/**
- * Number of attempts to reach out a member via another
- * members when it did not answer on a regular ping .
+ * Number of pings sent indirectly to a member via other
+ * members when it did not answer on a regular ping. The
+ * messages are sent in parallel and via different
+ * members.
*/
INDIRECT_PING_COUNT = 2,
};
==================================================================
>
>> +/**
>> + * Put the member into a list of ACK waiters. @a hop_count says
>> + * how many hops from one member to another the ACK is expected to
>> + * do.
>> + */
>> static void
>> -swim_wait_ack(struct swim *swim, struct swim_member *member)
>> +swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count)
>> {
>> if (heap_node_is_stray(&member->in_wait_ack_heap)) {
>> - member->ping_deadline = swim_time() + swim->wait_ack_tick.at;
>> + double timeout = swim->wait_ack_tick.at * hop_count;
>
> Is hop_count ever greater than two? If not, I would not use int
> hop_count, but use bool is_indirect_ping, and add a comment
> explaining that in case of indirect pings the number of hops is
> exactly two, so we need to increase the timeout.
Ok, I do not mind.
==================================================================
diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index af9b59cbb..24229bf0c 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -493,10 +493,18 @@ swim_cached_round_msg_invalidate(struct swim *swim)
* do.
*/
static void
-swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count)
+swim_wait_ack(struct swim *swim, struct swim_member *member,
+ bool was_ping_indirect)
{
if (heap_node_is_stray(&member->in_wait_ack_heap)) {
- double timeout = swim->wait_ack_tick.at * hop_count;
+ double timeout = swim->wait_ack_tick.at;
+ /*
+ * Direct ping is two hops: PING + ACK.
+ * Indirect ping is four hops: PING, FORWARD PING,
+ * ACK, FORWARD ACK. This is why x2 for indirects.
+ */
+ if (was_ping_indirect)
+ timeout *= 2;
member->ping_deadline = swim_time() + timeout;
wait_ack_heap_insert(&swim->wait_ack_heap, member);
swim_ev_timer_start(loop(), &swim->wait_ack_tick);
@@ -627,7 +635,7 @@ swim_ping_task_complete(struct swim_task *task,
struct swim *swim = swim_by_scheduler(scheduler);
struct swim_member *m = container_of(task, struct swim_member,
ping_task);
- swim_wait_ack(swim, m, 1);
+ swim_wait_ack(swim, m, false);
}
/** Free member's resources. */
@@ -1082,7 +1090,7 @@ swim_complete_step(struct swim_task *task,
* dissemination and failure detection
* sections.
*/
- swim_wait_ack(swim, m, 1);
+ swim_wait_ack(swim, m, false);
swim_decrease_event_ttd(swim);
}
}
@@ -1239,7 +1247,7 @@ swim_iping_task_complete(struct swim_task *base_task,
struct swim_member *m =
swim_find_member(swim, &b->dst_uuid);
if (m != NULL)
- swim_wait_ack(swim, m, 2);
+ swim_wait_ack(swim, m, true);
}
swim_iping_task_delete(t);
}
==================================================================
>
>> +/**
>> + * When indirect pings are sent, each is represented by this
>> + * structure.
>> + */
>> +struct swim_iping_task {
>> + /** Base task used by the scheduler. */
>> + struct swim_task base;
>> + /**
>> + * Reference to a block of other indirect ping tasks sent
>> + * at the same moment. Used to decide whether it is needed
>> + * to start waiting for an ACK, and on which member.
>> + */
>> + struct swim_iping_block *block;
>
> I don't understand this comment, why do you need a reference?
>
> I would not use it at all, seems like we have a separate data
> structure when existing structures would do just fine.
>
Talking of block - we have decided to get rid of it. To
understand why, I should explain why was it needed.
1) To wait for an ACK I need a struct swim_member to put it
into ack waiter queue. But in task_complete() callback I don't
know the member nor its UUID. struct swim_task is too encapsulated
inside the transport level. Iping block stores the UUID of sender.
2) It is possible, that a first indirect ACK was sent and we got
a result faster than managed to send a next indirect ACK. In such a
case we will wait for the second ACK, even having a response on
the first one.
We've decided that block can be removed if UUID is added to
struct swim_task, and if we mark ACK and ping with timestamps.
But I found another solution for ACK/ping timing problem.
When a next indirect ping is sent, lets watch at
swim_member status. If it is not suspected anymore, it
means, that an ACK had already been received while this ping
was waiting for EV_WRITE.
I will not paste here diff of this change since it is too big.
I would rather send V2 of this patchset.
>> switch (def.type) {
>> case SWIM_FD_MSG_PING:
>> - if (! swim_task_is_scheduled(&member->ack_task))
>> + if (proxy != NULL) {
>> + if (swim_send_indirect_ack(swim, &member->addr,
>> + proxy) != 0)
>> + diag_log();
>> + } else if (! swim_task_is_scheduled(&member->ack_task)) {
>> swim_send_ack(swim, &member->ack_task, &member->addr);
>> + }
>> break;
>
> Why do you ever need to distinguish direct and indirect acks? Why not
> respond to the sender address, while preserving the routing table
> from the ping request, assuming the sender can figure out whether
> the response is addressed to him directly or it should proxy it
> further based on the received proxy section of the response?
Such a long sentence, do not know how to split it in parts. Ok.
Firstly, I do not distinguish indirect ACKs when they are received
by the pinger. This is thanks to the transport level full isolation.
I do not even have a special request type like 'PING_REQUEST' or
something, like classical implementation does. In theory, I could even
send usual anti-entropy and dissemination messages indirectly.
Secondly, to preserve routing table I need to distinguish direct
from indirect ACKs on sender side, because routing table is generated
only when proxy address is set. I can't just say
'swim_send_ack(swim, member)' and let the transport level decide
whether it should use proxy or not.
The transport level doesn't have an address-table to choose proxies,
as well as it does not know, that this message is a response to another
message, which in turn was received via proxy. Only SWIM core has this
information. So it should give the transport level a hint via
'swim_task_proxy()' method.
swim_send_ack and swim_send_indirect_ack are just wrappers around
swim_send_fd_msg(), passing different proxy values. The former sets
proxy NULL, the latter not NULL. That's all.
>
> --
> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
> http://tarantool.io - www.twitter.com/kostja_osipov
>
More information about the Tarantool-patches
mailing list