Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
@ 2021-02-12 11:25 Serge Petrenko via Tarantool-patches
  2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-12 11:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

While sending a WAL, relay only yields in `coio_write_xrow`, once it
sees the socket isn't ready for writes.
It may happen that the socket is always ready for a long period of time,
and relay doesn't yield at all while recovering a whole .xlog file. This
may take well more than a minute.
During this period of time, relay doesn't read replica's ACKs due to
relay reader fiber not being scheduled, and once the reader is finally
live it times out immediately, causing the replica to reconnect.

The problem is amplified by the fact that replica waits for
replication_timeout to pass prior to reconnecting, which lets master
pile up even more ready WALs, and effectively making it impossible for
the replica to sync.

To fix the problem let's yield explicitly in relay_send_row every
WAL_ROWS_PER_YIELD rows. The same is already done in local recovery, and
serves the same purpose: to not block the event loop for too long.

Closes #5762
---
No test provided as the fix is quite obvious but rather hard to test
automatically.
https://github.com/tarantool/tarantool/issues/5762
https://github.com/tarantool/tarantool/tree/sp/gh-5762-relay-yield

 src/box/relay.cc | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/box/relay.cc b/src/box/relay.cc
index df04f8198..afc57dfbc 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
 {
 	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
 
+	static uint64_t row_cnt = 0;
 	packet->sync = relay->sync;
 	relay->last_row_time = ev_monotonic_now(loop());
 	coio_write_xrow(&relay->io, packet);
 	fiber_gc();
 
+	/*
+	 * It may happen that the socket is always ready for write, so yield
+	 * explicitly every now and then to not block the event loop.
+	 */
+	row_cnt++;
+	if (row_cnt % WAL_ROWS_PER_YIELD == 0) {
+		fiber_sleep(0);
+	}
 	struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
 	if (inj != NULL && inj->dparam > 0)
 		fiber_sleep(inj->dparam);
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 11:25 [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows Serge Petrenko via Tarantool-patches
@ 2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
  2021-02-12 11:46   ` Cyrill Gorcunov via Tarantool-patches
  2021-02-12 12:08   ` Serge Petrenko via Tarantool-patches
  2021-02-12 21:48 ` Vladislav Shpilevoy via Tarantool-patches
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-02-12 11:37 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: v.shpilevoy, tarantool-patches

On Fri, Feb 12, 2021 at 02:25:41PM +0300, Serge Petrenko wrote:
> @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>  {
>  	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
>  
> +	static uint64_t row_cnt = 0;
>  	packet->sync = relay->sync;
>  	relay->last_row_time = ev_monotonic_now(loop());
>  	coio_write_xrow(&relay->io, packet);
>  	fiber_gc();
>  
> +	/*
> +	 * It may happen that the socket is always ready for write, so yield
> +	 * explicitly every now and then to not block the event loop.
> +	 */
> +	row_cnt++;
> +	if (row_cnt % WAL_ROWS_PER_YIELD == 0) {
> +		fiber_sleep(0);
> +	}

Serge, if I'm not missing something obvious, this row counter is
related to xstream->rows? So maybe define this type as size_t instead,
to be consistent with another occurence of WAL_ROWS_PER_YIELD. Say,

	static size_t row_cnt = 0;
	...
	if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
		fiber_sleep(0);

I'm fine with uint64_t as well, just out of curiosity.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-12 11:46   ` Cyrill Gorcunov via Tarantool-patches
  2021-02-12 12:08   ` Serge Petrenko via Tarantool-patches
  1 sibling, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-02-12 11:46 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: v.shpilevoy, tarantool-patches

On Fri, Feb 12, 2021 at 02:37:48PM +0300, Cyrill Gorcunov wrote:
> 
> Serge, if I'm not missing something obvious, this row counter is
> related to xstream->rows? So maybe define this type as size_t instead,
> to be consistent with another occurence of WAL_ROWS_PER_YIELD. Say,
> 
> 	static size_t row_cnt = 0;
> 	...
> 	if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
> 		fiber_sleep(0);
> 
> I'm fine with uint64_t as well, just out of curiosity.

Ack for current version as well

	Cyrill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
  2021-02-12 11:46   ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-12 12:08   ` Serge Petrenko via Tarantool-patches
  2021-02-12 17:00     ` Cyrill Gorcunov via Tarantool-patches
  1 sibling, 1 reply; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-12 12:08 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: v.shpilevoy, tarantool-patches



12.02.2021 14:37, Cyrill Gorcunov пишет:
> On Fri, Feb 12, 2021 at 02:25:41PM +0300, Serge Petrenko wrote:
>> @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>>   {
>>   	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
>>   
>> +	static uint64_t row_cnt = 0;
>>   	packet->sync = relay->sync;
>>   	relay->last_row_time = ev_monotonic_now(loop());
>>   	coio_write_xrow(&relay->io, packet);
>>   	fiber_gc();
>>   
>> +	/*
>> +	 * It may happen that the socket is always ready for write, so yield
>> +	 * explicitly every now and then to not block the event loop.
>> +	 */
>> +	row_cnt++;
>> +	if (row_cnt % WAL_ROWS_PER_YIELD == 0) {
>> +		fiber_sleep(0);
>> +	}
> Serge, if I'm not missing something obvious, this row counter is
> related to xstream->rows? So maybe define this type as size_t instead,

Thanks for the review!
Yes, it serves the same purpose.
No problem, let it be size_t, updated.

> to be consistent with another occurence of WAL_ROWS_PER_YIELD. Say,
>
> 	static size_t row_cnt = 0;
> 	...
> 	if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
> 		fiber_sleep(0);
>
> I'm fine with uint64_t as well, just out of curiosity.

Looks fine. My bad, I didn't pay enough attention when writing the patch.
=========================================================================
diff --git a/src/box/relay.cc b/src/box/relay.cc
index afc57dfbc..1d8edf116 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -836,7 +836,7 @@ relay_send(struct relay *relay, struct xrow_header 
*packet)
  {
      ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);

-    static uint64_t row_cnt = 0;
+    static size_t row_cnt = 0;
      packet->sync = relay->sync;
      relay->last_row_time = ev_monotonic_now(loop());
      coio_write_xrow(&relay->io, packet);
@@ -846,10 +846,9 @@ relay_send(struct relay *relay, struct xrow_header 
*packet)
       * It may happen that the socket is always ready for write, so yield
       * explicitly every now and then to not block the event loop.
       */
-    row_cnt++;
-    if (row_cnt % WAL_ROWS_PER_YIELD == 0) {
+    if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
          fiber_sleep(0);
-    }
+
      struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
      if (inj != NULL && inj->dparam > 0)
          fiber_sleep(inj->dparam);

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 12:08   ` Serge Petrenko via Tarantool-patches
@ 2021-02-12 17:00     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 20+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-02-12 17:00 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: v.shpilevoy, tarantool-patches

On Fri, Feb 12, 2021 at 03:08:51PM +0300, Serge Petrenko wrote:
...
> =========================================================================
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index afc57dfbc..1d8edf116 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -836,7 +836,7 @@ relay_send(struct relay *relay, struct xrow_header
> *packet)
>  {

Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 11:25 [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows Serge Petrenko via Tarantool-patches
  2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-12 21:48 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-12 22:25   ` Cyrill Gorcunov via Tarantool-patches
  2021-02-15  8:40   ` Serge Petrenko via Tarantool-patches
  2021-02-24 22:20 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-26  8:41 ` Kirill Yukhin via Tarantool-patches
  3 siblings, 2 replies; 20+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-12 21:48 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Hi! Thanks for the patch!

On 12.02.2021 12:25, Serge Petrenko via Tarantool-patches wrote:
> While sending a WAL, relay only yields in `coio_write_xrow`, once it
> sees the socket isn't ready for writes.
> It may happen that the socket is always ready for a long period of time,
> and relay doesn't yield at all while recovering a whole .xlog file. This
> may take well more than a minute.
> During this period of time, relay doesn't read replica's ACKs due to
> relay reader fiber not being scheduled, and once the reader is finally
> live it times out immediately, causing the replica to reconnect.
> 
> The problem is amplified by the fact that replica waits for
> replication_timeout to pass prior to reconnecting, which lets master
> pile up even more ready WALs, and effectively making it impossible for
> the replica to sync.

I couldn't understand this part. Why is it bad? Yeah, replica waits,
but replica is applier, on another instance. How is it related? And
relay_reader does not send anything. So why is it bad?

Couldn't the problem be fixed by reading all the non-consumed data after
reading WAL?

The current solution also looks fine. Maybe even better because it
becomes consistent with local recovery. However I still want to
understand this part about replica.

> To fix the problem let's yield explicitly in relay_send_row every
> WAL_ROWS_PER_YIELD rows. The same is already done in local recovery, and
> serves the same purpose: to not block the event loop for too long.
> 
> Closes #5762
> ---
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index df04f8198..afc57dfbc 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>  {
>  	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
>  
> +	static uint64_t row_cnt = 0;

Relays are in threads. So this variable either should be thread-local,
or be in struct relay. Otherwise you get non-atomic updates which may
lead to some increments disappearing.

Given that thread-local variable access is not free, I would go for
having it in struct relay, but up to you.

>  	packet->sync = relay->sync;
>  	relay->last_row_time = ev_monotonic_now(loop());
>  	coio_write_xrow(&relay->io, packet);
>  	fiber_gc();
>  
> +	/*
> +	 * It may happen that the socket is always ready for write, so yield
> +	 * explicitly every now and then to not block the event loop.
> +	 */
> +	row_cnt++;
> +	if (row_cnt % WAL_ROWS_PER_YIELD == 0) {
> +		fiber_sleep(0);
> +	}

Maybe better drop {} as the if's body is just one line.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 21:48 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-12 22:25   ` Cyrill Gorcunov via Tarantool-patches
  2021-02-15  8:45     ` Serge Petrenko via Tarantool-patches
  2021-02-15  8:40   ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-02-12 22:25 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Fri, Feb 12, 2021 at 10:48:49PM +0100, Vladislav Shpilevoy wrote:
> > diff --git a/src/box/relay.cc b/src/box/relay.cc
> > index df04f8198..afc57dfbc 100644
> > --- a/src/box/relay.cc
> > +++ b/src/box/relay.cc
> > @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
> >  {
> >  	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
> >  
> > +	static uint64_t row_cnt = 0;
> 
> Relays are in threads. So this variable either should be thread-local,
> or be in struct relay. Otherwise you get non-atomic updates which may
> lead to some increments disappearing.

That's the good catch! Without lock/tls this gonna be completely
arbritrary updates.

> Given that thread-local variable access is not free, I would go for
> having it in struct relay, but up to you.

Actually tls access should be as cheap as regular memory access
except using different base register (iirc %fs on linux). But
maybe things are changed novadays.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 21:48 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-12 22:25   ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-15  8:40   ` Serge Petrenko via Tarantool-patches
  2021-02-17 21:11     ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 1 reply; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-15  8:40 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches



13.02.2021 00:48, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
> On 12.02.2021 12:25, Serge Petrenko via Tarantool-patches wrote:
>> While sending a WAL, relay only yields in `coio_write_xrow`, once it
>> sees the socket isn't ready for writes.
>> It may happen that the socket is always ready for a long period of time,
>> and relay doesn't yield at all while recovering a whole .xlog file. This
>> may take well more than a minute.
>> During this period of time, relay doesn't read replica's ACKs due to
>> relay reader fiber not being scheduled, and once the reader is finally
>> live it times out immediately, causing the replica to reconnect.
>>
>> The problem is amplified by the fact that replica waits for
>> replication_timeout to pass prior to reconnecting, which lets master
>> pile up even more ready WALs, and effectively making it impossible for
>> the replica to sync.
> I couldn't understand this part. Why is it bad? Yeah, replica waits,
> but replica is applier, on another instance. How is it related? And
> relay_reader does not send anything. So why is it bad?

Thanks for the review!

I shouldn't have included this paragraph to the explanation probably.
I tried to explain how this bug leads to replica not being able to sync
with master when master's under load.

I reworded the commit message a bit, hope it's more clear now.

>
> Couldn't the problem be fixed by reading all the non-consumed data after
> reading WAL?

Relay does read every ack received while feeding a WAL, but it reads the 
acks only
after finishing reading WAL, so all the reads time-out.

>
> The current solution also looks fine. Maybe even better because it
> becomes consistent with local recovery. However I still want to
> understand this part about replica.
>
>> To fix the problem let's yield explicitly in relay_send_row every
>> WAL_ROWS_PER_YIELD rows. The same is already done in local recovery, and
>> serves the same purpose: to not block the event loop for too long.
>>
>> Closes #5762
>> ---
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index df04f8198..afc57dfbc 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>>   {
>>   	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
>>   
>> +	static uint64_t row_cnt = 0;
> Relays are in threads. So this variable either should be thread-local,
> or be in struct relay. Otherwise you get non-atomic updates which may
> lead to some increments disappearing.
>
> Given that thread-local variable access is not free, I would go for
> having it in struct relay, but up to you.

Thanks for noticing! Let it be in relay then.
Diff:

================================================
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 1d8edf116..6d9269e1d 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -117,6 +117,11 @@ struct relay {
          * is passed by the replica on subscribe.
          */
         uint32_t id_filter;
+       /**
+        * How many rows has this relay sent to the replica. Used to 
yield once
+        * in a while when reading a WAL to unblock the event loop.
+        */
+       size_t row_cnt;
         /**
          * Local vclock at the moment of subscribe, used to check
          * dataset on the other side and send missing data rows if any.
@@ -218,6 +223,7 @@ relay_start(struct relay *relay, int fd, uint64_t sync,
         coio_create(&relay->io, fd);
         relay->sync = sync;
         relay->state = RELAY_FOLLOW;
+       relay->row_cnt = 0;
         relay->last_row_time = ev_monotonic_now(loop());
  }

@@ -836,7 +842,6 @@ relay_send(struct relay *relay, struct xrow_header 
*packet)
  {
         ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);

-       static size_t row_cnt = 0;
         packet->sync = relay->sync;
         relay->last_row_time = ev_monotonic_now(loop());
         coio_write_xrow(&relay->io, packet);
@@ -846,7 +851,7 @@ relay_send(struct relay *relay, struct xrow_header 
*packet)
          * It may happen that the socket is always ready for write, so 
yield
          * explicitly every now and then to not block the event loop.
          */
-       if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
+       if (++relay->row_cnt % WAL_ROWS_PER_YIELD == 0)
                 fiber_sleep(0);

         struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);



>
>>   	packet->sync = relay->sync;
>>   	relay->last_row_time = ev_monotonic_now(loop());
>>   	coio_write_xrow(&relay->io, packet);
>>   	fiber_gc();
>>   
>> +	/*
>> +	 * It may happen that the socket is always ready for write, so yield
>> +	 * explicitly every now and then to not block the event loop.
>> +	 */
>> +	row_cnt++;
>> +	if (row_cnt % WAL_ROWS_PER_YIELD == 0) {
>> +		fiber_sleep(0);
>> +	}
> Maybe better drop {} as the if's body is just one line.
Already fixed in reply to Cyrill.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 22:25   ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-15  8:45     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-15  8:45 UTC (permalink / raw)
  To: Cyrill Gorcunov, Vladislav Shpilevoy; +Cc: tarantool-patches



13.02.2021 01:25, Cyrill Gorcunov пишет:
> On Fri, Feb 12, 2021 at 10:48:49PM +0100, Vladislav Shpilevoy wrote:
>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>> index df04f8198..afc57dfbc 100644
>>> --- a/src/box/relay.cc
>>> +++ b/src/box/relay.cc
>>> @@ -836,11 +836,20 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>>>   {
>>>   	ERROR_INJECT_YIELD(ERRINJ_RELAY_SEND_DELAY);
>>>   
>>> +	static uint64_t row_cnt = 0;
>> Relays are in threads. So this variable either should be thread-local,
>> or be in struct relay. Otherwise you get non-atomic updates which may
>> lead to some increments disappearing.
> That's the good catch! Without lock/tls this gonna be completely
> arbritrary updates.

True. My bad I failed to notice this.

>
>> Given that thread-local variable access is not free, I would go for
>> having it in struct relay, but up to you.
> Actually tls access should be as cheap as regular memory access
> except using different base register (iirc %fs on linux). But
> maybe things are changed novadays.

I found some random article regarding tls access cost:
https://software.intel.com/content/www/us/en/develop/blogs/the-hidden-performance-cost-of-accessing-thread-local-variables.html

So it might be not that cheap. However I didn't dive too deep into the 
article.
I decided to implement the counter as relay member for now.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-15  8:40   ` Serge Petrenko via Tarantool-patches
@ 2021-02-17 21:11     ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-18 20:24       ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-17 21:11 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Hi! Thanks for the fixes!

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 1d8edf116..6d9269e1d 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -117,6 +117,11 @@ struct relay {
>          * is passed by the replica on subscribe.
>          */
>         uint32_t id_filter;
> +       /**
> +        * How many rows has this relay sent to the replica. Used to yield once
> +        * in a while when reading a WAL to unblock the event loop.
> +        */
> +       size_t row_cnt;

1. But it is not a size of anything, right? Maybe make it
int64_t then?

> @@ -846,7 +851,7 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>          * It may happen that the socket is always ready for write, so yield
>          * explicitly every now and then to not block the event loop.
>          */
> -       if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
> +       if (++relay->row_cnt % WAL_ROWS_PER_YIELD == 0)

2. I just found WAL_ROWS_PER_YIELD is not a power of 2. This means
'%' may be expensive. If WAL_ROWS_PER_YIELD would be a power of 2,
'%' would be turned by the compiler into '&' to cut the remainder off.

Maybe worth changing in a separate non-related to this bug commit?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-17 21:11     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-18 20:24       ` Serge Petrenko via Tarantool-patches
  2021-02-23 22:30         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-18 20:24 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches



18.02.2021 00:11, Vladislav Shpilevoy пишет:
> Hi! Thanks for the fixes!

Thanks for the review!

>
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 1d8edf116..6d9269e1d 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -117,6 +117,11 @@ struct relay {
>>           * is passed by the replica on subscribe.
>>           */
>>          uint32_t id_filter;
>> +       /**
>> +        * How many rows has this relay sent to the replica. Used to yield once
>> +        * in a while when reading a WAL to unblock the event loop.
>> +        */
>> +       size_t row_cnt;
> 1. But it is not a size of anything, right? Maybe make it
> int64_t then?

uint64_t, probably?
I'm fine with it. Fixed on the branch.


>
>> @@ -846,7 +851,7 @@ relay_send(struct relay *relay, struct xrow_header *packet)
>>           * It may happen that the socket is always ready for write, so yield
>>           * explicitly every now and then to not block the event loop.
>>           */
>> -       if (++row_cnt % WAL_ROWS_PER_YIELD == 0)
>> +       if (++relay->row_cnt % WAL_ROWS_PER_YIELD == 0)
> 2. I just found WAL_ROWS_PER_YIELD is not a power of 2. This means
> '%' may be expensive. If WAL_ROWS_PER_YIELD would be a power of 2,
> '%' would be turned by the compiler into '&' to cut the remainder off.
>
> Maybe worth changing in a separate non-related to this bug commit?

Good idea, thanks!
Pushed on top of the original commit.


-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-18 20:24       ` Serge Petrenko via Tarantool-patches
@ 2021-02-23 22:30         ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-24  9:48           ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-23 22:30 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Thanks for the patch!

>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>> index 1d8edf116..6d9269e1d 100644
>>> --- a/src/box/relay.cc
>>> +++ b/src/box/relay.cc
>>> @@ -117,6 +117,11 @@ struct relay {
>>>           * is passed by the replica on subscribe.
>>>           */
>>>          uint32_t id_filter;
>>> +       /**
>>> +        * How many rows has this relay sent to the replica. Used to yield once
>>> +        * in a while when reading a WAL to unblock the event loop.
>>> +        */
>>> +       size_t row_cnt;

Name is a bit ugly, because all the other members are not
contractions. They use full words. But up to you.

>> 1. But it is not a size of anything, right? Maybe make it
>> int64_t then?
> 
> uint64_t, probably?

Nope, int64_t. It is supposed to be 'faster'. Because it does
not have defined overflow rules, and therefore the hardware does
not need to handle it.

But honestly, I didn't measure. For me it is more cargo cult. I
just use signed integers where I can assuming that the hardware
really may omit an instruction or so.

Up to you.

The patch about power of 2 worked btw. This is how it looks now:

	andq   $0x7fff, %rcx             ; imm = 0x7FFF

this is how it used to look:

	movl   $0x7d00, %ecx             ; imm = 0x7D00 
	divq   %rcx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-23 22:30         ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-24  9:48           ` Serge Petrenko via Tarantool-patches
  2021-02-24 10:15             ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-24  9:48 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches



24.02.2021 01:30, Vladislav Shpilevoy пишет:
> Thanks for the patch!
Thanks for the reivew!

>>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>>> index 1d8edf116..6d9269e1d 100644
>>>> --- a/src/box/relay.cc
>>>> +++ b/src/box/relay.cc
>>>> @@ -117,6 +117,11 @@ struct relay {
>>>>            * is passed by the replica on subscribe.
>>>>            */
>>>>           uint32_t id_filter;
>>>> +       /**
>>>> +        * How many rows has this relay sent to the replica. Used to yield once
>>>> +        * in a while when reading a WAL to unblock the event loop.
>>>> +        */
>>>> +       size_t row_cnt;
> Name is a bit ugly, because all the other members are not
> contractions. They use full words. But up to you.

No problem, changed to `row_count`.
>>> 1. But it is not a size of anything, right? Maybe make it
>>> int64_t then?
>> uint64_t, probably?
> Nope, int64_t. It is supposed to be 'faster'. Because it does
> not have defined overflow rules, and therefore the hardware does
> not need to handle it.
>
> But honestly, I didn't measure. For me it is more cargo cult. I
> just use signed integers where I can assuming that the hardware
> really may omit an instruction or so.
>
> Up to you.

Long story short, I'd like to leave it as is. Besides, we have an 
unsigned type
(size_t) in local recovery.

Ok, now I see what you meant.
I never thought of this, and brief googling showed no signs of a speedup
with signed arithmetic vs unsigned.

Actually, the standard says signed overflow is an undefined behaviour while
an unsigned overflow should result in a wrap (modulo 2^64 in our case).

Do you think this wrap may be costly on some architecture?
>
> The patch about power of 2 worked btw. This is how it looks now:
>
> 	andq   $0x7fff, %rcx             ; imm = 0x7FFF
>
> this is how it used to look:
>
> 	movl   $0x7d00, %ecx             ; imm = 0x7D00
> 	divq   %rcx

Yeah, that was a nice catch.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-24  9:48           ` Serge Petrenko via Tarantool-patches
@ 2021-02-24 10:15             ` Cyrill Gorcunov via Tarantool-patches
  2021-02-24 10:35               ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-02-24 10:15 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Vladislav Shpilevoy, tarantool-patches

On Wed, Feb 24, 2021 at 12:48:07PM +0300, Serge Petrenko wrote:
...
> > > > 1. But it is not a size of anything, right? Maybe make it
> > > > int64_t then?
> > > uint64_t, probably?
> > Nope, int64_t. It is supposed to be 'faster'. Because it does
> > not have defined overflow rules, and therefore the hardware does
> > not need to handle it.
> > 
> > But honestly, I didn't measure. For me it is more cargo cult. I
> > just use signed integers where I can assuming that the hardware
> > really may omit an instruction or so.
> > 
> > Up to you.
> 
> Long story short, I'd like to leave it as is. Besides, we have an unsigned
> type (size_t) in local recovery.
> 
> Ok, now I see what you meant.
> I never thought of this, and brief googling showed no signs of a speedup
> with signed arithmetic vs unsigned.
> 
> Actually, the standard says signed overflow is an undefined behaviour while
> an unsigned overflow should result in a wrap (modulo 2^64 in our case).
> 
> Do you think this wrap may be costly on some architecture?

Addition on hardware level _always_ setup OF/CF flags so in this term it
doesn't matter which to use int64 or uint64 (iow hw treats addition as
signed integers all the time and it is up you how you gonna use this
flags in later code).

What is more interesting is that older compilers (at least gcc) generate
more efficient code for *signed* integers, ie int64_t. I've been pretty
surprised when discovered this. I dont get you a reference because I
don't remember the exact versions though. In summary: rule of thumb
is to use signed integers if you really need some addition in a cycle.
For more rare access unsigned is more that enough and latest gcc already
can handle it the same way as signed numbers.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-24 10:15             ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-24 10:35               ` Serge Petrenko via Tarantool-patches
  2021-02-24 12:07                 ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-24 10:35 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vladislav Shpilevoy, tarantool-patches



24.02.2021 13:15, Cyrill Gorcunov пишет:
> On Wed, Feb 24, 2021 at 12:48:07PM +0300, Serge Petrenko wrote:
> ...
>>>>> 1. But it is not a size of anything, right? Maybe make it
>>>>> int64_t then?
>>>> uint64_t, probably?
>>> Nope, int64_t. It is supposed to be 'faster'. Because it does
>>> not have defined overflow rules, and therefore the hardware does
>>> not need to handle it.
>>>
>>> But honestly, I didn't measure. For me it is more cargo cult. I
>>> just use signed integers where I can assuming that the hardware
>>> really may omit an instruction or so.
>>>
>>> Up to you.
>> Long story short, I'd like to leave it as is. Besides, we have an unsigned
>> type (size_t) in local recovery.
>>
>> Ok, now I see what you meant.
>> I never thought of this, and brief googling showed no signs of a speedup
>> with signed arithmetic vs unsigned.
>>
>> Actually, the standard says signed overflow is an undefined behaviour while
>> an unsigned overflow should result in a wrap (modulo 2^64 in our case).
>>
>> Do you think this wrap may be costly on some architecture?
> Addition on hardware level _always_ setup OF/CF flags so in this term it
> doesn't matter which to use int64 or uint64 (iow hw treats addition as
> signed integers all the time and it is up you how you gonna use this
> flags in later code).
>
> What is more interesting is that older compilers (at least gcc) generate
> more efficient code for *signed* integers, ie int64_t. I've been pretty
> surprised when discovered this. I dont get you a reference because I
> don't remember the exact versions though. In summary: rule of thumb
> is to use signed integers if you really need some addition in a cycle.
> For more rare access unsigned is more that enough and latest gcc already
> can handle it the same way as signed numbers.

Thanks for your answer!
What do you propose? Should I hange it to int64_t then?

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-24 10:35               ` Serge Petrenko via Tarantool-patches
@ 2021-02-24 12:07                 ` Cyrill Gorcunov via Tarantool-patches
  2021-02-24 12:14                   ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 20+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-02-24 12:07 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Vladislav Shpilevoy, tarantool-patches

On Wed, Feb 24, 2021 at 01:35:49PM +0300, Serge Petrenko wrote:
> 
> Thanks for your answer!
> What do you propose? Should I hange it to int64_t then?

Yeah, better to stick with int64_t

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-24 12:07                 ` Cyrill Gorcunov via Tarantool-patches
@ 2021-02-24 12:14                   ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 20+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-02-24 12:14 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vladislav Shpilevoy, tarantool-patches



24.02.2021 15:07, Cyrill Gorcunov пишет:
> On Wed, Feb 24, 2021 at 01:35:49PM +0300, Serge Petrenko wrote:
>> Thanks for your answer!
>> What do you propose? Should I hange it to int64_t then?
> Yeah, better to stick with int64_t

Ok, no problem.

Changed to int64_t.


-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 11:25 [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows Serge Petrenko via Tarantool-patches
  2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
  2021-02-12 21:48 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-24 22:20 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-26  8:41 ` Kirill Yukhin via Tarantool-patches
  3 siblings, 0 replies; 20+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-24 22:20 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Hi! Thanks for the patch!

LGTM.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-12 11:25 [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows Serge Petrenko via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-02-24 22:20 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-26  8:41 ` Kirill Yukhin via Tarantool-patches
  2021-02-26 20:24   ` Vladislav Shpilevoy via Tarantool-patches
  3 siblings, 1 reply; 20+ messages in thread
From: Kirill Yukhin via Tarantool-patches @ 2021-02-26  8:41 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

Hello,

On 12 фев 14:25, Serge Petrenko via Tarantool-patches wrote:
> While sending a WAL, relay only yields in `coio_write_xrow`, once it
> sees the socket isn't ready for writes.
> It may happen that the socket is always ready for a long period of time,
> and relay doesn't yield at all while recovering a whole .xlog file. This
> may take well more than a minute.
> During this period of time, relay doesn't read replica's ACKs due to
> relay reader fiber not being scheduled, and once the reader is finally
> live it times out immediately, causing the replica to reconnect.
> 
> The problem is amplified by the fact that replica waits for
> replication_timeout to pass prior to reconnecting, which lets master
> pile up even more ready WALs, and effectively making it impossible for
> the replica to sync.
> 
> To fix the problem let's yield explicitly in relay_send_row every
> WAL_ROWS_PER_YIELD rows. The same is already done in local recovery, and
> serves the same purpose: to not block the event loop for too long.
> 
> Closes #5762
> ---
> No test provided as the fix is quite obvious but rather hard to test
> automatically.
> https://github.com/tarantool/tarantool/issues/5762
> https://github.com/tarantool/tarantool/tree/sp/gh-5762-relay-yield

I've checked your patch into 2.6, 2.6 and master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows
  2021-02-26  8:41 ` Kirill Yukhin via Tarantool-patches
@ 2021-02-26 20:24   ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 20+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-26 20:24 UTC (permalink / raw)
  To: Kirill Yukhin, Serge Petrenko; +Cc: tarantool-patches

We forgot the changelog file.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-02-26 20:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-12 11:25 [Tarantool-patches] [PATCH] relay: yield explicitly every N sent rows Serge Petrenko via Tarantool-patches
2021-02-12 11:37 ` Cyrill Gorcunov via Tarantool-patches
2021-02-12 11:46   ` Cyrill Gorcunov via Tarantool-patches
2021-02-12 12:08   ` Serge Petrenko via Tarantool-patches
2021-02-12 17:00     ` Cyrill Gorcunov via Tarantool-patches
2021-02-12 21:48 ` Vladislav Shpilevoy via Tarantool-patches
2021-02-12 22:25   ` Cyrill Gorcunov via Tarantool-patches
2021-02-15  8:45     ` Serge Petrenko via Tarantool-patches
2021-02-15  8:40   ` Serge Petrenko via Tarantool-patches
2021-02-17 21:11     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-18 20:24       ` Serge Petrenko via Tarantool-patches
2021-02-23 22:30         ` Vladislav Shpilevoy via Tarantool-patches
2021-02-24  9:48           ` Serge Petrenko via Tarantool-patches
2021-02-24 10:15             ` Cyrill Gorcunov via Tarantool-patches
2021-02-24 10:35               ` Serge Petrenko via Tarantool-patches
2021-02-24 12:07                 ` Cyrill Gorcunov via Tarantool-patches
2021-02-24 12:14                   ` Serge Petrenko via Tarantool-patches
2021-02-24 22:20 ` Vladislav Shpilevoy via Tarantool-patches
2021-02-26  8:41 ` Kirill Yukhin via Tarantool-patches
2021-02-26 20:24   ` Vladislav Shpilevoy via Tarantool-patches

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git