Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code
@ 2020-07-14 14:44 Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions Cyrill Gorcunov
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

A few patches while being working on #5143

branch gorcunov/gh-5143-ack

Cyrill Gorcunov (4):
  qsync: add missing functions descriptions
  qsync: drop txn_limbo_confirm_timeout
  qsync: txn_limbo_wait_complete -- fix type conversion
  qsync: don't send negative timeouts into fiber_cond_wait_timeout

 src/box/box.cc      |  2 +-
 src/box/txn_limbo.c | 20 ++++++++------------
 src/box/txn_limbo.h |  9 ++++++---
 3 files changed, 15 insertions(+), 16 deletions(-)


base-commit: a9b99f0efa4a0be6b85a0e0ba1811300cdbec726
-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions
  2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout Cyrill Gorcunov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/txn_limbo.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index f88ec2bcc..918db2263 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -252,9 +252,15 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo);
 void
 txn_limbo_force_empty(struct txn_limbo *limbo, int64_t last_confirm);
 
+/**
+ * Update qsync parameters dynamically.
+ */
 void
 txn_limbo_on_parameters_change(struct txn_limbo *limbo);
 
+/**
+ * Initialize qsync engine.
+ */
 void
 txn_limbo_init();
 
-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout
  2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion Cyrill Gorcunov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

We already use replication_synchro_quorum in code,
no need for this wrap.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/box.cc      |  2 +-
 src/box/txn_limbo.c | 11 ++---------
 src/box/txn_limbo.h |  3 ---
 3 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 884061a81..7a7befc32 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -959,7 +959,7 @@ box_clear_synchro_queue(void)
 		return;
 
 	/* Wait until pending confirmations/rollbacks reach us. */
-	double timeout = 2 * txn_limbo_confirm_timeout(&txn_limbo);
+	double timeout = 2 * replication_synchro_timeout;
 	double start_tm = fiber_time();
 	while (!txn_limbo_is_empty(&txn_limbo)) {
 		if (fiber_time() - start_tm > timeout)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 9623b71cf..15dbe6515 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -173,7 +173,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
 	double start_time = fiber_clock();
 	while (true) {
-		double deadline = start_time + txn_limbo_confirm_timeout(limbo);
+		double deadline = start_time + replication_synchro_timeout;
 		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
 		bool timed_out = fiber_cond_wait_timeout(&limbo->wait_cond,
@@ -450,13 +450,6 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 	}
 }
 
-double
-txn_limbo_confirm_timeout(struct txn_limbo *limbo)
-{
-	(void)limbo;
-	return replication_synchro_timeout;
-}
-
 /**
  * Waitpoint stores information about the progress of confirmation.
  * In the case of multimaster support, it will store a bitset
@@ -516,7 +509,7 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
 	txn_on_rollback(tle->txn, &on_rollback);
 	double start_time = fiber_clock();
 	while (true) {
-		double deadline = start_time + txn_limbo_confirm_timeout(limbo);
+		double deadline = start_time + replication_synchro_timeout;
 		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
 		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 918db2263..8cfb490c3 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -232,9 +232,6 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn);
 void
 txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn);
 
-double
-txn_limbo_confirm_timeout(struct txn_limbo *limbo);
-
 /**
  * Waiting for confirmation of all "sync" transactions
  * during confirm timeout or fail.
-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion
  2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
  2020-07-20 19:28 ` [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Vladislav Shpilevoy
  4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

In txn_limbo_wait_confirm we already use proper int type
(as declared in fiber_cond_wait_timeout) thus lets do the
same here.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/txn_limbo.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 15dbe6515..d5b887d36 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -176,12 +176,11 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 		double deadline = start_time + replication_synchro_timeout;
 		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
-		bool timed_out = fiber_cond_wait_timeout(&limbo->wait_cond,
-							 timeout);
+		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
 		fiber_set_cancellable(cancellable);
 		if (txn_limbo_entry_is_complete(entry))
 			goto complete;
-		if (timed_out)
+		if (rc != 0)
 			goto do_rollback;
 	}
 
-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
  2020-07-14 14:53   ` [Tarantool-patches] [PATCH v2 " Cyrill Gorcunov
  2020-07-20 19:28 ` [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Vladislav Shpilevoy
  4 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

Basically our timeout is calculated via (a - b), where
@a is a constant positive value fetched once, in turn
the @b is rather a dynamic value thus the result may
be negative. libev uses assert() call to catch such
values when passed to timers setup. Thus lets intercept
potential assert() trigger and exit early if timeout
is already expired.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/txn_limbo.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d5b887d36..3ed0b6db5 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -176,6 +176,8 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 		double deadline = start_time + replication_synchro_timeout;
 		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
+		if (timeout < 0)
+			goto do_rollback;
 		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
 		fiber_set_cancellable(cancellable);
 		if (txn_limbo_entry_is_complete(entry))
@@ -511,6 +513,8 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
 		double deadline = start_time + replication_synchro_timeout;
 		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
+		if (timeout < 0)
+			goto timed_out;
 		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
 		fiber_set_cancellable(cancellable);
 		if (cwp.is_confirm || cwp.is_rollback)
-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
@ 2020-07-14 14:53   ` Cyrill Gorcunov
  2020-07-16 22:45     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:53 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

Basically our timeout is calculated via (a - b), where
@a is a constant positive value fetched once, in turn
the @b is rather a dynamic value thus the result may
be negative. libev uses assert() call to catch such
values when passed to timers setup. Thus lets intercept
potential assert() trigger and exit early if timeout
is already expired.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
There were a typo, so I force-updated the branch

 src/box/txn_limbo.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d5b887d36..0924952b7 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	double start_time = fiber_clock();
 	while (true) {
 		double deadline = start_time + replication_synchro_timeout;
-		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
+		if (timeout < 0)
+			goto do_rollback;
+		bool cancellable = fiber_set_cancellable(false);
 		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
 		fiber_set_cancellable(cancellable);
 		if (txn_limbo_entry_is_complete(entry))
@@ -509,8 +511,10 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
 	double start_time = fiber_clock();
 	while (true) {
 		double deadline = start_time + replication_synchro_timeout;
-		bool cancellable = fiber_set_cancellable(false);
 		double timeout = deadline - fiber_clock();
+		if (timeout < 0)
+			goto timed_out;
+		bool cancellable = fiber_set_cancellable(false);
 		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
 		fiber_set_cancellable(cancellable);
 		if (cwp.is_confirm || cwp.is_rollback)
-- 
2.26.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-14 14:53   ` [Tarantool-patches] [PATCH v2 " Cyrill Gorcunov
@ 2020-07-16 22:45     ` Vladislav Shpilevoy
  2020-07-17  7:25       ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 22:45 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

Thanks for the patch!

On 14.07.2020 16:53, Cyrill Gorcunov wrote:
> Basically our timeout is calculated via (a - b), where
> @a is a constant positive value fetched once, in turn
> the @b is rather a dynamic value thus the result may
> be negative. libev uses assert() call to catch such
> values when passed to timers setup. Thus lets intercept
> potential assert() trigger and exit early if timeout
> is already expired.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> There were a typo, so I force-updated the branch
> 
>  src/box/txn_limbo.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index d5b887d36..0924952b7 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>  	double start_time = fiber_clock();
>  	while (true) {
>  		double deadline = start_time + replication_synchro_timeout;
> -		bool cancellable = fiber_set_cancellable(false);
>  		double timeout = deadline - fiber_clock();
> +		if (timeout < 0)
> +			goto do_rollback;
> +		bool cancellable = fiber_set_cancellable(false);

I added timeout = -1; here and tried to commit a sync transaction. I
got timed out error, no assertions. Please, tell me, how to reproduce
the assertion you mention in the commit message. Otherwise I don't see
why would we need the < 0 check. If it is done somewhere inside
fiber_cond_wait_timeout anyway.

>  		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
>  		fiber_set_cancellable(cancellable);
>  		if (txn_limbo_entry_is_complete(entry))
> @@ -509,8 +511,10 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
>  	double start_time = fiber_clock();
>  	while (true) {
>  		double deadline = start_time + replication_synchro_timeout;
> -		bool cancellable = fiber_set_cancellable(false);
>  		double timeout = deadline - fiber_clock();
> +		if (timeout < 0)
> +			goto timed_out;
> +		bool cancellable = fiber_set_cancellable(false);
>  		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
>  		fiber_set_cancellable(cancellable);
>  		if (cwp.is_confirm || cwp.is_rollback)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-16 22:45     ` Vladislav Shpilevoy
@ 2020-07-17  7:25       ` Cyrill Gorcunov
  2020-07-17 20:03         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-17  7:25 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 17, 2020 at 12:45:16AM +0200, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> On 14.07.2020 16:53, Cyrill Gorcunov wrote:
> > Basically our timeout is calculated via (a - b), where
> > @a is a constant positive value fetched once, in turn
> > the @b is rather a dynamic value thus the result may
> > be negative. libev uses assert() call to catch such
> > values when passed to timers setup. Thus lets intercept
> > potential assert() trigger and exit early if timeout
> > is already expired.
> > 
> > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> > ---
> > There were a typo, so I force-updated the branch
> > 
> >  src/box/txn_limbo.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> > index d5b887d36..0924952b7 100644
> > --- a/src/box/txn_limbo.c
> > +++ b/src/box/txn_limbo.c
> > @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> >  	double start_time = fiber_clock();
> >  	while (true) {
> >  		double deadline = start_time + replication_synchro_timeout;
> > -		bool cancellable = fiber_set_cancellable(false);
> >  		double timeout = deadline - fiber_clock();
> > +		if (timeout < 0)
> > +			goto do_rollback;
> > +		bool cancellable = fiber_set_cancellable(false);
> 
> I added timeout = -1; here and tried to commit a sync transaction. I
> got timed out error, no assertions. Please, tell me, how to reproduce
> the assertion you mention in the commit message. Otherwise I don't see
> why would we need the < 0 check. If it is done somewhere inside
> fiber_cond_wait_timeout anyway.

Look, here is libev code

---
noinline
void
ev_timer_start (EV_P_ ev_timer *w) EV_THROW
{
  if (expect_false (ev_is_active (w)))
    return;

  ev_at (w) += mn_now;

  assert (("libev: ev_timer_start called with negative timer repeat value", w->repeat >= 0.));
---

To trigger this assert we have to enter idle cycle, then manually change
replication_synchro_timeout via cfg (make it less than it was initially)
which should lead to negative timeout.

To be fair I don't know how to force it without error injection. Lets
assume we have initial fiber clock 1 (the clocks are increasing everytime
libev does a new polling cycle). Thus

	// replication_synchro_timeout = 2

	double start_time = fiber_clock();	// 1
	while (true) {
		double deadline = start_time + replication_synchro_timeout;
		// => 3
		double timeout = deadline - fiber_clock();
		// => 2
		bool cancellable = fiber_set_cancellable(false);
		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
		//
		// enter into resched cycle
		// manually change replication_synchro_timeout to 1
		// next cycle starts

		double deadline = start_time + replication_synchro_timeout;
		// start_time didn't changed
		// => 1 + 1 = 2
		double timeout = deadline - fiber_clock();
		// assume several resched cycles passed, thus
		// fiber_clock returns 3, thus
		// => 2 - 3 => -1
	...

Am I missing something obvious here?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-17  7:25       ` Cyrill Gorcunov
@ 2020-07-17 20:03         ` Vladislav Shpilevoy
  2020-07-17 20:33           ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-17 20:03 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

>> On 14.07.2020 16:53, Cyrill Gorcunov wrote:
>>> Basically our timeout is calculated via (a - b), where
>>> @a is a constant positive value fetched once, in turn
>>> the @b is rather a dynamic value thus the result may
>>> be negative. libev uses assert() call to catch such
>>> values when passed to timers setup. Thus lets intercept
>>> potential assert() trigger and exit early if timeout
>>> is already expired.
>>>
>>> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
>>> ---
>>> There were a typo, so I force-updated the branch
>>>
>>>  src/box/txn_limbo.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
>>> index d5b887d36..0924952b7 100644
>>> --- a/src/box/txn_limbo.c
>>> +++ b/src/box/txn_limbo.c
>>> @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>>>  	double start_time = fiber_clock();
>>>  	while (true) {
>>>  		double deadline = start_time + replication_synchro_timeout;
>>> -		bool cancellable = fiber_set_cancellable(false);
>>>  		double timeout = deadline - fiber_clock();
>>> +		if (timeout < 0)
>>> +			goto do_rollback;
>>> +		bool cancellable = fiber_set_cancellable(false);
>>
>> I added timeout = -1; here and tried to commit a sync transaction. I
>> got timed out error, no assertions. Please, tell me, how to reproduce
>> the assertion you mention in the commit message. Otherwise I don't see
>> why would we need the < 0 check. If it is done somewhere inside
>> fiber_cond_wait_timeout anyway.
> 
> Look, here is libev code
> 
> ---
> noinline
> void
> ev_timer_start (EV_P_ ev_timer *w) EV_THROW
> {
>   if (expect_false (ev_is_active (w)))
>     return;
> 
>   ev_at (w) += mn_now;
> 
>   assert (("libev: ev_timer_start called with negative timer repeat value", w->repeat >= 0.));
> ---
> 
> To trigger this assert we have to enter idle cycle, then manually change
> replication_synchro_timeout via cfg (make it less than it was initially)
> which should lead to negative timeout.

Yeah, well. We set w->after. Not w->repeat.

> To be fair I don't know how to force it without error injection. Lets
> assume we have initial fiber clock 1 (the clocks are increasing everytime
> libev does a new polling cycle). Thus
> 
> 	// replication_synchro_timeout = 2
> 
> 	double start_time = fiber_clock();	// 1
> 	while (true) {
> 		double deadline = start_time + replication_synchro_timeout;
> 		// => 3
> 		double timeout = deadline - fiber_clock();
> 		// => 2
> 		bool cancellable = fiber_set_cancellable(false);
> 		int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
> 		//
> 		// enter into resched cycle
> 		// manually change replication_synchro_timeout to 1
> 		// next cycle starts
> 
> 		double deadline = start_time + replication_synchro_timeout;
> 		// start_time didn't changed
> 		// => 1 + 1 = 2
> 		double timeout = deadline - fiber_clock();
> 		// assume several resched cycles passed, thus
> 		// fiber_clock returns 3, thus
> 		// => 2 - 3 => -1
> 	...
> 
> Am I missing something obvious here?

I still don't see how to trigger this assert. Please, show a code sample with
error injection, if you think that will help. I explicitly added line
'timeout = -1;' and all worked fine.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-17 20:03         ` Vladislav Shpilevoy
@ 2020-07-17 20:33           ` Cyrill Gorcunov
  2020-07-17 20:43             ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-17 20:33 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jul 17, 2020 at 10:03:49PM +0200, Vladislav Shpilevoy wrote:
> > 
> > Am I missing something obvious here?
> 
> I still don't see how to trigger this assert. Please, show a code sample with
> error injection, if you think that will help. I explicitly added line
> 'timeout = -1;' and all worked fine.
>
---
[cyrill@grain tarantool.git] git diff
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 0924952b7..50e6326b3 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -175,8 +175,9 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
        while (true) {
                double deadline = start_time + replication_synchro_timeout;
                double timeout = deadline - fiber_clock();
-               if (timeout < 0)
-                       goto do_rollback;
+               timeout = -1;
+               //if (timeout < 0)
+               //      goto do_rollback;
                bool cancellable = fiber_set_cancellable(false);
                int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
                fiber_set_cancellable(cancellable);
@@ -512,8 +513,9 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
        while (true) {
                double deadline = start_time + replication_synchro_timeout;
                double timeout = deadline - fiber_clock();
-               if (timeout < 0)
-                       goto timed_out;
+               timeout = -1;
+               //if (timeout < 0)
+               //      goto timed_out;
                bool cancellable = fiber_set_cancellable(false);
                int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
                fiber_set_cancellable(cancellable);
---
[cyrill@grain test] ./test-run.py replication/qsync_basic.test.lua 
...
[001] [Instance "master" killed by signal: 6 (SIGABRT)]
[001] Found assertion fail in the results file [/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.log]:

You did the same thing? Note that it requires debug build. 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
  2020-07-17 20:33           ` Cyrill Gorcunov
@ 2020-07-17 20:43             ` Cyrill Gorcunov
  0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-17 20:43 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tml

On Fri, Jul 17, 2020 at 11:33:36PM +0300, Cyrill Gorcunov wrote:
> On Fri, Jul 17, 2020 at 10:03:49PM +0200, Vladislav Shpilevoy wrote:
> > > 
> > > Am I missing something obvious here?
> > 
> > I still don't see how to trigger this assert. Please, show a code sample with
> > error injection, if you think that will help. I explicitly added line
> > 'timeout = -1;' and all worked fine.
> >
> ---
> [cyrill@grain test] ./test-run.py replication/qsync_basic.test.lua 
> ...
> You did the same thing? Note that it requires debug build. 

Actually it is different issue. I think you can safely drop this
patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code
  2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
@ 2020-07-20 19:28 ` Vladislav Shpilevoy
  4 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-20 19:28 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

Hi! Thanks for the patchset!

First 3 commits are pushed to master.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-20 19:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
2020-07-14 14:53   ` [Tarantool-patches] [PATCH v2 " Cyrill Gorcunov
2020-07-16 22:45     ` Vladislav Shpilevoy
2020-07-17  7:25       ` Cyrill Gorcunov
2020-07-17 20:03         ` Vladislav Shpilevoy
2020-07-17 20:33           ` Cyrill Gorcunov
2020-07-17 20:43             ` Cyrill Gorcunov
2020-07-20 19:28 ` [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox