* [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions
2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout Cyrill Gorcunov
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/txn_limbo.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index f88ec2bcc..918db2263 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -252,9 +252,15 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo);
void
txn_limbo_force_empty(struct txn_limbo *limbo, int64_t last_confirm);
+/**
+ * Update qsync parameters dynamically.
+ */
void
txn_limbo_on_parameters_change(struct txn_limbo *limbo);
+/**
+ * Initialize qsync engine.
+ */
void
txn_limbo_init();
--
2.26.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout
2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion Cyrill Gorcunov
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
We already use replication_synchro_quorum in code,
no need for this wrap.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/box.cc | 2 +-
src/box/txn_limbo.c | 11 ++---------
src/box/txn_limbo.h | 3 ---
3 files changed, 3 insertions(+), 13 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 884061a81..7a7befc32 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -959,7 +959,7 @@ box_clear_synchro_queue(void)
return;
/* Wait until pending confirmations/rollbacks reach us. */
- double timeout = 2 * txn_limbo_confirm_timeout(&txn_limbo);
+ double timeout = 2 * replication_synchro_timeout;
double start_tm = fiber_time();
while (!txn_limbo_is_empty(&txn_limbo)) {
if (fiber_time() - start_tm > timeout)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 9623b71cf..15dbe6515 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -173,7 +173,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
assert(txn_has_flag(txn, TXN_WAIT_SYNC));
double start_time = fiber_clock();
while (true) {
- double deadline = start_time + txn_limbo_confirm_timeout(limbo);
+ double deadline = start_time + replication_synchro_timeout;
bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
bool timed_out = fiber_cond_wait_timeout(&limbo->wait_cond,
@@ -450,13 +450,6 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
}
}
-double
-txn_limbo_confirm_timeout(struct txn_limbo *limbo)
-{
- (void)limbo;
- return replication_synchro_timeout;
-}
-
/**
* Waitpoint stores information about the progress of confirmation.
* In the case of multimaster support, it will store a bitset
@@ -516,7 +509,7 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
txn_on_rollback(tle->txn, &on_rollback);
double start_time = fiber_clock();
while (true) {
- double deadline = start_time + txn_limbo_confirm_timeout(limbo);
+ double deadline = start_time + replication_synchro_timeout;
bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 918db2263..8cfb490c3 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -232,9 +232,6 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn);
void
txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn);
-double
-txn_limbo_confirm_timeout(struct txn_limbo *limbo);
-
/**
* Waiting for confirmation of all "sync" transactions
* during confirm timeout or fail.
--
2.26.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion
2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 1/4] qsync: add missing functions descriptions Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 2/4] qsync: drop txn_limbo_confirm_timeout Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
2020-07-20 19:28 ` [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Vladislav Shpilevoy
4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
In txn_limbo_wait_confirm we already use proper int type
(as declared in fiber_cond_wait_timeout) thus lets do the
same here.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/txn_limbo.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 15dbe6515..d5b887d36 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -176,12 +176,11 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
double deadline = start_time + replication_synchro_timeout;
bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
- bool timed_out = fiber_cond_wait_timeout(&limbo->wait_cond,
- timeout);
+ int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
if (txn_limbo_entry_is_complete(entry))
goto complete;
- if (timed_out)
+ if (rc != 0)
goto do_rollback;
}
--
2.26.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
` (2 preceding siblings ...)
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 3/4] qsync: txn_limbo_wait_complete -- fix type conversion Cyrill Gorcunov
@ 2020-07-14 14:44 ` Cyrill Gorcunov
2020-07-14 14:53 ` [Tarantool-patches] [PATCH v2 " Cyrill Gorcunov
2020-07-20 19:28 ` [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Vladislav Shpilevoy
4 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:44 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
Basically our timeout is calculated via (a - b), where
@a is a constant positive value fetched once, in turn
the @b is rather a dynamic value thus the result may
be negative. libev uses assert() call to catch such
values when passed to timers setup. Thus lets intercept
potential assert() trigger and exit early if timeout
is already expired.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/txn_limbo.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d5b887d36..3ed0b6db5 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -176,6 +176,8 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
double deadline = start_time + replication_synchro_timeout;
bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
+ if (timeout < 0)
+ goto do_rollback;
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
if (txn_limbo_entry_is_complete(entry))
@@ -511,6 +513,8 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
double deadline = start_time + replication_synchro_timeout;
bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
+ if (timeout < 0)
+ goto timed_out;
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
if (cwp.is_confirm || cwp.is_rollback)
--
2.26.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
@ 2020-07-14 14:53 ` Cyrill Gorcunov
2020-07-16 22:45 ` Vladislav Shpilevoy
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-14 14:53 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
Basically our timeout is calculated via (a - b), where
@a is a constant positive value fetched once, in turn
the @b is rather a dynamic value thus the result may
be negative. libev uses assert() call to catch such
values when passed to timers setup. Thus lets intercept
potential assert() trigger and exit early if timeout
is already expired.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
There were a typo, so I force-updated the branch
src/box/txn_limbo.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d5b887d36..0924952b7 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
double start_time = fiber_clock();
while (true) {
double deadline = start_time + replication_synchro_timeout;
- bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
+ if (timeout < 0)
+ goto do_rollback;
+ bool cancellable = fiber_set_cancellable(false);
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
if (txn_limbo_entry_is_complete(entry))
@@ -509,8 +511,10 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
double start_time = fiber_clock();
while (true) {
double deadline = start_time + replication_synchro_timeout;
- bool cancellable = fiber_set_cancellable(false);
double timeout = deadline - fiber_clock();
+ if (timeout < 0)
+ goto timed_out;
+ bool cancellable = fiber_set_cancellable(false);
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
if (cwp.is_confirm || cwp.is_rollback)
--
2.26.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-14 14:53 ` [Tarantool-patches] [PATCH v2 " Cyrill Gorcunov
@ 2020-07-16 22:45 ` Vladislav Shpilevoy
2020-07-17 7:25 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 22:45 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml
Thanks for the patch!
On 14.07.2020 16:53, Cyrill Gorcunov wrote:
> Basically our timeout is calculated via (a - b), where
> @a is a constant positive value fetched once, in turn
> the @b is rather a dynamic value thus the result may
> be negative. libev uses assert() call to catch such
> values when passed to timers setup. Thus lets intercept
> potential assert() trigger and exit early if timeout
> is already expired.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> There were a typo, so I force-updated the branch
>
> src/box/txn_limbo.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index d5b887d36..0924952b7 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> double start_time = fiber_clock();
> while (true) {
> double deadline = start_time + replication_synchro_timeout;
> - bool cancellable = fiber_set_cancellable(false);
> double timeout = deadline - fiber_clock();
> + if (timeout < 0)
> + goto do_rollback;
> + bool cancellable = fiber_set_cancellable(false);
I added timeout = -1; here and tried to commit a sync transaction. I
got timed out error, no assertions. Please, tell me, how to reproduce
the assertion you mention in the commit message. Otherwise I don't see
why would we need the < 0 check. If it is done somewhere inside
fiber_cond_wait_timeout anyway.
> int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
> fiber_set_cancellable(cancellable);
> if (txn_limbo_entry_is_complete(entry))
> @@ -509,8 +511,10 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
> double start_time = fiber_clock();
> while (true) {
> double deadline = start_time + replication_synchro_timeout;
> - bool cancellable = fiber_set_cancellable(false);
> double timeout = deadline - fiber_clock();
> + if (timeout < 0)
> + goto timed_out;
> + bool cancellable = fiber_set_cancellable(false);
> int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
> fiber_set_cancellable(cancellable);
> if (cwp.is_confirm || cwp.is_rollback)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-16 22:45 ` Vladislav Shpilevoy
@ 2020-07-17 7:25 ` Cyrill Gorcunov
2020-07-17 20:03 ` Vladislav Shpilevoy
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-17 7:25 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
On Fri, Jul 17, 2020 at 12:45:16AM +0200, Vladislav Shpilevoy wrote:
> Thanks for the patch!
>
> On 14.07.2020 16:53, Cyrill Gorcunov wrote:
> > Basically our timeout is calculated via (a - b), where
> > @a is a constant positive value fetched once, in turn
> > the @b is rather a dynamic value thus the result may
> > be negative. libev uses assert() call to catch such
> > values when passed to timers setup. Thus lets intercept
> > potential assert() trigger and exit early if timeout
> > is already expired.
> >
> > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> > ---
> > There were a typo, so I force-updated the branch
> >
> > src/box/txn_limbo.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> > index d5b887d36..0924952b7 100644
> > --- a/src/box/txn_limbo.c
> > +++ b/src/box/txn_limbo.c
> > @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> > double start_time = fiber_clock();
> > while (true) {
> > double deadline = start_time + replication_synchro_timeout;
> > - bool cancellable = fiber_set_cancellable(false);
> > double timeout = deadline - fiber_clock();
> > + if (timeout < 0)
> > + goto do_rollback;
> > + bool cancellable = fiber_set_cancellable(false);
>
> I added timeout = -1; here and tried to commit a sync transaction. I
> got timed out error, no assertions. Please, tell me, how to reproduce
> the assertion you mention in the commit message. Otherwise I don't see
> why would we need the < 0 check. If it is done somewhere inside
> fiber_cond_wait_timeout anyway.
Look, here is libev code
---
noinline
void
ev_timer_start (EV_P_ ev_timer *w) EV_THROW
{
if (expect_false (ev_is_active (w)))
return;
ev_at (w) += mn_now;
assert (("libev: ev_timer_start called with negative timer repeat value", w->repeat >= 0.));
---
To trigger this assert we have to enter idle cycle, then manually change
replication_synchro_timeout via cfg (make it less than it was initially)
which should lead to negative timeout.
To be fair I don't know how to force it without error injection. Lets
assume we have initial fiber clock 1 (the clocks are increasing everytime
libev does a new polling cycle). Thus
// replication_synchro_timeout = 2
double start_time = fiber_clock(); // 1
while (true) {
double deadline = start_time + replication_synchro_timeout;
// => 3
double timeout = deadline - fiber_clock();
// => 2
bool cancellable = fiber_set_cancellable(false);
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
//
// enter into resched cycle
// manually change replication_synchro_timeout to 1
// next cycle starts
double deadline = start_time + replication_synchro_timeout;
// start_time didn't changed
// => 1 + 1 = 2
double timeout = deadline - fiber_clock();
// assume several resched cycles passed, thus
// fiber_clock returns 3, thus
// => 2 - 3 => -1
...
Am I missing something obvious here?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-17 7:25 ` Cyrill Gorcunov
@ 2020-07-17 20:03 ` Vladislav Shpilevoy
2020-07-17 20:33 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-17 20:03 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml
>> On 14.07.2020 16:53, Cyrill Gorcunov wrote:
>>> Basically our timeout is calculated via (a - b), where
>>> @a is a constant positive value fetched once, in turn
>>> the @b is rather a dynamic value thus the result may
>>> be negative. libev uses assert() call to catch such
>>> values when passed to timers setup. Thus lets intercept
>>> potential assert() trigger and exit early if timeout
>>> is already expired.
>>>
>>> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
>>> ---
>>> There were a typo, so I force-updated the branch
>>>
>>> src/box/txn_limbo.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
>>> index d5b887d36..0924952b7 100644
>>> --- a/src/box/txn_limbo.c
>>> +++ b/src/box/txn_limbo.c
>>> @@ -174,8 +174,10 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>>> double start_time = fiber_clock();
>>> while (true) {
>>> double deadline = start_time + replication_synchro_timeout;
>>> - bool cancellable = fiber_set_cancellable(false);
>>> double timeout = deadline - fiber_clock();
>>> + if (timeout < 0)
>>> + goto do_rollback;
>>> + bool cancellable = fiber_set_cancellable(false);
>>
>> I added timeout = -1; here and tried to commit a sync transaction. I
>> got timed out error, no assertions. Please, tell me, how to reproduce
>> the assertion you mention in the commit message. Otherwise I don't see
>> why would we need the < 0 check. If it is done somewhere inside
>> fiber_cond_wait_timeout anyway.
>
> Look, here is libev code
>
> ---
> noinline
> void
> ev_timer_start (EV_P_ ev_timer *w) EV_THROW
> {
> if (expect_false (ev_is_active (w)))
> return;
>
> ev_at (w) += mn_now;
>
> assert (("libev: ev_timer_start called with negative timer repeat value", w->repeat >= 0.));
> ---
>
> To trigger this assert we have to enter idle cycle, then manually change
> replication_synchro_timeout via cfg (make it less than it was initially)
> which should lead to negative timeout.
Yeah, well. We set w->after. Not w->repeat.
> To be fair I don't know how to force it without error injection. Lets
> assume we have initial fiber clock 1 (the clocks are increasing everytime
> libev does a new polling cycle). Thus
>
> // replication_synchro_timeout = 2
>
> double start_time = fiber_clock(); // 1
> while (true) {
> double deadline = start_time + replication_synchro_timeout;
> // => 3
> double timeout = deadline - fiber_clock();
> // => 2
> bool cancellable = fiber_set_cancellable(false);
> int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
> //
> // enter into resched cycle
> // manually change replication_synchro_timeout to 1
> // next cycle starts
>
> double deadline = start_time + replication_synchro_timeout;
> // start_time didn't changed
> // => 1 + 1 = 2
> double timeout = deadline - fiber_clock();
> // assume several resched cycles passed, thus
> // fiber_clock returns 3, thus
> // => 2 - 3 => -1
> ...
>
> Am I missing something obvious here?
I still don't see how to trigger this assert. Please, show a code sample with
error injection, if you think that will help. I explicitly added line
'timeout = -1;' and all worked fine.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-17 20:03 ` Vladislav Shpilevoy
@ 2020-07-17 20:33 ` Cyrill Gorcunov
2020-07-17 20:43 ` Cyrill Gorcunov
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-17 20:33 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml
On Fri, Jul 17, 2020 at 10:03:49PM +0200, Vladislav Shpilevoy wrote:
> >
> > Am I missing something obvious here?
>
> I still don't see how to trigger this assert. Please, show a code sample with
> error injection, if you think that will help. I explicitly added line
> 'timeout = -1;' and all worked fine.
>
---
[cyrill@grain tarantool.git] git diff
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 0924952b7..50e6326b3 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -175,8 +175,9 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
while (true) {
double deadline = start_time + replication_synchro_timeout;
double timeout = deadline - fiber_clock();
- if (timeout < 0)
- goto do_rollback;
+ timeout = -1;
+ //if (timeout < 0)
+ // goto do_rollback;
bool cancellable = fiber_set_cancellable(false);
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
@@ -512,8 +513,9 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
while (true) {
double deadline = start_time + replication_synchro_timeout;
double timeout = deadline - fiber_clock();
- if (timeout < 0)
- goto timed_out;
+ timeout = -1;
+ //if (timeout < 0)
+ // goto timed_out;
bool cancellable = fiber_set_cancellable(false);
int rc = fiber_cond_wait_timeout(&limbo->wait_cond, timeout);
fiber_set_cancellable(cancellable);
---
[cyrill@grain test] ./test-run.py replication/qsync_basic.test.lua
...
[001] [Instance "master" killed by signal: 6 (SIGABRT)]
[001] Found assertion fail in the results file [/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.log]:
You did the same thing? Note that it requires debug build.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout
2020-07-17 20:33 ` Cyrill Gorcunov
@ 2020-07-17 20:43 ` Cyrill Gorcunov
0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2020-07-17 20:43 UTC (permalink / raw)
To: Vladislav Shpilevoy, tml
On Fri, Jul 17, 2020 at 11:33:36PM +0300, Cyrill Gorcunov wrote:
> On Fri, Jul 17, 2020 at 10:03:49PM +0200, Vladislav Shpilevoy wrote:
> > >
> > > Am I missing something obvious here?
> >
> > I still don't see how to trigger this assert. Please, show a code sample with
> > error injection, if you think that will help. I explicitly added line
> > 'timeout = -1;' and all worked fine.
> >
> ---
> [cyrill@grain test] ./test-run.py replication/qsync_basic.test.lua
> ...
> You did the same thing? Note that it requires debug build.
Actually it is different issue. I think you can safely drop this
patch.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code
2020-07-14 14:44 [Tarantool-patches] [PATCH 0/4] qsync: continue cooking the code Cyrill Gorcunov
` (3 preceding siblings ...)
2020-07-14 14:44 ` [Tarantool-patches] [PATCH 4/4] qsync: don't send negative timeouts into fiber_cond_wait_timeout Cyrill Gorcunov
@ 2020-07-20 19:28 ` Vladislav Shpilevoy
4 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-20 19:28 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml
Hi! Thanks for the patchset!
First 3 commits are pushed to master.
^ permalink raw reply [flat|nested] 12+ messages in thread