[tarantool-patches] Re: [PATCH v2 4/4] replication: use wal memory buffer to fetch rows

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Sep 24 00:51:15 MSK 2019


Thanks for the answers.

>>> +			struct vclock *send_vclock;
>>> +			if (relay->version_id < version_id(1, 7, 4))
>>> +				send_vclock = &relay->r->vclock;
>>
>> 3. I know, this is old code existed before your patch, and it will
>> be dropped, but why do you send relay->r->vclock and don't decode it
>> anywhere? Above you decoded a received vclock into relay->recv_vclock.
>> relay->r->vclock is not updated, and nonetheless you send it.
> Before 1.7.4 we used recovery vclock as replica vclock. So recovery does the 
> work.

It *did* the work, but now it doesn't. Recovery vclock was being
updated in relay_process_wal_event(). Now you relay directly
from memory, and recovery vclock is not changed until you fall
to disk. Moreover, looks like recovery object does not exist until
you fall to disk. It means, that to 1.7.4 you would either crash or
send the same vclock until a next fall to disk, wouldn't you?

>>> +	}
>>>
>>>  	vclock_copy(&relay->tx.vclock, replica_clock);
>>>  	relay->version_id = replica_version_id;
>>>
>>> @@ -612,7 +604,10 @@ static void
>>>
>>>  relay_send_row(struct xstream *stream, struct xrow_header *packet)
>>>  {
>>>  
>>>  	struct relay *relay = container_of(stream, struct relay, stream);
>>>
>>> -	assert(iproto_type_is_dml(packet->type));
>>> +	if (packet->type != IPROTO_OK) {
>>
>> 13. How can it be an error? We don't write errors to WAL, and packet here
>> is a row, written to WAL. No? What is more, if it is not 'OK', below
>> you may treat it as 'NOP' accidentally.
> IPROTO_OK  means a heartbeat message. The opposite case - a dml with verified 
> with an assert.
> 

How is it possible, that relay_send_row() is used to send a heartbeat
message? For heartbeats there is a special function relay_send_heartbeat().
As I understand, 'row' means something written to WAL. We don't write
heartbeats to WAL, as I remember.

>> 15. Why do you need each write signaled?
> To wake all relays up. And there is an error - I broadcast the condition only 
> when a whole batch was written out.

And what is wrong? WAL writes to disk in batches. 

>>> +
>>>
>>>  	tt_pthread_mutex_lock(&endpoint->mutex);
>>>  	output_was_empty = stailq_empty(&endpoint->output);
>>>  	/** Flush input */
>>>
>>> @@ -297,6 +300,7 @@ cpipe_flush_cb(ev_loop *loop, struct ev_async
>>> *watcher, int events)> 
>>>  		ev_async_send(endpoint->consumer, &endpoint->async);
>>>  	
>>>  	}
>>>
>>> +	pthread_setcancelstate(old_cancel_state, NULL);
>>>
>>>  }
>>
>> 29. I will review the tests later, when I will fully understand the
>> code. But so few tests for such a serious feature looks not enough.
>> Could you test it more rigorous?
> The issue - this patchset is not changing the tarantool behavior, so there is 
> no visible changes to test them. However, I will try to implement some hacking 
> and errinj test to show that in-memory replication is working.

If we can't implement functional tests, then perhaps we could implement stress
benchmarks with sanity checks for data.




More information about the Tarantool-patches mailing list