<HTML><BODY><br><br><br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">
        Пятница, 13 апреля 2018, 14:13 +03:00 от Konstantin Belyavskiy <k.belyavskiy@tarantool.org>:<br><br><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15236180140000000492_BODY"><div class="class_1523634474">
Please take a look at newer version.<br><br><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;">
        Пятница, 13 апреля 2018, 11:55 +03:00 от Vladimir Davydov <<a href="mailto:vdavydov.dev@gmail.com">vdavydov.dev@gmail.com</a>>:<br><br><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix">On Wed, Apr 11, 2018 at 07:02:27PM +0300, Konstantin Belyavskiy wrote:<br>
                                 > ticket: <a href="https://github.com/tarantool/tarantool/issues/3257" target="_blank" rel=" noopener noreferrer">https://github.com/tarantool/tarantool/issues/3257</a><br>
> branch: <a href="https://github.com/tarantool/tarantool/compare/gh-3257-fix-bug-with-ro-replica-as-a-leader" target="_blank" rel=" noopener noreferrer">https://github.com/tarantool/tarantool/compare/gh-3257-fix-bug-with-ro-replica-as-a-leader</a><br>
> <br>
> When bootstrapping a new cluster, each replica from replicaset can<br>
> be chosen as a leader, but if it is 'read-only', bootstrap will<br>
> failed with an error.<br>
> Fixed it by eliminating read-only replicas from voting by adding<br>
> access rights information to IPROTO_REQUEST_VOTE reply.<br>
> <br>
> Closes #3257<br>
> ---<br>
>  src/box/applier.cc                             |  3 +-<br>
>  src/box/applier.h                              |  2 +<br>
>  src/box/iproto.cc                              | 11 +++--<br>
>  src/box/iproto_constants.c                     |  2 +-<br>
>  src/box/iproto_constants.h                     |  1 +<br>
>  src/box/replication.cc                         |  8 ++++<br>
>  src/box/xrow.c                                 | 29 ++++++++++---<br>
>  src/box/xrow.h                                 | 54 ++++++++++++++++++-----<br>
>  test/replication/replica_uuid_ro.lua           | 33 ++++++++++++++<br>
>  test/replication/replica_uuid_ro1.lua          |  1 +<br>
>  test/replication/replica_uuid_ro2.lua          |  1 +<br>
>  test/replication/replicaset_ro_mostly.result   | 59 ++++++++++++++++++++++++++<br>
>  test/replication/replicaset_ro_mostly.test.lua | 30 +++++++++++++<br>
>  13 files changed, 212 insertions(+), 22 deletions(-)<br>
>  create mode 100644 test/replication/replica_uuid_ro.lua<br>
>  create mode 120000 test/replication/replica_uuid_ro1.lua<br>
>  create mode 120000 test/replication/replica_uuid_ro2.lua<br>
>  create mode 100644 test/replication/replicaset_ro_mostly.result<br>
>  create mode 100644 test/replication/replicaset_ro_mostly.test.lua<br>
> <br>
> diff --git a/src/box/applier.cc b/src/box/applier.cc<br>
> index 9aa951c34..b3a13260d 100644<br>
> --- a/src/box/applier.cc<br>
> +++ b/src/box/applier.cc<br>
> @@ -223,7 +223,8 @@ applier_connect(struct applier *applier)<br>
>            if (row.type != IPROTO_OK)<br>
>                    xrow_decode_error_xc(&row);<br>
>            vclock_create(&applier->vclock);<br>
> -          xrow_decode_vclock_xc(&row, &applier->vclock);<br>
> +          xrow_decode_request_vote_xc(&row, &applier->vclock,<br>
> +                                      &applier->read_only);<br>
>    }<br>
>  <br>
>    applier_set_state(applier, APPLIER_CONNECTED);<br>
> diff --git a/src/box/applier.h b/src/box/applier.h<br>
> index f25d6cb26..f47cf330f 100644<br>
> --- a/src/box/applier.h<br>
> +++ b/src/box/applier.h<br>
> @@ -95,6 +95,8 @@ struct applier {<br>
>    uint32_t version_id;<br>
>    /** Remote vclock at time of connect. */<br>
>    struct vclock vclock;<br>
> +  /** Remote access rights, true if read-only, default: false */<br><br>
The comment is misleading IMO as this variable doesn't have anything<br>
to do with access rights. It just means that the master is running in<br>
read-only mode. Please fix.</div></div></div></div></blockquote>      Fixed<p style="margin-bottom: 0px;font-stretch: normal;font-size: 11px;line-height: normal;font-family: Menlo;color: #26b41b;background-color: #fef49c;" data-mce-style="margin-bottom: 0px;font-stretch: normal;font-size: 11px;line-height: normal;font-family: Menlo;color: #26b41b;background-color: #fef49c;"><span style="font-variant-ligatures: no-common-ligatures;" data-mce-style="font-variant-ligatures: no-common-ligatures;">Remote peer mode, true if read-only, default: false</span></p><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br><br>
> +  bool read_only;<br><br>
I'd call it master_is_ro, because it's not the applier which is read<br>
only - it's the master.</div></div></div></div></blockquote>remote_is_ro<br><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br><br>
>    /** Remote address */<br>
>    union {<br>
>            struct sockaddr addr;<br>
> diff --git a/src/box/iproto.cc b/src/box/iproto.cc<br>
> index db5820806..51970bb28 100644<br>
> --- a/src/box/iproto.cc<br>
> +++ b/src/box/iproto.cc<br>
> @@ -60,6 +60,8 @@<br>
>  #include "iproto_constants.h"<br>
>  #include "rmean.h"<br>
>  #include "errinj.h"<br>
> +#include "applier.h"<br>
> +#include "cfg.h"<br>
>  <br>
>  /* The number of iproto messages in flight */<br>
>  enum { IPROTO_MSG_MAX = 768 };<br>
> @@ -1352,6 +1354,7 @@ tx_process_misc(struct cmsg *m)<br>
>            goto error;<br>
>  <br>
>    try {<br>
> +          bool read_only = false;<br>
>            switch (msg->header.type) {<br>
>            case IPROTO_AUTH:<br>
>                    box_process_auth(&msg->auth);<br>
> @@ -1363,9 +1366,11 @@ tx_process_misc(struct cmsg *m)<br>
>                                       ::schema_version);<br>
>                    break;<br>
>            case IPROTO_REQUEST_VOTE:<br>
> -                  iproto_reply_vclock_xc(out, msg->header.sync,<br>
> -                                         ::schema_version,<br>
> -                                         &replicaset.vclock);<br><br>
> +                  read_only = cfg_geti("read_only");<br><br>
We have box_is_ro() for this, which also takes into account "orphan"<br>
mode.</div></div></div></div></blockquote>No, during bootstrap it is in "orphan" mode. So this will actually crash it.<br>Also after bootstrap we should use a node with the most recent vclock.<br><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br><br>
> +                  iproto_reply_request_vote_xc(out, msg->header.sync,<br>
> +                                               ::schema_version,<br>
> +                                               &replicaset.vclock,<br>
> +                                               read_only);<br><br>
Nit: you don't need an extra variable here - let's pass box_is_ro()<br>
directly to iproto_reply_request_vote_xc().</div></div></div></div></blockquote>To be honest, I don't think it's necessary.<br>iproto_reply_request_vote_xc just pass all parameters to <br>iproto_reply_request_vote<br>and it construct reply from these params.<br>So it will create a mixed logic then some params are get internally, others<br>from input.<br>Let's leave it as is.</div></div></div></div></div></blockquote>Sorry, understand, just remove a variable and make it one line shorter.<br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15236180140000000492_BODY"><div class="class_1523634474"><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br>
>                    break;<br>
>            default:<br>
>                    unreachable();<br>
> diff --git a/src/box/iproto_constants.c b/src/box/iproto_constants.c<br>
> index cd7b1d03b..eaba6259e 100644<br>
> --- a/src/box/iproto_constants.c<br>
> +++ b/src/box/iproto_constants.c<br>
> @@ -40,10 +40,10 @@ const unsigned char iproto_key_type[IPROTO_KEY_MAX] =<br>
>            /* 0x04 */      MP_DOUBLE, /* IPROTO_TIMESTAMP */<br>
>            /* 0x05 */      MP_UINT,   /* IPROTO_SCHEMA_VERSION */<br>
>            /* 0x06 */      MP_UINT,   /* IPROTO_SERVER_VERSION */<br>
> +          /* 0x07 */      MP_UINT,   /* IPROTO_SERVER_READONLY */<br>
>    /* }}} */<br>
>  <br>
>    /* {{{ unused */<br>
> -          /* 0x07 */      MP_UINT,<br>
>            /* 0x08 */      MP_UINT,<br>
>            /* 0x09 */      MP_UINT,<br>
>            /* 0x0a */      MP_UINT,<br>
> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h<br>
> index 951842485..aae004584 100644<br>
> --- a/src/box/iproto_constants.h<br>
> +++ b/src/box/iproto_constants.h<br>
> @@ -58,6 +58,7 @@ enum iproto_key {<br>
>    IPROTO_TIMESTAMP = 0x04,<br>
>    IPROTO_SCHEMA_VERSION = 0x05,<br>
>    IPROTO_SERVER_VERSION = 0x06,<br>
> +  IPROTO_SERVER_READONLY = 0x07,<br><br>
IPROTO_SERVER_IS_RO, may be?</div></div></div></div></blockquote>Ok, fixed.<br><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br><br>
>    /* Leave a gap for other keys in the header. */<br>
>    IPROTO_SPACE_ID = 0x10,<br>
>    IPROTO_INDEX_ID = 0x11,<br>
> diff --git a/src/box/replication.cc b/src/box/replication.cc<br>
> index b1c84d36c..423de2c88 100644<br>
> --- a/src/box/replication.cc<br>
> +++ b/src/box/replication.cc<br>
> @@ -685,6 +685,14 @@ replicaset_leader(void)<br>
>    replicaset_foreach(replica) {<br>
>            if (replica->applier == NULL)<br>
>                    continue;<br>
> +          /**<br>
> +           * While bootstrapping a new cluster,<br>
> +           * read-only replicas shouldn't be considered<br>
> +           * as a leader.<br>
> +           */<br>
> +          if (replica->applier->read_only &&<br>
> +              replica->applier->vclock.signature == 0)<br>
> +                  continue;<br><br>
What if you're adding a new replica to an existing cluster some nodes of<br>
which are read only? AFAIU you want to choose an rw replica in this case<br>
as well.</div></div></div></div></blockquote>No, we improve only one case: bootstrapping a new cluster (and do not change others).<br>In other cases it's more difficult and may be we need to develop a specific protocol<br> to solve it in general case. But it's a case for a new ticket.<br><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br><br>
>            if (leader == NULL) {<br>
>                    leader = replica;<br>
>                    continue;<br><br>
> diff --git a/test/replication/replicaset_ro_mostly.test.lua b/test/replication/replicaset_ro_mostly.test.lua<br>
> new file mode 100644<br>
> index 000000000..539ca5a13<br>
> --- /dev/null<br>
> +++ b/test/replication/replicaset_ro_mostly.test.lua<br>
> @@ -0,0 +1,30 @@<br>
> +-- gh-3257 check bootstrap with read-only replica in cluster.<br>
> +-- Old behaviour: failed, since read-only is chosen by uuid.<br>
> +test_run = require('test_run').new()<br>
> +<br>
> +SERVERS = {'replica_uuid_ro1', 'replica_uuid_ro2'}<br>
> +<br>
> +uuid = require('uuid')<br>
> +uuid1 = uuid.new()<br>
> +uuid2 = uuid.new()<br>
> +function sort_cmp(a, b) return a.time_low > b.time_low and true or false end<br>
> +function sort(t) table.sort(t, sort_cmp) return t end<br>
> +UUID = sort({uuid1, uuid2}, sort_cmp)<br><br>
I don't think you really need to set UUID explicitly. If you don't, the<br>
test will fail quite often anyway.</div></div></div></div></blockquote>It's to make it 100% reproducible. In other cases it's a probability test (not good).<br><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><div id=""><div class="js-helper_mailru_css_attribute_postfix js-readmsg-msg_mailru_css_attribute_postfix"><div><div id="style_15236097120000000000_BODY_mailru_css_attribute_postfix"><br><br>
> +<br>
> +create_cluster_cmd1 = 'create server %s with script="replication/%s.lua"'<br>
> +create_cluster_cmd2 = 'start server %s with args="%s", wait_load=False, wait=False'<br>
> +<br>
> +test_run:cmd("setopt delimiter ';'")<br>
> +function create_cluster_uuid(servers, uuids)<br>
> +    for i, name in ipairs(servers) do<br>
> +        test_run:cmd(create_cluster_cmd1:format(name, name))<br>
> +        test_run:cmd(create_cluster_cmd2:format(name, uuids[i]))<br>
> +    end<br>
> +end;<br>
> +test_run:cmd("setopt delimiter ''");<br>
> +<br>
> +-- Deploy a cluster.<br>
> +create_cluster_uuid(SERVERS, UUID)<br>
> +test_run:wait_fullmesh(SERVERS)<br>
> +-- Cleanup.<br>
> +test_run:drop_cluster(SERVERS)<br><br></div></div></div></div></blockquote><br><br>Best regards,<br>Konstantin Belyavskiy<br><a href="mailto:k.belyavskiy@tarantool.org">k.belyavskiy@tarantool.org</a><br></div></div></div></div></div></blockquote>
<br>
<br>Best regards,<br>Konstantin Belyavskiy<br>k.belyavskiy@tarantool.org<br></BODY></HTML>