[Tarantool-patches] [PATCH vshard 1/5] router: backoff on some box errors

Oleg Babin olegrok at tarantool.org
Fri Dec 17 14:09:27 MSK 2021


Thanks for your patch! LGTM. But I think tests should be extended a bit.

I see that tests cover AccessDenied error but I don't see that there is 
simple error()/assert() error check.

On 17.12.2021 03:25, Vladislav Shpilevoy wrote:
> Storage configuration takes time. Firstly, box.cfg{} which can be
> called before vshard.storage.cfg(). Secondly, vshard.storage.cfg()
> is not immediate as well.
>
> During that time accessing the storage is not safe. Attempts to
> call vshard.storage functions can return weird errors, or the
> functions can even be not available yet. They need to be created
> in _func and get access rights in _priv before becoming public.
>
> Routers used to forward errors like 'access denied error' and
> 'no such function' to users as is, treating them as critical.
>
> Not only it was confusing for users, but also could make an entire
> replicaset not available for requests - the connection to it is
> alive, so router would send all requests into it and they all
> would fail. Even if the replicaset has another instance which is
> perfectly functional.
>
> This patch handles such specific errors inside of the router. The
> faulty replicas are put into a 'backoff' state. They remain in it
> for some fixed time (5 seconds for now), new requests won't be
> sent to them until the time passes. Router will use other
> instances.
>
> Backoff is activated only for vshard.* functions. If the errors
> are about some user's function, it is considered a regular error.
> Because the router can't tell whether any side effects were done
> on the remote instance before the error happened. Hence can't
> retry to another node.
>
> For example, if access was denied to 'vshard.storage.call', then
> it is backoff. If inside of vshard.storage.call the access was
> denied to 'user_test_func', then it is not backoff.
>
> It all works for read-only requests exclusively of course. Because
> for read-write requests the instance is just one - master. Router
> does not have other options so backoff here wouldn't help.
>
> Part of #298
> ---
>   test/router/router2.result   | 237 +++++++++++++++++++++++++++++++++++
>   test/router/router2.test.lua |  97 ++++++++++++++
>   vshard/consts.lua            |   1 +
>   vshard/error.lua             |   8 +-
>   vshard/replicaset.lua        | 100 +++++++++++++--
>   vshard/router/init.lua       |   3 +-
>   6 files changed, 432 insertions(+), 14 deletions(-)


More information about the Tarantool-patches mailing list