From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@freelists.org Cc: kostja@tarantool.org Subject: [tarantool-patches] [PATCH 1/1] rfc: describe proxy Date: Fri, 6 Apr 2018 13:20:01 +0300 [thread overview] Message-ID: <d810fa84a5224ad7d27738f34da614db133fb20b.1523009887.git.v.shpilevoy@tarantool.org> (raw) In-Reply-To: <20180405202353.GD3953@atlas> --- Original: https://github.com/tarantool/tarantool/blob/rfc-proxy/doc/rfc/2625-proxy.md doc/rfc/2625-proxy.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 doc/rfc/2625-proxy.md diff --git a/doc/rfc/2625-proxy.md b/doc/rfc/2625-proxy.md new file mode 100644 index 000000000..1d88b7e26 --- /dev/null +++ b/doc/rfc/2625-proxy.md @@ -0,0 +1,118 @@ +# Proxy + +* **Status**: In progress +* **Start date**: 19.03.2018 +* **Authors**: Vladislav Shpilevoy @Gerold103 \<v.shpilevoy@tarantool.org\>, Konstantin Osipov @kostja \<kostja@tarantool.org\> +* **Issues**: #2625, #3055 + +## Summary + +Proxy is a module to route requests from slaves to a master. It is built-in, configured via `netbox`, and can work both on a master and on a slave. Proxy on a slave resends all requests to a master. On a master a proxy does nothing. + +## Background and motivation + +A client must not know which instance is master, and which is replica - these details must be hidden behind a proxy. A client must be able to send a request to any cluster member, and to get a correct result, even for write requests. This is a preparation for a standalone proxy. + +A killer feature of a proxy joined with a storage is that on a master it has no overhead. It works in the same process as a storage, and can send requests directly to transaction processing thread. On a slave a proxy works in a network thread, in which it holds connections to a master. Exactly connection**s** in a common case, not connection, because for each user using a proxy an own connection is needed to separate user rights on various objects and actions. For example, if a cluster has 10 users, and 3 send requests to a slave, then the slave's proxy holds 3 connections to a master. + +Moreover, proxy merged with a storage has access to space `_user` with password hashes, which can be used to transparently authenticate users on a master. More detailed description in the next section. + +Besides authentication, a proxy must translate syncs for each request in a multiplexed connection. When multiple client-to-proxy connections share a single fat proxy-to-master connection, syncs in requests from different clients can be duplicated. So the proxy must change sync to its own unique value when forwarding a request to a master, and change sync back in a response. + +API: +```Lua +-- At first, turn on proxy. +netbox = require('net.box') +proxy = netbox.listen(...) +-- Now the proxy accepts requests, but does not forward +-- them. +box.cfg({ + -- When a proxy is up, the box.cfg + -- can do not care about listen. + listen = nil, + replication = { cluster_members }, + -- Other options ... +}) +``` + +Proxy does automatic failover, when `box.ctl.promote()` is called. + +## Detailed design + +### Architecture + +``` +client, user1 ------* + ... \ proxy master +client, user1 --------*----------* - *----------------* + ---- SYNC -> proXYNC ----> + <--- SYNC <- proXYNC ----- +``` +A proxy lives entirely in IProto (network) thread. On start it creates guest connections to all cluster members. Despite of the fact that a proxy sends all requests to a master, it must be able to do fast failover to one of replicas. So it must hold connections to slaves too. Salts, received from slaves and from a master, are stored by proxy to use them for authentication schema below. + +#### Authentication + +To understand how a proxy authenticates a user, the one must recall a Tarantool authentication protocol, described below: +``` +SERVER: salt = create_random_string() + send(salt) + +CLIENT: recv(salt) + hash1 = sha1(password) + hash2 = sha1(hash1) + reply = xor(hash1, sha1(salt, hash2)) + + send(reply) + +SERVER: recv(reply) + hash1 = xor(reply, sha1(salt, hash2)) + candidate_hash2 = sha1(hash1) + check(candidate_hash2 == hash2) + ``` + +A proxy has access to hash2, which is stored in space `_user`. Proxy replies to a just connected user with a local salt (!= master or other slaves salt). A client responds with `reply = xor(hash1, sha1(salt, hash2))`. The proxy knows both salt and hash2 and can calculate `hash1 = xor(reply, sha1(salt, hash2))`. Now a proxy can emulate client AUTH request to a master: `auth = xor(hash1, sha1(master_salt, hash2))`. + +When a new client connects to a proxy, it searches for an existing connection to a master for a user, specified in client auth request. If found, then the new client's requests are forwarded to this connection. If no existing connection is found, then the proxy establishes a new one using master salt, calculated hash1 and hash2 to authenticate the new user. + +#### Sync translation + +If a proxy-to-master connection serves one client-to-proxy connection, then `sync` translation is not needed - there are no conflicts. When a proxy-to-master connection serves multiple client-to-proxy connections, the proxy stores and maintains increasing `sync` counter. Consider the communication steps: +1. A client sends a request to a proxy with `sync = s1`; +2. A proxy remembers this `sync`, changes it to `sync = s2`, sends the request to a storage; +3. A response with `sync = s2` is received from the storage. The proxy replaces `s2` back to `s1` and sends the response to the client. + +A proxy-to-master connection stores a hash of original and translated syncs, and removes a record from the hash, when a master respond `IPROTO_OK` or `IPROTO_ERROR`. A special case is `IPROTO_PUSH` - a push does not finish a request, so on a push a syncs hash is not changed. + +#### Queuing + +Consider one proxy-to-master connection. To prevent mixing parts of multiple requests from different client-to-proxy connections, a proxy must forward requests one by one. To do it fairly, a proxy-to-master connection has a queue. In the queue client-to-proxy connections are stored, those sockets are available for reading. + +When a client socket with no available data becomes available for reading, it stands at the end of the queue. First client in the queue after sending ONE request is removed from a queue. If it has more requests, then it stands at the end of the queue to send them. It guarantees a fairness if one client will be always available for reading. + +To speed up `sync` translation, it can be done right after receiving a request from a client, with no waiting until a proxy-to-master connection is available for writing. It allows to do not dawdle with `sync`s when a client appears in the front of the queue. + +## Rationale and alternatives + +Consider another ways to implement some proxy parts. Lets begin from authentication. Authentication on the most of proxies of another DBMSs is not transparent - they store user names and passwords in a local storage. At first, it is not safe, and at second, passwords on a proxy can malsync with actuall passwords, that requires to reconfigure the proxy. The Tarantool proxy authentication is more useful, since it does not require any credentials configuring. + +Another point, is that the first Tarantool proxy version can not be separated from a storage. This is slightly ugly, but it allows transparent authentication, and overhead-free proxy on a master. A standalone proxy is a next step. + +The next point at issue is how IProto thread would read auth info from TX thread, when a new client is connected. There are several alternatives: +* Protect write access to user hash with a mutex. TX thread locks the mutex when writing, and does not lock it for reading, since TX is an exclusive writer. IProto thread meanwhile locks the mutex whenever it needs to read `hash2` to establish an outgoing connection;<br> +**Advantages:** very simple to implement. It is not a performance critical place, so mutex looks ok;<br> +**Shortcomings:** it is not a common practice in Tarantool to use mutexes. It looks like a crutch; +* Maintain a copy of the user hash in iproto thread, and propagate changes to the copy via cbus, in new on_replace triggers added to the system spaces;<br> +**Advantages:** no locks;<br> +**Shortcomings:** it is possible, then a client connects, when a hash copy in IProto thread is expired. And it is relatively hard to implement; +* Use a single copy of the users hash in TX thread, but protect access to this copy by `fiber.cond` object local to IProto thread. Lock/unlock the cond for IProto thread by sending a special message to it via cbus;<br> +**Advantages:** no mutexes;<br> +**Shortcomings:** the hash can expire in the same scenario, as in the previous variant. Hard to implement; +* We use persistent schema versioning and update the state of IProto thread on demand, whenever it notices that schema version has changed;<br> +**Advantages:** no locks, easy;<br> +**Shortcomings:** at first, it does not solve the problem - how to get the changes from TX thread? At second, schema is not updated, when a users hash is changed. And it is strange to update version for it. +* We kill iproto thread altogether and forget about the problem of IProto-TX synch forever (it is not a big patch and it may well improve our performance results).<br> +**Advantages:** in theory, it could speed up the requests processing;<br> +**Shortcomings:** it slows down all slaves, since proxy on a slave works entirely in IProto thead. If a proxy is not in a thread, then on a slave it will occupy CPU just to encode/decode requests, send/receive data, do `SYNC`s translation. When a routing rules will be introduced, it will occupy even more CPU; +* On each new client connection get needed data from a user hash using special cbus message;<br> +**Advantages:** simple, lock-free, new clients can not see expired data, proxy auth is stateless<br> +**Shortcomings:** ? -- 2.14.3 (Apple Git-98)
next prev parent reply other threads:[~2018-04-06 10:20 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <20180327110504.GA26560@atlas> 2018-04-05 14:23 ` Vladislav Shpilevoy 2018-04-05 20:23 ` [tarantool-patches] " Konstantin Osipov 2018-04-06 10:20 ` Vladislav Shpilevoy [this message] 2018-04-06 10:21 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d810fa84a5224ad7d27738f34da614db133fb20b.1523009887.git.v.shpilevoy@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [tarantool-patches] [PATCH 1/1] rfc: describe proxy' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox