[tarantool-patches] Re: [PATCH v2 1/1] rfc: describe a Tarantool wire protocol

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jun 29 22:07:33 MSK 2018


I have renamed chunk to push in the RFC.

On 29/06/2018 18:20, Vladislav Shpilevoy wrote:
> Part of #3328
> ---
> Source: https://github.com/tarantool/tarantool/blob/gerold103/gh-3328-new-iproto/doc/rfc/3328-wire_protocol.md
> Branch: https://github.com/tarantool/tarantool/tree/gerold103/gh-3328-new-iproto
> Issue: https://github.com/tarantool/tarantool/issues/3328
> 
> Changes in v2:
> - Renaming.
> 
>   doc/rfc/3328-wire_protocol.md       | 229 ++++++++++++++++++++++++++++++++++++
>   doc/rfc/3328-wire_protocol_img1.svg |   2 +
>   2 files changed, 231 insertions(+)
>   create mode 100644 doc/rfc/3328-wire_protocol.md
>   create mode 100644 doc/rfc/3328-wire_protocol_img1.svg
> 
> diff --git a/doc/rfc/3328-wire_protocol.md b/doc/rfc/3328-wire_protocol.md
> new file mode 100644
> index 000000000..1983d296f
> --- /dev/null
> +++ b/doc/rfc/3328-wire_protocol.md
> @@ -0,0 +1,229 @@
> +# Tarantool Wire protocol
> +
> +* **Status**: In progress
> +* **Start date**: 04-04-2018
> +* **Authors**: Vladislav Shpilevoy @Gerold103 v.shpilevoy at tarantool.org, Konstantin Osipov @kostja kostja at tarantool.org, Alexey Gadzhiev @alg1973 alg1973 at gmail.com
> +* **Issues**: [#2677](https://github.com/tarantool/tarantool/issues/2677), [#2620](https://github.com/tarantool/tarantool/issues/2620), [#2618](https://github.com/tarantool/tarantool/issues/2618)
> +
> +## Summary
> +
> +Tarantool wire protocol is a convention for encoding and sending execution results of SQL, Lua and C stored functions, DML (Data Manipulation Language), DDL (Data Definition Language), DQL (Data Query Language). The protocol is unified for all request types. For a single request multiple responses of different types can be sent.
> +
> +## Background and motivation
> +
> +Tarantool wire protocol is called **IProto**, and is used by database connectors written in different languages and accessing the database via network. The protocol describes how to distinguish different message types and what data can be stored in each message. Tarantool response messages can be of the following kind:
> +* A response, which represents a single reply to a request or completes a chain of replies. There are two response types of this kind: OK and ERROR. Error response is trivial, and contains an error code and message. OK response may contain useful payload, such as result set rows or metadata.
> +* A response which is a part of a chain or replies - a so called CHUNK-message. Multiple chunk messages can be sent in response to a single request, but they never indicate an end of reply stream: the end is always flagged by a response of the previous kind.
> +
> +In supporting this response set 2 main challenges appear:
> +1. How to unify responses;
> +2. How to support multiple messages inside a single request.
> +
> +A response which contains a payload can contain either data or metadata, or both. If it is necessary to share the same response metadata among multiple CHUNK messages, the metadata can be assigned a numeric identifier (CHUNK ID) and referenced in the stream by this identifier.
> +
> +The metadata itself can contain:
> +* affected row count, last autoincrement column value, flags (such metadata
> +  is sent in response to DML statements such as INSERT/UPDATE/DELETE);
> +* column count, names and types (used to describe result set rows that
> +  follow).
> +
> +To understand how a single request can produce multiple responses, consider the stored procedure (do not pay attention to the syntax - it does not matter here):
> +```SQL
> +FUNCTION my_sql_func(a1, a2, a3, a4) BEGIN
> +    SELECT my_lua_func(a1);
> +    SELECT * FROM table1;
> +    SELECT my_c_func(a2);
> +    INSERT INTO table1 VALUES (1, 2, 3);
> +    RETURN a4;
> +END;
> +```
> +, where `my_lua_func()` is the function, written in Lua and sending its own chunk-messages:
> +```Lua
> +function my_lua_func(arg)
> +    box.session.push(arg)
> +    return arg
> +end
> +```
> +and `my_c_func()` is the function, written in C and returning some raw data:
> +```C
> +int
> +my_c_func(box_function_ctx_t *ctx) {
> +    box_tuple_t *tuple;
> +    /* Fill the tuple with any data. */
> +    return box_return_tuple(ctx, tuple);
> +}
> +```
> +Consider each statement:
> +* `SELECT FROM` can split a big result set into multiple messages;
> +* `SELECT my_lua_func()` produces 2 messages: one is the chunk-message generated in `my_lua_func` and another is the result of `SELECT` itself;
> +* `INSERT` creates 1 message with metadata;
> +* `RETURN` creates a final response message.
> +
> +Of course, some of messages, or even all of them can be batched and sent as a single TCP packet.
> +
> +In the next section we specify code names and messages used by the protocol.
> +
> +For the protocol details - code values, all header and body keys - see Tarantool [website](https://tarantool.io/).
> +
> +## Detailed design
> +
> +Tarantool response consists of a body and a header. Header is used to store response code and some internal metainfo such as schema version, request id (called **sync** in Tarantool). Body is used to store result data and request-dependent metainfo.
> +
> +### Header
> +
> +There are 3 response codes in header:
> +* `IPROTO_OK` - the terminal response to a successful request;
> +* `IPROTO_ERROR | error code` - the terminal response to a request, which ended with an error.
> +* `IPROTO_CHUNK` - non-final response. One request can generate multuple CHUNK messages;
> +
> +`IPROTO_ERROR` response is trivial, and consists just of code and message.
> +
> +`IPROTO_OK` and `IPROTO_CHUNK` have the same body format. But
> +1. `IPROTO_OK` finalizes a request;
> +2. `IPROTO_CHUNK` can have `IPROTO_CHUNK_ID` field in the header, that allows to build a chain of chunks with the same `ID`. Absense of this field means, that the chunk is not a part of a chain. All chunks which are part of the same chain (are identified by the same id) should share chain metadata.
> +
> +### Body
> +
> +The common body structure:
> +```
> ++----------------------------------------------+
> +| IPROTO_BODY: {                               |
> +|     IPROTO_METADATA: [                       |
> +|         {                                    |
> +|             IPROTO_FIELD_NAME: string,       |
> +|             IPROTO_FIELD_TYPE: number,       |
> +|             IPROTO_FIELD_FLAGS: number,      |
> +|         },                                   |
> +|         ...                                  |
> +|     ],                                       |
> +|                                              |
> +|     IPROTO_SQL_INFO: {                       |
> +|         SQL_INFO_ROW_COUNT: number,          |
> +|         SQL_INFO_LAST_ID: number,            |
> +|         ...                                  |
> +|     },                                       |
> +|                                              |
> +|     IPROTO_DATA: [                           |
> +|         tuple/scalar,                        |
> +|         ...                                  |
> +|     ]                                        |
> +| }                                            |
> ++----------------------------------------------+
> +```
> +
> +Consider, how different responses use the body, and how they can be distinguished.
> +
> +_A non formatted response_ has only `IPROTO_DATA` key in a body. It is the result of Lua and C DML, DDL, DQL, stored procedures calls, chunk messages. Such response is never linked with the next or previous messages of the same request.
> +
> +_A non formatted response with metadata_ has only `IPROTO_SQL_INFO` and it is always a result of DDL/DML executed via SQL. As well as the previous type, this response is independent from other messages in the stream.
> +
> +_A formatted response_ always has `IPROTO_DATA`, and can have both `IPROTO_SQL_INFO` and `IPROTO_METADATA`. It is a result of SQL DQL (`SELECT`) or SQL DML (`INSERT`). The response can be part of a stream. The first message of the stream always contains `IPROTO_METADATA` in the body and sets `IPROTO_CHUNK_ID` in the header, should there be multiple messages sharing the same metadata. All other messasges in the stream contain `IPROTO_CHUNK_ID` with the same value.
> +
> +On the picture the state machine of the protocol is showed:
> +![alt text](https://raw.githubusercontent.com/tarantool/tarantool/gerold103/gh-3328-new-iproto/doc/rfc/3328-wire_protocol_img1.svg?sanitize=true)
> +
> +For a call to `FUNCTION my_sql_func` the following responses are sent:
> +```
> +/* Chunk from my_lua_func(a1). */
> ++----------------------------------------------+
> +| HEADER: IPROTO_CHUNK                         |
> ++- - - - - - - - - - - - - - - - - - - - - - - +
> +| BODY: {                                      |
> +|     IPROTO_DATA: [ a1 ]                      |
> +| }                                            |
> ++----------------------------------------------+
> +
> +/* Result of SELECT my_lua_func(a1). */
> ++----------------------------------------------+
> +| HEADER: IPROTO_CHUNK                         |
> ++- - - - - - - - - - - - - - - - - - - - - - - +
> +| BODY: {                                      |
> +|     IPROTO_DATA: [ [ a1 ] ],                 |
> +|     IPROTO_METADATA: [                       |
> +|         { /* field name, type ... */ }       |
> +|     ]                                        |
> +| }                                            |
> ++----------------------------------------------+
> +
> +/* First chunk of SELECT * FROM table1. */
> ++----------------------------------------------+
> +| HEADER: IPROTO_CHUNK, IPROTO_CHUNK_ID = <id1>|
> ++- - - - - - - - - - - - - - - - - - - - - - - +
> +| BODY: {                                      |
> +|    IPROTO_DATA: [ tuple1, tuple2, ... ]      |
> +|    IPROTO_METADATA: [                        |
> +|        { /* field1 name, type ... */ },      |
> +|        { /* field2 name, type ... */ },      |
> +|        ...                                   |
> +|    ]                                         |
> +| }                                            |
> ++----------------------------------------------+
> +
> +    /* From second to last chunk. */
> +    +----------------------------------------------+
> +    | HEADER: IPROTO_CHUNK, IPROTO_CHUNK_ID = <id1>|
> +    +- - - - - - - - - - - - - - - - - - - - - - - +
> +    | BODY: {                                      |
> +    |    IPROTO_DATA: [ tuple1, tuple2, ... ]      |
> +    | }                                            |
> +    +----------------------------------------------+
> +
> +/* Result of SELECT my_c_func(a2). */
> ++----------------------------------------------+
> +| HEADER: IPROTO_CHUNK                         |
> ++- - - - - - - - - - - - - - - - - - - - - - - +
> +| BODY: {                                      |
> +|     IPROTO_DATA: [ [ tuple ] ],              |
> +|     IPROTO_METADATA: [                       |
> +|         { /* field name, type ... */ }       |
> +|     ]                                        |
> +| }                                            |
> ++----------------------------------------------+
> +
> +/* Result of INSERT INTO table1 VALUES (1, 2, 3). */
> ++----------------------------------------------+
> +| HEADER: IPROTO_CHUNK                         |
> ++- - - - - - - - - - - - - - - - - - - - - - - +
> +| BODY: {                                      |
> +|     IPROTO_SQL_INFO: {                       |
> +|         SQL_INFO_ROW_COUNT: number,          |
> +|         SQL_INFO_LAST_ID: number,            |
> +|     }                                        |
> +| }                                            |
> ++----------------------------------------------+
> +
> +/* Result of RETURN a4 */
> ++----------------------------------------------+
> +| HEADER: IPROTO_OK                            |
> ++- - - - - - - - - - - - - - - - - - - - - - - +
> +| BODY: {                                      |
> +|     IPROTO_DATA: [ a4 ]                      |
> +| }                                            |
> ++----------------------------------------------+
> +```
> +
> +## Rationale and alternatives
> +
> +Another way to link chunks together exists, replacing `IPROTO_CHUNK_ID`.
> +Chunks can be linked via flag in a header: `IPROTO_FLAG_IS_CHAIN`, that would be stored in `IPROTO_FLAGS` header value. When a multiple messages form a chain, all of them except last one contain this flag. For example:
> +```
> +IPROTO_CHUNK
> +    |
> +IPROTO_CHUNK, IS_CHAIN
> +    |
> +    +--IPROTO_CHUNK, IS_CHAIN
> +    |
> +    +--IPROTO_CHUNK, IS_CHAIN
> +    |
> +    +--IPROTO_CHUNK
> +    |
> +IPROTO_CHUNK
> +    |
> +...
> +    |
> +IPROTO_OK/ERROR
> +```
> +
> +It is slightly simpler than `CHAIN_ID`, but
> +1. Does not enable to mix parts of different chains, if it will be needed sometimes;
> +2. The last response does not contain `IS_CHAIN`, but it is actually a part of chain. `IS_CHAIN` can not be stored in the last response, because else it will not be distinguishable from the next chain. This can be solved by renaming `IS_CHAIN` to `HAS_NEXT_CHAIN` or something, but `CHAIN_ID` seems better - it has no these problems, and is more scalable.
> 




More information about the Tarantool-patches mailing list