[Tarantool-patches] [PATCH] box: rfc for stacked diagnostic area

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jan 24 01:56:45 MSK 2020


Hi! Thanks for the RFC!

See 13 comments below.

On 14/01/2020 21:16, Nikita Pettik wrote:
> From: Kirill Shcherbatov <kshcherbatov at tarantool.org>
> 
> Part of #1148
> ---
> rfc in human-readable format: https://github.com/tarantool/tarantool/blob/np/gh-1148-stacked-diag/doc/rfc/1148-stacked-diagnostics.md
> Previous version of rfc: https://github.com/tarantool/tarantool/commit/c2a7e1a10732fdb231780ac04d1a4bc1618c0468
> 
> Changes in this version:
> - All savepoint routines have been removed as meaningless;
> - Removed SQL warnings mentions since they are unrelated to the subject of rfc;
> - Removed mentions about exposing box.error.new() with filename and
>   line parameters (again, it is barely related to the subject of rfc);
> - Described in details difference between diag_set() in diag_add()
>   functions as a part of C interface;
> - Other minor clean-ups.
> 
> Now RFC looks quite brief and straightforward in its implementation;
> it attempts at solving only #1148 issue.
> 
>  doc/rfc/1148-stacked-diagnostics.md | 214 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 214 insertions(+)
>  create mode 100644 doc/rfc/1148-stacked-diagnostics.md
> 
> diff --git a/doc/rfc/1148-stacked-diagnostics.md b/doc/rfc/1148-stacked-diagnostics.md
> new file mode 100644
> index 000000000..68fa6d21a
> --- /dev/null
> +++ b/doc/rfc/1148-stacked-diagnostics.md
> @@ -0,0 +1,214 @@
> +# Stacked Diagnostics
> +
> +* **Status**: In progress
> +* **Start date**: 30-07-2019
> +* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov at tarantool.org,
> +               Pettik Nikita korablev at tarantool.org

1. You missed your GitHub login.

> +* **Issues**: [#1148](https://github.com/tarantool/<repository\>/issues/1148)
> +

2. You missed 'Summary' section.

> +## Background and motivation
> +
> +Support stacked diagnostics for Tarantool allows to accumulate all errors
> +occurred during request processing. It allows to better understand what
> +happened, and handle errors appropriately.
> +
> +### Current error diagnostics
> +
> +Currently Tarantool has `diag_set()` mechanism to set a diagnostic error.
> +Object representing error featuring following properties:
> + - type (string) error’s C++ class;
> + - code (number) error’s number;
> + - message (string) error’s message;
> + - file (string) Tarantool source file;
> + - line (number) line number in the Tarantool source file.
> +
> +The last error raised is exported with `box.error.last()` function.
> +
> +Type of error is represented by a few C++ classes (all are inherited from
> +Exception class):
> +```
> +ClientError
> + | LoggedError
> + | AccessDeniedError
> + | UnsupportedIndexFeature
> +
> +XlogError
> + | XlogGapError
> +
> +SystemError
> + | SocketError
> + | OutOfMemory
> + | TimedOut
> +
> +ChannelIsClosed
> +FiberIsCancelled
> +LuajitError
> +IllegalParams
> +CollationError
> +SwimError
> +CryptoError
> +```

3. I don't remember why all these errors were listed but it certainly makes no
sense. For the RFC it does not matter what a concrete error types there are.

> +
> +All codes and names of ClientError class are available in box.error.
> +User is able to create a new error instance of predefined type using
> +box.error.new() function. For example:
> +```
> +tarantool> t = box.error.new(box.error.CREATE_SPACE, "myspace", "just cause")
> +tarantool> t:unpack()
> +---
> +- type: ClientError
> +  code: 9
> +  message: 'Failed to create space ''myspace'': just because'

4. Unpack is able to even unpack individual words! cause -> because, wow!
(joke time).

> +  trace:
> +  - file: '[string "t = box.error.new(box.error.CREATE_SPACE, "my..."]'
> +    line: 1
> +```
> +
> +User is also capable of defining own errors with any code  by means of:
> +```
> +box.error.new({code = user_code, reason = user_error_msg})
> +```
> +For instance:
> +```
> +e = box.error.new({code = 500, reason = 'just cause'})
> +```
> +
> +Error cdata object has `:unpack()`, `:raise()`, `:match(...)`, `:__serialize()`
> +methods and `.type`, `.message` and `.trace` fields.
> +

5. 'Background and motivation' section is finished, but no a word about
motivation.

> +## Proposal
> +
> +In some cases a diagnostic information should be more complicated than
> +one last raised error. Consider following example: persistent Lua function
> +referenced by functional index has a bug in it's definition, Lua handler sets
> +an diag message. Then functional index extractor code setups an own, more
> +specialized error. Without stacked diagnostic area, only last error is
> +delivired to user. One way to deal with this problem is to introduce stack
> +accumulating all errors happened during request processing.
> +
> +### C API
> +
> +Let's keep existent `diag_set()` method as is. It is supposed to replace the
> +last error in diagnostic area with a new one. To add new error at the top of
> +existing one, let's introduce new method `diag_add()`. It is assumed to keep
> +an existent error message in diagnostic area (if any) and sets it as a reason

6. Reason is a different term. It is an error message of the current error.
And that name 'reason' is already occupied in box.error.new() function. Besides,
below you call it 'prev', previous error. I propose to keep the terminology the
same everywhere. 'Reason' for error message, 'previous' for an error happened
before a new one, and linked with it. No mixing of these terms.

> +error for a recently-constructed error object. Note that `diag_set()` is not
> +going to preserve pointer to previous error which is held in error to be
> +substituted. To illustrate last point consider example:> +
> +```
> +0. Errors: <NULL>
> +1. diag_set(code = 1)
> +Errors: <e1(code = 1) -> NULL>
> +2. diag_add(code = 2)
> +Errors: <e1(code = 1) -> e2(code = 2) -> NULL>
> +3. diag_set(code = 3)
> +Errors: <e3(code = 3) -> NULL>
> +```
> +
> +Hence, developer takes responsibility of placing `diag_set()` where the most
> +basic error should be raised. For instance, if during request processing
> +`diag_add()` is called before `diag_set()` then it will result in inheritance
> +of all errors from previous error raise:> +
> +```
> +-- Processing of request #1
> +1. diag_set(code = 1)
> +Errors: <e1(code = 1) -> NULL>
> +2. diag_add(code = 2)
> +Errors: <e1(code = 1) -> e2(code = 2) -> NULL>
> +-- End of execution
> +
> +-- Processing of request #2
> +1. diag_add(code = 1)
> +Errors: <e1(code = 1) -> e2(code = 2) -> e3(code = 1) -> NULL>
> +-- End of execution
> +```
> +
> +As a result, at the end of execution fo second request, three errors in

7. fo -> of.

> +stack are reported instead of one.
> +
> +Another way to resolve this issue is to erase diagnostic area before
> +request processing. However, it breaks current user-visible behaviour
> +since box.error.last() will preserve last occurred error only until execution
> +of the next request.
> +
8. I think users would rather be surprised if a fiber would have not empty diag
when a request is just started. Just like it was recently with fiber.storage.

In case I misunderstood what do you mean as a request - in my terminology
'request' is either a manually created fiber, which dies and clears error.last
anyway after its function is finished, or it is a pooled fiber, which doesn't
die, but still should not leak anything between requests. So now we have a
not-consistent behaviour. It not only *can* be fixed, but *must* be fixed.

> +The diagnostic area (now) contains (nothing but) pointer to the top error:
> +```
> +struct diag {
> +  struct error *last;
> +};
> +
> +```
> +
> +To organize errors in a list let's extend error structure with pointer to
> +the previous element. Or alternatively, add member of any data structure
> +providing list properties (struct rlist, struct stailq or whatever):
> +```
> +struct diag {
> +  struct stailq *errors;
> +};
> +
> +struct error {
> +   ...
> +   struct stailq_entry *in_errors;
> +};
> +```
> +
> +
> +### Lua API
> +
> +Tarantool returns a last-set (diag::last) error as `cdata` object from central
> +diagnostic area to Lua in case of error. User should be unable to modify it
> +(since it is considered to be a bad practice - in fact object doesn't belong
> +to user). On the other hand, user needs an ability to inspect a collected
> +diagnostic information. Hence, let's extend the `box.error` API with a function
> +which provides the way to get the previous error (reason): `:prev()` (and
> +correspondingly `.prev` field).
> +
> +```
> +-- Return a reason error object for given error
> +-- (when exists, nil otherwise).
> +box.error.prev(error) == error.prev
> +```
> +
> +Furthermore, let's extend signature of `box.error.new()` with new (optional)
> +argument - the 'reason' parent error object:

9. 'Reason' is already occupied by a error message. And below you use 'prev'
name for a parent error.

> +
> +```
> +e1 = box.error.new({code = 111, reason = "just cause"})
> +e2 = box.error.new({code = 222, reason = "just cause x2", prev = e1})
> +```

10. Are we not going to allow to link two existing errors? I imagine
it could be simpler and more flexible for a user, than filling
one big map in error.new().

> +
> +### Binary protocol
> +
> +Currently errors are sent as `(IPROTO_ERROR | errcode)` response with an
> +string message containing error details as a payload. There are not so many
> +options to extend current protocol wihtout breaking backward compatibility
> +(which is obviously one of implementation requirements). One way is to extend
> +existent binary protocol with a new key IPROTO_ERROR_STACK (or
> +IPROTO_ERROR_REASON or simply IPROTO_ERROR_V2):
> +```
> +{
> +        // backward compatibility
> +        IPROTO_ERROR: "the most recent error message",
> +        // modern error message
> +        IPROTO_ERROR_STACK: {
> +                {
> +                        // the most recent error object
> +                        IPROTO_ERROR_CODE: error_code_number,
> +                        IPROTO_ERROR_REASON: error_reason_string,
> +                },
> +                ...
> +                {
> +                        // the oldest (reason) error object
> +                },
> +        }
> +}
> +```

11. The names look good. Except that maybe change IPROTO_ERROR_REASON ->
IPROTO_ERROR_MESSAGE. I don't know why it is called reason everywhere
in our public API. In struct error it is message, not reason. And it
just looks more correct, IMO.

12. Another option - send multiple IPROTO_ERROR's. In MP_MAP you can
put multiple keys with the same value, and even keep a strict order.
So we could send IPROTO_ERROR: ..., IPROTO_ERROR: ..., IPROTO_ERROR: ...
For each message in the stack. But that looks like an illegal hack,
and also does not allow to fix this thing alongside:
https://github.com/tarantool/tarantool/issues/3399

So a proper stack looks better, like you proposed.

> +
> +IPROTO_ERROR is always sent (as in older versions) in case of error.
> +IPROTO_ERROR_STACK is presented in response only if there's at least two
> +elements in diagnostic list. Map which contains error stack can be optimized
> +in terms of space, so that avoid including error which is already encoded
> +in IPROTO_ERROR.
> 

13. I don't think it is worth saving a few bytes for the cold path of error
reporting. Lets always send IPROTO_ERROR with a last error like now *and*
IPROTO_ERROR_STACK. Even if they contain the same error. It will make much
easier to remove IPROTO_ERROR later. Otherwise we will need to carry them
both forever, if IPROTO_ERROR_STACK would depend on IPROTO_ERROR.


More information about the Tarantool-patches mailing list