[Tarantool-patches] [PATCH] box: rfc for stacked diagnostic area

Mon Jan 27 14:23:38 MSK 2020

On 23 Jan 23:56, Vladislav Shpilevoy wrote:
> Hi! Thanks for the RFC!
> 
> See 13 comments below.
> 
> On 14/01/2020 21:16, Nikita Pettik wrote:
> > +* **Start date**: 30-07-2019
> > +* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov at tarantool.org,
> > +               Pettik Nikita korablev at tarantool.org
> 
> 1. You missed your GitHub login.

Okay, added.
 
> > +* **Issues**: [#1148](https://github.com/tarantool/<repository\>/issues/1148)
> > +
> 
> 2. You missed 'Summary' section.

I believe it is redundant for such brief RFC. 
 
> > +### Current error diagnostics
> > +
> > +Currently Tarantool has `diag_set()` mechanism to set a diagnostic error.
> > +Object representing error featuring following properties:
> > + - type (string) error’s C++ class;
> > + - code (number) error’s number;
> > + - message (string) error’s message;
> > + - file (string) Tarantool source file;
> > + - line (number) line number in the Tarantool source file.
> > +
> > +The last error raised is exported with `box.error.last()` function.
> > +
> > +Type of error is represented by a few C++ classes (all are inherited from
> > +Exception class):
> > +```
> > +ClientError
> > + | LoggedError
> > + | AccessDeniedError
> > + | UnsupportedIndexFeature
> > +
> > +XlogError
> > + | XlogGapError
> > +
> > +SystemError
> > + | SocketError
> > + | OutOfMemory
> > + | TimedOut
> > +
> > +ChannelIsClosed
> > +FiberIsCancelled
> > +LuajitError
> > +IllegalParams
> > +CollationError
> > +SwimError
> > +CryptoError
> > +```
> 
> 3. I don't remember why all these errors were listed but it certainly makes no
> sense. For the RFC it does not matter what a concrete error types there are.

Ok, left only ClientError hierarchy as an example.

> > +tarantool> t = box.error.new(box.error.CREATE_SPACE, "myspace", "just cause")
> > +tarantool> t:unpack()
> > +---
> > +- type: ClientError
> > +  code: 9
> > +  message: 'Failed to create space ''myspace'': just because'
> 
> 4. Unpack is able to even unpack individual words! cause -> because, wow!
> (joke time).

Fixed.

> > +Error cdata object has `:unpack()`, `:raise()`, `:match(...)`, `:__serialize()`
> > +methods and `.type`, `.message` and `.trace` fields.
> > +
> 
> 5. 'Background and motivation' section is finished, but no a word about
> motivation.

Ok, moved motivating example from "Proposal" to "Background and motivation"
section.

> > +## Proposal
> > +
> > +In some cases a diagnostic information should be more complicated than
> > +one last raised error. Consider following example: persistent Lua function
> > +referenced by functional index has a bug in it's definition, Lua handler sets
> > +an diag message. Then functional index extractor code setups an own, more
> > +specialized error. Without stacked diagnostic area, only last error is
> > +delivired to user. One way to deal with this problem is to introduce stack
> > +accumulating all errors happened during request processing.
> > +
> > +### C API
> > +
> > +Let's keep existent `diag_set()` method as is. It is supposed to replace the
> > +last error in diagnostic area with a new one. To add new error at the top of
> > +existing one, let's introduce new method `diag_add()`. It is assumed to keep
> > +an existent error message in diagnostic area (if any) and sets it as a reason
> 
> 6. Reason is a different term. It is an error message of the current error.
> And that name 'reason' is already occupied in box.error.new() function. Besides,
> below you call it 'prev', previous error. I propose to keep the terminology the
> same everywhere. 'Reason' for error message, 'previous' for an error happened
> before a new one, and linked with it. No mixing of these terms.

Fair, fixed.
 
> > +-- Processing of request #2
> > +1. diag_add(code = 1)
> > +Errors: <e1(code = 1) -> e2(code = 2) -> e3(code = 1) -> NULL>
> > +-- End of execution
> > +```
> > +
> > +As a result, at the end of execution fo second request, three errors in
> 
> 7. fo -> of.

Fixed.
 
> > +Another way to resolve this issue is to erase diagnostic area before
> > +request processing. However, it breaks current user-visible behaviour
> > +since box.error.last() will preserve last occurred error only until execution
> > +of the next request.
> > +
> 8. I think users would rather be surprised if a fiber would have not empty diag
> when a request is just started. Just like it was recently with fiber.storage.
> 
> In case I misunderstood what do you mean as a request - in my terminology
> 'request' is either a manually created fiber, which dies and clears error.last
> anyway after its function is finished, or it is a pooled fiber, which doesn't
> die, but still should not leak anything between requests. So now we have a
> not-consistent behaviour. It not only *can* be fixed, but *must* be fixed.

'Request' in context I used is any box operation (e.g. space:insert(),
index:select() or cn:execute()). Also I 'member that Konstantin was against
nullifing last error before processing box.execute() and other box functions
(it was in scope of reworking SQL parser's errors).

> > +The diagnostic area (now) contains (nothing but) pointer to the top error:
> > +```
> > +struct diag {
> > +  struct error *last;
> > +};
> > +
> > +
> > +-- Return a reason error object for given error
> > +-- (when exists, nil otherwise).
> > +box.error.prev(error) == error.prev
> > +```
> > +
> > +Furthermore, let's extend signature of `box.error.new()` with new (optional)
> > +argument - the 'reason' parent error object:
> 
> 9. 'Reason' is already occupied by a error message. And below you use 'prev'
> name for a parent error.

Fixed.
 
> 10. Are we not going to allow to link two existing errors? I imagine
> it could be simpler and more flexible for a user, than filling
> one big map in error.new().

Okay, I'm not against it:

 Another way to resolve this issue is to erase diagnostic area before
@@ -173,13 +158,23 @@ box.error.prev(error) == error.prev
 ```
 
 Furthermore, let's extend signature of `box.error.new()` with new (optional)
-argument - the 'reason' parent error object:
+argument - 'prev' - previous error object:
 
 ```
 e1 = box.error.new({code = 111, reason = "just cause"})
 e2 = box.error.new({code = 222, reason = "just cause x2", prev = e1})
 ```
 
+User may want to link already existing errors. To achieve this let's add
+`set_prev` method to error object or/and `link` to `box.error` so that one can
+join two errors:
+```
+e1 = box.error.new({code = 111, reason = "just cause"})
+e2 = box.error.new({code = 222, reason = "just cause x2"})
+...
+e2.set_prev(e1) -- e2.prev == e1
+box.error.link(e1, e2) -- e2.prev == e1
+```

> > +### Binary protocol
> > +
> > +Currently errors are sent as `(IPROTO_ERROR | errcode)` response with an
> > +string message containing error details as a payload. There are not so many
> > +options to extend current protocol wihtout breaking backward compatibility
> > +(which is obviously one of implementation requirements). One way is to extend
> > +existent binary protocol with a new key IPROTO_ERROR_STACK (or
> > +IPROTO_ERROR_REASON or simply IPROTO_ERROR_V2):
> > +```
> > +{
> > +        // backward compatibility
> > +        IPROTO_ERROR: "the most recent error message",
> > +        // modern error message
> > +        IPROTO_ERROR_STACK: {
> > +                {
> > +                        // the most recent error object
> > +                        IPROTO_ERROR_CODE: error_code_number,
> > +                        IPROTO_ERROR_REASON: error_reason_string,
> > +                },
> > +                ...
> > +                {
> > +                        // the oldest (reason) error object
> > +                },
> > +        }
> > +}
> > +```
> 
> 11. The names look good. Except that maybe change IPROTO_ERROR_REASON ->
> IPROTO_ERROR_MESSAGE. I don't know why it is called reason everywhere
> in our public API. In struct error it is message, not reason. And it
> just looks more correct, IMO.

On the other hand, it might look a bit inconsistent if in binary protocole
field would have different name from one in Lua API. Nevertheless, I
don't mind this renaming. Fixed in RFC.

> 12. Another option - send multiple IPROTO_ERROR's. In MP_MAP you can
> put multiple keys with the same value, and even keep a strict order.
> So we could send IPROTO_ERROR: ..., IPROTO_ERROR: ..., IPROTO_ERROR: ...
> For each message in the stack. But that looks like an illegal hack,
> and also does not allow to fix this thing alongside:
> https://github.com/tarantool/tarantool/issues/3399
> 
> So a proper stack looks better, like you proposed.
> 
> > +
> > +IPROTO_ERROR is always sent (as in older versions) in case of error.
> > +IPROTO_ERROR_STACK is presented in response only if there's at least two
> > +elements in diagnostic list. Map which contains error stack can be optimized
> > +in terms of space, so that avoid including error which is already encoded
> > +in IPROTO_ERROR.
> > 
> 
> 13. I don't think it is worth saving a few bytes for the cold path of error
> reporting. Lets always send IPROTO_ERROR with a last error like now *and*
> IPROTO_ERROR_STACK. Even if they contain the same error. It will make much
> easier to remove IPROTO_ERROR later. Otherwise we will need to carry them
> both forever, if IPROTO_ERROR_STACK would depend on IPROTO_ERROR.

Ok (it is only suggestion in scope of RFC, so I guess your comment doesn't
imply that last remark concerning optimization should be erased).

Full diff:

diff --git a/doc/rfc/1148-stacked-diagnostics.md b/doc/rfc/1148-stacked-diagnostics.md
index 68fa6d21a..0d1d07dc3 100644
--- a/doc/rfc/1148-stacked-diagnostics.md
+++ b/doc/rfc/1148-stacked-diagnostics.md
@@ -3,14 +3,17 @@
 * **Status**: In progress
 * **Start date**: 30-07-2019
 * **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov at tarantool.org,
-               Pettik Nikita korablev at tarantool.org
+               Pettik Nikita @Korablev77 korablev at tarantool.org
 * **Issues**: [#1148](https://github.com/tarantool/<repository\>/issues/1148)
 
 ## Background and motivation
 
 Support stacked diagnostics for Tarantool allows to accumulate all errors
 occurred during request processing. It allows to better understand what
-happened, and handle errors appropriately.
+happened, and handle errors appropriately. Consider following example:
+persistent Lua function referenced by functional index has a bug in it's
+definition, Lua handler sets an diag message. Then functional index extractor
+code setups an own, more specialized error.
 
 ### Current error diagnostics
 
@@ -25,28 +28,12 @@ Object representing error featuring following properties:
 The last error raised is exported with `box.error.last()` function.
 
 Type of error is represented by a few C++ classes (all are inherited from
-Exception class):
+Exception class). For instance hierarchy for ClientError is following:
 ```
 ClientError
  | LoggedError
  | AccessDeniedError
  | UnsupportedIndexFeature
-
-XlogError
- | XlogGapError
-
-SystemError
- | SocketError
- | OutOfMemory
- | TimedOut
-
-ChannelIsClosed
-FiberIsCancelled
-LuajitError
-IllegalParams
-CollationError
-SwimError
-CryptoError
 ```
 
 All codes and names of ClientError class are available in box.error.
@@ -58,7 +45,7 @@ tarantool> t:unpack()
 ---
 - type: ClientError
   code: 9
-  message: 'Failed to create space ''myspace'': just because'
+  message: 'Failed to create space ''myspace'': just cause'
   trace:
   - file: '[string "t = box.error.new(box.error.CREATE_SPACE, "my..."]'
     line: 1
@@ -78,12 +65,10 @@ methods and `.type`, `.message` and `.trace` fields.
 
 ## Proposal
 
-In some cases a diagnostic information should be more complicated than
-one last raised error. Consider following example: persistent Lua function
-referenced by functional index has a bug in it's definition, Lua handler sets
-an diag message. Then functional index extractor code setups an own, more
-specialized error. Without stacked diagnostic area, only last error is
-delivired to user. One way to deal with this problem is to introduce stack
+In some cases a diagnostic area should be more complicated than
+one last raised error to provide decent information concerning incident (see
+motivating example above). Without stacked diagnostic area, only last error is
+delivered to user. One way to deal with this problem is to introduce stack
 accumulating all errors happened during request processing.
 
 ### C API
@@ -91,7 +76,7 @@ accumulating all errors happened during request processing.
 Let's keep existent `diag_set()` method as is. It is supposed to replace the
 last error in diagnostic area with a new one. To add new error at the top of
 existing one, let's introduce new method `diag_add()`. It is assumed to keep
-an existent error message in diagnostic area (if any) and sets it as a reason
+an existent error message in diagnostic area (if any) and sets it as a previous
 error for a recently-constructed error object. Note that `diag_set()` is not
 going to preserve pointer to previous error which is held in error to be
 substituted. To illustrate last point consider example:
@@ -125,7 +110,7 @@ Errors: <e1(code = 1) -> e2(code = 2) -> e3(code = 1) -> NULL>
 -- End of execution
 ```
 
-As a result, at the end of execution fo second request, three errors in
+As a result, at the end of execution of second request, three errors in
 stack are reported instead of one.
 
 Another way to resolve this issue is to erase diagnostic area before
@@ -173,13 +158,23 @@ box.error.prev(error) == error.prev
 ```
 
 Furthermore, let's extend signature of `box.error.new()` with new (optional)
-argument - the 'reason' parent error object:
+argument - 'prev' - previous error object:
 
 ```
 e1 = box.error.new({code = 111, reason = "just cause"})
 e2 = box.error.new({code = 222, reason = "just cause x2", prev = e1})
 ```
 
+User may want to link already existing errors. To achieve this let's add
+`set_prev` method to error object or/and `link` to `box.error` so that one can
+join two errors:
+```
+e1 = box.error.new({code = 111, reason = "just cause"})
+e2 = box.error.new({code = 222, reason = "just cause x2"})
+...
+e2.set_prev(e1) -- e2.prev == e1
+box.error.link(e1, e2) -- e2.prev == e1
+```
 ### Binary protocol
 
 Currently errors are sent as `(IPROTO_ERROR | errcode)` response with an
@@ -197,7 +192,7 @@ IPROTO_ERROR_REASON or simply IPROTO_ERROR_V2):
                 {
                         // the most recent error object
                         IPROTO_ERROR_CODE: error_code_number,
-                        IPROTO_ERROR_REASON: error_reason_string,
+                        IPROTO_ERROR_MESSAGE: error_message_string,
                 },
                 ...
                 {