From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 502D946970E for ; Mon, 27 Jan 2020 14:23:39 +0300 (MSK) Date: Mon, 27 Jan 2020 14:23:38 +0300 From: Nikita Pettik Message-ID: <20200127112337.GE1144@tarantool.org> References: <3f49ddc14c2e07f4ac96fc71f3912fd0c5663882.1579032293.git.korablev@tarantool.org> <26c58292-4720-49b0-0770-6bafb559936f@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <26c58292-4720-49b0-0770-6bafb559936f@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH] box: rfc for stacked diagnostic area List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org On 23 Jan 23:56, Vladislav Shpilevoy wrote: > Hi! Thanks for the RFC! > > See 13 comments below. > > On 14/01/2020 21:16, Nikita Pettik wrote: > > +* **Start date**: 30-07-2019 > > +* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org, > > + Pettik Nikita korablev@tarantool.org > > 1. You missed your GitHub login. Okay, added. > > +* **Issues**: [#1148](https://github.com/tarantool//issues/1148) > > + > > 2. You missed 'Summary' section. I believe it is redundant for such brief RFC. > > +### Current error diagnostics > > + > > +Currently Tarantool has `diag_set()` mechanism to set a diagnostic error. > > +Object representing error featuring following properties: > > + - type (string) error’s C++ class; > > + - code (number) error’s number; > > + - message (string) error’s message; > > + - file (string) Tarantool source file; > > + - line (number) line number in the Tarantool source file. > > + > > +The last error raised is exported with `box.error.last()` function. > > + > > +Type of error is represented by a few C++ classes (all are inherited from > > +Exception class): > > +``` > > +ClientError > > + | LoggedError > > + | AccessDeniedError > > + | UnsupportedIndexFeature > > + > > +XlogError > > + | XlogGapError > > + > > +SystemError > > + | SocketError > > + | OutOfMemory > > + | TimedOut > > + > > +ChannelIsClosed > > +FiberIsCancelled > > +LuajitError > > +IllegalParams > > +CollationError > > +SwimError > > +CryptoError > > +``` > > 3. I don't remember why all these errors were listed but it certainly makes no > sense. For the RFC it does not matter what a concrete error types there are. Ok, left only ClientError hierarchy as an example. > > +tarantool> t = box.error.new(box.error.CREATE_SPACE, "myspace", "just cause") > > +tarantool> t:unpack() > > +--- > > +- type: ClientError > > + code: 9 > > + message: 'Failed to create space ''myspace'': just because' > > 4. Unpack is able to even unpack individual words! cause -> because, wow! > (joke time). Fixed. > > +Error cdata object has `:unpack()`, `:raise()`, `:match(...)`, `:__serialize()` > > +methods and `.type`, `.message` and `.trace` fields. > > + > > 5. 'Background and motivation' section is finished, but no a word about > motivation. Ok, moved motivating example from "Proposal" to "Background and motivation" section. > > +## Proposal > > + > > +In some cases a diagnostic information should be more complicated than > > +one last raised error. Consider following example: persistent Lua function > > +referenced by functional index has a bug in it's definition, Lua handler sets > > +an diag message. Then functional index extractor code setups an own, more > > +specialized error. Without stacked diagnostic area, only last error is > > +delivired to user. One way to deal with this problem is to introduce stack > > +accumulating all errors happened during request processing. > > + > > +### C API > > + > > +Let's keep existent `diag_set()` method as is. It is supposed to replace the > > +last error in diagnostic area with a new one. To add new error at the top of > > +existing one, let's introduce new method `diag_add()`. It is assumed to keep > > +an existent error message in diagnostic area (if any) and sets it as a reason > > 6. Reason is a different term. It is an error message of the current error. > And that name 'reason' is already occupied in box.error.new() function. Besides, > below you call it 'prev', previous error. I propose to keep the terminology the > same everywhere. 'Reason' for error message, 'previous' for an error happened > before a new one, and linked with it. No mixing of these terms. Fair, fixed. > > +-- Processing of request #2 > > +1. diag_add(code = 1) > > +Errors: e2(code = 2) -> e3(code = 1) -> NULL> > > +-- End of execution > > +``` > > + > > +As a result, at the end of execution fo second request, three errors in > > 7. fo -> of. Fixed. > > +Another way to resolve this issue is to erase diagnostic area before > > +request processing. However, it breaks current user-visible behaviour > > +since box.error.last() will preserve last occurred error only until execution > > +of the next request. > > + > 8. I think users would rather be surprised if a fiber would have not empty diag > when a request is just started. Just like it was recently with fiber.storage. > > In case I misunderstood what do you mean as a request - in my terminology > 'request' is either a manually created fiber, which dies and clears error.last > anyway after its function is finished, or it is a pooled fiber, which doesn't > die, but still should not leak anything between requests. So now we have a > not-consistent behaviour. It not only *can* be fixed, but *must* be fixed. 'Request' in context I used is any box operation (e.g. space:insert(), index:select() or cn:execute()). Also I 'member that Konstantin was against nullifing last error before processing box.execute() and other box functions (it was in scope of reworking SQL parser's errors). > > +The diagnostic area (now) contains (nothing but) pointer to the top error: > > +``` > > +struct diag { > > + struct error *last; > > +}; > > + > > + > > +-- Return a reason error object for given error > > +-- (when exists, nil otherwise). > > +box.error.prev(error) == error.prev > > +``` > > + > > +Furthermore, let's extend signature of `box.error.new()` with new (optional) > > +argument - the 'reason' parent error object: > > 9. 'Reason' is already occupied by a error message. And below you use 'prev' > name for a parent error. Fixed. > 10. Are we not going to allow to link two existing errors? I imagine > it could be simpler and more flexible for a user, than filling > one big map in error.new(). Okay, I'm not against it: Another way to resolve this issue is to erase diagnostic area before @@ -173,13 +158,23 @@ box.error.prev(error) == error.prev ``` Furthermore, let's extend signature of `box.error.new()` with new (optional) -argument - the 'reason' parent error object: +argument - 'prev' - previous error object: ``` e1 = box.error.new({code = 111, reason = "just cause"}) e2 = box.error.new({code = 222, reason = "just cause x2", prev = e1}) ``` +User may want to link already existing errors. To achieve this let's add +`set_prev` method to error object or/and `link` to `box.error` so that one can +join two errors: +``` +e1 = box.error.new({code = 111, reason = "just cause"}) +e2 = box.error.new({code = 222, reason = "just cause x2"}) +... +e2.set_prev(e1) -- e2.prev == e1 +box.error.link(e1, e2) -- e2.prev == e1 +``` > > +### Binary protocol > > + > > +Currently errors are sent as `(IPROTO_ERROR | errcode)` response with an > > +string message containing error details as a payload. There are not so many > > +options to extend current protocol wihtout breaking backward compatibility > > +(which is obviously one of implementation requirements). One way is to extend > > +existent binary protocol with a new key IPROTO_ERROR_STACK (or > > +IPROTO_ERROR_REASON or simply IPROTO_ERROR_V2): > > +``` > > +{ > > + // backward compatibility > > + IPROTO_ERROR: "the most recent error message", > > + // modern error message > > + IPROTO_ERROR_STACK: { > > + { > > + // the most recent error object > > + IPROTO_ERROR_CODE: error_code_number, > > + IPROTO_ERROR_REASON: error_reason_string, > > + }, > > + ... > > + { > > + // the oldest (reason) error object > > + }, > > + } > > +} > > +``` > > 11. The names look good. Except that maybe change IPROTO_ERROR_REASON -> > IPROTO_ERROR_MESSAGE. I don't know why it is called reason everywhere > in our public API. In struct error it is message, not reason. And it > just looks more correct, IMO. On the other hand, it might look a bit inconsistent if in binary protocole field would have different name from one in Lua API. Nevertheless, I don't mind this renaming. Fixed in RFC. > 12. Another option - send multiple IPROTO_ERROR's. In MP_MAP you can > put multiple keys with the same value, and even keep a strict order. > So we could send IPROTO_ERROR: ..., IPROTO_ERROR: ..., IPROTO_ERROR: ... > For each message in the stack. But that looks like an illegal hack, > and also does not allow to fix this thing alongside: > https://github.com/tarantool/tarantool/issues/3399 > > So a proper stack looks better, like you proposed. > > > + > > +IPROTO_ERROR is always sent (as in older versions) in case of error. > > +IPROTO_ERROR_STACK is presented in response only if there's at least two > > +elements in diagnostic list. Map which contains error stack can be optimized > > +in terms of space, so that avoid including error which is already encoded > > +in IPROTO_ERROR. > > > > 13. I don't think it is worth saving a few bytes for the cold path of error > reporting. Lets always send IPROTO_ERROR with a last error like now *and* > IPROTO_ERROR_STACK. Even if they contain the same error. It will make much > easier to remove IPROTO_ERROR later. Otherwise we will need to carry them > both forever, if IPROTO_ERROR_STACK would depend on IPROTO_ERROR. Ok (it is only suggestion in scope of RFC, so I guess your comment doesn't imply that last remark concerning optimization should be erased). Full diff: diff --git a/doc/rfc/1148-stacked-diagnostics.md b/doc/rfc/1148-stacked-diagnostics.md index 68fa6d21a..0d1d07dc3 100644 --- a/doc/rfc/1148-stacked-diagnostics.md +++ b/doc/rfc/1148-stacked-diagnostics.md @@ -3,14 +3,17 @@ * **Status**: In progress * **Start date**: 30-07-2019 * **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org, - Pettik Nikita korablev@tarantool.org + Pettik Nikita @Korablev77 korablev@tarantool.org * **Issues**: [#1148](https://github.com/tarantool//issues/1148) ## Background and motivation Support stacked diagnostics for Tarantool allows to accumulate all errors occurred during request processing. It allows to better understand what -happened, and handle errors appropriately. +happened, and handle errors appropriately. Consider following example: +persistent Lua function referenced by functional index has a bug in it's +definition, Lua handler sets an diag message. Then functional index extractor +code setups an own, more specialized error. ### Current error diagnostics @@ -25,28 +28,12 @@ Object representing error featuring following properties: The last error raised is exported with `box.error.last()` function. Type of error is represented by a few C++ classes (all are inherited from -Exception class): +Exception class). For instance hierarchy for ClientError is following: ``` ClientError | LoggedError | AccessDeniedError | UnsupportedIndexFeature - -XlogError - | XlogGapError - -SystemError - | SocketError - | OutOfMemory - | TimedOut - -ChannelIsClosed -FiberIsCancelled -LuajitError -IllegalParams -CollationError -SwimError -CryptoError ``` All codes and names of ClientError class are available in box.error. @@ -58,7 +45,7 @@ tarantool> t:unpack() --- - type: ClientError code: 9 - message: 'Failed to create space ''myspace'': just because' + message: 'Failed to create space ''myspace'': just cause' trace: - file: '[string "t = box.error.new(box.error.CREATE_SPACE, "my..."]' line: 1 @@ -78,12 +65,10 @@ methods and `.type`, `.message` and `.trace` fields. ## Proposal -In some cases a diagnostic information should be more complicated than -one last raised error. Consider following example: persistent Lua function -referenced by functional index has a bug in it's definition, Lua handler sets -an diag message. Then functional index extractor code setups an own, more -specialized error. Without stacked diagnostic area, only last error is -delivired to user. One way to deal with this problem is to introduce stack +In some cases a diagnostic area should be more complicated than +one last raised error to provide decent information concerning incident (see +motivating example above). Without stacked diagnostic area, only last error is +delivered to user. One way to deal with this problem is to introduce stack accumulating all errors happened during request processing. ### C API @@ -91,7 +76,7 @@ accumulating all errors happened during request processing. Let's keep existent `diag_set()` method as is. It is supposed to replace the last error in diagnostic area with a new one. To add new error at the top of existing one, let's introduce new method `diag_add()`. It is assumed to keep -an existent error message in diagnostic area (if any) and sets it as a reason +an existent error message in diagnostic area (if any) and sets it as a previous error for a recently-constructed error object. Note that `diag_set()` is not going to preserve pointer to previous error which is held in error to be substituted. To illustrate last point consider example: @@ -125,7 +110,7 @@ Errors: e2(code = 2) -> e3(code = 1) -> NULL> -- End of execution ``` -As a result, at the end of execution fo second request, three errors in +As a result, at the end of execution of second request, three errors in stack are reported instead of one. Another way to resolve this issue is to erase diagnostic area before @@ -173,13 +158,23 @@ box.error.prev(error) == error.prev ``` Furthermore, let's extend signature of `box.error.new()` with new (optional) -argument - the 'reason' parent error object: +argument - 'prev' - previous error object: ``` e1 = box.error.new({code = 111, reason = "just cause"}) e2 = box.error.new({code = 222, reason = "just cause x2", prev = e1}) ``` +User may want to link already existing errors. To achieve this let's add +`set_prev` method to error object or/and `link` to `box.error` so that one can +join two errors: +``` +e1 = box.error.new({code = 111, reason = "just cause"}) +e2 = box.error.new({code = 222, reason = "just cause x2"}) +... +e2.set_prev(e1) -- e2.prev == e1 +box.error.link(e1, e2) -- e2.prev == e1 +``` ### Binary protocol Currently errors are sent as `(IPROTO_ERROR | errcode)` response with an @@ -197,7 +192,7 @@ IPROTO_ERROR_REASON or simply IPROTO_ERROR_V2): { // the most recent error object IPROTO_ERROR_CODE: error_code_number, - IPROTO_ERROR_REASON: error_reason_string, + IPROTO_ERROR_MESSAGE: error_message_string, }, ... {