From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [94.100.177.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 51D6846970E for ; Fri, 24 Jan 2020 01:56:49 +0300 (MSK) References: <3f49ddc14c2e07f4ac96fc71f3912fd0c5663882.1579032293.git.korablev@tarantool.org> From: Vladislav Shpilevoy Message-ID: <26c58292-4720-49b0-0770-6bafb559936f@tarantool.org> Date: Thu, 23 Jan 2020 23:56:45 +0100 MIME-Version: 1.0 In-Reply-To: <3f49ddc14c2e07f4ac96fc71f3912fd0c5663882.1579032293.git.korablev@tarantool.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH] box: rfc for stacked diagnostic area List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik , tarantool-patches@dev.tarantool.org, Sergey Ostanevich , Alexander Turenko Hi! Thanks for the RFC! See 13 comments below. On 14/01/2020 21:16, Nikita Pettik wrote: > From: Kirill Shcherbatov > > Part of #1148 > --- > rfc in human-readable format: https://github.com/tarantool/tarantool/blob/np/gh-1148-stacked-diag/doc/rfc/1148-stacked-diagnostics.md > Previous version of rfc: https://github.com/tarantool/tarantool/commit/c2a7e1a10732fdb231780ac04d1a4bc1618c0468 > > Changes in this version: > - All savepoint routines have been removed as meaningless; > - Removed SQL warnings mentions since they are unrelated to the subject of rfc; > - Removed mentions about exposing box.error.new() with filename and > line parameters (again, it is barely related to the subject of rfc); > - Described in details difference between diag_set() in diag_add() > functions as a part of C interface; > - Other minor clean-ups. > > Now RFC looks quite brief and straightforward in its implementation; > it attempts at solving only #1148 issue. > > doc/rfc/1148-stacked-diagnostics.md | 214 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 214 insertions(+) > create mode 100644 doc/rfc/1148-stacked-diagnostics.md > > diff --git a/doc/rfc/1148-stacked-diagnostics.md b/doc/rfc/1148-stacked-diagnostics.md > new file mode 100644 > index 000000000..68fa6d21a > --- /dev/null > +++ b/doc/rfc/1148-stacked-diagnostics.md > @@ -0,0 +1,214 @@ > +# Stacked Diagnostics > + > +* **Status**: In progress > +* **Start date**: 30-07-2019 > +* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org, > + Pettik Nikita korablev@tarantool.org 1. You missed your GitHub login. > +* **Issues**: [#1148](https://github.com/tarantool//issues/1148) > + 2. You missed 'Summary' section. > +## Background and motivation > + > +Support stacked diagnostics for Tarantool allows to accumulate all errors > +occurred during request processing. It allows to better understand what > +happened, and handle errors appropriately. > + > +### Current error diagnostics > + > +Currently Tarantool has `diag_set()` mechanism to set a diagnostic error. > +Object representing error featuring following properties: > + - type (string) error’s C++ class; > + - code (number) error’s number; > + - message (string) error’s message; > + - file (string) Tarantool source file; > + - line (number) line number in the Tarantool source file. > + > +The last error raised is exported with `box.error.last()` function. > + > +Type of error is represented by a few C++ classes (all are inherited from > +Exception class): > +``` > +ClientError > + | LoggedError > + | AccessDeniedError > + | UnsupportedIndexFeature > + > +XlogError > + | XlogGapError > + > +SystemError > + | SocketError > + | OutOfMemory > + | TimedOut > + > +ChannelIsClosed > +FiberIsCancelled > +LuajitError > +IllegalParams > +CollationError > +SwimError > +CryptoError > +``` 3. I don't remember why all these errors were listed but it certainly makes no sense. For the RFC it does not matter what a concrete error types there are. > + > +All codes and names of ClientError class are available in box.error. > +User is able to create a new error instance of predefined type using > +box.error.new() function. For example: > +``` > +tarantool> t = box.error.new(box.error.CREATE_SPACE, "myspace", "just cause") > +tarantool> t:unpack() > +--- > +- type: ClientError > + code: 9 > + message: 'Failed to create space ''myspace'': just because' 4. Unpack is able to even unpack individual words! cause -> because, wow! (joke time). > + trace: > + - file: '[string "t = box.error.new(box.error.CREATE_SPACE, "my..."]' > + line: 1 > +``` > + > +User is also capable of defining own errors with any code by means of: > +``` > +box.error.new({code = user_code, reason = user_error_msg}) > +``` > +For instance: > +``` > +e = box.error.new({code = 500, reason = 'just cause'}) > +``` > + > +Error cdata object has `:unpack()`, `:raise()`, `:match(...)`, `:__serialize()` > +methods and `.type`, `.message` and `.trace` fields. > + 5. 'Background and motivation' section is finished, but no a word about motivation. > +## Proposal > + > +In some cases a diagnostic information should be more complicated than > +one last raised error. Consider following example: persistent Lua function > +referenced by functional index has a bug in it's definition, Lua handler sets > +an diag message. Then functional index extractor code setups an own, more > +specialized error. Without stacked diagnostic area, only last error is > +delivired to user. One way to deal with this problem is to introduce stack > +accumulating all errors happened during request processing. > + > +### C API > + > +Let's keep existent `diag_set()` method as is. It is supposed to replace the > +last error in diagnostic area with a new one. To add new error at the top of > +existing one, let's introduce new method `diag_add()`. It is assumed to keep > +an existent error message in diagnostic area (if any) and sets it as a reason 6. Reason is a different term. It is an error message of the current error. And that name 'reason' is already occupied in box.error.new() function. Besides, below you call it 'prev', previous error. I propose to keep the terminology the same everywhere. 'Reason' for error message, 'previous' for an error happened before a new one, and linked with it. No mixing of these terms. > +error for a recently-constructed error object. Note that `diag_set()` is not > +going to preserve pointer to previous error which is held in error to be > +substituted. To illustrate last point consider example:> + > +``` > +0. Errors: > +1. diag_set(code = 1) > +Errors: NULL> > +2. diag_add(code = 2) > +Errors: e2(code = 2) -> NULL> > +3. diag_set(code = 3) > +Errors: NULL> > +``` > + > +Hence, developer takes responsibility of placing `diag_set()` where the most > +basic error should be raised. For instance, if during request processing > +`diag_add()` is called before `diag_set()` then it will result in inheritance > +of all errors from previous error raise:> + > +``` > +-- Processing of request #1 > +1. diag_set(code = 1) > +Errors: NULL> > +2. diag_add(code = 2) > +Errors: e2(code = 2) -> NULL> > +-- End of execution > + > +-- Processing of request #2 > +1. diag_add(code = 1) > +Errors: e2(code = 2) -> e3(code = 1) -> NULL> > +-- End of execution > +``` > + > +As a result, at the end of execution fo second request, three errors in 7. fo -> of. > +stack are reported instead of one. > + > +Another way to resolve this issue is to erase diagnostic area before > +request processing. However, it breaks current user-visible behaviour > +since box.error.last() will preserve last occurred error only until execution > +of the next request. > + 8. I think users would rather be surprised if a fiber would have not empty diag when a request is just started. Just like it was recently with fiber.storage. In case I misunderstood what do you mean as a request - in my terminology 'request' is either a manually created fiber, which dies and clears error.last anyway after its function is finished, or it is a pooled fiber, which doesn't die, but still should not leak anything between requests. So now we have a not-consistent behaviour. It not only *can* be fixed, but *must* be fixed. > +The diagnostic area (now) contains (nothing but) pointer to the top error: > +``` > +struct diag { > + struct error *last; > +}; > + > +``` > + > +To organize errors in a list let's extend error structure with pointer to > +the previous element. Or alternatively, add member of any data structure > +providing list properties (struct rlist, struct stailq or whatever): > +``` > +struct diag { > + struct stailq *errors; > +}; > + > +struct error { > + ... > + struct stailq_entry *in_errors; > +}; > +``` > + > + > +### Lua API > + > +Tarantool returns a last-set (diag::last) error as `cdata` object from central > +diagnostic area to Lua in case of error. User should be unable to modify it > +(since it is considered to be a bad practice - in fact object doesn't belong > +to user). On the other hand, user needs an ability to inspect a collected > +diagnostic information. Hence, let's extend the `box.error` API with a function > +which provides the way to get the previous error (reason): `:prev()` (and > +correspondingly `.prev` field). > + > +``` > +-- Return a reason error object for given error > +-- (when exists, nil otherwise). > +box.error.prev(error) == error.prev > +``` > + > +Furthermore, let's extend signature of `box.error.new()` with new (optional) > +argument - the 'reason' parent error object: 9. 'Reason' is already occupied by a error message. And below you use 'prev' name for a parent error. > + > +``` > +e1 = box.error.new({code = 111, reason = "just cause"}) > +e2 = box.error.new({code = 222, reason = "just cause x2", prev = e1}) > +``` 10. Are we not going to allow to link two existing errors? I imagine it could be simpler and more flexible for a user, than filling one big map in error.new(). > + > +### Binary protocol > + > +Currently errors are sent as `(IPROTO_ERROR | errcode)` response with an > +string message containing error details as a payload. There are not so many > +options to extend current protocol wihtout breaking backward compatibility > +(which is obviously one of implementation requirements). One way is to extend > +existent binary protocol with a new key IPROTO_ERROR_STACK (or > +IPROTO_ERROR_REASON or simply IPROTO_ERROR_V2): > +``` > +{ > + // backward compatibility > + IPROTO_ERROR: "the most recent error message", > + // modern error message > + IPROTO_ERROR_STACK: { > + { > + // the most recent error object > + IPROTO_ERROR_CODE: error_code_number, > + IPROTO_ERROR_REASON: error_reason_string, > + }, > + ... > + { > + // the oldest (reason) error object > + }, > + } > +} > +``` 11. The names look good. Except that maybe change IPROTO_ERROR_REASON -> IPROTO_ERROR_MESSAGE. I don't know why it is called reason everywhere in our public API. In struct error it is message, not reason. And it just looks more correct, IMO. 12. Another option - send multiple IPROTO_ERROR's. In MP_MAP you can put multiple keys with the same value, and even keep a strict order. So we could send IPROTO_ERROR: ..., IPROTO_ERROR: ..., IPROTO_ERROR: ... For each message in the stack. But that looks like an illegal hack, and also does not allow to fix this thing alongside: https://github.com/tarantool/tarantool/issues/3399 So a proper stack looks better, like you proposed. > + > +IPROTO_ERROR is always sent (as in older versions) in case of error. > +IPROTO_ERROR_STACK is presented in response only if there's at least two > +elements in diagnostic list. Map which contains error stack can be optimized > +in terms of space, so that avoid including error which is already encoded > +in IPROTO_ERROR. > 13. I don't think it is worth saving a few bytes for the cold path of error reporting. Lets always send IPROTO_ERROR with a last error like now *and* IPROTO_ERROR_STACK. Even if they contain the same error. It will make much easier to remove IPROTO_ERROR later. Otherwise we will need to carry them both forever, if IPROTO_ERROR_STACK would depend on IPROTO_ERROR.