From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 382C4469719 for ; Sun, 23 Feb 2020 20:43:56 +0300 (MSK) References: From: Vladislav Shpilevoy Message-ID: Date: Sun, 23 Feb 2020 18:43:53 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 4/7] box: introduce stacked diagnostic area List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik , tarantool-patches@dev.tarantool.org Hi! Thanks for the patch, welcome to the first review iteration! Finally something good, not related to SQL/replication/application server modules. See 21 comments below tho. > box: introduce stacked diagnostic area > > In terms of implementation, now struct error objects can be organized > into double-linked lists. To achieve this pointers to the next and > previous elements have been added to struct error. It is worth > mentioning that already existing rlist and stailq list implementations > are not suitable: rlist is cycled list, as a result it is impossible to > start iteration over the list from random list entry and finish it at > the logical end of the list; stailq is single-linked list leaving no > possibility to remove elements from the middle of the list. > > As a part of C interface, box_error_add() has been introduced. In > contrast to box_error_set() it does not replace last raised error, but > instead it adds error to the list of diagnostic errors having already > been set. If error is to be deleted (its reference counter hits 0 value) > it is unlinked from the list it belongs to and destroyed. Meanwhile, > error destruction leads to decrement of reference counter of its > previous error and so on. > > To organize errors into lists in Lua, table representing error object in > Lua now has .prev field (corresponding to 'previous' error) and method > :set_prev(e). The latter accepts error object (i.e. created via > box.error.new() or box.error.last()) and nil value. Both field .prev and > :set_prev() method are implemented as ffi functions. Also note that > cycles are now allowed while organizing errors into lists: 1. Now -> not. > e1 -> e2 -> e3; e3:set_prev(e1) -- would lead to error. > > Part of #1148 > > diff --git a/extra/exports b/extra/exports > index 7b84a1452..94cbdd210 100644 > --- a/extra/exports > +++ b/extra/exports > @@ -246,6 +246,8 @@ clock_monotonic64 > clock_process64 > clock_thread64 > string_strip_helper > +error_prev > +error_set_prev 2. I would move this to other exported error functions, above. > > # Lua / LuaJIT > > diff --git a/src/box/key_list.c b/src/box/key_list.c > index 3d736b55f..a766ce0ec 100644 > --- a/src/box/key_list.c > +++ b/src/box/key_list.c > @@ -63,9 +63,9 @@ key_list_iterator_create(struct key_list_iterator *it, struct tuple *tuple, > if (rc != 0) { > /* Can't evaluate function. */ > struct space *space = space_by_id(index_def->space_id); > - diag_set(ClientError, ER_FUNC_INDEX_FUNC, index_def->name, > - space ? space_name(space) : "", > - diag_last_error(diag_get())->errmsg); > + diag_add(ClientError, ER_FUNC_INDEX_FUNC, index_def->name, > + space != NULL ? space_name(space) : "", > + "can't evaluate function"); > return -1; 3. I would split stack diag introduction and its appliance into separate commits. But up to you, it is not so critical here. > } > uint32_t key_data_sz; > diff --git a/src/lib/core/diag.c b/src/lib/core/diag.c > index c350abb4a..17d3b0d7b 100644 > --- a/src/lib/core/diag.c > +++ b/src/lib/core/diag.c > @@ -34,6 +34,56 @@ > /* Must be set by the library user */ > struct error_factory *error_factory = NULL; > > +struct error * > +error_prev(struct error *e) > +{ > + assert(e != NULL); > + return e->next; 4. error_prev() returns next? We should do something with that. It is very counter-intuitive. Maybe just change next <-> prev names in struct error. Besides, you don't need this function, especially in extern, because in Lua you can access ._next/._prev members right away, transparently, like a table member, thanks to FFI. > +} > + > +int > +error_set_prev(struct error *e, struct error *prev) > +{ > + /* > + * Make sure that adding error won't result in cycles. > + * Don't bother with sophisticated cycle-detection > + * algorithms, simple iteration is OK since as a rule > + * list contains a dozen errors at maximum. > + */ > + struct error *tmp = prev; > + while (tmp != NULL) { > + if (tmp == e) > + return -1; > + tmp = tmp->next; > + } 5. I would turn this into an assertion here, and extract into a separate function, which would be called only when an error is added from some public function like Lua API, or C public API. Up to you, this also looks ok. > + /* > + * At once error can be reason for only one error. > + * So unlink previous 'prev' node. > + * > + * +--------+ NEXT +--------+ > + * | e | ---> |old prev| > + * +--------+ +--------+ > + * ^ | > + * | PREV | > + * +-------X-------+ > + * > + */ 6. Nice picture, but it confused me a little. I don't even know why. Perhaps because it is really simple code, and the picture made me think, that there is something non trivial. I would drop it, up to you. > + if (e->next != NULL) > + e->next->prev = NULL; > + /* Set new 'prev' node. */ > + e->next = prev; 7. You don't unref old e->next value. As a result, there is a leak (maybe, if I correctly understand, that 'next' is the reference counter keeper). A test: e1 = box.error.new({code = 0, reason = 'e1'}) e2 = box.error.new({code = 0, reason = 'e2'}) e1:set_prev(e2) e3 = box.error.new({code = 0, reason = 'e3'}) e1:set_prev(e3) I added printf() to error_create() and before a call of error->destroy. Then I nullified the references, and called GC. Only e1 and e3 were destroyed. e1 = nil e2 = nil e3 = nil collectgarbage('collect') Output of all of this was: error created 0x102a01e68 # e1 created error created 0x102b93f68 # e2 created error created 0x102e2ab08 # e3 created error destroyed 0x102e2ab08 # e3 destroyed error destroyed 0x102a01e68 # e1 destroyed The bug is easy to fix. But don't know how to test it. The trick with setmetatable({}, {__mode = 'v'}) does not work, because Lua object e2 is deleted fine. C object is still alive. Another option - use error injections like I did with ERRINJ_DYN_MODULE_COUNT. > + /* > + * Unlink new 'prev' node from its old stack. > + * nil can be also passed as an argument. > + */ 8. I didn't understand the comment, because this code works fine: e1 = box.error.new({code = 0, reason = 'e1'}) e2 = box.error.new({code = 0, reason = 'e2'}) e1:set_prev(e2) e3 = box.error.new({code = 0, reason = 'e3'}) e4 = box.error.new({code = 0, reason = 'e4'}) e3:set_prev(e4) e2:set_prev(e3) >From the comment it looks like e2:set_prev(e3) should detach e3 from its old stack, so a result would look like: e1->e2->e3 e4 But really it looks like: e1->e2->e3->e4 And this is good. So the comment is misleading. You probably meant unlink it from its old 'newer' errors. > + if (prev != NULL) { > + error_unlink_tail(prev); > + prev->prev = e; > + error_ref(prev); > + } > + return 0; > +} > + > void > error_create(struct error *e, > error_f destroy, error_f raise, error_f log, > diff --git a/src/lib/core/diag.h b/src/lib/core/diag.h > index 7e1e1a174..fc7996c25 100644 > --- a/src/lib/core/diag.h > +++ b/src/lib/core/diag.h > @@ -37,6 +37,7 @@ > #include > #include > #include "say.h" > +#include "small/rlist.h" 9. An artifact from the time when you wanted to use rlist? > #if defined(__cplusplus) > extern "C" { > @@ -84,6 +85,17 @@ struct error { > char file[DIAG_FILENAME_MAX]; > /* Error description. */ > char errmsg[DIAG_ERRMSG_MAX]; > + /** > + * Link to the next and previous errors. > + * RLIST implementation is not really suitable here > + * since it is organized as circular list. In such > + * a case it is impossible to start an iteration > + * from any node and finish at the logical end of the > + * list. Double-linked list is required to allow deletion > + * from the middle of the list. > + */ > + struct error *next; > + struct error *prev; 10. Please, add an example here, and say explicitly, which error is newer, which is older. This would help a lot to read the code. Also we can make it more straight, and rename next->older, prev->newer. Or rename next->reason, prev->conseq/outcome. Don't know. But currently it is really hard to read, when you want to operate on a previous error, but use the 'next' member. Even if we swap next and prev, it is still unclear what is 'tail', what is 'reason'. I would go for reason and conseq/outcome. > }; > > static inline void > @@ -97,10 +109,60 @@ error_unref(struct error *e) > { > assert(e->refs > 0); > --e->refs; > - if (e->refs == 0) > + if (e->refs == 0) { > + /* Unlink error from the list completely. */ > + if (e->prev != NULL) > + e->prev->next = e->next; > + if (e->next != NULL) { > + e->next->prev = e->prev; > + error_unref(e->next); > + } 11. Only 'next' link keeps a reference, right? It is worth commenting in struct error. And why too (I guess because otherwise two errors linking each other as prev and next would never be deleted). > + e->next = NULL; > + e->prev = NULL; > e->destroy(e); > + } > +} 12. I think you know it was coming :D head = box.error.new({code = 0, reason = 'r'}) tail = head for i = 1, 100000 do local next = box.error.new({code = 0, reason = 'r'}) tail:set_prev(next) tail = next end tail = nil head = nil Stupid recursion test. tarantool> collectgarbage('collect') Process 58860 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x106101ff8) frame #0: 0x00000001001772e3 tarantool`error_unref(e=0x0000000106ee6d78) at diag.h:118:4 115 e->prev->next = e->next; 116 if (e->next != NULL) { 117 e->next->prev = e->prev; -> 118 error_unref(e->next); 119 } 120 e->next = NULL; 121 e->prev = NULL; Target 0: (tarantool) stopped. We need to either limit error stack size, or remove the recursion. Recursion removal should be simple. First you iterate through the list and check which errors are going to be deleted (have ref count 1). When you see a ref count 2 first time, you stop. Now you start their deletion from tail. So first you delete the error, which has prev error with ref count 2. It won't start a recursion, because prev error is not deleted. Then you delete next error. It won't start recursion, because its prev error is already null. And so on. Example: errors: e1 -> e2 -> e3 -> e4 -> e5 -> refs: 1 1 1 1 2 ... Assume you want to delete e1. Currently you delete it as recursion e1 deletes e2 deletes e3 deletes e4. With my algo above you find e4. Now you know e1 - e4 should be deleted. Then you go in a second while cycle from e4 back to e1. Each deletion won't trigger a recursion, because prev error always will be null or something with ref count 2. Also you could do it in one cycle probably. Then you would need a temporary ref. For example, when you delete e1, you give ++ref to its prev error. After e1 is deleted, you check whether --ref to the prev error would delete it too. If yes, you repeat it again - ++ to prev prev error, delete prev, check prev prev deletion, and so on. It also removes the recursion. Huge number of errors easily may happen, if a user had a bug in his code about an infinite cycle adding errors, or something like that. Also huge number of errors may affect the whole tx thread each time when we send an error to iproto, or delete it. Because we will iterate without any yields in a huge list. Don't know what to do with that. Probably nothing for now. > + > +/** > + * Unlink error from any error which point to it. For instance: > + * e1 -> e2 -> e3 -> e4 (e1:set_prev(e2); e2:set_prev(33) ...) 13. 33 -> e3. > + * unlink(e3): e1 -> e2 -> NULL; e3 -> e4 -> NULL > + */ > +static inline void > +error_unlink_tail(struct error *e) > +{ > + if (e->prev != NULL) { > + assert(e->refs > 1); > + error_unref(e); > + e->prev->next = NULL; > + } > + e->prev = NULL; > } > > +/** > + * Return previous (for given error) error. Result can be NULL > + * which means that there's no previous error. Simple getter > + * to be used as ffi method in lua/error.lua. > + */ > +struct error * > +error_prev(struct error *e); > + > +/** > + * Set previous error: remove @a prev from its current stack and 14. As I see from the code - not from its stack, but cut its previous 'tail' of newer errors. Right? > + * link to the one @a e belongs to. Note that all previous errors > + * starting from @a prev->next are transferred with it as well > + * (i.e. reasons for given error are not erased). For instance: > + * e1 -> e2 -> NULL; e3 -> e4 -> NULL; > + * e2:set_prev(e3): e1 -> e2 -> e3 -> e4 -> NULL > + * > + * @a prev can be NULL. To be used as ffi method in lua/error.lua. > + * > + * @retval -1 in case adding @a prev results in list cycles; > + * 0 otherwise. > + */ > +int > +error_set_prev(struct error *e, struct error *prev); > + > NORETURN static inline void > error_raise(struct error *e) > { > @@ -178,6 +240,25 @@ diag_set_error(struct diag *diag, struct error *e) > assert(e != NULL); > error_ref(e); > diag_clear(diag); > + error_unlink_tail(e); > + diag->last = e; > +} > + > +/** > + * Add a new error to the diagnostics area. It is added to the > + * tail, so that list forms stack. > + * @param diag Diagnostics area. > + * @param e Error to be added. > + */ > +static inline void > +diag_add_error(struct diag *diag, struct error *e) > +{ > + assert(e != NULL); > + error_ref(e); > + error_unlink_tail(e); > + e->next = diag->last; 15. I thought, that 'next' member is responsible for keeping a reference, but since you don't ref 'diag->last' here, I am not sure now. Could you please clarify? > + if (diag->last != NULL) > + diag->last->prev = e; > diag->last = e; > } > > diff --git a/src/lua/error.lua b/src/lua/error.lua > index 7f249864a..222b5e273 100644 > --- a/src/lua/error.lua > +++ b/src/lua/error.lua > @@ -11,6 +11,11 @@ enum { > > typedef void (*error_f)(struct error *e); > > +struct rlist { > + struct rlist *prev; > + struct rlist *next; > +}; 16. Nope. > + > struct error { > error_f _destroy; > error_f _raise; > @@ -95,11 +108,37 @@ local function error_errno(err) > return e > end > > +local function error_prev(err) > + local e = ffi.C.error_prev(err); > + if e ~= nil then > + return e > + else > + return nil > + end 17. I said it above, but just in case repeat it here: you can directly return err._next, because FFI allows to access struct members in Lua. Strictly speaking you can do the same below for set_prev, but it is not so trivial, and it is right to keep it in C and extern. > +end > + > +local function error_set_prev(err, prev) > + -- First argument must be error. > + if not ffi.istype('struct error', err) then > + error("Usage: error1:set_prev(error2)") > + end > + -- Second argument must be error or nil. > + if not ffi.istype('struct error', prev) and prev ~= nil then > + error("Usage: error1:set_prev(error2)") > + end > + local ok = tonumber(ffi.C.error_set_prev(err, prev)); 18. Why do you need tonumber? 'int' should be fine to compare with 0. > + if ok ~= 0 then > + error("Cycles are not allowed") > + end > + > +end > + > diff --git a/test/box/misc.result b/test/box/misc.result > index b0a81a055..1e235e8a1 100644 > --- a/test/box/misc.result > +++ b/test/box/misc.result > @@ -352,6 +352,282 @@ box.error.clear() > box.error() > --- > ... > +-- gh-1148: erros can be arranged into list (so called 19. erros -> errors. If you work in Sublime, it has nice spell checker. 20. I don't like that we have error module tests in misc.test.lua. Especially when there is already so many of them. Could you please move them all (including old tests) into a new file error.test.lua? > +-- stacked diagnostics). > +-- > +e1 = box.error.new({code = 111, reason = "cause"}) > +--- > +... > +assert(e1.prev == nil) > +--- > +- true > +... > +e1:set_prev(e1) > +--- > +- error: 'builtin/error.lua: Cycles are not allowed' > +... > +assert(e1.prev == nil) > +--- > +- true > +... > +e2 = box.error.new({code = 111, reason = "cause of cause"}) > +--- > +... > +e1:set_prev(e2) > +--- > +... > +assert(e1.prev == e2) > +--- > +- true > +... > +e2:set_prev(e1) > +--- > +- error: 'builtin/error.lua: Cycles are not allowed' > +... > +assert(e2.prev == nil) > +--- > +- true > +... > +-- At this point stack is following: e1 -> e2 > +-- Let's test following cases: > +-- 1. e3 -> e2, e1 -> NULL (e3:set_prev(e2)) > +-- 2. e1 -> e3, e2 -> NULL (e1:set_prev(e3)) > +-- 3. e3 -> e1 -> e2 (e3:set_prev(e1)) > +-- 4. e1 -> e2 -> e3 (e2:set_prev(e3)) > +-- > +e3 = box.error.new({code = 111, reason = "another cause"}) > +--- > +... > +e3:set_prev(e2) > +--- > +... > +assert(e3.prev == e2) 21. It is not really necessary to use assertion here. It will fail without assertion too in case of a problem. Assertion is needed, when you want an exception. For example, when you call a function, inside which you do tests. Instead of an assertion it is better to write something like that: e3.prev == e2 or {e3, e3.prev, e2} In that case, if the == is false, you will see not only a false in .reject file, but also actual values of e3, e3.prev, and e2. This is especially helpful, when error is not stable, and in Travis/Gitlab, because you see values right in the web console.