[Tarantool-patches] [PATCH 4/7] box: introduce stacked diagnostic area

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sun Feb 23 20:43:53 MSK 2020


Hi! Thanks for the patch, welcome to the first review iteration!

Finally something good, not related to SQL/replication/application server
modules.

See 21 comments below tho.

>     box: introduce stacked diagnostic area
>     
>     In terms of implementation, now struct error objects can be organized
>     into double-linked lists. To achieve this pointers to the next and
>     previous elements have been added to struct error. It is worth
>     mentioning that already existing rlist and stailq list implementations
>     are not suitable: rlist is cycled list, as a result it is impossible to
>     start iteration over the list from random list entry and finish it at
>     the logical end of the list; stailq is single-linked list leaving no
>     possibility to remove elements from the middle of the list.
>     
>     As a part of C interface, box_error_add() has been introduced. In
>     contrast to box_error_set() it does not replace last raised error, but
>     instead it adds error to the list of diagnostic errors having already
>     been set. If error is to be deleted (its reference counter hits 0 value)
>     it is unlinked from the list it belongs to and destroyed. Meanwhile,
>     error destruction leads to decrement of reference counter of its
>     previous error and so on.
>     
>     To organize errors into lists in Lua, table representing error object in
>     Lua now has .prev field (corresponding to 'previous' error) and method
>     :set_prev(e). The latter accepts error object (i.e. created via
>     box.error.new() or box.error.last()) and nil value. Both field .prev and
>     :set_prev() method are implemented as ffi functions. Also note that
>     cycles are now allowed while organizing errors into lists:

1. Now -> not.

>     e1 -> e2 -> e3; e3:set_prev(e1) -- would lead to error.
>     
>     Part of #1148
> 
> diff --git a/extra/exports b/extra/exports
> index 7b84a1452..94cbdd210 100644
> --- a/extra/exports
> +++ b/extra/exports
> @@ -246,6 +246,8 @@ clock_monotonic64
>  clock_process64
>  clock_thread64
>  string_strip_helper
> +error_prev
> +error_set_prev

2. I would move this to other exported error functions, above.

>  
>  # Lua / LuaJIT
>  
> diff --git a/src/box/key_list.c b/src/box/key_list.c
> index 3d736b55f..a766ce0ec 100644
> --- a/src/box/key_list.c
> +++ b/src/box/key_list.c
> @@ -63,9 +63,9 @@ key_list_iterator_create(struct key_list_iterator *it, struct tuple *tuple,
>  	if (rc != 0) {
>  		/* Can't evaluate function. */
>  		struct space *space = space_by_id(index_def->space_id);
> -		diag_set(ClientError, ER_FUNC_INDEX_FUNC, index_def->name,
> -			 space ? space_name(space) : "",
> -			 diag_last_error(diag_get())->errmsg);
> +		diag_add(ClientError, ER_FUNC_INDEX_FUNC, index_def->name,
> +			 space != NULL ? space_name(space) : "",
> +			 "can't evaluate function");
>  		return -1;

3. I would split stack diag introduction and its appliance into
separate commits. But up to you, it is not so critical here.

>  	}
>  	uint32_t key_data_sz;
> diff --git a/src/lib/core/diag.c b/src/lib/core/diag.c
> index c350abb4a..17d3b0d7b 100644
> --- a/src/lib/core/diag.c
> +++ b/src/lib/core/diag.c
> @@ -34,6 +34,56 @@
>  /* Must be set by the library user */
>  struct error_factory *error_factory = NULL;
>  
> +struct error *
> +error_prev(struct error *e)
> +{
> +	assert(e != NULL);
> +	return e->next;

4. error_prev() returns next? We should do something
with that. It is very counter-intuitive. Maybe just
change next <-> prev names in struct error.

Besides, you don't need this function, especially in
extern, because in Lua you can access ._next/._prev members
right away, transparently, like a table member, thanks to
FFI.

> +}
> +
> +int
> +error_set_prev(struct error *e, struct error *prev)
> +{
> +	/*
> +	 * Make sure that adding error won't result in cycles.
> +	 * Don't bother with sophisticated cycle-detection
> +	 * algorithms, simple iteration is OK since as a rule
> +	 * list contains a dozen errors at maximum.
> +	 */
> +	struct error *tmp = prev;
> +	while (tmp != NULL) {
> +		if (tmp == e)
> +			return -1;
> +		tmp = tmp->next;
> +	}

5. I would turn this into an assertion here, and extract into a
separate function, which would be called only when an error is
added from some public function like Lua API, or C public API.
Up to you, this also looks ok.

> +	/*
> +	 * At once error can be reason for only one error.
> +	 * So unlink previous 'prev' node.
> +	 *
> +	 * +--------+ NEXT +--------+
> +	 * |    e   | ---> |old prev|
> +	 * +--------+      +--------+
> +	 *     ^               |
> +	 *     |      PREV     |
> +	 *     +-------X-------+
> +	 *
> +	 */

6. Nice picture, but it confused me a little. I don't even know
why. Perhaps because it is really simple code, and the picture
made me think, that there is something non trivial. I would drop
it, up to you.

> +	if (e->next != NULL)
> +		e->next->prev = NULL;
> +	/* Set new 'prev' node. */
> +	e->next = prev;

7. You don't unref old e->next value. As a result, there is a leak
(maybe, if I correctly understand, that 'next' is the reference
counter keeper). A test:

	e1 = box.error.new({code = 0, reason = 'e1'})
	e2 = box.error.new({code = 0, reason = 'e2'})
	e1:set_prev(e2)

	e3 = box.error.new({code = 0, reason = 'e3'})
	e1:set_prev(e3)

I added printf() to error_create() and before a call of
error->destroy. Then I nullified the references, and called GC.
Only e1 and e3 were destroyed.

	e1 = nil
	e2 = nil
	e3 = nil
	collectgarbage('collect')

Output of all of this was:

	error created 0x102a01e68 # e1 created
	error created 0x102b93f68 # e2 created
	error created 0x102e2ab08 # e3 created
	error destroyed 0x102e2ab08 # e3 destroyed
	error destroyed 0x102a01e68 # e1 destroyed

The bug is easy to fix. But don't know how to test it. The trick
with setmetatable({}, {__mode = 'v'}) does not work, because Lua
object e2 is deleted fine. C object is still alive. Another
option - use error injections like I did with ERRINJ_DYN_MODULE_COUNT.

> +	/*
> +	 * Unlink new 'prev' node from its old stack.
> +	 * nil can be also passed as an argument.
> +	 */

8. I didn't understand the comment, because this code works
fine:

	e1 = box.error.new({code = 0, reason = 'e1'})
	e2 = box.error.new({code = 0, reason = 'e2'})
	e1:set_prev(e2)

	e3 = box.error.new({code = 0, reason = 'e3'})
	e4 = box.error.new({code = 0, reason = 'e4'})
	e3:set_prev(e4)

	e2:set_prev(e3)

>From the comment it looks like e2:set_prev(e3) should
detach e3 from its old stack, so a result would look
like:

	e1->e2->e3
        e4

But really it looks like:

	e1->e2->e3->e4

And this is good. So the comment is misleading. You probably
meant unlink it from its old 'newer' errors.

> +	if (prev != NULL) {
> +		error_unlink_tail(prev);
> +		prev->prev = e;
> +		error_ref(prev);
> +	}
> +	return 0;
> +}
> +
>  void
>  error_create(struct error *e,
>  	     error_f destroy, error_f raise, error_f log,
> diff --git a/src/lib/core/diag.h b/src/lib/core/diag.h
> index 7e1e1a174..fc7996c25 100644
> --- a/src/lib/core/diag.h
> +++ b/src/lib/core/diag.h
> @@ -37,6 +37,7 @@
>  #include <stdbool.h>
>  #include <assert.h>
>  #include "say.h"
> +#include "small/rlist.h"

9. An artifact from the time when you wanted to use rlist?

>  #if defined(__cplusplus)
>  extern "C" {
> @@ -84,6 +85,17 @@ struct error {
>  	char file[DIAG_FILENAME_MAX];
>  	/* Error description. */
>  	char errmsg[DIAG_ERRMSG_MAX];
> +	/**
> +	 * Link to the next and previous errors.
> +	 * RLIST implementation is not really suitable here
> +	 * since it is organized as circular list. In such
> +	 * a case it is impossible to start an iteration
> +	 * from any node and finish at the logical end of the
> +	 * list. Double-linked list is required to allow deletion
> +	 * from the middle of the list.
> +	 */
> +	struct error *next;
> +	struct error *prev;

10. Please, add an example here, and say explicitly, which error
is newer, which is older. This would help a lot to read the
code.

Also we can make it more straight, and rename next->older, prev->newer.
Or rename next->reason, prev->conseq/outcome. Don't know. But currently
it is really hard to read, when you want to operate on a previous error,
but use the 'next' member. Even if we swap next and prev, it is still
unclear what is 'tail', what is 'reason'. I would go for reason and
conseq/outcome.

>  };
>  
>  static inline void
> @@ -97,10 +109,60 @@ error_unref(struct error *e)
>  {
>  	assert(e->refs > 0);
>  	--e->refs;
> -	if (e->refs == 0)
> +	if (e->refs == 0) {
> +		/* Unlink error from the list completely. */
> +		if (e->prev != NULL)
> +			e->prev->next = e->next;
> +		if (e->next != NULL) {
> +			e->next->prev = e->prev;
> +			error_unref(e->next);
> +		}

11. Only 'next' link keeps a reference, right? It is worth
commenting in struct error. And why too (I guess because otherwise
two errors linking each other as prev and next would never be
deleted).

> +		e->next = NULL;
> +		e->prev = NULL;
>  		e->destroy(e);
> +	}
> +}

12. I think you know it was coming :D

	head = box.error.new({code = 0, reason = 'r'})
	tail = head
	for i = 1, 100000 do
	    local next = box.error.new({code = 0, reason = 'r'})
	    tail:set_prev(next)
	    tail = next
	end
	tail = nil
	head = nil

Stupid recursion test.

    tarantool> collectgarbage('collect')
    Process 58860 stopped
    * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x106101ff8)
        frame #0: 0x00000001001772e3 tarantool`error_unref(e=0x0000000106ee6d78) at diag.h:118:4
       115              e->prev->next = e->next;
       116          if (e->next != NULL) {
       117              e->next->prev = e->prev;
    -> 118              error_unref(e->next);
       119          }
       120          e->next = NULL;
       121          e->prev = NULL;
    Target 0: (tarantool) stopped.

We need to either limit error stack size, or remove the recursion.
Recursion removal should be simple. First you iterate through the
list and check which errors are going to be deleted (have ref
count 1). When you see a ref count 2 first time, you stop. Now
you start their deletion from tail. So first you delete the error,
which has prev error with ref count 2. It won't start a recursion,
because prev error is not deleted. Then you delete next error. It
won't start recursion, because its prev error is already null. And
so on.

Example:

    errors: e1 -> e2 -> e3 -> e4 -> e5 ->
    refs:   1     1     1     1     2     ...

Assume you want to delete e1. Currently you delete it
as recursion e1 deletes e2 deletes e3 deletes e4. With
my algo above you find e4. Now you know e1 - e4 should
be deleted.

Then you go in a second while cycle from e4 back to e1.
Each deletion won't trigger a recursion, because prev error
always will be null or something with ref count 2.

Also you could do it in one cycle probably. Then you would
need a temporary ref. For example, when you delete e1, you
give ++ref to its prev error. After e1 is deleted, you check
whether --ref to the prev error would delete it too. If yes,
you repeat it again - ++ to prev prev error, delete prev,
check prev prev deletion, and so on. It also removes the
recursion.

Huge number of errors easily may happen, if a user had a bug
in his code about an infinite cycle adding errors, or something
like that.

Also huge number of errors may affect the whole tx thread
each time when we send an error to iproto, or delete it. Because
we will iterate without any yields in a huge list. Don't know
what to do with that. Probably nothing for now.

> +
> +/**
> + * Unlink error from any error which point to it. For instance:
> + * e1 -> e2 -> e3 -> e4  (e1:set_prev(e2); e2:set_prev(33) ...)

13. 33 -> e3.

> + * unlink(e3): e1 -> e2 -> NULL; e3 -> e4 -> NULL
> + */
> +static inline void
> +error_unlink_tail(struct error *e)
> +{
> +	if (e->prev != NULL) {
> +		assert(e->refs > 1);
> +		error_unref(e);
> +		e->prev->next = NULL;
> +	}
> +	e->prev = NULL;
>  }
>  
> +/**
> + * Return previous (for given error) error. Result can be NULL
> + * which means that there's no previous error. Simple getter
> + * to be used as ffi method in lua/error.lua.
> + */
> +struct error *
> +error_prev(struct error *e);
> +
> +/**
> + * Set previous error: remove @a prev from its current stack and

14. As I see from the code - not from its stack, but cut its previous
'tail' of newer errors. Right?

> + * link to the one @a e belongs to. Note that all previous errors
> + * starting from @a prev->next are transferred with it as well
> + * (i.e. reasons for given error are not erased). For instance:
> + * e1 -> e2 -> NULL; e3 -> e4 -> NULL;
> + * e2:set_prev(e3): e1 -> e2 -> e3 -> e4 -> NULL
> + *
> + * @a prev can be  NULL. To be used as ffi method in lua/error.lua.
> + *
> + * @retval -1 in case adding @a prev results in list cycles;
> + *          0 otherwise.
> + */
> +int
> +error_set_prev(struct error *e, struct error *prev);
> +
>  NORETURN static inline void
>  error_raise(struct error *e)
>  {
> @@ -178,6 +240,25 @@ diag_set_error(struct diag *diag, struct error *e)
>  	assert(e != NULL);
>  	error_ref(e);
>  	diag_clear(diag);
> +	error_unlink_tail(e);
> +	diag->last = e;
> +}
> +
> +/**
> + * Add a new error to the diagnostics area. It is added to the
> + * tail, so that list forms stack.
> + * @param diag Diagnostics area.
> + * @param e Error to be added.
> + */
> +static inline void
> +diag_add_error(struct diag *diag, struct error *e)
> +{
> +	assert(e != NULL);
> +	error_ref(e);
> +	error_unlink_tail(e);
> +	e->next = diag->last;

15. I thought, that 'next' member is responsible for keeping a
reference, but since you don't ref 'diag->last' here, I am not
sure now. Could you please clarify?

> +	if (diag->last != NULL)
> +		diag->last->prev = e;
>  	diag->last = e;
>  }
>  
> diff --git a/src/lua/error.lua b/src/lua/error.lua
> index 7f249864a..222b5e273 100644
> --- a/src/lua/error.lua
> +++ b/src/lua/error.lua
> @@ -11,6 +11,11 @@ enum {
>  
>  typedef void (*error_f)(struct error *e);
>  
> +struct rlist {
> +   struct rlist *prev;
> +   struct rlist *next;
> +};

16. Nope.

> +
>  struct error {
>      error_f _destroy;
>      error_f _raise;
> @@ -95,11 +108,37 @@ local function error_errno(err)
>      return e
>  end
>  
> +local function error_prev(err)
> +    local e = ffi.C.error_prev(err);
> +    if e ~= nil then
> +        return e
> +    else
> +        return nil
> +    end

17. I said it above, but just in case repeat it here: you can
directly return err._next, because FFI allows to access struct
members in Lua. Strictly speaking you can do the same below
for set_prev, but it is not so trivial, and it is right to keep
it in C and extern.

> +end
> +
> +local function error_set_prev(err, prev)
> +    -- First argument must be error.
> +    if not ffi.istype('struct error', err) then
> +        error("Usage: error1:set_prev(error2)")
> +    end
> +    -- Second argument must be error or nil.
> +    if not ffi.istype('struct error', prev) and prev ~= nil then
> +        error("Usage: error1:set_prev(error2)")
> +    end
> +    local ok = tonumber(ffi.C.error_set_prev(err, prev));

18. Why do you need tonumber? 'int' should be fine to compare
with 0.

> +    if ok ~= 0 then
> +        error("Cycles are not allowed")
> +    end
> +
> +end
> +
> diff --git a/test/box/misc.result b/test/box/misc.result
> index b0a81a055..1e235e8a1 100644
> --- a/test/box/misc.result
> +++ b/test/box/misc.result
> @@ -352,6 +352,282 @@ box.error.clear()
>  box.error()
>  ---
>  ...
> +-- gh-1148: erros can be arranged into list (so called

19. erros -> errors. If you work in Sublime, it has nice spell
checker.

20. I don't like that we have error module tests in misc.test.lua.
Especially when there is already so many of them. Could you please
move them all (including old tests) into a new file error.test.lua?

> +-- stacked diagnostics).
> +--
> +e1 = box.error.new({code = 111, reason = "cause"})
> +---
> +...
> +assert(e1.prev == nil)
> +---
> +- true
> +...
> +e1:set_prev(e1)
> +---
> +- error: 'builtin/error.lua: Cycles are not allowed'
> +...
> +assert(e1.prev == nil)
> +---
> +- true
> +...
> +e2 = box.error.new({code = 111, reason = "cause of cause"})
> +---
> +...
> +e1:set_prev(e2)
> +---
> +...
> +assert(e1.prev == e2)
> +---
> +- true
> +...
> +e2:set_prev(e1)
> +---
> +- error: 'builtin/error.lua: Cycles are not allowed'
> +...
> +assert(e2.prev == nil)
> +---
> +- true
> +...
> +-- At this point stack is following: e1 -> e2
> +-- Let's test following cases:
> +-- 1. e3 -> e2, e1 -> NULL (e3:set_prev(e2))
> +-- 2. e1 -> e3, e2 -> NULL (e1:set_prev(e3))
> +-- 3. e3 -> e1 -> e2 (e3:set_prev(e1))
> +-- 4. e1 -> e2 -> e3 (e2:set_prev(e3))
> +--
> +e3 = box.error.new({code = 111, reason = "another cause"})
> +---
> +...
> +e3:set_prev(e2)
> +---
> +...
> +assert(e3.prev == e2)

21. It is not really necessary to use assertion here. It will fail
without assertion too in case of a problem. Assertion is needed,
when you want an exception. For example, when you call a function,
inside which you do tests.

Instead of an assertion it is better to write something like that:

    e3.prev == e2 or {e3, e3.prev, e2}

In that case, if the == is false, you will see not only a false in
.reject file, but also actual values of e3, e3.prev, and e2. This
is especially helpful, when error is not stable, and in Travis/Gitlab,
because you see values right in the web console.


More information about the Tarantool-patches mailing list