[Tarantool-patches] [PATCH] lua: lua_field_inspect_table without pushcfunction

Igor Munkin imun at tarantool.org
Tue Jun 2 03:19:48 MSK 2020


Sergey,

Thanks for the patch! Please consider my comments below.

On 18.05.20, Sergey Kaplun wrote:
> Currently on encoding table we push cfunction (lua_field_try_serialize)
> to lua stack with additional lightuserdata and table value and after
> pcall that function to avoid a raise of error.
> 
> In this case LuaJIT creates new object which will not live long time,

<lua_pushlightuserdata> just assigns a pointer to the top guest stack
slot. Yes, it might trigger stack reallocation, but no GC object is
created.

Yes, the reallocation can fail with either LUA_ERRMEM or LUA_ERRRUN, but
the error is raised out of the protected scope, and is not handled.
There are more corner cases we have already discussed with Vlad here[1].

> so it increase amount of dead object and also increase time and
> frequency of garbage collection inside LuaJIT.
> Also this pcall is necessary only in case when metafield __serialize
> of serilizable object has LUA_TFUNCTION type.

Typo: s/serilizable/serializable/.

> 
> So instead pushcfunction with pcall we can directly call the function
> trying to serialize an object.

Well, let's polish the commit message to make it a bit clearer. Commit
subject also looks non-informative and doesn't respect our contribution
guide[2], so I propose the following rewording:
| lua: remove excess Lua call from table encoding
|
| For safe table encoding <lua_field_try_serialize> function is pushed
| to Lua stack along with auxiliary lightuserdata and table object to be
| encoded. Its further protected call catches Lua error if one is raised
| while encoding. It is only necessary when the object to be serialized
| has __serialize field in metatable and this field is a Lua function.
|
| As a result of this change the function serializing the given object
| is called without excess protected frame and auxiliary status struct.
Feel free to change it on your own.

> ---
> 
> branch: https://github.com/tarantool/tarantool/tree/skaplun/no-ticket-lua-inspect-table-refactoring
> 
>  src/lua/utils.c | 132 ++++++++++++++++--------------------------------
>  1 file changed, 44 insertions(+), 88 deletions(-)
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index d410a3d03..58715a55e 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -461,91 +461,69 @@ lua_field_inspect_ucdata(struct lua_State *L, struct luaL_serializer *cfg,
>  }
>  
>  /**
> - * A helper structure to simplify safe call of __serialize method.
> - * It passes some arguments into the serializer called via pcall,
> - * and carries out some results.
> + * Call __serialize method of a table object by index if the former exists.

The comment exceeds 66 symbols. Please adjust it considering our
contribution guide[3].

> + *
> + * If __serialize not exists function does nothing and the function returns 1;
> + *
> + * If __serialize exists, is correct and is a function then

What is 'correct' in this context?

> + * a result of serialization replaces old value by the index and
> + * the function returns 0;
> + *
> + * If the serialization hints like 'array' or 'map', then field->type,
> + * field->syze and field->compact sets if necessary

Typo: s/syze/size/.

> + * and the function returns 0;

I failed to get the sentence above and here is what I see in the code:
| If the serialization is a hint string (like 'array' or 'map'), then
| field->type, field->size and field->compact are set if necessary and
| the function returns 0;

> + *
> + * Otherwise it is an error, set diag and the funciton returns -1;
> + *
> + * Return values:
> + * -1 - error, set diag
> + *  0 - has serialize, success replace and have to finish
> + *  1 - hasn't serialize, need to process

Minor: I propose the following rewording to make this part a bit clear:
| Return values:
| -1 - error occurs, diag is set, the top of guest stack is undefined.
|  0 - __serialize is set, the result value is in the origin slot,
|      encoding is finished.
|  1 - __serialize is not set, proceed with default table encoding.

>   */
> -struct lua_serialize_status {
> -	/**
> -	 * True if an attempt to call __serialize has failed. A
> -	 * diag message is set.
> -	 */
> -	bool is_error;
> -	/**
> -	 * True, if __serialize exists. Otherwise an ordinary
> -	 * default serialization is used.
> -	 */
> -	bool is_serialize_used;
> -	/**
> -	 * True, if __serialize not only exists, but also returned
> -	 * a new value which should replace the original one. On
> -	 * the contrary __serialize could be a string like 'array'
> -	 * or 'map' and do not push anything but rather say how to
> -	 * interpret the target table. In such a case there is
> -	 * nothing to replace with.
> -	 */
> -	bool is_value_returned;
> -	/** Parameters, passed originally to luaL_tofield. */
> -	struct luaL_serializer *cfg;
> -	struct luaL_field *field;
> -};
>  
> -/**
> - * Call __serialize method of a table object if the former exists.
> - * The function expects 2 values pushed onto the Lua stack: a
> - * value to serialize, and a pointer at a struct
> - * lua_serialize_status object. If __serialize exists, is correct,
> - * and is a function then one value is pushed as a result of
> - * serialization. If it is correct, but just a serialization hint
> - * like 'array' or 'map', then nothing is pushed. Otherwise it is
> - * an error. All the described outcomes can be distinguished via
> - * lua_serialize_status attributes.
> - */
>  static int
> -lua_field_try_serialize(struct lua_State *L)
> +lua_field_try_serialize(struct lua_State *L, int idx,
> +			struct luaL_serializer *cfg, struct luaL_field *field)

I looked on the code around and other signatures are different in the
arguments order:
| lua_field_do_something(struct lua_State *, struct luaL_serializer *,
|                        int, struct luaL_field *);
I see no reason to violate this practice.

>  {
> -	struct lua_serialize_status *s =
> -		(struct lua_serialize_status *) lua_touserdata(L, 2);
> -	s->is_serialize_used = (luaL_getmetafield(L, 1, LUAL_SERIALIZE) != 0);
> -	s->is_error = false;
> -	s->is_value_returned = false;
> -	if (! s->is_serialize_used)
> -		return 0;
> -	struct luaL_serializer *cfg = s->cfg;
> -	struct luaL_field *field = s->field;
> +	bool is_serialize_used = (luaL_getmetafield(L, idx,
> +						    LUAL_SERIALIZE) != 0);

I agree with Vlad here. If there are some reasons to leave this variable
has_serialize is also OK, but at the same time it fits 80 chars.

> +	if (!is_serialize_used)
> +		return 1;
>  	if (lua_isfunction(L, -1)) {
>  		/* copy object itself */
> -		lua_pushvalue(L, 1);
> -		lua_call(L, 1, 1);
> -		s->is_error = (luaL_tofield(L, cfg, NULL, -1, field) != 0);
> -		s->is_value_returned = true;
> -		return 1;
> +		lua_pushvalue(L, idx);
> +		if (lua_pcall(L, 1, 1, 0) != 0) {
> +			diag_set(LuajitError, lua_tostring(L, -1));
> +			return -1;
> +		}
> +		if (luaL_tofield(L, cfg, NULL, -1, field) != 0)
> +			return -1;
> +		lua_replace(L, idx);
> +		return 0;
>  	}
>  	if (!lua_isstring(L, -1)) {
>  		diag_set(LuajitError, "invalid " LUAL_SERIALIZE " value");
> -		s->is_error = true;
> -		lua_pop(L, 1);
> -		return 0;
> +		return -1;
>  	}
>  	const char *type = lua_tostring(L, -1);
>  	if (strcmp(type, "array") == 0 || strcmp(type, "seq") == 0 ||
>  	    strcmp(type, "sequence") == 0) {
>  		field->type = MP_ARRAY; /* Override type */
> -		field->size = luaL_arrlen(L, 1);
> +		field->size = luaL_arrlen(L, idx);
>  		/* YAML: use flow mode if __serialize == 'seq' */
>  		if (cfg->has_compact && type[3] == '\0')
>  			field->compact = true;
>  	} else if (strcmp(type, "map") == 0 || strcmp(type, "mapping") == 0) {
>  		field->type = MP_MAP;   /* Override type */
> -		field->size = luaL_maplen(L, 1);
> +		field->size = luaL_maplen(L, idx);
>  		/* YAML: use flow mode if __serialize == 'map' */
>  		if (cfg->has_compact && type[3] == '\0')
>  			field->compact = true;
>  	} else {
>  		diag_set(LuajitError, "invalid " LUAL_SERIALIZE " value");
> -		s->is_error = true;
> +		return -1;
>  	}
> -	lua_pop(L, 1);
> +	lua_pop(L, 1); /* remove value was setted by luaL_getmetafield */
>  	return 0;
>  }
>  
> @@ -559,36 +537,14 @@ lua_field_inspect_table(struct lua_State *L, struct luaL_serializer *cfg,
>  
>  	if (cfg->encode_load_metatables) {
>  		int top = lua_gettop(L);
> -		struct lua_serialize_status s;
> -		s.cfg = cfg;
> -		s.field = field;
> -		lua_pushcfunction(L, lua_field_try_serialize);
> -		lua_pushvalue(L, idx);
> -		lua_pushlightuserdata(L, &s);
> -		if (lua_pcall(L, 2, 1, 0) != 0) {
> -			diag_set(LuajitError, lua_tostring(L, -1));
> -			return -1;
> -		}
> -		if (s.is_error)
> +		int res = lua_field_try_serialize(L, idx, cfg, field);
> +		if (res == -1)
>  			return -1;
> -		/*
> -		 * lua_call/pcall always returns the specified
> -		 * number of values. Even if the function returned
> -		 * less - others are filled with nils. So when a
> -		 * nil is not needed, it should be popped
> -		 * manually.
> -		 */
> -		assert(lua_gettop(L) == top + 1);
> -		(void) top;
> -		if (s.is_serialize_used) {
> -			if (s.is_value_returned)
> -				lua_replace(L, idx);
> -			else
> -				lua_pop(L, 1);
> +		assert(lua_gettop(L) == top);
> +		(void)top;
> +		if (res == 0)
>  			return 0;
> -		}
> -		assert(! s.is_value_returned);
> -		lua_pop(L, 1);
> +		/* Fall throuth with res == 1 */

Typo: s/Fall throuth/Fallthrough/.

Minor: This part can be simply rewritten the following way:
| int top = lua_gettop(L);
| (void)top;
|
| switch(lua_field_try_serialize(L, idx, cfg, field)) {
| case -1:
| 	return -1;
| case 0:
| 	assert(lua_gettop(L) == top);
| 	return 0;
| case 1:
| 	assert(lua_gettop(L) == top);
| 	/* Continue table encoding */
| 	break;
| default:
| 	/* Unreachable */
| 	assert(0);
| }
IMHO, it's more readable and also checks that nothing except the values
mentioned in the function contract is returned. Feel free to ignore.

>  	}
>  
>  	field->type = MP_ARRAY;
> -- 
> 2.24.1
> 

Side note: I can't come up with tests except those you showed to Sasha
here[4], but it looks like they doesn't directly relate to the changes
you introduce with the patch. No idea for now. I guess we can return to
the question when you fix the review comments.


[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-April/015701.html
[2]: https://www.tarantool.io/en/doc/2.2/dev_guide/developer_guidelines/#how-to-write-a-commit-message
[3]: https://www.tarantool.io/en/doc/2.2/dev_guide/c_style_guide/#chapter-2-breaking-long-lines-and-strings
[4]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-May/016994.html

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list