Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Implement json path access to tuple fields
@ 2018-04-10  8:31 Vladislav Shpilevoy
  2018-04-10  8:31 ` [PATCH v3 1/3] Allow gcov on Mac Vladislav Shpilevoy
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-10  8:31 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, kshcherbatov

Branch: http://github.com/tarantool/tarantool/tree/gh-1285-tuple-field-by-json-icu
Issue: https://github.com/tarantool/tarantool/issues/1285

Tuple field access by JSON path provides fast access to a field and field
internals with no unpacking an entire tuple or even the field into Lua.

This patchset is a first part of JSON introduction into Tarantool core. It
introduces the JSON path parser and tuple internals access by JSON. The next
patchsets will introduce tuple update by JSON - gh-1261, indexing by
JSON - gh-1012.

Kirill Shcherbatov (2):
  Introduce json_path_parser with Unicode support.
  Lua: implement json path access to tuple fields

Vladislav Shpilevoy (1):
  Allow gcov on Mac

 cmake/profile.cmake         |   9 --
 src/box/CMakeLists.txt      |   2 +-
 src/box/lua/tuple.c         |  63 ++++++++----
 src/box/lua/tuple.lua       |  49 +++------
 src/box/tuple.h             |  21 ++++
 src/box/tuple_format.c      | 164 +++++++++++++++++++++++++++++
 src/box/tuple_format.h      |  19 ++++
 src/lib/CMakeLists.txt      |   1 +
 src/lib/json/CMakeLists.txt |   6 ++
 src/lib/json/path.c         | 245 ++++++++++++++++++++++++++++++++++++++++++++
 src/lib/json/path.h         | 112 ++++++++++++++++++++
 test/engine/tuple.result    | 229 +++++++++++++++++++++++++++++++++++++++++
 test/engine/tuple.test.lua  |  67 ++++++++++++
 test/unit/CMakeLists.txt    |   3 +
 test/unit/json_path.c       | 172 +++++++++++++++++++++++++++++++
 test/unit/json_path.result  |  98 ++++++++++++++++++
 16 files changed, 1195 insertions(+), 65 deletions(-)
 create mode 100644 src/lib/json/CMakeLists.txt
 create mode 100644 src/lib/json/path.c
 create mode 100644 src/lib/json/path.h
 create mode 100644 test/unit/json_path.c
 create mode 100644 test/unit/json_path.result

-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/3] Allow gcov on Mac
  2018-04-10  8:31 [PATCH v3 0/3] Implement json path access to tuple fields Vladislav Shpilevoy
@ 2018-04-10  8:31 ` Vladislav Shpilevoy
  2018-04-10 10:49   ` Vladimir Davydov
       [not found] ` <f1302082b4457a03b5d2b3c44526815428de4a0e.1523349020.git.v.shpilevoy@tarantool.org>
       [not found] ` <c3b64b7b4e638970864995c895bbd5f736282560.1523349020.git.v.shpilevoy@tarantool.org>
  2 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-10  8:31 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, kshcherbatov

For this it is enough to remove HAVE_GCOV. It checks for gcov
library existance, but the library is unused anyway.

Enabled gcov can be run locally on Mac now.
---
 cmake/profile.cmake | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/cmake/profile.cmake b/cmake/profile.cmake
index 278399155..c82fc0168 100644
--- a/cmake/profile.cmake
+++ b/cmake/profile.cmake
@@ -1,14 +1,7 @@
-check_library_exists (gcov __gcov_flush  ""  HAVE_GCOV)
-
 set(ENABLE_GCOV_DEFAULT OFF)
 option(ENABLE_GCOV "Enable integration with gcov, a code coverage program" ${ENABLE_GCOV_DEFAULT})
 
 if (ENABLE_GCOV)
-    if (NOT HAVE_GCOV)
-    message (FATAL_ERROR
-         "ENABLE_GCOV option requested but gcov library is not found")
-    endif()
-
     add_compile_flags("C;CXX"
         "-fprofile-arcs"
         "-ftest-coverage"
@@ -18,8 +11,6 @@ if (ENABLE_GCOV)
     set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -ftest-coverage")
     set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fprofile-arcs")
     set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -ftest-coverage")
-
-   # add_library(gcov SHARED IMPORTED)
 endif()
 
 if (NOT CMAKE_BUILD_TYPE STREQUAL "Debug")
-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] Allow gcov on Mac
  2018-04-10  8:31 ` [PATCH v3 1/3] Allow gcov on Mac Vladislav Shpilevoy
@ 2018-04-10 10:49   ` Vladimir Davydov
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Davydov @ 2018-04-10 10:49 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, kshcherbatov

On Tue, Apr 10, 2018 at 11:31:44AM +0300, Vladislav Shpilevoy wrote:
> For this it is enough to remove HAVE_GCOV. It checks for gcov
> library existance, but the library is unused anyway.
> 
> Enabled gcov can be run locally on Mac now.
> ---
>  cmake/profile.cmake | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/cmake/profile.cmake b/cmake/profile.cmake
> index 278399155..c82fc0168 100644
> --- a/cmake/profile.cmake
> +++ b/cmake/profile.cmake
> @@ -1,14 +1,7 @@
> -check_library_exists (gcov __gcov_flush  ""  HAVE_GCOV)
> -
>  set(ENABLE_GCOV_DEFAULT OFF)
>  option(ENABLE_GCOV "Enable integration with gcov, a code coverage program" ${ENABLE_GCOV_DEFAULT})
>  
>  if (ENABLE_GCOV)
> -    if (NOT HAVE_GCOV)
> -    message (FATAL_ERROR
> -         "ENABLE_GCOV option requested but gcov library is not found")
> -    endif()
> -

I don't understand this. We call __gcov_flush() in tarantool_free() if
ENABLE_GCOV is set so we must check that this function exists and AFAIU
the check you're suggesting to remove does that...

>      add_compile_flags("C;CXX"
>          "-fprofile-arcs"
>          "-ftest-coverage"
> @@ -18,8 +11,6 @@ if (ENABLE_GCOV)
>      set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -ftest-coverage")
>      set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fprofile-arcs")
>      set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -ftest-coverage")
> -
> -   # add_library(gcov SHARED IMPORTED)
>  endif()
>  
>  if (NOT CMAKE_BUILD_TYPE STREQUAL "Debug")

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] Introduce json_path_parser with Unicode support.
       [not found] ` <f1302082b4457a03b5d2b3c44526815428de4a0e.1523349020.git.v.shpilevoy@tarantool.org>
@ 2018-04-10 16:36   ` Vladimir Davydov
  2018-04-10 18:42     ` [tarantool-patches] " Vladislav Shpilevoy
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Davydov @ 2018-04-10 16:36 UTC (permalink / raw)
  To: kshcherbatov; +Cc: tarantool-patches, v.shpilevoy

On Tue, Apr 10, 2018 at 11:31:45AM +0300, Vladislav Shpilevoy wrote:
> diff --git a/src/lib/json/path.c b/src/lib/json/path.c
> new file mode 100644
> index 000000000..681419f90
> --- /dev/null
> +++ b/src/lib/json/path.c

> +/**
> + * Read a single symbol from a string starting from an offset.
> + * @param parser JSON path parser.
> + * @param[out] UChar32 Read symbol.
> + *
> + * @retval   0 Success.
> + * @retval > 0 1-based position of a syntax error.
> + */
> +static inline int
> +json_read_symbol(struct json_path_parser *parser, UChar32 *out)
> +{
> +	if (parser->offset == parser->src_len) {

> +		*out = U_SENTINEL;

This is not necessary AFAICS.

> +		return parser->symbol_count + 1;
> +	}
> +	U8_NEXT(parser->src, parser->offset, parser->src_len, *out);
> +	if (*out == U_SENTINEL)
> +		return parser->symbol_count + 1;
> +	++parser->symbol_count;
> +	return 0;
> +}
> +
> +/**
> + * Rollback one symbol offset.
> + * @param parser JSON path parser.
> + * @param offset Offset to the previous symbol.
> + */
> +static inline void
> +json_revert_symbol(struct json_path_parser *parser, int offset)

Not so good name IMO. Can't think of a better one though.

> +{
> +	parser->offset = offset;
> +	--parser->symbol_count;
> +}
> +
> +/** Fast forward when it is known that a symbol is 1-byte char. */
> +static inline void
> +json_propagate_char(struct json_path_parser *parser)

Bad name IMO, json_skip_char would be better.

> +{
> +	++parser->offset;
> +	++parser->symbol_count;
> +}
> +
> +/** Get a current symbol as a 1-byte char. */
> +static inline char
> +json_current_char(struct json_path_parser *parser)
> +{
> +	return *(parser->src + parser->offset);
> +}
> +
> +/**
> + * Parse string identifier in quotes. Parser either stops right
> + * after the closing quote, or returns an error position.
> + * @param parser JSON path parser.
> + * @param[out] node JSON node to store result.
> + * @param quote_type Quote by that a string must be terminated.
> + *
> + * @retval   0 Success.
> + * @retval > 0 1-based position of a syntax error.
> + */
> +static inline int
> +json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
> +		  UChar32 quote_type)
> +{
> +	assert(parser->offset < parser->src_len);

> +	assert(quote_type == json_current_char(parser));

Why pass quote_type if you know that it equals json_current_char?

> +	/* The first symbol is always char  - ' or ". */
> +	json_propagate_char(parser);
> +	int str_offset = parser->offset;
> +	UChar32 c;
> +	int rc;
> +	while ((rc = json_read_symbol(parser, &c)) == 0) {
> +		if (c == quote_type) {
> +			int len = parser->offset - str_offset - 1;
> +			if (len == 0)
> +				return parser->symbol_count;
> +			node->type = JSON_PATH_STR;
> +			node->str = parser->src + str_offset;
> +			node->len = len;
> +			return 0;
> +		}
> +	}
> +	return rc;
> +}
> +
> +/**
> + * Parse digit sequence into integer until non-digit is met.
> + * Parser stops right after the last digit.
> + * @param parser JSON parser.
> + * @param[out] node JSON node to store result.
> + *
> + * @retval   0 Success.
> + * @retval > 0 1-based position of a syntax error.
> + */
> +static inline int
> +json_parse_integer(struct json_path_parser *parser, struct json_path_node *node)
> +{
> +	const char *end = parser->src + parser->src_len;
> +	const char *pos = parser->src + parser->offset;
> +	assert(pos < end);
> +	int len = 0;
> +	uint64_t value = 0;
> +	char c = *pos;
> +	if (! isdigit(c))
> +		return parser->symbol_count + 1;
> +	do {
> +		value = value * 10 + c - (int)'0';
> +		++len;
> +	} while (++pos < end && isdigit((c = *pos)));
> +	parser->offset += len;
> +	parser->symbol_count += len;
> +	node->type = JSON_PATH_NUM;
> +	node->num = value;

strtol?

> +	return 0;
> +}
> +
> +/**
> + * Check that a symbol can be part of a JSON path not inside
> + * ["..."].
> + */
> +static inline bool
> +json_is_valid_identifier_symbol(UChar32 c)
> +{
> +	return u_isUAlphabetic(c) || c == (UChar32)'_' || u_isdigit(c);
> +}
> +
> +/**
> + * Parse identifier out of quotes. It can contain only alphas,
> + * digits and underscores. And can not contain digit at the first
> + * position. Parser is stoped right after the last non-digit,
> + * non-alpha and non-underscore symbol.
> + * @param parser JSON parser. Updates signs_read field.

What is 'signs_read'?

> + * @param[out] node JSON node to store result.
> + *
> + * @retval   0 Success.
> + * @retval > 0 1-based position of a syntax error.
> + */
> +static inline int
> +json_parse_identifier(struct json_path_parser *parser,
> +		      struct json_path_node *node)
> +{
> +	assert(parser->offset < parser->src_len);
> +	int str_offset = parser->offset;
> +	UChar32 c;
> +	int rc = json_read_symbol(parser, &c);
> +	if (rc != 0)
> +		return rc;
> +	/* First symbol can not be digit. */
> +	if (!u_isalpha(c) && c != (UChar32)'_')
> +		return parser->symbol_count;
> +	int last_offset = parser->offset;
> +	while ((rc = json_read_symbol(parser, &c)) == 0) {

> +		if (! json_is_valid_identifier_symbol(c)) {

I would fold this function, because we have to use u_isalpha right above
anyway.

> +			json_revert_symbol(parser, last_offset);
> +			break;
> +		}
> +		last_offset = parser->offset;
> +	}
> +	node->type = JSON_PATH_STR;
> +	node->str = parser->src + str_offset;
> +	node->len = parser->offset - str_offset;
> +	return 0;
> +}
> +
> +int
> +json_path_next(struct json_path_parser *parser, struct json_path_node *node)
> +{
> +	UChar32 c;
> +	int last_offset = parser->offset;
> +	int rc = json_read_symbol(parser, &c);
> +	if (rc != 0) {
> +		if (parser->offset == parser->src_len) {
> +			node->type = JSON_PATH_END;
> +			return 0;
> +		}

IMO it would be more straightforward to first check EOF and only then
proceed to reading the string, not vice versa.

> +		return rc;
> +	}
> +	switch(c) {
> +	case (UChar32)'[':
> +		/* Error for '[\0'. */
> +		if (parser->offset == parser->src_len)
> +			return parser->symbol_count;
> +		c = json_current_char(parser);
> +		if (c == '"' || c == '\'')
> +			rc = json_parse_string(parser, node, c);
> +		else
> +			rc = json_parse_integer(parser, node);
> +		if (rc != 0)
> +			return rc;
> +		/*
> +		 * Expression, started from [ must be finished
> +		 * with ] regardless of its type.
> +		 */
> +		if (parser->offset == parser->src_len ||
> +		    json_current_char(parser) != ']')
> +			return parser->symbol_count + 1;
> +		/* Skip ] - one byte char. */
> +		json_propagate_char(parser);
> +		return 0;
> +	case (UChar32)'.':
> +		if (parser->offset == parser->src_len)
> +			return parser->symbol_count + 1;
> +		return json_parse_identifier(parser, node);
> +	default:
> +		json_revert_symbol(parser, last_offset);
> +		return json_parse_identifier(parser, node);
> +	}
> +}

> diff --git a/test/unit/json_path.c b/test/unit/json_path.c
> new file mode 100644
> index 000000000..fb35bf898
> --- /dev/null
> +++ b/test/unit/json_path.c
> @@ -0,0 +1,172 @@
> +#include "json/path.h"
> +#include "unit.h"
> +#include "trivia/util.h"
> +#include <string.h>
> +
> +#define reset_to_new_path(value) \
> +	path = value; \
> +	len = strlen(value); \
> +	json_path_parser_create(&parser, path, len);
> +
> +#define is_next_index(value_len, value) \
> +	path = parser.src + parser.offset; \
> +	is(json_path_next(&parser, &node), 0, "parse <%." #value_len "s>", \
> +	   path); \
> +	is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", path); \
> +	is(node.num, value, "<%." #value_len "s> is " #value, path);
> +
> +#define is_next_key(value) \
> +	len = strlen(value); \
> +	is(json_path_next(&parser, &node), 0, "parse <" value ">"); \
> +	is(node.type, JSON_PATH_STR, "<" value "> is str"); \
> +	is(node.len, len, "len is %d", len); \
> +	is(strncmp(node.str, value, len), 0, "str is " value);
> +
> +void
> +test_basic()
> +{
> +	header();
> +	plan(67);
> +	const char *path;
> +	int len;
> +	struct json_path_parser parser;
> +	struct json_path_node node;
> +
> +	reset_to_new_path("[0].field1.field2['field3'][5]");
> +	is_next_index(3, 0);
> +	is_next_key("field1");
> +	is_next_key("field2");
> +	is_next_key("field3");
> +	is_next_index(3, 5);
> +
> +	reset_to_new_path("[3].field[2].field")
> +	is_next_index(3, 3);
> +	is_next_key("field");
> +	is_next_index(3, 2);
> +	is_next_key("field");
> +
> +	reset_to_new_path("[\"f1\"][\"f2'3'\"]");
> +	is_next_key("f1");
> +	is_next_key("f2'3'");
> +
> +	/* Support both '.field1...' and 'field1...'. */

The comment is misleading - there's no test case for 'field1...'

> +	reset_to_new_path(".field1");
> +	is_next_key("field1");

Shouldn't it return an empty string before 'field1' for such an input?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v3 2/3] Introduce json_path_parser with Unicode support.
  2018-04-10 16:36   ` [PATCH v3 2/3] Introduce json_path_parser with Unicode support Vladimir Davydov
@ 2018-04-10 18:42     ` Vladislav Shpilevoy
  2018-04-11  9:20       ` Vladimir Davydov
  0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-10 18:42 UTC (permalink / raw)
  To: Vladimir Davydov, kshcherbatov; +Cc: tarantool-patches



On 10/04/2018 19:36, Vladimir Davydov wrote:
> On Tue, Apr 10, 2018 at 11:31:45AM +0300, Vladislav Shpilevoy wrote:
>> diff --git a/src/lib/json/path.c b/src/lib/json/path.c
>> new file mode 100644
>> index 000000000..681419f90
>> --- /dev/null
>> +++ b/src/lib/json/path.c
> 
>> +/**
>> + * Read a single symbol from a string starting from an offset.
>> + * @param parser JSON path parser.
>> + * @param[out] UChar32 Read symbol.
>> + *
>> + * @retval   0 Success.
>> + * @retval > 0 1-based position of a syntax error.
>> + */
>> +static inline int
>> +json_read_symbol(struct json_path_parser *parser, UChar32 *out)
>> +{
>> +	if (parser->offset == parser->src_len) {
> 
>> +		*out = U_SENTINEL;
> 
> This is not necessary AFAICS.

One of compilers prints warning, that 'UChar32 c' is not initialized, 
where 'c' is argument of json_read_symbol.
>> +/** Fast forward when it is known that a symbol is 1-byte char. */
>> +static inline void
>> +json_propagate_char(struct json_path_parser *parser)
> 
> Bad name IMO, json_skip_char would be better.

As you wish.

diff --git a/src/lib/json/path.c b/src/lib/json/path.c
index 681419f90..1fccf82d0 100644
--- a/src/lib/json/path.c
+++ b/src/lib/json/path.c
@@ -72,7 +72,7 @@ json_revert_symbol(struct json_path_parser *parser, 
int offset)

  /** Fast forward when it is known that a symbol is 1-byte char. */
  static inline void
-json_propagate_char(struct json_path_parser *parser)
+json_skip_char(struct json_path_parser *parser)
  {
         ++parser->offset;
         ++parser->symbol_count;

>> +/**
>> + * Parse string identifier in quotes. Parser either stops right
>> + * after the closing quote, or returns an error position.
>> + * @param parser JSON path parser.
>> + * @param[out] node JSON node to store result.
>> + * @param quote_type Quote by that a string must be terminated.
>> + *
>> + * @retval   0 Success.
>> + * @retval > 0 1-based position of a syntax error.
>> + */
>> +static inline int
>> +json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
>> +		  UChar32 quote_type)
>> +{
>> +	assert(parser->offset < parser->src_len);
>> +	assert(quote_type == json_current_char(parser));
> 
> Why pass quote_type if you know that it equals json_current_char?

I do not want to calculate *(src + offset) if it is already done in the 
caller function. Besides, I do not like json_current_char(), but now I 
can not find how to drop it with no simple inline.
>> +static inline int
>> +json_parse_integer(struct json_path_parser *parser, struct json_path_node *node)
>> +{
>> +	const char *end = parser->src + parser->src_len;
>> +	const char *pos = parser->src + parser->offset;
>> +	assert(pos < end);
>> +	int len = 0;
>> +	uint64_t value = 0;
>> +	char c = *pos;
>> +	if (! isdigit(c))
>> +		return parser->symbol_count + 1;
>> +	do {
>> +		value = value * 10 + c - (int)'0';
>> +		++len;
>> +	} while (++pos < end && isdigit((c = *pos)));
>> +	parser->offset += len;
>> +	parser->symbol_count += len;
>> +	node->type = JSON_PATH_NUM;
>> +	node->num = value;
> 
> strtol?

Now number value is calculated simultaneously with reading and offsets 
propagation in exactly N symbols, where N - is length of a number.

You propose to read N symbols and then read them again to build a number 
- it is 2N readings. Seems to be slower.

Moreover, strtol will check each symbol on is digit, but we already know 
that all len symbols are digits. And strtol does not have strntol 
version. So I can input invalid JSON like this: '[1234', and strtol can 
do not stop after 4, if next memory area byte looks like a digit.

First version of JSON path parser had strntoll() implementation and 2N 
readings, but the bench showed, then building a number on fly is faster.

> 
>> +	return 0;
>> +}
>> +
>> +/**
>> + * Check that a symbol can be part of a JSON path not inside
>> + * ["..."].
>> + */
>> +static inline bool
>> +json_is_valid_identifier_symbol(UChar32 c)
>> +{
>> +	return u_isUAlphabetic(c) || c == (UChar32)'_' || u_isdigit(c);
>> +}
>> +
>> +/**
>> + * Parse identifier out of quotes. It can contain only alphas,
>> + * digits and underscores. And can not contain digit at the first
>> + * position. Parser is stoped right after the last non-digit,
>> + * non-alpha and non-underscore symbol.
>> + * @param parser JSON parser. Updates signs_read field.
> 
> What is 'signs_read'?

Did not notice that on self-review(

Fixed.

@@ -166,7 +166,7 @@ json_is_valid_identifier_symbol(UChar32 c)
   * digits and underscores. And can not contain digit at the first
   * position. Parser is stoped right after the last non-digit,
   * non-alpha and non-underscore symbol.
- * @param parser JSON parser. Updates signs_read field.
+ * @param parser JSON parser.
   * @param[out] node JSON node to store result.
   *
   * @retval   0 Success.

>> +	/* First symbol can not be digit. */
>> +	if (!u_isalpha(c) && c != (UChar32)'_')
>> +		return parser->symbol_count;
>> +	int last_offset = parser->offset;
>> +	while ((rc = json_read_symbol(parser, &c)) == 0) {
> 
>> +		if (! json_is_valid_identifier_symbol(c)) {
> 
> I would fold this function, because we have to use u_isalpha right above
> anyway.

is_valid_identifier checks not only on alpha and '_' - it checks on 
digits. An identifier can contain digits on non-first place.

>> +}
>> +
>> +int
>> +json_path_next(struct json_path_parser *parser, struct json_path_node *node)
>> +{
>> +	UChar32 c;
>> +	int last_offset = parser->offset;
>> +	int rc = json_read_symbol(parser, &c);
>> +	if (rc != 0) {
>> +		if (parser->offset == parser->src_len) {
>> +			node->type = JSON_PATH_END;
>> +			return 0;
>> +		}
> 
> IMO it would be more straightforward to first check EOF and only then
> proceed to reading the string, not vice versa.

As you wish.

@@ -202,16 +202,15 @@ json_parse_identifier(struct json_path_parser *parser,
  int
  json_path_next(struct json_path_parser *parser, struct json_path_node 
*node)
  {
+       if (parser->offset == parser->src_len) {
+               node->type = JSON_PATH_END;
+               return 0;
+       }
         UChar32 c;
         int last_offset = parser->offset;
         int rc = json_read_symbol(parser, &c);
-       if (rc != 0) {
-               if (parser->offset == parser->src_len) {
-                       node->type = JSON_PATH_END;
-                       return 0;
-               }
+       if (rc != 0)
                 return rc;
-       }
         switch(c) {

>> +void
>> +test_basic()
>> +{
>> +	header();
>> +	plan(67);
>> +	const char *path;
>> +	int len;
>> +	struct json_path_parser parser;
>> +	struct json_path_node node;
>> +
>> +	reset_to_new_path("[0].field1.field2['field3'][5]");
>> +	is_next_index(3, 0);
>> +	is_next_key("field1");
>> +	is_next_key("field2");
>> +	is_next_key("field3");
>> +	is_next_index(3, 5);
>> +
>> +	reset_to_new_path("[3].field[2].field")
>> +	is_next_index(3, 3);
>> +	is_next_key("field");
>> +	is_next_index(3, 2);
>> +	is_next_key("field");
>> +
>> +	reset_to_new_path("[\"f1\"][\"f2'3'\"]");
>> +	is_next_key("f1");
>> +	is_next_key("f2'3'");
>> +
>> +	/* Support both '.field1...' and 'field1...'. */
> 
> The comment is misleading - there's no test case for 'field1...'

Thanks for the catch.

diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index fb35bf898..1d7e7d372 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -26,7 +26,7 @@ void
  test_basic()
  {
         header();
-       plan(67);
+       plan(71);
         const char *path;
         int len;
         struct json_path_parser parser;
@@ -52,6 +52,8 @@ test_basic()
         /* Support both '.field1...' and 'field1...'. */
         reset_to_new_path(".field1");
         is_next_key("field1");
+       reset_to_new_path("field1");
+       is_next_key("field1");

> 
>> +	reset_to_new_path(".field1");
>> +	is_next_key("field1");
> 
> Shouldn't it return an empty string before 'field1' for such an input?

No, here everything is ok. My JSON path parser allows to start a path 
with '.' and it is not considered as an empty string + '.'. Exception is 
'.[...]' - it is an error (see tests in test_errors()).

I allowed '.identifier' syntax to lay emphasis that it is actually 
'tuple.identifier'. The JSON path is just a suffix for a tuple.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v3 2/3] Introduce json_path_parser with Unicode support.
  2018-04-10 18:42     ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-04-11  9:20       ` Vladimir Davydov
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Davydov @ 2018-04-11  9:20 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: kshcherbatov, tarantool-patches

OK, ACK

On Tue, Apr 10, 2018 at 09:42:08PM +0300, Vladislav Shpilevoy wrote:
> On 10/04/2018 19:36, Vladimir Davydov wrote:
> > On Tue, Apr 10, 2018 at 11:31:45AM +0300, Vladislav Shpilevoy wrote:
> > > diff --git a/src/lib/json/path.c b/src/lib/json/path.c
> > > new file mode 100644
> > > index 000000000..681419f90
> > > --- /dev/null
> > > +++ b/src/lib/json/path.c
> > 
> > > +/**
> > > + * Read a single symbol from a string starting from an offset.
> > > + * @param parser JSON path parser.
> > > + * @param[out] UChar32 Read symbol.
> > > + *
> > > + * @retval   0 Success.
> > > + * @retval > 0 1-based position of a syntax error.
> > > + */
> > > +static inline int
> > > +json_read_symbol(struct json_path_parser *parser, UChar32 *out)
> > > +{
> > > +	if (parser->offset == parser->src_len) {
> > 
> > > +		*out = U_SENTINEL;
> > 
> > This is not necessary AFAICS.
> 
> One of compilers prints warning, that 'UChar32 c' is not initialized, where
> 'c' is argument of json_read_symbol.
> > > +/** Fast forward when it is known that a symbol is 1-byte char. */
> > > +static inline void
> > > +json_propagate_char(struct json_path_parser *parser)
> > 
> > Bad name IMO, json_skip_char would be better.
> 
> As you wish.
> 
> diff --git a/src/lib/json/path.c b/src/lib/json/path.c
> index 681419f90..1fccf82d0 100644
> --- a/src/lib/json/path.c
> +++ b/src/lib/json/path.c
> @@ -72,7 +72,7 @@ json_revert_symbol(struct json_path_parser *parser, int
> offset)
> 
>  /** Fast forward when it is known that a symbol is 1-byte char. */
>  static inline void
> -json_propagate_char(struct json_path_parser *parser)
> +json_skip_char(struct json_path_parser *parser)
>  {
>         ++parser->offset;
>         ++parser->symbol_count;
> 
> > > +/**
> > > + * Parse string identifier in quotes. Parser either stops right
> > > + * after the closing quote, or returns an error position.
> > > + * @param parser JSON path parser.
> > > + * @param[out] node JSON node to store result.
> > > + * @param quote_type Quote by that a string must be terminated.
> > > + *
> > > + * @retval   0 Success.
> > > + * @retval > 0 1-based position of a syntax error.
> > > + */
> > > +static inline int
> > > +json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
> > > +		  UChar32 quote_type)
> > > +{
> > > +	assert(parser->offset < parser->src_len);
> > > +	assert(quote_type == json_current_char(parser));
> > 
> > Why pass quote_type if you know that it equals json_current_char?
> 
> I do not want to calculate *(src + offset) if it is already done in the
> caller function. Besides, I do not like json_current_char(), but now I can
> not find how to drop it with no simple inline.
> > > +static inline int
> > > +json_parse_integer(struct json_path_parser *parser, struct json_path_node *node)
> > > +{
> > > +	const char *end = parser->src + parser->src_len;
> > > +	const char *pos = parser->src + parser->offset;
> > > +	assert(pos < end);
> > > +	int len = 0;
> > > +	uint64_t value = 0;
> > > +	char c = *pos;
> > > +	if (! isdigit(c))
> > > +		return parser->symbol_count + 1;
> > > +	do {
> > > +		value = value * 10 + c - (int)'0';
> > > +		++len;
> > > +	} while (++pos < end && isdigit((c = *pos)));
> > > +	parser->offset += len;
> > > +	parser->symbol_count += len;
> > > +	node->type = JSON_PATH_NUM;
> > > +	node->num = value;
> > 
> > strtol?
> 
> Now number value is calculated simultaneously with reading and offsets
> propagation in exactly N symbols, where N - is length of a number.
> 
> You propose to read N symbols and then read them again to build a number -
> it is 2N readings. Seems to be slower.
> 
> Moreover, strtol will check each symbol on is digit, but we already know
> that all len symbols are digits. And strtol does not have strntol version.
> So I can input invalid JSON like this: '[1234', and strtol can do not stop
> after 4, if next memory area byte looks like a digit.
> 
> First version of JSON path parser had strntoll() implementation and 2N
> readings, but the bench showed, then building a number on fly is faster.
> 
> > 
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * Check that a symbol can be part of a JSON path not inside
> > > + * ["..."].
> > > + */
> > > +static inline bool
> > > +json_is_valid_identifier_symbol(UChar32 c)
> > > +{
> > > +	return u_isUAlphabetic(c) || c == (UChar32)'_' || u_isdigit(c);
> > > +}
> > > +
> > > +/**
> > > + * Parse identifier out of quotes. It can contain only alphas,
> > > + * digits and underscores. And can not contain digit at the first
> > > + * position. Parser is stoped right after the last non-digit,
> > > + * non-alpha and non-underscore symbol.
> > > + * @param parser JSON parser. Updates signs_read field.
> > 
> > What is 'signs_read'?
> 
> Did not notice that on self-review(
> 
> Fixed.
> 
> @@ -166,7 +166,7 @@ json_is_valid_identifier_symbol(UChar32 c)
>   * digits and underscores. And can not contain digit at the first
>   * position. Parser is stoped right after the last non-digit,
>   * non-alpha and non-underscore symbol.
> - * @param parser JSON parser. Updates signs_read field.
> + * @param parser JSON parser.
>   * @param[out] node JSON node to store result.
>   *
>   * @retval   0 Success.
> 
> > > +	/* First symbol can not be digit. */
> > > +	if (!u_isalpha(c) && c != (UChar32)'_')
> > > +		return parser->symbol_count;
> > > +	int last_offset = parser->offset;
> > > +	while ((rc = json_read_symbol(parser, &c)) == 0) {
> > 
> > > +		if (! json_is_valid_identifier_symbol(c)) {
> > 
> > I would fold this function, because we have to use u_isalpha right above
> > anyway.
> 
> is_valid_identifier checks not only on alpha and '_' - it checks on digits.
> An identifier can contain digits on non-first place.
> 
> > > +}
> > > +
> > > +int
> > > +json_path_next(struct json_path_parser *parser, struct json_path_node *node)
> > > +{
> > > +	UChar32 c;
> > > +	int last_offset = parser->offset;
> > > +	int rc = json_read_symbol(parser, &c);
> > > +	if (rc != 0) {
> > > +		if (parser->offset == parser->src_len) {
> > > +			node->type = JSON_PATH_END;
> > > +			return 0;
> > > +		}
> > 
> > IMO it would be more straightforward to first check EOF and only then
> > proceed to reading the string, not vice versa.
> 
> As you wish.
> 
> @@ -202,16 +202,15 @@ json_parse_identifier(struct json_path_parser *parser,
>  int
>  json_path_next(struct json_path_parser *parser, struct json_path_node
> *node)
>  {
> +       if (parser->offset == parser->src_len) {
> +               node->type = JSON_PATH_END;
> +               return 0;
> +       }
>         UChar32 c;
>         int last_offset = parser->offset;
>         int rc = json_read_symbol(parser, &c);
> -       if (rc != 0) {
> -               if (parser->offset == parser->src_len) {
> -                       node->type = JSON_PATH_END;
> -                       return 0;
> -               }
> +       if (rc != 0)
>                 return rc;
> -       }
>         switch(c) {
> 
> > > +void
> > > +test_basic()
> > > +{
> > > +	header();
> > > +	plan(67);
> > > +	const char *path;
> > > +	int len;
> > > +	struct json_path_parser parser;
> > > +	struct json_path_node node;
> > > +
> > > +	reset_to_new_path("[0].field1.field2['field3'][5]");
> > > +	is_next_index(3, 0);
> > > +	is_next_key("field1");
> > > +	is_next_key("field2");
> > > +	is_next_key("field3");
> > > +	is_next_index(3, 5);
> > > +
> > > +	reset_to_new_path("[3].field[2].field")
> > > +	is_next_index(3, 3);
> > > +	is_next_key("field");
> > > +	is_next_index(3, 2);
> > > +	is_next_key("field");
> > > +
> > > +	reset_to_new_path("[\"f1\"][\"f2'3'\"]");
> > > +	is_next_key("f1");
> > > +	is_next_key("f2'3'");
> > > +
> > > +	/* Support both '.field1...' and 'field1...'. */
> > 
> > The comment is misleading - there's no test case for 'field1...'
> 
> Thanks for the catch.
> 
> diff --git a/test/unit/json_path.c b/test/unit/json_path.c
> index fb35bf898..1d7e7d372 100644
> --- a/test/unit/json_path.c
> +++ b/test/unit/json_path.c
> @@ -26,7 +26,7 @@ void
>  test_basic()
>  {
>         header();
> -       plan(67);
> +       plan(71);
>         const char *path;
>         int len;
>         struct json_path_parser parser;
> @@ -52,6 +52,8 @@ test_basic()
>         /* Support both '.field1...' and 'field1...'. */
>         reset_to_new_path(".field1");
>         is_next_key("field1");
> +       reset_to_new_path("field1");
> +       is_next_key("field1");
> 
> > 
> > > +	reset_to_new_path(".field1");
> > > +	is_next_key("field1");
> > 
> > Shouldn't it return an empty string before 'field1' for such an input?
> 
> No, here everything is ok. My JSON path parser allows to start a path with
> '.' and it is not considered as an empty string + '.'. Exception is '.[...]'
> - it is an error (see tests in test_errors()).
> 
> I allowed '.identifier' syntax to lay emphasis that it is actually
> 'tuple.identifier'. The JSON path is just a suffix for a tuple.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/3] Lua: implement json path access to tuple fields
       [not found] ` <c3b64b7b4e638970864995c895bbd5f736282560.1523349020.git.v.shpilevoy@tarantool.org>
@ 2018-04-13 11:33   ` Vladimir Davydov
  2018-04-13 21:51     ` [tarantool-patches] " Vladislav Shpilevoy
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Davydov @ 2018-04-13 11:33 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, kshcherbatov

On Tue, Apr 10, 2018 at 11:31:46AM +0300, Vladislav Shpilevoy wrote:
> From: Kirill Shcherbatov <kshcherbatov@tarantool.org>
> 
> New tuple_field_raw_by_path and tuple_field_by_path APIs.

Would be nice to say a few words here about the new API.

> 
> Resolves #1285
> ---
>  src/box/CMakeLists.txt     |   2 +-
>  src/box/lua/tuple.c        |  63 +++++++++----
>  src/box/lua/tuple.lua      |  49 +++-------
>  src/box/tuple.h            |  21 +++++
>  src/box/tuple_format.c     | 164 ++++++++++++++++++++++++++++++++
>  src/box/tuple_format.h     |  19 ++++
>  test/engine/tuple.result   | 229 +++++++++++++++++++++++++++++++++++++++++++++
>  test/engine/tuple.test.lua |  67 +++++++++++++
>  8 files changed, 558 insertions(+), 56 deletions(-)
> 
> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
> index ef7225d10..ae156a2f5 100644
> --- a/src/box/CMakeLists.txt
> +++ b/src/box/CMakeLists.txt
> @@ -47,7 +47,7 @@ add_library(tuple STATIC
>      field_def.c
>      opt_def.c
>  )
> -target_link_libraries(tuple box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit)
> +target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit)
>  
>  add_library(xlog STATIC xlog.c)
>  target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES})
> diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
> index a0b4e4a79..c10e9b253 100644
> --- a/src/box/lua/tuple.c
> +++ b/src/box/lua/tuple.c
> @@ -41,6 +41,7 @@
>  #include "box/tuple_convert.h"
>  #include "box/errcode.h"
>  #include "box/memtx_tuple.h"
> +#include "json/path.h"
>  
>  /** {{{ box.tuple Lua library
>   *
> @@ -420,36 +421,58 @@ lbox_tuple_transform(struct lua_State *L)
>  }
>  
>  /**
> - * Find a tuple field using its name.
> + * Find a tuple field by JSON path. If a field was not found and a
> + * path contains JSON syntax errors, then an exception is raised.
>   * @param L Lua state.
> - * @param tuple 1-th argument on lua stack, tuple to get field
> + * @param tuple 1-th argument on a lua stack, tuple to get field
>   *        from.
> - * @param field_name 2-th argument on lua stack, field name to
> - *        get.
> + * @param path 2-th argument on lua stack. Can be field name,
> + *        JSON path to a field or a field number.
>   *
> - * @retval If a field was not found, return -1 and nil to lua else
> - *         return 0 and decoded field.
> + * @retval not nil Found field value.
> + * @retval     nil A field is NULL or does not exist.
>   */
>  static int
> -lbox_tuple_field_by_name(struct lua_State *L)
> +lbox_tuple_field_by_path(struct lua_State *L)
>  {
>  	struct tuple *tuple = luaT_istuple(L, 1);
>  	/* Is checked in Lua wrapper. */
>  	assert(tuple != NULL);
> -	assert(lua_isstring(L, 2));
> -	size_t name_len;
> -	const char *name = lua_tolstring(L, 2, &name_len);
> -	uint32_t name_hash = lua_hashstring(L, 2);
> -	const char *field =
> -		tuple_field_by_name(tuple, name, name_len, name_hash);
> -	if (field == NULL) {
> -		lua_pushinteger(L, -1);
> -		lua_pushnil(L);
> -		return 2;
> +	const char *field = NULL;
> +	if (lua_isnumber(L, 2)) {
> +		double dbl_index = lua_tonumber(L, 2);
> +		if (dbl_index != floor(dbl_index))
> +			goto usage_error;
> +		int index = (int) floor(dbl_index) - TUPLE_INDEX_BASE;
> +		if (index >= 0) {
> +			field = tuple_field(tuple, index);
> +			if (field == NULL) {
> +				lua_pushnil(L);
> +				return 1;
> +			}
> +		} else {
> +			lua_pushnil(L);
> +			return 1;
> +		}

Why did you move handling of integer index from Lua to C? AFAIU having
it in Lua should be faster. Besides, this new function is kinda
difficult to follow IMO as it does two things now. I'd prefer it to do
just one thing - looking up a tuple field by path.

Other than this, the patch looks good to me.

> +	} else if (lua_isstring(L, 2)) {
> +		size_t len;
> +		const char *path = lua_tolstring(L, 2, &len);
> +		if (len == 0)
> +			goto usage_error;
> +		if (tuple_field_by_path(tuple, path, (uint32_t) len,
> +					lua_hashstring(L, 2), &field) != 0) {
> +			return luaT_error(L);
> +		} else if (field == NULL) {
> +			lua_pushnil(L);
> +			return 1;
> +		}
> +	} else {
> +usage_error:
> +		return luaL_error(L, "Usage: tuple[<path> or number >= 1]");
>  	}
> -	lua_pushinteger(L, 0);
> +	assert(field != NULL);
>  	luamp_decode(L, luaL_msgpack_default, &field);
> -	return 2;
> +	return 1;
>  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v3 3/3] Lua: implement json path access to tuple fields
  2018-04-13 11:33   ` [PATCH v3 3/3] Lua: implement json path access to tuple fields Vladimir Davydov
@ 2018-04-13 21:51     ` Vladislav Shpilevoy
  2018-04-16  8:35       ` Vladimir Davydov
  0 siblings, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-13 21:51 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov; +Cc: kshcherbatov

Hello. Thanks for review.

New commit message:

     lua: implement json path access to tuple fields

     Until this moment in Lua a tuple data could be accessed in 3 ways:
     - get field by field number, decode into Lua,
     - get field by field name, decode into Lua
     - decode entire tuple into Lua.

     It was impossible to decode into Lua only a part of the field to
     avoid Lua garbage creating. For example, consider the tuple:
     `{1, 2, {key1 = value1,
              key2 = {key21 = {key211 = {key2111, key2112}}}}}`
     with field names `field1`, `field2`, `field3`.

     To get `key2112` it is necessary to decode into Lua the entire
     3-th field, including `key1`, `value1`, `key2`, `key21`,
     `key211`, `key2111` using the syntax
     `key2112 = tuple.field3.key2.key21.key211[2]`.

     Now the one can use the following syntax:
     `key2112 = tuple["field3.key2.key21.key211[2]"]`. The difference
     is in placing all the path into quotes and brackets `["..."]`.
     The Tarantool goes through this path and MessagePack tuple body,
     gets the needed tuple part and decodes into Lua only it.

     The path must be valid JSON path with one exception - in
     Tarantool a path can start with `.`. For example:
     `tuple[".field3..."]` - it is done to lay emphasis that the JSON
     path here is just a suffix for the tuple.

     At the same time tuple field names still work: `tuple["field1"]`,
     `tuple.field2`.

     If the field name looks like JSON path, for example:
     `my.field.name`, then it works too. The path at first is checked
     to be a real JSON path, and if nothing is found, then the entire
     path is considered as a field name. To combine such names and
     path elements the one can use `["..."]`. For example:
     `tuple["['my.field.name'].key1.key2.['another.key.in.map']"]`.

     Closes #1285

> 
> Why did you move handling of integer index from Lua to C? AFAIU having
> it in Lua should be faster. Besides, this new function is kinda
> difficult to follow IMO as it does two things now. I'd prefer it to do
> just one thing - looking up a tuple field by path.
> 

Fixed. I returned the tuple field by number implementation as it
was before JSON path patch.


diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
index c10e9b253..e01c12a56 100644
--- a/src/box/lua/tuple.c
+++ b/src/box/lua/tuple.c
@@ -426,8 +426,8 @@ lbox_tuple_transform(struct lua_State *L)
   * @param L Lua state.
   * @param tuple 1-th argument on a lua stack, tuple to get field
   *        from.
- * @param path 2-th argument on lua stack. Can be field name,
- *        JSON path to a field or a field number.
+ * @param path 2-th argument on lua stack. Can be field name or a
+ *        JSON path to a field.
   *
   * @retval not nil Found field value.
   * @retval     nil A field is NULL or does not exist.
@@ -438,37 +438,16 @@ lbox_tuple_field_by_path(struct lua_State *L)
  	struct tuple *tuple = luaT_istuple(L, 1);
  	/* Is checked in Lua wrapper. */
  	assert(tuple != NULL);
-	const char *field = NULL;
-	if (lua_isnumber(L, 2)) {
-		double dbl_index = lua_tonumber(L, 2);
-		if (dbl_index != floor(dbl_index))
-			goto usage_error;
-		int index = (int) floor(dbl_index) - TUPLE_INDEX_BASE;
-		if (index >= 0) {
-			field = tuple_field(tuple, index);
-			if (field == NULL) {
-				lua_pushnil(L);
-				return 1;
-			}
-		} else {
-			lua_pushnil(L);
-			return 1;
-		}
-	} else if (lua_isstring(L, 2)) {
-		size_t len;
-		const char *path = lua_tolstring(L, 2, &len);
-		if (len == 0)
-			goto usage_error;
-		if (tuple_field_by_path(tuple, path, (uint32_t) len,
-					lua_hashstring(L, 2), &field) != 0) {
-			return luaT_error(L);
-		} else if (field == NULL) {
-			lua_pushnil(L);
-			return 1;
-		}
-	} else {
-usage_error:
-		return luaL_error(L, "Usage: tuple[<path> or number >= 1]");
+	assert(lua_isstring(L, 2));
+	size_t len;
+	const char *field = NULL, *path = lua_tolstring(L, 2, &len);
+	if (len == 0)
+		return 0;
+	if (tuple_field_by_path(tuple, path, (uint32_t) len,
+				lua_hashstring(L, 2), &field) != 0) {
+		return luaT_error(L);
+	} else if (field == NULL) {
+		return 0;
  	}
  	assert(field != NULL);
  	luamp_decode(L, luaL_msgpack_default, &field);
diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua
index 0f368ba9c..63ea73e43 100644
--- a/src/box/lua/tuple.lua
+++ b/src/box/lua/tuple.lua
@@ -296,20 +296,31 @@ end

  methods["__serialize"] = tuple_totable -- encode hook for 
msgpack/yaml/json

+local tuple_field = function(tuple, field_n)
+    local field = builtin.box_tuple_field(tuple, field_n - 1)
+    if field == nil then
+        return nil
+    end
+    -- Use () to shrink stack to the first return value
+    return (msgpackffi.decode_unchecked(field))
+end
+
  ffi.metatype(tuple_t, {
      __len = function(tuple)
          return builtin.box_tuple_field_count(tuple)
      end;
      __tostring = internal.tuple.tostring;
      __index = function(tuple, key)
-        if type(key) == "string" or type(key) == "number" then
-            -- Try to get a field by json path or by [index]. If
-            -- it was not found (rc ~= 0) then return a method
-            -- from the vtable. If a collision occurred, then
-            -- fields have higher priority. For example, if a
-            -- tuple T has a field with name 'bsize', then T.bsize
-            -- returns field value, not tuple_bsize function. To
-            -- access hidden methods use
+        if type(key) == "number" then
+            return tuple_field(tuple, key)
+        elseif type(key) == "string" then
+            -- Try to get a field by JSON path. If it was not
+            -- found (rc ~= 0) then return a method from the
+            -- vtable. If a collision occurred, then fields have
+            -- higher priority. For example, if a tuple T has a
+            -- field with name 'bsize', then T.bsize returns
+            -- field value, not tuple_bsize function. To access
+            -- hidden methods use
              -- 'box.tuple.<method_name>(T, [args...])'.
              local res = tuple_field_by_path(tuple, key)
              if res ~= nil then

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v3 3/3] Lua: implement json path access to tuple fields
  2018-04-13 21:51     ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-04-16  8:35       ` Vladimir Davydov
  2018-04-16 10:02         ` Vladislav Shpilevoy
  2018-04-22 14:21         ` Vladislav Shpilevoy
  0 siblings, 2 replies; 11+ messages in thread
From: Vladimir Davydov @ 2018-04-16  8:35 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, kshcherbatov

On Sat, Apr 14, 2018 at 12:51:30AM +0300, Vladislav Shpilevoy wrote:
> Hello. Thanks for review.
> 
> New commit message:
> 
>     lua: implement json path access to tuple fields
> 
>     Until this moment in Lua a tuple data could be accessed in 3 ways:
>     - get field by field number, decode into Lua,
>     - get field by field name, decode into Lua
>     - decode entire tuple into Lua.
> 
>     It was impossible to decode into Lua only a part of the field to
>     avoid Lua garbage creating. For example, consider the tuple:
>     `{1, 2, {key1 = value1,
>              key2 = {key21 = {key211 = {key2111, key2112}}}}}`
>     with field names `field1`, `field2`, `field3`.
> 
>     To get `key2112` it is necessary to decode into Lua the entire
>     3-th field, including `key1`, `value1`, `key2`, `key21`,
>     `key211`, `key2111` using the syntax
>     `key2112 = tuple.field3.key2.key21.key211[2]`.
> 
>     Now the one can use the following syntax:
>     `key2112 = tuple["field3.key2.key21.key211[2]"]`. The difference
>     is in placing all the path into quotes and brackets `["..."]`.
>     The Tarantool goes through this path and MessagePack tuple body,
>     gets the needed tuple part and decodes into Lua only it.
> 
>     The path must be valid JSON path with one exception - in
>     Tarantool a path can start with `.`. For example:
>     `tuple[".field3..."]` - it is done to lay emphasis that the JSON
>     path here is just a suffix for the tuple.
> 
>     At the same time tuple field names still work: `tuple["field1"]`,
>     `tuple.field2`.
> 
>     If the field name looks like JSON path, for example:
>     `my.field.name`, then it works too. The path at first is checked
>     to be a real JSON path, and if nothing is found, then the entire
>     path is considered as a field name. To combine such names and
>     path elements the one can use `["..."]`. For example:
>     `tuple["['my.field.name'].key1.key2.['another.key.in.map']"]`.
> 
>     Closes #1285
> 
> > 
> > Why did you move handling of integer index from Lua to C? AFAIU having
> > it in Lua should be faster. Besides, this new function is kinda
> > difficult to follow IMO as it does two things now. I'd prefer it to do
> > just one thing - looking up a tuple field by path.
> > 
> 
> Fixed. I returned the tuple field by number implementation as it
> was before JSON path patch.

Thanks. The patch looks good to me now.

There's one thing: the hunk below doesn't belong to this patch.
Obviously, it should be a part of the previous patch. Please move
it there before pushing it to the trunk.

diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index 9b4ea641..a2a2f829 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,7 +1,7 @@
 	*** main ***
 1..2
 	*** test_basic ***
-    1..67
+    1..71
     ok 1 - parse <[0]>
     ok 2 - <[0]> is num
     ok 3 - <[0]> is 0
@@ -46,29 +46,33 @@
     ok 42 - <field1> is str
     ok 43 - len is 6
     ok 44 - str is field1
-    ok 45 - parse <[1234]>
-    ok 46 - <[1234]> is num
-    ok 47 - <[1234]> is 1234
-    ok 48 - parse empty path
-    ok 49 - is str
-    ok 50 - parse <field1>
-    ok 51 - <field1> is str
-    ok 52 - len is 6
-    ok 53 - str is field1
-    ok 54 - parse <[2]>
-    ok 55 - <[2]> is num
-    ok 56 - <[2]> is 2
-    ok 57 - parse <[6]>
-    ok 58 - <[6]> is num
-    ok 59 - <[6]> is 6
-    ok 60 - parse <привет中国world>
-    ok 61 - <привет中国world> is str
-    ok 62 - len is 23
-    ok 63 - str is привет中国world
-    ok 64 - parse <中国a>
-    ok 65 - <中国a> is str
-    ok 66 - len is 7
-    ok 67 - str is 中国a
+    ok 45 - parse <field1>
+    ok 46 - <field1> is str
+    ok 47 - len is 6
+    ok 48 - str is field1
+    ok 49 - parse <[1234]>
+    ok 50 - <[1234]> is num
+    ok 51 - <[1234]> is 1234
+    ok 52 - parse empty path
+    ok 53 - is str
+    ok 54 - parse <field1>
+    ok 55 - <field1> is str
+    ok 56 - len is 6
+    ok 57 - str is field1
+    ok 58 - parse <[2]>
+    ok 59 - <[2]> is num
+    ok 60 - <[2]> is 2
+    ok 61 - parse <[6]>
+    ok 62 - <[6]> is num
+    ok 63 - <[6]> is 6
+    ok 64 - parse <привет中国world>
+    ok 65 - <привет中国world> is str
+    ok 66 - len is 23
+    ok 67 - str is привет中国world
+    ok 68 - parse <中国a>
+    ok 69 - <中国a> is str
+    ok 70 - len is 7
+    ok 71 - str is 中国a
 ok 1 - subtests
 	*** test_basic: done ***
 	*** test_errors ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v3 3/3] Lua: implement json path access to tuple fields
  2018-04-16  8:35       ` Vladimir Davydov
@ 2018-04-16 10:02         ` Vladislav Shpilevoy
  2018-04-22 14:21         ` Vladislav Shpilevoy
  1 sibling, 0 replies; 11+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-16 10:02 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches, kshcherbatov

Hello.

> 
> Thanks. The patch looks good to me now.
> 
> There's one thing: the hunk below doesn't belong to this patch.
> Obviously, it should be a part of the previous patch. Please move
> it there before pushing it to the trunk.

Done on the branch.

> 
> diff --git a/test/unit/json_path.result b/test/unit/json_path.result
> index 9b4ea641..a2a2f829 100644
> --- a/test/unit/json_path.result
> +++ b/test/unit/json_path.result
> @@ -1,7 +1,7 @@
>   	*** main ***
>   1..2
>   	*** test_basic ***
> -    1..67
> +    1..71
>       ok 1 - parse <[0]>
>       ok 2 - <[0]> is num
>       ok 3 - <[0]> is 0
> @@ -46,29 +46,33 @@
>       ok 42 - <field1> is str
>       ok 43 - len is 6
>       ok 44 - str is field1
> -    ok 45 - parse <[1234]>
> -    ok 46 - <[1234]> is num
> -    ok 47 - <[1234]> is 1234
> -    ok 48 - parse empty path
> -    ok 49 - is str
> -    ok 50 - parse <field1>
> -    ok 51 - <field1> is str
> -    ok 52 - len is 6
> -    ok 53 - str is field1
> -    ok 54 - parse <[2]>
> -    ok 55 - <[2]> is num
> -    ok 56 - <[2]> is 2
> -    ok 57 - parse <[6]>
> -    ok 58 - <[6]> is num
> -    ok 59 - <[6]> is 6
> -    ok 60 - parse <привет中国world>
> -    ok 61 - <привет中国world> is str
> -    ok 62 - len is 23
> -    ok 63 - str is привет中国world
> -    ok 64 - parse <中国a>
> -    ok 65 - <中国a> is str
> -    ok 66 - len is 7
> -    ok 67 - str is 中国a
> +    ok 45 - parse <field1>
> +    ok 46 - <field1> is str
> +    ok 47 - len is 6
> +    ok 48 - str is field1
> +    ok 49 - parse <[1234]>
> +    ok 50 - <[1234]> is num
> +    ok 51 - <[1234]> is 1234
> +    ok 52 - parse empty path
> +    ok 53 - is str
> +    ok 54 - parse <field1>
> +    ok 55 - <field1> is str
> +    ok 56 - len is 6
> +    ok 57 - str is field1
> +    ok 58 - parse <[2]>
> +    ok 59 - <[2]> is num
> +    ok 60 - <[2]> is 2
> +    ok 61 - parse <[6]>
> +    ok 62 - <[6]> is num
> +    ok 63 - <[6]> is 6
> +    ok 64 - parse <привет中国world>
> +    ok 65 - <привет中国world> is str
> +    ok 66 - len is 23
> +    ok 67 - str is привет中国world
> +    ok 68 - parse <中国a>
> +    ok 69 - <中国a> is str
> +    ok 70 - len is 7
> +    ok 71 - str is 中国a
>   ok 1 - subtests
>   	*** test_basic: done ***
>   	*** test_errors ***
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH v3 3/3] Lua: implement json path access to tuple fields
  2018-04-16  8:35       ` Vladimir Davydov
  2018-04-16 10:02         ` Vladislav Shpilevoy
@ 2018-04-22 14:21         ` Vladislav Shpilevoy
  1 sibling, 0 replies; 11+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-22 14:21 UTC (permalink / raw)
  To: tarantool-patches, Konstantin Osipov, Kirill Yukhin

Hello. Now it is a week after Vova responded ACK to this patchset.
I did not saw new comments, so according to our autopush-after-Vova-ack
policy I pushed the patchset into 1.10 and 2.0.

On 16/04/2018 11:35, Vladimir Davydov wrote:
> On Sat, Apr 14, 2018 at 12:51:30AM +0300, Vladislav Shpilevoy wrote:
>> Hello. Thanks for review.
>>
>> New commit message:
>>
>>      lua: implement json path access to tuple fields
> 
> Thanks. The patch looks good to me now.
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-04-22 14:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10  8:31 [PATCH v3 0/3] Implement json path access to tuple fields Vladislav Shpilevoy
2018-04-10  8:31 ` [PATCH v3 1/3] Allow gcov on Mac Vladislav Shpilevoy
2018-04-10 10:49   ` Vladimir Davydov
     [not found] ` <f1302082b4457a03b5d2b3c44526815428de4a0e.1523349020.git.v.shpilevoy@tarantool.org>
2018-04-10 16:36   ` [PATCH v3 2/3] Introduce json_path_parser with Unicode support Vladimir Davydov
2018-04-10 18:42     ` [tarantool-patches] " Vladislav Shpilevoy
2018-04-11  9:20       ` Vladimir Davydov
     [not found] ` <c3b64b7b4e638970864995c895bbd5f736282560.1523349020.git.v.shpilevoy@tarantool.org>
2018-04-13 11:33   ` [PATCH v3 3/3] Lua: implement json path access to tuple fields Vladimir Davydov
2018-04-13 21:51     ` [tarantool-patches] " Vladislav Shpilevoy
2018-04-16  8:35       ` Vladimir Davydov
2018-04-16 10:02         ` Vladislav Shpilevoy
2018-04-22 14:21         ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox