Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/2] JSON field multikey crash
@ 2020-08-04 23:45 Vladislav Shpilevoy
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-04 23:45 UTC (permalink / raw)
  To: tarantool-patches, alyapunov, korablev

The patchset fixes 2 crashes related to multikey in JSON path
tuple field access code.

Also during working on this I found
https://github.com/tarantool/tarantool/issues/5226, but couldn't
find a simple solution.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5224-tuple-field-by-path-crash
Issue: https://github.com/tarantool/tarantool/issues/5224

@ChangeLog
* Fixed a crash when JSON tuple field access was used to get a multikey indexed field, and when a JSON contained [*] in the beginning;

Vladislav Shpilevoy (2):
  tuple: fix multikey field JSON access crash
  tuple: fix access by JSON path starting from '[*]'

 src/box/tuple.c                               |   3 +-
 src/box/tuple.h                               |   8 +
 test/box/gh-5224-multikey-field-access.result | 164 ++++++++++++++++++
 .../gh-5224-multikey-field-access.test.lua    |  72 ++++++++
 4 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 test/box/gh-5224-multikey-field-access.result
 create mode 100644 test/box/gh-5224-multikey-field-access.test.lua

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-04 23:45 [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Vladislav Shpilevoy
@ 2020-08-04 23:45 ` Vladislav Shpilevoy
  2020-08-06 16:00   ` Oleg Babin
                     ` (2 more replies)
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]' Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-04 23:45 UTC (permalink / raw)
  To: tarantool-patches, alyapunov, korablev

When a tuple had format with multikey indexes in it, any attempt
to get a multikey indexed field by a JSON path from Lua led to a
crash.

That was because of incorrect interpretation of offset slot value
in tuple's field map.

Tuple field map is an array stored before the tuple's MessagePack
data. Each element is a 4 byte offset to an indexed value to be
able to get it for O(1) time without MessagePack decoding of all
the previous fields.

At least it was so before multikeys. Now tuple field map is not
just an array. It is rather a 2-level array, somehow similar to
ext4 FS. Some elements of the root array are positive numbers
pointing at data. Some elements point at a second 'indirect'
array, so called 'extra', size of which is individual for each
tuple. These second arrays are used by multikey indexes to store
offsets to each multikey indexed value in a tuple.

It means, that if there is an offset slot, it can't be just used
as is. It is allowed only if the field is not multikey. Otherwise
it is neccessary to somehow get an index in the second 'indirect'
array.

This is what was happening - a multikey field was found, its
offset slot was valid, but it was pointing at an 'indirect' array,
not at the data. JSON tuple field access tried to use it as a data
offset.

The patch makes JSON field access degrade to fullscan when a field
is multikey, but no multikey array index is provided.

Closes #5224
---
 src/box/tuple.h                               |   8 +
 test/box/gh-5224-multikey-field-access.result | 155 ++++++++++++++++++
 .../gh-5224-multikey-field-access.test.lua    |  66 ++++++++
 3 files changed, 229 insertions(+)
 create mode 100644 test/box/gh-5224-multikey-field-access.result
 create mode 100644 test/box/gh-5224-multikey-field-access.test.lua

diff --git a/src/box/tuple.h b/src/box/tuple.h
index 4752323e4..2b48d6af9 100644
--- a/src/box/tuple.h
+++ b/src/box/tuple.h
@@ -628,6 +628,14 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			goto parse;
 		if (offset_slot_hint != NULL)
 			*offset_slot_hint = offset_slot;
+		/*
+		 * When the field is multikey, the offset slot points not at the
+		 * data. It points at 'extra' array of offsets for this multikey
+		 * index. That array can only be accessed if index in that array
+		 * is known.
+		 */
+		if (field->is_multikey_part && multikey_idx == MULTIKEY_NONE)
+			goto parse;
 offset_slot_access:
 		/* Indexed field */
 		offset = field_map_get_offset(field_map, offset_slot,
diff --git a/test/box/gh-5224-multikey-field-access.result b/test/box/gh-5224-multikey-field-access.result
new file mode 100644
index 000000000..014e810d0
--- /dev/null
+++ b/test/box/gh-5224-multikey-field-access.result
@@ -0,0 +1,155 @@
+-- test-run result file version 2
+--
+-- gh-5224: tuple field access by JSON path crashed when tried to get a multikey
+-- indexed field.
+--
+format = {}
+ | ---
+ | ...
+format[1] = {name = 'f1', type = 'unsigned'}
+ | ---
+ | ...
+format[2] = {name = 'f2', type = 'array'}
+ | ---
+ | ...
+s = box.schema.create_space('test', {format = format})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].tags", type = "unsigned"}                       \
+    }                                                                           \
+})
+ | ---
+ | ...
+
+t = s:replace{1, {{tags = 2}}}
+ | ---
+ | ...
+t['[2][1].tags']
+ | ---
+ | - 2
+ | ...
+
+t = s:replace{1, {{tags = 2}, {tags = 3}, {tags = 4}}}
+ | ---
+ | ...
+t['[2]']
+ | ---
+ | - [{'tags': 2}, {'tags': 3}, {'tags': 4}]
+ | ...
+t['[2][1]']
+ | ---
+ | - {'tags': 2}
+ | ...
+t['[2][1].tags']
+ | ---
+ | - 2
+ | ...
+t['[2][2]']
+ | ---
+ | - {'tags': 3}
+ | ...
+t['[2][2].tags']
+ | ---
+ | - 3
+ | ...
+t['[2][3]']
+ | ---
+ | - {'tags': 4}
+ | ...
+t['[2][3].tags']
+ | ---
+ | - 4
+ | ...
+
+s:truncate()
+ | ---
+ | ...
+s.index.sk:drop()
+ | ---
+ | ...
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].p1.p2", type = "unsigned"}                      \
+    }                                                                           \
+})
+ | ---
+ | ...
+
+t = s:replace{1, {{p1 = {p2 = 2}}}}
+ | ---
+ | ...
+t['[2][1].p1.p2']
+ | ---
+ | - 2
+ | ...
+
+t = s:replace{1, {                                                              \
+    {                                                                           \
+        p1 = {                                                                  \
+            p2 = 2, p3 = 3                                                      \
+        },                                                                      \
+        p4 = 4                                                                  \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 5}                                                           \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 6}                                                           \
+    }                                                                           \
+}}
+ | ---
+ | ...
+
+t['[2][1].p1.p2']
+ | ---
+ | - 2
+ | ...
+t['[2][1].p1.p3']
+ | ---
+ | - 3
+ | ...
+t['[2][1].p1']
+ | ---
+ | - {'p2': 2, 'p3': 3}
+ | ...
+t['[2][1]']
+ | ---
+ | - {'p4': 4, 'p1': {'p2': 2, 'p3': 3}}
+ | ...
+t['[2][1].p4']
+ | ---
+ | - 4
+ | ...
+t['[2][2].p1.p2']
+ | ---
+ | - 5
+ | ...
+t['[2][2].p1']
+ | ---
+ | - {'p2': 5}
+ | ...
+t['[2][2]']
+ | ---
+ | - {'p1': {'p2': 5}}
+ | ...
+t['[2][3].p1.p2']
+ | ---
+ | - 6
+ | ...
+t['[2][3].p1']
+ | ---
+ | - {'p2': 6}
+ | ...
+t['[2][3]']
+ | ---
+ | - {'p1': {'p2': 6}}
+ | ...
+
+s:drop()
+ | ---
+ | ...
diff --git a/test/box/gh-5224-multikey-field-access.test.lua b/test/box/gh-5224-multikey-field-access.test.lua
new file mode 100644
index 000000000..19765171e
--- /dev/null
+++ b/test/box/gh-5224-multikey-field-access.test.lua
@@ -0,0 +1,66 @@
+--
+-- gh-5224: tuple field access by JSON path crashed when tried to get a multikey
+-- indexed field.
+--
+format = {}
+format[1] = {name = 'f1', type = 'unsigned'}
+format[2] = {name = 'f2', type = 'array'}
+s = box.schema.create_space('test', {format = format})
+_ = s:create_index('pk')
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].tags", type = "unsigned"}                       \
+    }                                                                           \
+})
+
+t = s:replace{1, {{tags = 2}}}
+t['[2][1].tags']
+
+t = s:replace{1, {{tags = 2}, {tags = 3}, {tags = 4}}}
+t['[2]']
+t['[2][1]']
+t['[2][1].tags']
+t['[2][2]']
+t['[2][2].tags']
+t['[2][3]']
+t['[2][3].tags']
+
+s:truncate()
+s.index.sk:drop()
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].p1.p2", type = "unsigned"}                      \
+    }                                                                           \
+})
+
+t = s:replace{1, {{p1 = {p2 = 2}}}}
+t['[2][1].p1.p2']
+
+t = s:replace{1, {                                                              \
+    {                                                                           \
+        p1 = {                                                                  \
+            p2 = 2, p3 = 3                                                      \
+        },                                                                      \
+        p4 = 4                                                                  \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 5}                                                           \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 6}                                                           \
+    }                                                                           \
+}}
+
+t['[2][1].p1.p2']
+t['[2][1].p1.p3']
+t['[2][1].p1']
+t['[2][1]']
+t['[2][1].p4']
+t['[2][2].p1.p2']
+t['[2][2].p1']
+t['[2][2]']
+t['[2][3].p1.p2']
+t['[2][3].p1']
+t['[2][3]']
+
+s:drop()
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]'
  2020-08-04 23:45 [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Vladislav Shpilevoy
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
@ 2020-08-04 23:45 ` Vladislav Shpilevoy
  2020-08-10 17:52   ` Nikita Pettik
  2020-08-11 18:50   ` Aleksandr Lyapunov
  2020-08-10 10:10 ` [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Aleksandr Lyapunov
  2020-08-12 20:34 ` Vladislav Shpilevoy
  3 siblings, 2 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-04 23:45 UTC (permalink / raw)
  To: tarantool-patches, alyapunov, korablev

Tuple JSON field access crashed when '[*]' was used as a first
part of the JSON path. The patch makes it treated like 'field not
found'.

Follow-up #5224
---
 src/box/tuple.c                                 | 3 ++-
 test/box/gh-5224-multikey-field-access.result   | 9 +++++++++
 test/box/gh-5224-multikey-field-access.test.lua | 6 ++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/box/tuple.c b/src/box/tuple.c
index 9f0f24c64..e77753a5b 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -531,7 +531,8 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple,
 		break;
 	}
 	default:
-		assert(token.type == JSON_TOKEN_END);
+		assert(token.type == JSON_TOKEN_END ||
+		       token.type == JSON_TOKEN_ANY);
 		return NULL;
 	}
 	return tuple_field_raw_by_path(format, tuple, field_map, fieldno,
diff --git a/test/box/gh-5224-multikey-field-access.result b/test/box/gh-5224-multikey-field-access.result
index 014e810d0..20df44cc3 100644
--- a/test/box/gh-5224-multikey-field-access.result
+++ b/test/box/gh-5224-multikey-field-access.result
@@ -150,6 +150,15 @@ t['[2][3]']
  | - {'p1': {'p2': 6}}
  | ...
 
+--
+-- Multikey path part could crash when used as a first part of the path during
+-- accessing a tuple field.
+--
+t['[*]']
+ | ---
+ | - null
+ | ...
+
 s:drop()
  | ---
  | ...
diff --git a/test/box/gh-5224-multikey-field-access.test.lua b/test/box/gh-5224-multikey-field-access.test.lua
index 19765171e..6f6691d3c 100644
--- a/test/box/gh-5224-multikey-field-access.test.lua
+++ b/test/box/gh-5224-multikey-field-access.test.lua
@@ -63,4 +63,10 @@ t['[2][3].p1.p2']
 t['[2][3].p1']
 t['[2][3]']
 
+--
+-- Multikey path part could crash when used as a first part of the path during
+-- accessing a tuple field.
+--
+t['[*]']
+
 s:drop()
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
@ 2020-08-06 16:00   ` Oleg Babin
  2020-08-06 20:04     ` Vladislav Shpilevoy
  2020-08-10 16:09   ` Nikita Pettik
  2020-08-11  9:44   ` Aleksandr Lyapunov
  2 siblings, 1 reply; 15+ messages in thread
From: Oleg Babin @ 2020-08-06 16:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, alyapunov, korablev

Hi! Thanks for your patch. It's not a review but I have a question.

On 05/08/2020 02:45, Vladislav Shpilevoy wrote:
> When a tuple had format with multikey indexes in it, any attempt
> to get a multikey indexed field by a JSON path from Lua led to a
> crash.
>
> That was because of incorrect interpretation of offset slot value
> in tuple's field map.
>
> Tuple field map is an array stored before the tuple's MessagePack
> data. Each element is a 4 byte offset to an indexed value to be
> able to get it for O(1) time without MessagePack decoding of all
> the previous fields.
>
> At least it was so before multikeys. Now tuple field map is not
> just an array. It is rather a 2-level array, somehow similar to
> ext4 FS. Some elements of the root array are positive numbers
> pointing at data. Some elements point at a second 'indirect'
> array, so called 'extra', size of which is individual for each
> tuple. These second arrays are used by multikey indexes to store
> offsets to each multikey indexed value in a tuple.


Do json path updates use offsets? Is such issue relevant for them?

I tried to update poisoned tuple but seems it works fine. But maybe I've 
missed something.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-06 16:00   ` Oleg Babin
@ 2020-08-06 20:04     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-06 20:04 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, alyapunov, korablev

On 06.08.2020 18:00, Oleg Babin wrote:
> Hi! Thanks for your patch. It's not a review but I have a question.
> 
> On 05/08/2020 02:45, Vladislav Shpilevoy wrote:
>> When a tuple had format with multikey indexes in it, any attempt
>> to get a multikey indexed field by a JSON path from Lua led to a
>> crash.
>>
>> That was because of incorrect interpretation of offset slot value
>> in tuple's field map.
>>
>> Tuple field map is an array stored before the tuple's MessagePack
>> data. Each element is a 4 byte offset to an indexed value to be
>> able to get it for O(1) time without MessagePack decoding of all
>> the previous fields.
>>
>> At least it was so before multikeys. Now tuple field map is not
>> just an array. It is rather a 2-level array, somehow similar to
>> ext4 FS. Some elements of the root array are positive numbers
>> pointing at data. Some elements point at a second 'indirect'
>> array, so called 'extra', size of which is individual for each
>> tuple. These second arrays are used by multikey indexes to store
>> offsets to each multikey indexed value in a tuple.
> 
> 
> Do json path updates use offsets? Is such issue relevant for them?
> 
> I tried to update poisoned tuple but seems it works fine. But maybe I've missed something.

No, JSON updates always decode whole tuple, at least all fields <=
max affected field. So offsets are not used. I was thinking about
adding them, but so far there was no a request for it, nor benches
how would it help exactly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] JSON field multikey crash
  2020-08-04 23:45 [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Vladislav Shpilevoy
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]' Vladislav Shpilevoy
@ 2020-08-10 10:10 ` Aleksandr Lyapunov
  2020-08-10 22:22   ` Vladislav Shpilevoy
  2020-08-12 20:34 ` Vladislav Shpilevoy
  3 siblings, 1 reply; 15+ messages in thread
From: Aleksandr Lyapunov @ 2020-08-10 10:10 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, korablev

Hi! thanks for the patch.
Btw, why do we call them 'JSON fields'? Is it supposed that it should 
work somehow
with Java Script Object Notation serialization format?

On 8/5/20 2:45 AM, Vladislav Shpilevoy wrote:
> The patchset fixes 2 crashes related to multikey in JSON path
> tuple field access code.
>
> Also during working on this I found
> https://github.com/tarantool/tarantool/issues/5226, but couldn't
> find a simple solution.
>
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5224-tuple-field-by-path-crash
> Issue: https://github.com/tarantool/tarantool/issues/5224
>
> @ChangeLog
> * Fixed a crash when JSON tuple field access was used to get a multikey indexed field, and when a JSON contained [*] in the beginning;
>
> Vladislav Shpilevoy (2):
>    tuple: fix multikey field JSON access crash
>    tuple: fix access by JSON path starting from '[*]'
>
>   src/box/tuple.c                               |   3 +-
>   src/box/tuple.h                               |   8 +
>   test/box/gh-5224-multikey-field-access.result | 164 ++++++++++++++++++
>   .../gh-5224-multikey-field-access.test.lua    |  72 ++++++++
>   4 files changed, 246 insertions(+), 1 deletion(-)
>   create mode 100644 test/box/gh-5224-multikey-field-access.result
>   create mode 100644 test/box/gh-5224-multikey-field-access.test.lua
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
  2020-08-06 16:00   ` Oleg Babin
@ 2020-08-10 16:09   ` Nikita Pettik
  2020-08-11  9:44   ` Aleksandr Lyapunov
  2 siblings, 0 replies; 15+ messages in thread
From: Nikita Pettik @ 2020-08-10 16:09 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 05 Aug 01:45, Vladislav Shpilevoy wrote:
> When a tuple had format with multikey indexes in it, any attempt
> to get a multikey indexed field by a JSON path from Lua led to a
> crash.
> 
> That was because of incorrect interpretation of offset slot value
> in tuple's field map.
> 
> Tuple field map is an array stored before the tuple's MessagePack
> data. Each element is a 4 byte offset to an indexed value to be
> able to get it for O(1) time without MessagePack decoding of all
> the previous fields.
> 
> At least it was so before multikeys. Now tuple field map is not
> just an array. It is rather a 2-level array, somehow similar to
> ext4 FS. Some elements of the root array are positive numbers
> pointing at data. Some elements point at a second 'indirect'
> array, so called 'extra', size of which is individual for each
> tuple. These second arrays are used by multikey indexes to store
> offsets to each multikey indexed value in a tuple.
> 
> It means, that if there is an offset slot, it can't be just used
> as is. It is allowed only if the field is not multikey. Otherwise
> it is neccessary to somehow get an index in the second 'indirect'
> array.
> 
> This is what was happening - a multikey field was found, its
> offset slot was valid, but it was pointing at an 'indirect' array,
> not at the data. JSON tuple field access tried to use it as a data
> offset.
> 
> The patch makes JSON field access degrade to fullscan when a field
> is multikey, but no multikey array index is provided.
> 
> Closes #5224

LGTM

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]'
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]' Vladislav Shpilevoy
@ 2020-08-10 17:52   ` Nikita Pettik
  2020-08-11 18:50   ` Aleksandr Lyapunov
  1 sibling, 0 replies; 15+ messages in thread
From: Nikita Pettik @ 2020-08-10 17:52 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 05 Aug 01:45, Vladislav Shpilevoy wrote:
> Tuple JSON field access crashed when '[*]' was used as a first
> part of the JSON path. The patch makes it treated like 'field not
> found'.
> 
> Follow-up #5224

LGTM

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] JSON field multikey crash
  2020-08-10 10:10 ` [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Aleksandr Lyapunov
@ 2020-08-10 22:22   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-10 22:22 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches, korablev

Hi!

On 10.08.2020 12:10, Aleksandr Lyapunov wrote:
> Hi! thanks for the patch.
> Btw, why do we call them 'JSON fields'? Is it supposed that it should work somehow
> with Java Script Object Notation serialization format?

Yes. This is what JSON means. In context of this patch it is meant 'JSON field access'.
Access of a tuple field by a JSON path.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
  2020-08-06 16:00   ` Oleg Babin
  2020-08-10 16:09   ` Nikita Pettik
@ 2020-08-11  9:44   ` Aleksandr Lyapunov
  2020-08-11 21:24     ` Vladislav Shpilevoy
  2 siblings, 1 reply; 15+ messages in thread
From: Aleksandr Lyapunov @ 2020-08-11  9:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, korablev

Hi! thanks again for the patch. see one comment below.

On 8/5/20 2:45 AM, Vladislav Shpilevoy wrote:
>   			goto parse;
>   		if (offset_slot_hint != NULL)
>   			*offset_slot_hint = offset_slot;
> +		/*
> +		 * When the field is multikey, the offset slot points not at the
> +		 * data. It points at 'extra' array of offsets for this multikey
> +		 * index. That array can only be accessed if index in that array
> +		 * is known.
> +		 */
> +		if (field->is_multikey_part && multikey_idx == MULTIKEY_NONE)
> +			goto parse;
>   offset_slot_access:
>   		/* Indexed field */
>   		offset = field_map_get_offset(field_map, offset_slot,
I'm sure that your check must be moved for two lines up. I mean the check
must be done before setting *offset_slot_hint.

As I understood offset_slot_hint will contain a hint for further 
tuple_field_raw_by_path
calls with the same path. That is a kind of agreement, we may call 
tuple_field_raw_by_path
twice and must get the same results.

But in your code you set *offset_slot_hint before a check that could go 
to 'parse' label.
Meanwhile in the second call of tuple_field_raw_by_path it'll check 
*offset_slot_hint and
will go to 'offset_slot_access' label. That's wrong.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]'
  2020-08-04 23:45 ` [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]' Vladislav Shpilevoy
  2020-08-10 17:52   ` Nikita Pettik
@ 2020-08-11 18:50   ` Aleksandr Lyapunov
  1 sibling, 0 replies; 15+ messages in thread
From: Aleksandr Lyapunov @ 2020-08-11 18:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, korablev

LGTM

On 8/5/20 2:45 AM, Vladislav Shpilevoy wrote:
> Tuple JSON field access crashed when '[*]' was used as a first
> part of the JSON path. The patch makes it treated like 'field not
> found'.
>
> Follow-up #5224
> ---
>   src/box/tuple.c                                 | 3 ++-
>   test/box/gh-5224-multikey-field-access.result   | 9 +++++++++
>   test/box/gh-5224-multikey-field-access.test.lua | 6 ++++++
>   3 files changed, 17 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-11  9:44   ` Aleksandr Lyapunov
@ 2020-08-11 21:24     ` Vladislav Shpilevoy
  2020-08-12 13:05       ` Aleksandr Lyapunov
  0 siblings, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-11 21:24 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches, korablev

Hi! Thanks for the review!

> On 8/5/20 2:45 AM, Vladislav Shpilevoy wrote:
>>               goto parse;
>>           if (offset_slot_hint != NULL)
>>               *offset_slot_hint = offset_slot;
>> +        /*
>> +         * When the field is multikey, the offset slot points not at the
>> +         * data. It points at 'extra' array of offsets for this multikey
>> +         * index. That array can only be accessed if index in that array
>> +         * is known.
>> +         */
>> +        if (field->is_multikey_part && multikey_idx == MULTIKEY_NONE)
>> +            goto parse;
>>   offset_slot_access:
>>           /* Indexed field */
>>           offset = field_map_get_offset(field_map, offset_slot,
> I'm sure that your check must be moved for two lines up. I mean the check
> must be done before setting *offset_slot_hint.
> 
> As I understood offset_slot_hint will contain a hint for further tuple_field_raw_by_path
> calls with the same path. That is a kind of agreement, we may call tuple_field_raw_by_path
> twice and must get the same results.
> 
> But in your code you set *offset_slot_hint before a check that could go to 'parse' label.
> Meanwhile in the second call of tuple_field_raw_by_path it'll check *offset_slot_hint and
> will go to 'offset_slot_access' label. That's wrong.

You would be right if not the fact that there is always a guarantee, that
if offset_slot_hint != NULL, then either multikey_idx != MULTIKEY_NONE or it is not
a multikey part. It is unreachable.

So it wouldn't be correct to put it 2 lines above, nor it wouldn't be incorrect -
it does not change anything.

But it is possible to put it *instead*. Into 'else' branch. Then it will be -1
condition check.

New patch for this file:

====================
diff --git a/src/box/tuple.h b/src/box/tuple.h
index 4752323e4..09ebeecf3 100644
--- a/src/box/tuple.h
+++ b/src/box/tuple.h
@@ -626,8 +626,28 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		offset_slot = field->offset_slot;
 		if (offset_slot == TUPLE_OFFSET_SLOT_NIL)
 			goto parse;
-		if (offset_slot_hint != NULL)
+		if (offset_slot_hint != NULL) {
 			*offset_slot_hint = offset_slot;
+			/*
+			 * Hint is never requested for a multikey field without
+			 * providing a concrete multikey index.
+			 */
+			assert(!field->is_multikey_part ||
+			       (multikey_idx != MULTIKEY_NONE &&
+				field->is_multikey_part));
+		} else if (field->is_multikey_part &&
+			   multikey_idx == MULTIKEY_NONE) {
+			/*
+			 * When the field is multikey, the offset slot points
+			 * not at the data. It points at 'extra' array of
+			 * offsets for this multikey index. That array can only
+			 * be accessed if index in that array is known. It is
+			 * not known when the field is accessed not in an index.
+			 * For example, in an application's Lua code by a JSON
+			 * path.
+			 */
+			goto parse;
+		}
 offset_slot_access:
 		/* Indexed field */
 		offset = field_map_get_offset(field_map, offset_slot,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-11 21:24     ` Vladislav Shpilevoy
@ 2020-08-12 13:05       ` Aleksandr Lyapunov
  2020-08-12 20:34         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 15+ messages in thread
From: Aleksandr Lyapunov @ 2020-08-12 13:05 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, korablev

Hi! thanks for the fix, as for me it became more understandable.

See one minor comment below and LGTM.

On 8/12/20 12:24 AM, Vladislav Shpilevoy wrote:
> Hi! Thanks for the review!
>
> +		if (offset_slot_hint != NULL) {
>   			*offset_slot_hint = offset_slot;
> +			/*
> +			 * Hint is never requested for a multikey field without
> +			 * providing a concrete multikey index.
> +			 */
> +			assert(!field->is_multikey_part ||
> +			       (multikey_idx != MULTIKEY_NONE &&
> +				field->is_multikey_part));
The last '&& field->is_multikey_part' is excess.
> +		} else if (field->is_multikey_part &&
> +			   multikey_idx == MULTIKEY_NONE) {
> +			/*
> +			 * When the field is multikey, the offset slot points
> +			 * not at the data. It points at 'extra' array of
> +			 * offsets for this multikey index. That array can only
> +			 * be accessed if index in that array is known. It is
> +			 * not known when the field is accessed not in an index.
> +			 * For example, in an application's Lua code by a JSON
> +			 * path.
> +			 */
> +			goto parse;
> +		}
>   offset_slot_access:
>   		/* Indexed field */
>   		offset = field_map_get_offset(field_map, offset_slot,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
  2020-08-12 13:05       ` Aleksandr Lyapunov
@ 2020-08-12 20:34         ` Vladislav Shpilevoy
  0 siblings, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-12 20:34 UTC (permalink / raw)
  To: Aleksandr Lyapunov, tarantool-patches, korablev

>> +        if (offset_slot_hint != NULL) {
>>               *offset_slot_hint = offset_slot;
>> +            /*
>> +             * Hint is never requested for a multikey field without
>> +             * providing a concrete multikey index.
>> +             */
>> +            assert(!field->is_multikey_part ||
>> +                   (multikey_idx != MULTIKEY_NONE &&
>> +                field->is_multikey_part));
> The last '&& field->is_multikey_part' is excess.

Indeed, a stupid mistake. Somewhy I was sure it was needed yesterday.
Dropped now.

====================
 			assert(!field->is_multikey_part ||
-			       (multikey_idx != MULTIKEY_NONE &&
-				field->is_multikey_part));
+			       multikey_idx != MULTIKEY_NONE);
====================

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] JSON field multikey crash
  2020-08-04 23:45 [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2020-08-10 10:10 ` [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Aleksandr Lyapunov
@ 2020-08-12 20:34 ` Vladislav Shpilevoy
  3 siblings, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-12 20:34 UTC (permalink / raw)
  To: tarantool-patches, alyapunov, korablev

Pushed to master, 2.5, 2.4.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-12 20:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04 23:45 [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Vladislav Shpilevoy
2020-08-04 23:45 ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Vladislav Shpilevoy
2020-08-06 16:00   ` Oleg Babin
2020-08-06 20:04     ` Vladislav Shpilevoy
2020-08-10 16:09   ` Nikita Pettik
2020-08-11  9:44   ` Aleksandr Lyapunov
2020-08-11 21:24     ` Vladislav Shpilevoy
2020-08-12 13:05       ` Aleksandr Lyapunov
2020-08-12 20:34         ` Vladislav Shpilevoy
2020-08-04 23:45 ` [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]' Vladislav Shpilevoy
2020-08-10 17:52   ` Nikita Pettik
2020-08-11 18:50   ` Aleksandr Lyapunov
2020-08-10 10:10 ` [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Aleksandr Lyapunov
2020-08-10 22:22   ` Vladislav Shpilevoy
2020-08-12 20:34 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox