Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: tarantool-patches@dev.tarantool.org, alyapunov@tarantool.org,
	korablev@tarantool.org
Subject: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash
Date: Wed,  5 Aug 2020 01:45:18 +0200	[thread overview]
Message-ID: <f1b60f4011e19aa04c0ff7d49c82e46955a968fc.1596584571.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1596584571.git.v.shpilevoy@tarantool.org>

When a tuple had format with multikey indexes in it, any attempt
to get a multikey indexed field by a JSON path from Lua led to a
crash.

That was because of incorrect interpretation of offset slot value
in tuple's field map.

Tuple field map is an array stored before the tuple's MessagePack
data. Each element is a 4 byte offset to an indexed value to be
able to get it for O(1) time without MessagePack decoding of all
the previous fields.

At least it was so before multikeys. Now tuple field map is not
just an array. It is rather a 2-level array, somehow similar to
ext4 FS. Some elements of the root array are positive numbers
pointing at data. Some elements point at a second 'indirect'
array, so called 'extra', size of which is individual for each
tuple. These second arrays are used by multikey indexes to store
offsets to each multikey indexed value in a tuple.

It means, that if there is an offset slot, it can't be just used
as is. It is allowed only if the field is not multikey. Otherwise
it is neccessary to somehow get an index in the second 'indirect'
array.

This is what was happening - a multikey field was found, its
offset slot was valid, but it was pointing at an 'indirect' array,
not at the data. JSON tuple field access tried to use it as a data
offset.

The patch makes JSON field access degrade to fullscan when a field
is multikey, but no multikey array index is provided.

Closes #5224
---
 src/box/tuple.h                               |   8 +
 test/box/gh-5224-multikey-field-access.result | 155 ++++++++++++++++++
 .../gh-5224-multikey-field-access.test.lua    |  66 ++++++++
 3 files changed, 229 insertions(+)
 create mode 100644 test/box/gh-5224-multikey-field-access.result
 create mode 100644 test/box/gh-5224-multikey-field-access.test.lua

diff --git a/src/box/tuple.h b/src/box/tuple.h
index 4752323e4..2b48d6af9 100644
--- a/src/box/tuple.h
+++ b/src/box/tuple.h
@@ -628,6 +628,14 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			goto parse;
 		if (offset_slot_hint != NULL)
 			*offset_slot_hint = offset_slot;
+		/*
+		 * When the field is multikey, the offset slot points not at the
+		 * data. It points at 'extra' array of offsets for this multikey
+		 * index. That array can only be accessed if index in that array
+		 * is known.
+		 */
+		if (field->is_multikey_part && multikey_idx == MULTIKEY_NONE)
+			goto parse;
 offset_slot_access:
 		/* Indexed field */
 		offset = field_map_get_offset(field_map, offset_slot,
diff --git a/test/box/gh-5224-multikey-field-access.result b/test/box/gh-5224-multikey-field-access.result
new file mode 100644
index 000000000..014e810d0
--- /dev/null
+++ b/test/box/gh-5224-multikey-field-access.result
@@ -0,0 +1,155 @@
+-- test-run result file version 2
+--
+-- gh-5224: tuple field access by JSON path crashed when tried to get a multikey
+-- indexed field.
+--
+format = {}
+ | ---
+ | ...
+format[1] = {name = 'f1', type = 'unsigned'}
+ | ---
+ | ...
+format[2] = {name = 'f2', type = 'array'}
+ | ---
+ | ...
+s = box.schema.create_space('test', {format = format})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].tags", type = "unsigned"}                       \
+    }                                                                           \
+})
+ | ---
+ | ...
+
+t = s:replace{1, {{tags = 2}}}
+ | ---
+ | ...
+t['[2][1].tags']
+ | ---
+ | - 2
+ | ...
+
+t = s:replace{1, {{tags = 2}, {tags = 3}, {tags = 4}}}
+ | ---
+ | ...
+t['[2]']
+ | ---
+ | - [{'tags': 2}, {'tags': 3}, {'tags': 4}]
+ | ...
+t['[2][1]']
+ | ---
+ | - {'tags': 2}
+ | ...
+t['[2][1].tags']
+ | ---
+ | - 2
+ | ...
+t['[2][2]']
+ | ---
+ | - {'tags': 3}
+ | ...
+t['[2][2].tags']
+ | ---
+ | - 3
+ | ...
+t['[2][3]']
+ | ---
+ | - {'tags': 4}
+ | ...
+t['[2][3].tags']
+ | ---
+ | - 4
+ | ...
+
+s:truncate()
+ | ---
+ | ...
+s.index.sk:drop()
+ | ---
+ | ...
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].p1.p2", type = "unsigned"}                      \
+    }                                                                           \
+})
+ | ---
+ | ...
+
+t = s:replace{1, {{p1 = {p2 = 2}}}}
+ | ---
+ | ...
+t['[2][1].p1.p2']
+ | ---
+ | - 2
+ | ...
+
+t = s:replace{1, {                                                              \
+    {                                                                           \
+        p1 = {                                                                  \
+            p2 = 2, p3 = 3                                                      \
+        },                                                                      \
+        p4 = 4                                                                  \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 5}                                                           \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 6}                                                           \
+    }                                                                           \
+}}
+ | ---
+ | ...
+
+t['[2][1].p1.p2']
+ | ---
+ | - 2
+ | ...
+t['[2][1].p1.p3']
+ | ---
+ | - 3
+ | ...
+t['[2][1].p1']
+ | ---
+ | - {'p2': 2, 'p3': 3}
+ | ...
+t['[2][1]']
+ | ---
+ | - {'p4': 4, 'p1': {'p2': 2, 'p3': 3}}
+ | ...
+t['[2][1].p4']
+ | ---
+ | - 4
+ | ...
+t['[2][2].p1.p2']
+ | ---
+ | - 5
+ | ...
+t['[2][2].p1']
+ | ---
+ | - {'p2': 5}
+ | ...
+t['[2][2]']
+ | ---
+ | - {'p1': {'p2': 5}}
+ | ...
+t['[2][3].p1.p2']
+ | ---
+ | - 6
+ | ...
+t['[2][3].p1']
+ | ---
+ | - {'p2': 6}
+ | ...
+t['[2][3]']
+ | ---
+ | - {'p1': {'p2': 6}}
+ | ...
+
+s:drop()
+ | ---
+ | ...
diff --git a/test/box/gh-5224-multikey-field-access.test.lua b/test/box/gh-5224-multikey-field-access.test.lua
new file mode 100644
index 000000000..19765171e
--- /dev/null
+++ b/test/box/gh-5224-multikey-field-access.test.lua
@@ -0,0 +1,66 @@
+--
+-- gh-5224: tuple field access by JSON path crashed when tried to get a multikey
+-- indexed field.
+--
+format = {}
+format[1] = {name = 'f1', type = 'unsigned'}
+format[2] = {name = 'f2', type = 'array'}
+s = box.schema.create_space('test', {format = format})
+_ = s:create_index('pk')
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].tags", type = "unsigned"}                       \
+    }                                                                           \
+})
+
+t = s:replace{1, {{tags = 2}}}
+t['[2][1].tags']
+
+t = s:replace{1, {{tags = 2}, {tags = 3}, {tags = 4}}}
+t['[2]']
+t['[2][1]']
+t['[2][1].tags']
+t['[2][2]']
+t['[2][2].tags']
+t['[2][3]']
+t['[2][3].tags']
+
+s:truncate()
+s.index.sk:drop()
+_ = s:create_index('sk', {                                                      \
+    parts = {                                                                   \
+        {field = 2, path = "[*].p1.p2", type = "unsigned"}                      \
+    }                                                                           \
+})
+
+t = s:replace{1, {{p1 = {p2 = 2}}}}
+t['[2][1].p1.p2']
+
+t = s:replace{1, {                                                              \
+    {                                                                           \
+        p1 = {                                                                  \
+            p2 = 2, p3 = 3                                                      \
+        },                                                                      \
+        p4 = 4                                                                  \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 5}                                                           \
+    },                                                                          \
+    {                                                                           \
+        p1 = {p2 = 6}                                                           \
+    }                                                                           \
+}}
+
+t['[2][1].p1.p2']
+t['[2][1].p1.p3']
+t['[2][1].p1']
+t['[2][1]']
+t['[2][1].p4']
+t['[2][2].p1.p2']
+t['[2][2].p1']
+t['[2][2]']
+t['[2][3].p1.p2']
+t['[2][3].p1']
+t['[2][3]']
+
+s:drop()
-- 
2.21.1 (Apple Git-122.3)

  reply	other threads:[~2020-08-04 23:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-04 23:45 [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Vladislav Shpilevoy
2020-08-04 23:45 ` Vladislav Shpilevoy [this message]
2020-08-06 16:00   ` [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash Oleg Babin
2020-08-06 20:04     ` Vladislav Shpilevoy
2020-08-10 16:09   ` Nikita Pettik
2020-08-11  9:44   ` Aleksandr Lyapunov
2020-08-11 21:24     ` Vladislav Shpilevoy
2020-08-12 13:05       ` Aleksandr Lyapunov
2020-08-12 20:34         ` Vladislav Shpilevoy
2020-08-04 23:45 ` [Tarantool-patches] [PATCH 2/2] tuple: fix access by JSON path starting from '[*]' Vladislav Shpilevoy
2020-08-10 17:52   ` Nikita Pettik
2020-08-11 18:50   ` Aleksandr Lyapunov
2020-08-10 10:10 ` [Tarantool-patches] [PATCH 0/2] JSON field multikey crash Aleksandr Lyapunov
2020-08-10 22:22   ` Vladislav Shpilevoy
2020-08-12 20:34 ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1b60f4011e19aa04c0ff7d49c82e46955a968fc.1596584571.git.v.shpilevoy@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=alyapunov@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox