From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 871C6445321 for ; Wed, 5 Aug 2020 02:45:22 +0300 (MSK) From: Vladislav Shpilevoy Date: Wed, 5 Aug 2020 01:45:18 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 1/2] tuple: fix multikey field JSON access crash List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org, alyapunov@tarantool.org, korablev@tarantool.org When a tuple had format with multikey indexes in it, any attempt to get a multikey indexed field by a JSON path from Lua led to a crash. That was because of incorrect interpretation of offset slot value in tuple's field map. Tuple field map is an array stored before the tuple's MessagePack data. Each element is a 4 byte offset to an indexed value to be able to get it for O(1) time without MessagePack decoding of all the previous fields. At least it was so before multikeys. Now tuple field map is not just an array. It is rather a 2-level array, somehow similar to ext4 FS. Some elements of the root array are positive numbers pointing at data. Some elements point at a second 'indirect' array, so called 'extra', size of which is individual for each tuple. These second arrays are used by multikey indexes to store offsets to each multikey indexed value in a tuple. It means, that if there is an offset slot, it can't be just used as is. It is allowed only if the field is not multikey. Otherwise it is neccessary to somehow get an index in the second 'indirect' array. This is what was happening - a multikey field was found, its offset slot was valid, but it was pointing at an 'indirect' array, not at the data. JSON tuple field access tried to use it as a data offset. The patch makes JSON field access degrade to fullscan when a field is multikey, but no multikey array index is provided. Closes #5224 --- src/box/tuple.h | 8 + test/box/gh-5224-multikey-field-access.result | 155 ++++++++++++++++++ .../gh-5224-multikey-field-access.test.lua | 66 ++++++++ 3 files changed, 229 insertions(+) create mode 100644 test/box/gh-5224-multikey-field-access.result create mode 100644 test/box/gh-5224-multikey-field-access.test.lua diff --git a/src/box/tuple.h b/src/box/tuple.h index 4752323e4..2b48d6af9 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -628,6 +628,14 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, goto parse; if (offset_slot_hint != NULL) *offset_slot_hint = offset_slot; + /* + * When the field is multikey, the offset slot points not at the + * data. It points at 'extra' array of offsets for this multikey + * index. That array can only be accessed if index in that array + * is known. + */ + if (field->is_multikey_part && multikey_idx == MULTIKEY_NONE) + goto parse; offset_slot_access: /* Indexed field */ offset = field_map_get_offset(field_map, offset_slot, diff --git a/test/box/gh-5224-multikey-field-access.result b/test/box/gh-5224-multikey-field-access.result new file mode 100644 index 000000000..014e810d0 --- /dev/null +++ b/test/box/gh-5224-multikey-field-access.result @@ -0,0 +1,155 @@ +-- test-run result file version 2 +-- +-- gh-5224: tuple field access by JSON path crashed when tried to get a multikey +-- indexed field. +-- +format = {} + | --- + | ... +format[1] = {name = 'f1', type = 'unsigned'} + | --- + | ... +format[2] = {name = 'f2', type = 'array'} + | --- + | ... +s = box.schema.create_space('test', {format = format}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +_ = s:create_index('sk', { \ + parts = { \ + {field = 2, path = "[*].tags", type = "unsigned"} \ + } \ +}) + | --- + | ... + +t = s:replace{1, {{tags = 2}}} + | --- + | ... +t['[2][1].tags'] + | --- + | - 2 + | ... + +t = s:replace{1, {{tags = 2}, {tags = 3}, {tags = 4}}} + | --- + | ... +t['[2]'] + | --- + | - [{'tags': 2}, {'tags': 3}, {'tags': 4}] + | ... +t['[2][1]'] + | --- + | - {'tags': 2} + | ... +t['[2][1].tags'] + | --- + | - 2 + | ... +t['[2][2]'] + | --- + | - {'tags': 3} + | ... +t['[2][2].tags'] + | --- + | - 3 + | ... +t['[2][3]'] + | --- + | - {'tags': 4} + | ... +t['[2][3].tags'] + | --- + | - 4 + | ... + +s:truncate() + | --- + | ... +s.index.sk:drop() + | --- + | ... +_ = s:create_index('sk', { \ + parts = { \ + {field = 2, path = "[*].p1.p2", type = "unsigned"} \ + } \ +}) + | --- + | ... + +t = s:replace{1, {{p1 = {p2 = 2}}}} + | --- + | ... +t['[2][1].p1.p2'] + | --- + | - 2 + | ... + +t = s:replace{1, { \ + { \ + p1 = { \ + p2 = 2, p3 = 3 \ + }, \ + p4 = 4 \ + }, \ + { \ + p1 = {p2 = 5} \ + }, \ + { \ + p1 = {p2 = 6} \ + } \ +}} + | --- + | ... + +t['[2][1].p1.p2'] + | --- + | - 2 + | ... +t['[2][1].p1.p3'] + | --- + | - 3 + | ... +t['[2][1].p1'] + | --- + | - {'p2': 2, 'p3': 3} + | ... +t['[2][1]'] + | --- + | - {'p4': 4, 'p1': {'p2': 2, 'p3': 3}} + | ... +t['[2][1].p4'] + | --- + | - 4 + | ... +t['[2][2].p1.p2'] + | --- + | - 5 + | ... +t['[2][2].p1'] + | --- + | - {'p2': 5} + | ... +t['[2][2]'] + | --- + | - {'p1': {'p2': 5}} + | ... +t['[2][3].p1.p2'] + | --- + | - 6 + | ... +t['[2][3].p1'] + | --- + | - {'p2': 6} + | ... +t['[2][3]'] + | --- + | - {'p1': {'p2': 6}} + | ... + +s:drop() + | --- + | ... diff --git a/test/box/gh-5224-multikey-field-access.test.lua b/test/box/gh-5224-multikey-field-access.test.lua new file mode 100644 index 000000000..19765171e --- /dev/null +++ b/test/box/gh-5224-multikey-field-access.test.lua @@ -0,0 +1,66 @@ +-- +-- gh-5224: tuple field access by JSON path crashed when tried to get a multikey +-- indexed field. +-- +format = {} +format[1] = {name = 'f1', type = 'unsigned'} +format[2] = {name = 'f2', type = 'array'} +s = box.schema.create_space('test', {format = format}) +_ = s:create_index('pk') +_ = s:create_index('sk', { \ + parts = { \ + {field = 2, path = "[*].tags", type = "unsigned"} \ + } \ +}) + +t = s:replace{1, {{tags = 2}}} +t['[2][1].tags'] + +t = s:replace{1, {{tags = 2}, {tags = 3}, {tags = 4}}} +t['[2]'] +t['[2][1]'] +t['[2][1].tags'] +t['[2][2]'] +t['[2][2].tags'] +t['[2][3]'] +t['[2][3].tags'] + +s:truncate() +s.index.sk:drop() +_ = s:create_index('sk', { \ + parts = { \ + {field = 2, path = "[*].p1.p2", type = "unsigned"} \ + } \ +}) + +t = s:replace{1, {{p1 = {p2 = 2}}}} +t['[2][1].p1.p2'] + +t = s:replace{1, { \ + { \ + p1 = { \ + p2 = 2, p3 = 3 \ + }, \ + p4 = 4 \ + }, \ + { \ + p1 = {p2 = 5} \ + }, \ + { \ + p1 = {p2 = 6} \ + } \ +}} + +t['[2][1].p1.p2'] +t['[2][1].p1.p3'] +t['[2][1].p1'] +t['[2][1]'] +t['[2][1].p4'] +t['[2][2].p1.p2'] +t['[2][2].p1'] +t['[2][2]'] +t['[2][3].p1.p2'] +t['[2][3].p1'] +t['[2][3]'] + +s:drop() -- 2.21.1 (Apple Git-122.3)