From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 9132570361; Wed, 10 Feb 2021 12:01:44 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 9132570361 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1612947704; bh=4NoEEsL1UyJCER3cyqdfmBq+qCh/cpbhQ9jTHzx7zAQ=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=hJtRQEADwuZoeBJuBEtn1PlzmUhBiwvcqo9Tu82mfHo68P7hDYzjfWNdeFyobTARU 9Mn0WHC5tlrXsG20ZBVyVk0t2ce/A8wWvJ3IUnQXN9g5qUV0kSSFAYz81L4AeE/dw4 cAjty6Dtvl9tMluk2yhN07NeaY1G8hNjZ8V5uWa0= Received: from smtp48.i.mail.ru (smtp48.i.mail.ru [94.100.177.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B5440686F4 for ; Wed, 10 Feb 2021 12:01:16 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B5440686F4 Received: by smtp48.i.mail.ru with esmtpa (envelope-from ) id 1l9lN2-0000oE-2L; Wed, 10 Feb 2021 12:01:13 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, yaroslav.dynnikov@tarantool.org References: <259e9595aefe7a28af13eb6dd336ea8f145c2112.1612914070.git.v.shpilevoy@tarantool.org> Message-ID: <805dc520-0146-6fb0-29ec-f621bc8d2a8b@tarantool.org> Date: Wed, 10 Feb 2021 12:01:11 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <259e9595aefe7a28af13eb6dd336ea8f145c2112.1612914070.git.v.shpilevoy@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD953AC099BC0052A9C55152E5B36F70094641E3870F0CF5304182A05F538085040EE530102C8D5BD0A2FC11427CC62DAA9635A3B31AD6160CE1E032F9EF391CB72 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE787D390A33DAA1DB5EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006379FED03E15F715FAA8638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC8767A9B625D33EB455780D08376383C3F3DB80C94A75BD6F389733CBF5DBD5E913377AFFFEAFD269A417C69337E82CC2CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92176DF2183F8FC7C07E7E81EEA8A9722B8941B15DA834481F9449624AB7ADAF37BA3038C0950A5D3613377AFFFEAFD2691661749BA6B977356C4D8D31712B69767B076A6E789B0E97A8DF7F3B2552694A1E7802607F20496D49FD398EE364050F652FD71AFB96DC7D1DCA69B2CC609F68B3661434B16C20AC78D18283394535A975ECD9A6C639B01BC09775C1D3CA48CF32C16DEEF884694935872C767BF85DA22EF20D2F80756B5F40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE7F9581457D50944AB731C566533BA786A40A5AABA2AD371193C9F3DD0FB1AF5EB82E77451A5C57BD33C9F3DD0FB1AF5EB4E70A05D1297E1BBCB5012B2E24CD356 X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24A6D60772A99906F8E1CD14B953EB46DA92548729199E262355D89D7DBCDD132 X-C1DE0DAB: 0D63561A33F958A523DD49E637DBEB419CF86E82F3F9B690C3C54E583E6C4942D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75F04B387B5D7535DE410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D341776B9FDE05FBA7A45F1E1399606E062182495C130AC65DC3B8B70F3F1BB09A1CFECE7B1C43AFB3C1D7E09C32AA3244C3B2098975B80365E30FE86D9CE8EB23424AF4FAF06DA24FDFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojmMmQ+JvDeDFPMo9RAFJHvg== X-Mailru-Sender: E02534FE7B43636DC6DBEBE9641397943AA742D895A00E46086B1F65EEDF2E84D81B476BA929688923E75C7104EB1B885DEE61814008E47C7013064206BFB89F93956FB04BA385BE9437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Oleg Babin via Tarantool-patches Reply-To: Oleg Babin Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for your patch. Shouldn't it be added to storage "MODULE_INTERNALS" ? LGTM. One comment below. On 10/02/2021 02:46, Vladislav Shpilevoy wrote: > Lua does not have a built-in standard library for binary heaps > (also called priority queues). There is an implementation in > Tarantool core in libsalad, but it is in C. > > Heap is a perfect storage for the soon coming feature map-reduce. > In the map-reduce algorithm it will be necessary to be able to > lock an entire storage against any bucket moves for time <= > specified timeout. Number of map-reduce requests can be big, and > they can have different timeouts. > > So there is a pile of timeouts from different requests. It is > necessary to be able to quickly add new ones, be able to delete > random ones, and remove expired ones. > > One way would be a sorted array of the deadlines. Unfortunately, > it is super slow. O(N + log(N)) to add a new element (find place > for log(N) and move all next elements for N), O(N) to delete a > random one (move all next elements one cell left/right). > > Another way would be a sorted tree. But trees like RB or a dumb > binary tree require extra steps to keep them balanced and to have > access to the smallest element ASAP. > > The best way is the binary heap. It is perfectly balanced by > design meaning that all operations there have complexity at most > O(log(N)). It is possible to find the closest deadline for > constant time as it is the heap's top. > > This patch implements it. The heap is intrusive. It means it > stores index of each element right inside of the element as a > field 'index'. Having an index along with each element allows to > delete it from the heap for O(log(N)) without necessity to look > its place up first. > > Part of #147 > --- > test/unit-tap/heap.test.lua | 310 ++++++++++++++++++++++++++++++++++++ > test/unit-tap/suite.ini | 4 + > vshard/heap.lua | 226 ++++++++++++++++++++++++++ > 3 files changed, 540 insertions(+) > create mode 100755 test/unit-tap/heap.test.lua > create mode 100644 test/unit-tap/suite.ini > create mode 100644 vshard/heap.lua > > diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua > new file mode 100755 > index 0000000..8c3819f > --- /dev/null > +++ b/test/unit-tap/heap.test.lua > @@ -0,0 +1,310 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local test = tap.test("cfg") > +local heap = require('vshard.heap') > + Maybe it's better to use single brackets everywhere: test("cfg") -> test('cfg'). Or does such difference have some sense? > +-- > +-- Max number of heap to test. Number of iterations in the test > +-- grows as a factorial of this value. At 10 the test becomes > +-- too long already. > +-- > +local heap_size = 8 > + > +-- > +-- Type of the object stored in the intrusive heap. > +-- > +local function min_heap_cmp(l, r) > + return l.value < r.value > +end > + > +local function max_heap_cmp(l, r) > + return l.value > r.value > +end > + > +local function new_object(value) > + return {value = value} > +end > + > +local function heap_check_indexes(heap) > + local count = heap:count() > + local data = heap.data > + for i = 1, count do > + assert(data[i].index == i) > + end > +end > + > +local function reverse(values, i1, i2) > + while i1 < i2 do > + values[i1], values[i2] = values[i2], values[i1] > + i1 = i1 + 1 > + i2 = i2 - 1 > + end > +end > + > +-- > +-- Implementation of std::next_permutation() from C++. > +-- > +local function next_permutation(values) > + local count = #values > + if count <= 1 then > + return false > + end > + local i = count > + while true do > + local j = i > + i = i - 1 > + if values[i] < values[j] then > + local k = count > + while values[i] >= values[k] do > + k = k - 1 > + end > + values[i], values[k] = values[k], values[i] > + reverse(values, j, count) > + return true > + end > + if i == 1 then > + reverse(values, 1, count) > + return false > + end > + end > +end > + > +local function range(count) > + local res = {} > + for i = 1, count do > + res[i] = i > + end > + return res > +end > + > +-- > +-- Min heap fill and empty. > +-- > +local function test_min_heap_basic(test) > + test:plan(1) > + > + local h = heap.new(min_heap_cmp) > + assert(not h:pop()) > + assert(h:count() == 0) > + local values = {} > + for i = 1, heap_size do > + values[i] = new_object(i) > + end > + for counti = 1, heap_size do > + local indexes = range(counti) > + repeat > + for i = 1, counti do > + h:push(values[indexes[i]]) > + end > + heap_check_indexes(h) > + assert(h:count() == counti) > + for i = 1, counti do > + assert(h:top() == values[i]) > + assert(h:pop() == values[i]) > + heap_check_indexes(h) > + end > + assert(not h:pop()) > + assert(h:count() == 0) > + until not next_permutation(indexes) > + end > + > + test:ok(true, "no asserts") > +end > + > +-- > +-- Max heap fill and empty. > +-- > +local function test_max_heap_basic(test) > + test:plan(1) > + > + local h = heap.new(max_heap_cmp) > + assert(not h:pop()) > + assert(h:count() == 0) > + local values = {} > + for i = 1, heap_size do > + values[i] = new_object(heap_size - i + 1) > + end > + for counti = 1, heap_size do > + local indexes = range(counti) > + repeat > + for i = 1, counti do > + h:push(values[indexes[i]]) > + end > + heap_check_indexes(h) > + assert(h:count() == counti) > + for i = 1, counti do > + assert(h:top() == values[i]) > + assert(h:pop() == values[i]) > + heap_check_indexes(h) > + end > + assert(not h:pop()) > + assert(h:count() == 0) > + until not next_permutation(indexes) > + end > + > + test:ok(true, "no asserts") > +end > + > +-- > +-- Min heap update top element. > +-- > +local function test_min_heap_update_top(test) > + test:plan(1) > + > + local h = heap.new(min_heap_cmp) > + for counti = 1, heap_size do > + local indexes = range(counti) > + repeat > + local values = {} > + for i = 1, counti do > + values[i] = new_object(0) > + h:push(values[i]) > + end > + heap_check_indexes(h) > + for i = 1, counti do > + h:top().value = indexes[i] > + h:update_top() > + end > + heap_check_indexes(h) > + assert(h:count() == counti) > + for i = 1, counti do > + assert(h:top().value == i) > + assert(h:pop().value == i) > + heap_check_indexes(h) > + end > + assert(not h:pop()) > + assert(h:count() == 0) > + until not next_permutation(indexes) > + end > + > + test:ok(true, "no asserts") > +end > + > +-- > +-- Min heap update all elements in all possible positions. > +-- > +local function test_min_heap_update(test) > + test:plan(1) > + > + local h = heap.new(min_heap_cmp) > + for counti = 1, heap_size do > + for srci = 1, counti do > + local endv = srci * 10 + 5 > + for newv = 5, endv, 5 do > + local values = {} > + for i = 1, counti do > + values[i] = new_object(i * 10) > + h:push(values[i]) > + end > + heap_check_indexes(h) > + local obj = values[srci] > + obj.value = newv > + h:update(obj) > + assert(obj.index >= 1) > + assert(obj.index <= counti) > + local prev = -1 > + for i = 1, counti do > + obj = h:pop() > + assert(obj.index == -1) > + assert(obj.value >= prev) > + assert(obj.value >= 1) > + prev = obj.value > + obj.value = -1 > + heap_check_indexes(h) > + end > + assert(not h:pop()) > + assert(h:count() == 0) > + end > + end > + end > + > + test:ok(true, "no asserts") > +end > + > +-- > +-- Max heap delete all elements from all possible positions. > +-- > +local function test_max_heap_delete(test) > + test:plan(1) > + > + local h = heap.new(max_heap_cmp) > + local inf = heap_size + 1 > + for counti = 1, heap_size do > + for srci = 1, counti do > + local values = {} > + for i = 1, counti do > + values[i] = new_object(i) > + h:push(values[i]) > + end > + heap_check_indexes(h) > + local obj = values[srci] > + obj.value = inf > + h:remove(obj) > + assert(obj.index == -1) > + local prev = inf > + for i = 2, counti do > + obj = h:pop() > + assert(obj.index == -1) > + assert(obj.value < prev) > + assert(obj.value >= 1) > + prev = obj.value > + obj.value = -1 > + heap_check_indexes(h) > + end > + assert(not h:pop()) > + assert(h:count() == 0) > + end > + end > + > + test:ok(true, "no asserts") > +end > + > +local function test_min_heap_remove_top(test) > + test:plan(1) > + > + local h = heap.new(min_heap_cmp) > + for i = 1, heap_size do > + h:push(new_object(i)) > + end > + for i = 1, heap_size do > + assert(h:top().value == i) > + h:remove_top() > + end > + assert(h:count() == 0) > + > + test:ok(true, "no asserts") > +end > + > +local function test_max_heap_remove_try(test) > + test:plan(1) > + > + local h = heap.new(max_heap_cmp) > + local obj = new_object(1) > + assert(obj.index == nil) > + h:remove_try(obj) > + assert(h:count() == 0) > + > + h:push(obj) > + h:push(new_object(2)) > + assert(obj.index == 2) > + h:remove(obj) > + assert(obj.index == -1) > + h:remove_try(obj) > + assert(obj.index == -1) > + assert(h:count() == 1) > + > + test:ok(true, "no asserts") > +end > + > +test:plan(7) > + > +test:test('min_heap_basic', test_min_heap_basic) > +test:test('max_heap_basic', test_max_heap_basic) > +test:test('min_heap_update_top', test_min_heap_update_top) > +test:test('min heap update', test_min_heap_update) > +test:test('max heap delete', test_max_heap_delete) > +test:test('min heap remove top', test_min_heap_remove_top) > +test:test('max heap remove try', test_max_heap_remove_try) > + > +os.exit(test:check() and 0 or 1) > diff --git a/test/unit-tap/suite.ini b/test/unit-tap/suite.ini > new file mode 100644 > index 0000000..f365b69 > --- /dev/null > +++ b/test/unit-tap/suite.ini > @@ -0,0 +1,4 @@ > +[default] > +core = app > +description = Unit tests TAP > +is_parallel = True > diff --git a/vshard/heap.lua b/vshard/heap.lua > new file mode 100644 > index 0000000..78c600a > --- /dev/null > +++ b/vshard/heap.lua > @@ -0,0 +1,226 @@ > +local math_floor = math.floor > + > +-- > +-- Implementation of a typical algorithm of the binary heap. > +-- The heap is intrusive - it stores index of each element inside of it. It > +-- allows to update and delete elements in any place in the heap, not only top > +-- elements. > +-- > + > +local function heap_parent_index(index) > + return math_floor(index / 2) > +end > + > +local function heap_left_child_index(index) > + return index * 2 > +end > + > +-- > +-- Generate a new heap. > +-- > +-- The implementation is targeted on as few index accesses as possible. > +-- Everything what could be is stored as upvalue variables instead of as indexes > +-- in a table. What couldn't be an upvalue and is used in a function more than > +-- once is saved on the stack. > +-- > +local function heap_new(is_left_above) > + -- Having it as an upvalue allows not to do 'self.data' lookup in each > + -- function. > + local data = {} > + -- Saves #data calculation. In Lua it is not just reading a number. > + local count = 0 > + > + local function heap_update_index_up(idx) > + if idx == 1 then > + return false > + end > + > + local orig_idx = idx > + local value = data[idx] > + local pidx = heap_parent_index(idx) > + local parent = data[pidx] > + while is_left_above(value, parent) do > + data[idx] = parent > + parent.index = idx > + idx = pidx > + if idx == 1 then > + break > + end > + pidx = heap_parent_index(idx) > + parent = data[pidx] > + end > + > + if idx == orig_idx then > + return false > + end > + data[idx] = value > + value.index = idx > + return true > + end > + > + local function heap_update_index_down(idx) > + local left_idx = heap_left_child_index(idx) > + if left_idx > count then > + return false > + end > + > + local orig_idx = idx > + local left > + local right > + local right_idx = left_idx + 1 > + local top > + local top_idx > + local value = data[idx] > + repeat > + right_idx = left_idx + 1 > + if right_idx > count then > + top = data[left_idx] > + if is_left_above(value, top) then > + break > + end > + top_idx = left_idx > + else > + left = data[left_idx] > + right = data[right_idx] > + if is_left_above(left, right) then > + if is_left_above(value, left) then > + break > + end > + top_idx = left_idx > + top = left > + else > + if is_left_above(value, right) then > + break > + end > + top_idx = right_idx > + top = right > + end > + end > + > + data[idx] = top > + top.index = idx > + idx = top_idx > + left_idx = heap_left_child_index(idx) > + until left_idx > count > + > + if idx == orig_idx then > + return false > + end > + data[idx] = value > + value.index = idx > + return true > + end > + > + local function heap_update_index(idx) > + if not heap_update_index_up(idx) then > + heap_update_index_down(idx) > + end > + end > + > + local function heap_push(self, value) > + count = count + 1 > + data[count] = value > + value.index = count > + heap_update_index_up(count) > + end > + > + local function heap_update_top(self) > + heap_update_index_down(1) > + end > + > + local function heap_update(self, value) > + heap_update_index(value.index) > + end > + > + local function heap_remove_top(self) > + if count == 0 then > + return > + end > + data[1].index = -1 > + if count == 1 then > + data[1] = nil > + count = 0 > + return > + end > + local value = data[count] > + data[count] = nil > + data[1] = value > + value.index = 1 > + count = count - 1 > + heap_update_index_down(1) > + end > + > + local function heap_remove(self, value) > + local idx = value.index > + value.index = -1 > + if idx == count then > + data[count] = nil > + count = count - 1 > + return > + end > + value = data[count] > + data[idx] = value > + data[count] = nil > + value.index = idx > + count = count - 1 > + heap_update_index(idx) > + end > + > + local function heap_remove_try(self, value) > + local idx = value.index > + if idx and idx > 0 then > + heap_remove(self, value) > + end > + end > + > + local function heap_pop(self) > + if count == 0 then > + return > + end > + -- Some duplication from remove_top, but allows to save a few > + -- condition checks, index accesses, and a function call. > + local res = data[1] > + res.index = -1 > + if count == 1 then > + data[1] = nil > + count = 0 > + return res > + end > + local value = data[count] > + data[count] = nil > + data[1] = value > + value.index = 1 > + count = count - 1 > + heap_update_index_down(1) > + return res > + end > + > + local function heap_top(self) > + return data[1] > + end > + > + local function heap_count(self) > + return count > + end > + > + return setmetatable({ > + -- Expose the data. For testing. > + data = data, > + }, { > + __index = { > + push = heap_push, > + update_top = heap_update_top, > + remove_top = heap_remove_top, > + pop = heap_pop, > + update = heap_update, > + remove = heap_remove, > + remove_try = heap_remove_try, > + top = heap_top, > + count = heap_count, > + } > + }) > +end > + > +return { > + new = heap_new, > +}