Tarantool development patches archive
 help / color / mirror / Atom feed
From: Nikita Pettik <korablev@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] vinyl: add NULL check of xrow_upsert_execute() retval
Date: Wed, 8 Jul 2020 12:53:27 +0000	[thread overview]
Message-ID: <20200708125327.GE26461@tarantool.org> (raw)
In-Reply-To: <7445b38c-f664-ca79-bb05-73a73ddc4d6d@tarantool.org>

On 29 May 23:24, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> While the patch is obviously correct (we need to check NULL
> for sure), it solves the problem only partially, and creates
> another.

Okay, I suggest following. Let's push patch with minor test
changes as is (in scope of 1.10.7 release), but leave issue open.
As a result, we will get rid of crash, and postpone a bit
reconsideration of upsert application till 1.10.8.
We are going to rework upsert (according to the plan defined in #5107
https://github.com/tarantool/tarantool/issues/5107). Here's changed
test (https://github.com/tarantool/tarantool/tree/np/gh-4957-master):

diff --git a/test/vinyl/gh-4957-too-many-upserts.test.lua b/test/vinyl/gh-4957-too-many-upserts.test.lua
new file mode 100644
index 000000000..e5adfe41c
--- /dev/null
+++ b/test/vinyl/gh-4957-too-many-upserts.test.lua
@@ -0,0 +1,38 @@
+s = box.schema.create_space('test', {engine = 'vinyl'})
+pk = s:create_index('pk')
+s:insert{1, 1}
+box.snapshot()
+
+-- Let's test number of upserts in one transaction that exceeds
+-- the limit of operations allowed in one update.
+--
+ups_cnt = 5000
+box.begin()
+for i = 1, ups_cnt do s:upsert({1}, {{'&', 2, 1}}) end
+box.commit()
+-- Upserts are not able to squash, so scheduler will get stuck.
+-- So let's not waste much time here, just check that no crash
+-- takes place.
+--
+box.snapshot()
+
+fiber = require('fiber')
+fiber.sleep(0.01)
+
+s:drop()
+
+s = box.schema.create_space('test', {engine = 'vinyl'})
+pk = s:create_index('pk')
+
+tuple = {}
+for i = 1, ups_cnt do tuple[i] = i end
+_ = s:insert(tuple)
+box.snapshot()
+
+box.begin()
+for k = 1, ups_cnt do s:upsert({1}, {{'+', k, 1}}) end
+box.commit()
+box.snapshot()
+fiber.sleep(0.01)
+

Are you guys okay with this suggestion?
 
> We discussed that verbally, and here is a short resume of what
> is happening in the patch, and where we have a tricky problem:
> if there are 2 perfectly valid upserts, each with 2.5k operations,
> and they are merged into one, both of them are skipped, because
> after merge they become too fat - opcount > 4k.
> 
> It looks at first that this can only happen when field count > 4k,
> because otherwise all the operations would be squashed into something
> smaller or equal than field count, but it is not. There are a few
> cases, when even after squash total operation count will be bigger
> than field count:
> 
> 1) operations are complex - ':', '&', '|', '^', '#', '!'. The last
> two operations are actually used by people. These operations are not
> squashed. The last one - '!' - can't be squashed even in theory.
> 
> 2) operations have negative field number. For example, {'=', -1, ...} -
> assign a value to the last field in the tuple. But honestly I don't
> remember. Perhaps they are merged, if in both squashed upserts the
> field number is the same. But imagine this: {'=', -1, 100}, and
> {'=', 5, 100}. They look different, but if the tuple has only 5 fields,
> they operate on the same field.
> 
> That means it is not safe to drop any upsert having more than 4k
> operations. Because it can consist of many small valid upserts.
> 
> I don't know how to fix it in a simple way. The only thing I could
> come up with is probably don't squash such fat upserts. Just keep
> them all on the disk, until they eventually meet bottom of their key,
> or a terminal statement like REPLACE/INSERT/DELETE.
> 
> This is not only about disk, btw. 2 fat upserts could be inserted into
> the memory level, turn into an invalid upsert, and that will be skipped.
> 
> Here is a test. Create a tuple, and dump it on disk so as it would
> disappear from the memory level and from the cache:
> 
> 	box.cfg{}
> 	s = box.schema.create_space('test', {engine = 'vinyl'})
> 	pk = s:create_index('pk')
> 	s:insert({1, 1})
> 	box.snapshot()
> 
> Then restart (to ensure the cache is clear), and create 2 upserts:
> 
> 	box.cfg{}
> 	s = box.space.test
> 	ops = {}
> 	op = {'=', 2, 100}
> 	for i = 1, 2500 do table.insert(ops, op) end
> 	s:upsert({1}, ops)
> 	op = {'=', -1, 200}
> 	ops = {}
> 	for i = 1, 2500 do table.insert(ops, op) end
> 	s:upsert({1}, ops)
> 
> Now if I do select, I get
> 
> 	tarantool> s:select{}
> 	---
> 	- - [1, 200]
> 	...
> 
> But if I do dump + select, I get:
> 
> 	tarantool> box.snapshot()
> 	---
> 	- ok
> 	...
> 
> 	tarantool> s:select{}
> 	---
> 	- - [1, 100]
> 	...
> 
> During dump the second upsert was skipped even though it was valid.

  parent reply	other threads:[~2020-07-08 12:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27  2:56 Nikita Pettik
2020-05-29 21:24 ` Vladislav Shpilevoy
2020-05-29 21:34   ` Vladislav Shpilevoy
2020-07-08 12:22     ` Nikita Pettik
2020-05-29 23:04   ` Konstantin Osipov
2020-07-08 12:53   ` Nikita Pettik [this message]
2020-07-09 11:56     ` Nikita Pettik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200708125327.GE26461@tarantool.org \
    --to=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] vinyl: add NULL check of xrow_upsert_execute() retval' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox