Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Nikita Pettik <korablev@tarantool.org>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] vinyl: add NULL check of xrow_upsert_execute() retval
Date: Fri, 29 May 2020 23:24:26 +0200	[thread overview]
Message-ID: <7445b38c-f664-ca79-bb05-73a73ddc4d6d@tarantool.org> (raw)
In-Reply-To: <776e8b91b93c79dabd2932b5d665236c5da313c8.1590546551.git.korablev@tarantool.org>

Hi! Thanks for the patch!

While the patch is obviously correct (we need to check NULL
for sure), it solves the problem only partially, and creates
another.

We discussed that verbally, and here is a short resume of what
is happening in the patch, and where we have a tricky problem:
if there are 2 perfectly valid upserts, each with 2.5k operations,
and they are merged into one, both of them are skipped, because
after merge they become too fat - opcount > 4k.

It looks at first that this can only happen when field count > 4k,
because otherwise all the operations would be squashed into something
smaller or equal than field count, but it is not. There are a few
cases, when even after squash total operation count will be bigger
than field count:

1) operations are complex - ':', '&', '|', '^', '#', '!'. The last
two operations are actually used by people. These operations are not
squashed. The last one - '!' - can't be squashed even in theory.

2) operations have negative field number. For example, {'=', -1, ...} -
assign a value to the last field in the tuple. But honestly I don't
remember. Perhaps they are merged, if in both squashed upserts the
field number is the same. But imagine this: {'=', -1, 100}, and
{'=', 5, 100}. They look different, but if the tuple has only 5 fields,
they operate on the same field.

That means it is not safe to drop any upsert having more than 4k
operations. Because it can consist of many small valid upserts.

I don't know how to fix it in a simple way. The only thing I could
come up with is probably don't squash such fat upserts. Just keep
them all on the disk, until they eventually meet bottom of their key,
or a terminal statement like REPLACE/INSERT/DELETE.

This is not only about disk, btw. 2 fat upserts could be inserted into
the memory level, turn into an invalid upsert, and that will be skipped.

Here is a test. Create a tuple, and dump it on disk so as it would
disappear from the memory level and from the cache:

	box.cfg{}
	s = box.schema.create_space('test', {engine = 'vinyl'})
	pk = s:create_index('pk')
	s:insert({1, 1})
	box.snapshot()

Then restart (to ensure the cache is clear), and create 2 upserts:

	box.cfg{}
	s = box.space.test
	ops = {}
	op = {'=', 2, 100}
	for i = 1, 2500 do table.insert(ops, op) end
	s:upsert({1}, ops)
	op = {'=', -1, 200}
	ops = {}
	for i = 1, 2500 do table.insert(ops, op) end
	s:upsert({1}, ops)

Now if I do select, I get

	tarantool> s:select{}
	---
	- - [1, 200]
	...

But if I do dump + select, I get:

	tarantool> box.snapshot()
	---
	- ok
	...

	tarantool> s:select{}
	---
	- - [1, 100]
	...

During dump the second upsert was skipped even though it was valid.

  reply	other threads:[~2020-05-29 21:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27  2:56 Nikita Pettik
2020-05-29 21:24 ` Vladislav Shpilevoy [this message]
2020-05-29 21:34   ` Vladislav Shpilevoy
2020-07-08 12:22     ` Nikita Pettik
2020-05-29 23:04   ` Konstantin Osipov
2020-07-08 12:53   ` Nikita Pettik
2020-07-09 11:56     ` Nikita Pettik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7445b38c-f664-ca79-bb05-73a73ddc4d6d@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] vinyl: add NULL check of xrow_upsert_execute() retval' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox