[Tarantool-patches] [PATCH 2/2] vinyl: rework upsert operation

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jul 31 02:31:52 MSK 2020


Thanks for the patch!

See 5 comments below.

On 29.07.2020 03:15, Nikita Pettik wrote:
> Previous upsert implementation had a few drawback which led to several
> bugs and issues.
> 
> Issue #5092 (redundant update operations execution)
> 
> In a nutshell, application of upsert(s) consists of two actions
> (see vy_apply_upsert()): execute and squash. Consider example:
> 
> insert({1, 1})  -- stored on disk
> upsert({1}, {{'-', 2, 20}}) -- old ups1
> upsert({1}, {{'+', 2, 10}}) -- new ups2
> 
> 'Execute' takes update operations from the new upsert and combines them
> with key of the old upsert.  {1} --> {'+', 2, 10} can't be evaluated since
> key consists of only one field. Note that in case upsert doesn't fold
> into insert the upsert's tuple and the tuple stored in index are
> different. In our particular case, tuple stored on disk has two fields,
> so upsert's update operation can be applied to it. If upsert's operation
> can't be executed using key of old upsert, we simply continue processing
> squash step. Squash is a combination of update operations: arithmetic
> operations are combined so we don't have to store actions over the same
> field; the rest operations - are merged into single array. As a result,
> we get one upsert with combined operations: upsert({1}, {{'+', 2, -10}}).
> Then vy_apply_upsert is called again to apply new upsert on the top of
> terminal statement - insert{1, 1}. Since now tuple has second field,
> update operations can be executed and corresponding result is {1, -9}
> which in turn is the final result of upsert merging procedure.
> Now imagine that we have following upserts:
> 
> upsert({1, 1}, {{'-', 2, 20}}) -- old ups1
> upsert({1}, {{'+', 2, 10}}) -- new ups2
> 
> In this case tuple execution successfully finishes and modifies upsert's

1. 'tuple execution'?

> tuple: {2, 1} --> {'+', 2, 10} == {2, 11}

2. Where did you get {2, 1}? I see {1, 1} in this example.

> However, we still have to squash/accumulate update operations since they
> should be applied on tuple stored on disk later. After all, at the we

3. 'at the we'?

> have next upsert: upsert({2, 11}, {{'+', 2, -10}}). Then it is applied
> on the top of insert({1, 1}) and we get the same result as in the first
> case - {1, -9}. The only difference is that upsert's tuple was modified.
> As one can see, execution of update operations applied to upsert's tuple
> is redundant in the case index already contains tuple with the same key
> (i.e. when upserts turns into update). Instead, we are able to
> accumulate/squash update operations only. When the last upsert is being
> applied, we can either execute all update operation on tuple fetched
> from index (i.e. upsert is update) OR on tuple specified in the first
> upsert (i.e. first upsert is insert).
> 
> Issue #5105 (upsert doesn't follow associative property)
> 
> Secondly, current approach breaks associative property: after upserts'
> update operations are merged into one array, part of them (related to
> one upsert) can be skipped, meanwhile the rest - is applied. For
> instance:
> 
> -- Index is over second field.
> i = s:create_index('pk', {parts={2, 'uint'}})
> s:replace{1, 2, 3, 'default'}
> s:upsert({2, 2, 2}, {{'=', 4, 'upserted'}})
> -- First update operation modifies primary key, so upsert must be ignored.
> s:upsert({2, 2, 2}, {{'#', 1, 1}, {'!', 3, 1}})
> 
> After merging two upserts we get the next one:
> upsert({2, 2, 2}, {{'=', 4, 'upserted'}, {'#', 1, 1}, {'!', 3, 1}}
> 
> While we executing update operations, we don't tell ones from different

4. Couldn't parse "don't tell ones from different upserts".

> upserts. Thus, if one operation fails, the rest are ignored as well. As
> a result, first upsert won't be applied, even despite the fact it is
> absolutely OK.
> 
> To resolve this issue, let's group update operations of each upsert into
> separate array. So that operations related to particular upsert are
> stored in single array. In terms of previous example we will get:
> upsert({2, 2, 2}, {{{'=', 4, 'upserted'}}, {{'#', 1, 1}, {'!', 3, 1}}}
> 
> Also note that we don't longer have to apply update operations on tuple
> in vy_apply_upsert() if we deal with two upserts: it can be done once we
> face terminal statement; or if there's no underlying statement (it is
> delete op or doesn't exist at all) we apply all update arrays except the
> first one on upsert's tuple.
> 
> Arithmetic operations still can be combined in case there's no unsigned
> fields in space format. Otherwise, result of subtraction can turn out to
> be negative and resulting tuple won't satisfy this property.

5. Non-unsigned fields also can lead to an error. For example, signed
integer overflow. It seems, squashing becomes impossible at all. Most of
the operations are arithmetic for int and uint fields, which can throw an
error both on + and -.

> This patch also introduces format check of upsert application (#1622
> issue). In case it doesn't satisfy space's format, corresponding error
> is logged and upsert is skipped.
> 
> Closes #1622
> Closes #5105
> Closes #5092
> Part of #5107

I didn't review the rest yet. Will provide more comments afterwards.


More information about the Tarantool-patches mailing list