From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Sat, 12 May 2018 03:30:09 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH 1/1] vinyl: fix crash in vinyl_iterator_secondary_next Message-ID: <20180512003009.7wwgg3ecdsfibzcy@esperanza> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: To: Vladislav Shpilevoy Cc: tarantool-patches@freelists.org, kostja@tarantool.org List-ID: On Sat, May 12, 2018 at 02:24:41AM +0300, Vladislav Shpilevoy wrote: > + > +-- > +-- gh-3393: vinyl secondary index iterator must take in the hopper > +-- that a tuple can disappear from the primary index after the > +-- secondary index lookup. > +-- > +s = box.schema.create_space('test', {engine = 'vinyl'}) > +pk = s:create_index('pk') > +sk = s:create_index('sk', {parts = {{2, 'unsigned'}, {1, 'unsigned'}}}) > +s:replace{1, 2} > +s:replace{3, 4} > +box.snapshot() > +ret = nil > +function iterate(i) ret = i.next() end > +box.begin() > +s:replace{5, 6} -- Start a transaction in the engine. > +iter = create_iterator(sk, {4}, {iterator = 'GE'}) > +errinj.set("ERRINJ_VY_DELAY_PK_LOOKUP", true) > +f = fiber.create(iterate, iter) > +s:delete{3} > +errinj.set("ERRINJ_VY_DELAY_PK_LOOKUP", false) > +while not ret do fiber.sleep(0.1) end Here's what happens here: 1. You create an iterator in a transaction and pass it to a new fiber. Let us denote the current fiber as fiber A and the new fiber as fiber B. 2. Fiber B uses the iterator to read the secondary key {4}, then it yields right before fetching the tuple {3, 4} from the primary key. 3. In the mid time fiber A deletes tuple {3, 4} by writing DELETE{3, 4} to the transaction write set. 4. Fiber B continues execution and since it's bound to transaction corresponding to fiber A, it finds DELETE{3, 4} in the transaction write set and crashes. I don't think this has anything to do with the original bug you're pursuing. The use case is quite unnatural obviously. I guess we should simply prohibit it (i.e. abort read iterator if it is passed to another fiber). I surmise the crash our customer is experiencing is caused by a bug in read iterator restore procedure, which was probably brought about by one of my recent patches. That said, NAK for this patch. > +ret > +iter.iterate_over() > +box.commit() > +s:drop()