From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 3 Jul 2019 23:05:37 +0300 From: Vladimir Davydov Subject: Re: [PATCH 3/6] Don't take schema lock for checkpointing Message-ID: <20190703200537.d5fieiy6vq6otepu@esperanza> References: <2b12ec00f416bbb2ab214c5a0ddfc373375ed038.1561922496.git.vdavydov.dev@gmail.com> <20190703192153.GF17318@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190703192153.GF17318@atlas> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: On Wed, Jul 03, 2019 at 10:21:53PM +0300, Konstantin Osipov wrote: > * Vladimir Davydov [19/07/01 10:04]: > > Memtx checkpointing proceeds as follows: first we open iterators over > > primary indexes of all spaces and save them to a list, then we start > > a thread that uses the iterators to dump space contents to a snap file. > > To avoid accessing a freed tuple, we put the small allocator to the > > delayed free mode. However, this doesn't prevent an index from being > > dropped so we also take the schema lock to lock out any DDL operation > > that can potentially destroy a space or an index. Note, vinyl doesn't > > need this lock, because it implements index reference counting under > > the hood. > > > > Actually, we don't really need to take a lock - instead we can simply > > postpone index destruction until checkpointing is complete, similarly > > to how we postpone destruction of individual tuples. We even have all > > the infrastructure for this - it's delayed garbage collection. So this > > patch tweaks it a bit to delay the actual index destruction to be done > > after checkpointing is complete. > > > > This is a step forward towards removal of the schema lock, which stands > > in the way of transactional DDL. > > Looks like you do it because I said once I hate reference > counting. > > First, I don't mind having reference counting for memtx index objects now > that we've approached transactional ddl frontier. > > But even reference counting would be a bit cumbersome for this. > Please take a look at how bps does it - it links all pages into a > fifo-like list, and a checkpoint simply sets a savepoint on that > list. Committing a checkpoint releases the savepoint and garbage > collects all objects that have been freed after the savepoint. > > SQL will need multiple concurrent snapshots - so it will need > multiple versions. So please consider turning the algorithm you've > just used a general-purpose one - so that any database object > could add itself for delayed destruction, the delayed destruction > would take place immediately if there are no savepoints, or > immediately after the savepoint up until the next savepoint. > > Then we can move all subsystems to this. > > If you have a better general-purpose object garbage collection > idea, please share/implement it too. I thought we'd agreed that we aren't going to implement schema multi versioning in foreseeable future so this patch does a quick fix for memtx snapshotting to get rid of the schema lock which conflicts with the notion of transactional DDL. If you think that reference counting is a better design - fine, I'm okay with it - we can pull the patch by Georgy that did exactly that. Just that I think that introducing base index reference counting solely to fix a problem in memtx seems to be a bit of an overkill for me. Regarding SQL. I don't think I follow what you're talking about. Let's discuss it f2f.