[PATCH 3/6] Don't take schema lock for checkpointing

Vladimir Davydov vdavydov.dev at gmail.com
Wed Jul 3 23:05:37 MSK 2019


On Wed, Jul 03, 2019 at 10:21:53PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [19/07/01 10:04]:
> > Memtx checkpointing proceeds as follows: first we open iterators over
> > primary indexes of all spaces and save them to a list, then we start
> > a thread that uses the iterators to dump space contents to a snap file.
> > To avoid accessing a freed tuple, we put the small allocator to the
> > delayed free mode. However, this doesn't prevent an index from being
> > dropped so we also take the schema lock to lock out any DDL operation
> > that can potentially destroy a space or an index. Note, vinyl doesn't
> > need this lock, because it implements index reference counting under
> > the hood.
> > 
> > Actually, we don't really need to take a lock - instead we can simply
> > postpone index destruction until checkpointing is complete, similarly
> > to how we postpone destruction of individual tuples. We even have all
> > the infrastructure for this - it's delayed garbage collection. So this
> > patch tweaks it a bit to delay the actual index destruction to be done
> > after checkpointing is complete.
> > 
> > This is a step forward towards removal of the schema lock, which stands
> > in the way of transactional DDL.
> 
> Looks like you do it because I said once I hate reference
> counting.
> 
> First, I don't mind having reference counting for memtx index objects now
> that we've approached transactional ddl frontier.
> 
> But even reference counting would be a bit cumbersome for this.
> Please take a look at how bps does it - it links all pages into a
> fifo-like list, and a checkpoint simply sets a savepoint on that
> list. Committing a checkpoint releases the savepoint and garbage
> collects all objects that have been freed after the savepoint. 
> 
> SQL will need multiple concurrent snapshots - so it will need
> multiple versions. So please consider turning the algorithm you've
> just used a general-purpose one - so that any database object
> could add itself for delayed destruction, the delayed destruction
> would take place immediately if there are no savepoints, or
> immediately after the savepoint up until the next savepoint.
> 
> Then we can move all subsystems to this.
> 
> If you have a better general-purpose object garbage collection
> idea, please share/implement it too.

I thought we'd agreed that we aren't going to implement schema multi
versioning in foreseeable future so this patch does a quick fix for
memtx snapshotting to get rid of the schema lock which conflicts with
the notion of transactional DDL.

If you think that reference counting is a better design - fine, I'm okay
with it - we can pull the patch by Georgy that did exactly that. Just
that I think that introducing base index reference counting solely to
fix a problem in memtx seems to be a bit of an overkill for me.

Regarding SQL. I don't think I follow what you're talking about.
Let's discuss it f2f.



More information about the Tarantool-patches mailing list