[PATCH 3/6] Don't take schema lock for checkpointing

Konstantin Osipov kostja at tarantool.org
Wed Jul 3 22:21:53 MSK 2019


* Vladimir Davydov <vdavydov.dev at gmail.com> [19/07/01 10:04]:
> Memtx checkpointing proceeds as follows: first we open iterators over
> primary indexes of all spaces and save them to a list, then we start
> a thread that uses the iterators to dump space contents to a snap file.
> To avoid accessing a freed tuple, we put the small allocator to the
> delayed free mode. However, this doesn't prevent an index from being
> dropped so we also take the schema lock to lock out any DDL operation
> that can potentially destroy a space or an index. Note, vinyl doesn't
> need this lock, because it implements index reference counting under
> the hood.
> 
> Actually, we don't really need to take a lock - instead we can simply
> postpone index destruction until checkpointing is complete, similarly
> to how we postpone destruction of individual tuples. We even have all
> the infrastructure for this - it's delayed garbage collection. So this
> patch tweaks it a bit to delay the actual index destruction to be done
> after checkpointing is complete.
> 
> This is a step forward towards removal of the schema lock, which stands
> in the way of transactional DDL.

Looks like you do it because I said once I hate reference
counting.

First, I don't mind having reference counting for memtx index objects now
that we've approached transactional ddl frontier.

But even reference counting would be a bit cumbersome for this.
Please take a look at how bps does it - it links all pages into a
fifo-like list, and a checkpoint simply sets a savepoint on that
list. Committing a checkpoint releases the savepoint and garbage
collects all objects that have been freed after the savepoint. 

SQL will need multiple concurrent snapshots - so it will need
multiple versions. So please consider turning the algorithm you've
just used a general-purpose one - so that any database object
could add itself for delayed destruction, the delayed destruction
would take place immediately if there are no savepoints, or
immediately after the savepoint up until the next savepoint.

Then we can move all subsystems to this.

If you have a better general-purpose object garbage collection
idea, please share/implement it too.


-- 
Konstantin Osipov, Moscow, Russia



More information about the Tarantool-patches mailing list