From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vdavydov.dev@gmail.com>
Date: Wed, 3 Jul 2019 23:05:37 +0300
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH 3/6] Don't take schema lock for checkpointing
Message-ID: <20190703200537.d5fieiy6vq6otepu@esperanza>
References: <cover.1561922496.git.vdavydov.dev@gmail.com>
 <2b12ec00f416bbb2ab214c5a0ddfc373375ed038.1561922496.git.vdavydov.dev@gmail.com>
 <20190703192153.GF17318@atlas>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190703192153.GF17318@atlas>
To: Konstantin Osipov <kostja@tarantool.org>
Cc: tarantool-patches@freelists.org
List-ID: <tarantool-patches.dev.tarantool.org>

On Wed, Jul 03, 2019 at 10:21:53PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/07/01 10:04]:
> > Memtx checkpointing proceeds as follows: first we open iterators over
> > primary indexes of all spaces and save them to a list, then we start
> > a thread that uses the iterators to dump space contents to a snap file.
> > To avoid accessing a freed tuple, we put the small allocator to the
> > delayed free mode. However, this doesn't prevent an index from being
> > dropped so we also take the schema lock to lock out any DDL operation
> > that can potentially destroy a space or an index. Note, vinyl doesn't
> > need this lock, because it implements index reference counting under
> > the hood.
> > 
> > Actually, we don't really need to take a lock - instead we can simply
> > postpone index destruction until checkpointing is complete, similarly
> > to how we postpone destruction of individual tuples. We even have all
> > the infrastructure for this - it's delayed garbage collection. So this
> > patch tweaks it a bit to delay the actual index destruction to be done
> > after checkpointing is complete.
> > 
> > This is a step forward towards removal of the schema lock, which stands
> > in the way of transactional DDL.
> 
> Looks like you do it because I said once I hate reference
> counting.
> 
> First, I don't mind having reference counting for memtx index objects now
> that we've approached transactional ddl frontier.
> 
> But even reference counting would be a bit cumbersome for this.
> Please take a look at how bps does it - it links all pages into a
> fifo-like list, and a checkpoint simply sets a savepoint on that
> list. Committing a checkpoint releases the savepoint and garbage
> collects all objects that have been freed after the savepoint. 
> 
> SQL will need multiple concurrent snapshots - so it will need
> multiple versions. So please consider turning the algorithm you've
> just used a general-purpose one - so that any database object
> could add itself for delayed destruction, the delayed destruction
> would take place immediately if there are no savepoints, or
> immediately after the savepoint up until the next savepoint.
> 
> Then we can move all subsystems to this.
> 
> If you have a better general-purpose object garbage collection
> idea, please share/implement it too.

I thought we'd agreed that we aren't going to implement schema multi
versioning in foreseeable future so this patch does a quick fix for
memtx snapshotting to get rid of the schema lock which conflicts with
the notion of transactional DDL.

If you think that reference counting is a better design - fine, I'm okay
with it - we can pull the patch by Georgy that did exactly that. Just
that I think that introducing base index reference counting solely to
fix a problem in memtx seems to be a bit of an overkill for me.

Regarding SQL. I don't think I follow what you're talking about.
Let's discuss it f2f.