[PATCH v2 2/4] wal: preallocate disk space before writing rows

Konstantin Osipov kostja at tarantool.org
Tue Oct 23 21:33:22 MSK 2018


* Vladimir Davydov <vdavydov.dev at gmail.com> [18/10/23 20:30]:
> This function introduces a new xlog method xlog_fallocate() that makes
> sure that the requested amount of disk space is available at the current
> write position. It does that with posix_fallocate(). The new method is
> called before writing anything to WAL, see wal_fallocate(). In order not
> to invoke the system call too often, wal_fallocate() allocates disk
> space in big chunks (1 MB).
> 
> The reason why I'm doing this is that I want to have a single and
> clearly defined point in the code to handle ENOSPC errors, where
> I could delete old WALs and retry (this is what #3397 is about).
> 
> Needed for #3397

Please consider comment about member names. 
The patch is OK to push.
> index 1d64a7bd..fc495547 100644
> --- a/src/box/journal.h
> +++ b/src/box/journal.h
> @@ -59,6 +59,10 @@ struct journal_entry {
>  	 */
>  	struct fiber *fiber;
>  	/**
> +	 * Max size the rows are going to take when encoded.
> +	 */
> +	size_t len;

Since this is not an exact value, I suggest to call it est_len.
max_len is also OK but worse than est_len.

Any name which would indicate that it's not an exact len is fine:

footprint
xlog_max_len
xlog_len

> +	/**
>  	 * The number of rows in the request.
>  	 */
>  	int n_rows;
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 17d97d76..9b465561 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -272,6 +272,7 @@ txn_write_to_wal(struct txn *txn)
>  		if (stmt->row == NULL)
>  			continue; /* A read (e.g. select) request */
>  		*row++ = stmt->row;
> +		req->len += xrow_len_max(stmt->row);

req->est_len += xrow_est_len();
req->max_len += xrow_max_len();

>  
>  struct wal_msg {
>  	struct cmsg base;
> +	/**
> +	 * Max size the committed requests are going to take when
> +	 * written to disk.
> +	 */
> +	size_t len;

Same here.

> +	/*
> +	 * The actual write size can be greater than the sum size
> +	 * of encoded rows (compression, fixheaders). Double the
> +	 * given length to get a rough upper bound estimate.
> +	 */
> +	len *= 2;
> +
> +ssize_t
> +xlog_fallocate(struct xlog *log, size_t len)
> +{
> +#ifdef HAVE_POSIX_FALLOCATE
> +	int rc = posix_fallocate(log->fd, log->offset + log->alloc_len, len);
> +	if (rc != 0) {
> +		errno = rc;
> +		diag_set(SystemError, "%s: can't allocate disk space",
> +			 log->filename);
> +		return -1;
> +	}
> +	log->alloc_len += len;

I don't like this name either, but don't know of a better one.

log->reserved perhaps? or simply log->allocated? 

> +	return 0;
> +#else
> +	(void)log;
> +	(void)len;
> +	return 0;
> +#endif /* HAVE_POSIX_FALLOCATE */
> +}
> +
> +	if (log->alloc_len > (size_t)written)
> +		log->alloc_len -= written;

Imagine the name is allocated:

 if (log->allocated > written)
      log->allocated -= written;

Seems to be more clear?

> +	else
> +		log->alloc_len = 0;
>  	log->offset += written;
>  	log->rows += log->tx_rows;
>  	log->tx_rows = 0;
> @@ -1378,6 +1403,17 @@ xlog_write_eof(struct xlog *l)
>  		diag_set(ClientError, ER_INJECTION, "xlog write injection");
>  		return -1;
>  	});
> +
> +	/*
> +	 * Free disk space preallocated with xlog_fallocate().
> +	 * Don't write the eof marker if this fails, otherwise
> +	 * we'll get "data after eof marker" error on recovery.

I would simply remove this check - I don't see how it's useful as
long as we never append to an existing xlog.

> +	 */
> +	if (l->alloc_len > 0 && ftruncate(l->fd, l->offset) < 0) {
> +		diag_set(SystemError, "ftruncate() failed");
> +		return -1;
> +	}
> +
>  	if (fio_writen(l->fd, &eof_marker, sizeof(eof_marker)) < 0) {
>  		diag_set(SystemError, "write() failed");
>  		return -1;
> @@ -1793,6 +1829,15 @@ xlog_cursor_next_tx(struct xlog_cursor *i)
>  		return -1;
>  	if (rc > 0)
>  		return 1;
> +	if (load_u32(i->rbuf.rpos) == 0) {
> +		/*
> +		 * Space preallocated with xlog_fallocate().
> +		 * Treat as eof and clear the buffer.
> +		 */
> +		i->read_offset -= ibuf_used(&i->rbuf);
> +		ibuf_reset(&i->rbuf);

Especially since you added this. 

> +		return 1;
> +	}

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov



More information about the Tarantool-patches mailing list