From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 23 Oct 2018 21:33:22 +0300 From: Konstantin Osipov Subject: Re: [PATCH v2 2/4] wal: preallocate disk space before writing rows Message-ID: <20181023183322.GB29694@chai> References: <484c921eb04a0a94563dcccf058bb5e8f96d3a49.1540314925.git.vdavydov.dev@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <484c921eb04a0a94563dcccf058bb5e8f96d3a49.1540314925.git.vdavydov.dev@gmail.com> To: Vladimir Davydov Cc: tarantool-patches@freelists.org List-ID: * Vladimir Davydov [18/10/23 20:30]: > This function introduces a new xlog method xlog_fallocate() that makes > sure that the requested amount of disk space is available at the current > write position. It does that with posix_fallocate(). The new method is > called before writing anything to WAL, see wal_fallocate(). In order not > to invoke the system call too often, wal_fallocate() allocates disk > space in big chunks (1 MB). > > The reason why I'm doing this is that I want to have a single and > clearly defined point in the code to handle ENOSPC errors, where > I could delete old WALs and retry (this is what #3397 is about). > > Needed for #3397 Please consider comment about member names. The patch is OK to push. > index 1d64a7bd..fc495547 100644 > --- a/src/box/journal.h > +++ b/src/box/journal.h > @@ -59,6 +59,10 @@ struct journal_entry { > */ > struct fiber *fiber; > /** > + * Max size the rows are going to take when encoded. > + */ > + size_t len; Since this is not an exact value, I suggest to call it est_len. max_len is also OK but worse than est_len. Any name which would indicate that it's not an exact len is fine: footprint xlog_max_len xlog_len > + /** > * The number of rows in the request. > */ > int n_rows; > diff --git a/src/box/txn.c b/src/box/txn.c > index 17d97d76..9b465561 100644 > --- a/src/box/txn.c > +++ b/src/box/txn.c > @@ -272,6 +272,7 @@ txn_write_to_wal(struct txn *txn) > if (stmt->row == NULL) > continue; /* A read (e.g. select) request */ > *row++ = stmt->row; > + req->len += xrow_len_max(stmt->row); req->est_len += xrow_est_len(); req->max_len += xrow_max_len(); > > struct wal_msg { > struct cmsg base; > + /** > + * Max size the committed requests are going to take when > + * written to disk. > + */ > + size_t len; Same here. > + /* > + * The actual write size can be greater than the sum size > + * of encoded rows (compression, fixheaders). Double the > + * given length to get a rough upper bound estimate. > + */ > + len *= 2; > + > +ssize_t > +xlog_fallocate(struct xlog *log, size_t len) > +{ > +#ifdef HAVE_POSIX_FALLOCATE > + int rc = posix_fallocate(log->fd, log->offset + log->alloc_len, len); > + if (rc != 0) { > + errno = rc; > + diag_set(SystemError, "%s: can't allocate disk space", > + log->filename); > + return -1; > + } > + log->alloc_len += len; I don't like this name either, but don't know of a better one. log->reserved perhaps? or simply log->allocated? > + return 0; > +#else > + (void)log; > + (void)len; > + return 0; > +#endif /* HAVE_POSIX_FALLOCATE */ > +} > + > + if (log->alloc_len > (size_t)written) > + log->alloc_len -= written; Imagine the name is allocated: if (log->allocated > written) log->allocated -= written; Seems to be more clear? > + else > + log->alloc_len = 0; > log->offset += written; > log->rows += log->tx_rows; > log->tx_rows = 0; > @@ -1378,6 +1403,17 @@ xlog_write_eof(struct xlog *l) > diag_set(ClientError, ER_INJECTION, "xlog write injection"); > return -1; > }); > + > + /* > + * Free disk space preallocated with xlog_fallocate(). > + * Don't write the eof marker if this fails, otherwise > + * we'll get "data after eof marker" error on recovery. I would simply remove this check - I don't see how it's useful as long as we never append to an existing xlog. > + */ > + if (l->alloc_len > 0 && ftruncate(l->fd, l->offset) < 0) { > + diag_set(SystemError, "ftruncate() failed"); > + return -1; > + } > + > if (fio_writen(l->fd, &eof_marker, sizeof(eof_marker)) < 0) { > diag_set(SystemError, "write() failed"); > return -1; > @@ -1793,6 +1829,15 @@ xlog_cursor_next_tx(struct xlog_cursor *i) > return -1; > if (rc > 0) > return 1; > + if (load_u32(i->rbuf.rpos) == 0) { > + /* > + * Space preallocated with xlog_fallocate(). > + * Treat as eof and clear the buffer. > + */ > + i->read_offset -= ibuf_used(&i->rbuf); > + ibuf_reset(&i->rbuf); Especially since you added this. > + return 1; > + } -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov