From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Cyrill Gorcunov <gorcunov@gmail.com> Cc: v.shpilevoy@tarantool.org, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 1/9] wal: enrich row's meta information with sync replication flags Date: Tue, 13 Apr 2021 16:26:07 +0300 [thread overview] Message-ID: <2ecf5ae6-abee-5d17-8620-0be4fc6a7899@tarantool.org> (raw) In-Reply-To: <YHRFvE10aHyT+zYX@grain> 12.04.2021 16:06, Cyrill Gorcunov пишет: > On Sun, Apr 11, 2021 at 08:55:56PM +0300, Serge Petrenko wrote: >> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h >> index d3738c705..f7f46088f 100644 >> --- a/src/box/iproto_constants.h >> +++ b/src/box/iproto_constants.h >> @@ -49,9 +49,14 @@ enum { >> XLOG_FIXHEADER_SIZE = 19 >> }; >> >> +/** IPROTO_FLAGS bitfield constants. */ >> enum { >> /** Set for the last xrow in a transaction. */ >> IPROTO_FLAG_COMMIT = 0x01, >> + /** Set for the last row of a tx residing in limbo. */ >> + IPROTO_FLAG_WAIT_SYNC = 0x02, >> + /** Set for the last row of a synchronous tx. */ >> + IPROTO_FLAG_WAIT_ACK = 0x04, >> }; > Serge, is there some particular reason why cant we make them 1:1 > mapping to txn flags? Ie > > IPROTO_FLAG_WAIT_SYNC = 0x10, > IPROTO_FLAG_WAIT_ACK = 0x20, > > this would allow us to eliminate branching in code. While txn flags > are not part of API and iproto flags _are_ I don't expect the former > to be changed anyhow soon? > > Not insisting though, just share the idea. Thanks for the review! Please find my answers inline and the incremental diff below. Actually, this can be done. But it would require us to change the bitfield to something like struct { bool is_commit : 1; int unused : 3; bool wait_sync : 1; bool wait_ack: 1; }; Otherwise the mapping between opt_flags and the bit fields would be lost. I'd rather stick with your other idea (map between iproto_flags and txn_flags). > >> >> enum iproto_key { >> diff --git a/src/box/journal.h b/src/box/journal.h >> index 76c70c19f..9ab8af2c1 100644 >> --- a/src/box/journal.h >> +++ b/src/box/journal.h >> @@ -63,6 +63,7 @@ struct journal_entry { >> * A journal entry completion callback argument. >> */ >> void *complete_data; >> + uint8_t opt_flags; >> /** >> * Asynchronous write completion function. >> */ >> @@ -97,6 +98,7 @@ journal_entry_create(struct journal_entry *entry, size_t n_rows, >> entry->approx_len = approx_len; >> entry->n_rows = n_rows; >> entry->res = -1; >> + entry->opt_flags = 0; >> } > Please don't ruine alignment here. I know that Vlad prefer dense style > (which is actually hard to parse by eyes at least for me) but here we > have a block which ether should be whole formatted to dense or kept > aligned. Sure, thanks for noticing! > >> >> /** >> diff --git a/src/box/txn.c b/src/box/txn.c >> index 40061ff09..d65315f58 100644 >> --- a/src/box/txn.c >> +++ b/src/box/txn.c >> @@ -76,6 +76,7 @@ txn_add_redo(struct txn *txn, struct txn_stmt *stmt, struct request *request) >> row->lsn = 0; >> row->sync = 0; >> row->tm = 0; >> + row->opt_flags = 0; >> } >> /* >> * Group ID should be set both for requests not having a >> @@ -681,6 +682,10 @@ txn_journal_entry_new(struct txn *txn) >> --req->n_rows; >> } >> >> + req->opt_flags |= >> + (txn_has_flag(txn, TXN_WAIT_SYNC) ? IPROTO_FLAG_WAIT_SYNC : 0) | >> + (txn_has_flag(txn, TXN_WAIT_ACK) ? IPROTO_FLAG_WAIT_ACK : 0); >> + > When mentioning branching I meant this code. We could use 1:1 mapping > here if we define iproto flags to be the same as txn flags. > > Or say some explicit map could be provided (for readability sake) > > static const uint8_t opt_flags_map[] = { > [TXN_WAIT_SYNC] = IPROTO_FLAG_WAIT_SYNC, > [TXN_WAIT_ACK] = IPROTO_FLAG_WAIT_ACK, > }; > > req->opt_flags |= opt_flags_map[txn->flags & IPROTO_FLAG_WAIT_SYNC]; > req->opt_flags |= opt_flags_map[txn->flags & IPROTO_FLAG_WAIT_ACK]; > > Such technique is widely used when need some mapping between flags and > instead of number of `if` one simply use read-only memory. I like this variant, applied. > > On the other hands I'm fine with branching here because this is not > that hot code and even misprediction should not harm much. So just > to share the ideas. > >> return req; >> } >> >> diff --git a/src/box/wal.c b/src/box/wal.c >> index 34af0bda6..00fcb21b4 100644 >> --- a/src/box/wal.c >> +++ b/src/box/wal.c >> @@ -962,14 +962,14 @@ out: >> */ >> static void >> wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base, >> - struct xrow_header **row, >> - struct xrow_header **end) >> + struct journal_entry *entry) >> { >> int64_t tsn = 0; >> - struct xrow_header **start = row; >> - struct xrow_header **first_glob_row = row; >> + struct xrow_header **start = entry->rows; >> + struct xrow_header **end = entry->rows + entry->n_rows; >> + struct xrow_header **first_glob_row = start; > There is no need to define another dependency, actually I think > the compiler will figure out that @first_glob_row and @start > are initialized from same @entry->rows still previously this > was more clear, iow I propose to leave `first_glob_row = entry->rows;` Ok, no problem. > >> /** Assign LSN to all local rows. */ >> - for ( ; row < end; row++) { >> + for (struct xrow_header **row = start; row < end; row++) { >> if ((*row)->replica_id == 0) { >> /* >> * All rows representing local space data >> @@ -996,7 +996,13 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base, >> first_glob_row = row; >> } >> (*row)->tsn = tsn == 0 ? (*start)->lsn : tsn; >> - (*row)->is_commit = row == end - 1; >> + if (row < end - 1) >> + continue; >> + /* Tx meta is stored in the last tx row. */ >> + if (row == end - 1) { >> + (*row)->opt_flags = entry->opt_flags; >> + (*row)->is_commit = true; > Why can't we use > (*row)->opt_flags = entry->opt_flags | IPROTO_FLAG_COMMIT; > instead? WAL doesn't depend on IPROTO_CONSTANTS and I don't want to introduce such a dependency. OTOH, I don't want it to know of `wait_ack` and `wait_sync` properties of transactions/rows, that's why I have a generic opt_flags field, which WAL blindly assigns to the row, and it's txn_journal_entry_new()'s responsibility to set wait_ack and wait_sync bits. > >> + } >> } else { >> int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id); >> if (diff <= vclock_get(vclock_diff, >> @@ -1020,7 +1026,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base, >> * the first global row. tsn was yet unknown when those >> * rows were processed. >> */ >> - for (row = start; row < first_glob_row; row++) >> + for (struct xrow_header **row = start; row < first_glob_row; row++) >> (*row)->tsn = tsn; >> } >> >> @@ -1098,8 +1104,7 @@ wal_write_to_disk(struct cmsg *msg) >> struct journal_entry *entry; >> struct stailq_entry *last_committed = NULL; >> stailq_foreach_entry(entry, &wal_msg->commit, fifo) { >> - wal_assign_lsn(&vclock_diff, &writer->vclock, >> - entry->rows, entry->rows + entry->n_rows); >> + wal_assign_lsn(&vclock_diff, &writer->vclock, entry); >> entry->res = vclock_sum(&vclock_diff) + >> vclock_sum(&writer->vclock); >> rc = xlog_write_entry(l, entry); >> @@ -1319,8 +1324,7 @@ wal_write_none_async(struct journal *journal, >> struct vclock vclock_diff; >> >> vclock_create(&vclock_diff); >> - wal_assign_lsn(&vclock_diff, &writer->vclock, entry->rows, >> - entry->rows + entry->n_rows); >> + wal_assign_lsn(&vclock_diff, &writer->vclock, entry); >> vclock_merge(&writer->vclock, &vclock_diff); >> vclock_copy(&replicaset.vclock, &writer->vclock); >> entry->res = vclock_sum(&writer->vclock); >> diff --git a/src/box/xrow.c b/src/box/xrow.c >> index bc06738ad..cc8e43ed4 100644 >> --- a/src/box/xrow.c >> +++ b/src/box/xrow.c >> @@ -183,7 +183,7 @@ error: >> break; >> case IPROTO_FLAGS: >> flags = mp_decode_uint(pos); >> - header->is_commit = flags & IPROTO_FLAG_COMMIT; >> + header->opt_flags = flags; >> break; >> default: >> /* unknown header */ >> @@ -299,6 +299,7 @@ xrow_header_encode(const struct xrow_header *header, uint64_t sync, >> * flag to find transaction boundary (last row in the >> * transaction stream). >> */ >> + uint8_t flags_to_encode = header->opt_flags & ~IPROTO_FLAG_COMMIT; >> if (header->tsn != 0) { >> if (header->tsn != header->lsn || !header->is_commit) { >> /* >> @@ -314,12 +315,14 @@ xrow_header_encode(const struct xrow_header *header, uint64_t sync, >> map_size++; >> } >> if (header->is_commit && header->tsn != header->lsn) { >> - /* Setup last row for multi row transaction. */ >> - d = mp_encode_uint(d, IPROTO_FLAGS); >> - d = mp_encode_uint(d, IPROTO_FLAG_COMMIT); >> - map_size++; >> + flags_to_encode |= IPROTO_FLAG_COMMIT; >> } >> } >> + if (flags_to_encode != 0) { >> + d = mp_encode_uint(d, IPROTO_FLAGS); >> + d = mp_encode_uint(d, flags_to_encode); >> + map_size++; >> + } >> assert(d <= data + XROW_HEADER_LEN_MAX); >> mp_encode_map(data, map_size); >> out->iov_len = d - (char *) out->iov_base; >> diff --git a/src/box/xrow.h b/src/box/xrow.h >> index fde8f9474..2a18733c0 100644 >> --- a/src/box/xrow.h >> +++ b/src/box/xrow.h >> @@ -80,14 +80,28 @@ struct xrow_header { >> * transaction. >> */ >> int64_t tsn; >> - /** >> - * True for the last row in a multi-statement transaction, >> - * or single-statement transaction. Is only encoded in the >> - * write ahead log for multi-statement transactions. >> - * Single-statement transactions do not encode >> - * tsn and is_commit flag to save space. >> - */ >> - bool is_commit; >> + /** Transaction meta flags set only in the last transaction row. */ >> + union { >> + uint8_t opt_flags; >> + struct { >> + /** >> + * Is only encoded in the write ahead log for >> + * multi-statement transactions. Single-statement >> + * transactions do not encode tsn and is_commit flag to >> + * save space. >> + */ >> + bool is_commit : 1; >> + /** >> + * True for a synchronous transaction. >> + */ >> + bool wait_sync : 1; >> + /** >> + * True for any transaction that would enter the limbo >> + * (not necessarily a synchronous one). >> + */ >> + bool wait_ack : 1; >> + }; >> + }; > Serge, I know that you mention bitfields in the mail in context of our > discussion and I assume you'll rework this otherwise this will be > just a hidden bug which might work for sometime but one day it will > trigger and we won't even be able to figure out why. As discussed verbally, let's leave this bitfield in place, and introduce some unit test which would check the assumptions I make about bit field order. Will the test help, though? What if a person builds tarantool and doesn't run the tests? Would a Cmake check be enough? > >> >> int bodycnt; >> uint32_t schema_version; >> -- >> 2.24.3 (Apple Git-128) >> > Cyrill Incremental diff: =============================== diff --git a/src/box/journal.h b/src/box/journal.h index 9ab8af2c1..3ce9c869e 100644 --- a/src/box/journal.h +++ b/src/box/journal.h @@ -98,7 +98,7 @@ journal_entry_create(struct journal_entry *entry, size_t n_rows, entry->approx_len = approx_len; entry->n_rows = n_rows; entry->res = -1; - entry->opt_flags = 0; + entry->opt_flags = 0; } /** diff --git a/src/box/txn.c b/src/box/txn.c index d65315f58..34e8ce026 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -682,9 +682,13 @@ txn_journal_entry_new(struct txn *txn) --req->n_rows; } - req->opt_flags |= - (txn_has_flag(txn, TXN_WAIT_SYNC) ? IPROTO_FLAG_WAIT_SYNC : 0) | - (txn_has_flag(txn, TXN_WAIT_ACK) ? IPROTO_FLAG_WAIT_ACK : 0); + static const uint8_t opt_flags_map[] = { + [TXN_WAIT_SYNC] = IPROTO_FLAG_WAIT_SYNC, + [TXN_WAIT_ACK] = IPROTO_FLAG_WAIT_ACK, + }; + + req->opt_flags |= opt_flags_map[txn->flags & TXN_WAIT_SYNC]; + req->opt_flags |= opt_flags_map[txn->flags & TXN_WAIT_ACK]; return req; } diff --git a/src/box/wal.c b/src/box/wal.c index 00fcb21b4..b7c69fc59 100644 --- a/src/box/wal.c +++ b/src/box/wal.c @@ -967,7 +967,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base, int64_t tsn = 0; struct xrow_header **start = entry->rows; struct xrow_header **end = entry->rows + entry->n_rows; - struct xrow_header **first_glob_row = start; + struct xrow_header **first_glob_row = entry->rows; /** Assign LSN to all local rows. */ for (struct xrow_header **row = start; row < end; row++) { if ((*row)->replica_id == 0) { -- Serge Petrenko
next prev parent reply other threads:[~2021-04-13 13:26 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-11 17:55 [Tarantool-patches] [PATCH 0/9] raft: introduce manual elections and fix a bug with re-applying rolled back transactions Serge Petrenko via Tarantool-patches 2021-04-11 17:55 ` [Tarantool-patches] [PATCH 1/9] wal: enrich row's meta information with sync replication flags Serge Petrenko via Tarantool-patches 2021-04-12 13:06 ` Cyrill Gorcunov via Tarantool-patches 2021-04-13 13:26 ` Serge Petrenko via Tarantool-patches [this message] 2021-04-12 19:21 ` Serge Petrenko via Tarantool-patches 2021-04-11 17:55 ` [Tarantool-patches] [PATCH 2/9] xrow: introduce a PROMOTE entry Serge Petrenko via Tarantool-patches 2021-04-11 17:55 ` [Tarantool-patches] [PATCH 3/9] box: actualise iproto_key_type array Serge Petrenko via Tarantool-patches 2021-04-11 17:55 ` [Tarantool-patches] [PATCH 4/9] box: make clear_synchro_queue() write a PROMOTE entry instead of CONFIRM + ROLLBACK Serge Petrenko via Tarantool-patches 2021-04-11 17:56 ` [Tarantool-patches] [PATCH 5/9] box: write PROMOTE even for empty limbo Serge Petrenko via Tarantool-patches 2021-04-11 17:56 ` [Tarantool-patches] [PATCH 6/9] raft: keep track of greatest known term and filter replication sources based on that Serge Petrenko via Tarantool-patches 2021-04-12 19:23 ` Serge Petrenko via Tarantool-patches 2021-04-11 17:56 ` [Tarantool-patches] [PATCH 7/9] replication: introduce a new election mode: "manual" Serge Petrenko via Tarantool-patches 2021-04-11 17:56 ` [Tarantool-patches] [PATCH 8/9] Support manual elections in `box.ctl.clear_synchro_queue()` Serge Petrenko via Tarantool-patches 2021-04-12 19:23 ` Serge Petrenko via Tarantool-patches 2021-04-11 17:56 ` [Tarantool-patches] [PATCH 9/9] box.ctl: rename clear_synchro_queue to promote Serge Petrenko via Tarantool-patches 2021-04-12 19:24 ` Serge Petrenko via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2ecf5ae6-abee-5d17-8620-0be4fc6a7899@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 1/9] wal: enrich row'\''s meta information with sync replication flags' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox