[PATCH v4 1/3] box: introduce tuple_parse_iterator class

Mon Apr 29 18:41:14 MSK 2019

The subject is stale: the iterator is called tuple_format_iterator now.

Anyway, I'm not quite satisfied with the API:

 - I don't like that you pass some parameters in argument list while
   others (multikey_frame, stack) are accessed directly. This looks
   confusing.
 - Also, sometimes you use frame->type while sometimes field->token.type
   to determine if a field is array/map or a leaf. Again, inconsistent
   and hence confusing.
 - Returning MULTIKEY_POP to let the caller know that the iterator's
   done parsing a multikey entry looks kinda unnatural to me :-/
 - There's no need to pass key_part_only to each next() invocation - you
   could as well pass it once on iterator creation.

Here's what I'd try to do:

 - Fold validation in the iterator. After all, we can't reliably iterate
   over a malformed tuple, right? I'd allow to pass flags to the iterator
   constructor:

     VALIDATE // ensure the tuple conforms to the format
     KEY_PARTS_ONLY // skip fields that aren't indexed

   If VALIDATE was set, the iterator would allocate field bitmaps and
   check that all required fields were present in the tuple as it walked
   over it. It would check field types as well in this case.

   This would allow us to get rid of the awkward MULTIKEY_POP flag.

 - Rather than returning field + pos/end_pos and accessing other
   fields directly, I'd fill an auxiliary struct instead with a bit
   of commentary to make the iterator protocol more transparent:

	struct tuple_format_iterator_entry {
		// Pointer to the field data. NULL if EOF.
		void *data;
		// For leaf fields only: pointer to the data end.
		void *data_end;
		// For intermediate fields only (array/map):
		// number of child entries.
		int count;
		// Field metadata. May be NULL if the field isn't
		// present in the format.
		// (We return all child entries of an array present
		// in the format, no matter formatted or not.)
		struct tuple_field *field;
		// Parent field metadata. Cannot be NULL.
		struct tuple_field *parent;
		// Number of multikey items.
		int multikey_count;
		// Index in the multikey array.
		int multikey_idx;
	};

	int
	tuple_format_iterator_next(struct tuple_format_iterator *it,
				   struct tuple_format_iterator_entry *entry);

   Looking at this struct, a reader would quickly figure out what
   fields are used and what not. Currently, it's impossible without
   diving deep into tuple_format_iterator_next implementation.

 - Please avoid branching at top levels, because it will lead to
   code duplication between vy_stmt_new_surrogate_delete and
   tuple_field_map_create. For instance, there should be the only
   method for setting a field map offset in a builder:

	field_map_builder_set_slot(builder, offset_slot, offset,
				   multikey_idx, multikey_count);

   Simply pass multikey_idx = -1 to it for non-multikey indexes and let
   it decide what to do. You'll need to patch tuple_format_iterator_next
   so that it conforms to this protocol.

I guess we need to make vy_stmt_new_surrogate_delete() use field map
builder as well in the scope of this patch - it looks like without
fixing it we won't be able to come up with a good tuple_format_iterator
API.

BTW tuple_format_iterator_next is soo huuge. Why not move it from the
header to the C file?

Also, please look through your comments with a spell checker enabled:
there are a few typos that need to be fixed.