[Tarantool-patches] [PATCH 02/14] WIP: module api: expose box region

Alexander Turenko alexander.turenko at tarantool.org
Thu Oct 8 22:21:26 MSK 2020


On Fri, Sep 25, 2020 at 12:31:58AM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 2 comments below.
> 
> > diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
> > index 16ee9f414..618ba8045 100644
> > --- a/src/lib/core/fiber.h
> > +++ b/src/lib/core/fiber.h
> > @@ -386,6 +386,107 @@ struct slab_cache;
> >  API_EXPORT struct slab_cache *
> >  cord_slab_cache(void);
> >  
> > +/**
> > + * box region allocator
> > + *
> > + * It is the region allocator from the small library. It is useful
> > + * for allocating tons of small objects and free them at once.
> > + *
> > + * Typical usage is illustrated in the sketch below.
> > + *
> > + *  | size_t region_svp = box_region_used();
> > + *  | while (<...>) {
> > + *  |     char *buf = box_region_alloc(<...>);
> > + *  |     <...>
> > + *  | }
> > + *  | box_region_truncate(region_svp);
> > + *
> > + * There are module API functions that return a result on
> > + * this region. In this case a module is responsible to free the
> > + * result:
> > + *
> > + *  | size_t region_svp = box_region_used();
> > + *  | char *buf = box_<...>(<...>);
> > + *  | <...>
> > + *  | box_region_truncate(region_svp);
> > + *
> > + * This API provides better compatibility guarantees over using
> > + * the small library directly in a module. A binary layout of
> > + * internal structures may be changed in a future, but
> > + * <box_region_*>() functions will remain API and ABI compatible.
> > + *
> > + * Data allocated on the region are guaranteed to be valid until
> > + * a fiber yield or a call of a function from the certain set:
> > + *
> > + * - Related to transactions.
> > + * - Ones that may cause box initialization (box.cfg()).
> > + * - Ones that may involve SQL execution.
> > + *
> > + * FIXME: Provide more strict list of functions, which may
> > + * invalidate the data: ones that may lead to calling of
> > + * fiber_gc().
> > + *
> > + * It is safe to call simple box APIs around tuples, key_defs and
> > + * so on -- they don't invalidate the allocated data.
> > + *
> > + * Don't assume this region as fiber local. This is an
> > + * implementation detail and may be changed in a future.
> > + *
> > + * Another useful property is that data allocated on this region
> > + * are freed eventually, so a cold path may omit
> > + * <box_region_truncate>() call: it will not lead to a memory
> > + * leak. It is not recommended to exploit this property on a hot
> > + * path, because of the risk to exhaust available memory for this
> > + * region before it'll be freed.
> 
> 1. That looks like a dangerous advice. We do truncate the region in some
> places, but that may change any moment. For example, fiber_gc() in
> the transactions may be replaced with
> region_truncate(<size_at_transaction_start>). I would better say, that
> if you allocated something, or get something allocated implicitly from
> another module.h function, you shall free it. You can't rely on it be
> freed automatically. If you use it in a hot path to allocate a small
> number of objects, and box_region_truncate() really becomes a bottleneck,
> you may need to reconsider design of your allocation policy, and use
> something else but the region.

Re the dangerous advice itself: I agree and will remove it in the next
patchset version. We cannot guarantee that fiber_gc() will be called
periodically in future and even now it highly depends on a workload.

Re 'you shall free it': it is already described at top of the
description (with examples).

Regarding 'if <...> box_region_truncate() really becomes a bottleneck':
I think that 'It is useful for allocating tons of small objects and free
them at once' at top of the description is enough.

> > +/*
> > + * Allocate storage for an object of given type with respect to
> > + * its alignment.
> > + *
> > + * @param T    type of an object
> > + * @param size where to store size of an object (size_t *)
> > + */
> > +#define box_region_alloc_object(T, size) ({					\
> > +	*(size) = sizeof(T);							\
> > +	(T *)box_region_aligned_alloc(sizeof(T), alignof(T));			\
> > +})
> > +
> > +/*
> > + * Allocate storage for array of objects of given type.
> > + *
> > + * @param T     type of an object
> > + * @param count amount of objects to allocate (size_t)
> > + * @param size  where to store size of an object (size_t *)
> > + */
> > +#define box_region_alloc_array(T, count, size) ({				\
> > +	int _tmp_ = sizeof(T) * (count);					\
> > +	*(size) = _tmp_;							\
> > +	(T *)box_region_aligned_alloc(_tmp_, alignof(T));			\
> > +})
> 
> 2. Nah, lets better drop this sugar. If somebody wants them, they can
> implement them in their own code. It is not something what needs to
> be in module.h, and IMO we need to keep here only really necessary
> things.

I looked at this once again and now I agree. My points:

1. We anyway should describe when box_region_alloc() and
   box_region_aligned_alloc() should be used, because otherwise it is
   easy to step into UB by violation of a structure alignment rules.
2. box_region_alloc_array() does not check the multiplication for
   overflow (and stores the result in int, not even size_t). But
   calloc() checks it and the expectation of a developer may be that our
   function do the same.

I'll update the comment to box_region_alloc() with the following warning
about alignment in the next patchset version:

 | /**
 |  * Allocate size bytes from the box region.
 |  *
 |  * Don't use this function to allocate a memory block for a value
 |  * or array of values of a type with alignment requirements. A
 |  * violation of alignment requrements leads to undefined
 |  * behaviour.
 |  */
 | API_EXPORT void *
 | box_region_alloc(size_t size);


More information about the Tarantool-patches mailing list