[Tarantool-patches] [PATCH 02/14] WIP: module api: expose box region
Alexander Turenko
alexander.turenko at tarantool.org
Thu Oct 8 22:21:26 MSK 2020
On Fri, Sep 25, 2020 at 12:31:58AM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
>
> See 2 comments below.
>
> > diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
> > index 16ee9f414..618ba8045 100644
> > --- a/src/lib/core/fiber.h
> > +++ b/src/lib/core/fiber.h
> > @@ -386,6 +386,107 @@ struct slab_cache;
> > API_EXPORT struct slab_cache *
> > cord_slab_cache(void);
> >
> > +/**
> > + * box region allocator
> > + *
> > + * It is the region allocator from the small library. It is useful
> > + * for allocating tons of small objects and free them at once.
> > + *
> > + * Typical usage is illustrated in the sketch below.
> > + *
> > + * | size_t region_svp = box_region_used();
> > + * | while (<...>) {
> > + * | char *buf = box_region_alloc(<...>);
> > + * | <...>
> > + * | }
> > + * | box_region_truncate(region_svp);
> > + *
> > + * There are module API functions that return a result on
> > + * this region. In this case a module is responsible to free the
> > + * result:
> > + *
> > + * | size_t region_svp = box_region_used();
> > + * | char *buf = box_<...>(<...>);
> > + * | <...>
> > + * | box_region_truncate(region_svp);
> > + *
> > + * This API provides better compatibility guarantees over using
> > + * the small library directly in a module. A binary layout of
> > + * internal structures may be changed in a future, but
> > + * <box_region_*>() functions will remain API and ABI compatible.
> > + *
> > + * Data allocated on the region are guaranteed to be valid until
> > + * a fiber yield or a call of a function from the certain set:
> > + *
> > + * - Related to transactions.
> > + * - Ones that may cause box initialization (box.cfg()).
> > + * - Ones that may involve SQL execution.
> > + *
> > + * FIXME: Provide more strict list of functions, which may
> > + * invalidate the data: ones that may lead to calling of
> > + * fiber_gc().
> > + *
> > + * It is safe to call simple box APIs around tuples, key_defs and
> > + * so on -- they don't invalidate the allocated data.
> > + *
> > + * Don't assume this region as fiber local. This is an
> > + * implementation detail and may be changed in a future.
> > + *
> > + * Another useful property is that data allocated on this region
> > + * are freed eventually, so a cold path may omit
> > + * <box_region_truncate>() call: it will not lead to a memory
> > + * leak. It is not recommended to exploit this property on a hot
> > + * path, because of the risk to exhaust available memory for this
> > + * region before it'll be freed.
>
> 1. That looks like a dangerous advice. We do truncate the region in some
> places, but that may change any moment. For example, fiber_gc() in
> the transactions may be replaced with
> region_truncate(<size_at_transaction_start>). I would better say, that
> if you allocated something, or get something allocated implicitly from
> another module.h function, you shall free it. You can't rely on it be
> freed automatically. If you use it in a hot path to allocate a small
> number of objects, and box_region_truncate() really becomes a bottleneck,
> you may need to reconsider design of your allocation policy, and use
> something else but the region.
Re the dangerous advice itself: I agree and will remove it in the next
patchset version. We cannot guarantee that fiber_gc() will be called
periodically in future and even now it highly depends on a workload.
Re 'you shall free it': it is already described at top of the
description (with examples).
Regarding 'if <...> box_region_truncate() really becomes a bottleneck':
I think that 'It is useful for allocating tons of small objects and free
them at once' at top of the description is enough.
> > +/*
> > + * Allocate storage for an object of given type with respect to
> > + * its alignment.
> > + *
> > + * @param T type of an object
> > + * @param size where to store size of an object (size_t *)
> > + */
> > +#define box_region_alloc_object(T, size) ({ \
> > + *(size) = sizeof(T); \
> > + (T *)box_region_aligned_alloc(sizeof(T), alignof(T)); \
> > +})
> > +
> > +/*
> > + * Allocate storage for array of objects of given type.
> > + *
> > + * @param T type of an object
> > + * @param count amount of objects to allocate (size_t)
> > + * @param size where to store size of an object (size_t *)
> > + */
> > +#define box_region_alloc_array(T, count, size) ({ \
> > + int _tmp_ = sizeof(T) * (count); \
> > + *(size) = _tmp_; \
> > + (T *)box_region_aligned_alloc(_tmp_, alignof(T)); \
> > +})
>
> 2. Nah, lets better drop this sugar. If somebody wants them, they can
> implement them in their own code. It is not something what needs to
> be in module.h, and IMO we need to keep here only really necessary
> things.
I looked at this once again and now I agree. My points:
1. We anyway should describe when box_region_alloc() and
box_region_aligned_alloc() should be used, because otherwise it is
easy to step into UB by violation of a structure alignment rules.
2. box_region_alloc_array() does not check the multiplication for
overflow (and stores the result in int, not even size_t). But
calloc() checks it and the expectation of a developer may be that our
function do the same.
I'll update the comment to box_region_alloc() with the following warning
about alignment in the next patchset version:
| /**
| * Allocate size bytes from the box region.
| *
| * Don't use this function to allocate a memory block for a value
| * or array of values of a type with alignment requirements. A
| * violation of alignment requrements leads to undefined
| * behaviour.
| */
| API_EXPORT void *
| box_region_alloc(size_t size);
More information about the Tarantool-patches
mailing list