[Tarantool-patches] [PATCH] Add new 'allocator' option in box.cfg

Mon Dec 21 16:20:18 MSK 2020

Hi! Thanks for the patch!

I didn't review the C++ version yet, because clearly there are some
more general design problems, which can't be hidden behind C++
virtual methods syntax sugar.

Also I remember I asked you to split this into separate commits.
One commit with the refactoring when you introduce allocator object,
vtab, and refactor the code to use it. And the other commit
introducing malloc allocator. Ideally, the second commit should not
touch any code except introducing a new allocator option in box.cfg,
and adding a second allocator_create() variant in memtx_engine_new().

See 21 comments below.

On 18.12.2020 14:58, mechanik20051988 via Tarantool-patches wrote:
> Slab allocator, which is used for tuples allocation, has a certa$

1. The line is truncated with $.

> disadvantage - it tends to unresolvable fragmentation on certain
> workloads (size migration). New option allows to select
> the appropriate allocator if necessary.
> 
> @TarantoolBot document
> Title: Add new 'allocator' option
> Add new 'allocator' option which allows to select
> the appropriate allocator for memtx tuples if necessary.
> Use box.cfg{allocator="small"} or no option to use default
> small allocator, use box.cfg{allocator="system"} to use
> libc malloc.

2. It is worth explaining here when and what should users prefer.

> Closes #5419
> ---

3. Please, read this document carefully:
https://github.com/tarantool/tarantool/wiki/Code-review-procedure

You need to provide branch and issue links.

Here I assume the branch is
mechanik20051988/gh-5419-choose-allocator-for-memtx.

>  CMakeLists.txt                           |  11 ++
>  perf/allocator_perf.test.lua             |  34 ++++
>  src/box/allocator.h                      | 200 ++++++++++++++++++++
>  src/box/box.cc                           |   1 +
>  src/box/lua/load_cfg.lua                 |   2 +
>  src/box/lua/slab.c                       |  39 +++-
>  src/box/memtx_engine.c                   |  53 ++++--
>  src/box/memtx_engine.h                   |  41 +++-
>  src/box/system_allocator.h               | 226 +++++++++++++++++++++++
>  src/trivia/config.h.cmake                |   3 +
>  test/app-tap/init_script.result          |   1 +
>  test/box/admin.result                    |   4 +-
>  test/box/cfg.result                      |   8 +-
>  test/box/choose_memtx_allocator.lua      |   9 +
>  test/box/choose_memtx_allocator.result   | 135 ++++++++++++++
>  test/box/choose_memtx_allocator.test.lua |  43 +++++
>  16 files changed, 776 insertions(+), 34 deletions(-)
>  create mode 100755 perf/allocator_perf.test.lua
>  create mode 100644 src/box/allocator.h
>  create mode 100644 src/box/system_allocator.h
>  create mode 100644 test/box/choose_memtx_allocator.lua
>  create mode 100644 test/box/choose_memtx_allocator.result
>  create mode 100644 test/box/choose_memtx_allocator.test.lua
> 
> diff --git a/CMakeLists.txt b/CMakeLists.txt
> index fa6818f8e..290cd535a 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -92,6 +92,17 @@ check_symbol_exists(posix_fadvise fcntl.h HAVE_POSIX_FADVISE)
>  check_symbol_exists(fallocate fcntl.h HAVE_FALLOCATE)
>  check_symbol_exists(mremap sys/mman.h HAVE_MREMAP)
>  
> +check_function_exists(malloc_usable_size HAVE_MALLOC_USABLE_SIZE)
> +check_symbol_exists(malloc_size malloc/malloc.h HAVE_MALLOC_SIZE_DARWIN)

4. Why does it have suffix DARWIN? I don't see if this check is done
under `if TARGET_OS_DARWIN`.

> +
> +if (HAVE_MALLOC_USABLE_SIZE)
> +    if (TARGET_OS_LINUX)
> +        set(HAVE_MALLOC_USABLE_SIZE_LINUX 1)
> +    else ()
> +        set(HAVE_MALLOC_USABLE_SIZE_BSD 1)

5. Why can't you simply leave it HAVE_MALLOC_USABLE_SIZE
without OS suffix?

> +    endif ()
> +endif ()
> +
>  check_function_exists(sync_file_range HAVE_SYNC_FILE_RANGE)
>  check_function_exists(memmem HAVE_MEMMEM)
>  check_function_exists(memrchr HAVE_MEMRCHR)
> diff --git a/perf/allocator_perf.test.lua b/perf/allocator_perf.test.lua
> new file mode 100755
> index 000000000..be270379b
> --- /dev/null
> +++ b/perf/allocator_perf.test.lua
> @@ -0,0 +1,34 @@
> +#!/usr/bin/env ../src/tarantool
> +os.execute('rm -rf *.snap *.xlog *.vylog ./512 ./513 ./514 ./515 ./516 ./517 ./518 ./519 ./520 ./521')

6. Why do you touch vinyl files if you don't use vinyl here anyway?
And why only up to 521 space id?

> +local clock = require('clock')
> +box.cfg{listen = 3301, wal_mode='none', allocator=arg[1]}
> +local space = box.schema.space.create('test')
> +space:format({ {name = 'id', type = 'unsigned'}, {name = 'year', type = 'unsigned'} })
> +space:create_index('primary', { parts = {'id'} })
> +local time_insert = 0
> +local time_replace = 0
> +local time_delete = 0
> +local cnt = 0
> +local cnt_max = 20
> +local op_max = 2500000
> +local nanosec = 1.0e9
> +while cnt < cnt_max do
> +    cnt = cnt + 1
> +    local time_before = clock.monotonic64()
> +    for key = 1, op_max do space:insert({key, key + 1000}) end
> +    local time_after = clock.monotonic64()
> +    time_insert = time_insert + (time_after - time_before)
> +    time_before = clock.monotonic64()
> +    for key = 1, op_max do space:replace({key, key + 5000}) end
> +    time_after = clock.monotonic64()
> +    time_replace = time_replace + (time_after - time_before)
> +    time_before = clock.monotonic64()
> +    for key = 1, op_max do space:delete(key) end
> +    time_after = clock.monotonic64()
> +    time_delete = time_delete + (time_after - time_before)
> +end
> +io.write("{\n")
> +io.write(string.format("  \"alloc time\": \"%.3f\"\n", tonumber(time_insert) / (nanosec * cnt_max)))
> +io.write(string.format("  \"replace time\": \"%.3f\"\n", tonumber(time_replace) / (nanosec * cnt_max)))
> +io.write(string.format("  \"delete time\": \"%.3f\"\n}\n", tonumber(time_delete) / (nanosec * cnt_max)))
> +os.exit()
> diff --git a/src/box/allocator.h b/src/box/allocator.h
> new file mode 100644
> index 000000000..3bea67f50
> --- /dev/null
> +++ b/src/box/allocator.h
> @@ -0,0 +1,200 @@
> +#pragma once
> +/*
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include <small/small.h>
> +#include <trivia/util.h>
> +#include <stdarg.h>
> +
> +#include "memtx_engine.h"
> +#include "system_allocator.h"

7. System_allocator, AFAIU, should be based on allocator API, right?
But then why is it vice versa? General allocator depends on system
allocator? Something looks wrong here.

> +#include "tuple.h"

8. Tuple is not used here.

> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +#define noop_one_arg(a)
> +#define noop_two_arg(a, b)	

9. I suggest you to use git diff to locate all trailing
tabs and whitespaces in this patch. There is a lot.

> +
> +struct allocator_stats {
> +	size_t used;
> +	size_t total;
> +};
> +
> +static inline void
> +_small_allocator_create(struct memtx_engine *memtx, va_list argptr)

10. Why do you need '_' prefix?

11. Why do you have system_allocator.h, but don't have small_allocator.h?

12. Why _system_* functions are not in system_allocator.h?

13. You don't need va_list here. By doing this you try to invent a
virtual constructor, which does not exist even in C++, due to some serious
issues with that.

Normally, when you have an object with virtual methods, you create it
using a normal non-virtual function. That function fills the object fields,
initializes its vtable. And from now on you operate on it using the
virtual methods.

The constructor is never virtual. There is a pattern though when you have
a method like MyClass::Create(parameters), and depending on parameters the
method will create the needed object. But still it will use normal create()
functions inside.

Lets not try to invent a virtual constructor here. The allocator itself
is complex enough, so at least we must ensure the code is easy to follow.

> +{
> +	uint32_t objsize_min = va_arg(argptr, uint32_t);
> +	double alloc_factor = va_arg(argptr, double);
> +	return small_alloc_create(memtx->alloc, &memtx->slab_cache, 
> +		objsize_min, (float)alloc_factor);
> +}
> +
> +static inline void
> +_system_allocator_create(struct memtx_engine *memtx, MAYBE_UNUSED va_list argptr)
> +{
> +	return system_alloc_create(memtx->alloc, &memtx->quota);
> +}
> +
> +static inline void 
> +_small_allocator_stats(struct small_alloc *alloc, struct small_stats *stats, 
> +	va_list argptr)
> +{
> +	mempool_stats_cb stats_cb = 
> +		va_arg(argptr, mempool_stats_cb);
> +	void *cb_ctx = va_arg(argptr, void  *);
> +	return small_stats(alloc, stats, stats_cb, cb_ctx);
> +}
> +
> +static inline void 
> +_system_allocator_stats(struct system_alloc *alloc, struct system_stats *stats, 
> +	MAYBE_UNUSED va_list argptr)
> +{
> +	return system_stats(alloc, stats);
> +}
> +
> +#define MEM_CHECK_FUNC(prefix, func, param)					\
> +static inline void								\
> +prefix##_mem_check(MAYBE_UNUSED struct prefix##_alloc *alloc)			\
> +{										\
> +	func(alloc->param);							\
> +}
> +MEM_CHECK_FUNC(small, slab_cache_check, cache)
> +MEM_CHECK_FUNC(system, noop_one_arg, noop)
> +
> +/**
> + * Global abstract method to memory alloc
> + */
> +typedef void *(*global_alloc)(void *alloc, size_t bytes);
> +static global_alloc memtx_global_alloc;
> +
> +/**
> + * Global abstract method to memory free
> + */
> +typedef void (*global_free)(void *alloc, void *ptr, size_t bytes);
> +static global_free memtx_global_free;
> +
> +/**
> + * Global abstract method to delayed memory free
> + */
> +typedef void (*global_free_delayed)(void *alloc, void *ptr, size_t bytes);
> +static global_free_delayed memtx_global_free_delayed;
> +
> +#define DECLARE_MEMTX_ALLOCATOR_DESTROY(prefix) 				\
> +static inline void								\
> +prefix##_allocator_destroy(struct memtx_engine *memtx)  			\
> +{										\
> +	prefix##_alloc_destroy(memtx->alloc);					\
> +}
> +DECLARE_MEMTX_ALLOCATOR_DESTROY(small)
> +DECLARE_MEMTX_ALLOCATOR_DESTROY(system)
> +
> +#define DECLARE_MEMTX_ALLOCATOR_CREATE(prefix)  				\
> +static inline void								\
> +prefix##_allocator_create(struct memtx_engine *memtx, ...)  			\
> +{										\
> +	va_list argptr;								\
> +	va_start(argptr, memtx);						\
> +	_##prefix##_allocator_create(memtx, argptr);				\
> +	va_end(argptr);								\
> +}
> +DECLARE_MEMTX_ALLOCATOR_CREATE(small)
> +DECLARE_MEMTX_ALLOCATOR_CREATE(system)
> +
> +#define DECLARE_MEMTX_ALLOCATOR_ENTER_DELAYED_FREE_MODE(prefix, PREFIX) 	\
> +static inline void								\
> +prefix##_allocator_enter_delayed_free_mode(struct memtx_engine *memtx)  	\
> +{										\
> +	return prefix##_##alloc_setopt(memtx->alloc,				\
> +		PREFIX##_##DELAYED_FREE_MODE, true);				\
> +}
> +DECLARE_MEMTX_ALLOCATOR_ENTER_DELAYED_FREE_MODE(small, SMALL)
> +DECLARE_MEMTX_ALLOCATOR_ENTER_DELAYED_FREE_MODE(system, SYSTEM)
> +
> +#define DECLARE_MEMTX_ALLOCATOR_LEAVE_DELAYED_FREE_MODE(prefix, PREFIX) 	\
> +static inline void								\
> +prefix##_allocator_leave_delayed_free_mode(struct memtx_engine *memtx)  	\
> +{										\
> +	return prefix##_##alloc_setopt(memtx->alloc,				\
> +		PREFIX##_##DELAYED_FREE_MODE, false);				\
> +}
> +DECLARE_MEMTX_ALLOCATOR_LEAVE_DELAYED_FREE_MODE(small, SMALL)
> +DECLARE_MEMTX_ALLOCATOR_LEAVE_DELAYED_FREE_MODE(system, SYSTEM)
> +
> +#define DECLARE_MEMTX_ALLOCATOR_STATS(prefix)   				\
> +static inline void								\
> +prefix##_allocator_stats(struct memtx_engine *memtx,				\
> +		struct allocator_stats *stats, ...)				\

14. Why do you need va_list here again? I thought 'stats' is the only
needed argument. I see you use some 'callback' here exclusively for
Lua. I suggest to design this method more carefully instead of using
'if' + va_list everywhere. That is not the point of virtual functions.
Does not matter what you use - C or C++.

For example, you could make this function take a callback always
instead of making it optional. So the system allocator would invoke it
only once, and small allocator for each slab. I am not sure here. But
clearly va_list+if is not a silver bullet.

> +{										\
> +	va_list argptr;								\
> +	va_start(argptr, stats);						\
> +	struct prefix##_stats data_stats;					\
> +	_##prefix##_allocator_stats(memtx->alloc, &data_stats, argptr);		\
> +	va_end(argptr);								\
> +	stats->used = data_stats.used;						\
> +	stats->total = data_stats.total;					\
> +}
> +DECLARE_MEMTX_ALLOCATOR_STATS(small)
> +DECLARE_MEMTX_ALLOCATOR_STATS(system)
> +
> +#define DECLARE_MEMTX_MEM_CHECK(prefix)  					\
> +static inline void								\
> +prefix##_allocator_mem_check(struct memtx_engine *memtx)			\
> +{										\
> +	prefix##_mem_check((struct prefix##_alloc *)(memtx->alloc));    	\
> +}
> +DECLARE_MEMTX_MEM_CHECK(small)
> +DECLARE_MEMTX_MEM_CHECK(system)
> +
> +#define DECLARE_MEMTX_ALLOCATOR_CHOICE(prefix, alloc_func, free_func,   	\
> +			free_dealyed_func)					\
> +static inline void								\
> +prefix##_memtx_allocator_choice(struct memtx_engine *memtx)			\

15. 'choice' -> 'choose'.

> +{										\
> +	memtx_global_alloc = (void *)alloc_func;				\
> +	memtx_global_free = (void *)free_func;					\
> +	memtx_global_free_delayed = (void *)free_dealyed_func;  		\
> +	memtx->alloc = &memtx->prefix##_alloc; 					\
> +	memtx->memtx_allocator_create = prefix##_allocator_create;  		\
> +	memtx->memtx_allocator_destroy = prefix##_allocator_destroy;		\
> +	memtx->memtx_enter_delayed_free_mode =  				\
> +		prefix##_allocator_enter_delayed_free_mode; 			\
> +	memtx->memtx_leave_delayed_free_mode =  				\
> +		prefix##_allocator_leave_delayed_free_mode; 			\
> +	memtx->memtx_allocator_stats = prefix##_allocator_stats;		\
> +	memtx->memtx_mem_check = prefix##_allocator_mem_check;  		\
> +}
> +DECLARE_MEMTX_ALLOCATOR_CHOICE(small, smalloc, smfree, smfree_delayed)
> +DECLARE_MEMTX_ALLOCATOR_CHOICE(system, sysalloc, sysfree, sysfree_delayed)

16. Why do you need memtx_engine here? I see you use it for quota,
slab_cache, but these should also be a part of the allocator object.

I suggest you to desing it differently and more carefully. With the
allocator-specific members stored inside of the allocator object.
And you will see how much easier it will look.

> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index db2bb2333..e70cfc35a 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -1225,7 +1245,7 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end)
>  	}
>  
>  	struct memtx_tuple *memtx_tuple;
> -	while ((memtx_tuple = smalloc(&memtx->alloc, total)) == NULL) {
> +	while ((memtx_tuple = memtx_global_alloc(memtx->alloc, total)) == NULL) {

17. This wasn't virtual before, and now it is. So it will hit the
performance. Moreover, it also bypasses allocator methods.

If we go for this place being virtual, it should be something like
allocator_alloc(&memtx->alloc, size). Allocator_alloc() should be
implemented like this:

	size_t
	allocator_alloc(allocator, size)
	{
		return allocator->vtab->alloc(allocator, size);
	}

This is how virtual methods usually work in C.

If we go for this being non-virtual, then you need to design how
to keep it virtual at the tuple_format level, but non-virtual
inside. AFAIU, you had an implementation of that, but quite messy.
You could try to evolve it.

>  		bool stop;
>  		memtx_engine_run_gc(memtx, &stop);
>  		if (stop)
> diff --git a/src/box/memtx_engine.h b/src/box/memtx_engine.h
> index 8b380bf3c..cb7d4c1ce 100644
> --- a/src/box/memtx_engine.h
> +++ b/src/box/memtx_engine.h
> @@ -135,8 +137,12 @@ struct memtx_engine {
>  	struct slab_arena arena;
>  	/** Slab cache for allocating tuples. */
>  	struct slab_cache slab_cache;
> -	/** Tuple allocator. */
> -	struct small_alloc alloc;
> +	/** Small tuple allocator. */
> +	struct small_alloc small_alloc;
> +	/** System tuple allocator */
> +	struct system_alloc system_alloc;
> +	/** Tuple allocator currently used */
> +	void *alloc;
>  	/** Slab cache for allocating index extents. */
>  	struct slab_cache index_slab_cache;
>  	/** Index extent allocator. */
> @@ -178,6 +184,31 @@ struct memtx_engine {
>  	 * memtx_gc_task::link.
>  	 */
>  	struct stailq gc_queue;
> +	/**
> +	  * Method to create memtx allocator
> +	  */

18. The comments are misaligned. Exactly the same comments are
a few lines above. I suggest you to look at them as an example.

> +	void (*memtx_allocator_create)(struct memtx_engine *memtx, ...);
> +	/**
> +	  * Method to destroy memtx allocator
> +	  */
> +	void (*memtx_allocator_destroy)(struct memtx_engine *memtx);
> +	/**
> +	  * Method to enter delayed free mode
> +	  */
> +	void (*memtx_enter_delayed_free_mode)(struct memtx_engine *memtx);
> +	/**
> +	  * Method to leave delayed free mode
> +	  */
> +	void (*memtx_leave_delayed_free_mode)(struct memtx_engine *memtx);
> +	/**
> +	  * Method to get allocation statistic
> +	  */
> +	void (*memtx_allocator_stats)(struct memtx_engine *memtx, 
> +		struct allocator_stats *stats, ...);
> +	/**
> +	  * Method to memtx memory check
> +	  */
> +	void (*memtx_mem_check)(struct memtx_engine *memtx);

19. Here is the problem. Quite clearly visible. You tried to merge 2
virtual objects into 2. Memtx_engine already is virtual, because it
can be used as 'struct engine' and has a vtab. You added *second*
vtab and *second* virtual object into there - small_alloc/system_alloc/alloc.

It is not even virtual really, because you still has *3* members to
desribe it, and you use 'if' all over the code, which of course is not
the purpose of the virtual functions. They are supposed to eliminate
the branching entirely. Except when the object is created.

There should be one object called 'struct allocator' or 'struct memtx_allocator'.
It should have all the needed members inside of it - quota, slab cache,
stats, and so on.

There also should be 2 child structs - struct allocator_small and
struct allocator_malloc. You create struct allocator as one of them
using allocator_small_create() or allocator_malloc_create(). And now
all the interactions with the allocator must use the virtual methods
stored in struct allocator_vtab. No branching.

Otherwise these 'virtual' methods could be simply deleted. Because you
always branch before using them.

>  };
> diff --git a/src/box/system_allocator.h b/src/box/system_allocator.h
> new file mode 100644
> index 000000000..8e039f8e4
> --- /dev/null
> +++ b/src/box/system_allocator.h
> @@ -0,0 +1,226 @@
> +
> +struct system_alloc {
> +	/**
> +	 * Bytes allocated by system allocator
> +	 */
> +	uint64_t used_bytes;
> +	/**
> +	 * Allocator quota
> +	 */
> +	struct quota *quota;
> +	/**
> +	 * Free mode.
> +	 */
> +	enum system_free_mode free_mode;
> +	/**
> +	  * List of pointers for delayed free.
> +	  */
> +	struct lifo delayed;> +	bool init;

20. What is 'init'?

> +};
> diff --git a/test/box/choose_memtx_allocator.result b/test/box/choose_memtx_allocator.result
> new file mode 100644
> index 000000000..dab316b93
> --- /dev/null
> +++ b/test/box/choose_memtx_allocator.result

21. You didn't test snapshots. So all the code with 'delayed' free
in system alloc is not tested it seems.