From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 1E0E24765E1 for ; Wed, 23 Dec 2020 16:15:05 +0300 (MSK) Received: by mail-lf1-f67.google.com with SMTP id o17so40031073lfg.4 for ; Wed, 23 Dec 2020 05:15:05 -0800 (PST) From: mechanik20051988 Date: Wed, 23 Dec 2020 16:14:47 +0300 Message-Id: <8f959e036941677f2e8a18a7aadb2b7e32e408d0.1608715671.git.mechanik20051988@gmail.com> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v2 1/3] small: implement new size class evaluation List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: v.shpilevoy@tarantool.org Cc: tarantool-patches@dev.tarantool.org From: Aleksandr Lyapunov 1. Fixed 2. Fixed 3. Fixed 4. Yes count of incremental size == (1 << effective_bits), for example: granularity = 8, alloc_factor = 1.05, min_alloc = 24 so we have: ignore_bits_count = 3, effective_bits = 4, effective_size = 16, effective_mask = 15, size_shift = 16 and we can calculate first incremental pool sizes: 24, 32, 40, 48, ... and so on 16 times 5. Change rnd to ignore bits count 6. eff means effective, change to effective 7. For faster calculation, so as not to calculate in the process of calculating the offset of the pool from the beginning of the array 8. Yes, drop it 9. The user configures the allocation factor, the actual factor may differ, but it depends directly on the parameter set by the user. User can choose the parameter so that it suits the real allocation factor. 10. Fixed 11. Fls - find last significant bit see for example https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/bitops.h#L324 12. Yes you right, it's offset in the array of pools. Rename this functions small_class_calc_offset_by_size and small_class_calc_size_by_offset 13. Fixed 14. You can calculate it manually using float log2 = logf(2); sc->effective_bits = (unsigned)(logf(log2 / logf(desired_factor)) / log2 + .5); 15. Fixed 16. These are numbers from my head, i change test to make it more clear small_alloc uses a collection of mempools of different sizes. If small_alloc stores all mempools in an array then it have to determine an offset in that array where the most suitable mempool is. Let's name the offset as 'size class' and the size that the corresponding mempool allocates as 'class size'. Historically the class sizes grow incrementally up to some point and then (at some size class) grow exponentially with user-provided factor. Apart from incremental part the exponential part is not very obvious. Binary search and floating-point logarithm could be used for size class determination but both approaches seem to be too slow. This patch implements faster size class evaluation based on integral operations. Part of #5216 --- CMakeLists.txt | 4 + LICENSE | 2 +- small/small_class.c | 56 ++++++++ small/small_class.h | 218 +++++++++++++++++++++++++++++ test/CMakeLists.txt | 9 ++ test/small_class.c | 176 +++++++++++++++++++++++ test/small_class.result | 15 ++ test/small_class_branchless.result | 15 ++ 8 files changed, 494 insertions(+), 1 deletion(-) create mode 100644 small/small_class.c create mode 100644 small/small_class.h create mode 100644 test/small_class.c create mode 100644 test/small_class.result create mode 100644 test/small_class_branchless.result diff --git a/CMakeLists.txt b/CMakeLists.txt index 4c575eb..d797128 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -53,6 +53,7 @@ set(lib_headers small/rlist.h small/slab_arena.h small/slab_cache.h + small/small_class.h small/small.h small/lsregion.h small/static.h) @@ -63,6 +64,7 @@ set(lib_sources small/region.c small/mempool.c small/slab_arena.c + small/small_class.c small/small.c small/matras.c small/ibuf.c @@ -71,6 +73,7 @@ set(lib_sources small/static.c) add_library(${PROJECT_NAME} STATIC ${lib_sources}) +target_link_libraries(${PROJECT_NAME} m) enable_testing() add_subdirectory(test) @@ -87,6 +90,7 @@ if (NOT ENABLE_VALGRIND) endif() add_library(${PROJECT_NAME}_shared SHARED ${lib_sources}) +target_link_libraries(${PROJECT_NAME}_shared m) set_target_properties(${PROJECT_NAME}_shared PROPERTIES VERSION 1.0 SOVERSION 1) set_target_properties(${PROJECT_NAME}_shared PROPERTIES OUTPUT_NAME ${PROJECT_NAME}) Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following diff --git a/small/small_class.c b/small/small_class.c new file mode 100644 index 0000000..99915a8 --- /dev/null +++ b/small/small_class.c @@ -0,0 +1,56 @@ +/* + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include "small_class.h" +#include +#include + +void +small_class_create(struct small_class *sc, unsigned granularity, + float desired_factor, unsigned min_alloc) +{ + assert(granularity > 0); /* size cannot be multiple of zero. */ + assert((granularity & (granularity - 1)) == 0); /* must power of 2. */ + assert(desired_factor > 1.f); + assert(desired_factor <= 2.f); + assert(min_alloc > 0); /* Cannot allocate zero. */ + + sc->granularity = granularity; + sc->ignore_bits_count = __builtin_ctz(granularity); + float log2 = logf(2); + sc->effective_bits = (unsigned)(logf(log2 / logf(desired_factor)) / log2 + .5); + sc->effective_size = 1u << sc->effective_bits; + sc->effective_mask = sc->effective_size - 1u; + sc->size_shift = min_alloc - granularity; + sc->size_shift_plus_1 = sc->size_shift + 1; + + sc->actual_factor = powf(2, 1.f / powf(2, sc->effective_bits)); +} diff --git a/small/small_class.h b/small/small_class.h new file mode 100644 index 0000000..0677557 --- /dev/null +++ b/small/small_class.h @@ -0,0 +1,218 @@ +#pragma once +/* + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/** + * CHAR_BIT + */ +#include + +/** + * + * small_alloc uses a collection of mempools of different sizes. + * If small_alloc stores all mempools in an array then it have to determine + * an offset in that array where the most suitable mempool is. + * Let's name the offset as 'size class' and the size that the corresponding + * mempool allocates as 'class size'. + * Historically the class sizes grow incrementally up to some point and then + * (at some size class) grow exponentially with user-provided factor. + * Apart from incremental part the exponential part is not very obvious. + * Binary search and floating-point logarithm could be used for size class + * determination but both approaches seem to be too slow. + * + * This module is designed for faster size class determination. + * The idea is to use integral binary logarithm (bit search) and improve it + * in some way in order to increase precision - allow other logarithm bases + * along with 2. + * Binary integral logarithm is just a position of the most significant bit of + * a value. Let's look closer to binary representation of an allocation size + * and size class that is calculated as binary logarithm: + * size | size class + * 00001????.. | x + * 0001?????.. | x + 1 + * 001??????.. | x + 2 + * Let's take into account n lower bits just after the most significant + * in the value and divide size class into 2^n subclasses. For example if n = 2: + * size | size class + * 0000100??.. | x + * 0000101??.. | x + 1 + * 0000110??.. | x + 2 + * 0000111??.. | x + 3 + * 000100???.. | x + 4 <- here the size doubles, in 4 = 2^n steps. + * 000101???.. | x + 5 + * That gives us some kind of approximation of a logarithm with a base equal + * to pow(2, 1 / pow(2, n)). That means that for given factor 'f' of exponent + * we can choose such a number of bits 'n' that gives us an approximation of + * an exponent that is close to 'f'. + * + * Of course if the most significant bit of a value is less than 'n' we can't + * use the formula above. But it's not a problem since we can (and even would + * like to!) use an incremental size class evaluation of those sizes. + * size | size class + * 0000001 | 1 <- incremental growth. + * 0000010 | 2 + * 0000011 | 3 + * 0000100 | 4 <- here the exponential approximation starts. + * 0000101 | 5 + * 0000110 | 6 + * 0000111 | 7 + * 000100? | 8 + * 000101? | 9 + * + * There's some implementation details. Size class is zero based, and the size + * must be rounded up to the closest class size. Even more, we want to round + * up size to some granularity, we doesn't want to have incremental pools of + * sizes 1, 2, 3.., we want them to be 8, 16, 24.... All that is achieved by + * subtracting size by one and omitting several lower bits of the size. + */ + +#if defined(__cplusplus) +extern "C" { +#endif /* defined(__cplusplus) */ + +struct small_class { + /** Every class size must be a multiple of this. */ + unsigned granularity; + /** log2(granularity), ignore those number of the lowest bit of size. */ + unsigned ignore_bits_count; + /** + * A number of bits (after the most significant bit) that are used in + * size class evaluation ('n' in the Explanation above). + */ + unsigned effective_bits; + /** 1u << effective_bits. */ + unsigned effective_size; + /** effective_size - 1u. */ + unsigned effective_mask; + /** + * By default the lowest possible allocation size (aka class size of + * class 0) is granularity. If a user wants different min_alloc, we + * simply shift sizes; min_alloc = granularity + size_shift. + */ + unsigned size_shift; + /** Actually we need 'size_shift + 1', so store it. */ + unsigned size_shift_plus_1; + /** + * Exponential factor, approximation of which we managed to provide. + * It is calculated from requested_factor, it's guaranteed that + * it must be in range [requested_factor/k, requested_factor*k], + * where k = pow(requested_factor, 0.5). + */ + float actual_factor; +}; + +/** + * Create an instance of small_class evaluator. All args must meet the + * requirements, undefined behaviour otherwise (at least assert). + * @param sc - instance to create. + * @param granularity - any class size will be a multiple of this value. + * Must be a power of 2 (and thus greater than zero). + * @param desired_factor - desired factor of growth of class size. + * Must be in (1, 2] range. Actual factor can be different. + * @param min_alloc - the lowest class size, must be greater than zero. + * The good choice is the same value as granularity. + */ +void +small_class_create(struct small_class *sc, unsigned granularity, + float desired_factor, unsigned min_alloc); + +/** + * Find position of the most significant bit in a value. + * If the value is zero the behaviour is undefined. + */ +static inline unsigned +small_class_fls(unsigned value) +{ + /* + * Usually clz is implemented as bsr XOR 0x1f. + * If we add another XOR the compiler will squash 'em and leave just bsr. + */ + unsigned clz = __builtin_clz(value); + return (sizeof(value) * CHAR_BIT - 1) ^ clz; +} + +static inline unsigned +small_class_calc_offset_by_size(struct small_class *sc, unsigned size) +{ + /* + * Usually we have to decrement size in order to: + * 1)make zero base class. + * 2)round up to class size. + * Also here is a good place to shift size if a user wants the lowest + * class size to be different from granularity. + */ + unsigned checked_size = size - sc->size_shift_plus_1; + /* Check overflow. */ + size = checked_size > size ? 0 : checked_size; + /* Omit never significant bits. */ + size >>= sc->ignore_bits_count; +#ifndef SMALL_CLASS_BRANCHLESS + if (size < sc->effective_size) + return size; /* Linear approximation, faster part. */ + /* Get log2 base part of result. Effective bits are omitted. */ + unsigned log2 = small_class_fls(size >> sc->effective_bits); +#else + /* Evaluation without branching */ + /* + * Get log2 base part of result. Effective bits are omitted. + * Also note that 1u is ORed to make log2 == 0 for smaller sizes. + */ + unsigned log2 = small_class_fls((size >> sc->effective_bits) | 1u); +#endif + /* Effective bits (and leading 1?) in size, represent small steps. */ + unsigned linear_part = size >> log2; + /* Log2 part, multiplied correspondingly, represent big steps. */ + unsigned log2_part = log2 << sc->effective_bits; + /* Combine the result. */ + return linear_part + log2_part; +} + +static inline unsigned +small_class_calc_size_by_offset(struct small_class *sc, unsigned cls) +{ + ++cls; + /* Effective bits (without leading 1) in size */ + unsigned linear_part = cls & sc->effective_mask; + /* Log2 base part of the size, maybe with leading 1 of the size. */ + unsigned log2 = cls >> sc->effective_bits; + if (log2 != 0) { + /* Remove leading 1 from log2 part and add to linear part. */ + log2--; + linear_part |= sc->effective_size; + } + /* Reassemble size and add never significant bits. */ + return sc->size_shift + (linear_part << log2 << sc->ignore_bits_count); +} + +#if defined(__cplusplus) +} /* extern "C" { */ +#endif /* defined(__cplusplus) */ + diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 8317c65..5ee0821 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -31,6 +31,13 @@ set_source_files_properties(rb_rand.cc PROPERTIES add_executable(mempool.test mempool.c) target_link_libraries(mempool.test small) +add_executable(small_class.test small_class.c unit.c) +target_link_libraries(small_class.test small) + +add_executable(small_class_branchless.test small_class.c unit.c) +target_link_libraries(small_class_branchless.test small) +target_compile_definitions(small_class_branchless.test PUBLIC SMALL_CLASS_BRANCHLESS) + add_executable(small_alloc.test small_alloc.c) target_link_libraries(small_alloc.test small) @@ -64,6 +71,8 @@ add_test(region ${CMAKE_CURRENT_BUILD_DIR}/region.test) add_test(ibuf ${CMAKE_CURRENT_BUILD_DIR}/ibuf.test) add_test(obuf ${CMAKE_CURRENT_BUILD_DIR}/obuf.test) add_test(mempool ${CMAKE_CURRENT_BUILD_DIR}/mempool.test) +add_test(small_class ${CMAKE_CURRENT_BUILD_DIR}/small_class.test) +add_test(small_class_branchless ${CMAKE_CURRENT_BUILD_DIR}/small_class_branchless.test) add_test(small_alloc ${CMAKE_CURRENT_BUILD_DIR}/small_alloc.test) add_test(lf_lifo ${CMAKE_CURRENT_BUILD_DIR}/lf_lifo.test) add_test(slab_cache ${CMAKE_CURRENT_BUILD_DIR}/slab_cache.test) diff --git a/test/small_class.c b/test/small_class.c new file mode 100644 index 0000000..b56cfeb --- /dev/null +++ b/test/small_class.c @@ -0,0 +1,176 @@ +/* + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include "small/small_class.h" +#include "unit.h" +#include + +#define SZR(arr) sizeof(arr) / sizeof(arr[0]) + +static void +test_class(void) +{ + header(); + plan(0); + + struct small_class sc; + small_class_create(&sc, 2, 1.2, 12); + unsigned class = small_class_calc_offset_by_size(&sc, 0); + unsigned class_size = small_class_calc_size_by_offset(&sc, class); + + for (unsigned size = 1; size <= 100; size++) { + unsigned cls = small_class_calc_offset_by_size(&sc, size); + if (size <= class_size) + fail_unless(cls == class); + if (size == class_size + 1) { + fail_unless(cls == class + 1); + class = cls; + class_size = small_class_calc_size_by_offset(&sc, class); + } + } + + check_plan(); + footer(); +} + +static void +check_expectation() +{ + header(); + + const unsigned granularity_arr[] = {1, 2, 4, 8}; + const unsigned test_sizes = 1024; + const unsigned test_classes = 1024; + /* + * We expect 4 effective bits with factor = 1.05, + * see small_class_create + */ + float factor = 1.05; + /* + * 1 << 4 (effective bits mentioned above) + */ + const unsigned eff_size = 16; + unsigned test_class_size[test_classes + eff_size]; + + plan(0); + + for (unsigned variant = 0; variant < SZR(granularity_arr); variant++) { + unsigned granularity = granularity_arr[variant]; + unsigned min_alloc = granularity + (rand () % 16); + + { + unsigned class_size = min_alloc - granularity; + /* incremental growth */ + for (unsigned i = 0; i < eff_size; i++) { + class_size += granularity; + test_class_size[i] = class_size; + } + /* exponential growth */ + unsigned growth = granularity; + for (unsigned i = eff_size; i < test_classes; i += eff_size) { + for (unsigned j = 0; j < eff_size; j++) { + class_size += growth; + test_class_size[i + j] = class_size; + } + growth *= 2; + } + } + + struct small_class sc; + small_class_create(&sc, granularity, factor, min_alloc); + fail_unless(sc.effective_size == eff_size); + + for (unsigned s = 0; s <= test_sizes; s++) { + unsigned expect_class = 0; + while (expect_class < test_classes && s > test_class_size[expect_class]) + expect_class++; + unsigned expect_class_size = test_class_size[expect_class]; + unsigned got_class = small_class_calc_offset_by_size(&sc, s); + unsigned got_class_size = small_class_calc_size_by_offset(&sc, got_class); + fail_unless(got_class == expect_class); + fail_unless(got_class_size == expect_class_size); + } + } + + check_plan(); + footer(); +} + +static void +check_factor() +{ + header(); + + plan(0); + + for (unsigned granularity = 1; granularity <= 4; granularity *= 4) { + for(float factor = 1.01; factor < 1.995; factor += 0.01) { + struct small_class sc; + small_class_create(&sc, granularity, factor, granularity); + float k = powf(factor, 0.5f); + fail_unless(sc.actual_factor >= factor / k && sc.actual_factor <= factor * k); + + float min_deviation = 1.f; + float max_deviation = 1.f; + /* Skip incremental growth. */ + for (unsigned i = sc.effective_size; i < sc.effective_size * 3; i++) { + unsigned cl_sz1 = small_class_calc_size_by_offset(&sc, i); + unsigned cl_sz2 = small_class_calc_size_by_offset(&sc, i + 1); + float real_growth = 1.f * cl_sz2 / cl_sz1; + float deviation = sc.actual_factor / real_growth; + if (deviation < min_deviation) + min_deviation = deviation; + if (deviation > max_deviation) + max_deviation = deviation; + } + float ln2 = logf(2); + fail_unless(min_deviation > ln2 && max_deviation < 2 * ln2); + } + } + + check_plan(); + footer(); +} + + +int +main(void) +{ + header(); + plan(3); + + test_class(); + check_expectation(); + check_factor(); + + int rc = check_plan(); + footer(); + return rc; +} diff --git a/test/small_class.result b/test/small_class.result new file mode 100644 index 0000000..de74f00 --- /dev/null +++ b/test/small_class.result @@ -0,0 +1,15 @@ + *** main *** +1..3 + *** test_class *** + 1..0 +ok 1 - subtests + *** test_class: done *** + *** check_expectation *** + 1..0 +ok 2 - subtests + *** check_expectation: done *** + *** check_factor *** + 1..0 +ok 3 - subtests + *** check_factor: done *** + *** main: done *** diff --git a/test/small_class_branchless.result b/test/small_class_branchless.result new file mode 100644 index 0000000..de74f00 --- /dev/null +++ b/test/small_class_branchless.result @@ -0,0 +1,15 @@ + *** main *** +1..3 + *** test_class *** + 1..0 +ok 1 - subtests + *** test_class: done *** + *** check_expectation *** + 1..0 +ok 2 - subtests + *** check_expectation: done *** + *** check_factor *** + 1..0 +ok 3 - subtests + *** check_factor: done *** + *** main: done *** -- 2.20.1