Hi,   Thanks for your review!   Sent v3 of the patch considering your comments. >Суббота, 3 октября 2020, 1:43 +03:00 от Vladislav Shpilevoy : >  >Hi! Thanks for the patch and for the good research, seriously! > >See 11 comments below. > >On 30.09.2020 02:06, Ilya Kosarev wrote: >> Tarantool codebase had at least two functions to generate random >> integer in given range and both of them had problems at least with >> integer overflow. This patch brings nice function to generate random >> int64_t in given range without overflow while preserving uniform random >> generator distribution using unbiased bitmask with rejection method. >> It is now possible to use xoshiro256++ PRNG or random bytes as a basis. >> Most relevant replacements have been made. >> >> Closes #5075 >> --- >> Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-5075-fine-random-in-range >> Issue: https://github.com/tarantool/tarantool/issues/5075 >> >> @ChangeLog: >> * Introduce overflow resistant & evenly distributed random in range (gh-5075). > >1. I wouldn't put it into the changelog. It is not a visible bugfix, >nor a public feature. Rather some refactoring. Ok, i see, removed in v3 of the patch. > >> Changes in v2: >> - implemented fine 64-bit random >> - implemented fine range bounding >> >> src/lib/core/random.c | 57 +++++++++++++++++++++++++++++++++++++++++++ >> src/lib/core/random.h | 38 +++++++++++++++++++++++++++++ >> src/lib/swim/swim.c | 26 +++----------------- >> test/unit/histogram.c | 12 +++------ > >2. Please, add some unit tests checking that, for example, >the randoms work fine even on the maximal range [INT64_MIN, INT64_MAX], >with some edge cases like [INT64_MIN, INT64_MIN], [INT64_MAX, INT64_MAX], >[1, 1], [0, 0], and some regular cases. Also the tests should check the >borders are respected, and ASAN passes (originally the issue was discovered >by ASAN which found signed integer overflows under ENABLE_UB_SANITIZER >option). All right, done in v3. > >ASAN checks for signed int overflow UB are disabled for now because >of SQL producing this UB a lot. But you can enable it by changing >compiler.cmake, and running the unit tests. If you want to be sure >it won't fail, when the overflow check will be enabled after SQL is >fixed. Yes, thanks for the instruction, checked locally. > >> 4 files changed, 102 insertions(+), 31 deletions(-) >> >> diff --git a/src/lib/core/random.c b/src/lib/core/random.c >> index f4fa75b1c..4b52408ea 100644 >> --- a/src/lib/core/random.c >> +++ b/src/lib/core/random.c> @@ -63,6 +65,8 @@ random_init(void) >> srand: >> srandom(seed); >> srand(seed); >> + >> + random_bytes((char *)state, 32); > >3. Never hardcode an array size except maybe in its declaration. >sizeof(state) will return the same, but it is safe in case the >state size ever changes. Right, fixed in v3. > >> } >> >> void >> @@ -97,3 +101,56 @@ rand: >> while (generated < size) >> buf[generated++] = rand(); >> } >> + >> +uint64_t >> +real_random(void) >> +{ >> + uint64_t result; >> + random_bytes((char *)(&result), 8); >> + return result; >> +} >> + >> +static inline >> +uint64_t rotl(const uint64_t x, int k) > >4. Sorry, we never put return type on the same line as function >name. Also why did you declare 'x' const? Why 'k' is not const >then? It seems you copy-pasted it from somewhere without >changing. Please, convert it to our code style. Yes, right, didn’t think about the difference. Fixed in v3. Well, yes, it was borrowed from here originally, as mentioned: http://prng.di.unimi.it/xoshiro256plusplus.c > >5. What is 'rotl'? Comment is added in v3: «rotate left». > >> +{ >> + return (x << k) | (x >> (64 - k)); >> +} >> + >> +uint64_t >> +pseudo_random(void) >> +{ >> + const uint64_t result = rotl(state[0] + state[3], 23) + state[0]; >> + const uint64_t t = state[1] << 17; >> + state[2] ^= state[0]; >> + state[3] ^= state[1]; >> + state[1] ^= state[2]; >> + state[0] ^= state[3]; >> + state[2] ^= t; >> + state[3] = rotl(state[3], 45);> + return result; >> +} >> + >> +static inline uint64_t >> +random_in_range(uint64_t (*rnd)(void), uint64_t range) >> +{ >> + uint64_t mask = UINT64_MAX >> __builtin_clzll(range | 1); > >6. Use bit_clz_u64(). You may not have __builtin_clzll. All right, fixed in v3. > >> + uint64_t result; >> + do { >> + result = rnd() & mask; > >7. I guess we can hope the compiler will inline rnd() virtual >call to a non-virtual call below, but I can't be 100% sure, and >nobody can. Even if now you see it inlined in the assembly. Better >duplicate these few lines or use a macro, than relying on the >compiler. A single virtual call would ruin the perf of the >algorithm you chose over mersen twister. I was quite sure it is reliable, but couldn’t find the proof. Changed in v3. > >> + } while (result > range); >> + return result; >> +} >> + >> +int64_t >> +real_random_in_range(int64_t min, int64_t max) >> +{ >> + assert(max >= min); >> + return min + random_in_range(real_random, (uint64_t) (max - min)); > >8. I guess this will fail UB checks from ASAN, because max-min will >be firstly stored into int64_t and only then cast to uint64_t. The >first step can overflow int64_t. I remember there was a simple solution, >but can't remember what it was. Yes, ASAN is fine with  ( uint64_t )max - min.  Fixed in v3. > >> +} >> + >> +int64_t >> +pseudo_random_in_range(int64_t min, int64_t max) >> +{ >> + assert(max >= min); >> + return min + random_in_range(pseudo_random, (uint64_t) (max - min)); >> +} >> diff --git a/src/lib/core/random.h b/src/lib/core/random.h >> index 26109b14b..9228df72e 100644 >> --- a/src/lib/core/random.h >> +++ b/src/lib/core/random.h >> @@ -45,6 +46,43 @@ random_free(void); >> void >> random_bytes(char *buf, size_t size); >> >> +/** >> + * Just 8 random_bytes(). >> + */ >> +uint64_t >> +real_random(void); >> + >> +/** >> + * xoshiro256++ pseudo random generator. >> + * http://prng.di.unimi.it/ >> + * State is initialized with random_bytes(). >> + */ >> +uint64_t >> +pseudo_random(void); >> + >> +/** >> + * Return a random int64_t number within given boundaries. >> + * >> + * Instead of blindly calculating a modulo, this function uses >> + * unbiased bitmask with rejection method to provide number in >> + * given boundaries while preserving uniform distribution and >> + * avoiding overflow. It uses random bytes as a basis. > >9. Are the boundaries included or not? Both included, both >excluded, first included and last excluded? Added to the comment in v3. > >> + */ >> +int64_t >> +real_random_in_range(int64_t min, int64_t max); >> + >> +/** >> + * Return a pseudo random int64_t number within given boundaries. >> + * >> + * Instead of blindly calculating a modulo, this function uses >> + * unbiased bitmask with rejection method to provide number in >> + * given boundaries while preserving uniform distribution and >> + * avoiding overflow. It uses xoshiro256++ as an uint64 pseudo >> + * random generator. >> + */ >> +int64_t >> +pseudo_random_in_range(int64_t min, int64_t max); >> + >> #if defined(__cplusplus) >> } > >10. When you changed SWIM to use the new generator, you made the >tests harder to reproduce in case they fail. SWIM tests print random >seed in the beginning, so as it could be used to repeat the test. >Now you need to print the state of your pseudo generator too, so as >it could be restored and reproduced. See swim_run_test(). All right, done in v3. > >11. Would be nice to see motivation why the chosen generator was >chosen in the comments in random.c. Right, added some arguments in comment.   -- Ilya Kosarev