From: "Ilya Kosarev" <i.kosarev@tarantool.org> To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] core: introduce evenly distributed int64 random in range Date: Sun, 06 Sep 2020 16:19:58 +0300 [thread overview] Message-ID: <1599398398.275563907@f169.i.mail.ru> (raw) In-Reply-To: <67586d6e-46aa-e4f4-1ee7-3fb733bd80db@tarantool.org> [-- Attachment #1: Type: text/plain, Size: 5273 bytes --] Hi! Thanks for your review. See my answers & questions below. >Суббота, 5 сентября 2020, 1:39 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>: > >Hi! Thanks for the patch! > >See 2 comments below. > >On 04.09.2020 15:51, Ilya Kosarev wrote: >> Tarantool codebase had at least two functions to generate random >> integer in given range and both of them had problems at least with >> integer overflow. This patch brings nice function to generate random >> int64_t in given range without overflow while preserving uniform random >> generator distribution. Most relevant replacements have been made. >> >> Closes #5075 >> --- >> Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-5075-fine-random-in-range >> Issue: https://github.com/tarantool/tarantool/issues/5075 >> >> @ChangeLog: >> * Introduce overflow resistand & evenly distributed random in range (gh-5075). > >1. resistand -> resistant > >> >> src/lib/core/random.c | 8 ++++++++ >> src/lib/core/random.h | 12 ++++++++++++ >> src/lib/swim/swim.c | 26 ++++---------------------- >> test/unit/histogram.c | 12 +++--------- >> 4 files changed, 27 insertions(+), 31 deletions(-) >> >> diff --git a/src/lib/core/random.c b/src/lib/core/random.c >> index f4fa75b1c1..2229c5fbbc 100644 >> --- a/src/lib/core/random.c >> +++ b/src/lib/core/random.c >> @@ -97,3 +97,11 @@ rand: >> while (generated < size) >> buf[generated++] = rand(); >> } >> + >> +int64_t >> +random_in_range(int64_t min, int64_t max) >> +{ >> + assert(max >= min); >> + double drand = drand48(); >> + return (int64_t)(drand * max + (1 - drand) * min); >> +} > >2. The problem here is that you won't get an overflow, but you won't get a >completely random number either. This function is not able to return >anything > 48 bits without precision loss. So for example if I will give it >a range [INT64_MIN, INT64_MAX], it will never return lots of integers > 2^48. >Especially the ones > 2^53, since starting from this point double will loose >integer precision. * Ok, right, i see the problem here. Didn’t think it through enough. > >I expect a random function be able to return any value from the given range. >Not only certain ones. > >In other words, this solution is not much better than rand() %. It also is not >able to return the complete range. > >I was thinking about that a lot, and this is why I filed a ticket. It is not >a trivial task. > >The function should be able to return any value from 0 to 2^64-1 to cover the >entire [INT64_MIN, INT64_MAX], or [0, UINT64_MAX]. > >I was thinking we could use random_bytes() to get a value [0, UINT64_MAX], and >then scale it down to the given range. For example, if the range is [A, B], then >its length L = B - A. And the random value is R. > >Then a scaled down value would be > > S = R / UINT64_MAX * L > >The problem here is the precision loss again. When you will divide R / UINT64_MAX >to get a value from [0, 1] range, the result is likely to be garbage. Double >can't handle that precision. This is where I stopped and filed the ticket. * Well, yes, scaling doesn’t really seem to work as good as we want to. I looked through this page: https://www.pcg-random.org/posts/bounded-rands.html on generation in range and found out a bounding method which i think is the most suitable for us. It works for int64_t too and can be used both for «complete random» and «pseudo-random»: ``` uint32_t bounded_rand(rng_t& rng, uint32_t range) { uint32_t mask = ~uint32_t(0); --range; mask >>= __builtin_clz(range|1); uint32_t x; do { x = rng() & mask; } while (x > range); return x; } ``` >Another issue - random_bytes() is easy to use, but it is not pseudo-random. It is >completely random. This is actually good usually, but not for the tests. In swim >unit tests knowing random seed is the only way to reproduce a fail. That makes me >think we need two random functions: for pseudo-random values and for real random. > >Pseudo-random could be generated by concatenation of several rand() calls using ><< and |. > >Another option - implement Mersenne twister algorithm of pseudo-random numbers of >uint64_t type. > >Also I would look into what std:: has in C++. Its random functionality is much >richer than rand(). * Right, C++ has mersenne_twister int64_t implementation and some solution to get int64_t in range as following: ``` std::random_device rd; std::mt19937_64 generator(rd()); std::uniform_int_distribution<int64_t> range(min, max); return range(generator); ``` However, C++ implementation seems to be overcomplicated. I think it is better idea to implement something simple & fast. For the mersenne_twister the good idea is to adopt this implementation, as far as i see it: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html But what i think might be even better is to take something less classic while more relevant, according to this paper: https://arxiv.org/pdf/1910.06437.pdf I think the best option for us now is xoshiro256++: http://prng.di.unimi.it/ It seems to be much faster and doesn’t fail any known statistical test as far as i see. The implementation to adapt: http://prng.di.unimi.it/xoshiro256plusplus.c -- Ilya Kosarev [-- Attachment #2: Type: text/html, Size: 7483 bytes --]
next prev parent reply other threads:[~2020-09-06 13:19 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-04 13:51 Ilya Kosarev 2020-09-04 22:39 ` Vladislav Shpilevoy 2020-09-06 13:19 ` Ilya Kosarev [this message] 2020-09-17 13:56 ` Vladislav Shpilevoy -- strict thread matches above, loose matches on Subject: below -- 2020-09-04 13:20 Ilya Kosarev
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1599398398.275563907@f169.i.mail.ru \ --to=i.kosarev@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] core: introduce evenly distributed int64 random in range' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox