From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 68F5343040D for ; Sat, 5 Sep 2020 01:39:12 +0300 (MSK) References: <20200904135149.2756-1-i.kosarev@tarantool.org> From: Vladislav Shpilevoy Message-ID: <67586d6e-46aa-e4f4-1ee7-3fb733bd80db@tarantool.org> Date: Sat, 5 Sep 2020 00:39:10 +0200 MIME-Version: 1.0 In-Reply-To: <20200904135149.2756-1-i.kosarev@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] core: introduce evenly distributed int64 random in range List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ilya Kosarev , alyapunov@tarantool.org Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the patch! See 2 comments below. On 04.09.2020 15:51, Ilya Kosarev wrote: > Tarantool codebase had at least two functions to generate random > integer in given range and both of them had problems at least with > integer overflow. This patch brings nice function to generate random > int64_t in given range without overflow while preserving uniform random > generator distribution. Most relevant replacements have been made. > > Closes #5075 > --- > Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-5075-fine-random-in-range > Issue: https://github.com/tarantool/tarantool/issues/5075 > > @ChangeLog: > * Introduce overflow resistand & evenly distributed random in range (gh-5075). 1. resistand -> resistant > > src/lib/core/random.c | 8 ++++++++ > src/lib/core/random.h | 12 ++++++++++++ > src/lib/swim/swim.c | 26 ++++---------------------- > test/unit/histogram.c | 12 +++--------- > 4 files changed, 27 insertions(+), 31 deletions(-) > > diff --git a/src/lib/core/random.c b/src/lib/core/random.c > index f4fa75b1c1..2229c5fbbc 100644 > --- a/src/lib/core/random.c > +++ b/src/lib/core/random.c > @@ -97,3 +97,11 @@ rand: > while (generated < size) > buf[generated++] = rand(); > } > + > +int64_t > +random_in_range(int64_t min, int64_t max) > +{ > + assert(max >= min); > + double drand = drand48(); > + return (int64_t)(drand * max + (1 - drand) * min); > +} 2. The problem here is that you won't get an overflow, but you won't get a completely random number either. This function is not able to return anything > 48 bits without precision loss. So for example if I will give it a range [INT64_MIN, INT64_MAX], it will never return lots of integers > 2^48. Especially the ones > 2^53, since starting from this point double will loose integer precision. I expect a random function be able to return any value from the given range. Not only certain ones. In other words, this solution is not much better than rand() %. It also is not able to return the complete range. I was thinking about that a lot, and this is why I filed a ticket. It is not a trivial task. The function should be able to return any value from 0 to 2^64-1 to cover the entire [INT64_MIN, INT64_MAX], or [0, UINT64_MAX]. I was thinking we could use random_bytes() to get a value [0, UINT64_MAX], and then scale it down to the given range. For example, if the range is [A, B], then its length L = B - A. And the random value is R. Then a scaled down value would be S = R / UINT64_MAX * L The problem here is the precision loss again. When you will divide R / UINT64_MAX to get a value from [0, 1] range, the result is likely to be garbage. Double can't handle that precision. This is where I stopped and filed the ticket. Another issue - random_bytes() is easy to use, but it is not pseudo-random. It is completely random. This is actually good usually, but not for the tests. In swim unit tests knowing random seed is the only way to reproduce a fail. That makes me think we need two random functions: for pseudo-random values and for real random. Pseudo-random could be generated by concatenation of several rand() calls using << and |. Another option - implement Mersenne twister algorithm of pseudo-random numbers of uint64_t type. Also I would look into what std:: has in C++. Its random functionality is much richer than rand().