From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id DB74F2AA8D for ; Thu, 21 Mar 2019 15:30:07 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iffjK_t4FaaO for ; Thu, 21 Mar 2019 15:30:07 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 8B6A52AA8A for ; Thu, 21 Mar 2019 15:30:07 -0400 (EDT) From: imeevma@tarantool.org Subject: [tarantool-patches] [PATCH v1 1/3] sql: disallow identical samples in statistics Date: Thu, 21 Mar 2019 22:30:05 +0300 Message-Id: <33d7b06c804a50e5e9047fce26f53b15b7a73b0f.1553195994.git.imeevma@gmail.com> In-Reply-To: References: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: korablev@tarantool.org Cc: tarantool-patches@freelists.org Before this patch it was possible that there were less rows of statistic in _sql_stat4 table than it should be according to number of samples created during analyze. It was this way because some of rows in statistics were identical and were replaced during inserting statistics in table _sql_stat4. This patch disallows creation of identical rows of statistics during analyze. After this patch number of statistics in _sql_stat4 will be at least no less than it was before. Needed for #2843 --- src/box/sql/analyze.c | 18 +++++++++++++++++- test/sql-tap/analyze1.test.lua | 19 ++++++++++++++++++- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 6ea598c..ffb7335 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -180,6 +180,13 @@ struct Stat4Accum { int iGet; /* Index of current sample accessed by stat_get() */ Stat4Sample *a; /* Array of mxSample Stat4Sample objects */ sql *db; /* Database connection, for malloc() */ + /* + * Count of rows with index value identical current + * index value. + */ + uint64_t identical_index_value; + /* Row number of previous periodic sample. */ + uint64_t previous_psample; }; /* Reclaim memory used by a Stat4Sample @@ -307,6 +314,8 @@ statInit(sql_context * context, int argc, sql_value ** argv) p->nKeyCol = nKeyCol; p->current.anDLt = (tRowcnt *) & p[1]; p->current.anEq = &p->current.anDLt[nColUp]; + p->identical_index_value = 0; + p->previous_psample = 0; { u8 *pSpace; /* Allocated space not yet assigned */ @@ -477,7 +486,9 @@ sampleInsert(Stat4Accum * p, Stat4Sample * pNew, int nEqZero) /* Insert the new sample */ pSample = &p->a[p->nSample]; sampleCopy(p, pSample, pNew); - p->nSample++; + if (pNew->isPSample == 0 || p->previous_psample == 0 || + p->nRow - p->previous_psample > p->identical_index_value) + p->nSample++; /* Zero the first nEqZero entries in the anEq[] array. */ memset(pSample->anEq, 0, sizeof(tRowcnt) * nEqZero); @@ -559,6 +570,10 @@ statPush(sql_context * context, int argc, sql_value ** argv) assert(p->nCol > 0); /* iChng == p->nCol means that the current and previous rows are identical */ assert(iChng <= p->nCol); + if (iChng == p->nCol) + ++p->identical_index_value; + else + p->identical_index_value = 0; if (p->nRow == 0) { /* This is the first call to this function. Do initialization. */ for (i = 0; i < p->nCol + 1; i++) @@ -592,6 +607,7 @@ statPush(sql_context * context, int argc, sql_value ** argv) p->current.iCol = 0; sampleInsert(p, &p->current, p->nCol); p->current.isPSample = 0; + p->previous_psample = p->nRow; } /* Update the aBest[] array. */ for (i = 0; i < p->nCol; i++) { diff --git a/test/sql-tap/analyze1.test.lua b/test/sql-tap/analyze1.test.lua index cc12593..959ea5e 100755 --- a/test/sql-tap/analyze1.test.lua +++ b/test/sql-tap/analyze1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test = require("sqltester") -test:plan(38) +test:plan(39) --!./tcltestrunner.lua -- 2005 July 22 @@ -546,6 +546,23 @@ test:do_execsql_test( -- }) +-- This test show that index with 1000 identical index values and +-- 25 distinct ones gives max number of samples. +test:do_test( + "analyze-7.1", + function() + test:execsql("CREATE TABLE t7(i INTEGER PRIMARY KEY, a INTEGER);") + test:execsql("CREATE INDEX i7 ON t7(a);") + for i = 0, 999 do test:execsql("INSERT INTO t7 VALUES("..i..", 0) ") end + for i = 1, 24 do test:execsql("INSERT INTO t7 VALUES(".. i + 999 .. ", ".. i ..") ") end + test:execsql("ANALYZE;") + return test:execsql([[SELECT count(*) FROM "_sql_stat4" WHERE "idx" = 'I7';]]) + end, { + -- + 24 + -- +}) + -- # This test corrupts the database file so it must be the last test -- # in the series. -- # -- 2.7.4