* [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server @ 2018-04-23 20:29 Nikita Pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) Nikita Pettik ` (4 more replies) 0 siblings, 5 replies; 19+ messages in thread From: Nikita Pettik @ 2018-04-23 20:29 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik Branch: https://github.com/tarantool/tarantool/tree/np/gh-3253-move-statistics-to-server Issue: https://github.com/tarantool/tarantool/issues/3253 This patch-set is about transfering statistics containing in SQL specific struct Table and struct Index to the struct index of Tarantool. Unused statistics are removed at all. First patch of the series provides slight optimization of simple 'SELECT COUNT(*)' query compilation. There is no need to hesistate which index to choose: always use PK. It also allows to remove one usage of statistics from index. Second patch removes obsolete SQLite calculation of average tuple size: it is always possible to get size of space, count of tuples and divide them. Third patch adds simple wrapper to fetch tuple count of given table from server. The last and the main one, introduces new member of index's opts to describe statistics concerning distribution of tuples. Within this patch, routine used to load statistics from _sql_stat1 and _sql_stat4 has been reworked to operate directly on Tarantool's structs. It also includes codestyle fixes. Nikita Pettik (4): sql: optimize compilation of SELECT COUNT(*) sql: add average tuple size calculation sql: refactor usages of table's tuple count sql: move SQL statistics to server src/box/index_def.c | 92 +++++ src/box/index_def.h | 90 +++++ src/box/sql/analyze.c | 787 ++++++++++++++++++++--------------------- src/box/sql/build.c | 107 ------ src/box/sql/parse.y | 2 - src/box/sql/pragma.c | 25 +- src/box/sql/prepare.c | 7 +- src/box/sql/select.c | 210 +++++------ src/box/sql/sqliteInt.h | 123 +++++-- src/box/sql/vdbe.c | 7 +- src/box/sql/where.c | 231 ++++++++---- src/box/sql/whereexpr.c | 4 +- test/sql-tap/analyze9.test.lua | 81 +++-- test/sql-tap/eqp.test.lua | 2 +- 14 files changed, 954 insertions(+), 814 deletions(-) -- 2.15.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) 2018-04-23 20:29 [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server Nikita Pettik @ 2018-04-23 20:29 ` Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-04-23 20:29 ` [tarantool-patches] [PATCH 2/4] sql: add average tuple size calculation Nikita Pettik ` (3 subsequent siblings) 4 siblings, 1 reply; 19+ messages in thread From: Nikita Pettik @ 2018-04-23 20:29 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik Originally, SQLite chooses the best index to perform COUNT operation. In Tarantool there is no any considerable difference between them, so lets don't spend time on this routine and choose always primary index, in case of simple query 'SELECT COUNT(*) FROM <tab>'. Also, patch contains codestyle fixes. --- src/box/sql/select.c | 172 ++++++++++++++++++---------------------------- src/box/sql/vdbe.c | 2 - test/sql-tap/eqp.test.lua | 2 +- 3 files changed, 68 insertions(+), 108 deletions(-) diff --git a/src/box/sql/select.c b/src/box/sql/select.c index d97e466b5..0df8a71d4 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -35,6 +35,8 @@ */ #include <box/coll.h> #include "sqliteInt.h" +#include "tarantoolInt.h" +#include "box/schema.h" #include "box/session.h" /* @@ -4262,42 +4264,42 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax) return eRet; } -/* - * The select statement passed as the first argument is an aggregate query. - * The second argument is the associated aggregate-info object. This - * function tests if the SELECT is of the form: +/** + * The second argument is the associated aggregate-info object. + * This function tests if the SELECT is of the form: * * SELECT count(*) FROM <tbl> * - * where table is a database table, not a sub-select or view. If the query - * does match this pattern, then a pointer to the Table object representing - * <tbl> is returned. Otherwise, 0 is returned. + * where table is not a sub-select or view. + * + * @param select The select statement in form of aggregate query. + * @param agg_info The associated aggregate-info object. + * @retval Pointer to space representing the table, + * if the query matches this pattern. NULL otherwise. */ -static Table * -isSimpleCount(Select * p, AggInfo * pAggInfo) +static struct space* +is_simple_count(struct Select *select, struct AggInfo *agg_info) { - Table *pTab; - Expr *pExpr; - - assert(!p->pGroupBy); - - if (p->pWhere || p->pEList->nExpr != 1 - || p->pSrc->nSrc != 1 || p->pSrc->a[0].pSelect) { - return 0; - } - pTab = p->pSrc->a[0].pTab; - pExpr = p->pEList->a[0].pExpr; - assert(pTab && !space_is_view(pTab) && pExpr); - if (pExpr->op != TK_AGG_FUNCTION) - return 0; - if (NEVER(pAggInfo->nFunc == 0)) - return 0; - if ((pAggInfo->aFunc[0].pFunc->funcFlags & SQLITE_FUNC_COUNT) == 0) - return 0; - if (pExpr->flags & EP_Distinct) - return 0; - - return pTab; + assert(select->pGroupBy == NULL); + if (select->pWhere != NULL || select->pEList->nExpr != 1 || + select->pSrc->nSrc != 1 || select->pSrc->a[0].pSelect != NULL) { + return NULL; + } + uint32_t space_id = + SQLITE_PAGENO_TO_SPACEID(select->pSrc->a[0].pTab->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL && !space->def->opts.is_view); + struct Expr *expr = select->pEList->a[0].pExpr; + assert(expr != NULL); + if (expr->op != TK_AGG_FUNCTION) + return NULL; + if (NEVER(agg_info->nFunc == 0)) + return NULL; + if ((agg_info->aFunc[0].pFunc->funcFlags & SQLITE_FUNC_COUNT) == 0) + return NULL; + if (expr->flags & EP_Distinct) + return NULL; + return space; } /* @@ -5294,25 +5296,23 @@ updateAccumulator(Parse * pParse, AggInfo * pAggInfo) } } -/* - * Add a single OP_Explain instruction to the VDBE to explain a simple - * count(*) query ("SELECT count(*) FROM pTab"). +/** + * Add a single OP_Explain instruction to the VDBE to explain + * a simple count(*) query ("SELECT count(*) FROM <tab>"). + * + * @param parse_context Current parsing context. + * @param table_name Name of table being queried. */ #ifndef SQLITE_OMIT_EXPLAIN static void -explainSimpleCount(Parse * pParse, /* Parse context */ - Table * pTab, /* Table being queried */ - Index * pIdx) /* Index used to optimize scan, or NULL */ +explain_simple_count(struct Parse *parse_context, const char *table_name) { - if (pParse->explain == 2) { - int bCover = (pIdx != 0 && !IsPrimaryKeyIndex(pIdx)); - char *zEqp = sqlite3MPrintf(pParse->db, "SCAN TABLE %s%s%s", - pTab->zName, - bCover ? " USING COVERING INDEX " : - "", - bCover ? pIdx->zName : ""); - sqlite3VdbeAddOp4(pParse->pVdbe, OP_Explain, pParse->iSelectId, - 0, 0, zEqp, P4_DYNAMIC); + if (parse_context->explain == 2) { + char *zEqp = sqlite3MPrintf(parse_context->db, "SCAN TABLE %s", + table_name); + sqlite3VdbeAddOp4(parse_context->pVdbe, OP_Explain, + parse_context->iSelectId, 0, 0, zEqp, + P4_DYNAMIC); } } #else @@ -6124,71 +6124,32 @@ sqlite3Select(Parse * pParse, /* The parser context */ } /* endif pGroupBy. Begin aggregate queries without GROUP BY: */ else { - ExprList *pDel = 0; -#ifndef SQLITE_OMIT_BTREECOUNT - Table *pTab; - if ((pTab = isSimpleCount(p, &sAggInfo)) != 0) { - /* If isSimpleCount() returns a pointer to a Table structure, then - * the SQL statement is of the form: + struct space *space = is_simple_count(p, &sAggInfo); + if (space != NULL) { + /* + * If is_simple_count() returns a pointer to + * space, then the SQL statement is of the form: * * SELECT count(*) FROM <tbl> * - * where the Table structure returned represents table <tbl>. - * - * This statement is so common that it is optimized specially. The - * OP_Count instruction is executed either on the intkey table that - * contains the data for table <tbl> or on one of its indexes. It - * is better to execute the op on an index, as indexes are almost - * always spread across less pages than their corresponding tables. + * This statement is so common that it is + * optimized specially. The OP_Count instruction + * is executed on the primary key index, + * since there is no difference which index + * to choose (except for partial indexes). */ - const int iCsr = pParse->nTab++; /* Cursor to scan b-tree */ - Index *pIdx; /* Iterator variable */ - KeyInfo *pKeyInfo = 0; /* Keyinfo for scanned index */ - Index *pBest; /* Best index found so far */ - int iRoot = pTab->tnum; /* Root page of scanned b-tree */ - - /* Search for the index that has the lowest scan cost. - * - * (2011-04-15) Do not do a full scan of an unordered index. - * - * (2013-10-03) Do not count the entries in a partial index. - * - * In practice the KeyInfo structure will not be used. It is only - * passed to keep OP_OpenRead happy. + const int cursor = pParse->nTab++; + /* + * Open the cursor, execute the OP_Count, + * close the cursor. */ - pBest = sqlite3PrimaryKeyIndex(pTab); - for (pIdx = pTab->pIndex; pIdx; - pIdx = pIdx->pNext) { - if (pIdx->bUnordered == 0 - && pIdx->szIdxRow < pTab->szTabRow - && pIdx->pPartIdxWhere == 0 - && (!pBest - || pIdx->szIdxRow < - pBest->szIdxRow) - ) { - pBest = pIdx; - } - } - if (pBest) { - iRoot = pBest->tnum; - pKeyInfo = - sqlite3KeyInfoOfIndex(pParse, db, - pBest); - } - - /* Open a read-only cursor, execute the OP_Count, close the cursor. */ - emit_open_cursor(pParse, iCsr, iRoot); - if (pKeyInfo) { - sqlite3VdbeChangeP4(v, -1, - (char *)pKeyInfo, - P4_KEYINFO); - } - sqlite3VdbeAddOp2(v, OP_Count, iCsr, + emit_open_cursor(pParse, cursor, + space->def->id << 10); + sqlite3VdbeAddOp2(v, OP_Count, cursor, sAggInfo.aFunc[0].iMem); - sqlite3VdbeAddOp1(v, OP_Close, iCsr); - explainSimpleCount(pParse, pTab, pBest); + sqlite3VdbeAddOp1(v, OP_Close, cursor); + explain_simple_count(pParse, space->def->name); } else -#endif /* SQLITE_OMIT_BTREECOUNT */ { /* Check if the query is of one of the following forms: * @@ -6217,6 +6178,7 @@ sqlite3Select(Parse * pParse, /* The parser context */ */ ExprList *pMinMax = 0; u8 flag = WHERE_ORDERBY_NORMAL; + ExprList *pDel = 0; assert(p->pGroupBy == 0); assert(flag == 0); @@ -6267,6 +6229,7 @@ sqlite3Select(Parse * pParse, /* The parser context */ } sqlite3WhereEnd(pWInfo); finalizeAggFunctions(pParse, &sAggInfo); + sqlite3ExprListDelete(db, pDel); } sSort.pOrderBy = 0; @@ -6274,7 +6237,6 @@ sqlite3Select(Parse * pParse, /* The parser context */ SQLITE_JUMPIFNULL); selectInnerLoop(pParse, p, p->pEList, -1, 0, 0, pDest, addrEnd, addrEnd); - sqlite3ExprListDelete(db, pDel); } sqlite3VdbeResolveLabel(v, addrEnd); diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index 823718398..aaf912320 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -2831,7 +2831,6 @@ case OP_MakeRecord: { * Store the number of entries (an integer value) in the table or index * opened by cursor P1 in register P2 */ -#ifndef SQLITE_OMIT_BTREECOUNT case OP_Count: { /* out2 */ i64 nEntry; BtCursor *pCrsr; @@ -2852,7 +2851,6 @@ case OP_Count: { /* out2 */ pOut->u.i = nEntry; break; } -#endif /* Opcode: Savepoint P1 * * P4 * * diff --git a/test/sql-tap/eqp.test.lua b/test/sql-tap/eqp.test.lua index 7d6ad8cd1..468bca0a7 100755 --- a/test/sql-tap/eqp.test.lua +++ b/test/sql-tap/eqp.test.lua @@ -736,7 +736,7 @@ test:do_eqp_test(7.1, "SELECT count(*) FROM t1", { {0, 0, 0, "SCAN TABLE T1"}, }) test:do_eqp_test(7.2, "SELECT count(*) FROM t2", { - {0, 0, 0, "SCAN TABLE T2 USING COVERING INDEX I1"}, + {0, 0, 0, "SCAN TABLE T2"}, }) -- MUST_WORK_TEST if (0 > 0) -- 2.15.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) 2018-04-23 20:29 ` [tarantool-patches] [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) Nikita Pettik @ 2018-04-24 12:51 ` Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 0 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-04-24 12:51 UTC (permalink / raw) To: Nikita Pettik, tarantool-patches See my 1 comment below. And please, rebase on the latest 2.0 - maybe some parts of your patchset will become simpler. On 23/04/2018 23:29, Nikita Pettik wrote: > Originally, SQLite chooses the best index to perform COUNT operation. > In Tarantool there is no any considerable difference between them, > so lets don't spend time on this routine and choose always primary index, > in case of simple query 'SELECT COUNT(*) FROM <tab>'. > Also, patch contains codestyle fixes. > --- > src/box/sql/select.c | 172 ++++++++++++++++++---------------------------- > src/box/sql/vdbe.c | 2 - > test/sql-tap/eqp.test.lua | 2 +- > 3 files changed, 68 insertions(+), 108 deletions(-) > > diff --git a/src/box/sql/select.c b/src/box/sql/select.c > index d97e466b5..0df8a71d4 100644 > --- a/src/box/sql/select.c > +++ b/src/box/sql/select.c > @@ -5294,25 +5296,23 @@ updateAccumulator(Parse * pParse, AggInfo * pAggInfo) > } > } > > -/* > - * Add a single OP_Explain instruction to the VDBE to explain a simple > - * count(*) query ("SELECT count(*) FROM pTab"). > +/** > + * Add a single OP_Explain instruction to the VDBE to explain > + * a simple count(*) query ("SELECT count(*) FROM <tab>"). > + * > + * @param parse_context Current parsing context. > + * @param table_name Name of table being queried. > */ > #ifndef SQLITE_OMIT_EXPLAIN> static void > -explainSimpleCount(Parse * pParse, /* Parse context */ > - Table * pTab, /* Table being queried */ > - Index * pIdx) /* Index used to optimize scan, or NULL */ > +explain_simple_count(struct Parse *parse_context, const char *table_name) > { > - if (pParse->explain == 2) { > - int bCover = (pIdx != 0 && !IsPrimaryKeyIndex(pIdx)); > - char *zEqp = sqlite3MPrintf(pParse->db, "SCAN TABLE %s%s%s", 1. In Tarantool 'memtx' count() is not a scan - it is O(1) operation, that simply gets current size of the primary index B+ tree. Maybe worth show it in the explanation. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-05-11 17:29 ` n.pettik 0 siblings, 0 replies; 19+ messages in thread From: n.pettik @ 2018-05-11 17:29 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy I have rebased on fresh 2.0, but no significant changes seem to appear. > 1. In Tarantool 'memtx' count() is not a scan - it is O(1) operation, that simply > gets current size of the primary index B+ tree. Maybe worth show it in the > explanation. Done (alongside, with tests): --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -5259,6 +5259,8 @@ updateAccumulator(Parse * pParse, AggInfo * pAggInfo) /** * Add a single OP_Explain instruction to the VDBE to explain * a simple count(*) query ("SELECT count(*) FROM <tab>"). + * For memtx engine count is a simple operation, + * which takes O(1) complexity. * * @param parse_context Current parsing context. * @param table_name Name of table being queried. @@ -5267,7 +5269,7 @@ static void explain_simple_count(struct Parse *parse_context, const char *table_name) { if (parse_context->explain == 2) { - char *zEqp = sqlite3MPrintf(parse_context->db, "SCAN TABLE %s", + char *zEqp = sqlite3MPrintf(parse_context->db, "B+tree count %s”, ============================================================================== diff --git a/test/sql-tap/eqp.test.lua b/test/sql-tap/eqp.test.lua index 468bca0a7..15d428814 100755 --- a/test/sql-tap/eqp.test.lua +++ b/test/sql-tap/eqp.test.lua @@ -733,10 +733,10 @@ test:do_execsql_test( ]]) test:do_eqp_test(7.1, "SELECT count(*) FROM t1", { - {0, 0, 0, "SCAN TABLE T1"}, + {0, 0, 0, "B+tree count T1"}, }) test:do_eqp_test(7.2, "SELECT count(*) FROM t2", { - {0, 0, 0, "SCAN TABLE T2"}, + {0, 0, 0, "B+tree count T2"}, }) -- MUST_WORK_TEST if (0 > 0) @@ -781,7 +781,7 @@ test:do_eqp_test("8.1.1", "SELECT * FROM t2", { -- {0, 0, 0, "SEARCH TABLE T2 USING INTEGER PRIMARY KEY (rowid=?)"}, -- } test:do_eqp_test("8.1.3", "SELECT count(*) FROM t2", { - {0, 0, 0, "SCAN TABLE T2"}, + {0, 0, 0, "B+tree count T2"}, }) test:do_eqp_test("8.2.1", "SELECT * FROM t1", { {0, 0, 0, "SCAN TABLE T1"}, @@ -793,7 +793,7 @@ test:do_eqp_test("8.2.3", "SELECT * FROM t1 WHERE b=? AND c=?", { {0, 0, 0, "SEARCH TABLE T1 USING PRIMARY KEY (B=? AND C=?)"}, }) test:do_eqp_test("8.2.4", "SELECT count(*) FROM t1", { - {0, 0, 0, "SCAN TABLE T1"}, + {0, 0, 0, "B+tree count T1"}, }) ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] [PATCH 2/4] sql: add average tuple size calculation 2018-04-23 20:29 [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server Nikita Pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) Nikita Pettik @ 2018-04-23 20:29 ` Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-04-23 20:29 ` [tarantool-patches] [PATCH 3/4] sql: refactor usages of table's tuple count Nikita Pettik ` (2 subsequent siblings) 4 siblings, 1 reply; 19+ messages in thread From: Nikita Pettik @ 2018-04-23 20:29 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik Average tuple size can be calculated by dividing space size by index tuple count. Thus, there is no need to hold this value as separate member. Thus, this patch removes from struct Index and struct Table such statistic. Note that now there are no partial indexes, so all indexes feature the same average tuple size. Also, commented unstable tests. Part of #3253 --- src/box/sql/analyze.c | 31 +++++++--------- src/box/sql/build.c | 44 ----------------------- src/box/sql/pragma.c | 20 +++++++++-- src/box/sql/select.c | 26 +++++--------- src/box/sql/sqliteInt.h | 24 +++++++++---- src/box/sql/where.c | 41 +++++++++++---------- test/sql-tap/analyze9.test.lua | 81 +++++++++++++++++++++--------------------- 7 files changed, 118 insertions(+), 149 deletions(-) diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 665bfbcb5..7c16a2154 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -1196,6 +1196,18 @@ sqlite3Analyze(Parse * pParse, Token * pName) sqlite3VdbeAddOp0(v, OP_Expire); } +ssize_t +sql_index_tuple_size(struct space *space, struct index *idx) +{ + assert(space != NULL); + assert(idx != NULL); + assert(idx->def->space_id == space->def->id); + ssize_t tuple_count = idx->vtab->size(idx); + ssize_t space_size = space->vtab->bsize(space); + ssize_t avg_tuple_size = DIV_OR_ZERO(space_size, tuple_count); + return avg_tuple_size; +} + /* * Used to pass information from the analyzer reader through to the * callback routine. @@ -1246,18 +1258,9 @@ decodeIntArray(char *zIntArray, /* String containing int array to decode */ while (z[0]) { if (sqlite3_strglob("unordered*", z) == 0) { pIndex->bUnordered = 1; - } else if (sqlite3_strglob("sz=[0-9]*", z) == 0) { - pIndex->szIdxRow = - sqlite3LogEst(sqlite3Atoi(z + 3)); } else if (sqlite3_strglob("noskipscan*", z) == 0) { pIndex->noSkipScan = 1; } -#ifdef SQLITE_ENABLE_COSTMULT - else if (sqlite3_strglob("costmult=[0-9]*", z) == 0) { - pIndex->pTable->costMult = - sqlite3LogEst(sqlite3Atoi(z + 9)); - } -#endif while (z[0] != 0 && z[0] != ' ') z++; while (z[0] == ' ') @@ -1321,16 +1324,6 @@ analysisLoader(void *pData, int argc, char **argv, char **NotUsed) pIndex->bUnordered = 0; decodeIntArray((char *)z, nCol, aiRowEst, pIndex->aiRowLogEst, pIndex); - if (pIndex->pPartIdxWhere == 0) - pTable->nRowLogEst = pIndex->aiRowLogEst[0]; - } else { - Index fakeIdx; - fakeIdx.szIdxRow = pTable->szTabRow; -#ifdef SQLITE_ENABLE_COSTMULT - fakeIdx.pTable = pTable; -#endif - decodeIntArray((char *)z, 1, 0, &pTable->nRowLogEst, &fakeIdx); - pTable->szTabRow = fakeIdx.szIdxRow; } return 0; diff --git a/src/box/sql/build.c b/src/box/sql/build.c index 92f3cb607..cd99d5946 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -646,7 +646,6 @@ sqlite3AddColumn(Parse * pParse, Token * pName, Token * pType) */ pCol->affinity = SQLITE_AFF_BLOB; pCol->type = FIELD_TYPE_SCALAR; - pCol->szEst = 1; } else { /* TODO: convert string of type into runtime * FIELD_TYPE value for other types. @@ -1285,40 +1284,6 @@ createTableStmt(sqlite3 * db, Table * p) return zStmt; } -/* - * Estimate the total row width for a table. - */ -static void -estimateTableWidth(Table * pTab) -{ - unsigned wTable = 0; - const Column *pTabCol; - int i; - for (i = pTab->nCol, pTabCol = pTab->aCol; i > 0; i--, pTabCol++) { - wTable += pTabCol->szEst; - } - if (pTab->iPKey < 0) - wTable++; - pTab->szTabRow = sqlite3LogEst(wTable * 4); -} - -/* - * Estimate the average size of a row for an index. - */ -static void -estimateIndexWidth(Index * pIdx) -{ - unsigned wIndex = 0; - int i; - const Column *aCol = pIdx->pTable->aCol; - for (i = 0; i < pIdx->nColumn; i++) { - i16 x = pIdx->aiColumn[i]; - assert(x < pIdx->pTable->nCol); - wIndex += x < 0 ? 1 : aCol[pIdx->aiColumn[i]].szEst; - } - pIdx->szIdxRow = sqlite3LogEst(wIndex * 4); -} - /* Return true if value x is found any of the first nCol entries of aiCol[] */ static int @@ -1765,7 +1730,6 @@ sqlite3EndTable(Parse * pParse, /* Parse context */ { Table *p; /* The new table */ sqlite3 *db = pParse->db; /* The database connection */ - Index *pIdx; /* An implied index of the table */ if (pEnd == 0 && pSelect == 0) { return; @@ -1804,12 +1768,6 @@ sqlite3EndTable(Parse * pParse, /* Parse context */ } #endif /* !defined(SQLITE_OMIT_CHECK) */ - /* Estimate the average row size for the table and for all implied indices */ - estimateTableWidth(p); - for (pIdx = p->pIndex; pIdx; pIdx = pIdx->pNext) { - estimateIndexWidth(pIdx); - } - /* If not initializing, then create new Tarantool space. * * If this is a TEMPORARY table, write the entry into the auxiliary @@ -2987,8 +2945,6 @@ sqlite3CreateIndex(Parse * pParse, /* All information about this parse */ } sqlite3DefaultRowEst(pIndex); - if (pParse->pNewTable == 0) - estimateIndexWidth(pIndex); if (pTab == pParse->pNewTable) { /* This routine has been called to create an automatic index as a diff --git a/src/box/sql/pragma.c b/src/box/sql/pragma.c index b724c9845..812a5c0a3 100644 --- a/src/box/sql/pragma.c +++ b/src/box/sql/pragma.c @@ -37,7 +37,9 @@ #include <box/box.h> #include <box/tuple.h> #include "sqliteInt.h" +#include "tarantoolInt.h" #include "vdbeInt.h" +#include "box/schema.h" #include "box/session.h" #if !defined(SQLITE_ENABLE_LOCKING_STYLE) @@ -397,17 +399,31 @@ sqlite3Pragma(Parse * pParse, Token * pId, /* First part of [schema.]id field */ for (i = sqliteHashFirst(&db->pSchema->tblHash); i; i = sqliteHashNext(i)) { Table *pTab = sqliteHashData(i); + uint32_t space_id = + SQLITE_PAGENO_TO_SPACEID(pTab->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL); + struct index *pk = space_index(space, 0); + size_t avg_tuple_size_pk = + sql_index_tuple_size(space, pk); sqlite3VdbeMultiLoad(v, 1, "ssii", pTab->zName, 0, - pTab->szTabRow, + avg_tuple_size_pk, pTab->nRowLogEst); sqlite3VdbeAddOp2(v, OP_ResultRow, 1, 4); for (pIdx = pTab->pIndex; pIdx; pIdx = pIdx->pNext) { + uint32_t iid = + SQLITE_PAGENO_TO_INDEXID(pIdx->tnum); + struct index *idx = + space_index(space, iid); + assert(idx != NULL); + size_t avg_tuple_size_idx = + sql_index_tuple_size(space, idx); sqlite3VdbeMultiLoad(v, 2, "sii", pIdx->zName, - pIdx->szIdxRow, + avg_tuple_size_idx, pIdx-> aiRowLogEst[0]); sqlite3VdbeAddOp2(v, OP_ResultRow, 1, diff --git a/src/box/sql/select.c b/src/box/sql/select.c index 0df8a71d4..391b7e0a2 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -1588,20 +1588,19 @@ generateSortTail(Parse * pParse, /* Parsing context */ * the SQLITE_ENABLE_COLUMN_METADATA compile-time option is used. */ #ifdef SQLITE_ENABLE_COLUMN_METADATA -#define columnType(A,B,C,D,E,F) columnTypeImpl(A,B,D,E,F) +#define columnType(A,B,C,D,E) columnTypeImpl(A,B,D,E) #else /* if !defined(SQLITE_ENABLE_COLUMN_METADATA) */ -#define columnType(A,B,C,D,E,F) columnTypeImpl(A,B,F) +#define columnType(A,B,C,D,E) columnTypeImpl(A,B) #endif static enum field_type -columnTypeImpl(NameContext * pNC, Expr * pExpr, +columnTypeImpl(NameContext * pNC, Expr * pExpr #ifdef SQLITE_ENABLE_COLUMN_METADATA - const char **pzOrigTab, const char **pzOrigCol, + , const char **pzOrigTab, const char **pzOrigCol, #endif - u8 * pEstWidth) +) { enum field_type column_type = FIELD_TYPE_SCALAR; int j; - u8 estWidth = 1; #ifdef SQLITE_ENABLE_COLUMN_METADATA char const *zOrigTab = 0; char const *zOrigCol = 0; @@ -1676,8 +1675,7 @@ columnTypeImpl(NameContext * pNC, Expr * pExpr, sNC.pParse = pNC->pParse; column_type = columnType(&sNC, p, 0, - &zOrigTab, &zOrigCol, - &estWidth); + &zOrigTab, &zOrigCol); } } else if (pTab->pSchema) { /* A real table */ @@ -1686,11 +1684,9 @@ columnTypeImpl(NameContext * pNC, Expr * pExpr, #ifdef SQLITE_ENABLE_COLUMN_METADATA zOrigCol = pTab->aCol[iCol].zName; zType = sqlite3ColumnType(&pTab->aCol[iCol], 0); - estWidth = pTab->aCol[iCol].szEst; zOrigTab = pTab->zName; #else column_type = sqlite3ColumnType(&pTab->aCol[iCol]); - estWidth = pTab->aCol[iCol].szEst; #endif } break; @@ -1709,8 +1705,7 @@ columnTypeImpl(NameContext * pNC, Expr * pExpr, sNC.pNext = pNC; sNC.pParse = pNC->pParse; column_type = - columnType(&sNC, p, 0, &zOrigTab, &zOrigCol, - &estWidth); + columnType(&sNC, p, 0, &zOrigTab, &zOrigCol); break; } #endif @@ -1723,8 +1718,6 @@ columnTypeImpl(NameContext * pNC, Expr * pExpr, *pzOrigCol = zOrigCol; } #endif - if (pEstWidth) - *pEstWidth = estWidth; return column_type; } @@ -1941,7 +1934,6 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse, /* Parsing contexts */ int i; Expr *p; struct ExprList_item *a; - u64 szAll = 0; assert(pSelect != 0); assert((pSelect->selFlags & SF_Resolved) != 0); @@ -1954,8 +1946,7 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse, /* Parsing contexts */ for (i = 0, pCol = pTab->aCol; i < pTab->nCol; i++, pCol++) { enum field_type type; p = a[i].pExpr; - type = columnType(&sNC, p, 0, 0, 0, &pCol->szEst); - szAll += pCol->szEst; + type = columnType(&sNC, p, 0, 0, 0); pCol->affinity = sqlite3ExprAffinity(p); pCol->type = type; @@ -1966,7 +1957,6 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse, /* Parsing contexts */ pCol->zColl = sqlite3DbStrDup(db, pColl->name); } } - pTab->szTabRow = sqlite3LogEst(szAll * 4); } /* diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h index 59662cf14..8ca8e808f 100644 --- a/src/box/sql/sqliteInt.h +++ b/src/box/sql/sqliteInt.h @@ -1396,6 +1396,11 @@ struct BusyHandler { */ #define IsPowerOfTwo(X) (((X)&((X)-1))==0) +#ifdef ZERO_OR_DIV +#undef ZERO_OR_DIV +#endif +#define DIV_OR_ZERO(NUM, DENOM) (((DENOM) != 0) ? ((NUM) / (DENOM)) : 0) + /* * The following value as a destructor means to use sqlite3DbFree(). * The sqlite3DbFree() routine requires two parameters instead of the @@ -1883,7 +1888,6 @@ struct Column { * handling a NOT NULL constraint */ char affinity; /* One of the SQLITE_AFF_... values */ - u8 szEst; /* Estimated size of value in this column. sizeof(INT)==1 */ u8 is_primkey; /* Boolean propertie for being PK */ }; @@ -1958,10 +1962,6 @@ struct Table { column number here, -1 otherwise Tarantool specifics */ i16 nCol; /* Number of columns in this table */ LogEst nRowLogEst; /* Estimated rows in table - from _sql_stat1 table */ - LogEst szTabRow; /* Estimated size of each table row in bytes */ -#ifdef SQLITE_ENABLE_COSTMULT - LogEst costMult; /* Cost multiplier for using this table */ -#endif u8 tabFlags; /* Mask of TF_* values */ u8 keyConf; /* What to do in case of uniqueness conflict on iPKey */ #ifndef SQLITE_OMIT_ALTERTABLE @@ -2152,7 +2152,6 @@ struct Index { Expr *pPartIdxWhere; /* WHERE clause for partial indices */ ExprList *aColExpr; /* Column expressions */ int tnum; /* DB Page containing root of this index */ - LogEst szIdxRow; /* Estimated average row size in bytes */ u16 nColumn; /* Number of columns stored in the index */ u8 onError; /* ON_CONFLICT_ACTION_ABORT, _IGNORE, _REPLACE, * or _NONE @@ -3908,6 +3907,19 @@ char* rename_trigger(sqlite3 *, char const *, char const *, bool *); struct coll *sqlite3GetCollSeq(Parse *, struct coll *, const char *); char sqlite3AffinityType(const char *, u8 *); void sqlite3Analyze(Parse *, Token *); + +/** + * This function returns average size of tuple in given index. + * Currently, all indexes from one space feature the same size, + * due to the absence of partial indexes. + * + * @param space Index belongs to this space. + * @param idx Index to be examined. + * @retval Average size of tuple in given index. + */ +ssize_t +sql_index_tuple_size(struct space *space, struct index *idx); + int sqlite3InvokeBusyHandler(BusyHandler *); int sqlite3AnalysisLoad(sqlite3 *); void sqlite3DeleteIndexSamples(sqlite3 *, Index *); diff --git a/src/box/sql/where.c b/src/box/sql/where.c index 2a2630281..51b53c2df 100644 --- a/src/box/sql/where.c +++ b/src/box/sql/where.c @@ -39,9 +39,11 @@ */ #include <box/coll.h> #include "sqliteInt.h" +#include "tarantoolInt.h" #include "vdbeInt.h" #include "whereInt.h" #include "box/session.h" +#include "box/schema.h" /* Forward declaration of methods */ static int whereLoopResize(sqlite3 *, WhereLoop *, int); @@ -2253,16 +2255,6 @@ whereRangeVectorLen(Parse * pParse, /* Parsing context */ return i; } -/* - * Adjust the cost C by the costMult facter T. This only occurs if - * compiled with -DSQLITE_ENABLE_COSTMULT - */ -#ifdef SQLITE_ENABLE_COSTMULT -#define ApplyCostMultiplier(C,T) C += T -#else -#define ApplyCostMultiplier(C,T) -#endif - /* * We have so far matched pBuilder->pNew->nEq terms of the * index pIndex. Try to match one more. @@ -2545,15 +2537,31 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ * seek only. Then, if this is a non-covering index, add the cost of * visiting the rows in the main table. */ - rCostIdx = - pNew->nOut + 1 + - (15 * pProbe->szIdxRow) / pSrc->pTab->szTabRow; + struct space *space = + space_by_id(SQLITE_PAGENO_TO_SPACEID(pProbe->tnum)); + assert(space != NULL); + struct index *idx = + space_index(space, + SQLITE_PAGENO_TO_INDEXID(pProbe->tnum)); + assert(idx != NULL); + /* + * FIXME: currently, the procedure below makes no + * sense, since there are no partial indexes, so + * all indexes in the space feature the same + * average tuple size. + */ + ssize_t avg_tuple_size = sql_index_tuple_size(space, idx); + struct index *pk = space_index(space, 0); + assert(pProbe->pTable == pSrc->pTab); + ssize_t avg_tuple_size_pk = sql_index_tuple_size(space, pk); + uint32_t partial_index_cost = DIV_OR_ZERO((15 * avg_tuple_size), + avg_tuple_size_pk); + rCostIdx = pNew->nOut + 1 + partial_index_cost; pNew->rRun = sqlite3LogEstAdd(rLogSize, rCostIdx); if ((pNew->wsFlags & (WHERE_IDX_ONLY | WHERE_IPK)) == 0) { pNew->rRun = sqlite3LogEstAdd(pNew->rRun, pNew->nOut + 16); } - ApplyCostMultiplier(pNew->rRun, pProbe->pTable->costMult); nOutUnadjusted = pNew->nOut; pNew->rRun += nInMul + nIn; @@ -2778,7 +2786,6 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ sPk.aiRowLogEst = aiRowEstPk; sPk.onError = ON_CONFLICT_ACTION_REPLACE; sPk.pTable = pTab; - sPk.szIdxRow = pTab->szTabRow; aiRowEstPk[0] = pTab->nRowLogEst; aiRowEstPk[1] = 0; pFirst = pSrc->pTab->pIndex; @@ -2829,8 +2836,6 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ && (pTab->tabFlags & TF_Ephemeral) == 0) { pNew->rSetup += 24; } - ApplyCostMultiplier(pNew->rSetup, - pTab->costMult); if (pNew->rSetup < 0) pNew->rSetup = 0; /* TUNING: Each index lookup yields 20 rows in the table. This @@ -2882,7 +2887,6 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ pNew->iSortIdx = b ? iSortIdx : 0; /* TUNING: Cost of full table scan is (N*3.0). */ pNew->rRun = rSize + 16; - ApplyCostMultiplier(pNew->rRun, pTab->costMult); whereLoopOutputAdjust(pWC, pNew, rSize); rc = whereLoopInsert(pBuilder, pNew); pNew->nOut = rSize; @@ -2904,7 +2908,6 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ */ int notPkPenalty = IsPrimaryKeyIndex(pProbe) ? 0 : 4; pNew->rRun = rSize + 16 + notPkPenalty; - ApplyCostMultiplier(pNew->rRun, pTab->costMult); whereLoopOutputAdjust(pWC, pNew, rSize); rc = whereLoopInsert(pBuilder, pNew); pNew->nOut = rSize; diff --git a/test/sql-tap/analyze9.test.lua b/test/sql-tap/analyze9.test.lua index 4ce575e90..3b3d52f67 100755 --- a/test/sql-tap/analyze9.test.lua +++ b/test/sql-tap/analyze9.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test = require("sqltester") -test:plan(124) +test:plan(121) testprefix = "analyze9" @@ -1347,43 +1347,44 @@ end box.internal.sql_create_function("int_to_char", int_to_char) -test:do_execsql_test( - 23.0, - [[ - DROP TABLE IF EXISTS t4; - CREATE TABLE t4(a COLLATE "unicode_ci", b, c, d, e, f, PRIMARY KEY(c, b, a)); - CREATE INDEX i41 ON t4(e); - CREATE INDEX i42 ON t4(f); - - WITH data(a, b, c, d, e, f) AS (SELECT int_to_char(0), 'xyz', 'zyx', '*', 0, 0 UNION ALL - SELECT int_to_char(f+1), b, c, d, (e+1) % 2, f+1 FROM data WHERE f<1024) - INSERT INTO t4 SELECT a, b, c, d, e, f FROM data; - ANALYZE; - ]], { - -- <23.0> - -- </23.0> - }) - -test:do_execsql_test( - 23.1, - [[ - EXPLAIN QUERY PLAN SELECT * FROM t4 WHERE (e=1 AND b='xyz' AND c='zyx' AND a<'AEA') AND f<300; - ]], { - -- <23.1> - 0, 0, 0, "SEARCH TABLE T4 USING COVERING INDEX I42 (F<?)" - -- </23.1> - }) - -test:do_execsql_test( - 23.2, - [[ - EXPLAIN QUERY PLAN SELECT * FROM t4 WHERE (e=1 AND b='xyz' AND c='zyx' AND a<'JJJ') AND f<300; - ]], { - -- <23.2> - 0, 0, 0, "SEARCH TABLE T4 USING COVERING INDEX I42 (F<?)" - -- </23.2> - }) - +-- These tests are commented until query planer will be stable. +--test:do_execsql_test( +-- 23.0, +-- [[ +-- DROP TABLE IF EXISTS t4; +-- CREATE TABLE t4(a COLLATE "unicode_ci", b, c, d, e, f, PRIMARY KEY(c, b, a)); +-- CREATE INDEX i41 ON t4(e); +-- CREATE INDEX i42 ON t4(f); +-- +-- WITH data(a, b, c, d, e, f) AS (SELECT int_to_char(0), 'xyz', 'zyx', '*', 0, 0 UNION ALL +-- SELECT int_to_char(f+1), b, c, d, (e+1) % 2, f+1 FROM data WHERE f<1024) +-- INSERT INTO t4 SELECT a, b, c, d, e, f FROM data; +-- ANALYZE; +-- ]], { +-- -- <23.0> +-- -- </23.0> +-- }) +-- +--test:do_execsql_test( +-- 23.1, +-- [[ +-- EXPLAIN QUERY PLAN SELECT * FROM t4 WHERE (e=1 AND b='xyz' AND c='zyx' AND a<'AEA') AND f<300; +-- ]], { +-- -- <23.1> +-- 0, 0, 0, "SEARCH TABLE T4 USING COVERING INDEX I42 (F<?)" +-- -- </23.1> +-- }) +-- +--test:do_execsql_test( +-- 23.2, +-- [[ +-- EXPLAIN QUERY PLAN SELECT * FROM t4 WHERE (e=1 AND b='xyz' AND c='zyx' AND a<'JJJ') AND f<300; +-- ]], { +-- -- <23.2> +-- 0, 0, 0, "SEARCH TABLE T4 USING COVERING INDEX I42 (F<?)" +-- -- </23.2> +-- }) +-- test:do_execsql_test( 24.0, [[ @@ -1591,8 +1592,6 @@ test:do_execsql_test( -- the planner should estimate that (x = 'B' AND y > 25) matches 76 rows -- (70 * 2/3 + 30). Before, due to the problem, the planner was estimating -- that this matched 100 rows. --- --- In Tarantool all indexes are covering, so planner chooses index i2. -- test:do_execsql_test( "26.2.1", @@ -1623,7 +1622,7 @@ test:do_execsql_test( EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE x='B' AND y>25 AND z=?; ]], { -- <26.2.2> - 0, 0, 0, "SEARCH TABLE T1 USING COVERING INDEX I2 (Z=?)" + 0, 0, 0, "SEARCH TABLE T1 USING COVERING INDEX I1 (X=? AND Y>?)" -- </26.2.2> }) -- 2.15.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 2/4] sql: add average tuple size calculation 2018-04-23 20:29 ` [tarantool-patches] [PATCH 2/4] sql: add average tuple size calculation Nikita Pettik @ 2018-04-24 12:51 ` Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 0 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-04-24 12:51 UTC (permalink / raw) To: Nikita Pettik, tarantool-patches See my 5 comments below, thanks. On 23/04/2018 23:29, Nikita Pettik wrote: > Average tuple size can be calculated by dividing space size by index > tuple count. Thus, there is no need to hold this value as separate > member. Thus, this patch removes from struct Index and struct Table such > statistic. Note that now there are no partial indexes, so all indexes > feature the same average tuple size. Also, commented unstable tests. > > Part of #3253 > --- > src/box/sql/analyze.c | 31 +++++++--------- > src/box/sql/build.c | 44 ----------------------- > src/box/sql/pragma.c | 20 +++++++++-- > src/box/sql/select.c | 26 +++++--------- > src/box/sql/sqliteInt.h | 24 +++++++++---- > src/box/sql/where.c | 41 +++++++++++---------- > test/sql-tap/analyze9.test.lua | 81 +++++++++++++++++++++--------------------- > 7 files changed, 118 insertions(+), 149 deletions(-) > > diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c > index 665bfbcb5..7c16a2154 100644 > --- a/src/box/sql/analyze.c > +++ b/src/box/sql/analyze.c > @@ -1196,6 +1196,18 @@ sqlite3Analyze(Parse * pParse, Token * pName) > sqlite3VdbeAddOp0(v, OP_Expire); > } > > +ssize_t > +sql_index_tuple_size(struct space *space, struct index *idx) > +{ > + assert(space != NULL); > + assert(idx != NULL); > + assert(idx->def->space_id == space->def->id); > + ssize_t tuple_count = idx->vtab->size(idx); > + ssize_t space_size = space->vtab->bsize(space); 1. Lets use wrappers: index_size() and space_bsize() - they are defined already. > diff --git a/src/box/sql/select.c b/src/box/sql/select.c > index 0df8a71d4..391b7e0a2 100644 > --- a/src/box/sql/select.c > +++ b/src/box/sql/select.c > @@ -1588,20 +1588,19 @@ generateSortTail(Parse * pParse, /* Parsing context */ > * the SQLITE_ENABLE_COLUMN_METADATA compile-time option is used. > */ > #ifdef SQLITE_ENABLE_COLUMN_METADATA > -#define columnType(A,B,C,D,E,F) columnTypeImpl(A,B,D,E,F) > +#define columnType(A,B,C,D,E) columnTypeImpl(A,B,D,E) > #else /* if !defined(SQLITE_ENABLE_COLUMN_METADATA) */ > -#define columnType(A,B,C,D,E,F) columnTypeImpl(A,B,F) > +#define columnType(A,B,C,D,E) columnTypeImpl(A,B) > #endif > static enum field_type > -columnTypeImpl(NameContext * pNC, Expr * pExpr, > +columnTypeImpl(NameContext * pNC, Expr * pExpr > #ifdef SQLITE_ENABLE_COLUMN_METADATA > - const char **pzOrigTab, const char **pzOrigCol, > + , const char **pzOrigTab, const char **pzOrigCol, 2. As I can see, the third argument is always NULL. Lets remove it too. > diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h > index 59662cf14..8ca8e808f 100644 > --- a/src/box/sql/sqliteInt.h > +++ b/src/box/sql/sqliteInt.h > @@ -1396,6 +1396,11 @@ struct BusyHandler { > */ > #define IsPowerOfTwo(X) (((X)&((X)-1))==0) > > +#ifdef ZERO_OR_DIV > +#undef ZERO_OR_DIV > +#endif > +#define DIV_OR_ZERO(NUM, DENOM) (((DENOM) != 0) ? ((NUM) / (DENOM)) : 0) 3. Divide by 0: *exists* Programmers:https://pm1.narvii.com/6585/b5b717574d0d6250181c18aadd89fbe0b3c7bf3a_hq.jpg Lets just inline it. And what is ZERO_OR_DIV? > diff --git a/src/box/sql/where.c b/src/box/sql/where.c > index 2a2630281..51b53c2df 100644 > --- a/src/box/sql/where.c > +++ b/src/box/sql/where.c > @@ -2545,15 +2537,31 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ > * seek only. Then, if this is a non-covering index, add the cost of > * visiting the rows in the main table. > */ > - rCostIdx = > - pNew->nOut + 1 + > - (15 * pProbe->szIdxRow) / pSrc->pTab->szTabRow; > + struct space *space = > + space_by_id(SQLITE_PAGENO_TO_SPACEID(pProbe->tnum)); > + assert(space != NULL); > + struct index *idx = > + space_index(space, > + SQLITE_PAGENO_TO_INDEXID(pProbe->tnum)); > + assert(idx != NULL); > + /* > + * FIXME: currently, the procedure below makes no > + * sense, since there are no partial indexes, so > + * all indexes in the space feature the same > + * average tuple size. 4. Do not forget about Vinyl. In it even with no partial indexes different ones can contain different tuple count, tuples of different size (due to specific of secondary indexes disk data structure). Now it does not support SQL, but will do. > diff --git a/test/sql-tap/analyze9.test.lua b/test/sql-tap/analyze9.test.lua > index 4ce575e90..3b3d52f67 100755 > --- a/test/sql-tap/analyze9.test.lua > +++ b/test/sql-tap/analyze9.test.lua > +-- These tests are commented until query planer will be stable. 5. What do you mean saying 'unstable'? The test is flaky or incorrect? ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 2/4] sql: add average tuple size calculation 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-05-11 17:29 ` n.pettik 0 siblings, 0 replies; 19+ messages in thread From: n.pettik @ 2018-05-11 17:29 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy [-- Attachment #1: Type: text/plain, Size: 7192 bytes --] >> +ssize_t >> +sql_index_tuple_size(struct space *space, struct index *idx) >> +{ >> + assert(space != NULL); >> + assert(idx != NULL); >> + assert(idx->def->space_id == space->def->id); >> + ssize_t tuple_count = idx->vtab->size(idx); >> + ssize_t space_size = space->vtab->bsize(space); > > 1. Lets use wrappers: index_size() and space_bsize() - they are defined > already. Done: +++ b/src/box/sql/analyze.c @@ -1197,9 +1197,10 @@ sql_index_tuple_size(struct space *space, struct index *idx) assert(space != NULL); assert(idx != NULL); assert(idx->def->space_id == space->def->id); - ssize_t tuple_count = idx->vtab->size(idx); - ssize_t space_size = space->vtab->bsize(space); - ssize_t avg_tuple_size = DIV_OR_ZERO(space_size, tuple_count); + ssize_t tuple_count = index_size(idx); + ssize_t space_size = space_bsize(space); + ssize_t avg_tuple_size = tuple_count != 0 ? + (space_size / tuple_count) : 0; >> diff --git a/src/box/sql/select.c b/src/box/sql/select.c >> index 0df8a71d4..391b7e0a2 100644 >> --- a/src/box/sql/select.c >> +++ b/src/box/sql/select.c >> @@ -1588,20 +1588,19 @@ generateSortTail(Parse * pParse, /* Parsing context */ >> * the SQLITE_ENABLE_COLUMN_METADATA compile-time option is used. >> */ >> #ifdef SQLITE_ENABLE_COLUMN_METADATA >> -#define columnType(A,B,C,D,E,F) columnTypeImpl(A,B,D,E,F) >> +#define columnType(A,B,C,D,E) columnTypeImpl(A,B,D,E) >> #else /* if !defined(SQLITE_ENABLE_COLUMN_METADATA) */ >> -#define columnType(A,B,C,D,E,F) columnTypeImpl(A,B,F) >> +#define columnType(A,B,C,D,E) columnTypeImpl(A,B) >> #endif >> static enum field_type >> -columnTypeImpl(NameContext * pNC, Expr * pExpr, >> +columnTypeImpl(NameContext * pNC, Expr * pExpr >> #ifdef SQLITE_ENABLE_COLUMN_METADATA >> - const char **pzOrigTab, const char **pzOrigCol, >> + , const char **pzOrigTab, const char **pzOrigCol, > > 2. As I can see, the third argument is always NULL. Lets remove it > too. Done: @@ -1655,8 +1655,8 @@ columnTypeImpl(NameContext * pNC, Expr * pExpr sNC.pNext = pNC; sNC.pParse = pNC->pParse; column_type = - columnType(&sNC, p, 0, - &zOrigTab, &zOrigCol); + columnType(&sNC, p, &zOrigTab, + &zOrigCol); @@ -1685,7 +1685,7 @@ columnTypeImpl(NameContext * pNC, Expr * pExpr sNC.pNext = pNC; sNC.pParse = pNC->pParse; column_type = - columnType(&sNC, p, 0, &zOrigTab, &zOrigCol); + columnType(&sNC, p, &zOrigTab, &zOrigCol); @@ -1921,7 +1921,7 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse, /* Parsing contexts */ for (i = 0, pCol = pTab->aCol; i < pTab->nCol; i++, pCol++) { enum field_type type; p = a[i].pExpr; - type = columnType(&sNC, p, 0, 0, 0); + type = columnType(&sNC, p, 0, 0); >> diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h >> index 59662cf14..8ca8e808f 100644 >> --- a/src/box/sql/sqliteInt.h >> +++ b/src/box/sql/sqliteInt.h >> @@ -1396,6 +1396,11 @@ struct BusyHandler { >> */ >> #define IsPowerOfTwo(X) (((X)&((X)-1))==0) >> +#ifdef ZERO_OR_DIV >> +#undef ZERO_OR_DIV >> +#endif >> +#define DIV_OR_ZERO(NUM, DENOM) (((DENOM) != 0) ? ((NUM) / (DENOM)) : 0) > > 3. > > Divide by 0: *exists* > Programmers:https://pm1.narvii.com/6585/b5b717574d0d6250181c18aadd89fbe0b3c7bf3a_hq.jpg <https://pm1.narvii.com/6585/b5b717574d0d6250181c18aadd89fbe0b3c7bf3a_hq.jpg> > > Lets just inline it. And what is ZERO_OR_DIV? It was just typo in naming. Anyway, I have removed this macro: +++ b/src/box/sql/sqliteInt.h @@ -1389,11 +1389,6 @@ struct BusyHandler { */ #define IsPowerOfTwo(X) (((X)&((X)-1))==0) -#ifdef ZERO_OR_DIV -#undef ZERO_OR_DIV -#endif -#define DIV_OR_ZERO(NUM, DENOM) (((DENOM) != 0) ? ((NUM) / (DENOM)) : 0) - >> diff --git a/src/box/sql/where.c b/src/box/sql/where.c >> index 2a2630281..51b53c2df 100644 >> --- a/src/box/sql/where.c >> +++ b/src/box/sql/where.c >> @@ -2545,15 +2537,31 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ >> * seek only. Then, if this is a non-covering index, add the cost of >> * visiting the rows in the main table. >> */ >> - rCostIdx = >> - pNew->nOut + 1 + >> - (15 * pProbe->szIdxRow) / pSrc->pTab->szTabRow; >> + struct space *space = >> + space_by_id(SQLITE_PAGENO_TO_SPACEID(pProbe->tnum)); >> + assert(space != NULL); >> + struct index *idx = >> + space_index(space, >> + SQLITE_PAGENO_TO_INDEXID(pProbe->tnum)); >> + assert(idx != NULL); >> + /* >> + * FIXME: currently, the procedure below makes no >> + * sense, since there are no partial indexes, so >> + * all indexes in the space feature the same >> + * average tuple size. > > 4. Do not forget about Vinyl. In it even with no partial indexes different > ones can contain different tuple count, tuples of different size (due to > specific of secondary indexes disk data structure). Now it does not support SQL, > but will do. Ok, updated comment (and inlined macro): --- a/src/box/sql/where.c +++ b/src/box/sql/where.c @@ -2544,14 +2544,17 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ * FIXME: currently, the procedure below makes no * sense, since there are no partial indexes, so * all indexes in the space feature the same - * average tuple size. + * average tuple size. Moreover, secondary + * indexes in Vinyl engine may contain different + * tuple count of different sizes. */ ssize_t avg_tuple_size = sql_index_tuple_size(space, idx); struct index *pk = space_index(space, 0); assert(pProbe->pTable == pSrc->pTab); ssize_t avg_tuple_size_pk = sql_index_tuple_size(space, pk); - uint32_t partial_index_cost = DIV_OR_ZERO((15 * avg_tuple_size), - avg_tuple_size_pk); + uint32_t partial_index_cost = + avg_tuple_size_pk != 0 ? + (15 * avg_tuple_size) / avg_tuple_size_pk : 0; >> diff --git a/test/sql-tap/analyze9.test.lua b/test/sql-tap/analyze9.test.lua >> index 4ce575e90..3b3d52f67 100755 >> --- a/test/sql-tap/analyze9.test.lua >> +++ b/test/sql-tap/analyze9.test.lua >> +-- These tests are commented until query planer will be stable. > > 5. What do you mean saying 'unstable'? The test is flaky or incorrect? Both. [-- Attachment #2: Type: text/html, Size: 27886 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] [PATCH 3/4] sql: refactor usages of table's tuple count 2018-04-23 20:29 [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server Nikita Pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) Nikita Pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 2/4] sql: add average tuple size calculation Nikita Pettik @ 2018-04-23 20:29 ` Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-04-23 20:29 ` [tarantool-patches] [PATCH 4/4] sql: move SQL statistics to server Nikita Pettik 2018-05-14 14:12 ` [tarantool-patches] Re: [PATCH 0/4] Move original SQLite's " Vladislav Shpilevoy 4 siblings, 1 reply; 19+ messages in thread From: Nikita Pettik @ 2018-04-23 20:29 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik Count of tuples containing in given space can be calculated as a number of tuples in primary index. However, if table represents temporary object such as result set of SELECT or VIEW, tuple count can't be precisely determined. In this case, default approximation is used: 1 million tuples. Part of #3253 --- src/box/sql/analyze.c | 14 +++++++++++++- src/box/sql/build.c | 4 +--- src/box/sql/pragma.c | 2 +- src/box/sql/select.c | 12 +++++------- src/box/sql/sqliteInt.h | 29 +++++++++++++++++++++++++++-- src/box/sql/where.c | 18 ++++++++++++++++-- 6 files changed, 63 insertions(+), 16 deletions(-) diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 7c16a2154..4965e899a 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -1392,7 +1392,6 @@ initAvgEq(Index * pIdx) ((i64) 100 * pIdx->aiRowEst[0]) / pIdx->aiRowEst[iCol + 1]; } - pIdx->nRowEst0 = nRow; /* Set nSum to the number of distinct (iCol+1) field prefixes that * occur in the stat4 table for this index. Set sumEq to the sum of @@ -1634,6 +1633,19 @@ loadStat4(sqlite3 * db) "\"sample\" FROM \"_sql_stat4\""); } +LogEst +sql_space_tuple_log_count(struct Table *tab) +{ + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(tab->tnum)); + if (space == NULL) + return tab->tuple_log_count; + struct index *pk = space_index(space, 0); + /* If space represents VIEW, return default number. */ + if (pk == NULL) + return DEFAULT_TUPLE_LOG_COUNT; + return sqlite3LogEst(pk->vtab->size(pk)); +} + /* * Load the content of the _sql_stat1 and sql_stat4 tables. The * contents of _sql_stat1 are used to populate the Index.aiRowEst[] diff --git a/src/box/sql/build.c b/src/box/sql/build.c index cd99d5946..5a40308fe 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -560,8 +560,6 @@ sqlite3StartTable(Parse *pParse, Token *pName, int noErr) pTable->pSchema = db->pSchema; sqlite3HashInit(&pTable->idxHash); pTable->nTabRef = 1; - pTable->nRowLogEst = 200; - assert(200 == sqlite3LogEst(1048576)); assert(pParse->pNewTable == 0); pParse->pNewTable = pTable; @@ -3154,7 +3152,7 @@ sqlite3DefaultRowEst(Index * pIdx) * number of rows in the table, or half the number of rows in the table * for a partial index. But do not let the estimate drop below 10. */ - a[0] = pIdx->pTable->nRowLogEst; + a[0] = pIdx->pTable->tuple_log_count; if (pIdx->pPartIdxWhere != 0) a[0] -= 10; assert(10 == sqlite3LogEst(2)); diff --git a/src/box/sql/pragma.c b/src/box/sql/pragma.c index 812a5c0a3..02990e422 100644 --- a/src/box/sql/pragma.c +++ b/src/box/sql/pragma.c @@ -410,7 +410,7 @@ sqlite3Pragma(Parse * pParse, Token * pId, /* First part of [schema.]id field */ pTab->zName, 0, avg_tuple_size_pk, - pTab->nRowLogEst); + sql_space_tuple_log_count(pTab)); sqlite3VdbeAddOp2(v, OP_ResultRow, 1, 4); for (pIdx = pTab->pIndex; pIdx; pIdx = pIdx->pNext) { diff --git a/src/box/sql/select.c b/src/box/sql/select.c index 391b7e0a2..02f3612c5 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -1990,8 +1990,7 @@ sqlite3ResultSetOfSelect(Parse * pParse, Select * pSelect) assert(db->lookaside.bDisable); pTab->nTabRef = 1; pTab->zName = 0; - pTab->nRowLogEst = 200; - assert(200 == sqlite3LogEst(1048576)); + pTab->tuple_log_count = DEFAULT_TUPLE_LOG_COUNT; sqlite3ColumnsFromExprList(pParse, pSelect->pEList, &pTab->nCol, &pTab->aCol); sqlite3SelectAddColumnTypeAndCollation(pParse, pTab, pSelect); @@ -4527,7 +4526,7 @@ withExpand(Walker * pWalker, struct SrcList_item *pFrom) pTab->nTabRef = 1; pTab->zName = sqlite3DbStrDup(db, pCte->zName); pTab->iPKey = -1; - pTab->nRowLogEst = 200; + pTab->tuple_log_count = DEFAULT_TUPLE_LOG_COUNT; assert(200 == sqlite3LogEst(1048576)); pTab->tabFlags |= TF_Ephemeral; pFrom->pSelect = sqlite3SelectDup(db, pCte->pSelect, 0); @@ -4721,8 +4720,7 @@ selectExpander(Walker * pWalker, Select * p) sqlite3ColumnsFromExprList(pParse, pSel->pEList, &pTab->nCol, &pTab->aCol); pTab->iPKey = -1; - pTab->nRowLogEst = 200; - assert(200 == sqlite3LogEst(1048576)); + pTab->tuple_log_count = DEFAULT_TUPLE_LOG_COUNT; pTab->tabFlags |= TF_Ephemeral; #endif } else { @@ -5540,7 +5538,7 @@ sqlite3Select(Parse * pParse, /* The parser context */ explainSetInteger(pItem->iSelectId, (u8) pParse->iNextSelectId); sqlite3Select(pParse, pSub, &dest); - pItem->pTab->nRowLogEst = pSub->nSelectRow; + pItem->pTab->tuple_log_count = pSub->nSelectRow; pItem->fg.viaCoroutine = 1; pItem->regResult = dest.iSdst; sqlite3VdbeEndCoroutine(v, pItem->regReturn); @@ -5579,7 +5577,7 @@ sqlite3Select(Parse * pParse, /* The parser context */ explainSetInteger(pItem->iSelectId, (u8) pParse->iNextSelectId); sqlite3Select(pParse, pSub, &dest); - pItem->pTab->nRowLogEst = pSub->nSelectRow; + pItem->pTab->tuple_log_count = pSub->nSelectRow; if (onceAddr) sqlite3VdbeJumpHere(v, onceAddr); retAddr = diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h index 8ca8e808f..d026a213d 100644 --- a/src/box/sql/sqliteInt.h +++ b/src/box/sql/sqliteInt.h @@ -1961,7 +1961,13 @@ struct Table { i16 iAutoIncPKey; /* If PK is marked INTEGER PRIMARY KEY AUTOINCREMENT, store column number here, -1 otherwise Tarantool specifics */ i16 nCol; /* Number of columns in this table */ - LogEst nRowLogEst; /* Estimated rows in table - from _sql_stat1 table */ + /** + * Estimated number of entries in table. + * Used only when table represents temporary objects, + * such as nested SELECTs or VIEWs. Otherwise, this stat + * can be fetched from space struct. + */ + LogEst tuple_log_count; u8 tabFlags; /* Mask of TF_* values */ u8 keyConf; /* What to do in case of uniqueness conflict on iPKey */ #ifndef SQLITE_OMIT_ALTERTABLE @@ -1972,6 +1978,16 @@ struct Table { Table *pNextZombie; /* Next on the Parse.pZombieTab list */ }; +/** + * Return logarithm of tuple count in space. + * + * @param tab Table containing id of space to be examined. + * @retval Logarithm of tuple count in space, or default values, + * if there is no corresponding space for given table. + */ +LogEst +sql_space_tuple_log_count(struct Table *tab); + /* * Allowed values for Table.tabFlags. */ @@ -2164,7 +2180,6 @@ struct Index { tRowcnt *aAvgEq; /* Average nEq values for keys not in aSample */ IndexSample *aSample; /* Samples of the left-most key */ tRowcnt *aiRowEst; /* Non-logarithmic stat1 data for this index */ - tRowcnt nRowEst0; /* Non-logarithmic number of rows in the index */ }; /* @@ -2199,6 +2214,16 @@ struct IndexSample { tRowcnt *anDLt; /* Est. number of distinct keys less than this sample */ }; +#ifdef DEFAULT_TUPLE_COUNT +#undef DEFAULT_TUPLE_COUNT +#endif +#define DEFAULT_TUPLE_COUNT 1048576 + +#ifdef DEFAULT_TUPLE_LOG_COUNT +#undef DEFAULT_TUPLE_LOG_COUNT +#endif +#define DEFAULT_TUPLE_LOG_COUNT sqlite3LogEst(DEFAULT_TUPLE_COUNT) + /* * Each token coming out of the lexer is an instance of * this structure. Tokens are also used as part of an expression. diff --git a/src/box/sql/where.c b/src/box/sql/where.c index 51b53c2df..e6f9ef431 100644 --- a/src/box/sql/where.c +++ b/src/box/sql/where.c @@ -1326,8 +1326,22 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ } /* Determine iLower and iUpper using ($P) only. */ if (nEq == 0) { + /* + * In this simple case, there are no any + * equality constraints, so initially all rows + * are in range. + */ iLower = 0; - iUpper = p->nRowEst0; + uint32_t space_id = + SQLITE_PAGENO_TO_SPACEID(p->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = + SQLITE_PAGENO_TO_INDEXID(p->tnum); + struct index *idx = + space_index(space, iid); + assert(idx != NULL); + iUpper = idx->vtab->size(idx); } else { /* Note: this call could be optimized away - since the same values must * have been requested when testing key $P in whereEqualScanEst(). @@ -2786,7 +2800,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ sPk.aiRowLogEst = aiRowEstPk; sPk.onError = ON_CONFLICT_ACTION_REPLACE; sPk.pTable = pTab; - aiRowEstPk[0] = pTab->nRowLogEst; + aiRowEstPk[0] = sql_space_tuple_log_count(pTab); aiRowEstPk[1] = 0; pFirst = pSrc->pTab->pIndex; if (pSrc->fg.notIndexed == 0) { -- 2.15.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 3/4] sql: refactor usages of table's tuple count 2018-04-23 20:29 ` [tarantool-patches] [PATCH 3/4] sql: refactor usages of table's tuple count Nikita Pettik @ 2018-04-24 12:51 ` Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 0 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-04-24 12:51 UTC (permalink / raw) To: Nikita Pettik, tarantool-patches See my 3 comments below, thanks. On 23/04/2018 23:29, Nikita Pettik wrote: > Count of tuples containing in given space can be calculated as a number of > tuples in primary index. However, if table represents temporary object > such as result set of SELECT or VIEW, tuple count can't be precisely > determined. In this case, default approximation is used: 1 million tuples. > > Part of #3253 > diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h > index 8ca8e808f..d026a213d 100644 > --- a/src/box/sql/sqliteInt.h > +++ b/src/box/sql/sqliteInt.h > @@ -2164,7 +2180,6 @@ struct Index { > tRowcnt *aAvgEq; /* Average nEq values for keys not in aSample */ > IndexSample *aSample; /* Samples of the left-most key */ > tRowcnt *aiRowEst; /* Non-logarithmic stat1 data for this index */ > - tRowcnt nRowEst0; /* Non-logarithmic number of rows in the index */ > }; > > /* > @@ -2199,6 +2214,16 @@ struct IndexSample { > tRowcnt *anDLt; /* Est. number of distinct keys less than this sample */ > }; > > +#ifdef DEFAULT_TUPLE_COUNT > +#undef DEFAULT_TUPLE_COUNT 1. Why do you need these guards? sqliteInt.h already has #ifndef SQLITEINT_H #define SQLITEINT_H guard. If it does not help, then it is error and must be fixed with no ifdef + undef for each definition. > +#endif > +#define DEFAULT_TUPLE_COUNT 1048576 > + > +#ifdef DEFAULT_TUPLE_LOG_COUNT > +#undef DEFAULT_TUPLE_LOG_COUNT > +#endif > +#define DEFAULT_TUPLE_LOG_COUNT sqlite3LogEst(DEFAULT_TUPLE_COUNT) 2. In the original code it was 200 to avoid calculation of sqlite3LogEst(1048576) on each temporary table creation. Lets do the same. > diff --git a/src/box/sql/where.c b/src/box/sql/where.c > index 51b53c2df..e6f9ef431 100644 > --- a/src/box/sql/where.c > +++ b/src/box/sql/where.c > @@ -1326,8 +1326,22 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ > } > /* Determine iLower and iUpper using ($P) only. */ > if (nEq == 0) { > + /* > + * In this simple case, there are no any > + * equality constraints, so initially all rows > + * are in range. > + */ > iLower = 0; > - iUpper = p->nRowEst0; > + uint32_t space_id = > + SQLITE_PAGENO_TO_SPACEID(p->tnum); > + struct space *space = space_by_id(space_id); > + assert(space != NULL); > + uint32_t iid = > + SQLITE_PAGENO_TO_INDEXID(p->tnum); > + struct index *idx = > + space_index(space, iid); > + assert(idx != NULL); > + iUpper = idx->vtab->size(idx); 3. Lets use index_count. And looks like this case is repeated not the first and even not the second time. Lets implement a function sql_table_size(), that takes a table pointer, and does PAGENO_TO... + space_by_id + space_index + index_size. I would put it in one of the previous patches, where this case appears first. > } else { > /* Note: this call could be optimized away - since the same values must > * have been requested when testing key $P in whereEqualScanEst(). > @@ -2786,7 +2800,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ > sPk.aiRowLogEst = aiRowEstPk; > sPk.onError = ON_CONFLICT_ACTION_REPLACE; > sPk.pTable = pTab; > - aiRowEstPk[0] = pTab->nRowLogEst; > + aiRowEstPk[0] = sql_space_tuple_log_count(pTab); > aiRowEstPk[1] = 0; > pFirst = pSrc->pTab->pIndex; > if (pSrc->fg.notIndexed == 0) { > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 3/4] sql: refactor usages of table's tuple count 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-05-11 17:29 ` n.pettik 0 siblings, 0 replies; 19+ messages in thread From: n.pettik @ 2018-05-11 17:29 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy >> +#ifdef DEFAULT_TUPLE_COUNT >> +#undef DEFAULT_TUPLE_COUNT > > 1. Why do you need these guards? sqliteInt.h already has #ifndef SQLITEINT_H > #define SQLITEINT_H guard. If it does not help, then it is error and must be > fixed with no ifdef + undef for each definition. Just for case. Since you don’t like this way, I have removed them: @@ -2206,15 +2206,9 @@ struct IndexSample { tRowcnt *anDLt; /* Est. number of distinct keys less than this sample */ }; -#ifdef DEFAULT_TUPLE_COUNT -#undef DEFAULT_TUPLE_COUNT -#endif #define DEFAULT_TUPLE_COUNT 1048576 - -#ifdef DEFAULT_TUPLE_LOG_COUNT -#undef DEFAULT_TUPLE_LOG_COUNT -#endif -#define DEFAULT_TUPLE_LOG_COUNT sqlite3LogEst(DEFAULT_TUPLE_COUNT) +/** [10*log_{2}(1048576)] == 200 */ +#define DEFAULT_TUPLE_LOG_COUNT 200 > >> +#endif >> +#define DEFAULT_TUPLE_COUNT 1048576 >> + >> +#ifdef DEFAULT_TUPLE_LOG_COUNT >> +#undef DEFAULT_TUPLE_LOG_COUNT >> +#endif >> +#define DEFAULT_TUPLE_LOG_COUNT sqlite3LogEst(DEFAULT_TUPLE_COUNT) > > 2. In the original code it was 200 to avoid calculation of > sqlite3LogEst(1048576) on each temporary table creation. Lets do the same. I did it just to easily adjusting tuple count without carrying about logarithms. M’kay, added two independent macroses and appropriate assertion on each usage (so release build will not be affected by calling sqlite3LogEst()): +++ b/src/box/sql/analyze.c @@ -1636,6 +1636,7 @@ sql_space_tuple_log_count(struct Table *tab) if (space == NULL) return tab->tuple_log_count; struct index *pk = space_index(space, 0); + assert(sqlite3LogEst(DEFAULT_TUPLE_COUNT) == DEFAULT_TUPLE_LOG_COUNT); /* If space represents VIEW, return default number. */ if (pk == NULL) return DEFAULT_TUPLE_LOG_COUNT; diff --git a/src/box/sql/select.c b/src/box/sql/select.c index db3fb3948..c02322b40 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -1966,6 +1966,7 @@ sqlite3ResultSetOfSelect(Parse * pParse, Select * pSelect) pTab->nTabRef = 1; pTab->zName = 0; pTab->tuple_log_count = DEFAULT_TUPLE_LOG_COUNT; + assert(sqlite3LogEst(DEFAULT_TUPLE_COUNT) == DEFAULT_TUPLE_LOG_COUNT); sqlite3ColumnsFromExprList(pParse, pSelect->pEList, &pTab->nCol, &pTab->aCol); sqlite3SelectAddColumnTypeAndCollation(pParse, pTab, pSelect); @@ -4495,7 +4496,8 @@ withExpand(Walker * pWalker, struct SrcList_item *pFrom) pTab->zName = sqlite3DbStrDup(db, pCte->zName); pTab->iPKey = -1; pTab->tuple_log_count = DEFAULT_TUPLE_LOG_COUNT; - assert(200 == sqlite3LogEst(1048576)); + assert(sqlite3LogEst(DEFAULT_TUPLE_COUNT) == + DEFAULT_TUPLE_LOG_COUNT); pTab->tabFlags |= TF_Ephemeral; pFrom->pSelect = sqlite3SelectDup(db, pCte->pSelect, 0); if (db->mallocFailed) @@ -4688,6 +4690,8 @@ selectExpander(Walker * pWalker, Select * p) &pTab->nCol, &pTab->aCol); pTab->iPKey = -1; pTab->tuple_log_count = DEFAULT_TUPLE_LOG_COUNT; + assert(sqlite3LogEst(DEFAULT_TUPLE_COUNT) == + DEFAULT_TUPLE_LOG_COUNT); pTab->tabFlags |= TF_Ephemeral; > >> diff --git a/src/box/sql/where.c b/src/box/sql/where.c >> index 51b53c2df..e6f9ef431 100644 >> --- a/src/box/sql/where.c >> +++ b/src/box/sql/where.c >> @@ -1326,8 +1326,22 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ >> } >> /* Determine iLower and iUpper using ($P) only. */ >> if (nEq == 0) { >> + /* >> + * In this simple case, there are no any >> + * equality constraints, so initially all rows >> + * are in range. >> + */ >> iLower = 0; >> - iUpper = p->nRowEst0; >> + uint32_t space_id = >> + SQLITE_PAGENO_TO_SPACEID(p->tnum); >> + struct space *space = space_by_id(space_id); >> + assert(space != NULL); >> + uint32_t iid = >> + SQLITE_PAGENO_TO_INDEXID(p->tnum); >> + struct index *idx = >> + space_index(space, iid); >> + assert(idx != NULL); >> + iUpper = idx->vtab->size(idx); > > 3. Lets use index_count. I guess you mean index_size()? - iUpper = idx->vtab->size(idx); + iUpper = index_size(idx); > And looks like this case is repeated not the first and > even not the second time. Lets implement a function sql_table_size(), that takes > a table pointer, and does PAGENO_TO... + space_by_id + space_index + index_size. > I would put it in one of the previous patches, where this case appears first. Well, I grepped and found index_size() only in two places across SQL module: the first is here, and the second one in sql_index_tuple_size(), which already takes space/index arguments. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] [PATCH 4/4] sql: move SQL statistics to server 2018-04-23 20:29 [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server Nikita Pettik ` (2 preceding siblings ...) 2018-04-23 20:29 ` [tarantool-patches] [PATCH 3/4] sql: refactor usages of table's tuple count Nikita Pettik @ 2018-04-23 20:29 ` Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-05-14 14:12 ` [tarantool-patches] Re: [PATCH 0/4] Move original SQLite's " Vladislav Shpilevoy 4 siblings, 1 reply; 19+ messages in thread From: Nikita Pettik @ 2018-04-23 20:29 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik SQLite provides kind of statistics concerning data holding in each index and space. It is used to help query planer build optimized plan. Such statistics don't depend on internal index implementation (e.g. B-tree or LSM tree) and contain information concerning tuple distribution. This patch moves members resbonsible statistics from original SQLite structs to Tarantool ones. To be more precise, now index's opts contains statistics from _stat1 space (arrays of integers and their loarithm approximation) and more sophisticated one from _stat4 (array of samples). It also contains set of flags to turn on/off different optimizations (such as skip-scan). After ANALYZE command is executed, statistics are saved into _sql_stat1 and _sql_stat4 spaces (in order to persist them). However, it should also be loaded into in-memory structs representing indexes. This patch reworks original SQLite routine to load it directly to newly introduced stat struct of index. Closes #3253 --- src/box/index_def.c | 92 ++++++ src/box/index_def.h | 90 ++++++ src/box/sql/analyze.c | 742 +++++++++++++++++++++++------------------------- src/box/sql/build.c | 61 ---- src/box/sql/parse.y | 2 - src/box/sql/pragma.c | 3 +- src/box/sql/prepare.c | 7 +- src/box/sql/sqliteInt.h | 72 +++-- src/box/sql/vdbe.c | 5 +- src/box/sql/where.c | 190 ++++++++----- src/box/sql/whereexpr.c | 4 +- 11 files changed, 716 insertions(+), 552 deletions(-) diff --git a/src/box/index_def.c b/src/box/index_def.c index eda39a622..14ff94192 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -47,6 +47,7 @@ const struct index_opts index_opts_default = { /* .bloom_fpr = */ 0.05, /* .lsn = */ 0, /* .sql = */ NULL, + /* .stat = */ NULL, }; const struct opt_def index_opts_reg[] = { @@ -117,6 +118,8 @@ index_def_new(uint32_t space_id, uint32_t iid, const char *name, return NULL; } } + /* Statistics are initialized separately. */ + assert(opts->stat == NULL); return def; } @@ -154,9 +157,98 @@ index_def_dup(const struct index_def *def) return NULL; } } + if (def->opts.stat != NULL) { + dup->opts.stat = malloc(sizeof(*dup->opts.stat)); + if (dup->opts.stat == NULL) { + diag_set(OutOfMemory, sizeof(*dup->opts.stat), "malloc", + "dup->opts.stat"); + index_def_delete(dup); + return NULL; + } + dup->opts.stat->is_unordered = def->opts.stat->is_unordered; + dup->opts.stat->skip_scan_enabled = + def->opts.stat->skip_scan_enabled; + size_t stat_size = (def->key_def->part_count + 1) * + sizeof(uint32_t); + dup->opts.stat->tuple_stat1 = malloc(stat_size); + if (dup->opts.stat->tuple_stat1 == NULL) { + diag_set(OutOfMemory, stat_size, "malloc", + "tuple_stat1"); + index_def_delete(dup); + return NULL; + } + memcpy(dup->opts.stat->tuple_stat1, def->opts.stat->tuple_stat1, + stat_size); + dup->opts.stat->tuple_log_est = malloc(stat_size); + if (dup->opts.stat->tuple_log_est == NULL) { + diag_set(OutOfMemory, stat_size, "malloc", + "tuple_log_est"); + index_def_delete(dup); + return NULL; + } + memcpy(dup->opts.stat->tuple_log_est, + def->opts.stat->tuple_log_est, stat_size); + uint32_t sample_count = def->opts.stat->sample_count; + dup->opts.stat->sample_count = sample_count; + dup->opts.stat->sample_field_count = + def->opts.stat->sample_field_count; + if (def->opts.stat->samples == NULL) { + dup->opts.stat->samples = NULL; + dup->opts.stat->avg_eq = NULL; + return dup; + } + size_t samples_alloc_size = + /* Array of samples. */ + sample_count * sizeof(def->opts.stat->samples[0]) + + /* Arrays eq, lt, dlt for each sample. */ + def->opts.stat->sample_count * sizeof(uint32_t) * + def->opts.stat->sample_field_count * 3 + + /* Array of avg_eq. */ + (def->key_def->part_count * sizeof(uint32_t)); + dup->opts.stat->samples = malloc(samples_alloc_size); + if (dup->opts.stat->samples == NULL) { + diag_set(OutOfMemory, samples_alloc_size, "malloc", + "samples"); + index_def_delete(dup); + return NULL; + } + memcpy(dup->opts.stat->samples, def->opts.stat->samples, + samples_alloc_size); + for (uint32_t i = 0; i < def->opts.stat->sample_count; ++i) { + size_t key_size = def->opts.stat->samples[i].key_size; + /* + * Add at the end two zero-bytes in order + * to prevent buffer overread. + */ + dup->opts.stat->samples[i].sample_key = + calloc(1, key_size + 2); + if (dup->opts.stat->samples[i].sample_key == NULL) { + diag_set(OutOfMemory, key_size + 2, "calloc", + "sample_key"); + index_def_delete(dup); + return NULL; + } + memcpy(dup->opts.stat->samples[i].sample_key, + def->opts.stat->samples[i].sample_key, key_size); + } + } return dup; } +void +index_stat_destroy_samples(struct index_stat *stat) +{ + if (stat != NULL && stat->samples != NULL) { + for (uint32_t i = 0; i < stat->sample_count; ++i) { + struct index_sample *sample = &stat->samples[i]; + free(sample->sample_key); + } + free(stat->samples); + stat->sample_count = 0; + stat->samples = NULL; + } +} + /** Free a key definition. */ void index_def_delete(struct index_def *index_def) diff --git a/src/box/index_def.h b/src/box/index_def.h index 6167570b7..52e9a0548 100644 --- a/src/box/index_def.h +++ b/src/box/index_def.h @@ -58,6 +58,68 @@ enum rtree_index_distance_type { }; extern const char *rtree_index_distance_type_strs[]; +/** Simple alias to represent logarithm metrics. */ +typedef int16_t log_est; + +/** + * One sample represents index key and three arrays with + * statistics concerning tuples distribution. + * Max number of samples for one index is adjusted by + * SQL_STAT4_SAMPLES macros. By default it is about 24 entities. + * Samples are chosen to be selective. + */ +struct index_sample { + /** Key of the sample. */ + void *sample_key; + /** Size of sample key. */ + size_t key_size; + /** + * List of integers: first one is the approximate number + * of entries in the index whose left-most field exactly + * matches the left-most column of the sample; + * second one - est. number of entries in the index where + * the first two columns match the first two columns of + * the sample; and so forth. + */ + uint32_t *eq; + /** The same as eq list, but key is less than sample. */ + uint32_t *lt; + /** The same as lt list, but includes only distinct keys. */ + uint32_t *dlt; +}; + +/** + * SQL statistics for index, which is used by query planer. + * This is general statistics, without any relation to used + * engine and data structures (e.g. B-tree or LSM tree). + * Statistics appear only after executing ANALYZE statement. + * It is loaded from _sql_stat1 and _sql_stat4 system spaces. + */ +struct index_stat { + /** An array of samples of them left-most key. */ + struct index_sample *samples; + /** Number of samples. */ + uint32_t sample_count; + /** Number of fields in sample arrays: eq, lt and dlt. */ + uint32_t sample_field_count; + /** + * List of integers: the first is the number of tuples + * in the index; the second one is the average number of + * tuples that have the same key part in the first field + * of the index; the third - for the first two fields; + * and so forth. + */ + uint32_t *tuple_stat1; + /** Logarithms of stat1 data. */ + log_est *tuple_log_est; + /** Average eq values for keys not in samples. */ + uint32_t *avg_eq; + /** Use this index for == or IN queries only. */ + bool is_unordered; + /** Don't try to use skip-scan optimization if true. */ + bool skip_scan_enabled; +}; + /** Index options */ struct index_opts { /** @@ -99,6 +161,12 @@ struct index_opts { * SQL statement that produced this index. */ char *sql; + /** + * SQL specific statistics concerning tuples + * distribution for query planer. It is automatically + * filled after running ANALYZE command. + */ + struct index_stat *stat; }; extern const struct index_opts index_opts_default; @@ -113,6 +181,27 @@ index_opts_create(struct index_opts *opts) *opts = index_opts_default; } +/** + * Release memory allocated for samples, statistics they include, + * sample keys and avg_eq array. + */ +void +index_stat_destroy_samples(struct index_stat *stat); + +/** + * Destroy index stat. + */ +static inline void +destroy_stat(struct index_stat *stat) { + if (stat == NULL) + return; + free(stat->tuple_stat1); + free(stat->tuple_log_est); + index_stat_destroy_samples(stat); + TRASH(stat); + free(stat); +} + /** * Destroy index options */ @@ -120,6 +209,7 @@ static inline void index_opts_destroy(struct index_opts *opts) { free(opts->sql); + destroy_stat(opts->stat); TRASH(opts); } diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 4965e899a..14dfbd551 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -104,8 +104,8 @@ * large nEq values. * */ -#ifndef SQLITE_OMIT_ANALYZE +#include "box/box.h" #include "box/index.h" #include "box/key_def.h" #include "box/tuple_compare.h" @@ -1217,49 +1217,46 @@ struct analysisInfo { sqlite3 *db; }; -/* - * The first argument points to a nul-terminated string containing a - * list of space separated integers. Read the first nOut of these into - * the array aOut[]. +/** + * The first argument points to a nul-terminated string + * containing a list of space separated integers. Load + * the first stat_size of these into the output arrays. + * + * @param stat_string String containing array of integers. + * @param stat_size Size of output arrays. + * @param[out] stat_exact Decoded array of statistics. + * @param[out] stat_log Decoded array of stat logariphms. + * @param[out] index Handle extra flags for this index, + * if not NULL. */ static void -decodeIntArray(char *zIntArray, /* String containing int array to decode */ - int nOut, /* Number of slots in aOut[] */ - tRowcnt * aOut, /* Store integers here */ - LogEst * aLog, /* Or, if aOut==0, here */ - Index * pIndex /* Handle extra flags for this index, if not NULL */ - ) -{ - char *z = zIntArray; - int c; - int i; - tRowcnt v; - - if (z == 0) +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, + LogEst *stat_log, struct index *index) { + char *z = (char *) stat_string; + if (z == NULL) z = ""; - - for (i = 0; *z && i < nOut; i++) { - v = 0; + for (int i = 0; *z && i < stat_size; i++) { + tRowcnt v = 0; + int c; while ((c = z[0]) >= '0' && c <= '9') { v = v * 10 + c - '0'; z++; } - if (aOut) - aOut[i] = v; - if (aLog) - aLog[i] = sqlite3LogEst(v); + if (stat_exact != NULL) + stat_exact[i] = v; + if (stat_log != NULL) + stat_log[i] = sqlite3LogEst(v); if (*z == ' ') z++; } - - if (pIndex) { - pIndex->bUnordered = 0; - pIndex->noSkipScan = 0; + if (index != NULL) { + index->def->opts.stat->is_unordered = false; + index->def->opts.stat->skip_scan_enabled = true; while (z[0]) { if (sqlite3_strglob("unordered*", z) == 0) { - pIndex->bUnordered = 1; + index->def->opts.stat->is_unordered = true; } else if (sqlite3_strglob("noskipscan*", z) == 0) { - pIndex->noSkipScan = 1; + index->def->opts.stat->skip_scan_enabled = false; } while (z[0] != 0 && z[0] != ' ') z++; @@ -1269,369 +1266,374 @@ decodeIntArray(char *zIntArray, /* String containing int array to decode */ } } -/* +/** * This callback is invoked once for each index when reading the * _sql_stat1 table. * * argv[0] = name of the table * argv[1] = name of the index (might be NULL) - * argv[2] = results of analysis - on integer for each column + * argv[2] = results of analysis - array of integers + * + * Entries for which argv[1]==NULL simply record the number + * of rows in the table. This routine also allocates memory + * for stat struct itself and statistics which are not related + * to stat4 samples. * - * Entries for which argv[1]==NULL simply record the number of rows in - * the table. + * @param data Additional analysis info. + * @param argc Number of arguments. + * @param argv Callback arguments. + * @retval 0 on success, -1 otherwise. */ static int -analysisLoader(void *pData, int argc, char **argv, char **NotUsed) +analysis_loader(void *data, int argc, char **argv, char **unused) { - analysisInfo *pInfo = (analysisInfo *) pData; - Index *pIndex; - Table *pTable; - const char *z; - assert(argc == 3); - UNUSED_PARAMETER2(NotUsed, argc); - - if (argv == 0 || argv[0] == 0 || argv[2] == 0) { + UNUSED_PARAMETER(data); + UNUSED_PARAMETER2(unused, argc); + if (argv == 0 || argv[0] == 0 || argv[2] == 0) return 0; - } - pTable = sqlite3HashFind(&pInfo->db->pSchema->tblHash, argv[0]); - if (pTable == NULL) + uint32_t space_id = box_space_id_by_name(argv[0], strlen(argv[0])); + assert(space_id != BOX_ID_NIL); + struct space *space = space_by_id(space_id); + assert(space != NULL); + struct index *index; + if (space == NULL) return 0; - if (argv[1] == 0) { - pIndex = 0; + if (argv[1] == NULL) { + index = NULL; } else if (sqlite3_stricmp(argv[0], argv[1]) == 0) { - pIndex = sqlite3PrimaryKeyIndex(pTable); + index = space_index(space, 0); + assert(index != NULL); } else { - pIndex = sqlite3HashFind(&pTable->idxHash, argv[1]); + uint32_t iid = box_index_id_by_name(space_id, argv[1], + strlen(argv[1])); + assert(iid != BOX_ID_NIL); + index = space_index(space, iid); } - z = argv[2]; - - if (pIndex) { - tRowcnt *aiRowEst = 0; - int nCol = index_column_count(pIndex) + 1; - /* Index.aiRowEst may already be set here if there are duplicate - * _sql_stat1 entries for this index. In that case just clobber - * the old data with the new instead of allocating a new array. + if (index != NULL) { + /* + * Additional field is used to describe total + * count of tuples in index. Although now all + * indexes feature the same number of tuples, + * partial indexes are going to be implemented + * someday. */ - if (pIndex->aiRowEst == 0) { - pIndex->aiRowEst = - (tRowcnt *) sqlite3MallocZero(sizeof(tRowcnt) * - nCol); - if (pIndex->aiRowEst == 0) - sqlite3OomFault(pInfo->db); + int column_count = index->def->key_def->part_count + 1; + /* + * Stat arrays may already be set here if there + * are duplicate _sql_stat1 entries for this + * index. In that case just clobber the old data + * with the new instead of allocating a new array. + */ + struct index_stat *stat = index->def->opts.stat; + if (stat == NULL) { + stat = calloc(1, sizeof(*stat)); + if (stat == NULL) { + diag_set(OutOfMemory, sizeof(*stat), "calloc", + "stat"); + return -1; + } + index->def->opts.stat = stat; } - aiRowEst = pIndex->aiRowEst; - pIndex->bUnordered = 0; - decodeIntArray((char *)z, nCol, aiRowEst, pIndex->aiRowLogEst, - pIndex); - } - - return 0; -} - -/* - * If the Index.aSample variable is not NULL, delete the aSample[] array - * and its contents. - */ -void -sqlite3DeleteIndexSamples(sqlite3 * db, Index * pIdx) -{ - if (pIdx->aSample) { - int j; - for (j = 0; j < pIdx->nSample; j++) { - IndexSample *p = &pIdx->aSample[j]; - sqlite3DbFree(db, p->p); + if (stat->tuple_stat1 == NULL) { + stat->tuple_stat1 = + calloc(column_count, sizeof(uint32_t)); + if (stat->tuple_stat1 == NULL) { + diag_set(OutOfMemory, column_count, "calloc", + "tuple_stat1"); + return -1; + } } - sqlite3DbFree(db, pIdx->aSample); - } - if (db && db->pnBytesFreed == 0) { - pIdx->nSample = 0; - pIdx->aSample = 0; + if (stat->tuple_log_est == NULL) { + stat->tuple_log_est = calloc(column_count, + sizeof(uint32_t)); + if (stat->tuple_log_est == NULL) { + diag_set(OutOfMemory, column_count, "calloc", + "tuple_log_stat"); + return -1; + } + } + decode_stat_string(argv[2], index->def->key_def->part_count + 1, + index->def->opts.stat->tuple_stat1, + index->def->opts.stat->tuple_log_est, + index); } + return 0; } -/* - * Populate the pIdx->aAvgEq[] array based on the samples currently - * stored in pIdx->aSample[]. +/** + * Calculate avg_eq array based on the samples from index. + * Some *magic* calculations happen here. */ static void -initAvgEq(Index * pIdx) +init_avg_eq(struct index *index) { - if (pIdx) { - IndexSample *aSample = pIdx->aSample; - IndexSample *pFinal = &aSample[pIdx->nSample - 1]; - int iCol; - int nCol = 1; - if (pIdx->nSampleCol > 1) { - /* If this is stat4 data, then calculate aAvgEq[] values for all - * sample columns except the last. The last is always set to 1, as - * once the trailing PK fields are considered all index keys are - * unique. - */ - nCol = pIdx->nSampleCol - 1; - pIdx->aAvgEq[nCol] = 1; + assert(index != NULL); + assert(index->def->opts.stat != NULL); + struct index_sample *samples = index->def->opts.stat->samples; + uint32_t sample_count = index->def->opts.stat->sample_count; + uint32_t field_count = index->def->opts.stat->sample_field_count; + struct index_sample *last_sample = &samples[sample_count - 1]; + if (field_count > 1) + index->def->opts.stat->avg_eq[--field_count] = 1; + for (uint32_t i = 0; i < field_count; ++i) { + uint32_t column_count = index->def->key_def->part_count; + tRowcnt eq_sum = 0; + tRowcnt eq_avg = 0; + uint32_t tuple_count = index->vtab->size(index); + uint64_t distinct_tuple_count; + uint64_t terms_sum = 0; + if (i >= column_count || + index->def->opts.stat->tuple_stat1[i + 1] == 0) { + distinct_tuple_count = 100 * last_sample->dlt[i]; + sample_count--; + } else { + assert(index->def->opts.stat->tuple_stat1 != NULL); + distinct_tuple_count = (100 * tuple_count) / + index->def->opts.stat->tuple_stat1[i + 1]; } - for (iCol = 0; iCol < nCol; iCol++) { - int nSample = pIdx->nSample; - int i; /* Used to iterate through samples */ - tRowcnt sumEq = 0; /* Sum of the nEq values */ - tRowcnt avgEq = 0; - tRowcnt nRow; /* Number of rows in index */ - i64 nSum100 = 0; /* Number of terms contributing to sumEq */ - i64 nDist100; /* Number of distinct values in index */ - int nColumn = index_column_count(pIdx); - - if (!pIdx->aiRowEst || iCol >= nColumn - || pIdx->aiRowEst[iCol + 1] == 0) { - nRow = pFinal->anLt[iCol]; - nDist100 = (i64) 100 *pFinal->anDLt[iCol]; - nSample--; - } else { - nRow = pIdx->aiRowEst[0]; - nDist100 = - ((i64) 100 * pIdx->aiRowEst[0]) / - pIdx->aiRowEst[iCol + 1]; + for (uint32_t j = 0; j < sample_count; ++j) { + if (j == (index->def->opts.stat->sample_count - 1) || + samples[j].dlt[i] != samples[j + 1].dlt[i]) { + eq_sum += samples[j].eq[i]; + terms_sum += 100; } - - /* Set nSum to the number of distinct (iCol+1) field prefixes that - * occur in the stat4 table for this index. Set sumEq to the sum of - * the nEq values for column iCol for the same set (adding the value - * only once where there exist duplicate prefixes). - */ - for (i = 0; i < nSample; i++) { - if (i == (pIdx->nSample - 1) - || aSample[i].anDLt[iCol] != - aSample[i + 1].anDLt[iCol] - ) { - sumEq += aSample[i].anEq[iCol]; - nSum100 += 100; - } - } - - if (nDist100 > nSum100) { - avgEq = - ((i64) 100 * (nRow - sumEq)) / (nDist100 - - nSum100); - } - if (avgEq == 0) - avgEq = 1; - pIdx->aAvgEq[iCol] = avgEq; } + if (distinct_tuple_count > terms_sum) { + eq_avg = 100 * (tuple_count - eq_sum) / + (distinct_tuple_count - terms_sum); + } + if (eq_avg == 0) + eq_avg = 1; + index->def->opts.stat->avg_eq[i] = eq_avg; } } -/* - * Given two IndexSample arguments, compare there payloads +/** + * Given two key arguments, compare there payloads. + * This is a simple wrapper around key_compare() to support + * qsort() interface. */ static int -sampleCompareMsgPack(const void *a, const void *b, void *arg) +sample_compare(const void *a, const void *b, void *arg) { struct key_def *def = (struct key_def *)arg; - return key_compare(((IndexSample *) a)->p, ((IndexSample *) b)->p, def); + return key_compare(((struct index_sample *) a)->sample_key, + ((struct index_sample *) b)->sample_key, def); } -/* +/** * Load the content from the _sql_stat4 table - * into the relevant Index.aSample[] arrays. + * into the relevant index->stat->samples[] arrays. + * + * Arguments must point to SQL statements that return + * data equivalent to the following: * - * Arguments zSql1 and zSql2 must point to SQL statements that return - * data equivalent to the following (statements are different for stat4, - * see the caller of this function for details): + * prepare: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx; + * load: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4; * - * zSql1: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx - * zSql2: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4 + * 'prepare' statement is used to allocate enough memory for + * statistics (i.e. arrays lt, dt, dlt and avg_eq). 'load' query + * is needed for * + * @param db Database handler. + * @param sql_select_prepare SELECT statement, see above. + * @param sql_select_load SELECT statement, see above. + * @retval SQLITE_OK on success, smth else otherwise. */ static int -loadStatTbl(sqlite3 * db, /* Database handle */ - Table * pStatTab, /* Stat table to load to */ - const char *zSql1, /* SQL statement 1 (see above) */ - const char *zSql2 /* SQL statement 2 (see above) */ - ) +load_stat_from_space(struct sqlite3 *db, const char *sql_select_prepare, + const char *sql_select_load) { - int rc; /* Result codes from subroutines */ - sqlite3_stmt *pStmt = 0; /* An SQL statement being run */ - Index *pPrevIdx = 0; /* Previous index in the loop */ - IndexSample *pSample; /* A slot in pIdx->aSample[] */ - int nIdxCnt = 0; /* Number of different indexes visited */ - int nIdx = 0; /* Index loop iterator */ - Index **aIndex = 0; /* Array holding all visited indexes */ - - assert(db->lookaside.bDisable); - - assert(pStatTab); - nIdxCnt = box_index_len(SQLITE_PAGENO_TO_SPACEID(pStatTab->tnum), 0); - - if (nIdxCnt > 0) { - aIndex = sqlite3DbMallocZero(db, sizeof(Index *) * nIdxCnt); - if (aIndex == 0) { + struct index **indexes = NULL; + uint32_t index_count = box_index_len(BOX_SQL_STAT4_ID, 0); + if (index_count > 0) { + size_t alloc_size = sizeof(struct index *) * index_count; + indexes = malloc(alloc_size); + if (indexes == NULL) { + diag_set(OutOfMemory, alloc_size, "malloc", "indexes"); return SQLITE_NOMEM_BKPT; } } - - rc = sqlite3_prepare(db, zSql1, -1, &pStmt, 0); + sqlite3_stmt *stmt = NULL; + int rc = sqlite3_prepare(db, sql_select_prepare, -1, &stmt, 0); if (rc) goto finalize; - - while (sqlite3_step(pStmt) == SQLITE_ROW) { - int nIdxCol = 1; /* Number of columns in stat4 records */ - - char *zTab; /* Table name */ - char *zIndex; /* Index name */ - Index *pIdx; /* Pointer to the index object */ - int nSample; /* Number of samples */ - int nByte; /* Bytes of space required */ - int i; /* Bytes of space required */ - tRowcnt *pSpace; - - zTab = (char *)sqlite3_column_text(pStmt, 0); - if (zTab == 0) + uint32_t current_idx_count = 0; + while (sqlite3_step(stmt) == SQLITE_ROW) { + const char *space_name = (char *)sqlite3_column_text(stmt, 0); + if (space_name == NULL) continue; - zIndex = (char *)sqlite3_column_text(pStmt, 1); - if (zIndex == 0) + const char *index_name = (char *)sqlite3_column_text(stmt, 1); + if (index_name == NULL) continue; - nSample = sqlite3_column_int(pStmt, 2); - pIdx = sqlite3LocateIndex(db, zIndex, zTab); - assert(pIdx == 0 || pIdx->nSample == 0); - /* Index.nSample is non-zero at this point if data has already been - * loaded from the stat4 table. + uint32_t sample_count = sqlite3_column_int(stmt, 2); + uint32_t space_id = box_space_id_by_name(space_name, + strlen(space_name)); + if (space_id == BOX_ID_NIL) + continue; + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = box_index_id_by_name(space_id, index_name, + strlen(index_name)); + if (iid == BOX_ID_NIL) + continue; + struct index *index = space_index(space, iid); + assert(index != NULL); + assert(index->def->opts.stat != NULL || + index->def->opts.stat->sample_count == 0); + /* + * Samples array is non-zero at this point if + * data has already been loaded from the + * stat4 table. */ - if (pIdx == 0 || pIdx->nSample) + if (index->def->opts.stat != NULL && + index->def->opts.stat->sample_count) continue; - - nIdxCol = index_column_count(pIdx); - pIdx->nSampleCol = nIdxCol; - nByte = sizeof(IndexSample) * nSample; - nByte += sizeof(tRowcnt) * nIdxCol * 3 * nSample; - nByte += nIdxCol * sizeof(tRowcnt); /* Space for Index.aAvgEq[] */ - - pIdx->aSample = sqlite3DbMallocZero(db, nByte); - if (pIdx->aSample == 0) { - sqlite3_finalize(pStmt); + uint32_t column_count = index->def->key_def->part_count; + index->def->opts.stat->sample_field_count = + index->def->key_def->part_count; + /* Space for samples. */ + size_t alloc_size = sizeof(struct index_sample) * sample_count; + /* Space for eq, lt and dlt stats. */ + alloc_size += sizeof(tRowcnt) * column_count * 3 * sample_count; + /* Space for Index.aAvgEq[]. */ + alloc_size += column_count * sizeof(tRowcnt); + /* + * We are trying to fit into one chunk samples, + * eq_avg and arrays of eq, lt and dlt stats. + * Firstly comes memory for structs of samples, + * than array of eq_avg and finally arrays of + * eq, lt and dlt stats. + */ + index->def->opts.stat->samples = malloc(alloc_size); + if (index->def->opts.stat->samples == NULL) { + diag_set(OutOfMemory, alloc_size, "malloc", "samples"); rc = SQLITE_NOMEM_BKPT; goto finalize; } - pSpace = (tRowcnt *) & pIdx->aSample[nSample]; - pIdx->aAvgEq = pSpace; - pSpace += nIdxCol; - for (i = 0; i < nSample; i++) { - pIdx->aSample[i].anEq = pSpace; - pSpace += nIdxCol; - pIdx->aSample[i].anLt = pSpace; - pSpace += nIdxCol; - pIdx->aSample[i].anDLt = pSpace; - pSpace += nIdxCol; + memset(index->def->opts.stat->samples, 0, alloc_size); + /* Marking memory manually. */ + tRowcnt *pSpace = + (tRowcnt *) & index->def->opts.stat->samples[sample_count]; + index->def->opts.stat->avg_eq = pSpace; + pSpace += column_count; + for (uint32_t i = 0; i < sample_count; i++) { + index->def->opts.stat->samples[i].eq = pSpace; + pSpace += column_count; + index->def->opts.stat->samples[i].lt = pSpace; + pSpace += column_count; + index->def->opts.stat->samples[i].dlt = pSpace; + pSpace += column_count; } - assert(((u8 *) pSpace) - nByte == (u8 *) (pIdx->aSample)); - - aIndex[nIdx] = pIdx; - assert(nIdx < nIdxCnt); - ++nIdx; + assert(((u8 *) pSpace) - alloc_size == + (u8 *) (index->def->opts.stat->samples)); + indexes[current_idx_count] = index; + assert(current_idx_count < index_count); + current_idx_count++; } - rc = sqlite3_finalize(pStmt); + rc = sqlite3_finalize(stmt); if (rc) goto finalize; - rc = sqlite3_prepare(db, zSql2, -1, &pStmt, 0); + rc = sqlite3_prepare(db, sql_select_load, -1, &stmt, 0); if (rc) goto finalize; - - while (sqlite3_step(pStmt) == SQLITE_ROW) { - char *zTab; /* Table name */ - char *zIndex; /* Index name */ - Index *pIdx; /* Pointer to the index object */ - int nCol = 1; /* Number of columns in index */ - - zTab = (char *)sqlite3_column_text(pStmt, 0); - if (zTab == 0) + struct index *prev_index = NULL; + while (sqlite3_step(stmt) == SQLITE_ROW) { + const char *space_name = (char *)sqlite3_column_text(stmt, 0); + if (space_name == NULL) continue; - zIndex = (char *)sqlite3_column_text(pStmt, 1); - if (zIndex == 0) + const char *index_name = (char *)sqlite3_column_text(stmt, 1); + if (index_name == NULL) continue; - pIdx = sqlite3LocateIndex(db, zIndex, zTab); - if (pIdx == 0) + uint32_t space_id = box_space_id_by_name(space_name, + strlen(space_name)); + if (space_id == BOX_ID_NIL) continue; - - nCol = pIdx->nSampleCol; - if (pIdx != pPrevIdx) { - initAvgEq(pPrevIdx); - pPrevIdx = pIdx; + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = box_index_id_by_name(space_id, index_name, + strlen(index_name)); + if (iid == BOX_ID_NIL) + continue; + struct index *index = space_index(space, iid); + assert(index != NULL); + uint32_t field_count = + index->def->opts.stat->sample_field_count; + if (index != prev_index) { + if (prev_index != NULL) + init_avg_eq(prev_index); + prev_index = index; } - pSample = &pIdx->aSample[pIdx->nSample]; - decodeIntArray((char *)sqlite3_column_text(pStmt, 2), nCol, - pSample->anEq, 0, 0); - decodeIntArray((char *)sqlite3_column_text(pStmt, 3), nCol, - pSample->anLt, 0, 0); - decodeIntArray((char *)sqlite3_column_text(pStmt, 4), nCol, - pSample->anDLt, 0, 0); - - /* Take a copy of the sample. Add two 0x00 bytes the end of the buffer. - * This is in case the sample record is corrupted. In that case, the - * sqlite3VdbeRecordCompare() may read up to two varints past the - * end of the allocated buffer before it realizes it is dealing with - * a corrupt record. Adding the two 0x00 bytes prevents this from causing - * a buffer overread. + struct index_stat *stat = index->def->opts.stat; + assert(stat != NULL); + struct index_sample *sample = + &stat->samples[stat->sample_count]; + decode_stat_string((char *)sqlite3_column_text(stmt, 2), + field_count, sample->eq, 0, 0); + decode_stat_string((char *)sqlite3_column_text(stmt, 3), + field_count, sample->lt, 0, 0); + decode_stat_string((char *)sqlite3_column_text(stmt, 4), + field_count, sample->dlt, 0, 0); + /* + * Take a copy of the sample. Add two 0x00 bytes + * the end of the buffer. This is in case the + * sample record is corrupted. In that case, the + * sqlite3VdbeRecordCompare() may read up to two + * varints past the end of the allocated buffer + * before it realizes it is dealing with a corrupt + * record. Adding the two 0x00 bytes prevents this + * from causing a buffer overread. */ - pSample->n = sqlite3_column_bytes(pStmt, 5); - pSample->p = sqlite3DbMallocZero(db, pSample->n + 2); - if (pSample->p == 0) { - sqlite3_finalize(pStmt); + sample->key_size = sqlite3_column_bytes(stmt, 5); + sample->sample_key = calloc(1, sample->key_size + 2); + if (sample->sample_key == NULL) { + sqlite3_finalize(stmt); rc = SQLITE_NOMEM_BKPT; + diag_set(OutOfMemory, sample->key_size + 2, + "sample_key", "calloc"); goto finalize; } - if (pSample->n) { - memcpy(pSample->p, sqlite3_column_blob(pStmt, 5), - pSample->n); + if (sample->key_size > 0) { + memcpy(sample->sample_key, + sqlite3_column_blob(stmt, 5), + sample->key_size); } - pIdx->nSample++; + index->def->opts.stat->sample_count++; } - rc = sqlite3_finalize(pStmt); - if (rc == SQLITE_OK) - initAvgEq(pPrevIdx); - - assert(nIdx <= nIdxCnt); - for (int i = 0; i < nIdx; ++i) { - Index *pIdx = aIndex[i]; - assert(pIdx); - /* - sid, iid - * - space_index_key_def - * - key_compare - */ - int sid = SQLITE_PAGENO_TO_SPACEID(pIdx->tnum); - int iid = SQLITE_PAGENO_TO_INDEXID(pIdx->tnum); - struct space *s = space_by_id(sid); - assert(s); - struct key_def *def = space_index_key_def(s, iid); - assert(def); - qsort_arg(pIdx->aSample, - pIdx->nSample, - sizeof(pIdx->aSample[0]), sampleCompareMsgPack, def); + rc = sqlite3_finalize(stmt); + if (rc == SQLITE_OK) { + if (prev_index != NULL) + init_avg_eq(prev_index); + } + assert(current_idx_count <= index_count); + for (uint32_t i = 0; i < current_idx_count; ++i) { + struct index *index = indexes[i]; + assert(index != NULL); + qsort_arg(index->def->opts.stat->samples, + index->def->opts.stat->sample_count, + sizeof(struct index_sample), + sample_compare, index->def->key_def); } - finalize: - sqlite3DbFree(db, aIndex); + free(indexes); return rc; } -/* - * Load content from the _sql_stat4 table into - * the Index.aSample[] arrays of all indices. - */ static int -loadStat4(sqlite3 * db) +space_clear_stat(struct space *space, void *arg) { - Table *pTab = 0; /* Pointer to stat table */ - - assert(db->lookaside.bDisable); - pTab = sqlite3HashFind(&db->pSchema->tblHash, "_sql_stat4"); - /* _slq_stat4 is a system space, so it always exists. */ - assert(pTab != NULL); - return loadStatTbl(db, pTab, - "SELECT \"tbl\",\"idx\",count(*) FROM \"_sql_stat4\"" - " GROUP BY \"tbl\",\"idx\"", - "SELECT \"tbl\",\"idx\",\"neq\",\"nlt\",\"ndlt\"," - "\"sample\" FROM \"_sql_stat4\""); + (void) arg; + /* Statistics is relevant only for SQL query planer. */ + if (space->def->opts.sql != NULL) { + for (uint32_t i = 0; i < space->index_count; ++i) + index_stat_destroy_samples(space->index[i]->def->opts.stat); + } + return 0; } +/** [10*log_{2}(x)]:10, 9, 8, 7, 6, 5 */ +const log_est default_tuple_est[] = {33, 32, 30, 28, 26, 23}; LogEst sql_space_tuple_log_count(struct Table *tab) @@ -1646,84 +1648,52 @@ sql_space_tuple_log_count(struct Table *tab) return sqlite3LogEst(pk->vtab->size(pk)); } -/* - * Load the content of the _sql_stat1 and sql_stat4 tables. The - * contents of _sql_stat1 are used to populate the Index.aiRowEst[] -*\* arrays. The contents of sql_stat4 are used to populate the - * Index.aSample[] arrays. - * - * If the _sql_stat1 table is not present in the database, SQLITE_ERROR - * is returned. In this case, even if the sql_stat4 table is present, - * no data is read from it. - * - * If the _sql_stat4 table is not present in the database, SQLITE_ERROR - * is returned. However, in this case, data is read from the _sql_stat1 - * table (if it is present) before returning. - * - * If an OOM error occurs, this function always sets db->mallocFailed. - * This means if the caller does not care about other errors, the return - * code may be ignored. - */ -int -sqlite3AnalysisLoad(sqlite3 * db) +log_est +index_field_tuple_est(struct Index *idx, uint32_t field) { - analysisInfo sInfo; - HashElem *i, *j; - char *zSql; - int rc = SQLITE_OK; - - /* Clear any prior statistics */ - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; - j = sqliteHashNext(j)) { - Table *pTab = sqliteHashData(j); - for (i = sqliteHashFirst(&pTab->idxHash); i; - i = sqliteHashNext(i)) { - Index *pIdx = sqliteHashData(i); - pIdx->aiRowLogEst[0] = 0; - sqlite3DeleteIndexSamples(db, pIdx); - pIdx->aSample = 0; - } + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); + if (space == NULL) + return idx->aiRowLogEst[field]; + struct index *tnt_idx = + space_index(space, SQLITE_PAGENO_TO_INDEXID(idx->tnum)); + assert(tnt_idx != NULL); + assert(field <= tnt_idx->def->key_def->part_count); + if (tnt_idx->def->opts.stat == NULL) { + if (field == 0) + return DEFAULT_TUPLE_LOG_COUNT; + if (field == tnt_idx->def->key_def->part_count && + tnt_idx->def->opts.is_unique) + return 0; + return default_tuple_est[field + 1 >= 5 ? 5 : field + 1]; } + return tnt_idx->def->opts.stat->tuple_log_est[field]; +} +int +sql_analysis_load(struct sqlite3 *db) +{ + space_foreach(space_clear_stat, NULL); /* Load new statistics out of the _sql_stat1 table */ + analysisInfo sInfo; sInfo.db = db; - zSql = "SELECT \"tbl\",\"idx\",\"stat\" FROM \"_sql_stat1\""; - rc = sqlite3_exec(db, zSql, analysisLoader, &sInfo, 0); - - /* Set appropriate defaults on all indexes not in the _sql_stat1 table */ - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; - j = sqliteHashNext(j)) { - Table *pTab = sqliteHashData(j); - for (i = sqliteHashFirst(&pTab->idxHash); i; - i = sqliteHashNext(i)) { - Index *pIdx = sqliteHashData(i); - if (pIdx->aiRowLogEst[0] == 0) - sqlite3DefaultRowEst(pIdx); - } - } - + const char *load_stat1 = + "SELECT \"tbl\",\"idx\",\"stat\" FROM \"_sql_stat1\""; + int rc = sqlite3_exec(db, load_stat1, analysis_loader, &sInfo, 0); + if (rc != SQLITE_OK) + return rc; + /* + * This query is used to allocate enough memory for + * statistics. Result rows are given in a form: + * <table name>, <index name>, <count of samples> + */ + const char *init_query = "SELECT \"tbl\",\"idx\",count(*) FROM " + "\"_sql_stat4\" GROUP BY \"tbl\",\"idx\""; + /* Query for loading statistics into in-memory structs. */ + const char *load_query = "SELECT \"tbl\",\"idx\",\"neq\",\"nlt\"," + "\"ndlt\",\"sample\" FROM \"_sql_stat4\""; /* Load the statistics from the _sql_stat4 table. */ - if (rc == SQLITE_OK && OptimizationEnabled(db, SQLITE_Stat4)) { - db->lookaside.bDisable++; - rc = loadStat4(db); - db->lookaside.bDisable--; - } - - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; - j = sqliteHashNext(j)) { - Table *pTab = sqliteHashData(j); - for (i = sqliteHashFirst(&pTab->idxHash); i; - i = sqliteHashNext(i)) { - Index *pIdx = sqliteHashData(i); - sqlite3_free(pIdx->aiRowEst); - pIdx->aiRowEst = 0; - } - } - - if (rc == SQLITE_NOMEM) { + rc = load_stat_from_space(db, init_query, load_query); + if (rc == SQLITE_NOMEM) sqlite3OomFault(db); - } return rc; } - -#endif /* SQLITE_OMIT_ANALYZE */ diff --git a/src/box/sql/build.c b/src/box/sql/build.c index 5a40308fe..962dc2aa7 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -216,13 +216,9 @@ sqlite3LocateIndex(sqlite3 * db, const char *zName, const char *zTable) static void freeIndex(sqlite3 * db, Index * p) { -#ifndef SQLITE_OMIT_ANALYZE - sqlite3DeleteIndexSamples(db, p); -#endif sql_expr_free(db, p->pPartIdxWhere, false); sqlite3ExprListDelete(db, p->aColExpr); sqlite3DbFree(db, p->zColAff); - sqlite3_free(p->aiRowEst); sqlite3DbFree(db, p); } @@ -2941,9 +2937,6 @@ sqlite3CreateIndex(Parse * pParse, /* All information about this parse */ requestedSortOrder = pListItem->sortOrder & 0; pIndex->aSortOrder[i] = (u8) requestedSortOrder; } - - sqlite3DefaultRowEst(pIndex); - if (pTab == pParse->pNewTable) { /* This routine has been called to create an automatic index as a * result of a PRIMARY KEY or UNIQUE clause on a column definition, or @@ -3121,60 +3114,6 @@ sqlite3CreateIndex(Parse * pParse, /* All information about this parse */ sqlite3DbFree(db, zName); } -/* - * Fill the Index.aiRowEst[] array with default information - information - * to be used when we have not run the ANALYZE command. - * - * aiRowEst[0] is supposed to contain the number of elements in the index. - * Since we do not know, guess 1 million. aiRowEst[1] is an estimate of the - * number of rows in the table that match any particular value of the - * first column of the index. aiRowEst[2] is an estimate of the number - * of rows that match any particular combination of the first 2 columns - * of the index. And so forth. It must always be the case that -* - * aiRowEst[N]<=aiRowEst[N-1] - * aiRowEst[N]>=1 - * - * Apart from that, we have little to go on besides intuition as to - * how aiRowEst[] should be initialized. The numbers generated here - * are based on typical values found in actual indices. - */ -void -sqlite3DefaultRowEst(Index * pIdx) -{ - /* 10, 9, 8, 7, 6 */ - LogEst aVal[] = { 33, 32, 30, 28, 26 }; - LogEst *a = pIdx->aiRowLogEst; - int nCopy = MIN(ArraySize(aVal), pIdx->nColumn); - int i; - - /* Set the first entry (number of rows in the index) to the estimated - * number of rows in the table, or half the number of rows in the table - * for a partial index. But do not let the estimate drop below 10. - */ - a[0] = pIdx->pTable->tuple_log_count; - if (pIdx->pPartIdxWhere != 0) - a[0] -= 10; - assert(10 == sqlite3LogEst(2)); - if (a[0] < 33) - a[0] = 33; - assert(33 == sqlite3LogEst(10)); - - /* Estimate that a[1] is 10, a[2] is 9, a[3] is 8, a[4] is 7, a[5] is - * 6 and each subsequent value (if any) is 5. - */ - memcpy(&a[1], aVal, nCopy * sizeof(LogEst)); - int col_count = index_column_count(pIdx); - for (i = nCopy + 1; i <= col_count; i++) { - a[i] = 23; - assert(23 == sqlite3LogEst(5)); - } - - assert(0 == sqlite3LogEst(1)); - if (IsUniqueIndex(pIdx)) - a[pIdx->nColumn] = 0; -} - /** * Return number of columns in given index. * If space is ephemeral, use internal diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y index 8a3b35a37..f6c67158c 100644 --- a/src/box/sql/parse.y +++ b/src/box/sql/parse.y @@ -1495,10 +1495,8 @@ cmd ::= DROP TRIGGER ifexists(NOERR) fullname(X). { /* %endif SQLITE_OMIT_REINDEX */ /////////////////////////////////// ANALYZE /////////////////////////////////// -%ifndef SQLITE_OMIT_ANALYZE cmd ::= ANALYZE. {sqlite3Analyze(pParse, 0);} cmd ::= ANALYZE nm(X). {sqlite3Analyze(pParse, &X);} -%endif //////////////////////// ALTER TABLE table ... //////////////////////////////// %ifndef SQLITE_OMIT_ALTERTABLE diff --git a/src/box/sql/pragma.c b/src/box/sql/pragma.c index 02990e422..c73103f68 100644 --- a/src/box/sql/pragma.c +++ b/src/box/sql/pragma.c @@ -424,8 +424,7 @@ sqlite3Pragma(Parse * pParse, Token * pId, /* First part of [schema.]id field */ sqlite3VdbeMultiLoad(v, 2, "sii", pIdx->zName, avg_tuple_size_idx, - pIdx-> - aiRowLogEst[0]); + index_field_tuple_est(pIdx, 0)); sqlite3VdbeAddOp2(v, OP_ResultRow, 1, 4); } diff --git a/src/box/sql/prepare.c b/src/box/sql/prepare.c index 7a16074dc..a3a56515a 100644 --- a/src/box/sql/prepare.c +++ b/src/box/sql/prepare.c @@ -186,11 +186,8 @@ sqlite3InitDatabase(sqlite3 * db) assert(db->init.busy); { rc = initData.rc; -#ifndef SQLITE_OMIT_ANALYZE - if (rc == SQLITE_OK) { - sqlite3AnalysisLoad(db); - } -#endif + if (rc == SQLITE_OK) + sql_analysis_load(db); } if (db->mallocFailed) { rc = SQLITE_NOMEM_BKPT; diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h index d026a213d..c2ea066d6 100644 --- a/src/box/sql/sqliteInt.h +++ b/src/box/sql/sqliteInt.h @@ -1473,7 +1473,6 @@ typedef struct FuncDef FuncDef; typedef struct FuncDefHash FuncDefHash; typedef struct IdList IdList; typedef struct Index Index; -typedef struct IndexSample IndexSample; typedef struct KeyClass KeyClass; typedef struct KeyInfo KeyInfo; typedef struct Lookaside Lookaside; @@ -1695,7 +1694,6 @@ struct sqlite3 { #define SQLITE_SubqCoroutine 0x0100 /* Evaluate subqueries as coroutines */ #define SQLITE_Transitive 0x0200 /* Transitive constraints */ #define SQLITE_OmitNoopJoin 0x0400 /* Omit unused tables in joins */ -#define SQLITE_Stat4 0x0800 /* Use STAT4 data */ #define SQLITE_CursorHints 0x2000 /* Add OP_CursorHint opcodes */ #define SQLITE_AllOpts 0xffff /* All optimizations */ @@ -2173,14 +2171,41 @@ struct Index { * or _NONE */ unsigned idxType:2; /* 1==UNIQUE, 2==PRIMARY KEY, 0==CREATE INDEX */ - unsigned bUnordered:1; /* Use this index for == or IN queries only */ - unsigned noSkipScan:1; /* Do not try to use skip-scan if true */ - int nSample; /* Number of elements in aSample[] */ - int nSampleCol; /* Size of IndexSample.anEq[] and so on */ - tRowcnt *aAvgEq; /* Average nEq values for keys not in aSample */ - IndexSample *aSample; /* Samples of the left-most key */ - tRowcnt *aiRowEst; /* Non-logarithmic stat1 data for this index */ }; +/** + * default_tuple_est[] array contains default information + * which is used when we don't have real space, e.g. temporary + * objects representing result set of nested SELECT or VIEW. + * + * First number is supposed to contain the number of elements + * in the index. Since we do not know, guess 1 million. + * Second one is an estimate of the number of rows in the + * table that match any particular value of the first column of + * the index. Third one is an estimate of the number of + * rows that match any particular combination of the first 2 + * columns of the index. And so on. It must always be true: + * + * default_tuple_est[N] <= default_tuple_est[N-1] + * default_tuple_est[N] >= 1 + * + * Apart from that, we have little to go on besides intuition + * as to how default values should be initialized. The numbers + * generated here are based on typical values found in actual + * indices. + */ +extern const log_est default_tuple_est[]; + +/** + * Fetch statistics concerning tuples to be selected. + * If there is no appropriate Tarantool's index, + * return one of default values. + * + * @param idx Index. + * @param field Number of field to be examined. + * @retval Estimate logarithm of tuples selected by given field. + */ +log_est +index_field_tuple_est(struct Index *idx, uint32_t field); /* * Allowed values for Index.idxType @@ -2201,19 +2226,6 @@ struct Index { */ #define XN_EXPR (-2) /* Indexed column is an expression */ -/* - * Each sample stored in the sql_stat4 table is represented in memory - * using a structure of this type. See documentation at the top of the - * analyze.c source file for additional information. - */ -struct IndexSample { - void *p; /* Pointer to sampled record */ - int n; /* Size of record in bytes */ - tRowcnt *anEq; /* Est. number of rows where the key equals this sample */ - tRowcnt *anLt; /* Est. number of rows where key is less than this sample */ - tRowcnt *anDLt; /* Est. number of distinct keys less than this sample */ -}; - #ifdef DEFAULT_TUPLE_COUNT #undef DEFAULT_TUPLE_COUNT #endif @@ -3946,9 +3958,19 @@ ssize_t sql_index_tuple_size(struct space *space, struct index *idx); int sqlite3InvokeBusyHandler(BusyHandler *); -int sqlite3AnalysisLoad(sqlite3 *); -void sqlite3DeleteIndexSamples(sqlite3 *, Index *); -void sqlite3DefaultRowEst(Index *); + +/** + * Load the content of the _sql_stat1 and sql_stat4 tables. The + * contents of _sql_stat1 are used to populate the tuple_stat1[] + * arrays. The contents of sql_stat4 are used to populate the + * samples[] arrays. + * + * @param db Database handler. + * @retval SQLITE_OK on success, smth else otherwise. + */ +int +sql_analysis_load(struct sqlite3 *db); + uint32_t index_column_count(const Index *); bool diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index aaf912320..c02ce745a 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -4844,7 +4844,7 @@ case OP_RenameTable: { sqlite3DbFree(db, (void*)zSqlStmt); break; } -#if !defined(SQLITE_OMIT_ANALYZE) + /* Opcode: LoadAnalysis P1 * * * * * * Read the sql_stat1 table for database P1 and load the content @@ -4853,11 +4853,10 @@ case OP_RenameTable: { */ case OP_LoadAnalysis: { assert(pOp->p1==0 ); - rc = sqlite3AnalysisLoad(db); + rc = sql_analysis_load(db); if (rc) goto abort_due_to_error; break; } -#endif /* !defined(SQLITE_OMIT_ANALYZE) */ /* Opcode: DropTable P1 * * P4 * * diff --git a/src/box/sql/where.c b/src/box/sql/where.c index e6f9ef431..975e7059f 100644 --- a/src/box/sql/where.c +++ b/src/box/sql/where.c @@ -891,7 +891,16 @@ whereKeyStats(Parse * pParse, /* Database connection */ int roundUp, /* Round up if true. Round down if false */ tRowcnt * aStat) /* OUT: stats written here */ { - IndexSample *aSample = pIdx->aSample; + uint32_t space_id = SQLITE_PAGENO_TO_SPACEID(pIdx->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = SQLITE_PAGENO_TO_INDEXID(pIdx->tnum); + struct index *idx = space_index(space, iid); + assert(idx != NULL && idx->def->opts.stat != NULL); + struct index_sample *samples = idx->def->opts.stat->samples; + assert(idx->def->opts.stat->sample_count > 0); + assert(idx->def->opts.stat->samples != NULL); + assert(idx->def->opts.stat->sample_field_count >= pRec->nField); int iCol; /* Index of required stats in anEq[] etc. */ int i; /* Index of first sample >= pRec */ int iSample; /* Smallest sample larger than or equal to pRec */ @@ -905,8 +914,7 @@ whereKeyStats(Parse * pParse, /* Database connection */ UNUSED_PARAMETER(pParse); #endif assert(pRec != 0); - assert(pIdx->nSample > 0); - assert(pRec->nField > 0 && pRec->nField <= pIdx->nSampleCol); + assert(pRec->nField > 0); /* Do a binary search to find the first sample greater than or equal * to pRec. If pRec contains a single field, the set of samples to search @@ -954,7 +962,8 @@ whereKeyStats(Parse * pParse, /* Database connection */ */ nField = pRec->nField; iCol = 0; - iSample = pIdx->nSample * nField; + uint32_t sample_count = idx->def->opts.stat->sample_count; + iSample = sample_count * nField; do { int iSamp; /* Index in aSample[] of test sample */ int n; /* Number of fields in test sample */ @@ -967,8 +976,8 @@ whereKeyStats(Parse * pParse, /* Database connection */ * fields that is greater than the previous effective sample. */ for (n = (iTest % nField) + 1; n < nField; n++) { - if (aSample[iSamp - 1].anLt[n - 1] != - aSample[iSamp].anLt[n - 1]) + if (samples[iSamp - 1].lt[n - 1] != + samples[iSamp].lt[n - 1]) break; } } else { @@ -977,15 +986,15 @@ whereKeyStats(Parse * pParse, /* Database connection */ pRec->nField = n; res = - sqlite3VdbeRecordCompareMsgpack(aSample[iSamp].n, - aSample[iSamp].p, pRec); + sqlite3VdbeRecordCompareMsgpack(samples[iSamp].key_size, + samples[iSamp].sample_key, + pRec); if (res < 0) { iLower = - aSample[iSamp].anLt[n - 1] + aSample[iSamp].anEq[n - - 1]; + samples[iSamp].lt[n - 1] + samples[iSamp].eq[n - 1]; iMin = iTest + 1; } else if (res == 0 && n < nField) { - iLower = aSample[iSamp].anLt[n - 1]; + iLower = samples[iSamp].lt[n - 1]; iMin = iTest + 1; res = -1; } else { @@ -1003,12 +1012,12 @@ whereKeyStats(Parse * pParse, /* Database connection */ if (pParse->db->mallocFailed == 0) { if (res == 0) { /* If (res==0) is true, then pRec must be equal to sample i. */ - assert(i < pIdx->nSample); + assert(i < (int) sample_count); assert(iCol == nField - 1); pRec->nField = nField; assert(0 == - sqlite3VdbeRecordCompareMsgpack(aSample[i].n, - aSample[i].p, + sqlite3VdbeRecordCompareMsgpack(samples[i].key_size, + samples[i].sample_key, pRec) || pParse->db->mallocFailed); } else { @@ -1016,12 +1025,12 @@ whereKeyStats(Parse * pParse, /* Database connection */ * all samples in the aSample[] array, pRec must be smaller than the * (iCol+1) field prefix of sample i. */ - assert(i <= pIdx->nSample && i >= 0); + assert(i <= (int) sample_count && i >= 0); pRec->nField = iCol + 1; - assert(i == pIdx->nSample - || sqlite3VdbeRecordCompareMsgpack(aSample[i].n, - aSample[i].p, - pRec) > 0 + assert(i == (int) sample_count || + sqlite3VdbeRecordCompareMsgpack(samples[i].key_size, + samples[i].sample_key, + pRec) > 0 || pParse->db->mallocFailed); /* if i==0 and iCol==0, then record pRec is smaller than all samples @@ -1032,13 +1041,15 @@ whereKeyStats(Parse * pParse, /* Database connection */ if (iCol > 0) { pRec->nField = iCol; assert(sqlite3VdbeRecordCompareMsgpack - (aSample[i].n, aSample[i].p, pRec) <= 0 + (samples[i].key_size, + samples[i].sample_key, pRec) <= 0 || pParse->db->mallocFailed); } if (i > 0) { pRec->nField = nField; assert(sqlite3VdbeRecordCompareMsgpack - (aSample[i - 1].n, aSample[i - 1].p, + (samples[i - 1].key_size, + samples[i - 1].sample_key, pRec) < 0 || pParse->db->mallocFailed); } } @@ -1048,18 +1059,18 @@ whereKeyStats(Parse * pParse, /* Database connection */ if (res == 0) { /* Record pRec is equal to sample i */ assert(iCol == nField - 1); - aStat[0] = aSample[i].anLt[iCol]; - aStat[1] = aSample[i].anEq[iCol]; + aStat[0] = samples[i].lt[iCol]; + aStat[1] = samples[i].eq[iCol]; } else { /* At this point, the (iCol+1) field prefix of aSample[i] is the first * sample that is greater than pRec. Or, if i==pIdx->nSample then pRec * is larger than all samples in the array. */ tRowcnt iUpper, iGap; - if (i >= pIdx->nSample) { - iUpper = sqlite3LogEstToInt(pIdx->aiRowLogEst[0]); + if (i >= (int) sample_count) { + iUpper = sqlite3LogEstToInt(idx->def->opts.stat->tuple_log_est[0]); } else { - iUpper = aSample[i].anLt[iCol]; + iUpper = samples[i].lt[iCol]; } if (iLower >= iUpper) { @@ -1073,7 +1084,7 @@ whereKeyStats(Parse * pParse, /* Database connection */ iGap = iGap / 3; } aStat[0] = iLower + iGap; - aStat[1] = pIdx->aAvgEq[iCol]; + aStat[1] = idx->def->opts.stat->avg_eq[iCol]; } /* Restore the pRec->nField value before returning. */ @@ -1164,10 +1175,15 @@ whereRangeSkipScanEst(Parse * pParse, /* Parsing & code generating context */ int *pbDone) /* Set to true if at least one expr. value extracted */ { Index *p = pLoop->pIndex; + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(p->tnum)); + assert(space != NULL); + struct index *index = space_index(space, + SQLITE_PAGENO_TO_INDEXID(p->tnum)); + assert(index != NULL && index->def->opts.stat != NULL); int nEq = pLoop->nEq; sqlite3 *db = pParse->db; int nLower = -1; - int nUpper = p->nSample + 1; + int nUpper = index->def->opts.stat->sample_count + 1; int rc = SQLITE_OK; u8 aff = sqlite3IndexColumnAffinity(db, p, nEq); struct coll *pColl; @@ -1185,15 +1201,17 @@ whereRangeSkipScanEst(Parse * pParse, /* Parsing & code generating context */ if (pUpper && rc == SQLITE_OK) { rc = sqlite3Stat4ValueFromExpr(pParse, pUpper->pExpr->pRight, aff, &p2); - nUpper = p2 ? 0 : p->nSample; + nUpper = p2 ? 0 : index->def->opts.stat->sample_count; } if (p1 || p2) { int i; int nDiff; - for (i = 0; rc == SQLITE_OK && i < p->nSample; i++) { - rc = sqlite3Stat4Column(db, p->aSample[i].p, - p->aSample[i].n, nEq, &pVal); + struct index_sample *samples = index->def->opts.stat->samples; + uint32_t sample_count = index->def->opts.stat->sample_count; + for (i = 0; rc == SQLITE_OK && i < (int) sample_count; i++) { + rc = sqlite3Stat4Column(db, samples[i].sample_key, + samples[i].key_size, nEq, &pVal); if (rc == SQLITE_OK && p1) { int res = sqlite3MemCompare(p1, pVal, pColl); if (res >= 0) @@ -1217,7 +1235,8 @@ whereRangeSkipScanEst(Parse * pParse, /* Parsing & code generating context */ */ if (nDiff != 1 || pUpper == 0 || pLower == 0) { int nAdjust = - (sqlite3LogEst(p->nSample) - sqlite3LogEst(nDiff)); + (sqlite3LogEst(sample_count) - + sqlite3LogEst(nDiff)); pLoop->nOut -= nAdjust; *pbDone = 1; WHERETRACE(0x10, @@ -1288,8 +1307,22 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ Index *p = pLoop->pIndex; int nEq = pLoop->nEq; - - if (p->nSample > 0 && nEq < p->nSampleCol) { + uint32_t space_id = SQLITE_PAGENO_TO_SPACEID(p->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = SQLITE_PAGENO_TO_INDEXID(p->tnum); + struct index *idx = space_index(space, iid); + assert(idx != NULL); + struct index_stat *stat = idx->def->opts.stat; + /* + * Create surrogate stat in case ANALYZE command hasn't + * been ran. Simply fill it with zeros. + */ + struct index_stat surrogate_stat; + memset(&surrogate_stat, 0, sizeof(surrogate_stat)); + if (stat == NULL) + stat = &surrogate_stat; + if (stat->sample_count > 0 && nEq < (int) stat->sample_field_count) { if (nEq == pBuilder->nRecValid) { UnpackedRecord *pRec = pBuilder->pRec; tRowcnt a[2]; @@ -1332,15 +1365,6 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ * are in range. */ iLower = 0; - uint32_t space_id = - SQLITE_PAGENO_TO_SPACEID(p->tnum); - struct space *space = space_by_id(space_id); - assert(space != NULL); - uint32_t iid = - SQLITE_PAGENO_TO_INDEXID(p->tnum); - struct index *idx = - space_index(space, iid); - assert(idx != NULL); iUpper = idx->vtab->size(idx); } else { /* Note: this call could be optimized away - since the same values must @@ -1506,8 +1530,6 @@ whereEqualScanEst(Parse * pParse, /* Parsing & code generating context */ assert(nEq >= 1); assert(nEq <= (int)index_column_count(p)); - assert(p->aSample != 0); - assert(p->nSample > 0); assert(pBuilder->nRecValid < nEq); /* If values are not available for all fields of the index to the left @@ -1556,14 +1578,13 @@ whereInScanEst(Parse * pParse, /* Parsing & code generating context */ tRowcnt * pnRow) /* Write the revised row estimate here */ { Index *p = pBuilder->pNew->pIndex; - i64 nRow0 = sqlite3LogEstToInt(p->aiRowLogEst[0]); + i64 nRow0 = sqlite3LogEstToInt(index_field_tuple_est(p, 0)); int nRecValid = pBuilder->nRecValid; int rc = SQLITE_OK; /* Subfunction return code */ tRowcnt nEst; /* Number of rows for a single term */ tRowcnt nRowEst = 0; /* New estimate of the number of rows */ int i; /* Loop counter */ - assert(p->aSample != 0); for (i = 0; rc == SQLITE_OK && i < pList->nExpr; i++) { nEst = nRow0; rc = whereEqualScanEst(pParse, pBuilder, pList->a[i].pExpr, @@ -2323,9 +2344,25 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ WO_EQ | WO_IN | WO_GT | WO_GE | WO_LT | WO_LE | WO_ISNULL | WO_IS; } - if (pProbe->bUnordered) + struct space *space = + space_by_id(SQLITE_PAGENO_TO_SPACEID(pProbe->tnum)); + struct index *idx = NULL; + struct index_stat *stat = NULL; + if (space != NULL) { + idx = space_index(space, SQLITE_PAGENO_TO_INDEXID(pProbe->tnum)); + assert(idx != NULL); + stat = idx->def->opts.stat; + } + /* + * Create surrogate stat in case ANALYZE command hasn't + * been ran. Simply fill it with zeros. + */ + struct index_stat surrogate_stat; + memset(&surrogate_stat, 0, sizeof(surrogate_stat)); + if (stat == NULL) + stat = &surrogate_stat; + if (stat->is_unordered) opMask &= ~(WO_GT | WO_GE | WO_LT | WO_LE); - assert(pNew->nEq < nProbeCol); saved_nEq = pNew->nEq; @@ -2339,7 +2376,7 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ pTerm = whereScanInit(&scan, pBuilder->pWC, pSrc->iCursor, saved_nEq, opMask, pProbe); pNew->rSetup = 0; - rSize = pProbe->aiRowLogEst[0]; + rSize = index_field_tuple_est(pProbe, 0); rLogSize = estLog(rSize); for (; rc == SQLITE_OK && pTerm != 0; pTerm = whereScanNext(&scan)) { u16 eOp = pTerm->eOperator; /* Shorthand for pTerm->eOperator */ @@ -2497,8 +2534,8 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ } else { tRowcnt nOut = 0; if (nInMul == 0 - && pProbe->nSample - && pNew->nEq <= pProbe->nSampleCol + && stat->sample_count + && pNew->nEq <= stat->sample_field_count && ((eOp & WO_IN) == 0 || !ExprHasProperty(pTerm->pExpr, EP_xIsSelect)) @@ -2533,8 +2570,8 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ } if (nOut == 0) { pNew->nOut += - (pProbe->aiRowLogEst[nEq] - - pProbe->aiRowLogEst[nEq - 1]); + (index_field_tuple_est(pProbe, nEq) - + index_field_tuple_est(pProbe, nEq -1)); if (eOp & WO_ISNULL) { /* TUNING: If there is no likelihood() value, assume that a * "col IS NULL" expression matches twice as many rows @@ -2619,18 +2656,17 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ */ assert(42 == sqlite3LogEst(18)); if (saved_nEq == saved_nSkip && saved_nEq + 1U < nProbeCol && - pProbe->noSkipScan == 0 && + stat->skip_scan_enabled == true && /* TUNING: Minimum for skip-scan */ - pProbe->aiRowLogEst[saved_nEq + 1] >= 42 && + index_field_tuple_est(pProbe, saved_nEq + 1) >= 42 && (rc = whereLoopResize(db, pNew, pNew->nLTerm + 1)) == SQLITE_OK) { LogEst nIter; pNew->nEq++; pNew->nSkip++; pNew->aLTerm[pNew->nLTerm++] = 0; pNew->wsFlags |= WHERE_SKIPSCAN; - nIter = - pProbe->aiRowLogEst[saved_nEq] - - pProbe->aiRowLogEst[saved_nEq + 1]; + nIter = index_field_tuple_est(pProbe, saved_nEq) - + index_field_tuple_est(pProbe, saved_nEq + 1); pNew->nOut -= nIter; /* TUNING: Because uncertainties in the estimates for skip-scan queries, * add a 1.375 fudge factor to make skip-scan slightly less likely. @@ -2648,6 +2684,31 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ return rc; } +/** + * Check if is_unordered flag is set for given index. + * Since now there may not be corresponding Tarantool's index + * (e.g. for temporary objects such as ephemeral tables), + * a few preliminary checks are made. + * + * @param idx Index to be tested. + * @retval True, if statistics exist and unordered flag is set. + */ +static bool +index_is_unordered(struct Index *idx) +{ + assert(idx != NULL); + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); + if (space == NULL) + return false; + uint32_t iid = SQLITE_PAGENO_TO_INDEXID(idx->tnum); + struct index *tnt_idx = space_index(space, iid); + if (tnt_idx == NULL) + return false; + if (tnt_idx->def->opts.stat != NULL) + return tnt_idx->def->opts.stat->is_unordered; + return false; +} + /* * Return True if it is possible that pIndex might be useful in * implementing the ORDER BY clause in pBuilder. @@ -2664,8 +2725,7 @@ indexMightHelpWithOrderBy(WhereLoopBuilder * pBuilder, ExprList *aColExpr; int ii, jj; int nIdxCol = index_column_count(pIndex); - - if (pIndex->bUnordered) + if (index_is_unordered(pIndex)) return 0; if ((pOB = pBuilder->pWInfo->pOrderBy) == 0) return 0; @@ -2878,7 +2938,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ testcase(pNew->iTab != pSrc->iCursor); /* See ticket [98d973b8f5] */ continue; /* Partial index inappropriate for this query */ } - rSize = pProbe->aiRowLogEst[0]; + rSize = index_field_tuple_est(pProbe, 0); pNew->nEq = 0; pNew->nBtm = 0; pNew->nTop = 0; @@ -3276,8 +3336,8 @@ wherePathSatisfiesOrderBy(WhereInfo * pWInfo, /* The WHERE clause */ if (pLoop->wsFlags & WHERE_IPK) { pIndex = 0; nColumn = 1; - } else if ((pIndex = pLoop->pIndex) == 0 - || pIndex->bUnordered) { + } else if ((pIndex = pLoop->pIndex) == NULL || + index_is_unordered(pIndex)) { return 0; } else { nColumn = index_column_count(pIndex); diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c index ccdff4684..0bcb529bd 100644 --- a/src/box/sql/whereexpr.c +++ b/src/box/sql/whereexpr.c @@ -1297,9 +1297,7 @@ exprAnalyze(SrcList * pSrc, /* the FROM clause */ */ if (pExpr->op == TK_NOTNULL && pExpr->pLeft->op == TK_COLUMN - && pExpr->pLeft->iColumn >= 0 - && OptimizationEnabled(db, SQLITE_Stat4) - ) { + && pExpr->pLeft->iColumn >= 0) { Expr *pNewExpr; Expr *pLeft = pExpr->pLeft; int idxNew; -- 2.15.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 4/4] sql: move SQL statistics to server 2018-04-23 20:29 ` [tarantool-patches] [PATCH 4/4] sql: move SQL statistics to server Nikita Pettik @ 2018-04-24 12:51 ` Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 0 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-04-24 12:51 UTC (permalink / raw) To: Nikita Pettik, tarantool-patches Hello. The patch is great, but see 17 comments below %) Most of them are minor except one. The key idea is to use region allocator to build statistics on analyze. See details below. On 23/04/2018 23:29, Nikita Pettik wrote: > SQLite provides kind of statistics concerning data holding in each index > and space. It is used to help query planer build optimized plan. Such > statistics don't depend on internal index implementation (e.g. B-tree or > LSM tree) and contain information concerning tuple distribution. > This patch moves members resbonsible statistics from original SQLite > structs to Tarantool ones. To be more precise, now index's opts contains > statistics from _stat1 space (arrays of integers and their loarithm 1. resbonsible -> responsible, loarithm -> logarithm. > approximation) and more sophisticated one from _stat4 (array of samples). > It also contains set of flags to turn on/off different optimizations > (such as skip-scan). > > After ANALYZE command is executed, statistics are saved into _sql_stat1 > and _sql_stat4 spaces (in order to persist them). However, it should also > be loaded into in-memory structs representing indexes. This patch > reworks original SQLite routine to load it directly to newly introduced > stat struct of index. > > Closes #3253 > --- > src/box/index_def.c | 92 ++++++ > src/box/index_def.h | 90 ++++++ > src/box/sql/analyze.c | 742 +++++++++++++++++++++++------------------------- > src/box/sql/build.c | 61 ---- > src/box/sql/parse.y | 2 - > src/box/sql/pragma.c | 3 +- > src/box/sql/prepare.c | 7 +- > src/box/sql/sqliteInt.h | 72 +++-- > src/box/sql/vdbe.c | 5 +- > src/box/sql/where.c | 190 ++++++++----- > src/box/sql/whereexpr.c | 4 +- > 11 files changed, 716 insertions(+), 552 deletions(-) > > diff --git a/src/box/index_def.c b/src/box/index_def.c > index eda39a622..14ff94192 100644 > --- a/src/box/index_def.c > +++ b/src/box/index_def.c > @@ -154,9 +157,98 @@ index_def_dup(const struct index_def *def) > return NULL; > } > } > + if (def->opts.stat != NULL) { > + dup->opts.stat = malloc(sizeof(*dup->opts.stat)); 2. Lets reduce count of mallocs and allocate all on the single huge malloc. It eliminates index_stat_destroy_samples, and memory fragmentation. See how it is done for space_def. You must define an index_stat_sizeof function, that takes what is needed to calculate summary size, calculates offsets on different parts of the result memory etc. I know, that it is built in load_stat_from_space, but seems like I found a solution using region. See the comments below. > diff --git a/src/box/index_def.h b/src/box/index_def.h > index 6167570b7..52e9a0548 100644 > --- a/src/box/index_def.h > +++ b/src/box/index_def.h > @@ -113,6 +181,27 @@ index_opts_create(struct index_opts *opts) > *opts = index_opts_default; > } > > +/** > + * Release memory allocated for samples, statistics they include, > + * sample keys and avg_eq array. > + */ > +void > +index_stat_destroy_samples(struct index_stat *stat); > + > +/** > + * Destroy index stat. > + */ > +static inline void > +destroy_stat(struct index_stat *stat) { 3. It will be removed after review fixes, but why you did not just inlined it in a single usage place? > + if (stat == NULL) > + return; > + free(stat->tuple_stat1); > + free(stat->tuple_log_est); > + index_stat_destroy_samples(stat); > + TRASH(stat); > + free(stat); > +} > + > /** > * Destroy index options > */ > @@ -120,6 +209,7 @@ static inline void > index_opts_destroy(struct index_opts *opts) > { > free(opts->sql); > + destroy_stat(opts->stat); 4. Will be turned into just free(). > diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c > index 4965e899a..14dfbd551 100644 > --- a/src/box/sql/analyze.c > +++ b/src/box/sql/analyze.c > @@ -104,8 +104,8 @@ > * large nEq values. > * > */ > -#ifndef SQLITE_OMIT_ANALYZE 5. The definition still exists in mkkeywordhash.c > @@ -1217,49 +1217,46 @@ struct analysisInfo { > sqlite3 *db; > }; > > -/* > - * The first argument points to a nul-terminated string containing a > - * list of space separated integers. Read the first nOut of these into > - * the array aOut[]. > +/** > + * The first argument points to a nul-terminated string > + * containing a list of space separated integers. Load > + * the first stat_size of these into the output arrays. > + * > + * @param stat_string String containing array of integers. > + * @param stat_size Size of output arrays. > + * @param[out] stat_exact Decoded array of statistics. > + * @param[out] stat_log Decoded array of stat logariphms. > + * @param[out] index Handle extra flags for this index, > + * if not NULL. > */ > static void > -decodeIntArray(char *zIntArray, /* String containing int array to decode */ > - int nOut, /* Number of slots in aOut[] */ > - tRowcnt * aOut, /* Store integers here */ > - LogEst * aLog, /* Or, if aOut==0, here */ > - Index * pIndex /* Handle extra flags for this index, if not NULL */ > - ) > -{ > - char *z = zIntArray; > - int c; > - int i; > - tRowcnt v; > - > - if (z == 0) > +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, > + LogEst *stat_log, struct index *index) { 6. Please, create a ticket to store stat_string in a stat table as map or array of stat values instead of string. It allows to extremely simplify all of this pure absolute shit using raw MessagePack array of integers iteration. > + char *z = (char *) stat_string; > + if (z == NULL) > z = ""; > - > - for (i = 0; *z && i < nOut; i++) { > - v = 0; > + for (int i = 0; *z && i < stat_size; i++) { > + tRowcnt v = 0; > + int c; > while ((c = z[0]) >= '0' && c <= '9') { > v = v * 10 + c - '0'; > z++; > } 7. sscanf? > - if (aOut) > - aOut[i] = v; > - if (aLog) > - aLog[i] = sqlite3LogEst(v); > + if (stat_exact != NULL) > + stat_exact[i] = v; > + if (stat_log != NULL) > + stat_log[i] = sqlite3LogEst(v); > if (*z == ' ') > z++; > } > - > - if (pIndex) { > - pIndex->bUnordered = 0; > - pIndex->noSkipScan = 0; > + if (index != NULL) { 8. As I can see, you use this function 3 times with index == NULL and 1 time with not NULL. Lets just inline these things where the index is not NULL, and remove the index argument. > @@ -1269,369 +1266,374 @@ decodeIntArray(char *zIntArray, /* String containing int array to decode */ > } > } > > -/* > +/** > * This callback is invoked once for each index when reading the > * _sql_stat1 table. > * > * argv[0] = name of the table > * argv[1] = name of the index (might be NULL) > - * argv[2] = results of analysis - on integer for each column > + * argv[2] = results of analysis - array of integers > + * > + * Entries for which argv[1]==NULL simply record the number > + * of rows in the table. This routine also allocates memory > + * for stat struct itself and statistics which are not related > + * to stat4 samples. > * > - * Entries for which argv[1]==NULL simply record the number of rows in > - * the table. > + * @param data Additional analysis info. > + * @param argc Number of arguments. > + * @param argv Callback arguments. > + * @retval 0 on success, -1 otherwise. > */ > static int > -analysisLoader(void *pData, int argc, char **argv, char **NotUsed) > +analysis_loader(void *data, int argc, char **argv, char **unused) > { > - analysisInfo *pInfo = (analysisInfo *) pData; > - Index *pIndex; > - Table *pTable; > - const char *z; > - > assert(argc == 3); > - UNUSED_PARAMETER2(NotUsed, argc); > - > - if (argv == 0 || argv[0] == 0 || argv[2] == 0) { > + UNUSED_PARAMETER(data); > + UNUSED_PARAMETER2(unused, argc); > + if (argv == 0 || argv[0] == 0 || argv[2] == 0) > return 0; > - } > - pTable = sqlite3HashFind(&pInfo->db->pSchema->tblHash, argv[0]); > - if (pTable == NULL) > + uint32_t space_id = box_space_id_by_name(argv[0], strlen(argv[0])); > + assert(space_id != BOX_ID_NIL); > + struct space *space = space_by_id(space_id); > + assert(space != NULL); > + struct index *index; > + if (space == NULL) > return 0; > - if (argv[1] == 0) { > - pIndex = 0; > + if (argv[1] == NULL) { > + index = NULL; 9. Lets return 0 here and remove 'if index != NULL' below. The code will look more compact. > } else if (sqlite3_stricmp(argv[0], argv[1]) == 0) { > - pIndex = sqlite3PrimaryKeyIndex(pTable); > + index = space_index(space, 0); > + assert(index != NULL); > } else { > - pIndex = sqlite3HashFind(&pTable->idxHash, argv[1]); > + uint32_t iid = box_index_id_by_name(space_id, argv[1], > + strlen(argv[1])); > + assert(iid != BOX_ID_NIL); > + index = space_index(space, iid); > } > - z = argv[2]; > - > - if (pIndex) { > - tRowcnt *aiRowEst = 0; > - int nCol = index_column_count(pIndex) + 1; > - /* Index.aiRowEst may already be set here if there are duplicate > - * _sql_stat1 entries for this index. In that case just clobber > - * the old data with the new instead of allocating a new array. > + if (index != NULL) { > + /* > + * Additional field is used to describe total > + * count of tuples in index. Although now all > + * indexes feature the same number of tuples, > + * partial indexes are going to be implemented > + * someday. > */ > - if (pIndex->aiRowEst == 0) { > - pIndex->aiRowEst = > - (tRowcnt *) sqlite3MallocZero(sizeof(tRowcnt) * > - nCol); > - if (pIndex->aiRowEst == 0) > - sqlite3OomFault(pInfo->db); > + int column_count = index->def->key_def->part_count + 1; > + /* > + * Stat arrays may already be set here if there > + * are duplicate _sql_stat1 entries for this > + * index. In that case just clobber the old data > + * with the new instead of allocating a new array. > + */ > + struct index_stat *stat = index->def->opts.stat; > + if (stat == NULL) { > + stat = calloc(1, sizeof(*stat)); > + if (stat == NULL) { > + diag_set(OutOfMemory, sizeof(*stat), "calloc", > + "stat"); > + return -1; > + } > + index->def->opts.stat = stat; > } > - aiRowEst = pIndex->aiRowEst; > - pIndex->bUnordered = 0; > - decodeIntArray((char *)z, nCol, aiRowEst, pIndex->aiRowLogEst, > - pIndex); > - } > - > - return 0; > -} > - > -/* > - * If the Index.aSample variable is not NULL, delete the aSample[] array > - * and its contents. > - */ > -void > -sqlite3DeleteIndexSamples(sqlite3 * db, Index * pIdx) > -{ > - if (pIdx->aSample) { > - int j; > - for (j = 0; j < pIdx->nSample; j++) { > - IndexSample *p = &pIdx->aSample[j]; > - sqlite3DbFree(db, p->p); > + if (stat->tuple_stat1 == NULL) { 10. Can it be NULL, if index->def->opts.stat was not NULL above? And as I can see, if analysis loader fails not on the first call, then the statistics is updated partially. I propose to make it atomic. For instance, you can create for each index a new index_stat object, fill it with new stat, and on success replace the old one. See below how to do it simple. > -/* > +/** > * Load the content from the _sql_stat4 table > - * into the relevant Index.aSample[] arrays. > + * into the relevant index->stat->samples[] arrays. > + * > + * Arguments must point to SQL statements that return > + * data equivalent to the following: > * > - * Arguments zSql1 and zSql2 must point to SQL statements that return > - * data equivalent to the following (statements are different for stat4, > - * see the caller of this function for details): > + * prepare: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx; > + * load: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4; > * > - * zSql1: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx > - * zSql2: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4 > + * 'prepare' statement is used to allocate enough memory for > + * statistics (i.e. arrays lt, dt, dlt and avg_eq). 'load' query > + * is needed for > * > + * @param db Database handler. > + * @param sql_select_prepare SELECT statement, see above. > + * @param sql_select_load SELECT statement, see above. > + * @retval SQLITE_OK on success, smth else otherwise. > */ > static int > -loadStatTbl(sqlite3 * db, /* Database handle */ > - Table * pStatTab, /* Stat table to load to */ > - const char *zSql1, /* SQL statement 1 (see above) */ > - const char *zSql2 /* SQL statement 2 (see above) */ > - ) > +load_stat_from_space(struct sqlite3 *db, const char *sql_select_prepare, > + const char *sql_select_load) 11. As I can see, it is the core function of the patchset. And now it has two major problems: * partial statistics update on OOM or a non-first prepare() fail - new samples are allocated, but not filled, for example, due to malloc() error on the next index processing; * leaks on opts.stat->samples = malloc(alloc_size); - you do not delete the old memory. These and analysis_loader problems can be solved using region. I propose to use in this function and analysis_loader region allocations only. The array of new index_stat structures is created on a region in sql_analysis_load. Analysis loader takes this array via void *data parameter, and fills index_stats with more region allocations for tuple_stat1, tuple_log_est etc. Then load_stat_from_space takes the same array and process its index_stat instead of original stats in structs index. At the end of load_stat_from_space you are sure, that all statistics for all indexes are completely full, and replace original index stats in structs index using index_stat_dup and its super malloc. Region can be truncated right before return from sql_analysis_load. > @@ -1646,84 +1648,52 @@ sql_space_tuple_log_count(struct Table *tab) > return sqlite3LogEst(pk->vtab->size(pk)); > } > > -/* > - * Load the content of the _sql_stat1 and sql_stat4 tables. The > - * contents of _sql_stat1 are used to populate the Index.aiRowEst[] > -*\* arrays. The contents of sql_stat4 are used to populate the > - * Index.aSample[] arrays. > - * > - * If the _sql_stat1 table is not present in the database, SQLITE_ERROR > - * is returned. In this case, even if the sql_stat4 table is present, > - * no data is read from it. > - * > - * If the _sql_stat4 table is not present in the database, SQLITE_ERROR > - * is returned. However, in this case, data is read from the _sql_stat1 > - * table (if it is present) before returning. > - * > - * If an OOM error occurs, this function always sets db->mallocFailed. > - * This means if the caller does not care about other errors, the return > - * code may be ignored. > - */ > -int > -sqlite3AnalysisLoad(sqlite3 * db) > +log_est 12. Lets name this type log_est_t. > +index_field_tuple_est(struct Index *idx, uint32_t field) > { > - analysisInfo sInfo; > - HashElem *i, *j; > - char *zSql; > - int rc = SQLITE_OK; > - > - /* Clear any prior statistics */ > - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; > - j = sqliteHashNext(j)) { > - Table *pTab = sqliteHashData(j); > - for (i = sqliteHashFirst(&pTab->idxHash); i; > - i = sqliteHashNext(i)) { > - Index *pIdx = sqliteHashData(i); > - pIdx->aiRowLogEst[0] = 0; > - sqlite3DeleteIndexSamples(db, pIdx); > - pIdx->aSample = 0; > - } > + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); > + if (space == NULL) > + return idx->aiRowLogEst[field]; > + struct index *tnt_idx = > + space_index(space, SQLITE_PAGENO_TO_INDEXID(idx->tnum)); > + assert(tnt_idx != NULL); > + assert(field <= tnt_idx->def->key_def->part_count); > + if (tnt_idx->def->opts.stat == NULL) { > + if (field == 0) 13. Why? Field 0 is an ordinary field number. And why do you return 0, if a field is equal to part_count and the index is unique? > diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h > index d026a213d..c2ea066d6 100644 > --- a/src/box/sql/sqliteInt.h > +++ b/src/box/sql/sqliteInt.h > @@ -1473,7 +1473,6 @@ typedef struct FuncDef FuncDef; > typedef struct FuncDefHash FuncDefHash; > typedef struct IdList IdList; > typedef struct Index Index; > -typedef struct IndexSample IndexSample; 14. I still see IndexSample mention in analyze.c on 553 line. > @@ -2173,14 +2171,41 @@ struct Index { > * or _NONE > */ > unsigned idxType:2; /* 1==UNIQUE, 2==PRIMARY KEY, 0==CREATE INDEX */ > - unsigned bUnordered:1; /* Use this index for == or IN queries only */ > - unsigned noSkipScan:1; /* Do not try to use skip-scan if true */ > - int nSample; /* Number of elements in aSample[] */ > - int nSampleCol; /* Size of IndexSample.anEq[] and so on */ > - tRowcnt *aAvgEq; /* Average nEq values for keys not in aSample */ > - IndexSample *aSample; /* Samples of the left-most key */ > - tRowcnt *aiRowEst; /* Non-logarithmic stat1 data for this index */ > }; > +/** > + * default_tuple_est[] array contains default information > + * which is used when we don't have real space, e.g. temporary > + * objects representing result set of nested SELECT or VIEW. > + * > + * First number is supposed to contain the number of elements > + * in the index. Since we do not know, guess 1 million. > + * Second one is an estimate of the number of rows in the > + * table that match any particular value of the first column of > + * the index. Third one is an estimate of the number of > + * rows that match any particular combination of the first 2 > + * columns of the index. And so on. It must always be true: > + * > + * default_tuple_est[N] <= default_tuple_est[N-1] > + * default_tuple_est[N] >= 1 > + * > + * Apart from that, we have little to go on besides intuition > + * as to how default values should be initialized. The numbers > + * generated here are based on typical values found in actual > + * indices. > + */ > +extern const log_est default_tuple_est[]; 15. Why is it extern? Looks like it is used in analyze.c only, and can be static within the file. > + > +/** > + * Fetch statistics concerning tuples to be selected. > + * If there is no appropriate Tarantool's index, > + * return one of default values. > + * > + * @param idx Index. > + * @param field Number of field to be examined. > + * @retval Estimate logarithm of tuples selected by given field. 16. All tuples? Or unique by this field? > diff --git a/src/box/sql/where.c b/src/box/sql/where.c > index e6f9ef431..975e7059f 100644 > --- a/src/box/sql/where.c > +++ b/src/box/sql/where.c > @@ -977,15 +986,15 @@ whereKeyStats(Parse * pParse, /* Database connection */ > > pRec->nField = n; > res = > - sqlite3VdbeRecordCompareMsgpack(aSample[iSamp].n, > - aSample[iSamp].p, pRec); > + sqlite3VdbeRecordCompareMsgpack(samples[iSamp].key_size, > + samples[iSamp].sample_key, 17. Lets make sample_key has char * type instead of void *. It is common practice for raw MessagePack in Tarantool. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 4/4] sql: move SQL statistics to server 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-05-11 17:29 ` n.pettik 2018-05-11 22:00 ` Vladislav Shpilevoy 0 siblings, 1 reply; 19+ messages in thread From: n.pettik @ 2018-05-11 17:29 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy I am attaching whole patch since it has changed significantly with region allocations. I didn't answer on comments concerning memory allocation policy deliberately, but took them into consideration when was reworking patch. I hope I didn’t miss smth. Overall, thanks for ideas and suggestions! > On 24 Apr 2018, at 15:51, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Hello. The patch is great, but see 17 comments below %) > > Most of them are minor except one. The key idea is to > use region allocator to build statistics on analyze. See > details below. > > On 23/04/2018 23:29, Nikita Pettik wrote: >> SQLite provides kind of statistics concerning data holding in each index >> and space. It is used to help query planer build optimized plan. Such >> statistics don't depend on internal index implementation (e.g. B-tree or >> LSM tree) and contain information concerning tuple distribution. >> This patch moves members resbonsible statistics from original SQLite >> structs to Tarantool ones. To be more precise, now index's opts contains >> statistics from _stat1 space (arrays of integers and their loarithm > > 1. resbonsible -> responsible, loarithm -> logarithm. Typos, sorry. Fixed. > 2. Lets reduce count of mallocs and allocate all on the single > huge malloc. It eliminates index_stat_destroy_samples, and memory > fragmentation. > > See how it is done for space_def. You must define an index_stat_sizeof > function, that takes what is needed to calculate summary size, > calculates offsets on different parts of the result memory etc. > > I know, that it is built in load_stat_from_space, but seems like I > found a solution using region. See the comments below. Done. See in the patch below functions index_stat_dup(), index_stat_sizeof() and stat_copy() in analyse.c Unfortunately, it is impossible to make dup during analyze, since stats on region aren't placed in one chunk, so I have to write 2 versions of dup (in fact it is possible to locate in one chunk on region, but requires a lot of overhead). > >> diff --git a/src/box/index_def.h b/src/box/index_def.h >> index 6167570b7..52e9a0548 100644 >> --- a/src/box/index_def.h >> +++ b/src/box/index_def.h >> @@ -113,6 +181,27 @@ index_opts_create(struct index_opts *opts) >> *opts = index_opts_default; >> } >> +/** >> + * Release memory allocated for samples, statistics they include, >> + * sample keys and avg_eq array. >> + */ >> +void >> +index_stat_destroy_samples(struct index_stat *stat); >> + >> +/** >> + * Destroy index stat. >> + */ >> +static inline void >> +destroy_stat(struct index_stat *stat) { > > 3. It will be removed after review fixes, but why you did not > just inlined it in a single usage place? Nevermind. > >> + if (stat == NULL) >> + return; >> + free(stat->tuple_stat1); >> + free(stat->tuple_log_est); >> + index_stat_destroy_samples(stat); >> + TRASH(stat); >> + free(stat); >> +} >> + >> /** >> * Destroy index options >> */ >> @@ -120,6 +209,7 @@ static inline void >> index_opts_destroy(struct index_opts *opts) >> { >> free(opts->sql); >> + destroy_stat(opts->stat); > > 4. Will be turned into just free(). Done: now there is only free() call. > >> diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c >> index 4965e899a..14dfbd551 100644 >> --- a/src/box/sql/analyze.c >> +++ b/src/box/sql/analyze.c >> @@ -104,8 +104,8 @@ >> * large nEq values. >> * >> */ >> -#ifndef SQLITE_OMIT_ANALYZE > > 5. The definition still exists in mkkeywordhash.c Fixed: +++ b/extra/mkkeywordhash.c @@ -59,11 +59,6 @@ struct Keyword { # define ALTER 0x00000001 #endif #define ALWAYS 0x00000002 -#ifdef SQLITE_OMIT_ANALYZE -# define ANALYZE 0 -#else -# define ANALYZE 0x00000004 -#endif #ifdef SQLITE_OMIT_AUTOINCREMENT # define AUTOINCR 0 #else @@ -127,7 +122,7 @@ static Keyword aKeywordTable[] = { { "AFTER", "TK_AFTER", TRIGGER, false }, { "ALL", "TK_ALL", ALWAYS, true }, { "ALTER", "TK_ALTER", ALTER, true }, - { "ANALYZE", "TK_ANALYZE", ANALYZE, true }, + { "ANALYZE", "TK_ANALYZE", ALWAYS, true }, >> +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, >> + LogEst *stat_log, struct index *index) { > > 6. Please, create a ticket to store stat_string in a stat table as map or array of stat > values instead of string. It allows to extremely simplify all of this pure absolute shit > using raw MessagePack array of integers iteration. Done: https://github.com/tarantool/tarantool/issues/3372 > >> + char *z = (char *) stat_string; >> + if (z == NULL) >> z = ""; >> - >> - for (i = 0; *z && i < nOut; i++) { >> - v = 0; >> + for (int i = 0; *z && i < stat_size; i++) { >> + tRowcnt v = 0; >> + int c; >> while ((c = z[0]) >= '0' && c <= '9') { >> v = v * 10 + c - '0'; >> z++; >> } > > 7. sscanf? Does it really matter? It simply works, so I don’t see reason to change it. Anyway, it is going to disappear after #3372. > >> - if (aOut) >> - aOut[i] = v; >> - if (aLog) >> - aLog[i] = sqlite3LogEst(v); >> + if (stat_exact != NULL) >> + stat_exact[i] = v; >> + if (stat_log != NULL) >> + stat_log[i] = sqlite3LogEst(v); >> if (*z == ' ') >> z++; >> } >> - >> - if (pIndex) { >> - pIndex->bUnordered = 0; >> - pIndex->noSkipScan = 0; >> + if (index != NULL) { > > 8. As I can see, you use this function 3 times with index == NULL and 1 > time with not NULL. Lets just inline these things where the index is not NULL, > and remove the index argument. Done. > 9. Lets return 0 here and remove 'if index != NULL' below. The code will > look more compact. Done. > >> - */ >> -void >> -sqlite3DeleteIndexSamples(sqlite3 * db, Index * pIdx) >> -{ >> - if (pIdx->aSample) { >> - int j; >> - for (j = 0; j < pIdx->nSample; j++) { >> - IndexSample *p = &pIdx->aSample[j]; >> - sqlite3DbFree(db, p->p); >> + if (stat->tuple_stat1 == NULL) { > > 10. Can it be NULL, if index->def->opts.stat was not NULL above? And as I can see, > if analysis loader fails not on the first call, then the statistics is updated > partially. I propose to make it atomic. For instance, you can create for each index > a new index_stat object, fill it with new stat, and on success replace the old one. > See below how to do it simple. > >> -/* >> +/** >> * Load the content from the _sql_stat4 table >> - * into the relevant Index.aSample[] arrays. >> + * into the relevant index->stat->samples[] arrays. >> + * >> + * Arguments must point to SQL statements that return >> + * data equivalent to the following: >> * >> - * Arguments zSql1 and zSql2 must point to SQL statements that return >> - * data equivalent to the following (statements are different for stat4, >> - * see the caller of this function for details): >> + * prepare: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx; >> + * load: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4; >> * >> - * zSql1: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx >> - * zSql2: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4 >> + * 'prepare' statement is used to allocate enough memory for >> + * statistics (i.e. arrays lt, dt, dlt and avg_eq). 'load' query >> + * is needed for >> * >> + * @param db Database handler. >> + * @param sql_select_prepare SELECT statement, see above. >> + * @param sql_select_load SELECT statement, see above. >> + * @retval SQLITE_OK on success, smth else otherwise. >> */ >> static int >> -loadStatTbl(sqlite3 * db, /* Database handle */ >> - Table * pStatTab, /* Stat table to load to */ >> - const char *zSql1, /* SQL statement 1 (see above) */ >> - const char *zSql2 /* SQL statement 2 (see above) */ >> - ) >> +load_stat_from_space(struct sqlite3 *db, const char *sql_select_prepare, >> + const char *sql_select_load) > > 11. As I can see, it is the core function of the patchset. And now it has > two major problems: > * partial statistics update on OOM or a non-first prepare() fail - new samples > are allocated, but not filled, for example, due to malloc() error on the next > index processing; > * leaks on opts.stat->samples = malloc(alloc_size); - you do not delete > the old memory. > > These and analysis_loader problems can be solved using region. I propose > to use in this function and analysis_loader region allocations only. > > The array of new index_stat structures is created on a region in > sql_analysis_load. Analysis loader takes this array via > void *data parameter, and fills index_stats with more region allocations > for tuple_stat1, tuple_log_est etc. > > Then load_stat_from_space takes the same array and process its index_stat > instead of original stats in structs index. > > At the end of load_stat_from_space you are sure, that all statistics for > all indexes are completely full, and replace original index stats in > structs index using index_stat_dup and its super malloc. > > Region can be truncated right before return from sql_analysis_load. > >> @@ -1646,84 +1648,52 @@ sql_space_tuple_log_count(struct Table *tab) >> return sqlite3LogEst(pk->vtab->size(pk)); >> } >> -/* >> - * Load the content of the _sql_stat1 and sql_stat4 tables. The >> - * contents of _sql_stat1 are used to populate the Index.aiRowEst[] >> -*\* arrays. The contents of sql_stat4 are used to populate the >> - * Index.aSample[] arrays. >> - * >> - * If the _sql_stat1 table is not present in the database, SQLITE_ERROR >> - * is returned. In this case, even if the sql_stat4 table is present, >> - * no data is read from it. >> - * >> - * If the _sql_stat4 table is not present in the database, SQLITE_ERROR >> - * is returned. However, in this case, data is read from the _sql_stat1 >> - * table (if it is present) before returning. >> - * >> - * If an OOM error occurs, this function always sets db->mallocFailed. >> - * This means if the caller does not care about other errors, the return >> - * code may be ignored. >> - */ >> -int >> -sqlite3AnalysisLoad(sqlite3 * db) >> +log_est > > 12. Lets name this type log_est_t. Ok. > >> +index_field_tuple_est(struct Index *idx, uint32_t field) >> { >> - analysisInfo sInfo; >> - HashElem *i, *j; >> - char *zSql; >> - int rc = SQLITE_OK; >> - >> - /* Clear any prior statistics */ >> - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; >> - j = sqliteHashNext(j)) { >> - Table *pTab = sqliteHashData(j); >> - for (i = sqliteHashFirst(&pTab->idxHash); i; >> - i = sqliteHashNext(i)) { >> - Index *pIdx = sqliteHashData(i); >> - pIdx->aiRowLogEst[0] = 0; >> - sqlite3DeleteIndexSamples(db, pIdx); >> - pIdx->aSample = 0; >> - } >> + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); >> + if (space == NULL) >> + return idx->aiRowLogEst[field]; >> + struct index *tnt_idx = >> + space_index(space, SQLITE_PAGENO_TO_INDEXID(idx->tnum)); >> + assert(tnt_idx != NULL); >> + assert(field <= tnt_idx->def->key_def->part_count); >> + if (tnt_idx->def->opts.stat == NULL) { >> + if (field == 0) > > 13. Why? Field 0 is an ordinary field number. Not quite. It is assumed that firstly comes total amount of rows in table, and only then standard statistics. Well, I just add DEFAULT_TUPLE_LOG_COUNT as first member at stat array: -/** [10*log_{2}(x)]:10, 9, 8, 7, 6, 5 */ -const log_est default_tuple_est[] = {33, 32, 30, 28, 26, 23}; +const log_est_t default_tuple_est[] = {DEFAULT_TUPLE_LOG_COUNT, +/** [10*log_{2}(x)]: 10, 9, 8, 7, 6, 5 */ + 33, 32, 30, 28, 26, 23}; > And why do you return 0, if > a field is equal to part_count and the index is unique? Sorry, I should have described it somewhere in comments. Added comment: +log_est_t index_field_tuple_est(struct Index *idx, uint32_t field) { struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); @@ -1663,12 +1664,15 @@ index_field_tuple_est(struct Index *idx, uint32_t field) assert(tnt_idx != NULL); assert(field <= tnt_idx->def->key_def->part_count); if (tnt_idx->def->opts.stat == NULL) { - if (field == 0) - return DEFAULT_TUPLE_LOG_COUNT; + /* + * Last number for unique index is always 0: + * only one tuple exists with given full key + * in unique index and log(1) == 0. + */ if (field == tnt_idx->def->key_def->part_count && tnt_idx->def->opts.is_unique) return 0; - return default_tuple_est[field + 1 >= 5 ? 5 : field + 1]; + return default_tuple_est[field + 1 >= 6 ? 6 : field]; } > >> diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h >> index d026a213d..c2ea066d6 100644 >> --- a/src/box/sql/sqliteInt.h >> +++ b/src/box/sql/sqliteInt.h >> @@ -1473,7 +1473,6 @@ typedef struct FuncDef FuncDef; >> typedef struct FuncDefHash FuncDefHash; >> typedef struct IdList IdList; >> typedef struct Index Index; >> -typedef struct IndexSample IndexSample; > > 14. I still see IndexSample mention in analyze.c on 553 line. It is slight different sample, idk why IndexSample mentioned there. Fixed comment: @@ -550,7 +550,7 @@ samplePushPrevious(Stat4Accum * p, int iChng) { int i; /* Check if any samples from the aBest[] array should be pushed - * into IndexSample.a[] at this point. + * into samples array at this point. >> @@ -2173,14 +2171,41 @@ struct Index { >> * or _NONE >> */ >> unsigned idxType:2; /* 1==UNIQUE, 2==PRIMARY KEY, 0==CREATE INDEX */ >> - unsigned bUnordered:1; /* Use this index for == or IN queries only */ >> - unsigned noSkipScan:1; /* Do not try to use skip-scan if true */ >> - int nSample; /* Number of elements in aSample[] */ >> - int nSampleCol; /* Size of IndexSample.anEq[] and so on */ >> - tRowcnt *aAvgEq; /* Average nEq values for keys not in aSample */ >> - IndexSample *aSample; /* Samples of the left-most key */ >> - tRowcnt *aiRowEst; /* Non-logarithmic stat1 data for this index */ >> }; >> +/** >> + * default_tuple_est[] array contains default information >> + * which is used when we don't have real space, e.g. temporary >> + * objects representing result set of nested SELECT or VIEW. >> + * >> + * First number is supposed to contain the number of elements >> + * in the index. Since we do not know, guess 1 million. >> + * Second one is an estimate of the number of rows in the >> + * table that match any particular value of the first column of >> + * the index. Third one is an estimate of the number of >> + * rows that match any particular combination of the first 2 >> + * columns of the index. And so on. It must always be true: >> + * >> + * default_tuple_est[N] <= default_tuple_est[N-1] >> + * default_tuple_est[N] >= 1 >> + * >> + * Apart from that, we have little to go on besides intuition >> + * as to how default values should be initialized. The numbers >> + * generated here are based on typical values found in actual >> + * indices. >> + */ >> +extern const log_est default_tuple_est[]; > > 15. Why is it extern? Looks like it is used in analyze.c only, and > can be static within the file. Currently, yes - it is used only within analyse.c, but I thought it might be useful somewhere else..Well, removed extern modifier. >> + >> +/** >> + * Fetch statistics concerning tuples to be selected. >> + * If there is no appropriate Tarantool's index, >> + * return one of default values. >> + * >> + * @param idx Index. >> + * @param field Number of field to be examined. >> + * @retval Estimate logarithm of tuples selected by given field. > > 16. All tuples? Or unique by this field? It is number of tuples, which has the same first n fields. Also, it is the reason why for unique indexes this number is 0 when it comes for n == part_count. Clarified this point in comment: - * Fetch statistics concerning tuples to be selected. + * Fetch statistics concerning tuples to be selected: + * logarithm of number of tuples which has the same key as for + * the first key parts specified by second argument. + * Simple example (without logarithms): + * idx = {{1,2}, {1,2}, {2, 4}, {2, 3}, {3, 5}} + * stat[1] = (2 + 2 + 1) / 3 = 2 - two keys which start from 1, + * two - from 2, and one from 3. + * stat[2] = (2 + 1 + 1 + 1) / 4 = 1 - only one coincidence of + * keys: {1, 2}; the rest are different. + * Notice, that stat[0] is an average number of tuples in a whole + * index. By default it is DEFAULT_TUPLE_LOG_COUNT == 200. >> diff --git a/src/box/sql/where.c b/src/box/sql/where.c >> index e6f9ef431..975e7059f 100644 >> --- a/src/box/sql/where.c >> +++ b/src/box/sql/where.c >> @@ -977,15 +986,15 @@ whereKeyStats(Parse * pParse, /* Database connection */ >> pRec->nField = n; >> res = >> - sqlite3VdbeRecordCompareMsgpack(aSample[iSamp].n, >> - aSample[iSamp].p, pRec); >> + sqlite3VdbeRecordCompareMsgpack(samples[iSamp].key_size, >> + samples[iSamp].sample_key, > > 17. Lets make sample_key has char * type instead of void *. It is common > practice for raw MessagePack in Tarantool. Ok: struct index_sample { /** Key of the sample. */ - void *sample_key; + char *sample_key; Branch is the same: https://github.com/tarantool/tarantool/tree/np/gh-3253-move-statistics-to-server Patch itself is below: =============================================================================== diff --git a/extra/mkkeywordhash.c b/extra/mkkeywordhash.c index 6166119e2..cf3483151 100644 --- a/extra/mkkeywordhash.c +++ b/extra/mkkeywordhash.c @@ -59,11 +59,6 @@ struct Keyword { # define ALTER 0x00000001 #endif #define ALWAYS 0x00000002 -#ifdef SQLITE_OMIT_ANALYZE -# define ANALYZE 0 -#else -# define ANALYZE 0x00000004 -#endif #ifdef SQLITE_OMIT_AUTOINCREMENT # define AUTOINCR 0 #else @@ -127,7 +122,7 @@ static Keyword aKeywordTable[] = { { "AFTER", "TK_AFTER", TRIGGER, false }, { "ALL", "TK_ALL", ALWAYS, true }, { "ALTER", "TK_ALTER", ALTER, true }, - { "ANALYZE", "TK_ANALYZE", ANALYZE, true }, + { "ANALYZE", "TK_ANALYZE", ALWAYS, true }, { "AND", "TK_AND", ALWAYS, true }, { "AS", "TK_AS", ALWAYS, true }, { "ASC", "TK_ASC", ALWAYS, true }, diff --git a/src/box/index_def.c b/src/box/index_def.c index d0d852519..1702430b3 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -47,6 +47,7 @@ const struct index_opts index_opts_default = { /* .bloom_fpr = */ 0.05, /* .lsn = */ 0, /* .sql = */ NULL, + /* .stat = */ NULL, }; const struct opt_def index_opts_reg[] = { @@ -117,6 +118,8 @@ index_def_new(uint32_t space_id, uint32_t iid, const char *name, return NULL; } } + /* Statistics are initialized separately. */ + assert(opts->stat == NULL); return def; } @@ -154,6 +157,80 @@ index_def_dup(const struct index_def *def) return NULL; } } + if (def->opts.stat != NULL) { + dup->opts.stat = index_stat_dup(def->opts.stat); + if (dup->opts.stat == NULL) { + index_def_delete(dup); + return NULL; + } + } + return dup; +} + +size_t +index_stat_sizeof(const struct index_sample *samples, uint32_t sample_count, + uint32_t field_count) +{ + /* Space for index_stat struct itself. */ + size_t alloc_size = sizeof(struct index_stat); + /* + * Space for stat1, log_est and avg_eg arrays. + * stat1 and log_est feature additional field + * to describe total count of tuples in index. + */ + alloc_size += (3 * field_count + 2) * sizeof(uint32_t); + /* Space for samples structs. */ + alloc_size += sizeof(struct index_sample) * sample_count; + /* Space for eq, lt and dlt stats. */ + alloc_size += 3 * sizeof(uint32_t) * field_count * sample_count; + /* + * Space for sample keys. Additional two bytes are + * required for '\0' guards at the end. + */ + for (uint32_t i = 0; i < sample_count; ++i) + alloc_size += samples[i].key_size + 2; + return alloc_size; +} + +struct index_stat * +index_stat_dup(const struct index_stat *src) +{ + size_t size = index_stat_sizeof(src->samples, src->sample_count, + src->sample_field_count); + struct index_stat *dup = (struct index_stat *) malloc(size); + if (dup == NULL) { + diag_set(OutOfMemory, size, "malloc", "index stat"); + return NULL; + } + memcpy(dup, src, size); + uint32_t array_size = src->sample_field_count * sizeof(uint32_t); + uint32_t stat1_offset = sizeof(struct index_stat); + dup->tuple_stat1 = (uint32_t *)((char*)dup + stat1_offset); + dup->tuple_log_est = (log_est_t *)((char*)dup + stat1_offset + + array_size + sizeof(uint32_t)); + dup->avg_eq = (uint32_t *)((char*)dup + stat1_offset + + 2 * (array_size + sizeof(uint32_t))); + uint32_t samples_offset = stat1_offset + 3 * array_size + + 2 * sizeof(uint32_t); + dup->samples = (struct index_sample *)((char*)dup + samples_offset); + uint32_t current_sample_offset = samples_offset + + src->sample_count * + sizeof(struct index_sample); + for (uint32_t i = 0; i < src->sample_count; ++i) { + dup->samples[i].eq = (uint32_t *)((char*)dup + + current_sample_offset); + dup->samples[i].lt = (uint32_t *)((char*)dup + + current_sample_offset + + array_size); + dup->samples[i].dlt = (uint32_t *)((char*)dup + + current_sample_offset + + 2 * size); + dup->samples[i].sample_key = + (char *)((char*)dup + current_sample_offset + 3 * size); + current_sample_offset += current_sample_offset + + 3 * array_size + + dup->samples[i].key_size + 2; + } return dup; } diff --git a/src/box/index_def.h b/src/box/index_def.h index dc8293b23..107255cc7 100644 --- a/src/box/index_def.h +++ b/src/box/index_def.h @@ -58,6 +58,68 @@ enum rtree_index_distance_type { }; extern const char *rtree_index_distance_type_strs[]; +/** Simple alias to represent logarithm metrics. */ +typedef int16_t log_est_t; + +/** + * One sample represents index key and three arrays with + * statistics concerning tuples distribution. + * Max number of samples for one index is adjusted by + * SQL_STAT4_SAMPLES macros. By default it is about 24 entities. + * Samples are chosen to be selective. + */ +struct index_sample { + /** Key of the sample. */ + char *sample_key; + /** Size of sample key. */ + size_t key_size; + /** + * List of integers: first one is the approximate number + * of entries in the index whose left-most field exactly + * matches the left-most column of the sample; + * second one - est. number of entries in the index where + * the first two columns match the first two columns of + * the sample; and so forth. + */ + uint32_t *eq; + /** The same as eq list, but key is less than sample. */ + uint32_t *lt; + /** The same as lt list, but includes only distinct keys. */ + uint32_t *dlt; +}; + +/** + * SQL statistics for index, which is used by query planer. + * This is general statistics, without any relation to used + * engine and data structures (e.g. B-tree or LSM tree). + * Statistics appear only after executing ANALYZE statement. + * It is loaded from _sql_stat1 and _sql_stat4 system spaces. + */ +struct index_stat { + /** An array of samples of them left-most key. */ + struct index_sample *samples; + /** Number of samples. */ + uint32_t sample_count; + /** Number of fields in sample arrays: eq, lt and dlt. */ + uint32_t sample_field_count; + /** + * List of integers: the first is the number of tuples + * in the index; the second one is the average number of + * tuples that have the same key part in the first field + * of the index; the third - for the first two fields; + * and so forth. + */ + uint32_t *tuple_stat1; + /** Logarithms of stat1 data. */ + log_est_t *tuple_log_est; + /** Average eq values for keys not in samples. */ + uint32_t *avg_eq; + /** Use this index for == or IN queries only. */ + bool is_unordered; + /** Don't try to use skip-scan optimization if true. */ + bool skip_scan_enabled; +}; + /** Index options */ struct index_opts { /** @@ -99,6 +161,12 @@ struct index_opts { * SQL statement that produced this index. */ char *sql; + /** + * SQL specific statistics concerning tuples + * distribution for query planer. It is automatically + * filled after running ANALYZE command. + */ + struct index_stat *stat; }; extern const struct index_opts index_opts_default; @@ -120,6 +188,7 @@ static inline void index_opts_destroy(struct index_opts *opts) { free(opts->sql); + free(opts->stat); TRASH(opts); } @@ -173,6 +242,50 @@ struct index_def { struct index_def * index_def_dup(const struct index_def *def); +/** + * Calculate size of index's statistics. + * Statistics is located in memory according to following layout: + * + * +-------------------------+ <- Allocated memory starts here + * | struct index_stat | + * |-------------------------| + * | stat1 array | ^ + * |-------------------------| | + * | log_est array | | 3 * array_size + 2 * uint_32 + * |-------------------------| | + * | avg_eq array | v + * |-------------------------| + * | samples struct array | + * |-------------------------| + * | eq | lt | dlt | key | <- Content of one sample + * +-------------------------+ + * ... <- Up to 24 samples + * | eq | lt | dlt | key | + * +-------------------------+ + * + * array_size = field_count * sizeof(uint_32) + * offset of one sample = 3 * array_size + key_size + * + * @param samples Array of samples needed for calculating + * size of keys. + * @param sample_count Count of samples in samples array. + * @param field_count Count of fields in statistics arrays. + * @retval Size needed for statistics allocation in bytes. + */ +size_t +index_stat_sizeof(const struct index_sample *samples, uint32_t sample_count, + uint32_t field_count); + +/** + * Duplicate index_stat object. + * To understand memory layout see index_stat_sizeof() function. + * + * @param src Stat to duplicate. + * @retval Copy of the @src. + */ +struct index_stat * +index_stat_dup(const struct index_stat *src); + /* Destroy and free an index_def. */ void index_def_delete(struct index_def *def); diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 57cac437c..4f21cd25f 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -104,8 +104,8 @@ * large nEq values. * */ -#ifndef SQLITE_OMIT_ANALYZE +#include "box/box.h" #include "box/index.h" #include "box/key_def.h" #include "box/tuple_compare.h" @@ -550,7 +550,7 @@ samplePushPrevious(Stat4Accum * p, int iChng) { int i; /* Check if any samples from the aBest[] array should be pushed - * into IndexSample.a[] at this point. + * into samples array at this point. */ for (i = (p->nCol - 1); i >= iChng; i--) { Stat4Sample *pBest = &p->aBest[i]; @@ -1204,431 +1204,462 @@ sql_index_tuple_size(struct space *space, struct index *idx) return avg_tuple_size; } -/* - * Used to pass information from the analyzer reader through to the - * callback routine. +/** + * Used to pass information from the analyzer reader through + * to the callback routine. */ -typedef struct analysisInfo analysisInfo; -struct analysisInfo { +struct analysis_index_info { + /** Database handler. */ sqlite3 *db; + /*** Array of statistics for each index. */ + struct index_stat *stats; + /** Ordinal number of index to be processed. */ + uint32_t index_count; }; -/* - * The first argument points to a nul-terminated string containing a - * list of space separated integers. Read the first nOut of these into - * the array aOut[]. +/** + * The first argument points to a nul-terminated string + * containing a list of space separated integers. Load + * the first stat_size of these into the output arrays. + * + * @param stat_string String containing array of integers. + * @param stat_size Size of output arrays. + * @param[out] stat_exact Decoded array of statistics. + * @param[out] stat_log Decoded array of stat logariphms. */ static void -decodeIntArray(char *zIntArray, /* String containing int array to decode */ - int nOut, /* Number of slots in aOut[] */ - tRowcnt * aOut, /* Store integers here */ - LogEst * aLog, /* Or, if aOut==0, here */ - Index * pIndex /* Handle extra flags for this index, if not NULL */ - ) -{ - char *z = zIntArray; - int c; - int i; - tRowcnt v; - - if (z == 0) +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, + LogEst *stat_log) { + char *z = (char *) stat_string; + if (z == NULL) z = ""; - - for (i = 0; *z && i < nOut; i++) { - v = 0; + for (int i = 0; *z && i < stat_size; i++) { + tRowcnt v = 0; + int c; while ((c = z[0]) >= '0' && c <= '9') { v = v * 10 + c - '0'; z++; } - if (aOut) - aOut[i] = v; - if (aLog) - aLog[i] = sqlite3LogEst(v); + if (stat_exact != NULL) + stat_exact[i] = v; + if (stat_log != NULL) + stat_log[i] = sqlite3LogEst(v); if (*z == ' ') z++; } - - if (pIndex) { - pIndex->bUnordered = 0; - pIndex->noSkipScan = 0; - while (z[0]) { - if (sqlite3_strglob("unordered*", z) == 0) { - pIndex->bUnordered = 1; - } else if (sqlite3_strglob("noskipscan*", z) == 0) { - pIndex->noSkipScan = 1; - } - while (z[0] != 0 && z[0] != ' ') - z++; - while (z[0] == ' ') - z++; - } - } } -/* +/** * This callback is invoked once for each index when reading the * _sql_stat1 table. * * argv[0] = name of the table * argv[1] = name of the index (might be NULL) - * argv[2] = results of analysis - on integer for each column + * argv[2] = results of analysis - array of integers + * + * Entries for which argv[1]==NULL simply record the number + * of rows in the table. This routine also allocates memory + * for stat struct itself and statistics which are not related + * to stat4 samples. * - * Entries for which argv[1]==NULL simply record the number of rows in - * the table. + * @param data Additional analysis info. + * @param argc Number of arguments. + * @param argv Callback arguments. + * @retval 0 on success, -1 otherwise. */ static int -analysisLoader(void *pData, int argc, char **argv, char **NotUsed) +analysis_loader(void *data, int argc, char **argv, char **unused) { - analysisInfo *pInfo = (analysisInfo *) pData; - Index *pIndex; - Table *pTable; - const char *z; - assert(argc == 3); - UNUSED_PARAMETER2(NotUsed, argc); - - if (argv == 0 || argv[0] == 0 || argv[2] == 0) { - return 0; - } - pTable = sqlite3HashFind(&pInfo->db->pSchema->tblHash, argv[0]); - if (pTable == NULL) + UNUSED_PARAMETER2(unused, argc); + if (argv == 0 || argv[0] == 0 || argv[2] == 0) return 0; - if (argv[1] == 0) { - pIndex = 0; - } else if (sqlite3_stricmp(argv[0], argv[1]) == 0) { - pIndex = sqlite3PrimaryKeyIndex(pTable); + struct analysis_index_info *info = (struct analysis_index_info *) data; + assert(info->stats != NULL); + struct index_stat *stat = &info->stats[info->index_count++]; + uint32_t space_id = box_space_id_by_name(argv[0], strlen(argv[0])); + if (space_id == BOX_ID_NIL) + return -1; + struct space *space = space_by_id(space_id); + assert(space != NULL); + struct index *index; + uint32_t iid = box_index_id_by_name(space_id, argv[1], strlen(argv[1])); + /* + * Convention is if index's name matches with space's + * one, then it is primary index. + */ + if (iid != BOX_ID_NIL) { + index = space_index(space, iid); } else { - pIndex = sqlite3HashFind(&pTable->idxHash, argv[1]); + if (sqlite3_stricmp(argv[0], argv[1]) != 0) + return -1; + index = space_index(space, 0); } - z = argv[2]; - - if (pIndex) { - tRowcnt *aiRowEst = 0; - int nCol = index_column_count(pIndex) + 1; - /* Index.aiRowEst may already be set here if there are duplicate - * _sql_stat1 entries for this index. In that case just clobber - * the old data with the new instead of allocating a new array. - */ - if (pIndex->aiRowEst == 0) { - pIndex->aiRowEst = - (tRowcnt *) sqlite3MallocZero(sizeof(tRowcnt) * - nCol); - if (pIndex->aiRowEst == 0) - sqlite3OomFault(pInfo->db); - } - aiRowEst = pIndex->aiRowEst; - pIndex->bUnordered = 0; - decodeIntArray((char *)z, nCol, aiRowEst, pIndex->aiRowLogEst, - pIndex); + assert(index != NULL); + /* + * Additional field is used to describe total + * count of tuples in index. Although now all + * indexes feature the same number of tuples, + * partial indexes are going to be implemented + * someday. + */ + uint32_t column_count = index->def->key_def->part_count + 1; + /* + * Stat arrays may already be set here if there + * are duplicate _sql_stat1 entries for this + * index. In that case just clobber the old data + * with the new instead of allocating a new array. + */ + uint32_t stat1_size = column_count * sizeof(uint32_t); + stat->tuple_stat1 = region_alloc(&fiber()->gc, stat1_size); + if (stat->tuple_stat1 == NULL) { + diag_set(OutOfMemory, stat1_size, "region", "tuple_stat1"); + return -1; } - - return 0; -} - -/* - * If the Index.aSample variable is not NULL, delete the aSample[] array - * and its contents. - */ -void -sqlite3DeleteIndexSamples(sqlite3 * db, Index * pIdx) -{ - if (pIdx->aSample) { - int j; - for (j = 0; j < pIdx->nSample; j++) { - IndexSample *p = &pIdx->aSample[j]; - sqlite3DbFree(db, p->p); - } - sqlite3DbFree(db, pIdx->aSample); + stat->tuple_log_est = region_alloc(&fiber()->gc, stat1_size); + if (stat->tuple_log_est == NULL) { + diag_set(OutOfMemory, stat1_size, "region", "tuple_log_est"); + return -1; } - if (db && db->pnBytesFreed == 0) { - pIdx->nSample = 0; - pIdx->aSample = 0; + decode_stat_string(argv[2], column_count, stat->tuple_stat1, + stat->tuple_log_est); + stat->is_unordered = false; + stat->skip_scan_enabled = true; + char *z = argv[2]; + /* Position ptr at the end of stat string. */ + for (; *z == ' ' || (*z >= '0' && *z <= '9'); ++z); + while (z[0]) { + if (sqlite3_strglob("unordered*", z) == 0) { + index->def->opts.stat->is_unordered = true; + } else if (sqlite3_strglob("noskipscan*", z) == 0) { + index->def->opts.stat->skip_scan_enabled = false; + } + while (z[0] != 0 && z[0] != ' ') + z++; + while (z[0] == ' ') + z++; } + return 0; } -/* - * Populate the pIdx->aAvgEq[] array based on the samples currently - * stored in pIdx->aSample[]. +/** + * Calculate avg_eq array based on the samples from index. + * Some *magic* calculations happen here. */ static void -initAvgEq(Index * pIdx) +init_avg_eq(struct index *index, struct index_stat *stat) { - if (pIdx) { - IndexSample *aSample = pIdx->aSample; - IndexSample *pFinal = &aSample[pIdx->nSample - 1]; - int iCol; - int nCol = 1; - if (pIdx->nSampleCol > 1) { - /* If this is stat4 data, then calculate aAvgEq[] values for all - * sample columns except the last. The last is always set to 1, as - * once the trailing PK fields are considered all index keys are - * unique. - */ - nCol = pIdx->nSampleCol - 1; - pIdx->aAvgEq[nCol] = 1; + assert(stat != NULL); + struct index_sample *samples = stat->samples; + uint32_t sample_count = stat->sample_count; + uint32_t field_count = stat->sample_field_count; + struct index_sample *last_sample = &samples[sample_count - 1]; + if (field_count > 1) + stat->avg_eq[--field_count] = 1; + for (uint32_t i = 0; i < field_count; ++i) { + uint32_t column_count = index->def->key_def->part_count; + tRowcnt eq_sum = 0; + tRowcnt eq_avg = 0; + uint32_t tuple_count = index->vtab->size(index); + uint64_t distinct_tuple_count; + uint64_t terms_sum = 0; + if (i >= column_count || stat->tuple_stat1[i + 1] == 0) { + distinct_tuple_count = 100 * last_sample->dlt[i]; + sample_count--; + } else { + assert(stat->tuple_stat1 != NULL); + distinct_tuple_count = (100 * tuple_count) / + stat->tuple_stat1[i + 1]; } - for (iCol = 0; iCol < nCol; iCol++) { - int nSample = pIdx->nSample; - int i; /* Used to iterate through samples */ - tRowcnt sumEq = 0; /* Sum of the nEq values */ - tRowcnt avgEq = 0; - tRowcnt nRow; /* Number of rows in index */ - i64 nSum100 = 0; /* Number of terms contributing to sumEq */ - i64 nDist100; /* Number of distinct values in index */ - int nColumn = index_column_count(pIdx); - - if (!pIdx->aiRowEst || iCol >= nColumn - || pIdx->aiRowEst[iCol + 1] == 0) { - nRow = pFinal->anLt[iCol]; - nDist100 = (i64) 100 *pFinal->anDLt[iCol]; - nSample--; - } else { - nRow = pIdx->aiRowEst[0]; - nDist100 = - ((i64) 100 * pIdx->aiRowEst[0]) / - pIdx->aiRowEst[iCol + 1]; + for (uint32_t j = 0; j < sample_count; ++j) { + if (j == (stat->sample_count - 1) || + samples[j].dlt[i] != samples[j + 1].dlt[i]) { + eq_sum += samples[j].eq[i]; + terms_sum += 100; } - - /* Set nSum to the number of distinct (iCol+1) field prefixes that - * occur in the stat4 table for this index. Set sumEq to the sum of - * the nEq values for column iCol for the same set (adding the value - * only once where there exist duplicate prefixes). - */ - for (i = 0; i < nSample; i++) { - if (i == (pIdx->nSample - 1) - || aSample[i].anDLt[iCol] != - aSample[i + 1].anDLt[iCol] - ) { - sumEq += aSample[i].anEq[iCol]; - nSum100 += 100; - } - } - - if (nDist100 > nSum100) { - avgEq = - ((i64) 100 * (nRow - sumEq)) / (nDist100 - - nSum100); - } - if (avgEq == 0) - avgEq = 1; - pIdx->aAvgEq[iCol] = avgEq; } + if (distinct_tuple_count > terms_sum) { + eq_avg = 100 * (tuple_count - eq_sum) / + (distinct_tuple_count - terms_sum); + } + if (eq_avg == 0) + eq_avg = 1; + stat->avg_eq[i] = eq_avg; } } -/* - * Given two IndexSample arguments, compare there payloads +/** + * Given two key arguments, compare there payloads. + * This is a simple wrapper around key_compare() to support + * qsort() interface. */ static int -sampleCompareMsgPack(const void *a, const void *b, void *arg) +sample_compare(const void *a, const void *b, void *arg) { struct key_def *def = (struct key_def *)arg; - return key_compare(((IndexSample *) a)->p, ((IndexSample *) b)->p, def); + return key_compare(((struct index_sample *) a)->sample_key, + ((struct index_sample *) b)->sample_key, def); } -/* +/** * Load the content from the _sql_stat4 table - * into the relevant Index.aSample[] arrays. + * into the relevant index->stat->samples[] arrays. + * + * Arguments must point to SQL statements that return + * data equivalent to the following: * - * Arguments zSql1 and zSql2 must point to SQL statements that return - * data equivalent to the following (statements are different for stat4, - * see the caller of this function for details): + * prepare: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx; + * load: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4; * - * zSql1: SELECT tbl,idx,count(*) FROM _sql_stat4 GROUP BY tbl,idx - * zSql2: SELECT tbl,idx,neq,nlt,ndlt,sample FROM _sql_stat4 + * 'prepare' statement is used to allocate enough memory for + * statistics (i.e. arrays lt, dt, dlt and avg_eq). 'load' query + * is needed for * + * @param db Database handler. + * @param sql_select_prepare SELECT statement, see above. + * @param sql_select_load SELECT statement, see above. + * @param[out] stats Statistics is saved here. + * @retval 0 on success, -1 otherwise. */ static int -loadStatTbl(sqlite3 * db, /* Database handle */ - Table * pStatTab, /* Stat table to load to */ - const char *zSql1, /* SQL statement 1 (see above) */ - const char *zSql2 /* SQL statement 2 (see above) */ - ) +load_stat_from_space(struct sqlite3 *db, const char *sql_select_prepare, + const char *sql_select_load, struct index_stat *stats) { - int rc; /* Result codes from subroutines */ - sqlite3_stmt *pStmt = 0; /* An SQL statement being run */ - Index *pPrevIdx = 0; /* Previous index in the loop */ - IndexSample *pSample; /* A slot in pIdx->aSample[] */ - int nIdxCnt = 0; /* Number of different indexes visited */ - int nIdx = 0; /* Index loop iterator */ - Index **aIndex = 0; /* Array holding all visited indexes */ - - assert(db->lookaside.bDisable); - - assert(pStatTab); - nIdxCnt = box_index_len(SQLITE_PAGENO_TO_SPACEID(pStatTab->tnum), 0); - - if (nIdxCnt > 0) { - aIndex = sqlite3DbMallocZero(db, sizeof(Index *) * nIdxCnt); - if (aIndex == 0) { - return SQLITE_NOMEM_BKPT; + struct index **indexes = NULL; + uint32_t index_count = box_index_len(BOX_SQL_STAT4_ID, 0); + if (index_count > 0) { + size_t alloc_size = sizeof(struct index *) * index_count; + indexes = region_alloc(&fiber()->gc, alloc_size); + if (indexes == NULL) { + diag_set(OutOfMemory, alloc_size, "region", "indexes"); + return -1; } } - - rc = sqlite3_prepare(db, zSql1, -1, &pStmt, 0); + sqlite3_stmt *stmt = NULL; + int rc = sqlite3_prepare(db, sql_select_prepare, -1, &stmt, 0); if (rc) goto finalize; - - while (sqlite3_step(pStmt) == SQLITE_ROW) { - int nIdxCol = 1; /* Number of columns in stat4 records */ - - char *zTab; /* Table name */ - char *zIndex; /* Index name */ - Index *pIdx; /* Pointer to the index object */ - int nSample; /* Number of samples */ - int nByte; /* Bytes of space required */ - int i; /* Bytes of space required */ - tRowcnt *pSpace; - - zTab = (char *)sqlite3_column_text(pStmt, 0); - if (zTab == 0) + uint32_t current_idx_count = 0; + while (sqlite3_step(stmt) == SQLITE_ROW) { + const char *space_name = (char *)sqlite3_column_text(stmt, 0); + if (space_name == NULL) continue; - zIndex = (char *)sqlite3_column_text(pStmt, 1); - if (zIndex == 0) + const char *index_name = (char *)sqlite3_column_text(stmt, 1); + if (index_name == NULL) continue; - nSample = sqlite3_column_int(pStmt, 2); - pIdx = sqlite3LocateIndex(db, zIndex, zTab); - assert(pIdx == 0 || pIdx->nSample == 0); - /* Index.nSample is non-zero at this point if data has already been - * loaded from the stat4 table. + uint32_t sample_count = sqlite3_column_int(stmt, 2); + uint32_t space_id = box_space_id_by_name(space_name, + strlen(space_name)); + assert(space_id != BOX_ID_NIL); + struct space *space = space_by_id(space_id); + assert(space != NULL); + struct index *index; + uint32_t iid = box_index_id_by_name(space_id, index_name, + strlen(index_name)); + if (sqlite3_stricmp(space_name, index_name) == 0 && + iid == BOX_ID_NIL) + index = space_index(space, 0); + else + index = space_index(space, iid); + assert(index != NULL); + uint32_t column_count = index->def->key_def->part_count; + struct index_stat *stat = &stats[current_idx_count]; + stat->sample_field_count = column_count; + stat->sample_count = 0; + /* Space for sample structs. */ + size_t alloc_size = sizeof(struct index_sample) * sample_count; + /* Space for eq, lt and dlt stats. */ + alloc_size += sizeof(tRowcnt) * column_count * 3 * sample_count; + /* Space for avg_eq statistics. */ + alloc_size += column_count * sizeof(tRowcnt); + /* + * We are trying to fit into one chunk samples, + * eq_avg and arrays of eq, lt and dlt stats. + * Firstly comes memory for structs of samples, + * than array of eq_avg and finally arrays of + * eq, lt and dlt stats. */ - if (pIdx == 0 || pIdx->nSample) - continue; - - nIdxCol = index_column_count(pIdx); - pIdx->nSampleCol = nIdxCol; - nByte = sizeof(IndexSample) * nSample; - nByte += sizeof(tRowcnt) * nIdxCol * 3 * nSample; - nByte += nIdxCol * sizeof(tRowcnt); /* Space for Index.aAvgEq[] */ - - pIdx->aSample = sqlite3DbMallocZero(db, nByte); - if (pIdx->aSample == 0) { - sqlite3_finalize(pStmt); - rc = SQLITE_NOMEM_BKPT; + stat->samples = region_alloc(&fiber()->gc, alloc_size); + if (stat->samples == NULL) { + diag_set(OutOfMemory, alloc_size, "region", "samples"); + rc = -1; goto finalize; } - pSpace = (tRowcnt *) & pIdx->aSample[nSample]; - pIdx->aAvgEq = pSpace; - pSpace += nIdxCol; - for (i = 0; i < nSample; i++) { - pIdx->aSample[i].anEq = pSpace; - pSpace += nIdxCol; - pIdx->aSample[i].anLt = pSpace; - pSpace += nIdxCol; - pIdx->aSample[i].anDLt = pSpace; - pSpace += nIdxCol; + memset(stat->samples, 0, alloc_size); + /* Marking memory manually. */ + tRowcnt *pSpace = (tRowcnt *) & stat->samples[sample_count]; + stat->avg_eq = pSpace; + pSpace += column_count; + for (uint32_t i = 0; i < sample_count; i++) { + stat->samples[i].eq = pSpace; + pSpace += column_count; + stat->samples[i].lt = pSpace; + pSpace += column_count; + stat->samples[i].dlt = pSpace; + pSpace += column_count; } - assert(((u8 *) pSpace) - nByte == (u8 *) (pIdx->aSample)); + assert(((u8 *) pSpace) - alloc_size == (u8 *) (stat->samples)); + indexes[current_idx_count] = index; + assert(current_idx_count < index_count); + current_idx_count++; - aIndex[nIdx] = pIdx; - assert(nIdx < nIdxCnt); - ++nIdx; } - rc = sqlite3_finalize(pStmt); + rc = sqlite3_finalize(stmt); if (rc) goto finalize; - - rc = sqlite3_prepare(db, zSql2, -1, &pStmt, 0); + rc = sqlite3_prepare(db, sql_select_load, -1, &stmt, 0); if (rc) goto finalize; - - while (sqlite3_step(pStmt) == SQLITE_ROW) { - char *zTab; /* Table name */ - char *zIndex; /* Index name */ - Index *pIdx; /* Pointer to the index object */ - int nCol = 1; /* Number of columns in index */ - - zTab = (char *)sqlite3_column_text(pStmt, 0); - if (zTab == 0) + struct index *prev_index = NULL; + current_idx_count = 0; + while (sqlite3_step(stmt) == SQLITE_ROW) { + const char *space_name = (char *)sqlite3_column_text(stmt, 0); + if (space_name == NULL) continue; - zIndex = (char *)sqlite3_column_text(pStmt, 1); - if (zIndex == 0) + const char *index_name = (char *)sqlite3_column_text(stmt, 1); + if (index_name == NULL) continue; - pIdx = sqlite3LocateIndex(db, zIndex, zTab); - if (pIdx == 0) - continue; - - nCol = pIdx->nSampleCol; - if (pIdx != pPrevIdx) { - initAvgEq(pPrevIdx); - pPrevIdx = pIdx; + uint32_t space_id = box_space_id_by_name(space_name, + strlen(space_name)); + assert(space_id != BOX_ID_NIL); + struct space *space = space_by_id(space_id); + assert(space != NULL); + struct index *index; + uint32_t iid = box_index_id_by_name(space_id, index_name, + strlen(index_name)); + if (iid != BOX_ID_NIL) { + index = space_index(space, iid); + } else { + if (sqlite3_stricmp(space_name, index_name) != 0) + return -1; + index = space_index(space, 0); + } + assert(index != NULL); + uint32_t column_count = index->def->key_def->part_count; + if (index != prev_index) { + if (prev_index != NULL) { + init_avg_eq(prev_index, + &stats[current_idx_count]); + current_idx_count++; + } + prev_index = index; } - pSample = &pIdx->aSample[pIdx->nSample]; - decodeIntArray((char *)sqlite3_column_text(pStmt, 2), nCol, - pSample->anEq, 0, 0); - decodeIntArray((char *)sqlite3_column_text(pStmt, 3), nCol, - pSample->anLt, 0, 0); - decodeIntArray((char *)sqlite3_column_text(pStmt, 4), nCol, - pSample->anDLt, 0, 0); - - /* Take a copy of the sample. Add two 0x00 bytes the end of the buffer. - * This is in case the sample record is corrupted. In that case, the - * sqlite3VdbeRecordCompare() may read up to two varints past the - * end of the allocated buffer before it realizes it is dealing with - * a corrupt record. Adding the two 0x00 bytes prevents this from causing - * a buffer overread. + struct index_stat *stat = &stats[current_idx_count]; + struct index_sample *sample = + &stat->samples[stats[current_idx_count].sample_count]; + decode_stat_string((char *)sqlite3_column_text(stmt, 2), + column_count, sample->eq, 0); + decode_stat_string((char *)sqlite3_column_text(stmt, 3), + column_count, sample->lt, 0); + decode_stat_string((char *)sqlite3_column_text(stmt, 4), + column_count, sample->dlt, 0); + /* + * Take a copy of the sample. Add two 0x00 bytes + * the end of the buffer. This is in case the + * sample record is corrupted. In that case, the + * sqlite3VdbeRecordCompare() may read up to two + * varints past the end of the allocated buffer + * before it realizes it is dealing with a corrupt + * record. Adding the two 0x00 bytes prevents this + * from causing a buffer overread. */ - pSample->n = sqlite3_column_bytes(pStmt, 5); - pSample->p = sqlite3DbMallocZero(db, pSample->n + 2); - if (pSample->p == 0) { - sqlite3_finalize(pStmt); - rc = SQLITE_NOMEM_BKPT; + sample->key_size = sqlite3_column_bytes(stmt, 5); + sample->sample_key = region_alloc(&fiber()->gc, + sample->key_size + 2); + if (sample->sample_key == NULL) { + sqlite3_finalize(stmt); + rc = -1; + diag_set(OutOfMemory, sample->key_size + 2, + "region", "sample_key"); goto finalize; } - if (pSample->n) { - memcpy(pSample->p, sqlite3_column_blob(pStmt, 5), - pSample->n); + if (sample->key_size > 0) { + memcpy(sample->sample_key, + sqlite3_column_blob(stmt, 5), + sample->key_size); } - pIdx->nSample++; + stats[current_idx_count].sample_count++; } - rc = sqlite3_finalize(pStmt); - if (rc == SQLITE_OK) - initAvgEq(pPrevIdx); - - assert(nIdx <= nIdxCnt); - for (int i = 0; i < nIdx; ++i) { - Index *pIdx = aIndex[i]; - assert(pIdx); - /* - sid, iid - * - space_index_key_def - * - key_compare - */ - int sid = SQLITE_PAGENO_TO_SPACEID(pIdx->tnum); - int iid = SQLITE_PAGENO_TO_INDEXID(pIdx->tnum); - struct space *s = space_by_id(sid); - assert(s); - struct key_def *def = space_index_key_def(s, iid); - assert(def); - qsort_arg(pIdx->aSample, - pIdx->nSample, - sizeof(pIdx->aSample[0]), sampleCompareMsgPack, def); + rc = sqlite3_finalize(stmt); + if (rc == SQLITE_OK && prev_index != NULL) + init_avg_eq(prev_index, &stats[current_idx_count]); + assert(current_idx_count <= index_count); + for (uint32_t i = 0; i < current_idx_count; ++i) { + struct index *index = indexes[i]; + assert(index != NULL); + qsort_arg(stats[i].samples, + stats[i].sample_count, + sizeof(struct index_sample), + sample_compare, index->def->key_def); } - finalize: - sqlite3DbFree(db, aIndex); return rc; } -/* - * Load content from the _sql_stat4 table into - * the Index.aSample[] arrays of all indices. - */ static int -loadStat4(sqlite3 * db) +load_stat_to_index(struct sqlite3 *db, const char *sql_select_load, + struct index_stat **stats) { - Table *pTab = 0; /* Pointer to stat table */ - - assert(db->lookaside.bDisable); - pTab = sqlite3HashFind(&db->pSchema->tblHash, "_sql_stat4"); - /* _slq_stat4 is a system space, so it always exists. */ - assert(pTab != NULL); - return loadStatTbl(db, pTab, - "SELECT \"tbl\",\"idx\",count(*) FROM \"_sql_stat4\"" - " GROUP BY \"tbl\",\"idx\"", - "SELECT \"tbl\",\"idx\",\"neq\",\"nlt\",\"ndlt\"," - "\"sample\" FROM \"_sql_stat4\""); + assert(stats != NULL && *stats != NULL); + struct sqlite3_stmt *stmt = NULL; + if (sqlite3_prepare(db, sql_select_load, -1, &stmt, 0) != 0) + return -1; + uint32_t current_idx_count = 0; + while (sqlite3_step(stmt) == SQLITE_ROW) { + const char *space_name = (char *)sqlite3_column_text(stmt, 0); + if (space_name == NULL) + continue; + const char *index_name = (char *)sqlite3_column_text(stmt, 1); + if (index_name == NULL) + continue; + uint32_t space_id = box_space_id_by_name(space_name, + strlen(space_name)); + if (space_id == BOX_ID_NIL) + return -1; + struct space *space = space_by_id(space_id); + assert(space != NULL); + struct index *index; + uint32_t iid = box_index_id_by_name(space_id, index_name, + strlen(index_name)); + if (iid != BOX_ID_NIL) { + index = space_index(space, iid); + } else { + if (sqlite3_stricmp(space_name, index_name) != 0) + return -1; + index = space_index(space, 0); + } + assert(index != NULL); + free(index->def->opts.stat); + index->def->opts.stat = stats[current_idx_count++]; + } + return 0; } +/** + * default_tuple_est[] array contains default information + * which is used when we don't have real space, e.g. temporary + * objects representing result set of nested SELECT or VIEW. + * + * First number is supposed to contain the number of elements + * in the index. Since we do not know, guess 1 million. + * Second one is an estimate of the number of rows in the + * table that match any particular value of the first column of + * the index. Third one is an estimate of the number of + * rows that match any particular combination of the first 2 + * columns of the index. And so on. It must always be true: + * + * default_tuple_est[N] <= default_tuple_est[N-1] + * default_tuple_est[N] >= 1 + * + * Apart from that, we have little to go on besides intuition + * as to how default values should be initialized. The numbers + * generated here are based on typical values found in actual + * indices. + */ +const log_est_t default_tuple_est[] = {DEFAULT_TUPLE_LOG_COUNT, +/** [10*log_{2}(x)]: 10, 9, 8, 7, 6, 5 */ + 33, 32, 30, 28, 26, 23}; + LogEst sql_space_tuple_log_count(struct Table *tab) { @@ -1643,84 +1674,177 @@ sql_space_tuple_log_count(struct Table *tab) return sqlite3LogEst(pk->vtab->size(pk)); } -/* - * Load the content of the _sql_stat1 and sql_stat4 tables. The - * contents of _sql_stat1 are used to populate the Index.aiRowEst[] -*\* arrays. The contents of sql_stat4 are used to populate the - * Index.aSample[] arrays. - * - * If the _sql_stat1 table is not present in the database, SQLITE_ERROR - * is returned. In this case, even if the sql_stat4 table is present, - * no data is read from it. - * - * If the _sql_stat4 table is not present in the database, SQLITE_ERROR - * is returned. However, in this case, data is read from the _sql_stat1 - * table (if it is present) before returning. +log_est_t +index_field_tuple_est(struct Index *idx, uint32_t field) +{ + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); + if (space == NULL) + return idx->aiRowLogEst[field]; + struct index *tnt_idx = + space_index(space, SQLITE_PAGENO_TO_INDEXID(idx->tnum)); + assert(tnt_idx != NULL); + assert(field <= tnt_idx->def->key_def->part_count); + if (tnt_idx->def->opts.stat == NULL) { + /* + * Last number for unique index is always 0: + * only one tuple exists with given full key + * in unique index and log(1) == 0. + */ + if (field == tnt_idx->def->key_def->part_count && + tnt_idx->def->opts.is_unique) + return 0; + return default_tuple_est[field + 1 >= 6 ? 6 : field]; + } + return tnt_idx->def->opts.stat->tuple_log_est[field]; +} + +/** + * This function performs copy of statistics. + * In contrast to index_stat_dup(), there is no assumption + * that source statistics is allocated within chunk. But + * destination place is still one big chunk of heap memory. + * See also index_stat_sizeof() for understanding memory layout. * - * If an OOM error occurs, this function always sets db->mallocFailed. - * This means if the caller does not care about other errors, the return - * code may be ignored. + * @param dest One chunk of memory where statistics + * will be placed. + * @param src Statistics to be copied. */ -int -sqlite3AnalysisLoad(sqlite3 * db) +static void +stat_copy(struct index_stat *dest, const struct index_stat *src) { - analysisInfo sInfo; - HashElem *i, *j; - char *zSql; - int rc = SQLITE_OK; - - /* Clear any prior statistics */ - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; - j = sqliteHashNext(j)) { - Table *pTab = sqliteHashData(j); - for (i = sqliteHashFirst(&pTab->idxHash); i; - i = sqliteHashNext(i)) { - Index *pIdx = sqliteHashData(i); - pIdx->aiRowLogEst[0] = 0; - sqlite3DeleteIndexSamples(db, pIdx); - pIdx->aSample = 0; - } + assert(dest != NULL); + assert(src != NULL); + assert(src->sample_count != 0); + dest->sample_count = src->sample_count; + dest->sample_field_count = src->sample_field_count; + dest->skip_scan_enabled = src->skip_scan_enabled; + dest->is_unordered = src->is_unordered; + uint32_t array_size = src->sample_field_count * sizeof(uint32_t); + uint32_t stat1_offset = sizeof(struct index_stat); + memcpy((char *)dest + stat1_offset, src->tuple_stat1, + array_size + sizeof(uint32_t)); + memcpy((char *)dest + stat1_offset + array_size + sizeof(uint32_t), + src->tuple_log_est, array_size + sizeof(uint32_t)); + memcpy((char *)dest + stat1_offset + 2 * (array_size + sizeof(uint32_t)), + src->avg_eq, array_size); + dest->tuple_stat1 = (uint32_t *)((char *)dest + stat1_offset); + dest->tuple_log_est = (log_est_t *)((char *)dest + stat1_offset + + array_size + sizeof(uint32_t)); + dest->avg_eq = (uint32_t *)((char *)dest + stat1_offset + + 2 * (array_size + sizeof(uint32_t))); + uint32_t samples_offset = stat1_offset + 3 * array_size + + 2 * sizeof(uint32_t); + dest->samples = (struct index_sample *)((char*)dest + samples_offset); + uint32_t current_sample_offset = samples_offset + dest->sample_count * + sizeof(struct index_sample); + assert(src->samples != NULL); + for (uint32_t i = 0; i < dest->sample_count; ++i) { + dest->samples[i].key_size = src->samples[i].key_size; + memcpy((char *)dest + current_sample_offset, src->samples[i].eq, + array_size); + dest->samples[i].eq = (uint32_t *)((char *)dest + + current_sample_offset); + current_sample_offset += array_size; + memcpy((char *)dest + current_sample_offset, src->samples[i].lt, + array_size); + dest->samples[i].lt = (uint32_t *)((char *)dest + + current_sample_offset); + current_sample_offset += array_size; + memcpy((char *)dest + current_sample_offset, + src->samples[i].dlt, array_size); + dest->samples[i].dlt = (uint32_t *)((char *)dest + + current_sample_offset); + current_sample_offset += array_size; + memcpy((char *)dest + current_sample_offset, + src->samples[i].sample_key, + src->samples[i].key_size + 2); + dest->samples[i].sample_key = (char *)dest + + current_sample_offset; + current_sample_offset += dest->samples[i].key_size + 2; } +} - /* Load new statistics out of the _sql_stat1 table */ - sInfo.db = db; - zSql = "SELECT \"tbl\",\"idx\",\"stat\" FROM \"_sql_stat1\""; - rc = sqlite3_exec(db, zSql, analysisLoader, &sInfo, 0); - - /* Set appropriate defaults on all indexes not in the _sql_stat1 table */ - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; - j = sqliteHashNext(j)) { - Table *pTab = sqliteHashData(j); - for (i = sqliteHashFirst(&pTab->idxHash); i; - i = sqliteHashNext(i)) { - Index *pIdx = sqliteHashData(i); - if (pIdx->aiRowLogEst[0] == 0) - sqlite3DefaultRowEst(pIdx); - } +int +sql_analysis_load(struct sqlite3 *db) +{ + uint32_t index_count = box_index_len(BOX_SQL_STAT4_ID, 0); + if (box_txn_begin() != 0) + goto fail; + size_t stats_size = index_count * sizeof(struct index_stat); + struct index_stat *stats = region_alloc(&fiber()->gc, stats_size); + if (stats == NULL) { + diag_set(OutOfMemory, stats_size, "region", "stats"); + goto fail; } - + struct analysis_index_info info; + info.db = db; + info.stats = stats; + info.index_count = 0; + const char *load_stat1 = + "SELECT \"tbl\",\"idx\",\"stat\" FROM \"_sql_stat1\""; + /* Load new statistics out of the _sql_stat1 table. */ + if (sqlite3_exec(db, load_stat1, analysis_loader, &info, 0) != 0) + goto fail; + /* + * This query is used to allocate enough memory for + * statistics. Result rows are given in a form: + * <table name>, <index name>, <count of samples> + */ + const char *init_query = "SELECT \"tbl\",\"idx\",count(*) FROM " + "\"_sql_stat4\" GROUP BY \"tbl\",\"idx\""; + /* Query for loading statistics into in-memory structs. */ + const char *load_query = "SELECT \"tbl\",\"idx\",\"neq\",\"nlt\"," + "\"ndlt\",\"sample\" FROM \"_sql_stat4\""; /* Load the statistics from the _sql_stat4 table. */ - if (rc == SQLITE_OK && OptimizationEnabled(db, SQLITE_Stat4)) { - db->lookaside.bDisable++; - rc = loadStat4(db); - db->lookaside.bDisable--; + if (load_stat_from_space(db, init_query, load_query, stats) != 0) + goto fail; + /* + * Now we have complete statistics for each index + * allocated on the region. Time to copy it on the heap. + */ + size_t heap_stats_size = info.index_count * sizeof(struct index_stat *); + struct index_stat **heap_stats = region_alloc(&fiber()->gc, + heap_stats_size); + if (heap_stats == NULL) { + diag_set(OutOfMemory, heap_stats_size, "malloc", "heap_stats"); + goto fail; } - - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; - j = sqliteHashNext(j)) { - Table *pTab = sqliteHashData(j); - for (i = sqliteHashFirst(&pTab->idxHash); i; - i = sqliteHashNext(i)) { - Index *pIdx = sqliteHashData(i); - sqlite3_free(pIdx->aiRowEst); - pIdx->aiRowEst = 0; + /* + * We are using 'everything or nothing' policy: + * if there is no enough memory for statistics even for + * one index, then refresh it for no one. + */ + for (uint32_t i = 0; i < info.index_count; ++i) { + size_t size = index_stat_sizeof(stats[i].samples, + stats[i].sample_count, + stats[i].sample_field_count); + heap_stats[i] = malloc(size); + if (heap_stats[i] == NULL) { + diag_set(OutOfMemory, size, "malloc", "heap_stats"); + for (uint32_t j = 0; j < i; ++j) + free(heap_stats[j]); + goto fail; } } - - if (rc == SQLITE_NOMEM) { - sqlite3OomFault(db); - } - return rc; + /* + * We can't use stat_dup function since statistics on + * region doesn't fit into one memory chunk. Lets + * manually copy memory chunks and mark memory. + */ + for (uint32_t i = 0; i < info.index_count; ++i) + stat_copy(heap_stats[i], &stats[i]); + /* + * Ordered query is needed to be sure that indexes come + * in the same order as in previous SELECTs. + */ + const char *order_query = "SELECT \"tbl\",\"idx\" FROM " + "\"_sql_stat4\" GROUP BY \"tbl\",\"idx\""; + if (load_stat_to_index(db, order_query, heap_stats) != 0) + goto fail; + if (box_txn_commit() != 0) + return SQL_TARANTOOL_ERROR; + return SQLITE_OK; +fail: + box_txn_rollback(); + return SQL_TARANTOOL_ERROR; } - -#endif /* SQLITE_OMIT_ANALYZE */ diff --git a/src/box/sql/build.c b/src/box/sql/build.c index f860e331e..034a8928a 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -214,13 +214,9 @@ sqlite3LocateIndex(sqlite3 * db, const char *zName, const char *zTable) static void freeIndex(sqlite3 * db, Index * p) { -#ifndef SQLITE_OMIT_ANALYZE - sqlite3DeleteIndexSamples(db, p); -#endif sql_expr_free(db, p->pPartIdxWhere, false); sqlite3ExprListDelete(db, p->aColExpr); sqlite3DbFree(db, p->zColAff); - sqlite3_free(p->aiRowEst); sqlite3DbFree(db, p); } @@ -3034,9 +3030,6 @@ sqlite3CreateIndex(Parse * pParse, /* All information about this parse */ requestedSortOrder = pListItem->sortOrder & 0; pIndex->aSortOrder[i] = (u8) requestedSortOrder; } - - sqlite3DefaultRowEst(pIndex); - if (pTab == pParse->pNewTable) { /* This routine has been called to create an automatic index as a * result of a PRIMARY KEY or UNIQUE clause on a column definition, or @@ -3213,60 +3206,6 @@ sqlite3CreateIndex(Parse * pParse, /* All information about this parse */ sqlite3DbFree(db, zName); } -/* - * Fill the Index.aiRowEst[] array with default information - information - * to be used when we have not run the ANALYZE command. - * - * aiRowEst[0] is supposed to contain the number of elements in the index. - * Since we do not know, guess 1 million. aiRowEst[1] is an estimate of the - * number of rows in the table that match any particular value of the - * first column of the index. aiRowEst[2] is an estimate of the number - * of rows that match any particular combination of the first 2 columns - * of the index. And so forth. It must always be the case that -* - * aiRowEst[N]<=aiRowEst[N-1] - * aiRowEst[N]>=1 - * - * Apart from that, we have little to go on besides intuition as to - * how aiRowEst[] should be initialized. The numbers generated here - * are based on typical values found in actual indices. - */ -void -sqlite3DefaultRowEst(Index * pIdx) -{ - /* 10, 9, 8, 7, 6 */ - LogEst aVal[] = { 33, 32, 30, 28, 26 }; - LogEst *a = pIdx->aiRowLogEst; - int nCopy = MIN(ArraySize(aVal), pIdx->nColumn); - int i; - - /* Set the first entry (number of rows in the index) to the estimated - * number of rows in the table, or half the number of rows in the table - * for a partial index. But do not let the estimate drop below 10. - */ - a[0] = pIdx->pTable->tuple_log_count; - if (pIdx->pPartIdxWhere != 0) - a[0] -= 10; - assert(10 == sqlite3LogEst(2)); - if (a[0] < 33) - a[0] = 33; - assert(33 == sqlite3LogEst(10)); - - /* Estimate that a[1] is 10, a[2] is 9, a[3] is 8, a[4] is 7, a[5] is - * 6 and each subsequent value (if any) is 5. - */ - memcpy(&a[1], aVal, nCopy * sizeof(LogEst)); - int col_count = index_column_count(pIdx); - for (i = nCopy + 1; i <= col_count; i++) { - a[i] = 23; - assert(23 == sqlite3LogEst(5)); - } - - assert(0 == sqlite3LogEst(1)); - if (IsUniqueIndex(pIdx)) - a[pIdx->nColumn] = 0; -} - /** * Return number of columns in given index. * If space is ephemeral, use internal diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y index bffd065fb..059e5d037 100644 --- a/src/box/sql/parse.y +++ b/src/box/sql/parse.y @@ -1484,10 +1484,8 @@ cmd ::= DROP TRIGGER ifexists(NOERR) fullname(X). { /* %endif SQLITE_OMIT_REINDEX */ /////////////////////////////////// ANALYZE /////////////////////////////////// -%ifndef SQLITE_OMIT_ANALYZE cmd ::= ANALYZE. {sqlite3Analyze(pParse, 0);} cmd ::= ANALYZE nm(X). {sqlite3Analyze(pParse, &X);} -%endif //////////////////////// ALTER TABLE table ... //////////////////////////////// %ifndef SQLITE_OMIT_ALTERTABLE diff --git a/src/box/sql/pragma.c b/src/box/sql/pragma.c index 117d5ba88..f9cf4e89a 100644 --- a/src/box/sql/pragma.c +++ b/src/box/sql/pragma.c @@ -425,8 +425,7 @@ sqlite3Pragma(Parse * pParse, Token * pId, /* First part of [schema.]id field */ sqlite3VdbeMultiLoad(v, 2, "sii", pIdx->zName, avg_tuple_size_idx, - pIdx-> - aiRowLogEst[0]); + index_field_tuple_est(pIdx, 0)); sqlite3VdbeAddOp2(v, OP_ResultRow, 1, 4); } diff --git a/src/box/sql/prepare.c b/src/box/sql/prepare.c index 2ab8751a4..322d51e09 100644 --- a/src/box/sql/prepare.c +++ b/src/box/sql/prepare.c @@ -186,11 +186,8 @@ sqlite3InitDatabase(sqlite3 * db) assert(db->init.busy); { rc = initData.rc; -#ifndef SQLITE_OMIT_ANALYZE - if (rc == SQLITE_OK) { - sqlite3AnalysisLoad(db); - } -#endif + if (rc == SQLITE_OK) + sql_analysis_load(db); } if (db->mallocFailed) { rc = SQLITE_NOMEM_BKPT; diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h index 973ae1a44..d0f6926c1 100644 --- a/src/box/sql/sqliteInt.h +++ b/src/box/sql/sqliteInt.h @@ -1460,7 +1460,6 @@ typedef struct FuncDef FuncDef; typedef struct FuncDefHash FuncDefHash; typedef struct IdList IdList; typedef struct Index Index; -typedef struct IndexSample IndexSample; typedef struct KeyClass KeyClass; typedef struct KeyInfo KeyInfo; typedef struct Lookaside Lookaside; @@ -1682,7 +1681,6 @@ struct sqlite3 { #define SQLITE_SubqCoroutine 0x0100 /* Evaluate subqueries as coroutines */ #define SQLITE_Transitive 0x0200 /* Transitive constraints */ #define SQLITE_OmitNoopJoin 0x0400 /* Omit unused tables in joins */ -#define SQLITE_Stat4 0x0800 /* Use STAT4 data */ #define SQLITE_CursorHints 0x2000 /* Add OP_CursorHint opcodes */ #define SQLITE_AllOpts 0xffff /* All optimizations */ @@ -2163,15 +2161,30 @@ struct Index { * or _NONE */ unsigned idxType:2; /* 1==UNIQUE, 2==PRIMARY KEY, 0==CREATE INDEX */ - unsigned bUnordered:1; /* Use this index for == or IN queries only */ - unsigned noSkipScan:1; /* Do not try to use skip-scan if true */ - int nSample; /* Number of elements in aSample[] */ - int nSampleCol; /* Size of IndexSample.anEq[] and so on */ - tRowcnt *aAvgEq; /* Average nEq values for keys not in aSample */ - IndexSample *aSample; /* Samples of the left-most key */ - tRowcnt *aiRowEst; /* Non-logarithmic stat1 data for this index */ }; +/** + * Fetch statistics concerning tuples to be selected: + * logarithm of number of tuples which has the same key as for + * the first key parts specified by second argument. + * Simple example (without logarithms): + * idx = {{1,2}, {1,2}, {2, 4}, {2, 3}, {3, 5}} + * stat[1] = (2 + 2 + 1) / 3 = 2 - two keys which start from 1, + * two - from 2, and one from 3. + * stat[2] = (2 + 1 + 1 + 1) / 4 = 1 - only one coincidence of + * keys: {1, 2}; the rest are different. + * Notice, that stat[0] is an average number of tuples in a whole + * index. By default it is DEFAULT_TUPLE_LOG_COUNT == 200. + * If there is no appropriate Tarantool's index, + * return one of default values. + * + * @param idx Index. + * @param field Number of field to be examined. + * @retval Estimate logarithm of tuples selected by given field. + */ +log_est_t +index_field_tuple_est(struct Index *idx, uint32_t field); + /* * Allowed values for Index.idxType */ @@ -2191,19 +2204,9 @@ struct Index { */ #define XN_EXPR (-2) /* Indexed column is an expression */ -/* - * Each sample stored in the sql_stat4 table is represented in memory - * using a structure of this type. See documentation at the top of the - * analyze.c source file for additional information. - */ -struct IndexSample { - void *p; /* Pointer to sampled record */ - int n; /* Size of record in bytes */ - tRowcnt *anEq; /* Est. number of rows where the key equals this sample */ - tRowcnt *anLt; /* Est. number of rows where key is less than this sample */ - tRowcnt *anDLt; /* Est. number of distinct keys less than this sample */ -}; - +#ifdef DEFAULT_TUPLE_COUNT +#undef DEFAULT_TUPLE_COUNT +#endif #define DEFAULT_TUPLE_COUNT 1048576 /** [10*log_{2}(1048576)] == 200 */ #define DEFAULT_TUPLE_LOG_COUNT 200 @@ -3960,9 +3963,19 @@ ssize_t sql_index_tuple_size(struct space *space, struct index *idx); int sqlite3InvokeBusyHandler(BusyHandler *); -int sqlite3AnalysisLoad(sqlite3 *); -void sqlite3DeleteIndexSamples(sqlite3 *, Index *); -void sqlite3DefaultRowEst(Index *); + +/** + * Load the content of the _sql_stat1 and sql_stat4 tables. The + * contents of _sql_stat1 are used to populate the tuple_stat1[] + * arrays. The contents of sql_stat4 are used to populate the + * samples[] arrays. + * + * @param db Database handler. + * @retval SQLITE_OK on success, smth else otherwise. + */ +int +sql_analysis_load(struct sqlite3 *db); + uint32_t index_column_count(const Index *); bool diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index 96f9d4063..2adc17ed2 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -4830,7 +4830,7 @@ case OP_RenameTable: { sqlite3DbFree(db, (void*)zSqlStmt); break; } -#if !defined(SQLITE_OMIT_ANALYZE) + /* Opcode: LoadAnalysis P1 * * * * * * Read the sql_stat1 table for database P1 and load the content @@ -4839,11 +4839,10 @@ case OP_RenameTable: { */ case OP_LoadAnalysis: { assert(pOp->p1==0 ); - rc = sqlite3AnalysisLoad(db); + rc = sql_analysis_load(db); if (rc) goto abort_due_to_error; break; } -#endif /* !defined(SQLITE_OMIT_ANALYZE) */ /* Opcode: DropTable P1 * * P4 * * diff --git a/src/box/sql/where.c b/src/box/sql/where.c index 39963b10e..e62fedddc 100644 --- a/src/box/sql/where.c +++ b/src/box/sql/where.c @@ -888,7 +888,16 @@ whereKeyStats(Parse * pParse, /* Database connection */ int roundUp, /* Round up if true. Round down if false */ tRowcnt * aStat) /* OUT: stats written here */ { - IndexSample *aSample = pIdx->aSample; + uint32_t space_id = SQLITE_PAGENO_TO_SPACEID(pIdx->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = SQLITE_PAGENO_TO_INDEXID(pIdx->tnum); + struct index *idx = space_index(space, iid); + assert(idx != NULL && idx->def->opts.stat != NULL); + struct index_sample *samples = idx->def->opts.stat->samples; + assert(idx->def->opts.stat->sample_count > 0); + assert(idx->def->opts.stat->samples != NULL); + assert(idx->def->opts.stat->sample_field_count >= pRec->nField); int iCol; /* Index of required stats in anEq[] etc. */ int i; /* Index of first sample >= pRec */ int iSample; /* Smallest sample larger than or equal to pRec */ @@ -902,8 +911,7 @@ whereKeyStats(Parse * pParse, /* Database connection */ UNUSED_PARAMETER(pParse); #endif assert(pRec != 0); - assert(pIdx->nSample > 0); - assert(pRec->nField > 0 && pRec->nField <= pIdx->nSampleCol); + assert(pRec->nField > 0); /* Do a binary search to find the first sample greater than or equal * to pRec. If pRec contains a single field, the set of samples to search @@ -951,7 +959,8 @@ whereKeyStats(Parse * pParse, /* Database connection */ */ nField = pRec->nField; iCol = 0; - iSample = pIdx->nSample * nField; + uint32_t sample_count = idx->def->opts.stat->sample_count; + iSample = sample_count * nField; do { int iSamp; /* Index in aSample[] of test sample */ int n; /* Number of fields in test sample */ @@ -964,8 +973,8 @@ whereKeyStats(Parse * pParse, /* Database connection */ * fields that is greater than the previous effective sample. */ for (n = (iTest % nField) + 1; n < nField; n++) { - if (aSample[iSamp - 1].anLt[n - 1] != - aSample[iSamp].anLt[n - 1]) + if (samples[iSamp - 1].lt[n - 1] != + samples[iSamp].lt[n - 1]) break; } } else { @@ -974,15 +983,15 @@ whereKeyStats(Parse * pParse, /* Database connection */ pRec->nField = n; res = - sqlite3VdbeRecordCompareMsgpack(aSample[iSamp].n, - aSample[iSamp].p, pRec); + sqlite3VdbeRecordCompareMsgpack(samples[iSamp].key_size, + samples[iSamp].sample_key, + pRec); if (res < 0) { iLower = - aSample[iSamp].anLt[n - 1] + aSample[iSamp].anEq[n - - 1]; + samples[iSamp].lt[n - 1] + samples[iSamp].eq[n - 1]; iMin = iTest + 1; } else if (res == 0 && n < nField) { - iLower = aSample[iSamp].anLt[n - 1]; + iLower = samples[iSamp].lt[n - 1]; iMin = iTest + 1; res = -1; } else { @@ -1000,12 +1009,12 @@ whereKeyStats(Parse * pParse, /* Database connection */ if (pParse->db->mallocFailed == 0) { if (res == 0) { /* If (res==0) is true, then pRec must be equal to sample i. */ - assert(i < pIdx->nSample); + assert(i < (int) sample_count); assert(iCol == nField - 1); pRec->nField = nField; assert(0 == - sqlite3VdbeRecordCompareMsgpack(aSample[i].n, - aSample[i].p, + sqlite3VdbeRecordCompareMsgpack(samples[i].key_size, + samples[i].sample_key, pRec) || pParse->db->mallocFailed); } else { @@ -1013,12 +1022,12 @@ whereKeyStats(Parse * pParse, /* Database connection */ * all samples in the aSample[] array, pRec must be smaller than the * (iCol+1) field prefix of sample i. */ - assert(i <= pIdx->nSample && i >= 0); + assert(i <= (int) sample_count && i >= 0); pRec->nField = iCol + 1; - assert(i == pIdx->nSample - || sqlite3VdbeRecordCompareMsgpack(aSample[i].n, - aSample[i].p, - pRec) > 0 + assert(i == (int) sample_count || + sqlite3VdbeRecordCompareMsgpack(samples[i].key_size, + samples[i].sample_key, + pRec) > 0 || pParse->db->mallocFailed); /* if i==0 and iCol==0, then record pRec is smaller than all samples @@ -1029,13 +1038,15 @@ whereKeyStats(Parse * pParse, /* Database connection */ if (iCol > 0) { pRec->nField = iCol; assert(sqlite3VdbeRecordCompareMsgpack - (aSample[i].n, aSample[i].p, pRec) <= 0 + (samples[i].key_size, + samples[i].sample_key, pRec) <= 0 || pParse->db->mallocFailed); } if (i > 0) { pRec->nField = nField; assert(sqlite3VdbeRecordCompareMsgpack - (aSample[i - 1].n, aSample[i - 1].p, + (samples[i - 1].key_size, + samples[i - 1].sample_key, pRec) < 0 || pParse->db->mallocFailed); } } @@ -1045,18 +1056,18 @@ whereKeyStats(Parse * pParse, /* Database connection */ if (res == 0) { /* Record pRec is equal to sample i */ assert(iCol == nField - 1); - aStat[0] = aSample[i].anLt[iCol]; - aStat[1] = aSample[i].anEq[iCol]; + aStat[0] = samples[i].lt[iCol]; + aStat[1] = samples[i].eq[iCol]; } else { /* At this point, the (iCol+1) field prefix of aSample[i] is the first * sample that is greater than pRec. Or, if i==pIdx->nSample then pRec * is larger than all samples in the array. */ tRowcnt iUpper, iGap; - if (i >= pIdx->nSample) { - iUpper = sqlite3LogEstToInt(pIdx->aiRowLogEst[0]); + if (i >= (int) sample_count) { + iUpper = sqlite3LogEstToInt(idx->def->opts.stat->tuple_log_est[0]); } else { - iUpper = aSample[i].anLt[iCol]; + iUpper = samples[i].lt[iCol]; } if (iLower >= iUpper) { @@ -1070,7 +1081,7 @@ whereKeyStats(Parse * pParse, /* Database connection */ iGap = iGap / 3; } aStat[0] = iLower + iGap; - aStat[1] = pIdx->aAvgEq[iCol]; + aStat[1] = idx->def->opts.stat->avg_eq[iCol]; } /* Restore the pRec->nField value before returning. */ @@ -1161,10 +1172,15 @@ whereRangeSkipScanEst(Parse * pParse, /* Parsing & code generating context */ int *pbDone) /* Set to true if at least one expr. value extracted */ { Index *p = pLoop->pIndex; + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(p->tnum)); + assert(space != NULL); + struct index *index = space_index(space, + SQLITE_PAGENO_TO_INDEXID(p->tnum)); + assert(index != NULL && index->def->opts.stat != NULL); int nEq = pLoop->nEq; sqlite3 *db = pParse->db; int nLower = -1; - int nUpper = p->nSample + 1; + int nUpper = index->def->opts.stat->sample_count + 1; int rc = SQLITE_OK; u8 aff = sqlite3IndexColumnAffinity(db, p, nEq); struct coll *pColl; @@ -1182,15 +1198,17 @@ whereRangeSkipScanEst(Parse * pParse, /* Parsing & code generating context */ if (pUpper && rc == SQLITE_OK) { rc = sqlite3Stat4ValueFromExpr(pParse, pUpper->pExpr->pRight, aff, &p2); - nUpper = p2 ? 0 : p->nSample; + nUpper = p2 ? 0 : index->def->opts.stat->sample_count; } if (p1 || p2) { int i; int nDiff; - for (i = 0; rc == SQLITE_OK && i < p->nSample; i++) { - rc = sqlite3Stat4Column(db, p->aSample[i].p, - p->aSample[i].n, nEq, &pVal); + struct index_sample *samples = index->def->opts.stat->samples; + uint32_t sample_count = index->def->opts.stat->sample_count; + for (i = 0; rc == SQLITE_OK && i < (int) sample_count; i++) { + rc = sqlite3Stat4Column(db, samples[i].sample_key, + samples[i].key_size, nEq, &pVal); if (rc == SQLITE_OK && p1) { int res = sqlite3MemCompare(p1, pVal, pColl); if (res >= 0) @@ -1214,7 +1232,8 @@ whereRangeSkipScanEst(Parse * pParse, /* Parsing & code generating context */ */ if (nDiff != 1 || pUpper == 0 || pLower == 0) { int nAdjust = - (sqlite3LogEst(p->nSample) - sqlite3LogEst(nDiff)); + (sqlite3LogEst(sample_count) - + sqlite3LogEst(nDiff)); pLoop->nOut -= nAdjust; *pbDone = 1; WHERETRACE(0x10, @@ -1285,8 +1304,22 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ Index *p = pLoop->pIndex; int nEq = pLoop->nEq; - - if (p->nSample > 0 && nEq < p->nSampleCol) { + uint32_t space_id = SQLITE_PAGENO_TO_SPACEID(p->tnum); + struct space *space = space_by_id(space_id); + assert(space != NULL); + uint32_t iid = SQLITE_PAGENO_TO_INDEXID(p->tnum); + struct index *idx = space_index(space, iid); + assert(idx != NULL); + struct index_stat *stat = idx->def->opts.stat; + /* + * Create surrogate stat in case ANALYZE command hasn't + * been ran. Simply fill it with zeros. + */ + struct index_stat surrogate_stat; + memset(&surrogate_stat, 0, sizeof(surrogate_stat)); + if (stat == NULL) + stat = &surrogate_stat; + if (stat->sample_count > 0 && nEq < (int) stat->sample_field_count) { if (nEq == pBuilder->nRecValid) { UnpackedRecord *pRec = pBuilder->pRec; tRowcnt a[2]; @@ -1329,15 +1362,6 @@ whereRangeScanEst(Parse * pParse, /* Parsing & code generating context */ * are in range. */ iLower = 0; - uint32_t space_id = - SQLITE_PAGENO_TO_SPACEID(p->tnum); - struct space *space = space_by_id(space_id); - assert(space != NULL); - uint32_t iid = - SQLITE_PAGENO_TO_INDEXID(p->tnum); - struct index *idx = - space_index(space, iid); - assert(idx != NULL); iUpper = idx->vtab->size(idx); } else { /* Note: this call could be optimized away - since the same values must @@ -1503,8 +1527,6 @@ whereEqualScanEst(Parse * pParse, /* Parsing & code generating context */ assert(nEq >= 1); assert(nEq <= (int)index_column_count(p)); - assert(p->aSample != 0); - assert(p->nSample > 0); assert(pBuilder->nRecValid < nEq); /* If values are not available for all fields of the index to the left @@ -1553,14 +1575,13 @@ whereInScanEst(Parse * pParse, /* Parsing & code generating context */ tRowcnt * pnRow) /* Write the revised row estimate here */ { Index *p = pBuilder->pNew->pIndex; - i64 nRow0 = sqlite3LogEstToInt(p->aiRowLogEst[0]); + i64 nRow0 = sqlite3LogEstToInt(index_field_tuple_est(p, 0)); int nRecValid = pBuilder->nRecValid; int rc = SQLITE_OK; /* Subfunction return code */ tRowcnt nEst; /* Number of rows for a single term */ tRowcnt nRowEst = 0; /* New estimate of the number of rows */ int i; /* Loop counter */ - assert(p->aSample != 0); for (i = 0; rc == SQLITE_OK && i < pList->nExpr; i++) { nEst = nRow0; rc = whereEqualScanEst(pParse, pBuilder, pList->a[i].pExpr, @@ -2319,9 +2340,25 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ WO_EQ | WO_IN | WO_GT | WO_GE | WO_LT | WO_LE | WO_ISNULL | WO_IS; } - if (pProbe->bUnordered) + struct space *space = + space_by_id(SQLITE_PAGENO_TO_SPACEID(pProbe->tnum)); + struct index *idx = NULL; + struct index_stat *stat = NULL; + if (space != NULL) { + idx = space_index(space, SQLITE_PAGENO_TO_INDEXID(pProbe->tnum)); + assert(idx != NULL); + stat = idx->def->opts.stat; + } + /* + * Create surrogate stat in case ANALYZE command hasn't + * been ran. Simply fill it with zeros. + */ + struct index_stat surrogate_stat; + memset(&surrogate_stat, 0, sizeof(surrogate_stat)); + if (stat == NULL) + stat = &surrogate_stat; + if (stat->is_unordered) opMask &= ~(WO_GT | WO_GE | WO_LT | WO_LE); - assert(pNew->nEq < nProbeCol); saved_nEq = pNew->nEq; @@ -2335,7 +2372,7 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ pTerm = whereScanInit(&scan, pBuilder->pWC, pSrc->iCursor, saved_nEq, opMask, pProbe); pNew->rSetup = 0; - rSize = pProbe->aiRowLogEst[0]; + rSize = index_field_tuple_est(pProbe, 0); rLogSize = estLog(rSize); for (; rc == SQLITE_OK && pTerm != 0; pTerm = whereScanNext(&scan)) { u16 eOp = pTerm->eOperator; /* Shorthand for pTerm->eOperator */ @@ -2493,8 +2530,8 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ } else { tRowcnt nOut = 0; if (nInMul == 0 - && pProbe->nSample - && pNew->nEq <= pProbe->nSampleCol + && stat->sample_count + && pNew->nEq <= stat->sample_field_count && ((eOp & WO_IN) == 0 || !ExprHasProperty(pTerm->pExpr, EP_xIsSelect)) @@ -2529,8 +2566,8 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ } if (nOut == 0) { pNew->nOut += - (pProbe->aiRowLogEst[nEq] - - pProbe->aiRowLogEst[nEq - 1]); + (index_field_tuple_est(pProbe, nEq) - + index_field_tuple_est(pProbe, nEq -1)); if (eOp & WO_ISNULL) { /* TUNING: If there is no likelihood() value, assume that a * "col IS NULL" expression matches twice as many rows @@ -2618,18 +2655,17 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ */ assert(42 == sqlite3LogEst(18)); if (saved_nEq == saved_nSkip && saved_nEq + 1U < nProbeCol && - pProbe->noSkipScan == 0 && + stat->skip_scan_enabled == true && /* TUNING: Minimum for skip-scan */ - pProbe->aiRowLogEst[saved_nEq + 1] >= 42 && + index_field_tuple_est(pProbe, saved_nEq + 1) >= 42 && (rc = whereLoopResize(db, pNew, pNew->nLTerm + 1)) == SQLITE_OK) { LogEst nIter; pNew->nEq++; pNew->nSkip++; pNew->aLTerm[pNew->nLTerm++] = 0; pNew->wsFlags |= WHERE_SKIPSCAN; - nIter = - pProbe->aiRowLogEst[saved_nEq] - - pProbe->aiRowLogEst[saved_nEq + 1]; + nIter = index_field_tuple_est(pProbe, saved_nEq) - + index_field_tuple_est(pProbe, saved_nEq + 1); pNew->nOut -= nIter; /* TUNING: Because uncertainties in the estimates for skip-scan queries, * add a 1.375 fudge factor to make skip-scan slightly less likely. @@ -2647,6 +2683,31 @@ whereLoopAddBtreeIndex(WhereLoopBuilder * pBuilder, /* The WhereLoop factory */ return rc; } +/** + * Check if is_unordered flag is set for given index. + * Since now there may not be corresponding Tarantool's index + * (e.g. for temporary objects such as ephemeral tables), + * a few preliminary checks are made. + * + * @param idx Index to be tested. + * @retval True, if statistics exist and unordered flag is set. + */ +static bool +index_is_unordered(struct Index *idx) +{ + assert(idx != NULL); + struct space *space = space_by_id(SQLITE_PAGENO_TO_SPACEID(idx->tnum)); + if (space == NULL) + return false; + uint32_t iid = SQLITE_PAGENO_TO_INDEXID(idx->tnum); + struct index *tnt_idx = space_index(space, iid); + if (tnt_idx == NULL) + return false; + if (tnt_idx->def->opts.stat != NULL) + return tnt_idx->def->opts.stat->is_unordered; + return false; +} + /* * Return True if it is possible that pIndex might be useful in * implementing the ORDER BY clause in pBuilder. @@ -2663,8 +2724,7 @@ indexMightHelpWithOrderBy(WhereLoopBuilder * pBuilder, ExprList *aColExpr; int ii, jj; int nIdxCol = index_column_count(pIndex); - - if (pIndex->bUnordered) + if (index_is_unordered(pIndex)) return 0; if ((pOB = pBuilder->pWInfo->pOrderBy) == 0) return 0; @@ -2877,7 +2937,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ testcase(pNew->iTab != pSrc->iCursor); /* See ticket [98d973b8f5] */ continue; /* Partial index inappropriate for this query */ } - rSize = pProbe->aiRowLogEst[0]; + rSize = index_field_tuple_est(pProbe, 0); pNew->nEq = 0; pNew->nBtm = 0; pNew->nTop = 0; @@ -3269,8 +3329,8 @@ wherePathSatisfiesOrderBy(WhereInfo * pWInfo, /* The WHERE clause */ if (pLoop->wsFlags & WHERE_IPK) { pIndex = 0; nColumn = 1; - } else if ((pIndex = pLoop->pIndex) == 0 - || pIndex->bUnordered) { + } else if ((pIndex = pLoop->pIndex) == NULL || + index_is_unordered(pIndex)) { return 0; } else { nColumn = index_column_count(pIndex); diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c index 86ee2731a..6b63b2c69 100644 --- a/src/box/sql/whereexpr.c +++ b/src/box/sql/whereexpr.c @@ -1305,9 +1305,7 @@ exprAnalyze(SrcList * pSrc, /* the FROM clause */ */ if (pExpr->op == TK_NOTNULL && pExpr->pLeft->op == TK_COLUMN - && pExpr->pLeft->iColumn >= 0 - && OptimizationEnabled(db, SQLITE_Stat4) - ) { + && pExpr->pLeft->iColumn >= 0) { Expr *pNewExpr; Expr *pLeft = pExpr->pLeft; int idxNew; -- 2.15.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 4/4] sql: move SQL statistics to server 2018-05-11 17:29 ` n.pettik @ 2018-05-11 22:00 ` Vladislav Shpilevoy 2018-05-14 11:52 ` n.pettik 0 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-05-11 22:00 UTC (permalink / raw) To: n.pettik, tarantool-patches On 11/05/2018 20:29, n.pettik wrote: > I am attaching whole patch since it has changed significantly with region allocations. > I didn't answer on comments concerning memory allocation policy deliberately, but > took them into consideration when was reworking patch. I hope I didn’t miss smth. > Overall, thanks for ideas and suggestions! Thank you very much! The patch now looks almost perfect. See 6 comments below. > diff --git a/src/box/index_def.c b/src/box/index_def.c > index d0d852519..1702430b3 100644 > --- a/src/box/index_def.c > +++ b/src/box/index_def.c > +struct index_stat * > +index_stat_dup(const struct index_stat *src) > +{ > + size_t size = index_stat_sizeof(src->samples, src->sample_count, > + src->sample_field_count); > + struct index_stat *dup = (struct index_stat *) malloc(size); > + if (dup == NULL) { > + diag_set(OutOfMemory, size, "malloc", "index stat"); > + return NULL; > + } > + memcpy(dup, src, size); > + uint32_t array_size = src->sample_field_count * sizeof(uint32_t); > + uint32_t stat1_offset = sizeof(struct index_stat); > + dup->tuple_stat1 = (uint32_t *)((char*)dup + stat1_offset); > + dup->tuple_log_est = (log_est_t *)((char*)dup + stat1_offset + > + array_size + sizeof(uint32_t)); > + dup->avg_eq = (uint32_t *)((char*)dup + stat1_offset + > + 2 * (array_size + sizeof(uint32_t))); > + uint32_t samples_offset = stat1_offset + 3 * array_size + > + 2 * sizeof(uint32_t); > + dup->samples = (struct index_sample *)((char*)dup + samples_offset); > + uint32_t current_sample_offset = samples_offset + > + src->sample_count * > + sizeof(struct index_sample); > + for (uint32_t i = 0; i < src->sample_count; ++i) { > + dup->samples[i].eq = (uint32_t *)((char*)dup + > + current_sample_offset); > + dup->samples[i].lt = (uint32_t *)((char*)dup + > + current_sample_offset + > + array_size); > + dup->samples[i].dlt = (uint32_t *)((char*)dup + > + current_sample_offset + > + 2 * size); 1. Typo? Maybe array_size instead of size? I slightly refactored it. Take a look. Looks more compact, and tests are passed. I wish it was my task, huge malloc formatting is very interesting. diff --git a/src/box/index_def.c b/src/box/index_def.c index 1702430b3..d9035e3c5 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -205,31 +205,25 @@ index_stat_dup(const struct index_stat *src) memcpy(dup, src, size); uint32_t array_size = src->sample_field_count * sizeof(uint32_t); uint32_t stat1_offset = sizeof(struct index_stat); - dup->tuple_stat1 = (uint32_t *)((char*)dup + stat1_offset); - dup->tuple_log_est = (log_est_t *)((char*)dup + stat1_offset + - array_size + sizeof(uint32_t)); - dup->avg_eq = (uint32_t *)((char*)dup + stat1_offset + - 2 * (array_size + sizeof(uint32_t))); - uint32_t samples_offset = stat1_offset + 3 * array_size + - 2 * sizeof(uint32_t); - dup->samples = (struct index_sample *)((char*)dup + samples_offset); - uint32_t current_sample_offset = samples_offset + - src->sample_count * - sizeof(struct index_sample); + char *pos = (char *) dup + stat1_offset; + dup->tuple_stat1 = (uint32_t *) pos; + pos += array_size + sizeof(uint32_t); + dup->tuple_log_est = (log_est_t *) pos; + pos += array_size + sizeof(uint32_t); + dup->avg_eq = (uint32_t *) pos; + pos += array_size; + dup->samples = (struct index_sample *) pos; + pos += src->sample_count * sizeof(struct index_sample); + for (uint32_t i = 0; i < src->sample_count; ++i) { - dup->samples[i].eq = (uint32_t *)((char*)dup + - current_sample_offset); - dup->samples[i].lt = (uint32_t *)((char*)dup + - current_sample_offset + - array_size); - dup->samples[i].dlt = (uint32_t *)((char*)dup + - current_sample_offset + - 2 * size); - dup->samples[i].sample_key = - (char *)((char*)dup + current_sample_offset + 3 * size); - current_sample_offset += current_sample_offset + - 3 * array_size + - dup->samples[i].key_size + 2; + dup->samples[i].eq = (uint32_t *) pos; + pos += array_size; + dup->samples[i].lt = (uint32_t *) pos; + pos += array_size; + dup->samples[i].dlt = (uint32_t *) pos; + pos += array_size; + dup->samples[i].sample_key = pos; + pos += dup->samples[i].key_size + 2; } return dup; } > diff --git a/src/box/index_def.h b/src/box/index_def.h > index dc8293b23..107255cc7 100644 > --- a/src/box/index_def.h > +++ b/src/box/index_def.h > @@ -173,6 +242,50 @@ struct index_def { > struct index_def * > index_def_dup(const struct index_def *def); > > +/** > + * Calculate size of index's statistics. > + * Statistics is located in memory according to following layout: > + * > + * +-------------------------+ <- Allocated memory starts here > + * | struct index_stat | > + * |-------------------------| > + * | stat1 array | ^ > + * |-------------------------| | > + * | log_est array | | 3 * array_size + 2 * uint_32 > + * |-------------------------| | > + * | avg_eq array | v > + * |-------------------------| > + * | samples struct array | > + * |-------------------------| > + * | eq | lt | dlt | key | <- Content of one sample > + * +-------------------------+ > + * ... <- Up to 24 samples > + * | eq | lt | dlt | key | > + * +-------------------------+ Nice picture. > + * > + * array_size = field_count * sizeof(uint_32) > + * offset of one sample = 3 * array_size + key_size 2. +2, no? For zeros. (See the question about them below.) > + * > + * @param samples Array of samples needed for calculating > + * size of keys. > + * @param sample_count Count of samples in samples array. > + * @param field_count Count of fields in statistics arrays. > + * @retval Size needed for statistics allocation in bytes. > + */ > +size_t > +index_stat_sizeof(const struct index_sample *samples, uint32_t sample_count, > + uint32_t field_count); > + > +/** > + * Duplicate index_stat object. > + * To understand memory layout see index_stat_sizeof() function. > + * > + * @param src Stat to duplicate. > + * @retval Copy of the @src. 3. Or NULL. > diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c > index 57cac437c..4f21cd25f 100644 > --- a/src/box/sql/analyze.c > +++ b/src/box/sql/analyze.c > +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, > + LogEst *stat_log) { > + char *z = (char *) stat_string; 4. Why do you need this cast? > + /* > + * Take a copy of the sample. Add two 0x00 bytes > + * the end of the buffer. This is in case the > + * sample record is corrupted. In that case, the > + * sqlite3VdbeRecordCompare() may read up to two > + * varints past the end of the allocated buffer > + * before it realizes it is dealing with a corrupt > + * record. Adding the two 0x00 bytes prevents this > + * from causing a buffer overread. 5. Are you sure that this 0x0000 is still needed? Tarantool does not stores any "corrupted" things. Maybe it is Btree artifacts? > + > +/** > + * This function performs copy of statistics. > + * In contrast to index_stat_dup(), there is no assumption > + * that source statistics is allocated within chunk. But > + * destination place is still one big chunk of heap memory. > + * See also index_stat_sizeof() for understanding memory layout. > * > - * If an OOM error occurs, this function always sets db->mallocFailed. > - * This means if the caller does not care about other errors, the return > - * code may be ignored. > + * @param dest One chunk of memory where statistics > + * will be placed. > + * @param src Statistics to be copied. > */ > -int > -sqlite3AnalysisLoad(sqlite3 * db) > +static void > +stat_copy(struct index_stat *dest, const struct index_stat *src) > { > - analysisInfo sInfo; > - HashElem *i, *j; > - char *zSql; > - int rc = SQLITE_OK; > - > - /* Clear any prior statistics */ > - for (j = sqliteHashFirst(&db->pSchema->tblHash); j; > - j = sqliteHashNext(j)) { > - Table *pTab = sqliteHashData(j); > - for (i = sqliteHashFirst(&pTab->idxHash); i; > - i = sqliteHashNext(i)) { > - Index *pIdx = sqliteHashData(i); > - pIdx->aiRowLogEst[0] = 0; > - sqlite3DeleteIndexSamples(db, pIdx); > - pIdx->aSample = 0; > - } > + assert(dest != NULL); 6. How about this diff? diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 4f21cd25f..280b9132d 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -1721,46 +1721,39 @@ stat_copy(struct index_stat *dest, const struct index_stat *src) dest->is_unordered = src->is_unordered; uint32_t array_size = src->sample_field_count * sizeof(uint32_t); uint32_t stat1_offset = sizeof(struct index_stat); - memcpy((char *)dest + stat1_offset, src->tuple_stat1, - array_size + sizeof(uint32_t)); - memcpy((char *)dest + stat1_offset + array_size + sizeof(uint32_t), - src->tuple_log_est, array_size + sizeof(uint32_t)); - memcpy((char *)dest + stat1_offset + 2 * (array_size + sizeof(uint32_t)), - src->avg_eq, array_size); - dest->tuple_stat1 = (uint32_t *)((char *)dest + stat1_offset); - dest->tuple_log_est = (log_est_t *)((char *)dest + stat1_offset + - array_size + sizeof(uint32_t)); - dest->avg_eq = (uint32_t *)((char *)dest + stat1_offset + - 2 * (array_size + sizeof(uint32_t))); - uint32_t samples_offset = stat1_offset + 3 * array_size + - 2 * sizeof(uint32_t); - dest->samples = (struct index_sample *)((char*)dest + samples_offset); - uint32_t current_sample_offset = samples_offset + dest->sample_count * - sizeof(struct index_sample); + char *pos = (char *) dest + stat1_offset; + memcpy(pos, src->tuple_stat1, array_size + sizeof(uint32_t)); + dest->tuple_stat1 = (uint32_t *) pos; + pos += array_size + sizeof(uint32_t); + + memcpy(pos, src->tuple_log_est, array_size + sizeof(uint32_t)); + dest->tuple_log_est = (log_est_t *) pos; + pos += array_size + sizeof(uint32_t); + + memcpy(pos, src->avg_eq, array_size); + dest->avg_eq = (uint32_t *) pos; + pos += array_size; + + dest->samples = (struct index_sample *) pos; + pos += dest->sample_count * sizeof(struct index_sample); assert(src->samples != NULL); for (uint32_t i = 0; i < dest->sample_count; ++i) { dest->samples[i].key_size = src->samples[i].key_size; - memcpy((char *)dest + current_sample_offset, src->samples[i].eq, - array_size); - dest->samples[i].eq = (uint32_t *)((char *)dest + - current_sample_offset); - current_sample_offset += array_size; - memcpy((char *)dest + current_sample_offset, src->samples[i].lt, - array_size); - dest->samples[i].lt = (uint32_t *)((char *)dest + - current_sample_offset); - current_sample_offset += array_size; - memcpy((char *)dest + current_sample_offset, - src->samples[i].dlt, array_size); - dest->samples[i].dlt = (uint32_t *)((char *)dest + - current_sample_offset); - current_sample_offset += array_size; - memcpy((char *)dest + current_sample_offset, - src->samples[i].sample_key, + memcpy(pos, src->samples[i].eq, array_size); + dest->samples[i].eq = (uint32_t *) pos; + pos += array_size; + + memcpy(pos, src->samples[i].lt, array_size); + dest->samples[i].lt = (uint32_t *) pos; + pos += array_size; + + memcpy(pos, src->samples[i].dlt, array_size); + dest->samples[i].dlt = (uint32_t *) pos; + pos += array_size; + memcpy(pos, src->samples[i].sample_key, src->samples[i].key_size + 2); - dest->samples[i].sample_key = (char *)dest + - current_sample_offset; - current_sample_offset += dest->samples[i].key_size + 2; + dest->samples[i].sample_key = pos; + pos += dest->samples[i].key_size + 2; } } ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 4/4] sql: move SQL statistics to server 2018-05-11 22:00 ` Vladislav Shpilevoy @ 2018-05-14 11:52 ` n.pettik 2018-05-14 12:54 ` Vladislav Shpilevoy 0 siblings, 1 reply; 19+ messages in thread From: n.pettik @ 2018-05-14 11:52 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy >> + dup->samples[i].dlt = (uint32_t *)((char*)dup + >> + current_sample_offset + >> + 2 * size); > > 1. Typo? Maybe array_size instead of size? I slightly refactored it. Take a look. Looks > more compact, and tests are passed. Yep, it was definitely typo. Anyway, lets use your variant since it seems to be easier to understand. > I wish it was my task, huge malloc formatting is very interesting. I wish I could understand whether this is sarcasm or not... > > diff --git a/src/box/index_def.c b/src/box/index_def.c > index 1702430b3..d9035e3c5 100644 > --- a/src/box/index_def.c > +++ b/src/box/index_def.c > @@ -205,31 +205,25 @@ index_stat_dup(const struct index_stat *src) > memcpy(dup, src, size); > uint32_t array_size = src->sample_field_count * sizeof(uint32_t); > uint32_t stat1_offset = sizeof(struct index_stat); > - dup->tuple_stat1 = (uint32_t *)((char*)dup + stat1_offset); > - dup->tuple_log_est = (log_est_t *)((char*)dup + stat1_offset + > - array_size + sizeof(uint32_t)); > - dup->avg_eq = (uint32_t *)((char*)dup + stat1_offset + > - 2 * (array_size + sizeof(uint32_t))); > - uint32_t samples_offset = stat1_offset + 3 * array_size + > - 2 * sizeof(uint32_t); > - dup->samples = (struct index_sample *)((char*)dup + samples_offset); > - uint32_t current_sample_offset = samples_offset + > - src->sample_count * > - sizeof(struct index_sample); > + char *pos = (char *) dup + stat1_offset; > + dup->tuple_stat1 = (uint32_t *) pos; > + pos += array_size + sizeof(uint32_t); > + dup->tuple_log_est = (log_est_t *) pos; > + pos += array_size + sizeof(uint32_t); > + dup->avg_eq = (uint32_t *) pos; > + pos += array_size; > + dup->samples = (struct index_sample *) pos; > + pos += src->sample_count * sizeof(struct index_sample); > + > for (uint32_t i = 0; i < src->sample_count; ++i) { > - dup->samples[i].eq = (uint32_t *)((char*)dup + > - current_sample_offset); > - dup->samples[i].lt = (uint32_t *)((char*)dup + > - current_sample_offset + > - array_size); > - dup->samples[i].dlt = (uint32_t *)((char*)dup + > - current_sample_offset + > - 2 * size); > - dup->samples[i].sample_key = > - (char *)((char*)dup + current_sample_offset + 3 * size); > - current_sample_offset += current_sample_offset + > - 3 * array_size + > - dup->samples[i].key_size + 2; > + dup->samples[i].eq = (uint32_t *) pos; > + pos += array_size; > + dup->samples[i].lt = (uint32_t *) pos; > + pos += array_size; > + dup->samples[i].dlt = (uint32_t *) pos; > + pos += array_size; > + dup->samples[i].sample_key = pos; > + pos += dup->samples[i].key_size + 2; > } > return dup; > } Ok, your way rather better. >> +/** >> + * Duplicate index_stat object. >> + * To understand memory layout see index_stat_sizeof() function. >> + * >> + * @param src Stat to duplicate. >> + * @retval Copy of the @src. > > 3. Or NULL. Ok, added clarification. >> diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c >> index 57cac437c..4f21cd25f 100644 >> --- a/src/box/sql/analyze.c >> +++ b/src/box/sql/analyze.c >> +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, >> + LogEst *stat_log) { >> + char *z = (char *) stat_string; > > 4. Why do you need this cast? Since implicit cast discards const modifier and leads to compilation error. > >> + /* >> + * Take a copy of the sample. Add two 0x00 bytes >> + * the end of the buffer. This is in case the >> + * sample record is corrupted. In that case, the >> + * sqlite3VdbeRecordCompare() may read up to two >> + * varints past the end of the allocated buffer >> + * before it realizes it is dealing with a corrupt >> + * record. Adding the two 0x00 bytes prevents this >> + * from causing a buffer overread. > > 5. Are you sure that this 0x0000 is still needed? Tarantool does not stores > any "corrupted" things. Maybe it is Btree artifacts? > Well, no corruptions - no zero bytes. I have removed mentions about these bytes both from comments and code. Everything seems to be OK. Also, functions which are supposed to rely on this condition (i.e. 0x00 bytes) are unused. Thus, I will remove them in separate commit. > > 6. How about this diff? Pretty nice, I have updated my branch with your changes. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 4/4] sql: move SQL statistics to server 2018-05-14 11:52 ` n.pettik @ 2018-05-14 12:54 ` Vladislav Shpilevoy 2018-05-14 13:55 ` n.pettik 0 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-05-14 12:54 UTC (permalink / raw) To: n.pettik, tarantool-patches Hello. Thanks for fixes! On 14/05/2018 14:52, n.pettik wrote: > >> I wish it was my task, huge malloc formatting is very interesting. > > I wish I could understand whether this is sarcasm or not... Not sarcasm. > >>> diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c >>> index 57cac437c..4f21cd25f 100644 >>> --- a/src/box/sql/analyze.c >>> +++ b/src/box/sql/analyze.c >>> +decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, >>> + LogEst *stat_log) { >>> + char *z = (char *) stat_string; >> >> 4. Why do you need this cast? > > Since implicit cast discards const modifier and leads to compilation error. No, I checked that. The code below is compiled ok. So you can remove the cast and z variable. diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c index 4f21cd25f..061fcc7ab 100644 --- a/src/box/sql/analyze.c +++ b/src/box/sql/analyze.c @@ -1230,7 +1230,7 @@ struct analysis_index_info { static void decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, LogEst *stat_log) { - char *z = (char *) stat_string; + const char *z = stat_string; if (z == NULL) z = ""; for (int i = 0; *z && i < stat_size; i++) { ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 4/4] sql: move SQL statistics to server 2018-05-14 12:54 ` Vladislav Shpilevoy @ 2018-05-14 13:55 ` n.pettik 0 siblings, 0 replies; 19+ messages in thread From: n.pettik @ 2018-05-14 13:55 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy > No, I checked that. The code below is compiled ok. So you can remove the cast and > z variable. > > diff --git a/src/box/sql/analyze.c b/src/box/sql/analyze.c > index 4f21cd25f..061fcc7ab 100644 > --- a/src/box/sql/analyze.c > +++ b/src/box/sql/analyze.c > @@ -1230,7 +1230,7 @@ struct analysis_index_info { > static void > decode_stat_string(const char *stat_string, int stat_size, tRowcnt *stat_exact, > LogEst *stat_log) { > - char *z = (char *) stat_string; > + const char *z = stat_string; > if (z == NULL) > z = ""; > for (int i = 0; *z && i < stat_size; i++) { Ok, I misunderstood you. I thought you mean kind of: char *z = stat_string; Fixed this point as well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 0/4] Move original SQLite's statistics to server 2018-04-23 20:29 [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server Nikita Pettik ` (3 preceding siblings ...) 2018-04-23 20:29 ` [tarantool-patches] [PATCH 4/4] sql: move SQL statistics to server Nikita Pettik @ 2018-05-14 14:12 ` Vladislav Shpilevoy 2018-05-15 13:42 ` Kirill Yukhin 4 siblings, 1 reply; 19+ messages in thread From: Vladislav Shpilevoy @ 2018-05-14 14:12 UTC (permalink / raw) To: Nikita Pettik, tarantool-patches Now the entire patchset LGTM. On 23/04/2018 23:29, Nikita Pettik wrote: > Branch: https://github.com/tarantool/tarantool/tree/np/gh-3253-move-statistics-to-server > Issue: https://github.com/tarantool/tarantool/issues/3253 > > This patch-set is about transfering statistics containing in > SQL specific struct Table and struct Index to the struct index of > Tarantool. Unused statistics are removed at all. > > First patch of the series provides slight optimization of simple > 'SELECT COUNT(*)' query compilation. There is no need to hesistate which > index to choose: always use PK. It also allows to remove one usage of > statistics from index. > > Second patch removes obsolete SQLite calculation of average tuple size: > it is always possible to get size of space, count of tuples and divide > them. > > Third patch adds simple wrapper to fetch tuple count of given > table from server. > > The last and the main one, introduces new member of index's opts to > describe statistics concerning distribution of tuples. Within this patch, > routine used to load statistics from _sql_stat1 and _sql_stat4 has been > reworked to operate directly on Tarantool's structs. It also includes > codestyle fixes. > > > Nikita Pettik (4): > sql: optimize compilation of SELECT COUNT(*) > sql: add average tuple size calculation > sql: refactor usages of table's tuple count > sql: move SQL statistics to server > > src/box/index_def.c | 92 +++++ > src/box/index_def.h | 90 +++++ > src/box/sql/analyze.c | 787 ++++++++++++++++++++--------------------- > src/box/sql/build.c | 107 ------ > src/box/sql/parse.y | 2 - > src/box/sql/pragma.c | 25 +- > src/box/sql/prepare.c | 7 +- > src/box/sql/select.c | 210 +++++------ > src/box/sql/sqliteInt.h | 123 +++++-- > src/box/sql/vdbe.c | 7 +- > src/box/sql/where.c | 231 ++++++++---- > src/box/sql/whereexpr.c | 4 +- > test/sql-tap/analyze9.test.lua | 81 +++-- > test/sql-tap/eqp.test.lua | 2 +- > 14 files changed, 954 insertions(+), 814 deletions(-) > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [tarantool-patches] Re: [PATCH 0/4] Move original SQLite's statistics to server 2018-05-14 14:12 ` [tarantool-patches] Re: [PATCH 0/4] Move original SQLite's " Vladislav Shpilevoy @ 2018-05-15 13:42 ` Kirill Yukhin 0 siblings, 0 replies; 19+ messages in thread From: Kirill Yukhin @ 2018-05-15 13:42 UTC (permalink / raw) To: tarantool-patches; +Cc: Nikita Pettik Hello Vlad, Nikita, On 14 мая 17:12, Vladislav Shpilevoy wrote: > Now the entire patchset LGTM. I've checked the patch set into 2.0 branch. -- Regards, Kirill Yukhin ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-05-15 13:42 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-23 20:29 [tarantool-patches] [PATCH 0/4] Move original SQLite's statistics to server Nikita Pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 1/4] sql: optimize compilation of SELECT COUNT(*) Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 2/4] sql: add average tuple size calculation Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 3/4] sql: refactor usages of table's tuple count Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 2018-04-23 20:29 ` [tarantool-patches] [PATCH 4/4] sql: move SQL statistics to server Nikita Pettik 2018-04-24 12:51 ` [tarantool-patches] " Vladislav Shpilevoy 2018-05-11 17:29 ` n.pettik 2018-05-11 22:00 ` Vladislav Shpilevoy 2018-05-14 11:52 ` n.pettik 2018-05-14 12:54 ` Vladislav Shpilevoy 2018-05-14 13:55 ` n.pettik 2018-05-14 14:12 ` [tarantool-patches] Re: [PATCH 0/4] Move original SQLite's " Vladislav Shpilevoy 2018-05-15 13:42 ` Kirill Yukhin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox