[Tarantool-patches] [PATCH v3 1/1] sql: enable autoindex optimization
Nikita Pettik
korablev at tarantool.org
Tue Oct 13 17:08:34 MSK 2020
On 11 Oct 13:48, imeevma at tarantool.org wrote:
> From: Kirill Yukhin <kyukhin at tarantool.org>
>
> This patch enables "autoindex" optimization. This optimization uses an
> ephemeral space that contains all the fields used in the query. This
Не очень понимаю, что значит "все поля в запросе"..? Это как?:)
Ты имеешь ввиду все поля result-set'a? Или объединение всех полей
всех таблиц используемых в запросе?
> ephemeral space is called "autoindex". Autoindex may have fewer fields
> than the original space, and the order of these fields in autoindex may
Опять: original space - это какой спейс? В запросе может участвовать
много таблиц..
Еще имхо какая-то путаница выходит: называем эфемерные спейсы автоиндексами.
Я бы обозвал это как-нибудь по-другому. Это же просто еще одно использование
эфемерных таблиц, не более.
> be different from their order in the original space. In addition to the
> mentioned fields, autoindex contains a rowid.
>
> The primary key of autoindex is a covering index, i.e. it contains all
> of its fields.
В мемтиксе все индексы покрывающие. И 'covering' - свойство для каждого
запроса отдельно взятого, а не индекса как такового.
>
> Currently this optimization mainly used in cases when values SELECTED
> from more than one space and WHERE specified.
>
> At the moment, this optimization does not take into account the number
> of tuples in spaces. This should be fixed.
>
> Closes #4933
>
> Co-authored-by: Mergen Imeev <imeevma at gmail.com>
> Co-authored-by: Timur Safin <tsafin at tarantool.org>
> ---
> https://github.com/tarantool/tarantool/issues/4933
> https://github.com/tarantool/tarantool/tree/imeevma/gh-4933-autoindex
>
> @ChangeLog
> - "autoindex" optimization is now enabled (gh-4933).
МНе кажется, можно чуть больше написать про такую важную штуку
> src/box/CMakeLists.txt | 2 +-
> src/box/sql.c | 2 +-
> src/box/sql/delete.c | 28 ----
> src/box/sql/sqlInt.h | 35 -----
> src/box/sql/where.c | 238 +++++++++++++++++++++----------
> src/box/sql/wherecode.c | 13 +-
> test/sql-tap/autoindex1.test.lua | 115 +++++++++++++++
> test/sql-tap/autoindex4.test.lua | 14 +-
> test/sql-tap/whereF.test.lua | 16 ++-
> test/sql/misc.result | 35 +++++
> test/sql/misc.test.lua | 14 ++
> 11 files changed, 358 insertions(+), 154 deletions(-)
> create mode 100755 test/sql-tap/autoindex1.test.lua
>
> diff --git a/src/box/sql.c b/src/box/sql.c
> index 000246ee9..a551bffc3 100644
> --- a/src/box/sql.c
> +++ b/src/box/sql.c
> @@ -719,7 +719,7 @@ int
> tarantoolsqlIdxKeyCompare(struct BtCursor *cursor,
> struct UnpackedRecord *unpacked)
> {
> - assert(cursor->curFlags & BTCF_TaCursor);
> + assert(cursor->curFlags & (BTCF_TaCursor | BTCF_TEphemCursor));
> assert(cursor->iter != NULL);
> assert(cursor->last_tuple != NULL);
>
> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
> index e9e936856..6ab2be3ba 100644
> --- a/src/box/sql/where.c
> +++ b/src/box/sql/where.c
> @@ -683,7 +683,6 @@ translateColumnToCopy(Vdbe * v, /* The VDBE containing code to translate */
> }
> +/**
> + * Generate a code that will create a tuple, which will be inserted in the
> + * autoindex. The created tuple consists of rowid and fields described in the
> + * index key description.
> + *
> + * @param parse Parsing context.
> + * @param key_def The index key description.
> + * @param cursor Cursor of space where we will get values for tuple.
> + * @param reg_out Register to which the created tuple will be placed.
> + * @param reg_eph Register holding pointer to autoindex.
> + *
> + * @return Return a register number which is the first in a block of registers
> + * that holds the elements of the created tuple. The block of registers has
> + * already been deallocated by the time this routine returns.
Не очень понял, зачем возвращать уже деаллоцированные регистры..
> + */
> +static int
> +vdbe_emit_create_autoindex_tuple(struct Parse *parse,
> + const struct key_def *key_def,
> + int cursor, int reg_out, int reg_eph)
> +{
> + assert(reg_out != 0);
> + struct Vdbe *v = parse->pVdbe;
> + int col_cnt = key_def->part_count;
> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
> + for (int j = 0; j < col_cnt; j++) {
> + uint32_t tabl_col = key_def->parts[j].fieldno;
> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
> + }
> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
> + sqlReleaseTempRange(parse, reg_base, col_cnt + 1);
> + return reg_base;
> +}
> +
> /*
> - * Generate code to construct the Index object for an automatic index
> - * and to set up the WhereLevel object pLevel so that the code generator
> - * makes use of the automatic index.
> + * Generate code to construct the ephemeral space that contains all used in
> + * query fields of one of the tables. This ephemeral space will be known as an
Опять: какой таблицы..Очень запутывающий коммент имхо
> + * "autoindex". PK of this ephemeral space is a covering index. Also, this
> + * functions set up the WhereLevel object pLevel so that the code generator
> + * makes use of the auto-index.
> */
> static void
> constructAutomaticIndex(Parse * pParse, /* The parsing context */
> }
> * Estimate the location of a particular key among all keys in an
> -#ifndef SQL_OMIT_AUTOMATIC_INDEX
> /* Automatic indexes */
> - LogEst rSize = pTab->nRowLogEst;
> + rSize = DEFAULT_TUPLE_LOG_COUNT;
> + /* Increase cost of autoindex if number of tuples in space < ~10000. */
10k как-то маловато. В скулайте, если я правильно помню, какие-то подвижки
начинаются со 100к affected rows.
> + if (!space->def->opts.is_view && sql_space_tuple_log_count(space) < 133)
> + rSize += DEFAULT_TUPLE_LOG_COUNT;
Почему для представления это не работает?
> LogEst rLogSize = estLog(rSize);
> if (!pBuilder->pOrSet && /* Not pqart of an OR optimization */
> (pWInfo->wctrlFlags & WHERE_OR_SUBCLAUSE) == 0 &&
> (pWInfo->pParse->sql_flags & SQL_AutoIndex) != 0 &&
> pSrc->pIBIndex == 0 /* Has no INDEXED BY clause */
> && !pSrc->fg.notIndexed /* Has no NOT INDEXED clause */
> &&!pSrc->fg.isCorrelated /* Not a correlated subquery */
> && !pSrc->fg.isRecursive /* Not a recursive common table expression. */
> ) {
> @@ -2829,24 +2910,10 @@ tnt_error:
> if (termCanDriveIndex(pTerm, pSrc, 0)) {
> pNew->nEq = 1;
> pNew->nSkip = 0;
> - pNew->pIndex = 0;
> + pNew->index_def = NULL;
> pNew->nLTerm = 1;
> pNew->aLTerm[0] = pTerm;
> - /* TUNING: One-time cost for computing the automatic index is
> - * estimated to be X*N*log2(N) where N is the number of rows in
> - * the table being indexed and where X is 7 (LogEst=28) for normal
> - * tables or 1.375 (LogEst=4) for views and subqueries. The value
> - * of X is smaller for views and subqueries so that the query planner
> - * will be more aggressive about generating automatic indexes for
> - * those objects, since there is no opportunity to add schema
> - * indexes on subqueries and views.
> - */
> - pNew->rSetup = rLogSize + rSize + 4;
> - if (!pTab->def->opts.is_view &&
> - pTab->def->id == 0)
> - pNew->rSetup += 24;
> - if (pNew->rSetup < 0)
> - pNew->rSetup = 0;
> + pNew->rSetup = rLogSize + rSize;
Можешь объяснить, почему тут произошли изменения? То есть почему оптимизация
стала более агрессивной?
> /* TUNING: Each index lookup yields 20 rows in the table. This
> * is more than the usual guess of 10 rows, since we have no way
> * of knowing how selective the index will ultimately be. It would
> diff --git a/src/box/sql/wherecode.c b/src/box/sql/wherecode.c
> index 6d8768865..0d58b4701 100644
> --- a/src/box/sql/wherecode.c
> +++ b/src/box/sql/wherecode.c
> @@ -219,12 +219,12 @@ sqlWhereExplainOneScan(Parse * pParse, /* Parse context */
>
> assert(!(flags & WHERE_AUTO_INDEX)
> || (flags & WHERE_IDX_ONLY));
> - if (idx_def->iid == 0) {
> + if ((flags & WHERE_AUTO_INDEX) != 0) {
> + zFmt = "AUTOMATIC COVERING INDEX";
Я бы оставил просто automatic index
> + } else if (idx_def->iid == 0) {
> if (isSearch) {
> zFmt = "PRIMARY KEY";
> }
> - } else if (flags & WHERE_AUTO_INDEX) {
> - zFmt = "AUTOMATIC COVERING INDEX";
> } else if (flags & WHERE_IDX_ONLY) {
> zFmt = "COVERING INDEX %s";
> } else {
> @@ -1047,8 +1048,6 @@ sqlWhereCodeOneLoopStart(WhereInfo * pWInfo, /* Complete information about the W
> startEq = 0;
> start_constraints = 1;
> }
> - struct index_def *idx_pk = space->index[0]->def;
> - uint32_t pk_part_count = idx_pk->key_def->part_count;
> /*
> * Tarantool's iterator over integer fields doesn't
> * tolerate floating point values. Hence, if term
> @@ -1234,6 +1233,8 @@ sqlWhereCodeOneLoopStart(WhereInfo * pWInfo, /* Complete information about the W
> if (omitTable) {
> /* pIdx is a covering index. No need to access the main table. */
> } else if (iCur != iIdxCur) {
> + struct index_def *idx_pk = space->index[0]->def;
> + uint32_t pk_part_count = idx_pk->key_def->part_count;
Этот дифф точно нужен для патча?
> int iKeyReg = sqlGetTempRange(pParse, pk_part_count);
> for (j = 0; j < (int) pk_part_count; j++) {
> k = idx_pk->key_def->parts[j].fieldno;
> diff --git a/test/sql-tap/autoindex1.test.lua b/test/sql-tap/autoindex1.test.lua
> new file mode 100755
> index 000000000..42b995dd3
> --- /dev/null
> +++ b/test/sql-tap/autoindex1.test.lua
> @@ -0,0 +1,115 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(7)
> +
> +-- 2010 April 07
> +
> +-- The author disclaims copyright to this source code. In place of
> +-- a legal notice, here is a blessing:
> +
> +-- May you do good and not evil.
> +-- May you find forgiveness for yourself and forgive others.
> +-- May you share freely, never taking more than you give.
> +
Необязательно включать эти странные хидеры в наши тесты..
> +---------------------------------------------------------------------------
> +-- This file implements regression tests for sql library. The focus of this
> +-- script is testing automatic index creation logic.
> +
> +test:execsql([[
> + CREATE TABLE t1(a INT, b INT PRIMARY KEY);
> + INSERT INTO t1 VALUES(1, 11);
> + INSERT INTO t1 VALUES(2, 22);
> + INSERT INTO t1 SELECT a + 2, b + 22 FROM t1;
> + INSERT INTO t1 SELECT a + 4, b + 44 FROM t1;
> + CREATE TABLE t2(c INT, d INT PRIMARY KEY);
> +]])
> +
> +for i = 1, 11000 do test:execsql("INSERT INTO t2 VALUES ("..i..", "..i..");") end
Это быстро работает? Может сделать тест long-run?
Было бы еще неплохо к каждой группе тестов коммент, поясняющий почему
с эфмереной табличкой будет быстрее.
> +test:do_execsql_test(
> + "autoindex-1.1", [[
> + EXPLAIN QUERY PLAN SELECT b, (SELECT d FROM t2 WHERE c = a) FROM t1;
> + ]], {
> + 0,0,0,"SCAN TABLE T1 (~1048576 rows)",
> + 0,0,0,"EXECUTE CORRELATED SCALAR SUBQUERY 1",
> + 1,0,0,"SEARCH TABLE T2 USING AUTOMATIC COVERING INDEX (C=?) (~20 rows)"
> + })
> +
> +result = test:execsql([[SELECT b, (SELECT d FROM t2 WHERE c = a) FROM t1;]])
> +
> +test:do_execsql_test(
> + "autoindex-1.2", [[
> + EXPLAIN QUERY PLAN SELECT b, d FROM t1 JOIN t2 ON a = c ORDER BY b;
> + ]], {
> + 0,0,0,"SCAN TABLE T1 (~1048576 rows)",
> + 0,1,1,"SEARCH TABLE T2 USING AUTOMATIC COVERING INDEX (C=?) (~20 rows)"
> + })
> +
> +test:do_execsql_test(
> + "autoindex-1.3", [[
> + SELECT b, d FROM t1 JOIN t2 ON a = c ORDER BY b;
> + ]], result)
> +
> +test:do_execsql_test(
> + "autoindex-1.4", [[
> + EXPLAIN QUERY PLAN SELECT b, d FROM t1 CROSS JOIN t2 ON (c = a);
> + ]], {
> + 0,0,0,"SCAN TABLE T1 (~1048576 rows)",
> + 0,1,1,"SEARCH TABLE T2 USING AUTOMATIC COVERING INDEX (C=?) (~20 rows)"
> + })
> +
> +test:do_execsql_test(
> + "autoindex-1.5", [[
> + SELECT b, d FROM t1 CROSS JOIN t2 ON (c = a);
> + ]], result)
> +
> +test:execsql([[
> + CREATE TABLE t3(i INT PRIMARY KEY, a INT, b INT);
> +]])
> +
> +for i = 1, 11000 do test:execsql("INSERT INTO t3 VALUES ("..i..", "..i..", "..(i + 1)..");") end
> +
> +test:do_execsql_test(
> + "autoindex-1.6", [[
> + SELECT count(*)
> + FROM t3 AS x1
> + JOIN t3 AS x2 ON x2.a=x1.b
> + JOIN t3 AS x3 ON x3.a=x2.b
> + JOIN t3 AS x4 ON x4.a=x3.b
> + JOIN t3 AS x5 ON x5.a=x4.b
> + JOIN t3 AS x6 ON x6.a=x5.b
> + JOIN t3 AS x7 ON x7.a=x6.b
> + JOIN t3 AS x8 ON x8.a=x7.b
> + JOIN t3 AS x9 ON x9.a=x8.b
> + JOIN t3 AS x10 ON x10.a=x9.b;
> + ]], {
> + 10991
> + })
> +
> +test:do_execsql_test(
> + "autoindex-1.7", [[
> + EXPLAIN QUERY PLAN SELECT count(*)
> + FROM t3 AS x1
> + JOIN t3 AS x2 ON x2.a=x1.b
> + JOIN t3 AS x3 ON x3.a=x2.b
> + JOIN t3 AS x4 ON x4.a=x3.b
> + JOIN t3 AS x5 ON x5.a=x4.b
> + JOIN t3 AS x6 ON x6.a=x5.b
> + JOIN t3 AS x7 ON x7.a=x6.b
> + JOIN t3 AS x8 ON x8.a=x7.b
> + JOIN t3 AS x9 ON x9.a=x8.b
> + JOIN t3 AS x10 ON x10.a=x9.b;
> + ]], {
> + 0,0,0,"SCAN TABLE T3 AS X1 (~1048576 rows)",
> + 0,1,1,"SEARCH TABLE T3 AS X2 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,2,2,"SEARCH TABLE T3 AS X3 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,3,3,"SEARCH TABLE T3 AS X4 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,4,4,"SEARCH TABLE T3 AS X5 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,5,5,"SEARCH TABLE T3 AS X6 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,6,6,"SEARCH TABLE T3 AS X7 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,7,7,"SEARCH TABLE T3 AS X8 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,8,8,"SEARCH TABLE T3 AS X9 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)",
> + 0,9,9,"SEARCH TABLE T3 AS X10 USING AUTOMATIC COVERING INDEX (A=?) (~20 rows)"
> + })
> +
> +test:finish_test()
> diff --git a/test/sql-tap/autoindex4.test.lua b/test/sql-tap/autoindex4.test.lua
> index 46488f13a..e22f49c67 100755
> --- a/test/sql-tap/autoindex4.test.lua
> +++ b/test/sql-tap/autoindex4.test.lua
> @@ -1,6 +1,6 @@
> #!/usr/bin/env tarantool
> test = require("sqltester")
> -test:plan(7)
> +test:plan(8)
>
> --!./tcltestrunner.lua
> -- 2014-10-24
> @@ -75,6 +75,18 @@ test:do_execsql_test(
> -- </autoindex4-1.4>
> })
>
> +test:do_execsql_test(
> + "autoindex4-2.0",
> + [[
> + CREATE TABLE t3(i INT PRIMARY KEY, e INT, f INT);
> + INSERT INTO t3 VALUES(1, 123,654), (2, 555,444), (3, 234,987);
> + SELECT (SELECT count(*) FROM t1, t2 WHERE a=e AND x=f), e, f, '|' FROM t3 ORDER BY i;
> + ]], {
> + -- <autoindex4-2.0>
> + 1, 123, 654, "|", 0, 555, 444, "|", 4, 234, 987, "|"
> + -- </autoindex4-2.0>
> + })
> +
> -- do_execsql_test autoindex4-2.0 {
> -- CREATE TABLE t3(e INT,f INT);
> -- INSERT INTO t3 VALUES(123,654),(555,444),(234,987);
> diff --git a/test/sql-tap/whereF.test.lua b/test/sql-tap/whereF.test.lua
> index 5a894b748..3235df437 100755
> --- a/test/sql-tap/whereF.test.lua
> +++ b/test/sql-tap/whereF.test.lua
> @@ -90,10 +90,20 @@ test:do_execsql_test(
>
> -- for _ in X(0, "X!foreach", [=[["tn sql","\n 1 \"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 2 \"SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 3 \"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n"]]=]) do
> for tn, sql in ipairs({"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
> - "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
> - "SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
> + "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
> test:do_test(
> - "2."..tn,
> + "2.1."..tn,
> + function()
> + return test:execsql("EXPLAIN QUERY PLAN "..sql)
> + end, {
> + '/SEARCH TABLE T1/',
> + '/SEARCH TABLE T2/'
> + })
> +end
> +
> +for tn, sql in ipairs({"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
> + test:do_test(
> + "2.2."..tn,
> function()
> return test:execsql("EXPLAIN QUERY PLAN "..sql)
> end, {
> diff --git a/test/sql/misc.result b/test/sql/misc.result
> index 6af11bfba..b52a83bda 100644
> --- a/test/sql/misc.result
> +++ b/test/sql/misc.result
> @@ -204,3 +204,38 @@ box.execute('SELECT field66, field68, field70 FROM test')
> box.space.TEST:drop()
> ---
> ...
> +-- gh-4933: Make sure that autoindex optimization is used.
> +box.execute('CREATE TABLE t1(i INT PRIMARY KEY, a INT);')
> +---
> +- row_count: 1
> +...
> +box.execute('CREATE TABLE t2(i INT PRIMARY KEY, b INT);')
> +---
> +- row_count: 1
> +...
> +for i = 1, 11000 do\
> + box.execute('INSERT INTO t1 VALUES ($1, $1);', {i})\
> + box.execute('INSERT INTO t2 VALUES ($1, $1);', {i})\
> +end
> +---
> +...
> +--
> +-- There is no need to insert values in the tables since planner assumes a
> +-- default number of tuples for each space, regardless of how many tuples there
> +-- actually are in those spaces. The default value is 1048576 (== 2^20).
> +--
> +box.execute('EXPLAIN QUERY PLAN SELECT a, b FROM t1, t2 WHERE a = b;')
> +---
> +- metadata:
> + - name: selectid
> + type: integer
> + - name: order
> + type: integer
> + - name: from
> + type: integer
> + - name: detail
> + type: text
> + rows:
> + - [0, 0, 0, 'SCAN TABLE T1 (~1048576 rows)']
> + - [0, 1, 1, 'SEARCH TABLE T2 USING AUTOMATIC COVERING INDEX (B=?) (~20 rows)']
> +...
> diff --git a/test/sql/misc.test.lua b/test/sql/misc.test.lua
> index 63244228c..d7673bb49 100644
> --- a/test/sql/misc.test.lua
> +++ b/test/sql/misc.test.lua
> @@ -64,3 +64,17 @@ box.execute('SELECT field70, field64 FROM test')
> pk:alter({parts = {66}})
> box.execute('SELECT field66, field68, field70 FROM test')
> box.space.TEST:drop()
> +
> +-- gh-4933: Make sure that autoindex optimization is used.
> +box.execute('CREATE TABLE t1(i INT PRIMARY KEY, a INT);')
> +box.execute('CREATE TABLE t2(i INT PRIMARY KEY, b INT);')
> +for i = 1, 11000 do\
> + box.execute('INSERT INTO t1 VALUES ($1, $1);', {i})\
> + box.execute('INSERT INTO t2 VALUES ($1, $1);', {i})\
> +end
> +--
> +-- There is no need to insert values in the tables since planner assumes a
> +-- default number of tuples for each space, regardless of how many tuples there
> +-- actually are in those spaces. The default value is 1048576 (== 2^20).
> +--
> +box.execute('EXPLAIN QUERY PLAN SELECT a, b FROM t1, t2 WHERE a = b;')
> --
> 2.25.1
>
More information about the Tarantool-patches
mailing list