> Also, even though we get a whole tuple, we actually use only needed columns
> from the tuple to create a new tuple and push it to the new ephemeral space.
Если для тапла в эфемерном спейсе хранятся не все колонки оригинального спейса,
то это не covering с точки зрения оригинального спейса.
>>> + sqlVdbeAddOp4(v, OP_OpenTEphemeral, reg_eph, nKeyCol + 1, 0,
>>> + (char *)pk_info, P4_KEYINFO);
>>> + sqlVdbeAddOp3(v, OP_IteratorOpen, pLevel->iIdxCur, 0, reg_eph);
>>> + VdbeComment((v, "for %s", space->def->name));
>>>
>>> /* Fill the automatic index with content */
>>> sqlExprCachePush(pParse);
>>> @@ -2841,12 +2885,12 @@ tnt_error:
>>> * those objects, since there is no opportunity to add schema
>>> * indexes on subqueries and views.
>>> */
>>> - pNew->rSetup = rLogSize + rSize + 4;
>>> - if (!pTab->def->opts.is_view &&
>>> - pTab->def->id == 0)
>>> - pNew->rSetup += 24;
>>> - if (pNew->rSetup < 0)
>>> - pNew->rSetup = 0;
>>> + pNew->rSetup = rLogSize + rSize;
>>> + if (!space->def->opts.is_view &&
>>> + space->def->id == 0)
>>> + pNew->rSetup += 28;
>>> + else
>>> + pNew->rSetup -= 10;
>>
>> 10. What is this? What are the numbers?
> These numbers I got from SQLite. Actually, we should find these numbers
> experementally, but I still do not have a proper way to do this.
>
> There is one more place where we should experemetally find a function
> of number of tuples in the space which allows us to decide more carefully when
> we want to use the optimization. Here it is:
> "rSize = DEFAULT_TUPLE_LOG_COUNT;"
>
> As you can see, this function is a constant for now. This is also from SQLite.
>
> In general, we should use statistics here, but we do not have it at the moment.
>
> These numbers are used to decide when to use auto-index. I think that since at
> the moment all these values are constants, auto-index is used always. This is
> obviously wrong. I plan to fix it as soon as I find a proper way to test
> performance.
Здесь нужен как минимум комментарий в коде, что это за числа. Тот комментарий, что
есть, очевидно устарел. Он говорит, что стоимость для view - 4, а у тебя - -10. То
есть для view ты считаешь, что лучше создать авто-индекс, чем не создавать, всегда.
Нет? Если так, то отрицательная стоимость выглядит не очень правильно.
>>> /* TUNING: Each index lookup yields 20 rows in the table. This
>>> * is more than the usual guess of 10 rows, since we have no way
>>> * of knowing how selective the index will ultimately be. It would
>>> @@ -4794,18 +4838,34 @@ sqlWhereEnd(WhereInfo * pWInfo)
>>> + continue;
>>> + }
>>> + /*
>>> + * In case we are using auto-indexing, we have
>>> + * to change the column number, because in
>>> + * ephemeral space columns are in the same order
>>> + * as they described in key_def. So, instead of
>>> + * the field number, we have to insert index of
>>> + * the part with this fieldno.
>>
>> 12. Why are the columns reordered?
> See above. Should I add this explanation somewhere is the patch?
Если в авто-индексе только ключевые колонки, от откуда в result set попадают
неключевые колонки, которые в WHERE не участвовали? Делается обратно лукап
в оригинальном спейсе что ли?
>>> + */
>>> + struct key_def *key_def =
>>> + pLevel->pWLoop->index_def->key_def;
>>> + uint32_t part_count = key_def->part_count;
>>> + for (uint32_t i = 0; i < part_count; ++i) {
>>> + if ((int)key_def->parts[i].fieldno == x)
>>> + pOp->p2 = i;
>>> }
>>> }
>>> }
>>> diff --git a/test/sql-tap/whereF.test.lua b/test/sql-tap/whereF.test.lua
>>> index 5a894b748..3235df437 100755
>>> --- a/test/sql-tap/whereF.test.lua
>>> +++ b/test/sql-tap/whereF.test.lua
>>> @@ -90,10 +90,20 @@ test:do_execsql_test(
>>>
>>> -- for _ in X(0, "X!foreach", [=[["tn sql","\n 1 \"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 2 \"SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 3 \"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n"]]=]) do
>>> for tn, sql in ipairs({"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>> - "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>> - "SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>> + "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>> test:do_test(
>>> - "2."..tn,
>>> + "2.1."..tn,
>>> + function()
>>> + return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>> + end, {
>>> + '/SEARCH TABLE T1/',
>>> + '/SEARCH TABLE T2/'
>>> + })
>>> +end
>>> +
>>> +for tn, sql in ipairs({"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>> + test:do_test(
>>> + "2.2."..tn,
>>> function()
>>> return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>> end, {
>>
>> 14. Shouldn't there be a test showing 'AUTOMATIC' keyword in the execution plan?
> Actually it does, in the test above.
Во всем whereF.test.lua файле слово automatic грепается только в каком-то комменте.
> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
> index c5f79f908..f6b533ab1 100644
> --- a/src/box/sql/where.c
> +++ b/src/box/sql/where.c
> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm, /* WHERE clause term to check */
> #endif
>
> #ifndef SQL_OMIT_AUTOMATIC_INDEX
> +
> +/**
> + * Generate code that will assemble an index key, adds rowid and stores it in
> + * register reg_out. The key will be for index which is an
> + * index on table. cursor is the index of a cursor open on the
> + * table table and pointing to the entry that needs indexing.
> + * cursor must be the cursor of the PRIMARY KEY index.
Лимит на ширину комента расширен до 80.
Кроме того, ты читал этот коммент? Что за 'index which is an index on table'?
Какой 'table'? Что за 'cursor open on the table table'? Я как будто
контрольные фразы из Бегущего По Лезвию читаю.
Параметры у тебя описаны в двух местах - вверху и в @param. Должно быть
одно место.
> + *
> + * @param parse Parsing context.
> + * @param idx_def The index definition for which to generate a key.
> + * @param cursor Cursor number from which to take column data.
> + * @param reg_out Put the new key into this register if not NULL.
> + * @param reg_eph Register holding adress of ephemeral space.
> + *
> + * @retval Return a register number which is the first in a block
@retval принимает два аргумента - значение и описание. То есть ты описал, что
функция возвращает значение 'Return' с описанием 'a register number ...'. Для
описания без конкретного значения есть @return/@returns.
> + * of registers that holds the elements of the index key. The
> + * block of registers has already been deallocated by the time
> + * this routine returns.
> + */
> +static int
> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
Было бы неплохо в названии функции отразить, что она работает только с эфемерными
спейсами.
> + int cursor, int reg_out, int reg_eph)
> +{
> + assert(reg_out != 0);
> + struct Vdbe *v = parse->pVdbe;
> + int col_cnt = idx_def->key_def->part_count;
> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
> + for (int j = 0; j < col_cnt; j++) {
> + uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
> + /*
> + * If the column type is NUMBER but the number
> + * is an integer, then it might be stored in the
> + * table as an integer (using a compact
> + * representation) then converted to REAL by an
> + * OP_Realify opcode. But we are getting
> + * ready to store this value back into an index,
> + * where it should be converted by to INTEGER
> + * again. So omit the OP_Realify opcode if
> + * it is present
> + */
> + sqlVdbeDeletePriorOpcode(v, OP_Realify);
> + }
> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
> +
> + sqlReleaseTempRange(parse, reg_base, col_cnt);
> + return reg_base;
> +}
> +
> /*
> - * Generate code to construct the Index object for an automatic index
> - * and to set up the WhereLevel object pLevel so that the code generator
> - * makes use of the automatic index.
> + * Generate code to construct the ephemeral space that contains used in query
> + * fields of one of the tables. The index of this ephemeral space will be known
> + * as an "automatic index". Also, this functions set up the WhereLevel object
> + * pLevel so that the code generator makes use of the automatic index.
> */
> static void
> constructAutomaticIndex(Parse * pParse, /* The parsing context */
> @@ -801,9 +852,11 @@ constructAutomaticIndex(Parse * pParse, /* The parsing context */
> n = 0;
> idxCols = 0;
> uint32_t size = sizeof(struct key_part_def) * nKeyCol;
> - struct key_part_def *part_def = malloc(size);
> + struct region *region = &fiber()->gc;
> + size_t used = region_used(region);
> + struct key_part_def *part_def = region_alloc(region, size);
Это раздолбает с ASANом из-за отсутствия выравнивания. Нужен region_alloc_array.
Почему не используешь Parse.region? Файберный обязательно где-нибудь проебется
почистить и начнет течь.
> if (part_def == NULL) {
> - diag_set(OutOfMemory, size, "malloc", "part_def");
> + diag_set(OutOfMemory, size, "region_alloc", "part_def");
> pParse->is_aborted = true;
> return;
> }
> @@ -867,7 +919,7 @@ constructAutomaticIndex(Parse * pParse, /* The parsing context */
> assert(n == nKeyCol);
>
> struct key_def *key_def = key_def_new(part_def, nKeyCol, false);
> - free(part_def);
> + region_truncate(region, used);
Смысла в этом мало. Либо регион и забить на освобожения, либо куча.
> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
> index e9e936856..f6b533ab1 100644
> --- a/src/box/sql/where.c
> +++ b/src/box/sql/where.c
> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm, /* WHERE clause term to check */
> #endif
>
> #ifndef SQL_OMIT_AUTOMATIC_INDEX
> +
> +/**
> + * Generate code that will assemble an index key, adds rowid and stores it in
> + * register reg_out. The key will be for index which is an
> + * index on table. cursor is the index of a cursor open on the
> + * table table and pointing to the entry that needs indexing.
> + * cursor must be the cursor of the PRIMARY KEY index.
> + *
> + * @param parse Parsing context.
> + * @param idx_def The index definition for which to generate a key.
Походу тебе достаточно передавать key_def. Весь idx_def не нужен.
> + * @param cursor Cursor number from which to take column data.
> + * @param reg_out Put the new key into this register if not NULL.
> + * @param reg_eph Register holding adress of ephemeral space.
> + *
> + * @retval Return a register number which is the first in a block
> + * of registers that holds the elements of the index key. The
> + * block of registers has already been deallocated by the time
> + * this routine returns.
> + */
> +static int
> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
> + int cursor, int reg_out, int reg_eph)
> +{
> + assert(reg_out != 0);
> + struct Vdbe *v = parse->pVdbe;
> + int col_cnt = idx_def->key_def->part_count;
> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
> + for (int j = 0; j < col_cnt; j++) {
> + uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
> + /*
> + * If the column type is NUMBER but the number
> + * is an integer, then it might be stored in the
> + * table as an integer (using a compact
> + * representation) then converted to REAL by an
> + * OP_Realify opcode. But we are getting
> + * ready to store this value back into an index,
> + * where it should be converted by to INTEGER
> + * again. So omit the OP_Realify opcode if
> + * it is present
> + */
> + sqlVdbeDeletePriorOpcode(v, OP_Realify);
> + }
> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
> +
> + sqlReleaseTempRange(parse, reg_base, col_cnt);
Выглядит как лютый хак - аллоцируется N + 1 временных регистров, но
освобождается только N, чтоб в последем че-то сохранить. То есть намеренная
утечка временного регистра на время выполнения запроса. Выглядит не очень.
Лучше бы это починить отдельным тикетом.