Re: [Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization

Tarantool development patches archive
 help / color / mirror / Atom feed

From: "Мерген Имеев" <imeevma@tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: "Mergen Imeev" <imeevma@gmail.com>, tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization
Date: Tue, 22 Sep 2020 08:03:06 +0300	[thread overview]
Message-ID: <1600750986.926093408@f173.i.mail.ru> (raw)
In-Reply-To: <6147fe3a-fd68-53b3-3a95-a7321de79183@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 30682 bytes --]


Привет! Спасибо за ревью! Я прошу прощения, на данный момент я могу только ответить на вопросы ревьюё Все изменения и примеры я смогу отправить завтра.
  
>Вторник, 22 сентября 2020, 0:37 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
> 
>Привет! Спасибо за патч!
>
>On 20.09.2020 23:17, Mergen Imeev wrote:
>> Hi! Thank you for the review. My answers, diff and new patch below.
>>
>> I tried to answer a few qustions, but at the end decided to not include them
>> in this patch. Here are these questions:
>
>Давай по-русски. Я че-то не понимаю ничего. Я этих вопросов не задавал. Это
>твои собственные вопросы для самопроверки, на которые ты решил ответить
>публично?

Мне казалось, что эти вопросы имеют значение для этого патча, поэтому я включил их сюда.
>
>> 1) Can we use secondary index instead of an ephemeral space?
>> This look a bit difficult, and it cannot be used with view since they do not
>> have an actual index.
>
>Это не только сложно, но и неправильно. Создавать вторичный индекс на каждый
>запрос выглядит очень жестко. Кто будет гарантировать их удаление (так как
>это изменение схемы, а значит попадает в WAL? Как сделать их невидимыми для
>юзеров, и "видеть" их только в одном запросе? Как быть с винилом? Слишком
>много вопросов.

Понял, спасибо. У меня были вопросы другого плана — как работать с этим вторичным индексом что-бы все не сломалось в SQL, но твои вопросы более глобальны. Мне казалось, что создание индекса вместо пересоздания таплов будет выгоднее, но возникает слишком много проблем. Я думаю пока стоит отказаться от этого варианта.
 
>
>Хотя в далекой перспективе звучит круто - "эфемерные индексы". Прямо таки
>название для дисера. Могло бы помочь в мемтиксе, чтоб "эфемерные индексы"
>хранили ссылки на таплы, а не копии таплов.
>
>> 2) Can we get rid of 'viaCoroutine' branches in constructAutomaticIndex()?
>> As far as I understand, this flag is always false here.
>
>Не представляю что это. И с патчем походу не связано.

Не совсем не связано. Дело в том, что если мы удалим бранч с viaCoroutine, т.к. он у нас на данный момент всегда false, у нас упростится код. Например функция generate_index_key() станет void.
 
>
>> 3) Can I get rid of OP_Realify?
>> I think I can, but decided to do this no in this patch.
>
>Ты пробовал его просто дропнуть? Ломается что-то?

Пробовал, ничего не ломается. Если посмотреть на все места использования OP_Realify (их два) и посмотреть на старый код, еще до удалаления типа Real из SQL, то там видно, что этот опкод использовался для преобразования INT в REAL. Когда REAL быз заменен на NUMBER это преобразование осталось, хотя, как мне кажется, в нем небыло нужды. После добавления DOUBLE в паре мест этот опкод все еще существует, однако используется неправильно. Я думаю этот опкод больше ненужен. Однако, мне кажется его стоит убрать отдельным патчем вне этой задачи.
>
>> On Wed, Sep 09, 2020 at 11:58:09PM +0200, Vladislav Shpilevoy wrote:
>>> Hi! Thanks for the patch!
>>>
>>> See 14 comments below.
>>>
>>> On 04.09.2020 13:53,  imeevma@tarantool.org wrote:
>>>> This patch enables the "auto-index" optimization in SQL. The auto-index
>>>> is actually an ephemeral space that contain some columns of the original
>>>> space.
>>>
>>> 1. In the patch the code calls the index 'covering' meaning that it
>>> contains all the columns. What is true?
>> Both, I think, since this index is the PK of mentioned ephemeral space.
>
>Что значит both? Индекс либо covering (содержит все колонки), либо нет (содержит
>не все).

Суть в том, что idx_def содержит не все колонки оригинального спейса, однако индекс создаваемый для эфемерного спейса содержит все колонки. При этом, создание idx_def не приводит к созданию индекса.
 
Я попробую описать весь механизм:
*  Планировщик определяет, что будет использоваться автоматический индекс.
*  Создается idx_def, который содержит все использующиеся в запросе колонки оригинального спейса. Не только те, которые используются во where. Это делается для того, что бы больше не обращаться к оригинальному спейсу, а работать только с эфемерным спейсом. Этот idx_def не используется для создания индекса.
*  Создается эфемерный спейс на основе созданного ранее idx_def. Помимо колонок оригинального спейса добавляется rowid, т.к. возможны случаи, когда значения во всех колонках совпадает в нескольких записях. При этом, колонки в эфемерном спейсе расположены в том же порядке, в каком они описаны в индексе.  Т.е. они, скорее всего, расположены не в том же порядке, в каком они расположены в оригинальном спейсе.
*  Для созданного эфемерного спейса создается индекс, которые является covering. Именно поэтому в некоторых местах написано, что создается covering index.
*  Т.к. планировщик посчитал, что будет использоваться автоиндекс, то в качестве спейса из которого будут выбраны таплы мы будем использовать созданный нами эфемерный спейс. Однако, во время построения vdbe-кода в качестве fieldno было использовано расположение колонок в оригинальном спейсе. Поэтому, в случае использования автоиндекса мы заменяем fieldno оригинального спейса в OP_Column на fieldno в эфемерном спейсе используя созданный ранее idx_def.
 
>
>> We create index definition for the original space, however this index definition
>> is not used for any actual index. This index definition is used as a connection
>> between original space and created ephemeral space.
>
>Не понял ничего. Перефразируй на русском плиз, может так понятнее станет.

Здесь я имел ввиду то, что описал в п.2 и  п.5 выше.
>
>>>> In some cases, this can speed up execution, since creating a new
>>>> index and searching using it can be more cost efficient than scanning.
>>>
>>> 2. Could you provide an example?
>> At the moment examples that can show that in some cases SQL works faster with
>> enabled optimization are quite big. One of such examples is Q13 for TPC-H.
>> Should I provide data generation and query?
>
>Нет, скрипт слишком большой наверняка и не скажет ничего. Я пытаюсь понять,
>для каких запросов это применимо. Как эти запросы описать? Как по запросу понять,
>будет там автоиндекс или нет? Конкретный пример запроса может и поможет, но я
>хз, я просто не знаю как он выглядит.

Понял. Я добавлю пример и описание того, котгда автоиндекс применяется.  На данный момент могу сказать, что одним из случаев когда он применяется — запросы с использованием join. Этот вопрос я опишу более попдробно чуть позже.
>
>Вот ты добавил "Auto-index" optimization is now enabled в changelog. Я юзер, и не
>представляю что это такое. Ты отправишь меня читать код TPC-H бенча, чтобы понять?

Понял,  исправлю.
>
>>>> Co-authored-by: Mergen Imeev < imeevma@gmail.com >
>>>> Co-authored-by: Timur Safin < tsafin@tarantool.org >
>>>> ---
>>>>  https://github.com/tarantool/tarantool/issues/4933
>>>>  https://github.com/tarantool/tarantool/tree/imeevma/gh-4933-autoindex
>>>>
>>>> @ChangeLog
>>>> - "Auto-index" optimization is now enabled (gh-4933).
>>>>
>>>> src/box/CMakeLists.txt | 2 +-
>>>> src/box/sql.c | 2 +-
>>>> src/box/sql/delete.c | 16 ++--
>>>> src/box/sql/sqlInt.h | 8 +-
>>>> src/box/sql/where.c | 170 +++++++++++++++++++++++------------
>>>> src/box/sql/wherecode.c | 13 +--
>>>> test/sql-tap/whereF.test.lua | 16 +++-
>>>> 7 files changed, 151 insertions(+), 76 deletions(-)
>>>>
>>>> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
>>>> index b8b2689d2..7e3ad0e22 100644
>>>> --- a/src/box/CMakeLists.txt
>>>> +++ b/src/box/CMakeLists.txt
>>>> @@ -217,7 +217,7 @@ add_library(box STATIC
>>>> if(CMAKE_BUILD_TYPE STREQUAL "Debug")
>>>> add_definitions(-DSQL_DEBUG=1)
>>>> endif()
>>>> -add_definitions(-DSQL_OMIT_AUTOMATIC_INDEX=1 -DSQL_TEST=1)
>>>> +add_definitions(-DSQL_TEST=1)
>>>
>>> 3. I still see SQL_OMIT_AUTOMATIC_INDEX in src/box/sql/where.c.
>> I think the original idea was to make an option to disable this optimization.
>> Since such thing is already exists, I decided to not remove it.
>
>У нас нет конфигурации опций сборки SQL. Мы их все выпиливаем, так как билды
>у нас под один конкретный набор опций. Есть runtime опции, но это ничего общего
>с макросами не имеет. Если этот макрос более не указывается, то надо его
>удалить отовсюду.

Понял, исправлю. Я думаю в этом случае я добавлю новую опцию в session_settings, которая будет отключать эту оптимизацию.
>
>>>> set(EXT_SRC_DIR ${CMAKE_SOURCE_DIR}/extra)
>>>> set(EXT_BIN_DIR ${CMAKE_BINARY_DIR}/extra)
>>>> diff --git a/src/box/sql/delete.c b/src/box/sql/delete.c
>>>> index 68abd1f58..57478c129 100644
>>>> --- a/src/box/sql/delete.c
>>>> +++ b/src/box/sql/delete.c
>>>> @@ -546,24 +546,25 @@ sql_generate_row_delete(struct Parse *parse, struct space *space,
>>>> }
>>>>
>>>> int
>>>> -sql_generate_index_key(struct Parse *parse, struct index *index, int cursor,
>>>> - int reg_out, struct index *prev, int reg_prev)
>>>> +sql_generate_index_key(struct Parse *parse, struct index_def *idx_def,
>>>> + int cursor, int reg_out, struct index *prev,
>>>> + int reg_prev, int reg_eph)
>>>
>>> 4. The function has nothing to do with ephemeral spaces. It just does not
>>> care whether its space is ephemeral. Its task is to just assemble a key,
>>> not caring about space type. Why did you make it always work with an
>>> ephemeral space? Won't this will affect normal spaces - they don't need
>>> OP_NextIdEphemeral or whatever else is related to ephemerals.
>>>
>>> In the end of the review I realized the function is never used for anything
>>> except automatic index. Moreover, prev and reg_prev are NULL and 0 always.
>>> I suggest to move this function into the file, which needs it; make it
>>> 'static'; remove 'prev' and 'reg_prev' args; rename it to something closer to
>>> what it actually does.
>> Done. I refactored this function a bit while moving. However, I decided to
>> not remove part with 'OP_Realify', even though I think this is deprecared code.
>> From what I see, this opcode is outdated and should be removed, but not in this
>> patch. I will send a new patch later, as a refactoring.
>>
>>>
>>> 6. The function talks about 'covering' index, but covering makes no
>>> sense in Tarantool. It is not possible to fetch a part of tuple. All
>>> indexes in Tarantool, from box API point of view, are covering. So
>>> why is this concept still here? Can we remove it and simplify things?
>>>
>> It is true that we can get only a whole tuple, however the main feature of the
>> covering indexes that it contains all needed information about space field.
>
>Нет, в терминологии sqlite covering означает именно наличие всех колонок. И в
>этом смысле оно в коде и осталось. Это было нужно, так как индексы в sqlite
>хранят только ключевые колонки. Если запрос имел колонки не из индекса, нужно
>было делать поиск в таблице (!= индекс). После covering индексов делать второй
>поиск было не нужно.

Используемый индекс содержит все колонки эфемерного спейса, поэтому он covering.
 
>
>Такой смысл вкладывается covering, и в таком смысле он ничего в тарантуле не
>значит, так как не-covering есть только в виниле, но
>1) публичного апи для доступа к сырым индексам винила нет;
>2) все равно нужен поиск в первичном индексе, так как вторичный индекс может
>содержать удаленный мусор в случае наших LSM-деревьев.

Ок, понял. Изучу э
>
>> For example, we do not need to look at the space definition to find field type if
>> we have covering index.
>
>Да, но это опять же не связано никак с понятием covering.
>
>> It may be not so important in Tarantool as it was in
>> SQLite, however this concept is used quite often in SQL code.
>
>Это легаси от sqlite. В тарантуле все индексы считаются covering.
>
>> I do not think
>> that we should fix this issue here.
>
>Это зависит от того, насколько это усложняет код, чтобы раздрачивать эти
>covering/не-covering. Судя по коду я так понимаю, что в эфемерный спейс
>попадают не все колонки оригинального спейса, а только нужные для индекса?
>Или для индекса нужны как раз все?

В эфемерный спейс попадают все используемые в запросе колонки, не только те, что используются во where. При этом это скорее всего не все колонки оригинального спейса.
>
>> Also, even though we get a whole tuple, we actually use only needed columns
>> from the tuple to create a new tuple and push it to the new ephemeral space.
>
>Если для тапла в эфемерном спейсе хранятся не все колонки оригинального спейса,
>то это не covering с точки зрения оригинального спейса.
>
>>>> + sqlVdbeAddOp4(v, OP_OpenTEphemeral, reg_eph, nKeyCol + 1, 0,
>>>> + (char *)pk_info, P4_KEYINFO);
>>>> + sqlVdbeAddOp3(v, OP_IteratorOpen, pLevel->iIdxCur, 0, reg_eph);
>>>> + VdbeComment((v, "for %s", space->def->name));
>>>>
>>>> /* Fill the automatic index with content */
>>>> sqlExprCachePush(pParse);
>>>> @@ -2841,12 +2885,12 @@ tnt_error:
>>>> * those objects, since there is no opportunity to add schema
>>>> * indexes on subqueries and views.
>>>> */
>>>> - pNew->rSetup = rLogSize + rSize + 4;
>>>> - if (!pTab->def->opts.is_view &&
>>>> - pTab->def->id == 0)
>>>> - pNew->rSetup += 24;
>>>> - if (pNew->rSetup < 0)
>>>> - pNew->rSetup = 0;
>>>> + pNew->rSetup = rLogSize + rSize;
>>>> + if (!space->def->opts.is_view &&
>>>> + space->def->id == 0)
>>>> + pNew->rSetup += 28;
>>>> + else
>>>> + pNew->rSetup -= 10;
>>>
>>> 10. What is this? What are the numbers?
>> These numbers I got from SQLite. Actually, we should find these numbers
>> experementally, but I still do not have a proper way to do this.
>>
>> There is one more place where we should experemetally find a function
>> of number of tuples in the space which allows us to decide more carefully when
>> we want to use the optimization. Here it is:
>> "rSize = DEFAULT_TUPLE_LOG_COUNT;"
>>
>> As you can see, this function is a constant for now. This is also from SQLite.
>>
>> In general, we should use statistics here, but we do not have it at the moment.
>>
>> These numbers are used to decide when to use auto-index. I think that since at
>> the moment all these values are constants, auto-index is used always. This is
>> obviously wrong. I plan to fix it as soon as I find a proper way to test
>> performance.
>
>Здесь нужен как минимум комментарий в коде, что это за числа. Тот комментарий, что
>есть, очевидно устарел. Он говорит, что стоимость для view - 4, а у тебя - -10. То
>есть для view ты считаешь, что лучше создать авто-индекс, чем не создавать, всегда.
>Нет? Если так, то отрицательная стоимость выглядит не очень правильно.
>
>>>> /* TUNING: Each index lookup yields 20 rows in the table. This
>>>> * is more than the usual guess of 10 rows, since we have no way
>>>> * of knowing how selective the index will ultimately be. It would
>>>> @@ -4794,18 +4838,34 @@ sqlWhereEnd(WhereInfo * pWInfo)
>>>> + continue;
>>>> + }
>>>> + /*
>>>> + * In case we are using auto-indexing, we have
>>>> + * to change the column number, because in
>>>> + * ephemeral space columns are in the same order
>>>> + * as they described in key_def. So, instead of
>>>> + * the field number, we have to insert index of
>>>> + * the part with this fieldno.
>>>
>>> 12. Why are the columns reordered?
>> See above. Should I add this explanation somewhere is the patch?
>
>Если в авто-индексе только ключевые колонки, от откуда в result set попадают
>неключевые колонки, которые в WHERE не участвовали? Делается обратно лукап
>в оригинальном спейсе что ли?
>
>>>> + */
>>>> + struct key_def *key_def =
>>>> + pLevel->pWLoop->index_def->key_def;
>>>> + uint32_t part_count = key_def->part_count;
>>>> + for (uint32_t i = 0; i < part_count; ++i) {
>>>> + if ((int)key_def->parts[i].fieldno == x)
>>>> + pOp->p2 = i;
>>>> }
>>>> }
>>>> }
>>>> diff --git a/test/sql-tap/whereF.test.lua b/test/sql-tap/whereF.test.lua
>>>> index 5a894b748..3235df437 100755
>>>> --- a/test/sql-tap/whereF.test.lua
>>>> +++ b/test/sql-tap/whereF.test.lua
>>>> @@ -90,10 +90,20 @@ test:do_execsql_test(
>>>>
>>>> -- for _ in X(0, "X!foreach", [=[["tn sql","\n 1 \"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 2 \"SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 3 \"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n"]]=]) do
>>>> for tn, sql in ipairs({"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>>> - "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>>> - "SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>> + "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>> test:do_test(
>>>> - "2."..tn,
>>>> + "2.1."..tn,
>>>> + function()
>>>> + return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>>> + end, {
>>>> + '/SEARCH TABLE T1/',
>>>> + '/SEARCH TABLE T2/'
>>>> + })
>>>> +end
>>>> +
>>>> +for tn, sql in ipairs({"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>> + test:do_test(
>>>> + "2.2."..tn,
>>>> function()
>>>> return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>>> end, {
>>>
>>> 14. Shouldn't there be a test showing 'AUTOMATIC' keyword in the execution plan?
>> Actually it does, in the test above.
>
>Во всем whereF.test.lua файле слово automatic грепается только в каком-то комменте.
>
>> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
>> index c5f79f908..f6b533ab1 100644
>> --- a/src/box/sql/where.c
>> +++ b/src/box/sql/where.c
>> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm, /* WHERE clause term to check */
>> #endif
>>
>> #ifndef SQL_OMIT_AUTOMATIC_INDEX
>> +
>> +/**
>> + * Generate code that will assemble an index key, adds rowid and stores it in
>> + * register reg_out. The key will be for index which is an
>> + * index on table. cursor is the index of a cursor open on the
>> + * table table and pointing to the entry that needs indexing.
>> + * cursor must be the cursor of the PRIMARY KEY index.
>
>Лимит на ширину комента расширен до 80.
>
>Кроме того, ты читал этот коммент? Что за 'index which is an index on table'?
>Какой 'table'? Что за 'cursor open on the table table'? Я как будто
>контрольные фразы из Бегущего По Лезвию читаю.
>
>Параметры у тебя описаны в двух местах - вверху и в @param. Должно быть
>одно место.
>
>> + *
>> + * @param parse Parsing context.
>> + * @param idx_def The index definition for which to generate a key.
>> + * @param cursor Cursor number from which to take column data.
>> + * @param reg_out Put the new key into this register if not NULL.
>> + * @param reg_eph Register holding adress of ephemeral space.
>> + *
>> + * @retval Return a register number which is the first in a block
>
>@retval принимает два аргумента - значение и описание. То есть ты описал, что
>функция возвращает значение 'Return' с описанием 'a register number ...'. Для
>описания без конкретного значения есть @return/@returns.
>
>> + * of registers that holds the elements of the index key. The
>> + * block of registers has already been deallocated by the time
>> + * this routine returns.
>> + */
>> +static int
>> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
>
>Было бы неплохо в названии функции отразить, что она работает только с эфемерными
>спейсами.
>
>> + int cursor, int reg_out, int reg_eph)
>> +{
>> + assert(reg_out != 0);
>> + struct Vdbe *v = parse->pVdbe;
>> + int col_cnt = idx_def->key_def->part_count;
>> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
>> + for (int j = 0; j < col_cnt; j++) {
>> + uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
>> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
>> + /*
>> + * If the column type is NUMBER but the number
>> + * is an integer, then it might be stored in the
>> + * table as an integer (using a compact
>> + * representation) then converted to REAL by an
>> + * OP_Realify opcode. But we are getting
>> + * ready to store this value back into an index,
>> + * where it should be converted by to INTEGER
>> + * again. So omit the OP_Realify opcode if
>> + * it is present
>> + */
>> + sqlVdbeDeletePriorOpcode(v, OP_Realify);
>> + }
>> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
>> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
>> +
>> + sqlReleaseTempRange(parse, reg_base, col_cnt);
>> + return reg_base;
>> +}
>> +
>> /*
>> - * Generate code to construct the Index object for an automatic index
>> - * and to set up the WhereLevel object pLevel so that the code generator
>> - * makes use of the automatic index.
>> + * Generate code to construct the ephemeral space that contains used in query
>> + * fields of one of the tables. The index of this ephemeral space will be known
>> + * as an "automatic index". Also, this functions set up the WhereLevel object
>> + * pLevel so that the code generator makes use of the automatic index.
>> */
>> static void
>> constructAutomaticIndex(Parse * pParse, /* The parsing context */
>> @@ -801,9 +852,11 @@ constructAutomaticIndex(Parse * pParse, /* The parsing context */
>> n = 0;
>> idxCols = 0;
>> uint32_t size = sizeof(struct key_part_def) * nKeyCol;
>> - struct key_part_def *part_def = malloc(size);
>> + struct region *region = &fiber()->gc;
>> + size_t used = region_used(region);
>> + struct key_part_def *part_def = region_alloc(region, size);
>
>Это раздолбает с ASANом из-за отсутствия выравнивания. Нужен region_alloc_array.
>
>Почему не используешь Parse.region? Файберный обязательно где-нибудь проебется
>почистить и начнет течь.
>
>> if (part_def == NULL) {
>> - diag_set(OutOfMemory, size, "malloc", "part_def");
>> + diag_set(OutOfMemory, size, "region_alloc", "part_def");
>> pParse->is_aborted = true;
>> return;
>> }
>> @@ -867,7 +919,7 @@ constructAutomaticIndex(Parse * pParse, /* The parsing context */
>> assert(n == nKeyCol);
>>
>> struct key_def *key_def = key_def_new(part_def, nKeyCol, false);
>> - free(part_def);
>> + region_truncate(region, used);
>
>Смысла в этом мало. Либо регион и забить на освобожения, либо куча.
>
>> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
>> index e9e936856..f6b533ab1 100644
>> --- a/src/box/sql/where.c
>> +++ b/src/box/sql/where.c
>> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm, /* WHERE clause term to check */
>> #endif
>>
>> #ifndef SQL_OMIT_AUTOMATIC_INDEX
>> +
>> +/**
>> + * Generate code that will assemble an index key, adds rowid and stores it in
>> + * register reg_out. The key will be for index which is an
>> + * index on table. cursor is the index of a cursor open on the
>> + * table table and pointing to the entry that needs indexing.
>> + * cursor must be the cursor of the PRIMARY KEY index.
>> + *
>> + * @param parse Parsing context.
>> + * @param idx_def The index definition for which to generate a key.
>
>Походу тебе достаточно передавать key_def. Весь idx_def не нужен.
>
>> + * @param cursor Cursor number from which to take column data.
>> + * @param reg_out Put the new key into this register if not NULL.
>> + * @param reg_eph Register holding adress of ephemeral space.
>> + *
>> + * @retval Return a register number which is the first in a block
>> + * of registers that holds the elements of the index key. The
>> + * block of registers has already been deallocated by the time
>> + * this routine returns.
>> + */
>> +static int
>> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
>> + int cursor, int reg_out, int reg_eph)
>> +{
>> + assert(reg_out != 0);
>> + struct Vdbe *v = parse->pVdbe;
>> + int col_cnt = idx_def->key_def->part_count;
>> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
>> + for (int j = 0; j < col_cnt; j++) {
>> + uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
>> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
>> + /*
>> + * If the column type is NUMBER but the number
>> + * is an integer, then it might be stored in the
>> + * table as an integer (using a compact
>> + * representation) then converted to REAL by an
>> + * OP_Realify opcode. But we are getting
>> + * ready to store this value back into an index,
>> + * where it should be converted by to INTEGER
>> + * again. So omit the OP_Realify opcode if
>> + * it is present
>> + */
>> + sqlVdbeDeletePriorOpcode(v, OP_Realify);
>> + }
>> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
>> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
>> +
>> + sqlReleaseTempRange(parse, reg_base, col_cnt);
>
>Выглядит как лютый хак - аллоцируется N + 1 временных регистров, но
>освобождается только N, чтоб в последем че-то сохранить. То есть намеренная
>утечка временного регистра на время выполнения запроса. Выглядит не очень.
>Лучше бы это починить отдельным тикетом. 
 
 
-- <br/>Мерген Имеев
 

[-- Attachment #2: Type: text/html, Size: 36957 bytes --]

next prev parent reply	other threads:[~2020-09-22  5:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04 11:53 imeevma
2020-09-09 21:58 ` Vladislav Shpilevoy
2020-09-20 21:17   ` Mergen Imeev
2020-09-21 21:37     ` Vladislav Shpilevoy
2020-09-22  5:03       ` Мерген Имеев [this message]
2020-09-22  5:25         ` Мерген Имеев
2020-09-24 20:45           ` Vladislav Shpilevoy
2020-09-24 20:45         ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1600750986.926093408@f173.i.mail.ru \
    --to=imeevma@tarantool.org \
    --cc=imeevma@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox