Re: [Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization

Tarantool development patches archive
 help / color / mirror / Atom feed

From: "Мерген Имеев" <imeevma@tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: "Mergen Imeev" <imeevma@gmail.com>, tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization
Date: Tue, 22 Sep 2020 08:25:15 +0300	[thread overview]
Message-ID: <1600752315.389449454@f362.i.mail.ru> (raw)
In-Reply-To: <1600750986.926093408@f173.i.mail.ru>

[-- Attachment #1: Type: text/plain, Size: 33380 bytes --]


Прошу прощения, я случайно отправил не полностью написанное письмо.

  
>Вторник, 22 сентября 2020, 8:03 +03:00 от Мерген Имеев <imeevma@tarantool.org>:
> 
>Привет! Спасибо за ревью! Я прошу прощения, на данный момент я могу только ответить на вопросы ревьюё Все изменения и примеры я смогу отправить завтра.
>  
>>Вторник, 22 сентября 2020, 0:37 +03:00 от Vladislav Shpilevoy < v.shpilevoy@tarantool.org >:
>> 
>>Привет! Спасибо за патч!
>>
>>On 20.09.2020 23:17, Mergen Imeev wrote:
>>> Hi! Thank you for the review. My answers, diff and new patch below.
>>>
>>> I tried to answer a few qustions, but at the end decided to not include them
>>> in this patch. Here are these questions:
>>
>>Давай по-русски. Я че-то не понимаю ничего. Я этих вопросов не задавал. Это
>>твои собственные вопросы для самопроверки, на которые ты решил ответить
>>публично?
>
>Мне казалось, что эти вопросы имеют значение для этого патча, поэтому я включил их сюда.
>>
>>> 1) Can we use secondary index instead of an ephemeral space?
>>> This look a bit difficult, and it cannot be used with view since they do not
>>> have an actual index.
>>
>>Это не только сложно, но и неправильно. Создавать вторичный индекс на каждый
>>запрос выглядит очень жестко. Кто будет гарантировать их удаление (так как
>>это изменение схемы, а значит попадает в WAL? Как сделать их невидимыми для
>>юзеров, и "видеть" их только в одном запросе? Как быть с винилом? Слишком
>>много вопросов.
>
>Понял, спасибо. У меня были вопросы другого плана — как работать с этим вторичным индексом что-бы все не сломалось в SQL, но твои вопросы более глобальны. Мне казалось, что создание индекса вместо пересоздания таплов будет выгоднее, но возникает слишком много проблем. Я думаю пока стоит отказаться от этого варианта.
> 
>>
>>Хотя в далекой перспективе звучит круто - "эфемерные индексы". Прямо таки
>>название для дисера. Могло бы помочь в мемтиксе, чтоб "эфемерные индексы"
>>хранили ссылки на таплы, а не копии таплов.
>>
>>> 2) Can we get rid of 'viaCoroutine' branches in constructAutomaticIndex()?
>>> As far as I understand, this flag is always false here.
>>
>>Не представляю что это. И с патчем походу не связано.
>
>Не совсем не связано. Дело в том, что если мы удалим бранч с viaCoroutine, т.к. он у нас на данный момент всегда false, у нас упростится код. Например функция generate_index_key() станет void.
> 
>>
>>> 3) Can I get rid of OP_Realify?
>>> I think I can, but decided to do this no in this patch.
>>
>>Ты пробовал его просто дропнуть? Ломается что-то?
>
>Пробовал, ничего не ломается. Если посмотреть на все места использования OP_Realify (их два) и посмотреть на старый код, еще до удалаления типа Real из SQL, то там видно, что этот опкод использовался для преобразования INT в REAL. Когда REAL быз заменен на NUMBER это преобразование осталось, хотя, как мне кажется, в нем небыло нужды. После добавления DOUBLE в паре мест этот опкод все еще существует, однако используется неправильно. Я думаю этот опкод больше ненужен. Однако, мне кажется его стоит убрать отдельным патчем вне этой задачи.
>>
>>> On Wed, Sep 09, 2020 at 11:58:09PM +0200, Vladislav Shpilevoy wrote:
>>>> Hi! Thanks for the patch!
>>>>
>>>> See 14 comments below.
>>>>
>>>> On 04.09.2020 13:53,  imeevma@tarantool.org wrote:
>>>>> This patch enables the "auto-index" optimization in SQL. The auto-index
>>>>> is actually an ephemeral space that contain some columns of the original
>>>>> space.
>>>>
>>>> 1. In the patch the code calls the index 'covering' meaning that it
>>>> contains all the columns. What is true?
>>> Both, I think, since this index is the PK of mentioned ephemeral space.
>>
>>Что значит both? Индекс либо covering (содержит все колонки), либо нет (содержит
>>не все).
>
>Суть в том, что idx_def содержит не все колонки оригинального спейса, однако индекс создаваемый для эфемерного спейса содержит все колонки. При этом, создание idx_def не приводит к созданию индекса.
> 
>Я попробую описать весь механизм:
>*  Планировщик определяет, что будет использоваться автоматический индекс.
>*  Создается idx_def, который содержит все использующиеся в запросе колонки оригинального спейса. Не только те, которые используются во where. Это делается для того, что бы больше не обращаться к оригинальному спейсу, а работать только с эфемерным спейсом. Этот idx_def не используется для создания индекса.
>*  Создается эфемерный спейс на основе созданного ранее idx_def. Помимо колонок оригинального спейса добавляется rowid, т.к. возможны случаи, когда значения во всех колонках совпадает в нескольких записях. При этом, колонки в эфемерном спейсе расположены в том же порядке, в каком они описаны в индексе.  Т.е. они, скорее всего, расположены не в том же порядке, в каком они расположены в оригинальном спейсе.
>*  Для созданного эфемерного спейса создается индекс, которые является covering. Именно поэтому в некоторых местах написано, что создается covering index.
>*  Т.к. планировщик посчитал, что будет использоваться автоиндекс, то в качестве спейса из которого будут выбраны таплы мы будем использовать созданный нами эфемерный спейс. Однако, во время построения vdbe-кода в качестве fieldno было использовано расположение колонок в оригинальном спейсе. Поэтому, в случае использования автоиндекса мы заменяем fieldno оригинального спейса в OP_Column на fieldno в эфемерном спейсе используя созданный ранее idx_def.
> 
>>
>>> We create index definition for the original space, however this index definition
>>> is not used for any actual index. This index definition is used as a connection
>>> between original space and created ephemeral space.
>>
>>Не понял ничего. Перефразируй на русском плиз, может так понятнее станет.
>
>Здесь я имел ввиду то, что описал в п.2 и  п.5 выше.
>>
>>>>> In some cases, this can speed up execution, since creating a new
>>>>> index and searching using it can be more cost efficient than scanning.
>>>>
>>>> 2. Could you provide an example?
>>> At the moment examples that can show that in some cases SQL works faster with
>>> enabled optimization are quite big. One of such examples is Q13 for TPC-H.
>>> Should I provide data generation and query?
>>
>>Нет, скрипт слишком большой наверняка и не скажет ничего. Я пытаюсь понять,
>>для каких запросов это применимо. Как эти запросы описать? Как по запросу понять,
>>будет там автоиндекс или нет? Конкретный пример запроса может и поможет, но я
>>хз, я просто не знаю как он выглядит.
>
>Понял. Я добавлю пример и описание того, котгда автоиндекс применяется.  На данный момент могу сказать, что одним из случаев когда он применяется — запросы с использованием join. Этот вопрос я опишу более попдробно чуть позже.
>>
>>Вот ты добавил "Auto-index" optimization is now enabled в changelog. Я юзер, и не
>>представляю что это такое. Ты отправишь меня читать код TPC-H бенча, чтобы понять?
>
>Понял,  исправлю.
>>
>>>>> Co-authored-by: Mergen Imeev < imeevma@gmail.com >
>>>>> Co-authored-by: Timur Safin < tsafin@tarantool.org >
>>>>> ---
>>>>>  https://github.com/tarantool/tarantool/issues/4933
>>>>>  https://github.com/tarantool/tarantool/tree/imeevma/gh-4933-autoindex
>>>>>
>>>>> @ChangeLog
>>>>> - "Auto-index" optimization is now enabled (gh-4933).
>>>>>
>>>>> src/box/CMakeLists.txt | 2 +-
>>>>> src/box/sql.c | 2 +-
>>>>> src/box/sql/delete.c | 16 ++--
>>>>> src/box/sql/sqlInt.h | 8 +-
>>>>> src/box/sql/where.c | 170 +++++++++++++++++++++++------------
>>>>> src/box/sql/wherecode.c | 13 +--
>>>>> test/sql-tap/whereF.test.lua | 16 +++-
>>>>> 7 files changed, 151 insertions(+), 76 deletions(-)
>>>>>
>>>>> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
>>>>> index b8b2689d2..7e3ad0e22 100644
>>>>> --- a/src/box/CMakeLists.txt
>>>>> +++ b/src/box/CMakeLists.txt
>>>>> @@ -217,7 +217,7 @@ add_library(box STATIC
>>>>> if(CMAKE_BUILD_TYPE STREQUAL "Debug")
>>>>> add_definitions(-DSQL_DEBUG=1)
>>>>> endif()
>>>>> -add_definitions(-DSQL_OMIT_AUTOMATIC_INDEX=1 -DSQL_TEST=1)
>>>>> +add_definitions(-DSQL_TEST=1)
>>>>
>>>> 3. I still see SQL_OMIT_AUTOMATIC_INDEX in src/box/sql/where.c.
>>> I think the original idea was to make an option to disable this optimization.
>>> Since such thing is already exists, I decided to not remove it.
>>
>>У нас нет конфигурации опций сборки SQL. Мы их все выпиливаем, так как билды
>>у нас под один конкретный набор опций. Есть runtime опции, но это ничего общего
>>с макросами не имеет. Если этот макрос более не указывается, то надо его
>>удалить отовсюду.
>
>Понял, исправлю. Я думаю в этом случае я добавлю новую опцию в session_settings, которая будет отключать эту оптимизацию.
>>
>>>>> set(EXT_SRC_DIR ${CMAKE_SOURCE_DIR}/extra)
>>>>> set(EXT_BIN_DIR ${CMAKE_BINARY_DIR}/extra)
>>>>> diff --git a/src/box/sql/delete.c b/src/box/sql/delete.c
>>>>> index 68abd1f58..57478c129 100644
>>>>> --- a/src/box/sql/delete.c
>>>>> +++ b/src/box/sql/delete.c
>>>>> @@ -546,24 +546,25 @@ sql_generate_row_delete(struct Parse *parse, struct space *space,
>>>>> }
>>>>>
>>>>> int
>>>>> -sql_generate_index_key(struct Parse *parse, struct index *index, int cursor,
>>>>> - int reg_out, struct index *prev, int reg_prev)
>>>>> +sql_generate_index_key(struct Parse *parse, struct index_def *idx_def,
>>>>> + int cursor, int reg_out, struct index *prev,
>>>>> + int reg_prev, int reg_eph)
>>>>
>>>> 4. The function has nothing to do with ephemeral spaces. It just does not
>>>> care whether its space is ephemeral. Its task is to just assemble a key,
>>>> not caring about space type. Why did you make it always work with an
>>>> ephemeral space? Won't this will affect normal spaces - they don't need
>>>> OP_NextIdEphemeral or whatever else is related to ephemerals.
>>>>
>>>> In the end of the review I realized the function is never used for anything
>>>> except automatic index. Moreover, prev and reg_prev are NULL and 0 always.
>>>> I suggest to move this function into the file, which needs it; make it
>>>> 'static'; remove 'prev' and 'reg_prev' args; rename it to something closer to
>>>> what it actually does.
>>> Done. I refactored this function a bit while moving. However, I decided to
>>> not remove part with 'OP_Realify', even though I think this is deprecared code.
>>> From what I see, this opcode is outdated and should be removed, but not in this
>>> patch. I will send a new patch later, as a refactoring.
>>>
>>>>
>>>> 6. The function talks about 'covering' index, but covering makes no
>>>> sense in Tarantool. It is not possible to fetch a part of tuple. All
>>>> indexes in Tarantool, from box API point of view, are covering. So
>>>> why is this concept still here? Can we remove it and simplify things?
>>>>
>>> It is true that we can get only a whole tuple, however the main feature of the
>>> covering indexes that it contains all needed information about space field.
>>
>>Нет, в терминологии sqlite covering означает именно наличие всех колонок. И в
>>этом смысле оно в коде и осталось. Это было нужно, так как индексы в sqlite
>>хранят только ключевые колонки. Если запрос имел колонки не из индекса, нужно
>>было делать поиск в таблице (!= индекс). После covering индексов делать второй
>>поиск было не нужно.
>
>Используемый индекс содержит все колонки эфемерного спейса, поэтому он covering.
> 
>>
>>Такой смысл вкладывается covering, и в таком смысле он ничего в тарантуле не
>>значит, так как не-covering есть только в виниле, но
>>1) публичного апи для доступа к сырым индексам винила нет;
>>2) все равно нужен поиск в первичном индексе, так как вторичный индекс может
>>содержать удаленный мусор в случае наших LSM-деревьев.
>
>Ок, понял. Изучу э

Ок, понял. Изучу вопрос с covering в SQL подробнее.
>>
>>> For example, we do not need to look at the space definition to find field type if
>>> we have covering index.
>>
>>Да, но это опять же не связано никак с понятием covering.
>>
>>> It may be not so important in Tarantool as it was in
>>> SQLite, however this concept is used quite often in SQL code.
>>
>>Это легаси от sqlite. В тарантуле все индексы считаются covering.
>>
>>> I do not think
>>> that we should fix this issue here.
>>
>>Это зависит от того, насколько это усложняет код, чтобы раздрачивать эти
>>covering/не-covering. Судя по коду я так понимаю, что в эфемерный спейс
>>попадают не все колонки оригинального спейса, а только нужные для индекса?
>>Или для индекса нужны как раз все?
>
>В эфемерный спейс попадают все используемые в запросе колонки, не только те, что используются во where. При этом это скорее всего не все колонки оригинального спейса.
>>
>>> Also, even though we get a whole tuple, we actually use only needed columns
>>> from the tuple to create a new tuple and push it to the new ephemeral space.
>>
>>Если для тапла в эфемерном спейсе хранятся не все колонки оригинального спейса,
>>то это не covering с точки зрения оригинального спейса.

Мне кажется с точки зрения индекса это covering. Однако, скоре всего ты прав - с точки зрения индекса это covering, а с точки зрения «auto-index» в целом это будет не covering. Исправлю этот момент.
>>
>>>>> + sqlVdbeAddOp4(v, OP_OpenTEphemeral, reg_eph, nKeyCol + 1, 0,
>>>>> + (char *)pk_info, P4_KEYINFO);
>>>>> + sqlVdbeAddOp3(v, OP_IteratorOpen, pLevel->iIdxCur, 0, reg_eph);
>>>>> + VdbeComment((v, "for %s", space->def->name));
>>>>>
>>>>> /* Fill the automatic index with content */
>>>>> sqlExprCachePush(pParse);
>>>>> @@ -2841,12 +2885,12 @@ tnt_error:
>>>>> * those objects, since there is no opportunity to add schema
>>>>> * indexes on subqueries and views.
>>>>> */
>>>>> - pNew->rSetup = rLogSize + rSize + 4;
>>>>> - if (!pTab->def->opts.is_view &&
>>>>> - pTab->def->id == 0)
>>>>> - pNew->rSetup += 24;
>>>>> - if (pNew->rSetup < 0)
>>>>> - pNew->rSetup = 0;
>>>>> + pNew->rSetup = rLogSize + rSize;
>>>>> + if (!space->def->opts.is_view &&
>>>>> + space->def->id == 0)
>>>>> + pNew->rSetup += 28;
>>>>> + else
>>>>> + pNew->rSetup -= 10;
>>>>
>>>> 10. What is this? What are the numbers?
>>> These numbers I got from SQLite. Actually, we should find these numbers
>>> experementally, but I still do not have a proper way to do this.
>>>
>>> There is one more place where we should experemetally find a function
>>> of number of tuples in the space which allows us to decide more carefully when
>>> we want to use the optimization. Here it is:
>>> "rSize = DEFAULT_TUPLE_LOG_COUNT;"
>>>
>>> As you can see, this function is a constant for now. This is also from SQLite.
>>>
>>> In general, we should use statistics here, but we do not have it at the moment.
>>>
>>> These numbers are used to decide when to use auto-index. I think that since at
>>> the moment all these values are constants, auto-index is used always. This is
>>> obviously wrong. I plan to fix it as soon as I find a proper way to test
>>> performance.
>>
>>Здесь нужен как минимум комментарий в коде, что это за числа. Тот комментарий, что
>>есть, очевидно устарел. Он говорит, что стоимость для view - 4, а у тебя - -10. То
>>есть для view ты считаешь, что лучше создать авто-индекс, чем не создавать, всегда.
>>Нет? Если так, то отрицательная стоимость выглядит не очень правильно.
Изучу этот вопрос подробнее и отпишусь в следующем письме.
>>
>>>>> /* TUNING: Each index lookup yields 20 rows in the table. This
>>>>> * is more than the usual guess of 10 rows, since we have no way
>>>>> * of knowing how selective the index will ultimately be. It would
>>>>> @@ -4794,18 +4838,34 @@ sqlWhereEnd(WhereInfo * pWInfo)
>>>>> + continue;
>>>>> + }
>>>>> + /*
>>>>> + * In case we are using auto-indexing, we have
>>>>> + * to change the column number, because in
>>>>> + * ephemeral space columns are in the same order
>>>>> + * as they described in key_def. So, instead of
>>>>> + * the field number, we have to insert index of
>>>>> + * the part with this fieldno.
>>>>
>>>> 12. Why are the columns reordered?
>>> See above. Should I add this explanation somewhere is the patch?
>>
>>Если в авто-индексе только ключевые колонки, от откуда в result set попадают
>>неключевые колонки, которые в WHERE не участвовали? Делается обратно лукап
>>в оригинальном спейсе что ли?
В эферный спейс попадают все колонки, использующиеся в запросе. Поэтому нам больше нет нужды смотреть в оригинальный спейс.
>>
>>>>> + */
>>>>> + struct key_def *key_def =
>>>>> + pLevel->pWLoop->index_def->key_def;
>>>>> + uint32_t part_count = key_def->part_count;
>>>>> + for (uint32_t i = 0; i < part_count; ++i) {
>>>>> + if ((int)key_def->parts[i].fieldno == x)
>>>>> + pOp->p2 = i;
>>>>> }
>>>>> }
>>>>> }
>>>>> diff --git a/test/sql-tap/whereF.test.lua b/test/sql-tap/whereF.test.lua
>>>>> index 5a894b748..3235df437 100755
>>>>> --- a/test/sql-tap/whereF.test.lua
>>>>> +++ b/test/sql-tap/whereF.test.lua
>>>>> @@ -90,10 +90,20 @@ test:do_execsql_test(
>>>>>
>>>>> -- for _ in X(0, "X!foreach", [=[["tn sql","\n 1 \"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 2 \"SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n 3 \"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n"]]=]) do
>>>>> for tn, sql in ipairs({"SELECT * FROM t1, t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>>>> - "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>>>> - "SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>>> + "SELECT * FROM t2, t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>>> test:do_test(
>>>>> - "2."..tn,
>>>>> + "2.1."..tn,
>>>>> + function()
>>>>> + return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>>>> + end, {
>>>>> + '/SEARCH TABLE T1/',
>>>>> + '/SEARCH TABLE T2/'
>>>>> + })
>>>>> +end
>>>>> +
>>>>> +for tn, sql in ipairs({"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>>> + test:do_test(
>>>>> + "2.2."..tn,
>>>>> function()
>>>>> return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>>>> end, {
>>>>
>>>> 14. Shouldn't there be a test showing 'AUTOMATIC' keyword in the execution plan?
>>> Actually it does, in the test above.
>>
>>Во всем whereF.test.lua файле слово automatic грепается только в каком-то комменте.
Понял, исправлю и отпишу в следующем письме.
>>
>>> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
>>> index c5f79f908..f6b533ab1 100644
>>> --- a/src/box/sql/where.c
>>> +++ b/src/box/sql/where.c
>>> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm, /* WHERE clause term to check */
>>> #endif
>>>
>>> #ifndef SQL_OMIT_AUTOMATIC_INDEX
>>> +
>>> +/**
>>> + * Generate code that will assemble an index key, adds rowid and stores it in
>>> + * register reg_out. The key will be for index which is an
>>> + * index on table. cursor is the index of a cursor open on the
>>> + * table table and pointing to the entry that needs indexing.
>>> + * cursor must be the cursor of the PRIMARY KEY index.
>>
>>Лимит на ширину комента расширен до 80.
>>
>>Кроме того, ты читал этот коммент? Что за 'index which is an index on table'?
>>Какой 'table'? Что за 'cursor open on the table table'? Я как будто
>>контрольные фразы из Бегущего По Лезвию читаю.
Понял, исправлю и отпишусь в следующем письме.
>>
>>Параметры у тебя описаны в двух местах - вверху и в @param. Должно быть
>>одно место.
>>
>>> + *
>>> + * @param parse Parsing context.
>>> + * @param idx_def The index definition for which to generate a key.
>>> + * @param cursor Cursor number from which to take column data.
>>> + * @param reg_out Put the new key into this register if not NULL.
>>> + * @param reg_eph Register holding adress of ephemeral space.
>>> + *
>>> + * @retval Return a register number which is the first in a block
>>
>>@retval принимает два аргумента - значение и описание. То есть ты описал, что
>>функция возвращает значение 'Return' с описанием 'a register number ...'. Для
>>описания без конкретного значения есть @return/@returns.
Понял, исправлю.
>>
>>> + * of registers that holds the elements of the index key. The
>>> + * block of registers has already been deallocated by the time
>>> + * this routine returns.
>>> + */
>>> +static int
>>> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
>>
>>Было бы неплохо в названии функции отразить, что она работает только с эфемерными
>>спейсами.

Понял, исправлю.
>>
>>> + int cursor, int reg_out, int reg_eph)
>>> +{
>>> + assert(reg_out != 0);
>>> + struct Vdbe *v = parse->pVdbe;
>>> + int col_cnt = idx_def->key_def->part_count;
>>> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
>>> + for (int j = 0; j < col_cnt; j++) {
>>> + uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
>>> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
>>> + /*
>>> + * If the column type is NUMBER but the number
>>> + * is an integer, then it might be stored in the
>>> + * table as an integer (using a compact
>>> + * representation) then converted to REAL by an
>>> + * OP_Realify opcode. But we are getting
>>> + * ready to store this value back into an index,
>>> + * where it should be converted by to INTEGER
>>> + * again. So omit the OP_Realify opcode if
>>> + * it is present
>>> + */
>>> + sqlVdbeDeletePriorOpcode(v, OP_Realify);
>>> + }
>>> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
>>> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
>>> +
>>> + sqlReleaseTempRange(parse, reg_base, col_cnt);
>>> + return reg_base;
>>> +}
>>> +
>>> /*
>>> - * Generate code to construct the Index object for an automatic index
>>> - * and to set up the WhereLevel object pLevel so that the code generator
>>> - * makes use of the automatic index.
>>> + * Generate code to construct the ephemeral space that contains used in query
>>> + * fields of one of the tables. The index of this ephemeral space will be known
>>> + * as an "automatic index". Also, this functions set up the WhereLevel object
>>> + * pLevel so that the code generator makes use of the automatic index.
>>> */
>>> static void
>>> constructAutomaticIndex(Parse * pParse, /* The parsing context */
>>> @@ -801,9 +852,11 @@ constructAutomaticIndex(Parse * pParse, /* The parsing context */
>>> n = 0;
>>> idxCols = 0;
>>> uint32_t size = sizeof(struct key_part_def) * nKeyCol;
>>> - struct key_part_def *part_def = malloc(size);
>>> + struct region *region = &fiber()->gc;
>>> + size_t used = region_used(region);
>>> + struct key_part_def *part_def = region_alloc(region, size);
>>
>>Это раздолбает с ASANом из-за отсутствия выравнивания. Нужен region_alloc_array.
Исправлю, но мне казалочь, что это временый объект и его выравнивание нигде не проверяется.
>>
>>Почему не используешь Parse.region? Файберный обязательно где-нибудь проебется
>>почистить и начнет течь.
Понял, исправлю.
>>
>>> if (part_def == NULL) {
>>> - diag_set(OutOfMemory, size, "malloc", "part_def");
>>> + diag_set(OutOfMemory, size, "region_alloc", "part_def");
>>> pParse->is_aborted = true;
>>> return;
>>> }
>>> @@ -867,7 +919,7 @@ constructAutomaticIndex(Parse * pParse, /* The parsing context */
>>> assert(n == nKeyCol);
>>>
>>> struct key_def *key_def = key_def_new(part_def, nKeyCol, false);
>>> - free(part_def);
>>> + region_truncate(region, used);
>>
>>Смысла в этом мало. Либо регион и забить на освобожения, либо куча.

Понял, уберу после замены региона с файберного на парсерный.
>>
>>> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
>>> index e9e936856..f6b533ab1 100644
>>> --- a/src/box/sql/where.c
>>> +++ b/src/box/sql/where.c
>>> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm, /* WHERE clause term to check */
>>> #endif
>>>
>>> #ifndef SQL_OMIT_AUTOMATIC_INDEX
>>> +
>>> +/**
>>> + * Generate code that will assemble an index key, adds rowid and stores it in
>>> + * register reg_out. The key will be for index which is an
>>> + * index on table. cursor is the index of a cursor open on the
>>> + * table table and pointing to the entry that needs indexing.
>>> + * cursor must be the cursor of the PRIMARY KEY index.
>>> + *
>>> + * @param parse Parsing context.
>>> + * @param idx_def The index definition for which to generate a key.
>>
>>Походу тебе достаточно передавать key_def. Весь idx_def не нужен.
Понял, исправлю.
>>
>>> + * @param cursor Cursor number from which to take column data.
>>> + * @param reg_out Put the new key into this register if not NULL.
>>> + * @param reg_eph Register holding adress of ephemeral space.
>>> + *
>>> + * @retval Return a register number which is the first in a block
>>> + * of registers that holds the elements of the index key. The
>>> + * block of registers has already been deallocated by the time
>>> + * this routine returns.
>>> + */
>>> +static int
>>> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
>>> + int cursor, int reg_out, int reg_eph)
>>> +{
>>> + assert(reg_out != 0);
>>> + struct Vdbe *v = parse->pVdbe;
>>> + int col_cnt = idx_def->key_def->part_count;
>>> + int reg_base = sqlGetTempRange(parse, col_cnt + 1);
>>> + for (int j = 0; j < col_cnt; j++) {
>>> + uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
>>> + sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
>>> + /*
>>> + * If the column type is NUMBER but the number
>>> + * is an integer, then it might be stored in the
>>> + * table as an integer (using a compact
>>> + * representation) then converted to REAL by an
>>> + * OP_Realify opcode. But we are getting
>>> + * ready to store this value back into an index,
>>> + * where it should be converted by to INTEGER
>>> + * again. So omit the OP_Realify opcode if
>>> + * it is present
>>> + */
>>> + sqlVdbeDeletePriorOpcode(v, OP_Realify);
>>> + }
>>> + sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
>>> + sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
>>> +
>>> + sqlReleaseTempRange(parse, reg_base, col_cnt);
>>
>>Выглядит как лютый хак - аллоцируется N + 1 временных регистров, но
>>освобождается только N, чтоб в последем че-то сохранить. То есть намеренная
>>утечка временного регистра на время выполнения запроса. Выглядит не очень.
>>Лучше бы это починить отдельным тикетом.

Это не хак, а скорее баг. Я исправлю количество освождаемых mem на N + 1. А то, что используется N + 1 мем — это нужно исправить на N + 2. При этом, этот мем используется в только в бранче оператора if, который сейчас всегда пропускается т.к. viaCoroutine всегда false в этой функции. Я пока не смог проверить работоспособность этого бранча.
> 
>-- <br/>Мерген Имеев
>  
 
 
-- <br/>Мерген Имеев
 

[-- Attachment #2: Type: text/html, Size: 45100 bytes --]

next prev parent reply	other threads:[~2020-09-22  5:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04 11:53 imeevma
2020-09-09 21:58 ` Vladislav Shpilevoy
2020-09-20 21:17   ` Mergen Imeev
2020-09-21 21:37     ` Vladislav Shpilevoy
2020-09-22  5:03       ` Мерген Имеев
2020-09-22  5:25         ` Мерген Имеев [this message]
2020-09-24 20:45           ` Vladislav Shpilevoy
2020-09-24 20:45         ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1600752315.389449454@f362.i.mail.ru \
    --to=imeevma@tarantool.org \
    --cc=imeevma@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox