[Tarantool-patches] [PATCH v1 1/1] sql: enable autoindex optimization

Tue Sep 22 00:37:48 MSK 2020

Привет! Спасибо за патч!

On 20.09.2020 23:17, Mergen Imeev wrote:
> Hi! Thank you for the review. My answers, diff and new patch below.
> 
> I tried to answer a few qustions, but at the end decided to not include them
> in this patch. Here are these questions:

Давай по-русски. Я че-то не понимаю ничего. Я этих вопросов не задавал. Это
твои собственные вопросы для самопроверки, на которые ты решил ответить
публично?

> 1) Can we use secondary index instead of an ephemeral space?
> This look a bit difficult, and it cannot be used with view since they do not
> have an actual index.

Это не только сложно, но и неправильно. Создавать вторичный индекс на каждый
запрос выглядит очень жестко. Кто будет гарантировать их удаление (так как
это изменение схемы, а значит попадает в WAL? Как сделать их невидимыми для
юзеров, и "видеть" их только в одном запросе? Как быть с винилом? Слишком
много вопросов.

Хотя в далекой перспективе звучит круто - "эфемерные индексы". Прямо таки
название для дисера. Могло бы помочь в мемтиксе, чтоб "эфемерные индексы"
хранили ссылки на таплы, а не копии таплов.

> 2) Can we get rid of 'viaCoroutine' branches in constructAutomaticIndex()?
> As far as I understand, this flag is always false here.

Не представляю что это. И с патчем походу не связано.

> 3) Can I get rid of OP_Realify?
> I think I can, but decided to do this no in this patch.

Ты пробовал его просто дропнуть? Ломается что-то?

> On Wed, Sep 09, 2020 at 11:58:09PM +0200, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> See 14 comments below.
>>
>> On 04.09.2020 13:53, imeevma at tarantool.org wrote:
>>> This patch enables the "auto-index" optimization in SQL. The auto-index
>>> is actually an ephemeral space that contain some columns of the original
>>> space.
>>
>> 1. In the patch the code calls the index 'covering' meaning that it
>> contains all the columns. What is true?
> Both, I think, since this index is the PK of mentioned ephemeral space.

Что значит both? Индекс либо covering (содержит все колонки), либо нет (содержит
не все).

> We create index definition for the original space, however this index definition
> is not used for any actual index. This index definition is used as a connection
> between original space and created ephemeral space.

Не понял ничего. Перефразируй на русском плиз, может так понятнее станет.

>>> In some cases, this can speed up execution, since creating a new
>>> index and searching using it can be more cost efficient than scanning.
>>
>> 2. Could you provide an example?
> At the moment examples that can show that in some cases SQL works faster with
> enabled optimization are quite big. One of such examples is Q13 for TPC-H.
> Should I provide data generation and query?

Нет, скрипт слишком большой наверняка и не скажет ничего. Я пытаюсь понять,
для каких запросов это применимо. Как эти запросы описать? Как по запросу понять,
будет там автоиндекс или нет? Конкретный пример запроса может и поможет, но я
хз, я просто не знаю как он выглядит.

Вот ты добавил "Auto-index" optimization is now enabled в changelog. Я юзер, и не
представляю что это такое. Ты отправишь меня читать код TPC-H бенча, чтобы понять?

>>> Co-authored-by: Mergen Imeev <imeevma at gmail.com>
>>> Co-authored-by: Timur Safin <tsafin at tarantool.org>
>>> ---
>>> https://github.com/tarantool/tarantool/issues/4933
>>> https://github.com/tarantool/tarantool/tree/imeevma/gh-4933-autoindex
>>>
>>> @ChangeLog
>>>  - "Auto-index" optimization is now enabled (gh-4933).
>>>
>>>  src/box/CMakeLists.txt       |   2 +-
>>>  src/box/sql.c                |   2 +-
>>>  src/box/sql/delete.c         |  16 ++--
>>>  src/box/sql/sqlInt.h         |   8 +-
>>>  src/box/sql/where.c          | 170 +++++++++++++++++++++++------------
>>>  src/box/sql/wherecode.c      |  13 +--
>>>  test/sql-tap/whereF.test.lua |  16 +++-
>>>  7 files changed, 151 insertions(+), 76 deletions(-)
>>>
>>> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
>>> index b8b2689d2..7e3ad0e22 100644
>>> --- a/src/box/CMakeLists.txt
>>> +++ b/src/box/CMakeLists.txt
>>> @@ -217,7 +217,7 @@ add_library(box STATIC
>>>  if(CMAKE_BUILD_TYPE STREQUAL "Debug")
>>>    add_definitions(-DSQL_DEBUG=1)
>>>  endif()
>>> -add_definitions(-DSQL_OMIT_AUTOMATIC_INDEX=1 -DSQL_TEST=1)
>>> +add_definitions(-DSQL_TEST=1)
>>
>> 3. I still see SQL_OMIT_AUTOMATIC_INDEX in src/box/sql/where.c.
> I think the original idea was to make an option to disable this optimization.
> Since such thing is already exists, I decided to not remove it.

У нас нет конфигурации опций сборки SQL. Мы их все выпиливаем, так как билды
у нас под один конкретный набор опций. Есть runtime опции, но это ничего общего
с макросами не имеет. Если этот макрос более не указывается, то надо его
удалить отовсюду.

>>>  set(EXT_SRC_DIR ${CMAKE_SOURCE_DIR}/extra)
>>>  set(EXT_BIN_DIR ${CMAKE_BINARY_DIR}/extra)
>>> diff --git a/src/box/sql/delete.c b/src/box/sql/delete.c
>>> index 68abd1f58..57478c129 100644
>>> --- a/src/box/sql/delete.c
>>> +++ b/src/box/sql/delete.c
>>> @@ -546,24 +546,25 @@ sql_generate_row_delete(struct Parse *parse, struct space *space,
>>>  }
>>>  
>>>  int
>>> -sql_generate_index_key(struct Parse *parse, struct index *index, int cursor,
>>> -		       int reg_out, struct index *prev, int reg_prev)
>>> +sql_generate_index_key(struct Parse *parse, struct index_def *idx_def,
>>> +		       int cursor, int reg_out, struct index *prev,
>>> +		       int reg_prev, int reg_eph)
>>
>> 4. The function has nothing to do with ephemeral spaces. It just does not
>> care whether its space is ephemeral. Its task is to just assemble a key,
>> not caring about space type. Why did you make it always work with an
>> ephemeral space? Won't this will affect normal spaces - they don't need
>> OP_NextIdEphemeral or whatever else is related to ephemerals.
>>
>> In the end of the review I realized the function is never used for anything
>> except automatic index. Moreover, prev and reg_prev are NULL and 0 always.
>> I suggest to move this function into the file, which needs it; make it
>> 'static'; remove 'prev' and 'reg_prev' args; rename it to something closer to
>> what it actually does.
> Done. I refactored this function a bit while moving. However, I decided to
> not remove part with 'OP_Realify', even though I think this is deprecared code.
> From what I see, this opcode is outdated and should be removed, but not in this
> patch. I will send a new patch later, as a refactoring.
> 
>>
>> 6. The function talks about 'covering' index, but covering makes no
>> sense in Tarantool. It is not possible to fetch a part of tuple. All
>> indexes in Tarantool, from box API point of view, are covering. So
>> why is this concept still here? Can we remove it and simplify things?
>>
> It is true that we can get only a whole tuple, however the main feature of the
> covering indexes that it contains all needed information about space field.

Нет, в терминологии sqlite covering означает именно наличие всех колонок. И в
этом смысле оно в коде и осталось. Это было нужно, так как индексы в sqlite
хранят только ключевые колонки. Если запрос имел колонки не из индекса, нужно
было делать поиск в таблице (!= индекс). После covering индексов делать второй
поиск было не нужно.

Такой смысл вкладывается covering, и в таком смысле он ничего в тарантуле не
значит, так как не-covering есть только в виниле, но
1) публичного апи для доступа к сырым индексам винила нет;
2) все равно нужен поиск в первичном индексе, так как вторичный индекс может
содержать удаленный мусор в случае наших LSM-деревьев.

> For example, we do not need to look at the space definition to find field type if
> we have covering index.

Да, но это опять же не связано никак с понятием covering.

> It may be not so important in Tarantool as it was in
> SQLite, however this concept is used quite often in SQL code.

Это легаси от sqlite. В тарантуле все индексы считаются covering.

> I do not think
> that we should fix this issue here.

Это зависит от того, насколько это усложняет код, чтобы раздрачивать эти
covering/не-covering. Судя по коду я так понимаю, что в эфемерный спейс
попадают не все колонки оригинального спейса, а только нужные для индекса?
Или для индекса нужны как раз все?

> Also, even though we get a whole tuple, we actually use only needed columns
> from the tuple to create a new tuple and push it to the new ephemeral space.

Если для тапла в эфемерном спейсе хранятся не все колонки оригинального спейса,
то это не covering с точки зрения оригинального спейса.

>>> +	sqlVdbeAddOp4(v, OP_OpenTEphemeral, reg_eph, nKeyCol + 1, 0,
>>> +		      (char *)pk_info, P4_KEYINFO);
>>> +	sqlVdbeAddOp3(v, OP_IteratorOpen, pLevel->iIdxCur, 0, reg_eph);
>>> +	VdbeComment((v, "for %s", space->def->name));
>>>  
>>>  	/* Fill the automatic index with content */
>>>  	sqlExprCachePush(pParse);
>>> @@ -2841,12 +2885,12 @@ tnt_error:
>>>  				 * those objects, since there is no opportunity to add schema
>>>  				 * indexes on subqueries and views.
>>>  				 */
>>> -				pNew->rSetup = rLogSize + rSize + 4;
>>> -				if (!pTab->def->opts.is_view &&
>>> -				    pTab->def->id == 0)
>>> -					pNew->rSetup += 24;
>>> -				if (pNew->rSetup < 0)
>>> -					pNew->rSetup = 0;
>>> +				pNew->rSetup = rLogSize + rSize;
>>> +				if (!space->def->opts.is_view &&
>>> +				    space->def->id == 0)
>>> +					pNew->rSetup += 28;
>>> +				else
>>> +					pNew->rSetup -= 10;
>>
>> 10. What is this? What are the numbers?
> These numbers I got from SQLite. Actually, we should find these numbers
> experementally, but I still do not have a proper way to do this.
> 
> There is one more place where we should experemetally find a function
> of number of tuples in the space which allows us to decide more carefully when
> we want to use the optimization. Here it is:
> "rSize = DEFAULT_TUPLE_LOG_COUNT;"
> 
> As you can see, this function is a constant for now. This is also from SQLite.
> 
> In general, we should use statistics here, but we do not have it at the moment.
> 
> These numbers are used to decide when to use auto-index. I think that since at
> the moment all these values are constants, auto-index is used always. This is
> obviously wrong. I plan to fix it as soon as I find a proper way to test
> performance.

Здесь нужен как минимум комментарий в коде, что это за числа. Тот комментарий, что
есть, очевидно устарел. Он говорит, что стоимость для view - 4, а у тебя - -10. То
есть для view ты считаешь, что лучше создать авто-индекс, чем не создавать, всегда.
Нет? Если так, то отрицательная стоимость выглядит не очень правильно.

>>>  				/* TUNING: Each index lookup yields 20 rows in the table.  This
>>>  				 * is more than the usual guess of 10 rows, since we have no way
>>>  				 * of knowing how selective the index will ultimately be.  It would
>>> @@ -4794,18 +4838,34 @@ sqlWhereEnd(WhereInfo * pWInfo)
>>> +					continue;
>>> +				}
>>> +				/*
>>> +				 * In case we are using auto-indexing, we have
>>> +				 * to change the column number, because in
>>> +				 * ephemeral space columns are in the same order
>>> +				 * as they described in key_def. So, instead of
>>> +				 * the field number, we have to insert index of
>>> +				 * the part with this fieldno.
>>
>> 12. Why are the columns reordered?
> See above. Should I add this explanation somewhere is the patch?

Если в авто-индексе только ключевые колонки, от откуда в result set попадают
неключевые колонки, которые в WHERE не участвовали? Делается обратно лукап
в оригинальном спейсе что ли?

>>> +				 */
>>> +				struct key_def *key_def =
>>> +					pLevel->pWLoop->index_def->key_def;
>>> +				uint32_t part_count = key_def->part_count;
>>> +				for (uint32_t i = 0; i < part_count; ++i) {
>>> +					if ((int)key_def->parts[i].fieldno == x)
>>> +						pOp->p2 = i;
>>>  				}
>>>  			}
>>>  		}
>>> diff --git a/test/sql-tap/whereF.test.lua b/test/sql-tap/whereF.test.lua
>>> index 5a894b748..3235df437 100755
>>> --- a/test/sql-tap/whereF.test.lua
>>> +++ b/test/sql-tap/whereF.test.lua
>>> @@ -90,10 +90,20 @@ test:do_execsql_test(
>>>  
>>>  -- for _ in X(0, "X!foreach", [=[["tn sql","\n  1 \"SELECT * FROM t1,           t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n  2 \"SELECT * FROM t2,           t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n  3 \"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e\"\n"]]=]) do
>>>  for tn, sql in ipairs({"SELECT * FROM t1,           t2 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>> -                       "SELECT * FROM t2,           t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e",
>>> -                       "SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>> +                       "SELECT * FROM t2,           t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>>      test:do_test(
>>> -        "2."..tn,
>>> +        "2.1."..tn,
>>> +        function()
>>> +            return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>> +        end, {
>>> +            '/SEARCH TABLE T1/',
>>> +            '/SEARCH TABLE T2/'
>>> +        })
>>> +end
>>> +
>>> +for tn, sql in ipairs({"SELECT * FROM t2 CROSS JOIN t1 WHERE t1.a>? AND t2.d>t1.c AND t1.b=t2.e"}) do
>>> +    test:do_test(
>>> +        "2.2."..tn,
>>>          function()
>>>              return test:execsql("EXPLAIN QUERY PLAN "..sql)
>>>          end, {
>>
>> 14. Shouldn't there be a test showing 'AUTOMATIC' keyword in the execution plan?
> Actually it does, in the test above.

Во всем whereF.test.lua файле слово automatic грепается только в каком-то комменте.

> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
> index c5f79f908..f6b533ab1 100644
> --- a/src/box/sql/where.c
> +++ b/src/box/sql/where.c
> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm,	/* WHERE clause term to check */
>  #endif
>  
>  #ifndef SQL_OMIT_AUTOMATIC_INDEX
> +
> +/**
> + * Generate code that will assemble an index key, adds rowid and stores it in
> + * register reg_out. The key will be for index which is an
> + * index on table. cursor is the index of a cursor open on the
> + * table table and pointing to the entry that needs indexing.
> + * cursor must be the cursor of the PRIMARY KEY index.

Лимит на ширину комента расширен до 80.

Кроме того, ты читал этот коммент? Что за 'index which is an index on table'?
Какой 'table'? Что за 'cursor open on the table table'? Я как будто
контрольные фразы из Бегущего По Лезвию читаю.

Параметры у тебя описаны в двух местах - вверху и в @param. Должно быть
одно место.

> + *
> + * @param parse Parsing context.
> + * @param idx_def The index definition for which to generate a key.
> + * @param cursor Cursor number from which to take column data.
> + * @param reg_out Put the new key into this register if not NULL.
> + * @param reg_eph Register holding adress of ephemeral space.
> + *
> + * @retval Return a register number which is the first in a block

@retval принимает два аргумента - значение и описание. То есть ты описал, что
функция возвращает значение 'Return' с описанием 'a register number ...'. Для
описания без конкретного значения есть @return/@returns.

> + * of registers that holds the elements of the index key. The
> + * block of registers has already been deallocated by the time
> + * this routine returns.
> + */
> +static int
> +generate_index_key(struct Parse *parse, struct index_def *idx_def,

Было бы неплохо в названии функции отразить, что она работает только с эфемерными
спейсами.

> +		       int cursor, int reg_out, int reg_eph)
> +{
> +	assert(reg_out != 0);
> +	struct Vdbe *v = parse->pVdbe;
> +	int col_cnt = idx_def->key_def->part_count;
> +	int reg_base = sqlGetTempRange(parse, col_cnt + 1);
> +	for (int j = 0; j < col_cnt; j++) {
> +		uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
> +		sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
> +		/*
> +		 * If the column type is NUMBER but the number
> +		 * is an integer, then it might be stored in the
> +		 * table as an integer (using a compact
> +		 * representation) then converted to REAL by an
> +		 * OP_Realify opcode. But we are getting
> +		 * ready to store this value back into an index,
> +		 * where it should be converted by to INTEGER
> +		 * again.  So omit the OP_Realify opcode if
> +		 * it is present
> +		 */
> +		sqlVdbeDeletePriorOpcode(v, OP_Realify);
> +	}
> +	sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
> +	sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
> +
> +	sqlReleaseTempRange(parse, reg_base, col_cnt);
> +	return reg_base;
> +}
> +
>  /*
> - * Generate code to construct the Index object for an automatic index
> - * and to set up the WhereLevel object pLevel so that the code generator
> - * makes use of the automatic index.
> + * Generate code to construct the ephemeral space that contains used in query
> + * fields of one of the tables. The index of this ephemeral space will be known
> + * as an "automatic index". Also, this functions set up the WhereLevel object
> + * pLevel so that the code generator makes use of the automatic index.
>   */
>  static void
>  constructAutomaticIndex(Parse * pParse,			/* The parsing context */
> @@ -801,9 +852,11 @@ constructAutomaticIndex(Parse * pParse,			/* The parsing context */
>  	n = 0;
>  	idxCols = 0;
>  	uint32_t size = sizeof(struct key_part_def) * nKeyCol;
> -	struct key_part_def *part_def = malloc(size);
> +	struct region *region = &fiber()->gc;
> +	size_t used = region_used(region);
> +	struct key_part_def *part_def = region_alloc(region, size);

Это раздолбает с ASANом из-за отсутствия выравнивания. Нужен region_alloc_array.

Почему не используешь Parse.region? Файберный обязательно где-нибудь проебется
почистить и начнет течь.

>  	if (part_def == NULL) {
> -		diag_set(OutOfMemory, size, "malloc", "part_def");
> +		diag_set(OutOfMemory, size, "region_alloc", "part_def");
>  		pParse->is_aborted = true;
>  		return;
>  	}
> @@ -867,7 +919,7 @@ constructAutomaticIndex(Parse * pParse,			/* The parsing context */
>  	assert(n == nKeyCol);
>  
>  	struct key_def *key_def = key_def_new(part_def, nKeyCol, false);
> -	free(part_def);
> +	region_truncate(region, used);

Смысла в этом мало. Либо регион и забить на освобожения, либо куча.

> diff --git a/src/box/sql/where.c b/src/box/sql/where.c
> index e9e936856..f6b533ab1 100644
> --- a/src/box/sql/where.c
> +++ b/src/box/sql/where.c
> @@ -712,10 +712,61 @@ termCanDriveIndex(WhereTerm * pTerm,	/* WHERE clause term to check */
>  #endif
>  
>  #ifndef SQL_OMIT_AUTOMATIC_INDEX
> +
> +/**
> + * Generate code that will assemble an index key, adds rowid and stores it in
> + * register reg_out. The key will be for index which is an
> + * index on table. cursor is the index of a cursor open on the
> + * table table and pointing to the entry that needs indexing.
> + * cursor must be the cursor of the PRIMARY KEY index.
> + *
> + * @param parse Parsing context.
> + * @param idx_def The index definition for which to generate a key.

Походу тебе достаточно передавать key_def. Весь idx_def не нужен.

> + * @param cursor Cursor number from which to take column data.
> + * @param reg_out Put the new key into this register if not NULL.
> + * @param reg_eph Register holding adress of ephemeral space.
> + *
> + * @retval Return a register number which is the first in a block
> + * of registers that holds the elements of the index key. The
> + * block of registers has already been deallocated by the time
> + * this routine returns.
> + */
> +static int
> +generate_index_key(struct Parse *parse, struct index_def *idx_def,
> +		       int cursor, int reg_out, int reg_eph)
> +{
> +	assert(reg_out != 0);
> +	struct Vdbe *v = parse->pVdbe;
> +	int col_cnt = idx_def->key_def->part_count;
> +	int reg_base = sqlGetTempRange(parse, col_cnt + 1);
> +	for (int j = 0; j < col_cnt; j++) {
> +		uint32_t tabl_col = idx_def->key_def->parts[j].fieldno;
> +		sqlVdbeAddOp3(v, OP_Column, cursor, tabl_col, reg_base + j);
> +		/*
> +		 * If the column type is NUMBER but the number
> +		 * is an integer, then it might be stored in the
> +		 * table as an integer (using a compact
> +		 * representation) then converted to REAL by an
> +		 * OP_Realify opcode. But we are getting
> +		 * ready to store this value back into an index,
> +		 * where it should be converted by to INTEGER
> +		 * again.  So omit the OP_Realify opcode if
> +		 * it is present
> +		 */
> +		sqlVdbeDeletePriorOpcode(v, OP_Realify);
> +	}
> +	sqlVdbeAddOp2(v, OP_NextIdEphemeral, reg_eph, reg_base + col_cnt);
> +	sqlVdbeAddOp3(v, OP_MakeRecord, reg_base, col_cnt + 1, reg_out);
> +
> +	sqlReleaseTempRange(parse, reg_base, col_cnt);

Выглядит как лютый хак - аллоцируется N + 1 временных регистров, но
освобождается только N, чтоб в последем че-то сохранить. То есть намеренная
утечка временного регистра на время выполнения запроса. Выглядит не очень.
Лучше бы это починить отдельным тикетом.