From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9C4B327108
	for <tarantool-patches@freelists.org>; Wed, 18 Jul 2018 11:13:50 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id kSL6heCEOodl for <tarantool-patches@freelists.org>;
	Wed, 18 Jul 2018 11:13:50 -0400 (EDT)
Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 2BEAB27831
	for <tarantool-patches@freelists.org>; Wed, 18 Jul 2018 11:13:50 -0400 (EDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: [tarantool-patches] Re: [PATCH] sql: xfer optimization issue
From: "n.pettik" <korablev@tarantool.org>
In-Reply-To: <CAEi+_aqG_KspmHjhSR8FGkWS1E-oUC6y6C+Jh5ccfB0z_Z3FLQ@mail.gmail.com>
Date: Wed, 18 Jul 2018 18:13:47 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <605B15EF-BD1C-4B03-8A9F-6E6225076812@tarantool.org>
References: <CAEi+_aoTp-jnAtSZXMqo4EACEDS20Rwkvw+yCsH9pKz8X05Y7w@mail.gmail.com>
 <5BB99B27-5F86-4664-AAD5-57A22ECED854@tarantool.org>
 <CAEi+_aoOeCAsdf8QdP2tPqZYRJmzfW=vOBgAtAx2+ZnROMJJZA@mail.gmail.com>
 <A120DC35-6ACA-48DF-8A68-531A31B39B8D@tarantool.org>
 <CAEi+_aos_j278_E-j9UsDcoqPm3Bd=E2oTQo_qBrFOOwx8sZUw@mail.gmail.com>
 <93E4DAEA-EF90-479D-9F62-3D1CEB3CBE3F@tarantool.org>
 <CAEi+_aqxW-RPhC5tFBPQSmPewR48SSnDMiB2_iw6Etzpqrni4Q@mail.gmail.com>
 <F68261EE-EDE3-48FA-9864-7BA2653798B0@tarantool.org>
 <CAEi+_arx_rHJGAqafDYT83fRFSUhqQZ58-C3rvYg38jxdf_Fig@mail.gmail.com>
 <20180628101839.fhnijezdpwviohop@tkn_work_nb>
 <20180709155006.fwrikbznqk23ger5@tkn_work_nb>
 <CAEi+_apzof1NH=W+k97SKP-7LFs1JwLwM7AHaW5t5-Qh61tq5g@mail.gmail.com>
 <79D03E96-0BD0-418D-9DB2-45318C734628@tarantool.org>
 <CAEi+_apqiUN+w0uN8=0qp04dACEgs0xHHqNEAaiCoc2m8oQ06Q@mail.gmail.com>
 <CAEi+_aqwyRCEMdsMUOrxgaVHCrrG4k+VG+sM8Nt6A8W_HA7kdA@mail.gmail.com>
 <8B8D5501-075D-4BEB-B282-35B0B81CD555@tarantool.org>
 <CAEi+_aqG_KspmHjhSR8FGkWS1E-oUC6y6C+Jh5ccfB0z_Z3FLQ@mail.gmail.com>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org
Cc: "N. Tatunov" <hollow653@gmail.com>

Please, add to commit message results of benchmark to indicate
that this optimisation really matters.

> On 17 Jul 2018, at 00:27, Nikita Tatunov <hollow653@gmail.com> wrote:
>=20
> diff --git a/src/box/sql.c b/src/box/sql.c
> index fdce224..398b2a6 100644
> --- a/src/box/sql.c
> +++ b/src/box/sql.c
> @@ -1636,10 +1636,12 @@ sql_debug_info(struct info_handler *h)
>  	extern int sql_search_count;
>  	extern int sql_sort_count;
>  	extern int sql_found_count;
> +	extern int sql_xferOpt_count;

Don=E2=80=99t use camel notation. Lets call it simply =
=E2=80=99sql_xfer_count=E2=80=99.

>  	info_begin(h);
>  	info_append_int(h, "sql_search_count", sql_search_count);
>  	info_append_int(h, "sql_sort_count", sql_sort_count);
>  	info_append_int(h, "sql_found_count", sql_found_count);
> +	info_append_int(h, "sql_xferOpt_count", sql_xferOpt_count);
>  	info_end(h);
>  }
> =20
> diff --git a/src/box/sql/insert.c b/src/box/sql/insert.c
> index 2c9188e..9a99bab 100644
> --- a/src/box/sql/insert.c
> +++ b/src/box/sql/insert.c
> @@ -1635,7 +1635,7 @@ sqlite3OpenTableAndIndices(Parse * pParse,	=
/* Parsing context */
>   * purposes only - to make sure the transfer optimization really
>   * is happening when it is supposed to.
>   */
> -int sqlite3_xferopt_count;
> +int sql_xferOpt_count =3D 0;
>  #endif				/* SQLITE_TEST */
> =20
>  #ifndef SQLITE_OMIT_XFER_OPT
> @@ -1658,6 +1658,8 @@ xferCompatibleIndex(Index * pDest, Index * pSrc)
>  	assert(pDest->pTable !=3D pSrc->pTable);
>  	uint32_t nDestCol =3D index_column_count(pDest);
>  	uint32_t nSrcCol =3D index_column_count(pSrc);
> +	if ((pDest->idxType !=3D pSrc->idxType))
> +		return 0;
>  	if (nDestCol !=3D nSrcCol) {
>  		return 0;	/* Different number of columns */
>  	}
> @@ -1725,9 +1727,9 @@ xferOptimization(Parse * pParse,	/* =
Parser context */
>  	int emptyDestTest =3D 0;	/* Address of test for empty =
pDest */
>  	int emptySrcTest =3D 0;	/* Address of test for empty pSrc */
>  	Vdbe *v;		/* The VDBE we are building */
> -	int destHasUniqueIdx =3D 0;	/* True if pDest has a UNIQUE =
index */
>  	int regData, regTupleid;	/* Registers holding data and =
tupleid */
>  	struct session *user_session =3D current_session();
> +	bool is_err_action_default =3D false;

Again: why do you need this flag? Default action is just synonym for =
ABORT,
so why should we care about it?

> +	struct space *src_space =3D
> +		space_by_id(SQLITE_PAGENO_TO_SPACEID(pSrc->tnum));
> +	struct space *dest_space =3D
> +		space_by_id(SQLITE_PAGENO_TO_SPACEID(pDest->tnum));

You don=E2=80=99t need to again proceed lookup: space is found few lines =
above.
Moreover, I see those lookups are executed inside =E2=80=98for' loop. =
Lets move
them outside it.

> +	struct index *src_idx =3D space_index(src_space, 0);
> +	struct index *dest_idx =3D space_index(dest_space, 0);
> +
> +	/* Xfer optimization is unable to correctly insert data
> +	 * in case there's a conflict action other than *_ABORT.
> +	 * This is the reason we want to only run it if the
> +	 * destination table is initially empty.
> +	 * That block generates code to make that determination.
> +	 */

Multi-line comment should be formatted as following:

/*
 * Comment starts here.
 * =E2=80=A6
 */

> +
> +	if (!(onError =3D=3D ON_CONFLICT_ACTION_ABORT &&
> +	    is_err_action_default =3D=3D false)) {
>  		addr1 =3D sqlite3VdbeAddOp2(v, OP_Rewind, iDest, 0);
>  		VdbeCoverage(v);
>  		emptyDestTest =3D sqlite3VdbeAddOp0(v, OP_Goto);
>  		sqlite3VdbeJumpHere(v, addr1);
> +#ifdef SQLITE_TEST
> +		if (dest_idx->vtab->count(dest_idx, ITER_ALL, NULL, 0) =
=3D=3D 0)
> +			sql_xferOpt_count++;

Actually, I don=E2=80=99t like this approach.
Look, query may be compiled and saved into cache (even thought it is =
still
not implemented yet). So it might be executed later and it might be not =
empty.
Moreover, we are going to avoid doing space lookups and use only def.
With only def you can=E2=80=99t execute count.

Personally, I wanted you to defer incrementing sql_xfer_count till
VDBE execution. For instance, you may add special flag and pass it
to OP_RowData indicating that xFer is currently processing.

> +#endif
> +	vdbe_emit_open_cursor(pParse, iSrc, 0, src_space);
> +	VdbeComment((v, "%s", src_idx->def->name));
> +	vdbe_emit_open_cursor(pParse, iDest, 0, dest_space);

I see few lines above:

sqlite3OpenTable(pParse, iDest, pDest, OP_OpenWrite);

So, basically you don=E2=80=99t need to open it again.

> +	VdbeComment((v, "%s", dest_idx->def->name));
> +	addr1 =3D sqlite3VdbeAddOp2(v, OP_Rewind, iSrc, 0);
> +	VdbeCoverage(v);
> +	sqlite3VdbeAddOp2(v, OP_RowData, iSrc, regData);
> +	sqlite3VdbeAddOp2(v, OP_IdxInsert, iDest, regData);
> +	sqlite3VdbeChangeP5(v, OPFLAG_NCHANGE);
> +	sqlite3VdbeAddOp2(v, OP_Next, iSrc, addr1 + 1);
> +	VdbeCoverage(v);
> +	sqlite3VdbeJumpHere(v, addr1);
> +	sqlite3VdbeAddOp2(v, OP_Close, iSrc, 0);
> +	sqlite3VdbeAddOp2(v, OP_Close, iDest, 0);
> +
>  	if (emptySrcTest)
>  		sqlite3VdbeJumpHere(v, emptySrcTest);
>  	sqlite3ReleaseTempReg(pParse, regTupleid);
> diff --git a/test/sql-tap/gh-3307-xfer-optimization-issue.test.lua =
b/test/sql-tap/gh-3307-xfer-optimization-issue.test.lua
> new file mode 100755
> index 0000000..e75fabc
> --- /dev/null
> +test:do_catchsql_test(
> +	"xfer-optimization-1.15",
> +	[[
> +		DROP TABLE t1;
> +		DROP TABLE t2;
> +		CREATE TABLE t1(a INTEGER PRIMARY KEY, b UNIQUE);
> +		CREATE TABLE t2(a INTEGER PRIMARY KEY, b UNIQUE);
> +		INSERT INTO t1 VALUES (2, 2), (3, 3), (5, 5);
> +		INSERT INTO t2 VALUES (1, 1), (4, 4);
> +		INSERT OR ROLLBACK INTO t2 SELECT * FROM t1;

INSERT OT ROLLBACK outside transaction works the same as ABORT and =
DEFAULT.
So, surround it with transaction and check that it really rollbacks.

> +	]], {
> +		-- <xfer-optimization-1.15>
> +		0
> +		-- <xfer-optimization-1.15>
> +	})
> +
> +test:do_execsql_test(
> +	"xfer-optimization-1.16",
> +	[[
> +		SELECT * FROM t2;
> +	]], {
> +		-- <xfer-optimization-1.16>
> +		1, 1, 2, 2, 3, 3, 4, 4, 5, 5
> +		-- <xfer-optimization-1.16>
> +	})
> +
> +-- The following tests are supposed to test if xfer-optimization is =
actually
> +-- used in the given cases (if the conflict actually occurs):
> +-- 	1.0) insert w/o explicit confl. action & w/o index replace =
action
> +-- 	1.1) insert w/o explicit confl. action & w/ index replace action =
& empty dest_table
> +-- 	1.2) insert w/o explicit confl. action & w/ index replace action =
& non-empty dest_table
> +-- 	2) insert with abort
> +-- 	3.0) insert with rollback (into empty table)
> +-- 	3.1) insert with rollback (into non-empty table)
> +-- 	4) insert with replace
> +-- 	5) insert with fail
> +-- 	6) insert with ignore
> +
> +
> +-- 1.0) insert w/o explicit confl. action & w/o index replace action
> =
+-------------------------------------------------------------------------=
------------------
> +
> +bfr =3D box.sql.debug().sql_xferOpt_count
> +
> +test:do_catchsql_test(
> +	"xfer-optimization-1.17",
> +	[[
> +		DROP TABLE t1;
> +		DROP TABLE t2;
> +		CREATE TABLE t1(a INTEGER PRIMARY KEY, b);
> +		CREATE TABLE t2(a INTEGER PRIMARY KEY, b);
> +		INSERT INTO t1 VALUES (1, 1), (3, 3), (5, 5);
> +		INSERT INTO t2 VALUES (2, 2), (3, 4);
> +		BEGIN;
> +			INSERT INTO t2 VALUES (4, 4);
> +			INSERT INTO t2 SELECT * FROM t1;
> +	]], {
> +		-- <xfer-optimization-1.17>
> +		1, "Duplicate key exists in unique index =
'sqlite_autoindex_T2_1' in space 'T2'"
> +		-- <xfer-optimization-1.17>
> +	})
> +
> +test:do_execsql_test(
> +	"xfer-optimization-1.18",
> +	[[
> +			INSERT INTO t2 VALUES (10, 10);
> +		COMMIT;
> +		SELECT * FROM t2;
> +	]], {
> +		-- <xfer-optimization-1.18>
> +		2, 2, 3, 4, 4, 4, 10, 10
> +		-- <xfer-optimization-1.18>
> +	})
> +
> +aftr =3D box.sql.debug().sql_xferOpt_count
> +
> +test:do_test(
> +	"xfer-optimization-1.19",
> +	function()
> +		if (aftr - bfr =3D=3D 1) then
> +			return {1}
> +		end
> +		if (aftr =3D=3D bfr) then
> +			return {0}
> +		end
> +		return {2}

Why do you repeat this snippet each time? You can declare it as named
function once and use it everywhere.

> +	end, {
> +		-- <xfer-optimization-1.19>
> +		0
> +		-- <xfer-optimization-1.19>
> +	})
> +
> +-- 1.1) insert w/o explicit confl. action & w/ index replace action & =
empty dest_table

Even in tests lets not exceed 80 chars (here and in other places).