From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 25C4A21A6D for ; Fri, 20 Apr 2018 11:10:12 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O3D9HuyFw7dz for ; Fri, 20 Apr 2018 11:10:12 -0400 (EDT) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id A191E21A6B for ; Fri, 20 Apr 2018 11:10:11 -0400 (EDT) Received: by mail-lf0-f47.google.com with SMTP id d79-v6so5682251lfd.0 for ; Fri, 20 Apr 2018 08:10:11 -0700 (PDT) MIME-Version: 1.0 References: <1524065531-32467-1-git-send-email-hollow653@gmail.com> <08FAE06B-F6D3-49BD-9011-B5770629AA21@tarantool.org> <5BB99B27-5F86-4664-AAD5-57A22ECED854@tarantool.org> In-Reply-To: <5BB99B27-5F86-4664-AAD5-57A22ECED854@tarantool.org> From: Hollow111 Date: Fri, 20 Apr 2018 15:09:57 +0000 Message-ID: Subject: [tarantool-patches] Re: [PATCH] sql: xfer optimization issue Content-Type: multipart/alternative; boundary="f4f5e80eefa075d831056a49152a" Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: korablev@tarantool.org Cc: tarantool-patches@freelists.org --f4f5e80eefa075d831056a49152a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =D0=BF=D1=82, 20 =D0=B0=D0=BF=D1=80. 2018 =D0=B3. =D0=B2 4:02, n.pettik : > > > Currently insertion from the table to another one > > with the same schema using SELECT works wrong. > > The problem lies in xfer optimization which opens cursors for > > all of the indexes and inserts data excessively. > > Now only PRIMARY KEY is used to handle insertion. > > Moreover analyzing xfer optimization I have noticed > > Nitpicking: it is not worth mentioning process of exploration, > just use (in this particular case) statement of facts. > > > diff --git a/src/box/sql/insert.c b/src/box/sql/insert.c > > index ae8dafb..ed134f4 100644 > > --- a/src/box/sql/insert.c > > +++ b/src/box/sql/insert.c > > @@ -1711,6 +1711,7 @@ xferOptimization(Parse * pParse, /* Parser > context */ > > ExprList *pEList; /* The result set of the SELECT */ > > Table *pSrc; /* The table in the FROM clause of SELECT > */ > > Index *pSrcIdx, *pDestIdx; /* Source and destination indices > */ > > + struct index *src_idx, *dest_idx; /* Source and destination > indices */ > > We don=E2=80=99t use forward var declaration: it is obsolete rule from an= cient C > standards. > Moreover, we place comments above the code to be commented. > > > + struct space *space; /* Space pointer for pDest and pSrc */ > > The same is here. > > > - assert(destHasUniqueIdx); > > - if ((pDest->iPKey < 0 && pDest->pIndex !=3D 0) /* (1) */ > > - ||destHasUniqueIdx /* (2) */ > > - || (onError !=3D ON_CONFLICT_ACTION_ABORT > > - && onError !=3D ON_CONFLICT_ACTION_ROLLBACK) /* (3) = */ > > - ) { > > - /* In some circumstances, we are able to run the xfer > optimization > > - * only if the destination table is initially empty. > > - * This block generates code to make > > - * that determination. > > - * > > - * Conditions under which the destination must be empty: > > - * > > - * (1) There is no INTEGER PRIMARY KEY but there are > indices. > > - * > > - * (2) The destination has a unique index. (The xfer > optimization > > - * is unable to test uniqueness.) > > - * > > - * (3) onError is something other than > ON_CONFLICT_ACTION_ABORT and _ROLLBACK. > > - */ > > - addr1 =3D sqlite3VdbeAddOp2(v, OP_Rewind, iDest, 0); > > - VdbeCoverage(v); > > - emptyDestTest =3D sqlite3VdbeAddOp0(v, OP_Goto); > > - sqlite3VdbeJumpHere(v, addr1); > > - } > > Why did you delete if-clause? The code which checks that > table is empty is needed only in the cases in if-clauses, > i.e. in some cases xfer optimisation can be applied even if > table is not initially empty (as far as I understand). > Yes, that's true this code is needed if any of the cases in if-clause !=3D 0. But the point is that in Tarantool there's always a PRIMARY KEY that is obviously unique which means that the second case is always !=3D 0 and the result of the boolean expression is always not 0, hence there's no reason for this if-check. > > + /* The xfer optimization is unable to test uniqueness > > + * while we have a unique PRIMARY KEY in any existing table. > > + * This is the reason we can only run it if the destination table > > Nitpicking: comments should fit into 66 chars. > > > + * is initially empty. > > + * This block generates code to make that determination. > > + */ > > + addr1 =3D sqlite3VdbeAddOp2(v, OP_Rewind, iDest, 0); > > + VdbeCoverage(v); > > + emptyDestTest =3D sqlite3VdbeAddOp0(v, OP_Goto); > > + sqlite3VdbeJumpHere(v, addr1); > > + > > + space =3D space_by_id(SQLITE_PAGENO_TO_SPACEID(pDest->tnum)); > > + dest_idx =3D space_index(space, 0 /* PK */); > > + space =3D space_by_id(SQLITE_PAGENO_TO_SPACEID(pSrc->tnum)); > > + src_idx =3D space_index(space, 0 /* PK */); > > + assert(src_idx); > > + assert(dest_idx); > > We use explicit !=3D NULL checks. > > > + pDestIdx =3D sqlite3PrimaryKeyIndex(pDest); > > + pSrcIdx =3D sqlite3PrimaryKeyIndex(pSrc); > > In fact, you don=E2=80=99t need these indexes at all. > Instead of calling emit_open_cursor() you > can simply add two opcodes: OP_LoadPtr and OP_OpenWrite > (as it happens inside that function). Or, you can use pSrc->tnum. > Table=E2=80=99s tnum exactly encodes PK space id (index id is 0). > As for KeyInfo - you are able to remove it at all, since opcodes > below don=E2=80=99t rely on key info (AFAIK, better check it yourself). > The only issue remained - how to check PK compatibility. > I would add to xferCompatibleIndex() another one check: > If first index is PK, then second also must be PK. > It is easy to implement, since Index struct contains appropriate flag. > > --f4f5e80eefa075d831056a49152a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=D0=BF= =D1=82, 20 =D0=B0=D0=BF=D1=80. 2018 =D0=B3. =D0=B2 4:02, n.pettik <korablev@tarantool.org>:

> Currently insertion from the table to another one
> with the same schema using SELECT works wrong.
> The problem lies in xfer optimization which opens cursors for
> all of the indexes and inserts data excessively.
> Now only PRIMARY KEY is used to handle insertion.
> Moreover analyzing xfer optimization I have noticed

Nitpicking: it is not worth mentioning process of exploration,
just use (in this particular case) statement of facts.

> diff --git a/src/box/sql/insert.c b/src/box/sql/insert.c
> index ae8dafb..ed134f4 100644
> --- a/src/box/sql/insert.c
> +++ b/src/box/sql/insert.c
> @@ -1711,6 +1711,7 @@ xferOptimization(Parse * pParse,=C2=A0 =C2=A0 = =C2=A0 =C2=A0 /* Parser context */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0ExprList *pEList;=C2=A0 =C2=A0 =C2=A0 =C2=A0= /* The result set of the SELECT */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Table *pSrc;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 /* The table in the FROM clause of SELECT */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Index *pSrcIdx, *pDestIdx;=C2=A0 =C2=A0 =C2= =A0 /* Source and destination indices */
> +=C2=A0 =C2=A0 =C2=A0struct index *src_idx, *dest_idx;=C2=A0 =C2=A0 = =C2=A0 =C2=A0/* Source and destination indices */

We don=E2=80=99t use forward var declaration: it is obsolete rule from anci= ent C standards.
Moreover, we place comments above the code to be commented.

> +=C2=A0 =C2=A0 =C2=A0struct space *space;=C2=A0 =C2=A0 /* Space pointe= r for pDest and pSrc */

The same is here.

> -=C2=A0 =C2=A0 =C2=A0assert(destHasUniqueIdx);
> -=C2=A0 =C2=A0 =C2=A0if ((pDest->iPKey < 0 && pDest->= pIndex !=3D 0)=C2=A0 =C2=A0 /* (1) */
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0||destHasUniqueIdx=C2=A0 /* (2) */<= br> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| (onError !=3D ON_CONFLICT_ACTION= _ABORT
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&& onError != =3D ON_CONFLICT_ACTION_ROLLBACK)=C2=A0 =C2=A0 =C2=A0 /* (3) */
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0) {
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* In some circumstan= ces, we are able to run the xfer optimization
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * only if the destin= ation table is initially empty.
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * This block generat= es code to make
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * that determination= .
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Conditions under w= hich the destination must be empty:
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * (1) There is no IN= TEGER PRIMARY KEY but there are indices.
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * (2) The destinatio= n has a unique index.=C2=A0 (The xfer optimization
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *=C2=A0 =C2=A0 =C2= =A0is unable to test uniqueness.)
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * (3) onError is som= ething other than ON_CONFLICT_ACTION_ABORT and _ROLLBACK.
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0addr1 =3D sqlite3Vdbe= AddOp2(v, OP_Rewind, iDest, 0);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VdbeCoverage(v);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0emptyDestTest =3D sql= ite3VdbeAddOp0(v, OP_Goto);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sqlite3VdbeJumpHere(v= , addr1);
> -=C2=A0 =C2=A0 =C2=A0}

Why did you delete if-clause? The code which checks that
table is empty is needed only in the cases in if-clauses,
i.e. in some cases xfer optimisation can be applied even if
table is not initially empty (as far as I understand).

Yes, that's true this code is needed if any of the=C2= =A0
cases in if-clause !=3D 0. But the point is that in Tarantool=
there's always a PRIMARY KEY that is obviously unique
which mean= s that the=C2=A0second case is always !=3D 0 and
the result of th= e boolean expression is always not 0,
hence there's no reason= for this if-check.


> +=C2=A0 =C2=A0 =C2=A0/* The xfer optimization is unable to test unique= ness
> +=C2=A0 =C2=A0 =C2=A0 * while we have a unique PRIMARY KEY in any exis= ting table.
> +=C2=A0 =C2=A0 =C2=A0 * This is the reason we can only run it if the d= estination table

Nitpicking: comments should fit into 66 chars.

> +=C2=A0 =C2=A0 =C2=A0 * is initially empty.
> +=C2=A0 =C2=A0 =C2=A0 * This block generates code to make that determi= nation.
> +=C2=A0 =C2=A0 =C2=A0 */
> +=C2=A0 =C2=A0 =C2=A0addr1 =3D sqlite3VdbeAddOp2(v, OP_Rewind, iDest, = 0);
> +=C2=A0 =C2=A0 =C2=A0VdbeCoverage(v);
> +=C2=A0 =C2=A0 =C2=A0emptyDestTest =3D sqlite3VdbeAddOp0(v, OP_Goto);<= br> > +=C2=A0 =C2=A0 =C2=A0sqlite3VdbeJumpHere(v, addr1);
> +
> +=C2=A0 =C2=A0 =C2=A0space =3D space_by_id(SQLITE_PAGENO_TO_SPACEID(pD= est->tnum));
> +=C2=A0 =C2=A0 =C2=A0dest_idx =3D space_index(space, 0 /* PK */);
> +=C2=A0 =C2=A0 =C2=A0space =3D space_by_id(SQLITE_PAGENO_TO_SPACEID(pS= rc->tnum));
> +=C2=A0 =C2=A0 =C2=A0src_idx =3D space_index(space, 0 /* PK */);
> +=C2=A0 =C2=A0 =C2=A0assert(src_idx);
> +=C2=A0 =C2=A0 =C2=A0assert(dest_idx);

We use explicit !=3D NULL checks.

> +=C2=A0 =C2=A0 =C2=A0pDestIdx =3D sqlite3PrimaryKeyIndex(pDest);
> +=C2=A0 =C2=A0 =C2=A0pSrcIdx =3D sqlite3PrimaryKeyIndex(pSrc);

In fact, you don=E2=80=99t need these indexes at all.
Instead of calling emit_open_cursor() you
can simply add two opcodes: OP_LoadPtr and OP_OpenWrite
(as it happens inside that function). Or, you can use pSrc->tnum.
Table=E2=80=99s tnum exactly encodes PK space id (index id is 0).
As for KeyInfo - you are able to remove it at all, since opcodes
below don=E2=80=99t rely on key info (AFAIK, better check it yourself).
The only issue remained - how to check PK compatibility.
I would add to xferCompatibleIndex() another one check:
If first index is PK, then second also must be PK.
It is easy to implement, since Index struct contains appropriate flag.

--f4f5e80eefa075d831056a49152a--