From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 903FC2A73E
	for <tarantool-patches@freelists.org>; Wed, 17 Apr 2019 09:19:20 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 8aJjbse6NW_1 for <tarantool-patches@freelists.org>;
	Wed, 17 Apr 2019 09:19:20 -0400 (EDT)
Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 4A4382A739
	for <tarantool-patches@freelists.org>; Wed, 17 Apr 2019 09:19:20 -0400 (EDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
Subject: [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions
 types more strict
From: "n.pettik" <korablev@tarantool.org>
In-Reply-To: <BBC9768D-9708-429B-A40C-C3E91EAB56E8@tarantool.org>
Date: Wed, 17 Apr 2019 16:19:16 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <EBECFD7F-3491-428A-B022-B91AAAF47E29@tarantool.org>
References: <cover.1554475881.git.ivan.koptelov@tarantool.org>
 <49e4ae0bc187dc02f908427692c0ddb2cc2d36a8.1554475881.git.ivan.koptelov@tarantool.org>
 <20190405194815.GH3789@chai>
 <BBC9768D-9708-429B-A40C-C3E91EAB56E8@tarantool.org>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-Help: <mailto:ecartis@freelists.org?Subject=help>
List-Unsubscribe: <mailto:tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-Subscribe: <mailto:tarantool-patches-request@freelists.org?Subject=subscribe>
List-Owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-Archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org
Cc: Ivan Koptelov <ivan.koptelov@tarantool.org>, Konstantin Osipov <kostja@tarantool.org>


> On 17 Apr 2019, at 15:50, i.koptelov <ivan.koptelov@tarantool.org> =
wrote:
>=20
> Thank you for the comments! I agree with all your suggestions and =
would
> send fixes a little bit later. Now I have one thing to discuss.=20
>=20
>> On 5 Apr 2019, at 22:48, Konstantin Osipov <kostja@tarantool.org> =
wrote:
>>=20
>> * Ivan Koptelov <ivan.koptelov@tarantool.org> [19/04/05 18:02]:
>>=20
>> Besides, I guess you can get rid of this check for most common
>> cases - averaging a column of the same type - so this is perhaps
>> better to make a separate opcode, not part of the main opcode, and
>> emit only when we're not sure the type is going to be the same
>> across all values. I don't know how hard this is to do, however -
>> perhaps should be moved into a separate patch, but I'd guess
>> detecting that the aggregate function argument has a non-mutable
>> type is not hard.=20
>>=20
>> --=20
>> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
>> http://tarantool.io - www.twitter.com/kostja_osipov
>=20
> I am not quite understand why do you use word =E2=80=98opcode=E2=80=99.
> Functions are implemented as C code.

I guess Konstantin meant adding opcode that implements
logic of primitive aggregate function. We are operating in
terms of opcodes due to our abstraction model: it is divided
into compilation stage and runtime. To invoke =E2=80=98c function=E2=80=99=
, we
should have corresponding opcode implementing initial preparation,
passing arguments to function, saving results to given registers etc.

Original SQLite routines to implement aggregates take into account
most general case (any function, any types, any number of arguements =
etc).
We can significantly simplify this procedures for several quite common =
cases.
For instance, SELECT sum(a) FROM t;

Within one opcode, we can open iterator, fetch values and quickly =
evaluate
sum (without involving OP_Next, OP_Column etc).

> Considering your suggestion (one =E2=80=98opcode=E2=80=99 for simple =
cases, another one
> - for complex) I want to do the following:
> 1) Add a bunch of INTERNAL functions, for example max_number, max_text =
and max_scalar.
> max_number and max_text would not have excess type checks, while =
max_scalar would
> have all necessary type checks. So a bunch of INTERNAL functions would =
implement one EXTERNAL function
> (just max() in this example).

Anyway, at some moment you have to dispatch call of function depending
on runtime value:

switch (mem->type) {
	case MEM_Str:
		return max_str(mem);
etc.

Otherwise, we have to add separate opcode for each pair
(aggregate, type), which in turn would decrease performance.

Nevertheless, I like this approach. For instance, it is used in =
PosgreSQL:
for each operation (like =E2=80=98+=E2=80=99) they have function with =
corresponding arguments.
When you try to add non-compatible values (like string and double), =
there=E2=80=99s
no function implementing addition with char * and double arguments, so
error is raised:  'operator does not exist: text * integer=E2=80=99. =
What is more, such
approach is quite suitable for JITing. On the other hand, we should =
carefully
benchmark changes like this. I=E2=80=99m not sure whether priority of =
this task is high
or low.

> 2) In runtime determine proper INTERNAL function (max_number, max_text =
or max_scalar) to implement
> given function. It would be done only once (not on the every step of =
aggregate function) using
> information about column type.
>=20
> For example:
> SELECT MAX(b) FROM test_table;
>=20
> If test_table.b has TEXT type we would use max_text. If test_table.b =
has SCALAR type
> we would max_scalar. =20
>=20
> If this question seem for you to be too =E2=80=98low-level=E2=80=99 I =
can just send the code
> for the next review round.
>=20
>=20