Tarantool development patches archive
 help / color / mirror / Atom feed
From: "n.pettik" <korablev@tarantool.org>
To: tarantool-patches@freelists.org
Cc: Ivan Koptelov <ivan.koptelov@tarantool.org>,
	Konstantin Osipov <kostja@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
Date: Wed, 17 Apr 2019 16:19:16 +0300	[thread overview]
Message-ID: <EBECFD7F-3491-428A-B022-B91AAAF47E29@tarantool.org> (raw)
In-Reply-To: <BBC9768D-9708-429B-A40C-C3E91EAB56E8@tarantool.org>



> On 17 Apr 2019, at 15:50, i.koptelov <ivan.koptelov@tarantool.org> wrote:
> 
> Thank you for the comments! I agree with all your suggestions and would
> send fixes a little bit later. Now I have one thing to discuss. 
> 
>> On 5 Apr 2019, at 22:48, Konstantin Osipov <kostja@tarantool.org> wrote:
>> 
>> * Ivan Koptelov <ivan.koptelov@tarantool.org> [19/04/05 18:02]:
>> 
>> Besides, I guess you can get rid of this check for most common
>> cases - averaging a column of the same type - so this is perhaps
>> better to make a separate opcode, not part of the main opcode, and
>> emit only when we're not sure the type is going to be the same
>> across all values. I don't know how hard this is to do, however -
>> perhaps should be moved into a separate patch, but I'd guess
>> detecting that the aggregate function argument has a non-mutable
>> type is not hard. 
>> 
>> -- 
>> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
>> http://tarantool.io - www.twitter.com/kostja_osipov
> 
> I am not quite understand why do you use word ‘opcode’.
> Functions are implemented as C code.

I guess Konstantin meant adding opcode that implements
logic of primitive aggregate function. We are operating in
terms of opcodes due to our abstraction model: it is divided
into compilation stage and runtime. To invoke ‘c function’, we
should have corresponding opcode implementing initial preparation,
passing arguments to function, saving results to given registers etc.

Original SQLite routines to implement aggregates take into account
most general case (any function, any types, any number of arguements etc).
We can significantly simplify this procedures for several quite common cases.
For instance, SELECT sum(a) FROM t;

Within one opcode, we can open iterator, fetch values and quickly evaluate
sum (without involving OP_Next, OP_Column etc).

> Considering your suggestion (one ‘opcode’ for simple cases, another one
> - for complex) I want to do the following:
> 1) Add a bunch of INTERNAL functions, for example max_number, max_text and max_scalar.
> max_number and max_text would not have excess type checks, while max_scalar would
> have all necessary type checks. So a bunch of INTERNAL functions would implement one EXTERNAL function
> (just max() in this example).

Anyway, at some moment you have to dispatch call of function depending
on runtime value:

switch (mem->type) {
	case MEM_Str:
		return max_str(mem);
etc.

Otherwise, we have to add separate opcode for each pair
(aggregate, type), which in turn would decrease performance.

Nevertheless, I like this approach. For instance, it is used in PosgreSQL:
for each operation (like ‘+’) they have function with corresponding arguments.
When you try to add non-compatible values (like string and double), there’s
no function implementing addition with char * and double arguments, so
error is raised:  'operator does not exist: text * integer’. What is more, such
approach is quite suitable for JITing. On the other hand, we should carefully
benchmark changes like this. I’m not sure whether priority of this task is high
or low.

> 2) In runtime determine proper INTERNAL function (max_number, max_text or max_scalar) to implement
> given function. It would be done only once (not on the every step of aggregate function) using
> information about column type.
> 
> For example:
> SELECT MAX(b) FROM test_table;
> 
> If test_table.b has TEXT type we would use max_text. If test_table.b has SCALAR type
> we would max_scalar.  
> 
> If this question seem for you to be too ‘low-level’ I can just send the code
> for the next review round.
> 
> 

  reply	other threads:[~2019-04-17 13:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-05 14:57 [tarantool-patches] [PATCH 0/2] " Ivan Koptelov
2019-04-05 14:57 ` [tarantool-patches] [PATCH 1/2] " Ivan Koptelov
2019-04-09 14:52   ` [tarantool-patches] " n.pettik
2019-04-05 14:57 ` [tarantool-patches] [PATCH 2/2] " Ivan Koptelov
2019-04-05 19:48   ` [tarantool-patches] " Konstantin Osipov
2019-04-17 12:50     ` i.koptelov
2019-04-17 13:19       ` n.pettik [this message]
2019-04-09 14:52   ` n.pettik
2019-04-23 15:38     ` i.koptelov
2019-04-24 17:37       ` n.pettik
2019-05-06 13:24         ` i.koptelov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EBECFD7F-3491-428A-B022-B91AAAF47E29@tarantool.org \
    --to=korablev@tarantool.org \
    --cc=ivan.koptelov@tarantool.org \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox