Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Mergen Imeev <imeevma@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: introduce user-defined aggregate functions
Date: Mon, 24 Jan 2022 23:08:49 +0100	[thread overview]
Message-ID: <035cf09f-f958-a39c-89db-8e01dce22814@tarantool.org> (raw)
In-Reply-To: <20220123141755.GA103585@tarantool.org>

On 23.01.2022 15:17, Mergen Imeev wrote:
> Hi! Thank you for the review! Diff, new patch and a patch that adds a new
> utility function below.
> 
> On Tue, Jan 18, 2022 at 01:10:04AM +0100, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> What happened to the idea we had about SQL iterators? Then people would
>> be able to iterate with yields and to make any kind of aggregation without
>> having to create functions for that.
>>
> I think that this idea is actually very similar to the definition of CURSOR from
> the standard. Here's what the standard says about CURSORs:
> "A cursor is a mechanism by which the rows of a table may be acted on (e.g.,
> returned to a host programming language) one at a time."
> 
> I will try to suggest this idea.
> 
> However, current issue have quite strict deadline, which is actually quite near.
> I believe that CURSOR injection is not a problem that can be solved in a few
> weeks.
> 
> Also, about my current implementation. I don't like it, because there are too
> many changes made to BOX that are not related to BOX itself. I think there is a
> much simpler implementation. I think you noticed that the current implementation
> says that non-constant functions and C functions should define two
> implementations named "<name>" and "<name>_finalize". Do you think this rule can
> be made more general? I suggest creating two tuples in _func instead of one when
> we create an aggregate function using box.schema.func.create(). In this case,
> there will be almost no changes in alter.cc and BOX as a whole, and the logic
> will become much simpler. Also, I think we won't be adding new opcodes to the
> VDBE since we can just reuse OP_FunctionByName. That problem with this approach
> is that "<name>_finalize" will be occupied.

Another issue is that people will need to delete this func explicitly when they
drop the main one. That might be a blocker for this rework idea.

Plausible alternatives are:

1. Postpone aggregation until cursors are implemented. Move the deadline. Then the
aggregation might be not needed at all.

2. Allow to specify step and finalization functions in SQL query. For instance,

	-- SQL
	--
	SELECT AGGREGATE({STEP = AvgFunc, END = AvgEnd}, [a, b]) FROM t;

	-- Lua
	--
	function AvgFunc(ctx, a, b)
		if ctx == nil then
			return {s = a + b, n = 1}
		else
			ctx.s = ctx.s + a + b
			ctx.n = ctx.n + 1
		end
	end

	function AvgEnd(ctx)
		return ctx.s / ctx.n
	end


	box.schema.func.create('AvgFunc', ...)
	box.schema.func.create('AvgEnd', ...)

If you omit FIN, it is simply not called. The same with STEP. And later could
introduce START.

3. Force people to create '..._finalize' function. Until it is created by their
own hands, the aggregation will raise an error.

> Also, this could potentially lead to
> a different combination of STEP and FINALIZE functions, for example, STEP could
> be constant but FINALIZE could be non-persistent. However, to create such
> combinations, the user must insert tuples directly into _func. I am not sure,
> if this a feature or a bug. What do you think?

As for how the functions are implemented - I wouldn't care much. They must
exist and be callable, that could be the only requirement.

I didn't look at the new version of the patch yet. Only answering the main
question now.

      reply	other threads:[~2022-01-24 22:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 13:38 Mergen Imeev via Tarantool-patches
2022-01-18  0:10 ` Vladislav Shpilevoy via Tarantool-patches
2022-01-23 14:17   ` Mergen Imeev via Tarantool-patches
2022-01-24 22:08     ` Vladislav Shpilevoy via Tarantool-patches [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=035cf09f-f958-a39c-89db-8e01dce22814@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=imeevma@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/1] sql: introduce user-defined aggregate functions' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox