[Tarantool-patches] [PATCH v1 1/1] sql: introduce user-defined aggregate functions

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Jan 25 01:08:49 MSK 2022


On 23.01.2022 15:17, Mergen Imeev wrote:
> Hi! Thank you for the review! Diff, new patch and a patch that adds a new
> utility function below.
> 
> On Tue, Jan 18, 2022 at 01:10:04AM +0100, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> What happened to the idea we had about SQL iterators? Then people would
>> be able to iterate with yields and to make any kind of aggregation without
>> having to create functions for that.
>>
> I think that this idea is actually very similar to the definition of CURSOR from
> the standard. Here's what the standard says about CURSORs:
> "A cursor is a mechanism by which the rows of a table may be acted on (e.g.,
> returned to a host programming language) one at a time."
> 
> I will try to suggest this idea.
> 
> However, current issue have quite strict deadline, which is actually quite near.
> I believe that CURSOR injection is not a problem that can be solved in a few
> weeks.
> 
> Also, about my current implementation. I don't like it, because there are too
> many changes made to BOX that are not related to BOX itself. I think there is a
> much simpler implementation. I think you noticed that the current implementation
> says that non-constant functions and C functions should define two
> implementations named "<name>" and "<name>_finalize". Do you think this rule can
> be made more general? I suggest creating two tuples in _func instead of one when
> we create an aggregate function using box.schema.func.create(). In this case,
> there will be almost no changes in alter.cc and BOX as a whole, and the logic
> will become much simpler. Also, I think we won't be adding new opcodes to the
> VDBE since we can just reuse OP_FunctionByName. That problem with this approach
> is that "<name>_finalize" will be occupied.

Another issue is that people will need to delete this func explicitly when they
drop the main one. That might be a blocker for this rework idea.

Plausible alternatives are:

1. Postpone aggregation until cursors are implemented. Move the deadline. Then the
aggregation might be not needed at all.

2. Allow to specify step and finalization functions in SQL query. For instance,

	-- SQL
	--
	SELECT AGGREGATE({STEP = AvgFunc, END = AvgEnd}, [a, b]) FROM t;

	-- Lua
	--
	function AvgFunc(ctx, a, b)
		if ctx == nil then
			return {s = a + b, n = 1}
		else
			ctx.s = ctx.s + a + b
			ctx.n = ctx.n + 1
		end
	end

	function AvgEnd(ctx)
		return ctx.s / ctx.n
	end


	box.schema.func.create('AvgFunc', ...)
	box.schema.func.create('AvgEnd', ...)

If you omit FIN, it is simply not called. The same with STEP. And later could
introduce START.

3. Force people to create '..._finalize' function. Until it is created by their
own hands, the aggregation will raise an error.

> Also, this could potentially lead to
> a different combination of STEP and FINALIZE functions, for example, STEP could
> be constant but FINALIZE could be non-persistent. However, to create such
> combinations, the user must insert tuples directly into _func. I am not sure,
> if this a feature or a bug. What do you think?

As for how the functions are implemented - I wouldn't care much. They must
exist and be callable, that could be the only requirement.

I didn't look at the new version of the patch yet. Only answering the main
question now.


More information about the Tarantool-patches mailing list