From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tsafin@tarantool.org>
Received: from smtp40.i.mail.ru (smtp40.i.mail.ru [94.100.177.100])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 16F62469710
 for <tarantool-discussions@dev.tarantool.org>;
 Wed, 25 Nov 2020 00:55:41 +0300 (MSK)
From: "Timur Safin" <tsafin@tarantool.org>
References: <0b1d01d6b9a4$aecaf500$0c60df00$@tarantool.org>
 <49e6f534-1d53-3fe1-a363-3e42fcf916bb@ocelot.ca>
In-Reply-To: <49e6f534-1d53-3fe1-a363-3e42fcf916bb@ocelot.ca>
Date: Wed, 25 Nov 2020 00:55:37 +0300
Message-ID: <07de01d6c2ac$8b80b600$a2822200$@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Content-Language: ru
Subject: Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - AST
List-Id: Tarantool development process
 <tarantool-discussions.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-discussions>, 
 <mailto:tarantool-discussions-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-discussions/>
List-Post: <mailto:tarantool-discussions@dev.tarantool.org>
List-Help: <mailto:tarantool-discussions-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-discussions>, 
 <mailto:tarantool-discussions-request@dev.tarantool.org?subject=subscribe>
To: 'Peter Gulutzan' <pgulutzan@ocelot.ca>
Cc: m.semkin@corp.mail.ru, mons@tarantool.org, tarantool-discussions@dev.tarantool.org

Thanks for your ideas, Peter!

We are not yet ready to discuss in all details distributed SQL=20
modifications we would need to introduce for cluster execution,
because we are in much earlier stages of development, and need=20
to proceed the 1st step yet - extracting AST from SQL parse tree.

But I promise, we will return back to this discussion, once=20
we would go beyond the single node. I'll return back then
more prepared.

Thanks,
Timur

: From: Peter Gulutzan <pgulutzan@ocelot.ca>
: Subject: Re: [Tarantool-discussions] RFC - distributed SQL, step #1 - =
AST
:=20
: Hi,
:=20
: I only want to address the possible SQL syntax adjustments.
:=20
: You mentioned "vshard" so I take it that we must assume use of
: =
https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/
: so I mean: what SQL adjustments can take advantage of vshard.
:=20
: If we have some appropriate things in SQL, we can:
: () make it appear that migration from other SQL DBMSs is easier
: () partly fulfill the "distributed SQL" by really doing it in SQL.
: () allow users to design tables and queries with shard awareness.
: However, if I guess something looks hard, I'll say: make it illegal.
:=20
: BUCKET_ID
:=20
: All tuples, and most queries, need a column which contains numbers
: between 1 and bucket_count, and has a non-unique index.
: More flexibility would be nice but assume that's true for a year.
: Since a typical SQL database will contain column values that have
: different definitions, we are looking for a way that an SQL user
: can declare, for each table, "here is a column that can be mapped
: to bucket_id". Thus, in SQL terms, bucket_id is a GENERATED column
: which is derived from other columns in the table, or system values.
: For example, if column_x is an unsigned primary key, perhaps
: bucket_id =3D mod(primary_key_value, bucket_count).
: Any deterministic function will do.
: Changing bucket_count will be difficult.
: Changing the source column will be illegal.
:=20
: Of course the initial assumption is that a router will handle
: distribution questions automatically but the feature might
: look better if there is clear support in the syntax.
:=20
: "Distributed SQL" should mean that you can specify something in SQL.
: There are some common attributes that can be specified,
: and the most obvious one is: there is a choice of distribution
: methods for each table.
: In other words: saying
: "It is distributed SQL but it's not in SQL and the only thing
: you can do is distribute by a hash"
: will not look like what some people think is distributed SQL.
: So I am describing some additional SQL syntax
: which is non-standard but very common,
: with slight extra functionality that I hope will not look hard.
:=20
: CREATE TABLE statement
:=20
: I am suggesting a new optional clause, and a change to an
: existing clause, that would at least indicate whether and how
: the table is sharded.
:=20
: It will show up in information_schema
: and in error messages in a way that SQL users will understand.
: Also, a syntax check can determine whether the clause is illegal
: because there are other CREATE TABLE clauses that won't work
: in combination with the new or changed clause.
: (I am thinking of foreign-key and check clauses, possibly.)
:=20
: CREATE TABLE ... PARTITION BY ...
:=20
: Should the clause be PARTITION BY?
: Of course the clause could start with the keyword SHARD, or something =
else.
: And of course we do not wish to appear to be saying that
: shards are synonymous with partitions.
: But we can decide, as
: PostgreSQL did when they "implemented sharding on top of partitioning"
: https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-
: postgresql-and-how-it-relates-to-mongodbs/
: that a partition description will affect a shard definition,
: so it is okay unless we plan to do partitioning without sharding =
later.
: And PARTITION BY is the clause that is already used in
: MySQL/MariaDB
: https://dev.mysql.com/doc/refman/8.0/en/create-table.html
: https://mariadb.com/kb/en/create-table/#partitions
: PostgreSQL:
: https://www.postgresql.org/docs/11/sql-createtable.html
: Oracle (though they add "You cannot partition a table that is part of =
a
: cluster."):
: =
https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2215406
: =
https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#i2215406
: DB2:
: =
https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/=
db2
: z_sql_createtable.html
: So, although this is only part of what those other vendors do, I =
suggest
: PARTITION BY some-keyword, and we can add more options as time goes =
by.
:=20
: The word PARTITION is already a reserved word.
:=20
: REJECTING REDSHIFT
:=20
: To be fair, I must acknowledge the existence of other syntaxes,
: such as SQL Server's (which allows specifying functions) and most
: interestingly Amazon's. For example their ALTER TABLE options include
: https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html
: | ALTER DISTKEY column_name
: | ALTER DISTSTYLE ALL
: | ALTER DISTSTYLE EVEN
: | ALTER DISTSTYLE KEY DISTKEY column_name
: | ALTER DISTSTYLE AUTO
: where ALL means "all nodes should get a copy of the table",
: DISTKEY i.e. Distribution Key means something like partition key,
: AUTO means let the system figure it out.
: I like ALL but it would require a significant change to vshard rules.
: I like AUTO but it is nearly useless without ALL.
: Therefore, given that these options are uncommon, I ignore them.
:=20
: (By the way Redshift's "leader node" can itself have tables so it
: doesn't need to pass all queries to other nodes, maybe our "router"
: could do something similar, but that's not an issue here.)
:=20
: CREATE TABLE ... PARTITION BY HASH (partition-key-column-names);
:=20
: I am assuming that we don't need to change vshard hashing or
: add more hash algorithms. And I don't refer here to the way
: that vshard handles bucket_id, I only expect the hash here to
: provide us with a bucket_id value when partition-key-columns are =
known.
: So perhaps it is uncontroversial that HASH (primary-key-column-names)
: would be legal. But is it default and is it compulsory?
:=20
: I think the answers are 'no' and 'no' but am uncertain what does it
: mean if this clause is absent. It could mean
: "iff the whole database is sharded then assume it"
: or it could mean
: "even if the database is sharded this is local and probably =
temporary".
:=20
: It would be simplest if partition-key-column-names happened to be
: a primary-key-column, because updating a primary-key-column is
: illegal already. Anyway, updating partition-key-column-names is =
illegal.
:=20
: The list of columns in CREATE TABLE must include "bucket_id" (lower =
case).
: Perhaps we could assume that it exists and not show it with SELECT *.
: But I think it is useful to know its value and useful to be able to
: specify it anywhere in the column list. So now an example:
:=20
: CREATE TABLE t (column_1 INT PRIMARY KEY, "bucket" INT, column_2 =
STRING)
: PARTITION BY HASH (column_2);
: INSERT INTO t VALUES (1, 'A');
: SELECT * FROM t;
: Result: 1, 41, 'A'.
: Notice assumption: when inserting, I don't need to specify =
"bucket_id".
: Notice assumption: bucket_id =3D code point of first letter of string.
: Maybe those are bad assumptions, I just want the example to be easy.
:=20
: I suppose that PARTITION BY clause comes after WITH clause,
: as in MySQL where partition options come after table options.
:=20
: CREATE TABLE ... PARTITION BY RANGE (expression)
: PARTITION partition_name VALUES LESS THAN (value) [, ... ]);
:=20
: This will be a bit more detailed than the first CREATE TABLE,
: and will allow the user more control over what bucket a row goes to,
: but will not be default and maybe will not be necessary.
:=20
: An example could be
: CREATE TABLE t (column_1 INT PRIMARY KEY, "bucket" INT, column_2 =
STRING)
: PARTITION BY RANGE (column_2)
: PARTITION p1 VALUES LESS THAN ('J'),
: PARTITION p2 VALUES LESS THAN ('Z');
: The result would be that there are only two possible bucket_id values,
: 1 for 'p1', and 2 for 'p2'.
:=20
: Notice that this definition means that column_2 will always be less
: than 'Z', so an additional clause
: CHECK (column_2 < 'Y') would be redundant
: and an additional clause
: CHECK (column_2 > '=D0=94') should be illegal.
:=20
: It may be useful that this means there is a one-to-one association
: between partition_name and bucket_id value, as we will see later.
:=20
: It may not be useful that this means a column with monotonically
: increasing values will tend to be cause re-use of the same partition.
:=20
: FOREIGN KEYS
:=20
: The good news about foreign keys is:
: if the foreign key and the referenced key are also partition keys
: and the partition clauses are the same, then rows with the same
: values will be in the same buckets. In such cases we can guarantee
: that INSERT/UPDATE/DELETE will work without needing multiple nodes,
: and equijoins will work if they are ANDED with single-row
: selections of either key.
:=20
: The bad news is:
: Otherwise there has to be at least one lookup which might
: be for a value in a different bucket, and the
: ON UPDATE | ON DELETE clauses cause changes of values in
: different buckets. That is, they are "multi-row".
:=20
: So the recommendation is: let foreign keys be illegal.
:=20
: TABLE REFERENCES
:=20
: Now that we know there is sharding, we can change
: FROM table_name
: to
: FROM table_name [PARTITION (partition_name)]
: which would override router decisions because p1 is associated
: with bucket.
:=20
: This is manual "pruning".
: "Pruning" can mean "only go to the necessary partitions".
: Usually we hope an optimizer will do it, but let's not assume so.
:=20
: In some systems one can say PARTITION (expression) instead,
: but that's unnecessary, I think.
:=20
: Table references can have PARTITION clauses in SELECT,
: INSERT, UPDATE, DELETE, REPLACE, and perhaps other statements.
: But they should be unnecessary for single-row INSERTs,
: because single-row INSERTs will always contain the partition-key
: values and therefore bucket_id is known already and therefore
: passing partition name would be redundant.
:=20
: INSERT
:=20
: As stated, INSERT should be easy because bucket_id is easy
: to calculate from the rest of the statement.
:=20
: On the other hand, INSERT ... SELECT is hard, so I suggest
: that it should be illegal in the first release.
:=20
: UPDATE, SELECT, DELETE
:=20
: Either these statements should have a partition name or they should
: have a WHERE clause that contains the exact phrase
: partition_key_value =3D literal
: so that even a very simple program can calculate bucket_id.
: Otherwise they should be illegal in the  first release.
:=20
: Updating partition-key columns is illegal too.
:=20
: ALTER
:=20
: When ALTER ... DROP is illegal, it should be illegal to drop the
: "_bucket" column
: or any column used for partitioning.
:=20
: DROP INDEX
:=20
: It should be illegal to drop indexes on the _bucket column.
: It probably should be legal to drop indexes on columns used for
: partitioning.
:=20
: INFORMATION_SCHEMA
:=20
: The _TABLES table should show the partition definitions.
: The _COLUMNS table should show whether a column is in a partition key.
: Minimally, a user or a program should be able to see enough to decide
: whether manual pruning is possible for a simple query.
:=20
: Actually I suppose this is my job, since the only implementation of
: INFORMATION_SCHEMA is the series of functions that I wrote for the
: user manual. Remember that one of those functions will crash until
: a certain server bug is fixed.
:=20
: If it's desirable, I can write other routines that put bucket
: information in INFORMATION_SCHEMA. That's easy for anyone to do,
: it merely involves reading system tables and calling Lua functions.
: However, I believe the display should be as a table -- that's what
: any SQL person would expect -- so for example suppose we had this.
: (not a system table but similar to an example in the manual):
: tarantool> cfg =3D {
:           >   servers =3D {
:           >     { uri =3D 'host1:33131', zone =3D '1' },
:           >     { uri =3D 'host2:33131', zone =3D '2' }
:           >   },
:           >   login =3D 'test_user',
:           >   password =3D 'pass',
:           >   redundancy =3D '2'
:           > }
: I want this display if I select:
: | --------- + -------- + ---------- + ------------ + ------------- +
: -------------+ --------------|
: | login     | password | redundancy | server_1_uri | server_1_zone |
: server_2_uri | server_2_zone |
: | --------- + -------- + ---------- + ------------ + ------------- +
: -------------+ --------------|
: | test_user | pass     | 2          | host:33131   | 1             |
: host2:33131  | 2             |
: | --------- + -------- + ---------- + ------------ + ------------- +
: -------------+ --------------|
: This is not a server issue, it is a client feature.
:=20
: EXPLAIN | ANALYZE
:=20
: I expect that the simple phrase "this can be done on a single node"
: is the most useful thing that can be said. More is not necessary.
:=20
: ON CLAUSE AND WHERE CLAUSE
:=20
: These are the places where we're most likely to see joins or =
subqueries.
: Suppose they require a search of more than one =
partition/bucket/node/etc.
: We could cause a warning if someday we support warnings.
: We could cause an error if the user fails to add a keyword (like =
FORCE)
: to indicate an understanding of the possible consequences.
: We could require 'admin' privilege to do slow queries of any kind.
: But maybe we don't need to do anything, it might not be a big problem.
:=20
: I believe, though, that everyone will be happier if we say:
: joins and subqueries are illegal in the first release.
:=20
: I could add more details but am unsure whether any of this is
: the sort of feedback that you expected.
:=20
: Peter Gulutzan