Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: "n.pettik" <korablev@tarantool.org>, tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations
Date: Wed, 14 Nov 2018 14:52:16 +0300	[thread overview]
Message-ID: <e86922fe-baa6-4559-d64e-44c6a352936e@tarantool.org> (raw)
In-Reply-To: <03A13ECD-6BA4-448F-8952-EE8E5C753130@tarantool.org>


>>> +	/*
>>> +	 * In SQL all identifiers should be uppercased, so
>>> +	 * to avoid mess lets simple search binary (since it is
>>> +	 * sort of "special" collation) ignoring case at all.
>>> +	 */
>>
>> 2. I am not sure if it should be so special. I think, we should
>> treat it just like any other collation. If tests have BINARY
>> uppercased, then they should be fixed, I guess.
> 
> Thing is in SQLite binary can be used without quotes, like:
> 
> … COLLATE BiNaRy …
> 
> And since ids are uppercased, this thing comes as “BINARY".
> If I forced only lower case “binary”, a lot (>80 separate sub-tests,
> 12 separate file tests) of tests would fail. Moreover, I see that similar
> things take place at Lua-land (schema.cc):
> 
> local function update_index_parts(format, parts)
> ...
> elseif k == 'collation' then
>      -- find ID by name
>      local coll = box.space._collation.index.name:get{v}
>      if not coll then
>          coll = box.space._collation.index.name:get{v:lower()}
>      end
> 
> If you still think that we should allow only “binary” format,
> I will do it alongside with tests in a separate commit.

I still think that since 'binary' is even in _collation space
now, it should be just a regular collation. If you do not want
to change tests, then you can try to search any collation firstly
as is, and if not found, then lowerify and search again, regardless
of is it binary or not.

My point is that we should not treat binary differently from
others. Please, ask in the server team chat or consult Peter.
Especially, if you like the idea of lowering any collation name
if it is not found (just like we do in Lua as you decently
highlighted).

By the way, what if I write SQL like this? :

     SELECT ... WHERE  ... COLLATE NONE ...

Looks like this code:

> +       if (strcasecmp(name, "binary") == 0)
> +               p = coll_by_name("binary", strlen("binary"));
> +       else
> +               p = coll_by_name(name, strlen(name));

will find it ok, but maybe should not, is it?

> diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua
> index 61df33a95..05f666f5e 100644
> --- a/test/sql/collation.test.lua
> +++ b/test/sql/collation.test.lua
> @@ -321,8 +319,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
>          for (uint32_t i = 0; i < part_count; i++) {
>                  const struct key_part_def *part = &parts[i];
>                  int count = 2;
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR)
> +               if (part->coll_id != COLL_NONE)
>                          count++;
>                  if (part->is_nullable)
>                          count++;
> @@ -332,8 +329,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
>                  assert(part->type < field_type_MAX);
>                  size += mp_sizeof_str(strlen(PART_OPT_TYPE));
>                  size += mp_sizeof_str(strlen(field_type_strs[part->type]));
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR) {
> +               if (part->coll_id != COLL_NONE) {
>                          size += mp_sizeof_str(strlen(PART_OPT_COLLATION));
>                          size += mp_sizeof_uint(part->coll_id);
>                  }
> @@ -352,8 +348,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
>          for (uint32_t i = 0; i < part_count; i++) {
>                  const struct key_part_def *part = &parts[i];
>                  int count = 2;
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR)
> +               if (part->coll_id != COLL_NONE)
>                          count++;
>                  if (part->is_nullable)
>                          count++;
> @@ -366,8 +361,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
>                  assert(part->type < field_type_MAX);
>                  const char *type_str = field_type_strs[part->type];
>                  data = mp_encode_str(data, type_str, strlen(type_str));
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR) {
> +               if (part->coll_id != COLL_NONE) {
>                          data = mp_encode_str(data, PART_OPT_COLLATION,
>                                               strlen(PART_OPT_COLLATION));
>                          data = mp_encode_uint(data, part->coll_id);
> 
> diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
> index efaa10da0..7cae436f1 100644
> --- a/src/box/lua/space.cc
> +++ b/src/box/lua/space.cc
> @@ -299,15 +299,12 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
>                          lua_pushboolean(L, key_part_is_nullable(part));
>                          lua_setfield(L, -2, "is_nullable");
>   
> -                       if (part->type == FIELD_TYPE_STRING ||
> -                           part->type == FIELD_TYPE_SCALAR) {
> -                               if (! space_is_system(space)) {
> -                                       struct coll_id *coll_id =
> -                                               coll_by_id(part->coll_id);
> -                                       assert(coll_id != NULL);
> -                                       lua_pushstring(L, coll_id->name);
> -                                       lua_setfield(L, -2, "collation");
> -                               }
> +                       if (part->coll_id != COLL_NONE) {
> +                               struct coll_id *coll_id =
> +                                       coll_by_id(part->coll_id);
> +                               assert(coll_id != NULL);
> +                               lua_pushstring(L, coll_id->name);
> +                               lua_setfield(L, -2, "collation");
>                          }
> 
> In other words, each field with type of STRING/SCALAR and mb ANY
> would have real pointer to collation. However, in turn such change
> would break too many tests, without significant refactoring of SQL
> query planner (at least I failed to do it in several hours).
> On the other hand, in this case we can remove COLL_NONE
> and unify collation interface: each field or key part MUST feature
> collation. And I mean not only valid ID (it is how it works now),
> but also valid (i.e. != NULL) pointer to collation struct. So, it seems
> that we would be able to remove any checks like:
> 
> if (coll != NULL)
> 	*use collation*
> else
> 	*process without involving collations*
> 
> Would become:
> 
> If (type_is_string_like)
> 	*always use collation*
> 
> To be honest I was sure that you would notice this pickle…
> Or current approach is not dat bad?
> 

Yeah, I understand your aspiration, but I have some remarks
about the draft patch above and about the concept as it is:

* you said that you want to get rid of COLL_NONE, but in the draft
   above you still use it. Even if you had removed COLL_NONE
   usage from key_def builders, it would have stayed in alter.cc
   as a check that you can not drop 'none' collation. I think that
   COLL_NONE as a constant equal to 0 should not be removed from
   source code until it is not used at all. But it is a minor
   comment rather about coding style: I prefer code like
   'id != UNDERSTANDABLE_NAME' to 'id != <strange_constant>'.

* the main flaw in storing a collation for each string is that it
   will dramatically slow down comparators. The awful problem is in
   virtual functions calling - it is much slower than regular
   functions. Now we have two levels of avoiding coll->cmp() calling.
   First in *_compare_create functions. If a key has no collations,
   we can use highly specified template recursive comparators
   (struct TupleCompare, struct TupleCompareWithKey) which call
   memcmp. Second, we are trying to do not call coll->cmp() in other
   comparators (see mp_compare_str).

   Yes, we still can follow you, store a collation object in each
   key_part, tuple_field, field_def, but in tuple_compare.cc we
   still will need a way how to check that a collation comparator
   can be inlined into memcmp. This way is check coll->id == COLL_NONE.
   Now it is like

       use_fast_cmp = coll == NULL

   Will be like

       use_fast_cmp = coll->id == COLL_NONE

   As a summary: we can store a collation in each *_field/part object
   to simplify SQL source code, but it does not affect necessity of
   a special COLL_NONE constant.


Also, I see now a new minor problem:

> diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
> index 64f74f9d3..a9525058b 100644
> --- a/src/box/lua/upgrade.lua
> +++ b/src/box/lua/upgrade.lua
> @@ -396,8 +396,10 @@ local function create_collation_space()
>      box.space._index:insert{_collation.id, 1, 'name', 'tree', {unique = true}, {{1, 'string'}}}
>  
>      log.info("create predefined collations")
> +    box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
>      box.space._collation:replace{1, "unicode", ADMIN, "ICU", "", setmap{}}
>      box.space._collation:replace{2, "unicode_ci", ADMIN, "ICU", "", {strength='primary'}}
> +    box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
>  
>      local _priv = box.space[box.schema.PRIV_ID]
>      _priv:insert{ADMIN, PUBLIC, 'space', _collation.id, box.priv.W}

Unfortunately, we can not change old update() functions. So the only way to add
these new collations is to add a new code to upgrade_to_2_1_0() with these two
new lines.

  reply	other threads:[~2018-11-14 11:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-13  0:07 [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik
2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 1/3] sql: do not add explicit <COLLATE "BINARY"> clause Nikita Pettik
2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 2/3] Introduce "none" and "binary" collations Nikita Pettik
2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
2018-11-13 22:37     ` n.pettik
2018-11-14 11:52       ` Vladislav Shpilevoy [this message]
2018-11-14 20:59         ` n.pettik
2018-11-15 11:04           ` Vladislav Shpilevoy
2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 3/3] sql: change collation compatibility rules Nikita Pettik
2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
2018-11-13 22:37     ` n.pettik
2018-11-15 11:24 ` [tarantool-patches] Re: [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e86922fe-baa6-4559-d64e-44c6a352936e@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox