[tarantool-patches] Re: [PATCH 0/7] Expose ICU into Lua

Alexander Turenko alexander.turenko at tarantool.org
Sat Apr 28 04:55:47 MSK 2018

Hi Vlad!

Some thoughts / questions re API are below.

1. u_upper/u_lower have support of user-provided locales and follow
   system-default locale by default (that decision is debatable IMHO,
   see the email re 1st patch). But u_compare/u_icompare uses built-in
   collations unconditionally. Should all these functions have some
   unified approach to handle locales?

2. Should we expose these functions into the 'string' module? The module
   seems to be very basic for the language and maybe it worth to be
   conservative in its extending. Lua 5.3 have separate 'utf8' module,
   for example.

3. Should we stick to some existing API to be more friendly for existing

The examples I found:

- lua 5.3 utf8: https://www.lua.org/manual/5.3/manual.html#6.5 
- lua-utf8: https://github.com/starwing/luautf8
- icu-lua: http://files.luaforge.net/releases/icu-lua/icu-lua/0.1A

>From the other side, they seems to don't have character properties
exposing like in our u_count and don't provide ability to set specific
locale. So trying to provide transparent replacement for some parts of
these APIs seems not being a good idea. Just note this possible point

WBR, Alexander Turenko.

On Thu, Apr 26, 2018 at 02:29:00AM +0300, Vladislav Shpilevoy wrote:
> Branch: http://github.com/tarantool/tarantool/tree/gh-3290-lua-icu-ucasemap
> Issue: https://github.com/tarantool/tarantool/issues/3290
> Issue: https://github.com/tarantool/tarantool/issues/3081
> Lua can not work with unicode - in Lua it is enterpreted as a binary. On such
> string built-in upper/lower functions, '#' (length) and comparison operators do
> not work. But Tarantool links with ICU and has comparators with collations, that
> can solve the problems.
> But there is another issue - string methods must be available before box.cfg,
> so the ICU and collations must be built out of main 'box' static library. To do
> this the collation related files are moved from 'box' into 'core' library.
> A second issue is that when box.cfg is called, it inserts built-in collations
> into _collation space, and these insertions can conflict with built-in
> collations, created before box.cfg. Delete from _collation can break the
> collations cache as well. The patchset solves this by checking collations
> deletions and insertions, and if they tries to operate on built-in collations,
> then they are ignored - a user sees changes in _collation, but the cache is
> unchanged.
> Vladislav Shpilevoy (7):
>   lua: expose ICU upper/lower functions to Lua
>   lua: implement string.u_count
>   alter: fix assertion in collations alter
>   Move struct on_access_denied_ctx into error.h
>   Merge box_error, stat and collations into core library
>   Always store built-in collations in the cache
>   lua: expose u_compare/u_icompare into Lua
> <...>

More information about the Tarantool-patches mailing list