[tarantool-patches] Re: [PATCH 1/7] lua: expose ICU upper/lower functions to Lua

Alexander Turenko alexander.turenko at tarantool.org
Sat Apr 28 03:56:53 MSK 2018


Hi Vlad!

Please, consider comments below.

WBR, Alexander Turenko.

On Thu, Apr 26, 2018 at 02:29:01AM +0300, Vladislav Shpilevoy wrote:
> Lua can not work with unicode - in Lua it is enterpreted as a
> binary. On such string built-in upper/lower functions do not
> work. But Tarantool links with ICU that can solve the problem.
> Lets expose ICU upper/lower function into Lua to enable correct
> case transformations.
> 
> Closes #3290
> ---
>  src/lua/init.c               |   2 +-
>  src/lua/string.lua           | 140 +++++++++++++++++++++++++++++++++++++++++++
>  test/app-tap/string.test.lua |  34 ++++++++++-
>  3 files changed, 174 insertions(+), 2 deletions(-)
> 
> <...>
> diff --git a/src/lua/string.lua b/src/lua/string.lua
> index 5ff64c9f6..1c7226143 100644
> --- a/src/lua/string.lua
> +++ b/src/lua/string.lua
> <...>
> +--
> +-- Find cached UCaseMap for @a locale, or create a new one and
> +-- cache it.
> +-- @param locale String locale or box.NULL for default.
> +-- @retval nil Can neither get or create a UCaseMap.
> +-- @retval not nil Needed UCaseMap.
> +--

There are two special values for a locale: NULL is a 'default locale'
means system-wide one (LANG=...) and "" (empty string) means 'root'
locale that is some kind of compromise for language/country neural
applications.

Two things should be considered here:

1. Want we use system-default locale in tarantool by default? A system
   locale which is dependent of environment and can be different on a
   developer system and on a staging/production system. Tarantool is not
   end-user application, but some kind of system software.

2. We should desribe both special values in doxygen-style comment (and,
   then, in documentation) and explicitly state which one is the default
   (and why).

Side note: it seems that box.NULL will be returned in case or an error,
but not nil.

> +local function ucasemap_retrieve(locale)
> +    local ret = ucasemap_cache[locale]
> +    if not ret then
> +        ret = ffi.C.ucasemap_open(c_char_ptr(locale), 0, errcode)
> +        if ret ~= nil then
> +            ffi.gc(ret, ffi.C.ucasemap_close)
> +            ucasemap_cache[locale] = ret
> +        end
> +    end
> +    return ret
> +end

Root locale will be returned in case of non-existent locale (please note
that it can differs from 'default', i.e. system locale). Proposed to at
least document that behaviour. But maybe it would be better to give
explicit error in the case.

> +--
> +-- Check ICU options for string.u_upper/u_lower.
> +-- @param opts Options. Can contain only one option - locale.
> +-- @param usage_err What to throw if opts types are violated.
> +-- @retval String locale if found.
> +-- @retval box.NULL if locale is not found.
> +--

Proposed wording:

> box.NULL if 'locale' field is not found.

To clarify that no lookup for available locales is performed.

> +local function icu_check_case_opts(opts, usage_err)
> <...>
> diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
> index 852a7923c..004e149e9 100755
> --- a/test/app-tap/string.test.lua
> +++ b/test/app-tap/string.test.lua
> <...>
> +test:test("unicode", function(test)
> +    test:plan(12)
> +    local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'

Your implementation correctly handle the empty string. Proposed to add
corresponsing test case.

> +    local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
> +    local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
> +    local lower_res = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i i i̇ 勺#☢༺'
> +    local lower_turkish = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i ı i 勺#☢༺'
> +    local s = string.u_upper(str)
> +    test:is(s, upper_res, 'default locale upper')

The test is system-locale dependent (you can try it like `LANG=tr_TR
./src/tarantool ./test/app-tap/string.test.lua`). Proposed to set LANG
explicitly in the test.




More information about the Tarantool-patches mailing list