Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alexander Turenko <alexander.turenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: kostja@tarantool.org, tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH 1/7] lua: expose ICU upper/lower functions to Lua
Date: Sat, 28 Apr 2018 03:56:53 +0300	[thread overview]
Message-ID: <20180428005653.5ssaax3llh6xzlw4@tkn_work_nb> (raw)
In-Reply-To: <4964845f82fc37f46f28b1713adf4527c219cb0d.1524698920.git.v.shpilevoy@tarantool.org>

Hi Vlad!

Please, consider comments below.

WBR, Alexander Turenko.

On Thu, Apr 26, 2018 at 02:29:01AM +0300, Vladislav Shpilevoy wrote:
> Lua can not work with unicode - in Lua it is enterpreted as a
> binary. On such string built-in upper/lower functions do not
> work. But Tarantool links with ICU that can solve the problem.
> Lets expose ICU upper/lower function into Lua to enable correct
> case transformations.
> 
> Closes #3290
> ---
>  src/lua/init.c               |   2 +-
>  src/lua/string.lua           | 140 +++++++++++++++++++++++++++++++++++++++++++
>  test/app-tap/string.test.lua |  34 ++++++++++-
>  3 files changed, 174 insertions(+), 2 deletions(-)
> 
> <...>
> diff --git a/src/lua/string.lua b/src/lua/string.lua
> index 5ff64c9f6..1c7226143 100644
> --- a/src/lua/string.lua
> +++ b/src/lua/string.lua
> <...>
> +--
> +-- Find cached UCaseMap for @a locale, or create a new one and
> +-- cache it.
> +-- @param locale String locale or box.NULL for default.
> +-- @retval nil Can neither get or create a UCaseMap.
> +-- @retval not nil Needed UCaseMap.
> +--

There are two special values for a locale: NULL is a 'default locale'
means system-wide one (LANG=...) and "" (empty string) means 'root'
locale that is some kind of compromise for language/country neural
applications.

Two things should be considered here:

1. Want we use system-default locale in tarantool by default? A system
   locale which is dependent of environment and can be different on a
   developer system and on a staging/production system. Tarantool is not
   end-user application, but some kind of system software.

2. We should desribe both special values in doxygen-style comment (and,
   then, in documentation) and explicitly state which one is the default
   (and why).

Side note: it seems that box.NULL will be returned in case or an error,
but not nil.

> +local function ucasemap_retrieve(locale)
> +    local ret = ucasemap_cache[locale]
> +    if not ret then
> +        ret = ffi.C.ucasemap_open(c_char_ptr(locale), 0, errcode)
> +        if ret ~= nil then
> +            ffi.gc(ret, ffi.C.ucasemap_close)
> +            ucasemap_cache[locale] = ret
> +        end
> +    end
> +    return ret
> +end

Root locale will be returned in case of non-existent locale (please note
that it can differs from 'default', i.e. system locale). Proposed to at
least document that behaviour. But maybe it would be better to give
explicit error in the case.

> +--
> +-- Check ICU options for string.u_upper/u_lower.
> +-- @param opts Options. Can contain only one option - locale.
> +-- @param usage_err What to throw if opts types are violated.
> +-- @retval String locale if found.
> +-- @retval box.NULL if locale is not found.
> +--

Proposed wording:

> box.NULL if 'locale' field is not found.

To clarify that no lookup for available locales is performed.

> +local function icu_check_case_opts(opts, usage_err)
> <...>
> diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua
> index 852a7923c..004e149e9 100755
> --- a/test/app-tap/string.test.lua
> +++ b/test/app-tap/string.test.lua
> <...>
> +test:test("unicode", function(test)
> +    test:plan(12)
> +    local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺'

Your implementation correctly handle the empty string. Proposed to add
corresponsing test case.

> +    local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺'
> +    local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺'
> +    local lower_res = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i i i̇ 勺#☢༺'
> +    local lower_turkish = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i ı i 勺#☢༺'
> +    local s = string.u_upper(str)
> +    test:is(s, upper_res, 'default locale upper')

The test is system-locale dependent (you can try it like `LANG=tr_TR
./src/tarantool ./test/app-tap/string.test.lua`). Proposed to set LANG
explicitly in the test.

  reply	other threads:[~2018-04-28  0:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy
2018-04-28  0:56   ` Alexander Turenko [this message]
2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy
2018-04-26 10:36   ` [tarantool-patches] " Vladislav Shpilevoy
2018-04-26 16:07   ` Vladislav Shpilevoy
2018-04-26 23:57   ` Vladislav Shpilevoy
2018-04-28  1:10   ` Alexander Turenko
2018-04-25 23:29 ` [tarantool-patches] [PATCH 3/7] alter: fix assertion in collations alter Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 4/7] Move struct on_access_denied_ctx into error.h Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 5/7] Merge box_error, stat and collations into core library Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 6/7] Always store built-in collations in the cache Vladislav Shpilevoy
2018-04-25 23:29 ` [tarantool-patches] [PATCH 7/7] lua: expose u_compare/u_icompare into Lua Vladislav Shpilevoy
2018-04-28  1:55 ` [tarantool-patches] Re: [PATCH 0/7] Expose ICU " Alexander Turenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180428005653.5ssaax3llh6xzlw4@tkn_work_nb \
    --to=alexander.turenko@tarantool.org \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='[tarantool-patches] Re: [PATCH 1/7] lua: expose ICU upper/lower functions to Lua' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox