From: Alexander Turenko <alexander.turenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: kostja@tarantool.org, tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Date: Sat, 28 Apr 2018 03:56:53 +0300 [thread overview] Message-ID: <20180428005653.5ssaax3llh6xzlw4@tkn_work_nb> (raw) In-Reply-To: <4964845f82fc37f46f28b1713adf4527c219cb0d.1524698920.git.v.shpilevoy@tarantool.org> Hi Vlad! Please, consider comments below. WBR, Alexander Turenko. On Thu, Apr 26, 2018 at 02:29:01AM +0300, Vladislav Shpilevoy wrote: > Lua can not work with unicode - in Lua it is enterpreted as a > binary. On such string built-in upper/lower functions do not > work. But Tarantool links with ICU that can solve the problem. > Lets expose ICU upper/lower function into Lua to enable correct > case transformations. > > Closes #3290 > --- > src/lua/init.c | 2 +- > src/lua/string.lua | 140 +++++++++++++++++++++++++++++++++++++++++++ > test/app-tap/string.test.lua | 34 ++++++++++- > 3 files changed, 174 insertions(+), 2 deletions(-) > > <...> > diff --git a/src/lua/string.lua b/src/lua/string.lua > index 5ff64c9f6..1c7226143 100644 > --- a/src/lua/string.lua > +++ b/src/lua/string.lua > <...> > +-- > +-- Find cached UCaseMap for @a locale, or create a new one and > +-- cache it. > +-- @param locale String locale or box.NULL for default. > +-- @retval nil Can neither get or create a UCaseMap. > +-- @retval not nil Needed UCaseMap. > +-- There are two special values for a locale: NULL is a 'default locale' means system-wide one (LANG=...) and "" (empty string) means 'root' locale that is some kind of compromise for language/country neural applications. Two things should be considered here: 1. Want we use system-default locale in tarantool by default? A system locale which is dependent of environment and can be different on a developer system and on a staging/production system. Tarantool is not end-user application, but some kind of system software. 2. We should desribe both special values in doxygen-style comment (and, then, in documentation) and explicitly state which one is the default (and why). Side note: it seems that box.NULL will be returned in case or an error, but not nil. > +local function ucasemap_retrieve(locale) > + local ret = ucasemap_cache[locale] > + if not ret then > + ret = ffi.C.ucasemap_open(c_char_ptr(locale), 0, errcode) > + if ret ~= nil then > + ffi.gc(ret, ffi.C.ucasemap_close) > + ucasemap_cache[locale] = ret > + end > + end > + return ret > +end Root locale will be returned in case of non-existent locale (please note that it can differs from 'default', i.e. system locale). Proposed to at least document that behaviour. But maybe it would be better to give explicit error in the case. > +-- > +-- Check ICU options for string.u_upper/u_lower. > +-- @param opts Options. Can contain only one option - locale. > +-- @param usage_err What to throw if opts types are violated. > +-- @retval String locale if found. > +-- @retval box.NULL if locale is not found. > +-- Proposed wording: > box.NULL if 'locale' field is not found. To clarify that no lookup for available locales is performed. > +local function icu_check_case_opts(opts, usage_err) > <...> > diff --git a/test/app-tap/string.test.lua b/test/app-tap/string.test.lua > index 852a7923c..004e149e9 100755 > --- a/test/app-tap/string.test.lua > +++ b/test/app-tap/string.test.lua > <...> > +test:test("unicode", function(test) > + test:plan(12) > + local str = 'хеЛлоу вОрЛд ё Ё я Я э Э ъ Ъ hElLo WorLd 1234 i I İ 勺#☢༺' Your implementation correctly handle the empty string. Proposed to add corresponsing test case. > + local upper_res = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 I I İ 勺#☢༺' > + local upper_turkish = 'ХЕЛЛОУ ВОРЛД Ё Ё Я Я Э Э Ъ Ъ HELLO WORLD 1234 İ I İ 勺#☢༺' > + local lower_res = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i i i̇ 勺#☢༺' > + local lower_turkish = 'хеллоу ворлд ё ё я я э э ъ ъ hello world 1234 i ı i 勺#☢༺' > + local s = string.u_upper(str) > + test:is(s, upper_res, 'default locale upper') The test is system-locale dependent (you can try it like `LANG=tr_TR ./src/tarantool ./test/app-tap/string.test.lua`). Proposed to set LANG explicitly in the test.
next prev parent reply other threads:[~2018-04-28 0:56 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-25 23:29 [tarantool-patches] [PATCH 0/7] Expose ICU into Lua Vladislav Shpilevoy 2018-04-25 23:29 ` [tarantool-patches] [PATCH 1/7] lua: expose ICU upper/lower functions to Lua Vladislav Shpilevoy 2018-04-28 0:56 ` Alexander Turenko [this message] 2018-04-25 23:29 ` [tarantool-patches] [PATCH 2/7] lua: implement string.u_count Vladislav Shpilevoy 2018-04-26 10:36 ` [tarantool-patches] " Vladislav Shpilevoy 2018-04-26 16:07 ` Vladislav Shpilevoy 2018-04-26 23:57 ` Vladislav Shpilevoy 2018-04-28 1:10 ` Alexander Turenko 2018-04-25 23:29 ` [tarantool-patches] [PATCH 3/7] alter: fix assertion in collations alter Vladislav Shpilevoy 2018-04-25 23:29 ` [tarantool-patches] [PATCH 4/7] Move struct on_access_denied_ctx into error.h Vladislav Shpilevoy 2018-04-25 23:29 ` [tarantool-patches] [PATCH 5/7] Merge box_error, stat and collations into core library Vladislav Shpilevoy 2018-04-25 23:29 ` [tarantool-patches] [PATCH 6/7] Always store built-in collations in the cache Vladislav Shpilevoy 2018-04-25 23:29 ` [tarantool-patches] [PATCH 7/7] lua: expose u_compare/u_icompare into Lua Vladislav Shpilevoy 2018-04-28 1:55 ` [tarantool-patches] Re: [PATCH 0/7] Expose ICU " Alexander Turenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180428005653.5ssaax3llh6xzlw4@tkn_work_nb \ --to=alexander.turenko@tarantool.org \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=v.shpilevoy@tarantool.org \ --subject='[tarantool-patches] Re: [PATCH 1/7] lua: expose ICU upper/lower functions to Lua' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox