From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id E8EB820AFE for ; Fri, 27 Apr 2018 21:55:39 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6HWel7dWpEda for ; Fri, 27 Apr 2018 21:55:39 -0400 (EDT) Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id A805F20AEE for ; Fri, 27 Apr 2018 21:55:39 -0400 (EDT) Date: Sat, 28 Apr 2018 04:55:47 +0300 From: Alexander Turenko Subject: [tarantool-patches] Re: [PATCH 0/7] Expose ICU into Lua Message-ID: <20180428015547.stjjxx5nm67do5lb@tkn_work_nb> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: Vladislav Shpilevoy Cc: kostja@tarantool.org, tarantool-patches@freelists.org Hi Vlad! Some thoughts / questions re API are below. 1. u_upper/u_lower have support of user-provided locales and follow system-default locale by default (that decision is debatable IMHO, see the email re 1st patch). But u_compare/u_icompare uses built-in collations unconditionally. Should all these functions have some unified approach to handle locales? 2. Should we expose these functions into the 'string' module? The module seems to be very basic for the language and maybe it worth to be conservative in its extending. Lua 5.3 have separate 'utf8' module, for example. 3. Should we stick to some existing API to be more friendly for existing users? The examples I found: - lua 5.3 utf8: https://www.lua.org/manual/5.3/manual.html#6.5 - lua-utf8: https://github.com/starwing/luautf8 - icu-lua: http://files.luaforge.net/releases/icu-lua/icu-lua/0.1A >From the other side, they seems to don't have character properties exposing like in our u_count and don't provide ability to set specific locale. So trying to provide transparent replacement for some parts of these APIs seems not being a good idea. Just note this possible point here. WBR, Alexander Turenko. On Thu, Apr 26, 2018 at 02:29:00AM +0300, Vladislav Shpilevoy wrote: > Branch: http://github.com/tarantool/tarantool/tree/gh-3290-lua-icu-ucasemap > Issue: https://github.com/tarantool/tarantool/issues/3290 > Issue: https://github.com/tarantool/tarantool/issues/3081 > > Lua can not work with unicode - in Lua it is enterpreted as a binary. On such > string built-in upper/lower functions, '#' (length) and comparison operators do > not work. But Tarantool links with ICU and has comparators with collations, that > can solve the problems. > > But there is another issue - string methods must be available before box.cfg, > so the ICU and collations must be built out of main 'box' static library. To do > this the collation related files are moved from 'box' into 'core' library. > > A second issue is that when box.cfg is called, it inserts built-in collations > into _collation space, and these insertions can conflict with built-in > collations, created before box.cfg. Delete from _collation can break the > collations cache as well. The patchset solves this by checking collations > deletions and insertions, and if they tries to operate on built-in collations, > then they are ignored - a user sees changes in _collation, but the cache is > unchanged. > > Vladislav Shpilevoy (7): > lua: expose ICU upper/lower functions to Lua > lua: implement string.u_count > alter: fix assertion in collations alter > Move struct on_access_denied_ctx into error.h > Merge box_error, stat and collations into core library > Always store built-in collations in the cache > lua: expose u_compare/u_icompare into Lua > > <...>